Parts that work:
- Java. While I considered Perl for a while, I now know at least enough about regular expressions in Java to be dangerous.
- Breaking the problem down, and solving the easy parts first.
- Premature data modelling. I was focused on targeting a relational data model. While there will ultimately be a relational database, concentrating on what I had specified for that model and generating load files for it was putting the cart before the horse. Derive an object model first, tweak as needed, and then decide how the data should sit relationally.
- Lack of a scorecard. While I had processed the data in order of increasing difficulty, the capability to do so was not designed into the rather ad-hoc software. The program needs a serious redesign to filter and attack the problem in a more systematic way. Being able to measure progress is a side-effect of a good design in that respect.