Software Development

More software development lessons from my side project

Last month I wrote about migrating the syracuse codebase to Neo4j and changing the hosting from Heroku to Digital Ocean.

Since then I’ve finished adding the remaining types of content that I have to the UI, so you can now see information on corporate finance activities, senior appointment activities and location-related activities (e.g. adding a new site, exiting a territory). This is all part of building up a picture of how an organization evolves over time using information extracted from unstructured data sources.

I want to write about two things that came up while I was doing this which reminded me of why some things are good do and some aren’t!

CSV, CSV, CSV

The bad news was that adding the location and appointment activities into the UI showed that there were some inconsistencies in how the different types were represented in the data. The good news was that the inconsistencies weren’t too hard to fix. All the data was stored as RDF triples in json-ld format. This made it pretty trivial to regenerate. It would have been a lot harder to do it if the data had been stored in a structured database. Once you start getting data into a database, then even the smallest schema change can get very complicated to handle. So I’m glad I follow the advice of one of the smartest developers I ever worked with: Don’t assume you need a database, if a CSV file can handle your requirements then go with that.

In fact one feature I implemented does use a CSV file as it’s storage. Easier than using a database for now. My preferred approach for data handling is:

  1. CSV
  2. JSON
  3. Database

Test, Test, Test

Adding these new data types into the graph made the already quite messy code even messier. It was ripe for a refactor. I then had to do a fair amount of work to get things to work right which involved a lot of refreshing a web page, tweaking some code, then repeating.

My initial instinct was to think that I didn’t have time to write any tests.

But guess what… it was only when I finally wrote some tests that I got to the bottom of the problems and fixed them all. All it took was 2 integration tests and I quickly fixed the issues. You don’t need 100% code coverage to make testing worthwhile. Even the process of putting together some test data for these integration tests helped to identify where some of the bugs were.

The app is currently live at https://syracuse.1145.am, I hope you enjoy it. The web app is running on a Digital Ocean droplet and the backend database is in Neo4j’s Auradb free tier. So there is a fair amount of traffic going backwards and forwards which means the app isn’t super fast. But hopefully it gives a flavor.