Software Development

Styling with django-allauth

I implemented django-allauth into Syracuse recently.

I wanted to implement some simple styling. It was surprisingly tricky to piece together the relevant information from different stackoverflow and medium posts. So here is what I ended up with in case it’s useful for others.

The app source code including allauth is in the allauth_v01 tag. The relevant commit that included allauth is here.

Briefly the necessary changes are:

settings.py

  1. Install allauth per the quickstart docs. I didn’t add the SOCIALACCOUNT_PROVIDERS piece as those can be set up via the admin interface as shown in this tutorial.
  2. The DIRS section tells Django where to look locally for templates. If it doesn’t find templates in here then it will look in other magical places which would include the templates that come in the allauth library

Then it’s just a case of find which template you need from the allauth library. In my case I just wanted to add some basic formatting from the other pages in the app to the allauth pages. So I just had to copy https://github.com/pennersr/django-allauth/blob/main/allauth/templates/allauth/layouts/base.html to an equivalent location within my local templates directory and add in one line at the top:

{% include 'layouts/main-styling.html' %}

Simple when you know how, right.

Up until this point the styling file was living in the relevant app directory in the project, so I moved it to a shared location in templates where it can be accessed by any part of the project. The rest of the updates in the commit are

  • related to moving this file and changing the rest of Syracuse to use the new file location.
  • implementing a little snippet to give an appropriate message/links if users are logged in or not
Software Development

More software development lessons from my side project

Last month I wrote about migrating the syracuse codebase to Neo4j and changing the hosting from Heroku to Digital Ocean.

Since then I’ve finished adding the remaining types of content that I have to the UI, so you can now see information on corporate finance activities, senior appointment activities and location-related activities (e.g. adding a new site, exiting a territory). This is all part of building up a picture of how an organization evolves over time using information extracted from unstructured data sources.

I want to write about two things that came up while I was doing this which reminded me of why some things are good do and some aren’t!

CSV, CSV, CSV

The bad news was that adding the location and appointment activities into the UI showed that there were some inconsistencies in how the different types were represented in the data. The good news was that the inconsistencies weren’t too hard to fix. All the data was stored as RDF triples in json-ld format. This made it pretty trivial to regenerate. It would have been a lot harder to do it if the data had been stored in a structured database. Once you start getting data into a database, then even the smallest schema change can get very complicated to handle. So I’m glad I follow the advice of one of the smartest developers I ever worked with: Don’t assume you need a database, if a CSV file can handle your requirements then go with that.

In fact one feature I implemented does use a CSV file as it’s storage. Easier than using a database for now. My preferred approach for data handling is:

  1. CSV
  2. JSON
  3. Database

Test, Test, Test

Adding these new data types into the graph made the already quite messy code even messier. It was ripe for a refactor. I then had to do a fair amount of work to get things to work right which involved a lot of refreshing a web page, tweaking some code, then repeating.

My initial instinct was to think that I didn’t have time to write any tests.

But guess what… it was only when I finally wrote some tests that I got to the bottom of the problems and fixed them all. All it took was 2 integration tests and I quickly fixed the issues. You don’t need 100% code coverage to make testing worthwhile. Even the process of putting together some test data for these integration tests helped to identify where some of the bugs were.

The app is currently live at https://syracuse.1145.am, I hope you enjoy it. The web app is running on a Digital Ocean droplet and the backend database is in Neo4j’s Auradb free tier. So there is a fair amount of traffic going backwards and forwards which means the app isn’t super fast. But hopefully it gives a flavor.

Software Development

Syracuse update: Django and Neo4j

Time for another update in my side project, following on from https://alanbuxton.wordpress.com/2023/08/05/rip-it-up-and-start-again-without-ripping-it-up-and-starting-again/

Since that post I’ve implemented a new backend with Neo4j and created an app for accessing the data1. It’s here: https://github.com/alanbuxton/syracuse-neo. The early commits have decent messages showing my baby steps in getting from one capability to the next2.

The previous app stored each topic collection separately in a Postgres database. A topic collection is the stories taken from one article, and there could be several stories within one article. This was ok as a starting point, but the point of this project is to connect the dots between different entities based on NLPing articles, so really I needed a Graph database to plug things together.

Heroku doesn’t support Neo4j so I’ve moved the app to a Digital Ocean VM that uses Neo4j’s free tier AuraDB. It’s hosted here http://syracuse.1145.am and has just enough data in it to fit into the AuraDB limits.

The graph visualization is done with vis.js. This turned out to be pretty straightforward to code: you just need some javascript for the nodes and some javascript for the edges. So long as your ids are all unique everything seems pretty straightforward.

Visualizing this sort of data in a graph makes it a lot more immediate than before. I just want to share a few entertaining images to show one feature I worked on.

The underlying data has a node for every time an entity (e.g. organization) is mentioned. This is intentional because when processing an article you can’t tell whether the name of a company in one article is the same company as a similarly-named company in a different article3. So each node in the graph is a mention of an organization and then there is some separate logic to figure out whether two nodes are the same organization or not. For example, if it’s a similar name and industry then it’s likely that the two are the same organziation.

This sometimes led to a ball of lines that looks like Pig-Pen’s hair.

On the plus side, this does make for a soothing visual as the graph library tries to move the nodes about into a sensible shape. With a bit of ambient sounds this could make a good relaxation video.

But, pretty though this may be, it’s hard to read. So I implemented an ‘uber node’ that is the result of clubbing together all the “same as” nodes. A lot more readable, see below:

Below is an example of the same graph after all the Accel Partners nodes had been combined together.

Next steps:

  1. Implement the other types of topic collections into this graph (e.g. people appointments, opening new locations)
  2. Implement a feature to easily flag any incorrect relationships or entities (which can then feed back into the ML training)

Thanks for reading!

Notes

  1. With thanks to the https://github.com/neo4j-examples/paradise-papers-django for their example app which gave a starting point for working with graph data in Neo4j. ↩︎
  2. For example, git checkout b2791fdb439c18026585bced51091f6c6dcd4f72 is a good one for complete newbies to see some basic interactions between Django and Neo4j. ↩︎
  3. Also this type of reconciliation is a difficult problem – I have some experience of it from this project: https://theybuyforyou.eu/business-cases/ – so it’s safer to process the articles and then have a separate process for combining the topics together. ↩︎