Software Development

Syracuse update: Django and Neo4j

Time for another update in my side project, following on from https://alanbuxton.wordpress.com/2023/08/05/rip-it-up-and-start-again-without-ripping-it-up-and-starting-again/

Since that post I’ve implemented a new backend with Neo4j and created an app for accessing the data1. It’s here: https://github.com/alanbuxton/syracuse-neo. The early commits have decent messages showing my baby steps in getting from one capability to the next2.

The previous app stored each topic collection separately in a Postgres database. A topic collection is the stories taken from one article, and there could be several stories within one article. This was ok as a starting point, but the point of this project is to connect the dots between different entities based on NLPing articles, so really I needed a Graph database to plug things together.

Heroku doesn’t support Neo4j so I’ve moved the app to a Digital Ocean VM that uses Neo4j’s free tier AuraDB. It’s hosted here http://syracuse.1145.am and has just enough data in it to fit into the AuraDB limits.

The graph visualization is done with vis.js. This turned out to be pretty straightforward to code: you just need some javascript for the nodes and some javascript for the edges. So long as your ids are all unique everything seems pretty straightforward.

Visualizing this sort of data in a graph makes it a lot more immediate than before. I just want to share a few entertaining images to show one feature I worked on.

The underlying data has a node for every time an entity (e.g. organization) is mentioned. This is intentional because when processing an article you can’t tell whether the name of a company in one article is the same company as a similarly-named company in a different article3. So each node in the graph is a mention of an organization and then there is some separate logic to figure out whether two nodes are the same organization or not. For example, if it’s a similar name and industry then it’s likely that the two are the same organziation.

This sometimes led to a ball of lines that looks like Pig-Pen’s hair.

On the plus side, this does make for a soothing visual as the graph library tries to move the nodes about into a sensible shape. With a bit of ambient sounds this could make a good relaxation video.

But, pretty though this may be, it’s hard to read. So I implemented an ‘uber node’ that is the result of clubbing together all the “same as” nodes. A lot more readable, see below:

Below is an example of the same graph after all the Accel Partners nodes had been combined together.

Next steps:

  1. Implement the other types of topic collections into this graph (e.g. people appointments, opening new locations)
  2. Implement a feature to easily flag any incorrect relationships or entities (which can then feed back into the ML training)

Thanks for reading!

Notes

  1. With thanks to the https://github.com/neo4j-examples/paradise-papers-django for their example app which gave a starting point for working with graph data in Neo4j. ↩︎
  2. For example, git checkout b2791fdb439c18026585bced51091f6c6dcd4f72 is a good one for complete newbies to see some basic interactions between Django and Neo4j. ↩︎
  3. Also this type of reconciliation is a difficult problem – I have some experience of it from this project: https://theybuyforyou.eu/business-cases/ – so it’s safer to process the articles and then have a separate process for combining the topics together. ↩︎
Business

The UK is prioritizing startup investors over startup entrepreneurs

And I think it’s a huge missed opportunity.

The UK Prime Minister, Rishi Sunak, has exciting ambitions for the role of tech in the UK’s future. He went so far as to put a joke in binary on the door of Number 10 Downing Street. It seems the priority is to attract external investment.

I’m a big fan of deep-pocketed investors arriving with bucketloads of cash and generating loads of jobs.

But I would also like to see an environment where startup entrepreneurs and early stage staff go on to build 2nd, 3rd and even more companies. I’d like to see a virtuous circle which encourages home-grown entrepreneurship.

Sadly, the current regime favours investors so much that it discourages serial entrepreneurs from build a vibrant startup ecosystem in the UK.

Here’s a story about how great government support is for UK startup investors:

I was recently involved in a pre-Series A funding round. It was into the 7 figures, so not a tiny round, but equally not a huge institutional round. The investors came from various countries around the world. Investors could choose to receive preference shares or ordinary shares. We don’t need to go into the details of what the differences are between these types of share. It’s enough to understand that preference shares are better for investors than ordinary shares. The clue is in the name.

Sure enough, the non-UK-taxpayers all chose preference shares. No surprises there. The UK taxpayers all chose ordinary shares. Bizarre behaviour. Why would you choose to have a share that would likely make you less money?

The answer is a pretty amazing tax incentive for UK taxpayers: SEIS/EIS or [Seed] Enterprise Investment Scheme. It gives an investor in early stage companies a credit on their income tax bill. It also comes with extra downside protection in case the investment goes under.

The UK taxpayers chose ordinary shares to guarantee that they could benefit from this EIS tax incentive1. Let that sink in: the tax regime is so pro-startup-investor that, as an investor, you’re better off limiting the value of your investment so you can maximise the tax benefits. Still, as they say, “don’t hate the player”, so I am absolutely not surprised about why people chose what they did.

So much for the incentives available to investors. What about the entrepreneurs and startup employees actually doing the work?

There used to be something called Entrepreneur’s Relief. This was a preferential rate on the capital gains tax that business owners would otherwise have to pay when selling (parts of) their business. It used to have a very generous lifetime limit of £10m. It’s now been rebranded as Business Asset Disposal Relief and the lifetime limit reduced to £1m2. Employees typically get EMI share options that share similar capital gains tax benefits to Business Asset Disposal Relief.

“So what”, you might say, “all these tax breaks are for the 1% of the 1%, why should I care?”

For sure, this is a very niche concern, which presumably is why the rules are the way they are. But, if you want to take a long-term view and build up a tech startup ecosystem, then surely you want to encourage more serial startup founders, rather than discouraging them?

To be clear: I’m not advocating for the abolition of anything here, nor for free money for anyone. I’m simply pointing out an imbalance that the UK needs to fix if it wants to encourage serial entrepreneurs. As things stand, UK taxpayers in 2023 are better off getting a regular income and then using the money they earn to make EIS investments, rather than doing the hard graft of creating a business themselves.

I can understand, if you’re taking a transactional view of individual taxpayer decisions, why you’d want to prioritise investors vs entrepreneurs and employees. An investor is making a conscious decision to invest into startup A vs in index fund B (or whatever) with every investment. On the other hand, tax planning is something that factors into the decision-making of approximately zero first-time founders and early stage employees. It’s the second-time (and later) founders where this kicks in. All that expertise and energy is actively encouraged to become semi-retired rather than starting new businesses.

This is not an isolated case. There are other examples which show how the UK takes startup entrepreneurs for granted these days:

  1. R&D tax credits are becoming less attractive for SMEs.
  2. The future fund that was supposed to encourage startups but didn’t: Some of the deal terms are awful, the sort of thing a VC’s lawyers would come up with and your lawyers would tell you to never agree to.
  3. Oh, and even rebranding “Enterpreneur’s Relief” (exciting, aspirational) to “Business Asset Disposal Relief” (boring, administrative). What genius came up with that change? Why consciously go to the effort of changing something that sounds cool to something that sounds tedious?

I hope this whole area gets a good going over and somehow we move more towards an environment that makes life easier for startup entrepreneurs to build business after business after business.

Footnotes

  1. There was some debate amongst the lawyers if you could structure preference shares in such a way that they would just about qualify for EIS, but none of these investors were prepared to take the risk ↩︎
  2. As a comparison, £1m is the annual limit for EIS investments (or £2m if in “knowledge-intensive” firms) ↩︎
Anthropology

How long before Google kills British English? (And does it matter?)

I’m (slowly) learning an excellent Chinese Karaoke song. I’ve so far got the chorus and most of the first verse. My process is: I print out the pinyin and the Chinese characters and gradually try to recognise the characters. I was recently looking at 裤子 – here is what Google Translate has to say about these characters:

The translation is “Pants”, or the American English version of the word. If you squint carefully, you can see there is another option for the British English version: “Trousers”, but how many people are going to do that?

It reminds me of a big debate I had with my kids a few years back about what these are called:

They insisted “Ladybug”. Took a while to get them to accept “Ladybird”.

I wonder what Google Translate thinks. Sure enough, if I use Google Translate to translate “Ladybird” to Chinese and then back to English I get “Ladybug”:

Again, with a tangential nod to “Ladybird” but, frankly, the damage has already been done.

Sharing this because it’s an interesting story about changing language: if the tools the people use to translate into English only consider American English then how long before British English disappears? And does it matter?

Let’s try one more for fun: Pavement. Hmmm, what do we have here:

It’s using the American meaning of the word to translate to the “road surface”. Squinters will notice that you can find a “sidewalk”/”pavement” option.

Translate 路面 back into English with Google Translate and you get ‘pavement’ (as in road surface):

“Sidewalk” gets translated to 人行道, which when translated back into English only gives the option for “Sidewalk”, not for “Pavement”:

When a word can have one meaning in British English and one in American English, Google treats the American version as the default1.

So this is not just about translating into English. It’s also about translating from English.

Seems that from Google’s point of view, American English is the default option and, only if there isn’t an American English meaning of a word will it grudgingly accept that British English exists2.

I mean …. where will this end? Language and spellings are one thing… but will Google eventually try to convince the world that the US date format should be treated as the standard? Doesn’t bear thinking about 😀

I wonder whose side ChatGPT is on in this subtle manipulation of language….

Ah, phew. All is not lost. At least ChatGPT hasn’t been absorbed into the Google Borg of American English. Though…. as it trains on more and more text that people have written with Google Translate…. how long will ChatGPT be able to hold out before it is assimilated? Is resistance futile?

Footnotes

  1. I am reliably informed by my Chinese friends that 路面 is a reasonable word to use for the whole wider concept of a road, so you could use this word to describe a trip hazard that a person might fall over or a bump that a car might encounter. But evidently Google is leaning towards the sense of ‘where the vehicles go’ ↩︎
  2. I’m also reliably informed by my Chinese friends that state-provided schools in (mainland) China tend to use British English books rather than American English. Oh the irony. ↩︎