I recently wrote a piece for my company blog about why Transformers are a better machine learning technology to use in your spend classification projects compared to older ML techniques.
That was a theoretical post that discussed things like sub-word tokenization and self-attention and how these architectural features should be expected to deliver improvements over older ML approaches.
During the Jubilee Weekend, I thought I’d have a go at doing some real-world tests. I wanted to do a simple test to see how much of a difference this all really makes in the spend classification use case. The code is here: https://github.com/alanbuxton/tbfy-cpv-classifier-poc
TL;DR – Bidirectional LSTM is a world away from Support Vector Machines but Transformers have the edge over Bi-LSTM. In particular they are more tolerant of spelling inconsistencies.
This is an update of the code I did for this post: https://alanbuxton.wordpress.com/2021/10/25/transformers-vs-spend-classification/ in which I trained the Transformer for 20 epochs. In this case it was 15 epochs. FWIW the 20-epoch version was better at handling the ‘mobile office’ example. This does indicate that better results will be achieved with more training. But for the purposes of the current blog post there wasn’t any need to go further.