Machine Learning

ML Topic Extraction Update

This is an update to https://alanbuxton.wordpress.com/2022/01/19/first-steps-in-natural-language-topic-understanding. It’s scratching an itch I have about using machine learning to pick out useful information from text articles on topics like: who is being appointed to a new senior role in a company; what companies are launching new products in new regions etc. My first try, and a review of the various existing approaches out there, was first summarised here: https://alanbuxton.wordpress.com/2021/09/21/transformers-for-use-oriented-entity-extraction/.

After this recent nonsense about whether language models are sentient or not, I’ve decided to use language that doesn’t imply any level of consciousness or intelligence. So I’m not going to be using the word “understanding” any more. The algorithm clearly doesn’t understand the text it is being given in the same way that a human understands text.

Since the previous version of the topic extraction system I implemented logic that use constituency parsing and graphs in networkx to better model the relationships amongst the different entities. It went a long way to improving the quality of the results but the Appointment topic extraction, for example, still struggles in two particular use cases:

  • When lots of people are being appointed to one role (e.g. a lot of people being announced as partners)
  • When one person is taking on a new role that someone else is leaving (e.g. “Jane Smith is taking on the CEO role that Peter Franklin has stepped down from”)

At this point the post-processing is pretty complex. Instead of going further on with this approach I’m going back to square one. I once saw a maxim along the lines of “once your rules get complex, it’s best to replace them with machine learning”. This will mean throwing away a lot of code so emotionally it’s hard to do. And there is an open question will be how much more labelled data the algorithm would need to learn these relationships accurately. But it will be fun to find out.

A simplified version of the app, covering Appointments (senior hires and fires) and Locations (setting up a new HQ, launching in a new location) is available on Heroku at https://syracuse-1145.herokuapp.com/. Feedback more than welcome.

Machine Learning, Software Development, Supply Chain Management

Comparison of Transformers vs older ML architectures in Spend Classification

I recently wrote a piece for my company blog about why Transformers are a better machine learning technology to use in your spend classification projects compared to older ML techniques.

That was a theoretical post that discussed things like sub-word tokenization and self-attention and how these architectural features should be expected to deliver improvements over older ML approaches.

During the Jubilee Weekend, I thought I’d have a go at doing some real-world tests. I wanted to do a simple test to see how much of a difference this all really makes in the spend classification use case. The code is here: https://github.com/alanbuxton/tbfy-cpv-classifier-poc

TL;DR – Bidirectional LSTM is a world away from Support Vector Machines but Transformers have the edge over Bi-LSTM. In particular they are more tolerant of spelling inconsistencies.

This is an update of the code I did for this post: https://alanbuxton.wordpress.com/2021/10/25/transformers-vs-spend-classification/ in which I trained the Transformer for 20 epochs. In this case it was 15 epochs. FWIW the 20-epoch version was better at handling the ‘mobile office’ example. This does indicate that better results will be achieved with more training. But for the purposes of the current blog post there wasn’t any need to go further.

Machine Learning, Software Development

Analyzing a WhatsApp group chat with Seaborn, NetworkX and Transformers

We had a company shutdown recently. Simfoni operates ‘Anytime Anywhere’, which means that anyone can work whatever hours they feel are appropriate from wherever they want to. Every quarter we mandate a full company shutdown over a long weekend to make sure that we all take time away from work at the same time and come back to a clear inbox.

For me this meant a bunch of playing with my kids and hanging out in the garden.

But it also meant playing with some fun tech courtesy of a brief challenge I was set: what insights could I generate quickly from a WhatsApp chat list.

I had a go using some of my favourite tools: Seaborn for easy data visualization; Huggingface Transformers for ML insights and Networkx for graph analysis.

You can find the repo here: https://github.com/alanbuxton/whatsapp-analysis

Enterprise Software, Machine Learning, Product Management

First steps in Natural Language Topic Understanding

In https://alanbuxton.wordpress.com/2021/09/21/transformers-for-use-oriented-entity-extraction/ I showed how transformers allowed me to build something more advanced than the generic entity extraction systems that are publicly available out there.

Next step was to see if I can do something useful with this. In past lives customers have told me about the importance of tracking certain signals or events in a company’s lifecycle, e.g. making an acquisition, expanding to a new territory, making a new senior hire etc.

So I gave it a go, initially looking purely at whether I could train an algorithm to pick out key staffing changes. Results below are 20 random topics pulled from my from my first attempt showing the good, bad and ugly. The numbers are the confidence scores that the algorithm chose for each entity in the topic.

I’ll give myself a B for a decent first prototype.

I do wonder who else out there is working on this sort of thing. From what I can see in the market ML is used to classify articles (e.g. “this article is about a new hire”) but I couldn’t see any commercial offering that goes to the level of “which org hired who into what role”.

If I were to take this further I would be training specialist models on each different type of topic. I wonder if there is something like a T5-style model to rule them all that can handle all this kind of intelligent detailed topic understanding?

TitleOSE Immunotherapeutics Announces the Appointment of Dominique Costantini as Interim CEO Following the Departure of Alexis Peyroles
Urlhttps://www.businesswire.com/news/home/20220116005013/en/OSE-Immunotherapeutics-Announces-the-Appointment-of-Dominique-Costantini-as-Interim-CEO-Following-the-Departure-of-Alexis-Peyroles
WhoWhatRoleOrgEffective When
Alexis Peyroles (0.9846990705)departure (0.943598628)Chief Executive Officer (0.9995111823)OSE Immunotherapeutics SA (0.9983804822)immediately (0.9876502156)
Dominique Costantini (0.9990960956)appointed (0.9998416901)interim Chief Executive Officer (0.9983062148)OSE Immunotherapeutics SA (0.9983804822)immediately (0.9876502156)
Alexis Peyroles (0.993326962)departure (0.9623697996)Chief Executive Officer (0.9994782805)OSE Immunotherapeutics SA (0.9968072176)
Dominique Costantini (0.9989916682)appointed (0.9993845224)interim Chief Executive Officer (0.9982660413)OSE Immunotherapeutics SA (0.9968072176)
AssessmentTopic is duplicated without the ‘effective immediately’ piece – should only keep the most granular topics
TitleBarclays appoints managing directors for Australia investment banking unit
Urlhttps://www.reuters.com/markets/funds/barclays-appoints-managing-directors-australia-investment-banking-unit-2022-01-17/
WhoWhatRoleOrgEffective When
Duncan Connellan (0.988427639)appointed (0.9996656179)managing directors (0.9994463921)Britain ‘s Barclays Plc (0.9851405621)
Duncan Beattie (0.9959402084)appointed (0.9996656179)managing directors (0.9994463921)Britain ‘s Barclays Plc (0.9851405621)
AssessmentPulled out the two key items but: didn’t do a great job of the Entity (Britain’s Barclays Plc was treated as one entity) and doesn’t understand the pluralised role name. Model was not trained to look for where the role is based, so haven’t identified that these roles are specifically in Australia
TitleTrulioo Appoints Michael Ramsbacker as Chief Product Officer
Urlhttps://www.prweb.com/releases/trulioo_appoints_michael_ramsbacker_as_chief_product_officer/prweb18439306.htm
WhoWhatRoleOrgEffective When
Michael Ramsbacker (0.999671936)appointment (0.9997799993)Chief Product Officer (0.9999740124)Trulioo (0.9999925494)
AssessmentGot it right
TitleElastrin Therapeutics Announces Newly Formed Scientific Advisory Board
Urlhttps://www.businesswire.com/news/home/20220117005220/en/Elastrin-Therapeutics-Announces-Newly-Formed-Scientific-Advisory-Board
WhoWhatRoleOrgEffective When
Dr. Pedro M. Quintana Diez (0.9665058851)chairman (0.9933767915)Elastrin Therapeutics Inc. (0.9841426611)
Dr. Pedro M. Quintana Diez (0.9665058851)Scientific Advisory Board (0.9952206612)Elastrin Therapeutics Inc. (0.9841426611)
AssessmentCorrectly extracts key info that Dr Quntana Diez is chairman of the new Scientific Advisory Board but treats these as two roles rather than as one
TitleToshiba Appoints Andrew McDaniel to Lead Its European Retail Business
Urlhttps://www.businesswire.com/news/home/20220117005027/en/Toshiba-Appoints-Andrew-McDaniel-to-Lead-Its-European-Retail-Business
WhoWhatRoleOrgEffective When
Andrew McDaniel (0.9996804595)senior vice president of Europe (0.9983366132)Toshiba Global Commerce Solutions (0.9999386668)January 15 , 2022 (0.9999966621)
Andrew McDaniel (0.9996804595)managing director (0.9998098612)Toshiba Global Commerce Solutions (0.9999386668)January 15 , 2022 (0.9999966621)
AssessmentGot it right
TitleCairn Real Estate Holdings Appoints Mark Johnson President of JPAR® – Real Estate
Urlhttps://www.prweb.com/releases/cairn_real_estate_holdings_appoints_mark_johnson_president_of_jpar_real_estate/prweb18437732.htm
WhoWhatRoleOrgEffective When
Mark Johnson (0.9998755455)appointment (0.955047369)JPAR® – Real Estate (0.9999427795)
AssessmentCorrectly pulls out the appointment but doesn’t identify the role
TitleFiona Macfarlane and Andrea Nicholls appointed to HSBC Bank Canada Board of Directors
Urlhttps://www.businesswire.com/news/home/20220117005321/en/Fiona-Macfarlane-and-Andrea-Nicholls-appointed-to-HSBC-Bank-Canada-Board-of-Directors
WhoWhatRoleOrgEffective When
Fiona Macfarlane (0.9959855676)appointed (0.9996260405)non-executive directors (0.9942650795)HSBC Bank Canada Board of Directors (0.9947710037)
Andrea Nicholls (0.9999670982)appointed (0.9996260405)non-executive directors (0.9942650795)HSBC Bank Canada Board of Directors (0.9947710037)
AssessmentGot it right
TitleDigital Mountain Announces Industry Veteran Calvin Weeks Joining Team as Director of Digital Forensics & Cybersecurity
Urlhttps://www.prweb.com/releases/2022/1/prweb18416336.htm
WhoWhatRoleOrgEffective When
Calvin Weeks (0.999994576)Director , Digital Forensics & Cybersecurity (0.999989152)Digital Mountain , Inc. (0.9999924898)
AssessmentGot the role right but didn’t get the ‘what’
TitleMiniCo Insurance Announces Two Strategic Leadership Promotions
Urlhttps://www.prweb.com/releases/minico_insurance_announces_two_strategic_leadership_promotions/prweb18437565.htm
WhoWhatRoleOrgEffective When
Rick Krouner (0.9899243116)named (0.9960696697)President (0.9988073111)MiniCo Insurance Agency ( MiniCo ) (0.9878121018)
Jim Henry (0.9995553493)named (0.9960696697)Specialty Programs division (0.9527196288)MiniCo Insurance Agency ( MiniCo ) (0.9878121018)
Jim Henry (0.9995553493)named (0.9960696697)National Programs division (0.9757707119)MiniCo Insurance Agency ( MiniCo ) (0.9878121018)
Jim Henry (0.9995553493)named (0.9960696697)President (0.9988151789)MiniCo Insurance Agency ( MiniCo ) (0.9878121018)
AssessmentSimilar to the Elastrin story it pulls out the title and the division but treats them as different roles; also only assigns one of the found roles to Mr Krouner. Also is a bit ‘greedy’ at identifying the Org – the part in parentheses is redundant
TitleStertil-Koni Names Supply Chain Sales Pro Scott Steinhardt as Vice President of Sales
Urlhttps://www.prweb.com/releases/stertil_koni_names_supply_chain_sales_pro_scott_steinhardt_as_vice_president_of_sales/prweb18430929.htm
WhoWhatRoleOrgEffective When
Scott Steinhardt (0.9999918938)joined (0.9999970198)Vice President of Sales (0.9999983311)Stertil-Koni (0.9999969602)
AssessmentGot it right