Simpified history of NLP Transformers

(Some notes I made recently and posting here in case of interest to others – see the tables below)

The transformers story was kicked off by the “Attention is all you need” paper published in mid 2017. (See “Key Papers” section below). This eventually led to use cases like Google implementing transformers to improve its search in 2019/2020 and Microsoft implementing transformers to simplify writing code in 2021 (See “Real-world use of Transformers” section below).

For the rest of us, Huggingface has been producing some great code libraries for working with transformers. This was under heavy development in 2018-2019, including being renamed twice – an indicator of how in flux this area was at the time – but it’s fair to say that this has stabilised a lot over the past year. See “Major Huggingface releases” section below.

Another recent data point – Coursera’s Deep Learning Specialisation was based around using Google Brain’s Trax ( As of October 2021 Coursera has now announced that (in addition to doing some of the course with Trax) the transformers part now uses Huggingface.

Feels like transformers are at the level of maturity now that it makes sense to embed them into more real-world use cases. We will inevitably have to go through the Gartner Hype Cycle phases of inflated expectations leading to despair, so it’s important not to let expectations get too far ahead of reality. But even with that caveat in mind, now is a great time to be doing some experimentation with Huggingface’s transformers.

Key papers

Jun 2017“Attention is all you need” published
Oct 2018 “BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding” published
Jul 2019“RoBERTa: A Robustly Optimized BERT Pretraining Approach” published.
May 2020“Language Models are Few-Shot Learners” published, describing use of GPT-3

Real-world use of Transformers

Nov 2018Google open sources BERT code
Oct 2019Google starts rolling out BERT implementation for search
May 2020OpenAI introduces GPT-3
Oct 2020Google is using BERT used on “almost every English-language query”
May 2021Microsoft introduces GPT-3 into Power Apps

Major Huggingface Releases

Nov 2018Initial 0.1.2 release of pytorch-pretrained-bert
Jul 2019v1.0 of their pytorch-transformers library (including change of name from pytorch-pretrained-bert to pytorch-transformers)
Sep 2019v2.0, this time including name change from pytorch-transformers to, simply, transformers
June 2020v3.0 of transformers
Nov 2020v4.0 of transformers


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: