(Some notes I made recently and posting here in case of interest to others – see the tables below)
The transformers story was kicked off by the “Attention is all you need” paper published in mid 2017. (See “Key Papers” section below). This eventually led to use cases like Google implementing transformers to improve its search in 2019/2020 and Microsoft implementing transformers to simplify writing code in 2021 (See “Real-world use of Transformers” section below).
For the rest of us, Huggingface has been producing some great code libraries for working with transformers. This was under heavy development in 2018-2019, including being renamed twice – an indicator of how in flux this area was at the time – but it’s fair to say that this has stabilised a lot over the past year. See “Major Huggingface releases” section below.
Another recent data point – Coursera’s Deep Learning Specialisation was based around using Google Brain’s Trax (https://github.com/google/trax). As of October 2021 Coursera has now announced that (in addition to doing some of the course with Trax) the transformers part now uses Huggingface.
Feels like transformers are at the level of maturity now that it makes sense to embed them into more real-world use cases. We will inevitably have to go through the Gartner Hype Cycle phases of inflated expectations leading to despair, so it’s important not to let expectations get too far ahead of reality. But even with that caveat in mind, now is a great time to be doing some experimentation with Huggingface’s transformers.
Key papers
Jun 2017 | “Attention is all you need” published | https://arxiv.org/abs/1706.03762 |
Oct 2018 | “BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding” published | https://arxiv.org/abs/1810.04805 |
Jul 2019 | “RoBERTa: A Robustly Optimized BERT Pretraining Approach” published. | https://arxiv.org/abs/1907.11692 |
May 2020 | “Language Models are Few-Shot Learners” published, describing use of GPT-3 | https://arxiv.org/abs/2005.14165 |
Real-world use of Transformers
Nov 2018 | Google open sources BERT code | https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html |
Oct 2019 | Google starts rolling out BERT implementation for search | https://searchengineland.com/faq-all-about-the-bert-algorithm-in-google-search-324193 |
May 2020 | OpenAI introduces GPT-3 | https://en.wikipedia.org/wiki/GPT-3 |
Oct 2020 | Google is using BERT used on “almost every English-language query” | https://searchengineland.com/google-bert-used-on-almost-every-english-query-342193 |
May 2021 | Microsoft introduces GPT-3 into Power Apps | https://powerapps.microsoft.com/en-us/blog/introducing-power-apps-ideas-ai-powered-assistance-now-helps-anyone-create-apps-using-natural-language/ |
Major Huggingface Releases
Nov 2018 | Initial 0.1.2 release of pytorch-pretrained-bert | https://github.com/huggingface/transformers/releases/tag/v0.1.2 |
Jul 2019 | v1.0 of their pytorch-transformers library (including change of name from pytorch-pretrained-bert to pytorch-transformers) | https://github.com/huggingface/transformers/releases/tag/v1.0.0 |
Sep 2019 | v2.0, this time including name change from pytorch-transformers to, simply, transformers | https://github.com/huggingface/transformers/releases/tag/v2.0.0 |
June 2020 | v3.0 of transformers | https://github.com/huggingface/transformers/releases/tag/v3.0.0 |
Nov 2020 | v4.0 of transformers | https://github.com/huggingface/transformers/releases/tag/v4.0.0 |