Python script to handle company abbreviations

A while back I was doing some tasks to clean up company names. Wikipedia has a useful page but I couldn’t find a simple way to use this information in a python script. So after some downloading and wrangling data from this wikipedia page, here is a link to the code I ended up with. Posting it here in case it is of use to someone else out there.

Some useful factoids I picked up along the way:

  1. Take a lot of care with Unicode. Characters that look the same (depending on font) might be very different. For example: 'KT vs КТ'.lower() == 'kt vs кт'
  2. Some abbreviations might appear at the beginning of a name, not just at the end. For example ENEL RUSSIA PJSC vs PJSC “AEROFLOT”.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.