Canonicalizing locations from user input
I've been working on improving (redoing) the LJ directory which hasn't gotten love in ages. The new design is wonderful and the subject of a future post, but I wanted to share this table first:
The data for Australia is also bad, but nowhere near this bad.
Clearly some canonicalization is in order! (Don't worry, I'll never change what appears on profile pages.... just how searches are grouped...)
| RU--Москва | 76190 | (Moscow) |
| RU--Moscow | 70213 | (Moscow) |
| RU-- | 70188 | |
| RU--Санкт-Петербург | 14703 | (Saint Petersburg) |
| RU--Saint-Petersburg | 4844 | (Saint Petersburg) |
| RU--Питер | 4614 | (Saint Petersburg) |
| RU--SPb | 4209 | (Saint Petersburg) |
| RU--москва | 3743 | (Moscow) |
| RU--Новосибирск | 2887 | |
| RU--Екатеринбург | 2429 | |
| RU--Novosibirsk | 2345 | |
| RU--Moskow | 2232 | (Moscow) |
| RU--СПб | 2170 | (Saint Petersburg) |
| RU--Msk | 2012 | (Moscow) |
| RU--St.Petersburg | 1866 | (Saint Petersburg) |
| RU--St. Petersburg | 1533 | (Saint Petersburg) |
| RU--Нижний Новгород | 1503 | |
| RU--Samara | 1497 | |
| RU--Самара | 1349 | |
| RU--Ростов-на-Дону | 1214 | |
| RU--Челябинск | 1201 | |
| RU--Казань | 1150 | |
| RU--Уфа | 1057 | |
| RU-Moscow-Moscow | 1055 | (Moscow) |
| RU--Иркутск | 1036 | |
| RU-Москва-Москва | 1033 | (Moscow) |
| RU--Воронеж | 1028 | |
| RU--Калининград | 999 | |
| RU--Kazan | 965 | |
| RU--Ufa | 956 | |
| RU--Петербург | 954 | (Saint Petersburg) |
| RU--Красноярск | 950 | |
| RU--Vladivostok | 936 | |
| RU--Краснодар | 935 | |
| RU--Kaliningrad | 932 | |
| RU--Владивосток | 923 | |
| RU--Пермь | 913 | |
| RU--Ekaterinburg | 898 | |
| RU--Perm | 866 | |
| RU--Omsk | 820 |
The data for Australia is also bad, but nowhere near this bad.
Clearly some canonicalization is in order! (Don't worry, I'll never change what appears on profile pages.... just how searches are grouped...)