When working on one small project (but interesting, nevertheless), we needed to solve an interesting request. The customer was a Russian company and there was a requirement for fast database search by city name, so that the search returned the correct results when entering the name in both cyrillic or latin alphabets. E.g. search for ‘Mосква’ must return the same results as search for ‘Moskva’. In other words, we have to search in both, the cyrillic names and the names [transliterated] (https://en.wikipedia.org/wiki/Transliteration) from cyrillic.
The easiest solution would be to add a database column with transliterated latin names of cities, and when querying, search in both columns - cyrillic and latin. A more optimal solution would be to query only one column - the transliterated latin name. The client’s input would be first programmatically transliterated into latin and then the DB would be queried. But how to do the transliteration? PostgreSQL does not provide native transliteration, but nothing prevents us from writing our own transliteration function. The [rules of transliteration] (https://en.wikipedia.org/wiki/Scientific_transliteration_of_Cyrillic) from cyrillic to latin are quite simple, it is basically just string substitution:
CREATE OR REPLACE FUNCTION cyrillic_transliterate(p_string text)
RETURNS character varying AS $BODY$
SELECT replace( replace( replace( replace(
replace( replace( replace( replace( translate(lower($1),
'абвгдеёзийклмнопрстуфхцэы', 'abvgdeezijklmnoprstufхcey'), 'ж', 'zh'),
'ч', 'ch'), 'ш', 'sh'), 'щ', 'shh'), 'ъ', ''), 'ю', 'yu'), 'я', 'ya'), 'ь', '');
$BODY$
LANGUAGE SQL IMMUTABLE COST 100;
ALTER FUNCTION cyrillic_transliterate(p_string text) OWNER TO blog;
The usage of the function is simple. In our case (since it is a static database codelist), we have created the following materialized view:
CREATE MATERIALIZED VIEW MV_CITYREGION AS
SELECT
city.id AS cityId,
city.name AS cityName,
city.population AS population,
region.id AS regionId,
region.name AS regionName,
lower(cyrillic_transliterate(city.name)) AS cityTransliterated,
city.name || ' (' || region.name || ')' AS cityRegion
FROM city
INNER JOIN region ON city.regionid = region.id
ORDER BY city.name, region.name, city.id;
Now we can query the database:
select
cr.cityName,
cr.cityTransliterated,
cr.cityregion
from MV_CITYREGION cr
where
cityTransliterated = 'moskva'
order by
cr.population desc,
cr.cityname asc;
cityname | citytransliterated | cityregion
----------+--------------------+----------------------------
Москва | moskva | Москва (Москва)
Москва | moskva | Москва (Кировская область)
Москва | moskva | Москва (Псковская область)
Москва | moskva | Москва (Тверская область)
(4 rows)
Another option would be, for example, to create [expression index] (https://www.postgresql.org/docs/current/static/indexes-expressional.html) using our cyrillic transliteration function.