Skip to content

nls: Auto-generate 11692 compose sequences

Most accented and other characters can be decomposed programmatically, and yet the Compose file - thousands of lines long - is entirely manually curated at present. This commit rectifies the situation by generating a large number of Compose sequences with a Python script, generate_compose.py. The comments inside that file describe its workings in more detail.

Existing sequences do not have their meaning changed by this PR; generate_compose.py contains a list of exclusions where necessary to ensure this. The only exceptions are:

  • The existing sequences <Multi_key> <U223C> <slash>: "≁" and <Multi_key> <approximate> <slash>: "≇" clashed; U223C and approximate are synonyms, but the repository tests did not detect this. The second sequence has been corrected to <Multi_key> <U2245> <slash>: "≇" in order to resolve the conflict.
  • A few sequences that produced non-NFC strings, now produce the NFC-normalized form of those strings.

Merge request reports

Loading