Locale-specific numbers

One of our channel members mentioned a failure in parsing “-1” to an int given a Locale. That’s … fascinating, actually, as I (dreamreal) was unaware of any numbering systems under which that would fail. So, as any programmer would, I wanted a program to show me the locales for which “-1” wasn’t the proper representation for “negative one.”

Being a programmer, I … immediately ran to an AI (Claude, specifically) and had it generate a table of the possibilities, along with the associated locales, that did not fit the “normal” representation of “-1” – where, well, my locale (US English) was “normal.”

If you’re not English, please recognize the humor here – I’m well aware that Urdu readers would think their representation was “normal” and “-1” was not. Or, well, I am aware now and wasn’t before, and my use of “normal” is entirely meant to poke fun at my own English-centric expectations, because it didn’t occur to me that negatives using Arabic sigils might not be the same everywhere.

Anyway, this is what came out of it, for my Java 25 installation. The table shows the representation (the best that I could get WordPress to trivially display it, at least), the Unicode transcription, and then a list of the locales that emitted that particular representation. I didn’t bother including “-1” because, well, it’s a ginormous list and this table is long enough already.

Representation	Character Analysis	Locales (168 total)
`-?`	`len:2, chars:U+002D,U+07C1`	nqo (N’Ko), nqo_GN (N’Ko (Guinea)), nqo_GN_#Nkoo (N’Ko (N’Ko, Guinea))
`-?`	`len:2, chars:U+002D,U+0967`	bgc (Haryanvi), bgc_IN (Haryanvi (India)), bgc_IN_#Deva (Haryanvi (Devanagari, India)), bho (Bhojpuri), bho_IN (Bhojpuri (India)), bho_IN_#Deva (Bhojpuri (Devanagari, India)), mr (Marathi), mr_IN (Marathi (India)), mr_IN_#Deva (Marathi (Devanagari, India)), ne (Nepali), ne_IN (Nepali (India)), ne_NP (Nepali (Nepal)), ne_NP_#Deva (Nepali (Devanagari, Nepal)), raj (Rajasthani), raj_IN (Rajasthani (India)), raj_IN_#Deva (Rajasthani (Devanagari, India)), sa (Sanskrit), sa_IN (Sanskrit (India)), sa_IN_#Deva (Sanskrit (Devanagari, India))
`-?`	`len:2, chars:U+002D,U+09E7`	as (Assamese), as_IN (Assamese (India)), as_IN_#Beng (Assamese (Bangla, India)), bn (Bangla), bn_BD (Bangla (Bangladesh)), bn_BD_#Beng (Bangla (Bangla, Bangladesh)), bn_IN (Bangla (India)), mni (Manipuri), mni_IN (Manipuri (India)), mni_IN_#Beng (Manipuri (Bangla, India)), mni__#Beng (Manipuri (Bangla))
`-?`	`len:2, chars:U+002D,U+0E51`	th_TH_TH_#u-nu-thai (Thai (Thailand, TH, Thai Digits))
`-?`	`len:2, chars:U+002D,U+0F21`	dz (Dzongkha), dz_BT (Dzongkha (Bhutan)), dz_BT_#Tibt (Dzongkha (Tibetan, Bhutan))
`-?`	`len:2, chars:U+002D,U+1041`	my (Burmese), my_MM (Burmese (Myanmar (Burma))), my_MM_#Mymr (Burmese (Myanmar, Myanmar (Burma)))
`-?`	`len:2, chars:U+002D,U+1C51`	sat (Santali), sat_IN (Santali (India)), sat_IN_#Olck (Santali (Ol Chiki, India)), sat__#Olck (Santali (Ol Chiki))
`-?`	`len:3, chars:U+061C,U+002D,U+0661`	ar_BH (Arabic (Bahrain)), ar_DJ (Arabic (Djibouti)), ar_EG (Arabic (Egypt)), ar_EG_#Arab (Arabic (Arabic, Egypt)), ar_ER (Arabic (Eritrea)), ar_IL (Arabic (Israel)), ar_IQ (Arabic (Iraq)), ar_JO (Arabic (Jordan)), ar_KM (Arabic (Comoros)), ar_KW (Arabic (Kuwait)), ar_LB (Arabic (Lebanon)), ar_MR (Arabic (Mauritania)), ar_OM (Arabic (Oman)), ar_PS (Arabic (Palestinian Territories)), ar_QA (Arabic (Qatar)), ar_SA (Arabic (Saudi Arabia)), ar_SD (Arabic (Sudan)), ar_SO (Arabic (Somalia)), ar_SS (Arabic (South Sudan)), ar_SY (Arabic (Syria)), ar_TD (Arabic (Chad)), ar_YE (Arabic (Yemen)), sd (Sindhi), sd_IN (Sindhi (India)), sd_PK (Sindhi (Pakistan)), sd_PK_#Arab (Sindhi (Arabic, Pakistan)), sd__#Arab (Sindhi (Arabic))
`-1`	`len:3, chars:U+200E,U+002D,U+0031`	ar (Arabic), ar_001 (Arabic (world)), ar_AE (Arabic (United Arab Emirates)), ar_DZ (Arabic (Algeria)), ar_EH (Arabic (Western Sahara)), ar_LY (Arabic (Libya)), ar_MA (Arabic (Morocco)), ar_TN (Arabic (Tunisia)), he (Hebrew), he_IL (Hebrew (Israel)), he_IL_#Hebr (Hebrew (Hebrew, Israel)), ur (Urdu), ur_PK (Urdu (Pakistan)), ur_PK_#Arab (Urdu (Arabic, Pakistan))
`-?`	`len:4, chars:U+200E,U+002D,U+200E,U+06F1`	ks (Kashmiri), ks_IN (Kashmiri (India)), ks_IN_#Arab (Kashmiri (Arabic, India)), ks__#Arab (Kashmiri (Arabic)), lrc (Northern Luri), lrc_IQ (Northern Luri (Iraq)), lrc_IR (Northern Luri (Iran)), lrc_IR_#Arab (Northern Luri (Arabic, Iran)), mzn (Mazanderani), mzn_IR (Mazanderani (Iran)), mzn_IR_#Arab (Mazanderani (Arabic, Iran)), pa_PK_#Arab (Punjabi (Arabic, Pakistan)), pa__#Arab (Punjabi (Arabic)), ps (Pashto), ps_AF (Pashto (Afghanistan)), ps_AF_#Arab (Pashto (Arabic, Afghanistan)), ps_PK (Pashto (Pakistan)), ur_IN (Urdu (India)), uz_AF_#Arab (Uzbek (Arabic, Afghanistan)), uz__#Arab (Uzbek (Arabic))
`??`	`len:3, chars:U+200E,U+2212,U+06F1`	fa (Persian), fa_AF (Persian (Afghanistan)), fa_IR (Persian (Iran)), fa_IR_#Arab (Persian (Arabic, Iran))
`-?`	`len:3, chars:U+200F,U+002D,U+0661`	ckb (Central Kurdish), ckb_IQ (Central Kurdish (Iraq)), ckb_IQ_#Arab (Central Kurdish (Arabic, Iraq)), ckb_IR (Central Kurdish (Iran))
`?1`	`len:2, chars:U+2212,U+0031`	et (Estonian), et_EE (Estonian (Estonia)), et_EE_#Latn (Estonian (Latin, Estonia)), eu (Basque), eu_ES (Basque (Spain)), eu_ES_#Latn (Basque (Latin, Spain)), fi (Finnish), fi_FI (Finnish (Finland)), fi_FI_#Latn (Finnish (Latin, Finland)), fo (Faroese), fo_DK (Faroese (Denmark)), fo_FO (Faroese (Faroe Islands)), fo_FO_#Latn (Faroese (Latin, Faroe Islands)), gsw (Swiss German), gsw_CH (Swiss German (Switzerland)), gsw_CH_#Latn (Swiss German (Latin, Switzerland)), gsw_FR (Swiss German (France)), gsw_LI (Swiss German (Liechtenstein)), hr (Croatian), hr_BA (Croatian (Bosnia & Herzegovina)), hr_HR (Croatian (Croatia)), hr_HR_#Latn (Croatian (Latin, Croatia)), ksh (Colognian), ksh_DE (Colognian (Germany)), ksh_DE_#Latn (Colognian (Latin, Germany)), lt (Lithuanian), lt_LT (Lithuanian (Lithuania)), lt_LT_#Latn (Lithuanian (Latin, Lithuania)), nb (Norwegian Bokmål), nb_NO (Norwegian Bokmål (Norway)), nb_NO_#Latn (Norwegian Bokmål (Latin, Norway)), nb_SJ (Norwegian Bokmål (Svalbard & Jan Mayen)), nn (Norwegian Nynorsk), nn_NO (Norwegian Nynorsk (Norway)), nn_NO_#Latn (Norwegian Nynorsk (Latin, Norway)), no (Norwegian), no_NO (Norwegian (Norway)), no_NO_#Latn (Norwegian (Latin, Norway)), no_NO_NY (Norwegian (Norway, Nynorsk)), rm (Romansh), rm_CH (Romansh (Switzerland)), rm_CH_#Latn (Romansh (Latin, Switzerland)), se (Northern Sami), se_FI (Northern Sami (Finland)), se_NO (Northern Sami (Norway)), se_NO_#Latn (Northern Sami (Latin, Norway)), se_SE (Northern Sami (Sweden)), sl (Slovenian), sl_SI (Slovenian (Slovenia)), sl_SI_#Latn (Slovenian (Latin, Slovenia)), sv (Swedish), sv_AX (Swedish (Åland Islands)), sv_FI (Swedish (Finland)), sv_SE (Swedish (Sweden)), sv_SE_#Latn (Swedish (Latin, Sweden))

Leave a Reply Cancel reply