[Cuis-dev] Thoughts about symbols

Andres Valloud ten at smallinteger.com
Sun Dec 1 07:22:22 PST 2024


Looks nicer!

On 12/1/24 6:40 AM, Juan Vuletich via Cuis-dev wrote:
> On 12/1/2024 5:54 AM, Andres Valloud via Cuis-dev wrote:
>> So I read the updated comments, and they read in part:
>>
>> ==============================
>> A String is an indexed collection of Characters. In Cuis, Characters 
>> are Unicode Code Points. In an instance of String, all the Characters 
>> must be in the first 255 CodePoints, the Latin-1 set. See also 
>> UnicodeString.
>> ==============================
>>
>> You're saying it yourself: characters are code points.  String code 
>> points must be in Latin-1.  To be sure, this must mean "Basic Latin" 
>> plus "Latin Extended-A" in Unicode, and *not* ISO-8859-1 (the "Latin 
>> alphabet no. 1" standard, which is where search engines send you if 
>> you look up "Latin-1").
>>
>> I say this because if "Latin-1" is to be interpreted as "ISO-8859-1", 
>> then the comment is not true because ISO-8859-1 has a lot of undefined 
>> code points (per the Wikipedia page).  I do not see anything stopping 
>> the storage of the zero code point into an instance of String, for 
>> example.
>>
>> I really think looking at String as "sequence of unsigned byte code 
>> points" is much better.  The characters you get out of that should be 
>> Unicode in all cases for the sake of simplicity.
>>
>> Per the relevant Wikipedia pages, I believe that Latin-1 (meaning 
>> ISO-8859-1) and Unicode (meaning Basic Latin plus Latin Extended-A) 
>> match wherever Latin-1 is defined.  However, I didn't check this 
>> exhaustively.
>>
> 
> Yes. That was not carefully stated. I just pushed some tweaks to the 
> comments.
> 
> Thanks,
> 



More information about the Cuis-dev mailing list