[Cuis-dev] Thoughts about symbols
Andres Valloud
ten at smallinteger.com
Sun Dec 1 07:22:22 PST 2024
Looks nicer!
On 12/1/24 6:40 AM, Juan Vuletich via Cuis-dev wrote:
> On 12/1/2024 5:54 AM, Andres Valloud via Cuis-dev wrote:
>> So I read the updated comments, and they read in part:
>>
>> ==============================
>> A String is an indexed collection of Characters. In Cuis, Characters
>> are Unicode Code Points. In an instance of String, all the Characters
>> must be in the first 255 CodePoints, the Latin-1 set. See also
>> UnicodeString.
>> ==============================
>>
>> You're saying it yourself: characters are code points. String code
>> points must be in Latin-1. To be sure, this must mean "Basic Latin"
>> plus "Latin Extended-A" in Unicode, and *not* ISO-8859-1 (the "Latin
>> alphabet no. 1" standard, which is where search engines send you if
>> you look up "Latin-1").
>>
>> I say this because if "Latin-1" is to be interpreted as "ISO-8859-1",
>> then the comment is not true because ISO-8859-1 has a lot of undefined
>> code points (per the Wikipedia page). I do not see anything stopping
>> the storage of the zero code point into an instance of String, for
>> example.
>>
>> I really think looking at String as "sequence of unsigned byte code
>> points" is much better. The characters you get out of that should be
>> Unicode in all cases for the sake of simplicity.
>>
>> Per the relevant Wikipedia pages, I believe that Latin-1 (meaning
>> ISO-8859-1) and Unicode (meaning Basic Latin plus Latin Extended-A)
>> match wherever Latin-1 is defined. However, I didn't check this
>> exhaustively.
>>
>
> Yes. That was not carefully stated. I just pushed some tweaks to the
> comments.
>
> Thanks,
>
More information about the Cuis-dev
mailing list