[Cuis-dev] Thoughts about symbols

Sun Dec 1 06:40:43 PST 2024

On 12/1/2024 5:54 AM, Andres Valloud via Cuis-dev wrote:
> So I read the updated comments, and they read in part:
>
> ==============================
> A String is an indexed collection of Characters. In Cuis, Characters 
> are Unicode Code Points. In an instance of String, all the Characters 
> must be in the first 255 CodePoints, the Latin-1 set. See also 
> UnicodeString.
> ==============================
>
> You're saying it yourself: characters are code points.  String code 
> points must be in Latin-1.  To be sure, this must mean "Basic Latin" 
> plus "Latin Extended-A" in Unicode, and *not* ISO-8859-1 (the "Latin 
> alphabet no. 1" standard, which is where search engines send you if 
> you look up "Latin-1").
>
> I say this because if "Latin-1" is to be interpreted as "ISO-8859-1", 
> then the comment is not true because ISO-8859-1 has a lot of undefined 
> code points (per the Wikipedia page).  I do not see anything stopping 
> the storage of the zero code point into an instance of String, for 
> example.
>
> I really think looking at String as "sequence of unsigned byte code 
> points" is much better.  The characters you get out of that should be 
> Unicode in all cases for the sake of simplicity.
>
> Per the relevant Wikipedia pages, I believe that Latin-1 (meaning 
> ISO-8859-1) and Unicode (meaning Basic Latin plus Latin Extended-A) 
> match wherever Latin-1 is defined.  However, I didn't check this 
> exhaustively.
>

Yes. That was not carefully stated. I just pushed some tweaks to the 
comments.

Thanks,

-- 
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich