[Cuis-dev] Thoughts about symbols
Juan Vuletich
juan at cuis.st
Sun Dec 1 06:40:43 PST 2024
On 12/1/2024 5:54 AM, Andres Valloud via Cuis-dev wrote:
> So I read the updated comments, and they read in part:
>
> ==============================
> A String is an indexed collection of Characters. In Cuis, Characters
> are Unicode Code Points. In an instance of String, all the Characters
> must be in the first 255 CodePoints, the Latin-1 set. See also
> UnicodeString.
> ==============================
>
> You're saying it yourself: characters are code points. String code
> points must be in Latin-1. To be sure, this must mean "Basic Latin"
> plus "Latin Extended-A" in Unicode, and *not* ISO-8859-1 (the "Latin
> alphabet no. 1" standard, which is where search engines send you if
> you look up "Latin-1").
>
> I say this because if "Latin-1" is to be interpreted as "ISO-8859-1",
> then the comment is not true because ISO-8859-1 has a lot of undefined
> code points (per the Wikipedia page). I do not see anything stopping
> the storage of the zero code point into an instance of String, for
> example.
>
> I really think looking at String as "sequence of unsigned byte code
> points" is much better. The characters you get out of that should be
> Unicode in all cases for the sake of simplicity.
>
> Per the relevant Wikipedia pages, I believe that Latin-1 (meaning
> ISO-8859-1) and Unicode (meaning Basic Latin plus Latin Extended-A)
> match wherever Latin-1 is defined. However, I didn't check this
> exhaustively.
>
Yes. That was not carefully stated. I just pushed some tweaks to the
comments.
Thanks,
--
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich
More information about the Cuis-dev
mailing list