[Cuis-dev] isSeparator
Juan Vuletich
juan at cuis.st
Mon May 13 07:58:43 PDT 2024
Hi Ezequiel,
On top of that,
https://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values defines
3 general categories for "separators", but codePoint 00A0 "NON-BREAKING
SPACE" is marked as a "Separator, Space" (Zs), just like 0020, although
it does not 'separate' words in the common sense...
It would be great to have better classification according to
UnicodeData.txt (see Character class >> #initialize). I think a set of
new testing methods to classify characters as "whitespace", "separator",
"non drawable", etc, would be a good addition, if you are in the mood.
WRT #isSeparator, I think that, unless we rename it as something like
#isSeparatorInSmalltalkCode, it is best to keep the "is whitespace,
separator, non-zero width" semantic it already has. Adding additional
codepoints that satisfy this criteria is ok though.
Thanks,
On 5/8/2024 12:31 PM, Ezequiel Birman via Cuis-dev wrote:
> And of course I forgot there are a lot more visible separators, like
> the middle dots in ancient roman texts, phoenician and aegean
> scripts... Currently `isSeparator` is being used during parsing, case
> conversions, trimming, etc. Sometimes meaning blank i.e. non-drawable,
> and sometimes meaning any word separator whether drawable or not.
>
> I'll add an isBlank or isDrawable for my use case, but let me know
> what you think about adding unicode space-like separators to isSeparator.
>
> --
> Eze
>
> On Wed, 8 May 2024 at 15:37, Ezequiel Birman <ebirman77 at gmail.com
> <mailto:ebirman77 at gmail.com>> wrote:
>
> Lately I've started tinkering with text morphs and I was wondering
> about UnicodeCodePoint > #isSeparator. I needed to (in)validate
> non-drawable codepoints including control sequences, but the
> current implementation doesn't include the codepoints for thin
> space, hair space, em space, etc. is it on purpose? For what is
> worth I gathered all the non-drawable codepoints (maybe some are
> still missing):
>
> ^ `#(32 9 10 13 12 160 8192 8193 8194 8195 8196 8197 8198 8199
> 8200 8201 8202 8203 8239 8287 12288)` statePointsTo: value
>
> Also, I learned that there is one separator that *is* drawable:
> The Ogham space mark. Probably, it should be included too, unless
> I am misunderstanding the semantics of isSeparator.
>
> I should have added comments describing the codepoint, will do asap.
>
> --
> Eze
>
--
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20240513/f022b740/attachment.htm>
More information about the Cuis-dev
mailing list