[Cuis-dev] isSeparator

Juan Vuletich juan at cuis.st
Mon May 13 07:58:43 PDT 2024


Hi Ezequiel,

On top of that, 
https://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values defines 
3 general categories for "separators", but codePoint 00A0 "NON-BREAKING 
SPACE" is marked as a "Separator, Space" (Zs), just like 0020, although 
it does not 'separate' words in the common sense...

It would be great to have better classification according to 
UnicodeData.txt (see Character class >> #initialize). I think a set of 
new testing methods to classify characters as "whitespace", "separator", 
"non drawable", etc, would be a good addition, if you are in the mood.

WRT #isSeparator, I think that, unless we rename it as something like 
#isSeparatorInSmalltalkCode, it is best to keep the "is whitespace, 
separator, non-zero width" semantic it already has. Adding additional 
codepoints that satisfy this criteria is ok though.

Thanks,

On 5/8/2024 12:31 PM, Ezequiel Birman via Cuis-dev wrote:
> And of course I forgot there are a lot more visible separators, like 
> the middle dots in ancient roman texts, phoenician and aegean 
> scripts... Currently `isSeparator` is being used during parsing, case 
> conversions, trimming, etc. Sometimes meaning blank i.e. non-drawable, 
> and sometimes meaning any word separator whether drawable or not.
>
> I'll add an isBlank or isDrawable for my use case, but let me know 
> what you think about adding unicode space-like separators to isSeparator.
>
> -- 
> Eze
>
> On Wed, 8 May 2024 at 15:37, Ezequiel Birman <ebirman77 at gmail.com 
> <mailto:ebirman77 at gmail.com>> wrote:
>
>     Lately I've started tinkering with text morphs and I was wondering
>     about UnicodeCodePoint > #isSeparator. I needed to (in)validate
>     non-drawable codepoints including control sequences, but the
>     current implementation doesn't include the codepoints for thin
>     space, hair space, em space, etc. is it on purpose? For what is
>     worth I gathered all the non-drawable codepoints (maybe some are
>     still missing):
>
>     ^ `#(32 9 10 13 12 160 8192 8193 8194 8195 8196 8197 8198 8199
>     8200 8201 8202 8203 8239 8287 12288)` statePointsTo: value
>
>     Also, I learned that there is one separator that *is* drawable:
>     The Ogham space mark. Probably, it should be included too, unless
>     I am misunderstanding the semantics of isSeparator.
>
>     I should have added comments describing the codepoint, will do asap.
>
>     -- 
>     Eze
>


-- 
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20240513/f022b740/attachment.htm>


More information about the Cuis-dev mailing list