[Cuis-dev] Non-ASCII characters in source code files Was: TrueType font import problems

Bernhard Pieber bernhard at pieber.com
Thu Sep 14 13:19:14 PDT 2023


Hi Juan,

Thanks for the clarification. I can now understand the changes in String>>#iso8859s15ToRTFEncoding in 0b58898 and in StyleSet>>#createFeaturesParagraphStyleSet and StyleSet>>#createSampleParagraphStyleSet in 57ce260. It is really great how well this works!

The changes in RTFConversionTest>>#textSample6 in c1d67b5 and in StyledTextTest>>#testOldInstanceDeserialization in 57ce260 look dubious to me, though.

I decided to create the PR for StyledTextEditor anyway:
https://github.com/Cuis-Smalltalk/StyledTextEditor/pull/7

With these changes STE can be loaded again. There are no more test failures and errors than before the font improvement. Feel free to improve my code. ;-)

Cheers,
Bernhard


------- Original Message -------
Juan Vuletich via Cuis-dev <cuis-dev at lists.cuis.st> schrieb am Donnerstag, 14. September 2023 um 19:38:


> 
> 
> Hi Bernhard,
> 
> On 9/14/2023 10:16 AM, Bernhard Pieber via Cuis-dev wrote:
> 
> > Hi Juan,
> > 
> > Thanks for the explanation. Did I understand correctly that old ISO-8859-15-encoded source files should be converted to UTF-8 before they are loaded in Unicode-enabled Cuis? Or does Cuis somehow do that automatically if possible?
> 
> 
> Almost all the code is pure ASCII. Nothing to be done here.
> 
> For non-ASCII parts, usually they are "invalid UTF-8", they can't be
> mistaken for UTF-8 content. Cuis converts these on the fly to the new
> UnicodeString objects. Then, when you save, the new file is UTF-8. So,
> the conversion is done automatically. There is a very small risk of some
> old non-ASCII stuff to be mistaken for UTF-8, leading to wrong code. But
> it is really small. A bit of checking should be enough.
> 
> > IIUC the files on https://github.com/Cuis-Smalltalk/StyledTextEditor should have been converted to UTF-8 already. If yes, I still don't understand why the string literal in RTFExporting.pck.st was changed by resaving the package file? (I did not use my repo for the test.)
> 
> 
> That's what I described now above. Auto convert the instances on load,
> then save in UTF-8 format.
> 
> > How did you do the conversion? Did you use some external tool? (I could not find any code for this in Cuis except from UnicodeString>>#fromBytesStream: but there are no senders.)
> 
> 
> No. Just loaded, saved, and checked that everything looked ok. It did.
> For instance, #nextUtf8BytesAndCodePointInto:into: ends calling
> #utf8BytesAndCodePointFor:byte2:byte3:byte4:into:into: . Check the
> comments in these methods. I had hoped these would be informative enough.
> 
> > Regarding the method iso8859s15ToRTFEncoding, I am pretty sure this is the correct string from the comment (Test for Cent and Euro characters):
> > self assert: 'A¢€' iso8859s15ToRTFEncoding = 'A\u162?\u8364?'
> 
> 
> Cool.
> 
> > Instead of #iso8859s15ToRTFEncoding a new method #toRTFEncoding or #asRTF polymorphic to String and UnicodeString is probably needed, right?
> 
> 
> Yes. That sounds like a good idea. Still, I'd check recent RTF
> documentation. I'd be really surprised if they don't handle UTF-8
> encoding as part of the standard. If the do, maybe all that can simply
> be removed, and just replaced with UTF-8 stuff.
> 
> > Cheers,
> > Bernhard
> 
> 
> Cheers,
> 
> --
> Juan Vuletich
> cuis.st
> github.com/jvuletich
> researchgate.net/profile/Juan-Vuletich
> independent.academia.edu/JuanVuletich
> patents.justia.com/inventor/juan-manuel-vuletich
> linkedin.com/in/juan-vuletich-75611b3
> twitter.com/JuanVuletich
> 
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev


More information about the Cuis-dev mailing list