[Cuis-dev] Using OSProcess package

Juan Vuletich juan at cuis.st
Wed Dec 14 06:42:34 PST 2022


Hi Dave,

On 12/12/2022 12:57 PM, David T. Lewis via Cuis-dev wrote:
> Thanks Juan. If anyone sees problems with Unicode strings in OSProcess,
> please mention it here on the list. My understanding of Unicode is
> not good, so if there are problems with string handling I may not
> see or understand the issues without some help.
>
> Thanks,
> Dave

Well, I can annoy you a bit with Unicode stuff :) !

All what follows I tried on my Intel Mac, using OpenSmalltalk Cog from 
2022-11-21. I'm using a fresh, updated Cuis with the current OSProcess. 
I believe results would be similar on Linux and Windows, and I can try 
if you find it useful. I also believe result would be similar in Squeak.

1) Let's add this method to ByteArray:
asHackedUtf8BytesInAString
"The hideous selector is on purpose. Performance is bad."
^ (String newFrom: (self asArray collect: [ :b | Character numericValue: 
b ]))

2) In your default Cuis folder, create a file named 'aa.txt'.

3) With the console where you start Cuis from visible, examples to run 
in a Workspace. Ascii stuff, works as usual:
us1 := 'ls a*.txt'.
OSProcess command: us1.
OSProcess command: us1 asUtf8Bytes asHackedUtf8BytesInAString.
First command sends the ascii string. Second command calls our 'hacked 
string with utf8 method'. Result of this call is exactly the same 
string, because utf8 of ascii means exactly the same bytes.

4) To make things more interesting. Now create a file with this name: 
'agüita.txt'. If necessary, copy and paste the name from here.
us1 := 'ls a*.txt'.
OSProcess command: us1.
OSProcess command: us1 asUtf8Bytes asHackedUtf8BytesInAString.
No big deal. Works as before. The result includes both files.

5) What if we actually want to use non-ascii characters in our command? 
Try these:
us2 := 'ls agüita.txt'.
OSProcess command: us2.
OSProcess command: us2 asUtf8Bytes asHackedUtf8BytesInAString.
With the first command, on my Mac I get `ls: ag?ita.txt: No such file or 
directory`. It seems that $ü was converted into $? by MacOS, right?
But the second example works well, and it finds our file. The reason is 
that what OSs wants is the UTF-8 bytes.
The first example, taken as bytes, is not valid UTF-8, and MacOS 
replaces invalid bytes with $?.
In UTF-8 encoding, $ü takes two bytes, and this horribly hacked String: 
'ls agÃŒita.txt' has exactly the required bytes. It would be nicer if a 
ByteArray was accepted by the plugin. But given that it insists on 
requiring a String, a hacked String with the appropriate bytes makes it 
work.
These hacked Strings that make no sense if viewed as Characters, but 
that contain the appropriate UTF-8 bytes of some reasonable 
UnicodeString should never leak beyond code that is working directly 
with VM or external services, like OSProcess. A selector such as 
#asZeroTerminatedUtf8Bytes seems to make sense. It could also be used in 
Squeak, as the same problems should happen.

6) Finally, and at the risk of this being too much for a single email, 
what happens if we use wildcards?
us3 := 'ls agüita*'.
OSProcess command: us3.
OSProcess command: us3 asUtf8Bytes asHackedUtf8BytesInAString.
On the Mac, none of these find the file. This could be different on 
Linux or Windows.
Some Unicode strings can be expressed in more than one way. Unicode 
speaks about "normalization forms". I implemented them in Cuis: #asNFC 
and #asNFD. So I tried:
OSProcess command: us3 asNFC asUtf8Bytes asHackedUtf8BytesInAString.
OSProcess command: us3 asNFD asUtf8Bytes asHackedUtf8BytesInAString.
The Mac happens to like #asNFD and not #asNFC. Not sure about other OSs.

In Cuis, I tried to minimize the need of dealing with details of 
Unicode, but in code that deals with the external world, it is still needed.

HTH,

-- 
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich



More information about the Cuis-dev mailing list