[Cuis-dev] fileout. proposed 2 new methods for strict file chunks reading

Juan Vuletich juan at jvuletich.org
Mon Oct 25 07:39:15 PDT 2021


Hi Nicola, Hernán,

This is my take. I tried to be explicit and clear with 'terminator' vs. 
'separator', and also added the other two implementors of #upTo: in the 
Stream hierarchy. Slightly tweaked your tests, and used them almost 
verbatim for the other two implementors.

Please review.

Thanks,

On 10/25/2021 7:35 AM, Nicola Mingotti via Cuis-dev wrote:
>
> Hi Juan,
>
> 1. I corrected the bug you found, added other test cases and made them 
> symmetric
> between 'upTo' and 'upToStrict'. There are 2 files attached, one for 
> tests one to collect changes to System-Files.
>
> 2. about names, 'terminator', 'separator', i see your point. I am open 
> to any
> naming scheme. The motivation that pushes me to ask this enhancement
> of 'upTo' is totally based on log parsing. So, It wouldn't be 
> inappropriate also to name the
> boolean parameter something like "logReaderMode". It would be long, 
> but easy to detect
> for people involved in this kind of business. I don't dislike also 
> "strict" to be honest.
>
>
> bye
> Nicola
>
>
>
> On 10/24/21 18:14, Juan Vuletich wrote:
>> Hi Nicola,
>>
>> On 10/23/2021 6:56 PM, Nicola Mingotti via Cuis-dev wrote:
>>>
>>> Hi Juan,
>>>
>>> At the best of my current undestanding I can provide this:
>>>
>>> 1. Fileout for tests in BaseImageTests
>>
>> Much better!
>>
>>> 2. A few fileout for new methods and method names
>>
>> I still think the focus should be on terminator vs. separator, 
>> especially on method and argument names. See 
>> https://en.wikipedia.org/wiki/Newline :
>>
>> "Interpretation
>> Two ways to view newlines, both of which are self-consistent, are 
>> that newlines either separate lines or that they terminate lines. If 
>> a newline is considered a separator, there will be no newline after 
>> the last line of a file. Some programs have problems processing the 
>> last line of a file if it is not terminated by a newline. On the 
>> other hand, programs that expect newline to be used as a separator 
>> will interpret a final newline as starting a new (empty) line. 
>> Conversely, if a newline is considered a terminator, all text lines 
>> including the last are expected to be terminated by a newline. If the 
>> final character sequence in a text file is not a newline, the final 
>> line of the file may be considered to be an improper or incomplete 
>> text line, or the file may be considered to be improperly truncated. "
>>
>>> 3. I could not replicate the bug you say, I did not understand well 
>>> maybe. if you could send me
>>> a fail example it would be helpful.
>>>
>>
>> Sure. The following test fails even after fixing the obvious bug:
>>
>> testUpToStrict3
>>     | path fs read |
>>     path := 'test-{1}.txt' format: {(Float pi * 10e10) floor. } .
>>     path asFileEntry fileContents: ((1 to: 100) inject: '' into: [ 
>> :prev :each | prev, 'A lot of stuff, needs over 2000 chars! ']).
>>     fs := path asFileEntry readStream.
>>     read := fs upTo: $X strict: true.
>>     self assert: (read =  nil).
>>     fs close.
>>
>>>
>>> bye
>>> Nicola
>>>
>>
>> Cheers,
>>
>>>
>>> On 10/23/21 02:26, Nicola Mingotti wrote:
>>>>
>>>> Hi Juan, let me a bit of time to read your references, I thought 
>>>> what I sent were test methods,
>>>> clearly i miss part of the story.
>>>>
>>>> There shouldn't be any concatenation of nil and for God sake NO 
>>>> partial records.
>>>> This is what I wanted to avoid, apologies.
>>>>
>>>> Tomorrow i will probably be out for the Linux day, i will update 
>>>> when possible.
>>>>
>>>>
>>>> bye
>>>> Nicola
>>>>
>>>>
>>>>
>>>>
>>>> On 10/23/21 01:20, Juan Vuletich wrote:
>>>>> Hi Folks,
>>>>>
>>>>> The main point here is not "strict vs. legacy", "logically correct 
>>>>> vs incorrect" or anything like that at all.
>>>>>
>>>>> The point is "separator vs. terminator", and how using a 
>>>>> terminator instead of a separator allows processing files while 
>>>>> they are still being written to. (And this has really no relation 
>>>>> with running on a server or any other kind of machine.)
>>>>>
>>>>> Besides, Nicola, your code has a bug when recurring on terminator: 
>>>>> it will answer the previous partial last record concatenated with nil.
>>>>>
>>>>> Finally, please take a look at TestCase,SUnit and 
>>>>> BaseImageTests.pck.st to see what we mean by a "test".
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On 10/22/2021 12:18 PM, Nicola Mingotti via Cuis-dev wrote:
>>>>>>
>>>>>> Hi Hernan,
>>>>>>
>>>>>> We will have opportunity to work together on larger problems, 
>>>>>> this is too small.
>>>>>> It would take more time to talk than to do things ;)
>>>>>>
>>>>>> I have a proposed version. I rewrote the methods. wrote the test. 
>>>>>> I kept a good part
>>>>>> of the original code which may have evolved for efficiency over time.
>>>>>>
>>>>>> upToLegacy method can of course be eliminated. it is there only 
>>>>>> for reference.
>>>>>>
>>>>>> upTo: XXX --- now calls --->  upTo: XXX strict: false
>>>>>>
>>>>>> upTo: XXX strict: XXX ------ is recursive, it needs an extra 
>>>>>> helper method to remember a parameter (Scheme recursion style)  
>>>>>> ----> upTo: XXX strict: XXX posMemo: xxxx
>>>>>>
>>>>>> See attached fileout
>>>>>>
>>>>>>
>>>>>> bye
>>>>>> Nicola
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/21/21 19:49, Hernan Wilkinson wrote:
>>>>>>> ok, let me know. I wish we could do it together but my agenda 
>>>>>>> (and I guess yours) is almost always full...
>>>>>>>
>>>>>>> On Thu, Oct 21, 2021 at 2:32 PM Nicola Mingotti 
>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>     Hi Hernan,
>>>>>>>
>>>>>>>     ok, let me try, it is too many days i am talking about it.
>>>>>>>
>>>>>>>     I will let you know soon
>>>>>>>
>>>>>>>     bye
>>>>>>>     Nicola
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On 10/21/21 19:02, Hernan Wilkinson wrote:
>>>>>>>>     Hi Nicolas,
>>>>>>>>      if you could refactor upTo: to use the same code as
>>>>>>>>     strictUpTo: and write the tests to check that everything
>>>>>>>>     works as expected, that would be great!
>>>>>>>>      I would not use the names of the Linux stdlib for those
>>>>>>>>     messages nor the C functions, it is not necessary...
>>>>>>>>      If you do not have the time to do it, I can give it a try
>>>>>>>>     if you wish.
>>>>>>>>
>>>>>>>>     Cheers!
>>>>>>>>     Hernan.
>>>>>>>>
>>>>>>>>     On Thu, Oct 21, 2021 at 12:47 PM Nicola Mingotti
>>>>>>>>     <nmingotti at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>         Hi Hernan,
>>>>>>>>
>>>>>>>>         . forget the code and test. I can rewrite it from
>>>>>>>>         scratch with test. I actually changed
>>>>>>>>         existing code for "politeness" ;)
>>>>>>>>
>>>>>>>>         . for me it is very important to have this matter
>>>>>>>>         fixed, well and for the future.
>>>>>>>>         It is not good to have standard lib functionality
>>>>>>>>         disseminated in my application packages.
>>>>>>>>
>>>>>>>>         . since I found Linux stdlib has a function to do well
>>>>>>>>         what i want i will use that name(s)
>>>>>>>>         to avoid confusion and recycle already existing
>>>>>>>>         function names. "getline" and "getdelim".
>>>>>>>>
>>>>>>>>         . if you really dislike this functions I can put them
>>>>>>>>         in OSProcess and maybe
>>>>>>>>         just link the C version only for Linux/BSD. So much I
>>>>>>>>         think they are valuable in the server environment.
>>>>>>>>
>>>>>>>>         . to fix this i need maybe 1-2 days. If i need to link
>>>>>>>>         the C functions I don't know, since I never tried.
>>>>>>>>
>>>>>>>>         So, let me know, if you are not against these functions
>>>>>>>>         I am open to implement them well.
>>>>>>>>
>>>>>>>>
>>>>>>>>         ===== Extra considerations whose reading is secondary
>>>>>>>>         ==================
>>>>>>>>
>>>>>>>>         . your fix was one step in the right direction but not
>>>>>>>>         enough, you also need to
>>>>>>>>         bring back the stream pointer to the last existant $A.
>>>>>>>>         This is to say: too complex.
>>>>>>>>         A good method must do all its chore, not leave us back
>>>>>>>>         the dirty business and special conditions.
>>>>>>>>
>>>>>>>>         . I understand the concision, small core etc. On the
>>>>>>>>         other side, i
>>>>>>>>         run Cuis on the servers.  the most important thing
>>>>>>>>         there is on servers are files and
>>>>>>>>         sockets. You must read from there all of the time. It
>>>>>>>>         must be easy and idiot proof,
>>>>>>>>         rock solid and resistant to concurrent processing as
>>>>>>>>         far as possible.
>>>>>>>>
>>>>>>>>         . I see that Python and Ruby standard library do it
>>>>>>>>         wrong, at bit better than Cuis 'upTo' does.
>>>>>>>>         but still bad. They leave you the '\n' at the end, but,
>>>>>>>>         if any process goes on writing
>>>>>>>>         'f1.txt' Ruby and Python lost the half backed record !
>>>>>>>>         -------- Linux
>>>>>>>>         $> printf 'line-1\nline-2\nline-TRAP' > f1.txt
>>>>>>>>         # python
>>>>>>>>         $> python3.9 -c "f=open('f1.txt','r');
>>>>>>>>         print(f.readlines())"
>>>>>>>>         => ['line-1\n', 'line-2\n', 'line-TRAP']
>>>>>>>>         # ruby
>>>>>>>>         $> ruby -e "f=open('f1.txt','r'); puts
>>>>>>>>         f.readlines().to_s;  "
>>>>>>>>         => ["line-1\n", "line-2\n", "line-TRAP"]
>>>>>>>>         # both Python and Ruby ate the half backed record ! bad !
>>>>>>>>         ---------------------------------------------------------
>>>>>>>>
>>>>>>>>         . C and CommonLisp standard libraries have a way to do
>>>>>>>>         it right:
>>>>>>>>         -) CL read-line.
>>>>>>>>         http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_lin.htm#read-line
>>>>>>>>
>>>>>>>>         -) C getline.
>>>>>>>>         https://man7.org/linux/man-pages/man3/getline.3.html
>>>>>>>>
>>>>>>>>         . I understand I am probably the only one running Cuis
>>>>>>>>         in the server so I am the first
>>>>>>>>         to step into a few particular problems.
>>>>>>>>
>>>>>>>>         . In my opinion Cuis in the Server can be a good match,
>>>>>>>>         up to now i have 2 small
>>>>>>>>         company services working and a big one project in
>>>>>>>>         continuous development.
>>>>>>>>         Time will tell. Sturdiness, undertandability and ease
>>>>>>>>         of modification were my top priority.
>>>>>>>>         Up to now things are at least working.
>>>>>>>>
>>>>>>>>         ======================================================
>>>>>>>>
>>>>>>>>         bye
>>>>>>>>         Nicola
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>         On 10/21/21 14:53, Hernan Wilkinson wrote:
>>>>>>>>>         Hi Nicola,
>>>>>>>>>          I see your point regarding the functionality of
>>>>>>>>>         upTo:, but you can easily overcome that using
>>>>>>>>>         #peekBack. Using you example:
>>>>>>>>>         -----
>>>>>>>>>         s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>>         '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>
>>>>>>>>>         st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>
>>>>>>>>>         st1 upTo: $A. " 'hello-1' "
>>>>>>>>>         st1 upTo: $A. " 'hello-2' "
>>>>>>>>>         st1 upTo: $A. " 'hel' "
>>>>>>>>>         (st1 atEnd and: [ st1 peekBack ~= $A ]) ifTrue: [ self
>>>>>>>>>         error: 'End of file without delimiter ].
>>>>>>>>>         ------
>>>>>>>>>          Regarding my concern of adding this functionality to
>>>>>>>>>         Cuis, we are trying to have a compact set of classes
>>>>>>>>>         and methods to reduce complexity (or at least not
>>>>>>>>>         increase it) and help newcomers to understand it and
>>>>>>>>>         oldies to remember it :-) . We are also trying to add
>>>>>>>>>         more and more tests because it is the only way to keep
>>>>>>>>>         a system from becoming a legacy one and to reduce the
>>>>>>>>>         fear it produces to change something.
>>>>>>>>>          The strictUpTo:startPos: you are sending is almost a
>>>>>>>>>         copy of the upTo: method, with a few lines changed.
>>>>>>>>>         Even though the functionality makes sense (although
>>>>>>>>>         right now you are the only one needing it and as I
>>>>>>>>>         said, you can use peekBack to overcome it),
>>>>>>>>>         adding that method adds repeated code which in the
>>>>>>>>>         long term makes it more difficult to understand and
>>>>>>>>>         maintain, even more because it does not have tests.
>>>>>>>>>          So I hope you understand that as maintainers of Cuis,
>>>>>>>>>         we want to be loyal to the goals I mentioned before
>>>>>>>>>         and keep Cuis as clean and simple as possible. If you
>>>>>>>>>         can refactor what you sent to avoid having repeated
>>>>>>>>>         code with #upTo: and add tests that verify the
>>>>>>>>>         functionality of both methods (strictUpTo: and upTo:),
>>>>>>>>>         that will make our task easier and meet the goals we
>>>>>>>>>         have. If you think this does not make sense to you, or
>>>>>>>>>         you do not have the time to do it, it is completely
>>>>>>>>>         understandable and in that case I suggest for you to
>>>>>>>>>         have it as an extension of the StandardFileStream
>>>>>>>>>         class or just use the peekBack message as I showed.
>>>>>>>>>          I hope you understand my concern and agree with me.
>>>>>>>>>         If not, please let me know.
>>>>>>>>>
>>>>>>>>>         Cheers!
>>>>>>>>>         Hernan.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         On Tue, Oct 19, 2021 at 10:32 AM Nicola Mingotti
>>>>>>>>>         <nmingotti at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             Hi Hernan,
>>>>>>>>>
>>>>>>>>>             In all frankness, in I would wipe out the old
>>>>>>>>>             'upTo' because its behavior is a bit "wild".
>>>>>>>>>
>>>>>>>>>             On the other side, I understand it may create
>>>>>>>>>             problems in retro-compatibility, that is why for
>>>>>>>>>             the moment i propose to add a new method which
>>>>>>>>>             behaves a bit better.
>>>>>>>>>
>>>>>>>>>             I hope this example explains the problem:
>>>>>>>>>             -------------------------------------------------------
>>>>>>>>>             s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>>             '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>
>>>>>>>>>             st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>
>>>>>>>>>             st1 upTo: $A. " 'hello-1' "
>>>>>>>>>             st1 upTo: $A. " 'hello-2' "
>>>>>>>>>             st1 upTo: $A. " 'hel' "         "(*)"
>>>>>>>>>             ------------------------------------------------------
>>>>>>>>>             (*) You can't establish in any way if you actually
>>>>>>>>>             found an "A" terminated block or just hit the end
>>>>>>>>>             of file
>>>>>>>>>             (*) If you hit the end of file you eat an
>>>>>>>>>             incomplete record, this is another problem, maybe
>>>>>>>>>             another process
>>>>>>>>>             was going to end writing that record but you will
>>>>>>>>>             never know.
>>>>>>>>>
>>>>>>>>>             Maybe there is another method around that performs
>>>>>>>>>             similarly to 'strictUpTp', if there is I did not
>>>>>>>>>             find it, sorry.
>>>>>>>>>
>>>>>>>>>             IMHO, In a scale of importance from 0 to 10, this
>>>>>>>>>             method, for a programmer, >= 8.
>>>>>>>>>             I would definitely not put it into an external
>>>>>>>>>             package, too much fundamental.
>>>>>>>>>
>>>>>>>>>             bye
>>>>>>>>>             Nicola
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             On 10/19/21 14:44, Hernan Wilkinson wrote:
>>>>>>>>>>             Hi Nicola!
>>>>>>>>>>              I was wondering, why are you suggesting adding
>>>>>>>>>>             them to the base? Is it not enough to implement
>>>>>>>>>>             them as an extension in your package?
>>>>>>>>>>              Also, I think that any new functionality should
>>>>>>>>>>             come with its corresponding tests to help the
>>>>>>>>>>             maintenance and understanding of the functionality.
>>>>>>>>>>
>>>>>>>>>>             Cheers!
>>>>>>>>>>             Hernan.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>             On Tue, Oct 19, 2021 at 7:04 AM Nicola Mingotti
>>>>>>>>>>             via Cuis-dev <cuis-dev at lists.cuis.st> wrote:
>>>>>>>>>>
>>>>>>>>>>                 Hi Juan, guys,
>>>>>>>>>>
>>>>>>>>>>                 I would like to add to Cuis the 2 methods i
>>>>>>>>>>                 attach here. One is a helper method.
>>>>>>>>>>
>>>>>>>>>>                 -----------
>>>>>>>>>>                 StandardFileStream strictUpTo: delim.
>>>>>>>>>>                 -----------
>>>>>>>>>>
>>>>>>>>>>                 Differently from 'upTo: delim' this method:
>>>>>>>>>>                 1. Does not return stuff if it does not find
>>>>>>>>>>                 'delim'.
>>>>>>>>>>                 2. Does not upgrade the position on the
>>>>>>>>>>                 stream if does not find 'delim'.
>>>>>>>>>>                 3. If it finds 'delim' returns a chunk that
>>>>>>>>>>                 includes it.
>>>>>>>>>>
>>>>>>>>>>                 I am parsing log files at the moment, this is
>>>>>>>>>>                 very much useful.
>>>>>>>>>>
>>>>>>>>>>                 NOTE. Up to now I tested only on small files.
>>>>>>>>>>
>>>>>>>>>>                 bye
>>>>>>>>>>                 Nicola
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 -- 
>>>>>>>>>>                 Cuis-dev mailing list
>>>>>>>>>>                 Cuis-dev at lists.cuis.st
>>>>>>>>>>                 https://lists.cuis.st/mailman/listinfo/cuis-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> -- 
>>>>> Juan Vuletich
>>>>> www.cuis-smalltalk.org
>>>>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>>>>> https://github.com/jvuletich
>>>>> https://www.linkedin.com/in/juan-vuletich-75611b3
>>>>> @JuanVuletich
>>>>
>>>
>>
>>
>> -- 
>> Juan Vuletich
>> www.cuis-smalltalk.org
>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>> https://github.com/jvuletich
>> https://www.linkedin.com/in/juan-vuletich-75611b3
>> @JuanVuletich
>


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0001.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UnsavedChangesTo-BaseImageTests-jmv.002.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 4941-upTodelimiterIsTerminator-NicolaMingotti-JuanVuletich-2021Oct25-09h28m-jmv.001.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0003.ksh>


More information about the Cuis-dev mailing list