[Cuis-dev] fileout. proposed 2 new methods for strict file chunks reading
Juan Vuletich
juan at jvuletich.org
Mon Oct 25 07:39:15 PDT 2021
Hi Nicola, Hernán,
This is my take. I tried to be explicit and clear with 'terminator' vs.
'separator', and also added the other two implementors of #upTo: in the
Stream hierarchy. Slightly tweaked your tests, and used them almost
verbatim for the other two implementors.
Please review.
Thanks,
On 10/25/2021 7:35 AM, Nicola Mingotti via Cuis-dev wrote:
>
> Hi Juan,
>
> 1. I corrected the bug you found, added other test cases and made them
> symmetric
> between 'upTo' and 'upToStrict'. There are 2 files attached, one for
> tests one to collect changes to System-Files.
>
> 2. about names, 'terminator', 'separator', i see your point. I am open
> to any
> naming scheme. The motivation that pushes me to ask this enhancement
> of 'upTo' is totally based on log parsing. So, It wouldn't be
> inappropriate also to name the
> boolean parameter something like "logReaderMode". It would be long,
> but easy to detect
> for people involved in this kind of business. I don't dislike also
> "strict" to be honest.
>
>
> bye
> Nicola
>
>
>
> On 10/24/21 18:14, Juan Vuletich wrote:
>> Hi Nicola,
>>
>> On 10/23/2021 6:56 PM, Nicola Mingotti via Cuis-dev wrote:
>>>
>>> Hi Juan,
>>>
>>> At the best of my current undestanding I can provide this:
>>>
>>> 1. Fileout for tests in BaseImageTests
>>
>> Much better!
>>
>>> 2. A few fileout for new methods and method names
>>
>> I still think the focus should be on terminator vs. separator,
>> especially on method and argument names. See
>> https://en.wikipedia.org/wiki/Newline :
>>
>> "Interpretation
>> Two ways to view newlines, both of which are self-consistent, are
>> that newlines either separate lines or that they terminate lines. If
>> a newline is considered a separator, there will be no newline after
>> the last line of a file. Some programs have problems processing the
>> last line of a file if it is not terminated by a newline. On the
>> other hand, programs that expect newline to be used as a separator
>> will interpret a final newline as starting a new (empty) line.
>> Conversely, if a newline is considered a terminator, all text lines
>> including the last are expected to be terminated by a newline. If the
>> final character sequence in a text file is not a newline, the final
>> line of the file may be considered to be an improper or incomplete
>> text line, or the file may be considered to be improperly truncated. "
>>
>>> 3. I could not replicate the bug you say, I did not understand well
>>> maybe. if you could send me
>>> a fail example it would be helpful.
>>>
>>
>> Sure. The following test fails even after fixing the obvious bug:
>>
>> testUpToStrict3
>> | path fs read |
>> path := 'test-{1}.txt' format: {(Float pi * 10e10) floor. } .
>> path asFileEntry fileContents: ((1 to: 100) inject: '' into: [
>> :prev :each | prev, 'A lot of stuff, needs over 2000 chars! ']).
>> fs := path asFileEntry readStream.
>> read := fs upTo: $X strict: true.
>> self assert: (read = nil).
>> fs close.
>>
>>>
>>> bye
>>> Nicola
>>>
>>
>> Cheers,
>>
>>>
>>> On 10/23/21 02:26, Nicola Mingotti wrote:
>>>>
>>>> Hi Juan, let me a bit of time to read your references, I thought
>>>> what I sent were test methods,
>>>> clearly i miss part of the story.
>>>>
>>>> There shouldn't be any concatenation of nil and for God sake NO
>>>> partial records.
>>>> This is what I wanted to avoid, apologies.
>>>>
>>>> Tomorrow i will probably be out for the Linux day, i will update
>>>> when possible.
>>>>
>>>>
>>>> bye
>>>> Nicola
>>>>
>>>>
>>>>
>>>>
>>>> On 10/23/21 01:20, Juan Vuletich wrote:
>>>>> Hi Folks,
>>>>>
>>>>> The main point here is not "strict vs. legacy", "logically correct
>>>>> vs incorrect" or anything like that at all.
>>>>>
>>>>> The point is "separator vs. terminator", and how using a
>>>>> terminator instead of a separator allows processing files while
>>>>> they are still being written to. (And this has really no relation
>>>>> with running on a server or any other kind of machine.)
>>>>>
>>>>> Besides, Nicola, your code has a bug when recurring on terminator:
>>>>> it will answer the previous partial last record concatenated with nil.
>>>>>
>>>>> Finally, please take a look at TestCase,SUnit and
>>>>> BaseImageTests.pck.st to see what we mean by a "test".
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On 10/22/2021 12:18 PM, Nicola Mingotti via Cuis-dev wrote:
>>>>>>
>>>>>> Hi Hernan,
>>>>>>
>>>>>> We will have opportunity to work together on larger problems,
>>>>>> this is too small.
>>>>>> It would take more time to talk than to do things ;)
>>>>>>
>>>>>> I have a proposed version. I rewrote the methods. wrote the test.
>>>>>> I kept a good part
>>>>>> of the original code which may have evolved for efficiency over time.
>>>>>>
>>>>>> upToLegacy method can of course be eliminated. it is there only
>>>>>> for reference.
>>>>>>
>>>>>> upTo: XXX --- now calls ---> upTo: XXX strict: false
>>>>>>
>>>>>> upTo: XXX strict: XXX ------ is recursive, it needs an extra
>>>>>> helper method to remember a parameter (Scheme recursion style)
>>>>>> ----> upTo: XXX strict: XXX posMemo: xxxx
>>>>>>
>>>>>> See attached fileout
>>>>>>
>>>>>>
>>>>>> bye
>>>>>> Nicola
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/21/21 19:49, Hernan Wilkinson wrote:
>>>>>>> ok, let me know. I wish we could do it together but my agenda
>>>>>>> (and I guess yours) is almost always full...
>>>>>>>
>>>>>>> On Thu, Oct 21, 2021 at 2:32 PM Nicola Mingotti
>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Hernan,
>>>>>>>
>>>>>>> ok, let me try, it is too many days i am talking about it.
>>>>>>>
>>>>>>> I will let you know soon
>>>>>>>
>>>>>>> bye
>>>>>>> Nicola
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/21/21 19:02, Hernan Wilkinson wrote:
>>>>>>>> Hi Nicolas,
>>>>>>>> if you could refactor upTo: to use the same code as
>>>>>>>> strictUpTo: and write the tests to check that everything
>>>>>>>> works as expected, that would be great!
>>>>>>>> I would not use the names of the Linux stdlib for those
>>>>>>>> messages nor the C functions, it is not necessary...
>>>>>>>> If you do not have the time to do it, I can give it a try
>>>>>>>> if you wish.
>>>>>>>>
>>>>>>>> Cheers!
>>>>>>>> Hernan.
>>>>>>>>
>>>>>>>> On Thu, Oct 21, 2021 at 12:47 PM Nicola Mingotti
>>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Hernan,
>>>>>>>>
>>>>>>>> . forget the code and test. I can rewrite it from
>>>>>>>> scratch with test. I actually changed
>>>>>>>> existing code for "politeness" ;)
>>>>>>>>
>>>>>>>> . for me it is very important to have this matter
>>>>>>>> fixed, well and for the future.
>>>>>>>> It is not good to have standard lib functionality
>>>>>>>> disseminated in my application packages.
>>>>>>>>
>>>>>>>> . since I found Linux stdlib has a function to do well
>>>>>>>> what i want i will use that name(s)
>>>>>>>> to avoid confusion and recycle already existing
>>>>>>>> function names. "getline" and "getdelim".
>>>>>>>>
>>>>>>>> . if you really dislike this functions I can put them
>>>>>>>> in OSProcess and maybe
>>>>>>>> just link the C version only for Linux/BSD. So much I
>>>>>>>> think they are valuable in the server environment.
>>>>>>>>
>>>>>>>> . to fix this i need maybe 1-2 days. If i need to link
>>>>>>>> the C functions I don't know, since I never tried.
>>>>>>>>
>>>>>>>> So, let me know, if you are not against these functions
>>>>>>>> I am open to implement them well.
>>>>>>>>
>>>>>>>>
>>>>>>>> ===== Extra considerations whose reading is secondary
>>>>>>>> ==================
>>>>>>>>
>>>>>>>> . your fix was one step in the right direction but not
>>>>>>>> enough, you also need to
>>>>>>>> bring back the stream pointer to the last existant $A.
>>>>>>>> This is to say: too complex.
>>>>>>>> A good method must do all its chore, not leave us back
>>>>>>>> the dirty business and special conditions.
>>>>>>>>
>>>>>>>> . I understand the concision, small core etc. On the
>>>>>>>> other side, i
>>>>>>>> run Cuis on the servers. the most important thing
>>>>>>>> there is on servers are files and
>>>>>>>> sockets. You must read from there all of the time. It
>>>>>>>> must be easy and idiot proof,
>>>>>>>> rock solid and resistant to concurrent processing as
>>>>>>>> far as possible.
>>>>>>>>
>>>>>>>> . I see that Python and Ruby standard library do it
>>>>>>>> wrong, at bit better than Cuis 'upTo' does.
>>>>>>>> but still bad. They leave you the '\n' at the end, but,
>>>>>>>> if any process goes on writing
>>>>>>>> 'f1.txt' Ruby and Python lost the half backed record !
>>>>>>>> -------- Linux
>>>>>>>> $> printf 'line-1\nline-2\nline-TRAP' > f1.txt
>>>>>>>> # python
>>>>>>>> $> python3.9 -c "f=open('f1.txt','r');
>>>>>>>> print(f.readlines())"
>>>>>>>> => ['line-1\n', 'line-2\n', 'line-TRAP']
>>>>>>>> # ruby
>>>>>>>> $> ruby -e "f=open('f1.txt','r'); puts
>>>>>>>> f.readlines().to_s; "
>>>>>>>> => ["line-1\n", "line-2\n", "line-TRAP"]
>>>>>>>> # both Python and Ruby ate the half backed record ! bad !
>>>>>>>> ---------------------------------------------------------
>>>>>>>>
>>>>>>>> . C and CommonLisp standard libraries have a way to do
>>>>>>>> it right:
>>>>>>>> -) CL read-line.
>>>>>>>> http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_lin.htm#read-line
>>>>>>>>
>>>>>>>> -) C getline.
>>>>>>>> https://man7.org/linux/man-pages/man3/getline.3.html
>>>>>>>>
>>>>>>>> . I understand I am probably the only one running Cuis
>>>>>>>> in the server so I am the first
>>>>>>>> to step into a few particular problems.
>>>>>>>>
>>>>>>>> . In my opinion Cuis in the Server can be a good match,
>>>>>>>> up to now i have 2 small
>>>>>>>> company services working and a big one project in
>>>>>>>> continuous development.
>>>>>>>> Time will tell. Sturdiness, undertandability and ease
>>>>>>>> of modification were my top priority.
>>>>>>>> Up to now things are at least working.
>>>>>>>>
>>>>>>>> ======================================================
>>>>>>>>
>>>>>>>> bye
>>>>>>>> Nicola
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/21/21 14:53, Hernan Wilkinson wrote:
>>>>>>>>> Hi Nicola,
>>>>>>>>> I see your point regarding the functionality of
>>>>>>>>> upTo:, but you can easily overcome that using
>>>>>>>>> #peekBack. Using you example:
>>>>>>>>> -----
>>>>>>>>> s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>
>>>>>>>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>
>>>>>>>>> st1 upTo: $A. " 'hello-1' "
>>>>>>>>> st1 upTo: $A. " 'hello-2' "
>>>>>>>>> st1 upTo: $A. " 'hel' "
>>>>>>>>> (st1 atEnd and: [ st1 peekBack ~= $A ]) ifTrue: [ self
>>>>>>>>> error: 'End of file without delimiter ].
>>>>>>>>> ------
>>>>>>>>> Regarding my concern of adding this functionality to
>>>>>>>>> Cuis, we are trying to have a compact set of classes
>>>>>>>>> and methods to reduce complexity (or at least not
>>>>>>>>> increase it) and help newcomers to understand it and
>>>>>>>>> oldies to remember it :-) . We are also trying to add
>>>>>>>>> more and more tests because it is the only way to keep
>>>>>>>>> a system from becoming a legacy one and to reduce the
>>>>>>>>> fear it produces to change something.
>>>>>>>>> The strictUpTo:startPos: you are sending is almost a
>>>>>>>>> copy of the upTo: method, with a few lines changed.
>>>>>>>>> Even though the functionality makes sense (although
>>>>>>>>> right now you are the only one needing it and as I
>>>>>>>>> said, you can use peekBack to overcome it),
>>>>>>>>> adding that method adds repeated code which in the
>>>>>>>>> long term makes it more difficult to understand and
>>>>>>>>> maintain, even more because it does not have tests.
>>>>>>>>> So I hope you understand that as maintainers of Cuis,
>>>>>>>>> we want to be loyal to the goals I mentioned before
>>>>>>>>> and keep Cuis as clean and simple as possible. If you
>>>>>>>>> can refactor what you sent to avoid having repeated
>>>>>>>>> code with #upTo: and add tests that verify the
>>>>>>>>> functionality of both methods (strictUpTo: and upTo:),
>>>>>>>>> that will make our task easier and meet the goals we
>>>>>>>>> have. If you think this does not make sense to you, or
>>>>>>>>> you do not have the time to do it, it is completely
>>>>>>>>> understandable and in that case I suggest for you to
>>>>>>>>> have it as an extension of the StandardFileStream
>>>>>>>>> class or just use the peekBack message as I showed.
>>>>>>>>> I hope you understand my concern and agree with me.
>>>>>>>>> If not, please let me know.
>>>>>>>>>
>>>>>>>>> Cheers!
>>>>>>>>> Hernan.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 19, 2021 at 10:32 AM Nicola Mingotti
>>>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Hernan,
>>>>>>>>>
>>>>>>>>> In all frankness, in I would wipe out the old
>>>>>>>>> 'upTo' because its behavior is a bit "wild".
>>>>>>>>>
>>>>>>>>> On the other side, I understand it may create
>>>>>>>>> problems in retro-compatibility, that is why for
>>>>>>>>> the moment i propose to add a new method which
>>>>>>>>> behaves a bit better.
>>>>>>>>>
>>>>>>>>> I hope this example explains the problem:
>>>>>>>>> -------------------------------------------------------
>>>>>>>>> s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>
>>>>>>>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>
>>>>>>>>> st1 upTo: $A. " 'hello-1' "
>>>>>>>>> st1 upTo: $A. " 'hello-2' "
>>>>>>>>> st1 upTo: $A. " 'hel' " "(*)"
>>>>>>>>> ------------------------------------------------------
>>>>>>>>> (*) You can't establish in any way if you actually
>>>>>>>>> found an "A" terminated block or just hit the end
>>>>>>>>> of file
>>>>>>>>> (*) If you hit the end of file you eat an
>>>>>>>>> incomplete record, this is another problem, maybe
>>>>>>>>> another process
>>>>>>>>> was going to end writing that record but you will
>>>>>>>>> never know.
>>>>>>>>>
>>>>>>>>> Maybe there is another method around that performs
>>>>>>>>> similarly to 'strictUpTp', if there is I did not
>>>>>>>>> find it, sorry.
>>>>>>>>>
>>>>>>>>> IMHO, In a scale of importance from 0 to 10, this
>>>>>>>>> method, for a programmer, >= 8.
>>>>>>>>> I would definitely not put it into an external
>>>>>>>>> package, too much fundamental.
>>>>>>>>>
>>>>>>>>> bye
>>>>>>>>> Nicola
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/19/21 14:44, Hernan Wilkinson wrote:
>>>>>>>>>> Hi Nicola!
>>>>>>>>>> I was wondering, why are you suggesting adding
>>>>>>>>>> them to the base? Is it not enough to implement
>>>>>>>>>> them as an extension in your package?
>>>>>>>>>> Also, I think that any new functionality should
>>>>>>>>>> come with its corresponding tests to help the
>>>>>>>>>> maintenance and understanding of the functionality.
>>>>>>>>>>
>>>>>>>>>> Cheers!
>>>>>>>>>> Hernan.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 19, 2021 at 7:04 AM Nicola Mingotti
>>>>>>>>>> via Cuis-dev <cuis-dev at lists.cuis.st> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Juan, guys,
>>>>>>>>>>
>>>>>>>>>> I would like to add to Cuis the 2 methods i
>>>>>>>>>> attach here. One is a helper method.
>>>>>>>>>>
>>>>>>>>>> -----------
>>>>>>>>>> StandardFileStream strictUpTo: delim.
>>>>>>>>>> -----------
>>>>>>>>>>
>>>>>>>>>> Differently from 'upTo: delim' this method:
>>>>>>>>>> 1. Does not return stuff if it does not find
>>>>>>>>>> 'delim'.
>>>>>>>>>> 2. Does not upgrade the position on the
>>>>>>>>>> stream if does not find 'delim'.
>>>>>>>>>> 3. If it finds 'delim' returns a chunk that
>>>>>>>>>> includes it.
>>>>>>>>>>
>>>>>>>>>> I am parsing log files at the moment, this is
>>>>>>>>>> very much useful.
>>>>>>>>>>
>>>>>>>>>> NOTE. Up to now I tested only on small files.
>>>>>>>>>>
>>>>>>>>>> bye
>>>>>>>>>> Nicola
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Cuis-dev mailing list
>>>>>>>>>> Cuis-dev at lists.cuis.st
>>>>>>>>>> https://lists.cuis.st/mailman/listinfo/cuis-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> Juan Vuletich
>>>>> www.cuis-smalltalk.org
>>>>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>>>>> https://github.com/jvuletich
>>>>> https://www.linkedin.com/in/juan-vuletich-75611b3
>>>>> @JuanVuletich
>>>>
>>>
>>
>>
>> --
>> Juan Vuletich
>> www.cuis-smalltalk.org
>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>> https://github.com/jvuletich
>> https://www.linkedin.com/in/juan-vuletich-75611b3
>> @JuanVuletich
>
--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0001.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UnsavedChangesTo-BaseImageTests-jmv.002.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 4941-upTodelimiterIsTerminator-NicolaMingotti-JuanVuletich-2021Oct25-09h28m-jmv.001.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3895204a/attachment-0003.ksh>
More information about the Cuis-dev
mailing list