[Cuis-dev] fileout. proposed 2 new methods for strict file chunks reading
Nicola Mingotti
nmingotti at gmail.com
Mon Oct 25 09:09:38 PDT 2021
Hi guys,
Juan, I saw your changes, i think i understood.
My only perplexity is about the parameter slot: 'delimiterIsTerminator',
i guess
if it has to be a question it should be 'isDelimiterTerminator' or
'isDelimiterATerminator' .
I am not native English so my opinion on this weights as a feather.
bye
Nicola
On 10/25/21 16:39, Juan Vuletich wrote:
> Hi Nicola, Hernán,
>
> This is my take. I tried to be explicit and clear with 'terminator'
> vs. 'separator', and also added the other two implementors of #upTo:
> in the Stream hierarchy. Slightly tweaked your tests, and used them
> almost verbatim for the other two implementors.
>
> Please review.
>
> Thanks,
>
> On 10/25/2021 7:35 AM, Nicola Mingotti via Cuis-dev wrote:
>>
>> Hi Juan,
>>
>> 1. I corrected the bug you found, added other test cases and made
>> them symmetric
>> between 'upTo' and 'upToStrict'. There are 2 files attached, one for
>> tests one to collect changes to System-Files.
>>
>> 2. about names, 'terminator', 'separator', i see your point. I am
>> open to any
>> naming scheme. The motivation that pushes me to ask this enhancement
>> of 'upTo' is totally based on log parsing. So, It wouldn't be
>> inappropriate also to name the
>> boolean parameter something like "logReaderMode". It would be long,
>> but easy to detect
>> for people involved in this kind of business. I don't dislike also
>> "strict" to be honest.
>>
>>
>> bye
>> Nicola
>>
>>
>>
>> On 10/24/21 18:14, Juan Vuletich wrote:
>>> Hi Nicola,
>>>
>>> On 10/23/2021 6:56 PM, Nicola Mingotti via Cuis-dev wrote:
>>>>
>>>> Hi Juan,
>>>>
>>>> At the best of my current undestanding I can provide this:
>>>>
>>>> 1. Fileout for tests in BaseImageTests
>>>
>>> Much better!
>>>
>>>> 2. A few fileout for new methods and method names
>>>
>>> I still think the focus should be on terminator vs. separator,
>>> especially on method and argument names. See
>>> https://en.wikipedia.org/wiki/Newline :
>>>
>>> "Interpretation
>>> Two ways to view newlines, both of which are self-consistent, are
>>> that newlines either separate lines or that they terminate lines. If
>>> a newline is considered a separator, there will be no newline after
>>> the last line of a file. Some programs have problems processing the
>>> last line of a file if it is not terminated by a newline. On the
>>> other hand, programs that expect newline to be used as a separator
>>> will interpret a final newline as starting a new (empty) line.
>>> Conversely, if a newline is considered a terminator, all text lines
>>> including the last are expected to be terminated by a newline. If
>>> the final character sequence in a text file is not a newline, the
>>> final line of the file may be considered to be an improper or
>>> incomplete text line, or the file may be considered to be improperly
>>> truncated. "
>>>
>>>> 3. I could not replicate the bug you say, I did not understand well
>>>> maybe. if you could send me
>>>> a fail example it would be helpful.
>>>>
>>>
>>> Sure. The following test fails even after fixing the obvious bug:
>>>
>>> testUpToStrict3
>>> | path fs read |
>>> path := 'test-{1}.txt' format: {(Float pi * 10e10) floor. } .
>>> path asFileEntry fileContents: ((1 to: 100) inject: '' into: [
>>> :prev :each | prev, 'A lot of stuff, needs over 2000 chars! ']).
>>> fs := path asFileEntry readStream.
>>> read := fs upTo: $X strict: true.
>>> self assert: (read = nil).
>>> fs close.
>>>
>>>>
>>>> bye
>>>> Nicola
>>>>
>>>
>>> Cheers,
>>>
>>>>
>>>> On 10/23/21 02:26, Nicola Mingotti wrote:
>>>>>
>>>>> Hi Juan, let me a bit of time to read your references, I thought
>>>>> what I sent were test methods,
>>>>> clearly i miss part of the story.
>>>>>
>>>>> There shouldn't be any concatenation of nil and for God sake NO
>>>>> partial records.
>>>>> This is what I wanted to avoid, apologies.
>>>>>
>>>>> Tomorrow i will probably be out for the Linux day, i will update
>>>>> when possible.
>>>>>
>>>>>
>>>>> bye
>>>>> Nicola
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/23/21 01:20, Juan Vuletich wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> The main point here is not "strict vs. legacy", "logically
>>>>>> correct vs incorrect" or anything like that at all.
>>>>>>
>>>>>> The point is "separator vs. terminator", and how using a
>>>>>> terminator instead of a separator allows processing files while
>>>>>> they are still being written to. (And this has really no relation
>>>>>> with running on a server or any other kind of machine.)
>>>>>>
>>>>>> Besides, Nicola, your code has a bug when recurring on
>>>>>> terminator: it will answer the previous partial last record
>>>>>> concatenated with nil.
>>>>>>
>>>>>> Finally, please take a look at TestCase,SUnit and
>>>>>> BaseImageTests.pck.st to see what we mean by a "test".
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On 10/22/2021 12:18 PM, Nicola Mingotti via Cuis-dev wrote:
>>>>>>>
>>>>>>> Hi Hernan,
>>>>>>>
>>>>>>> We will have opportunity to work together on larger problems,
>>>>>>> this is too small.
>>>>>>> It would take more time to talk than to do things ;)
>>>>>>>
>>>>>>> I have a proposed version. I rewrote the methods. wrote the
>>>>>>> test. I kept a good part
>>>>>>> of the original code which may have evolved for efficiency over
>>>>>>> time.
>>>>>>>
>>>>>>> upToLegacy method can of course be eliminated. it is there only
>>>>>>> for reference.
>>>>>>>
>>>>>>> upTo: XXX --- now calls ---> upTo: XXX strict: false
>>>>>>>
>>>>>>> upTo: XXX strict: XXX ------ is recursive, it needs an extra
>>>>>>> helper method to remember a parameter (Scheme recursion style)
>>>>>>> ----> upTo: XXX strict: XXX posMemo: xxxx
>>>>>>>
>>>>>>> See attached fileout
>>>>>>>
>>>>>>>
>>>>>>> bye
>>>>>>> Nicola
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/21/21 19:49, Hernan Wilkinson wrote:
>>>>>>>> ok, let me know. I wish we could do it together but my agenda
>>>>>>>> (and I guess yours) is almost always full...
>>>>>>>>
>>>>>>>> On Thu, Oct 21, 2021 at 2:32 PM Nicola Mingotti
>>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Hernan,
>>>>>>>>
>>>>>>>> ok, let me try, it is too many days i am talking about it.
>>>>>>>>
>>>>>>>> I will let you know soon
>>>>>>>>
>>>>>>>> bye
>>>>>>>> Nicola
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/21/21 19:02, Hernan Wilkinson wrote:
>>>>>>>>> Hi Nicolas,
>>>>>>>>> if you could refactor upTo: to use the same code as
>>>>>>>>> strictUpTo: and write the tests to check that everything
>>>>>>>>> works as expected, that would be great!
>>>>>>>>> I would not use the names of the Linux stdlib for those
>>>>>>>>> messages nor the C functions, it is not necessary...
>>>>>>>>> If you do not have the time to do it, I can give it a try
>>>>>>>>> if you wish.
>>>>>>>>>
>>>>>>>>> Cheers!
>>>>>>>>> Hernan.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 21, 2021 at 12:47 PM Nicola Mingotti
>>>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Hernan,
>>>>>>>>>
>>>>>>>>> . forget the code and test. I can rewrite it from
>>>>>>>>> scratch with test. I actually changed
>>>>>>>>> existing code for "politeness" ;)
>>>>>>>>>
>>>>>>>>> . for me it is very important to have this matter
>>>>>>>>> fixed, well and for the future.
>>>>>>>>> It is not good to have standard lib functionality
>>>>>>>>> disseminated in my application packages.
>>>>>>>>>
>>>>>>>>> . since I found Linux stdlib has a function to do well
>>>>>>>>> what i want i will use that name(s)
>>>>>>>>> to avoid confusion and recycle already existing
>>>>>>>>> function names. "getline" and "getdelim".
>>>>>>>>>
>>>>>>>>> . if you really dislike this functions I can put them
>>>>>>>>> in OSProcess and maybe
>>>>>>>>> just link the C version only for Linux/BSD. So much I
>>>>>>>>> think they are valuable in the server environment.
>>>>>>>>>
>>>>>>>>> . to fix this i need maybe 1-2 days. If i need to link
>>>>>>>>> the C functions I don't know, since I never tried.
>>>>>>>>>
>>>>>>>>> So, let me know, if you are not against these
>>>>>>>>> functions I am open to implement them well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ===== Extra considerations whose reading is secondary
>>>>>>>>> ==================
>>>>>>>>>
>>>>>>>>> . your fix was one step in the right direction but not
>>>>>>>>> enough, you also need to
>>>>>>>>> bring back the stream pointer to the last existant $A.
>>>>>>>>> This is to say: too complex.
>>>>>>>>> A good method must do all its chore, not leave us back
>>>>>>>>> the dirty business and special conditions.
>>>>>>>>>
>>>>>>>>> . I understand the concision, small core etc. On the
>>>>>>>>> other side, i
>>>>>>>>> run Cuis on the servers. the most important thing
>>>>>>>>> there is on servers are files and
>>>>>>>>> sockets. You must read from there all of the time. It
>>>>>>>>> must be easy and idiot proof,
>>>>>>>>> rock solid and resistant to concurrent processing as
>>>>>>>>> far as possible.
>>>>>>>>>
>>>>>>>>> . I see that Python and Ruby standard library do it
>>>>>>>>> wrong, at bit better than Cuis 'upTo' does.
>>>>>>>>> but still bad. They leave you the '\n' at the end,
>>>>>>>>> but, if any process goes on writing
>>>>>>>>> 'f1.txt' Ruby and Python lost the half backed record !
>>>>>>>>> -------- Linux
>>>>>>>>> $> printf 'line-1\nline-2\nline-TRAP' > f1.txt
>>>>>>>>> # python
>>>>>>>>> $> python3.9 -c "f=open('f1.txt','r');
>>>>>>>>> print(f.readlines())"
>>>>>>>>> => ['line-1\n', 'line-2\n', 'line-TRAP']
>>>>>>>>> # ruby
>>>>>>>>> $> ruby -e "f=open('f1.txt','r'); puts
>>>>>>>>> f.readlines().to_s; "
>>>>>>>>> => ["line-1\n", "line-2\n", "line-TRAP"]
>>>>>>>>> # both Python and Ruby ate the half backed record ! bad !
>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>
>>>>>>>>> . C and CommonLisp standard libraries have a way to do
>>>>>>>>> it right:
>>>>>>>>> -) CL read-line.
>>>>>>>>> http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_lin.htm#read-line
>>>>>>>>>
>>>>>>>>> -) C getline.
>>>>>>>>> https://man7.org/linux/man-pages/man3/getline.3.html
>>>>>>>>>
>>>>>>>>> . I understand I am probably the only one running Cuis
>>>>>>>>> in the server so I am the first
>>>>>>>>> to step into a few particular problems.
>>>>>>>>>
>>>>>>>>> . In my opinion Cuis in the Server can be a good
>>>>>>>>> match, up to now i have 2 small
>>>>>>>>> company services working and a big one project in
>>>>>>>>> continuous development.
>>>>>>>>> Time will tell. Sturdiness, undertandability and ease
>>>>>>>>> of modification were my top priority.
>>>>>>>>> Up to now things are at least working.
>>>>>>>>>
>>>>>>>>> ======================================================
>>>>>>>>>
>>>>>>>>> bye
>>>>>>>>> Nicola
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/21/21 14:53, Hernan Wilkinson wrote:
>>>>>>>>>> Hi Nicola,
>>>>>>>>>> I see your point regarding the functionality of
>>>>>>>>>> upTo:, but you can easily overcome that using
>>>>>>>>>> #peekBack. Using you example:
>>>>>>>>>> -----
>>>>>>>>>> s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>>
>>>>>>>>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>>
>>>>>>>>>> st1 upTo: $A. " 'hello-1' "
>>>>>>>>>> st1 upTo: $A. " 'hello-2' "
>>>>>>>>>> st1 upTo: $A. " 'hel' "
>>>>>>>>>> (st1 atEnd and: [ st1 peekBack ~= $A ]) ifTrue: [
>>>>>>>>>> self error: 'End of file without delimiter ].
>>>>>>>>>> ------
>>>>>>>>>> Regarding my concern of adding this functionality to
>>>>>>>>>> Cuis, we are trying to have a compact set of classes
>>>>>>>>>> and methods to reduce complexity (or at least not
>>>>>>>>>> increase it) and help newcomers to understand it and
>>>>>>>>>> oldies to remember it :-) . We are also trying to add
>>>>>>>>>> more and more tests because it is the only way to
>>>>>>>>>> keep a system from becoming a legacy one and to
>>>>>>>>>> reduce the fear it produces to change something.
>>>>>>>>>> The strictUpTo:startPos: you are sending is almost a
>>>>>>>>>> copy of the upTo: method, with a few lines changed.
>>>>>>>>>> Even though the functionality makes sense (although
>>>>>>>>>> right now you are the only one needing it and as I
>>>>>>>>>> said, you can use peekBack to overcome it),
>>>>>>>>>> adding that method adds repeated code which in the
>>>>>>>>>> long term makes it more difficult to understand and
>>>>>>>>>> maintain, even more because it does not have tests.
>>>>>>>>>> So I hope you understand that as maintainers of
>>>>>>>>>> Cuis, we want to be loyal to the goals I mentioned
>>>>>>>>>> before and keep Cuis as clean and simple as possible.
>>>>>>>>>> If you can refactor what you sent to avoid having
>>>>>>>>>> repeated code with #upTo: and add tests that verify
>>>>>>>>>> the functionality of both methods (strictUpTo: and
>>>>>>>>>> upTo:), that will make our task easier and meet the
>>>>>>>>>> goals we have. If you think this does not make sense
>>>>>>>>>> to you, or you do not have the time to do it, it is
>>>>>>>>>> completely understandable and in that case I suggest
>>>>>>>>>> for you to have it as an extension of the
>>>>>>>>>> StandardFileStream class or just use the peekBack
>>>>>>>>>> message as I showed.
>>>>>>>>>> I hope you understand my concern and agree with me.
>>>>>>>>>> If not, please let me know.
>>>>>>>>>>
>>>>>>>>>> Cheers!
>>>>>>>>>> Hernan.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 19, 2021 at 10:32 AM Nicola Mingotti
>>>>>>>>>> <nmingotti at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Hernan,
>>>>>>>>>>
>>>>>>>>>> In all frankness, in I would wipe out the old
>>>>>>>>>> 'upTo' because its behavior is a bit "wild".
>>>>>>>>>>
>>>>>>>>>> On the other side, I understand it may create
>>>>>>>>>> problems in retro-compatibility, that is why for
>>>>>>>>>> the moment i propose to add a new method which
>>>>>>>>>> behaves a bit better.
>>>>>>>>>>
>>>>>>>>>> I hope this example explains the problem:
>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>> s _ 'hello-1Ahello-2Ahel'.
>>>>>>>>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>>>>>>>>
>>>>>>>>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>>>>>>>
>>>>>>>>>> st1 upTo: $A. " 'hello-1' "
>>>>>>>>>> st1 upTo: $A. " 'hello-2' "
>>>>>>>>>> st1 upTo: $A. " 'hel' " "(*)"
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>> (*) You can't establish in any way if you
>>>>>>>>>> actually found an "A" terminated block or just
>>>>>>>>>> hit the end of file
>>>>>>>>>> (*) If you hit the end of file you eat an
>>>>>>>>>> incomplete record, this is another problem, maybe
>>>>>>>>>> another process
>>>>>>>>>> was going to end writing that record but you will
>>>>>>>>>> never know.
>>>>>>>>>>
>>>>>>>>>> Maybe there is another method around that
>>>>>>>>>> performs similarly to 'strictUpTp', if there is I
>>>>>>>>>> did not find it, sorry.
>>>>>>>>>>
>>>>>>>>>> IMHO, In a scale of importance from 0 to 10, this
>>>>>>>>>> method, for a programmer, >= 8.
>>>>>>>>>> I would definitely not put it into an external
>>>>>>>>>> package, too much fundamental.
>>>>>>>>>>
>>>>>>>>>> bye
>>>>>>>>>> Nicola
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/19/21 14:44, Hernan Wilkinson wrote:
>>>>>>>>>>> Hi Nicola!
>>>>>>>>>>> I was wondering, why are you suggesting adding
>>>>>>>>>>> them to the base? Is it not enough to implement
>>>>>>>>>>> them as an extension in your package?
>>>>>>>>>>> Also, I think that any new functionality should
>>>>>>>>>>> come with its corresponding tests to help the
>>>>>>>>>>> maintenance and understanding of the functionality.
>>>>>>>>>>>
>>>>>>>>>>> Cheers!
>>>>>>>>>>> Hernan.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 19, 2021 at 7:04 AM Nicola Mingotti
>>>>>>>>>>> via Cuis-dev <cuis-dev at lists.cuis.st> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Juan, guys,
>>>>>>>>>>>
>>>>>>>>>>> I would like to add to Cuis the 2 methods i
>>>>>>>>>>> attach here. One is a helper method.
>>>>>>>>>>>
>>>>>>>>>>> -----------
>>>>>>>>>>> StandardFileStream strictUpTo: delim.
>>>>>>>>>>> -----------
>>>>>>>>>>>
>>>>>>>>>>> Differently from 'upTo: delim' this method:
>>>>>>>>>>> 1. Does not return stuff if it does not find
>>>>>>>>>>> 'delim'.
>>>>>>>>>>> 2. Does not upgrade the position on the
>>>>>>>>>>> stream if does not find 'delim'.
>>>>>>>>>>> 3. If it finds 'delim' returns a chunk that
>>>>>>>>>>> includes it.
>>>>>>>>>>>
>>>>>>>>>>> I am parsing log files at the moment, this
>>>>>>>>>>> is very much useful.
>>>>>>>>>>>
>>>>>>>>>>> NOTE. Up to now I tested only on small files.
>>>>>>>>>>>
>>>>>>>>>>> bye
>>>>>>>>>>> Nicola
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Cuis-dev mailing list
>>>>>>>>>>> Cuis-dev at lists.cuis.st
>>>>>>>>>>> https://lists.cuis.st/mailman/listinfo/cuis-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juan Vuletich
>>>>>> www.cuis-smalltalk.org
>>>>>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>>>>>> https://github.com/jvuletich
>>>>>> https://www.linkedin.com/in/juan-vuletich-75611b3
>>>>>> @JuanVuletich
>>>>>
>>>>
>>>
>>>
>>> --
>>> Juan Vuletich
>>> www.cuis-smalltalk.org
>>> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>>> https://github.com/jvuletich
>>> https://www.linkedin.com/in/juan-vuletich-75611b3
>>> @JuanVuletich
>>
>
>
> --
> Juan Vuletich
> www.cuis-smalltalk.org
> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> https://github.com/jvuletich
> https://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211025/3c565f98/attachment-0001.htm>
More information about the Cuis-dev
mailing list