[Cuis-dev] fileout. proposed 2 new methods for strict file chunks reading

Hernan Wilkinson hernan.wilkinson at 10pines.com
Tue Oct 26 05:53:45 PDT 2021


Hi all!
 great work guys!! Nice tests too!!
 Can I suggest changing the name of the tests to be more declarative? I
think that will help a lot.

 Nicola, about 'delimiterIsTerminator', that is a statement, not a question
so Juan's selection on that keyword is correct from my perspective.

Cheers!
Hernan

On Mon, Oct 25, 2021 at 1:09 PM Nicola Mingotti <nmingotti at gmail.com> wrote:

>
> Hi guys,
>
> Juan, I saw your changes, i think i understood.
>
> My only perplexity is about the parameter slot: 'delimiterIsTerminator', i
> guess
> if it has to be a question it should be 'isDelimiterTerminator' or
> 'isDelimiterATerminator' .
>
> I am not native English so my opinion on this weights as a feather.
>
> bye
> Nicola
>
>
>
>
> On 10/25/21 16:39, Juan Vuletich wrote:
>
> Hi Nicola, Hernán,
>
> This is my take. I tried to be explicit and clear with 'terminator' vs.
> 'separator', and also added the other two implementors of #upTo: in the
> Stream hierarchy. Slightly tweaked your tests, and used them almost
> verbatim for the other two implementors.
>
> Please review.
>
> Thanks,
>
> On 10/25/2021 7:35 AM, Nicola Mingotti via Cuis-dev wrote:
>
>
> Hi Juan,
>
> 1. I corrected the bug you found, added other test cases and made them
> symmetric
> between 'upTo' and 'upToStrict'. There are 2 files attached, one for tests
> one to collect changes to System-Files.
>
> 2. about names, 'terminator', 'separator', i see your point. I am open to
> any
> naming scheme. The motivation that pushes me to ask this enhancement
> of 'upTo' is totally based on log parsing. So, It wouldn't be
> inappropriate also to name the
> boolean parameter something like "logReaderMode". It would be long, but
> easy to detect
> for people involved in this kind of business. I don't dislike also
> "strict" to be honest.
>
>
> bye
> Nicola
>
>
>
> On 10/24/21 18:14, Juan Vuletich wrote:
>
> Hi Nicola,
>
> On 10/23/2021 6:56 PM, Nicola Mingotti via Cuis-dev wrote:
>
>
> Hi Juan,
>
> At the best of my current undestanding I can provide this:
>
> 1. Fileout for tests in BaseImageTests
>
>
> Much better!
>
> 2. A few fileout for new methods and method names
>
>
> I still think the focus should be on terminator vs. separator, especially
> on method and argument names. See https://en.wikipedia.org/wiki/Newline :
>
> "Interpretation
> Two ways to view newlines, both of which are self-consistent, are that
> newlines either separate lines or that they terminate lines. If a newline
> is considered a separator, there will be no newline after the last line of
> a file. Some programs have problems processing the last line of a file if
> it is not terminated by a newline. On the other hand, programs that expect
> newline to be used as a separator will interpret a final newline as
> starting a new (empty) line. Conversely, if a newline is considered a
> terminator, all text lines including the last are expected to be terminated
> by a newline. If the final character sequence in a text file is not a
> newline, the final line of the file may be considered to be an improper or
> incomplete text line, or the file may be considered to be improperly
> truncated. "
>
> 3. I could not replicate the bug you say, I did not understand well maybe.
> if you could send me
> a fail example it would be helpful.
>
>
> Sure. The following test fails even after fixing the obvious bug:
>
> testUpToStrict3
>     | path fs read |
>     path := 'test-{1}.txt' format: {(Float pi * 10e10) floor. } .
>     path asFileEntry fileContents: ((1 to: 100) inject: '' into: [ :prev
> :each | prev, 'A lot of stuff, needs over 2000 chars! ']).
>     fs := path asFileEntry readStream.
>     read := fs upTo: $X strict: true.
>     self assert: (read =  nil).
>     fs close.
>
>
> bye
> Nicola
>
>
> Cheers,
>
>
> On 10/23/21 02:26, Nicola Mingotti wrote:
>
>
> Hi Juan, let me a bit of time to read your references, I thought what I
> sent were test methods,
> clearly i miss part of the story.
>
> There shouldn't be any concatenation of nil and for God sake NO partial
> records.
> This is what I wanted to avoid, apologies.
>
> Tomorrow i will probably be out for the Linux day, i will update when
> possible.
>
>
> bye
> Nicola
>
>
>
>
> On 10/23/21 01:20, Juan Vuletich wrote:
>
> Hi Folks,
>
> The main point here is not "strict vs. legacy", "logically correct vs
> incorrect" or anything like that at all.
>
> The point is "separator vs. terminator", and how using a terminator
> instead of a separator allows processing files while they are still being
> written to. (And this has really no relation with running on a server or
> any other kind of machine.)
>
> Besides, Nicola, your code has a bug when recurring on terminator: it will
> answer the previous partial last record concatenated with nil.
>
> Finally, please take a look at TestCase,SUnit and BaseImageTests.pck.st
> to see what we mean by a "test".
>
> Thanks,
>
> On 10/22/2021 12:18 PM, Nicola Mingotti via Cuis-dev wrote:
>
>
> Hi Hernan,
>
> We will have opportunity to work together on larger problems, this is too
> small.
> It would take more time to talk than to do things ;)
>
> I have a proposed version. I rewrote the methods. wrote the test. I kept a
> good part
> of the original code which may have evolved for efficiency over time.
>
> upToLegacy method can of course be eliminated. it is there only for
> reference.
>
> upTo: XXX --- now calls --->  upTo: XXX strict: false
>
> upTo: XXX strict: XXX ------ is recursive, it needs an extra helper method
> to remember a parameter (Scheme recursion style)  ----> upTo: XXX strict:
> XXX posMemo: xxxx
>
> See attached fileout
>
>
> bye
> Nicola
>
>
>
>
>
> On 10/21/21 19:49, Hernan Wilkinson wrote:
>
> ok, let me know. I wish we could do it together but my agenda (and I guess
> yours) is almost always full...
>
> On Thu, Oct 21, 2021 at 2:32 PM Nicola Mingotti <nmingotti at gmail.com>
> wrote:
>
>>
>> Hi Hernan,
>>
>> ok, let me try, it is too many days i am talking about it.
>>
>> I will let you know soon
>>
>> bye
>> Nicola
>>
>>
>>
>>
>> On 10/21/21 19:02, Hernan Wilkinson wrote:
>>
>> Hi Nicolas,
>>  if you could refactor upTo: to use the same code as strictUpTo: and
>> write the tests to check that everything works as expected, that would be
>> great!
>>  I would not use the names of the Linux stdlib for those messages nor the
>> C functions, it is not necessary...
>>  If you do not have the time to do it, I can give it a try if you wish.
>>
>> Cheers!
>> Hernan.
>>
>>
>> On Thu, Oct 21, 2021 at 12:47 PM Nicola Mingotti <nmingotti at gmail.com>
>> wrote:
>>
>>>
>>> Hi Hernan,
>>>
>>> . forget the code and test. I can rewrite it from scratch with test. I
>>> actually changed
>>> existing code for "politeness" ;)
>>>
>>> . for me it is very important to have this matter fixed, well and for
>>> the future.
>>> It is not good to have standard lib functionality disseminated in my
>>> application packages.
>>>
>>> . since I found Linux stdlib has a function to do well what i want i
>>> will use that name(s)
>>> to avoid confusion and recycle already existing function names.
>>> "getline" and "getdelim".
>>>
>>> . if you really dislike this functions I can put them in OSProcess and
>>> maybe
>>> just link the C version only for Linux/BSD. So much I think they are
>>> valuable in the server environment.
>>>
>>> . to fix this i need maybe 1-2 days. If i need to link the C functions I
>>> don't know, since I never tried.
>>>
>>> So, let me know, if you are not against these functions I am open to
>>> implement them well.
>>>
>>>
>>> ===== Extra considerations whose reading is secondary ==================
>>>
>>> . your fix was one step in the right direction but not enough, you also
>>> need to
>>> bring back the stream pointer to the last existant $A. This is to say:
>>> too complex.
>>> A good method must do all its chore, not leave us back the dirty
>>> business and special conditions.
>>>
>>> . I understand the concision, small core etc. On the other side, i
>>> run Cuis on the servers.  the most important thing there is on servers
>>> are files and
>>> sockets. You must read from there all of the time. It must be easy and
>>> idiot proof,
>>> rock solid and resistant to concurrent processing as far as possible.
>>>
>>> . I see that Python and Ruby standard library do it wrong, at bit better
>>> than Cuis 'upTo' does.
>>> but still bad. They leave you the '\n' at the end, but, if any process
>>> goes on writing
>>> 'f1.txt' Ruby and Python lost the half backed record !
>>> -------- Linux
>>> $> printf 'line-1\nline-2\nline-TRAP' > f1.txt
>>> # python
>>> $> python3.9 -c "f=open('f1.txt','r'); print(f.readlines())"
>>> => ['line-1\n', 'line-2\n', 'line-TRAP']
>>> # ruby
>>> $> ruby -e "f=open('f1.txt','r'); puts f.readlines().to_s;  "
>>> => ["line-1\n", "line-2\n", "line-TRAP"]
>>> # both Python and Ruby ate the half backed record ! bad !
>>> ---------------------------------------------------------
>>>
>>> . C and CommonLisp standard libraries have a way to do it right:
>>> -) CL read-line.
>>> http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_lin.htm#read-line
>>> -) C getline. https://man7.org/linux/man-pages/man3/getline.3.html
>>>
>>> . I understand I am probably the only one running Cuis in the server so
>>> I am the first
>>> to step into a few particular problems.
>>>
>>> . In my opinion Cuis in the Server can be a good match, up to now i have
>>> 2 small
>>> company services working and a big one project in continuous
>>> development.
>>> Time will tell. Sturdiness, undertandability and ease of modification
>>> were my top priority.
>>> Up to now things are at least working.
>>>
>>> ======================================================
>>>
>>> bye
>>> Nicola
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/21/21 14:53, Hernan Wilkinson wrote:
>>>
>>> Hi Nicola,
>>>  I see your point regarding the functionality of upTo:, but you can
>>> easily overcome that using #peekBack. Using you example:
>>> -----
>>> s _ 'hello-1Ahello-2Ahel'.
>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>
>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>
>>> st1 upTo: $A. " 'hello-1' "
>>> st1 upTo: $A. " 'hello-2' "
>>> st1 upTo: $A. " 'hel' "
>>> (st1 atEnd and: [ st1 peekBack ~= $A ]) ifTrue: [ self error: 'End of
>>> file without delimiter ].
>>> ------
>>>
>>>  Regarding my concern of adding this functionality to Cuis, we are
>>> trying to have a compact set of classes and methods to reduce complexity
>>> (or at least not increase it) and help newcomers to understand it and
>>> oldies to remember it :-) . We are also trying to add more and more tests
>>> because it is the only way to keep a system from becoming a legacy one and
>>> to reduce the fear it produces to change something.
>>>  The strictUpTo:startPos: you are sending is almost a copy of the upTo:
>>> method, with a few lines changed. Even though the functionality makes sense
>>> (although right now you are the only one needing it and as I said, you can
>>> use peekBack to overcome it), adding that method adds repeated code which
>>> in the long term makes it more difficult to understand and maintain, even
>>> more because it does not have tests.
>>>  So I hope you understand that as maintainers of Cuis, we want to be
>>> loyal to the goals I mentioned before and keep Cuis as clean and simple as
>>> possible. If you can refactor what you sent to avoid having repeated code
>>> with #upTo: and add tests that verify the functionality of both methods
>>> (strictUpTo: and upTo:), that will make our task easier and meet the goals
>>> we have. If you think this does not make sense to you, or you do not have
>>> the time to do it, it is completely understandable and in that case I
>>> suggest for you to have it as an extension of the StandardFileStream class
>>> or just use the peekBack message as I showed.
>>>  I hope you understand my concern and agree with me. If not, please let
>>> me know.
>>>
>>> Cheers!
>>> Hernan.
>>>
>>>
>>> On Tue, Oct 19, 2021 at 10:32 AM Nicola Mingotti <nmingotti at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Hernan,
>>>>
>>>> In all frankness, in I would wipe out the old 'upTo' because its
>>>> behavior is a bit "wild".
>>>>
>>>> On the other side, I understand it may create problems in
>>>> retro-compatibility, that is why for
>>>> the moment i propose to add a new method which behaves a bit better.
>>>>
>>>> I hope this example explains the problem:
>>>> -------------------------------------------------------
>>>> s _ 'hello-1Ahello-2Ahel'.
>>>> '/tmp/test.txt' asFileEntry fileContents: s.
>>>>
>>>> st1 _ '/tmp/test.txt' asFileEntry readStream .
>>>>
>>>> st1 upTo: $A. " 'hello-1' "
>>>> st1 upTo: $A. " 'hello-2' "
>>>> st1 upTo: $A. " 'hel' "         "(*)"
>>>> ------------------------------------------------------
>>>> (*) You can't establish in any way if you actually found an "A"
>>>> terminated block or just hit the end of file
>>>> (*) If you hit the end of file you eat an incomplete record, this is
>>>> another problem, maybe another process
>>>> was going to end writing that record but you will never know.
>>>>
>>>> Maybe there is another method around that performs similarly to
>>>> 'strictUpTp', if there is I did not find it, sorry.
>>>>
>>>> IMHO, In a scale of importance from 0 to 10, this method, for a
>>>> programmer, >= 8.
>>>> I would definitely not put it into an external package, too much
>>>> fundamental.
>>>>
>>>> bye
>>>> Nicola
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/19/21 14:44, Hernan Wilkinson wrote:
>>>>
>>>> Hi Nicola!
>>>>  I was wondering, why are you suggesting adding them to the base? Is it
>>>> not enough to implement them as an extension in your package?
>>>>  Also, I think that any new functionality should come with its
>>>> corresponding tests to help the maintenance and understanding of the
>>>> functionality.
>>>>
>>>> Cheers!
>>>> Hernan.
>>>>
>>>>
>>>> On Tue, Oct 19, 2021 at 7:04 AM Nicola Mingotti via Cuis-dev <
>>>> cuis-dev at lists.cuis.st> wrote:
>>>>
>>>>> Hi Juan, guys,
>>>>>
>>>>> I would like to add to Cuis the 2 methods i attach here. One is a
>>>>> helper method.
>>>>>
>>>>> -----------
>>>>> StandardFileStream strictUpTo: delim.
>>>>> -----------
>>>>>
>>>>> Differently from 'upTo: delim' this method:
>>>>> 1. Does not return stuff if it does not find 'delim'.
>>>>> 2. Does not upgrade the position on the stream if does not find
>>>>> 'delim'.
>>>>> 3. If it finds 'delim' returns a chunk that includes it.
>>>>>
>>>>> I am parsing log files at the moment, this is very much useful.
>>>>>
>>>>> NOTE. Up to now I tested only on small files.
>>>>>
>>>>> bye
>>>>> Nicola
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cuis-dev mailing list
>>>>> Cuis-dev at lists.cuis.st
>>>>> https://lists.cuis.st/mailman/listinfo/cuis-dev
>>>>>
>>>>
>>>>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
>
>
>
>
>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
>
>
>
>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
>
>
>

-- 
<https://10pines.com/>Hernán WilkinsonSoftware Developer & Coach

Alem 896, Floor 6, Buenos Aires, Argentina

+54 11 6091 3125

@HernanWilkinson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20211026/3349c74d/attachment-0001.htm>


More information about the Cuis-dev mailing list