[Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Juan Vuletich juan at jvuletich.org
Thu Apr 29 12:34:26 PDT 2021


Hi Jaromir,

Yesterday I found some time to finally integrate your great fixes to 
#terminate & friends. While doing this, I turned many of your examples 
into tests. Then I did this one, based on your non-local-return examples:

test1ATerminate
     "Terminate suspended process.
     Test all nested unwind blocks are correctly unwound;
     all unwind blocks halfway     through their execution should be 
completed."

     | p a |
     a _ Array new: 4 withAll: false.
     p := [
             self aux1A: a
         ] newProcess.
     p resume.
     Processor yield.
     "make sure p is suspended and none of the unwind blocks has 
finished yet"
     self assert: p isSuspended.
     a noneSatisfy: [ :b | b ].
     "now terminate the process and make sure all unwind blocks have 
finished"
     p terminate.
     self assert: p isTerminated.
     self assert: a first & a third.
     self deny: a second | a fourth.

aux1A: anArray
     [
         self aux1Ainner: anArray.
     ] ensure: [
         anArray at: 3 put: true].
     anArray at: 4 put: true

aux1Ainner: anArray
         [ ] ensure: [
             [Processor activeProcess suspend] ensure: [
                 ^anArray at: 1 put: true].
             anArray at: 2 put: true]

But this test fail. The 3rd entry doesn't get updated, even if it is in 
an ensured block. I think this is a bug, and I wrote a fix.

The attached zip file with all the relevant code. Please review my tweak 
to #terminate, and its effect on this test. If you agree with it, I'll 
integrate it all.

Thanks,

On 4/14/2021 5:37 AM, Jaromir Matas via Cuis-dev wrote:
>
> Hi Juan, hi all,
>
> I’m enclosing an amended list of examples with fixed #fork and ‘self 
> error’ patterns so that you can use them right away.
>
> > - The notes for a few examples speak of crashes in Cuis. I pushed 
> some fixes a few days ago,
>
> > and now cmd-. works as expected. The examples behave the same as in 
> Squeak.
>
> Yes, it’s perfect :) I removed the crash notes from the example list.
>
> A little correction: I haven’t realized Martin McClure was talking 
> from a Visual Works position - they have different semantics of 
> #valueUninterruptably (at least in 8.3 from 2017, using a semaphore).
>
> The Squeak semantics of #valueUninterruptably is “don’t let anything 
> kill me” (using a non-local return in the unwind block).
>
> Best regards,
>
> Jaromir
>
> *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
> *Sent: *Wednesday, April 14, 2021 0:20
> *To: *Juan Vuletich <mailto:juan at jvuletich.org>; Discussion of Cuis 
> Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>; Hernan Wilkinson 
> <mailto:hernan.wilkinson at 10pines.com>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is 
> broken and inconsistent
>
> Hi Juan,
>
> >
>
> > Hi Jaromir, hi all,
>
> >
>
> > This is a most impressive work!
>
> Many thanks for your encouragement - I'm really glad you like it :)
>
> >
>
> > It made me remember this talk given by Martin McClure in Smalltalks 
> 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00, 
> Martin states the behavior he expects from #terminate, and shows a 
> test. I tried his test in Cuis, and sometimes (but not always) it 
> failed. Jaromir, with your code, Martin's test now pass! In addition, 
> I ran the existing Process tests, and none is made to fail.
>
> Very interesting: I haven't touched the part for releasing critical 
> sections (I thought it was an independent part I wanted to revisit 
> later) but I'm relieved to hear it works properly with the proposed 
> changes :)
>
> In the talk Martin McClure also mentioned #valueUninterruptably. I've 
> noticed Cuis doesn't implement it but I think with the suggested 
> #terminate it could be implemented safely (fingers crossed).
>
> BlockClosure >> valueUninterruptably
>
>                ^ self ifCurtailed: [^ self]
>
> At the moment however, Squeak and Pharo suffer a horrible damage to 
> the image if you try:
>
>                [ self error ] valueUninterruptably
>
> and later e.g.:
>
>                Semaphore new waitTimeoutMSecs: 50
>
> The worst part is the user won't notice anything and a simple code 
> will run as if nothing happened - until an error or a crash happens. 
> One of only a few ways to see something's off is e.g. Squeak's 
> `Processor activeProcess = Project current uiProcess` which all of a 
> sudden answers false (!)
>
> But that's just Squeak/Pharo; Cuis is not affected as badly because it 
> doesn't use the `effectiveProcess` mechanism.
>
> Martin McClure also mentioned #valueUnpreemptively; funny thing is it 
> doesn't work as designed because there's a bug in Process>>#priority 
> which will leave the elevated process run at highest priority until it 
> reaches a suspension point instead of bumping the priority back lower 
> before returning from #valueUnpreemptively. It's discussed in this 
> thread: 
> http://forum.world.st/The-Inbox-Kernel-jar-1368-mcz-tp5126894p5126982.html 
> . The conclusion is a new primitive for #priority is needed.
>
> >
>
> > I also tried all your tests and examples. They cover a really broad 
> and exhaustive set of scenarios! A few comments:
>
> > - Many examples don't work in Cuis, because we made #fork answer nil 
> (and not the forked process). The reason for this is explained in the 
> comment in #fork. The pattern is instead of "p := [stuff] fork." do "p 
> := [stuff] newProcess. p resume." With this change, all the examples 
> work as expected.
>
> Oops, apologies, I noticed the fork pattern (very reasonable), I 
> modified my #fork to return the process to avoid the need to modify 
> all the examples I created for Squeak :) Same for 'self error' 
> unfortunately - for Cuis
>
>                self error
>
> in the examples has to change to:
>
>                self error: 'unwind error'
>
> sorry...
>
> > - The notes for a few examples speak of crashes in Cuis. I pushed 
> some fixes a few days ago, and now cmd-. works as expected. The 
> examples behave the same as in Squeak. Please pull your repos.
>
> >
>
> I will, thanks!
>
> > Additionally, this remark of you (Jaromir) I don't understand:
>
> > /////
>
> > ad 1) The #isTerminated condition `suspendedContext pc > 
> suspendedContext startpc` is always met after executing the first 
> instruction of the bottom context – generally not a sign of a 
> terminated process.
>
> > \\\\\\
>
> > I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
>
> >
>
> My bad explaining; I was talking about #isTerminated implementation in 
> Cuis (and Pharo) using this old condition - on the last line of:
>
> Process >> isTerminated
>
>                "Answer true if terminated, i.e. can never be resumed 
> again, because have nothing to run."
>
>                self isRunning ifTrue: [^ false].
>
>                ^suspendedContext isNil
>
>                               or: [ suspendedContext pc isNil
>
>                                              or: ["If the 
> suspendedContext is the bottomContext it is the block in 
> Process>>newProcess.
>
>                                                             If so, and 
> the pc is greater than the startpc, the bock has alrteady sent and 
> returned
>
>                                                             from value 
> and there is nothing more to do."
>
>                                                             
> suspendedContext isBottomContext
>
>                                                                            
> and: [
>
>                                                                                           
> suspendedContext pc > suspendedContext startpc]]]
>
> Squeak improved this condition a while ago but it's still insufficient 
> :) It's discussed here: 
> http://forum.world.st/The-Inbox-Kernel-jar-1376-mcz-tp5127335p5127341.html
>
> > In any case, I think this is a great contribution, It perfectly 
> fixes a lot of broken behavior, and doesn't seem to bring problems. 
> I'm all for integrating it.
>
> >
>
> > Jaromir, do you intend to turn the rest of your examples into tests? 
> I think we should also do that, so we integrate all your experiments 
> and what you learnt.
>
> >
>
> I'd love to but I realized I don't know how to build tests involving 
> raising errors and pressing Abandon on the Debugger :D If you could 
> point me in the right direction it'd save me a lot of trying :)
>
> Thanks a lot for all your comments. I'm happy you like the suggestion 
> and would be delighted to see it integrated.
>
> > Thanks!
>
> >
>
> Best regards,
>
> Jaromir
>
> *From: *Juan Vuletich <mailto:juan at jvuletich.org>
> *Sent: *Tuesday, April 13, 2021 20:45
> *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>; Jaromir 
> Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is 
> broken and inconsistent
>
> Hi Jaromir, hi all,
>
> This is a most impressive work!
>
> It made me remember this talk given by Martin McClure in Smalltalks 
> 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00, 
> Martin states the behavior he expects from #terminate, and shows a 
> test. I tried his test in Cuis, and sometimes (but not always) it 
> failed. Jaromir, with your code, Martin's test now pass! In addition, 
> I ran the existing Process tests, and none is made to fail.
>
> I also tried all your tests and examples. They cover a really broad 
> and exhaustive set of scenarios! A few comments:
> - Many examples don't work in Cuis, because we made #fork answer nil 
> (and not the forked process). The reason for this is explained in the 
> comment in #fork. The pattern is instead of "p := [stuff] fork." do "p 
> := [stuff] newProcess. p resume." With this change, all the examples 
> work as expected.
> - The notes for a few examples speak of crashes in Cuis. I pushed some 
> fixes a few days ago, and now cmd-. works as expected. The examples 
> behave the same as in Squeak. Please pull your repos.
>
> Additionally, this remark of you (Jaromir) I don't understand:
> /////
> ad 1) The #isTerminated condition `suspendedContext pc > 
> suspendedContext startpc` is always met after executing the first 
> instruction of the bottom context – generally not a sign of a 
> terminated process.
> \\\\\\
> I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
>
> In any case, I think this is a great contribution, It perfectly fixes 
> a lot of broken behavior, and doesn't seem to bring problems. I'm all 
> for integrating it.
>
> Jaromir, do you intend to turn the rest of your examples into tests? I 
> think we should also do that, so we integrate all your experiments and 
> what you learnt.
>
> Thanks!
>
> -- 
> Juan Vuletich
> www.cuis-smalltalk.org  <http://www.cuis-smalltalk.org>
> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> https://github.com/jvuletich
> https://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
>
>
>
> On 4/13/2021 6:02 AM, Jaromir Matas via Cuis-dev wrote:
>
>     Hi Hernan, hi all,
>
>     I'm enclosing a slightly improved version of #terminate along with
>     a few semantic tests covering basic nested unwind scenarios
>     (terminating an active, suspended, blocked and ready processes in
>     nested unwind blocks). More tests can be based on examples in the
>     enclosed collection - for scenarios involving non-local returns
>     and errors during termination and especially their combinations. A
>     few examples explore resiliency of the code exposed to pathologic
>     situations (nested termination or errors).
>
>     Thanks again for your time. I look forward to your response.
>
>     best,
>
>     Jaromir
>
>     *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
>     *Sent: *Sunday, April 11, 2021 10:51
>     *To: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>;
>     Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
>     *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
>     *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
>     broken and inconsistent
>
>     Hi Hernan,
>
>     Thank you very much for your immediate response.
>
>     > Hi Jaromir,
>
>     >  thank you for sharing this with us! It looks very interesting.
>
>     >  As you can imagine a change of this magnitude and impact has to
>     be analyzed in detail and tested rigorously, and that will take time.
>
>     Oh absolutely, I'm aware of that and wouldn't expect anything less :)
>
>     >  It would help us deeply if there are tests that reproduce the
>     errors, do you have them?
>
>     Yes, I have a set of Squeak tests I'm planning to extend and make
>     presentable ASAP. I'll "translate" them to Cuis too.
>
>     >  If not, could you write them? I know it is a difficult problem
>     to test but if we could have automated tests for this it would
>     help a lot.
>
>     >  Also, it would help to know how you tested it.
>
>     That's important: I tested solely for 'correct' semantics in a
>     series of simple scenarios I'll send you.
>
>     >  For example, have you tried in stress conditions?
>
>     Nope, I'm not that experienced unfortunately :)
>
>     >  for example with an app that uses many processes (ie. seaside or
>     any web framework). Please, do not take this question as
>     disrespectful, but again,
>
>     >  it is a change with a big impact and we need to be sure it will
>     keep the current behavior (but the bugs of course) and it will not
>     introduce new bugs.
>
>     >  I'm also wondering if there could be systems that are
>     implemented as these bugs you mention were "features" ...
>
>     That's an interesting question. I realized the sooner bugs this
>     deep in the system are fixed the less chance someone learns to
>     'live' with them, crippling their code, or even uses them as
>     'features'. I just hope it's not the case. How would we know?
>     Well... actually here's one: while testing it on Pharo I found two
>     tests that sort of 'legitimize' or use the Unwind error (bug) to
>     achieve certain behavior... (if I interpret the tests correctly).
>
>     Thanks again for your questions!
>
>     Jaromir
>
>     >  Thanks!
>
>     >  Hernan.
>
>     *From: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>
>     *Sent: *Sunday, April 11, 2021 1:47
>     *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
>     *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
>     *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
>     broken and inconsistent
>
>     Hi Jaromir,
>
>      thank you for sharing this with us! It looks very interesting.
>
>      As you can imagine a change of this magnitude and impact has to
>     be analyzed in detail and tested rigorously, and that will take
>     time. (Sadly Juan and me are really busy at this time so it will
>     even take more time :-) )
>
>      It would help us deeply if there are tests that reproduce the
>     errors, do you have them? If not, could you write them? I know it
>     is a difficult problem to test but if we could have automated
>     tests for this it would help a lot.
>
>      Also, it would help to know how you tested it. For example, have
>     you tried in stress conditions? for example with an app that uses
>     many processes (ie. seaside or any web framework). Please, do not
>     take this question as disrespectful, but again, it is a change
>     with a big impact and we need to be sure it will keep the current
>     behavior (but the bugs of course) and it will not introduce new bugs.
>
>      I'm also wondering if there could be systems that are implemented
>     as these bugs you mention were "features" ...
>
>      Thanks!
>
>      Hernan.
>
>     On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev
>     <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> wrote:
>
>         Hi All,
>
>         I'd like to present to you a rewrite of #terminate and
>         #isTerminated that fixes a few bugs and inconsistencies in
>         process termination. I hoped it might interest you.
>
>         I'm a Smalltalk enthusiast with education background in
>         math/CS. I've been experimenting with processes in Squeak
>         lately and discovered a few bugs (or at least inconsistencies)
>         in process termination and would like to offer and discuss a
>         solution for Cuis.
>
>         The bugs are not unique to Cuis; Squeak/Pharo inherited them
>         too and to a degree even Visual Works and VA are affected. The
>         proposal presented here doesn't copy any VW or VA solution but
>         rather implements a different approach :)
>
>         Before boring you to death I'll list the bugs:
>
>         1. #isTerminated falsely reports almost any bottom context as
>         terminated
>
>         2. an active process termination bug causes an image freeze
>
>         3. a /nested/ unwind bug: ensure blocks may get skipped during
>         unwind
>
>         4. a failure to complete /nested/ unwind blocks halfway thru
>         execution
>
>         5. a failure to correctly execute a non-local return or an
>         error in an unwind block
>
>         6. inconsistent semantics of protected blocks unwind during
>         active vs. suspended process termination
>
>         The current implementation of #terminate uses three different
>         approaches to terminate a process:
>
>         - the active process is terminated via a 'standard' unwind
>         algorithm used in context #unwindTo: or #resume:,
>
>         - a suspended process termination attempts completing unwind
>         blocks halfway through their execution first using
>         #runUntilErrorOrReturnFrom:
>
>         - and after that the termination continues unwinding the
>         remaining unwind blocks using the simulation algorithm of #popTo:
>
>         This approach /looks/ inconsistent and indeed leads to
>         inconsistencies, undesirable behavior and an instability
>         mentioned above.
>
>         The Idea: In my view the easiest and most consistent solution
>         is to simply extend the existing mechanism for completing
>         halfway-through unwind blocks and let it deal with *all*
>         unwind blocks. To make this approach applicable to terminating
>         the active process, we suspend it first and then terminate it
>         as a suspended process.
>
>         A commented code is enclosed.
>
>         I know it's pretty difficult to get a feedback on an ancient
>         code; I'll be all the more grateful for your inputs.
>
>         Best regards,
>
>         Jaromir
>
>         PS:
>
>         Note the change in newProcess - `Processor activeProcess
>         terminate` is replaced by `Processor activeProcess suspend`
>         because there's no need for `terminate` at the bottom context
>         ("terminate = unwind + suspend") and because it would lead to
>         an infinite loop combined with my proposed changes to #terminate.
>
>         PPS:
>
>         More on the bugs:
>
>         ad 1) The #isTerminated condition `suspendedContext pc >
>         suspendedContext startpc` is always met after executing the
>         first instruction of the bottom context – generally not a sign
>         of a terminated process.
>
>         ad 2) Explained in [1]. Concerns e.g. examples like
>
>                       [ [ Processor activeProcess terminate ] ensure:
>         [ Processor activeProcess terminate ] ] fork.
>
>         ad 3-4) Explained in detail in [2].
>
>         One example to illustrate the bug:
>
>         | p |
>
>         p := [
>
>                       [
>
>                                    [ ] ensure: [
>
>                                                  [ ] ensure: [
>
>                                                               
>         Processor activeProcess suspend.
>
>                                                               
>         Transcript show: 'x1'].
>
>                                                  Transcript show: 'x2']
>
>                       ] ensure: [
>
>                                    Transcript show: 'x3']
>
>         ] newProcess.
>
>         p resume.
>
>         Processor yield.
>
>         p terminate
>
>         ===> x1
>
>         The unwind procedure prints just x1 and skips not only x2 but
>         x3 as well ! You'd like to see them all.
>
>         ad 5) This happens in cases like (better save your image
>         before trying this):
>
>                       [self error: 'e1'] ensure: [^2]      
>          "discovered by Christoph Thiede"
>
>         and
>
>                       [self error: 'e1'] ensure: [self error: 'e2']
>
>         These are generally the types of situations causing the Unwind
>         errors. The root cause is that the unwinding is done via
>         simulation (#popTo) rather than 'directly'; the problem is
>         during the simulated execution of unwind blocks a non-local
>         return forwards the execution into a wrong stack - resulting
>         in the Unwind errors.
>
>         ad 6) I just prefer a unified approach... unless I somehow
>         overlooked a reason for two different approaches (I hope not).
>
>         [1]
>         http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
>
>         [2]
>         http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178
>
>         -- 
>         Cuis-dev mailing list
>         Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
>         https://lists.cuis.st/mailman/listinfo/cuis-dev
>
>
>     -- 
>
>     <https://10pines.com/>
>
>
>       Hernán Wilkinson
>
>
>         Software Developer, Teacher & Coach
>
>     Alem 896, Floor 6, Buenos Aires, Argentina
>
>     +54 11 6091 3125
>
>     @HernanWilkinson
>


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210429/4c8355c3/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Terminate-fixesAndTests.zip
Type: application/x-zip-compressed
Size: 9334 bytes
Desc: not available
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210429/4c8355c3/attachment-0001.bin>


More information about the Cuis-dev mailing list