[Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Tue Apr 13 11:45:17 PDT 2021

Hi Jaromir, hi all,

This is a most impressive work!

It made me remember this talk given by Martin McClure in Smalltalks 
2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00, 
Martin states the behavior he expects from #terminate, and shows a test. 
I tried his test in Cuis, and sometimes (but not always) it failed. 
Jaromir, with your code, Martin's test now pass! In addition, I ran the 
existing Process tests, and none is made to fail.

I also tried all your tests and examples. They cover a really broad and 
exhaustive set of scenarios! A few comments:
- Many examples don't work in Cuis, because we made #fork answer nil 
(and not the forked process). The reason for this is explained in the 
comment in #fork. The pattern is instead of "p := [stuff] fork." do "p 
:= [stuff] newProcess. p resume." With this change, all the examples 
work as expected.
- The notes for a few examples speak of crashes in Cuis. I pushed some 
fixes a few days ago, and now cmd-. works as expected. The examples 
behave the same as in Squeak. Please pull your repos.

Additionally, this remark of you (Jaromir) I don't understand:
/////
ad 1) The #isTerminated condition `suspendedContext pc > 
suspendedContext startpc` is always met after executing the first 
instruction of the bottom context – generally not a sign of a terminated 
process.
\\\\\\
I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).

In any case, I think this is a great contribution, It perfectly fixes a 
lot of broken behavior, and doesn't seem to bring problems. I'm all for 
integrating it.

Jaromir, do you intend to turn the rest of your examples into tests? I 
think we should also do that, so we integrate all your experiments and 
what you learnt.

Thanks!

-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich

On 4/13/2021 6:02 AM, Jaromir Matas via Cuis-dev wrote:
>
> Hi Hernan, hi all,
>
> I'm enclosing a slightly improved version of #terminate along with a 
> few semantic tests covering basic nested unwind scenarios (terminating 
> an active, suspended, blocked and ready processes in nested unwind 
> blocks). More tests can be based on examples in the enclosed 
> collection - for scenarios involving non-local returns and errors 
> during termination and especially their combinations. A few examples 
> explore resiliency of the code exposed to pathologic situations 
> (nested termination or errors).
>
> Thanks again for your time. I look forward to your response.
>
> best,
>
> Jaromir
>
> *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
> *Sent: *Sunday, April 11, 2021 10:51
> *To: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>; 
> Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is 
> broken and inconsistent
>
> Hi Hernan,
>
> Thank you very much for your immediate response.
>
> > Hi Jaromir,
>
> >  thank you for sharing this with us! It looks very interesting.
>
> >  As you can imagine a change of this magnitude and impact has to be 
> analyzed in detail and tested rigorously, and that will take time.
>
> Oh absolutely, I'm aware of that and wouldn't expect anything less :)
>
> >  It would help us deeply if there are tests that reproduce the 
> errors, do you have them?
>
> Yes, I have a set of Squeak tests I'm planning to extend and make 
> presentable ASAP. I'll "translate" them to Cuis too.
>
> >  If not, could you write them? I know it is a difficult problem to 
> test but if we could have automated tests for this it would help a lot.
>
> >  Also, it would help to know how you tested it.
>
> That's important: I tested solely for 'correct' semantics in a series 
> of simple scenarios I'll send you.
>
> >  For example, have you tried in stress conditions?
>
> Nope, I'm not that experienced unfortunately :)
>
> >  for example with an app that uses many processes (ie. seaside or 
> any web framework). Please, do not take this question as 
> disrespectful, but again,
>
> >  it is a change with a big impact and we need to be sure it will 
> keep the current behavior (but the bugs of course) and it will not 
> introduce new bugs.
>
> >  I'm also wondering if there could be systems that are implemented 
> as these bugs you mention were "features" ...
>
> That's an interesting question. I realized the sooner bugs this deep 
> in the system are fixed the less chance someone learns to 'live' with 
> them, crippling their code, or even uses them as 'features'. I just 
> hope it's not the case. How would we know? Well... actually here's 
> one: while testing it on Pharo I found two tests that sort of 
> 'legitimize' or use the Unwind error (bug) to achieve certain 
> behavior... (if I interpret the tests correctly).
>
> Thanks again for your questions!
>
> Jaromir
>
> >  Thanks!
>
> >  Hernan.
>
> *From: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>
> *Sent: *Sunday, April 11, 2021 1:47
> *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is 
> broken and inconsistent
>
> Hi Jaromir,
>
>  thank you for sharing this with us! It looks very interesting.
>
>  As you can imagine a change of this magnitude and impact has to be 
> analyzed in detail and tested rigorously, and that will take time. 
> (Sadly Juan and me are really busy at this time so it will even take 
> more time :-) )
>
>  It would help us deeply if there are tests that reproduce the errors, 
> do you have them? If not, could you write them? I know it is a 
> difficult problem to test but if we could have automated tests for 
> this it would help a lot.
>
>  Also, it would help to know how you tested it. For example, have you 
> tried in stress conditions? for example with an app that uses many 
> processes (ie. seaside or any web framework). Please, do not take this 
> question as disrespectful, but again, it is a change with a big impact 
> and we need to be sure it will keep the current behavior (but the bugs 
> of course) and it will not introduce new bugs.
>
>  I'm also wondering if there could be systems that are implemented as 
> these bugs you mention were "features" ...
>
>  Thanks!
>
>  Hernan.
>
> On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev 
> <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> wrote:
>
>     Hi All,
>
>     I'd like to present to you a rewrite of #terminate and
>     #isTerminated that fixes a few bugs and inconsistencies in process
>     termination. I hoped it might interest you.
>
>     I'm a Smalltalk enthusiast with education background in math/CS.
>     I've been experimenting with processes in Squeak lately and
>     discovered a few bugs (or at least inconsistencies) in process
>     termination and would like to offer and discuss a solution for Cuis.
>
>     The bugs are not unique to Cuis; Squeak/Pharo inherited them too
>     and to a degree even Visual Works and VA are affected. The
>     proposal presented here doesn't copy any VW or VA solution but
>     rather implements a different approach :)
>
>     Before boring you to death I'll list the bugs:
>
>     1. #isTerminated falsely reports almost any bottom context as
>     terminated
>
>     2. an active process termination bug causes an image freeze
>
>     3. a /nested/ unwind bug: ensure blocks may get skipped during unwind
>
>     4. a failure to complete /nested/ unwind blocks halfway thru
>     execution
>
>     5. a failure to correctly execute a non-local return or an error
>     in an unwind block
>
>     6. inconsistent semantics of protected blocks unwind during active
>     vs. suspended process termination
>
>     The current implementation of #terminate uses three different
>     approaches to terminate a process:
>
>     - the active process is terminated via a 'standard' unwind
>     algorithm used in context #unwindTo: or #resume:,
>
>     - a suspended process termination attempts completing unwind
>     blocks halfway through their execution first using
>     #runUntilErrorOrReturnFrom:
>
>     - and after that the termination continues unwinding the remaining
>     unwind blocks using the simulation algorithm of #popTo:
>
>     This approach /looks/ inconsistent and indeed leads to
>     inconsistencies, undesirable behavior and an instability mentioned
>     above.
>
>     The Idea: In my view the easiest and most consistent solution is
>     to simply extend the existing mechanism for completing
>     halfway-through unwind blocks and let it deal with *all* unwind
>     blocks. To make this approach applicable to terminating the active
>     process, we suspend it first and then terminate it as a suspended
>     process.
>
>     A commented code is enclosed.
>
>     I know it's pretty difficult to get a feedback on an ancient code;
>     I'll be all the more grateful for your inputs.
>
>     Best regards,
>
>     Jaromir
>
>     PS:
>
>     Note the change in newProcess - `Processor activeProcess
>     terminate` is replaced by `Processor activeProcess suspend`
>     because there's no need for `terminate` at the bottom context
>     ("terminate = unwind + suspend") and because it would lead to an
>     infinite loop combined with my proposed changes to #terminate.
>
>     PPS:
>
>     More on the bugs:
>
>     ad 1) The #isTerminated condition `suspendedContext pc >
>     suspendedContext startpc` is always met after executing the first
>     instruction of the bottom context – generally not a sign of a
>     terminated process.
>
>     ad 2) Explained in [1]. Concerns e.g. examples like
>
>                   [ [ Processor activeProcess terminate ] ensure: [
>     Processor activeProcess terminate ] ] fork.
>
>     ad 3-4) Explained in detail in [2].
>
>     One example to illustrate the bug:
>
>     | p |
>
>     p := [
>
>                   [
>
>                                [ ] ensure: [
>
>                                              [ ] ensure: [
>
>                                                            Processor
>     activeProcess suspend.
>
>                                                            Transcript
>     show: 'x1'].
>
>                                              Transcript show: 'x2']
>
>                   ] ensure: [
>
>                                Transcript show: 'x3']
>
>     ] newProcess.
>
>     p resume.
>
>     Processor yield.
>
>     p terminate
>
>     ===> x1
>
>     The unwind procedure prints just x1 and skips not only x2 but x3
>     as well ! You'd like to see them all.
>
>     ad 5) This happens in cases like (better save your image before
>     trying this):
>
>                   [self error: 'e1'] ensure: [^2]        "discovered
>     by Christoph Thiede"
>
>     and
>
>                   [self error: 'e1'] ensure: [self error: 'e2']
>
>     These are generally the types of situations causing the Unwind
>     errors. The root cause is that the unwinding is done via
>     simulation (#popTo) rather than 'directly'; the problem is during
>     the simulated execution of unwind blocks a non-local return
>     forwards the execution into a wrong stack - resulting in the
>     Unwind errors.
>
>     ad 6) I just prefer a unified approach... unless I somehow
>     overlooked a reason for two different approaches (I hope not).
>
>     [1]
>     http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
>
>     [2]
>     http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178
>
>     -- 
>     Cuis-dev mailing list
>     Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
>     https://lists.cuis.st/mailman/listinfo/cuis-dev
>
>
> -- 
>
> <https://10pines.com/>
>
>
>   Hernán Wilkinson
>
>
>     Software Developer, Teacher & Coach
>
> Alem 896, Floor 6, Buenos Aires, Argentina
>
> +54 11 6091 3125
>
> @HernanWilkinson
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/261af524/attachment-0001.htm>