[Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Jaromir Matas mail at jaromir.net
Tue Apr 13 15:20:15 PDT 2021


Hi Juan,

>
> Hi Jaromir, hi all,
>
> This is a most impressive work!

Many thanks for your encouragement - I'm really glad you like it :)

>
> It made me remember this talk given by Martin McClure in Smalltalks 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00, Martin states the behavior he expects from #terminate, and shows a test. I tried his test in Cuis, and sometimes (but not always) it failed. Jaromir, with your code, Martin's test now pass! In addition, I ran the existing Process tests, and none is made to fail.

Very interesting: I haven't touched the part for releasing critical sections (I thought it was an independent part I wanted to revisit later) but I'm relieved to hear it works properly with the proposed changes :)

In the talk Martin McClure also mentioned #valueUninterruptably. I've noticed Cuis doesn't implement it but I think with the suggested #terminate it could be implemented safely (fingers crossed).

BlockClosure >> valueUninterruptably
               ^ self ifCurtailed: [^ self]

At the moment however, Squeak and Pharo suffer a horrible damage to the image if you try:

               [ self error ] valueUninterruptably

and later e.g.:

               Semaphore new waitTimeoutMSecs: 50

The worst part is the user won't notice anything and a simple code will run as if nothing happened - until an error or a crash happens. One of only a few ways to see something's off is e.g. Squeak's `Processor activeProcess = Project current uiProcess` which all of a sudden answers false (!)

But that's just Squeak/Pharo; Cuis is not affected as badly because it doesn't use the `effectiveProcess` mechanism.

Martin McClure also mentioned #valueUnpreemptively; funny thing is it doesn't work as designed because there's a bug in Process>>#priority which will leave the elevated process run at highest priority until it reaches a suspension point instead of bumping the priority back lower before returning from #valueUnpreemptively. It's discussed in this thread: http://forum.world.st/The-Inbox-Kernel-jar-1368-mcz-tp5126894p5126982.html . The conclusion is a new primitive for #priority is needed.

>
> I also tried all your tests and examples. They cover a really broad and exhaustive set of scenarios! A few comments:
> - Many examples don't work in Cuis, because we made #fork answer nil (and not the forked process). The reason for this is explained in the comment in #fork. The pattern is instead of "p := [stuff] fork." do "p := [stuff] newProcess. p resume." With this change, all the examples work as expected.

Oops, apologies, I noticed the fork pattern (very reasonable), I modified my #fork to return the process to avoid the need to modify all the examples I created for Squeak :) Same for 'self error' unfortunately - for Cuis

               self error

in the examples has to change to:

               self error: 'unwind error'

sorry...


> - The notes for a few examples speak of crashes in Cuis. I pushed some fixes a few days ago, and now cmd-. works as expected. The examples behave the same as in Squeak. Please pull your repos.
>

I will, thanks!


> Additionally, this remark of you (Jaromir) I don't understand:
> /////
> ad 1) The #isTerminated condition `suspendedContext pc > suspendedContext startpc` is always met after executing the first instruction of the bottom context – generally not a sign of a terminated process.
> \\\\\\
> I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
>

My bad explaining; I was talking about #isTerminated implementation in Cuis (and Pharo) using this old condition - on the last line of:

Process >> isTerminated
               "Answer true if terminated, i.e. can never be resumed again, because have nothing to run."
               self isRunning ifTrue: [^ false].
               ^suspendedContext isNil
                              or: [ suspendedContext pc isNil
                                             or: ["If the suspendedContext is the bottomContext it is the block in Process>>newProcess.
                                                            If so, and the pc is greater than the startpc, the bock has alrteady sent and returned
                                                            from value and there is nothing more to do."
                                                            suspendedContext isBottomContext
                                                                           and: [
                                                                                          suspendedContext pc > suspendedContext startpc]]]

Squeak improved this condition a while ago but it's still insufficient :) It's discussed here: http://forum.world.st/The-Inbox-Kernel-jar-1376-mcz-tp5127335p5127341.html


> In any case, I think this is a great contribution, It perfectly fixes a lot of broken behavior, and doesn't seem to bring problems. I'm all for integrating it.
>
> Jaromir, do you intend to turn the rest of your examples into tests? I think we should also do that, so we integrate all your experiments and what you learnt.
>

I'd love to but I realized I don't know how to build tests involving raising errors and pressing Abandon on the Debugger :D If you could point me in the right direction it'd save me a lot of trying :)

Thanks a lot for all your comments. I'm happy you like the suggestion and would be delighted to see it integrated.

> Thanks!
>

Best regards,

Jaromir




From: Juan Vuletich<mailto:juan at jvuletich.org>
Sent: Tuesday, April 13, 2021 20:45
To: Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>; Jaromir Matas<mailto:mail at jaromir.net>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Hi Jaromir, hi all,

This is a most impressive work!

It made me remember this talk given by Martin McClure in Smalltalks 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00, Martin states the behavior he expects from #terminate, and shows a test. I tried his test in Cuis, and sometimes (but not always) it failed. Jaromir, with your code, Martin's test now pass! In addition, I ran the existing Process tests, and none is made to fail.

I also tried all your tests and examples. They cover a really broad and exhaustive set of scenarios! A few comments:
- Many examples don't work in Cuis, because we made #fork answer nil (and not the forked process). The reason for this is explained in the comment in #fork. The pattern is instead of "p := [stuff] fork." do "p := [stuff] newProcess. p resume." With this change, all the examples work as expected.
- The notes for a few examples speak of crashes in Cuis. I pushed some fixes a few days ago, and now cmd-. works as expected. The examples behave the same as in Squeak. Please pull your repos.

Additionally, this remark of you (Jaromir) I don't understand:
/////
ad 1) The #isTerminated condition `suspendedContext pc > suspendedContext startpc` is always met after executing the first instruction of the bottom context – generally not a sign of a terminated process.
\\\\\\
I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).

In any case, I think this is a great contribution, It perfectly fixes a lot of broken behavior, and doesn't seem to bring problems. I'm all for integrating it.

Jaromir, do you intend to turn the rest of your examples into tests? I think we should also do that, so we integrate all your experiments and what you learnt.

Thanks!


--

Juan Vuletich

www.cuis-smalltalk.org<http://www.cuis-smalltalk.org>

https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev

https://github.com/jvuletich

https://www.linkedin.com/in/juan-vuletich-75611b3

@JuanVuletich


On 4/13/2021 6:02 AM, Jaromir Matas via Cuis-dev wrote:
Hi Hernan, hi all,

I'm enclosing a slightly improved version of #terminate along with a few semantic tests covering basic nested unwind scenarios (terminating an active, suspended, blocked and ready processes in nested unwind blocks). More tests can be based on examples in the enclosed collection - for scenarios involving non-local returns and errors during termination and especially their combinations. A few examples explore resiliency of the code exposed to pathologic situations (nested termination or errors).

Thanks again for your time. I look forward to your response.
best,
Jaromir

From: Jaromir Matas via Cuis-dev<mailto:cuis-dev at lists.cuis.st>
Sent: Sunday, April 11, 2021 10:51
To: Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>; Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Jaromir Matas<mailto:mail at jaromir.net>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Hi Hernan,

Thank you very much for your immediate response.

> Hi Jaromir,
>  thank you for sharing this with us! It looks very interesting.
>  As you can imagine a change of this magnitude and impact has to be analyzed in detail and tested rigorously, and that will take time.

Oh absolutely, I'm aware of that and wouldn't expect anything less :)

>  It would help us deeply if there are tests that reproduce the errors, do you have them?

Yes, I have a set of Squeak tests I'm planning to extend and make presentable ASAP. I'll "translate" them to Cuis too.

>  If not, could you write them? I know it is a difficult problem to test but if we could have automated tests for this it would help a lot.
>  Also, it would help to know how you tested it.

That's important: I tested solely for 'correct' semantics in a series of simple scenarios I'll send you.

>  For example, have you tried in stress conditions?

Nope, I'm not that experienced unfortunately :)

>  for example with an app that uses many processes (ie. seaside or any web framework). Please, do not take this question as disrespectful, but again,
>  it is a change with a big impact and we need to be sure it will keep the current behavior (but the bugs of course) and it will not introduce new bugs.
>  I'm also wondering if there could be systems that are implemented as these bugs you mention were "features" ...

That's an interesting question. I realized the sooner bugs this deep in the system are fixed the less chance someone learns to 'live' with them, crippling their code, or even uses them as 'features'. I just hope it's not the case. How would we know? Well... actually here's one: while testing it on Pharo I found two tests that sort of 'legitimize' or use the Unwind error (bug) to achieve certain behavior... (if I interpret the tests correctly).

Thanks again for your questions!

Jaromir

>  Thanks!
>  Hernan.

From: Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>
Sent: Sunday, April 11, 2021 1:47
To: Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Jaromir Matas<mailto:mail at jaromir.net>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Hi Jaromir,
 thank you for sharing this with us! It looks very interesting.
 As you can imagine a change of this magnitude and impact has to be analyzed in detail and tested rigorously, and that will take time. (Sadly Juan and me are really busy at this time so it will even take more time :-) )
 It would help us deeply if there are tests that reproduce the errors, do you have them? If not, could you write them? I know it is a difficult problem to test but if we could have automated tests for this it would help a lot.
 Also, it would help to know how you tested it. For example, have you tried in stress conditions? for example with an app that uses many processes (ie. seaside or any web framework). Please, do not take this question as disrespectful, but again, it is a change with a big impact and we need to be sure it will keep the current behavior (but the bugs of course) and it will not introduce new bugs.
 I'm also wondering if there could be systems that are implemented as these bugs you mention were "features" ...

 Thanks!
 Hernan.


On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev <cuis-dev at lists.cuis.st<mailto:cuis-dev at lists.cuis.st>> wrote:
Hi All,

I'd like to present to you a rewrite of #terminate and #isTerminated that fixes a few bugs and inconsistencies in process termination. I hoped it might interest you.

I'm a Smalltalk enthusiast with education background in math/CS. I've been experimenting with processes in Squeak lately and discovered a few bugs (or at least inconsistencies) in process termination and would like to offer and discuss a solution for Cuis.

The bugs are not unique to Cuis; Squeak/Pharo inherited them too and to a degree even Visual Works and VA are affected. The proposal presented here doesn't copy any VW or VA solution but rather implements a different approach :)

Before boring you to death I'll list the bugs:

1. #isTerminated falsely reports almost any bottom context as terminated

2. an active process termination bug causes an image freeze

3. a nested unwind bug: ensure blocks may get skipped during unwind

4. a failure to complete nested unwind blocks halfway thru execution

5. a failure to correctly execute a non-local return or an error in an unwind block

6. inconsistent semantics of protected blocks unwind during active vs. suspended process termination

The current implementation of #terminate uses three different approaches to terminate a process:
- the active process is terminated via a 'standard' unwind algorithm used in context #unwindTo: or #resume:,
- a suspended process termination attempts completing unwind blocks halfway through their execution first using #runUntilErrorOrReturnFrom:
- and after that the termination continues unwinding the remaining unwind blocks using the simulation algorithm of #popTo:

This approach looks inconsistent and indeed leads to inconsistencies, undesirable behavior and an instability mentioned above.

The Idea: In my view the easiest and most consistent solution is to simply extend the existing mechanism for completing halfway-through unwind blocks and let it deal with all unwind blocks. To make this approach applicable to terminating the active process, we suspend it first and then terminate it as a suspended process.

A commented code is enclosed.

I know it's pretty difficult to get a feedback on an ancient code; I'll be all the more grateful for your inputs.

Best regards,

Jaromir


PS:
Note the change in newProcess - `Processor activeProcess terminate` is replaced by `Processor activeProcess suspend` because there's no need for `terminate` at the bottom context ("terminate = unwind + suspend") and because it would lead to an infinite loop combined with my proposed changes to #terminate.

PPS:
More on the bugs:
ad 1) The #isTerminated condition `suspendedContext pc > suspendedContext startpc` is always met after executing the first instruction of the bottom context – generally not a sign of a terminated process.

ad 2) Explained in [1]. Concerns e.g. examples like
              [ [ Processor activeProcess terminate ] ensure: [ Processor activeProcess terminate ] ] fork.

ad 3-4) Explained in detail in [2].

One example to illustrate the bug:

| p |
p := [
              [
                           [ ] ensure: [
                                         [ ] ensure: [
                                                       Processor activeProcess suspend.
                                                       Transcript show: 'x1'].
                                         Transcript show: 'x2']
              ] ensure: [
                           Transcript show: 'x3']
] newProcess.
p resume.
Processor yield.
p terminate

===> x1

The unwind procedure prints just x1 and skips not only x2 but x3 as well ! You'd like to see them all.

ad 5) This happens in cases like (better save your image before trying this):
              [self error: 'e1'] ensure: [^2]        "discovered by Christoph Thiede"
and
              [self error: 'e1'] ensure: [self error: 'e2']
These are generally the types of situations causing the Unwind errors. The root cause is that the unwinding is done via simulation (#popTo) rather than 'directly'; the problem is during the simulated execution of unwind blocks a non-local return forwards the execution into a wrong stack - resulting in the Unwind errors.

ad 6) I just prefer a unified approach... unless I somehow overlooked a reason for two different approaches (I hope not).


[1] http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
[2] http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178

--
Cuis-dev mailing list
Cuis-dev at lists.cuis.st<mailto:Cuis-dev at lists.cuis.st>
https://lists.cuis.st/mailman/listinfo/cuis-dev


--
[https://10pines.github.io/email-signature/10pines-firma@2x.png]<https://10pines.com/>
Hernán Wilkinson
Software Developer, Teacher & Coach

Alem 896, Floor 6, Buenos Aires, Argentina

+54 11 6091 3125

@HernanWilkinson


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/04e8439a/attachment-0001.htm>


More information about the Cuis-dev mailing list