[Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Sun May 16 13:26:47 PDT 2021

Hi Juan,

I'm enclosing an updated changeset (please disregard the previous one); I made a stupid assumption about #runUntilErrorOrReturnFrom:. The new changeset leaves it untouched and adds its replica called #runUnwindUntilErrorOrReturnFrom: with the required functionality.

I'd like to point out a little issue with the proposed semantics of nested errors during termination:

Below’s a simple example. If you abandon the first debugger it means you want to terminate the execution and unwind the terminated process but an error happens during the unwind. You get another debugger reporting the error and asking you what to do next - to debug the error or proceed. But proceeding a termination really means to proceed with the *unwind business only* and not to proceed with the original computation. Here's my point: Hitting Proceed will resume the original computation beyond the unwind blocks but Abandon will only finish the unwind procedure which is exactly what we want. In Squeak I simply disabled the Proceed button but in Cuis I couldn't see such a simple solution so I left it unresolved.

[self error: 'x1'] ensure: [
    [self error: 'x2'] ensure: [
        Transcript show: '2'].
    Transcript show: '1'].
Transcript show: '0'

---> you want to see '21' but proceeding the debugger will give you '210' where '0' is printed outside of unwind blocks.

So just that you know there's still a little inconsistency in the termination semantics.

One more question: do you have an idea why BlockCannotReturn error is considered resumable? I'd say it's unresumable by definition…? Thanks :)

best regards,

Jaromir

From: Jaromir Matas via Cuis-dev<mailto:cuis-dev at lists.cuis.st>
Sent: Wednesday, May 12, 2021 17:02
To: Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Jaromir Matas<mailto:mail at jaromir.net>; Juan Vuletich<mailto:juan at jvuletich.org>; Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Hi Juan,

I apologize for responding so late to your great comments. I felt I had to educate myself in error handling first :)

> m1
>    | a |
>    a := Array new: 3.
>    self m2: a.
>    a at: 3 put: true.
>    a print.
>
>  m2: a   "A"
>    [1 + 2] ensure: [
>      [ 3 + 4 ] ensure: [    "*1"
>        true ifTrue: [
>           ^a at: 1 put: true ]]. "*2"
>      a at: 2 put: true ]
>
>  In this example, the *1ensure is there only to guarantee that *2is ran, even if [3+4] happens to fail. If [3+4] it runs without problems, the result should be exactly the same as:
>
>  m2: a   "B"
>    [1 + 2] ensure: [
>      3 + 4.
>      true ifTrue: [
>        ^a at: 1 put: true ].
>      a at: 2 put: true ]
>
>  Applying the same argument, the result should be the same as:
>
>  m2: a   "C"
>    1 + 2.
>    3 + 4.
>    true ifTrue: [
>      ^a at: 1 put: true ].
>    a at: 2 put: true
>
>  In implementation C it is clear that a second isNil. So, the same should be the case for B and A.

This is a beautiful example! Thanks a lot, very enlightening. I fully agree, my suggestion was incorrect. The logic you are showing says to leave the block where the non-local return executes without unwinding but to continue unwinding after that block. Very interesting!

> x := nil.
> [self error: 'x1'] ensure: [
>   [self error: 'x2'] ensure: [
>     [self error: 'x3'] ensure: [
>       x:=3].
>     x:=2].
>   x:=1].
> x
>
> I think this case is very similar to the one above.. For example, if we proceed the first debugger (the x1 error), the x2 debugger opens. If we abandon it, we are abandoning the execution of the first ensured block (that includes the x := 1 assignment at the end). So, no assignment is done. I think that the behavior of our fix is correct in this case. No need to simulate "resuming".

In this case I'd disagree. Abandoning is currently implemented and interpreted as terminating so I chose to follow this logic and made abandoning equivalent to sending #terminate, meaning the unwind will continue and execute the second error, open a second debugger etc. Please try out the enclosed changeset; it also follows the non-local logic you suggested above. We've also discussed the issue with Christoph Thiede here: http://forum.world.st/Solving-multiple-termination-bugs-summary-proposal-tp5128285p5129113.html

The main idea behind the extended version of #terminate is using the return value from #runUntilErrorOrReturnFrom: to continue the unwind based on the exception found.

I've also fixed a few minor bug in the current implementation:

1) using suspendedContext variable during unwind is an incorrect approach that may lead to leaving unterminated context chains behind which the garbage collector wouldn't recycle; I'm using a local temp 'top' instead.

2) I removed the part looking for an 'inner' context halfway through execution - it's unnecessary and theoretically even incorrect - and use outerMost as the unwind context that needs to be completed.

3) the condition in the middle (`ifNil: [^self]`) became unnecessary after your fix.

4) there was an issue with infinite recursion in case of #cannotReturn and #doesNotUnderstand - this has been fixed by explicitly excluding the two exceptions from returning.

5) #cannotReturn generally causes a VM crash if it attempts to return - this is fixed by looping #cannotReturn back to itself and the way out leads via Abandoning the debugger window (even accidental pressing Proceed crashes your image so I guess it's worth fixing in every case)

6) #runUntilErrorOrReturnFrom: required a slight modification - I suggest either modify it or we can create a copy with this modified behavior; I chose to modify it because no current method would be affected by the change.

As another example illustrating the changes you can try:

    a := Array new: 4 withAll: false.
    p := [
             [
            [] ensure: [
                           [ ^2 ] ensure: [
                                                      a at: 1 put: true].
                           a at: 2 put: true]
                           ] ensure: [a at: 3 put: true].
                           a at: 4 put: true
        ] newProcess.
    p resume.

It crashes the image without amending #cannotReturn but works well with the enclosed changeset and returns
correctly a equal to #(true false true false).

Also this example is catastrophic but dealt with safely with the new changeset:

   [^2] newProcess resume

I haven't built any testcases yet but I definitely plan to if you find this interesting.

Many thanks for discussing this issue; any comments will be greatly appreciated :)

best,

Jaromir

From: Juan Vuletich via Cuis-dev<mailto:cuis-dev at lists.cuis.st>
Sent: Friday, April 30, 2021 19:32
To: Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Juan Vuletich<mailto:juan at jvuletich.org>; Jaromir Matas<mailto:mail at jaromir.net>; Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Hi Jaromir,

(inline)
On 4/30/2021 7:01 AM, Jaromir Matas via Cuis-dev wrote:
Hi Juan,

Awesome! I remember the idea of using the result of #runUntilErrorOrReturnFrom crossed my mind for a fleeting moment… and then I lost it ;) With your permission I’d like to use your fix in my Squeak version as well.

I'm glad you like it. Yes, of course. In these low level details, it is a good thing to have as much common code as possible!

Besides, in general, all the code we publish for Cuis is MIT license, and free for any use.

I’ve tried to rewrite your #test1ATerminate without the method calls – and indeed it passes… and that’s why I missed that – it was too simple; when using sends is where your fix comes to the rescue – THANKS!

    | p a |
    a := Array new: 4 withAll: false.
    p := [
             [
                              [ ] ensure: [
                                             [Processor activeProcess suspend] ensure: [
                                                           ^a at: 1 put: true].    "line L1"
                                             a at: 2 put: true]                     "line L2"
                              ] ensure: [a at: 3 put: true].
                              a at: 4 put: true
        ] newProcess.
    p resume.
    Processor yield.
    "make sure p is suspended and none of the unwind blocks has finished yet"
    self assert: p isSuspended.
    a noneSatisfy: [ :b | b ].
    "now terminate the process and make sure all unwind blocks have finished"
    p terminate.
    self assert: p isTerminated.
    self assert: a first & a third.
    self assert: (a second | a fourth) not.

Yes it does. I just thought that a more real-life like test of non local returns should also include actual method calls!
I’d like to raise a question here: I feel the second item, on line L2 should ideally execute too because it’s inside an unwind block halfway through it’s termination. The problem is though the non-local return at line L1 invokes it’s own unwind algorithm in #resume:through: which ignores halfway through unwind blocks – the reason for that is #resume:through: operates on the active process’s stack which makes it extremely difficult to unwind halfway through blocks. I tried to apply a similar tactics like in termination (control the unwind from another stack) and it works well but it’s very intrusive… I may open a separate discussion on that later to share the results. Do you think it may be worth exploring or it’s just not worth the bother?

Well. This is not just in the case of process #terminate, right? To play with this without involving process handling, but including actual method calls I just tried this:

m1
    | a |
    a := Array new: 3.
    self m2: a.
    a at: 3 put: true.
    a print.

m2: a      "A"
    [1 + 2] ensure: [
        [ 3 + 4 ] ensure: [       "*1"
            true ifTrue: [
                ^a at: 1 put: true ]]. "*2"
        a at: 2 put: true ]

In this example, the *1ensure is there only to guarantee that *2is ran, even if [3+4] happens to fail. If [3+4] it runs without problems, the result should be exactly the same as:

m2: a     "B"
    [1 + 2] ensure: [
        3 + 4.
        true ifTrue: [
            ^a at: 1 put: true ].
        a at: 2 put: true ]

Applying the same argument, the result should be the same as:

m2: a     "C"
    1 + 2.
    3 + 4.
    true ifTrue: [
        ^a at: 1 put: true ].
    a at: 2 put: true

In implementation C it is clear that a second isNil. So, the same should be the case for B and A.

I think that an ensured block should be guaranteed to run without external interference. But if it decides on its own to exit before running all its statements, it is it's own decision.

Another issue: I considered using the error part of the result of #runUntilErrorOrReturnFrom: to deal with situations like this (careful – crashes the Cuis image without the terminate fix; with the fix it works “ok”):

x := nil.
[self error: 'x1'] ensure: [
    [self error: 'x2'] ensure: [
        [self error: 'x3'] ensure: [
            x:=3].
        x:=2].
    x:=1].
x

Here you have nested errors and the question is: If we abandon the Debugger window, what do we want to see as a result? Without the fix the image crashes badly with unwind errors, with the fix however the Debugger closes without unwinding – it’s a consequence of # runUntilErrorOrReturnFrom: behavior – it returns errors rather than opens a debugger (and leaves the decision with the user). So what do we want to see as a result – keep opening debugger windows and abandoning them manually or ignoring the errors and executing the assignments? That sounds like “resuming” rather than abandoning to me so at the moment I don’t know and will have to think about it. I just didn’t want to complicate the #terminate prematurely :)

I think this case is very similar to the one above.. For example, if we proceed the first debugger (the x1 error), the x2 debugger opens. If we abandon it, we are abandoning the execution of the first ensured block (that includes the x := 1 assignment at the end). So, no assignment is done. I think that the behavior of our fix is correct in this case. No need to simulate "resuming".

Juan, many thanks again, I’ll study your tests and learn from them.

I'm happy to be of help. There's not nothing in those tests that could be new to you. All I did was to add nested method calls, and add the #resume cases, that already did work with the fix. Let me thank you. You did a great analysis of the issues at hand, and your fix is a great contribution.

I'll integrate it right now. If any further analysis provides additional changes, we'll integrate them too.

Best regards,

jaromir

Cheers,

--

Juan Vuletich

www.cuis-smalltalk.org<http://www.cuis-smalltalk.org>

https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev

https://github.com/jvuletich

https://www.linkedin.com/in/juan-vuletich-75611b3

@JuanVuletich

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210516/d581f38b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4588-CuisCore-JaromirMatas-2021May09-20h43m-jar.002.cs.st
Type: application/octet-stream
Size: 7484 bytes
Desc: 4588-CuisCore-JaromirMatas-2021May09-20h43m-jar.002.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210516/d581f38b/attachment-0001.obj>