[Cuis-dev] Unwind mechanism during termination is broken and inconsistent

Sat Apr 10 12:20:16 PDT 2021

Hi All,

I'd like to present to you a rewrite of #terminate and #isTerminated that fixes a few bugs and inconsistencies in process termination. I hoped it might interest you.

I'm a Smalltalk enthusiast with education background in math/CS. I've been experimenting with processes in Squeak lately and discovered a few bugs (or at least inconsistencies) in process termination and would like to offer and discuss a solution for Cuis.

The bugs are not unique to Cuis; Squeak/Pharo inherited them too and to a degree even Visual Works and VA are affected. The proposal presented here doesn't copy any VW or VA solution but rather implements a different approach :)

Before boring you to death I'll list the bugs:

1. #isTerminated falsely reports almost any bottom context as terminated

2. an active process termination bug causes an image freeze

3. a nested unwind bug: ensure blocks may get skipped during unwind

4. a failure to complete nested unwind blocks halfway thru execution

5. a failure to correctly execute a non-local return or an error in an unwind block

6. inconsistent semantics of protected blocks unwind during active vs. suspended process termination

The current implementation of #terminate uses three different approaches to terminate a process:
- the active process is terminated via a 'standard' unwind algorithm used in context #unwindTo: or #resume:,
- a suspended process termination attempts completing unwind blocks halfway through their execution first using #runUntilErrorOrReturnFrom:
- and after that the termination continues unwinding the remaining unwind blocks using the simulation algorithm of #popTo:

This approach looks inconsistent and indeed leads to inconsistencies, undesirable behavior and an instability mentioned above.

The Idea: In my view the easiest and most consistent solution is to simply extend the existing mechanism for completing halfway-through unwind blocks and let it deal with all unwind blocks. To make this approach applicable to terminating the active process, we suspend it first and then terminate it as a suspended process.

A commented code is enclosed.

I know it's pretty difficult to get a feedback on an ancient code; I'll be all the more grateful for your inputs.

Best regards,

Jaromir

PS:
Note the change in newProcess - `Processor activeProcess terminate` is replaced by `Processor activeProcess suspend` because there's no need for `terminate` at the bottom context ("terminate = unwind + suspend") and because it would lead to an infinite loop combined with my proposed changes to #terminate.

PPS:
More on the bugs:
ad 1) The #isTerminated condition `suspendedContext pc > suspendedContext startpc` is always met after executing the first instruction of the bottom context – generally not a sign of a terminated process.

ad 2) Explained in [1]. Concerns e.g. examples like
              [ [ Processor activeProcess terminate ] ensure: [ Processor activeProcess terminate ] ] fork.

ad 3-4) Explained in detail in [2].

One example to illustrate the bug:

| p |
p := [
              [
                           [ ] ensure: [
                                         [ ] ensure: [
                                                       Processor activeProcess suspend.
                                                       Transcript show: 'x1'].
                                         Transcript show: 'x2']
              ] ensure: [
                           Transcript show: 'x3']
] newProcess.
p resume.
Processor yield.
p terminate

===> x1

The unwind procedure prints just x1 and skips not only x2 but x3 as well ! You'd like to see them all.

ad 5) This happens in cases like (better save your image before trying this):
              [self error: 'e1'] ensure: [^2]        "discovered by Christoph Thiede"
and
              [self error: 'e1'] ensure: [self error: 'e2']
These are generally the types of situations causing the Unwind errors. The root cause is that the unwinding is done via simulation (#popTo) rather than 'directly'; the problem is during the simulated execution of unwind blocks a non-local return forwards the execution into a wrong stack - resulting in the Unwind errors.

ad 6) I just prefer a unified approach... unless I somehow overlooked a reason for two different approaches (I hope not).

[1] http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
[2] http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210410/1e5e2e7e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4527-CuisCore-JaromirMatas-2021Apr08-21h20m-jar.001.cs.st
Type: application/octet-stream
Size: 4236 bytes
Desc: 4527-CuisCore-JaromirMatas-2021Apr08-21h20m-jar.001.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210410/1e5e2e7e/attachment.obj>