[Cuis-dev] Unwind mechanism during termination is broken and inconsistent
Jaromir Matas
mail at jaromir.net
Tue Apr 13 02:02:29 PDT 2021
Hi Hernan, hi all,
I'm enclosing a slightly improved version of #terminate along with a few semantic tests covering basic nested unwind scenarios (terminating an active, suspended, blocked and ready processes in nested unwind blocks). More tests can be based on examples in the enclosed collection - for scenarios involving non-local returns and errors during termination and especially their combinations. A few examples explore resiliency of the code exposed to pathologic situations (nested termination or errors).
Thanks again for your time. I look forward to your response.
best,
Jaromir
From: Jaromir Matas via Cuis-dev<mailto:cuis-dev at lists.cuis.st>
Sent: Sunday, April 11, 2021 10:51
To: Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>; Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Jaromir Matas<mailto:mail at jaromir.net>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent
Hi Hernan,
Thank you very much for your immediate response.
> Hi Jaromir,
> thank you for sharing this with us! It looks very interesting.
> As you can imagine a change of this magnitude and impact has to be analyzed in detail and tested rigorously, and that will take time.
Oh absolutely, I'm aware of that and wouldn't expect anything less :)
> It would help us deeply if there are tests that reproduce the errors, do you have them?
Yes, I have a set of Squeak tests I'm planning to extend and make presentable ASAP. I'll "translate" them to Cuis too.
> If not, could you write them? I know it is a difficult problem to test but if we could have automated tests for this it would help a lot.
> Also, it would help to know how you tested it.
That's important: I tested solely for 'correct' semantics in a series of simple scenarios I'll send you.
> For example, have you tried in stress conditions?
Nope, I'm not that experienced unfortunately :)
> for example with an app that uses many processes (ie. seaside or any web framework). Please, do not take this question as disrespectful, but again,
> it is a change with a big impact and we need to be sure it will keep the current behavior (but the bugs of course) and it will not introduce new bugs.
> I'm also wondering if there could be systems that are implemented as these bugs you mention were "features" ...
That's an interesting question. I realized the sooner bugs this deep in the system are fixed the less chance someone learns to 'live' with them, crippling their code, or even uses them as 'features'. I just hope it's not the case. How would we know? Well... actually here's one: while testing it on Pharo I found two tests that sort of 'legitimize' or use the Unwind error (bug) to achieve certain behavior... (if I interpret the tests correctly).
Thanks again for your questions!
Jaromir
> Thanks!
> Hernan.
From: Hernan Wilkinson<mailto:hernan.wilkinson at 10pines.com>
Sent: Sunday, April 11, 2021 1:47
To: Discussion of Cuis Smalltalk<mailto:cuis-dev at lists.cuis.st>
Cc: Jaromir Matas<mailto:mail at jaromir.net>
Subject: Re: [Cuis-dev] Unwind mechanism during termination is broken and inconsistent
Hi Jaromir,
thank you for sharing this with us! It looks very interesting.
As you can imagine a change of this magnitude and impact has to be analyzed in detail and tested rigorously, and that will take time. (Sadly Juan and me are really busy at this time so it will even take more time :-) )
It would help us deeply if there are tests that reproduce the errors, do you have them? If not, could you write them? I know it is a difficult problem to test but if we could have automated tests for this it would help a lot.
Also, it would help to know how you tested it. For example, have you tried in stress conditions? for example with an app that uses many processes (ie. seaside or any web framework). Please, do not take this question as disrespectful, but again, it is a change with a big impact and we need to be sure it will keep the current behavior (but the bugs of course) and it will not introduce new bugs.
I'm also wondering if there could be systems that are implemented as these bugs you mention were "features" ...
Thanks!
Hernan.
On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev <cuis-dev at lists.cuis.st<mailto:cuis-dev at lists.cuis.st>> wrote:
Hi All,
I'd like to present to you a rewrite of #terminate and #isTerminated that fixes a few bugs and inconsistencies in process termination. I hoped it might interest you.
I'm a Smalltalk enthusiast with education background in math/CS. I've been experimenting with processes in Squeak lately and discovered a few bugs (or at least inconsistencies) in process termination and would like to offer and discuss a solution for Cuis.
The bugs are not unique to Cuis; Squeak/Pharo inherited them too and to a degree even Visual Works and VA are affected. The proposal presented here doesn't copy any VW or VA solution but rather implements a different approach :)
Before boring you to death I'll list the bugs:
1. #isTerminated falsely reports almost any bottom context as terminated
2. an active process termination bug causes an image freeze
3. a nested unwind bug: ensure blocks may get skipped during unwind
4. a failure to complete nested unwind blocks halfway thru execution
5. a failure to correctly execute a non-local return or an error in an unwind block
6. inconsistent semantics of protected blocks unwind during active vs. suspended process termination
The current implementation of #terminate uses three different approaches to terminate a process:
- the active process is terminated via a 'standard' unwind algorithm used in context #unwindTo: or #resume:,
- a suspended process termination attempts completing unwind blocks halfway through their execution first using #runUntilErrorOrReturnFrom:
- and after that the termination continues unwinding the remaining unwind blocks using the simulation algorithm of #popTo:
This approach looks inconsistent and indeed leads to inconsistencies, undesirable behavior and an instability mentioned above.
The Idea: In my view the easiest and most consistent solution is to simply extend the existing mechanism for completing halfway-through unwind blocks and let it deal with all unwind blocks. To make this approach applicable to terminating the active process, we suspend it first and then terminate it as a suspended process.
A commented code is enclosed.
I know it's pretty difficult to get a feedback on an ancient code; I'll be all the more grateful for your inputs.
Best regards,
Jaromir
PS:
Note the change in newProcess - `Processor activeProcess terminate` is replaced by `Processor activeProcess suspend` because there's no need for `terminate` at the bottom context ("terminate = unwind + suspend") and because it would lead to an infinite loop combined with my proposed changes to #terminate.
PPS:
More on the bugs:
ad 1) The #isTerminated condition `suspendedContext pc > suspendedContext startpc` is always met after executing the first instruction of the bottom context – generally not a sign of a terminated process.
ad 2) Explained in [1]. Concerns e.g. examples like
[ [ Processor activeProcess terminate ] ensure: [ Processor activeProcess terminate ] ] fork.
ad 3-4) Explained in detail in [2].
One example to illustrate the bug:
| p |
p := [
[
[ ] ensure: [
[ ] ensure: [
Processor activeProcess suspend.
Transcript show: 'x1'].
Transcript show: 'x2']
] ensure: [
Transcript show: 'x3']
] newProcess.
p resume.
Processor yield.
p terminate
===> x1
The unwind procedure prints just x1 and skips not only x2 but x3 as well ! You'd like to see them all.
ad 5) This happens in cases like (better save your image before trying this):
[self error: 'e1'] ensure: [^2] "discovered by Christoph Thiede"
and
[self error: 'e1'] ensure: [self error: 'e2']
These are generally the types of situations causing the Unwind errors. The root cause is that the unwinding is done via simulation (#popTo) rather than 'directly'; the problem is during the simulated execution of unwind blocks a non-local return forwards the execution into a wrong stack - resulting in the Unwind errors.
ad 6) I just prefer a unified approach... unless I somehow overlooked a reason for two different approaches (I hope not).
[1] http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
[2] http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178
--
Cuis-dev mailing list
Cuis-dev at lists.cuis.st<mailto:Cuis-dev at lists.cuis.st>
https://lists.cuis.st/mailman/listinfo/cuis-dev
--
[https://10pines.github.io/email-signature/10pines-firma@2x.png]<https://10pines.com/>
Hernán Wilkinson
Software Developer, Teacher & Coach
Alem 896, Floor 6, Buenos Aires, Argentina
+54 11 6091 3125
@HernanWilkinson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/593551fb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BaseImageTests-jar.001.cs.st
Type: application/octet-stream
Size: 6182 bytes
Desc: BaseImageTests-jar.001.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/593551fb/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4527-CuisCore-JaromirMatas-2021Apr08-21h20m-jar.002.cs.st
Type: application/octet-stream
Size: 4353 bytes
Desc: 4527-CuisCore-JaromirMatas-2021Apr08-21h20m-jar.002.cs.st
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/593551fb/attachment-0003.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Process - terminate examples Cuis.txt
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/593551fb/attachment-0001.txt>
More information about the Cuis-dev
mailing list