[Cuis-dev] Unwind mechanism during termination is broken and inconsistent
Juan Vuletich
juan at jvuletich.org
Tue Apr 13 11:45:17 PDT 2021
Hi Jaromir, hi all,
This is a most impressive work!
It made me remember this talk given by Martin McClure in Smalltalks
2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00,
Martin states the behavior he expects from #terminate, and shows a test.
I tried his test in Cuis, and sometimes (but not always) it failed.
Jaromir, with your code, Martin's test now pass! In addition, I ran the
existing Process tests, and none is made to fail.
I also tried all your tests and examples. They cover a really broad and
exhaustive set of scenarios! A few comments:
- Many examples don't work in Cuis, because we made #fork answer nil
(and not the forked process). The reason for this is explained in the
comment in #fork. The pattern is instead of "p := [stuff] fork." do "p
:= [stuff] newProcess. p resume." With this change, all the examples
work as expected.
- The notes for a few examples speak of crashes in Cuis. I pushed some
fixes a few days ago, and now cmd-. works as expected. The examples
behave the same as in Squeak. Please pull your repos.
Additionally, this remark of you (Jaromir) I don't understand:
/////
ad 1) The #isTerminated condition `suspendedContext pc >
suspendedContext startpc` is always met after executing the first
instruction of the bottom context – generally not a sign of a terminated
process.
\\\\\\
I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
In any case, I think this is a great contribution, It perfectly fixes a
lot of broken behavior, and doesn't seem to bring problems. I'm all for
integrating it.
Jaromir, do you intend to turn the rest of your examples into tests? I
think we should also do that, so we integrate all your experiments and
what you learnt.
Thanks!
--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
On 4/13/2021 6:02 AM, Jaromir Matas via Cuis-dev wrote:
>
> Hi Hernan, hi all,
>
> I'm enclosing a slightly improved version of #terminate along with a
> few semantic tests covering basic nested unwind scenarios (terminating
> an active, suspended, blocked and ready processes in nested unwind
> blocks). More tests can be based on examples in the enclosed
> collection - for scenarios involving non-local returns and errors
> during termination and especially their combinations. A few examples
> explore resiliency of the code exposed to pathologic situations
> (nested termination or errors).
>
> Thanks again for your time. I look forward to your response.
>
> best,
>
> Jaromir
>
> *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
> *Sent: *Sunday, April 11, 2021 10:51
> *To: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>;
> Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Hernan,
>
> Thank you very much for your immediate response.
>
> > Hi Jaromir,
>
> > thank you for sharing this with us! It looks very interesting.
>
> > As you can imagine a change of this magnitude and impact has to be
> analyzed in detail and tested rigorously, and that will take time.
>
> Oh absolutely, I'm aware of that and wouldn't expect anything less :)
>
> > It would help us deeply if there are tests that reproduce the
> errors, do you have them?
>
> Yes, I have a set of Squeak tests I'm planning to extend and make
> presentable ASAP. I'll "translate" them to Cuis too.
>
> > If not, could you write them? I know it is a difficult problem to
> test but if we could have automated tests for this it would help a lot.
>
> > Also, it would help to know how you tested it.
>
> That's important: I tested solely for 'correct' semantics in a series
> of simple scenarios I'll send you.
>
> > For example, have you tried in stress conditions?
>
> Nope, I'm not that experienced unfortunately :)
>
> > for example with an app that uses many processes (ie. seaside or
> any web framework). Please, do not take this question as
> disrespectful, but again,
>
> > it is a change with a big impact and we need to be sure it will
> keep the current behavior (but the bugs of course) and it will not
> introduce new bugs.
>
> > I'm also wondering if there could be systems that are implemented
> as these bugs you mention were "features" ...
>
> That's an interesting question. I realized the sooner bugs this deep
> in the system are fixed the less chance someone learns to 'live' with
> them, crippling their code, or even uses them as 'features'. I just
> hope it's not the case. How would we know? Well... actually here's
> one: while testing it on Pharo I found two tests that sort of
> 'legitimize' or use the Unwind error (bug) to achieve certain
> behavior... (if I interpret the tests correctly).
>
> Thanks again for your questions!
>
> Jaromir
>
> > Thanks!
>
> > Hernan.
>
> *From: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>
> *Sent: *Sunday, April 11, 2021 1:47
> *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Jaromir,
>
> thank you for sharing this with us! It looks very interesting.
>
> As you can imagine a change of this magnitude and impact has to be
> analyzed in detail and tested rigorously, and that will take time.
> (Sadly Juan and me are really busy at this time so it will even take
> more time :-) )
>
> It would help us deeply if there are tests that reproduce the errors,
> do you have them? If not, could you write them? I know it is a
> difficult problem to test but if we could have automated tests for
> this it would help a lot.
>
> Also, it would help to know how you tested it. For example, have you
> tried in stress conditions? for example with an app that uses many
> processes (ie. seaside or any web framework). Please, do not take this
> question as disrespectful, but again, it is a change with a big impact
> and we need to be sure it will keep the current behavior (but the bugs
> of course) and it will not introduce new bugs.
>
> I'm also wondering if there could be systems that are implemented as
> these bugs you mention were "features" ...
>
> Thanks!
>
> Hernan.
>
> On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev
> <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> wrote:
>
> Hi All,
>
> I'd like to present to you a rewrite of #terminate and
> #isTerminated that fixes a few bugs and inconsistencies in process
> termination. I hoped it might interest you.
>
> I'm a Smalltalk enthusiast with education background in math/CS.
> I've been experimenting with processes in Squeak lately and
> discovered a few bugs (or at least inconsistencies) in process
> termination and would like to offer and discuss a solution for Cuis.
>
> The bugs are not unique to Cuis; Squeak/Pharo inherited them too
> and to a degree even Visual Works and VA are affected. The
> proposal presented here doesn't copy any VW or VA solution but
> rather implements a different approach :)
>
> Before boring you to death I'll list the bugs:
>
> 1. #isTerminated falsely reports almost any bottom context as
> terminated
>
> 2. an active process termination bug causes an image freeze
>
> 3. a /nested/ unwind bug: ensure blocks may get skipped during unwind
>
> 4. a failure to complete /nested/ unwind blocks halfway thru
> execution
>
> 5. a failure to correctly execute a non-local return or an error
> in an unwind block
>
> 6. inconsistent semantics of protected blocks unwind during active
> vs. suspended process termination
>
> The current implementation of #terminate uses three different
> approaches to terminate a process:
>
> - the active process is terminated via a 'standard' unwind
> algorithm used in context #unwindTo: or #resume:,
>
> - a suspended process termination attempts completing unwind
> blocks halfway through their execution first using
> #runUntilErrorOrReturnFrom:
>
> - and after that the termination continues unwinding the remaining
> unwind blocks using the simulation algorithm of #popTo:
>
> This approach /looks/ inconsistent and indeed leads to
> inconsistencies, undesirable behavior and an instability mentioned
> above.
>
> The Idea: In my view the easiest and most consistent solution is
> to simply extend the existing mechanism for completing
> halfway-through unwind blocks and let it deal with *all* unwind
> blocks. To make this approach applicable to terminating the active
> process, we suspend it first and then terminate it as a suspended
> process.
>
> A commented code is enclosed.
>
> I know it's pretty difficult to get a feedback on an ancient code;
> I'll be all the more grateful for your inputs.
>
> Best regards,
>
> Jaromir
>
> PS:
>
> Note the change in newProcess - `Processor activeProcess
> terminate` is replaced by `Processor activeProcess suspend`
> because there's no need for `terminate` at the bottom context
> ("terminate = unwind + suspend") and because it would lead to an
> infinite loop combined with my proposed changes to #terminate.
>
> PPS:
>
> More on the bugs:
>
> ad 1) The #isTerminated condition `suspendedContext pc >
> suspendedContext startpc` is always met after executing the first
> instruction of the bottom context – generally not a sign of a
> terminated process.
>
> ad 2) Explained in [1]. Concerns e.g. examples like
>
> [ [ Processor activeProcess terminate ] ensure: [
> Processor activeProcess terminate ] ] fork.
>
> ad 3-4) Explained in detail in [2].
>
> One example to illustrate the bug:
>
> | p |
>
> p := [
>
> [
>
> [ ] ensure: [
>
> [ ] ensure: [
>
> Processor
> activeProcess suspend.
>
> Transcript
> show: 'x1'].
>
> Transcript show: 'x2']
>
> ] ensure: [
>
> Transcript show: 'x3']
>
> ] newProcess.
>
> p resume.
>
> Processor yield.
>
> p terminate
>
> ===> x1
>
> The unwind procedure prints just x1 and skips not only x2 but x3
> as well ! You'd like to see them all.
>
> ad 5) This happens in cases like (better save your image before
> trying this):
>
> [self error: 'e1'] ensure: [^2] "discovered
> by Christoph Thiede"
>
> and
>
> [self error: 'e1'] ensure: [self error: 'e2']
>
> These are generally the types of situations causing the Unwind
> errors. The root cause is that the unwinding is done via
> simulation (#popTo) rather than 'directly'; the problem is during
> the simulated execution of unwind blocks a non-local return
> forwards the execution into a wrong stack - resulting in the
> Unwind errors.
>
> ad 6) I just prefer a unified approach... unless I somehow
> overlooked a reason for two different approaches (I hope not).
>
> [1]
> http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
>
> [2]
> http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178
>
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
>
> --
>
> <https://10pines.com/>
>
>
> Hernán Wilkinson
>
>
> Software Developer, Teacher & Coach
>
> Alem 896, Floor 6, Buenos Aires, Argentina
>
> +54 11 6091 3125
>
> @HernanWilkinson
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210413/261af524/attachment-0001.htm>
More information about the Cuis-dev
mailing list