[Cuis-dev] Unwind mechanism during termination is broken and inconsistent
Juan Vuletich
juan at jvuletich.org
Thu Apr 29 12:34:26 PDT 2021
Hi Jaromir,
Yesterday I found some time to finally integrate your great fixes to
#terminate & friends. While doing this, I turned many of your examples
into tests. Then I did this one, based on your non-local-return examples:
test1ATerminate
"Terminate suspended process.
Test all nested unwind blocks are correctly unwound;
all unwind blocks halfway through their execution should be
completed."
| p a |
a _ Array new: 4 withAll: false.
p := [
self aux1A: a
] newProcess.
p resume.
Processor yield.
"make sure p is suspended and none of the unwind blocks has
finished yet"
self assert: p isSuspended.
a noneSatisfy: [ :b | b ].
"now terminate the process and make sure all unwind blocks have
finished"
p terminate.
self assert: p isTerminated.
self assert: a first & a third.
self deny: a second | a fourth.
aux1A: anArray
[
self aux1Ainner: anArray.
] ensure: [
anArray at: 3 put: true].
anArray at: 4 put: true
aux1Ainner: anArray
[ ] ensure: [
[Processor activeProcess suspend] ensure: [
^anArray at: 1 put: true].
anArray at: 2 put: true]
But this test fail. The 3rd entry doesn't get updated, even if it is in
an ensured block. I think this is a bug, and I wrote a fix.
The attached zip file with all the relevant code. Please review my tweak
to #terminate, and its effect on this test. If you agree with it, I'll
integrate it all.
Thanks,
On 4/14/2021 5:37 AM, Jaromir Matas via Cuis-dev wrote:
>
> Hi Juan, hi all,
>
> I’m enclosing an amended list of examples with fixed #fork and ‘self
> error’ patterns so that you can use them right away.
>
> > - The notes for a few examples speak of crashes in Cuis. I pushed
> some fixes a few days ago,
>
> > and now cmd-. works as expected. The examples behave the same as in
> Squeak.
>
> Yes, it’s perfect :) I removed the crash notes from the example list.
>
> A little correction: I haven’t realized Martin McClure was talking
> from a Visual Works position - they have different semantics of
> #valueUninterruptably (at least in 8.3 from 2017, using a semaphore).
>
> The Squeak semantics of #valueUninterruptably is “don’t let anything
> kill me” (using a non-local return in the unwind block).
>
> Best regards,
>
> Jaromir
>
> *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
> *Sent: *Wednesday, April 14, 2021 0:20
> *To: *Juan Vuletich <mailto:juan at jvuletich.org>; Discussion of Cuis
> Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>; Hernan Wilkinson
> <mailto:hernan.wilkinson at 10pines.com>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Juan,
>
> >
>
> > Hi Jaromir, hi all,
>
> >
>
> > This is a most impressive work!
>
> Many thanks for your encouragement - I'm really glad you like it :)
>
> >
>
> > It made me remember this talk given by Martin McClure in Smalltalks
> 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00,
> Martin states the behavior he expects from #terminate, and shows a
> test. I tried his test in Cuis, and sometimes (but not always) it
> failed. Jaromir, with your code, Martin's test now pass! In addition,
> I ran the existing Process tests, and none is made to fail.
>
> Very interesting: I haven't touched the part for releasing critical
> sections (I thought it was an independent part I wanted to revisit
> later) but I'm relieved to hear it works properly with the proposed
> changes :)
>
> In the talk Martin McClure also mentioned #valueUninterruptably. I've
> noticed Cuis doesn't implement it but I think with the suggested
> #terminate it could be implemented safely (fingers crossed).
>
> BlockClosure >> valueUninterruptably
>
> ^ self ifCurtailed: [^ self]
>
> At the moment however, Squeak and Pharo suffer a horrible damage to
> the image if you try:
>
> [ self error ] valueUninterruptably
>
> and later e.g.:
>
> Semaphore new waitTimeoutMSecs: 50
>
> The worst part is the user won't notice anything and a simple code
> will run as if nothing happened - until an error or a crash happens.
> One of only a few ways to see something's off is e.g. Squeak's
> `Processor activeProcess = Project current uiProcess` which all of a
> sudden answers false (!)
>
> But that's just Squeak/Pharo; Cuis is not affected as badly because it
> doesn't use the `effectiveProcess` mechanism.
>
> Martin McClure also mentioned #valueUnpreemptively; funny thing is it
> doesn't work as designed because there's a bug in Process>>#priority
> which will leave the elevated process run at highest priority until it
> reaches a suspension point instead of bumping the priority back lower
> before returning from #valueUnpreemptively. It's discussed in this
> thread:
> http://forum.world.st/The-Inbox-Kernel-jar-1368-mcz-tp5126894p5126982.html
> . The conclusion is a new primitive for #priority is needed.
>
> >
>
> > I also tried all your tests and examples. They cover a really broad
> and exhaustive set of scenarios! A few comments:
>
> > - Many examples don't work in Cuis, because we made #fork answer nil
> (and not the forked process). The reason for this is explained in the
> comment in #fork. The pattern is instead of "p := [stuff] fork." do "p
> := [stuff] newProcess. p resume." With this change, all the examples
> work as expected.
>
> Oops, apologies, I noticed the fork pattern (very reasonable), I
> modified my #fork to return the process to avoid the need to modify
> all the examples I created for Squeak :) Same for 'self error'
> unfortunately - for Cuis
>
> self error
>
> in the examples has to change to:
>
> self error: 'unwind error'
>
> sorry...
>
> > - The notes for a few examples speak of crashes in Cuis. I pushed
> some fixes a few days ago, and now cmd-. works as expected. The
> examples behave the same as in Squeak. Please pull your repos.
>
> >
>
> I will, thanks!
>
> > Additionally, this remark of you (Jaromir) I don't understand:
>
> > /////
>
> > ad 1) The #isTerminated condition `suspendedContext pc >
> suspendedContext startpc` is always met after executing the first
> instruction of the bottom context – generally not a sign of a
> terminated process.
>
> > \\\\\\
>
> > I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
>
> >
>
> My bad explaining; I was talking about #isTerminated implementation in
> Cuis (and Pharo) using this old condition - on the last line of:
>
> Process >> isTerminated
>
> "Answer true if terminated, i.e. can never be resumed
> again, because have nothing to run."
>
> self isRunning ifTrue: [^ false].
>
> ^suspendedContext isNil
>
> or: [ suspendedContext pc isNil
>
> or: ["If the
> suspendedContext is the bottomContext it is the block in
> Process>>newProcess.
>
> If so, and
> the pc is greater than the startpc, the bock has alrteady sent and
> returned
>
> from value
> and there is nothing more to do."
>
>
> suspendedContext isBottomContext
>
>
> and: [
>
>
> suspendedContext pc > suspendedContext startpc]]]
>
> Squeak improved this condition a while ago but it's still insufficient
> :) It's discussed here:
> http://forum.world.st/The-Inbox-Kernel-jar-1376-mcz-tp5127335p5127341.html
>
> > In any case, I think this is a great contribution, It perfectly
> fixes a lot of broken behavior, and doesn't seem to bring problems.
> I'm all for integrating it.
>
> >
>
> > Jaromir, do you intend to turn the rest of your examples into tests?
> I think we should also do that, so we integrate all your experiments
> and what you learnt.
>
> >
>
> I'd love to but I realized I don't know how to build tests involving
> raising errors and pressing Abandon on the Debugger :D If you could
> point me in the right direction it'd save me a lot of trying :)
>
> Thanks a lot for all your comments. I'm happy you like the suggestion
> and would be delighted to see it integrated.
>
> > Thanks!
>
> >
>
> Best regards,
>
> Jaromir
>
> *From: *Juan Vuletich <mailto:juan at jvuletich.org>
> *Sent: *Tuesday, April 13, 2021 20:45
> *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>; Jaromir
> Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Jaromir, hi all,
>
> This is a most impressive work!
>
> It made me remember this talk given by Martin McClure in Smalltalks
> 2019: https://www.youtube.com/watch?v=AvM5YrjK9AE . At minute 22:00,
> Martin states the behavior he expects from #terminate, and shows a
> test. I tried his test in Cuis, and sometimes (but not always) it
> failed. Jaromir, with your code, Martin's test now pass! In addition,
> I ran the existing Process tests, and none is made to fail.
>
> I also tried all your tests and examples. They cover a really broad
> and exhaustive set of scenarios! A few comments:
> - Many examples don't work in Cuis, because we made #fork answer nil
> (and not the forked process). The reason for this is explained in the
> comment in #fork. The pattern is instead of "p := [stuff] fork." do "p
> := [stuff] newProcess. p resume." With this change, all the examples
> work as expected.
> - The notes for a few examples speak of crashes in Cuis. I pushed some
> fixes a few days ago, and now cmd-. works as expected. The examples
> behave the same as in Squeak. Please pull your repos.
>
> Additionally, this remark of you (Jaromir) I don't understand:
> /////
> ad 1) The #isTerminated condition `suspendedContext pc >
> suspendedContext startpc` is always met after executing the first
> instruction of the bottom context – generally not a sign of a
> terminated process.
> \\\\\\
> I don't see that condition neither in Squeak (#19435) nor Cuis (#4567).
>
> In any case, I think this is a great contribution, It perfectly fixes
> a lot of broken behavior, and doesn't seem to bring problems. I'm all
> for integrating it.
>
> Jaromir, do you intend to turn the rest of your examples into tests? I
> think we should also do that, so we integrate all your experiments and
> what you learnt.
>
> Thanks!
>
> --
> Juan Vuletich
> www.cuis-smalltalk.org <http://www.cuis-smalltalk.org>
> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> https://github.com/jvuletich
> https://www.linkedin.com/in/juan-vuletich-75611b3
> @JuanVuletich
>
>
>
> On 4/13/2021 6:02 AM, Jaromir Matas via Cuis-dev wrote:
>
> Hi Hernan, hi all,
>
> I'm enclosing a slightly improved version of #terminate along with
> a few semantic tests covering basic nested unwind scenarios
> (terminating an active, suspended, blocked and ready processes in
> nested unwind blocks). More tests can be based on examples in the
> enclosed collection - for scenarios involving non-local returns
> and errors during termination and especially their combinations. A
> few examples explore resiliency of the code exposed to pathologic
> situations (nested termination or errors).
>
> Thanks again for your time. I look forward to your response.
>
> best,
>
> Jaromir
>
> *From: *Jaromir Matas via Cuis-dev <mailto:cuis-dev at lists.cuis.st>
> *Sent: *Sunday, April 11, 2021 10:51
> *To: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>;
> Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Hernan,
>
> Thank you very much for your immediate response.
>
> > Hi Jaromir,
>
> > thank you for sharing this with us! It looks very interesting.
>
> > As you can imagine a change of this magnitude and impact has to
> be analyzed in detail and tested rigorously, and that will take time.
>
> Oh absolutely, I'm aware of that and wouldn't expect anything less :)
>
> > It would help us deeply if there are tests that reproduce the
> errors, do you have them?
>
> Yes, I have a set of Squeak tests I'm planning to extend and make
> presentable ASAP. I'll "translate" them to Cuis too.
>
> > If not, could you write them? I know it is a difficult problem
> to test but if we could have automated tests for this it would
> help a lot.
>
> > Also, it would help to know how you tested it.
>
> That's important: I tested solely for 'correct' semantics in a
> series of simple scenarios I'll send you.
>
> > For example, have you tried in stress conditions?
>
> Nope, I'm not that experienced unfortunately :)
>
> > for example with an app that uses many processes (ie. seaside or
> any web framework). Please, do not take this question as
> disrespectful, but again,
>
> > it is a change with a big impact and we need to be sure it will
> keep the current behavior (but the bugs of course) and it will not
> introduce new bugs.
>
> > I'm also wondering if there could be systems that are
> implemented as these bugs you mention were "features" ...
>
> That's an interesting question. I realized the sooner bugs this
> deep in the system are fixed the less chance someone learns to
> 'live' with them, crippling their code, or even uses them as
> 'features'. I just hope it's not the case. How would we know?
> Well... actually here's one: while testing it on Pharo I found two
> tests that sort of 'legitimize' or use the Unwind error (bug) to
> achieve certain behavior... (if I interpret the tests correctly).
>
> Thanks again for your questions!
>
> Jaromir
>
> > Thanks!
>
> > Hernan.
>
> *From: *Hernan Wilkinson <mailto:hernan.wilkinson at 10pines.com>
> *Sent: *Sunday, April 11, 2021 1:47
> *To: *Discussion of Cuis Smalltalk <mailto:cuis-dev at lists.cuis.st>
> *Cc: *Jaromir Matas <mailto:mail at jaromir.net>
> *Subject: *Re: [Cuis-dev] Unwind mechanism during termination is
> broken and inconsistent
>
> Hi Jaromir,
>
> thank you for sharing this with us! It looks very interesting.
>
> As you can imagine a change of this magnitude and impact has to
> be analyzed in detail and tested rigorously, and that will take
> time. (Sadly Juan and me are really busy at this time so it will
> even take more time :-) )
>
> It would help us deeply if there are tests that reproduce the
> errors, do you have them? If not, could you write them? I know it
> is a difficult problem to test but if we could have automated
> tests for this it would help a lot.
>
> Also, it would help to know how you tested it. For example, have
> you tried in stress conditions? for example with an app that uses
> many processes (ie. seaside or any web framework). Please, do not
> take this question as disrespectful, but again, it is a change
> with a big impact and we need to be sure it will keep the current
> behavior (but the bugs of course) and it will not introduce new bugs.
>
> I'm also wondering if there could be systems that are implemented
> as these bugs you mention were "features" ...
>
> Thanks!
>
> Hernan.
>
> On Sat, Apr 10, 2021 at 4:20 PM Jaromir Matas via Cuis-dev
> <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> wrote:
>
> Hi All,
>
> I'd like to present to you a rewrite of #terminate and
> #isTerminated that fixes a few bugs and inconsistencies in
> process termination. I hoped it might interest you.
>
> I'm a Smalltalk enthusiast with education background in
> math/CS. I've been experimenting with processes in Squeak
> lately and discovered a few bugs (or at least inconsistencies)
> in process termination and would like to offer and discuss a
> solution for Cuis.
>
> The bugs are not unique to Cuis; Squeak/Pharo inherited them
> too and to a degree even Visual Works and VA are affected. The
> proposal presented here doesn't copy any VW or VA solution but
> rather implements a different approach :)
>
> Before boring you to death I'll list the bugs:
>
> 1. #isTerminated falsely reports almost any bottom context as
> terminated
>
> 2. an active process termination bug causes an image freeze
>
> 3. a /nested/ unwind bug: ensure blocks may get skipped during
> unwind
>
> 4. a failure to complete /nested/ unwind blocks halfway thru
> execution
>
> 5. a failure to correctly execute a non-local return or an
> error in an unwind block
>
> 6. inconsistent semantics of protected blocks unwind during
> active vs. suspended process termination
>
> The current implementation of #terminate uses three different
> approaches to terminate a process:
>
> - the active process is terminated via a 'standard' unwind
> algorithm used in context #unwindTo: or #resume:,
>
> - a suspended process termination attempts completing unwind
> blocks halfway through their execution first using
> #runUntilErrorOrReturnFrom:
>
> - and after that the termination continues unwinding the
> remaining unwind blocks using the simulation algorithm of #popTo:
>
> This approach /looks/ inconsistent and indeed leads to
> inconsistencies, undesirable behavior and an instability
> mentioned above.
>
> The Idea: In my view the easiest and most consistent solution
> is to simply extend the existing mechanism for completing
> halfway-through unwind blocks and let it deal with *all*
> unwind blocks. To make this approach applicable to terminating
> the active process, we suspend it first and then terminate it
> as a suspended process.
>
> A commented code is enclosed.
>
> I know it's pretty difficult to get a feedback on an ancient
> code; I'll be all the more grateful for your inputs.
>
> Best regards,
>
> Jaromir
>
> PS:
>
> Note the change in newProcess - `Processor activeProcess
> terminate` is replaced by `Processor activeProcess suspend`
> because there's no need for `terminate` at the bottom context
> ("terminate = unwind + suspend") and because it would lead to
> an infinite loop combined with my proposed changes to #terminate.
>
> PPS:
>
> More on the bugs:
>
> ad 1) The #isTerminated condition `suspendedContext pc >
> suspendedContext startpc` is always met after executing the
> first instruction of the bottom context – generally not a sign
> of a terminated process.
>
> ad 2) Explained in [1]. Concerns e.g. examples like
>
> [ [ Processor activeProcess terminate ] ensure:
> [ Processor activeProcess terminate ] ] fork.
>
> ad 3-4) Explained in detail in [2].
>
> One example to illustrate the bug:
>
> | p |
>
> p := [
>
> [
>
> [ ] ensure: [
>
> [ ] ensure: [
>
>
> Processor activeProcess suspend.
>
>
> Transcript show: 'x1'].
>
> Transcript show: 'x2']
>
> ] ensure: [
>
> Transcript show: 'x3']
>
> ] newProcess.
>
> p resume.
>
> Processor yield.
>
> p terminate
>
> ===> x1
>
> The unwind procedure prints just x1 and skips not only x2 but
> x3 as well ! You'd like to see them all.
>
> ad 5) This happens in cases like (better save your image
> before trying this):
>
> [self error: 'e1'] ensure: [^2]
> "discovered by Christoph Thiede"
>
> and
>
> [self error: 'e1'] ensure: [self error: 'e2']
>
> These are generally the types of situations causing the Unwind
> errors. The root cause is that the unwinding is done via
> simulation (#popTo) rather than 'directly'; the problem is
> during the simulated execution of unwind blocks a non-local
> return forwards the execution into a wrong stack - resulting
> in the Unwind errors.
>
> ad 6) I just prefer a unified approach... unless I somehow
> overlooked a reason for two different approaches (I hope not).
>
> [1]
> http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
>
> [2]
> http://forum.world.st/Another-bug-in-Process-gt-gt-terminate-in-unwinding-contexts-td5128171.html#a5128178
>
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
>
> --
>
> <https://10pines.com/>
>
>
> Hernán Wilkinson
>
>
> Software Developer, Teacher & Coach
>
> Alem 896, Floor 6, Buenos Aires, Argentina
>
> +54 11 6091 3125
>
> @HernanWilkinson
>
--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210429/4c8355c3/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Terminate-fixesAndTests.zip
Type: application/x-zip-compressed
Size: 9334 bytes
Desc: not available
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20210429/4c8355c3/attachment-0001.bin>
More information about the Cuis-dev
mailing list