[Cuis-dev] Exception handler blocks with non-local returns

Mon Nov 4 22:15:06 PST 2019

Hi,

On 11/4/19 08:27, Phil B wrote:
>     Are we writing code to help others help us, or just to please
>     ourselves?
> 
> Both... what would be the point of contributing if we didn't like the 
> result ourselves?

Sorry, I meant something else, more like if we write stuff thinking 
about more than just ourselves, chances are others will have a better 
time with it, so when it comes time for them to do their bit and help 
the collective the result will be better for everyone.

>        What are the long term consequences of those two approaches?

So if instead there's no concern for how others do their bit, i.e. if 
the code isn't written to help others manage it better, eventually 
change stops because the cost is too much to bear.

> That's why I'm arguing for the functionality.  Eliminating the 
> functionality that is provided by '^' has short and long term 
> consequences that I think are problematic.

I think we agreed on this particular bit already :).

> I can provide code for one example and talk through a couple others:
> 
> 1) In OMeta, there are a few places where exception handlers capture 
> rule failures.  In one case 
> (https://github.com/pbella/OMeta-Cuis/commit/4884384de762ca8508b35a8952054f2333b40f26#diff-51b2428bf35caf95ff9fd8b0b88d37d8 
> - see line 677 of OMeta2Extensions.pck.st 
> <http://OMeta2Extensions.pck.st>) the non-local return is pointless: it 
> has nothing to do with handling the exception so it makes perfect sense 
> to remove it since we would be returning that value had the exception 
> not occurred.

Right, that should be removed, it's overly forceful for what needs to 
happen.

> However in 
> https://github.com/pbella/OMeta-Cuis/blob/4884384de762ca8508b35a8952054f2333b40f26/OMeta2Preload.st#L544 
> and 
> https://github.com/pbella/OMeta-Cuis/blob/4884384de762ca8508b35a8952054f2333b40f26/OMeta2Preload.st#L606 
> the non-local return is related to handling the exception (i.e. there is 
> nothing more we can do in the current method, return to sender) so it 
> seems to make sense to keep the functionality in the handler.

Ok, that makes sense.

> Juan 
> showed that this can be worked around by effectively creating an 'outer' 
> handler around the handler block which performs the non-local return and 
> it does work.  But how is this an improvement?  The non-local return is 
> still being done, just not from within the exception handler where it 
> really belongs.  So now the exception framework doesn't have a full 
> picture of what's going on and the code is a bit less straightforward.

I can see the motivation for wanting non-local return behavior in 
exception handlers.  Sometimes it's just really convenient.

> 2) Networking code.  Sometimes you will have a method whose job it is to 
> send/receive some data which makes a lot of implicit assumptions which 
> can be invalidated at a moment's notice.  Connections (esp. http) have a 
> tendency to just disappear which is to be expected and normal.  When 
> that happens, there's often no point in either trying to continue in the 
> current method (you can't do anything to 'fix' it... you no longer have 
> a connection to send to/receive from) or raise an exception up the call 
> stack (the sender can't do anything either): just stop what you were 
> doing and return to the sender whatever the most appropriate return 
> value is. (i.e. the failure should terminate the send/receive operation 
> and the sender can continue but sometimes doesn't need or want to know 
> that it failed)

I'm more familiar with how a message such as #receive would be 
implemented in some socket class, than how applications tend to write 
code that uses that functionality.  So, suppose you had a method like this:

doTheHttpThing

	| data |
	self prepareToDoTheHttpThing.
	data := [self socket receive] on: SocketError do: [:ex | ^nil].
	data preprocess filter blah blah.
	^data

So now the problem that the on:do: is dealing with is what happens when 
the socket disconnects from under the client code.  Then #receive fails, 
there is no recourse, and the answer should be nil.  One could do the 
wrap around like this:

	data := [self socket receive] on: SocketError do: [:ex | nil].
	data isNil ifTrue: [^nil].

and I'd be tempted to write it that way now because who knows that ^nil 
does today.  But suppose that no, that one would rather not have extra 
statements.  Then one might want something like [:ex | ex methodReturn: 
nil] instead.

In this case, why couldn't that be written like this?

	| data |
	^[
		self prepareToDoTheHttpThing.
		data := self socket receive.
		data preprocess filter blah blah.
		data
	] on: SocketError do: [:ex | nil]

Now there's no non-local return as far as exceptions are concerned, and 
by the way the code is faster in all cases because non-local return is 
expensive.

Once I wrote a multi-process web spider, and I remember writing the code 
so that the actual networking interaction happened in a very small place 
that could be controlled easily with patterns such as the above.  Is the 
problem that application code doesn't always factor that nicely to allow 
that implementation strategy?  I do not have enough of a sample to tell.

> For example, I have a periodic task which involves ~10k http requests.  
> A couple hundred of those will fail each run for various reasons: 
> network errors, server down, server errors, page errors etc.  There are 
> exception handlers at different levels of connection/request handling 
> that decide if the error is something that needs to be specifically 
> handled / retried or if terminating the request (or perhaps it was 
> terminated on us) and returning some specific value to the sender is the 
> appropriate result.  These are almost always non-local returns since 
> we're failing right in the middle of processing a request which is no 
> longer valid.

I suspect this is one of those cases where the code just doesn't factor 
nicely, right?

> 3) FFI code and dealing with the 'outside world' more generally.  
> Similar situation as 2: something fails either in an expected place or 
> in an expected way.  The current method can't continue but the sender 
> can.  Examples: a non-critical log file or database can't be found or 
> has a problem... (maybe) log something to transcript and return from the 
> method in question.  A backup file can't be copied to a secondary backup 
> location... we probably aren't connected to the network so ignore it and 
> try again the next time we get called.  I have several of these: fatal 
> to the current method but the sender doesn't care.
> 
> The thing all of these examples have in common is that they are all 
> essentially resumable, fatal exceptions... that need to resume somewhere 
> else.

Maybe more like critical exceptions that are not fatal simply because of 
the context in which they occur.

You know, it just occurred to me that having such exceptions implement a 
#defaultAction method isn't really useful, either.  Suppose you could 
arrange the code so that the non-fatal yet critical exceptions were 
modeled by a particular class.  Instead of a million exception handlers 
everywhere doing something like

	[...] on: NonFatalIOError do: [:ex | ex methodReturn: nil]

one would like #defaultAction to do that.  However, #defaultAction 
cannot do that now because a non-local return in the #defaultAction 
method of the exception does not do what is required.

If exceptions implemented #methodReturn: instead, then it might be 
possible for #defaultAction to be written like this:

NonFatalIOError>>defaultAction

	self methodReturn: nil

This is not perfect yet, and one still needs to mark the return point 
with on:do:, but maybe there's a way to write even less code here.

>     Yeah, I can see this in general.  I'd even phrase it as "from a best
>     practices standpoint, we recommend avoiding non-local returns in
>     exception handler blocks, but if you choose to do that then using the
>     #methodReturn: method provided for that effect helps the maintainers of
>     the exception framework keep things working as intended because said
>     method *defines* what it means to do a non-local return in this
>     context".
> 
> I'm fine with that.

Awesome, this is in the spirit of my initial comments above.

>     But let's wait until we get there, and especially after we get a feel
>     for the most difficult cleanup cases.  There's no need to put that down
>     in stone right now.
> 
> C'mon folks, the train is at the station and I'm on board.  Let's get 
> moving... ;-)

:)... let's just finish getting the evidence, then we can write the 
commentary in the right place with full confidence.

> I've already started in my code as you've sold me on it being something 
> to avoid when not needed.  I've been holding off going beyond that as I 
> don't think we've reached agreement as to the best course of action 
> yet.

I'm not convinced that the system code is beyond cleaning up... I'll 
take a look and see what.

Andres.