[Cuis-dev] Questions triggered by #forceChangesToDisk

Phil B pbpublist at gmail.com
Tue Sep 17 02:10:42 PDT 2019


Andres,

On Tue, Sep 17, 2019 at 3:28 AM Andres Valloud via Cuis-dev <
cuis-dev at lists.cuis.st> wrote:

> Hey Phil, this is neat :).  Let's play the VM development game a bit, I
> think it's helpful to at least give an idea of what it's like.  The same
> principles can be used to any program, and IME the results are good.
>
>
Works for me!


> On 9/16/19 23:10, Phil B wrote:
> > Taking a half step back, I ask the question: what's it supposed to be
> > doing?  According to
> >
> https://github.com/Geal/Squeak-VM/blob/master/platforms/Mac%20OS/vm/Documentation/3.2.2%20Release%20Notes.rtf
> > the file flush primitive was added in (classic) VM 3.0.5 and 'now
> > actually flushes the file via an OS call' as of 3.0.6.
>
> Interesting: how do the dates of the primFlush: method (circa 2001) and
> the 3.0.6 VM correlate?  Is this a case of the Smalltalk image hacks
> going stale while the VM changes away?  Interesting: both primFlush: and
> that VM are essentially contemporaneous, because the VM is from about 2002.
>

Yes, but also keep in mind that the VM/image changes haven't necessarily
been in sync since at least whenever the separate VM team started up (not
sure when that occurred.  But for example, there has been discussion on
vm-dev about Sista VM development but primary Squeak development isn't
doing anything with it yet etc. etc... image support often lags a bit since
the VM changes need to exist before they can support them though it's
entirely likely that on some changes it has led)


> Why would primFlush: retain the comment about xyzOS not doing flush when
> the VM release notes insist that flushing now flushes?


This is one of the downsides to having separate development teams: left
hand vs. right hand and all that.


>   Note also the
> reference to CodeWarrior 5.3, but according to this:
>
> https://en.wikipedia.org/wiki/CodeWarrior
>
> that only ran on Windows and Mac.  Also, surely CodeWarrior usage is
> super obsolete by now.  What's going on in here?...
>

While CodeWarrior is obsolete in this century for new development, it was
pretty much *the* go-to development platform for Macs as of PPC support (it
was the first, and for a long time only, mainstream compiler supporting
PPC) until OS X.  And as is typical, many devs clung to it until the bitter
end (when supporting OS 9 and prior was no longer viable.)


>
> Side comment: you know, back then there was a POSIX / Single Unix
> Specification, so all that was necessary was to write the VM to POSIX
> (mostly, on Windows you have to do a bit of work for that).  However,
> maybe it can be excused because POSIX /SUS was rather new at the time.
>
> https://en.wikipedia.org/wiki/POSIX
>
> Maybe at the time there wasn't a decent SDK on Mac... no idea.  In any
> case, that's not the case today.
>

Just because there was a specification doesn't mean that it was fully or
even correctly supported throughout most of the 90's even by those who were
supposedly POSIX compliant.  A big part of open source development work was
in dealing with some rather significant deviations between platforms... and
that was just on Unix.  Also, GNU (and therefore Linux) was notably
incomplete as well as incompatible by design on one or two things IIRC.
That said, Mac and Windows were not POSIX compliant in any meaningful way
back then. (Windows had a POSIX compliance subsystem for NT but I don't
recall anyone I worked with ever using it... I think it was more a
marketing bullet point than anything that saw serious use back then.  I
don't think Mac had anything to say on the POSIX front until OS X)

It's similar to today with HTML standards:  everyone is compliant... kinda,
sorta... to varying degrees... with inadvertent and deliberate deviations.

> [1] "fflush() can fail for the same reasons write() can so errors
> > must be checked but sqFileFlush() must support being called on
> > readonly files for historical reasons so EBADF is ignored"... so
> > there's one example of how it could fail but for this particular
> > failure case it is ignored in the C code
> That's interesting, I'd verify whether a 5 line C program shows fflush()
> fails with EBADF when given a file open for reading only.  POSIX says
> EBADF is returned when the file handle isn't valid, but what if you do
> pass in a valid file handle?  Shouldn't fflush() be a no-op then?
>

I would expect that it depends on the underlying object the handle
references: local file (what kind of filesystem), network file (what kind)
etc. and that's where the 'other errors' would come into play.


> > platforms/win32/plugins/FilePlugin/sqWin32FilePrims.c (which
> > calls FlushFileBuffers(FILE_HANDLE(f)) and ignores the return code)
>
> Ignoring return codes is not good at all.  In particular, you can get
> into all sorts of problems by doing that in MSDN land.
>

That could either be due to the errors reported being different enough that
meaningfully supporting/reporting would have been a chore or simple
laziness.  And of course everyone knows fflush never fails in the real
world (see your famous last words comment ;-)


>
> > It looks like it will never fail on Windows (regardless of the fact that
> > the call might have)
>
> In MSDN, any time you see "call GetLastError() to see what happened",
> effectively that means any of these circumstances can happen:
>
> https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes
>
> Note the numbers go up to 15999.  Ok fine they are not all used, but
> still.  And look at this text:
>
> "System Error Codes are very broad: each one can occur in one of many
> hundreds of locations in the system. Consequently, the descriptions of
> these codes cannot be very specific. Use of these codes requires some
> amount of investigation and analysis. You need to note both the
> programmatic and the runtime context in which these errors occur.
> Because these codes are defined in WinError.h for anyone to use,
> sometimes the codes are returned by non-system software. And sometimes
> the code is returned by a function deep in the stack and far removed
> from your code that is handling the error."
>
> In practice, this means "no MSDN function documentation page will list
> what errors can occur when calling it", which means "anything goes".
> This is very much unlike POSIX, i.e. very much unhelpful.
>

See my comment about POSIX compliance above ;-)


>
> For instance, why is it that I need to care that ReadFile() and
> WriteFile() may fail with ERROR_NO_SYSTEM_RESOURCES when attempting an
> I/O operation at least 64mb - 32kb + 16 bytes in size (this figure is
> undocumented), but only when that I/O occurs on mapped drives, and even
> if the mapped drive is local to the machine?
>
> Because e.g.: loading or saving the image fails.
>
> Ah, right.  So now that *ONE* error condition needs special handling.
> Great.  Only 15998 possible values to go.
>

One thing that can be said in defense of Windows VM development: prior to
OS X, Mac VM support was probably just about as distinct.  Mac moved to
Unix and Windows stayed put so now it's the odd OS out on this front.


>
> > but can fail everywhere else depending on the rules
> > of the particular OS.  On Linux, over a dozen possible error codes are
> > given (one of them is the ignored EBADF case) as well as a note that a
> > variety of additional errors can occur depending on the particular
> > object the file descriptor represents.  So reasons: many.
>
> Yes, however, there are O(10) possible errors, not O(10^4).  It's a huge
> difference with the MSDN world.  I'd rather write against the Unix
> subsystem / standard C library on Windows.
>
> > I believe the #forceChangesToDisk hack had a different objective.  The
> > other hack(s) are dealing with flush failure, #forceChangesToDisk
> > appears to predate flush support and/or to deal with the reality at the
> > time that flush alone often wasn't a complete solution.
>
> Ok, so if that's true, then we're dealing with bit rot.
>
> This shows that it is incredibly important to be completely thorough,
> because it is at that time that a good understanding of the entire
> problem is in anyone's head.  If you are not thorough today, someone
> else will have to recreate your state of mind tomorrow.  Overall,
> everybody goes slower.
>

Yep, that's why I'm always complaining about documentation.  What seems
obvious today often won't be six months or more from now.  The frustrating
thing is that even though we're being thorough, we could still easily be
wrong since we're trying to piece together intent with incomplete
information.  It's better than guessing but worse than if someone had just
written...a ...few...more... words of documentation ;-)


>
> > I'm as baffled by that cryptic comment as you.
>
> Might as well delete it, then.  It serves no good purpose if it can't be
> tied to anything concrete.
>

I'm OK with that.


> > I did a little more general search trying to find something, anything
> > that might point in a direction that leads to clarity... nothing so far.
>
> The reference in POSIX says fflush() flushes, and the MSDN reference
> says FlushFileBuffers() flushes.  If that covers all platforms, the
> comment needs to go because misbehavior means it's not your problem and
> you can file a bug against the spec.  Provided, of course, that you are
> certain as you can be that the relevant API is being used correctly, and
> that you can recreate the problem in a small, standalone C program,
> which you will attach to the bug report :).
>
> Andres.
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20190917/d14f6373/attachment-0001.htm>


More information about the Cuis-dev mailing list