[Cuis-dev] Inconsistent #= and #hash

Juan Vuletich juan at jvuletich.org
Wed Jun 12 04:09:08 PDT 2019


On 6/12/2019 1:19 AM, Luciano Notarfrancesco wrote:
> Awesome, thanks! (That's a google-suggested automatic reply, first 
> time I try it, but really thanks).

:)

BTW, be sure to update to #3798. With this and previous updates, this 
runs in 20 minutes in my PC (without halting):

instances := Object allSubInstances.
{ 'Total: '. instances size} print.
Time now print.
1 to: instances size do: [:i|
     i \\ 1000 = 0 ifTrue: [ i print ].
     a := instances at: i.
     aHash := a hash.
     i+1 to: instances size do: [:j|
         b := instances at: j.
         (a = b and: [(aHash = b hash and: [b = a]) not]) ifTrue: [self 
halt]]].
Time now print.

BTW, I found the StackSizeWatcher in ProcessBrowser (by Hernán 
Wilkinson) extremely useful for dealing with the bugs this snippet made 
visible.

>
> So to make it clear, the current implementation of Collection>>hash 
> implies that if two Collections are equal, then they must at least 1) 
> be the same species, and 2) have equal elements. New Collections can 
> have more requirements in order to be equal, but those two are 
> necessary otherwise the hashes won't match. This is the essence of a 
> Collection's identity.

Agreed.

> BTW, do you think 'aCollection includes: anObject' should work without 
> errors for arbitrary anObject instance of any class?

Yes. Any errors is a bug or at least a smell, and we'd discuss it... 
Well, no. Adding this as I come to the end of your message. See comments 
there.

> I had a similar problem with algebraic structures (what I call 
> Domains), and I ended up implementing #includes: to work for arbitrary 
> objects, and a new message #contains: (not the best name, might think 
> a better one in the future) that is faster and simpler and assumes the 
> objects are of a certain type.

I guess I'd call it something like #containsDomain:, including in the 
selector the assumption about the type. A matter of style, of course.

> I don't think we need two messages like that for Collections, but I 
> still don't know what's the correct way of thinking about 
> Collection>>includes:, and if we think that it should work for 
> arbitrary objects we can try a similar script to test 'aCollection 
> includes: Object new' in all instances of Collections... without 
> trying that, I know that Interval fails for anything that is not a 
> Number, not sure if should be fixed or not. What do you think?

Many collections can hold any kind of object. Those should not fail when 
evaluating `aCollection includes: anObject`, for any object. Others are 
specific for some kind of content, like String, ByteArray, FloatArray, 
etc. It is ok for those to raise an error when evaluating `aCollection 
includes: anObject`, if trying to add that anObject would also fail. I 
see no problem there. Still, I wouldn't object if people prefer to just 
make #includes: answer false in those cases.

> On Tue, Jun 11, 2019 at 5:25 PM Juan Vuletich <juan at jvuletich.org 
> <mailto:juan at jvuletich.org>> wrote:
>
>     On 6/11/2019 8:49 AM, Luciano Notarfrancesco via Cuis-dev wrote:
>     > I did this to fish for inconsistencies:
>     >
>     > instances _ OrderedCollection new.
>     > Object withAllSubclassesDo: [:each| instances addAll: each
>     allInstances].
>     > instances _ instances asArray.
>     >
>     > 1 to: instances size do: [:i|
>     >     a _ instances at: i.
>     >     aHash _ a hash.
>     >     i+1 to: instances size do: [:j|
>     >         b _ instances at: j.
>     >         (a = b and: [(aHash = b hash and: [b = a]) not]) ifTrue:
>     [self
>     > halt]]]
>     >
>     > (I typed it manually here, I hope I copied it right, but you get
>     the
>     > idea.)
>     >
>     > Found out this:
>     > - FeatureRequest forgets to implement hash;
>     > - #() = Semaphore new, but not the other way around, and hashes
>     also
>     > differ;
>     > - similarly, LinkedList new = Semaphore new;
>     > - RunArray new = Object new fails because it assumes the
>     argument is a
>     > Collection and sends isSequenceable;
>     > - RunArray new = Text new, but hashes differ;
>     > - Set new = Dictionary new but hashes differ;
>     >
>     > And there are more, I stopped before finishing.
>     >
>     > I don't know how to fix some of those, and the implications of
>     > changing the behavior of #= or #hash are not obvious in some cases.
>     > But I think we should change Collection>>hash to set the initial
>     value
>     > to 0 instead of 'self species hash', and that would fix two of the
>     > issues above. What do you think? Other ideas?
>
>     I just pushed fixes to most of them. For Semaphore and RunArray it
>     is a
>     matter of implementing and honoring #species. For Set, it was making
>     aDictionary is: #Set to answer false. For the others it was adding
>     a new
>     #hash or #= methods. I don't expect much breakeage.
>
>     I'll run now a tweaked (hopefully faster) version of your script.
>
>     Cheers,
>
>     -- 
>     Juan Vuletich
>     www.cuis-smalltalk.org <http://www.cuis-smalltalk.org>
>     https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
>     https://github.com/jvuletich
>     https://www.linkedin.com/in/juan-vuletich-75611b3
>     @JuanVuletich
>

Cheers,

-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20190612/ccad0876/attachment-0001.htm>


More information about the Cuis-dev mailing list