<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi Luciano,</p>
    <p>I wonder what advantage you see in having integers be their own
      #hash. I'm having trouble thinking of any reason that would be
      more desirable than using something like hashMultiply.<br>
    </p>
    <p>In general, a hash function maps some key to an index in a table.
      In Smalltalk, a hash function consists of two parts. The first
      part is the #hash method. The second part is whatever the
      collection decides to do to map the result of the #hash method to
      indices in its table.</p>
    <p>So sure, you could move hashMultiply from the hash method to the
      collections and get the same result -- for SmallIntegers. But
      classes like Set and Dictionary are intended to have keys of any
      kind of object, not just SmallIntegers. Some objects produce
      better-scrambled #hash values than integers do, and for those
      objects sending hashMultiply in the collection is an extra
      unnecessary step (and in some cases might even make things worse,
      but I haven't looked deeply into the theory for that).</p>
    <p>What has worked best for me is to make the #hash method of all
      objects answer SmallIntegers, spread across the entire
      non-negative SmallInteger range in a manner that "looks random" as
      much as possible. Then the collections can typically just modulo
      the hash value with their table size in order to complete the hash
      function, and get good results. <br>
    </p>
    <p>Regards,</p>
    <p>-Martin<br>
    </p>
    <div class="moz-cite-prefix">On 1/14/25 11:10 PM, Luciano
      Notarfrancesco via Cuis-dev wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAL5GDypV8AJO4z5XOncqK0n3HjVb18XcosrYS4in6AJ=LN=zHg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="auto">Yes, I came across that issue with hashed
        collections of consecutive integers some years ago, and that’s
        why we changed SmallInteger>>hash in Cuis. However, I
        think it might be better to change Set>>scanFor: instead
        (and in the appropriate subclasses like Dictionary, etc), put
        the call to hashMultiply there (or “shuffle” the hash in some
        other way), and make integers be their own hash again.</div>
      <div dir="auto"><br>
      </div>
      <div>
        <div class="gmail_quote gmail_quote_container">
          <div dir="ltr" class="gmail_attr">On Wed, Jan 15, 2025 at
            06:14 Nicolas Cellier <<a
              href="mailto:nicolas.cellier.aka.nice@gmail.com"
              moz-do-not-send="true" class="moz-txt-link-freetext">nicolas.cellier.aka.nice@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
            <div dir="ltr">
              <div>
                <div>This time it seems a bit different though, the
                  bench is not dominated by GC, but by hash collisions.<br>
                  <div>
                    <div><br>
                      Cuis Set uses the old fashioned capacity:</div>
                    <div>sizeFor: nElements<br>
                           "Large enough size to hold nElements with
                      some slop (see fullCheck)"<br>
                           nElements <= 0 ifTrue: [^ 1].<br>
                           ^ nElements+1*4//3 <br>
                    </div>
                  </div>
                  <div>Squeak Set uses more advanced #<b>goodPrimeAtLeast:</b>
                    for determining the capacity<b>.<br>
                    </b>So we would expect Squeak to be better... But
                    no, it isn't.<b><br>
                    </b></div>
                  <div><b><br>
                    </b></div>
                  <div>First observe that we have:<br>
                  </div>
                  <div><br>
                  </div>
                  <div>    (hashes sorted = (1 to: hashes size)
                    asArray).</div>
                  <div><br>
                  </div>
                  <div>Hence, we try to put consecutives SmallInteger in
                    a Set.<br>
                  </div>
                  <div><br>
                  </div>
                  <div>If I bench Eliot's test, but with gradually
                    growing size (1<<18-1),
                    (1<<19-1),(1<<20-1), (1<<21-1)<br>
                  </div>
                  <div>I get:<br>
                     '6.87 per second. 146 milliseconds per run.
                    13.48116 % GC time.'<br>
                     '3.23 per second. 310 milliseconds per run.
                    14.60311 % GC time.'<br>
                     '1.56 per second. 641 milliseconds per run. 15.1462
                    % GC time.'</div>
                  <div> '0.087 per second. 11.5 seconds per run. 0.04346
                    % GC time.'</div>
                  <div><br>
                  </div>
                  <div>So the cost grows linearly with size up to 2^20,<br>
                    but 2^21 costs us a factor x20 instead of expected
                    x2,<br>
                    that's 10 times more than expected,<br>
                  </div>
                  <div>and GC cost drops in percentage...<br>
                  </div>
                  <div>
                    <div><br>
                    </div>
                    <div>I won't even try (1<<22-1) on my slow
                      machine...</div>
                  </div>
                  <div>What happens ?<br>
                    <br>
                  </div>
                  <div>Simple : the answer is in
                    SmallInteger>>hash ^self<br>
                  </div>
                  <div>The identity hashes are varying from 1 to
                    1<<22-1.</div>
                  <div>Up to 1<<20-1, we do have not so many
                    consecutives hashes statistically.<br>
                  </div>
                  <div>But we have way more consecutive with
                    1<<21-1 elements (amonf 1<<22-1 possible
                    hashes).<br>
                  </div>
                  <div>And since candidate index in scanFor: is simply
                    object hash\\array size+1,</div>
                  <div>once we have a collision (and we will have until
                    the array size grows above 1<<22-2), <br>
                    since we have already filled consecutive slots with
                    consecutives hashes,</div>
                  <div>we are getting long collision chains.<br>
                    <br>
                  </div>
                  <div>So why is Cuis better behaved?<br>
                  </div>
                  <div>Because Integer>>hash will return self
                    hashMultiply...<br>
                  </div>
                  <div>Note that above 1<<54, consecutive Integer
                    hashes will show quite some collisions because first
                    convert asFloat, and many consecutive LargeInteger
                    convert to the same Float.<br>
                  </div>
                  <div><br>
                  </div>
                  <div>Just redefine SmallInteger hash>>^self
                    hashMultiply, and you'll get more decent benchmark.<br>
                  </div>
                  <div>Except, that you can't do it interactively,
                    because you have to rehash all hashed collection
                    before Morphic takes the hand back.</div>
                </div>
              </div>
            </div>
            <div dir="ltr">
              <div>
                <div>
                  <div><br>
                    <br>
                  </div>
                  <div>Nicolas<br>
                  </div>
                </div>
              </div>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">Le mar. 14 janv. 2025
                à 21:55, Nicolas Cellier <<a
                  href="mailto:nicolas.cellier.aka.nice@gmail.com"
                  target="_blank" moz-do-not-send="true"
                  class="moz-txt-link-freetext">nicolas.cellier.aka.nice@gmail.com</a>>
                a écrit :<br>
              </div>
              <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                <div dir="ltr">That remind me this thread:<br>
                  <br>
                  <a
href="https://lists.cuis.st/mailman/archives/cuis-dev/2024-March/008555.html"
                    target="_blank" moz-do-not-send="true"
                    class="moz-txt-link-freetext">https://lists.cuis.st/mailman/archives/cuis-dev/2024-March/008555.html</a></div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">Le mar. 14 janv.
                    2025 à 20:38, David T Lewis via Squeak-dev <<a
href="mailto:squeak-dev@lists.squeakfoundation.org" target="_blank"
                      moz-do-not-send="true"
                      class="moz-txt-link-freetext">squeak-dev@lists.squeakfoundation.org</a>>
                    a écrit :<br>
                  </div>
                  <blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
                    <p dir="auto">Reported by Eliot on squeak-dev:<br>
                      <a
href="https://lists.squeakfoundation.org/archives/list/squeak-dev@lists.squeakfoundation.org/thread/ABGLWILW6YJ2NPAHICWSLCOUIAKXKLFL/"
                        rel="nofollow" target="_blank"
                        moz-do-not-send="true"
                        class="moz-txt-link-freetext">https://lists.squeakfoundation.org/archives/list/squeak-dev@lists.squeakfoundation.org/thread/ABGLWILW6YJ2NPAHICWSLCOUIAKXKLFL/</a></p>
                    <p dir="auto">Copied from the original report:</p>
                    <p dir="auto">Hi All,</p>
                    <p dir="auto">   recently the identityHash
                      implementation in the VM has been upgraded. </p>
                    <p dir="auto">Open Smalltalk Cog[Spur] VM
                      [CoInterpreterPrimitives VMMaker.oscog-eem.3500]
                      5.20250103.0325<br>
                      platform sources revision VM: 202501030325 <a
                        href="mailto:eliot@Machado.local"
                        target="_blank" moz-do-not-send="true"
                        class="moz-txt-link-freetext">eliot@Machado.local</a>:oscogvm
                      Date: Thu Jan 2 19:25:19 2025 CommitHash: 5e05e38</p>
                    <p dir="auto">It now uses an xorshift register PNG
                      that cycles through all possible 2^22 - 1 valid
                      identity hashes in 2^22 - 1 iterations. One way to
                      test this is to create a set, and loop creating
                      2^22-1 objects, adding their hashes to the set,
                      and answering the size of the set.</p>
                    <p dir="auto">So let's do this and see how long it
                      takes at the same time. e.g. this is on my 2021 M1
                      Max MBP</p>
                    <p dir="auto">| size |<br>
                      { [| hashes |<br>
                          hashes := Set new: (2 raisedTo: 22).<br>
                          1 to: (2 raisedTo: 22) - 1 do:<br>
        [:ign| hashes add: Object new identityHash].<br>
                          size := hashes size] timeToRun.<br>
                          size.<br>
                          (2 raisedTo: 22) - 1 }. #(450 4194303 4194303)</p>
                    <p dir="auto">also #(483 4194303 4194303)</p>
                    <p dir="auto">So avoiding grows this takes less than
                      half a second.</p>
                    <p dir="auto">However, if we just use Set new and
                      rely on it growing we get approximately a 1,500 to
                      1,800 fold slowdown:</p>
                    <p dir="auto">| size |<br>
                      { [| hashes |<br>
                          hashes := Set new.<br>
                          1 to: (2 raisedTo: 22) - 1 do:<br>
        [:ign| hashes add: Object new identityHash].<br>
                          size := hashes size] timeToRun.<br>
                          size.<br>
    (2 raisedTo: 22) - 1 }. #(768992 4194303 4194303)</p>
                    <p dir="auto">also #(800874 4194303 4194303).</p>
                    <p dir="auto">768992 / 483.0 = 1592<br>
                      800874 / 450.0 1780</p>
                    <p dir="auto">Now Cuis has moved away from allowing
                      stretchy collections to have their capacity
                      initialized with new:. One has to
                      use newWithRoomForMoreThan:. So</p>
                    <p dir="auto">| size |<br>
                      { [| hashes |<br>
                      hashes := Set newWithRoomForMoreThan: (2 raisedTo:
                      22).<br>
                      1 to: (2 raisedTo: 22) - 1 do:<br>
                      [:ign| hashes add: Object new identityHash].<br>
                      size := hashes size] timeToRun.<br>
                      size.<br>
                      (2 raisedTo: 22) - 1 }. #(506 4194303 4194303)</p>
                    <p dir="auto">BUT!!!! If we just use new and allow
                      Cuis to grow the set we get e.g.</p>
                    <p dir="auto">| size |<br>
                      { [| hashes |<br>
                      hashes := Set new.<br>
                      1 to: (2 raisedTo: 22) - 1 do:<br>
                      [:ign| hashes add: Object new identityHash].<br>
                      size := hashes size] timeToRun.<br>
                      size.<br>
                      (2 raisedTo: 22) - 1 }. #(725 4194303 4194303) .</p>
                    <p dir="auto">This is at least 1,000 times faster
                      than Squeak.  Something is seriously wrong with
                      the Squeak code.</p>
                    <p dir="auto">768992 / 725.0 = 1061</p>
                    <p dir="auto"><strong>,,,^..^,,,</strong><br>
                      Happy New Year! Eliot</p>
                    <p style="font-size:small;color:rgb(102,102,102)">—<br>
                      Reply to this email directly, <a
href="https://github.com/squeak-smalltalk/squeak-object-memory/issues/129"
                        target="_blank" moz-do-not-send="true">view it
                        on GitHub</a>, or <a
href="https://github.com/notifications/unsubscribe-auth/BFYAK6LM7BP4BHBELJMTK4L2KVRQPAVCNFSM6AAAAABVFUFTZWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG44DQMJTG42DCMA"
                        target="_blank" moz-do-not-send="true">unsubscribe</a>.<br>
                      You are receiving this because you are subscribed
                      to this thread.<img height="1" width="1" alt=""
                        moz-do-not-send="true"><span
style="font-size:0px;display:none;overflow:hidden;opacity:0;width:0px;height:0px;max-width:0px;max-height:0px;color:transparent">Message
                        ID: <span><squeak-smalltalk/squeak-object-memory/issues/129</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p>
                    Squeak-dev mailing list -- <a
href="mailto:squeak-dev@lists.squeakfoundation.org" target="_blank"
                      moz-do-not-send="true"
                      class="moz-txt-link-freetext">squeak-dev@lists.squeakfoundation.org</a><br>
                    To unsubscribe send an email to <a
href="mailto:squeak-dev-leave@lists.squeakfoundation.org"
                      target="_blank" moz-do-not-send="true"
                      class="moz-txt-link-freetext">squeak-dev-leave@lists.squeakfoundation.org</a></blockquote>
                </div>
              </blockquote>
            </div>
            Squeak-dev mailing list -- <a
              href="mailto:squeak-dev@lists.squeakfoundation.org"
              target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">squeak-dev@lists.squeakfoundation.org</a><br>
            To unsubscribe send an email to <a
              href="mailto:squeak-dev-leave@lists.squeakfoundation.org"
              target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">squeak-dev-leave@lists.squeakfoundation.org</a></blockquote>
        </div>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
    </blockquote>
  </body>
</html>