[Cuis-dev] performance of OrderedCollection #new vs. #new:

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sun Mar 3 15:31:58 PST 2024


Hi Christian,
concerning the thresholds, the Opensmalltalk VM stops allocating in eden
for 2^16 slots and above.
It allocates in oldSpace instead

     numSlots > self maxSlotsForNewSpaceAlloc
         ifTrue:
             [numSlots > self maxSlotsForAlloc ifTrue:
                 [coInterpreter primitiveFailFor: PrimErrUnsupported.
                 ^nil].
             newObj := self allocateSlotsInOldSpace: numSlots format:
instSpec classIndex: classIndex]
         ifFalse:
             [newObj := self allocateSlots: numSlots format: instSpec
classIndex: classIndex].

Hence the 66,000 threshold you observe with new: (it's 65536).

For 41000, we start we size 10, and double size at each growth, then, at
11th growth we get a size 2^11 * 10, 4096*10 = 40,960
At next growth (adding the 41961), the array size will get over the 65535
threshold.
Hence it's about the same threshold that we observe.

For 82000, this gets more interesting. This time, the size allocated is
163,840 slots.
With 8 bytes per slot (assuming 64 bits VM), we're getting just over a
MiByte.
No time to dig more in VM source code, but it might be related to object
memory growth...

Now let's observe some interesting figures in Squeak:

[Array new: 65535] bench.
 '1,650 per second. 605 microseconds per run. 60.35841 % GC time.'

[Array new: 65536] bench.
 '226 per second. 4.42 milliseconds per run. 91.64405 % GC time.'

Notice the high percentage spent at GC once we're in the old space : the
cost is likely to be dominated by GC.

And in Cuis:

[Array new: 65535] bench.
 '2.02 k runs per second' .
[Array new: 65536] bench.
 '1.31 k runs per second' .

Ah ah ! Cuis image is much smaller, hence full GC much cheaper !

I guess that Pharo images are much larger, hence the cost...

So,
- the cost is dominated by GC in OpenSmalltalk images
- starting at 2^16 slot and above, oldSpace is allocated, and that ends up
with full GC
- the larger the image, the less efficient the allocation

VW memory policy is more optimized than Opensmalltalk, at least for this
benchmark.
Probably a virtue of large space (segments reserved for large objects).

Nicolas

Le dim. 3 mars 2024 à 19:36, Christian Haider via Cuis-dev <
cuis-dev at lists.cuis.st> a écrit :

> Hi,
>
>
>
> I was going to complain that #new: on OrderedCollection was removed.
>
> I thought that it is common wisdom that #new: is a very important
> optimization when allocating OrderedCollections, especially big ones.
>
>
>
> Therefore, I measured the differences with the following script:
>
> | time time1 time2 |
>
> (0 to: 100000 by: 1000) collect: [:size |
>
>      time := time1 := time2 := 0.
>
>      1000 timesRepeat: [
>
>            time := time + (Time microsecondsToRun: [
>
>                 | list |
>
>                 list := OrderedCollection new.
>
>                 1 to: size do: [:i | list add: i]])].
>
>      time1 := time / 1000.
>
>      time := 0.
>
>      1000 timesRepeat: [
>
>            time := time + (Time microsecondsToRun: [
>
>                 | list |
>
>                 list := OrderedCollection new: size.
>
>                 1 to: size do: [:i | list add: i]])].
>
>      time2 := time / 1000.
>
>      Array with: size with: time1 with: time2]
>
>
>
> I ran the script with the current Cuis 6.3, Pharo 10, Squeak 6.0 and VW
> 9.3.1.
>
> The results are in the attached file from which I created the charts below
> with Excel.
>
> The size of the created collections are on the x-axis, while the y-axis
> shows the microseconds.
>
>
>
> The results are interesting!
>
>    1. As expected, VW new: is the fastest and grows linear with the
>    collection size.
>    2. As expected, #new is slower than #new: (except for Cuis – more
>    below)
>    3. There are certain threshold sizes from which on the creation is
>    consistently slower (look at 40,000 and 82,000 for #new:)
>    4. Surprising is the slowness of Pharo(?)
>
>
>
>
>
> Extremely surprising is Cuis. The next chart shows just Cuis and Squeak to
> spread the scale a bit.
>
> In Cuis, both methods have about the same performance and are growing
> linear with the size!!!
>
>
>
> How is this possible? What is the magic? Or is the measurement wrong?
>
> I am quite impressed.
>
> And I am not going to complain about the missing #new: J.
>
>
>
> Happy hacking,
>
> Christian
>
>
>
>
>
>
>
>
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20240304/8e562c27/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 60052 bytes
Desc: not available
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20240304/8e562c27/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 43991 bytes
Desc: not available
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20240304/8e562c27/attachment-0003.png>


More information about the Cuis-dev mailing list