[Cuis-dev] performance of OrderedCollection #new vs. #new:

Mon Mar 4 09:53:23 PST 2024

Hi Folks,

Interesting questions, and even better answer!

It looks like any allocation in old space will trigger a GC. Is this 
right? Is it needed?

It also looks like a two level design, with leaves of size 2^16-1 or 
such is in order...

Cheers,

On 3/3/2024 8:31 PM, Nicolas Cellier via Cuis-dev wrote:
> Hi Christian,
> concerning the thresholds, the Opensmalltalk VM stops allocating in 
> eden for 2^16 slots and above.
> It allocates in oldSpace instead
>
>      numSlots > self maxSlotsForNewSpaceAlloc
>          ifTrue:
>              [numSlots > self maxSlotsForAlloc ifTrue:
>                  [coInterpreter primitiveFailFor: PrimErrUnsupported.
>                  ^nil].
>              newObj := self allocateSlotsInOldSpace: numSlots format: 
> instSpec classIndex: classIndex]
>          ifFalse:
>              [newObj := self allocateSlots: numSlots format: instSpec 
> classIndex: classIndex].
>
> Hence the 66,000 threshold you observe with new: (it's 65536).
>
> For 41000, we start we size 10, and double size at each growth, then, 
> at 11th growth we get a size 2^11 * 10, 4096*10 = 40,960
> At next growth (adding the 41961), the array size will get over the 
> 65535 threshold.
> Hence it's about the same threshold that we observe.
>
> For 82000, this gets more interesting. This time, the size allocated 
> is 163,840 slots.
> With 8 bytes per slot (assuming 64 bits VM), we're getting just over a 
> MiByte.
> No time to dig more in VM source code, but it might be related to 
> object memory growth...
>
> Now let's observe some interesting figures in Squeak:
>
> [Array new: 65535] bench.
>  '1,650 per second. 605 microseconds per run. 60.35841 % GC time.'
>
> [Array new: 65536] bench.
>  '226 per second. 4.42 milliseconds per run. 91.64405 % GC time.'
>
> Notice the high percentage spent at GC once we're in the old space : 
> the cost is likely to be dominated by GC.
>
> And in Cuis:
>
> [Array new: 65535] bench.
>  '2.02 k runs per second' .
> [Array new: 65536] bench.
>  '1.31 k runs per second' .
>
> Ah ah ! Cuis image is much smaller, hence full GC much cheaper !
>
> I guess that Pharo images are much larger, hence the cost...
>
> So,
> - the cost is dominated by GC in OpenSmalltalk images
> - starting at 2^16 slot and above, oldSpace is allocated, and that 
> ends up with full GC
> - the larger the image, the less efficient the allocation
>
> VW memory policy is more optimized than Opensmalltalk, at least for 
> this benchmark.
> Probably a virtue of large space (segments reserved for large objects).
>
> Nicolas
>

-- 
Juan Vuletich
cuis.st
github.com/jvuletich
researchgate.net/profile/Juan-Vuletich
independent.academia.edu/JuanVuletich
patents.justia.com/inventor/juan-manuel-vuletich
linkedin.com/in/juan-vuletich-75611b3
twitter.com/JuanVuletich