[Cuis-dev] Some more Bag tweaks

Luciano Notarfrancesco luchiano at gmail.com
Sat May 14 02:21:41 PDT 2022


I added a comment with a warning about reproducibility to
Collection>>#atRandom:, Set>>#atRandom: and Bag>>#atRandom:. I also moved
the empty check to the top of the methods in Collection>>#atRandom: and
Bag>>#atRandom:, the idea behind this is that '0 atRandom: aGenerator'
produces a different error, and doing 'self emptyCheck' first is better
because if at some point we change Collection>>#errorEmptyCollection to
signal a new exception like say EmptyCollectionException we won't need to
change these methods. Finally the last line in those two methods should
never be executed, unless the collection is broken (i.e. size returns n,
but do: iterates over less than n elements), so I just do "self error:
'collection invariants broken'"

Let me know if you have any more suggestions, I really want the kernel
classes to be absolutely perfect.

Thanks!!
Luciano

On Fri, May 13, 2022 at 11:01 PM Juan Vuletich <JuanVuletich at zoho.com>
wrote:

> What I think is the more important issue is that of reproducibility of
> results. Again, something that someone using hashed collections should be
> aware of, but a comment wouldn't hurt.
>
> On 5/13/2022 7:48 PM, Luciano Notarfrancesco via Cuis-dev wrote:
>
> Hm, yes, it could fail to be reproducible if the Collection is iterated in
> different order in two successive calls to do:. Good point. This might
> happen if the collection is rehashed between calls, as you say. Same
> problem could happen in Set with the implementation that we have been using
> for some years now. I use this extensively in tests in my math project and
> I didn’t run into problems, but I think it’s a good idea to add a comment.
>
> However I don’t think it could affect the uniformity of the distribution.
> If you take random samples from a collection and reorder it between each
> sample, it should still be uniform, assuming the generator is really
> random. Since our generators are not real random, and knowing the internal
> state of the generator and the algorithm, it is possible to do a trick and
> reorder it each time you take a sample in such a way that would always
> return the same element… but I don’t think it’s a real problem, I think if
> the collection is rehashed it would still look uniform. I’ll think more
> about it tomorrow when I’m more awake, tho.
>
> The implementation of Collection>>#atRandom: is mostly for completeness,
> in practice we reimplement it in subclasses more efficiently, exploiting
> the structure of each type of collection.
>
>
> On Sat, 14 May 2022 at 5:26 AM Juan Vuletich <JuanVuletich at zoho.com>
> wrote:
>
> Well, given that there is no actual guarantee of the iteration order,
> there is no guarantee of the distribution of the picks.
>
> I know it is not likely, but we can't prove it is impossible that this
> strategy answers always the same element. All we need is extremely bad luck
> so the collections are rehashed following the RNG! In practice, what is
> possible, is that the distribution is not exactly uniform.
>
> Another problem is with reproducibility of the results. In tests, or any
> other situation where you need to guarantee the same results, it is usually
> enough to use the same seeds for the RNGs. In these cases, results would
> not be reproducible, because there is no guarantee of sequencing between
> successive runs. Especially if the image is restarted, or the data is
> recreated, in a different image.
>
> What I'd assume is that the user knows what they are doing. Maybe a
> warning in a comment in those methods is in order.
>
> Thanks,
>
>
> On 5/13/2022 7:11 PM, Luciano Notarfrancesco via Cuis-dev wrote:
>
> What do you mean by good random properties? It should be uniformly
> distributed. Do you see any problem in Collection>>#atRandom: or
> Bag>>#atRandom:?
>
> Thanks!
> Luciano
>
> On Sat, 14 May 2022 at 4:57 AM Juan Vuletich <JuanVuletich at zoho.com>
> wrote:
>
> Anyone using #atRandom: on a non-sequenceable collection should be aware
> that no good random properties can be guaranteed, right?
>
> Anyway, just pushed to GitHub.
>
> Thanks,
>
>
> On 5/9/2022 7:56 AM, Luciano Notarfrancesco via Cuis-dev wrote:
>
> Juan, please don't forget to look at the changeset in my previous mail
> when you have time.
>
> Here are some additional tweaks. I reimplemented
> Collection>>#identityIncludes: using #allSatisfy: instead of #do:, in this
> way it is fast for Bags too, and it mirrors the implementation of
> Collection>>#includes:.
>
> I also implemented a fast Bag>>#atRandom:, and implemented a general
> Collection>>#atRandom:. Originally #atRandom: was not implemented in
> Collection, so I implemented a generic version that only assumes the
> collection understands #size and #do:.
>
> Thanks,
> Luciano
>
> On Tue, May 3, 2022 at 7:54 AM Luciano Notarfrancesco <luchiano at gmail.com>
> wrote:
>
> Here are some more methods that take advantage of the structure of a Bag
> (#allSatisfy:, #anySatisfy:, #max:, #min:, #sum:, etc).
>
> Also made some tweaks to some methods in Collection to call existing
> methods instead of reimplementing, in order to simplify the changes in Bag
> (otherwise. for example, I'd have to implement #sum, #sum: and
> #sum:ifEmpty: in Bag instead of only implementing #sum:ifEmpty). And I
> changed Collection>>#product to produce an error when the collection is
> empty instead of returning 1 (to be consistent with Collection>>#sum:).
>
> All base image tests pass, but please review.
>
> Also, while running tests I got a walkback on BitBltCanvasEngine, see the
> attached log.
>
>
>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3https://independent.academia.edu/JuanVuletichhttps://www.researchgate.net/profile/Juan-Vuletichhttps://patents.justia.com/inventor/juan-manuel-vuletichhttps://twitter.com/JuanVuletich
>
>
>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3https://independent.academia.edu/JuanVuletichhttps://www.researchgate.net/profile/Juan-Vuletichhttps://patents.justia.com/inventor/juan-manuel-vuletichhttps://twitter.com/JuanVuletich
>
>
>
> --
> Juan Vuletichwww.cuis-smalltalk.orghttps://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Devhttps://github.com/jvuletichhttps://www.linkedin.com/in/juan-vuletich-75611b3https://independent.academia.edu/JuanVuletichhttps://www.researchgate.net/profile/Juan-Vuletichhttps://patents.justia.com/inventor/juan-manuel-vuletichhttps://twitter.com/JuanVuletich
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20220514/64abefe1/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5164-RandomTweaks-LucianoEstebanNotarfrancesco-2022May14-08h19m-len.001.cs.st
Type: application/octet-stream
Size: 2571 bytes
Desc: not available
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20220514/64abefe1/attachment-0001.obj>


More information about the Cuis-dev mailing list