[Cuis-dev] Bag>>#sum:ifEmpty:

Martin McClure martin at hand2mouse.com
Wed Nov 2 23:45:36 PDT 2022


Hi Luciano,

Well, a lot depends on exactly which combination of properties one 
wants. There are many ways to implement collections, and each reasonable 
way is better at some things and worse than others.

Arrays can already be concurrently modified while iterating. (Arrays 
allow duplicate entries, violating my third requirement, which I now 
realize should only apply to collections which do not allow duplicates.)

Hashed collections (let's say Set, but Dictionary is similar) as 
implemented in Smalltalk-80 (open addressing with linear probing) have 
two problem points for iterating while modifying -- deletions can move 
an existing element earlier in the table, and additions can cause the 
table to grow, scrambling everything. The deletion problem can be 
handled by using collision buckets instead of linear probing, which is 
not quite as fast because more cache line reads, but still O(1). The 
grow problem can be handled more easily. A grow replaces the table and 
copies the elements over to the new table. If the iterator caches the 
table in a temporary variable then it can do the iteration over the 
original table even if the base collection has moved on to a new table. 
The old table will be released when the iteration is done.

One of the nice things about Smalltalk is that if there isn't a 
collection class that meets the your exact needs, you have the ability 
to implement your own.

Regards,
-Martin

On 11/2/22 23:09, Luciano Notarfrancesco via Cuis-dev wrote:
> Hi Martin,
> That’s very interesting. I guess this was not the best example, since 
> not being able to modify a collection while it’s iterating doesn’t 
> really have to do with the essence of the collections or their 
> messages, but more with a limitation of the implementation.
> Is there a way to change existing collections to support modification 
> while iterating without loosing performance?
>
> On Thu, 3 Nov 2022 at 12:44 Martin McClure <martin at hand2mouse.com> wrote:
>
>     Hi Luciano,
>
>     I agree that restricting #sum: to be used with a block that is a
>     function is reasonable, and probably best.
>
>     But I disagree about changing a collection while iterating it. I
>     have often wanted a collection that could be modified while
>     iterating it. And, given how much time I've spent writing stuff
>     that has to work in a multi-threaded environment, I've probably
>     implemented that sort of collection at least once. My idea of
>     desirable semantics for this kind of collection are:
>     1) The iteration /will/ include each element that is present
>     throughout the iteration.
>     2) It is /unpredictable/ whether any elements added or removed
>     during the iteration are included in the iteration.
>     3) The same element (same being equal or identical, depending on
>     the collection) will appear at most once in the iteration.
>
>     There are a number of cases where this is useful.
>
>     * The iterating thread may want to add or remove elements while it
>     iterates.
>     A queue is a simple example of this kind of collection, where the
>     iterator is unconditionally removing elements as they're iterated,
>     while some other entity may be concurrently adding elements. This
>     can be generalized to conditional removal of elements that are
>     iterated, or conditional adding of elements based on elements found.
>
>     * Modification of the collection and iteration of the collection
>     might happen in separate threads. There are times when a "more or
>     less up-to-date" iteration of the contents is fine.
>
>     Regards,
>     -Martin
>
>     On 11/2/22 21:57, Luciano Notarfrancesco via Cuis-dev wrote:
>>     Yes, the method assumes the block is a function (it maps one
>>     element to only one value). We talked about this here a couple of
>>     times. The Collection protocol is very “functional”, specially
>>     messages like #collect:, #select:, #detect:, etc, and using them
>>     with blocks that are functions is the most natural way to use
>>     them, and I don’t think we loose anything by making the
>>     assumption that the blocks for these messages are functions.
>>     Moreover, the whole point of Bag is to be able to efficiently
>>     store elements with lots of repetitions, possibly millions of
>>     repetitions that otherwise wouldn’t even fit in memory. If #sum:
>>     iterated over all the repetitions it would defeat the original
>>     purpose of Bag. And I argue that this is not necessary because no
>>     one would ever call #sum: with a block that is not a function, at
>>     least I never had the need to do that, and it feels unnatural to
>>     use #sum: in that way and I’d use #do: instead.
>>     If you think this assumption is strange, think about all the
>>     other assumptions that collections make. For example, you cannot
>>     change a collection while you’re iterating it. It would just feel
>>     wrong to change a collection while you iterate it, we don’t need
>>     to try to support that because who would do that?
>>
>>
>>     On Thu, 3 Nov 2022 at 03:12 Hernán Wilkinson
>>     <hernan.wilkinson at 10pines.com> wrote:
>>
>>         How does that optimization work?
>>         Because I thought about evaluating the block and
>>         multiplying that for the number of elements, but that makes
>>         sense if the block returns always the same value por the same
>>         element, if it does not then it will not work...
>>
>>
>>         On Wed, Nov 2, 2022 at 3:24 PM Juan Vuletich <juan at cuis.st>
>>         wrote:
>>
>>             On 11/1/2022 11:23 PM, Hernán Wilkinson via Cuis-dev wrote:
>>>             yeap, the current implementation is not correct.
>>>             Juan, attached is a change set that fixes it and another
>>>             with the related tests.
>>>
>>>             Cheers!
>>>             Hernan.
>>>
>>>             On Tue, Nov 1, 2022 at 3:39 AM Luciano Notarfrancesco
>>>             <luchiano at gmail.com> wrote:
>>>
>>>                 I’m afk right now so I cannot check, but it sounds
>>>                 like I made a mistake. Of course it should be the
>>>                 value of aBlock at each element times the number of
>>>                 occurrences.
>>>
>>>                 On Tue, 1 Nov 2022 at 07:33 Hernán Wilkinson
>>>                 <hernan.wilkinson at 10pines.com> wrote:
>>>
>>>                     Hi,
>>>                      the implementation of Bag>>#sum: aBlock
>>>                     ifEmpty: emptyBlock does not use the parameter
>>>                     aBlock at all and assumes that each of the
>>>                     elements answers the message *
>>>                     @Luciano Notarfrancesco
>>>                     <mailto:luchiano at gmail.com> the implementation
>>>                     is yours and it is very new? Is there a reason
>>>                     you did that way?
>>>
>>>                     Thanks
>>>                     Hernan
>>>
>>>
>>>                     -- 
>>>                     *Hernán Wilkinson
>>>                     Agile Software Development, Teaching & Coaching*
>>>                     *Phone: +54-011*-4893-2057
>>>                     *Twitter: @HernanWilkinson*
>>>                     *site: http://www.10Pines.com
>>>                     <http://www.10pines.com/>*
>>>                     Address: Alem 896, Floor 6, Buenos Aires, Argentina
>>>
>>>
>>>
>>>             -- 
>>>             *Hernán Wilkinson
>>>             Agile Software Development, Teaching & Coaching*
>>>             *Phone: +54-011*-4893-2057
>>>             *Twitter: @HernanWilkinson*
>>>             *site: http://www.10Pines.com <http://www.10pines.com/>*
>>>             Address: Alem 896, Floor 6, Buenos Aires, Argentina
>>
>>             Hi Hernán,
>>
>>             Your fix disables the optimization Luciano did. I chose
>>             to fix Luciano's code. Did the same for #product: (same
>>             bug). Integrated your tests, and added another one for
>>             #product:
>>
>>             Thanks,
>>
>>             -- 
>>             Juan Vuletich
>>             cuis.st  <http://cuis.st>
>>             github.com/jvuletich  <http://github.com/jvuletich>
>>             researchgate.net/profile/Juan-Vuletich  <http://researchgate.net/profile/Juan-Vuletich>
>>             independent.academia.edu/JuanVuletich  <http://independent.academia.edu/JuanVuletich>
>>             patents.justia.com/inventor/juan-manuel-vuletich  <http://patents.justia.com/inventor/juan-manuel-vuletich>
>>             linkedin.com/in/juan-vuletich-75611b3  <http://linkedin.com/in/juan-vuletich-75611b3>
>>             twitter.com/JuanVuletich  <http://twitter.com/JuanVuletich>
>>
>>
>>
>>         -- 
>>         *Hernán Wilkinson
>>         Agile Software Development, Teaching & Coaching*
>>         *Phone: +54-011*-4893-2057
>>         *Twitter: @HernanWilkinson*
>>         *site: http://www.10Pines.com <http://www.10pines.com/>*
>>         Address: Alem 896, Floor 6, Buenos Aires, Argentina
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20221102/2c719e17/attachment-0001.htm>


More information about the Cuis-dev mailing list