[Cuis-dev] Really fast Fibonacci function

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Dec 7 01:29:15 PST 2019


Hi all,
For image based Karatsuba on Spur64 VM, the breakpoint is at about 500
bytes (4k-bits), but it's a very flat transition rather than a break.
This is all implemented in Squeak trunk if you want to play with.
Note that I already gained a huge speed up for naive multiplication by
using 32-bits limbs instead of 8-bits limbs in the VM.
I'm curious, what was the limb size on HPS giving the 384 bits breakpoint?
For VM implementation of Karatsuba, we can do horrible things like reusing
pre-allocated memory...

I had fun with more divide and conquer methods for / (Burnikel Ziegler) and
the most beautiful is sqrt (Zimmerman) which beats squared for huge
numbers. See posts in this serie:
http://smallissimo.blogspot.com/2019/05/tuning-large-arithmetic-thresholds.html
.

It would be nice to make those accelerated arithmetic optional, and it
would be nice too if other pairs of eyes could review and improve code in
this area.

Le sam. 7 déc. 2019 à 07:11, Andres Valloud via Cuis-dev <
cuis-dev at lists.cuis.st> a écrit :

> I played with these things quite a bit over the years.  The identity
> F(a+b) = F(a+1) F(b) + F(a) F(b-1) can be used to speed up calculations.
>   In particular, the idea is to split the index n = a+b so that a >= b
> and a = 2^k.  Then you can cache F(2^k + 1), F(2^k), and F(2^k - 1) to
> speed things up (once you have two in any of these triplets, the other
> one is essentially free).  Since these indices are known in advance, no
> dictionary is necessary.  Note this approach requires very little
> storage space, as opposed to the more naive way of breaking n into e.g.
> (n / 2) ceiling and (n / 2) floor.
>
> I saw this about 20 years ago and implemented it in Squeak 1.x or 2.x.
> Another bit that came from that work: look at F(5^n) mod 100 and 1000.
> Fascinating, no?
>
> Calculating Fn using that recursion plus caching was a bit faster than
> Dijkstra's recursion (which, by virtue of its shape, is not as cache
> friendly).  However, when starting from scratch, Dijsktra's recursion
> was a bit faster.  I did some of this around 2008.
>
> A very important implementation strategy for this kind of work is to
> implement Karatsuba's multiplication.  I did that in the HPS world, and
> it paid off once you got past 384 bit x 384 bit multiplications or so.
> Very quickly, Karatsuba's multiplication complexity of about n^1.6
> soundly beats the usual elementary n^2 multiplication algorithm, even if
> Karatsuba is implemented in the image while the n^2 algorithm is a VM
> primitive (that, in the HPS case, I also (re)wrote).  I have no idea
> what the breakeven point is in the Cog world.
>
> By the way, the simple grade school multiplication algorithm is just
> fine as a default VM implementation because the usual multiplication
> case involves relatively small large integers, and in that case the
> simple algorithm wins because of its simplicity.
>
> As a suggestion, you can reimplement LargeInteger>>* to do a size test
> then delegate to the usual primitives if the sizes are small enough,
> otherwise you do Karatsuba to get smaller integers then try again.
> However, I think it's cleaner to implement something like
> Integer>>karatsubaTimes: (refining it in the subclasses appropriately),
> then you can send that new message from the places you know matter.
>
> A VM implementation of Karatsuba's algorithm comes with subtle problems
> because it needs to allocate multiple intermediate objects.  This means
> such multiplications will tax the memory manager.  It's possible that
> allocations will fail at any point, and having to stop the calculation
> in the middle will induce stability problems (e.g. Karatsuba can almost
> fill memory completely and then fail --- now the image doesn't have
> enough space to even react to the memory emergency, and your system
> crashes).  It's not that trivial to make a robust VM implementation.  In
> contrast, the grade school multiplication has optimal memory allocation
> behavior, so if it fails due to an out of memory condition then you know
> why and what to do.  Moreover, it's not that hard to implement Karatsuba
> in the image, and again the big performance gain comes from n^1.6 vs
> n^2.  Soon enough even the constants won't matter.
>
> On 12/6/19 20:26, Agustín Sansone via Cuis-dev wrote:
> > Hi all,
> >
> > I was exploring the /mathematical functions/ category of the class
> > /Integer/ and a I didn't find the very well-known Fibonacci function
> > ("/f(n)=f(n-1)+f(n-2)"/) , so I decided to implement it. I think this
> > will be interesting due to the complexity I found (wich is O(log(n))!!).
> >
> >   * A simple way to implement it is by writing the direct recursive
> >     mathematical recurrence relation. This would be exponential.
> >   * An iterative solution wouldn't do all this repeated work, therefore
> >     it would be lineal.
> >   * I used a dynamic-programming-divide-and-conquer-aproach wich
> >     computes the n-th Fibonacci number in O(log(n)). I think from this
> >     comment you can see why it works:
> >
> https://math.stackexchange.com/questions/1834242/derivation-of-fibonacci-logn-time-sequence
> .
> >
> > Here are some running times:
> > exponential --> [1 to: 40 do: [:elem | elem fibonacciNumber]] timeToRun
> >    38536
> > lineal -->           [1 to: 10000 do: [:elem | elem fibonacciNumber]]
> > timeToRun.   25558
> > logarithmic -->  [1 to: 10000 do: [:elem | elem fibonacciNumber]]
> > timeToRun.   1471
> >
> > I leave you all three implementations, let me know what you think,
> > Agustín
> >
> >
> >
> >
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cuis.st/mailman/archives/cuis-dev/attachments/20191207/53bebf02/attachment-0001.htm>


More information about the Cuis-dev mailing list