[Cuis-dev] Really fast Fibonacci function
Andres Valloud
ten at smallinteger.com
Sat Dec 7 18:26:14 PST 2019
Hi,
> For image based Karatsuba on Spur64 VM, the breakpoint is at about 500
> bytes (4k-bits), but it's a very flat transition rather than a break.
Yes, I also saw the gentle transition.
> This is all implemented in Squeak trunk if you want to play with.
> Note that I already gained a huge speed up for naive multiplication by
> using 32-bits limbs instead of 8-bits limbs in the VM.
> I'm curious, what was the limb size on HPS giving the 384 bits breakpoint?
I (re)wrote all of the large integer arithmetic for HPS. Now it uses
integers the size of pointers for all platforms. As to the algorithms,
I threw the book (i.e. Knuth) at it.
> For VM implementation of Karatsuba, we can do horrible things like
> reusing pre-allocated memory...
Ok, is there a memory manager designed to do these horrible things easy?
I think I can hack just like anyone else, but why?
> I had fun with more divide and conquer methods for / (Burnikel Ziegler)
> and the most beautiful is sqrt (Zimmerman) which beats squared for huge
> numbers. See posts in this serie:
> http://smallissimo.blogspot.com/2019/05/tuning-large-arithmetic-thresholds.html.
I'll take a look and see what.
> It would be nice to make those accelerated arithmetic optional, and it
> would be nice too if other pairs of eyes could review and improve code
> in this area.
I agree on collaboration, and I think it needs to start by having code
not live in one particular dialect. Do you know if the Squeak code you
speak of is portable? And, how portable? I do not recall seeing any
such code, I have no knowledge and thus no opinion.
Andres.
>
> Le sam. 7 déc. 2019 à 07:11, Andres Valloud via Cuis-dev
> <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> a écrit :
>
> I played with these things quite a bit over the years. The identity
> F(a+b) = F(a+1) F(b) + F(a) F(b-1) can be used to speed up
> calculations.
> In particular, the idea is to split the index n = a+b so that a >= b
> and a = 2^k. Then you can cache F(2^k + 1), F(2^k), and F(2^k - 1) to
> speed things up (once you have two in any of these triplets, the other
> one is essentially free). Since these indices are known in advance, no
> dictionary is necessary. Note this approach requires very little
> storage space, as opposed to the more naive way of breaking n into e.g.
> (n / 2) ceiling and (n / 2) floor.
>
> I saw this about 20 years ago and implemented it in Squeak 1.x or 2.x.
> Another bit that came from that work: look at F(5^n) mod 100 and 1000.
> Fascinating, no?
>
> Calculating Fn using that recursion plus caching was a bit faster than
> Dijkstra's recursion (which, by virtue of its shape, is not as cache
> friendly). However, when starting from scratch, Dijsktra's recursion
> was a bit faster. I did some of this around 2008.
>
> A very important implementation strategy for this kind of work is to
> implement Karatsuba's multiplication. I did that in the HPS world, and
> it paid off once you got past 384 bit x 384 bit multiplications or so.
> Very quickly, Karatsuba's multiplication complexity of about n^1.6
> soundly beats the usual elementary n^2 multiplication algorithm,
> even if
> Karatsuba is implemented in the image while the n^2 algorithm is a VM
> primitive (that, in the HPS case, I also (re)wrote). I have no idea
> what the breakeven point is in the Cog world.
>
> By the way, the simple grade school multiplication algorithm is just
> fine as a default VM implementation because the usual multiplication
> case involves relatively small large integers, and in that case the
> simple algorithm wins because of its simplicity.
>
> As a suggestion, you can reimplement LargeInteger>>* to do a size test
> then delegate to the usual primitives if the sizes are small enough,
> otherwise you do Karatsuba to get smaller integers then try again.
> However, I think it's cleaner to implement something like
> Integer>>karatsubaTimes: (refining it in the subclasses appropriately),
> then you can send that new message from the places you know matter.
>
> A VM implementation of Karatsuba's algorithm comes with subtle problems
> because it needs to allocate multiple intermediate objects. This means
> such multiplications will tax the memory manager. It's possible that
> allocations will fail at any point, and having to stop the calculation
> in the middle will induce stability problems (e.g. Karatsuba can almost
> fill memory completely and then fail --- now the image doesn't have
> enough space to even react to the memory emergency, and your system
> crashes). It's not that trivial to make a robust VM
> implementation. In
> contrast, the grade school multiplication has optimal memory allocation
> behavior, so if it fails due to an out of memory condition then you
> know
> why and what to do. Moreover, it's not that hard to implement
> Karatsuba
> in the image, and again the big performance gain comes from n^1.6 vs
> n^2. Soon enough even the constants won't matter.
>
> On 12/6/19 20:26, Agustín Sansone via Cuis-dev wrote:
> > Hi all,
> >
> > I was exploring the /mathematical functions/ category of the class
> > /Integer/ and a I didn't find the very well-known Fibonacci function
> > ("/f(n)=f(n-1)+f(n-2)"/) , so I decided to implement it. I think
> this
> > will be interesting due to the complexity I found (wich is
> O(log(n))!!).
> >
> > * A simple way to implement it is by writing the direct recursive
> > mathematical recurrence relation. This would be exponential.
> > * An iterative solution wouldn't do all this repeated work,
> therefore
> > it would be lineal.
> > * I used a dynamic-programming-divide-and-conquer-aproach wich
> > computes the n-th Fibonacci number in O(log(n)). I think from
> this
> > comment you can see why it works:
> >
> https://math.stackexchange.com/questions/1834242/derivation-of-fibonacci-logn-time-sequence.
> >
> > Here are some running times:
> > exponential --> [1 to: 40 do: [:elem | elem fibonacciNumber]]
> timeToRun
> > 38536
> > lineal --> [1 to: 10000 do: [:elem | elem
> fibonacciNumber]]
> > timeToRun. 25558
> > logarithmic --> [1 to: 10000 do: [:elem | elem fibonacciNumber]]
> > timeToRun. 1471
> >
> > I leave you all three implementations, let me know what you think,
> > Agustín
> >
> >
> >
> >
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
> https://lists.cuis.st/mailman/listinfo/cuis-dev
>
>
More information about the Cuis-dev
mailing list