[Cuis-dev] Really fast Fibonacci function

Andres Valloud ten at smallinteger.com
Sat Dec 7 18:26:14 PST 2019


Hi,

> For image based Karatsuba on Spur64 VM, the breakpoint is at about 500 
> bytes (4k-bits), but it's a very flat transition rather than a break.

Yes, I also saw the gentle transition.

> This is all implemented in Squeak trunk if you want to play with.
> Note that I already gained a huge speed up for naive multiplication by 
> using 32-bits limbs instead of 8-bits limbs in the VM.
> I'm curious, what was the limb size on HPS giving the 384 bits breakpoint?

I (re)wrote all of the large integer arithmetic for HPS.  Now it uses 
integers the size of pointers for all platforms.  As to the algorithms, 
I threw the book (i.e. Knuth) at it.

> For VM implementation of Karatsuba, we can do horrible things like 
> reusing pre-allocated memory...

Ok, is there a memory manager designed to do these horrible things easy? 
  I think I can hack just like anyone else, but why?

> I had fun with more divide and conquer methods for / (Burnikel Ziegler) 
> and the most beautiful is sqrt (Zimmerman) which beats squared for huge 
> numbers. See posts in this serie: 
> http://smallissimo.blogspot.com/2019/05/tuning-large-arithmetic-thresholds.html.
I'll take a look and see what.

> It would be nice to make those accelerated arithmetic optional, and it 
> would be nice too if other pairs of eyes could review and improve code 
> in this area.

I agree on collaboration, and I think it needs to start by having code 
not live in one particular dialect.  Do you know if the Squeak code you 
speak of is portable?  And, how portable?  I do not recall seeing any 
such code, I have no knowledge and thus no opinion.

Andres.

> 
> Le sam. 7 déc. 2019 à 07:11, Andres Valloud via Cuis-dev 
> <cuis-dev at lists.cuis.st <mailto:cuis-dev at lists.cuis.st>> a écrit :
> 
>     I played with these things quite a bit over the years.  The identity
>     F(a+b) = F(a+1) F(b) + F(a) F(b-1) can be used to speed up
>     calculations.
>        In particular, the idea is to split the index n = a+b so that a >= b
>     and a = 2^k.  Then you can cache F(2^k + 1), F(2^k), and F(2^k - 1) to
>     speed things up (once you have two in any of these triplets, the other
>     one is essentially free).  Since these indices are known in advance, no
>     dictionary is necessary.  Note this approach requires very little
>     storage space, as opposed to the more naive way of breaking n into e.g.
>     (n / 2) ceiling and (n / 2) floor.
> 
>     I saw this about 20 years ago and implemented it in Squeak 1.x or 2.x.
>     Another bit that came from that work: look at F(5^n) mod 100 and 1000.
>     Fascinating, no?
> 
>     Calculating Fn using that recursion plus caching was a bit faster than
>     Dijkstra's recursion (which, by virtue of its shape, is not as cache
>     friendly).  However, when starting from scratch, Dijsktra's recursion
>     was a bit faster.  I did some of this around 2008.
> 
>     A very important implementation strategy for this kind of work is to
>     implement Karatsuba's multiplication.  I did that in the HPS world, and
>     it paid off once you got past 384 bit x 384 bit multiplications or so.
>     Very quickly, Karatsuba's multiplication complexity of about n^1.6
>     soundly beats the usual elementary n^2 multiplication algorithm,
>     even if
>     Karatsuba is implemented in the image while the n^2 algorithm is a VM
>     primitive (that, in the HPS case, I also (re)wrote).  I have no idea
>     what the breakeven point is in the Cog world.
> 
>     By the way, the simple grade school multiplication algorithm is just
>     fine as a default VM implementation because the usual multiplication
>     case involves relatively small large integers, and in that case the
>     simple algorithm wins because of its simplicity.
> 
>     As a suggestion, you can reimplement LargeInteger>>* to do a size test
>     then delegate to the usual primitives if the sizes are small enough,
>     otherwise you do Karatsuba to get smaller integers then try again.
>     However, I think it's cleaner to implement something like
>     Integer>>karatsubaTimes: (refining it in the subclasses appropriately),
>     then you can send that new message from the places you know matter.
> 
>     A VM implementation of Karatsuba's algorithm comes with subtle problems
>     because it needs to allocate multiple intermediate objects.  This means
>     such multiplications will tax the memory manager.  It's possible that
>     allocations will fail at any point, and having to stop the calculation
>     in the middle will induce stability problems (e.g. Karatsuba can almost
>     fill memory completely and then fail --- now the image doesn't have
>     enough space to even react to the memory emergency, and your system
>     crashes).  It's not that trivial to make a robust VM
>     implementation.  In
>     contrast, the grade school multiplication has optimal memory allocation
>     behavior, so if it fails due to an out of memory condition then you
>     know
>     why and what to do.  Moreover, it's not that hard to implement
>     Karatsuba
>     in the image, and again the big performance gain comes from n^1.6 vs
>     n^2.  Soon enough even the constants won't matter.
> 
>     On 12/6/19 20:26, Agustín Sansone via Cuis-dev wrote:
>      > Hi all,
>      >
>      > I was exploring the /mathematical functions/ category of the class
>      > /Integer/ and a I didn't find the very well-known Fibonacci function
>      > ("/f(n)=f(n-1)+f(n-2)"/) , so I decided to implement it. I think
>     this
>      > will be interesting due to the complexity I found (wich is
>     O(log(n))!!).
>      >
>      >   * A simple way to implement it is by writing the direct recursive
>      >     mathematical recurrence relation. This would be exponential.
>      >   * An iterative solution wouldn't do all this repeated work,
>     therefore
>      >     it would be lineal.
>      >   * I used a dynamic-programming-divide-and-conquer-aproach wich
>      >     computes the n-th Fibonacci number in O(log(n)). I think from
>     this
>      >     comment you can see why it works:
>      >
>     https://math.stackexchange.com/questions/1834242/derivation-of-fibonacci-logn-time-sequence.
>      >
>      > Here are some running times:
>      > exponential --> [1 to: 40 do: [:elem | elem fibonacciNumber]]
>     timeToRun
>      >    38536
>      > lineal -->           [1 to: 10000 do: [:elem | elem
>     fibonacciNumber]]
>      > timeToRun.   25558
>      > logarithmic -->  [1 to: 10000 do: [:elem | elem fibonacciNumber]]
>      > timeToRun.   1471
>      >
>      > I leave you all three implementations, let me know what you think,
>      > Agustín
>      >
>      >
>      >
>      >
>     -- 
>     Cuis-dev mailing list
>     Cuis-dev at lists.cuis.st <mailto:Cuis-dev at lists.cuis.st>
>     https://lists.cuis.st/mailman/listinfo/cuis-dev
> 
> 


More information about the Cuis-dev mailing list