20210718, 21:07  #45 
Sep 2016
101011000_{2} Posts 
Worth mentioning that using balanced representation cancellation puts you squarely into the "works for average case, but not all cases" category. It is no longer 100% reliable because it's always possible to construct an input that does not destructively cancel out.
For library writers, this is considered a taboo. For more specialized applications like LL where you know you're unlikely to hit those pathological inputs, it's fine. Throw some checksums over it, and it can be made at least as reliable as the hardware itself even if mathematically it isn't 100% provably correct. But even without balanced representation, FFTs still destroy NTTs for smaller products that fit into cache in terms of pure speed. Last fiddled with by Mysticial on 20210718 at 21:08 
20210718, 21:54  #46  
∂^{2}ω=0
Sep 2002
República de California
2×7^{3}×17 Posts 
Quote:
"According to statistical thermodynamics, there is a nonzero probability that that piece of chalk will spontaneously reassemble itself and leap back up into my hand." But instead of waiting for such to occur, he went to the blacboard and grabbed a fresh piece of chalk with with to finish the day's lecture. Our present use case is not nearly so extreme, but as you note, as long as we have reliable errorchecking mechanisms, we need not worry about lowprobability pathological inputs, we simply put in place procedures to deal with them as they come. Hit a dangerously high roundoff or Gerbiczcheck error? Rerun from last checkpoint file and see if it recurs. If so, take the lastcheckpoint residue and randomly rotate it, or rerun at slightly larger FFT length, or whathaveyou. 

20210719, 15:44  #47 
Jul 2021
3·11 Posts 
Right now trying to implement balanced representation in my FFT code. Still made some bug that I can't catch out, searching for it.
But already can tell in advance that there is not much point in doing this improvement inside my code. Because I'm not planning to opensource my library, as it was only a part of learning process. Already what is necessary was answered for my self by people on this thread. For example I'm pretty sure that after implementing this balanced representation I can significantly rise number of used bits from 4 bits per word to desired 1719 bits. And this rise of bits definitely will give great speedup to my FFT. So thanks for pointing out balanced representation. After this speedup finally my FFT will become about the same speed as my NTT. I'll write back if I manage to implement correctly balanced representation and catch current bug. 
20210720, 08:01  #48 
Jul 2021
3×11 Posts 
I observe very strange behavior.
After implementing balanced representation, for some reason I observe following behavior: For example for 2^18 bytes input numbers there is wrong multiplication result if I use 15 and less bits per word, although observed error is very small. Then for 16 17 18 19 bits per word I get correct result and for more than 19 bits I get wrong result again. So for midrange of bits per word I get correct result and for smaller and larger bits I get wrong result. For large bits I can understand that there might be a large error. By for smallrange values I don't understand why its happenning. It might be a bug in my code, that I can't find where it is. Or some other reason that I can't explain why. If it is a bug then I can't explain why I get correct results for part of vector  for example there are 1000 elements in total and I get 230 elements in the beginning of vector with correct value, and 231th value is incorrect. Also if it is a bug then it will be unexplainable why for 16 17 18 19 bits I get correct answer. Strange that some bug might be not critical bug, that in some cases gives correct result. 
20210720, 12:42  #49 
Jun 2003
2^{5}·7·23 Posts 
One thing I could think of: When you balance the most significant word, you might have to carry out a 1 into the next word. Whether or not this needs to be done could change based on the bits per word.
There might be other corner cases related to negative numbers as well. 
20210720, 18:06  #50  
Tribal Bullet
Oct 2004
6727_{8} Posts 
Quote:
The other thing to watch out for with balanced representation is that it need not be unique. In theory with B bits per word all of your array entries x_i should have 2^(B1) <= x_i < 2^(B1) , but your carry propagation can subtract 1 from the next word in the array and add 2^B to x_i, or add 1 and subtract 2^B from x_i. This leads to array entries that may be too large or too small but can still represent a multipleprecision modulus exactly. But the carry propagation must be smart enough to expect extra work converting from balanced representation, and the final carry out of the leading array entry has to be handled carefully because it participates in the modulo operation. As the LL test performs its final iterations the last few residues have long strings of largest or smallest possible x_i and this increases the odds of a bug or an overflow spoiling the computation Last fiddled with by jasonp on 20210720 at 18:10 

20210721, 05:55  #51  
Jul 2021
33_{10} Posts 
Quote:
The strange thing is that I get following picture: for example for 2^18bytes large numbers for 14 bit words I get first 10% of answer correctly, then next word is errored, significantly errored. For 15 bit words there first 30% correct and then error for 16 bits there first 60% correct and then error. And for 17 bits everything is correct. For 18, 19 bits also all is correct. And then for 20 bits I get errors due to overflow of 0.5 error limit, this case is correct case of the error. All numbers and percents above I took approximately, I don't remember by heart what where exact numbers but they looked approximately like those. Last fiddled with by moytrage on 20210721 at 06:03 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
PRP on gpu is faster that on cpu  indomit  Information & Answers  4  20201007 10:50 
faster than LL?  paulunderwood  Miscellaneous Math  13  20160802 00:05 
My CPU is getting faster and faster ;)  lidocorc  Software  2  20081108 09:26 
Faster way to do LLT?  1260  Miscellaneous Math  23  20050904 07:12 
Faster than LL?  clowns789  Miscellaneous Math  3  20040527 23:39 