grub-devel.gnu.org archive mirror
 help / color / mirror / Atom feed
* Reed-Solomon optimised
@ 2011-11-13 22:41 Vladimir 'φ-coder/phcoder' Serbinenko
  2011-11-14 10:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 2+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-11-13 22:41 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 2100 bytes --]

Hello, all. When I was implementing raidz, I thought of a more
convenient, compact and faster representation of GF(256). Using it I
could make raid6_recover more compact and Reed-Solomon more fast. Later
is very important since we have to run Reed-Solomon check (but not
necessarily much slower recovery) on every boot. I expected 8-fold
speedup but got 16-18-fold one. The idea was the following:
- In standard representation of GF(256) it's fast to add 2 numbers: it's
just a XOR.
- We also know that every non-zero number in GF(256) is represented as a
power of X: a=X^s
So if we know that a=X^s and b=X^t then
c=ab=X^sX^t=X^(s+t)
Since there are only 255 such elements for each one of them we can
precompute s s.t. X^s=a and we can also precompute powers of X. By
choosing s in interval 0...254 we ensure that 0<=s+t<=508, so we need to
store only 508 elements and then we can multiply any 2 elements by first
checking that both are non-zero (and output 0 if any of them is), one
addition and 3 table lookups (which are probably in cache given their
tiny size) which is much faster than the loop with shifts.
Same idea is usable for GF(2^n) up to n=16. Currently in the code we
have GF(256), GF(2^32), GF(2^64) and GF(2^128).
GF(2^32) and GF(2^64) are used for CRC-32/CRC-64 where they are implicit
with all 256 values precomputed.
GF(2^128) is used in cryptodisk and ZFS-crypto. In cryptodisk it's used
in LRW and XTS. In LRW thanks to the trick from LRW we need only 2
multiplications per sector. XTS does only multiplications with X which
are fast (although the version in GRUB could be optimised by shifting
uint64 and not uint8 at time but it needs care to avoid endianness
problems).
ZFS-crypto in GCM on the other hand probably has the problem of slow
GF(2^128) multiplication. It's possible to optimise it somehow using
some precomputation but it hasn't been done.
Also if you notice any other ways to speed up the boot or make critical
modules or kernel more compact, please share your ideas.

-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 294 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Reed-Solomon optimised
  2011-11-13 22:41 Reed-Solomon optimised Vladimir 'φ-coder/phcoder' Serbinenko
@ 2011-11-14 10:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 2+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-11-14 10:26 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 2492 bytes --]

On 13.11.2011 23:41, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> Hello, all. When I was implementing raidz, I thought of a more
> convenient, compact and faster representation of GF(256). Using it I
> could make raid6_recover more compact and Reed-Solomon more fast. Later
> is very important since we have to run Reed-Solomon check (but not
> necessarily much slower recovery) on every boot. I expected 8-fold
> speedup but got 16-18-fold one. The idea was the following:
> - In standard representation of GF(256) it's fast to add 2 numbers: it's
> just a XOR.
> - We also know that every non-zero number in GF(256) is represented as a
> power of X: a=X^s
> So if we know that a=X^s and b=X^t then
> c=ab=X^sX^t=X^(s+t)
> Since there are only 255 such elements for each one of them we can
> precompute s s.t. X^s=a and we can also precompute powers of X. By
> choosing s in interval 0...254 we ensure that 0<=s+t<=508, so we need to
> store only 508 elements and then we can multiply any 2 elements by first
> checking that both are non-zero (and output 0 if any of them is), one
> addition and 3 table lookups (which are probably in cache given their
> tiny size) which is much faster than the loop with shifts.
> Same idea is usable for GF(2^n) up to n=16. Currently in the code we
> have GF(256), GF(2^32), GF(2^64) and GF(2^128).
> GF(2^32) and GF(2^64) are used for CRC-32/CRC-64 where they are implicit
> with all 256 values precomputed.
Correction,only CRC with irreducible polynomial is connected with GF.
Other CRC are connected with other rings which aren't even integral
domains. And it looks like the polynoms in the CRCs used by GRUB aren't
irreducible. Sorry for this mistake.
> GF(2^128) is used in cryptodisk and ZFS-crypto. In cryptodisk it's used
> in LRW and XTS. In LRW thanks to the trick from LRW we need only 2
> multiplications per sector. XTS does only multiplications with X which
> are fast (although the version in GRUB could be optimised by shifting
> uint64 and not uint8 at time but it needs care to avoid endianness
> problems).
> ZFS-crypto in GCM on the other hand probably has the problem of slow
> GF(2^128) multiplication. It's possible to optimise it somehow using
> some precomputation but it hasn't been done.
> Also if you notice any other ways to speed up the boot or make critical
> modules or kernel more compact, please share your ideas.
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 294 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-11-14 10:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-13 22:41 Reed-Solomon optimised Vladimir 'φ-coder/phcoder' Serbinenko
2011-11-14 10:26 ` Vladimir 'φ-coder/phcoder' Serbinenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).