public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* JFFS3 & performance
@ 2004-12-16 13:20 Joakim Tjernlund
  2004-12-16 14:27 ` Artem B. Bityuckiy
                   ` (2 more replies)
  0 siblings, 3 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 13:20 UTC (permalink / raw)
  To: Linux MTD mailing list

Hi List

I am a long time JFFS2 user(and developer), but I haven't been following
JFFS2 for a year or two. I have noticed that JFFS3 has begun and I figured
I would offer a few thoughts.

1) Consider changing the start seed to crc32 from 0 to -1. Zero
   is not a good start seed for crc32

2) Consider another checksum algorithm. Crc32 is very expensive
   and JFFS2 suffered severely in the early days. Now that crc32 is
   very optimized that problem is less visible, but crc32 is still
   expensive. Maybe an Adler32 checksum is good enough or a crc16?

3) Don't calculate a Adler32 checksum when comressing with zlib.
   JFFS2 already has its own checksum.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 13:20 Joakim Tjernlund
@ 2004-12-16 14:27 ` Artem B. Bityuckiy
  2004-12-16 14:45   ` Joakim Tjernlund
  2004-12-17 11:33 ` David Vrabel
  2004-12-21 14:38 ` Jörn Engel
  2 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-16 14:27 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Thu, 16 Dec 2004, Joakim Tjernlund wrote:

> Hi List
> 
> I am a long time JFFS2 user(and developer), but I haven't been following
> JFFS2 for a year or two. I have noticed that JFFS3 has begun and I figured
> I would offer a few thoughts.
> 
> 1) Consider changing the start seed to crc32 from 0 to -1. Zero
>    is not a good start seed for crc32
If you do point 2 this will not be needed :-)
 
> 
> 2) Consider another checksum algorithm. Crc32 is very expensive
>    and JFFS2 suffered severely in the early days. Now that crc32 is
>    very optimized that problem is less visible, but crc32 is still
>    expensive. Maybe an Adler32 checksum is good enough or a crc16?
IMHO, NAND/ECC NOR are additionally protected by ECCs so that sounds 
reasonable. NORs are reliable, so that is reasonable too, IMHO.

> 
> 3) Don't calculate a Adler32 checksum when comressing with zlib.
>    JFFS2 already has its own checksum.
This is really seems reasonable. Personally I didn't do that, but guess it 
is possible to ask zlib not to add Adler32 checksums, right?


I'll put your ideas to jffs3/TODO if you don't mind :-)

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 14:27 ` Artem B. Bityuckiy
@ 2004-12-16 14:45   ` Joakim Tjernlund
  2004-12-16 14:50     ` Artem B. Bityuckiy
  2004-12-16 17:53     ` Jörn Engel
  0 siblings, 2 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 14:45 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list

> On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> 
> > Hi List
> > 
> > I am a long time JFFS2 user(and developer), but I haven't been following
> > JFFS2 for a year or two. I have noticed that JFFS3 has begun and I figured
> > I would offer a few thoughts.
> > 
> > 1) Consider changing the start seed to crc32 from 0 to -1. Zero
> >    is not a good start seed for crc32
> If you do point 2 this will not be needed :-)

Yes, but if you do point 2, you still have to consider start seed and it should
be != 0 regardless of what checksum you choose.

>  
> > 
> > 2) Consider another checksum algorithm. Crc32 is very expensive
> >    and JFFS2 suffered severely in the early days. Now that crc32 is
> >    very optimized that problem is less visible, but crc32 is still
> >    expensive. Maybe an Adler32 checksum is good enough or a crc16?
> IMHO, NAND/ECC NOR are additionally protected by ECCs so that sounds 
> reasonable. NORs are reliable, so that is reasonable too, IMHO.

Exactly.

> 
> > 
> > 3) Don't calculate a Adler32 checksum when comressing with zlib.
> >    JFFS2 already has its own checksum.
> This is really seems reasonable. Personally I didn't do that, but guess it 
> is possible to ask zlib not to add Adler32 checksums, right?

 Yes, I tried this once but that was too long time ago for me to remember how
 I did it. Currently JFFS2 skip the adler32 check upon read.
> 
> 
> I'll put your ideas to jffs3/TODO if you don't mind :-)

I don't mind :)

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 14:45   ` Joakim Tjernlund
@ 2004-12-16 14:50     ` Artem B. Bityuckiy
  2004-12-16 15:00       ` Joakim Tjernlund
  2004-12-16 17:53     ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-16 14:50 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Thu, 16 Dec 2004, Joakim Tjernlund wrote:

> > On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> > 
> > > Hi List
> > > 
> > > I am a long time JFFS2 user(and developer), but I haven't been following
> > > JFFS2 for a year or two. I have noticed that JFFS3 has begun and I figured
> > > I would offer a few thoughts.
> > > 
> > > 1) Consider changing the start seed to crc32 from 0 to -1. Zero
> > >    is not a good start seed for crc32
> > If you do point 2 this will not be needed :-)
> 
> Yes, but if you do point 2, you still have to consider start seed and it should
> be != 0 regardless of what checksum you choose.

Do you have some of your Ideas done? :)

> 
> >  
> > > 
> > > 2) Consider another checksum algorithm. Crc32 is very expensive
> > >    and JFFS2 suffered severely in the early days. Now that crc32 is
> > >    very optimized that problem is less visible, but crc32 is still
> > >    expensive. Maybe an Adler32 checksum is good enough or a crc16?
> > IMHO, NAND/ECC NOR are additionally protected by ECCs so that sounds 
> > reasonable. NORs are reliable, so that is reasonable too, IMHO.
> 
> Exactly.
> 
> > 
> > > 
> > > 3) Don't calculate a Adler32 checksum when comressing with zlib.
> > >    JFFS2 already has its own checksum.
> > This is really seems reasonable. Personally I didn't do that, but guess it 
> > is possible to ask zlib not to add Adler32 checksums, right?
> 
>  Yes, I tried this once but that was too long time ago for me to remember how
>  I did it. Currently JFFS2 skip the adler32 check upon read.
> > 
> > 
> > I'll put your ideas to jffs3/TODO if you don't mind :-)
> 
> I don't mind :)

Done.

> 
>  Jocke
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 14:50     ` Artem B. Bityuckiy
@ 2004-12-16 15:00       ` Joakim Tjernlund
  0 siblings, 0 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 15:00 UTC (permalink / raw)
  To: Artem Bityuckiy; +Cc: Linux MTD mailing list

> On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> 
> > > On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> > > 
> > > > Hi List
> > > > 
> > > > I am a long time JFFS2 user(and developer), but I haven't been following
> > > > JFFS2 for a year or two. I have noticed that JFFS3 has begun and I figured
> > > > I would offer a few thoughts.
> > > > 
> > > > 1) Consider changing the start seed to crc32 from 0 to -1. Zero
> > > >    is not a good start seed for crc32
> > > If you do point 2 this will not be needed :-)
> > 
> > Yes, but if you do point 2, you still have to consider start seed and it should
> > be != 0 regardless of what checksum you choose.
> 
> Do you have some of your Ideas done? :)

No, I had the removal of the Adler32 checksum, but it is gone now.
I could possibly do that again and a optimized crc16(I did the crc32 in current kernel
so it shouldn't be hard) or adler32 checksum algorithm.
It will have to wait as I got plenty to do currently.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 14:45   ` Joakim Tjernlund
  2004-12-16 14:50     ` Artem B. Bityuckiy
@ 2004-12-16 17:53     ` Jörn Engel
  2004-12-16 18:42       ` Artem B. Bityuckiy
  2004-12-21 14:45       ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 17:53 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Thu, 16 December 2004 15:45:00 +0100, Joakim Tjernlund wrote:
> > On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> 
> > > 2) Consider another checksum algorithm. Crc32 is very expensive
> > >    and JFFS2 suffered severely in the early days. Now that crc32 is
> > >    very optimized that problem is less visible, but crc32 is still
> > >    expensive. Maybe an Adler32 checksum is good enough or a crc16?
> > IMHO, NAND/ECC NOR are additionally protected by ECCs so that sounds 
> > reasonable. NORs are reliable, so that is reasonable too, IMHO.
> 
> Exactly.

I'd vote against adler32 and in favor of crc16 or crc24.  Crc24 would
be slightly safer, but still much faster than crc33 (crc32 is a
misnamer, actually).

Code should be something like this for crc24:

u32 crc24(void *m, size_t len)
{
	u32 ret=0;
	char *s=m;
	size_t i;
	for (i=0; i<len; i++) {
		ret <<= 8;
		ret += s[i];
		ret %/ 0xfffffd
	}
	return ret;
}

Completely untested, written from scratch just now.  But apart from
the bugs, it should be reasonably fast.

Jörn

-- 
Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.
-- Rob Pike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 17:53     ` Jörn Engel
@ 2004-12-16 18:42       ` Artem B. Bityuckiy
  2004-12-16 19:15         ` Jörn Engel
  2004-12-21 14:45       ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-16 18:42 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Joakim Tjernlund

On Thu, 16 Dec 2004, [iso-8859-1] Jörn Engel wrote:

And several other aspects:
 For the nodes common header which is only 8 bytes long it is reasonable 
to use very very simple CRC.
 For the node headers (next after common header, see jffs[23].h) we may 
possibly try to use crc16 since it is also not very long.
 For direntry node data (name of direntry) which is not longer than 255 
symbols (at least currently) we may also use crc16 I guess.
 The inode data is <= PAGE_SIZE (mostly 4K) may be something more strong.
 Other node's data, like summary or ICP (they not exist yet but I hope are 
going to appear) may have longer length, but anyway, restricted by JFFS2 
erasable block size. This may be protected by something more strong...

Frankly speaking, I'm not expert in the CRC issues. May be somebody may 
concieder this issue more formal way, using some criterias, etc... May be 
it is possible to evaluate the probability if CRC "misses" somehow for 
typical NOR/NAND if we know data leghtes ... ?

> On Thu, 16 December 2004 15:45:00 +0100, Joakim Tjernlund wrote:
> > > On Thu, 16 Dec 2004, Joakim Tjernlund wrote:
> > 
> > > > 2) Consider another checksum algorithm. Crc32 is very expensive
> > > >    and JFFS2 suffered severely in the early days. Now that crc32 is
> > > >    very optimized that problem is less visible, but crc32 is still
> > > >    expensive. Maybe an Adler32 checksum is good enough or a crc16?
> > > IMHO, NAND/ECC NOR are additionally protected by ECCs so that sounds 
> > > reasonable. NORs are reliable, so that is reasonable too, IMHO.
> > 
> > Exactly.
> 
> I'd vote against adler32 and in favor of crc16 or crc24.  Crc24 would
> be slightly safer, but still much faster than crc33 (crc32 is a
> misnamer, actually).
> 
> Code should be something like this for crc24:
> 
> u32 crc24(void *m, size_t len)
> {
> 	u32 ret=0;
> 	char *s=m;
> 	size_t i;
> 	for (i=0; i<len; i++) {
> 		ret <<= 8;
> 		ret += s[i];
> 		ret %/ 0xfffffd
> 	}
> 	return ret;
> }
> 
> Completely untested, written from scratch just now.  But apart from
> the bugs, it should be reasonably fast.
> 
> Jörn
> 
> -- 
> Fancy algorithms are buggier than simple ones, and they're much harder
> to implement. Use simple algorithms as well as simple data structures.
> -- Rob Pike
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 18:42       ` Artem B. Bityuckiy
@ 2004-12-16 19:15         ` Jörn Engel
  2004-12-16 19:49           ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 19:15 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list, Joakim Tjernlund

On Thu, 16 December 2004 18:42:02 +0000, Artem B. Bityuckiy wrote:
> 
> And several other aspects:
>  For the nodes common header which is only 8 bytes long it is reasonable 
> to use very very simple CRC.
>  For the node headers (next after common header, see jffs[23].h) we may 
> possibly try to use crc16 since it is also not very long.
>  For direntry node data (name of direntry) which is not longer than 255 
> symbols (at least currently) we may also use crc16 I guess.
>  The inode data is <= PAGE_SIZE (mostly 4K) may be something more strong.
>  Other node's data, like summary or ICP (they not exist yet but I hope are 
> going to appear) may have longer length, but anyway, restricted by JFFS2 
> erasable block size. This may be protected by something more strong...
> 
> Frankly speaking, I'm not expert in the CRC issues. May be somebody may 
> concieder this issue more formal way, using some criterias, etc... May be 
> it is possible to evaluate the probability if CRC "misses" somehow for 
> typical NOR/NAND if we know data leghtes ... ?

Not being an expert either, I can at least leave my half-knowledge
here.  If someone knows things better than I do, please correct me.


Principle of crc:
A crc is nothing but the remainder of an integer division.  That
simple.


Example crc4:
crc4(data) = data%13;

The divisor was picked such that the remainder always fits into 4
bits.  Natural choice is a prime number, and 13 happens to be the
biggest prime that fits into 4 bits.  Non-prime numbers work as well,
but primes generally give you a better feeling.

Since most cpus don't support division of arbitrary-length data, the
division is implemented in a loop, just like my crc24 example:

> > u32 crc24(void *m, size_t len)
> > {
> > 	u32 ret=0;
> > 	char *s=m;
> > 	size_t i;
> > 	for (i=0; i<len; i++) {
> > 		ret <<= 8;
> > 		ret += s[i];
> > 		ret %/ 0xfffffd
> > 	}
> > 	return ret;
> > }

It is trivial to prove that crc4 detects any possible one-bit error.
It doesn't detect all possible two-bit or higher errors, though, so
crc4 should be considered pretty weak.


Example crc32:
Most people have learned many boring details about this algorithm,
esp. about highly optimized variants of it.  In principle, it works
just as above with a 32-bit prime number as divisor instead.  On
32-bit machines, you cannot even shift the remainder by a single bit
without losing precision, so the algorithm goes through lots of pain
just to make it work in software and even more pain to make it
somewhat fast.
Only hardware-designers can come up with something like that.

Like crc4, crc32 will obviously detect every possible 1-bit error.  It
appears as if it also detects all possible 2-bit and 3-bit errors, but
noone could show me a formal prove yet.  People simply rely on the
fact that many people tried for quite some time to find undetected
3-bit error and couldn't find any.  When dealing with ethernet, you
might find some relevant documentation.


Example crc24:
See above for the code.  Remainder fits into 24 bits, so you can shift
by 8 bits on 32-bit machines.  Process a full byte per loop cycle
without any complex code.  Nice.


Example crc16:
Similar, but remainder fits into 16 bits.  As a result, you can
process two bytes per loop cycle, so it should be about twice as fast.


I have no idea about how good either crc24 or crc16 are wrt. detecting
n-bit errors.  But even if they don't detect every single error, crc16
will only miss one out of about 2^16 (65521 is the largest prime)
n-bit errors and crc24 will miss one out of about 2^24 (0xfffffd)
n-bit errors.  Maybe they also catch _all_ 2-bit errors, I just don't
know.


Jörn

PS: Now you've done it.  I'll implement crc24 and crc16 and benchmark
them against adler32.  Darn you!

-- 
Fantasy is more important than knowledge. Knowledge is limited,
while fantasy embraces the whole world.
-- Albert Einstein

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 19:15         ` Jörn Engel
@ 2004-12-16 19:49           ` Jörn Engel
  2004-12-16 19:58             ` Joakim Tjernlund
  2004-12-16 20:02             ` Joakim Tjernlund
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 19:49 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Joakim Tjernlund, Linux MTD mailing list

On Thu, 16 December 2004 20:15:00 +0100, Jörn Engel wrote:
> 
> PS: Now you've done it.  I'll implement crc24 and crc16 and benchmark
> them against adler32.  Darn you!

Testcase going through 45MB of data in chunks of 4k.  Machine is
PIII-1166 with warm caches.

Doing three runs and discarding the fastest and slowest ones:

crc32:
real    0m0.214s
user    0m0.133s
sys     0m0.076s

adler32:
real    0m0.128s
user    0m0.061s
sys     0m0.066s

crc24:
real    0m0.969s
user    0m0.882s
sys     0m0.073s

crc16:
real    0m0.382s
user    0m0.312s
sys     0m0.061s


Looks like those cold hard numbers beat the crap out of my previous
argument.  So unless someone can seriously optimize below functions,
just pick adler32.


uint32_t crc24(uint32_t crc, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t ret = crc;
	strlen(s);
	for (; len; len--,s++) {
		ret <<= 8;
		ret += *s;
		ret %= 0xfffffd;
	}
	return ret;
}

uint32_t crc16(uint32_t crc, const void *_s, size_t len)
{
	const uint16_t *s = _s;
	uint32_t ret = crc;
	for (; len>1; len-=2,s++) {
		ret <<= 16;
		ret += *s;
		ret %= 65521;
	}
	return ret;
}


Just in order to be complete, here is a variant of reiserfs' r5 hash.
Noone should seriously use it for error detection, but the results are
nice for comparison:
real    0m0.207s
user    0m0.134s
sys     0m0.067s


Jörn

-- 
"Translations are and will always be problematic. They inflict violence 
upon two languages." (translation from German)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 19:49           ` Jörn Engel
@ 2004-12-16 19:58             ` Joakim Tjernlund
  2004-12-16 20:46               ` Jörn Engel
  2004-12-16 20:02             ` Joakim Tjernlund
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 19:58 UTC (permalink / raw)
  To: Jörn Engel, Artem B. Bityuckiy; +Cc: Linux MTD mailing list

> On Thu, 16 December 2004 20:15:00 +0100, Jörn Engel wrote:
> >
> > PS: Now you've done it.  I'll implement crc24 and crc16 and benchmark
> > them against adler32.  Darn you!
>
> Testcase going through 45MB of data in chunks of 4k.  Machine is
> PIII-1166 with warm caches.
>
> Doing three runs and discarding the fastest and slowest ones:
>
> crc32:
> real    0m0.214s
> user    0m0.133s
> sys     0m0.076s
>
> adler32:
> real    0m0.128s
> user    0m0.061s
> sys     0m0.066s
>
> crc24:
> real    0m0.969s
> user    0m0.882s
> sys     0m0.073s
>
> crc16:
> real    0m0.382s
> user    0m0.312s
> sys     0m0.061s
>
>
> Looks like those cold hard numbers beat the crap out of my previous
> argument.  So unless someone can seriously optimize below functions,
> just pick adler32.

A table driven crc24/crc16 is faster, I think.

>
>
> uint32_t crc24(uint32_t crc, const void *_s, size_t len)
> {
> 	const char *s = _s;
> 	uint32_t ret = crc;
> 	strlen(s);

Why strlen()?

> 	for (; len; len--,s++) {
> 		ret <<= 8;
> 		ret += *s;
> 		ret %= 0xfffffd;
> 	}
> 	return ret;
> }

This is probably faster(on PPC).
Depens on gcc version as well.

uint32_t crc24(uint32_t crc, const void *_s, size_t len)
{
	const char *s = _s-1;
	uint32_t ret = crc;
	if (len)
		do {
			ret <<= 8;
			ret += *++s;
			ret %= 0xfffffd;
		} while (--len);
	return ret;
}
>
> uint32_t crc16(uint32_t crc, const void *_s, size_t len)
> {
> 	const uint16_t *s = _s;
> 	uint32_t ret = crc;
> 	for (; len>1; len-=2,s++) {
> 		ret <<= 16;
> 		ret += *s;
> 		ret %= 65521;
> 	}
> 	return ret;
> }
>
>
> Just in order to be complete, here is a variant of reiserfs' r5 hash.
> Noone should seriously use it for error detection, but the results are
> nice for comparison:
> real    0m0.207s
> user    0m0.134s
> sys     0m0.067s
>
>
> Jörn
>
> --
> "Translations are and will always be problematic. They inflict violence
> upon two languages." (translation from German)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 19:49           ` Jörn Engel
  2004-12-16 19:58             ` Joakim Tjernlund
@ 2004-12-16 20:02             ` Joakim Tjernlund
  2004-12-16 20:37               ` Thomas Gleixner
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 20:02 UTC (permalink / raw)
  To: Jörn Engel, Artem B. Bityuckiy; +Cc: Linux MTD mailing list

> On Thu, 16 December 2004 20:15:00 +0100, Jörn Engel wrote:
> >
> > PS: Now you've done it.  I'll implement crc24 and crc16 and benchmark
> > them against adler32.  Darn you!
>
> Testcase going through 45MB of data in chunks of 4k.  Machine is
> PIII-1166 with warm caches.
>
> Doing three runs and discarding the fastest and slowest ones:
>
> crc32:
> real    0m0.214s
> user    0m0.133s
> sys     0m0.076s
>
> adler32:
> real    0m0.128s
> user    0m0.061s
> sys     0m0.066s
>
> crc24:
> real    0m0.969s
> user    0m0.882s
> sys     0m0.073s
>
> crc16:
> real    0m0.382s
> user    0m0.312s
> sys     0m0.061s
>
>
> Looks like those cold hard numbers beat the crap out of my previous
> argument.  So unless someone can seriously optimize below functions,
> just pick adler32.

hmm, crc32 is much faster than crc16/crc24. That means that there is plenty of room
for optimization. One probably needs for find a table driven algoritm and do
a little unrolling.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 20:02             ` Joakim Tjernlund
@ 2004-12-16 20:37               ` Thomas Gleixner
  2004-12-16 20:51                 ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2004-12-16 20:37 UTC (permalink / raw)
  To: Joakim.Tjernlund; +Cc: Linux MTD mailing list

On Thu, 2004-12-16 at 21:02 +0100, Joakim Tjernlund wrote:
> hmm, crc32 is much faster than crc16/crc24. That means that there is plenty of room
> for optimization. One probably needs for find a table driven algoritm and do
> a little unrolling.
> 

static unsigned short crc16tab[] = {
        0x0000, 0xc0c1, 0xc181, 0x0140, 0xc301, 0x03c0, 0x0280, 0xc241,
        0xc601, 0x06c0, 0x0780, 0xc741, 0x0500, 0xc5c1, 0xc481, 0x0440,
        0xcc01, 0x0cc0, 0x0d80, 0xcd41, 0x0f00, 0xcfc1, 0xce81, 0x0e40,
        0x0a00, 0xcac1, 0xcb81, 0x0b40, 0xc901, 0x09c0, 0x0880, 0xc841,
        0xd801, 0x18c0, 0x1980, 0xd941, 0x1b00, 0xdbc1, 0xda81, 0x1a40,
        0x1e00, 0xdec1, 0xdf81, 0x1f40, 0xdd01, 0x1dc0, 0x1c80, 0xdc41,
        0x1400, 0xd4c1, 0xd581, 0x1540, 0xd701, 0x17c0, 0x1680, 0xd641,
        0xd201, 0x12c0, 0x1380, 0xd341, 0x1100, 0xd1c1, 0xd081, 0x1040,
        0xf001, 0x30c0, 0x3180, 0xf141, 0x3300, 0xf3c1, 0xf281, 0x3240,
        0x3600, 0xf6c1, 0xf781, 0x3740, 0xf501, 0x35c0, 0x3480, 0xf441,
        0x3c00, 0xfcc1, 0xfd81, 0x3d40, 0xff01, 0x3fc0, 0x3e80, 0xfe41,
        0xfa01, 0x3ac0, 0x3b80, 0xfb41, 0x3900, 0xf9c1, 0xf881, 0x3840,
        0x2800, 0xe8c1, 0xe981, 0x2940, 0xeb01, 0x2bc0, 0x2a80, 0xea41,
        0xee01, 0x2ec0, 0x2f80, 0xef41, 0x2d00, 0xedc1, 0xec81, 0x2c40,
        0xe401, 0x24c0, 0x2580, 0xe541, 0x2700, 0xe7c1, 0xe681, 0x2640,
        0x2200, 0xe2c1, 0xe381, 0x2340, 0xe101, 0x21c0, 0x2080, 0xe041,
        0xa001, 0x60c0, 0x6180, 0xa141, 0x6300, 0xa3c1, 0xa281, 0x6240,
        0x6600, 0xa6c1, 0xa781, 0x6740, 0xa501, 0x65c0, 0x6480, 0xa441,
        0x6c00, 0xacc1, 0xad81, 0x6d40, 0xaf01, 0x6fc0, 0x6e80, 0xae41,
        0xaa01, 0x6ac0, 0x6b80, 0xab41, 0x6900, 0xa9c1, 0xa881, 0x6840,
        0x7800, 0xb8c1, 0xb981, 0x7940, 0xbb01, 0x7bc0, 0x7a80, 0xba41,
        0xbe01, 0x7ec0, 0x7f80, 0xbf41, 0x7d00, 0xbdc1, 0xbc81, 0x7c40,
        0xb401, 0x74c0, 0x7580, 0xb541, 0x7700, 0xb7c1, 0xb681, 0x7640,
        0x7200, 0xb2c1, 0xb381, 0x7340, 0xb101, 0x71c0, 0x7080, 0xb041,
        0x5000, 0x90c1, 0x9181, 0x5140, 0x9301, 0x53c0, 0x5280, 0x9241,
        0x9601, 0x56c0, 0x5780, 0x9741, 0x5500, 0x95c1, 0x9481, 0x5440,
        0x9c01, 0x5cc0, 0x5d80, 0x9d41, 0x5f00, 0x9fc1, 0x9e81, 0x5e40,
        0x5a00, 0x9ac1, 0x9b81, 0x5b40, 0x9901, 0x59c0, 0x5880, 0x9841,
        0x8801, 0x48c0, 0x4980, 0x8941, 0x4b00, 0x8bc1, 0x8a81, 0x4a40,
        0x4e00, 0x8ec1, 0x8f81, 0x4f40, 0x8d01, 0x4dc0, 0x4c80, 0x8c41,
        0x4400, 0x84c1, 0x8581, 0x4540, 0x8701, 0x47c0, 0x4680, 0x8641,
        0x8201, 0x42c0, 0x4380, 0x8341, 0x4100, 0x81c1, 0x8081, 0x4040,
};

/*
* Calculate crc16
*/
unsigned short crc16 (unsigned char *buf, size_t len)
{
        unsigned short crc = 0xffff;
        while (len--)
                crc = (unsigned short) (crc >> 8) ^ 
		     crc16tab[(crc ^ (unsigned short)(*buf++)) & 0xff];
        return crc;
}

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 19:58             ` Joakim Tjernlund
@ 2004-12-16 20:46               ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 20:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Thu, 16 December 2004 20:58:55 +0100, Joakim Tjernlund wrote:
> > crc24:
> > real    0m0.969s
> > user    0m0.882s
> > sys     0m0.073s
> >
> >
> > Looks like those cold hard numbers beat the crap out of my previous
> > argument.  So unless someone can seriously optimize below functions,
> > just pick adler32.
> 
> A table driven crc24/crc16 is faster, I think.

Sure.  I'm just too lazy to create one.  (hint, hint)

> > uint32_t crc24(uint32_t crc, const void *_s, size_t len)
> > {
> > 	const char *s = _s;
> > 	uint32_t ret = crc;
> > 	strlen(s);
> 
> Why strlen()?

Leftover from my quick'n'dirty hacking.  Numbers without the strlen:
real    0m0.952s
user    0m0.878s
sys     0m0.068s

Not a big difference.

> 
> > 	for (; len; len--,s++) {
> > 		ret <<= 8;
> > 		ret += *s;
> > 		ret %= 0xfffffd;
> > 	}
> > 	return ret;
> > }
> 
> This is probably faster(on PPC).
> Depens on gcc version as well.
>
> uint32_t crc24(uint32_t crc, const void *_s, size_t len)
> {
> 	const char *s = _s-1;
> 	uint32_t ret = crc;
> 	if (len)
> 		do {
> 			ret <<= 8;
> 			ret += *++s;
> 			ret %= 0xfffffd;
> 		} while (--len);
> 	return ret;
> }

Not in i386, though.  So unless the tables really make a difference,
just don't bother.
real    0m0.954s
user    0m0.880s
sys     0m0.064s

Jörn

-- 
People will accept your ideas much more readily if you tell them
that Benjamin Franklin said it first.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 20:37               ` Thomas Gleixner
@ 2004-12-16 20:51                 ` Jörn Engel
  2004-12-16 21:02                   ` Thomas Gleixner
  2004-12-16 21:06                   ` Joakim Tjernlund
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 20:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux MTD mailing list, Joakim.Tjernlund

On Thu, 16 December 2004 21:37:32 +0100, Thomas Gleixner wrote:
> 
> static unsigned short crc16tab[] = {
>         0x0000, 0xc0c1, 0xc181, 0x0140, 0xc301, 0x03c0, 0x0280, 0xc241,
>         0xc601, 0x06c0, 0x0780, 0xc741, 0x0500, 0xc5c1, 0xc481, 0x0440,
>         0xcc01, 0x0cc0, 0x0d80, 0xcd41, 0x0f00, 0xcfc1, 0xce81, 0x0e40,
>         0x0a00, 0xcac1, 0xcb81, 0x0b40, 0xc901, 0x09c0, 0x0880, 0xc841,
>         0xd801, 0x18c0, 0x1980, 0xd941, 0x1b00, 0xdbc1, 0xda81, 0x1a40,
>         0x1e00, 0xdec1, 0xdf81, 0x1f40, 0xdd01, 0x1dc0, 0x1c80, 0xdc41,
>         0x1400, 0xd4c1, 0xd581, 0x1540, 0xd701, 0x17c0, 0x1680, 0xd641,
>         0xd201, 0x12c0, 0x1380, 0xd341, 0x1100, 0xd1c1, 0xd081, 0x1040,
>         0xf001, 0x30c0, 0x3180, 0xf141, 0x3300, 0xf3c1, 0xf281, 0x3240,
>         0x3600, 0xf6c1, 0xf781, 0x3740, 0xf501, 0x35c0, 0x3480, 0xf441,
>         0x3c00, 0xfcc1, 0xfd81, 0x3d40, 0xff01, 0x3fc0, 0x3e80, 0xfe41,
>         0xfa01, 0x3ac0, 0x3b80, 0xfb41, 0x3900, 0xf9c1, 0xf881, 0x3840,
>         0x2800, 0xe8c1, 0xe981, 0x2940, 0xeb01, 0x2bc0, 0x2a80, 0xea41,
>         0xee01, 0x2ec0, 0x2f80, 0xef41, 0x2d00, 0xedc1, 0xec81, 0x2c40,
>         0xe401, 0x24c0, 0x2580, 0xe541, 0x2700, 0xe7c1, 0xe681, 0x2640,
>         0x2200, 0xe2c1, 0xe381, 0x2340, 0xe101, 0x21c0, 0x2080, 0xe041,
>         0xa001, 0x60c0, 0x6180, 0xa141, 0x6300, 0xa3c1, 0xa281, 0x6240,
>         0x6600, 0xa6c1, 0xa781, 0x6740, 0xa501, 0x65c0, 0x6480, 0xa441,
>         0x6c00, 0xacc1, 0xad81, 0x6d40, 0xaf01, 0x6fc0, 0x6e80, 0xae41,
>         0xaa01, 0x6ac0, 0x6b80, 0xab41, 0x6900, 0xa9c1, 0xa881, 0x6840,
>         0x7800, 0xb8c1, 0xb981, 0x7940, 0xbb01, 0x7bc0, 0x7a80, 0xba41,
>         0xbe01, 0x7ec0, 0x7f80, 0xbf41, 0x7d00, 0xbdc1, 0xbc81, 0x7c40,
>         0xb401, 0x74c0, 0x7580, 0xb541, 0x7700, 0xb7c1, 0xb681, 0x7640,
>         0x7200, 0xb2c1, 0xb381, 0x7340, 0xb101, 0x71c0, 0x7080, 0xb041,
>         0x5000, 0x90c1, 0x9181, 0x5140, 0x9301, 0x53c0, 0x5280, 0x9241,
>         0x9601, 0x56c0, 0x5780, 0x9741, 0x5500, 0x95c1, 0x9481, 0x5440,
>         0x9c01, 0x5cc0, 0x5d80, 0x9d41, 0x5f00, 0x9fc1, 0x9e81, 0x5e40,
>         0x5a00, 0x9ac1, 0x9b81, 0x5b40, 0x9901, 0x59c0, 0x5880, 0x9841,
>         0x8801, 0x48c0, 0x4980, 0x8941, 0x4b00, 0x8bc1, 0x8a81, 0x4a40,
>         0x4e00, 0x8ec1, 0x8f81, 0x4f40, 0x8d01, 0x4dc0, 0x4c80, 0x8c41,
>         0x4400, 0x84c1, 0x8581, 0x4540, 0x8701, 0x47c0, 0x4680, 0x8641,
>         0x8201, 0x42c0, 0x4380, 0x8341, 0x4100, 0x81c1, 0x8081, 0x4040,
> };
> 
> /*
> * Calculate crc16
> */
> unsigned short crc16 (unsigned char *buf, size_t len)
> {
>         unsigned short crc = 0xffff;
>         while (len--)
>                 crc = (unsigned short) (crc >> 8) ^ 
> 		     crc16tab[(crc ^ (unsigned short)(*buf++)) & 0xff];
>         return crc;
> }
> 

Thomas, you respond faster than I can think!

real    0m0.395s
user    0m0.326s
sys     0m0.067s

The algorithm doesn't run that fast, though.

Jörn

-- 
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 20:51                 ` Jörn Engel
@ 2004-12-16 21:02                   ` Thomas Gleixner
  2004-12-16 21:06                   ` Joakim Tjernlund
  1 sibling, 0 replies; 196+ messages in thread
From: Thomas Gleixner @ 2004-12-16 21:02 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Joakim.Tjernlund, Linux MTD mailing list

On Thu, 2004-12-16 at 21:51 +0100, Jörn Engel wrote:
> real    0m0.395s
> user    0m0.326s
> sys     0m0.067s
> 
> The algorithm doesn't run that fast, though.

Makes sense, as the modulo in your loop is a provided by the div
instruction on x86. I used this on a CPU where no modulo result was
available.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 20:51                 ` Jörn Engel
  2004-12-16 21:02                   ` Thomas Gleixner
@ 2004-12-16 21:06                   ` Joakim Tjernlund
  2004-12-16 21:22                     ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 21:06 UTC (permalink / raw)
  To: Jörn Engel, Thomas Gleixner; +Cc: Linux MTD mailing list

> > unsigned short crc16 (unsigned char *buf, size_t len)
> > {
> >         unsigned short crc = 0xffff;
> >         while (len--)
> >                 crc = (unsigned short) (crc >> 8) ^ 
> > 		     crc16tab[(crc ^ (unsigned short)(*buf++)) & 0xff];
> >         return crc;
> > }
> > 
> 
> Thomas, you respond faster than I can think!
> 
> real    0m0.395s
> user    0m0.326s
> sys     0m0.067s
> 
> The algorithm doesn't run that fast, though.

There is still more one can do, look in lib/crc32.c in the kernel to see
what I mean. It is tricky to get everything right, but just unrolling it a bit will help.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 21:06                   ` Joakim Tjernlund
@ 2004-12-16 21:22                     ` Jörn Engel
  2004-12-16 22:06                       ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-16 21:22 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

On Thu, 16 December 2004 22:06:21 +0100, Joakim Tjernlund wrote:
> 
> There is still more one can do, look in lib/crc32.c in the kernel to see
> what I mean. It is tricky to get everything right, but just unrolling it a bit will help.

Sure.  But I won't do it tonight anymore and the rest of the week is
booked.

In related news, I just had a look at adler32 code.  It's pretty
obvious why it runs so fast and why we shouldn't use it as a checksum:
o It splits data into chunks of 5552 bytes.
o Inside each chunk, it basically implements the two weak checksums
  from rsync. (And those really are weak.  I once tried using them for
  a hash table and the result was horrible.)
o At the end of each chunk, it implements crc16 on both weak
  checksums.

With jffs3 node containing less than 5552 bytes each, the crc16
operation, a simple modulo, is done but once.  So in effect, we end up
with the weak checksums from rsync, which cause tons of collisions in
hash tables.  Pretty nasty for a checksum.

Istr reading something from one of the two authors that basically
states the adler32 weakness for short files.  But for long files, the
checksum is getting strong enough and much faster.  Too bad that our
"files" are short.

Jörn

-- 
Optimizations always bust things, because all optimizations are, in
the long haul, a form of cheating, and cheaters eventually get caught.
-- Larry Wall 

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-16 21:22                     ` Jörn Engel
@ 2004-12-16 22:06                       ` Joakim Tjernlund
  2004-12-17 10:25                         ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-16 22:06 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Thomas Gleixner, Linux MTD mailing list

> On Thu, 16 December 2004 22:06:21 +0100, Joakim Tjernlund wrote:
> > 
> > There is still more one can do, look in lib/crc32.c in the kernel to see
> > what I mean. It is tricky to get everything right, but just unrolling it a bit will help.
> 
> Sure.  But I won't do it tonight anymore and the rest of the week is
> booked.

Same here :(

> 
> In related news, I just had a look at adler32 code.  It's pretty
> obvious why it runs so fast and why we shouldn't use it as a checksum:
> o It splits data into chunks of 5552 bytes.
> o Inside each chunk, it basically implements the two weak checksums
>   from rsync. (And those really are weak.  I once tried using them for
>   a hash table and the result was horrible.)
> o At the end of each chunk, it implements crc16 on both weak
>   checksums.
> 
> With jffs3 node containing less than 5552 bytes each, the crc16
> operation, a simple modulo, is done but once.  So in effect, we end up
> with the weak checksums from rsync, which cause tons of collisions in
> hash tables.  Pretty nasty for a checksum.
> 
> Istr reading something from one of the two authors that basically
> states the adler32 weakness for short files.  But for long files, the
> checksum is getting strong enough and much faster.  Too bad that our
> "files" are short.

I too rember reading about that. So a new optimzed crc16 is probably the best bet.
It should do better than crc32 once it is done.

hmm, I rember that the networking code has some crc/checksum algorithm too.
Maybe it is possible to use that?

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 22:06                       ` Joakim Tjernlund
@ 2004-12-17 10:25                         ` Jörn Engel
  2004-12-17 10:44                           ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-17 10:25 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

On Thu, 16 December 2004 23:06:41 +0100, Joakim Tjernlund wrote:
> > On Thu, 16 December 2004 22:06:21 +0100, Joakim Tjernlund wrote:
> > 
> > Istr reading something from one of the two authors that basically
> > states the adler32 weakness for short files.  But for long files, the
> > checksum is getting strong enough and much faster.  Too bad that our
> > "files" are short.
> 
> I too rember reading about that. So a new optimzed crc16 is probably the best bet.
> It should do better than crc32 once it is done.

Agreed.

> hmm, I rember that the networking code has some crc/checksum algorithm too.
> Maybe it is possible to use that?

include/linux/jhash.h

Looks rather complicated.  Iirc it was introduced to avoid DoS attacks
against the networking code.  With the old hashing algorithm
(include/linux/hash.h?), it was easy for an attacker to predict into
which hash bucket any given packet would go.  So it could just
generate tons of packets that would all hash to the same bucket.  End
result is that the hash table is pointless and performance is the same
as with a linked list.

Then there's ethernet checksumming, which is crc32.

TCP/IP should have some checksums as well, but I'm not familiar with
those right now.

Jörn

-- 
Schrödinger's cat is <BLINK>not</BLINK> dead.
-- Illiad

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 10:25                         ` Jörn Engel
@ 2004-12-17 10:44                           ` Joakim Tjernlund
  2004-12-17 10:56                             ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-17 10:44 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Thomas Gleixner, Linux MTD mailing list

> 
> On Thu, 16 December 2004 23:06:41 +0100, Joakim Tjernlund wrote:
> > > On Thu, 16 December 2004 22:06:21 +0100, Joakim Tjernlund wrote:
> > > 
> > > Istr reading something from one of the two authors that basically
> > > states the adler32 weakness for short files.  But for long files, the
> > > checksum is getting strong enough and much faster.  Too bad that our
> > > "files" are short.
> > 
> > I too rember reading about that. So a new optimzed crc16 is probably the best bet.
> > It should do better than crc32 once it is done.
> 
> Agreed.
> 
> > hmm, I rember that the networking code has some crc/checksum algorithm too.
> > Maybe it is possible to use that?
> 
> include/linux/jhash.h
> 
> Looks rather complicated.  Iirc it was introduced to avoid DoS attacks
> against the networking code.  With the old hashing algorithm
> (include/linux/hash.h?), it was easy for an attacker to predict into
> which hash bucket any given packet would go.  So it could just
> generate tons of packets that would all hash to the same bucket.  End
> result is that the hash table is pointless and performance is the same
> as with a linked list.
> 
> Then there's ethernet checksumming, which is crc32.
> 
> TCP/IP should have some checksums as well, but I'm not familiar with
> those right now.

Thats the ones I am thinking about, csum_partial and csum_partial_copy_generic.
Does not look like it can be used without modification.

Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
That give better L1 cache behaviour.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 10:56                             ` Artem B. Bityuckiy
@ 2004-12-17 10:46                               ` jasmine
  2004-12-17 11:01                                 ` Artem B. Bityuckiy
  2004-12-17 11:10                               ` Joakim Tjernlund
  2004-12-17 11:20                               ` Jörn Engel
  2 siblings, 1 reply; 196+ messages in thread
From: jasmine @ 2004-12-17 10:46 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund



On Fri, 17 Dec 2004, Artem B. Bityuckiy wrote:

>> Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
>> That give better L1 cache behaviour.
>>
> Why (may be some URL)? For all archs?

Because the process to use the buffer after the CRC will probably start 
from the beginning.  Processing the CRC from the beginning to the end will 
tend to leave the L1 cache full of the end of the buffer and thus the next 
process to use it will need to reload the L1.

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 10:44                           ` Joakim Tjernlund
@ 2004-12-17 10:56                             ` Artem B. Bityuckiy
  2004-12-17 10:46                               ` jasmine
                                                 ` (2 more replies)
  0 siblings, 3 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-17 10:56 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list, Thomas Gleixner

> Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> That give better L1 cache behaviour.
>
Why (may be some URL)? For all archs? 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 10:46                               ` jasmine
@ 2004-12-17 11:01                                 ` Artem B. Bityuckiy
  2004-12-17 11:19                                   ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-17 11:01 UTC (permalink / raw)
  To: jasmine; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Fri, 17 Dec 2004 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 17 Dec 2004, Artem B. Bityuckiy wrote:
> 
> >> Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> >> That give better L1 cache behaviour.
> >>
> > Why (may be some URL)? For all archs?
> 
> Because the process to use the buffer after the CRC will probably start 
> from the beginning.  Processing the CRC from the beginning to the end will 
> tend to leave the L1 cache full of the end of the buffer and thus the next 
> process to use it will need to reload the L1.
> 
> -J.
> 
Ok, I got it, thanks.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 10:56                             ` Artem B. Bityuckiy
  2004-12-17 10:46                               ` jasmine
@ 2004-12-17 11:10                               ` Joakim Tjernlund
  2004-12-17 11:20                                 ` Artem B. Bityuckiy
  2004-12-17 11:20                               ` Jörn Engel
  2 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-17 11:10 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Thomas Gleixner, Linux MTD mailing list

> 
> > Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> > That give better L1 cache behaviour.
> >
> Why (may be some URL)? For all archs? 

example, when writing a buffer to flash you first CRC it from top to bottom.
The bottom part of the buffer is then newer than the top of then buffer in the
L1 cache or the top part has already been evicted from the L1 cache depending
of the size of the buffer and size of L1 cache.

Now you start again from the top of the buffer to actually write the buffer
to flash and you probaly need to read the data from RAM into L1 cache again.

Reading from flash will cause similar effects.

If you calculate CRC backwards you will avoid trashing the cache needlessly.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 11:01                                 ` Artem B. Bityuckiy
@ 2004-12-17 11:19                                   ` Joakim Tjernlund
  2004-12-18 16:09                                     ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-17 11:19 UTC (permalink / raw)
  To: Artem B. Bityuckiy, jasmine; +Cc: Thomas Gleixner, Linux MTD mailing list

> On Fri, 17 Dec 2004 jasmine@linuxgrrls.org wrote:
> 
> > 
> > 
> > On Fri, 17 Dec 2004, Artem B. Bityuckiy wrote:
> > 
> > >> Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> > >> That give better L1 cache behaviour.
> > >>
> > > Why (may be some URL)? For all archs?
> > 
> > Because the process to use the buffer after the CRC will probably start 
> > from the beginning.  Processing the CRC from the beginning to the end will 
> > tend to leave the L1 cache full of the end of the buffer and thus the next 
> > process to use it will need to reload the L1.
> > 
> > -J.
> > 
> Ok, I got it, thanks.

I wish I read all new messages before I anwer so I don't answer when someone else already has :(

Looked a little closer on csum_partial and I think JFFS3 can use it. You need
csum_fold as well:
seed = ~0;
crc = csum_fold(csum_partial(buff, len, seed));

Don't know if it is good enough for JFFS3 but it is fast.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-17 10:56                             ` Artem B. Bityuckiy
  2004-12-17 10:46                               ` jasmine
  2004-12-17 11:10                               ` Joakim Tjernlund
@ 2004-12-17 11:20                               ` Jörn Engel
  2004-12-18 12:23                                 ` Artem B. Bityuckiy
  2 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-17 11:20 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Fri, 17 December 2004 10:56:22 +0000, Artem B. Bityuckiy wrote:
> 
> > Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> > That give better L1 cache behaviour.
> >
> Why (may be some URL)? For all archs? 

Sounds rather odd indeed.  Most CPUs I know about either have no
hardware-prefetch, can only prefetch forward or can prefetch both
forward and backward.  In either case, calculating the CRC backward
should either be the same speed or slower than going forward.

But I like to learn something new.

Jörn

-- 
I've never met a human being who would want to read 17,000 pages of
documentation, and if there was, I'd kill him to get him out of the
gene pool.
-- Joseph Costello

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 11:10                               ` Joakim Tjernlund
@ 2004-12-17 11:20                                 ` Artem B. Bityuckiy
  2004-12-22 13:36                                   ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-17 11:20 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

On Fri, 17 Dec 2004, Joakim Tjernlund wrote:

> > 
> > > Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> > > That give better L1 cache behaviour.
> > >
> > Why (may be some URL)? For all archs? 
> 
> example, when writing a buffer to flash you first CRC it from top to bottom.
> The bottom part of the buffer is then newer than the top of then buffer in the
> L1 cache or the top part has already been evicted from the L1 cache depending
> of the size of the buffer and size of L1 cache.
> 
> Now you start again from the top of the buffer to actually write the buffer
> to flash and you probaly need to read the data from RAM into L1 cache again.
> 
> Reading from flash will cause similar effects.
> 
> If you calculate CRC backwards you will avoid trashing the cache needlessly.
> 
>  Jocke
> 
Ok, thanks, good idea, I'll put it to the TODO file too.
 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 13:20 Joakim Tjernlund
  2004-12-16 14:27 ` Artem B. Bityuckiy
@ 2004-12-17 11:33 ` David Vrabel
  2004-12-17 15:34   ` Joakim Tjernlund
  2004-12-18 16:14   ` Jörn Engel
  2004-12-21 14:38 ` Jörn Engel
  2 siblings, 2 replies; 196+ messages in thread
From: David Vrabel @ 2004-12-17 11:33 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

Joakim Tjernlund wrote:
> 
> 2) Consider another checksum algorithm. Crc32 is very expensive
>    and JFFS2 suffered severely in the early days. Now that crc32 is
>    very optimized that problem is less visible, but crc32 is still
>    expensive. Maybe an Adler32 checksum is good enough or a crc16?

Does anyone have links to the profilling data that showed this?

(Also, I wouldn't have thought crc16 on 32 bit archs would have any 
significant performance benefits.  But since I don't know either the 
crc16 or crc32 algorithms... *shrug*)

David Vrabel
-- 
David Vrabel, Design Engineer

Arcom, Clifton Road           Tel: +44 (0)1223 411200 ext. 3233
Cambridge CB1 7EA, UK         Web: http://www.arcom.com/

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 11:33 ` David Vrabel
@ 2004-12-17 15:34   ` Joakim Tjernlund
  2004-12-18 16:14   ` Jörn Engel
  1 sibling, 0 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-17 15:34 UTC (permalink / raw)
  To: David Vrabel; +Cc: Linux MTD mailing list

> Joakim Tjernlund wrote:
> > 
> > 2) Consider another checksum algorithm. Crc32 is very expensive
> >    and JFFS2 suffered severely in the early days. Now that crc32 is
> >    very optimized that problem is less visible, but crc32 is still
> >    expensive. Maybe an Adler32 checksum is good enough or a crc16?
> 
> Does anyone have links to the profilling data that showed this?

I had long time ago. That was the reason I optimized the in kernel
crc32() function to its current status.

> 
> (Also, I wouldn't have thought crc16 on 32 bit archs would have any 
> significant performance benefits.  But since I don't know either the 
> crc16 or crc32 algorithms... *shrug*)

Well, benchmarking the crc32() and csum_partial() functions in the
kernel should give a clue.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-17 11:20                               ` Jörn Engel
@ 2004-12-18 12:23                                 ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-18 12:23 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Fri, 17 Dec 2004, [iso-8859-1] Jörn Engel wrote:

> On Fri, 17 December 2004 10:56:22 +0000, Artem B. Bityuckiy wrote:
> > 
> > > Another idea, calculate the CRC backwards(from the end of the buffer to the beginning).
> > > That give better L1 cache behaviour.
> > >
> > Why (may be some URL)? For all archs? 
> 
> Sounds rather odd indeed.  Most CPUs I know about either have no
> hardware-prefetch, can only prefetch forward or can prefetch both
> forward and backward.  In either case, calculating the CRC backward
> should either be the same speed or slower than going forward.
> 
> But I like to learn something new.
Hmm, I'm curious and exploring that too. Take a glimps at 
http://gcc.gnu.org/projects/prefetch.html
There are some useful references there.

> 
> Jörn
> 
> -- 
> I've never met a human being who would want to read 17,000 pages of
> documentation, and if there was, I'd kill him to get him out of the
> gene pool.
> -- Joseph Costello
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-17 11:19                                   ` Joakim Tjernlund
@ 2004-12-18 16:09                                     ` Jörn Engel
  2004-12-18 16:26                                       ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 16:09 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: jasmine, Thomas Gleixner, Linux MTD mailing list

On Fri, 17 December 2004 12:19:55 +0100, Joakim Tjernlund wrote:
> 
> Looked a little closer on csum_partial and I think JFFS3 can use it. You need
> csum_fold as well:
> seed = ~0;
> crc = csum_fold(csum_partial(buff, len, seed));
> 
> Don't know if it is good enough for JFFS3 but it is fast.

It sure is hard to read.  Is there any version in C around?

Jörn

-- 
It's not whether you win or lose, it's how you place the blame.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-17 11:33 ` David Vrabel
  2004-12-17 15:34   ` Joakim Tjernlund
@ 2004-12-18 16:14   ` Jörn Engel
  2004-12-18 16:25     ` Joakim Tjernlund
  2004-12-18 17:10     ` Joakim Tjernlund
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 16:14 UTC (permalink / raw)
  To: David Vrabel; +Cc: Linux MTD mailing list, Joakim Tjernlund

On Fri, 17 December 2004 11:33:42 +0000, David Vrabel wrote:
> 
> (Also, I wouldn't have thought crc16 on 32 bit archs would have any 
> significant performance benefits.  But since I don't know either the 
> crc16 or crc32 algorithms... *shrug*)

See my earlier posts in this thread for the algorithms.  Without
special optimizations, you can process 16 bits at a time for crc16 and
0 (zero) bits at a time for crc32.

Jörn

-- 
People will accept your ideas much more readily if you tell them
that Benjamin Franklin said it first.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 16:14   ` Jörn Engel
@ 2004-12-18 16:25     ` Joakim Tjernlund
  2004-12-18 16:39       ` Jörn Engel
  2004-12-18 17:10     ` Joakim Tjernlund
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 16:25 UTC (permalink / raw)
  To: Jörn Engel, David Vrabel; +Cc: Linux MTD mailing list

> On Fri, 17 December 2004 11:33:42 +0000, David Vrabel wrote:
> >
> > (Also, I wouldn't have thought crc16 on 32 bit archs would have any
> > significant performance benefits.  But since I don't know either the
> > crc16 or crc32 algorithms... *shrug*)
>
> See my earlier posts in this thread for the algorithms.  Without
> special optimizations, you can process 16 bits at a time for crc16 and
> 0 (zero) bits at a time for crc32.
>
> Jörn

Hi Jörn

Care to run this in your test program? Use the
same table as tglx posted. This will only work on LE machines
for now.

 Jocke

#define DO_CRC(x) crc = crc16tab[ (crc ^ (x)) & 255 ] ^ (crc>>8)

unsigned short crc16 (unsigned char *buf, size_t len)
{
  const unsigned short  *b =(unsigned short *)buf;
  unsigned short crc = 0xffff;

  if(len >= 2){
    /* load data 16 bits wide, xor data 16 bits wide. */
    size_t save_len = len & 1;
    len = len >> 1;
    do {
      crc ^= *b++;
      DO_CRC(0);
      DO_CRC(0);
    } while (--len);
    len = save_len;
  }
  /* And the last few bytes */
  if(len){
      unsigned char *p = (unsigned char *)b;
      DO_CRC(*p);
  }
  return crc;
}

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 16:09                                     ` Jörn Engel
@ 2004-12-18 16:26                                       ` Joakim Tjernlund
  2004-12-18 16:52                                         ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 16:26 UTC (permalink / raw)
  To: Jörn Engel; +Cc: jasmine, Thomas Gleixner, Linux MTD mailing list

> On Fri, 17 December 2004 12:19:55 +0100, Joakim Tjernlund wrote:
> >
> > Looked a little closer on csum_partial and I think JFFS3 can use it. You need
> > csum_fold as well:
> > seed = ~0;
> > crc = csum_fold(csum_partial(buff, len, seed));
> >
> > Don't know if it is good enough for JFFS3 but it is fast.
>
> It sure is hard to read.  Is there any version in C around?
>
> Jörn

Found this in RFC1017:

       {
           /* Compute Internet Checksum for "count" bytes
            *         beginning at location "addr".
            */
       register long sum = 0;

        while( count > 1 )  {
           /*  This is the inner loop */
               sum += * (unsigned short) addr++;
               count -= 2;
       }

           /*  Add left-over byte, if any */
       if( count > 0 )
               sum += * (unsigned char *) addr;

           /*  Fold 32-bit sum to 16 bits */
       while (sum>>16)
           sum = (sum & 0xffff) + (sum >> 16);

       checksum = ~sum;
   }

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-18 16:25     ` Joakim Tjernlund
@ 2004-12-18 16:39       ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 16:39 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Sat, 18 December 2004 17:25:00 +0100, Joakim Tjernlund wrote:
> 
> Care to run this in your test program? Use the
> same table as tglx posted. This will only work on LE machines
> for now.

real    0m0.355s
user    0m0.285s
sys     0m0.066s

Jörn

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-18 16:26                                       ` Joakim Tjernlund
@ 2004-12-18 16:52                                         ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 16:52 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: jasmine, Thomas Gleixner, Linux MTD mailing list

On Sat, 18 December 2004 17:26:59 +0100, Joakim Tjernlund wrote:
> > On Fri, 17 December 2004 12:19:55 +0100, Joakim Tjernlund wrote:
> >
> > It sure is hard to read.  Is there any version in C around?
> 
> Found this in RFC1017:

Thanks!

>        {
>            /* Compute Internet Checksum for "count" bytes
>             *         beginning at location "addr".
>             */
>        register long sum = 0;
> 
>         while( count > 1 )  {
>            /*  This is the inner loop */
>                sum += * (unsigned short) addr++;
>                count -= 2;
>        }
> 
>            /*  Add left-over byte, if any */
>        if( count > 0 )
>                sum += * (unsigned char *) addr;
> 
>            /*  Fold 32-bit sum to 16 bits */
>        while (sum>>16)
>            sum = (sum & 0xffff) + (sum >> 16);
> 
>        checksum = ~sum;
>    }

Even weaker than the weak checksums from rsync.  Hmm...

The algorithm should be faster than adler32, but also less secure.
Tough question now is how secure does the algorithm have to be?

o How often would we expect data corruption on the flash?
o How is it handled?  Mark erase as bad block and continue?
o How many data corruptions occur before flash is declared broken?

In the end, if data gets replaced with completely random garbage, is
it good enough to miss once such occurence out of 10?  Out of 1000?
1000000?

Jörn

-- 
Courage is not the absence of fear, but rather the judgement that
something else is more important than fear.
-- Ambrose Redmoon

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 16:14   ` Jörn Engel
  2004-12-18 16:25     ` Joakim Tjernlund
@ 2004-12-18 17:10     ` Joakim Tjernlund
  2004-12-18 17:19       ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 17:10 UTC (permalink / raw)
  To: Jörn Engel, David Vrabel; +Cc: Linux MTD mailing list

> On Fri, 17 December 2004 11:33:42 +0000, David Vrabel wrote:
> > 
> > (Also, I wouldn't have thought crc16 on 32 bit archs would have any 
> > significant performance benefits.  But since I don't know either the 
> > crc16 or crc32 algorithms... *shrug*)
> 
> See my earlier posts in this thread for the algorithms.  Without
> special optimizations, you can process 16 bits at a time for crc16 and
> 0 (zero) bits at a time for crc32.

hmm, what about the kernel version of crc32. I think that one should
be classified as 32 bits at a time. Is that the version you use in your tests?

   Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-18 17:10     ` Joakim Tjernlund
@ 2004-12-18 17:19       ` Jörn Engel
  2004-12-18 17:51         ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 17:19 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Sat, 18 December 2004 18:10:42 +0100, Joakim Tjernlund wrote:
> 
> hmm, what about the kernel version of crc32. I think that one should
> be classified as 32 bits at a time. Is that the version you use in your tests?

I use the zlib version on my machine, zlib 1.2.2.  It should have
about the same performance.  At least that's what I thought when I
last looked at the code.

Jörn

-- 
He that composes himself is wiser than he that composes a book.
-- B. Franklin

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 17:19       ` Jörn Engel
@ 2004-12-18 17:51         ` Joakim Tjernlund
  2004-12-18 17:59           ` Jörn Engel
  2004-12-18 18:09           ` Joakim Tjernlund
  0 siblings, 2 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 17:51 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

> On Sat, 18 December 2004 18:10:42 +0100, Joakim Tjernlund wrote:
> > 
> > hmm, what about the kernel version of crc32. I think that one should
> > be classified as 32 bits at a time. Is that the version you use in your tests?
> 
> I use the zlib version on my machine, zlib 1.2.2.  It should have
> about the same performance.  At least that's what I thought when I
> last looked at the code.

I will have a look.

BTW, have you tested your crc16 for correctness?
It doesn't give the same result as mine or Thomas versions. 

uint32_t crc16(uint32_t crc, const void *_s, size_t len)
{
	const uint16_t *s = _s;
	uint32_t ret = crc;
	for (; len>1; len-=2,s++) {
		ret <<= 16;
		ret += *s;
		ret %= 65521;
	}
	return ret;
}

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-18 17:51         ` Joakim Tjernlund
@ 2004-12-18 17:59           ` Jörn Engel
  2004-12-18 18:13             ` Joakim Tjernlund
  2004-12-18 18:09           ` Joakim Tjernlund
  1 sibling, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-18 17:59 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Sat, 18 December 2004 18:51:08 +0100, Joakim Tjernlund wrote:
> 
> BTW, have you tested your crc16 for correctness?
> It doesn't give the same result as mine or Thomas versions. 

Nope.  I completely ignored the remainding byte if len is uneven and
didn't bother much about the usual corner-cases.  Xor with ~0, off by
one, all that.  It's not performance relevant, so that can be done
later.

> uint32_t crc16(uint32_t crc, const void *_s, size_t len)
> {
> 	const uint16_t *s = _s;
> 	uint32_t ret = crc;
> 	for (; len>1; len-=2,s++) {
> 		ret <<= 16;
> 		ret += *s;
> 		ret %= 65521;
> 	}
> 	return ret;
> }

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 17:51         ` Joakim Tjernlund
  2004-12-18 17:59           ` Jörn Engel
@ 2004-12-18 18:09           ` Joakim Tjernlund
  1 sibling, 0 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 18:09 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

> > On Sat, 18 December 2004 18:10:42 +0100, Joakim Tjernlund wrote:
> > > 
> > > hmm, what about the kernel version of crc32. I think that one should
> > > be classified as 32 bits at a time. Is that the version you use in your tests?
> > 
> > I use the zlib version on my machine, zlib 1.2.2.  It should have
> > about the same performance.  At least that's what I thought when I
> > last looked at the code.
> 
> I will have a look.

Only found zlib 1.2.1 and that uses a similar algorithm as the kernel but the tables are
gigantic. I think the kernel version is a better match for JFFS.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-18 17:59           ` Jörn Engel
@ 2004-12-18 18:13             ` Joakim Tjernlund
  2004-12-19  3:05               ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-18 18:13 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

> On Sat, 18 December 2004 18:51:08 +0100, Joakim Tjernlund wrote:
> >
> > BTW, have you tested your crc16 for correctness?
> > It doesn't give the same result as mine or Thomas versions.
>
> Nope.  I completely ignored the remainding byte if len is uneven and
> didn't bother much about the usual corner-cases.  Xor with ~0, off by
> one, all that.  It's not performance relevant, so that can be done
> later.

Yes, I noticed that but crc16(~0, "abcd", 4) does
not yield the same result as mine.

>
> > uint32_t crc16(uint32_t crc, const void *_s, size_t len)
> > {
> > 	const uint16_t *s = _s;
> > 	uint32_t ret = crc;
> > 	for (; len>1; len-=2,s++) {
> > 		ret <<= 16;
> > 		ret += *s;
> > 		ret %= 65521;
> > 	}
> > 	return ret;
> > }
>
> Jörn
>
> --
> Mundie uses a textbook tactic of manipulation: start with some
> reasonable talk, and lead the audience to an unreasonable conclusion.
> -- Bruce Perens

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-18 18:13             ` Joakim Tjernlund
@ 2004-12-19  3:05               ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-19  3:05 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Sat, 18 December 2004 19:13:35 +0100, Joakim Tjernlund wrote:
> >
> > Nope.  I completely ignored the remainding byte if len is uneven and
> > didn't bother much about the usual corner-cases.  Xor with ~0, off by
> > one, all that.  It's not performance relevant, so that can be done
> > later.
> 
> Yes, I noticed that but crc16(~0, "abcd", 4) does
> not yield the same result as mine.

int main()
{
	int *i = (void*)"abcd";
	printf("%d", *i % 65521);
}

17544

This results differs from both.  Maybe I can understand the difference
after getting some sleep.

Jörn

-- 
They laughed at Galileo.  They laughed at Copernicus.  They laughed at
Columbus. But remember, they also laughed at Bozo the Clown.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 13:20 Joakim Tjernlund
  2004-12-16 14:27 ` Artem B. Bityuckiy
  2004-12-17 11:33 ` David Vrabel
@ 2004-12-21 14:38 ` Jörn Engel
  2 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-21 14:38 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list

On Thu, 16 December 2004 14:20:43 +0100, Joakim Tjernlund wrote:
> 
> 2) Consider another checksum algorithm. Crc32 is very expensive
>    and JFFS2 suffered severely in the early days. Now that crc32 is
>    very optimized that problem is less visible, but crc32 is still
>    expensive. Maybe an Adler32 checksum is good enough or a crc16?

Ok, after further thought, I actually like adler32.  It uses two
checksums that individually are very weak.  But the combination of the
two proves to be surprisingly strong.  Running my hash-table test,
adler32 is on par with the best hash function I could come up with
(and faster than any hash function in the kernel).

The hash-table test is quite nice, as it penalizes weak hash
functions.  Hash functions that run fast, but cause many collisions,
cause more time to be spent in search for a free slot.  So only
functions that are fast _and_ strong do well.  And adler32 does
extremely well.

So, unless you want to change the function to run backwards, I'd
propose to use adler32.

Jörn

-- 
/* Keep these two variables together */
int bar;

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-16 17:53     ` Jörn Engel
  2004-12-16 18:42       ` Artem B. Bityuckiy
@ 2004-12-21 14:45       ` Artem B. Bityuckiy
  2004-12-21 16:03         ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-21 14:45 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Joakim Tjernlund

Hi, 

> Principle of crc:
> A crc is nothing but the remainder of an integer division.  That
> simple.
>
>
> Example crc4:
> crc4(data) = data%13;
Hmm, is it really correct? CRC is ramainder, but in mod 2 arithmetic. I 
consulted http://www.cs.waikato.ac.nz/~312/crc.txt

And if so, your algorithm
u32 crc24(void *m, size_t len)
{
        u32 ret=0;
        char *s=m;
        size_t i;
        for (i=0; i<len; i++) {
                ret <<= 8;
                ret += s[i];
                ret %/ 0xfffffd
        }
        return ret;
}

doesn't look like something which is called CRC...

Do I misunderstand something?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-21 14:45       ` Artem B. Bityuckiy
@ 2004-12-21 16:03         ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-21 16:03 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list, Joakim Tjernlund

On Tue, 21 December 2004 14:45:38 +0000, Artem B. Bityuckiy wrote:
>
> Hmm, is it really correct? CRC is ramainder, but in mod 2 arithmetic. I 
> consulted http://www.cs.waikato.ac.nz/~312/crc.txt
> 
> And if so, your algorithm
> [...]
> doesn't look like something which is called CRC...
> 
> Do I misunderstand something?

Nope, but you cleared my misunderstanding.  A very nice article.
Thanks!

PS: Still, adler32 or a variant is my current favorite.

Jörn

-- 
Premature optimization is the root of all evil.
-- Donald Knuth

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-17 11:20                                 ` Artem B. Bityuckiy
@ 2004-12-22 13:36                                   ` Artem B. Bityuckiy
  2004-12-22 14:03                                     ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 13:36 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

Hello,

We began interesting discussions concerning CRC.
There were several issues:
- on 32-bit architectures crc16 may be calculated not so fast;
- proposition to calculate CRC starting from the end;
- counter-evidence about forward/backward prefetching;

I decided to write a test to experimentally check the CRCs speed.

Short description:
1. Test works in kernel space in order to be precise. We don't want to be 
affected by preemptions, interrupts, etc etc.
2. Test is implemented as Kernel module and does actual testing in the 
module init function. Of course it can be linked with the kernel. In the 
later case just see the kernel boot log.
3. Preemption and interrupts are disabled during the test.
4. Test uses the high resolution timer (CPU clocks counter, etc). As one 
can see it is defined for many platforms (asm/timex.h).

For now I have added only CRCs which are in linux/lib. They do forward CRC 
check of course.

I offer people to review the test, add more CRCs. Then we will try it on 
different platforms (hope people here have different boards with Linux).

Note 1: crc32c and crc-ccitt should be explicitly enabled in the Linux 
configuration or the correspondent modules loaded);

Note 2: I inserted file content in order to pass infradead.org's 
filtering.

Note 3: For now only was run on x86, 2.6.8 kernel. Hope will work on other 
arch/kernels.

-------------------------------------------------------------
/*
 *      This program is free software; you can redistribute it and/or 
modify
 *      it under the terms of the GNU General Public License as published 
by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 *      Author: Artem B. Bityuckiy, dedekind@infradead.org
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

/* The number of memory cunks to test */
#define MEM_CHUNKS 3

/* The test iterations number */
#define ITERATIONS 1

#define TEST_PREFIX "[crctst] "

/* 
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it.
 */
#define TIMESTAMP get_cycles()

/* 
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int
memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 64*1024};

/* 
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
	register int i, j;
	char *mem[MEM_CHUNKS];
	unsigned long flags;
	spinlock_t lock = SPIN_LOCK_UNLOCKED;
	int ret = 0;
	cycles_t ts1, ts2;

	memcmp(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

	/* Allocate memory */
	for (i = 0; i < MEM_CHUNKS; i++) {
		if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
			printk(KERN_ERR TEST_PREFIX "can't allocate %d 
bytes\n",
					memsizes[i]);
			ret = -ENOMEM;
			goto exit;
		}
	}
	
	/* 
	 * We do not want to be preempted during the test as well as do
	 * not want interrupts affect our results. Both of these are
	 * prevented by spin_lock_irqsave().
	 */
	spin_lock_irqsave(&lock, flags);
	
	/* Test 16 bit CRC CCITT */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u16 crc;
		
		/* Do one fake pass to exclude CPU cache influence */
		crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "16-bit CRC CCITT %d bytes: 
"
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test crc32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32(0xFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc32(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	
	/* Test crc32c */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32c(0xFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc32c(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32c %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	spin_unlock_irqrestore(&lock, flags);
	
exit:
	for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
		kfree(mem[i]);

	return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
	return;
}

module_exit(cleanup_crctest);

MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");

-------------------------------------------------------------

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 13:36                                   ` Artem B. Bityuckiy
@ 2004-12-22 14:03                                     ` Jörn Engel
  2004-12-22 14:44                                       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 14:03 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 13:36:47 +0000, Artem B. Bityuckiy wrote:
> 
> I decided to write a test to experimentally check the CRCs speed.

Looks good.  Thanks.

> For now I have added only CRCs which are in linux/lib. They do forward CRC 
> check of course.

How about adding adler32 and possibly this algorithm?  Engel32
(missing a better name) is basically adler32, but instead of a
modula-operation, it mirrors one of the two checksums (last bit
becomes first, etc.) and XORs both.

Doesn't have any measurable effect on i386.  Leaving out any of the
five mirror rounds makes the algorithm lightly worse than adler32 in
my hash-table test.  Maybe on some cpu either the modulo or the
mirroring is much slower.

uint32_t engel32(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t sum=engel, prod=engel;
	for (; len>=4; len-=4, s+=4) {
		sum += s[0];
		prod += sum;
		sum += s[1];
		prod += sum;
		sum += s[2];
		prod += sum;
		sum += s[3];
		prod += sum;
	}
	for (; len; len--, s++) {
		sum += *s;
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}

Jörn

-- 
Victory in war is not repetitious.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 14:03                                     ` Jörn Engel
@ 2004-12-22 14:44                                       ` Artem B. Bityuckiy
  2004-12-22 15:14                                         ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 14:44 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

Hello,

> How about adding adler32 and possibly this algorithm?  Engel32
> (missing a better name) is basically adler32, but instead of a
> modula-operation, it mirrors one of the two checksums (last bit
> becomes first, etc.) and XORs both.
Adler32 was added. Engel32 too.

Did yiu investigate theoretically these algorithms? Why engel32 is weaker?

Would somebody like to add backward CRCs to the test?

I ran test on i686 (just to see it is compiled and works). Results are:

[crctst] 16-bit CRC CCITT 32 bytes: ts1 14352216056936, ts2 
14352216057276, delta 340
[crctst] 16-bit CRC CCITT 4096 bytes: ts1 14352237257556, ts2 
14352237286332, delta 28776
[crctst] 16-bit CRC CCITT 65536 bytes: ts1 14352259870036, ts2 
14352260333528, delta 463492
[crctst] crc32 32 bytes: ts1 14352282934664, ts2 14352282934980, delta 316
[crctst] crc32 4096 bytes: ts1 14352301401184, ts2 14352301429996, delta 
28812
[crctst] crc32 65536 bytes: ts1 14352321296456, ts2 14352321726112, delta 
429656
[crctst] crc32c 32 bytes: ts1 14352341677716, ts2 14352341678048, delta 
332
[crctst] crc32c 4096 bytes: ts1 14352360411764, ts2 14352360436508, delta 
24744
[crctst] crc32c 65536 bytes: ts1 14352380586256, ts2 14352381043780, delta 
457524
[crctst] adler32 32 bytes: ts1 14352401248340, ts2 14352401248540, delta 
200
[crctst] adler32 4096 bytes: ts1 14352420251940, ts2 14352420256744, delta 
4804
[crctst] adler32 65536 bytes: ts1 14352439968500, ts2 14352440043460, 
delta 74960
[crctst] engel32 32 bytes: ts1 14352460224760, ts2 14352460224960, delta 
200
[crctst] engel32 4096 bytes: ts1 14352479232376, ts2 14352479240700, delta 
8324
[crctst] engel32 65536 bytes: ts1 14352499089344, ts2 14352499228820, 
delta 139476

New test version:
---------------------------------------------------------
/*
 *      This program is free software; you can redistribute it and/or 
modify
 *      it under the terms of the GNU General Public License as published 
by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 *      Author: Artem B. Bityuckiy, dedekind@infradead.org
 *      Version: 1.1
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

/* The number of memory cunks to test */
#define MEM_CHUNKS 3

/* The test iterations number */
#define ITERATIONS 1

#define TEST_PREFIX "[crctst] "

/* 
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it.
 */
#define TIMESTAMP get_cycles()

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len);

static uint32_t
engel32(uint32_t engel, const void *_s, size_t len);

/* 
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int
memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 64*1024};

/* 
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
	register int i, j;
	char *mem[MEM_CHUNKS];
	unsigned long flags;
	spinlock_t lock = SPIN_LOCK_UNLOCKED;
	int ret = 0;
	cycles_t ts1, ts2;

	memcmp(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

	/* Allocate memory */
	for (i = 0; i < MEM_CHUNKS; i++) {
		if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
			printk(KERN_ERR TEST_PREFIX "can't allocate %d 
bytes\n",
					memsizes[i]);
			ret = -ENOMEM;
			goto exit;
		}
	}
	
	/* 
	 * We do not want to be preempted during the test as well as do
	 * not want interrupts affect our results. Both of these are
	 * prevented by spin_lock_irqsave().
	 */
	spin_lock_irqsave(&lock, flags);
	
	/* Test 16 bit CRC CCITT */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u16 crc;
		
		/* Do one fake pass to exclude CPU cache influence */
		crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "16-bit CRC CCITT %d bytes: 
"
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test crc32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	
	/* Test crc32c */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32c %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test adler32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		unsigned long crc;
		
		crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "adler32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test engel32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		uint32_t crc;
		
		crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "engel32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	spin_unlock_irqrestore(&lock, flags);
	
exit:
	for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
		kfree(mem[i]);

	return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
	return;
}

module_exit(cleanup_crctest);

/*
 * Was borrowed from include/linux/zutil.h
 */
#define NMAX 5552
#define BASE 65521L
#define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
#define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
#define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
#define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
#define DO16(buf)   DO8(buf,0); DO8(buf,8);

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len)
{
    unsigned long s1 = adler & 0xffff;
    unsigned long s2 = (adler >> 16) & 0xffff;
    int k;

    if (buf == NULL) return 1L;

    while (len > 0) {
        k = len < NMAX ? len : NMAX;
        len -= k;
        while (k >= 16) {
            DO16(buf);
            buf += 16;
            k -= 16;
        }
        if (k != 0) do {
            s1 += *buf++;
            s2 += s1;
        } while (--k);
        s1 %= BASE;
        s2 %= BASE;
    }
    return (s2 << 16) | s1;
}

/*
 * Jorn Engel's algorithm.
 */
static uint32_t
engel32(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t sum=engel, prod=engel;
	for (; len>=4; len-=4, s+=4) {
		sum += s[0];
		prod += sum;
		sum += s[1];
		prod += sum;
		sum += s[2];
		prod += sum;
		sum += s[3];
		prod += sum;
	}
	for (; len; len--, s++) {
		sum += *s;
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}

MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");

---------------------------------------------------------

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 14:44                                       ` Artem B. Bityuckiy
@ 2004-12-22 15:14                                         ` Jörn Engel
  2004-12-22 15:25                                           ` Artem B. Bityuckiy
  2004-12-22 15:30                                           ` Joakim Tjernlund
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 15:14 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 14:44:35 +0000, Artem B. Bityuckiy wrote:
> 
> > How about adding adler32 and possibly this algorithm?  Engel32
> > (missing a better name) is basically adler32, but instead of a
> > modula-operation, it mirrors one of the two checksums (last bit
> > becomes first, etc.) and XORs both.
> Adler32 was added. Engel32 too.
> 
> Did yiu investigate theoretically these algorithms? Why engel32 is weaker?

Yup, engel32 should be roughly the same.  Here are the details:

o Both algorithms calculate two checksum.  One is the sum of all input
  characters, the other is the sum of (input character x position),
  where the position is counted backwards.
  From here on, they are called "sum" and "product".

o Either sum or product is a weak checksum.

o The combination of sum and product is a strong checksum.

Differences show up in how sum and product are combined:

o Adler32 splits the 32bit checksum in 16 bits for sum and 16 bits for
  product.  Both are calculated seperately in 32bit variables.  In the
  end, a modulo 65521 operation folds the higher bits of each checksum
  to the lower 16 bits and both parts are concatenated.

o Engel32 mirrors the sum and xors sum and product.

Problems of adler32:
o The product checksum grows faster than the sum checksum.  For some
  input data, the product holds more than 16 bits of relevant
  information, while the sum holds less.  After combining both, only
  16 bits of the product are used, so the result has less than 32 bits
  of relevant information.

Problems of engel32:
o When creating sum and product, high bits can never influence low
  bits, while low bits influence high bits.  Using modulo in adler32
  causes high bits to influence low bits as well.  For engel32, the
  high bits of product never influence low bit again.


In the end, I guess that neither of the two problems matter and
adler32 has the advantage of being well-known and proven.

> Would somebody like to add backward CRCs to the test?

Here is backward engel32.  It runs about 1.5% faster than forward
engel32 in my hash test.  Other backward algorithms should be in the
same order (depending on hardware).

uint32_t engel32r(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t sum=engel, prod=engel;
	for (; len>=4; len-=4, s+=4) {
		sum += s[len-1];
		prod += sum;
		sum += s[len-2];
		prod += sum;
		sum += s[len-3];
		prod += sum;
		sum += s[len-4];
		prod += sum;
	}
	for (; len; len--, s++) {
		sum += s[len];
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}


> I ran test on i686 (just to see it is compiled and works). Results are:
> 
> [crctst] 16-bit CRC CCITT 32 bytes: ts1 14352216056936, ts2 
> 14352216057276, delta 340
> [crctst] 16-bit CRC CCITT 4096 bytes: ts1 14352237257556, ts2 
> 14352237286332, delta 28776
> [crctst] 16-bit CRC CCITT 65536 bytes: ts1 14352259870036, ts2 
> 14352260333528, delta 463492
> [crctst] crc32 32 bytes: ts1 14352282934664, ts2 14352282934980, delta 316
> [crctst] crc32 4096 bytes: ts1 14352301401184, ts2 14352301429996, delta 
> 28812
> [crctst] crc32 65536 bytes: ts1 14352321296456, ts2 14352321726112, delta 
> 429656
> [crctst] crc32c 32 bytes: ts1 14352341677716, ts2 14352341678048, delta 
> 332
> [crctst] crc32c 4096 bytes: ts1 14352360411764, ts2 14352360436508, delta 
> 24744
> [crctst] crc32c 65536 bytes: ts1 14352380586256, ts2 14352381043780, delta 
> 457524
> [crctst] adler32 32 bytes: ts1 14352401248340, ts2 14352401248540, delta 
> 200
> [crctst] adler32 4096 bytes: ts1 14352420251940, ts2 14352420256744, delta 
> 4804
> [crctst] adler32 65536 bytes: ts1 14352439968500, ts2 14352440043460, 
> delta 74960
> [crctst] engel32 32 bytes: ts1 14352460224760, ts2 14352460224960, delta 
> 200
> [crctst] engel32 4096 bytes: ts1 14352479232376, ts2 14352479240700, delta 
> 8324
> [crctst] engel32 65536 bytes: ts1 14352499089344, ts2 14352499228820, 
> delta 139476

Adler32 beats the hell out of every other algorithm.  Except for the
backwards part, it appears to be a clear winner.

Jörn

-- 
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson 

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:14                                         ` Jörn Engel
@ 2004-12-22 15:25                                           ` Artem B. Bityuckiy
  2004-12-22 16:08                                             ` Jörn Engel
  2004-12-22 20:22                                             ` xemc
  2004-12-22 15:30                                           ` Joakim Tjernlund
  1 sibling, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 15:25 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

Jorn, Could you please provide adler32r?

engel32r added, newer test is:

--------------------------------------------------------------
/*
 *      This program is free software; you can redistribute it and/or 
modify
 *      it under the terms of the GNU General Public License as published 
by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 *      Author: Artem B. Bityuckiy, dedekind@infradead.org
 *      Version: 1.2
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

/* The number of memory cunks to test */
#define MEM_CHUNKS 3

/* The test iterations number */
#define ITERATIONS 1

#define TEST_PREFIX "[crctst] "

/* 
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it.
 */
#define TIMESTAMP get_cycles()

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len);

static uint32_t
engel32(uint32_t engel, const void *_s, size_t len);

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len);

/* 
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int
memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 64*1024};

/* 
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
	register int i, j;
	char *mem[MEM_CHUNKS];
	unsigned long flags;
	spinlock_t lock = SPIN_LOCK_UNLOCKED;
	int ret = 0;
	cycles_t ts1, ts2;

	memcmp(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

	/* Allocate memory */
	for (i = 0; i < MEM_CHUNKS; i++) {
		if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
			printk(KERN_ERR TEST_PREFIX "can't allocate %d 
bytes\n",
					memsizes[i]);
			ret = -ENOMEM;
			goto exit;
		}
	}
	
	/* 
	 * We do not want to be preempted during the test as well as do
	 * not want interrupts affect our results. Both of these are
	 * prevented by spin_lock_irqsave().
	 */
	spin_lock_irqsave(&lock, flags);
	
	/* Test 16 bit CRC CCITT */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u16 crc;
		
		/* Do one fake pass to exclude CPU cache influence */
		crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "16-bit CRC CCITT %d bytes: 
"
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test crc32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	
	/* Test crc32c */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32c %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test adler32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		unsigned long crc;
		
		crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "adler32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test engel32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		uint32_t crc;
		
		crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "engel32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test engel32r */
	for (i = 0; i < MEM_CHUNKS; i++) {
		uint32_t crc;
		
		crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++)
			crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "engel32r %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	spin_unlock_irqrestore(&lock, flags);
	
exit:
	for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
		kfree(mem[i]);

	return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
	return;
}

module_exit(cleanup_crctest);

/*
 * Was borrowed from include/linux/zutil.h
 */
#define NMAX 5552
#define BASE 65521L
#define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
#define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
#define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
#define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
#define DO16(buf)   DO8(buf,0); DO8(buf,8);

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len)
{
    unsigned long s1 = adler & 0xffff;
    unsigned long s2 = (adler >> 16) & 0xffff;
    int k;

    if (buf == NULL) return 1L;

    while (len > 0) {
        k = len < NMAX ? len : NMAX;
        len -= k;
        while (k >= 16) {
            DO16(buf);
            buf += 16;
            k -= 16;
        }
        if (k != 0) do {
            s1 += *buf++;
            s2 += s1;
        } while (--k);
        s1 %= BASE;
        s2 %= BASE;
    }
    return (s2 << 16) | s1;
}

/*
 * Jorn Engel's algorithm.
 */
static uint32_t
engel32(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t sum=engel, prod=engel;
	for (; len>=4; len-=4, s+=4) {
		sum += s[0];
		prod += sum;
		sum += s[1];
		prod += sum;
		sum += s[2];
		prod += sum;
		sum += s[3];
		prod += sum;
	}
	for (; len; len--, s++) {
		sum += *s;
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len)
{
        const char *s = _s;
        uint32_t sum=engel, prod=engel;
        for (; len>=4; len-=4, s+=4) {
                sum += s[len-1];
                prod += sum;
                sum += s[len-2];
                prod += sum;
                sum += s[len-3];
                prod += sum;
                sum += s[len-4];
                prod += sum;
        }
        for (; len; len--, s++) {
                sum += s[len];
                prod += sum;
        }
        sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
        sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
        sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
        sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
        sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
        prod ^= sum;
        return prod;
}


MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");

--------------------------------------------------------------

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:14                                         ` Jörn Engel
  2004-12-22 15:25                                           ` Artem B. Bityuckiy
@ 2004-12-22 15:30                                           ` Joakim Tjernlund
  2004-12-22 15:37                                             ` Artem B. Bityuckiy
  2004-12-22 15:56                                             ` Jörn Engel
  1 sibling, 2 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-22 15:30 UTC (permalink / raw)
  To: Jörn Engel, Artem B. Bityuckiy
  Cc: Thomas Gleixner, Linux MTD mailing list

> Here is backward engel32.  It runs about 1.5% faster than forward
> engel32 in my hash test.  Other backward algorithms should be in the
> same order (depending on hardware).

Hi again

Can you do
1) First run a forward and then a backward version on memory space >= 2*L1 cache
2) Then do two forward runs on the same memory.

Compare the delta between 1) and 2)
That should give a clue if it is worth having a backward version.

> 
> Adler32 beats the hell out of every other algorithm.  Except for the
> backwards part, it appears to be a clear winner.

Have you look at the assembler Engler32 generates? Every instruction counts
in such small loops.

 Jocke

PS.
 I am levaing for Christmas in 2 hours and will be offline more or less until January
 so I guess I won't be much help in the near future.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:30                                           ` Joakim Tjernlund
@ 2004-12-22 15:37                                             ` Artem B. Bityuckiy
  2004-12-22 15:47                                               ` Joakim Tjernlund
  2004-12-22 15:56                                             ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 15:37 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list, Thomas Gleixner

> Hi again
> 
> Can you do
> 1) First run a forward and then a backward version on memory space >= 
2*L1 cache
> 2) Then do two forward runs on the same memory.
> 
> Compare the delta between 1) and 2)
> That should give a clue if it is worth having a backward version.
We may do this, but frankly speaking I don't understand the goal. IMHO, we 
believe that backward checking is better providing things with prefetching 
is equivalent. So, IMHO, we just need to grasp with prefetching issues...

> 
> Have you look at the assembler Engler32 generates? Every instruction 
counts
> in such small loops.
Which arch?

> PS.
>  I am levaing for Christmas in 2 hours and will be offline more or less 
until January
>  so I guess I won't be much help in the near future.
Ok, we may continue discusssing on Jan :-)

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:37                                             ` Artem B. Bityuckiy
@ 2004-12-22 15:47                                               ` Joakim Tjernlund
  2004-12-22 15:56                                                 ` Artem B. Bityuckiy
  2004-12-22 15:59                                                 ` jasmine
  0 siblings, 2 replies; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-22 15:47 UTC (permalink / raw)
  To: Artem Bityuckiy; +Cc: Linux MTD mailing list, Thomas Gleixner

> > Hi again
> > 
> > Can you do
> > 1) First run a forward and then a backward version on memory space >= 
> 2*L1 cache
> > 2) Then do two forward runs on the same memory.
> > 
> > Compare the delta between 1) and 2)
> > That should give a clue if it is worth having a backward version.
> We may do this, but frankly speaking I don't understand the goal. IMHO, we 
> believe that backward checking is better providing things with prefetching 
> is equivalent. So, IMHO, we just need to grasp with prefetching issues...

Prefetching may work well for some archs, but for low end embedded CPUs
I am not so sure. You will probably need to add arch specific code also to
do prefetching. 

> 
> > 
> > Have you look at the assembler Engler32 generates? Every instruction 
> counts
> > in such small loops.
> Which arch?

Any arch, the ones you have. All the other checksums has been optimized very
carefully over time. Engler32 is brand new and there is probably room for some
improvement.

> 
> > PS.
> >  I am levaing for Christmas in 2 hours and will be offline more or less 
> until January
> >  so I guess I won't be much help in the near future.
> Ok, we may continue discusssing on Jan :-)

Yep, I may get E-mail access were I am going so maybe I can do something.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:47                                               ` Joakim Tjernlund
@ 2004-12-22 15:56                                                 ` Artem B. Bityuckiy
  2004-12-22 16:09                                                   ` Jörn Engel
  2004-12-22 15:59                                                 ` jasmine
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 15:56 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list, Thomas Gleixner

Preliminarily I'm planning:
1. Use adler32 on headers
2. Leave crc32 on data

We may also try to paly with prefetch GCC builtin...

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:30                                           ` Joakim Tjernlund
  2004-12-22 15:37                                             ` Artem B. Bityuckiy
@ 2004-12-22 15:56                                             ` Jörn Engel
  2004-12-22 16:39                                               ` Joakim Tjernlund
  1 sibling, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 15:56 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

On Wed, 22 December 2004 16:30:20 +0100, Joakim Tjernlund wrote:
> 
> Can you do
> 1) First run a forward and then a backward version on memory space >= 2*L1 cache
> 2) Then do two forward runs on the same memory.
> 
> Compare the delta between 1) and 2)
> That should give a clue if it is worth having a backward version.

In my private not-too-scientific hash-table test, I do a strlen()
before calling engel32.  The reverse version is 1.5% faster, even
though the strings are in the 80byte arena.  That's well over noise
level.

> > Adler32 beats the hell out of every other algorithm.  Except for the
> > backwards part, it appears to be a clear winner.
> 
> Have you look at the assembler Engler32 generates? Every instruction counts
> in such small loops.

Not yet.  Might be worth a try, though.  Since it's losing in direct
comparison, but equal in the hash test, it might be slightly stronger
than adler32.

BTW: I reject the name "Engler32".  Few names are worse than
"engel32", but that one is. ;)

Jörn

-- 
A defeated army first battles and then seeks victory.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:47                                               ` Joakim Tjernlund
  2004-12-22 15:56                                                 ` Artem B. Bityuckiy
@ 2004-12-22 15:59                                                 ` jasmine
  2004-12-22 16:19                                                   ` Jörn Engel
  2004-12-22 16:21                                                   ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: jasmine @ 2004-12-22 15:59 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list



On Wed, 22 Dec 2004, Joakim Tjernlund wrote:

[going backwards]
> Prefetching may work well for some archs, but for low end embedded CPUs
> I am not so sure. You will probably need to add arch specific code also to
> do prefetching.

This is nothing to do with prefetching.

Imagine you have three functions, a(), b() and c().  They all
work through a block of data D[] which is of a size >> cache size.

a() runs, loading data into cache as it works through the data.
At the end of a()'s run, the cache predominantly contains data from
the end of D[], because that was the last part to be accessed.

b() then runs, and needs data from the start of D[], so the cache
discards all the lines it loaded for a() and reloads them.  At the
end of b()'s runm the cache predominantly contains data from the
last part of D[], again, because that was the last part to be accessed.

c() finally runs, needs the start of D[].  The cache dumps all those
lines once more and reloads them again.


Now:  what happens if b() starts from the end of D[]?  You save a little 
time because the data b() needs to start with is already in cache.  And 
you save a little more because at the end of b(), the cache is full of the 
start of D[], so c() is ready to run.

Does this clarify?

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:25                                           ` Artem B. Bityuckiy
@ 2004-12-22 16:08                                             ` Jörn Engel
  2004-12-22 20:22                                             ` xemc
  1 sibling, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 16:08 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 15:25:23 +0000, Artem B. Bityuckiy wrote:
> 
> Jorn, Could you please provide adler32r?

I tried.  Hope this is correct.

#define BASE 65521L		/* largest prime smaller than 65536 */
#define NMAX 5552
/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */

#define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
#define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
#define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
#define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
#define DO16(buf)   DO8(buf,0); DO8(buf,8);

/* ========================================================================= */
uint32_t adler32r(uint32_t adler, const char *buf, size_t len)
{
	unsigned long s1 = adler & 0xffff;
	unsigned long s2 = (adler >> 16) & 0xffff;
	int k;

	if (!buf)
		return 1L;

	buf += len;
	while (len > 0) {
		k = len < NMAX ? len : NMAX;
		len -= k;
		while (k >= 16) {
			buf -= 16;
			DO16(buf);
			k -= 16;
		}
		if (k != 0)
			do {
				s1 += *--buf;
				s2 += s1;
			} while (--k);
		s1 %= BASE;
		s2 %= BASE;
	}
	return (s2 << 16) | s1;
}


Jörn

-- 
Don't patch bad code, rewrite it.
-- Kernigham and Pike, according to Rusty

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:56                                                 ` Artem B. Bityuckiy
@ 2004-12-22 16:09                                                   ` Jörn Engel
  2004-12-22 16:17                                                     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 16:09 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 15:56:23 +0000, Artem B. Bityuckiy wrote:
> 
> Preliminarily I'm planning:
> 1. Use adler32 on headers
> 2. Leave crc32 on data

Why keep crc32?

Jörn

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 16:09                                                   ` Jörn Engel
@ 2004-12-22 16:17                                                     ` Artem B. Bityuckiy
  2004-12-22 16:43                                                       ` Joakim Tjernlund
  2004-12-22 17:26                                                       ` Jörn Engel
  0 siblings, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 16:17 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 Dec 2004, [iso-8859-1] Jörn Engel wrote:

> On Wed, 22 December 2004 15:56:23 +0000, Artem B. Bityuckiy wrote:
> > 
> > Preliminarily I'm planning:
> > 1. Use adler32 on headers
> > 2. Leave crc32 on data
> 
> Why keep crc32?
This is just my IMHO: headers are small and the probability that serious 
error appear there is lower that the probability of error on data. So it 
seems we may reduce crc strength for them.

In case of data - I just not sure if it is correct to use weaker 
algorithm... 
If somebody compitent may soundly explain why we may reduce CRC strength 
there...

Anyway, saing zlib not to add adler32 CRC is good idea and I think should 
be done.
 
> Jörn
> 
> -- 
> Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are,
> by definition, not smart enough to debug it.
> -- Brian W. Kernighan
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:59                                                 ` jasmine
@ 2004-12-22 16:19                                                   ` Jörn Engel
  2004-12-22 16:21                                                   ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 16:19 UTC (permalink / raw)
  To: jasmine; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 15:59:46 +0000, jasmine@linuxgrrls.org wrote:
> On Wed, 22 Dec 2004, Joakim Tjernlund wrote:
> [going backwards]
> >Prefetching may work well for some archs, but for low end embedded CPUs
> >I am not so sure. You will probably need to add arch specific code also to
> >do prefetching.
> 
> This is nothing to do with prefetching.

Maybe we could do both?  Adding a prefetch now and then inside the
loop, using <linux/prefetch.h> may help as well.

> Imagine you have three functions, a(), b() and c().  They all
> work through a block of data D[] which is of a size >> cache size.
> 
> a() runs, loading data into cache as it works through the data.
> At the end of a()'s run, the cache predominantly contains data from
> the end of D[], because that was the last part to be accessed.
> 
> b() then runs, and needs data from the start of D[], so the cache
> discards all the lines it loaded for a() and reloads them.  At the
> end of b()'s runm the cache predominantly contains data from the
> last part of D[], again, because that was the last part to be accessed.
> 
> c() finally runs, needs the start of D[].  The cache dumps all those
> lines once more and reloads them again.
> 
> 
> Now:  what happens if b() starts from the end of D[]?  You save a little 
> time because the data b() needs to start with is already in cache.  And 
> you save a little more because at the end of b(), the cache is full of the 
> start of D[], so c() is ready to run.
> 
> Does this clarify?

Sure does.  Actually, your earlier mail already did.  Very simple
concept, yet totally new to me.  Thanks!

Jörn

-- 
"Error protection by error detection and correction."
-- from a university class

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:59                                                 ` jasmine
  2004-12-22 16:19                                                   ` Jörn Engel
@ 2004-12-22 16:21                                                   ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 16:21 UTC (permalink / raw)
  To: jasmine; +Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

IMHO he understood that - you have already kindly explained. He just meant 
that he would like to measure the benefit from passing data from different 
ends...

jasmine@linuxgrrls.org wrote:
> This is nothing to do with prefetching.
> 
> Imagine you have three functions, a(), b() and c().  They all
> work through a block of data D[] which is of a size >> cache size.
> 
> a() runs, loading data into cache as it works through the data.
> At the end of a()'s run, the cache predominantly contains data from
> the end of D[], because that was the last part to be accessed.
> 
> b() then runs, and needs data from the start of D[], so the cache
> discards all the lines it loaded for a() and reloads them.  At the
> end of b()'s runm the cache predominantly contains data from the
> last part of D[], again, because that was the last part to be accessed.
> 
> c() finally runs, needs the start of D[].  The cache dumps all those
> lines once more and reloads them again.
> 
> 
> Now:  what happens if b() starts from the end of D[]?  You save a little 
> time because the data b() needs to start with is already in cache.  And 
> you save a little more because at the end of b(), the cache is full of 
> the start of D[], so c() is ready to run.
> 
> Does this clarify?
> 
> -J.
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 15:56                                             ` Jörn Engel
@ 2004-12-22 16:39                                               ` Joakim Tjernlund
  2004-12-22 17:33                                                 ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-22 16:39 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Thomas Gleixner, Linux MTD mailing list

> On Wed, 22 December 2004 16:30:20 +0100, Joakim Tjernlund wrote:
> > 
> > Can you do
> > 1) First run a forward and then a backward version on memory space >= 2*L1 cache
> > 2) Then do two forward runs on the same memory.
> > 
> > Compare the delta between 1) and 2)
> > That should give a clue if it is worth having a backward version.
> 
> In my private not-too-scientific hash-table test, I do a strlen()
> before calling engel32.  The reverse version is 1.5% faster, even
> though the strings are in the 80byte arena.  That's well over noise
> level.

OK, then there might be worth considering a reverse checksum.

> 
> > > Adler32 beats the hell out of every other algorithm.  Except for the
> > > backwards part, it appears to be a clear winner.
> > 
> > Have you look at the assembler Engler32 generates? Every instruction counts
> > in such small loops.
> 
> Not yet.  Might be worth a try, though.  Since it's losing in direct
> comparison, but equal in the hash test, it might be slightly stronger
> than adler32.

This generates a little better code on PPC.
Not tested.

uint32_t engel32r_new(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s + len;
	uint32_t sum=engel, prod=engel;
	size_t new_len = len >> 2; 
	len &=3;

	for ( ;new_len; --new_len) {
		sum += *--s;
		prod += sum;
		sum += *--s;
		prod += sum;
		sum += *--s;
		prod += sum;
		sum += *--s;
		prod += sum;
	}
	for (; len; len--) {
		sum += *--s;
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}
> 
> BTW: I reject the name "Engler32".  Few names are worse than
> "engel32", but that one is. ;)

I am so sorry, I felt something was wrong when I wrote that but I didn't
check, my bad.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 16:17                                                     ` Artem B. Bityuckiy
@ 2004-12-22 16:43                                                       ` Joakim Tjernlund
  2004-12-22 16:46                                                         ` Artem B. Bityuckiy
  2004-12-22 17:26                                                       ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2004-12-22 16:43 UTC (permalink / raw)
  To: Artem B. Bityuckiy, Jörn Engel
  Cc: Thomas Gleixner, Linux MTD mailing list

> On Wed, 22 Dec 2004, [iso-8859-1] Jörn Engel wrote:
>
> > On Wed, 22 December 2004 15:56:23 +0000, Artem B. Bityuckiy wrote:
> > >
> > > Preliminarily I'm planning:
> > > 1. Use adler32 on headers
> > > 2. Leave crc32 on data
> >
> > Why keep crc32?
> This is just my IMHO: headers are small and the probability that serious
> error appear there is lower that the probability of error on data. So it
> seems we may reduce crc strength for them.

Maybe, but it is the data CRC that costs. I don't think you will notice
the difference if you just change the header CRC to Adler32.

>
> In case of data - I just not sure if it is correct to use weaker
> algorithm...
> If somebody compitent may soundly explain why we may reduce CRC strength
> there...
>
> Anyway, saing zlib not to add adler32 CRC is good idea and I think should
> be done.

Yes, that one is independant of what CRC algorithm we choose.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2004-12-22 16:43                                                       ` Joakim Tjernlund
@ 2004-12-22 16:46                                                         ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 16:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linux MTD mailing list, Thomas Gleixner

On Wed, 22 Dec 2004, Joakim Tjernlund wrote:

> > On Wed, 22 Dec 2004, [iso-8859-1] Jörn Engel wrote:
> >
> > > On Wed, 22 December 2004 15:56:23 +0000, Artem B. Bityuckiy wrote:
> > > >
> > > > Preliminarily I'm planning:
> > > > 1. Use adler32 on headers
> > > > 2. Leave crc32 on data
> > >
> > > Why keep crc32?
> > This is just my IMHO: headers are small and the probability that serious
> > error appear there is lower that the probability of error on data. So it
> > seems we may reduce crc strength for them.
> 
> Maybe, but it is the data CRC that costs. I don't think you will notice
> the difference if you just change the header CRC to Adler32.
True...
> 
> >
> > In case of data - I just not sure if it is correct to use weaker
> > algorithm...
> > If somebody compitent may soundly explain why we may reduce CRC strength
> > there...
> >
> > Anyway, saing zlib not to add adler32 CRC is good idea and I think should
> > be done.
> 
> Yes, that one is independant of what CRC algorithm we choose.
> 
>  Jocke
> 
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 16:17                                                     ` Artem B. Bityuckiy
  2004-12-22 16:43                                                       ` Joakim Tjernlund
@ 2004-12-22 17:26                                                       ` Jörn Engel
  2004-12-22 18:14                                                         ` xemc
  1 sibling, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 17:26 UTC (permalink / raw)
  To: Artem B. Bityuckiy
  Cc: Linux MTD mailing list, Thomas Gleixner, Joakim Tjernlund

On Wed, 22 December 2004 16:17:00 +0000, Artem B. Bityuckiy wrote:
> On Wed, 22 Dec 2004, [iso-8859-1] Jörn Engel wrote:
> > On Wed, 22 December 2004 15:56:23 +0000, Artem B. Bityuckiy wrote:
> > > 
> > > Preliminarily I'm planning:
> > > 1. Use adler32 on headers
> > > 2. Leave crc32 on data
> > 
> > Why keep crc32?
> This is just my IMHO: headers are small and the probability that serious 
> error appear there is lower that the probability of error on data. So it 
> seems we may reduce crc strength for them.
> 
> In case of data - I just not sure if it is correct to use weaker 
> algorithm... 
> If somebody compitent may soundly explain why we may reduce CRC strength 
> there...

Let me try, despite my questionable competence.

adler32 is strong agains 1bit errors:
If any single bit flips, 2^n with 0<=n<8 will be added/substracted to
the "sum" checksum.  After modulo 65521, there will always be a change
in at least one bit of the checksum.

adler32 is relatively strong against 2bit erros:
If any two bits flip, it would only go unnoticed by the "sum" checksum
if they are both at the same position within a byte, but in different
bytes, one bit flipping to 1, the other to 0.
This again would cause n*1 to be added to the "product" checksum and
m*1 to be substracted, with n and m being the positions of both bytes.
After modulo 65521, such an even would go unnoticed only if n-m is a
multiple of 65521.  Unlikely in general and impossible with 4k nodes
of jffs2.

For higher level errors, adler32 should have the usual a-priori
chances of catching them.  Even for very short messages, the chances
are 1/2^16 or better (both checksums have 8 bits or more).  For full
4k block with an average character value of 128, the "sum" checksum
will have 12+7 bits and the "product" checksum 11+12+7, both folded to
16 bits after the modulo.  So full nodes have the expected 1/2^32
a-priori chanced.


Overall, it is not as good as crc32, but pretty close, esp. for long
data.  I would entrust my personal data to it, provided that the
device in general is good enough and the first adler32 checksum
failure (ignoring power-failure during writes) results in a message to
discard the broken flash chips or something similar.

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 16:39                                               ` Joakim Tjernlund
@ 2004-12-22 17:33                                                 ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-22 17:33 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Thomas Gleixner, Linux MTD mailing list

On Wed, 22 December 2004 17:39:08 +0100, Joakim Tjernlund wrote:
> 
> OK, then there might be worth considering a reverse checksum.

Looks like it.

> This generates a little better code on PPC.
> Not tested.

Thanks.  Imo this variant is less readable and doesn't help i386, so
I'll ignore it for now.  If an i386-optimized engel32 is in the
adler32 range, I'll get back to it.  If not, let's just drop the whole
idea.

> > BTW: I reject the name "Engler32".  Few names are worse than
> > "engel32", but that one is. ;)
> 
> I am so sorry, I felt something was wrong when I wrote that but I didn't
> check, my bad.

No offense taken.  Btw, if you know a better name, that would be nice.
There should be quite a few, I'm just too lazy to think of one.

Jörn

-- 
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 17:26                                                       ` Jörn Engel
@ 2004-12-22 18:14                                                         ` xemc
  2004-12-22 18:20                                                           ` Artem B. Bityuckiy
  2004-12-23 13:52                                                           ` Jörn Engel
  0 siblings, 2 replies; 196+ messages in thread
From: xemc @ 2004-12-22 18:14 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

> adler32 is strong agains 1bit errors:
...
> adler32 is relatively strong against 2bit erros:
... 
> For higher level errors, adler32 should have the usual a-priori
> chances of catching them.
...
> Overall, it is not as good as crc32, but pretty close, esp. for long
> data.  I would entrust my personal data to it, provided that the
> device in general is good enough and the first adler32 checksum
> failure (ignoring power-failure during writes) results in a message to
> discard the broken flash chips or something similar.

Please forgive me if this is a naive question, but isn't this data
also protected by ECC in the first place?  (or is that just for NAND?)
 How strong is this ECC compared to CRC32 or Adler32?

If the ECC can handle a few bit errors, then wouldn't a simple
checksum handle the case where the file was completely corrupt or
partially written?

Please correct any wrong assumptions.
Thanks,
Mike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 18:14                                                         ` xemc
@ 2004-12-22 18:20                                                           ` Artem B. Bityuckiy
  2004-12-23 13:52                                                           ` Jörn Engel
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-22 18:20 UTC (permalink / raw)
  To: xemc; +Cc: Linux MTD mailing list

I didn't investigate this. But may be I will do. Anyway, it should be 
done.

But not all flashes are protected by ECC and I believe we want to have the 
same nodes format for any flash. From the another hand, those who not 
protected are robust.

Now I may only say about CRC32 - with properly selected polnome it 
protects 1, 2 bit errors and burst of consequitive bit errors. At least papers 
I've read stand this.

On Wed, 22 Dec 2004, xemc wrote:

> > adler32 is strong agains 1bit errors:
> ...
> > adler32 is relatively strong against 2bit erros:
> ... 
> > For higher level errors, adler32 should have the usual a-priori
> > chances of catching them.
> ...
> > Overall, it is not as good as crc32, but pretty close, esp. for long
> > data.  I would entrust my personal data to it, provided that the
> > device in general is good enough and the first adler32 checksum
> > failure (ignoring power-failure during writes) results in a message to
> > discard the broken flash chips or something similar.
> 
> Please forgive me if this is a naive question, but isn't this data
> also protected by ECC in the first place?  (or is that just for NAND?)
>  How strong is this ECC compared to CRC32 or Adler32?
> 
> If the ECC can handle a few bit errors, then wouldn't a simple
> checksum handle the case where the file was completely corrupt or
> partially written?
> 
> Please correct any wrong assumptions.
> Thanks,
> Mike
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 15:25                                           ` Artem B. Bityuckiy
  2004-12-22 16:08                                             ` Jörn Engel
@ 2004-12-22 20:22                                             ` xemc
  2004-12-22 20:43                                               ` xemc
  1 sibling, 1 reply; 196+ messages in thread
From: xemc @ 2004-12-22 20:22 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list

> engel32r added, newer test is:
...

I added the adler32r test mentioned in a later post.
Now, I tried this on my ARM-based machine.  First, it crashed on me
(NULL pointer deref.).  Then I commented out the line "memcmp(&mem[0],
'\0', MEM_CHUNKS * sizeof(char *));",  and it worked.

Here's some of my results:
# insmod /lib/modules/2.6.8.1-oscar/kernel/lib/crc-ccitt.ko
# insmod /lib/modules/2.6.8.1-oscar/kernel/lib/libcrc32c.ko
# insmod ./crc_test.ko
[crctst] 16-bit CRC CCITT 32 bytes: ts1 0, ts2 0, delta 0
[crctst] 16-bit CRC CCITT 4096 bytes: ts1 0, ts2 0, delta 0
[crctst] 16-bit CRC CCITT 65536 bytes: ts1 0, ts2 0, delta 0
[crctst] crc32 32 bytes: ts1 0, ts2 0, delta 0
[crctst] crc32 4096 bytes: ts1 0, ts2 0, delta 0
...

Here's the get_cycles definition for arm: (include/asm-arm/timex.h)
static inline cycles_t get_cycles (void)  {
        return 0;
}
Well, _that_ helps.   =]

Mike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 20:22                                             ` xemc
@ 2004-12-22 20:43                                               ` xemc
  2004-12-22 20:49                                                 ` Jasmine Strong
  0 siblings, 1 reply; 196+ messages in thread
From: xemc @ 2004-12-22 20:43 UTC (permalink / raw)
  To: Linux MTD mailing list

If you're interested, here are some more meaningful results for an ARM
architecture (Sharp LH7A400):

# rmmod crc_test; insmod ./crc_test.ko
[crctst] 16-bit CRC CCITT 32 bytes: ts1 37059, ts2 37070, delta 11
[crctst] 16-bit CRC CCITT 4096 bytes: ts1 60342, ts2 61427, delta 1085
[crctst] 16-bit CRC CCITT 65536 bytes: ts1 31089, ts2 51127, delta 20038
[crctst] crc32 32 bytes: ts1 38319, ts2 38325, delta 6
[crctst] crc32 4096 bytes: ts1 57032, ts2 57458, delta 426
[crctst] crc32 65536 bytes: ts1 49367, ts2 58710, delta 9343
[crctst] crc32c 32 bytes: ts1 42039, ts2 42048, delta 9
[crctst] crc32c 4096 bytes: ts1 61670, ts2 62677, delta 1007
[crctst] crc32c 65536 bytes: ts1 64761, ts2 46754, delta 4294949289
[crctst] adler32 32 bytes: ts1 32352, ts2 32357, delta 5
[crctst] adler32 4096 bytes: ts1 51543, ts2 51811, delta 268
[crctst] adler32 65536 bytes: ts1 41608, ts2 48194, delta 6586
[crctst] adler32r 32 bytes: ts1 32140, ts2 32145, delta 5
[crctst] adler32r 4096 bytes: ts1 51663, ts2 51921, delta 258
[crctst] adler32r 65536 bytes: ts1 41654, ts2 47917, delta 6263
[crctst] engel32 32 bytes: ts1 32182, ts2 32188, delta 6
[crctst] engel32 4096 bytes: ts1 51549, ts2 51918, delta 369
[crctst] engel32 65536 bytes: ts1 43199, ts2 51347, delta 8148
[crctst] engel32r 32 bytes: ts1 35293, ts2 35298, delta 5
[crctst] engel32r 4096 bytes: ts1 54784, ts2 55173, delta 389
[crctst] engel32r 65536 bytes: ts1 44905, ts2 51102, delta 6197

Mike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 20:43                                               ` xemc
@ 2004-12-22 20:49                                                 ` Jasmine Strong
  0 siblings, 0 replies; 196+ messages in thread
From: Jasmine Strong @ 2004-12-22 20:49 UTC (permalink / raw)
  To: xemc; +Cc: Linux MTD mailing list


On 22 Dec 2004, at 20:43, xemc wrote:

> LH7A400

Sadly this is only an ARM9TDMI core and so misses out on the
(substantial) benefits that the extra instructions in the 9EJ and
11J architectures confer on such routines.  Packed arithmetic
makes a huge difference to CRC16.

I've been meaning to write some ARM SIMD routines for
checksumming; if I can get permission to release them, I'll let you
know.  (My employer is a bit paranoid...)

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-22 18:14                                                         ` xemc
  2004-12-22 18:20                                                           ` Artem B. Bityuckiy
@ 2004-12-23 13:52                                                           ` Jörn Engel
  2004-12-23 17:02                                                             ` Artem B. Bityuckiy
  2005-01-06 10:08                                                             ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2004-12-23 13:52 UTC (permalink / raw)
  To: xemc; +Cc: Linux MTD mailing list

On Wed, 22 December 2004 12:14:18 -0600, xemc wrote:
> 
> Please forgive me if this is a naive question, but isn't this data
> also protected by ECC in the first place?  (or is that just for NAND?)

Just for NAND.  NOR is pretty reliable anyway, so we could just go
without a checksum.  Hard drive based filesystems usually do the same
and few people complain.
But as long as it's cheap enough, extra confidence doesn't hurt.

>  How strong is this ECC compared to CRC32 or Adler32?

No idea.

> If the ECC can handle a few bit errors, then wouldn't a simple
> checksum handle the case where the file was completely corrupt or
> partially written?

Correct.  Simple parity might be a nice reference as well.  It is
really bad at catching even-bit errors (2,4,6,...), but it's fast.

Jörn

-- 
"Translations are and will always be problematic. They inflict violence 
upon two languages." (translation from German)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-23 13:52                                                           ` Jörn Engel
@ 2004-12-23 17:02                                                             ` Artem B. Bityuckiy
  2005-01-07 11:10                                                               ` Artem B. Bityuckiy
  2005-01-06 10:08                                                             ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2004-12-23 17:02 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list


Created new version of test. Review please. What else should we add? 

Fixed bug: memcmp -> memset.
Added buffers passes in different directions.
Added few more features.

P.S. Didn't run - only compiled. Right now crashed PC and do some 
recovery.

/*
 *      This program is free software; you can redistribute it and/or 
modify
 *      it under the terms of the GNU General Public License as published 
by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 *      Author: Artem B. Bityuckiy, dedekind@infradead.org
 *      Version: 1.3
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

/* The number of memory cunks to test */
#define MEM_CHUNKS 4

/* The test iterations number */
#define ITERATIONS 1

/*
 * The size of vmalloc'ed array used to prune any test-related data
 * from the CPU data cache.
 */
#define TMPMEM_SIZE 1*1024*1024

#define TEST_PREFIX "[crctst] "

/*
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it.
 */
#define TIMESTAMP get_cycles()


/*
 * In order to not lost much interupts we relax the system from
 * time to time during testing
 */
#define RELAX()                                                 \
        do {                                                    \
                spin_unlock_irqrestore(&lock, flags);           \
                yield();                                        \
                spin_lock_irqsave(&lock, flags);                \
        } while(0)

#define PRINT_RESULTS(name, bytes, ts1, ts2)                    \
        printk(KERN_NOTICE TEST_PREFIX name ", %d bytes: "      \
                " delta %llu, ts1 %llu, ts2 %llu", bytes,       \
                (unsigned long long)(ts2 - ts1),                \
                (unsigned long long)ts1,                        \
                (unsigned long long)ts2)                        \


static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len);

static uint32_t
adler32r(uint32_t adler, const char *buf, size_t len);

static uint32_t
engel32(uint32_t engel, const void *_s, size_t len);

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len);

static void
trash_cache(void);

/*
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 32*1024, 64*1024};

static char *tmp_mem;


/*
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
        register int i, j;
        char *mem[MEM_CHUNKS];
        unsigned long flags;
        spinlock_t lock = SPIN_LOCK_UNLOCKED;
        int ret = 0;
        cycles_t ts1, ts2;

        if ((tmp_mem = vmalloc(TMPMEM_SIZE)) == NULL) {
                printk(KERN_ERR TEST_PREFIX "can't allocate %d bytes\n",
                                        TMPMEM_SIZE);
                ret = -ENOMEM;
                goto exit;
        }

        memset(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

        /* Allocate memory */
        for (i = 0; i < MEM_CHUNKS; i++) {
                if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
                        printk(KERN_ERR TEST_PREFIX "can't allocate %d 
bytes\n",
                                        memsizes[i]);
                        ret = -ENOMEM;
                        goto exit;
                }
        }

        /*
         * We do not want to be preempted during the test as well as do
         * not want interrupts affect our results. Both of these are
         * prevented by spin_lock_irqsave().
         */
        spin_lock_irqsave(&lock, flags);

        /*
         * Now we gonna measure the difference between passing arrays
         * two times forward/on time backward and one time forward.
         */
        for (i = 0; i < MEM_CHUNKS; i++) {

                /* Trash the CPU data chache */
                trash_cache();

                ts1 = TIMESTAMP;
                for (i = 0; i < TMPMEM_SIZE; i++)
                        tmp_mem[i] = tmp_mem[i] + 1;
                for (i = 0; i < TMPMEM_SIZE; i++)
                        tmp_mem[i] = tmp_mem[i] + 1;
                ts2 = TIMESTAMP;
                PRINT_RESULTS("Data pass both forward", memsizes[i], ts1, 
ts2);

                trash_cache();

                ts1 = TIMESTAMP;
                for (i = TMPMEM_SIZE; i >= 0; i--)
                        tmp_mem[i] = tmp_mem[i] + 1;
                for (i = 0; i < TMPMEM_SIZE; i++)
                        tmp_mem[i] = tmp_mem[i] + 1;
                ts2 = TIMESTAMP;
                PRINT_RESULTS("Data pass backward/forward", memsizes[i], 
ts1, ts2);
        }

        RELAX();

        /* Test adler32 */
        for (i = 0; i < MEM_CHUNKS; i++) {
                unsigned long crc;

                crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("adler32", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test adler32r */
        for (i = 0; i < MEM_CHUNKS; i++) {
                unsigned long crc;

                crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("adler32r", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test engel32 */
        for (i = 0; i < MEM_CHUNKS; i++) {
                uint32_t crc;

                crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("engel32", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test engel32r */
        for (i = 0; i < MEM_CHUNKS; i++) {
                uint32_t crc;

                crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("engel32r", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test 16 bit CRC CCITT */
        for (i = 0; i < MEM_CHUNKS; i++) {
                u16 crc;

                /* Do one fake pass to exclude CPU cache influence */
                crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("16-bit CRC CCITT", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test crc32 */
        for (i = 0; i < MEM_CHUNKS; i++) {
                u32 crc;

                crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("CRC32", memsizes[i], ts1, ts2);
        }

        RELAX();

        /* Test crc32c */
        for (i = 0; i < MEM_CHUNKS; i++) {
                u32 crc;

                crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]);
                ts1 = TIMESTAMP;
                for (j = 0; j < ITERATIONS; j++)
                        crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]);
                ts2 = TIMESTAMP;
                PRINT_RESULTS("CRC32c", memsizes[i], ts1, ts2);
        }

        spin_unlock_irqrestore(&lock, flags);

exit:
        if (tmp_mem != NULL)
                vfree(tmp_mem);

        for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
                kfree(mem[i]);

        return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
        return;
}

module_exit(cleanup_crctest);

/*
 * In order to prune our data from the CPU cache, we scan big data
 * array.
 */
static void
trash_cache(void) {
        register int i;
        for (i = 1; i < TMPMEM_SIZE; i++)
                tmp_mem[i-1] = tmp_mem[i] + 1;
}

/* ----------------------------------------------------------------------- 
*/

/*
 * Was borrowed from include/linux/zutil.h
 */
#define NMAX 5552
#define BASE 65521L
#define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
#define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
#define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
#define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
#define DO16(buf)   DO8(buf,0); DO8(buf,8);

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len)
{
    unsigned long s1 = adler & 0xffff;
    unsigned long s2 = (adler >> 16) & 0xffff;
    int k;

    if (buf == NULL) return 1L;

    while (len > 0) {
        k = len < NMAX ? len : NMAX;
        len -= k;
        while (k >= 16) {
            DO16(buf);
            buf += 16;
            k -= 16;
        }
        if (k != 0) do {
            s1 += *buf++;
            s2 += s1;
        } while (--k);
        s1 %= BASE;
        s2 %= BASE;
    }
    return (s2 << 16) | s1;
}

/*
 * Reverse version of adler32 (provided by Jorn Engel).
 */
static uint32_t
adler32r(uint32_t adler, const char *buf, size_t len)
{
        unsigned long s1 = adler & 0xffff;
        unsigned long s2 = (adler >> 16) & 0xffff;
        int k;

        if (!buf)
                return 1L;

        buf += len;
        while (len > 0) {
                k = len < NMAX ? len : NMAX;
                len -= k;
                while (k >= 16) {
                        buf -= 16;
                        DO16(buf);
                        k -= 16;
                }
                if (k != 0)
                        do {
                                s1 += *--buf;
                                s2 += s1;
                        } while (--k);
                s1 %= BASE;
                s2 %= BASE;
        }
        return (s2 << 16) | s1;
}

/*
 * Jorn Engel's algorithms.
 */
static uint32_t
engel32(uint32_t engel, const void *_s, size_t len)
{
        const char *s = _s;
        uint32_t sum=engel, prod=engel;
        for (; len>=4; len-=4, s+=4) {
                sum += s[0];
                prod += sum;
                sum += s[1];
                prod += sum;
                sum += s[2];
                prod += sum;
                sum += s[3];
                prod += sum;
        }
        for (; len; len--, s++) {
                sum += *s;
                prod += sum;
        }
        sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
        sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
        sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
        sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
        sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
        prod ^= sum;
        return prod;
}

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len)
{
        const char *s = _s;
        uint32_t sum=engel, prod=engel;
        for (; len>=4; len-=4, s+=4) {
                sum += s[len-1];
                prod += sum;
                sum += s[len-2];
                prod += sum;
                sum += s[len-3];
                prod += sum;
                sum += s[len-4];
                prod += sum;
        }
        for (; len; len--, s++) {
                sum += s[len];
                prod += sum;
        }
        sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
        sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
        sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
        sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
        sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
        prod ^= sum;
        return prod;
}

MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");


--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-23 13:52                                                           ` Jörn Engel
  2004-12-23 17:02                                                             ` Artem B. Bityuckiy
@ 2005-01-06 10:08                                                             ` Artem B. Bityuckiy
  2005-01-08 20:14                                                               ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-06 10:08 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

On Thu, 23 Dec 2004, [iso-8859-1] Jörn Engel wrote:

Jorn Engel: 
> NOR is pretty reliable anyway, so we could just go
> without a checksum.

Can not agree with you. Checksums are requred even for very reliable NOR 
flashes to be able to detect broken nodes which can appear after unclean 
reboots.
My understanding is that this is the most important thing why CRCs are 
needed. The media corruption is of lesser priority. From this perspective 
we may easilly use any weaker (then CRC32) checksum, but this checksum 
must be good in detecting partially written nodes.

  
> Correct.  Simple parity might be a nice reference as well.  It is
> really bad at catching even-bit errors (2,4,6,...), but it's fast.

And please, bear in mind that if we encounter ECC error, this means error 
somwhere in the page. But this page may contain several JFFS3 nodes and we 
may recover some of them. So, having per-node CRC is good idea even if 
there is ECC (ECC is per-page).
For example, the board may have been rebooted uncleanly during writing ECC 
(the data was already written, ECC is written after data). In this case we 
may have correct data but just wrong ECC. Having CRCs, we might recover 
all JFFS3 nodes.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 11:10                                                               ` Artem B. Bityuckiy
@ 2005-01-07 11:09                                                                 ` David Woodhouse
  2005-01-07 11:27                                                                   ` jasmine
  0 siblings, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-07 11:09 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: joakim.tjernlund, MTD List

On Fri, 2005-01-07 at 11:10 +0000, Artem B. Bityuckiy wrote:
> I gonna add support for those architectures who does not have any 
> CPU cycles counters (like ARM). In this case I gonna use jiffies value.

You can do better than that -- we have timers on ARM chips which are
better.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2004-12-23 17:02                                                             ` Artem B. Bityuckiy
@ 2005-01-07 11:10                                                               ` Artem B. Bityuckiy
  2005-01-07 11:09                                                                 ` David Woodhouse
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 11:10 UTC (permalink / raw)
  To: MTD List; +Cc: joakim.tjernlund

Hello Guys.

Please, don't try the last test version I sent and don review it. It is 
buggy. I gonna create another one. I gonna add support for those 
architectures who does not have any CPU cycles counters (like ARM). In 
this case I gonna use jiffies value. But I don't like it since we need to 
have interrupts enabled during test. Moreover, this is very inaccurate and 
we need to do lots of iterations of CRC calculations. And we may be 
interrupted and the IRQ handler may put lots of garbage to the L1 caches 
(both instruction and data caches). 

So I don't know how to believe test which uses jiffies (especially when we 
want to measure diference between forward and backward CRC calculations).

Any thoughts?

On Thu, 23 Dec 2004, Artem B. Bityuckiy wrote:
> 
> Created new version of test. Review please. What else should we add? 
> 
> Fixed bug: memcmp -> memset.
> Added buffers passes in different directions.
> Added few more features.
> 
> P.S. Didn't run - only compiled. Right now crashed PC and do some 
> recovery.
> 
> /*
>  *      This program is free software; you can redistribute it and/or 
> modify
>  *      it under the terms of the GNU General Public License as published 
> by
>  *      the Free Software Foundation; either version 2 of the License, or
>  *      (at your option) any later version.
>  *
>  *      This program is distributed in the hope that it will be useful,
>  *      but WITHOUT ANY WARRANTY; without even the implied warranty of
>  *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>  *      GNU General Public License for more details.
>  *
>  *      You should have received a copy of the GNU General Public License
>  *      along with this program; if not, write to the Free Software
>  *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>  *
>  *      Author: Artem B. Bityuckiy, dedekind@infradead.org
>  *      Version: 1.3
>  */
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/vmalloc.h>
> #include <linux/pagemap.h>
> #include <linux/spinlock.h>
> #include <linux/crc32.h>
> #include <linux/crc32c.h>
> #include <linux/crc-ccitt.h>
> #include <asm/timex.h>
> 
> /* The number of memory cunks to test */
> #define MEM_CHUNKS 4
> 
> /* The test iterations number */
> #define ITERATIONS 1
> 
> /*
>  * The size of vmalloc'ed array used to prune any test-related data
>  * from the CPU data cache.
>  */
> #define TMPMEM_SIZE 1*1024*1024
> 
> #define TEST_PREFIX "[crctst] "
> 
> /*
>  * Most architectures have some kind of hight resolution time-stamp
>  * counter and define the get_cycles() macro to access it.
>  */
> #define TIMESTAMP get_cycles()
> 
> 
> /*
>  * In order to not lost much interupts we relax the system from
>  * time to time during testing
>  */
> #define RELAX()                                                 \
>         do {                                                    \
>                 spin_unlock_irqrestore(&lock, flags);           \
>                 yield();                                        \
>                 spin_lock_irqsave(&lock, flags);                \
>         } while(0)
> 
> #define PRINT_RESULTS(name, bytes, ts1, ts2)                    \
>         printk(KERN_NOTICE TEST_PREFIX name ", %d bytes: "      \
>                 " delta %llu, ts1 %llu, ts2 %llu", bytes,       \
>                 (unsigned long long)(ts2 - ts1),                \
>                 (unsigned long long)ts1,                        \
>                 (unsigned long long)ts2)                        \
> 
> 
> static unsigned long
> adler32(unsigned long adler, const unsigned char *buf, size_t len);
> 
> static uint32_t
> adler32r(uint32_t adler, const char *buf, size_t len);
> 
> static uint32_t
> engel32(uint32_t engel, const void *_s, size_t len);
> 
> static uint32_t
> engel32r(uint32_t engel, const void *_s, size_t len);
> 
> static void
> trash_cache(void);
> 
> /*
>  * The sizes of memory chunks for which CRCs should be tested.
>  */
> static int memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 32*1024, 64*1024};
> 
> static char *tmp_mem;
> 
> 
> /*
>  * We perform actual testing in the module initialization function.
>  */
> static int __init
> init_crctest(void)
> {
>         register int i, j;
>         char *mem[MEM_CHUNKS];
>         unsigned long flags;
>         spinlock_t lock = SPIN_LOCK_UNLOCKED;
>         int ret = 0;
>         cycles_t ts1, ts2;
> 
>         if ((tmp_mem = vmalloc(TMPMEM_SIZE)) == NULL) {
>                 printk(KERN_ERR TEST_PREFIX "can't allocate %d bytes\n",
>                                         TMPMEM_SIZE);
>                 ret = -ENOMEM;
>                 goto exit;
>         }
> 
>         memset(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));
> 
>         /* Allocate memory */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
>                         printk(KERN_ERR TEST_PREFIX "can't allocate %d 
> bytes\n",
>                                         memsizes[i]);
>                         ret = -ENOMEM;
>                         goto exit;
>                 }
>         }
> 
>         /*
>          * We do not want to be preempted during the test as well as do
>          * not want interrupts affect our results. Both of these are
>          * prevented by spin_lock_irqsave().
>          */
>         spin_lock_irqsave(&lock, flags);
> 
>         /*
>          * Now we gonna measure the difference between passing arrays
>          * two times forward/on time backward and one time forward.
>          */
>         for (i = 0; i < MEM_CHUNKS; i++) {
> 
>                 /* Trash the CPU data chache */
>                 trash_cache();
> 
>                 ts1 = TIMESTAMP;
>                 for (i = 0; i < TMPMEM_SIZE; i++)
>                         tmp_mem[i] = tmp_mem[i] + 1;
>                 for (i = 0; i < TMPMEM_SIZE; i++)
>                         tmp_mem[i] = tmp_mem[i] + 1;
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("Data pass both forward", memsizes[i], ts1, 
> ts2);
> 
>                 trash_cache();
> 
>                 ts1 = TIMESTAMP;
>                 for (i = TMPMEM_SIZE; i >= 0; i--)
>                         tmp_mem[i] = tmp_mem[i] + 1;
>                 for (i = 0; i < TMPMEM_SIZE; i++)
>                         tmp_mem[i] = tmp_mem[i] + 1;
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("Data pass backward/forward", memsizes[i], 
> ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test adler32 */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 unsigned long crc;
> 
>                 crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("adler32", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test adler32r */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 unsigned long crc;
> 
>                 crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("adler32r", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test engel32 */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 uint32_t crc;
> 
>                 crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("engel32", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test engel32r */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 uint32_t crc;
> 
>                 crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("engel32r", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test 16 bit CRC CCITT */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 u16 crc;
> 
>                 /* Do one fake pass to exclude CPU cache influence */
>                 crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);
> 
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("16-bit CRC CCITT", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test crc32 */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 u32 crc;
> 
>                 crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("CRC32", memsizes[i], ts1, ts2);
>         }
> 
>         RELAX();
> 
>         /* Test crc32c */
>         for (i = 0; i < MEM_CHUNKS; i++) {
>                 u32 crc;
> 
>                 crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts1 = TIMESTAMP;
>                 for (j = 0; j < ITERATIONS; j++)
>                         crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]);
>                 ts2 = TIMESTAMP;
>                 PRINT_RESULTS("CRC32c", memsizes[i], ts1, ts2);
>         }
> 
>         spin_unlock_irqrestore(&lock, flags);
> 
> exit:
>         if (tmp_mem != NULL)
>                 vfree(tmp_mem);
> 
>         for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
>                 kfree(mem[i]);
> 
>         return ret;
> }
> 
> module_init(init_crctest);
> 
> static void __exit
> cleanup_crctest(void)
> {
>         return;
> }
> 
> module_exit(cleanup_crctest);
> 
> /*
>  * In order to prune our data from the CPU cache, we scan big data
>  * array.
>  */
> static void
> trash_cache(void) {
>         register int i;
>         for (i = 1; i < TMPMEM_SIZE; i++)
>                 tmp_mem[i-1] = tmp_mem[i] + 1;
> }
> 
> /* ----------------------------------------------------------------------- 
> */
> 
> /*
>  * Was borrowed from include/linux/zutil.h
>  */
> #define NMAX 5552
> #define BASE 65521L
> #define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
> #define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
> #define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
> #define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
> #define DO16(buf)   DO8(buf,0); DO8(buf,8);
> 
> static unsigned long
> adler32(unsigned long adler, const unsigned char *buf, size_t len)
> {
>     unsigned long s1 = adler & 0xffff;
>     unsigned long s2 = (adler >> 16) & 0xffff;
>     int k;
> 
>     if (buf == NULL) return 1L;
> 
>     while (len > 0) {
>         k = len < NMAX ? len : NMAX;
>         len -= k;
>         while (k >= 16) {
>             DO16(buf);
>             buf += 16;
>             k -= 16;
>         }
>         if (k != 0) do {
>             s1 += *buf++;
>             s2 += s1;
>         } while (--k);
>         s1 %= BASE;
>         s2 %= BASE;
>     }
>     return (s2 << 16) | s1;
> }
> 
> /*
>  * Reverse version of adler32 (provided by Jorn Engel).
>  */
> static uint32_t
> adler32r(uint32_t adler, const char *buf, size_t len)
> {
>         unsigned long s1 = adler & 0xffff;
>         unsigned long s2 = (adler >> 16) & 0xffff;
>         int k;
> 
>         if (!buf)
>                 return 1L;
> 
>         buf += len;
>         while (len > 0) {
>                 k = len < NMAX ? len : NMAX;
>                 len -= k;
>                 while (k >= 16) {
>                         buf -= 16;
>                         DO16(buf);
>                         k -= 16;
>                 }
>                 if (k != 0)
>                         do {
>                                 s1 += *--buf;
>                                 s2 += s1;
>                         } while (--k);
>                 s1 %= BASE;
>                 s2 %= BASE;
>         }
>         return (s2 << 16) | s1;
> }
> 
> /*
>  * Jorn Engel's algorithms.
>  */
> static uint32_t
> engel32(uint32_t engel, const void *_s, size_t len)
> {
>         const char *s = _s;
>         uint32_t sum=engel, prod=engel;
>         for (; len>=4; len-=4, s+=4) {
>                 sum += s[0];
>                 prod += sum;
>                 sum += s[1];
>                 prod += sum;
>                 sum += s[2];
>                 prod += sum;
>                 sum += s[3];
>                 prod += sum;
>         }
>         for (; len; len--, s++) {
>                 sum += *s;
>                 prod += sum;
>         }
>         sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
>         sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
>         sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
>         sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
>         sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
>         prod ^= sum;
>         return prod;
> }
> 
> static uint32_t
> engel32r(uint32_t engel, const void *_s, size_t len)
> {
>         const char *s = _s;
>         uint32_t sum=engel, prod=engel;
>         for (; len>=4; len-=4, s+=4) {
>                 sum += s[len-1];
>                 prod += sum;
>                 sum += s[len-2];
>                 prod += sum;
>                 sum += s[len-3];
>                 prod += sum;
>                 sum += s[len-4];
>                 prod += sum;
>         }
>         for (; len; len--, s++) {
>                 sum += s[len];
>                 prod += sum;
>         }
>         sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
>         sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
>         sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
>         sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
>         sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
>         prod ^= sum;
>         return prod;
> }
> 
> MODULE_LICENSE ("GPL");
> MODULE_AUTHOR ("Artem B. Bityuckiy");
> MODULE_DESCRIPTION ("The CRC test");
> 
> 
> --
> Best Regards,
> Artem B. Bityuckiy,
> St.-Petersburg, Russia.
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 11:09                                                                 ` David Woodhouse
@ 2005-01-07 11:27                                                                   ` jasmine
  2005-01-07 11:43                                                                     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: jasmine @ 2005-01-07 11:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List, joakim.tjernlund



On Fri, 7 Jan 2005, David Woodhouse wrote:

> You can do better than that -- we have timers on ARM chips which are
> better.

And ARMv6 has performance counters too.

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 11:27                                                                   ` jasmine
@ 2005-01-07 11:43                                                                     ` Artem B. Bityuckiy
  2005-01-07 14:23                                                                       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 11:43 UTC (permalink / raw)
  To: jasmine; +Cc: joakim.tjernlund, David Woodhouse, MTD List

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, David Woodhouse wrote:
> 
> > You can do better than that -- we have timers on ARM chips which are
> > better.
> 
> And ARMv6 has performance counters too.
> 
> -J.
> 

I just can see asm-arm/timex.h. get_cycles() is always zero. May be ARMv6 
is not supported yet, have no ideas.

Anyway, seems that almost all platforms have get_cycles() implemented 
except ARM and some x86 (i386, i486 ...). We don't care much about x86 
IMHO. So, only ARM case should be handled another way.

I'm not expert in ARM. So, any help is welcomed. Are there any calls in 
(ARM) linux which we may use?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 11:43                                                                     ` Artem B. Bityuckiy
@ 2005-01-07 14:23                                                                       ` Artem B. Bityuckiy
  2005-01-07 14:27                                                                         ` jasmine
  2005-01-07 14:31                                                                         ` Artem B. Bityuckiy
  0 siblings, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 14:23 UTC (permalink / raw)
  To: MTD List; +Cc: David Woodhouse, joakim.tjernlund, jasmine

[-- Attachment #1: Type: TEXT/PLAIN, Size: 12489 bytes --]

Hello,

here is the test. 

I added ARM support and use timers in case of ARM. I use the 
system_timer->offset() call which returns the content of the timer 
registers casted to nanoseconds. On small problem - the system_timer 
structure is not exported, so I failed to create module in case of ARM. 
But it is still possible to compile it with kernel and the test will be 
performed during linux load.

The new test version is attached. Please, review it. If it is possible, 
try on linux board you have. MIPS, PPC, ARM IMHO especially interesting.
If there are some offers to add smth to test, you are welcome :-)

I've run the test on ARM and x86.
My results show that Jorn's adler32r is the winer (adler32 is the next).

Strange results with data walks - two backward walks are fastest. This 
doesn't fit what we discussed about passing data forward then backward... 
May be 128K array is too small?

P.S. Test is attached.

-----------------------------------------------------------------------
ARM (OMAP):

#cat /proc/cpuinfo
Processor       : ARM926EJ-Sid(wb) rev 3 (v5l)
BogoMIPS        : 107.72
Features        : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 3
Cache type      : VIPT write-back
Cache clean     : cp15 c7 ops
Cache lockdown  : format C
Cache format    : Harvard
I size          : 32768
I assoc         : 4
I line length   : 32
I sets          : 256
D size          : 16384
D assoc         : 4
D line length   : 32
D sets          : 128

[crctest] (ARM timers) Data pass both forward, 32 bytes: delta 10, ts1 
86284, ts2 86294
[crctest] (ARM timers) Data pass both backward, 32 bytes: delta 8, ts1 
165359, ts2 165367
[crctest] (ARM timers) Data pass backward/forward, 32 bytes: delta 9, ts1 
244402, ts2 244411
[crctest] (ARM timers) Data pass forward/backward, 32 bytes: delta 9, ts1 
323449, ts2 323458
[crctest] (ARM timers) Data pass both forward, 4096 bytes: delta 635, ts1 
402496, ts2 403131
[crctest] (ARM timers) Data pass both backward, 4096 bytes: delta 524, ts1 
482171, ts2 482695
[crctest] (ARM timers) Data pass backward/forward, 4096 bytes: delta 582, 
ts1 561735, ts2 562317
[crctest] (ARM timers) Data pass forward/backward, 4096 bytes: delta 577, 
ts1 641359, ts2 641936
[crctest] (ARM timers) Data pass both forward, 32768 bytes: delta 5219, 
ts1 720979, ts2 726198
[crctest] (ARM timers) Data pass both backward, 32768 bytes: delta 4361, 
ts1 805222, ts2 809583
[crctest] (ARM timers) Data pass backward/forward, 32768 bytes: delta 
4695, ts1 888609, ts2 893304
[crctest] (ARM timers) Data pass forward/backward, 32768 bytes: delta 
4682, ts1 972332, ts2 977014
[crctest] (ARM timers) Data pass both forward, 65536 bytes: delta 10487, 
ts1 1056042, ts2 1066529
[crctest] (ARM timers) Data pass both backward, 65536 bytes: delta 8769, 
ts1 1145579, ts2 1154348
[crctest] (ARM timers) Data pass backward/forward, 65536 bytes: delta 
9534, ts1 1233398, ts2 1242932
[crctest] (ARM timers) Data pass forward/backward, 65536 bytes: delta 
9522, ts1 1321982, ts2 1331504
[crctest] (ARM timers) Data pass both forward, 131072 bytes: delta 20970, 
ts1 1410557, ts2 1431527
[crctest] (ARM timers) Data pass both backward, 131072 bytes: delta 17533, 
ts1 1510579, ts2 1528112
[crctest] (ARM timers) Data pass backward/forward, 131072 bytes: delta 
19157, ts1 1607164, ts2 1626321
[crctest] (ARM timers) Data pass forward/backward, 131072 bytes: delta 
19145, ts1 1705375, ts2 1724520
[crctest] (ARM timers) adler32, 32 bytes: delta 10, ts1 1724634, ts2 
1724644
[crctest] (ARM timers) adler32, 4096 bytes: delta 662, ts1 4917, ts2 5579
[crctest] (ARM timers) adler32, 32768 bytes: delta 6529, ts1 6299, ts2 
12828
[crctest] (ARM timers) adler32, 65536 bytes: delta 13031, ts1 4287, ts2 
17318
[crctest] (ARM timers) adler32, 131072 bytes: delta 26044, ts1 10038, ts2 
36082
[crctest] (ARM timers) adler32r, 32 bytes: delta 10, ts1 6205, ts2 6215
[crctest] (ARM timers) adler32r, 4096 bytes: delta 639, ts1 6358, ts2 6997
[crctest] (ARM timers) adler32r, 32768 bytes: delta 6480, ts1 7711, ts2 
14191
[crctest] (ARM timers) adler32r, 65536 bytes: delta 12944, ts1 5599, ts2 
18543
[crctest] (ARM timers) adler32r, 131072 bytes: delta 25873, ts1 11246, ts2 
37119
[crctest] (ARM timers) engel32, 32 bytes: delta 11, ts1 7241, ts2 7252
[crctest] (ARM timers) engel32, 4096 bytes: delta 911, ts1 7422, ts2 8333
[crctest] (ARM timers) engel32, 32768 bytes: delta 8553, ts1 9255, ts2 
17808
[crctest] (ARM timers) engel32, 65536 bytes: delta 17086, ts1 9629, ts2 
26715
[crctest] (ARM timers) engel32, 131072 bytes: delta 34152, ts1 10361, ts2 
44513
[crctest] (ARM timers) engel32r, 32 bytes: delta 12, ts1 4644, ts2 4656
[crctest] (ARM timers) engel32r, 4096 bytes: delta 953, ts1 4815, ts2 5768
[crctest] (ARM timers) engel32r, 32768 bytes: delta 7589, ts1 6592, ts2 
14181
[crctest] (ARM timers) engel32r, 65536 bytes: delta 15175, ts1 5792, ts2 
20967
[crctest] (ARM timers) engel32r, 131072 bytes: delta 30345, ts1 4079, ts2 
34424
[crctest] (ARM timers) 16-bit CRC CCITT, 32 bytes: delta 24, ts1 4515, ts2 
4539
[crctest] (ARM timers) 16-bit CRC CCITT, 4096 bytes: delta 2817, ts1 4903, 
ts2 7720
[crctest] (ARM timers) 16-bit CRC CCITT, 32768 bytes: delta 23752, ts1 
10166, ts2 33918
[crctest] (ARM timers) 16-bit CRC CCITT, 65536 bytes: delta 47409, ts1 
8775, ts2 56184
[crctest] (ARM timers) 16-bit CRC CCITT, 131072 bytes: delta 94831, ts1 
15781, ts2 110612
[crctest] (ARM timers) CRC32, 32 bytes: delta 14, ts1 740, ts2 754
[crctest] (ARM timers) CRC32, 4096 bytes: delta 1402, ts1 971, ts2 2373
[crctest] (ARM timers) CRC32, 32768 bytes: delta 12502, ts1 3689, ts2 
16191
[crctest] (ARM timers) CRC32, 65536 bytes: delta 24847, ts1 8787, ts2 
33634
[crctest] (ARM timers) CRC32, 131072 bytes: delta 49724, ts1 8714, ts2 
58438
[crctest] (ARM timers) CRC32c, 32 bytes: delta 26, ts1 8563, ts2 8589
[crctest] (ARM timers) CRC32c, 4096 bytes: delta 2857, ts1 8953, ts2 11810
[crctest] (ARM timers) CRC32c, 32768 bytes: delta 24094, ts1 4298, ts2 
28392
[crctest] (ARM timers) CRC32c, 65536 bytes: delta 48066, ts1 13308, ts2 
61374
[crctest] (ARM timers) CRC32c, 131072 bytes: delta 96113, ts1 11096, ts2 
107209

-----------------------------------------------------------------------
i686

#cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : Celeron (Mendocino)
stepping        : 5
cpu MHz         : 534.805
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat 
pse36 mmx fxsr
bogomips        : 1052.67

[crctst] (get_cycles) Data pass both forward, 32 bytes: delta 731, ts1 
5554559977084, ts2 5554559977815
[crctst] (get_cycles) Data pass both backward, 32 bytes: delta 755, ts1 
5554571204036, ts2 5554571204791
[crctst] (get_cycles) Data pass backward/forward, 32 bytes: delta 762, ts1 
5554582493116, ts2 5554582493878
[crctst] (get_cycles) Data pass forward/backward, 32 bytes: delta 575, ts1 
5554593729476, ts2 5554593730051
[crctst] (get_cycles) Data pass both forward, 4096 bytes: delta 70053, ts1 
5554604977980, ts2 5554605048033
[crctst] (get_cycles) Data pass both backward, 4096 bytes: delta 70096, 
ts1 5554616415540, ts2 5554616485636
[crctst] (get_cycles) Data pass backward/forward, 4096 bytes: delta 70125, 
ts1 5554627702308, ts2 5554627772433
[crctst] (get_cycles) Data pass forward/backward, 4096 bytes: delta 70126, 
ts1 5554639032212, ts2 5554639102338
[crctst] (get_cycles) Data pass both forward, 32768 bytes: delta 561878, 
ts1 5554650362692, ts2 5554650924570
[crctst] (get_cycles) Data pass both backward, 32768 bytes: delta 555688, 
ts1 5554662199516, ts2 5554662755204
[crctst] (get_cycles) Data pass backward/forward, 32768 bytes: delta 
552954, ts1 5554673970492, ts2 5554674523446
[crctst] (get_cycles) Data pass forward/backward, 32768 bytes: delta 
446164, ts1 5554685910812, ts2 5554686356976
[crctst] (get_cycles) Data pass both forward, 65536 bytes: delta 1091671, 
ts1 5554697756260, ts2 5554698847931
[crctst] (get_cycles) Data pass both backward, 65536 bytes: delta 1114579, 
ts1 5554710185588, ts2 5554711300167
[crctst] (get_cycles) Data pass backward/forward, 65536 bytes: delta 
1111689, ts1 5554722654316, ts2 5554723766005
[crctst] (get_cycles) Data pass forward/backward, 65536 bytes: delta 
1118957, ts1 5554735058076, ts2 5554736177033
[crctst] (get_cycles) Data pass both forward, 131072 bytes: delta 2131183, 
ts1 5554747495004, ts2 5554749626187
[crctst] (get_cycles) Data pass both backward, 131072 bytes: delta 
2081712, ts1 5554760991724, ts2 5554763073436
[crctst] (get_cycles) Data pass backward/forward, 131072 bytes: delta 
2211375, ts1 5554774343292, ts2 5554776554667
[crctst] (get_cycles) Data pass forward/backward, 131072 bytes: delta 
2237850, ts1 5554787816988, ts2 5554790054838
[crctst] (get_cycles) adler32, 32 bytes: delta 208, ts1 5554790070690, ts2 
5554790070898
[crctst] (get_cycles) adler32, 4096 bytes: delta 6464, ts1 5554797404641, 
ts2 5554797411105
[crctst] (get_cycles) adler32, 32768 bytes: delta 56771, ts1 
5554797649035, ts2 5554797705806
[crctst] (get_cycles) adler32, 65536 bytes: delta 113785, ts1 
5554798361404, ts2 5554798475189
[crctst] (get_cycles) adler32, 131072 bytes: delta 233520, ts1 
5554799374350, ts2 5554799607870
[crctst] (get_cycles) adler32r, 32 bytes: delta 216, ts1 5554800107526, 
ts2 5554800107742
[crctst] (get_cycles) adler32r, 4096 bytes: delta 5113, ts1 5554800150365, 
ts2 5554800155478
[crctst] (get_cycles) adler32r, 32768 bytes: delta 53081, ts1 
5554800379348, ts2 5554800432429
[crctst] (get_cycles) adler32r, 65536 bytes: delta 106798, ts1 
5554801010586, ts2 5554801117384
[crctst] (get_cycles) adler32r, 131072 bytes: delta 218664, ts1 
5554802198998, ts2 5554802417662
[crctst] (get_cycles) engel32, 32 bytes: delta 144, ts1 5554802654360, ts2 
5554802654504
[crctst] (get_cycles) engel32, 4096 bytes: delta 7693, ts1 5554802696176, 
ts2 5554802703869
[crctst] (get_cycles) engel32, 32768 bytes: delta 65239, ts1 
5554803294336, ts2 5554803359575
[crctst] (get_cycles) engel32, 65536 bytes: delta 130514, ts1 
5554803958262, ts2 5554804088776
[crctst] (get_cycles) engel32, 131072 bytes: delta 266126, ts1 
5554805379160, ts2 5554805645286
[crctst] (get_cycles) engel32r, 32 bytes: delta 150, ts1 5554806129841, 
ts2 5554806129991
[crctst] (get_cycles) engel32r, 4096 bytes: delta 7246, ts1 5554806153189, 
ts2 5554806160435
[crctst] (get_cycles) engel32r, 32768 bytes: delta 56813, ts1 
5554806224973, ts2 5554806281786
[crctst] (get_cycles) engel32r, 65536 bytes: delta 113411, ts1 
5554806402638, ts2 5554806516049
[crctst] (get_cycles) engel32r, 131072 bytes: delta 226662, ts1 
5554807652080, ts2 5554807878742
[crctst] (get_cycles) 16-bit CRC CCITT, 32 bytes: delta 316, ts1 
5554807901011, ts2 5554807901327
[crctst] (get_cycles) 16-bit CRC CCITT, 4096 bytes: delta 29361, ts1 
5554807970344, ts2 5554807999705
[crctst] (get_cycles) 16-bit CRC CCITT, 32768 bytes: delta 235446, ts1 
5554808413329, ts2 5554808648775
[crctst] (get_cycles) 16-bit CRC CCITT, 65536 bytes: delta 473151, ts1 
5554809580918, ts2 5554810054069
[crctst] (get_cycles) 16-bit CRC CCITT, 131072 bytes: delta 940992, ts1 
5554812013686, ts2 5554812954678
[crctst] (get_cycles) CRC32, 32 bytes: delta 242, ts1 5554813156313, ts2 
5554813156555
[crctst] (get_cycles) CRC32, 4096 bytes: delta 22075, ts1 5554813404036, 
ts2 5554813426111
[crctst] (get_cycles) CRC32, 32768 bytes: delta 176204, ts1 5554813816461, 
ts2 5554813992665
[crctst] (get_cycles) CRC32, 65536 bytes: delta 352496, ts1 5554814878041, 
ts2 5554815230537
[crctst] (get_cycles) CRC32, 131072 bytes: delta 697640, ts1 
5554817199646, ts2 5554817897286
[crctst] (get_cycles) CRC32c, 32 bytes: delta 268, ts1 5554818293652, ts2 
5554818293920
[crctst] (get_cycles) CRC32c, 4096 bytes: delta 26211, ts1 5554818362264, 
ts2 5554818388475
[crctst] (get_cycles) CRC32c, 32768 bytes: delta 213319, ts1 
5554818787865, ts2 5554819001184
[crctst] (get_cycles) CRC32c, 65536 bytes: delta 426616, ts1 
5554820096814, ts2 5554820523430
[crctst] (get_cycles) CRC32c, 131072 bytes: delta 859504, ts1 
5554822563046, ts2 5554823422550

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 4079 bytes --]

/*
 *      This program is free software; you can redistribute it and/or modify
 *      it under the terms of the GNU General Public License as published by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

/* The number of memory cunks to test */
#define MEM_CHUNKS 3

/* The test iterations number */
#define ITERATIONS 1

#define TEST_PREFIX "[crctst] "

/* 
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it.
 */
#define TIMESTAMP get_cycles()

/* 
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int
memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 64*1024};

/* 
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
	register int i, j;
	char *mem[MEM_CHUNKS];
	unsigned long flags;
	spinlock_t lock = SPIN_LOCK_UNLOCKED;
	int ret = 0;
	cycles_t ts1, ts2;

	memcmp(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

	/* Allocate memory */
	for (i = 0; i < MEM_CHUNKS; i++) {
		if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
			printk(KERN_ERR TEST_PREFIX "can't allocate %d bytes\n",
					memsizes[i]);
			ret = -ENOMEM;
			goto exit;
		}
	}
	
	/* 
	 * We do not want to be preempted during the test as well as do
	 * not want interrupts affect our results. Both of these are
	 * prevented by spin_lock_irqsave().
	 */
	spin_lock_irqsave(&lock, flags);
	
	/* Test 16 bit CRC CCITT */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u16 crc;
		
		/* Do one fake pass to exclude CPU cache influence */
		crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "16-bit CRC CCITT %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}

	/* Test crc32 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32(0xFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc32(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32 %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	
	/* Test crc32c */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32c(0xFFFF, mem[i], memsizes[i]); 
		ts1 = TIMESTAMP;
		for (j = 0; j < ITERATIONS; j++) {
			crc = crc32c(0xFFFF, mem[i], memsizes[i]); 
		}
		ts2 = TIMESTAMP;

		printk(KERN_NOTICE TEST_PREFIX "crc32c %d bytes: "
			"ts1 %llu, ts2 %llu, delta %llu\n", memsizes[i],
			(unsigned long long)ts1, (unsigned long long)ts2,
			(unsigned long long)(ts2 - ts1));
	}
	spin_unlock_irqrestore(&lock, flags);
	
exit:
	for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
		kfree(mem[i]);

	return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
	return;
}

module_exit(cleanup_crctest);

MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:23                                                                       ` Artem B. Bityuckiy
@ 2005-01-07 14:27                                                                         ` jasmine
  2005-01-07 14:33                                                                           ` Artem B. Bityuckiy
  2005-01-07 14:31                                                                         ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: jasmine @ 2005-01-07 14:27 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: joakim.tjernlund, MTD List, David Woodhouse


Hi Artem,

On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:

> #cat /proc/cpuinfo
> Processor       : ARM926EJ-Sid(wb) rev 3 (v5l)

This is an OMAP1623 or similar.  Make sure your tight loops are 
32-byte-aligned.

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:23                                                                       ` Artem B. Bityuckiy
  2005-01-07 14:27                                                                         ` jasmine
@ 2005-01-07 14:31                                                                         ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 14:31 UTC (permalink / raw)
  To: MTD List; +Cc: joakim.tjernlund, David Woodhouse, jasmine

[-- Attachment #1: Type: TEXT/PLAIN, Size: 130 bytes --]

I'm sorry, I attached old version. Here is the lastest which I used.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

[-- Attachment #2: test --]
[-- Type: TEXT/PLAIN, Size: 13033 bytes --]

/*
 * 	Copyright (C)
 * 		Artem B. Bityuckiy, dedekind@infradead.org
 * 		Joern Engel, joern@wohnheim.fh-wedel.de
 * 
 *      This program is free software; you can redistribute it and/or modify
 *      it under the terms of the GNU General Public License as published by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 *      Version: 1.5
 */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
#include <linux/crc32.h>
#include <linux/crc32c.h>
#include <linux/crc-ccitt.h>
#include <asm/timex.h>

#if defined CONFIG_ARM || defined CONFIG_ARM_THUMB

/* 
 * In case of ARM we do not have any cycles counter, so use timers
 * instead.
 */
#include <asm/mach/time.h>
extern struct sys_timer *system_timer;

#define GET_CYCLES_SUPPORTED	0

#define ARCH_TIMESTAMP()	system_timer->offset()
#define arch_timestamp_t	unsigned long
#define ARCH_TEST_PREFIX	"[crctest] (ARM timers) "
#define ARCH_CAST_ULL(ts)	(unsigned long long)(ts)
#define ARCH_DELTA(ts1, ts2)	ARCH_CAST_ULL(ts2 - ts1)
#define ARCH_ITERATIONS		10

#else

#define GET_CYCLES_SUPPORTED	1

#endif

/*
 * Most architectures have some kind of hight resolution time-stamp
 * counter and define the get_cycles() macro to access it (asm-timex.h).
 * But some do not. For those we need to do something spetial. In case
 * of ARM we use timers.
 */
#if GET_CYCLES_SUPPORTED
/* Time-stamp function */
#define TIMESTAMP()	get_cycles()
/* Time-stamp type */
#define timestamp_t	cycles_t
/* The prefix for test's output */
#define TEST_PREFIX	"[crctst] (get_cycles) "
/* The macro to cast time-stamp type to unsigned long long) */
#define CAST_ULL(ts)	(unsigned long long)(ts)
/* The difference between two time-stamps (unsigned long long) */
#define DELTA(ts1, ts2)	CAST_ULL(ts2 - ts1)
/* The number of iterations of CRC calculation */
#define ITERATIONS	1 /* get_cycles is too accurate to iterate */
#else
#define TIMESTAMP()	ARCH_TIMESTAMP()
#define timestamp_t	arch_timestamp_t
#define TEST_PREFIX	ARCH_TEST_PREFIX
#define CAST_ULL(ts)	ARCH_CAST_ULL(ts)
#define DELTA(ts1, ts2)	ARCH_DELTA(ts1, ts2)
#define ITERATIONS	ARCH_ITERATIONS
#endif

/*
 * Tests are performed with interrupt and preemption disabled.
 */
static unsigned long irq_flags;
#define lock()							\
	do {							\
		preempt_disable();				\
		local_irq_save(irq_flags);			\
	} while(0)
#define unlock()						\
	do {							\
		preempt_enable();				\
		local_irq_restore(irq_flags);			\
	} while (0)


/* 
 * In order to not lost much interupts we relax the system from
 * time to time during testing. This may be important if we perform
 * tests on machine that must does other work.
 */
#define RELAX()								\
	do {								\
		unlock();						\
		cond_resched();						\
		lock();							\
	} while(0)
		
	
/* The test results output macro */
#define PRINT_RESULTS(name, bytes, ts1, ts2)				\
	do {								\
		printk(KERN_NOTICE TEST_PREFIX name ", %d bytes: "	\
			"delta %llu, ts1 %llu, ts2 %llu\n", bytes,	\
			DELTA(ts1, ts2), CAST_ULL(ts1), CAST_ULL(ts2));	\
	} while(0)

/* 
 * The size of vmalloc'ed array used to prune any test-related data
 * from the CPU data cache.
 */
#define TMPMEM_SIZE 1*1024*1024

/* The number of memory cunks to test */
#define MEM_CHUNKS 5

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len);

static uint32_t
adler32r(uint32_t adler, const char *buf, size_t len);

static uint32_t
engel32(uint32_t engel, const void *_s, size_t len);

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len);

static void
trash_cache(void);

/* 
 * The sizes of memory chunks for which CRCs should be tested.
 */
static int memsizes[MEM_CHUNKS] = {32, PAGE_SIZE, 32*1024, 64*1024, 128*1024};

/* 
 * The buffer which we use to trash the L1 cache
 */
static char *tmp_mem;


/* 
 * We perform actual testing in the module initialization function.
 */
static int __init
init_crctest(void)
{
	register int i, j;
	char *mem[MEM_CHUNKS];
	timestamp_t ts1, ts2;
	int ret = 0;

	if ((tmp_mem = vmalloc(TMPMEM_SIZE)) == NULL) {
		printk(KERN_ERR TEST_PREFIX "can't allocate %d bytes\n",
					TMPMEM_SIZE);
		ret = -ENOMEM;
		goto exit;
	}

	memset(&mem[0], '\0', MEM_CHUNKS * sizeof(char *));

	/* Allocate memory */
	for (i = 0; i < MEM_CHUNKS; i++) {
		if ((mem[i] = kmalloc(memsizes[i], GFP_KERNEL)) == NULL) {
			printk(KERN_ERR TEST_PREFIX "can't allocate %d bytes\n",
					memsizes[i]);
			ret = -ENOMEM;
			goto exit;
		}
	}
	
	/* 
	 * We do not want to be preempted during the test as well as do
	 * not want interrupts affect our results.
	 */
	lock();
	
	/* 
	 * Now we gonna measure the difference between passing arrays
	 * two times forward/on time backward and one time forward.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {


		/* Trash the CPU data chache */
		trash_cache();
		ts1 = TIMESTAMP();
		for (j = 0; j < memsizes[i]; j++)
			mem[i][j] = mem[i][j] + 1;
		for (j = 0; j < memsizes[i]; j++)
			mem[i][j] = mem[i][j] + 1;
		ts2 = TIMESTAMP();
		PRINT_RESULTS("Data pass both forward", memsizes[i], ts1, ts2);


		trash_cache();
		ts1 = TIMESTAMP();
		for (j = memsizes[i] - 1; j >= 0; j--)
			mem[i][j] = mem[i][j] + 1;
		for (j = memsizes[i] - 1; j >= 0; j--)
			mem[i][j] = mem[i][j] + 1;
		ts2 = TIMESTAMP();
		PRINT_RESULTS("Data pass both backward", memsizes[i], ts1, ts2);

		trash_cache();
		ts1 = TIMESTAMP();
		for (j = memsizes[i] - 1; j >= 0; j--)
			mem[i][j] = mem[i][j] + 1;
		for (j = 0; j < memsizes[i]; j++)
			mem[i][j] = mem[i][j] + 1;
		ts2 = TIMESTAMP();
		PRINT_RESULTS("Data pass backward/forward", memsizes[i], ts1, ts2);

		trash_cache();
		ts1 = TIMESTAMP();
		for (j = 0; j < memsizes[i]; j++)
			mem[i][j] = mem[i][j] + 1;
		for (j = memsizes[i] - 1; j >= 0; j--)
			mem[i][j] = mem[i][j] + 1;
		ts2 = TIMESTAMP();
		PRINT_RESULTS("Data pass forward/backward", memsizes[i], ts1, ts2);
	}
	
	/* 
	 * Test adler32 CRC.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		unsigned long crc;
		
		/* Do one fake pass to exclude CPU cache influence */
		crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = adler32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("adler32", memsizes[i], ts1, ts2);

		RELAX();
	}
	
	/* 
	 * Test adler32r CRC.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		unsigned long crc;
		
		crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = adler32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("adler32r", memsizes[i], ts1, ts2);

		RELAX();
	}
	
	/* 
	 * Test engel32 CRC.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		uint32_t crc;
		
		crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = engel32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("engel32", memsizes[i], ts1, ts2);

		RELAX();
	}
	
	/* 
	 * Test engel32r CRC.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		uint32_t crc;
		
		crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = engel32r(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("engel32r", memsizes[i], ts1, ts2);

		RELAX();
	}

	/* 
	 * Test 16 bit CRC CCITT.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u16 crc;
		
		crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]);

		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = crc_ccitt(0xFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();

		PRINT_RESULTS("16-bit CRC CCITT", memsizes[i], ts1, ts2);

		RELAX();
	}
	
	/* 
	 * Test crc32 CRC.
	 */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("CRC32", memsizes[i], ts1, ts2);

		RELAX();
	}
	
	/* Test crc32c */
	for (i = 0; i < MEM_CHUNKS; i++) {
		u32 crc;
		
		crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		
		ts1 = TIMESTAMP();
		for (j = 0; j < ITERATIONS; j++)
			crc = crc32c(0xFFFFFFFF, mem[i], memsizes[i]); 
		ts2 = TIMESTAMP();
		PRINT_RESULTS("CRC32c", memsizes[i], ts1, ts2);

		RELAX();
	}

	unlock();
	
exit:
	if (tmp_mem != NULL)
		vfree(tmp_mem);

	for (i = 0; i < MEM_CHUNKS && mem[i] != NULL; i++)
		kfree(mem[i]);

	return ret;
}

module_init(init_crctest);

static void __exit
cleanup_crctest(void)
{
	return;
}

module_exit(cleanup_crctest);

/*
 * In order to prune our data from the CPU cache, we scan big data
 * array.
 */
static void
trash_cache(void) {
	register int i;
	for (i = 0; i < TMPMEM_SIZE; i++)
		tmp_mem[i] = tmp_mem[i] + 1;
}

/* ----------------------------------------------------------------------- */

/*
 * Was borrowed from include/linux/zutil.h
 * Copyright (C) 1995-1998 Jean-loup Gailly.
 */
#define NMAX 5552
#define BASE 65521L
#define DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
#define DO2(buf,i)  DO1(buf,i); DO1(buf,i+1);
#define DO4(buf,i)  DO2(buf,i); DO2(buf,i+2);
#define DO8(buf,i)  DO4(buf,i); DO4(buf,i+4);
#define DO16(buf)   DO8(buf,0); DO8(buf,8);

static unsigned long
adler32(unsigned long adler, const unsigned char *buf, size_t len)
{
    unsigned long s1 = adler & 0xffff;
    unsigned long s2 = (adler >> 16) & 0xffff;
    int k;

    if (buf == NULL) return 1L;

    while (len > 0) {
        k = len < NMAX ? len : NMAX;
        len -= k;
        while (k >= 16) {
            DO16(buf);
            buf += 16;
            k -= 16;
        }
        if (k != 0) do {
            s1 += *buf++;
            s2 += s1;
        } while (--k);
        s1 %= BASE;
        s2 %= BASE;
    }
    return (s2 << 16) | s1;
}

/*
 * Reverse version of adler32 (provided by Jorn Engel).
 */
static uint32_t
adler32r(uint32_t adler, const char *buf, size_t len)
{
	unsigned long s1 = adler & 0xffff;
	unsigned long s2 = (adler >> 16) & 0xffff;
	int k;

	if (!buf)
		return 1L;

	buf += len;
	while (len > 0) {
		k = len < NMAX ? len : NMAX;
		len -= k;
		while (k >= 16) {
			buf -= 16;
			DO16(buf);
			k -= 16;
		}
		if (k != 0)
			do {
				s1 += *--buf;
				s2 += s1;
			} while (--k);
		s1 %= BASE;
		s2 %= BASE;
	}
	return (s2 << 16) | s1;
}

/*
 * Jorn Engel's algorithms.
 */
static uint32_t
engel32(uint32_t engel, const void *_s, size_t len)
{
	const char *s = _s;
	uint32_t sum=engel, prod=engel;
	for (; len>=4; len-=4, s+=4) {
		sum += s[0];
		prod += sum;
		sum += s[1];
		prod += sum;
		sum += s[2];
		prod += sum;
		sum += s[3];
		prod += sum;
	}
	for (; len; len--, s++) {
		sum += *s;
		prod += sum;
	}
	sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
	sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
	sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
	sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
	sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
	prod ^= sum;
	return prod;
}

static uint32_t
engel32r(uint32_t engel, const void *_s, size_t len)
{
        const char *s = _s;
        uint32_t sum=engel, prod=engel;
        for (; len>=4; len-=4, s+=4) {
                sum += s[len-1];
                prod += sum;
                sum += s[len-2];
                prod += sum;
                sum += s[len-3];
                prod += sum;
                sum += s[len-4];
                prod += sum;
        }
        for (; len; len--, s++) {
                sum += s[len];
                prod += sum;
        }
        sum = (sum&0x0000ffff)<<16^ (sum&0xffff0000)>>16;
        sum = (sum&0x00ff00ff)<<8 ^ (sum&0xff00ff00)>>8;
        sum = (sum&0x0f0f0f0f)<<4 ^ (sum&0xf0f0f0f0)>>4;
        sum = (sum&0x33333333)<<2 ^ (sum&0xcccccccc)>>2;
        sum = (sum&0x55555555)<<1 ^ (sum&0xaaaaaaaa)>>1;
        prod ^= sum;
        return prod;
}

/*MODULE_VERSION("1.5");
MODULE_LICENSE ("GPL");
MODULE_AUTHOR ("Artem B. Bityuckiy");
MODULE_DESCRIPTION ("The CRC test");*/


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:27                                                                         ` jasmine
@ 2005-01-07 14:33                                                                           ` Artem B. Bityuckiy
  2005-01-07 14:37                                                                             ` jasmine
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 14:33 UTC (permalink / raw)
  To: jasmine; +Cc: joakim.tjernlund, MTD List, David Woodhouse

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> Hi Artem,
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> > #cat /proc/cpuinfo
> > Processor       : ARM926EJ-Sid(wb) rev 3 (v5l)
> 
> This is an OMAP1623 or similar.  Make sure your tight loops are 
> 32-byte-aligned.
> 
> -J.
>
Hmm, why is this better? (I suppose you mean cache line size)
But bake them word-aligned is good. Thanks. 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:33                                                                           ` Artem B. Bityuckiy
@ 2005-01-07 14:37                                                                             ` jasmine
  2005-01-07 14:43                                                                               ` Artem B. Bityuckiy
  2005-01-07 14:50                                                                               ` Artem B. Bityuckiy
  0 siblings, 2 replies; 196+ messages in thread
From: jasmine @ 2005-01-07 14:37 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: joakim.tjernlund, MTD List, David Woodhouse



On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:

> Hmm, why is this better? (I suppose you mean cache line size)

Because ARM926 has a compulsory cache penalty cycle on every line
traversal.  (This is why ARM926 is about 8% slower than ARM925 at
the same clock frequency.)  There is also a penalty on every eighth
page mapped in, because there are only eight microTLB entries and
there are wait states on the main TLB.

> But bake them word-aligned is good.

Not word-aligned, cacheline-aligned.

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:37                                                                             ` jasmine
@ 2005-01-07 14:43                                                                               ` Artem B. Bityuckiy
  2005-01-07 14:55                                                                                 ` jasmine
  2005-01-07 14:50                                                                               ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 14:43 UTC (permalink / raw)
  To: jasmine; +Cc: joakim.tjernlund, MTD List, David Woodhouse

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> > Hmm, why is this better? (I suppose you mean cache line size)
> 
> Because ARM926 has a compulsory cache penalty cycle on every line
> traversal.  (This is why ARM926 is about 8% slower than ARM925 at
> the same clock frequency.)  There is also a penalty on every eighth
> page mapped in, because there are only eight microTLB entries and
> there are wait states on the main TLB.
> 
> > But bake them word-aligned is good.
> 
> Not word-aligned, cacheline-aligned.
> 
> -J.
> 
But the reason why I include that loops is dictated by our discussions 
about backward/forward CRC calculations and CPU cache benefits. CRC's are 
calculated reading bytes. So I don't think we should take into account 
cache line size in our particular case.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:37                                                                             ` jasmine
  2005-01-07 14:43                                                                               ` Artem B. Bityuckiy
@ 2005-01-07 14:50                                                                               ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 14:50 UTC (permalink / raw)
  To: jasmine; +Cc: joakim.tjernlund, MTD List, David Woodhouse

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> > Hmm, why is this better? (I suppose you mean cache line size)
> 
> Because ARM926 has a compulsory cache penalty cycle on every line
> traversal.  (This is why ARM926 is about 8% slower than ARM925 at
> the same clock frequency.)  There is also a penalty on every eighth
> page mapped in, because there are only eight microTLB entries and
> there are wait states on the main TLB.
> 
> > But bake them word-aligned is good.
> 
> Not word-aligned, cacheline-aligned.
> 
> -J.
> 
And it seems difficult to take into account a lot of platform-dependent 
things. Isn't it better to have one universal test?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:43                                                                               ` Artem B. Bityuckiy
@ 2005-01-07 14:55                                                                                 ` jasmine
  2005-01-07 15:20                                                                                   ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: jasmine @ 2005-01-07 14:55 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: joakim.tjernlund, David Woodhouse, jasmine, MTD List

> On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:
>
> But the reason why I include that loops is dictated by our discussions
> about backward/forward CRC calculations and CPU cache benefits. CRC's are
> calculated reading bytes. So I don't think we should take into account
> cache line size in our particular case.

You're wrong, because:

i)  The instruction cache suffers from this penalty and is, in fact, the
    major issue here.  Most of the wasted cycles will be waiting for an
    instruction to arrive from the i-cache.

ii) All data accesses use the cache, even byte accesses.  (Byte accesses
    actually reach the processor's data port as word accesses in any
    case.)  Data is fetched from the interconnect into the cache in
    eight-word-long bursts in OMAP1623, regardless of how much the core
    has asked for.  Critical-word-first means that the core only has
    to wait for the first word to arrive, but there is an additional
    cycle of penalty as the line is latched. If your data area is
    aligned to an eight-word-boundary, and your algorithm works eight
    words to a stride, the branch will cover the penalty of traversing
    the cache line boundary.  This will be slightly faster.

Does this explain?

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 14:55                                                                                 ` jasmine
@ 2005-01-07 15:20                                                                                   ` Artem B. Bityuckiy
  2005-01-07 15:24                                                                                     ` jasmine
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 15:20 UTC (permalink / raw)
  To: jasmine; +Cc: MTD List

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> > On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:
> >
> > But the reason why I include that loops is dictated by our discussions
> > about backward/forward CRC calculations and CPU cache benefits. CRC's are
> > calculated reading bytes. So I don't think we should take into account
> > cache line size in our particular case.
> 
> You're wrong, because:
> 
> i)  The instruction cache suffers from this penalty and is, in fact, the
>     major issue here.  Most of the wasted cycles will be waiting for an
>     instruction to arrive from the i-cache.
> 
> ii) All data accesses use the cache, even byte accesses.  (Byte accesses
>     actually reach the processor's data port as word accesses in any
>     case.)  Data is fetched from the interconnect into the cache in
>     eight-word-long bursts in OMAP1623, regardless of how much the core
>     has asked for.  Critical-word-first means that the core only has
>     to wait for the first word to arrive, but there is an additional
>     cycle of penalty as the line is latched. If your data area is
>     aligned to an eight-word-boundary, and your algorithm works eight
>     words to a stride, the branch will cover the penalty of traversing
>     the cache line boundary.  This will be slightly faster.
> 
> Does this explain?

I'm sorry, not exactly. Ok, could you please write the code you think is 
better? Currentlly it is:

/* Trash the CPU data chache */
trash_cache();
ts1 = TIMESTAMP();
for (j = 0; j < memsizes[i]; j++)
	mem[i][j] = mem[i][j] + 1;
for (j = 0; j < memsizes[i]; j++)
        mem[i][j] = mem[i][j] + 1;
ts2 = TIMESTAMP();

Where memsizes array is 32-byte aligned (kmalloc does this - see 
mm/slab.c, ARCH_KMALLOC_FLAGS definition).

So, I suppose you suggest to write smth like:

/* Trash the CPU data chache */
trash_cache();
ts1 = TIMESTAMP();
for (j = 0; j < memsizes[i]; j += L1_CACHE_BYTES)
	mem[i][j] = mem[i][j] + 1;
for (j = 0; j < memsizes[i]; j += L1_CACHE_BYTES)
        mem[i][j] = mem[i][j] + 1;
ts2 = TIMESTAMP();

? 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 15:20                                                                                   ` Artem B. Bityuckiy
@ 2005-01-07 15:24                                                                                     ` jasmine
  2005-01-07 15:28                                                                                       ` Artem B. Bityuckiy
  2005-01-07 17:57                                                                                       ` Artem B. Bityuckiy
  0 siblings, 2 replies; 196+ messages in thread
From: jasmine @ 2005-01-07 15:24 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List



On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:

>> i)  The instruction cache suffers from this penalty and is, in fact, the
>>     major issue here.  Most of the wasted cycles will be waiting for an
>>     instruction to arrive from the i-cache.

> I'm sorry, not exactly. Ok, could you please write the code you think is
> better? Currentlly it is:

I don't know.  I guess you'd need to use inline assembler to shove the 
code around.  I don't know how to do that in Linux any more, but in the 
OS I'm used to it would look something like this:

/* Trash the CPU data chache */
trash_cache();
ts1 = TIMESTAMP();
asm(ALIGN_32);
for (j = 0; j < memsizes[i]; j++)
 	mem[i][j] = mem[i][j] + 1;
asm(ALIGN_32);
for (j = 0; j < memsizes[i]; j++)
        mem[i][j] = mem[i][j] + 1;
ts2 = TIMESTAMP();

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 15:24                                                                                     ` jasmine
@ 2005-01-07 15:28                                                                                       ` Artem B. Bityuckiy
  2005-01-07 15:31                                                                                         ` jasmine
  2005-01-07 17:57                                                                                       ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 15:28 UTC (permalink / raw)
  To: jasmine; +Cc: MTD List

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> >> i)  The instruction cache suffers from this penalty and is, in fact, the
> >>     major issue here.  Most of the wasted cycles will be waiting for an
> >>     instruction to arrive from the i-cache.
> 
> > I'm sorry, not exactly. Ok, could you please write the code you think is
> > better? Currentlly it is:
> 
> I don't know.  I guess you'd need to use inline assembler to shove the 
> code around.  I don't know how to do that in Linux any more, but in the 
> OS I'm used to it would look something like this:
> 
> /* Trash the CPU data chache */
> trash_cache();
> ts1 = TIMESTAMP();
> asm(ALIGN_32);
> for (j = 0; j < memsizes[i]; j++)
>  	mem[i][j] = mem[i][j] + 1;
> asm(ALIGN_32);
> for (j = 0; j < memsizes[i]; j++)
>         mem[i][j] = mem[i][j] + 1;
> ts2 = TIMESTAMP();
> 
Ah, you meant the *code* should be cache-line aligned ?!?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 15:28                                                                                       ` Artem B. Bityuckiy
@ 2005-01-07 15:31                                                                                         ` jasmine
  2005-01-07 15:32                                                                                           ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: jasmine @ 2005-01-07 15:31 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List



On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:

> Ah, you meant the *code* should be cache-line aligned ?!?

Yes!  The data array starts on a 32-byte boundary already, does it not?

-J.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 15:31                                                                                         ` jasmine
@ 2005-01-07 15:32                                                                                           ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 15:32 UTC (permalink / raw)
  To: jasmine; +Cc: MTD List

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> > Ah, you meant the *code* should be cache-line aligned ?!?
> 
> Yes!  The data array starts on a 32-byte boundary already, does it not?
I believe it is.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-07 15:24                                                                                     ` jasmine
  2005-01-07 15:28                                                                                       ` Artem B. Bityuckiy
@ 2005-01-07 17:57                                                                                       ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-07 17:57 UTC (permalink / raw)
  To: jasmine; +Cc: MTD List

On Fri, 7 Jan 2005 jasmine@linuxgrrls.org wrote:

> 
> 
> On Fri, 7 Jan 2005, Artem B. Bityuckiy wrote:
> 
> >> i)  The instruction cache suffers from this penalty and is, in fact, the
> >>     major issue here.  Most of the wasted cycles will be waiting for an
> >>     instruction to arrive from the i-cache.
> 
> > I'm sorry, not exactly. Ok, could you please write the code you think is
> > better? Currentlly it is:
> 
> I don't know.  I guess you'd need to use inline assembler to shove the 
> code around.  I don't know how to do that in Linux any more, but in the 
> OS I'm used to it would look something like this:
> 
> /* Trash the CPU data chache */
> trash_cache();
> ts1 = TIMESTAMP();
> asm(ALIGN_32);
> for (j = 0; j < memsizes[i]; j++)
>  	mem[i][j] = mem[i][j] + 1;
> asm(ALIGN_32);
> for (j = 0; j < memsizes[i]; j++)
>         mem[i][j] = mem[i][j] + 1;
> ts2 = TIMESTAMP();
> 
I don't know simple way eiter. The only idea is somthing like put these to 
32-byte aligned sections... So I'd prefer to leave it as it is... Does 
this really affect results much?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-06 10:08                                                             ` Artem B. Bityuckiy
@ 2005-01-08 20:14                                                               ` Jörn Engel
  2005-01-09 11:39                                                                 ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-08 20:14 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list

On Thu, 6 January 2005 10:08:16 +0000, Artem B. Bityuckiy wrote:
> On Thu, 23 Dec 2004, [iso-8859-1] Jörn Engel wrote:
>
> > NOR is pretty reliable anyway, so we could just go
> > without a checksum.
> 
> Can not agree with you. Checksums are requred even for very reliable NOR 
> flashes to be able to detect broken nodes which can appear after unclean 
> reboots.
> My understanding is that this is the most important thing why CRCs are 
> needed. The media corruption is of lesser priority. From this perspective 
> we may easilly use any weaker (then CRC32) checksum, but this checksum 
> must be good in detecting partially written nodes.

You win.  So I'll go and recheck adler32 wrt. detecting partially
written nodes.  There's always the a-priori chance of missing a
change, so we should try to cut that down as far as possible.  Since
adler32 has a bit less than 32 bits of non-redundant information, we
might want to cheat a little:

static uint32_t adler32_tailcheck(const void *buf, size_t len)
{
	uint32_t end = *(uint32_t*) (buf + len - 4); /* last word */
	return adler32(end, buf, len);
}

By using the last word of data as initial value, we put extra emphasis
on it.  For long data (full data nodes), this shouldn't make a
difference.  On the 12-Byte header, it might make a big one.

> > Correct.  Simple parity might be a nice reference as well.  It is
> > really bad at catching even-bit errors (2,4,6,...), but it's fast.
> 
> And please, bear in mind that if we encounter ECC error, this means error 
> somwhere in the page. But this page may contain several JFFS3 nodes and we 
> may recover some of them. So, having per-node CRC is good idea even if 
> there is ECC (ECC is per-page).
> For example, the board may have been rebooted uncleanly during writing ECC 
> (the data was already written, ECC is written after data). In this case we 
> may have correct data but just wrong ECC. Having CRCs, we might recover 
> all JFFS3 nodes.

Correct, although I don't care too much.  If you pull the power while
writing, you'll lose data.  Yes, with some care you lose less data,
but some extra milliseconds of power would have done the same.

Jörn

-- 
Victory in war is not repetitious.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-08 20:14                                                               ` Jörn Engel
@ 2005-01-09 11:39                                                                 ` Artem B. Bityuckiy
  2005-01-10 14:24                                                                   ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-09 11:39 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linux MTD mailing list

Hello Joern,

Joern Engel wrote:
> You win.  So I'll go and recheck adler32 wrt. detecting partially
> written nodes.  There's always the a-priori chance of missing a
> change, so we should try to cut that down as far as possible.  Since
> adler32 has a bit less than 32 bits of non-redundant information, we
> might want to cheat a little:
> 
> static uint32_t adler32_tailcheck(const void *buf, size_t len)
> {
> 	uint32_t end = *(uint32_t*) (buf + len - 4); /* last word */
> 	return adler32(end, buf, len);
> }
> 
> By using the last word of data as initial value, we put extra emphasis
> on it.  For long data (full data nodes), this shouldn't make a
> difference.  On the 12-Byte header, it might make a big one.
Looks like good idea!

We do not yet sure we're going to use adler32 CRC, may be engel32 or 
reverse version. I hope test I've written will clarify this if people run 
it on different platforms. But preliminary results show that 
adler32/adler32r is the winner. Will wait and see.

By the way, I added your name to test Copyright, as you asked :-) If you 
mention some inconsistency/bug there, fill free to change it yourself :-)

> Correct, although I don't care too much.  If you pull the power while
> writing, you'll lose data.  Yes, with some care you lose less data,
> but some extra milliseconds of power would have done the same.
I suppose you are right if we speak about errors due to unclean reboots.

But if we speak about errors due to Flash media corruptions? This is 
typical for NAND if bad blocks appear from time to time. I do not know 
exactly how does this happen. But if we suppose when it happens - just 
several bits on some page(es) become permanently bogus, there is big 
difference if we recover good data from these pages or not. If we do not, 
we loose much more. For example, we may loose direntry of some important 
file (and file will disappear).

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-09 11:39                                                                 ` Artem B. Bityuckiy
@ 2005-01-10 14:24                                                                   ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-10 14:24 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: Linux MTD mailing list

On Sun, 9 January 2005 11:39:42 +0000, Artem B. Bityuckiy wrote:
> Joern Engel wrote:
> > 
> > static uint32_t adler32_tailcheck(const void *buf, size_t len)
> > {
> > 	uint32_t end = *(uint32_t*) (buf + len - 4); /* last word */
> > 	return adler32(end, buf, len);
> > }
> > 
> Looks like good idea!
> 
> We do not yet sure we're going to use adler32 CRC, may be engel32 or 
> reverse version. I hope test I've written will clarify this if people run 
> it on different platforms. But preliminary results show that 
> adler32/adler32r is the winner. Will wait and see.

Same trick should help engel32 as well.  Crc32 doesn't need it,
although it might be an idea to pick a non-zero initial value.
Rationale for that was in the excellent link you passed.

> By the way, I added your name to test Copyright, as you asked :-) If you 
> mention some inconsistency/bug there, fill free to change it yourself :-)

Thanks. :)
Work has started again, so don't expect too much from me for a while.

> > Correct, although I don't care too much.  If you pull the power while
> > writing, you'll lose data.  Yes, with some care you lose less data,
> > but some extra milliseconds of power would have done the same.
> I suppose you are right if we speak about errors due to unclean reboots.

We do.

> But if we speak about errors due to Flash media corruptions? This is 
> typical for NAND if bad blocks appear from time to time. I do not know 
> exactly how does this happen. But if we suppose when it happens - just 
> several bits on some page(es) become permanently bogus, there is big 
> difference if we recover good data from these pages or not. If we do not, 
> we loose much more. For example, we may loose direntry of some important 
> file (and file will disappear).

That is the GAU for jffs2, no doubt.  If at all possible, we MUST make
sure that data is moved from the bad blocks BEFORE they rot away!
As soon as we start to lose data due to rotten flash, the design is
broken.  Users might care enough to recover as much data as possible,
embedded systems can simply be put into the nearest trash bin.

So, I don't care much about how gracefully we handle this case.  Maybe
there is a 99% chance that only redundant/obsolete data is lost.
Doesn't matter.  The same thing will happen again and after 70 such
incidents, you're down to 50%.  So all you gained is some time before
the hardware hits the bin.  Oh golly!

...

So either we can make sure this case never happens, or we can't.  It
depends on the type of flash, for sure, and it may be pretty hard with
some types.  But if it doesn't work at all, the flash is broken,
period.

Same with hard drives.  They also rot over time, but their internal
logic is good enough to hide that fact from you.  Until - one fine day
- there has been too much rot going on and it fails completely.  You
can protect against that with backups, raids or simply buying a new
driver every other year.  No fs will ever help you.

Jörn

-- 
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
@ 2005-01-11 12:29 Artem B. Bityuckiy
  2005-01-11 14:37 ` Josh Boyer
  2005-01-11 21:51 ` Jörn Engel
  0 siblings, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-11 12:29 UTC (permalink / raw)
  To: joern; +Cc: David Woodhouse, MTD List

Ferenc Havasi:
>If I am right CRCs are not only against the effect of unclean reboots 
>but also to handle flash errors. On NAND flashes the ECC handles this
>problem but NOR doesn't have any error detection system. 

Joern Engel wrote:
>So either we can make sure this case never happens, or we can't.  It
>depends on the type of flash, for sure, and it may be pretty hard with
>some types.  But if it doesn't work at all, the flash is broken,
>period.

Hi, that is really interesting, *why* do we need CRCs in JFFS2?

Do we need CRC *only* to handle unclean reboots? If so, we may possibly 
handle it another way, just putting some magic word at the end of node. 
Possibly, no need for CRC at all.

Joern stands that if Flash got rotten, we *do not need* to do something 
special trying to recover data. Am I right?

I still think we need to do recovery in case of NAND, mark the rotten 
block bad and keep working, so we still need CRC. In case of NOR, we 
possibly should just report error and do nothing (we can't mark block bad 
there). Do not know about ECC NOR.

Comments?

P.S. By the way, we could put CRCs at the end of blocks (*after* data) - 
in this case CRC well be extremely strong detecting unclean reboots, isn't 
it?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-11 12:29 JFFS3 & performance Artem B. Bityuckiy
@ 2005-01-11 14:37 ` Josh Boyer
  2005-01-11 21:51 ` Jörn Engel
  1 sibling, 0 replies; 196+ messages in thread
From: Josh Boyer @ 2005-01-11 14:37 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

On Tue, 2005-01-11 at 06:29, Artem B. Bityuckiy wrote:
> Ferenc Havasi:
> >If I am right CRCs are not only against the effect of unclean reboots 
> >but also to handle flash errors. On NAND flashes the ECC handles this
> >problem but NOR doesn't have any error detection system. 
> 
> Joern Engel wrote:
> >So either we can make sure this case never happens, or we can't.  It
> >depends on the type of flash, for sure, and it may be pretty hard with
> >some types.  But if it doesn't work at all, the flash is broken,
> >period.
> 
> Hi, that is really interesting, *why* do we need CRCs in JFFS2?
> 
> Do we need CRC *only* to handle unclean reboots? If so, we may possibly 
> handle it another way, just putting some magic word at the end of node. 
> Possibly, no need for CRC at all.
> 
> Joern stands that if Flash got rotten, we *do not need* to do something 
> special trying to recover data. Am I right?
> 
> I still think we need to do recovery in case of NAND, mark the rotten 
> block bad and keep working, so we still need CRC. In case of NOR, we 
> possibly should just report error and do nothing (we can't mark block bad 
> there). Do not know about ECC NOR.

Just some info.

The only ECC NOR that I know of has transparent ECC that automagically
does the corrections for you.  There is no mechanism provided by the ECC
logic to let you know that an error occured.  It's not like ECC on NAND.

So those chips are really almost identical to NOR in this kind of
discussion.  The CRCs are really the only way to detect any kind of
errors.

josh

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-11 12:29 JFFS3 & performance Artem B. Bityuckiy
  2005-01-11 14:37 ` Josh Boyer
@ 2005-01-11 21:51 ` Jörn Engel
  2005-01-12  0:06   ` Thomas Gleixner
                     ` (2 more replies)
  1 sibling, 3 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-11 21:51 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

On Tue, 11 January 2005 12:29:33 +0000, Artem B. Bityuckiy wrote:
> Ferenc Havasi:
> >If I am right CRCs are not only against the effect of unclean reboots 
> >but also to handle flash errors. On NAND flashes the ECC handles this
> >problem but NOR doesn't have any error detection system. 
> 
> Joern Engel wrote:
> >So either we can make sure this case never happens, or we can't.  It
> >depends on the type of flash, for sure, and it may be pretty hard with
> >some types.  But if it doesn't work at all, the flash is broken,
> >period.
> 
> Hi, that is really interesting, *why* do we need CRCs in JFFS2?
> 
> Do we need CRC *only* to handle unclean reboots? If so, we may possibly 
> handle it another way, just putting some magic word at the end of node. 
> Possibly, no need for CRC at all.
> 
> Joern stands that if Flash got rotten, we *do not need* to do something 
> special trying to recover data. Am I right?

Pretty much.  Detecting the breakage is still a good thing, so we can
report an error.  There are non-embedded devices with flash and users
want to see the problem and replace their flash.  But apart from that,
don't try too hard to fix something that cannot be fixed.

> I still think we need to do recovery in case of NAND, mark the rotten 
> block bad and keep working, so we still need CRC. In case of NOR, we 
> possibly should just report error and do nothing (we can't mark block bad 
> there). Do not know about ECC NOR.
> 
> Comments?

Ok, here is my approach.

Claim: No mtd has problems with lost data due to bad blocks.  This is
a complete non-issue and jffs[23] doesn't care.

"No" means that less than .001% (add or remove some digits, depending
on your needs) of all devices have this problem.  In those cases, the
system just won't work and will be replaced.  Business as usual.

Now, if any particular flash doesn't match this requirement, the mtd
driver is supposed to mirror all blocks.  If either copy rots away,
the data can still be read back from the other block.  After GC, the
block can be marked as bad and everyone lives on happily ever after.

Capacity is half the original, but that's better than losing
/sbin/init now and then, right?  And maybe the flash is cheap enough
that noone cares.

Sane?

> P.S. By the way, we could put CRCs at the end of blocks (*after* data) - 
> in this case CRC well be extremely strong detecting unclean reboots, isn't 
> it?

Interesting idea.  Will make the code slightly messy, but it should be
worth it.

Jörn

-- 
Do not stop an army on its way home.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-11 21:51 ` Jörn Engel
@ 2005-01-12  0:06   ` Thomas Gleixner
  2005-01-12 16:59     ` Jörn Engel
  2005-01-12  9:15   ` Artem B. Bityuckiy
  2005-01-13 14:49   ` Artem B. Bityuckiy
  2 siblings, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2005-01-12  0:06 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

> Claim: No mtd has problems with lost data due to bad blocks.  This is
> a complete non-issue and jffs[23] doesn't care.

NAK 

NAND and ECC'ed NOR FLASH have a implicit property of occasional
bitflips, which are related to various causes. The ECC addon in the
chip/technology driver takes care of that and may even be able to
recover the resulting data loss. 

ACK

NOR is significantly less prone to data loss and bit flips, so I agree
here to the extent of the following paragraph

A flash aware filesystem has to be aware of failure sensitivity of the
devices they are dealing with. An additional constraint is that FLASH
filesystems are normally used in systems which have totally different
environments and constraints than a desktop system.


> "No" means that less than .001% (add or remove some digits, depending
> on your needs) of all devices have this problem.  In those cases, the
> system just won't work and will be replaced.  Business as usual.
> 
> Now, if any particular flash doesn't match this requirement, the mtd
> driver is supposed to mirror all blocks.  If either copy rots away,
> the data can still be read back from the other block.  After GC, the
> block can be marked as bad and everyone lives on happily ever after.

Mirroring is not a complete solution to address the various
requirements. It's one out of many mechanisms which you might be able to
sell to some of your audience.
Moving the responsibility to a different layer is a real bad idea, as it
restricts the implementation of other filesystems on top of the same MTD
technology. e.g. YAFFS deals nice with NAND oddities and I totaly refuse
to restrict the usage to a subset of capabilities. If your
implementation decision is to use mirroring then provide is as a
seperate feature/layer, which can be used on request.

Facts: 

- FLASH devices are error prone. The grade of proneness is depending on
the FLASH technology
- Devices have partially standardized mechanisms of error correction

Problem:

- Develop a filesystem aware of the above constraints
- Do not overemphasize error correction/protection/ in favor of
performance and depending on FLASH technology
- Make it DAU (For non native german speakers: Dumbest Assumed User)
safe 

Incomplete solution requirements:

- Be aware of the error proneness you are dealing with
- Utilize the error correction / detection mechanisms and deal with the
information they provide. 
- Provide additional userspace fallback mechanisms and design patterns
which help to improve the failsafe goal

> Capacity is half the original, but that's better than losing
> /sbin/init now and then, right?  And maybe the flash is cheap enough
> that noone cares.
> 
> Sane?

Especially high volume systems are very sensitive to the price aspect.
For them it is cheaper to invest in software development than adding an
additional 1$ to the hardware.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-11 21:51 ` Jörn Engel
  2005-01-12  0:06   ` Thomas Gleixner
@ 2005-01-12  9:15   ` Artem B. Bityuckiy
  2005-01-12 16:41     ` Jared Hulbert
  2005-01-12 18:10     ` Jörn Engel
  2005-01-13 14:49   ` Artem B. Bityuckiy
  2 siblings, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12  9:15 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

Hi Joern,

please, read the paper 
http://www.semicon.toshiba.co.jp/eng/prd/memory/doc/pdf/nand_applicationguide_e.pdf
I like this paper. I've just reread it and now I have no doubts that CRCs 
are required on NAND. :-)

Shortly: errors are normal phenomena on NAND devices. Errors are mostly 
handled by NAND ECCs, but
JFFS[23] MUST take care about failures and handle them properly. There 
are permanent and occasional
errors exist. Blocks with permanent errors must be marked bad and it is 
good to recover data...

On Tue, 11 Jan 2005, [iso-8859-1] Jörn Engel wrote:
> On Tue, 11 January 2005 12:29:33 +0000, Artem B. Bityuckiy wrote:
> > Ferenc Havasi:
> > >If I am right CRCs are not only against the effect of unclean reboots 
> > >but also to handle flash errors. On NAND flashes the ECC handles this
> > >problem but NOR doesn't have any error detection system. 
> > 
> > Joern Engel wrote:
> > >So either we can make sure this case never happens, or we can't.  It
> > >depends on the type of flash, for sure, and it may be pretty hard with
> > >some types.  But if it doesn't work at all, the flash is broken,
> > >period.
> > 
> > Hi, that is really interesting, *why* do we need CRCs in JFFS2?
> > 
> > Do we need CRC *only* to handle unclean reboots? If so, we may possibly 
> > handle it another way, just putting some magic word at the end of node. 
> > Possibly, no need for CRC at all.
> > 
> > Joern stands that if Flash got rotten, we *do not need* to do something 
> > special trying to recover data. Am I right?
> 
> Pretty much.  Detecting the breakage is still a good thing, so we can
> report an error.  There are non-embedded devices with flash and users
> want to see the problem and replace their flash.  But apart from that,
> don't try too hard to fix something that cannot be fixed.
> 
> > I still think we need to do recovery in case of NAND, mark the rotten 
> > block bad and keep working, so we still need CRC. In case of NOR, we 
> > possibly should just report error and do nothing (we can't mark block bad 
> > there). Do not know about ECC NOR.
> > 
> > Comments?
> 
> Ok, here is my approach.
> 
> Claim: No mtd has problems with lost data due to bad blocks.  This is
> a complete non-issue and jffs[23] doesn't care.
> 
> "No" means that less than .001% (add or remove some digits, depending
> on your needs) of all devices have this problem.  In those cases, the
> system just won't work and will be replaced.  Business as usual.
> 
> Now, if any particular flash doesn't match this requirement, the mtd
> driver is supposed to mirror all blocks.  If either copy rots away,
> the data can still be read back from the other block.  After GC, the
> block can be marked as bad and everyone lives on happily ever after.
> 
> Capacity is half the original, but that's better than losing
> /sbin/init now and then, right?  And maybe the flash is cheap enough
> that noone cares.
> 
> Sane?
> 
> > P.S. By the way, we could put CRCs at the end of blocks (*after* data) - 
> > in this case CRC well be extremely strong detecting unclean reboots, isn't 
> > it?
> 
> Interesting idea.  Will make the code slightly messy, but it should be
> worth it.
> 
> Jörn
> 
> -- 
> Do not stop an army on its way home.
> -- Sun Tzu
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12  9:15   ` Artem B. Bityuckiy
@ 2005-01-12 16:41     ` Jared Hulbert
  2005-01-12 17:02       ` Jörn Engel
  2005-01-12 17:14       ` Artem B. Bityuckiy
  2005-01-12 18:10     ` Jörn Engel
  1 sibling, 2 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-12 16:41 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

Clarification on NOR technology.  Remember that the ability to run
code XIP is effectively a requirement for a NOR chip.  This means no
read errors can leave the chip.  I don't see this changing in the
foreseeable future.  Any read errors that do occur would probably be
caused by a failed/incomplete program.

We probably do want to be able to easily retire, reprogram, and/or
test those blocks/pages that get read errors.  That would have to be
done in the filesystem and the chip driver needs to be able report a
read error occured.

Any chance of being able store special files XIP in JFFS3? 
Uncompressed aligned page sized chunks, etc.


,Jared



On Wed, 12 Jan 2005 09:15:42 +0000 (GMT), Artem B. Bityuckiy
<dedekind@infradead.org> wrote:
> Hi Joern,
> 
> please, read the paper
> http://www.semicon.toshiba.co.jp/eng/prd/memory/doc/pdf/nand_applicationguide_e.pdf
> I like this paper. I've just reread it and now I have no doubts that CRCs
> are required on NAND. :-)
> 
> Shortly: errors are normal phenomena on NAND devices. Errors are mostly
> handled by NAND ECCs, but
> JFFS[23] MUST take care about failures and handle them properly. There
> are permanent and occasional
> errors exist. Blocks with permanent errors must be marked bad and it is
> good to recover data...
> 
> On Tue, 11 Jan 2005, [iso-8859-1] Jörn Engel wrote:
> > On Tue, 11 January 2005 12:29:33 +0000, Artem B. Bityuckiy wrote:
> > > Ferenc Havasi:
> > > >If I am right CRCs are not only against the effect of unclean reboots
> > > >but also to handle flash errors. On NAND flashes the ECC handles this
> > > >problem but NOR doesn't have any error detection system.
> > >
> > > Joern Engel wrote:
> > > >So either we can make sure this case never happens, or we can't.  It
> > > >depends on the type of flash, for sure, and it may be pretty hard with
> > > >some types.  But if it doesn't work at all, the flash is broken,
> > > >period.
> > >
> > > Hi, that is really interesting, *why* do we need CRCs in JFFS2?
> > >
> > > Do we need CRC *only* to handle unclean reboots? If so, we may possibly
> > > handle it another way, just putting some magic word at the end of node.
> > > Possibly, no need for CRC at all.
> > >
> > > Joern stands that if Flash got rotten, we *do not need* to do something
> > > special trying to recover data. Am I right?
> >
> > Pretty much.  Detecting the breakage is still a good thing, so we can
> > report an error.  There are non-embedded devices with flash and users
> > want to see the problem and replace their flash.  But apart from that,
> > don't try too hard to fix something that cannot be fixed.
> >
> > > I still think we need to do recovery in case of NAND, mark the rotten
> > > block bad and keep working, so we still need CRC. In case of NOR, we
> > > possibly should just report error and do nothing (we can't mark block bad
> > > there). Do not know about ECC NOR.
> > >
> > > Comments?
> >
> > Ok, here is my approach.
> >
> > Claim: No mtd has problems with lost data due to bad blocks.  This is
> > a complete non-issue and jffs[23] doesn't care.
> >
> > "No" means that less than .001% (add or remove some digits, depending
> > on your needs) of all devices have this problem.  In those cases, the
> > system just won't work and will be replaced.  Business as usual.
> >
> > Now, if any particular flash doesn't match this requirement, the mtd
> > driver is supposed to mirror all blocks.  If either copy rots away,
> > the data can still be read back from the other block.  After GC, the
> > block can be marked as bad and everyone lives on happily ever after.
> >
> > Capacity is half the original, but that's better than losing
> > /sbin/init now and then, right?  And maybe the flash is cheap enough
> > that noone cares.
> >
> > Sane?
> >
> > > P.S. By the way, we could put CRCs at the end of blocks (*after* data) -
> > > in this case CRC well be extremely strong detecting unclean reboots, isn't
> > > it?
> >
> > Interesting idea.  Will make the code slightly messy, but it should be
> > worth it.
> >
> > Jörn
> >
> > --
> > Do not stop an army on its way home.
> > -- Sun Tzu
> >
> 
> --
> Best Regards,
> Artem B. Bityuckiy,
> St.-Petersburg, Russia.
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12  0:06   ` Thomas Gleixner
@ 2005-01-12 16:59     ` Jörn Engel
  2005-01-12 17:37       ` Thomas Gleixner
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 16:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: MTD List, David Woodhouse

Ok, you completely ignore the central issue.  Looks like I was unclear
about it.

What happens, if crucial data in flash gets corrupted and is unusable?
Say, /sbin/init.

If you don't have a solution for that case, I'm completely unexcited
about solutions to any lesser problems.

Can you make sure /sbin/init remains safe?

Jörn

-- 
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 16:41     ` Jared Hulbert
@ 2005-01-12 17:02       ` Jörn Engel
  2005-01-12 17:06         ` David Woodhouse
  2005-01-12 17:22         ` Jared Hulbert
  2005-01-12 17:14       ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 17:02 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List, David Woodhouse

On Wed, 12 January 2005 08:41:04 -0800, Jared Hulbert wrote:
> 
> Clarification on NOR technology.  Remember that the ability to run
> code XIP is effectively a requirement for a NOR chip.  This means no
> read errors can leave the chip.  I don't see this changing in the
> foreseeable future.  Any read errors that do occur would probably be
> caused by a failed/incomplete program.

Correct.  That requirement comes mostly from having to program the
memory controller before being able to use DRAM.  After the early
boot, it's just nice to have.

> We probably do want to be able to easily retire, reprogram, and/or
> test those blocks/pages that get read errors.  That would have to be
> done in the filesystem and the chip driver needs to be able report a
> read error occured.
> 
> Any chance of being able store special files XIP in JFFS3? 
> Uncompressed aligned page sized chunks, etc.

Would this actually be an advantage?  Last time I looked, it was
cheaper to use more DRAM.  Most DSL routers I see advertised have a
1:4 or 1:8 ratio of NOR:DRAM, so it looks as if the prices haven't
changed much.

Jörn

-- 
Correctness comes second.
Features come third.
Performance comes last.
Maintainability is needed for all of them.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:02       ` Jörn Engel
@ 2005-01-12 17:06         ` David Woodhouse
  2005-01-12 17:11           ` Jörn Engel
  2005-01-12 17:22         ` Jared Hulbert
  1 sibling, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-12 17:06 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Wed, 2005-01-12 at 18:02 +0100, Jörn Engel wrote:
> Would this actually be an advantage?  Last time I looked, it was
> cheaper to use more DRAM.  Most DSL routers I see advertised have a
> 1:4 or 1:8 ratio of NOR:DRAM, so it looks as if the prices haven't
> changed much.

Cheaper and faster. But remember who Jared works for :)

Seriously though, XIP is a win on power -- it doesn't take constant
application of power to keep your data in flash. If you start comparing
prices of SRAM and flash, the game becomes a little different.

Of course, if you're _that_ concerned about the power budget one has to
wonder why you're using Linux, but then JFFS2 runs on eCos too...

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:06         ` David Woodhouse
@ 2005-01-12 17:11           ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 17:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Wed, 12 January 2005 17:06:02 +0000, David Woodhouse wrote:
> 
> Cheaper and faster. But remember who Jared works for :)

:)

> Seriously though, XIP is a win on power -- it doesn't take constant
> application of power to keep your data in flash. If you start comparing
> prices of SRAM and flash, the game becomes a little different.
> 
> Of course, if you're _that_ concerned about the power budget one has to
> wonder why you're using Linux, but then JFFS2 runs on eCos too...

In that case, most people I know use some 8bit or 16bit controllers
with less total code than jffs2 alone would weigh in.

But I also remember who you work for. :)

Jörn

-- 
Data expands to fill the space available for storage.
-- Parkinson's Law

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 16:41     ` Jared Hulbert
  2005-01-12 17:02       ` Jörn Engel
@ 2005-01-12 17:14       ` Artem B. Bityuckiy
  2005-01-12 22:30         ` Jared Hulbert
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 17:14 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: David Woodhouse, MTD List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 4948 bytes --]

On Wed, 12 Jan 2005, Jared Hulbert wrote:

Hi Jared,

> Clarification on NOR technology.  Remember that the ability to run
> code XIP is effectively a requirement for a NOR chip.  This means no
> read errors can leave the chip.  I don't see this changing in the
> foreseeable future.  Any read errors that do occur would probably be
> caused by a failed/incomplete program
> 
> We probably do want to be able to easily retire, reprogram, and/or
> test those blocks/pages that get read errors.  That would have to be
> done in the filesystem and the chip driver needs to be able report a
> read error occured.

What would you conclude from this (in the context of disscussion)? That 
CRC *must* be *always* checked in case of NOR?

> 
> Any chance of being able store special files XIP in JFFS3? 
> Uncompressed aligned page sized chunks, etc.
> 
> 
> ,Jared
> 
> 
> 
> On Wed, 12 Jan 2005 09:15:42 +0000 (GMT), Artem B. Bityuckiy
> <dedekind@infradead.org> wrote:
> > Hi Joern,
> > 
> > please, read the paper
> > http://www.semicon.toshiba.co.jp/eng/prd/memory/doc/pdf/nand_applicationguide_e.pdf
> > I like this paper. I've just reread it and now I have no doubts that CRCs
> > are required on NAND. :-)
> > 
> > Shortly: errors are normal phenomena on NAND devices. Errors are mostly
> > handled by NAND ECCs, but
> > JFFS[23] MUST take care about failures and handle them properly. There
> > are permanent and occasional
> > errors exist. Blocks with permanent errors must be marked bad and it is
> > good to recover data...
> > 
> > On Tue, 11 Jan 2005, [iso-8859-1] Jörn Engel wrote:
> > > On Tue, 11 January 2005 12:29:33 +0000, Artem B. Bityuckiy wrote:
> > > > Ferenc Havasi:
> > > > >If I am right CRCs are not only against the effect of unclean reboots
> > > > >but also to handle flash errors. On NAND flashes the ECC handles this
> > > > >problem but NOR doesn't have any error detection system.
> > > >
> > > > Joern Engel wrote:
> > > > >So either we can make sure this case never happens, or we can't.  It
> > > > >depends on the type of flash, for sure, and it may be pretty hard with
> > > > >some types.  But if it doesn't work at all, the flash is broken,
> > > > >period.
> > > >
> > > > Hi, that is really interesting, *why* do we need CRCs in JFFS2?
> > > >
> > > > Do we need CRC *only* to handle unclean reboots? If so, we may possibly
> > > > handle it another way, just putting some magic word at the end of node.
> > > > Possibly, no need for CRC at all.
> > > >
> > > > Joern stands that if Flash got rotten, we *do not need* to do something
> > > > special trying to recover data. Am I right?
> > >
> > > Pretty much.  Detecting the breakage is still a good thing, so we can
> > > report an error.  There are non-embedded devices with flash and users
> > > want to see the problem and replace their flash.  But apart from that,
> > > don't try too hard to fix something that cannot be fixed.
> > >
> > > > I still think we need to do recovery in case of NAND, mark the rotten
> > > > block bad and keep working, so we still need CRC. In case of NOR, we
> > > > possibly should just report error and do nothing (we can't mark block bad
> > > > there). Do not know about ECC NOR.
> > > >
> > > > Comments?
> > >
> > > Ok, here is my approach.
> > >
> > > Claim: No mtd has problems with lost data due to bad blocks.  This is
> > > a complete non-issue and jffs[23] doesn't care.
> > >
> > > "No" means that less than .001% (add or remove some digits, depending
> > > on your needs) of all devices have this problem.  In those cases, the
> > > system just won't work and will be replaced.  Business as usual.
> > >
> > > Now, if any particular flash doesn't match this requirement, the mtd
> > > driver is supposed to mirror all blocks.  If either copy rots away,
> > > the data can still be read back from the other block.  After GC, the
> > > block can be marked as bad and everyone lives on happily ever after.
> > >
> > > Capacity is half the original, but that's better than losing
> > > /sbin/init now and then, right?  And maybe the flash is cheap enough
> > > that noone cares.
> > >
> > > Sane?
> > >
> > > > P.S. By the way, we could put CRCs at the end of blocks (*after* data) -
> > > > in this case CRC well be extremely strong detecting unclean reboots, isn't
> > > > it?
> > >
> > > Interesting idea.  Will make the code slightly messy, but it should be
> > > worth it.
> > >
> > > Jörn
> > >
> > > --
> > > Do not stop an army on its way home.
> > > -- Sun Tzu
> > >
> > 
> > --
> > Best Regards,
> > Artem B. Bityuckiy,
> > St.-Petersburg, Russia.
> > 
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> >
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:02       ` Jörn Engel
  2005-01-12 17:06         ` David Woodhouse
@ 2005-01-12 17:22         ` Jared Hulbert
  2005-01-12 17:28           ` Artem B. Bityuckiy
  2005-01-12 17:34           ` David Woodhouse
  1 sibling, 2 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-12 17:22 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

> Correct.  That requirement comes mostly from having to program the
> memory controller before being able to use DRAM.  After the early
> boot, it's just nice to have.

Requirement in the marketplace I mean.  Most NOR chips are expected to
have 0 errors by those who buy them.

> Would this actually be an advantage?  Last time I looked, it was
> cheaper to use more DRAM.  Most DSL routers I see advertised have a
> 1:4 or 1:8 ratio of NOR:DRAM, so it looks as if the prices haven't
> changed much.

So you save more RAM that you use up flash when doing XIP.  We've seen
1.5MiB of RAM saved at a cost of 1MiB of NOR.  This reduces the total
memory footprint of the system.  It can also make the difference
between 16MiB and a 32MiB DRAM.  The end result is that the XIP can
lower the BOM cost for a device.  It also can reduce the power
consumption such that your phone or PDA has a much longer standby
time.  The performance advantage of XIP to boot up and application
launch can be quite noticable.  Think of the flow of data to start
executing an application from JFFS2.  We copy the data 3 times in RAM?
decompress it, CRC it.  Compare that to mmap() in XIP cramfs.  It just
points to the flash address.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:22         ` Jared Hulbert
@ 2005-01-12 17:28           ` Artem B. Bityuckiy
  2005-01-12 17:34           ` David Woodhouse
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 17:28 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: David Woodhouse, MTD List

On Wed, 12 Jan 2005, Jared Hulbert wrote:

> > Correct.  That requirement comes mostly from having to program the
> > memory controller before being able to use DRAM.  After the early
> > boot, it's just nice to have.
> 
> Requirement in the marketplace I mean.  Most NOR chips are expected to
> have 0 errors by those who buy them.
> 
> > Would this actually be an advantage?  Last time I looked, it was
> > cheaper to use more DRAM.  Most DSL routers I see advertised have a
> > 1:4 or 1:8 ratio of NOR:DRAM, so it looks as if the prices haven't
> > changed much.
> 
> So you save more RAM that you use up flash when doing XIP.  We've seen
> 1.5MiB of RAM saved at a cost of 1MiB of NOR.  This reduces the total
> memory footprint of the system.  It can also make the difference
> between 16MiB and a 32MiB DRAM.  The end result is that the XIP can
> lower the BOM cost for a device.  It also can reduce the power
> consumption such that your phone or PDA has a much longer standby
> time.  The performance advantage of XIP to boot up and application
> launch can be quite noticable.  Think of the flow of data to start
> executing an application from JFFS2.  We copy the data 3 times in RAM?
Teoretically in case of NOR only once in the best case when node is 
pristine. We just uncompress it directly from NOR flash to RAM page.
In worst case twice. We uncompress non-pristine data to temporary buffer, 
copy the valid data to the page. And so forth for all nodes containing 
data related to the requested range.

> decompress it, CRC it.  Compare that to mmap() in XIP cramfs.  It just
> points to the flash address.
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:22         ` Jared Hulbert
  2005-01-12 17:28           ` Artem B. Bityuckiy
@ 2005-01-12 17:34           ` David Woodhouse
  2005-01-12 17:45             ` Dan Post
  1 sibling, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-12 17:34 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List

On Wed, 2005-01-12 at 09:22 -0800, Jared Hulbert wrote:
> Think of the flow of data to start executing an application from
> JFFS2.  We copy the data 3 times in RAM? decompress it, CRC it. 

We ought to be decompressing directly from flash to RAM. It takes 0.5x
flash, and 1x RAM. And the RAM usage is dynamic -- you can throw the
page away if memory is tight, and read it back again later.

>  Compare that to mmap() in XIP cramfs.  It just points to the flash
> address.

And takes twice the amount of flash. Yes, it uses less RAM, but I don't
see how it can reduce the ROM cost. 

I haven't seen a decent analysis of the real-life cost of startup -- the
CELF numbers at OLS included decompression of the kernel in their
startup timings.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 16:59     ` Jörn Engel
@ 2005-01-12 17:37       ` Thomas Gleixner
  2005-01-12 18:17         ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2005-01-12 17:37 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

On Wed, 2005-01-12 at 17:59 +0100, Jörn Engel wrote:
> Ok, you completely ignore the central issue.  Looks like I was unclear
> about it.
> 
> What happens, if crucial data in flash gets corrupted and is unusable?
> Say, /sbin/init.

I totaly agree that it must be sure that stuff does not get corrupted. 

We use MTD for embedded systems since long and I'm well aware that data
might be corrupted. So most of the systems have a ro partition for
crucial stuff which only is mounted rw for updates under safe
conditions. The application data is living on a different partition.

For critical systems we have a backup emergency root, which we can
activate when a update smashes the normal rootfs. There are other
userspace possibilities to take care of backups and fallback.

So I don't see the point of an enforced mirror in MTD which will bring
more complexity than it is worth, but I might have missed some points.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:34           ` David Woodhouse
@ 2005-01-12 17:45             ` Dan Post
  2005-01-12 17:52               ` David Woodhouse
  0 siblings, 1 reply; 196+ messages in thread
From: Dan Post @ 2005-01-12 17:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Wed, 12 Jan 2005 17:34:53 +0000, David Woodhouse <dwmw2@infradead.org> wrote:
> We ought to be decompressing directly from flash to RAM. It takes 0.5x
> flash, and 1x RAM. And the RAM usage is dynamic -- you can throw the
> page away if memory is tight, and read it back again later.

Which can cause severe application performance problems in
RAM-constrained systems.  Your system will slow to a crawl if the page
cache is thrashed a lot... and I've seen this problem a number of
times.  If the page is always present, there is no page cache penalty
for executing code.

> And takes twice the amount of flash. Yes, it uses less RAM, but I don't
> see how it can reduce the ROM cost.
> I haven't seen a decent analysis of the real-life cost of startup -- the
> CELF numbers at OLS included decompression of the kernel in their
> startup timings.

We'll see what we can do to post numbers.
(No pun intended.)

Dan

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:45             ` Dan Post
@ 2005-01-12 17:52               ` David Woodhouse
  0 siblings, 0 replies; 196+ messages in thread
From: David Woodhouse @ 2005-01-12 17:52 UTC (permalink / raw)
  To: Dan Post; +Cc: MTD List

On Wed, 2005-01-12 at 09:45 -0800, Dan Post wrote:
> Which can cause severe application performance problems in
> RAM-constrained systems.  Your system will slow to a crawl if the page
> cache is thrashed a lot... and I've seen this problem a number of
> times.  If the page is always present, there is no page cache penalty
> for executing code.

Remember, you're comparing with the case where that page wasn't present
at _all_ so the page is always absent. If you miscalculate your RAM
budget with XIP, you're fairly quickly screwed as you run out of memory.
If you miscalculate with non-XIP, you have a certain amount of slack
because pages can be discarded and re-read. Yes, that can be suboptimal
if you get into thrashing, but there's often going to be a reasonable
number of pages you can discard before you get to that.

> We'll see what we can do to post numbers.
> (No pun intended.)

:)

That'd be interesting. I can't imagine them showing that application XIP
really is a long-term win, but we may find that some kind of partial
XIP, or faulting parts of the kernel into RAM as we start up, or
something like that, may be beneficial. I have an open mind -- mostly
because I lack the wit to remember what I said before and hence be
consistent :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12  9:15   ` Artem B. Bityuckiy
  2005-01-12 16:41     ` Jared Hulbert
@ 2005-01-12 18:10     ` Jörn Engel
  2005-01-12 18:27       ` Thomas Gleixner
  2005-01-12 18:33       ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 18:10 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

On Wed, 12 January 2005 09:15:42 +0000, Artem B. Bityuckiy wrote:
> 
> please, read the paper 
> http://www.semicon.toshiba.co.jp/eng/prd/memory/doc/pdf/nand_applicationguide_e.pdf
> I like this paper.

Your ability to come up with excellent papers is astounding!

> I've just reread it and now I have no doubts that CRCs 
> are required on NAND. :-)
> 
> Shortly: errors are normal phenomena on NAND devices. Errors are mostly 
> handled by NAND ECCs, but
> JFFS[23] MUST take care about failures and handle them properly. There 
> are permanent and occasional
> errors exist. Blocks with permanent errors must be marked bad and it is 
> good to recover data...

Ok, let me distinguish between the different problems:


  Bad blocks
We definitely need to handle those, no doubt.  Problem is not that
some blocks _are_ bad, it's that they _become_ bad.  So, when and how
does this happen?


  Initial bad blocks
Should be simple to handle.  No crucial data was ever written to those
blocks, so we don't have a problem.


  Blocks that fail during use
Quote: "Therefore, blocks should be marked as bad and no longer
accessed if there is either a block erase failure or a page program
failure."

During erase, by definition those blocks don't hold crucial data.  Not
a problem.  Page program is slightly worse, but it only means that we
have to program a different block instead.  Make sure you don't use
the partial programming thing they hinted at and no crucial data is
lost.  Again, harmless.


  Permanent Failure
Those can be noticed during either erase or program, so the above
applies.  Harmless.


  Soft errors
They occur at a rate of 10^-10 or 2^-30 for 1-bit errors.  Those are
corrected by the ECC, so no data is lost.  Assuming two such incidents
are completely independent, 2-bit errors occur at a rate of 10^-20 or
2^-60.  Let's do some math.

Flash sizes today are in the 2^30 bit (128MiB) range.  Erase cycles
are about 1 Million or 2^20, so total bit writes per medium are about
2^50.  That means there is a 2^-10 remaining chance to experience a
2-bit error during the lifetime of the flash.

In other words, one out of 1000 flashes will have a non-recoverable
error sometime during it's life cycle.  Doesn't exactly make me happy,
but it's not horrible either.  In cases where little data is written
to flash or other components (cpu, power regulators) die before the
flash does, this is even less of an issue.

For 3-bit errors, those that are not detected by ECC logic anymore,
the chances are 2^-40 or practically non-existant.

Well, mostly harmless.



What does that mean for this discussion?  On Toshiba's NAND flashes,
according to their claims, jffs2 checksums won't catch any errors that
wouldn't already be caught either by ECC or during write/erase.

Am I wrong?

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:37       ` Thomas Gleixner
@ 2005-01-12 18:17         ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 18:17 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: David Woodhouse, MTD List

On Wed, 12 January 2005 18:37:00 +0100, Thomas Gleixner wrote:
> On Wed, 2005-01-12 at 17:59 +0100, Jörn Engel wrote:
> > Ok, you completely ignore the central issue.  Looks like I was unclear
> > about it.
> > 
> > What happens, if crucial data in flash gets corrupted and is unusable?
> > Say, /sbin/init.
> 
> I totaly agree that it must be sure that stuff does not get corrupted. 
> 
> We use MTD for embedded systems since long and I'm well aware that data
> might be corrupted. So most of the systems have a ro partition for
> crucial stuff which only is mounted rw for updates under safe
> conditions. The application data is living on a different partition.
> 
> For critical systems we have a backup emergency root, which we can
> activate when a update smashes the normal rootfs. There are other
> userspace possibilities to take care of backups and fallback.
> 
> So I don't see the point of an enforced mirror in MTD which will bring
> more complexity than it is worth, but I might have missed some points.

So my last mail in this thread for NAND analysis.  It basically states
that data corruption is nothing we have to worry about a lot.  Maybe a
little, depending on your preferred reliability.

Same goes for NOR, DRAM and hard drives, so I feel pretty safe.

Now, if some El-Cheapo manufacturer would come along and create an
ultra-cheap and ultra-unreliable new type of flash, we might need to
reconsider something like the mirroring, but only for this particular
type of flash.  Let's hope this never happens.

So again, I don't care too much, whether jffs2 can scavenge some bytes
of data from a rotten block, simply because this never happens (for
all practical purposes).  If it's simple enough, sure, go ahead.  If
it's complicated and ugly, please don't.

Jörn

-- 
It's just what we asked for, but not what we want!
-- anonymous

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:10     ` Jörn Engel
@ 2005-01-12 18:27       ` Thomas Gleixner
  2005-01-12 18:40         ` Jörn Engel
  2005-01-12 18:33       ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2005-01-12 18:27 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Wed, 2005-01-12 at 19:10 +0100, Jörn Engel wrote:

> Ok, let me distinguish between the different problems:
>   Soft errors
>
> In other words, one out of 1000 flashes will have a non-recoverable
> error sometime during it's life cycle.  Doesn't exactly make me happy,
> but it's not horrible either.  In cases where little data is written
> to flash or other components (cpu, power regulators) die before the
> flash does, this is even less of an issue.
>
> What does that mean for this discussion?  On Toshiba's NAND flashes,
> according to their claims, jffs2 checksums won't catch any errors that
> wouldn't already be caught either by ECC or during write/erase.
> 
> Am I wrong?

No, using Reed-Solomon codes with an hardware decoder/encoder can
improve this significantly (3 - 12 bits depending on the
implementation), so your error propabilty is going near 0.
Hardware encoders/decoders are easy to build into FPGA/CPLD. There are
public VHDL sources available. The speed advantage over sofwtare ECC is
significant.

Missing error

There is another type of error, which came up lately with AND FLASH. If
you leave a block untouched in a range of blocks and erase/program the
surrounding blocks, then the untouched block is likely to have bitflips.
GC likeliness to GC clean blocks from time to time will cover this, but
if you have a big number of clean blocks you might get into trouble.

A strong argument for using seperate partitions for static and dynamic
data.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:10     ` Jörn Engel
  2005-01-12 18:27       ` Thomas Gleixner
@ 2005-01-12 18:33       ` Artem B. Bityuckiy
  2005-01-12 18:43         ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 18:33 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

Joern,

> Your ability to come up with excellent papers is astounding!
Thanks :-)

>   Blocks that fail during use
> Quote: "Therefore, blocks should be marked as bad and no longer
> accessed if there is either a block erase failure or a page program
> failure."
>
> During erase, by definition those blocks don't hold crucial data.  Not
> a problem.  Page program is slightly worse, but it only means that we
> have to program a different block instead.  Make sure you don't use
> the partial programming thing they hinted at and no crucial data is
> lost.  Again, harmless.
>
>
>   Permanent Failure
> Those can be noticed during either erase or program, so the above
> applies.  Harmless.
Hmm. I think about the intermediate stage. I mean that block is good at 
one moment, and it is bad at another moment. But *when* it become bad? Is 
this only during erase? If so, you're probably right. But I'm not sure. Is 
there some stage when block still contain crucial data, but is already 
bad? If it is, CRC will cach this.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:27       ` Thomas Gleixner
@ 2005-01-12 18:40         ` Jörn Engel
  2005-01-12 18:42           ` David Woodhouse
                             ` (2 more replies)
  0 siblings, 3 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 18:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: MTD List, David Woodhouse

On Wed, 12 January 2005 19:27:59 +0100, Thomas Gleixner wrote:
> 
> Missing error
> 
> There is another type of error, which came up lately with AND FLASH. If
                                                           ^^
You mean NAND, right?

> you leave a block untouched in a range of blocks and erase/program the
> surrounding blocks, then the untouched block is likely to have bitflips.
> GC likeliness to GC clean blocks from time to time will cover this, but
> if you have a big number of clean blocks you might get into trouble.
> 
> A strong argument for using seperate partitions for static and dynamic
> data.

Wouldn't that also require a DMZ of some size between static and
dynamic data?  For example, 4MB static, 1MB unused, rest dynamic?

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:40         ` Jörn Engel
@ 2005-01-12 18:42           ` David Woodhouse
  2005-01-12 18:43           ` Artem B. Bityuckiy
  2005-01-12 19:16           ` Thomas Gleixner
  2 siblings, 0 replies; 196+ messages in thread
From: David Woodhouse @ 2005-01-12 18:42 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Thomas Gleixner, MTD List

On Wed, 2005-01-12 at 19:40 +0100, Jörn Engel wrote:
> 
> > There is another type of error, which came up lately with AND FLASH.
> If
>                                                            ^^
> You mean NAND, right?

No, AND. 
http://www.renesas.com/fmwk.jsp?cnt=ag_and_flash_memory_root.jsp&fp=/products/memory/ag_and_flash_memory/&site=i

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:33       ` Artem B. Bityuckiy
@ 2005-01-12 18:43         ` Jörn Engel
  2005-01-12 18:45           ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 18:43 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

On Wed, 12 January 2005 18:33:33 +0000, Artem B. Bityuckiy wrote:
>
> Hmm. I think about the intermediate stage. I mean that block is good at 
> one moment, and it is bad at another moment. But *when* it become bad? Is 
> this only during erase? If so, you're probably right. But I'm not sure. Is 
> there some stage when block still contain crucial data, but is already 
> bad? If it is, CRC will cach this.

According to the paper, all but soft errors only occur during erase
and write.  So we're safe ...

... or Toshiba lied to us.

Jörn

-- 
The only real mistake is the one from which we learn nothing.
-- John Powell

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:40         ` Jörn Engel
  2005-01-12 18:42           ` David Woodhouse
@ 2005-01-12 18:43           ` Artem B. Bityuckiy
  2005-01-12 19:16           ` Thomas Gleixner
  2 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 18:43 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, Thomas Gleixner, David Woodhouse

On Wed, 12 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Wed, 12 January 2005 19:27:59 +0100, Thomas Gleixner wrote:
> > 
> > Missing error
> > 
> > There is another type of error, which came up lately with AND FLASH. If
>                                                            ^^
> You mean NAND, right?
Think AG-AND flashes. I'm not know much about this technology either. 

> 
> > you leave a block untouched in a range of blocks and erase/program the
> > surrounding blocks, then the untouched block is likely to have bitflips.
> > GC likeliness to GC clean blocks from time to time will cover this, but
> > if you have a big number of clean blocks you might get into trouble.
> > 
> > A strong argument for using seperate partitions for static and dynamic
> > data.
> 
> Wouldn't that also require a DMZ of some size between static and
> dynamic data?  For example, 4MB static, 1MB unused, rest dynamic?
> 
> Jörn
> 
> -- 
> If you're willing to restrict the flexibility of your approach,
> you can almost always do something better.
> -- John Carmack
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:43         ` Jörn Engel
@ 2005-01-12 18:45           ` Artem B. Bityuckiy
  2005-01-12 18:58             ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 18:45 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

On Wed, 12 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Wed, 12 January 2005 18:33:33 +0000, Artem B. Bityuckiy wrote:
> >
> > Hmm. I think about the intermediate stage. I mean that block is good at 
> > one moment, and it is bad at another moment. But *when* it become bad? Is 
> > this only during erase? If so, you're probably right. But I'm not sure. Is 
> > there some stage when block still contain crucial data, but is already 
> > bad? If it is, CRC will cach this.
> 
> According to the paper, all but soft errors only occur during erase
> and write.  So we're safe ...
> 
> ... or Toshiba lied to us.
> 
Still not sure. This is not standed explicitly...
To be honest, I would be happy if you are right - this would simplify 
things.

> Jörn
> 
> -- 
> The only real mistake is the one from which we learn nothing.
> -- John Powell
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:45           ` Artem B. Bityuckiy
@ 2005-01-12 18:58             ` Artem B. Bityuckiy
  2005-01-12 19:50               ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-12 18:58 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

Anyway, guys, this is *crucial* question. Would be nice to come up with 
consensus :-) I hope we do not discuss for discussion :-) Hope we want to 
come up with the truth :-) ?

Could we summarize?

Does Everybody agree that CRC is needed at least to detect problems *on 
time* and report about them ?

If all agree, the question about "do we need CRC for purpose other then 
detect unclean reboots?" may be closed. Is it?

The open question: do we need trying recover data in case of errors?
For NOR this question is close - errors are critical there.
For NAND - is it still open? Does somebody thinks we should recover?

Saying recover I mean trying to move not corrupted data from the block 
which have become bad to another, good block.

Comment anybody please.
Thanks.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:40         ` Jörn Engel
  2005-01-12 18:42           ` David Woodhouse
  2005-01-12 18:43           ` Artem B. Bityuckiy
@ 2005-01-12 19:16           ` Thomas Gleixner
  2005-01-12 19:44             ` Jörn Engel
  2 siblings, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2005-01-12 19:16 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

On Wed, 2005-01-12 at 19:40 +0100, Jörn Engel wrote:
> On Wed, 12 January 2005 19:27:59 +0100, Thomas Gleixner wrote:
> > 
> > Missing error
> > 
> > There is another type of error, which came up lately with AND FLASH. If
>                                                            ^^
> You mean NAND, right?

No, I mean AND. And is similar to NAND, it has the same interface, but a
different cell technology.

http://www.renesas.com/fmwk.jsp?
cnt=ag_and_flash_memory_root.jsp&fp=/products/memory/ag_and_flash_memory/

> > you leave a block untouched in a range of blocks and erase/program the
> > surrounding blocks, then the untouched block is likely to have bitflips.
> > GC likeliness to GC clean blocks from time to time will cover this, but
> > if you have a big number of clean blocks you might get into trouble.
> > 
> > A strong argument for using seperate partitions for static and dynamic
> > data.
> 
> Wouldn't that also require a DMZ of some size between static and
> dynamic data?  For example, 4MB static, 1MB unused, rest dynamic?

No it does not propagate over blocks of blocks. e.g. inside of 0-255,
256-511, ... you have the trouble, but not between those regions.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 19:16           ` Thomas Gleixner
@ 2005-01-12 19:44             ` Jörn Engel
  2005-01-12 19:53               ` Thomas Gleixner
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 19:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: David Woodhouse, MTD List

On Wed, 12 January 2005 20:16:41 +0100, Thomas Gleixner wrote:
> On Wed, 2005-01-12 at 19:40 +0100, Jörn Engel wrote:
> > 
> > Wouldn't that also require a DMZ of some size between static and
> > dynamic data?  For example, 4MB static, 1MB unused, rest dynamic?
> 
> No it does not propagate over blocks of blocks. e.g. inside of 0-255,
> 256-511, ... you have the trouble, but not between those regions.

Ah, ok.  So they simply decided to pack the blocks inside those groups
tighter.  Another way to handle it would be to treat those block
groups as jffs2 eraseblocks.  Should be quite similar to NOR blocks
sizes then.

Jörn

-- 
The story so far:
In the beginning the Universe was created.  This has made a lot
of people very angry and been widely regarded as a bad move.
-- Douglas Adams

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 18:58             ` Artem B. Bityuckiy
@ 2005-01-12 19:50               ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 19:50 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, David Woodhouse

On Wed, 12 January 2005 18:58:18 +0000, Artem B. Bityuckiy wrote:
> 
> Could we summarize?
> 
> Does Everybody agree that CRC is needed at least to detect problems *on 
> time* and report about them ?

s/CRC/checksum/

Agreed, with that change.

> If all agree, the question about "do we need CRC for purpose other then 
> detect unclean reboots?" may be closed. Is it?
> 
> The open question: do we need trying recover data in case of errors?
> For NOR this question is close - errors are critical there.
> For NAND - is it still open? Does somebody thinks we should recover?
> 
> Saying recover I mean trying to move not corrupted data from the block 
> which have become bad to another, good block.

If it's simple enough, sure.

Imo, it is more important to *report* and *remember* this problem,
though.  If such a problem ever occurs and it's not just a short
write, but real data corruption on the medium, the device is broken.
Trying to recove is fine, but staying silent definitely isn't.

Plus, we should note this information somewhere, preferrably in every
single erase block.  On every new mount, it should be clear to anyone
possibly watching that the medium needs urgent replacement.

Jörn

-- 
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 19:44             ` Jörn Engel
@ 2005-01-12 19:53               ` Thomas Gleixner
  2005-01-12 20:06                 ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Thomas Gleixner @ 2005-01-12 19:53 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

On Wed, 2005-01-12 at 20:44 +0100, Jörn Engel wrote:
> On Wed, 12 January 2005 20:16:41 +0100, Thomas Gleixner wrote:
> > On Wed, 2005-01-12 at 19:40 +0100, Jörn Engel wrote:
> > > 
> > > Wouldn't that also require a DMZ of some size between static and
> > > dynamic data?  For example, 4MB static, 1MB unused, rest dynamic?
> > 
> > No it does not propagate over blocks of blocks. e.g. inside of 0-255,
> > 256-511, ... you have the trouble, but not between those regions.
> 
> Ah, ok.  So they simply decided to pack the blocks inside those groups
> tighter.  Another way to handle it would be to treat those block
> groups as jffs2 eraseblocks.  Should be quite similar to NOR blocks
> sizes then.

Yes, but then we need the already discussed change in block handling and
accounting, which allows us to handle blocks which consist of multiple
subblocks, where one or more of the subblocks can be bad.

The current code treats blocks with multiple subblocks in a way, that
when one of the subblocks is or becomes bad, the whole block is treated
as bad.

So you mark e.g. 128MB bad, because one 16K sub block is bad. This is
extra annoying on AG-AND where the bad block rate can be quite high.

tglx

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 19:53               ` Thomas Gleixner
@ 2005-01-12 20:06                 ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-12 20:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: David Woodhouse, MTD List

On Wed, 12 January 2005 20:53:21 +0100, Thomas Gleixner wrote:
> 
> Yes, but then we need the already discussed change in block handling and
> accounting, which allows us to handle blocks which consist of multiple
> subblocks, where one or more of the subblocks can be bad.
> 
> The current code treats blocks with multiple subblocks in a way, that
> when one of the subblocks is or becomes bad, the whole block is treated
> as bad.
> 
> So you mark e.g. 128MB bad, because one 16K sub block is bad. This is
> extra annoying on AG-AND where the bad block rate can be quite high.

Ouch!  Well, combining blocks to groups makes sense in other scenarios
as well.  With huge flashes, the number of blocks can also grow quite
large.  And I wouldn't want to introduce rb_tree structures for the
blocks.

Hard problem, though.

Jörn

-- 
When I am working on a problem I never think about beauty.  I think
only how to solve the problem.  But when I have finished, if the
solution is not beautiful, I know it is wrong.
-- R. Buckminster Fuller

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 17:14       ` Artem B. Bityuckiy
@ 2005-01-12 22:30         ` Jared Hulbert
  2005-01-12 22:43           ` Josh Boyer
                             ` (2 more replies)
  0 siblings, 3 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-12 22:30 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: David Woodhouse, MTD List

> > Clarification on NOR technology.  Remember that the ability to run
> > code XIP is effectively a requirement for a NOR chip.  This means no
> > read errors can leave the chip.  I don't see this changing in the
> > foreseeable future.  Any read errors that do occur would probably be
> > caused by a failed/incomplete program
> >
> > We probably do want to be able to easily retire, reprogram, and/or
> > test those blocks/pages that get read errors.  That would have to be
> > done in the filesystem and the chip driver needs to be able report a
> > read error occured.
> 
> What would you conclude from this (in the context of disscussion)? That
> CRC *must* be *always* checked in case of NOR?

Retiring blocks/pages was an idea for the more lossy flash NAND, AND
etc.  If your media goes bad with NOR you probably can't boot anyway.

I'm thinking the opposite conclusion.  If I understand this correctly
most CRC's on NOR are wasted effort.  I don't claim to quite
understand JFFS2 architecture yet but it seems to me the data CRC's
are not needed for NOR, perhaps some of the other CRC's are not needed
as well.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:30         ` Jared Hulbert
@ 2005-01-12 22:43           ` Josh Boyer
  2005-01-12 22:55             ` Jared Hulbert
  2005-01-13  7:54             ` David Woodhouse
  2005-01-13  8:25           ` Artem B. Bityuckiy
  2005-01-13 15:09           ` Jörn Engel
  2 siblings, 2 replies; 196+ messages in thread
From: Josh Boyer @ 2005-01-12 22:43 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List, David Woodhouse

On Wed, 2005-01-12 at 16:30, Jared Hulbert wrote:
> > > Clarification on NOR technology.  Remember that the ability to run
> > > code XIP is effectively a requirement for a NOR chip.  This means no
> > > read errors can leave the chip.  I don't see this changing in the
> > > foreseeable future.  Any read errors that do occur would probably be
> > > caused by a failed/incomplete program
> > >
> > > We probably do want to be able to easily retire, reprogram, and/or
> > > test those blocks/pages that get read errors.  That would have to be
> > > done in the filesystem and the chip driver needs to be able report a
> > > read error occured.
> > 
> > What would you conclude from this (in the context of disscussion)? That
> > CRC *must* be *always* checked in case of NOR?
> 
> Retiring blocks/pages was an idea for the more lossy flash NAND, AND
> etc.  If your media goes bad with NOR you probably can't boot anyway.

That's not true.  Eraseblocks can go bad during operation.  That doesn't
mean that the whole device returns bad data.

> 
> I'm thinking the opposite conclusion.  If I understand this correctly
> most CRC's on NOR are wasted effort.  I don't claim to quite
> understand JFFS2 architecture yet but it seems to me the data CRC's
> are not needed for NOR, perhaps some of the other CRC's are not needed
> as well.

CRCs are needed.  Or rather, some form of checksum is needed.  Bits flip
during operation on NOR as well.  I've seen it happen.  It's rare, but
as David put it in an IRC conversation "it's a sanity check on the
hardware".

There's sort of multiple threads on this topic, so maybe check some of
those.  We even got Joern to agree they're needed :).

josh

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:43           ` Josh Boyer
@ 2005-01-12 22:55             ` Jared Hulbert
  2005-01-13 15:50               ` Josh Boyer
  2005-01-13  7:54             ` David Woodhouse
  1 sibling, 1 reply; 196+ messages in thread
From: Jared Hulbert @ 2005-01-12 22:55 UTC (permalink / raw)
  To: Josh Boyer; +Cc: MTD List, David Woodhouse

> That's not true.  Eraseblocks can go bad during operation.  That doesn't
> mean that the whole device returns bad data.

Sure, during an erase...  How does a checksum help you here?

> CRCs are needed.  Or rather, some form of checksum is needed.  Bits flip
> during operation on NOR as well.  I've seen it happen.  It's rare, but
> as David put it in an IRC conversation "it's a sanity check on the
> hardware".
>
> There's sort of multiple threads on this topic, so maybe check some of
> those.  We even got Joern to agree they're needed :).

I respectfully disagree.  I don't think checksums are needed to
protect you from NOR read errors *unless* the checksums are the only
thing protecting the filesystem from bad things like crashes, power
failures, and bugs.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:43           ` Josh Boyer
  2005-01-12 22:55             ` Jared Hulbert
@ 2005-01-13  7:54             ` David Woodhouse
  1 sibling, 0 replies; 196+ messages in thread
From: David Woodhouse @ 2005-01-13  7:54 UTC (permalink / raw)
  To: Josh Boyer; +Cc: MTD List

On Wed, 2005-01-12 at 16:43 -0600, Josh Boyer wrote:
> CRCs are needed.  Or rather, some form of checksum is needed.  Bits flip
> during operation on NOR as well.  I've seen it happen.  It's rare, but
> as David put it in an IRC conversation "it's a sanity check on the
> hardware".

More than just the hardware -- it's not just bit flips. The name CRC is
often the canary which warns of corruption due to the arch having broken
fixups for misaligned loads/stores, for example.

Shit happens. You can either work that out for yourself, or just blindly
trust everything on the medium and end up with inconsistent data
structures and entirely bizarre and uninterpretable failure modes. 

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:30         ` Jared Hulbert
  2005-01-12 22:43           ` Josh Boyer
@ 2005-01-13  8:25           ` Artem B. Bityuckiy
  2005-01-13 15:09           ` Jörn Engel
  2 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13  8:25 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: David Woodhouse, MTD List

> I'm thinking the opposite conclusion.  If I understand this correctly
> most CRC's on NOR are wasted effort.  I don't claim to quite
> understand JFFS2 architecture yet but it seems to me the data CRC's
> are not needed for NOR, perhaps some of the other CRC's are not needed
> as well.
Hmm.

Suppose you have some device with NOR flash. That NOR flash stores your 
files (flash file system works there, e.g. JFFS2). Suppose one of your 
flash blocks become bad.

1. Scenario 1: You do not have any checksum or do not check it. 
One of network-related library is loaded corrupted. It is loaded and start 
sending private keys of your corporative clients to random addresses (if 
your device is some router/switch etc).
Lots of nasty things may happen.

2. Secenario 2: Filesystem detects the CRC error, realizes that this error 
is due to Media corruption. It prints the error message and returns -EIO 
to the caller task which loads your library.

Yes, in both cases you put your device/NOR flash to the rubbish can, but 
without checksum you may end up with much more serious consequences.

P.S. But JFFS2 does not behave as I described in scenario 2. I just 
ignores the bad node, prints the error message, but still returns OK. So 
you still might present private keys or another confidentional information 
to potential plotters. But JFFS2 is not supposed to be ideal :-)

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-11 21:51 ` Jörn Engel
  2005-01-12  0:06   ` Thomas Gleixner
  2005-01-12  9:15   ` Artem B. Bityuckiy
@ 2005-01-13 14:49   ` Artem B. Bityuckiy
  2005-01-13 15:05     ` Artem B. Bityuckiy
  2 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 14:49 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, MTD List

> > P.S. By the way, we could put CRCs at the end of blocks (*after* data) 
> > in this case CRC well be extremely strong detecting unclean reboots, 
> > isn't it?
>
> Interesting idea.  Will make the code slightly messy, but it should be
> worth it.
Now I like this Idea more.

See the benefits:
1. We are able to use weaker then CRC32 checksums and still be good in 
detecting wrong due to unclean reboot nodes.
2. We have principal ability to distingush between nodes corrupted due to 
flash problems and due to unclean reboots.
This is extreemly good do be able to distinguish. (JFFS2 has no such 
ability). 

Examples:
	a. We will not print frightfull messges like "CRC ERROR!!!! 
You'ra going to die!" and will not make users worry if the error was 
cased by unclean reboot. Unclean reboots is qute frequent thing.
	b. When do iget() on file and see corrupted nodes, we just ignore 
them if they are due to unclean reboot. But we return -EIO if they are due 
to flash corruptions. Currently JFFS2 happily ignores corrupted nodes and 
still keeps working with the corrupted file. That is no good.
It is better, for example, to regect opening corrupted /lib/libc.o then 
just open but corrupted file.

Thoughts?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 14:49   ` Artem B. Bityuckiy
@ 2005-01-13 15:05     ` Artem B. Bityuckiy
  2005-01-13 15:17       ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 15:05 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Thu, 13 Jan 2005, Artem B. Bityuckiy wrote:

> > > P.S. By the way, we could put CRCs at the end of blocks (*after* data) 
> > > in this case CRC well be extremely strong detecting unclean reboots, 
> > > isn't it?
> >
> > Interesting idea.  Will make the code slightly messy, but it should be
> > worth it.
> Now I like this Idea more.
> 
> See the benefits:
> 1. We are able to use weaker then CRC32 checksums and still be good in 
> detecting wrong due to unclean reboot nodes.
> 2. We have principal ability to distingush between nodes corrupted due to 
> flash problems and due to unclean reboots.
> This is extreemly good do be able to distinguish. (JFFS2 has no such 
> ability). 
> 
> Examples:
> 	a. We will not print frightfull messges like "CRC ERROR!!!! 
> You'ra going to die!" and will not make users worry if the error was 
> cased by unclean reboot. Unclean reboots is qute frequent thing.
> 	b. When do iget() on file and see corrupted nodes, we just ignore 
> them if they are due to unclean reboot. But we return -EIO if they are due 
> to flash corruptions. Currently JFFS2 happily ignores corrupted nodes and 
> still keeps working with the corrupted file. That is no good.
> It is better, for example, to regect opening corrupted /lib/libc.o then 
> just open but corrupted file.
> 
> Thoughts?
Hmm, only one small note, putting CRC to the end does not allow us to 
recognize unclean reboots and flash media corruptions :-) But we still may 
do this putting some majic bitmask to the end of nodes.

> 
> --
> Best Regards,
> Artem B. Bityuckiy,
> St.-Petersburg, Russia.
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:30         ` Jared Hulbert
  2005-01-12 22:43           ` Josh Boyer
  2005-01-13  8:25           ` Artem B. Bityuckiy
@ 2005-01-13 15:09           ` Jörn Engel
  2 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 15:09 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List, David Woodhouse

On Wed, 12 January 2005 14:30:32 -0800, Jared Hulbert wrote:
> 
> Retiring blocks/pages was an idea for the more lossy flash NAND, AND
> etc.  If your media goes bad with NOR you probably can't boot anyway.
> 
> I'm thinking the opposite conclusion.  If I understand this correctly
> most CRC's on NOR are wasted effort.  I don't claim to quite
> understand JFFS2 architecture yet but it seems to me the data CRC's
> are not needed for NOR, perhaps some of the other CRC's are not needed
> as well.

I thought the same, but Josh convinced me.  NOR *should* just work and
never have problems.  Checksums *should* be a wasted effort.  But for
some reason, both the flash vendor and quality assurance messed up and
broken flashes got were soldered into products.  Nasty.

So at this point, checksums are quite nice because they effectively do
the job that quality assurance should have done.

Jörn

-- 
When you close your hand, you own nothing. When you open it up, you
own the whole world.
-- Li Mu Bai in Tiger & Dragon

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:05     ` Artem B. Bityuckiy
@ 2005-01-13 15:17       ` Jörn Engel
  2005-01-13 15:22         ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 15:17 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, David Woodhouse

On Thu, 13 January 2005 15:05:16 +0000, Artem B. Bityuckiy wrote:
> On Thu, 13 Jan 2005, Artem B. Bityuckiy wrote:
> 
> > See the benefits:
> > 1. We are able to use weaker then CRC32 checksums and still be good in 
> > detecting wrong due to unclean reboot nodes.
> > 2. We have principal ability to distingush between nodes corrupted due to 
> > flash problems and due to unclean reboots.
> > This is extreemly good do be able to distinguish. (JFFS2 has no such 
> > ability). 

Right, very nice attributes.

> > Thoughts?
> Hmm, only one small note, putting CRC to the end does not allow us to 
> recognize unclean reboots and flash media corruptions :-) But we still may 
> do this putting some majic bitmask to the end of nodes.

static uint32_t jffs3_checksum(...)
{
	static uint32_t ret = do_jffs3_checksum(...);
	if (ret == 0xffffffff)
		return 0;
	return ret;
}

When reading the node, simply compare the checksum to 0xffffffff.  If
it matches, we know that
a) the checksum is wrong and 
b) it was due to incomplete write, not flash corruption.

Adler32 has the nice property that 0xffffffff is an impossible
checksum by design already, so the above code wouldn't be necessary.

Jörn

-- 
...one more straw can't possibly matter...
-- Kirby Bakken

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:17       ` Jörn Engel
@ 2005-01-13 15:22         ` Artem B. Bityuckiy
  2005-01-13 15:40           ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 15:22 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Thu, 13 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Thu, 13 January 2005 15:05:16 +0000, Artem B. Bityuckiy wrote:
> > On Thu, 13 Jan 2005, Artem B. Bityuckiy wrote:
> > 
> > > See the benefits:
> > > 1. We are able to use weaker then CRC32 checksums and still be good in 
> > > detecting wrong due to unclean reboot nodes.
> > > 2. We have principal ability to distingush between nodes corrupted due to 
> > > flash problems and due to unclean reboots.
> > > This is extreemly good do be able to distinguish. (JFFS2 has no such 
> > > ability). 
> 
> Right, very nice attributes.
> 
> > > Thoughts?
> > Hmm, only one small note, putting CRC to the end does not allow us to 
> > recognize unclean reboots and flash media corruptions :-) But we still may 
> > do this putting some majic bitmask to the end of nodes.
> 
> static uint32_t jffs3_checksum(...)
> {
> 	static uint32_t ret = do_jffs3_checksum(...);
> 	if (ret == 0xffffffff)
> 		return 0;
> 	return ret;
> }
> 
> When reading the node, simply compare the checksum to 0xffffffff.  If
> it matches, we know that
> a) the checksum is wrong and 
> b) it was due to incomplete write, not flash corruption.
> 
> Adler32 has the nice property that 0xffffffff is an impossible
> checksum by design already, so the above code wouldn't be necessary.
Yes, I thought so either. But imagine due to unclean reboot the last CRC 
was not written completely. this is the problem.
> 
> Jörn
> 
> -- 
> ...one more straw can't possibly matter...
> -- Kirby Bakken
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:22         ` Artem B. Bityuckiy
@ 2005-01-13 15:40           ` Jörn Engel
  2005-01-13 15:49             ` David Woodhouse
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 15:40 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, David Woodhouse

On Thu, 13 January 2005 15:22:38 +0000, Artem B. Bityuckiy wrote:
>
> Yes, I thought so either. But imagine due to unclean reboot the last CRC 
> was not written completely. this is the problem.

Agreed, it's hard to fully protect against such races.

Here is another design I'm not very happy about.  But it might work:

1) Checksum errors on the last node are silently ignored - most likely
   unclean reboot.

2) Last node per erase block is a special endmarker node, similar to
   cleanmarkers.

3) During unmount, an endmarker is written to the end of all
   partially-filled erase blocks.

Rule 1 makes sure we can somewhat distinguish between both problems
and don't report any false positives.  Rule 2 makes sure, we don't
miss real flash corruption on the last node, in case it occurs to the
last node of a full erase block.  Rule 3 closes the next small hole.

As long as GC doesn't obsolete/erase old node before the GC'd
something else is written behind the new node, this should be pretty
safe.  But it's horribly complicated.  Would be much nicer if the
checksum itself could distinguish between both cases.

Jörn

-- 
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:40           ` Jörn Engel
@ 2005-01-13 15:49             ` David Woodhouse
  2005-01-13 15:53               ` Artem B. Bityuckiy
  2005-01-13 16:13               ` Jörn Engel
  0 siblings, 2 replies; 196+ messages in thread
From: David Woodhouse @ 2005-01-13 15:49 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Thu, 2005-01-13 at 16:40 +0100, Jörn Engel wrote:
> 
> 2) Last node per erase block is a special endmarker node, similar to
>    cleanmarkers.
> 
> 3) During unmount, an endmarker is written to the end of all
>    partially-filled erase blocks.

This ties in quite nicely with the checkpointing where such a per-block
'endmarker' would contain a summary of what the mount code would glean
by reading through that eraseblock.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-12 22:55             ` Jared Hulbert
@ 2005-01-13 15:50               ` Josh Boyer
  2005-01-13 18:30                 ` Jared Hulbert
  0 siblings, 1 reply; 196+ messages in thread
From: Josh Boyer @ 2005-01-13 15:50 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List, David Woodhouse

On Wed, 2005-01-12 at 16:55, Jared Hulbert wrote:
> 
> I respectfully disagree.  I don't think checksums are needed to
> protect you from NOR read errors *unless* the checksums are the only
> thing protecting the filesystem from bad things like crashes, power
> failures, and bugs.

Guess we'll have to agree to disagree then :).  All I know is that I
want to be damn sure that the data I'm returning isn't totally screwed. 
Call me paranoid.  A checksum is the only way I know of doing that.

David summed it up pretty good.  Crap happens in a multitude of ways. 
Even a bug in the write method of the filesystem could cause invalid
data to be written.  A checksum can at least help point stuff like that
out.

Poo is icky.  I'd rather try to go around it than jump in the middle of
it. :)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:49             ` David Woodhouse
@ 2005-01-13 15:53               ` Artem B. Bityuckiy
  2005-01-13 16:13               ` Jörn Engel
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 15:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Thu, 13 Jan 2005, David Woodhouse wrote:

> On Thu, 2005-01-13 at 16:40 +0100, Jörn Engel wrote:
> > 
> > 2) Last node per erase block is a special endmarker node, similar to
> >    cleanmarkers.
> > 
> > 3) During unmount, an endmarker is written to the end of all
> >    partially-filled erase blocks.
> 
> This ties in quite nicely with the checkpointing where such a per-block
> 'endmarker' would contain a summary of what the mount code would glean
> by reading through that eraseblock.
Heh, really. I've forgotten about summaries. They are our markers.
> 
> -- 
> dwmw2
> 
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:49             ` David Woodhouse
  2005-01-13 15:53               ` Artem B. Bityuckiy
@ 2005-01-13 16:13               ` Jörn Engel
  2005-01-13 16:16                 ` Artem B. Bityuckiy
  2005-01-13 16:21                 ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 16:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Thu, 13 January 2005 15:49:09 +0000, David Woodhouse wrote:
> On Thu, 2005-01-13 at 16:40 +0100, Jörn Engel wrote:
> > 
> > 2) Last node per erase block is a special endmarker node, similar to
> >    cleanmarkers.
> > 
> > 3) During unmount, an endmarker is written to the end of all
> >    partially-filled erase blocks.
> 
> This ties in quite nicely with the checkpointing where such a per-block
> 'endmarker' would contain a summary of what the mount code would glean
> by reading through that eraseblock.

Sounds like "design makes sense, please send patches". :)

Would this also allow us to use adler32/adler32r as checksum
algorithm?

Jörn

-- 
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface. 
-- Doug MacIlroy

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 16:13               ` Jörn Engel
@ 2005-01-13 16:16                 ` Artem B. Bityuckiy
  2005-01-13 16:21                   ` Jörn Engel
  2005-01-13 16:21                 ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 16:16 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Thu, 13 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Thu, 13 January 2005 15:49:09 +0000, David Woodhouse wrote:
> > On Thu, 2005-01-13 at 16:40 +0100, Jörn Engel wrote:
> > > 
> > > 2) Last node per erase block is a special endmarker node, similar to
> > >    cleanmarkers.
> > > 
> > > 3) During unmount, an endmarker is written to the end of all
> > >    partially-filled erase blocks.
> > 
> > This ties in quite nicely with the checkpointing where such a per-block
> > 'endmarker' would contain a summary of what the mount code would glean
> > by reading through that eraseblock.
> 
> Sounds like "design makes sense, please send patches". :)
> 
> Would this also allow us to use adler32/adler32r as checksum
> algorithm?
IMHO:
Keeping CRC at the end still makes sence.

> 
> Jörn
> 
> -- 
> Write programs that do one thing and do it well. Write programs to work
> together. Write programs to handle text streams, because that is a
> universal interface. 
> -- Doug MacIlroy
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 16:16                 ` Artem B. Bityuckiy
@ 2005-01-13 16:21                   ` Jörn Engel
  2005-01-13 16:22                     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 16:21 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, David Woodhouse

On Thu, 13 January 2005 16:16:35 +0000, Artem B. Bityuckiy wrote:
>
> IMHO:
> Keeping CRC at the end still makes sence.

Sure, but does it have to be CRC?  Wouldn't adler32 be stong enough?

Jörn

-- 
A victorious army first wins and then seeks battle.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 16:13               ` Jörn Engel
  2005-01-13 16:16                 ` Artem B. Bityuckiy
@ 2005-01-13 16:21                 ` Artem B. Bityuckiy
  2005-01-14 13:46                   ` Jamey Hicks
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 16:21 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Thu, 13 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Thu, 13 January 2005 15:49:09 +0000, David Woodhouse wrote:
> > On Thu, 2005-01-13 at 16:40 +0100, Jörn Engel wrote:
> > > 
> > > 2) Last node per erase block is a special endmarker node, similar to
> > >    cleanmarkers.
> > > 
> > > 3) During unmount, an endmarker is written to the end of all
> > >    partially-filled erase blocks.
> > 
> > This ties in quite nicely with the checkpointing where such a per-block
> > 'endmarker' would contain a summary of what the mount code would glean
> > by reading through that eraseblock.
> 
> Sounds like "design makes sense, please send patches". :)
Design makes sence, patches makes more sence, running our test  
different platforms makes even more sence. :-) Now we have only x86/ARM 
results. Need more...

> 
> Would this also allow us to use adler32/adler32r as checksum
> algorithm?
> 
> Jörn
> 
> -- 
> Write programs that do one thing and do it well. Write programs to work
> together. Write programs to handle text streams, because that is a
> universal interface. 
> -- Doug MacIlroy
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 16:21                   ` Jörn Engel
@ 2005-01-13 16:22                     ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 16:22 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, David Woodhouse

On Thu, 13 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Thu, 13 January 2005 16:16:35 +0000, Artem B. Bityuckiy wrote:
> >
> > IMHO:
> > Keeping CRC at the end still makes sence.
> 
> Sure, but does it have to be CRC?  Wouldn't adler32 be stong enough?
Sorry, this is not the first time you correct me. Yes, I mean checksum in 
general.
> 
> Jörn
> 
> -- 
> A victorious army first wins and then seeks battle.
> -- Sun Tzu
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 15:50               ` Josh Boyer
@ 2005-01-13 18:30                 ` Jared Hulbert
  2005-01-13 18:36                   ` David Woodhouse
                                     ` (2 more replies)
  0 siblings, 3 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-13 18:30 UTC (permalink / raw)
  To: Josh Boyer; +Cc: MTD List, David Woodhouse

> Guess we'll have to agree to disagree then :).  All I know is that I
> want to be damn sure that the data I'm returning isn't totally screwed.
> Call me paranoid.  A checksum is the only way I know of doing that.

Paranoid :)

Checksums may reduce the chance, as Artem says,  of having your device "start
sending private keys of your corporative clients to random addresses" 
because of flash blocks going bad.  Last I checked, read errors were a
very, very improbable event with high quality NOR and besides
checksuming the filesystem won't help protecting us from gamma rays
causing such a problem once the library is in RAM;)

To humor those of us willing to take our chances trusting the media
won't go bad, would it be possible to architect JFFS3 such that
disabling the checksumming or stripping it out is possible with out
too much pain?

Given the performance we get out of even the fastest checksum
algorithms proposed and tested here, it seems checksumming data that
doesn't need it would be a significant performance bottleneck.  I see
this filesystem bottleneck as a big issue when trying to get Linux to
boot really fast, such as for a cell phone.  I understand that most
cell phones have NOR that can be counted on not to have read errors
and that a single successful Linux based phone model could, in matter
of weeks, become the source of the vast majority of running instances
of JFFS2/3 in the universe.  Some would see that as more support for
the need for checksums, but I think it says it's worth adding a
no-checksum option to serve this potential userbase that just doesn't
need the checksums but does need the speed.

I'm not trying to drag on this discussion just for kicks.  In fact I
think it's probably not worth replying to this message, unless there
is something really fantasic and new to add (For example "By George,
he's right!").  I think we all understand each other now.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:30                 ` Jared Hulbert
@ 2005-01-13 18:36                   ` David Woodhouse
  2005-01-13 19:06                     ` Jörn Engel
  2005-01-13 19:22                     ` Josh Boyer
  2005-01-13 18:55                   ` Artem B. Bityuckiy
  2005-01-28  6:08                   ` Eric W. Biederman
  2 siblings, 2 replies; 196+ messages in thread
From: David Woodhouse @ 2005-01-13 18:36 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List

On Thu, 2005-01-13 at 10:30 -0800, Jared Hulbert wrote:
> I think it says it's worth adding a no-checksum option to serve this
> potential userbase that just doesn't need the checksums but does need
> the speed.

You may be right. That doesn't make me relish the thought of providing
support to people who do that though :)

Perhaps we could make it a mount option, and that way it's easy enough
for people to turn it on if they have problems, before we'll talk to
them about it.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:30                 ` Jared Hulbert
  2005-01-13 18:36                   ` David Woodhouse
@ 2005-01-13 18:55                   ` Artem B. Bityuckiy
  2005-01-13 19:10                     ` Brian Fox
  2005-01-28  6:08                   ` Eric W. Biederman
  2 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 18:55 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List, David Woodhouse

May be smth like:
#ifdef I_AM_NOT_PARANOID
static inline uint32_t
checksum(void *buf)
{
	return 0;
}

static inline int
is_checksum_correct(uint32_t checksum, void *buf)
{
	return 1;
}
#else
static inline uint32_t
checksum(void *buf)
{
        return do_checksum(JFFS3_CRC_INITIAL, buf);
}

static inline int
is_checksum_correct(uint32_t checksum, void *buf)
{
        return checksum == do_checksum(JFFS3_CRC_INITIAL, buf);
}
#endif

:-) ?

On Thu, 13 Jan 2005, Jared Hulbert wrote:
> > Guess we'll have to agree to disagree then :).  All I know is that I
> > want to be damn sure that the data I'm returning isn't totally screwed.
> > Call me paranoid.  A checksum is the only way I know of doing that.
> 
> Paranoid :)
> 
> Checksums may reduce the chance, as Artem says,  of having your device "start
> sending private keys of your corporative clients to random addresses" 
> because of flash blocks going bad.  Last I checked, read errors were a
> very, very improbable event with high quality NOR and besides
> checksuming the filesystem won't help protecting us from gamma rays
> causing such a problem once the library is in RAM;)
> 
> To humor those of us willing to take our chances trusting the media
> won't go bad, would it be possible to architect JFFS3 such that
> disabling the checksumming or stripping it out is possible with out
> too much pain?
> 
> Given the performance we get out of even the fastest checksum
> algorithms proposed and tested here, it seems checksumming data that
> doesn't need it would be a significant performance bottleneck.  I see
> this filesystem bottleneck as a big issue when trying to get Linux to
> boot really fast, such as for a cell phone.  I understand that most
> cell phones have NOR that can be counted on not to have read errors
> and that a single successful Linux based phone model could, in matter
> of weeks, become the source of the vast majority of running instances
> of JFFS2/3 in the universe.  Some would see that as more support for
> the need for checksums, but I think it says it's worth adding a
> no-checksum option to serve this potential userbase that just doesn't
> need the checksums but does need the speed.
> 
> I'm not trying to drag on this discussion just for kicks.  In fact I
> think it's probably not worth replying to this message, unless there
> is something really fantasic and new to add (For example "By George,
> he's right!").  I think we all understand each other now.
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:36                   ` David Woodhouse
@ 2005-01-13 19:06                     ` Jörn Engel
  2005-01-13 19:22                     ` Josh Boyer
  1 sibling, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-13 19:06 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Thu, 13 January 2005 18:36:53 +0000, David Woodhouse wrote:
> On Thu, 2005-01-13 at 10:30 -0800, Jared Hulbert wrote:
> > I think it says it's worth adding a no-checksum option to serve this
> > potential userbase that just doesn't need the checksums but does need
> > the speed.
> 
> You may be right. That doesn't make me relish the thought of providing
> support to people who do that though :)
> 
> Perhaps we could make it a mount option, and that way it's easy enough
> for people to turn it on if they have problems, before we'll talk to
> them about it.

How about making the checksum algorithm a config option?  "return 0"
will also speed up the writes.

If this change is needed to get some million cell phones converted to
jffs3, I'm all for it.

Jörn

-- 
But this is not to say that the main benefit of Linux and other GPL
software is lower-cost. Control is the main benefit--cost is secondary.
-- Bruce Perens

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:55                   ` Artem B. Bityuckiy
@ 2005-01-13 19:10                     ` Brian Fox
  2005-01-13 19:23                       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Brian Fox @ 2005-01-13 19:10 UTC (permalink / raw)
  To: Artem B. Bityuckiy, Jared Hulbert; +Cc: David Woodhouse, MTD List

On 1/13/05 10:55 AM, "Artem B. Bityuckiy" <dedekind@infradead.org> wrote:

> May be smth like:
> #ifdef I_AM_NOT_PARANOID
> static inline uint32_t
> checksum(void *buf)
> {
> return 0;
> } 

Well, David says a mount time option, which means it should be available all
the time, and on a per mount basis.

So, why not have a checksum function jump table, per filesystem, and have
all of the callers use that?

Thanks,

Brian
-- 
-------------------------------------------------------------------
Brian J. Fox                                       netCelerant, Inc
CTO                                             www.netCelerant.com
bfox@netCelerant.com                             Ph: (805) 275-0214
-------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:36                   ` David Woodhouse
  2005-01-13 19:06                     ` Jörn Engel
@ 2005-01-13 19:22                     ` Josh Boyer
  1 sibling, 0 replies; 196+ messages in thread
From: Josh Boyer @ 2005-01-13 19:22 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Thu, 2005-01-13 at 12:36, David Woodhouse wrote:
> On Thu, 2005-01-13 at 10:30 -0800, Jared Hulbert wrote:
> > I think it says it's worth adding a no-checksum option to serve this
> > potential userbase that just doesn't need the checksums but does need
> > the speed.
> 
> You may be right. That doesn't make me relish the thought of providing
> support to people who do that though :)
> 
> Perhaps we could make it a mount option, and that way it's easy enough
> for people to turn it on if they have problems, before we'll talk to
> them about it.

Sure, and option would be fine with me.  Just as long as there is the
choice for those of us in the tin-foil hats :).

josh

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 19:10                     ` Brian Fox
@ 2005-01-13 19:23                       ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-13 19:23 UTC (permalink / raw)
  To: Brian Fox; +Cc: David Woodhouse, MTD List

We might as well.
We even might to make it per file using fcntl.
(C) Josh W. Boyer: implement xattr support and use it for these purposes.

On Thu, 13 Jan 2005, Brian Fox wrote:
> On 1/13/05 10:55 AM, "Artem B. Bityuckiy" <dedekind@infradead.org> wrote:
> 
> > May be smth like:
> > #ifdef I_AM_NOT_PARANOID
> > static inline uint32_t
> > checksum(void *buf)
> > {
> > return 0;
> > } 
> 
> Well, David says a mount time option, which means it should be available all
> the time, and on a per mount basis.
> 
> So, why not have a checksum function jump table, per filesystem, and have
> all of the callers use that?
> 
> Thanks,
> 
> Brian
> -- 
> -------------------------------------------------------------------
> Brian J. Fox                                       netCelerant, Inc
> CTO                                             www.netCelerant.com
> bfox@netCelerant.com                             Ph: (805) 275-0214
> -------------------------------------------------------------------
> 
> 
> 
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 16:21                 ` Artem B. Bityuckiy
@ 2005-01-14 13:46                   ` Jamey Hicks
  2005-01-14 14:16                     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jamey Hicks @ 2005-01-14 13:46 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, David Woodhouse

Artem B. Bityuckiy wrote:

> Design makes sence, patches makes more sence, running our test 
> different platforms makes even more sence. :-) Now we have only 
> x86/ARM results. Need more...
>
I'm running jffs2 on blackfin uClinux  I could run jffs3.  What kind of 
test results are you looking for?

Jamey

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-14 13:46                   ` Jamey Hicks
@ 2005-01-14 14:16                     ` Artem B. Bityuckiy
  2005-01-18 15:50                       ` Joakim Tjernlund
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-14 14:16 UTC (permalink / raw)
  To: Jamey Hicks; +Cc: MTD List, David Woodhouse

Of course it is taktless to refer you to read the thread from the 
beginning - it is large :-))

This thread has multiple subthreads. We begun discussing the issue 
of optimizing JFFS3 - we concieder the possibility to use adler32 checksum 
instead of CRC32 which is much slower, though, stronger. There are several 
candidates...

There was interesting offer to calculate checksum from the end of buffer 
since it implies better L1 cache behaviour. Then there was another 
argument against this idea - it was standed that some CPUs might be 
capable to prefetch data forward and not capable to do it backward.

So, we've written test which just tests different CRCs and 
backward/forward directions.

Here is the reference where you could get this test:
http://lists.infradead.org/pipermail/linux-mtd/2005-January/011403.html

There are two results either: for ARM and x86.

The test works in kernel space and you should compile it as module. ARM is 
special case, and you should compile the test with the kernel (or export 
the system_timer structure).

So, if you run the test and provide results - would be good. We'se 
interesting on any architecture except x86.

If you're curious, may read the thread from the beginning. Here is it:
http://lists.infradead.org/pipermail/linux-mtd/2004-December/011137.html
It spans from dec to Jan and is in the archive.

Of course, any suggestions are welcomed.

On Fri, 14 Jan 2005, Jamey Hicks wrote:
> Artem B. Bityuckiy wrote:
> 
> > Design makes sence, patches makes more sence, running our test 
> > different platforms makes even more sence. :-) Now we have only 
> > x86/ARM results. Need more...
> >
> I'm running jffs2 on blackfin uClinux  I could run jffs3.  What kind of 
> test results are you looking for?
> 
> Jamey
> 
> 
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2005-01-14 14:16                     ` Artem B. Bityuckiy
@ 2005-01-18 15:50                       ` Joakim Tjernlund
  2005-01-19 13:07                         ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Joakim Tjernlund @ 2005-01-18 15:50 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

> 
> There was interesting offer to calculate checksum from the end of buffer 
> since it implies better L1 cache behaviour. Then there was another 
> argument against this idea - it was standed that some CPUs might be 
> capable to prefetch data forward and not capable to do it backward.

Sorry for being so quiet lately, work is a bit much currently.

Another idea is to calculate the checksum while reading/writing(much like
the networking code does). That would probably require support from the MTD layer
and that the checksum is stored at the end of the JFFS3 node.

 Jocke

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: JFFS3 & performance
  2005-01-18 15:50                       ` Joakim Tjernlund
@ 2005-01-19 13:07                         ` Artem B. Bityuckiy
  2005-01-19 15:24                           ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-19 13:07 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: MTD List

On Tue, 18 Jan 2005, Joakim Tjernlund wrote:

> > 
> > There was interesting offer to calculate checksum from the end of buffer 
> > since it implies better L1 cache behaviour. Then there was another 
> > argument against this idea - it was standed that some CPUs might be 
> > capable to prefetch data forward and not capable to do it backward.
> 
> Sorry for being so quiet lately, work is a bit much currently.
> 
> Another idea is to calculate the checksum while reading/writing(much like
> the networking code does). That would probably require support from the MTD layer
> and that the checksum is stored at the end of the JFFS3 node.
Hmm. How to do this? In case of NAND we issue 
read command and provide buffer, and it is up to driver/hw how to read. 
One may use DMA.

>  Jocke
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 13:07                         ` Artem B. Bityuckiy
@ 2005-01-19 15:24                           ` Jörn Engel
  2005-01-19 15:27                             ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-19 15:24 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, Joakim Tjernlund

On Wed, 19 January 2005 13:07:39 +0000, Artem B. Bityuckiy wrote:
> On Tue, 18 Jan 2005, Joakim Tjernlund wrote:
> 
> > Another idea is to calculate the checksum while reading/writing(much like
> > the networking code does). That would probably require support from the MTD layer
> > and that the checksum is stored at the end of the JFFS3 node.
> Hmm. How to do this? In case of NAND we issue 
> read command and provide buffer, and it is up to driver/hw how to read. 
> One may use DMA.

Something like this?

int jffs3_write_with_checksum(void *data, ...)
{
	...
	if (mtd->write_with_checksum)
		return mtd->write_with_checksum(data, ...);
	mtd->write(data, ...);
	mtd->write(checksum(data), ...);
}

Jörn

-- 
Data dominates. If you've chosen the right data structures and organized
things well, the algorithms will almost always be self-evident. Data
structures, not algorithms, are central to programming.
-- Rob Pike

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 15:24                           ` Jörn Engel
@ 2005-01-19 15:27                             ` Artem B. Bityuckiy
  2005-01-19 15:32                               ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-19 15:27 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, Joakim Tjernlund

On Wed, 19 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Wed, 19 January 2005 13:07:39 +0000, Artem B. Bityuckiy wrote:
> > On Tue, 18 Jan 2005, Joakim Tjernlund wrote:
> > 
> > > Another idea is to calculate the checksum while reading/writing(much like
> > > the networking code does). That would probably require support from the MTD layer
> > > and that the checksum is stored at the end of the JFFS3 node.
> > Hmm. How to do this? In case of NAND we issue 
> > read command and provide buffer, and it is up to driver/hw how to read. 
> > One may use DMA.
> 
> Something like this?
> 
> int jffs3_write_with_checksum(void *data, ...)
> {
> 	...
> 	if (mtd->write_with_checksum)
> 		return mtd->write_with_checksum(data, ...);
> 	mtd->write(data, ...);
> 	mtd->write(checksum(data), ...);
> }
> 
> Jörn
Can't get it - What is the goal? Some advandages?

> 
> -- 
> Data dominates. If you've chosen the right data structures and organized
> things well, the algorithms will almost always be self-evident. Data
> structures, not algorithms, are central to programming.
> -- Rob Pike
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 15:27                             ` Artem B. Bityuckiy
@ 2005-01-19 15:32                               ` Jörn Engel
  2005-01-19 15:51                                 ` Artem B. Bityuckiy
  2005-01-19 19:58                                 ` Artem B. Bityuckiy
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-19 15:32 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, Joakim Tjernlund

On Wed, 19 January 2005 15:27:53 +0000, Artem B. Bityuckiy wrote:
> > 
> > Something like this?
> > 
> > int jffs3_write_with_checksum(void *data, ...)
> > {
> > 	...
> > 	if (mtd->write_with_checksum)
> > 		return mtd->write_with_checksum(data, ...);
> > 	mtd->write(data, ...);
> > 	mtd->write(checksum(data), ...);
> > }
> > 
> > Jörn
> Can't get it - What is the goal? Some advandages?

As you explained, Joakim's proposal doesn't always make sense.
Basically, if the underlying drives uses DMA, there is no point.  If
it does memcpy(), then it's faster to checksum/copy in a single loop,
rather than do both things individually.

With the above code, an mtd driver can supply a converged
checksum/copy method, but doesn't have to.  If it doesn't, we do
regular copy (through mtd->write) and checksum instead.

Jörn

-- 
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 15:32                               ` Jörn Engel
@ 2005-01-19 15:51                                 ` Artem B. Bityuckiy
  2005-01-19 16:31                                   ` Jörn Engel
  2005-01-19 19:58                                 ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-19 15:51 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List, Joakim Tjernlund

On Wed, 19 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Wed, 19 January 2005 15:27:53 +0000, Artem B. Bityuckiy wrote:
> > > 
> > > Something like this?
> > > 
> > > int jffs3_write_with_checksum(void *data, ...)
> > > {
> > > 	...
> > > 	if (mtd->write_with_checksum)
> > > 		return mtd->write_with_checksum(data, ...);
> > > 	mtd->write(data, ...);
> > > 	mtd->write(checksum(data), ...);
> > > }
> > > 
> > > Jörn
> > Can't get it - What is the goal? Some advandages?
> 
> As you explained, Joakim's proposal doesn't always make sense.
> Basically, if the underlying drives uses DMA, there is no point.  If
> it does memcpy(), then it's faster to checksum/copy in a single loop,
> rather than do both things individually.
> 
> With the above code, an mtd driver can supply a converged
> checksum/copy method, but doesn't have to.  If it doesn't, we do
> regular copy (through mtd->write) and checksum instead.
> 
Hmm. Don't know if this good.

In case of network CRCs are always needed and the algorithm is the same.

In our case MTD is not dedicated to JFFS2. It is consceptually wrong to 
implement this on MTD layer. This implies one more operation to drivers... 
And then one may want to use another checksum algorith.... Implement yet 
another operation? Doubts... (even understanding that this may lead to 
better performance...). Thinking more needed.

> Jörn
> 
> -- 
> The wise man seeks everything in himself; the ignorant man tries to get
> everything from somebody else.
> -- unknown
> 

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 15:51                                 ` Artem B. Bityuckiy
@ 2005-01-19 16:31                                   ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-19 16:31 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List, Joakim Tjernlund

On Wed, 19 January 2005 15:51:14 +0000, Artem B. Bityuckiy wrote:
> > 
> Hmm. Don't know if this good.
> 
> In case of network CRCs are always needed and the algorithm is the same.
> 
> In our case MTD is not dedicated to JFFS2. It is consceptually wrong to 
> implement this on MTD layer. This implies one more operation to drivers... 
> And then one may want to use another checksum algorith.... Implement yet 
> another operation? Doubts... (even understanding that this may lead to 
> better performance...). Thinking more needed.

Yup, it's fairly ugly.  Some ugliness in the name of performance is
fine, so benchmarks should decide.

Jörn

-- 
More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.
-- W. A. Wulf 

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 15:32                               ` Jörn Engel
  2005-01-19 15:51                                 ` Artem B. Bityuckiy
@ 2005-01-19 19:58                                 ` Artem B. Bityuckiy
  2005-01-20 14:35                                   ` Jörn Engel
  2005-01-21 22:46                                   ` Jared Hulbert
  1 sibling, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-19 19:58 UTC (permalink / raw)
  To: MTD List

Hello guys, just want to summarize. Here is what I think JFFS3 should do  
in case of checksum errors.


1 Flash media errors overview
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~

Both NOR and NAND flashes may have media errors. All errors may be divided 
on 2 classes:
1. Permanent errors - flash sector become bad.
2. Bit flips - data is corrupted in some sector. But sector may still be 
not bad.

1.1 NOR flash
    ~~~~~~~~~
NOR is supposed to be very reliable. Any error is considered as critical.

1.2 NAND flash
    ~~~~~~~~~~
NAND is not so reliable. NAND usually protects each NAND page by ECC 
codes. It is normal to NAND to have bad blocks.


2 Checksum errors and JFFS3 strategy
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The first requirement to JFFS3 is that it must distinguish between 
checksum errors due to unclean reboots and due to media errors. This is 
very helpful in lots of situations, see bellow. I do not discuss here how 
we can achieve it, it doesn't matter now - there are ways exist.

I consider 2 scenarios:
1. User does not care about detecting errors as soon as they appear. For 
example, user has multimedia data on the filesystem and it is OK if JFFS3 
report about errors not as soon as possible, may be on the next mount. 
Will refer this scenario as NOT_PARANOID.

2. User care about detecting errors on early stage. For example it makes 
sense if users cares about device may do something bad if some data is 
read corrupted (like libc.a is loaded corrupted and this cases some 
crucial data may be is erased). Will refer this scenario as PARANOID.

These 2 scenarious assume 2 JFFS3 working modes.

2.1 NOR Flash
    ~~~~~~~~~
Recall, I assume we have mechanism do detect partially written nodes (due 
to unclean reboots) *without* checking checksum.

2.1.1 NOT_PARANOID
      ~~~~~~~~~~~~
Checksums are neither generated nor checked.

2.1.2 PARANOID
      ~~~~~~~~
Checksums are always generated and checked.

2.2 NAND flash
    ~~~~~~~~~~

2.2.1 NOT_PARANOID
      ~~~~~~~~~~~~
Checkums are always generated, but checked only if there was ECC error 
during NAND page read.

2.2.2 PARANOID
      ~~~~~~~~
Checksums are always generated and always checked.

3. Read errors
   ~~~~~~~~~~~
If JFFS3 encounter read checksum error, JFFS3 rejects to read the 
corrupted file end reports -EIO to the caller.


4. Bad blocks
   ~~~~~~~~~~
NOR flash is not considered workable if there are bad blocks. So, this is 
NAND-only section. For NAND errors are assumed by the NAND technology.

Read errors (either ECC or CRC) do not mean the block become bad. This may 
be just occasional bit flips which will be repaired by the next erase.

Bad erase and write status (if we work in write-verify mode) mean block 
become bad.

5. Data recovery
   ~~~~~~~~~~~~~
If JFFS3 failed to write data it reads all valid data from this block and 
writes it to another (good) block. Then block is marked bad.

6. Checksum algorithm
   ~~~~~~~~~~~~~~~~~~
Pending issue. It is wanted to have something faster then CRC32.

Appendix
~~~~~~~

JFFS2 uses CRC to detect errors and in any error it just reject node. This 
is not the best behavior and we may fix this in JFFS3 (if it ever will be 
created).

Comments?


--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 19:58                                 ` Artem B. Bityuckiy
@ 2005-01-20 14:35                                   ` Jörn Engel
  2005-01-20 14:37                                     ` David Woodhouse
                                                       ` (2 more replies)
  2005-01-21 22:46                                   ` Jared Hulbert
  1 sibling, 3 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 14:35 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Wed, 19 January 2005 19:58:10 +0000, Artem B. Bityuckiy wrote:
> 
> 1 Flash media errors overview
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Both NOR and NAND flashes may have media errors. All errors may be divided 
> on 2 classes:
> 1. Permanent errors - flash sector become bad.
> 2. Bit flips - data is corrupted in some sector. But sector may still be 
> not bad.
> 
> 1.1 NOR flash
>     ~~~~~~~~~
> NOR is supposed to be very reliable. Any error is considered as critical.
> 
> 1.2 NAND flash
>     ~~~~~~~~~~
> NAND is not so reliable. NAND usually protects each NAND page by ECC 
> codes. It is normal to NAND to have bad blocks.
> 
> 
> 2 Checksum errors and JFFS3 strategy
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The first requirement to JFFS3 is that it must distinguish between 
> checksum errors due to unclean reboots and due to media errors. This is 
> very helpful in lots of situations, see bellow. I do not discuss here how 
> we can achieve it, it doesn't matter now - there are ways exist.
> 
> I consider 2 scenarios:
> 1. User does not care about detecting errors as soon as they appear. For 
> example, user has multimedia data on the filesystem and it is OK if JFFS3 
> report about errors not as soon as possible, may be on the next mount. 
> Will refer this scenario as NOT_PARANOID.

RELAXED may be a better name.  Less likely to misread as PARANOID.

> 2. User care about detecting errors on early stage. For example it makes 
> sense if users cares about device may do something bad if some data is 
> read corrupted (like libc.a is loaded corrupted and this cases some 
> crucial data may be is erased). Will refer this scenario as PARANOID.
> 
> These 2 scenarious assume 2 JFFS3 working modes.
> 
> 2.1 NOR Flash
>     ~~~~~~~~~
> Recall, I assume we have mechanism do detect partially written nodes (due 
> to unclean reboots) *without* checking checksum.
> 
> 2.1.1 NOT_PARANOID
>       ~~~~~~~~~~~~
> Checksums are neither generated nor checked.
> 
> 2.1.2 PARANOID
>       ~~~~~~~~
> Checksums are always generated and checked.
> 
> 2.2 NAND flash
>     ~~~~~~~~~~
> 
> 2.2.1 NOT_PARANOID
>       ~~~~~~~~~~~~
> Checkums are always generated, but checked only if there was ECC error 
> during NAND page read.

I dislike this.  Imo, we should handle NOR and NAND the same way.
There are two strategies, that make some sense:

a) Never generate checksums.
b) Always generate checksums, but never check them.

Strategy b) sounds pretty stupid, but it optimizes the 90% case - read
- and allows the user to remount the filesystem to switch to PARANOID
mode.  So, we could go as you proposed, we could settle for either a)
or b) or we could allow both.  In that case I'd call a) the SLOPPY
case and b) the RELAXED, just to distinguish things.

Which one makes most sense?

> 2.2.2 PARANOID
>       ~~~~~~~~
> Checksums are always generated and always checked.
> 
> 3. Read errors
>    ~~~~~~~~~~~
> If JFFS3 encounter read checksum error, JFFS3 rejects to read the 
> corrupted file end reports -EIO to the caller.

Imo, jffs3 should also set the FS_IS_CORRUPTED flag.  More below.

> 4. Bad blocks
>    ~~~~~~~~~~
> NOR flash is not considered workable if there are bad blocks. So, this is 
> NAND-only section. For NAND errors are assumed by the NAND technology.
> 
> Read errors (either ECC or CRC) do not mean the block become bad. This may 
> be just occasional bit flips which will be repaired by the next erase.
> 
> Bad erase and write status (if we work in write-verify mode) mean block 
> become bad.
> 
> 5. Data recovery
>    ~~~~~~~~~~~~~
> If JFFS3 failed to write data it reads all valid data from this block and 
> writes it to another (good) block. Then block is marked bad.

We shouldn't read the data back.  Make sure it still exists in the
wbuf and use that instead.  After all the block just turned bad, so it
would be better if we don't depend on it.

> 6. Checksum algorithm
>    ~~~~~~~~~~~~~~~~~~
> Pending issue. It is wanted to have something faster then CRC32.
> 
> Appendix
> ~~~~~~~
> 
> JFFS2 uses CRC to detect errors and in any error it just reject node. This 
> is not the best behavior and we may fix this in JFFS3 (if it ever will be 
> created).

Jffs3 flags design draft:
o We create a new node-type for flags.  It just contains the 12 bytes
  header plus a 4-byte flags field.
o If possible, the flags node is the first node for all erase blocks.
  It effectively replaces the erase marker.
o Flags can only be set within the lifetime of a filesystem.

Optional:
o Flags can also be cleared.  For this, the flags node needs an
  additional versions field.

With this in place, we can set a flag when detecting a checksum error
due to flash corruption.  Reading this flag on mount should print out
a big warning.  In PARANOID mode, we could also refuse to mount, after
detecting this flag.
Main point is that as soon as we get the first flash corruption, the
flash cannot be trusted anymore.  People may wish to ignore this,
that's fine.  But others may wish to disable the complete device,
generate a call home, blink some red LEDs on the case or whatever.

Jörn

-- 
Sometimes, asking the right question is already the answer.
-- Unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 14:35                                   ` Jörn Engel
@ 2005-01-20 14:37                                     ` David Woodhouse
  2005-01-20 14:40                                       ` Jörn Engel
  2005-01-20 15:05                                     ` Artem B. Bityuckiy
  2005-01-21 22:33                                     ` Jared Hulbert
  2 siblings, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-20 14:37 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Thu, 2005-01-20 at 15:35 +0100, Jörn Engel wrote:
> 
> a) Never generate checksums.
> b) Always generate checksums, but never check them.
> 
> Strategy b) sounds pretty stupid, but it optimizes the 90% case - read
> - and allows the user to remount the filesystem to switch to PARANOID
> mode.  So, we could go as you proposed, we could settle for either a)
> or b) or we could allow both.  In that case I'd call a) the SLOPPY
> case and b) the RELAXED, just to distinguish things.
> 
> Which one makes most sense?

B


-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 14:37                                     ` David Woodhouse
@ 2005-01-20 14:40                                       ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 14:40 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Thu, 20 January 2005 14:37:22 +0000, David Woodhouse wrote:
> On Thu, 2005-01-20 at 15:35 +0100, Jörn Engel wrote:
> > 
> > a) Never generate checksums.
> > b) Always generate checksums, but never check them.
> > 
> > Strategy b) sounds pretty stupid, but it optimizes the 90% case - read
> > - and allows the user to remount the filesystem to switch to PARANOID
> > mode.  So, we could go as you proposed, we could settle for either a)
> > or b) or we could allow both.  In that case I'd call a) the SLOPPY
> > case and b) the RELAXED, just to distinguish things.
> > 
> > Which one makes most sense?
> 
> B

I agree.  Deal.

Jörn

-- 
The strong give up and move away, while the weak give up and stay.
-- unknown

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 14:35                                   ` Jörn Engel
  2005-01-20 14:37                                     ` David Woodhouse
@ 2005-01-20 15:05                                     ` Artem B. Bityuckiy
  2005-01-20 15:27                                       ` Jörn Engel
  2005-01-21 22:33                                     ` Jared Hulbert
  2 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-20 15:05 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

> I dislike this.  Imo, we should handle NOR and NAND the same way.
> There are two strategies, that make some sense:
> 
> a) Never generate checksums.
> b) Always generate checksums, but never check them.
> 
> Strategy b) sounds pretty stupid, but it optimizes the 90% case - read
> - and allows the user to remount the filesystem to switch to PARANOID
> mode.  So, we could go as you proposed, we could settle for either a)
> or b) or we could allow both.  In that case I'd call a) the SLOPPY
> case and b) the RELAXED, just to distinguish things.
Fully agree. Thanks for noting.

> 
> Which one makes most sense?
Possybly B is better...

> > 5. Data recovery
> >    ~~~~~~~~~~~~~
> > If JFFS3 failed to write data it reads all valid data from this block and 
> > writes it to another (good) block. Then block is marked bad.
> 
> We shouldn't read the data back.  Make sure it still exists in the
> wbuf and use that instead.  After all the block just turned bad, so it
> would be better if we don't depend on it.
I mean the situation:
1. we write to the middle of block to, say, page 10 and have detected
error.
2. We then move pages 1-9 to another, good block. Write our 10th page
(from buffer) to that good block. Brk bogus block bad.

I'm not sure this is reasonale. May be JFFS2's behaviour is better - it
just skips page 10 and writes data to page 11. Then GC will move data,
and error will be detected during erase. But my argument it is not very
reliable. But is more workable if we have few space...

Another thing is JFFS2 recovery function, I didn't speak about it. Seems
you dislike that JFFS2 tries to read data from bad bage first instead of
just writing data from wbuf.

> 
> > 6. Checksum algorithm
> >    ~~~~~~~~~~~~~~~~~~
> > Pending issue. It is wanted to have something faster then CRC32.
> > 
> > Appendix
> > ~~~~~~~
> > 
> > JFFS2 uses CRC to detect errors and in any error it just reject node. This 
> > is not the best behavior and we may fix this in JFFS3 (if it ever will be 
> > created).
> 
> Jffs3 flags design draft:
> o We create a new node-type for flags.  It just contains the 12 bytes
>   header plus a 4-byte flags field.
Hmm, having distinct node-type for flags souns good. Do you mean that we
could store there information like GIGs PIGs either? (instead of having
it in each inode node, this, saving space)?
> o If possible, the flags node is the first node for all erase blocks.
>   It effectively replaces the erase marker.
Not sure this is good Idea, need thinking... 
> o Flags can only be set within the lifetime of a filesystem.
> 
> Optional:
> o Flags can also be cleared.  For this, the flags node needs an
>   additional versions field.
> 
> With this in place, we can set a flag when detecting a checksum error
> due to flash corruption.  Reading this flag on mount should print out
> a big warning.  In PARANOID mode, we could also refuse to mount, after
> detecting this flag.
> Main point is that as soon as we get the first flash corruption, the
> flash cannot be trusted anymore.  People may wish to ignore this,
> that's fine.  But others may wish to disable the complete device,
> generate a call home, blink some red LEDs on the case or whatever.
How amout to have slightly relaxed requiremants for NAND?

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 15:05                                     ` Artem B. Bityuckiy
@ 2005-01-20 15:27                                       ` Jörn Engel
  2005-01-20 15:37                                         ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 15:27 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 20 January 2005 18:05:17 +0300, Artem B. Bityuckiy wrote:
> 
> > > 5. Data recovery
> > >    ~~~~~~~~~~~~~
> > > If JFFS3 failed to write data it reads all valid data from this block and 
> > > writes it to another (good) block. Then block is marked bad.
> > 
> > We shouldn't read the data back.  Make sure it still exists in the
> > wbuf and use that instead.  After all the block just turned bad, so it
> > would be better if we don't depend on it.
> I mean the situation:
> 1. we write to the middle of block to, say, page 10 and have detected
> error.
> 2. We then move pages 1-9 to another, good block. Write our 10th page
> (from buffer) to that good block. Brk bogus block bad.

Sounds reasonable.

> I'm not sure this is reasonale. May be JFFS2's behaviour is better - it
> just skips page 10 and writes data to page 11. Then GC will move data,
> and error will be detected during erase. But my argument it is not very
> reliable. But is more workable if we have few space...

Could also be reasonable.  Someone with deeper NAND knowledge (tglx?)
might know more.

> Another thing is JFFS2 recovery function, I didn't speak about it. Seems
> you dislike that JFFS2 tries to read data from bad bage first instead of
> just writing data from wbuf.

I don't trust the page that caused the write error.  Trusting the
other pages in the same block is fine with me.

> > Jffs3 flags design draft:
> > o We create a new node-type for flags.  It just contains the 12 bytes
> >   header plus a 4-byte flags field.
> Hmm, having distinct node-type for flags souns good. Do you mean that we
> could store there information like GIGs PIGs either? (instead of having
> it in each inode node, this, saving space)?

Not sure.  What are GIGs and PIGs?

> > o If possible, the flags node is the first node for all erase blocks.
> >   It effectively replaces the erase marker.
> Not sure this is good Idea, need thinking... 

You may be right.  Flags could also be part of the summary nodes.  But
then, they should also be independent node, in case we don't have a
summary node.  And in that case, we may as well replace the erase
marker.

Hmm.  More thinking...

> > o Flags can only be set within the lifetime of a filesystem.
> > 
> > Optional:
> > o Flags can also be cleared.  For this, the flags node needs an
> >   additional versions field.
> > 
> > With this in place, we can set a flag when detecting a checksum error
> > due to flash corruption.  Reading this flag on mount should print out
> > a big warning.  In PARANOID mode, we could also refuse to mount, after
> > detecting this flag.
> > Main point is that as soon as we get the first flash corruption, the
> > flash cannot be trusted anymore.  People may wish to ignore this,
> > that's fine.  But others may wish to disable the complete device,
> > generate a call home, blink some red LEDs on the case or whatever.
> How amout to have slightly relaxed requiremants for NAND?

Nope, no way!  If Samsung aims to replace hard drives, sporadic data
corruption is not acceptable.

If flashes were merely floppy disk replacements, maybe.  But that's
not a very sexy goal.

Jörn

-- 
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague. 
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 15:27                                       ` Jörn Engel
@ 2005-01-20 15:37                                         ` Artem B. Bityuckiy
  2005-01-20 16:13                                           ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-20 15:37 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List


> Not sure.  What are GIGs and PIGs?
Shit :-)
I meant PIDs and GIDs :-)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 15:37                                         ` Artem B. Bityuckiy
@ 2005-01-20 16:13                                           ` Jörn Engel
  2005-01-20 16:31                                             ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 16:13 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 20 January 2005 18:37:41 +0300, Artem B. Bityuckiy wrote:
> 
> > Not sure.  What are GIGs and PIGs?
> Shit :-)

First pigs, now this.  You have a filthy language. ;)

> I meant PIDs and GIDs :-)

Doesn't work.  Those are per inode, unless you want all files to have
the same owner.
What would work, on the other hand, is to seperate inodes and data
nodes.  PID, GID and all the other inode fields are needed only once,
not for every chunk of data belonging to that inode.


And while we're at it, some other fields from the inodes should be
removed.

	jint32_t atime;      /* Last access time.  */
	jint32_t mtime;      /* Last modification time.  */
	jint32_t ctime;      /* Change time.  */

Who would want to mount jffs3 _without_ noatime?  In fact, look at
fs/jffs2/super.c:
        sb->s_flags = flags | MS_NOATIME;

For mtime and ctime, those could be collapsed into one.  Not sure if
that makes sense.

	jint32_t dsize;	     /* Size of the node's data. (after decompression) */

In most cases, dsize will be 4k.  So we can just uncompress into the
page and ignore the flag.  For the last page of a file, we can do the
same, even though the dsize is different.
In the remaining cases, we'd have to uncompress to a buffer of 4k (or
more, if we go for 64k nodes someday) and memcpy.

Might be worth the effort, even if it's just 4 bytes.

	uint8_t usercompr;   /* Compression algorithm requested by the user */

Not exactly sure how this is used.  The code appears to either ignore
it or to be buggy, if this is ever nonzero.  But I could be wrong.

Jörn

-- 
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 16:13                                           ` Jörn Engel
@ 2005-01-20 16:31                                             ` Artem B. Bityuckiy
  2005-01-20 16:41                                               ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-20 16:31 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

Jörn Engel wrote:
> Doesn't work.  Those are per inode, unless you want all files to have
> the same owner.
Surely. I thought you suggested to introduce new type of nodes which are
per-inode and contain inode flags and other information which we do not
want to copy to each inode node, like PID, GID and time stuff.

> What would work, on the other hand, is to seperate inodes and data
> nodes.  PID, GID and all the other inode fields are needed only once,
> not for every chunk of data belonging to that inode.
> 
> 
> And while we're at it, some other fields from the inodes should be
> removed.
> 
> 	jint32_t atime;      /* Last access time.  */
> 	jint32_t mtime;      /* Last modification time.  */
> 	jint32_t ctime;      /* Change time.  */
Absolutely correct :-)

> 	jint32_t dsize;	     /* Size of the node's data. (after
decompression) */
> 
> In most cases, dsize will be 4k.  So we can just uncompress into the
> page and ignore the flag.  For the last page of a file, we can do the
> same, even though the dsize is different.
> In the remaining cases, we'd have to uncompress to a buffer of 4k (or
> more, if we go for 64k nodes someday) and memcpy.
Hmm. I assume you offer to remove this field at all? Not sure it is good
Idea.
When we build fragtree we should know how many bytes in nodes. We do not
want to
uncompress data to calculate this...

> Might be worth the effort, even if it's just 4 bytes.
> 
> 	uint8_t usercompr;   /* Compression algorithm requested by the user
*/
AFIK, this is uncompleted Ferenc Havsi's stuff. He was going to add
smart compressioning. This is just ignored now.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 16:31                                             ` Artem B. Bityuckiy
@ 2005-01-20 16:41                                               ` Jörn Engel
  2005-01-20 17:08                                                 ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 16:41 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 20 January 2005 19:31:13 +0300, Artem B. Bityuckiy wrote:
> Jörn Engel wrote:
> > Doesn't work.  Those are per inode, unless you want all files to have
> > the same owner.
> Surely. I thought you suggested to introduce new type of nodes which are
> per-inode and contain inode flags and other information which we do not
> want to copy to each inode node, like PID, GID and time stuff.

Ah, yes.  Makes sense.

Well, the original proposal was about _filesystem_ flags.  Right now,
we have none, but I'd like to have the "something smells fishy,
replace your flash ASAP" flage.

> > What would work, on the other hand, is to seperate inodes and data
> > nodes.  PID, GID and all the other inode fields are needed only once,
> > not for every chunk of data belonging to that inode.
> > 
> > 
> > And while we're at it, some other fields from the inodes should be
> > removed.
> > 
> > 	jint32_t atime;      /* Last access time.  */
> > 	jint32_t mtime;      /* Last modification time.  */
> > 	jint32_t ctime;      /* Change time.  */
> Absolutely correct :-)
> 
> > 	jint32_t dsize;	     /* Size of the node's data. (after
> decompression) */
> > 
> > In most cases, dsize will be 4k.  So we can just uncompress into the
> > page and ignore the flag.  For the last page of a file, we can do the
> > same, even though the dsize is different.
> > In the remaining cases, we'd have to uncompress to a buffer of 4k (or
> > more, if we go for 64k nodes someday) and memcpy.
> Hmm. I assume you offer to remove this field at all? Not sure it is good
> Idea.
> When we build fragtree we should know how many bytes in nodes. We do not
> want to
> uncompress data to calculate this...

Ouch!  Ok, looks like this one could stay.

Although...we do have the offset already.  So for the 4k case, it's
pretty simple to build the fragtree as well, as there is just one node
that could possibly be in the 4k range for it.

The remaining cases get quite a bit messier.  They remain the rare
case, so it may still be worth the effort, if anyone cares enough to
write the code.

> > 	uint8_t usercompr;   /* Compression algorithm requested by the user
> */
> AFIK, this is uncompleted Ferenc Havsi's stuff. He was going to add
> smart compressioning. This is just ignored now.

Ok, will ignore it then.

Jörn

-- 
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 16:41                                               ` Jörn Engel
@ 2005-01-20 17:08                                                 ` Artem B. Bityuckiy
  2005-01-20 17:33                                                   ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-20 17:08 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

Jörn Engel wrote:
> Ah, yes.  Makes sense.
> 
> Well, the original proposal was about _filesystem_ flags.  Right now,
> we have none, but I'd like to have the "something smells fishy,
> replace your flash ASAP" flage.
We still may do this with per-inode flags.

By the way, I'm not sure having times in our "inode flags node" is
reasonable. In this case we will need to update it very often and
possibly, will waste even more space (flag nodes have headers). If we
put only rarely changed data there, like UID and GID, we may save some
flash space... 

One more by the way, compression mode and whatever people implement in
xattr flags may go to that inode flags node.

And I still do not think that if we found bad block on NAND we should
print large warning every mount. For NOR, no doubts.

Hmm, how about this:
If we found checksum error related so some inode X, we set correspondent
flag in "inode flag node" of inode X. Then we reject any work with this
file we do not allow user read or write it (but it is possible to remove
it). But this inode may be just partially corrupted, and user still
wants to edit/read it, it should explicitly call the correspondent xattr
(I thin think xattr should be assumed to be implemented in JFFS3) and
clear that "corrupted" flag.

Yes, on mount we print large warning for NOR.

> Ouch!  Ok, looks like this one could stay.
> 
> Although...we do have the offset already.  So for the 4k case, it's
> pretty simple to build the fragtree as well, as there is just one node
> that could possibly be in the 4k range for it.
> 
> The remaining cases get quite a bit messier.  They remain the rare
> case, so it may still be worth the effort, if anyone cares enough to
> write the code.
> 

Anyway, nodes are not sorted on media. So to find the node with 
next offset we will need to do some search. This is time consuming. Or
we will need to keep nodes in offset-sorted list, which I consider not
reasonable at least if thinking in the context of the current JFFS2
design (I imagine JFFS3 as JFFS2 generation, go JFFS3 will inherit lots
from JFFS2, again, if JFFS3 will ever happen). :-)

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 17:08                                                 ` Artem B. Bityuckiy
@ 2005-01-20 17:33                                                   ` Jörn Engel
  2005-01-20 17:57                                                     ` Artem B. Bityuckiy
  2005-01-21 14:30                                                     ` Josh Boyer
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-20 17:33 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 20 January 2005 20:08:23 +0300, Artem B. Bityuckiy wrote:
>
> By the way, I'm not sure having times in our "inode flags node" is
> reasonable. In this case we will need to update it very often and
> possibly, will waste even more space (flag nodes have headers). If we
> put only rarely changed data there, like UID and GID, we may save some
> flash space... 

Makes some sense.  Needs more thinking...

> One more by the way, compression mode and whatever people implement in
> xattr flags may go to that inode flags node.

Extended attributes can be arbitrary, so they should go somewhere
else.  Basically, have a pseudo-directory for each inode and treat the
inodes inside that pseudo-directory as extended attributes.  Similar
to reiser4.  Didn't Josh plan to work in this area?

> And I still do not think that if we found bad block on NAND we should
> print large warning every mount. For NOR, no doubts.

What's so very special about NAND?  Either it's a useful storage
medium (with slightly different details), or it's not.  If it is, you
will never see this message.  If it's not, you shouldn't put any
serious data on it anyway.  In both cases, you should never see this
message.

If you do see the message, it means that you *thought* it would be a
useful storage medium, but you're *wrong*.  Telling you that your
conception of reality doesn't match reality itself is a service.

> > Ouch!  Ok, looks like this one could stay.
> > 
> > Although...we do have the offset already.  So for the 4k case, it's
> > pretty simple to build the fragtree as well, as there is just one node
> > that could possibly be in the 4k range for it.
> > 
> > The remaining cases get quite a bit messier.  They remain the rare
> > case, so it may still be worth the effort, if anyone cares enough to
> > write the code.
> > 
> 
> Anyway, nodes are not sorted on media. So to find the node with 
> next offset we will need to do some search. This is time consuming. Or
> we will need to keep nodes in offset-sorted list,

How about an rb_tree, sorted by the offset?  Just like the one in
fs/jffs2/nodelist.h, hm? ;)

You're right, it's not sure whether this is actually a useful
optimization.  But it's not trivial to dismiss either.

Jörn

-- 
All art is but imitation of nature.
-- Lucius Annaeus Seneca

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 17:33                                                   ` Jörn Engel
@ 2005-01-20 17:57                                                     ` Artem B. Bityuckiy
  2005-01-21 12:44                                                       ` Jörn Engel
  2005-01-21 14:30                                                     ` Josh Boyer
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-20 17:57 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

> Extended attributes can be arbitrary, so they should go somewhere
> else.  Basically, have a pseudo-directory for each inode and treat the
> inodes inside that pseudo-directory as extended attributes.  Similar
> to reiser4.  Didn't Josh plan to work in this area?
Frankly speaking, do not really understand you. But do not care much,
did not think about xattr. This is another topic. And agree that xattrs
may be different, and what I proposed is not generic solution.

> What's so very special about NAND?  Either it's a useful storage
> medium (with slightly different details), or it's not.  If it is, you
> will never see this message.  If it's not, you shouldn't put any
> serious data on it anyway.  In both cases, you should never see this
> message.
> 
> If you do see the message, it means that you *thought* it would be a
> useful storage medium, but you're *wrong*.  Telling you that your
> conception of reality doesn't match reality itself is a service.
Did not want return to this :-)
We may print it for NAND too. If we provide method to clear our flag,
user may disable this.

> How about an rb_tree, sorted by the offset?  Just like the one in
> fs/jffs2/nodelist.h, hm? ;)
Yes, it is. But before you create fragtree:
1. You need dsize to create fragtree.
2. You have no structures which are sorted by offsets, so it is not easy
to obtain dsize.

That's problem. So, let us keep dsize alive :-)


On Thu, 2005-01-20 at 18:33 +0100, Jörn Engel wrote:
> On Thu, 20 January 2005 20:08:23 +0300, Artem B. Bityuckiy wrote:
> >
> > By the way, I'm not sure having times in our "inode flags node" is
> > reasonable. In this case we will need to update it very often and
> > possibly, will waste even more space (flag nodes have headers). If we
> > put only rarely changed data there, like UID and GID, we may save some
> > flash space... 
> 
> Makes some sense.  Needs more thinking...
> 
> > One more by the way, compression mode and whatever people implement in
> > xattr flags may go to that inode flags node.
> 
> Extended attributes can be arbitrary, so they should go somewhere
> else.  Basically, have a pseudo-directory for each inode and treat the
> inodes inside that pseudo-directory as extended attributes.  Similar
> to reiser4.  Didn't Josh plan to work in this area?
> 
> > And I still do not think that if we found bad block on NAND we should
> > print large warning every mount. For NOR, no doubts.
> 
> What's so very special about NAND?  Either it's a useful storage
> medium (with slightly different details), or it's not.  If it is, you
> will never see this message.  If it's not, you shouldn't put any
> serious data on it anyway.  In both cases, you should never see this
> message.
> 
> If you do see the message, it means that you *thought* it would be a
> useful storage medium, but you're *wrong*.  Telling you that your
> conception of reality doesn't match reality itself is a service.
> 
> > > Ouch!  Ok, looks like this one could stay.
> > > 
> > > Although...we do have the offset already.  So for the 4k case, it's
> > > pretty simple to build the fragtree as well, as there is just one node
> > > that could possibly be in the 4k range for it.
> > > 
> > > The remaining cases get quite a bit messier.  They remain the rare
> > > case, so it may still be worth the effort, if anyone cares enough to
> > > write the code.
> > > 
> > 
> > Anyway, nodes are not sorted on media. So to find the node with 
> > next offset we will need to do some search. This is time consuming. Or
> > we will need to keep nodes in offset-sorted list,
> 
> How about an rb_tree, sorted by the offset?  Just like the one in
> fs/jffs2/nodelist.h, hm? ;)
> 
> You're right, it's not sure whether this is actually a useful
> optimization.  But it's not trivial to dismiss either.
> 
> Jörn
> 

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 17:57                                                     ` Artem B. Bityuckiy
@ 2005-01-21 12:44                                                       ` Jörn Engel
  2005-01-21 13:13                                                         ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-21 12:44 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Thu, 20 January 2005 20:57:33 +0300, Artem B. Bityuckiy wrote:
> 
> > > One more by the way, compression mode and whatever people
> > > implement in xattr flags may go to that inode flags node.
> >
> > Extended attributes can be arbitrary, so they should go somewhere
> > else.  Basically, have a pseudo-directory for each inode and treat the
> > inodes inside that pseudo-directory as extended attributes.  Similar
> > to reiser4.  Didn't Josh plan to work in this area?
> Frankly speaking, do not really understand you. But do not care much,
> did not think about xattr. This is another topic. And agree that xattrs
> may be different, and what I proposed is not generic solution.

Basically, there are two kinds of file attributes.  First the old,
well-known and well-defined ones from original unix, see stat(2).
Those are part of the inode for any unix-style filesystem.  Plus any
arbitraty other attributes, that people may wish for.  Since those are
neither well-known nor well-defined, we shouldn't statically allocate
space for them in the inode.

But I fear that you were talking about something completely different.
In that case, the "xattr" put me on the wrong track.

> We may print it for NAND too. If we provide method to clear our flag,
> user may disable this.

Sure.  I'm ok to confine this to PARANOID mode or make it otherwise
user-controllable.  Not everyone will want it, so it has to be a
tunable.

> > How about an rb_tree, sorted by the offset?  Just like the one in
> > fs/jffs2/nodelist.h, hm? ;)
> Yes, it is. But before you create fragtree:
> 1. You need dsize to create fragtree.
> 2. You have no structures which are sorted by offsets, so it is not easy
> to obtain dsize.
> 
> That's problem. So, let us keep dsize alive :-)

Still not convinced, but I'm too lazy to argue over four bytes.

Jörn

-- 
"Translations are and will always be problematic. They inflict violence 
upon two languages." (translation from German)

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 12:44                                                       ` Jörn Engel
@ 2005-01-21 13:13                                                         ` Artem B. Bityuckiy
  2005-01-21 13:42                                                           ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-21 13:13 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

Jrn Engel wrote:
> Basically, there are two kinds of file attributes.  First the old,
> well-known and well-defined ones from original unix, see stat(2).
> Those are part of the inode for any unix-style filesystem.  Plus any
> arbitraty other attributes, that people may wish for.  Since those are
> neither well-known nor well-defined, we shouldn't statically allocate
> space for them in the inode.
> 
> But I fear that you were talking about something completely different.
> In that case, the "xattr" put me on the wrong track.
I'm aware of xattr basics. Ok, let's put out of mind this for now :-)

> Still not convinced, but I'm too lazy to argue over four bytes.
I think this is not so important now.

I would like create TeX file and put this "JFFS3 checkusm stuff" there. 
This will be "JFFS3 design issues file". It will be accessible via CVS and 
people may  edit it, contributing new ideas.

More questions which I would like to discuss are:

* JFFS3 memory consumption
* JFFS3 & locking

May be:

* mmap "shared" issue
* xattr design

may be other.

After discussing we may put summarized info to that document. Possibly 
this document will be accessible via web as pdf.

Comments?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 13:13                                                         ` Artem B. Bityuckiy
@ 2005-01-21 13:42                                                           ` Jörn Engel
  2005-01-21 13:52                                                             ` Artem B. Bityuckiy
  2005-01-21 14:04                                                             ` Artem B. Bityuckiy
  0 siblings, 2 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-21 13:42 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Fri, 21 January 2005 13:13:14 +0000, Artem B. Bityuckiy wrote:
> 
> I would like create TeX file and put this "JFFS3 checkusm stuff" there. 
> This will be "JFFS3 design issues file". It will be accessible via CVS and 
> people may  edit it, contributing new ideas.

Sounds good.

> More questions which I would like to discuss are:
> 
> * JFFS3 memory consumption

Good idea.

> * JFFS3 & locking

I started a locking document for jffs2.  Below.

It doesn't follow any style.  As I didn't know a decent style for
locking documentation, I made up my own.  Feel free to improve it.

Jörn

-- 
When people work hard for you for a pat on the back, you've got
to give them that pat.
-- Robert Heinlein

Locking hierarchy:

Locks can belong to either an inode or a superblock.  "f" is used to
indicate an inode, "c" to indicate a superblock, as in most of the
code:
	struct jffs2_inode *f
	struct jffs2_sb *c

Each lock in the graph below is requested with all the locks above it
possibly held - as long as their indentation level is less.  So
c->inocache_lock can be requested with c->alloc_sem, f->sem and
c->erase_completion_lock held.  c->erase_free_sem is never requested
with any other locks held.

none
	c->alloc_sem
		f->sem
			c->erase_completion_lock
				c->inocache_lock
	c->erase_free_sem
	c->gc_thread_start



Protected data structures:

f->sem
	everything inside f?

c->alloc_sem
	everything in c but the below?

c->erase_completion_lock
	c->free_list
	c->erasing_list

c->inocache_lock
	c->inocache_list

c->erase_free_sem
	???

c->gc_thread_start
	nothing - just for synchronization

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 13:42                                                           ` Jörn Engel
@ 2005-01-21 13:52                                                             ` Artem B. Bityuckiy
  2005-01-21 14:00                                                               ` Jörn Engel
  2005-01-21 14:04                                                             ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-21 13:52 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Fri, 21 Jan 2005, [iso-8859-1] Jörn Engel wrote:

> On Fri, 21 January 2005 13:13:14 +0000, Artem B. Bityuckiy wrote:
> > 
> > I would like create TeX file and put this "JFFS3 checkusm stuff" there. 
> > This will be "JFFS3 design issues file". It will be accessible via CVS and 
> > people may  edit it, contributing new ideas.
> 
> Sounds good.
> 
> > More questions which I would like to discuss are:
> > 
> > * JFFS3 memory consumption
> 
> Good idea.
> 
> > * JFFS3 & locking
> 
> I started a locking document for jffs2.  Below.
> 
> It doesn't follow any style.  As I didn't know a decent style for
> locking documentation, I made up my own.  Feel free to improve it.
> 
> Jörn
> 
> -- 
> When people work hard for you for a pat on the back, you've got
> to give them that pat.
> -- Robert Heinlein
> 
> Locking hierarchy:
> 
> Locks can belong to either an inode or a superblock.  "f" is used to
> indicate an inode, "c" to indicate a superblock, as in most of the
> code:
> 	struct jffs2_inode *f
> 	struct jffs2_sb *c
> 
> Each lock in the graph below is requested with all the locks above it
> possibly held - as long as their indentation level is less.  So
> c->inocache_lock can be requested with c->alloc_sem, f->sem and
> c->erase_completion_lock held.  c->erase_free_sem is never requested
> with any other locks held.
> 
> none
> 	c->alloc_sem
> 		f->sem
> 			c->erase_completion_lock
> 				c->inocache_lock
> 	c->erase_free_sem
> 	c->gc_thread_start
> 
> 
> 
> Protected data structures:
> 
> f->sem
> 	everything inside f?
> 
> c->alloc_sem
> 	everything in c but the below?
> 
> c->erase_completion_lock
> 	c->free_list
> 	c->erasing_list
> 
> c->inocache_lock
> 	c->inocache_list
> 
> c->erase_free_sem
> 	???
> 
> c->gc_thread_start
> 	nothing - just for synchronization
> 
I think the locking rules in JFFS2 are fuzzy and would like to change 
them. But now I do not have any *real* idea and not sure will have any.

P.S. Have you ever seen fs/jffs2/README.Locking?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 13:52                                                             ` Artem B. Bityuckiy
@ 2005-01-21 14:00                                                               ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-21 14:00 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Fri, 21 January 2005 13:52:29 +0000, Artem B. Bityuckiy wrote:
> > 
> I think the locking rules in JFFS2 are fuzzy and would like to change 
> them. But now I do not have any *real* idea and not sure will have any.
> 
> P.S. Have you ever seen fs/jffs2/README.Locking?

Ouch.  No.  Thanks.

Jörn

-- 
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 13:42                                                           ` Jörn Engel
  2005-01-21 13:52                                                             ` Artem B. Bityuckiy
@ 2005-01-21 14:04                                                             ` Artem B. Bityuckiy
  2005-01-25 10:51                                                               ` Jörn Engel
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-21 14:04 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

Some of my comments bellow:

On Fri, 21 Jan 2005, [iso-8859-1] Jörn Engel wrote:
> On Fri, 21 January 2005 13:13:14 +0000, Artem B. Bityuckiy wrote:
> > 
> > I would like create TeX file and put this "JFFS3 checkusm stuff" there. 
> > This will be "JFFS3 design issues file". It will be accessible via CVS and 
> > people may  edit it, contributing new ideas.
> 
> Sounds good.
> 
> > More questions which I would like to discuss are:
> > 
> > * JFFS3 memory consumption
> 
> Good idea.
> 
> > * JFFS3 & locking
> 
> I started a locking document for jffs2.  Below.
> 
> It doesn't follow any style.  As I didn't know a decent style for
> locking documentation, I made up my own.  Feel free to improve it.
> 
> Jörn
> 
> -- 
> When people work hard for you for a pat on the back, you've got
> to give them that pat.
> -- Robert Heinlein
> 
> Locking hierarchy:
> 
> Locks can belong to either an inode or a superblock.  "f" is used to
> indicate an inode, "c" to indicate a superblock, as in most of the
> code:
> 	struct jffs2_inode *f
> 	struct jffs2_sb *c
> 
> Each lock in the graph below is requested with all the locks above it
> possibly held - as long as their indentation level is less.  So
> c->inocache_lock can be requested with c->alloc_sem, f->sem and
> c->erase_completion_lock held.  c->erase_free_sem is never requested
> with any other locks held.
> 
> none
> 	c->alloc_sem
> 		f->sem
RULE: Can not have any f->sem locked if gonna lock f->alloc_sem.

> 			c->erase_completion_lock
> 				c->inocache_lock
> 	c->erase_free_sem
> 	c->gc_thread_start
> 
> 
> 
> Protected data structures:
> 
> f->sem
> 	everything inside f?
Yes. Also if you have f->sem seems you may change f->inocache fields 
without having c->inocachelock.

> 
> c->alloc_sem
> 	everything in c but the below?
No. In essence f->alloc_sem protects flash space. Believe this mutex is 
misnamed. And tends to lock code, not data.

> c->erase_completion_lock
Misnamed either. David said this is historical. Protects 
node_ref list.

> 	c->free_list
> 	c->erasing_list
> 
> c->inocache_lock
> 	c->inocache_list
Protects inodcache list. Also protects objects in it. Protects ic->state.

> 
> c->erase_free_sem
> 	???
David hates this. This is to protect node_refs - they are protected by 
both c->erase_complition lock and by c->erase_free_sem. Most time you use 
c->erase_completion lock. But if you need sleep, you use mutex.

> 
> c->gc_thread_start
> 	nothing - just for synchronization


--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 17:33                                                   ` Jörn Engel
  2005-01-20 17:57                                                     ` Artem B. Bityuckiy
@ 2005-01-21 14:30                                                     ` Josh Boyer
  1 sibling, 0 replies; 196+ messages in thread
From: Josh Boyer @ 2005-01-21 14:30 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Thu, 2005-01-20 at 11:33, Jörn Engel wrote:
> > One more by the way, compression mode and whatever people implement in
> > xattr flags may go to that inode flags node.
> 
> Extended attributes can be arbitrary, so they should go somewhere
> else.  Basically, have a pseudo-directory for each inode and treat the
> inodes inside that pseudo-directory as extended attributes.  Similar
> to reiser4.  Didn't Josh plan to work in this area?

<Delayed reaction>

What I'd like to do and what time allows are two separate things :). 
And time will be in very short supply for me in a while.

I'd be happy to review anything in this area.  And if I do get some
time, I'll look into it as well.  But I wouldn't hold your breath.

You know what they say about best laid plans...

</Delayed reaction>

josh

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-20 14:35                                   ` Jörn Engel
  2005-01-20 14:37                                     ` David Woodhouse
  2005-01-20 15:05                                     ` Artem B. Bityuckiy
@ 2005-01-21 22:33                                     ` Jared Hulbert
  2 siblings, 0 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-21 22:33 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

> a) Never generate checksums.
> b) Always generate checksums, but never check them.
> 
> Strategy b) sounds pretty stupid, but it optimizes the 90% case - read
> - and allows the user to remount the filesystem to switch to PARANOID
> mode.  So, we could go as you proposed, we could settle for either a)
> or b) or we could allow both.  In that case I'd call a) the SLOPPY
> case and b) the RELAXED, just to distinguish things.
> 
> Which one makes most sense?

Allowing for both makes the most sense.  Keep in mind that most other
filesystems aren't nearly as paranoid as JFFS2.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-19 19:58                                 ` Artem B. Bityuckiy
  2005-01-20 14:35                                   ` Jörn Engel
@ 2005-01-21 22:46                                   ` Jared Hulbert
  2005-01-21 23:54                                     ` Josh Boyer
  2005-01-22 13:03                                     ` Artem B. Bityuckiy
  1 sibling, 2 replies; 196+ messages in thread
From: Jared Hulbert @ 2005-01-21 22:46 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

New idea.

Why should we waste time compressing the uncompressable?  JFFS2
actually spent alot of time compressing.

I can think of a few possible mechanisms:
a) extension based
        -Don't compress files with .mpg, .jpg, .avi, .gz, etc.  User
defined list?
b) test first data
         Compress the first X sized chunk of data to determine if the
file is compressable.  Make  determination and write data.
c) test node by node
          If the last node didn't compress, stop compressing file.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 22:46                                   ` Jared Hulbert
@ 2005-01-21 23:54                                     ` Josh Boyer
  2005-01-22 13:03                                     ` Artem B. Bityuckiy
  1 sibling, 0 replies; 196+ messages in thread
From: Josh Boyer @ 2005-01-21 23:54 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List

On Fri, 2005-01-21 at 14:46 -0800, Jared Hulbert wrote:
> New idea.
> 
> Why should we waste time compressing the uncompressable?  JFFS2
> actually spent alot of time compressing.
> 
> I can think of a few possible mechanisms:
> a) extension based
>         -Don't compress files with .mpg, .jpg, .avi, .gz, etc.  User
> defined list?
> b) test first data
>          Compress the first X sized chunk of data to determine if the
> file is compressable.  Make  determination and write data.
> c) test node by node
>           If the last node didn't compress, stop compressing file.

Both b) and c) don't work.  Binary files with large sections of zeros
between seemingly random data compress quite well.  With those schemes
you loose the compression benefit of all those zeros (or any other
repeating pattern).

Option a) could work, maybe.  You can argue that file extensions are
more of a Winders mechanism though.  None of those files have to end in
a specific extension on Linux.  E.g. someone can make a foo.gz, copy it
to foo.notgz, and gzip will still grok it.

But if the list was user definable as you suggested, then users can tune
to their specific usage.

Or, as has been suggested before, you could use xattrs to do per-file
compression.  This is probably the most generic option.

josh

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 22:46                                   ` Jared Hulbert
  2005-01-21 23:54                                     ` Josh Boyer
@ 2005-01-22 13:03                                     ` Artem B. Bityuckiy
  2005-01-22 22:04                                       ` David Woodhouse
  1 sibling, 1 reply; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-22 13:03 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: MTD List

On Fri, 2005-01-21 at 14:46 -0800, Jared Hulbert wrote:
> New idea.
> 
> Why should we waste time compressing the uncompressable?  JFFS2
> actually spent alot of time compressing.
Jared, I belive you have opened very important subject. I'll try to
count drawback of compression in JFFS2:

1. In a lot of cases, as Jared said, yes, we "waste time to compress
uncompressable".

2. Worse, compression impacts JFFS2 design globally. You know, that the
maximum amount of data JFFS2 may store in one inode node is the size of
RAM page, most often 4K. 

Q: Why we restrict ourself with this size? 
A: Because of compression.

This is the only way to implement quick read operation: VFS talks with
JFFS2 in terms of RAM pages - it reads pages of data and writes pages of
data. So having one page of compressed data in one node we may
uncompress nodes data directly to the provided by VFS RAM page, without
excessive copy operations.

Let me count which consequences do we have with this:

2.1. We now have two flavors of inode nodes: pristine inode node and
non-pristine inode nodes. Pristine inode nodes are those nodes who has
4K of data. Non-pristine inode nodes are those who have fewer then 4K
data.

2.2. GC not only collects garbage, it has additional task - it merges
non-pristine nodes producing pristine nodes, thus optimizing future read
operations. This additional policy slows down and complicates the
Garbage Collector.

2.3. It is know fact that JFFS2 for each node on flash keeps one small
object in RAM. So, more nodes we have, more RAM memory we waste. The
amount of memory (called in-core memory) JFFS2 needs is linear function
of number of nodes on flash. So, with growing sizes of flash chips,
JFFS2's chances to alive decrease.

Simple estimations shows that if we have 1 GigaByte flash partition with
JFFS2, and we fulfill it with data, JFFS2 will eat:

~3 MB of RAM in the best and probably not reachable situation.
~ 5 - 10 MB of RAM if your filesystem has many small files, like Linux
root FS with many files /dev/, etc.

This is if you only mount JFFS2. If you start opening big files, you
need much more RAM.

So, the need to split our data on 4K portions, leads to more nodes on
flash which lead to more RAM needed.

2.4. JFFS2 compresses 4K portions independently, so in many cases the
compression will be not so good.

Imagine we have compressed .avi film. We split it on 4K portions and
compress these 4K peaces independently. Then we join 64-byte JFFS2
header to each of 4K portion. Result: we waste a lot of space since we
have redundant inode node headers on flash. We waste CPU trying to
compress data.

Of course, there are advantages of compression either.

My conclusion:
~~~~~~~~~~~~~
Have or not have compression is very important question and this should
be carefully analyzed.

> 
> I can think of a few possible mechanisms:
> a) extension based
>         -Don't compress files with .mpg, .jpg, .avi, .gz, etc.  User
> defined list?
I belive this is *not JFFS2's deal* to distinguish extensions. Worse,
this is not kernel's buisness. Xattr again, may help here. User may
disable or enable compression for any file.

> b) test first data
>          Compress the first X sized chunk of data to determine if the
> file is compressable.  Make  determination and write data.
This is already implemented in JFFS2. There is "size" compression mode.
See JFFS2's sections when configure kernel.

> c) test node by node
>           If the last node didn't compress, stop compressing file.
May be...


Just want to stress, that:

1. If we just put out compression, we will have many possibilities to
simplify and optimize JFFS3. This will impact the JFFS3 design.

2. If we make compression optional, we possibly should implement the
possibility to have more then 4K data in inodes for which compression is
disabled.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-22 13:03                                     ` Artem B. Bityuckiy
@ 2005-01-22 22:04                                       ` David Woodhouse
  2005-01-23 10:03                                         ` Artem B. Bityuckiy
  0 siblings, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-22 22:04 UTC (permalink / raw)
  To: dedekind; +Cc: MTD List

On Sat, 2005-01-22 at 16:03 +0300, Artem B. Bityuckiy wrote:
> 2. Worse, compression impacts JFFS2 design globally. You know, that the
> maximum amount of data JFFS2 may store in one inode node is the size of
> RAM page, most often 4K. 
> 
> Q: Why we restrict ourself with this size? 
> A: Because of compression.

It's not just compression that restricts us to 4KiB. Compressing in
larger chunks isn't hard -- zisofs does it, for example. The real
problem is the larger buffers you need for garbage collection. It's not
impossible to fix that though.

We've always intended to have a per-inode attribute for user-specified
compression parameters; the most basic of which would be 'no
compression'. I originally intended to expose those as attributes in the
way that 'chattr' works, with an ioctl on the file -- but I suspect
xattrs would be a better approach nowadays.

Compression has other drawbacks though -- if you throw out compression
you can do fixed-size records, you can have a block-based architecture
and simplify your metadata, ...

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-22 22:04                                       ` David Woodhouse
@ 2005-01-23 10:03                                         ` Artem B. Bityuckiy
  2005-01-23 10:08                                           ` Artem B. Bityuckiy
  2005-01-23 11:04                                           ` David Woodhouse
  0 siblings, 2 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-23 10:03 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Sat, 2005-01-22 at 22:04 +0000, David Woodhouse wrote:
> Compression has other drawbacks though -- if you throw out compression
> you can do fixed-size records, you can have a block-based architecture
> and simplify your metadata, ...

Hmm. What do you mean? Do you mean YAFFS-like architecture? Than we
would better do YAFFS2 then JFFS3 :-))

If to be serious, I really can't imagine this. Do you mean, just to fix
the size of inode node data, say, to 512 bytes? What to do with
direntries?

Is it really feasible? Will we still have JFFS3 generation?

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-23 10:03                                         ` Artem B. Bityuckiy
@ 2005-01-23 10:08                                           ` Artem B. Bityuckiy
  2005-01-23 11:04                                           ` David Woodhouse
  1 sibling, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-23 10:08 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Sun, 2005-01-23 at 13:03 +0300, Artem B. Bityuckiy wrote:
> On Sat, 2005-01-22 at 22:04 +0000, David Woodhouse wrote:
> > Compression has other drawbacks though -- if you throw out compression
> > you can do fixed-size records, you can have a block-based architecture
> > and simplify your metadata, ...
> 
> Hmm. What do you mean? Do you mean YAFFS-like architecture? Than we
> would better do YAFFS2 then JFFS3 :-))
> 
> If to be serious, I really can't imagine this. Do you mean, just to fix
> the size of inode node data, say, to 512 bytes? What to do with
> direntries?
> 
> Is it really feasible? Will we still have JFFS3 generation?
Ouch, I meant JFFS2 generation.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-23 10:03                                         ` Artem B. Bityuckiy
  2005-01-23 10:08                                           ` Artem B. Bityuckiy
@ 2005-01-23 11:04                                           ` David Woodhouse
  2005-01-23 11:55                                             ` Artem B. Bityuckiy
  1 sibling, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-23 11:04 UTC (permalink / raw)
  To: dedekind; +Cc: MTD List

On Sun, 2005-01-23 at 13:03 +0300, Artem B. Bityuckiy wrote:
> Hmm. What do you mean? Do you mean YAFFS-like architecture? Than we
> would better do YAFFS2 then JFFS3 :-))

Something like that. If you're going to ditch the compression then you
might as well try to make some use of the structure. Having real on-
medium metadata lets you use a lot less RAM than a truly log-structured
file system.

I don't think that's necessarily the right approach though -- even with
the larger NAND sizes, we're short of space and compression is a really
useful tool for a general-purpose file system. 

YAFFS2 is already being worked on, for those applications for which is
makes more sense.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-23 11:04                                           ` David Woodhouse
@ 2005-01-23 11:55                                             ` Artem B. Bityuckiy
  0 siblings, 0 replies; 196+ messages in thread
From: Artem B. Bityuckiy @ 2005-01-23 11:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Sun, 2005-01-23 at 11:04 +0000, David Woodhouse wrote:
> Something like that. If you're going to ditch the compression then you
> might as well try to make some use of the structure. Having real on-
> medium metadata lets you use a lot less RAM than a truly log-structured
> file system.
My opinion is that we do JFFS3 and should keep its "pure" log-structured
nature inact. YAFFS is another and has its advantages and drawbacks.

> I don't think that's necessarily the right approach though -- even with
> the larger NAND sizes, we're short of space and compression is a really
> useful tool for a general-purpose file system. 
I agree. I think, as it was already said by different people and by you,
we should provide the possibility to switch compression on and off for
different files.

And possibly, we may:
1. Provide the possibility to put several independent compressed 4K
chunks to one nodes with compression on.
2. Have the possibility to put more then 4K data to inodes with
compression off.

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-21 14:04                                                             ` Artem B. Bityuckiy
@ 2005-01-25 10:51                                                               ` Jörn Engel
  2005-01-25 10:56                                                                 ` David Woodhouse
  0 siblings, 1 reply; 196+ messages in thread
From: Jörn Engel @ 2005-01-25 10:51 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: MTD List

On Fri, 21 January 2005 14:04:31 +0000, Artem B. Bityuckiy wrote:
> > 
> > none
> > 	c->alloc_sem
> > 		f->sem
> RULE: Can not have any f->sem locked if gonna lock f->alloc_sem.

Or c->erase_completion_lock if you want to lock either of the above.
Or c->inocache_lock if you want to lock either of the above.

As a very simple statical deadlock checker, you can draw a graph of
all locks, like I did.  If the graph has cycles, deadlock situations
are possible.

> > 			c->erase_completion_lock
> > 				c->inocache_lock
> > 	c->erase_free_sem
> > 	c->gc_thread_start
> > 
> > 
> > 
> > Protected data structures:
> > 
> > f->sem
> > 	everything inside f?
> Yes. Also if you have f->sem seems you may change f->inocache fields 
> without having c->inocachelock.

That could be a problem.  Will take a while before I can check it.

> > c->alloc_sem
> > 	everything in c but the below?
> No. In essence f->alloc_sem protects flash space. Believe this mutex is 
> misnamed. And tends to lock code, not data.

Since that code writes to flash, it does protect data.  Makes sense,
although one could possibly move the lock down to individual erase
blocks.

> > c->erase_completion_lock
> Misnamed either. David said this is historical. Protects 
> node_ref list.

Ok.

> > 	c->free_list
> > 	c->erasing_list
> > 
> > c->inocache_lock
> > 	c->inocache_list
> Protects inodcache list. Also protects objects in it. Protects ic->state.

Ok.

> > c->erase_free_sem
> > 	???
> David hates this. This is to protect node_refs - they are protected by 
> both c->erase_complition lock and by c->erase_free_sem. Most time you use 
> c->erase_completion lock. But if you need sleep, you use mutex.

Sounds like a good candidate for a patch.

Jörn

-- 
Das Aufregende am Schreiben ist es, eine Ordnung zu schaffen, wo
vorher keine existiert hat.
-- Doris Lessing

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-25 10:51                                                               ` Jörn Engel
@ 2005-01-25 10:56                                                                 ` David Woodhouse
  2005-01-25 11:13                                                                   ` Jörn Engel
  0 siblings, 1 reply; 196+ messages in thread
From: David Woodhouse @ 2005-01-25 10:56 UTC (permalink / raw)
  To: Jörn Engel; +Cc: MTD List

On Tue, 2005-01-25 at 11:51 +0100, Jörn Engel wrote:
> > RULE: Can not have any f->sem locked if gonna lock f->alloc_sem.
> 
> Or c->erase_completion_lock if you want to lock either of the above.

That's obvious -- you can't lock semaphores while you hold spinlocks.

> Or c->inocache_lock if you want to lock either of the above.

That (inocache_lock vs. erase_completion_lock) is documented in
README.Locking.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-25 10:56                                                                 ` David Woodhouse
@ 2005-01-25 11:13                                                                   ` Jörn Engel
  0 siblings, 0 replies; 196+ messages in thread
From: Jörn Engel @ 2005-01-25 11:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List

On Tue, 25 January 2005 10:56:39 +0000, David Woodhouse wrote:
> 
> That (inocache_lock vs. erase_completion_lock) is documented in
> README.Locking.

Some of the locks I found are not.  Send a patch?

Jörn

-- 
ticks = jiffies;
while (ticks == jiffies);
ticks = jiffies;
-- /usr/src/linux/init/main.c

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: JFFS3 & performance
  2005-01-13 18:30                 ` Jared Hulbert
  2005-01-13 18:36                   ` David Woodhouse
  2005-01-13 18:55                   ` Artem B. Bityuckiy
@ 2005-01-28  6:08                   ` Eric W. Biederman
  2 siblings, 0 replies; 196+ messages in thread
From: Eric W. Biederman @ 2005-01-28  6:08 UTC (permalink / raw)
  To: Jared Hulbert; +Cc: David Woodhouse, MTD List

Jared Hulbert <jaredeh@gmail.com> writes:

> > Guess we'll have to agree to disagree then :).  All I know is that I
> > want to be damn sure that the data I'm returning isn't totally screwed.
> > Call me paranoid.  A checksum is the only way I know of doing that.
> 
> Paranoid :)

Only insane programmers are not paranoid.

> To humor those of us willing to take our chances trusting the media
> won't go bad, would it be possible to architect JFFS3 such that
> disabling the checksumming or stripping it out is possible with out
> too much pain?

But checksums protect against more than the reads going bad.  They
also protect against writes going bad.  With large volumes of use
write errors are almost a certainty, with NOR.  And if you miss the
fact that the error happens.  And a single bad write is especially
painful if you are writing compressed data.

If you just need reads something like romfs, or isofs tuned for from a
NOR flash chip is probably better.

Eric

^ permalink raw reply	[flat|nested] 196+ messages in thread

end of thread, other threads:[~2005-01-28  6:08 UTC | newest]

Thread overview: 196+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-11 12:29 JFFS3 & performance Artem B. Bityuckiy
2005-01-11 14:37 ` Josh Boyer
2005-01-11 21:51 ` Jörn Engel
2005-01-12  0:06   ` Thomas Gleixner
2005-01-12 16:59     ` Jörn Engel
2005-01-12 17:37       ` Thomas Gleixner
2005-01-12 18:17         ` Jörn Engel
2005-01-12  9:15   ` Artem B. Bityuckiy
2005-01-12 16:41     ` Jared Hulbert
2005-01-12 17:02       ` Jörn Engel
2005-01-12 17:06         ` David Woodhouse
2005-01-12 17:11           ` Jörn Engel
2005-01-12 17:22         ` Jared Hulbert
2005-01-12 17:28           ` Artem B. Bityuckiy
2005-01-12 17:34           ` David Woodhouse
2005-01-12 17:45             ` Dan Post
2005-01-12 17:52               ` David Woodhouse
2005-01-12 17:14       ` Artem B. Bityuckiy
2005-01-12 22:30         ` Jared Hulbert
2005-01-12 22:43           ` Josh Boyer
2005-01-12 22:55             ` Jared Hulbert
2005-01-13 15:50               ` Josh Boyer
2005-01-13 18:30                 ` Jared Hulbert
2005-01-13 18:36                   ` David Woodhouse
2005-01-13 19:06                     ` Jörn Engel
2005-01-13 19:22                     ` Josh Boyer
2005-01-13 18:55                   ` Artem B. Bityuckiy
2005-01-13 19:10                     ` Brian Fox
2005-01-13 19:23                       ` Artem B. Bityuckiy
2005-01-28  6:08                   ` Eric W. Biederman
2005-01-13  7:54             ` David Woodhouse
2005-01-13  8:25           ` Artem B. Bityuckiy
2005-01-13 15:09           ` Jörn Engel
2005-01-12 18:10     ` Jörn Engel
2005-01-12 18:27       ` Thomas Gleixner
2005-01-12 18:40         ` Jörn Engel
2005-01-12 18:42           ` David Woodhouse
2005-01-12 18:43           ` Artem B. Bityuckiy
2005-01-12 19:16           ` Thomas Gleixner
2005-01-12 19:44             ` Jörn Engel
2005-01-12 19:53               ` Thomas Gleixner
2005-01-12 20:06                 ` Jörn Engel
2005-01-12 18:33       ` Artem B. Bityuckiy
2005-01-12 18:43         ` Jörn Engel
2005-01-12 18:45           ` Artem B. Bityuckiy
2005-01-12 18:58             ` Artem B. Bityuckiy
2005-01-12 19:50               ` Jörn Engel
2005-01-13 14:49   ` Artem B. Bityuckiy
2005-01-13 15:05     ` Artem B. Bityuckiy
2005-01-13 15:17       ` Jörn Engel
2005-01-13 15:22         ` Artem B. Bityuckiy
2005-01-13 15:40           ` Jörn Engel
2005-01-13 15:49             ` David Woodhouse
2005-01-13 15:53               ` Artem B. Bityuckiy
2005-01-13 16:13               ` Jörn Engel
2005-01-13 16:16                 ` Artem B. Bityuckiy
2005-01-13 16:21                   ` Jörn Engel
2005-01-13 16:22                     ` Artem B. Bityuckiy
2005-01-13 16:21                 ` Artem B. Bityuckiy
2005-01-14 13:46                   ` Jamey Hicks
2005-01-14 14:16                     ` Artem B. Bityuckiy
2005-01-18 15:50                       ` Joakim Tjernlund
2005-01-19 13:07                         ` Artem B. Bityuckiy
2005-01-19 15:24                           ` Jörn Engel
2005-01-19 15:27                             ` Artem B. Bityuckiy
2005-01-19 15:32                               ` Jörn Engel
2005-01-19 15:51                                 ` Artem B. Bityuckiy
2005-01-19 16:31                                   ` Jörn Engel
2005-01-19 19:58                                 ` Artem B. Bityuckiy
2005-01-20 14:35                                   ` Jörn Engel
2005-01-20 14:37                                     ` David Woodhouse
2005-01-20 14:40                                       ` Jörn Engel
2005-01-20 15:05                                     ` Artem B. Bityuckiy
2005-01-20 15:27                                       ` Jörn Engel
2005-01-20 15:37                                         ` Artem B. Bityuckiy
2005-01-20 16:13                                           ` Jörn Engel
2005-01-20 16:31                                             ` Artem B. Bityuckiy
2005-01-20 16:41                                               ` Jörn Engel
2005-01-20 17:08                                                 ` Artem B. Bityuckiy
2005-01-20 17:33                                                   ` Jörn Engel
2005-01-20 17:57                                                     ` Artem B. Bityuckiy
2005-01-21 12:44                                                       ` Jörn Engel
2005-01-21 13:13                                                         ` Artem B. Bityuckiy
2005-01-21 13:42                                                           ` Jörn Engel
2005-01-21 13:52                                                             ` Artem B. Bityuckiy
2005-01-21 14:00                                                               ` Jörn Engel
2005-01-21 14:04                                                             ` Artem B. Bityuckiy
2005-01-25 10:51                                                               ` Jörn Engel
2005-01-25 10:56                                                                 ` David Woodhouse
2005-01-25 11:13                                                                   ` Jörn Engel
2005-01-21 14:30                                                     ` Josh Boyer
2005-01-21 22:33                                     ` Jared Hulbert
2005-01-21 22:46                                   ` Jared Hulbert
2005-01-21 23:54                                     ` Josh Boyer
2005-01-22 13:03                                     ` Artem B. Bityuckiy
2005-01-22 22:04                                       ` David Woodhouse
2005-01-23 10:03                                         ` Artem B. Bityuckiy
2005-01-23 10:08                                           ` Artem B. Bityuckiy
2005-01-23 11:04                                           ` David Woodhouse
2005-01-23 11:55                                             ` Artem B. Bityuckiy
  -- strict thread matches above, loose matches on Subject: below --
2004-12-16 13:20 Joakim Tjernlund
2004-12-16 14:27 ` Artem B. Bityuckiy
2004-12-16 14:45   ` Joakim Tjernlund
2004-12-16 14:50     ` Artem B. Bityuckiy
2004-12-16 15:00       ` Joakim Tjernlund
2004-12-16 17:53     ` Jörn Engel
2004-12-16 18:42       ` Artem B. Bityuckiy
2004-12-16 19:15         ` Jörn Engel
2004-12-16 19:49           ` Jörn Engel
2004-12-16 19:58             ` Joakim Tjernlund
2004-12-16 20:46               ` Jörn Engel
2004-12-16 20:02             ` Joakim Tjernlund
2004-12-16 20:37               ` Thomas Gleixner
2004-12-16 20:51                 ` Jörn Engel
2004-12-16 21:02                   ` Thomas Gleixner
2004-12-16 21:06                   ` Joakim Tjernlund
2004-12-16 21:22                     ` Jörn Engel
2004-12-16 22:06                       ` Joakim Tjernlund
2004-12-17 10:25                         ` Jörn Engel
2004-12-17 10:44                           ` Joakim Tjernlund
2004-12-17 10:56                             ` Artem B. Bityuckiy
2004-12-17 10:46                               ` jasmine
2004-12-17 11:01                                 ` Artem B. Bityuckiy
2004-12-17 11:19                                   ` Joakim Tjernlund
2004-12-18 16:09                                     ` Jörn Engel
2004-12-18 16:26                                       ` Joakim Tjernlund
2004-12-18 16:52                                         ` Jörn Engel
2004-12-17 11:10                               ` Joakim Tjernlund
2004-12-17 11:20                                 ` Artem B. Bityuckiy
2004-12-22 13:36                                   ` Artem B. Bityuckiy
2004-12-22 14:03                                     ` Jörn Engel
2004-12-22 14:44                                       ` Artem B. Bityuckiy
2004-12-22 15:14                                         ` Jörn Engel
2004-12-22 15:25                                           ` Artem B. Bityuckiy
2004-12-22 16:08                                             ` Jörn Engel
2004-12-22 20:22                                             ` xemc
2004-12-22 20:43                                               ` xemc
2004-12-22 20:49                                                 ` Jasmine Strong
2004-12-22 15:30                                           ` Joakim Tjernlund
2004-12-22 15:37                                             ` Artem B. Bityuckiy
2004-12-22 15:47                                               ` Joakim Tjernlund
2004-12-22 15:56                                                 ` Artem B. Bityuckiy
2004-12-22 16:09                                                   ` Jörn Engel
2004-12-22 16:17                                                     ` Artem B. Bityuckiy
2004-12-22 16:43                                                       ` Joakim Tjernlund
2004-12-22 16:46                                                         ` Artem B. Bityuckiy
2004-12-22 17:26                                                       ` Jörn Engel
2004-12-22 18:14                                                         ` xemc
2004-12-22 18:20                                                           ` Artem B. Bityuckiy
2004-12-23 13:52                                                           ` Jörn Engel
2004-12-23 17:02                                                             ` Artem B. Bityuckiy
2005-01-07 11:10                                                               ` Artem B. Bityuckiy
2005-01-07 11:09                                                                 ` David Woodhouse
2005-01-07 11:27                                                                   ` jasmine
2005-01-07 11:43                                                                     ` Artem B. Bityuckiy
2005-01-07 14:23                                                                       ` Artem B. Bityuckiy
2005-01-07 14:27                                                                         ` jasmine
2005-01-07 14:33                                                                           ` Artem B. Bityuckiy
2005-01-07 14:37                                                                             ` jasmine
2005-01-07 14:43                                                                               ` Artem B. Bityuckiy
2005-01-07 14:55                                                                                 ` jasmine
2005-01-07 15:20                                                                                   ` Artem B. Bityuckiy
2005-01-07 15:24                                                                                     ` jasmine
2005-01-07 15:28                                                                                       ` Artem B. Bityuckiy
2005-01-07 15:31                                                                                         ` jasmine
2005-01-07 15:32                                                                                           ` Artem B. Bityuckiy
2005-01-07 17:57                                                                                       ` Artem B. Bityuckiy
2005-01-07 14:50                                                                               ` Artem B. Bityuckiy
2005-01-07 14:31                                                                         ` Artem B. Bityuckiy
2005-01-06 10:08                                                             ` Artem B. Bityuckiy
2005-01-08 20:14                                                               ` Jörn Engel
2005-01-09 11:39                                                                 ` Artem B. Bityuckiy
2005-01-10 14:24                                                                   ` Jörn Engel
2004-12-22 15:59                                                 ` jasmine
2004-12-22 16:19                                                   ` Jörn Engel
2004-12-22 16:21                                                   ` Artem B. Bityuckiy
2004-12-22 15:56                                             ` Jörn Engel
2004-12-22 16:39                                               ` Joakim Tjernlund
2004-12-22 17:33                                                 ` Jörn Engel
2004-12-17 11:20                               ` Jörn Engel
2004-12-18 12:23                                 ` Artem B. Bityuckiy
2004-12-21 14:45       ` Artem B. Bityuckiy
2004-12-21 16:03         ` Jörn Engel
2004-12-17 11:33 ` David Vrabel
2004-12-17 15:34   ` Joakim Tjernlund
2004-12-18 16:14   ` Jörn Engel
2004-12-18 16:25     ` Joakim Tjernlund
2004-12-18 16:39       ` Jörn Engel
2004-12-18 17:10     ` Joakim Tjernlund
2004-12-18 17:19       ` Jörn Engel
2004-12-18 17:51         ` Joakim Tjernlund
2004-12-18 17:59           ` Jörn Engel
2004-12-18 18:13             ` Joakim Tjernlund
2004-12-19  3:05               ` Jörn Engel
2004-12-18 18:09           ` Joakim Tjernlund
2004-12-21 14:38 ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox