public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* jffs2_scan_eraseblock() - errors
@ 2002-07-30 18:11 Curtis, Allen
  2002-07-30 22:21 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Curtis, Allen @ 2002-07-30 18:11 UTC (permalink / raw)
  To: 'linux-mtd@lists.infradead.org'

While testing power-fail operation of JFFS2, I can consistently produce this
error which occurs when the partition is mounted.

jffs2_scan_eraseblock() - Node at 0xXXXXXX {0x1985, 0xe002, 0x1985c002) has
invalid CRC)

I do not remember if the error is always CRC related and the Node location
changes but the hex values displayed within the parens is consistent.

Has anyone seen this before? Some concerns are:
1. Once the error occurs, it never gets fixed.
2. There does not appear to be a utility to fix errors.

We are using MVista 2.1, which is 2.4.17. I was able to produce this error
with only 13 power-cycles.

Any help would be appreciated!
TIA

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-30 18:11 Curtis, Allen
@ 2002-07-30 22:21 ` David Woodhouse
  2002-07-31 10:34   ` Jörn Engel
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2002-07-30 22:21 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
> While testing power-fail operation of JFFS2, I can consistently
> produce this error which occurs when the partition is mounted.

> jffs2_scan_eraseblock() - Node at 0xXXXXXX {0x1985, 0xe002,
> 0x1985c002) has invalid CRC)

> I do not remember if the error is always CRC related and the Node
> location changes but the hex values displayed within the parens is
> consistent.

> Has anyone seen this before? Some concerns are: 1. Once the error
> occurs, it never gets fixed. 2. There does not appear to be a utility
> to fix errors.

> We are using MVista 2.1, which is 2.4.17. I was able to produce this
> error with only 13 power-cycles. 

It's harmless. We were in the middle of writing that node when we lost
power, and hadn't finished writing the payload -- hence the CRC fails.
That's sort of what the CRC is there for. No data were lost -- either we
were garbage-collecting and the original copy of the data will still be on
the flash, or it was a write of new data which never made it to flash
because the write() call never returned -- just as if you pulled the plug a
moment sooner. 

It's actually possible to avoid the cosmetic annoyance by marking the node 
as obsolete, so we don't check the CRC and hence don't bitch about it 
failing. Older versions of JFFS2 did that, as does the current version in 
2.4.19-rc3. But it's harmless and expected behaviour.


--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: jffs2_scan_eraseblock() - errors
@ 2002-07-30 23:30 Curtis, Allen
  2002-07-30 23:42 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Curtis, Allen @ 2002-07-30 23:30 UTC (permalink / raw)
  To: 'David Woodhouse', Curtis, Allen
  Cc: 'linux-mtd@lists.infradead.org'

> > jffs2_scan_eraseblock() - Node at 0xXXXXXX {0x1985, 0xe002,
> > 0x1985c002) has invalid CRC)
> 
> 
> It's harmless. We were in the middle of writing that node when we lost
> power, and hadn't finished writing the payload -- hence the CRC fails.
> That's sort of what the CRC is there for. 

Questions then:
1. Does this node get reclaimed? I get this same message now with each
reboot.

2. It appears that the mount operation gets slower once this error condition
occurs. Is this possible/expected?

3. If the file-system becomes corrupted, what is the failure condition and
how do you correct it? Test for it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-30 23:30 jffs2_scan_eraseblock() - errors Curtis, Allen
@ 2002-07-30 23:42 ` David Woodhouse
  0 siblings, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-07-30 23:42 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
>  Questions then: 1. Does this node get reclaimed? I get this same
> message now with each reboot.

It'll get erased when that block gets garbage-collected.

> 2. It appears that the mount operation gets slower once this error
> condition occurs. Is this possible/expected?

Definitely not expected. I can't really see how it's possible. I'd say it's 
probably a coincidence.

> 3. If the file-system becomes corrupted, what is the failure condition
> and how do you correct it? Test for it?

JFFS2 is purely log-structured. There are few ways in which the file system 
as a whole can be declared to be 'corrupted'. The only real case I can 
think of is when a directory with children appears to have no links from 
the root inode, which is obviously a bug. In that case, we just delete the 
offending directory and all its children. I suppose we should really 
re-link it to /lost+found instead :)

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: jffs2_scan_eraseblock() - errors
@ 2002-07-30 23:59 Curtis, Allen
  2002-07-31  0:15 ` David Woodhouse
  2002-07-31  0:18 ` David Woodhouse
  0 siblings, 2 replies; 20+ messages in thread
From: Curtis, Allen @ 2002-07-30 23:59 UTC (permalink / raw)
  To: 'David Woodhouse', Curtis, Allen
  Cc: 'linux-mtd@lists.infradead.org'

> the root inode, which is obviously a bug. In that case, we 
> just delete the 
> offending directory and all its children. I suppose we should really 
> re-link it to /lost+found instead :)

Yes, disappearing files could be annoying.

Has anyone characterized the performance of the JFFS* file-systems over
time? Our initial tests indicated that JFFS2 was faster than JFFS. After
using the JFFS2 file-system and maintaining a relatively consistent
utilization there was a 4X increase in the time required to complete a mount
operation. Is this expected? Is there a worse case? See below for some
measurements.

JFFS/JFFS2 Mount Performance (38% usage, fresh install):
JFFS	real	16.164
	user	 0.01
	sys	16.14
JFFS2	real	6.47
	user	0.01
	sys	6.45

After being used, usage = 41%:
JFFS2	real	1m3.583	(4X increase)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-30 23:59 Curtis, Allen
@ 2002-07-31  0:15 ` David Woodhouse
  2002-07-31  0:18 ` David Woodhouse
  1 sibling, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-07-31  0:15 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
> Has anyone characterized the performance of the JFFS* file-systems
> over time? Our initial tests indicated that JFFS2 was faster than
> JFFS. After using the JFFS2 file-system and maintaining a relatively
> consistent utilization there was a 4X increase in the time required to
> complete a mount operation. Is this expected? Is there a worse case?
> See below for some measurements. 

It's expected. See the TODO file for notes on how we intend to fix it. 

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-30 23:59 Curtis, Allen
  2002-07-31  0:15 ` David Woodhouse
@ 2002-07-31  0:18 ` David Woodhouse
  1 sibling, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-07-31  0:18 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
>  After being used, usage = 41%: JFFS2	real	1m3.583	(4X increase) 

Btw, the most useful thing you can give here is profiling information. If 
you still have INOCACHE_HASHSIZE set to 1 in include/linux/jffs2_fs_sb.h 
then I can guess what's at the top of your profile. Increase it to 14 or 
128 or something for an instant win.

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: jffs2_scan_eraseblock() - errors
@ 2002-07-31  0:38 Curtis, Allen
  2002-07-31  0:48 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Curtis, Allen @ 2002-07-31  0:38 UTC (permalink / raw)
  To: 'David Woodhouse', Curtis, Allen
  Cc: 'linux-mtd@lists.infradead.org'

> you still have INOCACHE_HASHSIZE set to 1 in 
> include/linux/jffs2_fs_sb.h 
> then I can guess what's at the top of your profile. Increase 
> it to 14 or 
> 128 or something for an instant win.

THX, anything like that for JFFS? Unfortunately that is what we are shipping
now and it gets VERY slow, even with our GCD nice level modification.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31  0:38 Curtis, Allen
@ 2002-07-31  0:48 ` David Woodhouse
  0 siblings, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-07-31  0:48 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
>  THX, anything like that for JFFS? Unfortunately that is what we are
> shipping now and it gets VERY slow, even with our GCD nice level
> modification. 

Dunno. You could try increasing JFFS_HASH_SIZE but it may not be a 
bottleneck there -- it's been a very long time since I've looked seriously 
at JFFS1 code. Again, profile data are useful.

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: jffs2_scan_eraseblock() - errors
@ 2002-07-31  1:15 Curtis, Allen
  2002-07-31  3:07 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Curtis, Allen @ 2002-07-31  1:15 UTC (permalink / raw)
  To: 'David Woodhouse', Curtis, Allen
  Cc: 'linux-mtd@lists.infradead.org'

> >  After being used, usage = 41%: JFFS2	real	1m3.583	
> (4X increase) 
> 
> Btw, the most useful thing you can give here is profiling 
> information. If 
> you still have INOCACHE_HASHSIZE set to 1 in 
> include/linux/jffs2_fs_sb.h 
> then I can guess what's at the top of your profile. Increase 
> it to 14 or 
> 128 or something for an instant win.

INOCACHE_HASHSIZE was set to 1. Changed this to 128 and....

real	47.258s
sys	47.25s

THX

JFFS is already set to 40. I will try changing this some other time.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31  1:15 Curtis, Allen
@ 2002-07-31  3:07 ` David Woodhouse
  0 siblings, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-07-31  3:07 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: 'linux-mtd@lists.infradead.org'

Allen.Curtis@Thales-IFS.com said:
>  INOCACHE_HASHSIZE was set to 1. Changed this to 128 and....
> real	47.258s sys	47.25s

Hmm. Out of interest, What happens if you apply this? Again, profile data 
are useful :)

DO NOT LEAVE THIS IN. It disables the CRC32 check on nodes during mount. 
It's the first (and easy) part of the proof-of-concept which involves 
moving said CRC32 check to later so the mount doesn't have to wait for it.

Index: scan.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/scan.c,v
retrieving revision 1.51.2.3
diff -u -p -r1.51.2.3 scan.c
--- scan.c	25 Jul 2002 20:49:06 -0000	1.51.2.3
+++ scan.c	31 Jul 2002 03:06:44 -0000
@@ -467,7 +467,7 @@ static int jffs2_scan_inode_node(struct 
 		ri.csize = 0;
 	}
 
-	if (ri.csize) {
+	if (0 && ri.csize) {
 		/* Check data CRC too */
 		unsigned char *dbuf;
 		__u32 crc;


--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-30 22:21 ` David Woodhouse
@ 2002-07-31 10:34   ` Jörn Engel
  2002-07-31 11:45     ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Jörn Engel @ 2002-07-31 10:34 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Tue, 30 July 2002 23:21:51 +0100, David Woodhouse wrote:
> It's actually possible to avoid the cosmetic annoyance by marking the node 
> as obsolete, so we don't check the CRC and hence don't bitch about it 
> failing. Older versions of JFFS2 did that, as does the current version in 
> 2.4.19-rc3. But it's harmless and expected behaviour.

How exactly is this done? On the next mount after the failed write? By
toggling bits in the status word of the node header?

If so, this will not work with any of the shiny STMicro chips with
hardware ECC, so I would have to tackle that one too, once we merge.

Jörn

-- 
Jörn Engel
mailto: joern@wohnheim.fh-wedel.de
http://wohnheim.fh-wedel.de/~joern
Phone: +49 179 6704074

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 10:34   ` Jörn Engel
@ 2002-07-31 11:45     ` David Woodhouse
  2002-07-31 11:57       ` Jörn Engel
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2002-07-31 11:45 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

joern@wohnheim.fh-wedel.de said:
>  How exactly is this done? On the next mount after the failed write?
> By toggling bits in the status word of the node header?

By marking it obsolete. It's an optimisation.

> If so, this will not work with any of the shiny STMicro chips with
> hardware ECC, so I would have to tackle that one too, once we merge. 

Doesn't matter. Our other plans for deferring CRC checking will avoid the 
harmless complaint.


--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 11:45     ` David Woodhouse
@ 2002-07-31 11:57       ` Jörn Engel
  2002-07-31 11:59         ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Jörn Engel @ 2002-07-31 11:57 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Wed, 31 July 2002 12:45:52 +0100, David Woodhouse wrote:
> joern@wohnheim.fh-wedel.de said:
> >  How exactly is this done? On the next mount after the failed write?
> > By toggling bits in the status word of the node header?
> 
> By marking it obsolete. It's an optimisation.

And this marking as obsolete is done how? Does is involve writes to
the flash node, after it has already been written?

Sorry for me being clueless.

> > If so, this will not work with any of the shiny STMicro chips with
> > hardware ECC, so I would have to tackle that one too, once we merge. 
> 
> Doesn't matter. Our other plans for deferring CRC checking will avoid the 
> harmless complaint.

Ok, I can live with that.

Jörn

-- 
When in doubt, use brute force.
-- Ken Thompson

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 11:57       ` Jörn Engel
@ 2002-07-31 11:59         ` David Woodhouse
  2002-07-31 12:16           ` Jörn Engel
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2002-07-31 11:59 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

joern@wohnheim.fh-wedel.de said:
>  And this marking as obsolete is done how? Does is involve writes to
> the flash node, after it has already been written?

> Sorry for me being clueless. 

Yep. There's a bit in the node type field which we clear to mark the node 
obsolete. And I've been very strict about making sure it's an optimisation 
_only_, and we never actually rely on being able do to it.

Deletion of directory entries, for example, could perhaps have been done 
just by marking the original as obsolete, but instead we do it by writing a 
new dirent with the same name and inode #0. 

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 11:59         ` David Woodhouse
@ 2002-07-31 12:16           ` Jörn Engel
  2002-07-31 12:17             ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Jörn Engel @ 2002-07-31 12:16 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Wed, 31 July 2002 12:59:09 +0100, David Woodhouse wrote:
> Yep. There's a bit in the node type field which we clear to mark the node 
> obsolete. And I've been very strict about making sure it's an optimisation 
> _only_, and we never actually rely on being able do to it.

Ok, then the stuff should be caught by the command set. But some tests
don't hurt either.
Very wise of you, btw.

Jörn

-- 
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 12:16           ` Jörn Engel
@ 2002-07-31 12:17             ` David Woodhouse
  2002-07-31 13:07               ` Jörn Engel
  0 siblings, 1 reply; 20+ messages in thread
From: David Woodhouse @ 2002-07-31 12:17 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

joern@wohnheim.fh-wedel.de said:
>  Ok, then the stuff should be caught by the command set. But some
> tests don't hurt either.

Should be caught by jffs2_can_mark_obsolete() which checks the device type.

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 12:17             ` David Woodhouse
@ 2002-07-31 13:07               ` Jörn Engel
  0 siblings, 0 replies; 20+ messages in thread
From: Jörn Engel @ 2002-07-31 13:07 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

On Wed, 31 July 2002 13:17:46 +0100, David Woodhouse wrote:
> Should be caught by jffs2_can_mark_obsolete() which checks the device type.

True. Ok, never mind.

Jörn

-- 
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: jffs2_scan_eraseblock() - errors
@ 2002-07-31 23:02 Curtis, Allen
  2002-08-01 10:44 ` David Woodhouse
  0 siblings, 1 reply; 20+ messages in thread
From: Curtis, Allen @ 2002-07-31 23:02 UTC (permalink / raw)
  To: 'David Woodhouse', Curtis, Allen
  Cc: 'linux-mtd@lists.infradead.org'

> >  INOCACHE_HASHSIZE was set to 1. Changed this to 128 and....
> > real	47.258s sys	47.25s
> 
> It's the first (and easy) part of the proof-of-concept which involves 
> moving said CRC32 check to later so the mount doesn't have to 
> wait for it.

With the CRC patch:
real	36.7s

PS: The version of scan.c we are using is 1.57. The patch note was for
1.51.2.3.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: jffs2_scan_eraseblock() - errors
  2002-07-31 23:02 Curtis, Allen
@ 2002-08-01 10:44 ` David Woodhouse
  0 siblings, 0 replies; 20+ messages in thread
From: David Woodhouse @ 2002-08-01 10:44 UTC (permalink / raw)
  To: Curtis, Allen; +Cc: jffs-dev

(Moved to JFFS list).

Allen.Curtis@Thales-IFS.com said:
> > real	47.258s

>  With the CRC patch:
>  real	36.7s

OK, that's respectable. I just need to finish the other part of the change 
-- to actually check the CRCs later on, rather than just leaving them 
unchecked. Make sure you take the patch back out of your tree now :)

We may get a little bit more of a speedup when we do that too, as we'll also
stop building up the fragment lists for every inode on mount, but in fact
that code wasn't showing up very high on the profile last time I looked.

> PS: The version of scan.c we are using is 1.57. The patch note was for
> 1.51.2.3.

Doesn't matter; it's basically the same in both branches. 

I suspect that if you profile it now you'll find the most time is taken in 
your flash map driver's copy_from routine or memcpy. 

I need to implement the XIP scheme which will also allow JFFS2 to directly
access the flash through a kind of pageable ioremap (which handles the times
when the flash is in a mode other than read mode correctly) instead of 
having to read into a buffer and work from that. We get to use caches and 
burst reads from flash at that point too.

--
dwmw2

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2002-08-01 10:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-30 23:30 jffs2_scan_eraseblock() - errors Curtis, Allen
2002-07-30 23:42 ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2002-07-31 23:02 Curtis, Allen
2002-08-01 10:44 ` David Woodhouse
2002-07-31  1:15 Curtis, Allen
2002-07-31  3:07 ` David Woodhouse
2002-07-31  0:38 Curtis, Allen
2002-07-31  0:48 ` David Woodhouse
2002-07-30 23:59 Curtis, Allen
2002-07-31  0:15 ` David Woodhouse
2002-07-31  0:18 ` David Woodhouse
2002-07-30 18:11 Curtis, Allen
2002-07-30 22:21 ` David Woodhouse
2002-07-31 10:34   ` Jörn Engel
2002-07-31 11:45     ` David Woodhouse
2002-07-31 11:57       ` Jörn Engel
2002-07-31 11:59         ` David Woodhouse
2002-07-31 12:16           ` Jörn Engel
2002-07-31 12:17             ` David Woodhouse
2002-07-31 13:07               ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox