jffs2 robustness against powerfailure

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* jffs2 robustness against powerfailure
@ 2005-10-14  9:35 David Jander
  2005-10-14 14:33 ` Mark Chambers
  2005-10-17 12:37 ` David Woodhouse
  0 siblings, 2 replies; 6+ messages in thread
From: David Jander @ 2005-10-14  9:35 UTC (permalink / raw)
  To: linux-mtd; +Cc: linuxppc-embedded

Hi,

We have a custom embedded linux board, based on a MPC852T processor, running 
2.4.25 kernel from denx. Jffs2 has certain backported patches after cvs from 
03/2005.
I wanted to try some stress-testing the flash using jffs2 and the "checkfs" 
tool which comes as part of the jffs2 sources. I setup a "power-cycle-box" as 
described in the README and started logging everything the system produced.
Since jffs2 claims to be robust against power-failures I set the threshold for 
maximum number of corrupt files allowed to 0. The test procedure rewrites all 
testfiles using a single write() call for each file, so that should be ok.
After 279 power-cycles, it stopped with a CRC error in "file13". Of course 
"file13" was the one being written to when power was cut off the last time.

Question: Is this a known shorcoming of jffs2, or must I assume that my 
hardware is broken?

The latter is relatively unlikely, once I try to explain the contents of the 
file:

diskles9:/flash # hexdump file13
0000000 0000 0300 0000 036d 0000 0942 0000 20b0
0000010 0000 08dd 0000 0715 0000 1da1 0000 043c
0000020 0000 05c2 0000 228d 0000 10ad 0000 1c35
...
00002e0 0000 14f1 0000 0d94 0000 1911 0000 12dd
00002f0 0000 09e9 0000 0686 0000 2380 0000 2294
0000300 0000 18f1 0000 01be 0000 25bb 0000 1af9
0000310 0000 1b94 0000 02b0 0000 2511 0000 1f79
0000320 0000 1f97 0000 0b53 0000 1eb7 0000 10bb
0000330 0000 2529 0000 2130 0000 0361 0000 0ff8
0000340 0000 1428 0000 10ab 0000 0364 0000 1b89
0000350 b110

As one can easily see, the first int (0x00000300) indicates the file-length, 
after which the 16-bit CRC should be placed. At offset 0000300 in the file 
there seems to be just more random data (a CRC of 0x0000 is unlikely and 
known wrong in this case).
At the end of the file (offset 0x0000350) there is something that looks more 
like a checksum.
Apparently the previous file was 0x0352 bytes long and the new file was going 
to be 0x0302 bytes long, but was never written completely. 
How comes I get a to see a valid file containing a mix of old and new data if 
it was written with a single write() call?????
Shouldn't jffs2 throw away the new incomplete node and keep the old version of 
the file?

Can anyone explain what happened here??

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: jffs2 robustness against powerfailure
  2005-10-14  9:35 jffs2 robustness against powerfailure David Jander
@ 2005-10-14 14:33 ` Mark Chambers
  2005-10-17  6:42   ` David Jander
  2005-10-17 12:37 ` David Woodhouse
  1 sibling, 1 reply; 6+ messages in thread
From: Mark Chambers @ 2005-10-14 14:33 UTC (permalink / raw)
  To: David Jander; +Cc: linuxppc-embedded

>
> Hi,
>
> We have a custom embedded linux board, based on a MPC852T processor,
running
> 2.4.25 kernel from denx. Jffs2 has certain backported patches after cvs
from
> 03/2005.
> I wanted to try some stress-testing the flash using jffs2 and the
"checkfs"
> tool which comes as part of the jffs2 sources. I setup a "power-cycle-box"
as
> described in the README and started logging everything the system
produced.
> Since jffs2 claims to be robust against power-failures I set the threshold
for
> maximum number of corrupt files allowed to 0. The test procedure rewrites
all
> testfiles using a single write() call for each file, so that should be ok.
> After 279 power-cycles, it stopped with a CRC error in "file13". Of course
> "file13" was the one being written to when power was cut off the last
time.
>
> Question: Is this a known shorcoming of jffs2, or must I assume that my
> hardware is broken?
>
> The latter is relatively unlikely, once I try to explain the contents of
the
> file:
>
> diskles9:/flash # hexdump file13
> 0000000 0000 0300 0000 036d 0000 0942 0000 20b0
> 0000010 0000 08dd 0000 0715 0000 1da1 0000 043c
> 0000020 0000 05c2 0000 228d 0000 10ad 0000 1c35
> ...
> 00002e0 0000 14f1 0000 0d94 0000 1911 0000 12dd
> 00002f0 0000 09e9 0000 0686 0000 2380 0000 2294
> 0000300 0000 18f1 0000 01be 0000 25bb 0000 1af9
> 0000310 0000 1b94 0000 02b0 0000 2511 0000 1f79
> 0000320 0000 1f97 0000 0b53 0000 1eb7 0000 10bb
> 0000330 0000 2529 0000 2130 0000 0361 0000 0ff8
> 0000340 0000 1428 0000 10ab 0000 0364 0000 1b89
> 0000350 b110
>
> As one can easily see, the first int (0x00000300) indicates the
file-length,
> after which the 16-bit CRC should be placed. At offset 0000300 in the file
> there seems to be just more random data (a CRC of 0x0000 is unlikely and
> known wrong in this case).
> At the end of the file (offset 0x0000350) there is something that looks
more
> like a checksum.
> Apparently the previous file was 0x0352 bytes long and the new file was
going
> to be 0x0302 bytes long, but was never written completely.
> How comes I get a to see a valid file containing a mix of old and new data
if
> it was written with a single write() call?????
> Shouldn't jffs2 throw away the new incomplete node and keep the old
version of
> the file?
>
> Can anyone explain what happened here??
>
> Greetings,
>
> -- 
> David Jander

Well, I can tell you this, from bitter experience:  Chips do strange stuff
when power is
coming or going.  One thing that can happen is addresses get messed up, so
writes go
to the wrong place.  You say your hardware is good, but it may not have been
thoroughly characterized for power-down behavior.   Probably the same chip
that
generates a power-up reset generates a reset when power is falling, check if
the trip
voltage is high enough.

You could rule a power problem out by running your tests where you reset the
processor (shorting hreset or poreset somewhere) but not power-cycling the
board, and see if
the failures are the same.

Just my $.02,
Mark Chambers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: jffs2 robustness against powerfailure
  2005-10-14 14:33 ` Mark Chambers
@ 2005-10-17  6:42   ` David Jander
  0 siblings, 0 replies; 6+ messages in thread
From: David Jander @ 2005-10-17  6:42 UTC (permalink / raw)
  To: linuxppc-embedded

On Friday 14 October 2005 16:33, Mark Chambers wrote:
>[...]
> Well, I can tell you this, from bitter experience:  Chips do strange stuff
> when power is
> coming or going.  One thing that can happen is addresses get messed up, so
> writes go
> to the wrong place.  You say your hardware is good, but it may not have
> been thoroughly characterized for power-down behavior.   Probably the same
> chip that
> generates a power-up reset generates a reset when power is falling, check
> if the trip
> voltage is high enough.

There's a hardware watchdog which monitors both power-supplies and asserts 
reset in case of failure. It's reliable and it works.
But that all doesn't matter. You seem to oversee three facts:
1.- The file being written to at the moment of power failure is always the 
file that has a CRC failure (if that happens) afterwards, not other files. So 
"writes go to the wrong place" is quite unlikely.
2.- If a piece of flash get's corrupted, there's always the jffs2's CRC that 
should trip and detect that block as invalid.
3.- If there really were writes to the wrong place, I'd expect that to be 
noticeable by looking at the files. There is random data being written to the 
files, but fortunately not that random: It's all 32-bit integers from 
0...10000. That makes the chances of corrupt random data, or valid data 
written to the wrong place not being noticed actually quite small!

It looks to me very much like a jffs2 bug or design flaw, maybe a 
race-condition, but since I don't know jffs2 internals that much, I can't 
tell for sure. Isn't this a known issue?

> You could rule a power problem out by running your tests where you reset
> the processor (shorting hreset or poreset somewhere) but not power-cycling
> the board, and see if
> the failures are the same.

I could do that, but I fear it will give the same results.
Ok, for sciences sake I'll do that experiment.

Sincerely,

-- 
David Jander

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: jffs2 robustness against powerfailure
  2005-10-14  9:35 jffs2 robustness against powerfailure David Jander
  2005-10-14 14:33 ` Mark Chambers
@ 2005-10-17 12:37 ` David Woodhouse
  2005-10-19  8:10   ` David Jander
  1 sibling, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2005-10-17 12:37 UTC (permalink / raw)
  To: David Jander; +Cc: linux-mtd, linuxppc-embedded

On Fri, 2005-10-14 at 11:35 +0200, David Jander wrote:
> We have a custom embedded linux board, based on a MPC852T processor, running 
> 2.4.25 kernel from denx. Jffs2 has certain backported patches after cvs from 
> 03/2005.

That sounds like a recipe for pain. March 2005 wasn't a good time to
take a snapshot from CVS; that just happens to be the time that we
stopped bothering to make it build in obsolete kernels.

If you want _stable_ JFFS2 code, you should use the code which is in the
2.4.31 kernel, or use the code which is in the 2.6 kernel (perhaps
updated from current CVS). 

> How comes I get a to see a valid file containing a mix of old and new
> data if it was written with a single write() call?????

Linux doesn't guarantee atomicity of writes larger than a single page,
but since your case is smaller than a page, it should have been atomic.

> Shouldn't jffs2 throw away the new incomplete node and keep the old
> version of the file?

Yes, it should. It's acceptable that there are extra data in the file
after 0x300 bytes, because the test program first does a write() call
and then a subsequent truncate() call. But it's not expected that the
0x300-byte write was not atomic; except in certain circumstances (like
reaching the end of an eraseblock and writing a smaller node there) you
should have seen all of it, or none. 

Please could you reproduce on a sane kernel and show the output of the
checkfs program during your test just before the power down, and also if
possible take an image of the contents of the flash _before_ mounting it
again after the power cycle. I'd like to see precisely the log nodes
which were present on the flash. If it's difficult to take a snapshot
before remounting, then running with CONFIG_JFFS2_FS_DEBUG=1 and
capturing all the KERN_DEBUG output via a serial console would suffice.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: jffs2 robustness against powerfailure
  2005-10-17 12:37 ` David Woodhouse
@ 2005-10-19  8:10   ` David Jander
  2005-10-19  9:50     ` David Woodhouse
  0 siblings, 1 reply; 6+ messages in thread
From: David Jander @ 2005-10-19  8:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, linuxppc-embedded

On Monday 17 October 2005 14:37, David Woodhouse wrote:
> On Fri, 2005-10-14 at 11:35 +0200, David Jander wrote:
> > We have a custom embedded linux board, based on a MPC852T processor,
> > running 2.4.25 kernel from denx. Jffs2 has certain backported patches
> > after cvs from 03/2005.
>
> That sounds like a recipe for pain. March 2005 wasn't a good time to
> take a snapshot from CVS; that just happens to be the time that we
> stopped bothering to make it build in obsolete kernels.

That's why I posted to the linuxppc-embedded list, because I know there are 
quite some people using the same version (denx CVS kernel), and might have 
had issues of this kind also, although I mostly hear that it seems pretty 
stable and doesn't give problems.

> If you want _stable_ JFFS2 code, you should use the code which is in the
> 2.4.31 kernel, or use the code which is in the 2.6 kernel (perhaps
> updated from current CVS).

2.6 is not an option yet for mpc8xx architecture, so I'll have to stick with 
either what I have now or 2.4.31, but I fear the tradeoff of using vanilla 
2.4.31 jffs2 will be much slower fs, prohibitively long mount-times, etc... 
am I right?

>[...]
> Please could you reproduce on a sane kernel and show the output of the
> checkfs program during your test just before the power down, and also if
> possible take an image of the contents of the flash _before_ mounting it
> again after the power cycle. I'd like to see precisely the log nodes
> which were present on the flash. If it's difficult to take a snapshot
> before remounting, then running with CONFIG_JFFS2_FS_DEBUG=1 and
> capturing all the KERN_DEBUG output via a serial console would suffice.

I am still busy doing experiments, please have a little patience.
Until now I have turned on debug info in the same kernel as before, and get 
literally tons of log info. My monitor script had a bug, so the board was 
reset a little to soon in several occasions (shouldn't harm, should it), so 
now I have an image of jffs2 which on boot of the system produces a BUG() in 
gc.c line 139. This is not what I am looking for right now, and I still have 
to discard any possibilities that this could have happened due to other 
problems (RAM issues, etc). Once I finish sorting this out, I'd be glad to 
send you a few megabytes of debug output along with a "broken" jffs2 image if 
you like. Actually I'd be very grateful if you could take some time to look 
at it and give me your opinion, because I am still slightly clueless about 
jffs2.

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: jffs2 robustness against powerfailure
  2005-10-19  8:10   ` David Jander
@ 2005-10-19  9:50     ` David Woodhouse
  0 siblings, 0 replies; 6+ messages in thread
From: David Woodhouse @ 2005-10-19  9:50 UTC (permalink / raw)
  To: David Jander; +Cc: linux-mtd, linuxppc-embedded

On Wed, 2005-10-19 at 10:10 +0200, David Jander wrote:
> 2.6 is not an option yet for mpc8xx architecture, so I'll have to stick with 
> either what I have now or 2.4.31, but I fear the tradeoff of using vanilla 
> 2.4.31 jffs2 will be much slower fs, prohibitively long mount-times, etc... 
> am I right?

If it's all running perfectly for you and you have no work to do, then
yes, perhaps you're right. But since that's evidently _not_ the case,
then no, I would disagree.

If I were you, the first thing I'd do would be to get a current kernel
working. It should only take a week or so -- porting from 2.4 to 2.6
really isn't that difficult.

> I am still busy doing experiments, please have a little patience.
> Until now I have turned on debug info in the same kernel as before, and get 
> literally tons of log info. My monitor script had a bug, so the board was 
> reset a little to soon in several occasions (shouldn't harm, should it), so 
> now I have an image of jffs2 which on boot of the system produces a BUG() in 
> gc.c line 139.

That should never happen, regardless of when the board is reset.
Assuming it still happens with JFFS2 code I care about (either 2.4 or
2.6), please could I have a copy of this image?

The problem you first reported doesn't seem too worrying to me. Writes
aren't always atomic -- in fact the Linux VFS¹ _guarantees_ that writes
larger than a page are _not_ atomic, because it splits pages up to call
prepare_write() and commit_write() on each one.

JFFS2 will mostly write each page out in a single node, but when there
is only a small amount of space at the end of an eraseblock it will
split writes still further, filling the eraseblock with as much data as
possible before writing the remainder of the page into a new eraseblock.
I suspect that's what happened in the case you showed. 

-- 
dwmw2

¹ Assuming you use generic_file_write()

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-10-19  9:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-14  9:35 jffs2 robustness against powerfailure David Jander
2005-10-14 14:33 ` Mark Chambers
2005-10-17  6:42   ` David Jander
2005-10-17 12:37 ` David Woodhouse
2005-10-19  8:10   ` David Jander
2005-10-19  9:50     ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).