public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* JFFS2 on dataflash problem
@ 2005-02-02 15:00 Creech, Matthew
  2005-02-02 21:32 ` Ulf Samuelsson
  2005-02-02 22:22 ` Thomas Gleixner
  0 siblings, 2 replies; 3+ messages in thread
From: Creech, Matthew @ 2005-02-02 15:00 UTC (permalink / raw)
  To: linux-mtd, jffs-dev; +Cc: linux-arm-kernel

Hi,

Please forgive the cross-posting, but I'm not sure where exactly to go
with this problem.

I have an embedded device based on Atmel's AT91RM9200DK board, which is
using serial dataflash (AT45DB642).  I've allocated a JFFS2 partition to
store non-volatile data.  In testing I stumbled across a particular
problem that only occurs after heavy hammering on our device, but is
fairly consistent in how and when it occurs.  The pattern has been
narrowed down so that a script doing something like this:

while [ 1 ]; do
   cp /mnt/jffs2/$RANDOM_FILE /mnt/jffs2/$BLAH
   # File size has been tested between 8K and 64K
done

makes the problem occur within 24 to 36 hours.  So something about
copying one file over another one breaks things.  The "problem" here is
that every I/O operation having to do with the JFFS2 partition blocks
indefinitely.  For example, after running the test for 2 days, you can
log into the device and try to "ls" the contents of /mnt/jffs2, and your
shell will hang.  You can then login on another terminal, but you'll get
another hang if you try to have any interaction with the JFFS2
partition.  So everything else seems to function normally, but JFFS2
just dies.  Also note that rebooting the device sometimes fixes things
right up (JFFS2 mounts fine and works properly as if nothing happened at
all), but sometimes the filesystem image is corrupt and refuses to
mount.

The system's specs are as follows:
AT91RM9200
AT45DB642 8MB serial dataflash device on SPI channel
2.4.25 kernel with VRS2 patchset, plus...
Andrew Victor's AT91 patchset (http://maxim.org.za/AT91RM9200/), plus...
Various snapshots of MTD taken over the past several months (no change)
Snapgear-3.2.0 userland (doubt this makes a difference)

As noted above, I've tried this with various MTD snapshots (using the
default MTD in 2.4.25 makes JFFS2 die almost immediately when doing
anything).  I've also recently compiled 2.6.10 with the AT91 patchset
(http://maxim.org.za/AT91RM9200/2.6/), but the *exact* same thing
happens.

The only kernel output I get is a few repetitions of this message, just
before the problem begins:

Node totlen on flash (0xffffffff) != totlen from node ref ([some
close-to-zero number])

My testing using raw dataflash access seems to rule out dataflash
issues, and _suggest_ that MTD isn't directly to blame, since no errors
occur when copying images to /dev/mtd/X then reading them back.  But I
can't rule anything out for sure.  It seems more likely that this is
some strange interaction between dataflash, MTD, and JFFS2, possibly
related to this device's 1056-byte blocksize; IIRC this was a problem in
the past that required some patching, since most code assumes a
power-of-two blocksize.

These are all just guesses, though, which is why I'm posting this
message.  I'm wondering whether there are any other ideas I can try to
narrow this problem down; since testing it requires 1-2 days, it's
fruitless to just make random guesses and see if they "fix" things.  Any
suggestions you have are greatly appreciated!

Thanks for the help

-- 
Matthew L. Creech 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: JFFS2 on dataflash problem
  2005-02-02 15:00 JFFS2 on dataflash problem Creech, Matthew
@ 2005-02-02 21:32 ` Ulf Samuelsson
  2005-02-02 22:22 ` Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: Ulf Samuelsson @ 2005-02-02 21:32 UTC (permalink / raw)
  To: Creech, Matthew, linux-mtd, jffs-dev, linux-arm-kernel

> I have an embedded device based on Atmel's AT91RM9200DK board, which is
> using serial dataflash (AT45DB642).  I've allocated a JFFS2 partition to
> store non-volatile data.  In testing I stumbled across a particular
> problem that only occurs after heavy hammering on our device, but is
> fairly consistent in how and when it occurs.  The pattern has been
> narrowed down so that a script doing something like this:

> while [ 1 ]; do
>   cp /mnt/jffs2/$RANDOM_FILE /mnt/jffs2/$BLAH
>   # File size has been tested between 8K and 64K
> done

If I wrote this script I would call it:

wear_out_dataflash_quickly.sh

There are some limitations to the number of erase cycles in the dataflash.
(IN any flash to be correct)
You can expect to reprogram it 50,000-100.000 times before the first errors
occur.
The second thing is that you need to do a block erase after the sum
of erases inside an 8 page block exceeds 10,000.

I am not at all sure that the MTD drivers/JFFS2 handle this (did not look at
the code).
I assume that JFFS may be able to detect a bad write and map
the block out from time to time, so this could explain why you can do the
recover.

If you really want to test the dataflash, write a CRC in the extra bytes
available.
(The page is 1024 + 32 bytes) and read back, checking CRC.

Best Regards,
Ulf Samuelsson
ulf@a-t-m-e-l.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: JFFS2 on dataflash problem
  2005-02-02 15:00 JFFS2 on dataflash problem Creech, Matthew
  2005-02-02 21:32 ` Ulf Samuelsson
@ 2005-02-02 22:22 ` Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Gleixner @ 2005-02-02 22:22 UTC (permalink / raw)
  To: Creech, Matthew; +Cc: linux-mtd

On Wed, 2005-02-02 at 10:00 -0500, Creech, Matthew wrote:
> Hi,
> 
> makes the problem occur within 24 to 36 hours.  So something about
> copying one file over another one breaks things.  The "problem" here is
> that every I/O operation having to do with the JFFS2 partition blocks
> indefinitely.  For example, after running the test for 2 days, you can
> log into the device and try to "ls" the contents of /mnt/jffs2, and your
> shell will hang.  You can then login on another terminal, but you'll get
> another hang if you try to have any interaction with the JFFS2
> partition.  So everything else seems to function normally, but JFFS2
> just dies.  Also note that rebooting the device sometimes fixes things
> right up (JFFS2 mounts fine and works properly as if nothing happened at
> all), but sometimes the filesystem image is corrupt and refuses to
> mount.

Can you turn on JFFS2 debugging (level 1) in this case and log the debug
output over a serial console ?

tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-02-02 22:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-02 15:00 JFFS2 on dataflash problem Creech, Matthew
2005-02-02 21:32 ` Ulf Samuelsson
2005-02-02 22:22 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox