NAND BBT corruption on MPC83xx

* NAND BBT corruption on MPC83xx
@ 2011-06-17 20:54 Matthew L. Creech
  2011-06-17 21:34 ` Scott Wood
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew L. Creech @ 2011-06-17 20:54 UTC (permalink / raw)
  To: linuxppc-dev

Hi, I posted this on the Linux-MTD list but haven't gotten any hits.
Since it looks like it could be MPC83xx-specific, I'm reposting here.
Rick Johnson noted a problem in fsl_elbc_nand.c back in May which
might be related:

http://lists.infradead.org/pipermail/linux-mtd/2011-May/035372.html

We've gotten some devices back from the field which all suffer from
this same problem on bootup when attaching UBI (these messages are
from U-Boot):

...
Bad block table found at page 524224, version 0x01
Bad block table found at page 524160, version 0x01
nand_bbt: ECC error while reading bad block table
...(long stream of bogus bad blocks)...
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI error: vtbl_check: volume table check failed: record 0, error 9
UBI error: ubi_init: cannot attach mtd1
UBI error: ubi_init: UBI error: cannot initialize UBI, error -22
UBI init error -22

Full console dumps from 2 devices are here:

http://mcreech.com/work/bbt-ecc-error.txt
http://mcreech.com/work/bbt-ecc-error2.txt

Another device encountered a slightly different error, but which I
assume is due to the same underlying problem:

UBI error: init_volumes: not enough PEBs, required 8061, available 8059
UBI error: ubi_wl_init_scan: no enough physical eraseblocks (-2, need 1)

A full dump from that one is here:

http://mcreech.com/work/bbt-ecc-error3.txt

Are there any known issues that could cause the BBT to
become corrupt like this?

I noticed that the reported bad blocks were all aligned at multiples
of 0x80000 (with one exception).  Dump #1 shows:
 - one BBT with lots of bytes that have their lower 1 or 2 bits
un-set (e.g. 0xfe instead of 0xff): this explains all the
each-4th-block alignment.
 - the other BBT shows only one factory-marked bad block at
0x062e0000, which is presumably correct.  This is preserved in the
bogus BBT, and is the only non-0x80000-aligned bad block in the table.
 - Only the first 1024 bytes of the BBT contain bogus info - the
latter half of the BBT is all correct

It seems like the original BBT somehow had 0-2 bits corrupted at the
low end of each of its bytes, either while in memory or when the BBT
was written to NAND.  Any ideas on what I can do to isolate the
problem?  Thanks in advance!

More info on this board:
- MPC 8313 SoC
- 1GB Samsung NAND flash (K9K8G08U0B)
- Linux 2.6.31
- U-Boot 2009.06

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 15+ messages in thread