NAND BBT corruption on MPC83xx

* NAND BBT corruption on MPC83xx
@ 2011-06-15 19:48 Matthew L. Creech
  2011-06-15 20:17 ` Mike Hench
  2011-06-23  8:35 ` Artem Bityutskiy
  0 siblings, 2 replies; 10+ messages in thread
From: Matthew L. Creech @ 2011-06-15 19:48 UTC (permalink / raw)
  To: MTD list

Hi, I'm not sure whether this list or the U-Boot list is more
appropriate, but figured I'd start here and see if anyone can help.

We've gotten some devices back from the field which all suffer from
this same problem on bootup when attaching UBI (these messages are
from U-Boot):

...
Bad block table found at page 524224, version 0x01
Bad block table found at page 524160, version 0x01
nand_bbt: ECC error while reading bad block table
...(long stream of bogus bad blocks)...
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI error: vtbl_check: volume table check failed: record 0, error 9
UBI error: ubi_init: cannot attach mtd1
UBI error: ubi_init: UBI error: cannot initialize UBI, error -22
UBI init error -22

A full console dump is here:

http://mcreech.com/work/bbt-ecc-error.txt

Question #1: Is the UBI error here attributable to the blocks which
are wrongly marked as bad?  I would assume that it's a red herring,
and I should focus on figuring out how the BBT got corrupted, but
figured I'd check first.

Question #2: Are there any known issues that could cause the BBT to
become corrupt like this?

I noticed that the reported bad blocks were all aligned at multiples
of 0x80000 (with one exception).  Dumping the last 2 blocks shows:
  - one BBT with lots of bytes that have their lower 1 or 2 bits
un-set (e.g. 0xfe instead of 0xff): this explains all the
each-4th-block alignment.
  - the other BBT shows only one factory-marked bad block at
0x062e0000, which is presumably correct.  This is preserved in the
bogus BBT, and is the only non-0x80000-aligned bad block in the table.
  - Only the first 1024 bytes of the BBT contain bogus info - the
latter half of the BBT is all correct

It seems like the original BBT somehow had 0-2 bits corrupted at the
low end of each of its bytes, either while in memory or when the BBT
was written to NAND.  Any ideas on what I can do to isolate the
problem?  Thanks in advance!

More info on this board:
- MPC 8313 SoC
- 1GB Samsung NAND flash (K9K8G08U0B)
- Linux 2.6.31
- U-Boot 2009.06

-- 
Matthew L. Creech

^ permalink raw reply	[flat|nested] 10+ messages in thread