From: Huang Shijie <shijie8@gmail.com>
To: Elie De Brauwer <eliedebrauwer@gmail.com>
Cc: b32955@freescale.com, dwmw2@infradead.org,
linux-mtd@lists.infradead.org, dedekind1@gmail.com
Subject: Re: [PATCH v1] mtd: gpmi: Bitflip support in erased regions
Date: Wed, 11 Dec 2013 21:24:58 +0800 [thread overview]
Message-ID: <20131211132455.GA1284@gmail.com> (raw)
In-Reply-To: <1386619091-23992-1-git-send-email-eliedebrauwer@gmail.com>
On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
> Fixed cc to linux-mtd, please ignore my previous version.
>
> Hello all,
>
> I bumped into an issue on a custom board with an i.MX28 and a Micron
> MT29F4G08 NAND flash. My system running a 3.9.0 failed to boot during
> upgrade testing due to UBI errors related to a bitflips in NAND:
>
> [ 3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
> ...
> [ 4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
> [ 4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
> [ 4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383
thanks a lot for this patch.
I met the "corrupt empty space" issue too.
>
> Diving a bit deeper with nanddump:
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 8
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
> ECC: 1 uncorrectable bitflip(s) at offset 0x06efe000
> root@(none):~# nanddump -s 116129792 -c --noecc -l 262144 /dev/mtd8
> ...
> 0x06efe6a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7f |................|
>
> Which is points to a well know 'corrupt empty space' issue, which appears
> every now and then:
> - http://permalink.gmane.org/gmane.linux.drivers.mtd/46617
> - http://lists.infradead.org/pipermail/linux-mtd/2012-January/039254.html
>
> Hence I went on a quest to teach my NAND driver how to do this, gpmi-nand in
> question. The problem is that although on properly written data which gets
> streamed through the BCH block we get 16 bit ecc, if we erase block we git
> like 0 bit ecc, since erase is a command, not a stream of data travelling
> through the BCH block. The BCH block (see i.MX28 reference manual chapters
> 15 GPMI and 16 BCH) can tell us of protected chunks:
> - if they are error free (if ecc data is present)
> - the amount of bitflips they contain (if ecc data is present)
> - if they are fully erased (all 0xFF's)
> - if they are uncorrectable (# bitflips > ecc_strength, or 0xFF with
> bitflips).
> In the current situation as soon as a single bitflip exists in a region
> where the parity information is all 0xFF (looking like it's erased) the
> block is marked as uncorrectable. Which is a pity since I can peform this
> kind of ECC by hand.
>
> Quote datasheet:
> "As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
> that all of the bits of a payload data block or metadata block are one, including any associated
> parity bytes. The all-ones case for both parity and data indicates an erased block in the
> NAND device."
>
> Fortunately we can more or less tune this parameter by using the
> ERASE_THRESHOLD in HW_BCH_MODE register:
> "This value indicates the maximum number of zero bits on a flash page for
> it to be considered erased. For SLC NAND devices, this value should be
I met the "correct empty space" with a Toshiba SLC nand.
The spec tells us it should be 0 for the SLC nand.
I will double-check it tomorrow.
> programmed to 0 (meaning that the entire page should consist of bytes of
> 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank
> pages), so this threshold can be used to tune the erased page checking
> algorithm."
>
> So as my solution I'm setting this erase threshold to the ecc_strength
> derived from the geometry, meaning that I will tolerate the same number of
> bitflips the BCH block would consider correctable.
> The side effect is that whever I'm reading a page (gpmi_ecc_read_page() )
> which the BCH block marked as "erased" I need to take a software approach.
> The software approach is inspired on what is currently
> done in the omap2 driver (but not free from discussion). At that point I
> now that the page can contain up to ecc_strenght bitflips, so I need to
The ecc_strength can be 40 sometimes.
I really donot know what is the proper value for the ERASE_THRESHOLD.
Maybe set ERASE_THRESHOLD with 2 is ok?
I think the ecc_strength is a little large.
> count and correct them if necessary. This obviously gives a slight overhead
> when compared to a normal read of erased pages but is more polite towards
> upper layers.
> On the other hand, the upper layers should also show some intelligence when
> it comes to reading erased pages which doesn't make much sense either.
>
> I considered alternatives based upon the 'let it fails as it does now, and
> try to intelligently figure out whether or not it's an erased page or not'
> possibly using additional byte in the metadata or something based
> on fuzzy rules, but this is actually the solution which ended up giving
> most certainty.
>
> I have tested this on a 3.9/i.MX28 and after applying this patch my board
> went from a stubbornly-whining-about-corrupt-empty-space to happily
> mounting the partition and even the trace of my stuck bit disappeared:
>
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 0
> ECC corrected: 1
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
>
>
> I have also seen Pekon is eagerly trying to get the code removed from omap2,
> (e.g. http://lists.infradead.org/pipermail/linux-mtd/2013-July/047548.html )
> but even though his set of patches is currently in their 4th version I
> haven't seen any proper solution to handling bitflips in erased pages
> without iterating through them.
>
I will read it.
Please give us more time about this issue.
I will discuss it with out IC guy.
thanks
Huang Shijie
next prev parent reply other threads:[~2013-12-11 13:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-09 19:58 [PATCH v1] mtd: gpmi: Bitflip support in erased regions Elie De Brauwer
2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
2013-12-10 9:37 ` Peter Korsgaard
2013-12-11 13:24 ` Huang Shijie [this message]
2013-12-13 8:49 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
2013-12-13 10:00 ` Elie De Brauwer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131211132455.GA1284@gmail.com \
--to=shijie8@gmail.com \
--cc=b32955@freescale.com \
--cc=dedekind1@gmail.com \
--cc=dwmw2@infradead.org \
--cc=eliedebrauwer@gmail.com \
--cc=linux-mtd@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.