From: Huang Shijie <shijie8@gmail.com>
To: Elie De Brauwer <eliedebrauwer@gmail.com>
Cc: b32955@freescale.com, dwmw2@infradead.org,
linux-mtd@lists.infradead.org, dedekind1@gmail.com
Subject: Re: [PATCH v1] mtd: gpmi: Bitflip support in erased regions
Date: Wed, 11 Dec 2013 21:24:58 +0800 [thread overview]
Message-ID: <20131211132455.GA1284@gmail.com> (raw)
In-Reply-To: <1386619091-23992-1-git-send-email-eliedebrauwer@gmail.com>
On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
> Fixed cc to linux-mtd, please ignore my previous version.
>
> Hello all,
>
> I bumped into an issue on a custom board with an i.MX28 and a Micron
> MT29F4G08 NAND flash. My system running a 3.9.0 failed to boot during
> upgrade testing due to UBI errors related to a bitflips in NAND:
>
> [ 3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
> ...
> [ 4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
> [ 4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
> [ 4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383
thanks a lot for this patch.
I met the "corrupt empty space" issue too.
>
> Diving a bit deeper with nanddump:
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 8
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
> ECC: 1 uncorrectable bitflip(s) at offset 0x06efe000
> root@(none):~# nanddump -s 116129792 -c --noecc -l 262144 /dev/mtd8
> ...
> 0x06efe6a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7f |................|
>
> Which is points to a well know 'corrupt empty space' issue, which appears
> every now and then:
> - http://permalink.gmane.org/gmane.linux.drivers.mtd/46617
> - http://lists.infradead.org/pipermail/linux-mtd/2012-January/039254.html
>
> Hence I went on a quest to teach my NAND driver how to do this, gpmi-nand in
> question. The problem is that although on properly written data which gets
> streamed through the BCH block we get 16 bit ecc, if we erase block we git
> like 0 bit ecc, since erase is a command, not a stream of data travelling
> through the BCH block. The BCH block (see i.MX28 reference manual chapters
> 15 GPMI and 16 BCH) can tell us of protected chunks:
> - if they are error free (if ecc data is present)
> - the amount of bitflips they contain (if ecc data is present)
> - if they are fully erased (all 0xFF's)
> - if they are uncorrectable (# bitflips > ecc_strength, or 0xFF with
> bitflips).
> In the current situation as soon as a single bitflip exists in a region
> where the parity information is all 0xFF (looking like it's erased) the
> block is marked as uncorrectable. Which is a pity since I can peform this
> kind of ECC by hand.
>
> Quote datasheet:
> "As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
> that all of the bits of a payload data block or metadata block are one, including any associated
> parity bytes. The all-ones case for both parity and data indicates an erased block in the
> NAND device."
>
> Fortunately we can more or less tune this parameter by using the
> ERASE_THRESHOLD in HW_BCH_MODE register:
> "This value indicates the maximum number of zero bits on a flash page for
> it to be considered erased. For SLC NAND devices, this value should be
I met the "correct empty space" with a Toshiba SLC nand.
The spec tells us it should be 0 for the SLC nand.
I will double-check it tomorrow.
> programmed to 0 (meaning that the entire page should consist of bytes of
> 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank
> pages), so this threshold can be used to tune the erased page checking
> algorithm."
>
> So as my solution I'm setting this erase threshold to the ecc_strength
> derived from the geometry, meaning that I will tolerate the same number of
> bitflips the BCH block would consider correctable.
> The side effect is that whever I'm reading a page (gpmi_ecc_read_page() )
> which the BCH block marked as "erased" I need to take a software approach.
> The software approach is inspired on what is currently
> done in the omap2 driver (but not free from discussion). At that point I
> now that the page can contain up to ecc_strenght bitflips, so I need to
The ecc_strength can be 40 sometimes.
I really donot know what is the proper value for the ERASE_THRESHOLD.
Maybe set ERASE_THRESHOLD with 2 is ok?
I think the ecc_strength is a little large.
> count and correct them if necessary. This obviously gives a slight overhead
> when compared to a normal read of erased pages but is more polite towards
> upper layers.
> On the other hand, the upper layers should also show some intelligence when
> it comes to reading erased pages which doesn't make much sense either.
>
> I considered alternatives based upon the 'let it fails as it does now, and
> try to intelligently figure out whether or not it's an erased page or not'
> possibly using additional byte in the metadata or something based
> on fuzzy rules, but this is actually the solution which ended up giving
> most certainty.
>
> I have tested this on a 3.9/i.MX28 and after applying this patch my board
> went from a stubbornly-whining-about-corrupt-empty-space to happily
> mounting the partition and even the trace of my stuck bit disappeared:
>
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 0
> ECC corrected: 1
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
>
>
> I have also seen Pekon is eagerly trying to get the code removed from omap2,
> (e.g. http://lists.infradead.org/pipermail/linux-mtd/2013-July/047548.html )
> but even though his set of patches is currently in their 4th version I
> haven't seen any proper solution to handling bitflips in erased pages
> without iterating through them.
>
I will read it.
Please give us more time about this issue.
I will discuss it with out IC guy.
thanks
Huang Shijie
next prev parent reply other threads:[~2013-12-11 13:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-09 19:58 [PATCH v1] mtd: gpmi: Bitflip support in erased regions Elie De Brauwer
2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
2013-12-10 9:37 ` Peter Korsgaard
2013-12-11 13:24 ` Huang Shijie [this message]
2013-12-13 8:49 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
2013-12-13 10:00 ` Elie De Brauwer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131211132455.GA1284@gmail.com \
--to=shijie8@gmail.com \
--cc=b32955@freescale.com \
--cc=dedekind1@gmail.com \
--cc=dwmw2@infradead.org \
--cc=eliedebrauwer@gmail.com \
--cc=linux-mtd@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).