From: Liu Hui <onlyflyer@gmail.com>
To: Ricard Wanderlof <ricard.wanderlof@axis.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
Jamie Lokier <jamie@shareable.org>
Subject: Re: Is it an atomic operation for writing a page in NAND flash
Date: Thu, 21 Jan 2010 09:37:43 +0800 [thread overview]
Message-ID: <2c3b11251001201737m2dadd594jc6dc2df2002ee62b@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1001201646400.32263@lnxricardw.se.axis.com>
Very informative, it expand my view. Thanks!
2010/1/21 Ricard Wanderlof <ricard.wanderlof@axis.com>:
>
> On Wed, 20 Jan 2010, Liu Hui wrote:
>
>> Thanks you very much, CRC is the real solution.
>>
>> But I don't understand, if a partial write happens, we use ECC to
>> correct the data, we will find the data can't be corrected, then
>> -EBADMSG will be returned(see nand_correct_data()), then we can know
>> this page are corrupted. IMHO, this works.
>
> Assuming the ECC algorithm used by mtd, it only produces correct results in
> the case of 0, 1 or 2 bit errors in the data. For more bit errors than that,
> the result is undefined.
>
> Let's assume a partial write occurs, which leads to 57 bit errors compared
> to what was originally supposed to be there. Since there are more than 2 bit
> errors, the algorthm output is undefined; it may say that the data can't be
> corrected, or it may say that the data is ok, or it may say that the data
> can be corrected; it's impossible to tell. As far as I understand, it is not
> uncommon for ECC to say the data is correct when it fact it isn't.
>
> A slightly trivial case:
>
> Again assuming the ECC algorithm used by mtd, the ECC bytes for a chunk of
> data where all the bytes have the same value is 0xFFFFFF, regardless of the
> actual value. So, say you have a page full of 0xA3; the ECC is then
> 0xFFFFFFF. Now, assuming a partial write causes bit 2 of all bit cells to
> not change from 1 to 0 when programming. The result is a page full of 0xA7,
> in effect, 256 bit errors (assuming a page size of 256 bits, or at least,
> assuming an ECC calculation encompassing that many bytes). But the ECC will
> still be 0xFFFFFF, and the corresponding ECC calculation will say that the
> data is correct. That is, as I mentioned before, because the result of an
> ECC calculation on data with >2 bit errors is undefined.
>
> Note that there are other ECC algorithms which can correct more error bits.
> For MLC flash it is recommended to use an algorithm which corrects 4 bit
> errors rather than a single bit error in a block of data. Such algorithms
> require more ECC bits though.
>
> One has a tendency to think of ECC as a checking algorithm. It is not. It
> corrects and detects bit errors under certain circumstances. Outside those
> circumstances it is worthless. For the case of the software algorithm used
> in mtd, it is worthless if there are more than 2 bit errors. A failed write
> could cause any number of bit errors, so it is worthless to check the result
> using the ECC algorithm. The normal failure mode of a nand flash chip is
> random single random bit errors with low probability. ECC handles this
> elegantly.
>
>
> Elaborating on this slightly, a devil's advocate would considere ECC
> worthless as a correction algorithm for a flash chip. Assume one bit error
> occurs, which is corrected by ECC. Then another bit error occurs in the same
> page. ECC then detects a failure. Then another bit error occurrs. The ECC
> algorithm is now worthless, it may detect the error, it may say that the
> data is correct, or it may even try to correct it (erroneously).
>
> The reason that all this works in practice is that the probability of a bit
> error occurring is so low that the probability of two bit errors occurring
> in the same page is very low, in some respect lower than other failure modes
> in the system, so that we don't have to worry about it. (For example: What
> about bit errors occurring in RAM chips from cosmic radiation? It is a real
> risk, but so small that most systems don't have to worry about it.)
>
> It can be a real concern though, and that is why things like UBI provide
> so-called bit scrubbing: whenever it detects that ECC has done a bit
> correction in a block, it erases that block and rewrites the data [lots of
> details omitted here] so that the chance of two bit errors ever occurring in
> the same page will be very small indeed.
>
> Especially in these days of larger and larger flash chips resulting from
> shrinking chip geometries this is problem that is getting worse and worse.
> It also tends to vary hugely among manufacturers.
>
> /Ricard
> --
> Ricard Wolf Wanderlöf ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden www.axis.com
> Phone +46 46 272 2016 Fax +46 46 13 61 30
>
--
Thanks & Best Regards
Liu Hui
--
next prev parent reply other threads:[~2010-01-21 1:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 9:58 Is it an atomic operation for writing a page in NAND flash Liu Hui
2010-01-20 10:13 ` Ricard Wanderlof
2010-01-20 13:11 ` Liu Hui
2010-01-20 13:33 ` Jamie Lokier
2010-01-20 14:25 ` Liu Hui
2010-01-20 14:54 ` Ricard Wanderlof
2010-01-20 15:11 ` Liu Hui
2010-01-20 16:09 ` Ricard Wanderlof
2010-01-21 1:37 ` Liu Hui [this message]
2010-01-20 16:17 ` David Parkinson
2010-01-20 16:35 ` Ricard Wanderlof
2010-01-20 23:08 ` Charles Manning
2010-01-20 13:41 ` Jamie Lokier
2010-01-20 13:58 ` Artem Bityutskiy
2010-01-20 14:06 ` Liu Hui
2010-01-20 14:38 ` Artem Bityutskiy
2010-01-20 14:46 ` Artem Bityutskiy
2010-01-20 14:52 ` Liu Hui
2010-01-20 14:01 ` Liu Hui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c3b11251001201737m2dadd594jc6dc2df2002ee62b@mail.gmail.com \
--to=onlyflyer@gmail.com \
--cc=jamie@shareable.org \
--cc=linux-mtd@lists.infradead.org \
--cc=ricard.wanderlof@axis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox