Is it an atomic operation for writing a page in NAND flash

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* Is it an atomic operation for writing a page in NAND flash
@ 2010-01-20  9:58 Liu Hui
  2010-01-20 10:13 ` Ricard Wanderlof
  2010-01-20 13:41 ` Jamie Lokier
  0 siblings, 2 replies; 19+ messages in thread
From: Liu Hui @ 2010-01-20  9:58 UTC (permalink / raw)
  To: linux-mtd

Hi guys,

This is a question confused me for a long time. As I know, writing a
sector for a hard disk is atomic. That is to say, when we are writing
a sector to hard disk and power failure happen, the sector will be
written completely or not at all.

For NAND flash, I didn't see the atomic guarantee in any material.
Could you please tell me if writing a page for NAND flash is atomic?
This is very important for a transaction based file system.

--
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20  9:58 Is it an atomic operation for writing a page in NAND flash Liu Hui
@ 2010-01-20 10:13 ` Ricard Wanderlof
  2010-01-20 13:11   ` Liu Hui
  2010-01-20 13:41 ` Jamie Lokier
  1 sibling, 1 reply; 19+ messages in thread
From: Ricard Wanderlof @ 2010-01-20 10:13 UTC (permalink / raw)
  To: Liu Hui; +Cc: linux-mtd@lists.infradead.org


On Wed, 20 Jan 2010, Liu Hui wrote:

> Hi guys,
>
> This is a question confused me for a long time. As I know, writing a
> sector for a hard disk is atomic. That is to say, when we are writing
> a sector to hard disk and power failure happen, the sector will be
> written completely or not at all.
>
> For NAND flash, I didn't see the atomic guarantee in any material.
> Could you please tell me if writing a page for NAND flash is atomic?
> This is very important for a transaction based file system.

It is my understanding that if you get a power failure while the nand 
flash chip is writing the page the page could get partly written.

The only way around something like this would be to monitor the power line 
prior to the supply regulator, and not start a write if it can be detected 
that a power failure has occurred and there is not enough power to 
complete the write.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 10:13 ` Ricard Wanderlof
@ 2010-01-20 13:11   ` Liu Hui
  2010-01-20 13:33     ` Jamie Lokier
  0 siblings, 1 reply; 19+ messages in thread
From: Liu Hui @ 2010-01-20 13:11 UTC (permalink / raw)
  To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org

Richard,

Thank you for your confirmation and good idea.

I also think about your idea before, that is, when power failure
happens, generate an interrupt and blocks any other write requests in
interrupt handler. But this is a little complex.

Now, I think I can use ECC to check the partial write, if a write was
not finished, the ECC should be wrong, so we can detect this partial
write and discard this write. Do you think this is a good idea?

Thanks,
Hui

2010/1/20 Ricard Wanderlof <ricard.wanderlof@axis.com>:
>
> On Wed, 20 Jan 2010, Liu Hui wrote:
>
>> Hi guys,
>>
>> This is a question confused me for a long time. As I know, writing a
>> sector for a hard disk is atomic. That is to say, when we are writing
>> a sector to hard disk and power failure happen, the sector will be
>> written completely or not at all.
>>
>> For NAND flash, I didn't see the atomic guarantee in any material.
>> Could you please tell me if writing a page for NAND flash is atomic?
>> This is very important for a transaction based file system.
>
> It is my understanding that if you get a power failure while the nand flash
> chip is writing the page the page could get partly written.
>
> The only way around something like this would be to monitor the power line
> prior to the supply regulator, and not start a write if it can be detected
> that a power failure has occurred and there is not enough power to complete
> the write.
>
> /Ricard
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 13:11   ` Liu Hui
@ 2010-01-20 13:33     ` Jamie Lokier
  2010-01-20 14:25       ` Liu Hui
  0 siblings, 1 reply; 19+ messages in thread
From: Jamie Lokier @ 2010-01-20 13:33 UTC (permalink / raw)
  To: Liu Hui; +Cc: linux-mtd@lists.infradead.org, Ricard Wanderlof

Liu Hui wrote:
> Richard,
> 
> Thank you for your confirmation and good idea.
> 
> I also think about your idea before, that is, when power failure
> happens, generate an interrupt and blocks any other write requests in
> interrupt handler. But this is a little complex.

Ideally, you would design the hardware so that power failure can be
detected early near the power input, but with enough on-board power
retention (i.e. capacitor) that there is guaranteed enough continuous
power for the CPU to react and the NAND chip to have enough stable
power to complete the write reliably.

There is no need for an interrupt, if you have a fast GPIO that you
can read before each write command that tells if the input power has
not dropped.

> Now, I think I can use ECC to check the partial write, if a write was
> not finished, the ECC should be wrong, so we can detect this partial
> write and discard this write. Do you think this is a good idea?

It's good, but not perfect: In principle a power-failed write could
successfully store the correct bits including ECC so they read back
correctly, but with the cell charges not completely stable.  But I
guess that's rare enough that it is just included in the normal NAND
bad block possibilities.

-- Jamie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20  9:58 Is it an atomic operation for writing a page in NAND flash Liu Hui
  2010-01-20 10:13 ` Ricard Wanderlof
@ 2010-01-20 13:41 ` Jamie Lokier
  2010-01-20 13:58   ` Artem Bityutskiy
  2010-01-20 14:01   ` Liu Hui
  1 sibling, 2 replies; 19+ messages in thread
From: Jamie Lokier @ 2010-01-20 13:41 UTC (permalink / raw)
  To: Liu Hui; +Cc: linux-mtd

Liu Hui wrote:
> This is a question confused me for a long time. As I know, writing a
> sector for a hard disk is atomic. That is to say, when we are writing
> a sector to hard disk and power failure happen, the sector will be
> written completely or not at all.

Are you sure about that?
I have never seen a reliable confirmation of it.

-- Jamie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 13:41 ` Jamie Lokier
@ 2010-01-20 13:58   ` Artem Bityutskiy
  2010-01-20 14:06     ` Liu Hui
  2010-01-20 14:01   ` Liu Hui
  1 sibling, 1 reply; 19+ messages in thread
From: Artem Bityutskiy @ 2010-01-20 13:58 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Liu Hui

On Wed, 2010-01-20 at 13:41 +0000, Jamie Lokier wrote:
> Liu Hui wrote:
> > This is a question confused me for a long time. As I know, writing a
> > sector for a hard disk is atomic. That is to say, when we are writing
> > a sector to hard disk and power failure happen, the sector will be
> > written completely or not at all.
> 
> Are you sure about that?
> I have never seen a reliable confirmation of it.

Theo sent a longish e-mail some time ago to fs-devel or lkml, explaining
why this almost never true in practice.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 13:41 ` Jamie Lokier
  2010-01-20 13:58   ` Artem Bityutskiy
@ 2010-01-20 14:01   ` Liu Hui
  1 sibling, 0 replies; 19+ messages in thread
From: Liu Hui @ 2010-01-20 14:01 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd

http://kerneltrap.org/node/6741
In this article, it said:"Disks assure atomicity at the sector level.
This means that a write to a sector either goes through completely or
not at all."

I also know, some transaction based file system depend on the atomic
sector write feature of hard disk.

But I am not very confirmed about this...

2010/1/20 Jamie Lokier <jamie@shareable.org>:
> Liu Hui wrote:
>> This is a question confused me for a long time. As I know, writing a
>> sector for a hard disk is atomic. That is to say, when we are writing
>> a sector to hard disk and power failure happen, the sector will be
>> written completely or not at all.
>
> Are you sure about that?
> I have never seen a reliable confirmation of it.
>
> -- Jamie
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 13:58   ` Artem Bityutskiy
@ 2010-01-20 14:06     ` Liu Hui
  2010-01-20 14:38       ` Artem Bityutskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Liu Hui @ 2010-01-20 14:06 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Jamie Lokier

I didn't see this e-mail of Theo and I also can't find it in my
fs-devel mail list, could you please share it with us?

much appreciated!

2010/1/20 Artem Bityutskiy <dedekind1@gmail.com>:
> On Wed, 2010-01-20 at 13:41 +0000, Jamie Lokier wrote:
>> Liu Hui wrote:
>> > This is a question confused me for a long time. As I know, writing a
>> > sector for a hard disk is atomic. That is to say, when we are writing
>> > a sector to hard disk and power failure happen, the sector will be
>> > written completely or not at all.
>>
>> Are you sure about that?
>> I have never seen a reliable confirmation of it.
>
> Theo sent a longish e-mail some time ago to fs-devel or lkml, explaining
> why this almost never true in practice.
>
> --
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
>
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 13:33     ` Jamie Lokier
@ 2010-01-20 14:25       ` Liu Hui
  2010-01-20 14:54         ` Ricard Wanderlof
  0 siblings, 1 reply; 19+ messages in thread
From: Liu Hui @ 2010-01-20 14:25 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd@lists.infradead.org, Ricard Wanderlof

Ok, In fact, I am a soft developer, I want to find a way to design a
power-cut safe FTL. I can't control the design of hardware, so I have
to find a common way to ensure atomic operation.

> There is no need for an interrupt, if you have a fast GPIO that you
> can read before each write command that tells if the input power has
> not dropped.
This is no good for performance.

> It's good, but not perfect: In principle a power-failed write could
> successfully store the correct bits including ECC so they read back
> correctly, but with the cell charges not completely stable.  But I
> guess that's rare enough that it is just included in the normal NAND
> bad block possibilities.
Ok, ECC can detect partial write but can't detect unstable cell
charges, I think this is enough since NAND flash is unstable media.

Thanks for your information!
Hui



2010/1/20 Jamie Lokier <jamie@shareable.org>:
> Liu Hui wrote:
>> Richard,
>>
>> Thank you for your confirmation and good idea.
>>
>> I also think about your idea before, that is, when power failure
>> happens, generate an interrupt and blocks any other write requests in
>> interrupt handler. But this is a little complex.
>
> Ideally, you would design the hardware so that power failure can be
> detected early near the power input, but with enough on-board power
> retention (i.e. capacitor) that there is guaranteed enough continuous
> power for the CPU to react and the NAND chip to have enough stable
> power to complete the write reliably.
>
> There is no need for an interrupt, if you have a fast GPIO that you
> can read before each write command that tells if the input power has
> not dropped.
>
>> Now, I think I can use ECC to check the partial write, if a write was
>> not finished, the ECC should be wrong, so we can detect this partial
>> write and discard this write. Do you think this is a good idea?
>
> It's good, but not perfect: In principle a power-failed write could
> successfully store the correct bits including ECC so they read back
> correctly, but with the cell charges not completely stable.  But I
> guess that's rare enough that it is just included in the normal NAND
> bad block possibilities.
>
> -- Jamie
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:06     ` Liu Hui
@ 2010-01-20 14:38       ` Artem Bityutskiy
  2010-01-20 14:46         ` Artem Bityutskiy
  0 siblings, 1 reply; 19+ messages in thread
From: Artem Bityutskiy @ 2010-01-20 14:38 UTC (permalink / raw)
  To: Liu Hui; +Cc: linux-mtd, Jamie Lokier

On Wed, 2010-01-20 at 22:06 +0800, Liu Hui wrote:
> I didn't see this e-mail of Theo and I also can't find it in my
> fs-devel mail list, could you please share it with us?
> 
> much appreciated!

Found it on lkml. Enjoy. IMO, very good writing.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:38       ` Artem Bityutskiy
@ 2010-01-20 14:46         ` Artem Bityutskiy
  2010-01-20 14:52           ` Liu Hui
  0 siblings, 1 reply; 19+ messages in thread
From: Artem Bityutskiy @ 2010-01-20 14:46 UTC (permalink / raw)
  To: Liu Hui; +Cc: Jamie Lokier, linux-mtd

On Wed, 2010-01-20 at 16:38 +0200, Artem Bityutskiy wrote:
> On Wed, 2010-01-20 at 22:06 +0800, Liu Hui wrote:
> > I didn't see this e-mail of Theo and I also can't find it in my
> > fs-devel mail list, could you please share it with us?
> > 
> > much appreciated!
> 
> Found it on lkml. Enjoy. IMO, very good writing.

Sorry:
http://lkml.org/lkml/2009/8/24/156

(forgot to paste the link)

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:46         ` Artem Bityutskiy
@ 2010-01-20 14:52           ` Liu Hui
  0 siblings, 0 replies; 19+ messages in thread
From: Liu Hui @ 2010-01-20 14:52 UTC (permalink / raw)
  To: dedekind1; +Cc: Jamie Lokier, linux-mtd

Great, so kind of you.

Thanks,
Hui

2010/1/20 Artem Bityutskiy <dedekind1@gmail.com>:
> On Wed, 2010-01-20 at 16:38 +0200, Artem Bityutskiy wrote:
>> On Wed, 2010-01-20 at 22:06 +0800, Liu Hui wrote:
>> > I didn't see this e-mail of Theo and I also can't find it in my
>> > fs-devel mail list, could you please share it with us?
>> >
>> > much appreciated!
>>
>> Found it on lkml. Enjoy. IMO, very good writing.
>
> Sorry:
> http://lkml.org/lkml/2009/8/24/156
>
> (forgot to paste the link)
>
> --
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
>
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:25       ` Liu Hui
@ 2010-01-20 14:54         ` Ricard Wanderlof
  2010-01-20 15:11           ` Liu Hui
  2010-01-20 16:17           ` David Parkinson
  0 siblings, 2 replies; 19+ messages in thread
From: Ricard Wanderlof @ 2010-01-20 14:54 UTC (permalink / raw)
  To: Liu Hui; +Cc: linux-mtd@lists.infradead.org, Jamie Lokier

On Wed, 20 Jan 2010, Liu Hui wrote:

>> It's good, but not perfect: In principle a power-failed write could
>> successfully store the correct bits including ECC so they read back
>> correctly, but with the cell charges not completely stable.  But I
>> guess that's rare enough that it is just included in the normal NAND
>> bad block possibilities.
> Ok, ECC can detect partial write but can't detect unstable cell
> charges, I think this is enough since NAND flash is unstable media.

ECC is designed to correct a small number of bits (1 bit for the software 
ECC algorithm used by mtd) and detect failure if a couple of more bits are 
bad (2 bits for the mtd algorithm). Beyond that, the results cannot be 
trusted. That means, that if there are, say, 16 incorrect bits in the 
data, the ECC algorithm will not necessarily indicate that there is a 
failure. It might very well indicate that there is a single bit that needs 
correction, or that all bits are correct. It is not a CRC.

The end result is that you can't say "if the ECC says it's ok, the data 
hasn't been corrupted" (which you could with a CRC). The only thing you 
can say (in the case of the mtd ECC algorithm) is "if there is a one-bit 
error in the data, the ECC will correct it" and "if there is a two-bit 
error in the data, the ECC will detect it".

If you really need that kind of check for data integrity, I suppose you 
could add a CRC algorithm to the ECC calculations already being performed 
by mtd, so that any change in the data would be flagged due to a 
mismatching CRC.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:54         ` Ricard Wanderlof
@ 2010-01-20 15:11           ` Liu Hui
  2010-01-20 16:09             ` Ricard Wanderlof
  2010-01-20 16:17           ` David Parkinson
  1 sibling, 1 reply; 19+ messages in thread
From: Liu Hui @ 2010-01-20 15:11 UTC (permalink / raw)
  To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org, Jamie Lokier

Thanks you very much, CRC is the real solution.

But I don't understand, if a partial write happens, we use ECC to
correct the data, we will find the data can't be corrected, then
-EBADMSG will be returned(see nand_correct_data()), then we can know
this page are corrupted. IMHO, this works.

Anyway, I will think about CRC seriously.

Thanks,
Hui

2010/1/20 Ricard Wanderlof <ricard.wanderlof@axis.com>:
>
> On Wed, 20 Jan 2010, Liu Hui wrote:
>
>>> It's good, but not perfect: In principle a power-failed write could
>>> successfully store the correct bits including ECC so they read back
>>> correctly, but with the cell charges not completely stable.  But I
>>> guess that's rare enough that it is just included in the normal NAND
>>> bad block possibilities.
>>
>> Ok, ECC can detect partial write but can't detect unstable cell
>> charges, I think this is enough since NAND flash is unstable media.
>
> ECC is designed to correct a small number of bits (1 bit for the software
> ECC algorithm used by mtd) and detect failure if a couple of more bits are
> bad (2 bits for the mtd algorithm). Beyond that, the results cannot be
> trusted. That means, that if there are, say, 16 incorrect bits in the data,
> the ECC algorithm will not necessarily indicate that there is a failure. It
> might very well indicate that there is a single bit that needs correction,
> or that all bits are correct. It is not a CRC.
>
> The end result is that you can't say "if the ECC says it's ok, the data
> hasn't been corrupted" (which you could with a CRC). The only thing you can
> say (in the case of the mtd ECC algorithm) is "if there is a one-bit error
> in the data, the ECC will correct it" and "if there is a two-bit error in
> the data, the ECC will detect it".
>
> If you really need that kind of check for data integrity, I suppose you
> could add a CRC algorithm to the ECC calculations already being performed by
> mtd, so that any change in the data would be flagged due to a mismatching
> CRC.
>
> /Ricard
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 15:11           ` Liu Hui
@ 2010-01-20 16:09             ` Ricard Wanderlof
  2010-01-21  1:37               ` Liu Hui
  0 siblings, 1 reply; 19+ messages in thread
From: Ricard Wanderlof @ 2010-01-20 16:09 UTC (permalink / raw)
  To: Liu Hui; +Cc: Jamie Lokier, Ricard Wanderlöf,
	linux-mtd@lists.infradead.org

On Wed, 20 Jan 2010, Liu Hui wrote:

> Thanks you very much, CRC is the real solution.
>
> But I don't understand, if a partial write happens, we use ECC to
> correct the data, we will find the data can't be corrected, then
> -EBADMSG will be returned(see nand_correct_data()), then we can know
> this page are corrupted. IMHO, this works.

Assuming the ECC algorithm used by mtd, it only produces correct results 
in the case of 0, 1 or 2 bit errors in the data. For more bit errors than 
that, the result is undefined.

Let's assume a partial write occurs, which leads to 57 bit errors compared 
to what was originally supposed to be there. Since there are more than 2 
bit errors, the algorthm output is undefined; it may say that the data 
can't be corrected, or it may say that the data is ok, or it may say that 
the data can be corrected; it's impossible to tell. As far as I 
understand, it is not uncommon for ECC to say the data is correct when it 
fact it isn't.

A slightly trivial case:

Again assuming the ECC algorithm used by mtd, the ECC bytes for a chunk of 
data where all the bytes have the same value is 0xFFFFFF, regardless of 
the actual value. So, say you have a page full of 0xA3; the ECC is then 
0xFFFFFFF. Now, assuming a partial write causes bit 2 of all bit cells to 
not change from 1 to 0 when programming. The result is a page full of 
0xA7, in effect, 256 bit errors (assuming a page size of 256 bits, or at 
least, assuming an ECC calculation encompassing that many bytes). But the 
ECC will still be 0xFFFFFF, and the corresponding ECC calculation will say 
that the data is correct. That is, as I mentioned before, because the 
result of an ECC calculation on data with >2 bit errors is undefined.

Note that there are other ECC algorithms which can correct more error 
bits. For MLC flash it is recommended to use an algorithm which corrects 4 
bit errors rather than a single bit error in a block of data. Such 
algorithms require more ECC bits though.

One has a tendency to think of ECC as a checking algorithm. It is not. It 
corrects and detects bit errors under certain circumstances. Outside those 
circumstances it is worthless. For the case of the software algorithm used 
in mtd, it is worthless if there are more than 2 bit errors. A failed 
write could cause any number of bit errors, so it is worthless to check 
the result using the ECC algorithm. The normal failure mode of a nand 
flash chip is random single random bit errors with low probability. ECC 
handles this elegantly.

Elaborating on this slightly, a devil's advocate would considere ECC 
worthless as a correction algorithm for a flash chip. Assume one bit error 
occurs, which is corrected by ECC. Then another bit error occurs in the 
same page. ECC then detects a failure. Then another bit error occurrs. The 
ECC algorithm is now worthless, it may detect the error, it may say that 
the data is correct, or it may even try to correct it (erroneously).

The reason that all this works in practice is that the probability of a 
bit error occurring is so low that the probability of two bit errors 
occurring in the same page is very low, in some respect lower than other 
failure modes in the system, so that we don't have to worry about it. (For 
example: What about bit errors occurring in RAM chips from cosmic 
radiation? It is a real risk, but so small that most systems don't have to 
worry about it.)

It can be a real concern though, and that is why things like UBI provide 
so-called bit scrubbing: whenever it detects that ECC has done a bit 
correction in a block, it erases that block and rewrites the data [lots of 
details omitted here] so that the chance of two bit errors ever occurring 
in the same page will be very small indeed.

Especially in these days of larger and larger flash chips resulting from 
shrinking chip geometries this is problem that is getting worse and worse. 
It also tends to vary hugely among manufacturers.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 14:54         ` Ricard Wanderlof
  2010-01-20 15:11           ` Liu Hui
@ 2010-01-20 16:17           ` David Parkinson
  2010-01-20 16:35             ` Ricard Wanderlof
  1 sibling, 1 reply; 19+ messages in thread
From: David Parkinson @ 2010-01-20 16:17 UTC (permalink / raw)
  To: linux-mtd

At 14:54 20/01/2010, Ricard Wanderlof wrote:
 >...
 >The end result is that you can't say "if the ECC says it's ok, the data
 >hasn't been corrupted" (which you could with a CRC).
 >...

Apologies for nit-picking (and small digression), but a CRC is no 
guarantee either.  Whilst error correcting codes have additional 
information so that small errors can be corrected both CRCs and ECCs 
work in the same way in detecting likely errors in the communications 
channel.  (It's all maths and statistics....).

A side question here is have the check algorithms been matched to the 
characteristics of the MTDs?  For example a weakish radio signal is 
likely to have errors randomly distributed across the message.  With 
a magnetic disk drive the errors are likely be caused by a blemish on 
the surface and therefore will come in bursts.  Some algorithms will 
be better than others in the respective cases.

Regards

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 16:17           ` David Parkinson
@ 2010-01-20 16:35             ` Ricard Wanderlof
  2010-01-20 23:08               ` Charles Manning
  0 siblings, 1 reply; 19+ messages in thread
From: Ricard Wanderlof @ 2010-01-20 16:35 UTC (permalink / raw)
  To: David Parkinson; +Cc: linux-mtd@lists.infradead.org


On Wed, 20 Jan 2010, David Parkinson wrote:

> At 14:54 20/01/2010, Ricard Wanderlof wrote:
> >...
> >The end result is that you can't say "if the ECC says it's ok, the data
> >hasn't been corrupted" (which you could with a CRC).
> >...
>
> Apologies for nit-picking (and small digression), but a CRC is no
> guarantee either.  Whilst error correcting codes have additional
> information so that small errors can be corrected both CRCs and ECCs
> work in the same way in detecting likely errors in the communications
> channel.  (It's all maths and statistics....).

You are right of course. Indeed, any mapping of N bits to n bits (where N 
> n) must result in a number of bit patterns for N which map to identical 
bit patterns for n. Still, CRC's used for data checking are designed so 
that the different bit patterns for N that map to the same n n are 
reasonably different from each other, so that a CRC is unlikely to show a 
correct result if there has been a 'typical' failure on the channel. At 
least the ECC algorithm used for mtd has is not intended for that level of 
error detection; it is optimized for correcting single-bit errors.

> A side question here is have the check algorithms been matched to the
> characteristics of the MTDs?  For example a weakish radio signal is
> likely to have errors randomly distributed across the message.  With
> a magnetic disk drive the errors are likely be caused by a blemish on
> the surface and therefore will come in bursts.  Some algorithms will
> be better than others in the respective cases.

The algorithm used in mtd comes from Toshiba I think and was 
originally designed for an old 256 page flash of theirs. But I would think 
all 1-bit-error-correction ECC's are basically the same.

I don't know, but I think the basic premise is that bit errors are rare, 
and when they do occur, they will be single bit errors occurring in random 
places. Indeed, the algorithm used seems to be ideally suited to this 
case.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 16:35             ` Ricard Wanderlof
@ 2010-01-20 23:08               ` Charles Manning
  0 siblings, 0 replies; 19+ messages in thread
From: Charles Manning @ 2010-01-20 23:08 UTC (permalink / raw)
  To: linux-mtd; +Cc: Ricard Wanderlof, David Parkinson

On Thursday 21 January 2010 05:35:09 Ricard Wanderlof wrote:
> On Wed, 20 Jan 2010, David Parkinson wrote:
> > At 14:54 20/01/2010, Ricard Wanderlof wrote:
> > >...
> > >The end result is that you can't say "if the ECC says it's ok, the data
> > >hasn't been corrupted" (which you could with a CRC).
> > >...
> >
> > Apologies for nit-picking (and small digression), but a CRC is no
> > guarantee either.  Whilst error correcting codes have additional
> > information so that small errors can be corrected both CRCs and ECCs
> > work in the same way in detecting likely errors in the communications
> > channel.  (It's all maths and statistics....).

While you might be technically and theoretically correct, in practical terms 
CRC is a copper-bottomed guarantee when compared with ECC.

A 32-bit CRC is very difficult to randomly spoof. An ECC is extremely easy to 
spoof with random errors. The difference is in the order of millions.

The best protection for getting good NAND writes/erases is to make sure you 
don't launch a programming (write/erase op) unless you know power is good. If 
your system has a "power OK" flag then check it before doing the write.

Don't wire the WP pin to hardware power fail flags either since a falling WP 
will abort the current write.


>
> You are right of course. Indeed, any mapping of N bits to n bits (where N
>
> > n) must result in a number of bit patterns for N which map to identical
>
> bit patterns for n. Still, CRC's used for data checking are designed so
> that the different bit patterns for N that map to the same n n are
> reasonably different from each other, so that a CRC is unlikely to show a
> correct result if there has been a 'typical' failure on the channel. At
> least the ECC algorithm used for mtd has is not intended for that level of
> error detection; it is optimized for correcting single-bit errors.
>
> > A side question here is have the check algorithms been matched to the
> > characteristics of the MTDs?  For example a weakish radio signal is
> > likely to have errors randomly distributed across the message.  With
> > a magnetic disk drive the errors are likely be caused by a blemish on
> > the surface and therefore will come in bursts.  Some algorithms will
> > be better than others in the respective cases.

Actually radio errors are often bursty due to interference. The electric fence 
clicks I get here knocks out a few adjacent bits.

>
> The algorithm used in mtd comes from Toshiba I think and was
> originally designed for an old 256 page flash of theirs. But I would think
> all 1-bit-error-correction ECC's are basically the same.

This still treats the pages as blocks of 256 bytes so if you have a 512-byte 
page it will be treated as two 256-byte ECC regions.
>
> I don't know, but I think the basic premise is that bit errors are rare,
> and when they do occur, they will be single bit errors occurring in random
> places. Indeed, the algorithm used seems to be ideally suited to this
> case.

It depends...

For MLC flash you can expect quite a few errors which is why there has been a 
shift to multi-bit ECC for these.  If you have, say, 4-bit ECC then you might 
choose to treat 2 or less errors as "no error".

There are basically two types of multi-bit ECC: RS and BCH. RS is more suited 
to "burst" errors like you'll see on CD or radio. BCH is more suited to 
random errors.

>
> /Ricard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Is it an atomic operation for writing a page in NAND flash
  2010-01-20 16:09             ` Ricard Wanderlof
@ 2010-01-21  1:37               ` Liu Hui
  0 siblings, 0 replies; 19+ messages in thread
From: Liu Hui @ 2010-01-21  1:37 UTC (permalink / raw)
  To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org, Jamie Lokier

Very informative, it expand my view. Thanks!

2010/1/21 Ricard Wanderlof <ricard.wanderlof@axis.com>:
>
> On Wed, 20 Jan 2010, Liu Hui wrote:
>
>> Thanks you very much, CRC is the real solution.
>>
>> But I don't understand, if a partial write happens, we use ECC to
>> correct the data, we will find the data can't be corrected, then
>> -EBADMSG will be returned(see nand_correct_data()), then we can know
>> this page are corrupted. IMHO, this works.
>
> Assuming the ECC algorithm used by mtd, it only produces correct results in
> the case of 0, 1 or 2 bit errors in the data. For more bit errors than that,
> the result is undefined.
>
> Let's assume a partial write occurs, which leads to 57 bit errors compared
> to what was originally supposed to be there. Since there are more than 2 bit
> errors, the algorthm output is undefined; it may say that the data can't be
> corrected, or it may say that the data is ok, or it may say that the data
> can be corrected; it's impossible to tell. As far as I understand, it is not
> uncommon for ECC to say the data is correct when it fact it isn't.
>
> A slightly trivial case:
>
> Again assuming the ECC algorithm used by mtd, the ECC bytes for a chunk of
> data where all the bytes have the same value is 0xFFFFFF, regardless of the
> actual value. So, say you have a page full of 0xA3; the ECC is then
> 0xFFFFFFF. Now, assuming a partial write causes bit 2 of all bit cells to
> not change from 1 to 0 when programming. The result is a page full of 0xA7,
> in effect, 256 bit errors (assuming a page size of 256 bits, or at least,
> assuming an ECC calculation encompassing that many bytes). But the ECC will
> still be 0xFFFFFF, and the corresponding ECC calculation will say that the
> data is correct. That is, as I mentioned before, because the result of an
> ECC calculation on data with >2 bit errors is undefined.
>
> Note that there are other ECC algorithms which can correct more error bits.
> For MLC flash it is recommended to use an algorithm which corrects 4 bit
> errors rather than a single bit error in a block of data. Such algorithms
> require more ECC bits though.
>
> One has a tendency to think of ECC as a checking algorithm. It is not. It
> corrects and detects bit errors under certain circumstances. Outside those
> circumstances it is worthless. For the case of the software algorithm used
> in mtd, it is worthless if there are more than 2 bit errors. A failed write
> could cause any number of bit errors, so it is worthless to check the result
> using the ECC algorithm. The normal failure mode of a nand flash chip is
> random single random bit errors with low probability. ECC handles this
> elegantly.
>
>
> Elaborating on this slightly, a devil's advocate would considere ECC
> worthless as a correction algorithm for a flash chip. Assume one bit error
> occurs, which is corrected by ECC. Then another bit error occurs in the same
> page. ECC then detects a failure. Then another bit error occurrs. The ECC
> algorithm is now worthless, it may detect the error, it may say that the
> data is correct, or it may even try to correct it (erroneously).
>
> The reason that all this works in practice is that the probability of a bit
> error occurring is so low that the probability of two bit errors occurring
> in the same page is very low, in some respect lower than other failure modes
> in the system, so that we don't have to worry about it. (For example: What
> about bit errors occurring in RAM chips from cosmic radiation? It is a real
> risk, but so small that most systems don't have to worry about it.)
>
> It can be a real concern though, and that is why things like UBI provide
> so-called bit scrubbing: whenever it detects that ECC has done a bit
> correction in a block, it erases that block and rewrites the data [lots of
> details omitted here] so that the chance of two bit errors ever occurring in
> the same page will be very small indeed.
>
> Especially in these days of larger and larger flash chips resulting from
> shrinking chip geometries this is problem that is getting worse and worse.
> It also tends to vary hugely among manufacturers.
>
> /Ricard
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30
>



-- 
Thanks & Best Regards
Liu Hui
--

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-01-21  1:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-20  9:58 Is it an atomic operation for writing a page in NAND flash Liu Hui
2010-01-20 10:13 ` Ricard Wanderlof
2010-01-20 13:11   ` Liu Hui
2010-01-20 13:33     ` Jamie Lokier
2010-01-20 14:25       ` Liu Hui
2010-01-20 14:54         ` Ricard Wanderlof
2010-01-20 15:11           ` Liu Hui
2010-01-20 16:09             ` Ricard Wanderlof
2010-01-21  1:37               ` Liu Hui
2010-01-20 16:17           ` David Parkinson
2010-01-20 16:35             ` Ricard Wanderlof
2010-01-20 23:08               ` Charles Manning
2010-01-20 13:41 ` Jamie Lokier
2010-01-20 13:58   ` Artem Bityutskiy
2010-01-20 14:06     ` Liu Hui
2010-01-20 14:38       ` Artem Bityutskiy
2010-01-20 14:46         ` Artem Bityutskiy
2010-01-20 14:52           ` Liu Hui
2010-01-20 14:01   ` Liu Hui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox