Nandwrite's behavior in case of write failure

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* Nandwrite's behavior in case of write failure
@ 2009-06-05  1:23 Nahor
  2009-06-05 12:31 ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Nahor @ 2009-06-05  1:23 UTC (permalink / raw)
  To: linux-mtd

Hi,

If the call to pwrite fails, nanwrite tries first to erase the block 
then to mark it as bad. If erase fails, nandwrite aborts. If setting the 
bad block flag fails, nandwrite just ignores it and go to the next block.

My questions are:
- Why erase the block?
- Probably linked to the first question, why abort if erase fails? Why 
not just ignore it and rely on the bad block flag?
- Why ignore the bad block flag error? If nandwrite can't set it and 
just goes on, the caller (app ou user) will think that everything is 
good. But when reading the partition later, the user will garbage when 
reaching that page.


Thanks,
	Nahor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-05  1:23 Nandwrite's behavior in case of write failure Nahor
@ 2009-06-05 12:31 ` Artem Bityutskiy
  2009-06-06  2:40   ` Nahor
  0 siblings, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2009-06-05 12:31 UTC (permalink / raw)
  To: Nahor; +Cc: linux-mtd

On Thu, 2009-06-04 at 18:23 -0700, Nahor wrote:
> Hi,
> 
> If the call to pwrite fails, nanwrite tries first to erase the block 
> then to mark it as bad. If erase fails, nandwrite aborts. If setting the 
> bad block flag fails, nandwrite just ignores it and go to the next block.

I think this is a bug. It should not aborts in erase fails, but mark
the block as bad instead. But it should check the error code as well,
and mark it as bad only if the error was EIO. I.e., it should not
mark the block as bad on ENOMEM.

And yes, if mark_bad() fails, nandwrite should not ignore this.

This is what we do in ubiformat, I think. Also, ubiformat asks the user
if he wants to mark the block as bad, unless the -y option was used.
nandwrite could do something similar.

> My questions are:
> - Why erase the block?
Just in case. We should be very careful with marking blocks as bad,
because if you do this by mistake, you may loose your device. Indeed,
imagine you marked 100 blocks as bad my mistake, and you do not remember
the block numbers. How will you know which blocks you should then
unmark?

So the idea of erase is to check whether the block is really bad
or this is just a driver bug or something.

> - Probably linked to the first question, why abort if erase fails? Why 
> not just ignore it and rely on the bad block flag?
I guess this is a bug in nandwrite

> - Why ignore the bad block flag error? If nandwrite can't set it and 
> just goes on, the caller (app ou user) will think that everything is 
> good. But when reading the partition later, the user will garbage when 
> reaching that page.
And this must be a bug, IMO.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-05 12:31 ` Artem Bityutskiy
@ 2009-06-06  2:40   ` Nahor
  2009-06-07  8:57     ` Artem Bityutskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Nahor @ 2009-06-06  2:40 UTC (permalink / raw)
  To: linux-mtd

Artem Bityutskiy wrote:
> On Thu, 2009-06-04 at 18:23 -0700, Nahor wrote:
>> Hi,
>>
>> If the call to pwrite fails, nanwrite tries first to erase the block 
>> then to mark it as bad. If erase fails, nandwrite aborts. If setting the 
>> bad block flag fails, nandwrite just ignores it and go to the next block.
> 
> I think this is a bug. It should not aborts in erase fails, but mark
> the block as bad instead. But it should check the error code as well,
> and mark it as bad only if the error was EIO. I.e., it should not
> mark the block as bad on ENOMEM.
> 
> And yes, if mark_bad() fails, nandwrite should not ignore this.
> 
> This is what we do in ubiformat, I think. Also, ubiformat asks the user
> if he wants to mark the block as bad, unless the -y option was used.
> nandwrite could do something similar.

nandwrite has the -m/--markbad option instead of -y.
However, if the option is not set, the blocks are skipped without being
marked. I guess nandwrite should either ask the user or fail immediately.
Continuing seems useless.

>> My questions are:
>> - Why erase the block?
> Just in case. We should be very careful with marking blocks as bad,
> because if you do this by mistake, you may loose your device. Indeed,
> imagine you marked 100 blocks as bad my mistake, and you do not remember
> the block numbers. How will you know which blocks you should then
> unmark?

Well, one could unmark all of them and run nandtest, then flog oneself and
swear to pay more attention next time :)

More seriously, in my case, I want to use nandwrite for automatic updates
so I prefer that it marks too many blocks bad than have the update abort
because nandwrite wants to be conservative and exits instead of flagging
the offending block and continuing.

> So the idea of erase is to check whether the block is really bad
> or this is just a driver bug or something.

Do you mean that if the erase succeeds, nandwrite shouldn't mark the block
bad and just exit?

I'm not too familiar with the NAND technology but it is not possible that
a block can be erasable but not written to? At least nandtest thinks so, it
mark blocks as bad on either erase or write failures.

And so does ubiformat. flash_image() either exits or marks the block
bad if mtd_write fails. It doesn't try to erase it first.

	Nahor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-06  2:40   ` Nahor
@ 2009-06-07  8:57     ` Artem Bityutskiy
  2009-06-08 17:00       ` Nahor
  2009-06-08 20:43       ` Jehan Bing
  0 siblings, 2 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2009-06-07  8:57 UTC (permalink / raw)
  To: Nahor; +Cc: linux-mtd

On Fri, 2009-06-05 at 19:40 -0700, Nahor wrote:
> Artem Bityutskiy wrote:
> > This is what we do in ubiformat, I think. Also, ubiformat asks the user
> > if he wants to mark the block as bad, unless the -y option was used.
> > nandwrite could do something similar.
> 
> nandwrite has the -m/--markbad option instead of -y.

OK.

> However, if the option is not set, the blocks are skipped without being
> marked. I guess nandwrite should either ask the user or fail immediately.
> Continuing seems useless.

Yes. It might as well suggest using -m/

> >> My questions are:
> >> - Why erase the block?
> > Just in case. We should be very careful with marking blocks as bad,
> > because if you do this by mistake, you may loose your device. Indeed,
> > imagine you marked 100 blocks as bad my mistake, and you do not remember
> > the block numbers. How will you know which blocks you should then
> > unmark?
> 
> Well, one could unmark all of them and run nandtest, then flog oneself and
> swear to pay more attention next time :)

The problem is that factory-marked bad blocks do not have to manifest
themselves easily. They may appear to work fine, and file later, e.g.
when the temperature is higher.

> More seriously, in my case, I want to use nandwrite for automatic updates
> so I prefer that it marks too many blocks bad than have the update abort
> because nandwrite wants to be conservative and exits instead of flagging
> the offending block and continuing.

Then you probably need to fix the tool. I wonder, it it would make sense
to use libmtd.c from UBI utils and re-write nandwrite? Of course
libmtd.c would need to be improved as well. But the benefit would be
shared code between ubiformat and nandwrite.

> > So the idea of erase is to check whether the block is really bad
> > or this is just a driver bug or something.
> 
> Do you mean that if the erase succeeds, nandwrite shouldn't mark the block
> bad and just exit?

I think it should do something like UBI is doing - erase the eraseblock,
then torture it by writing several patterns and reading them back. If
the eraseblock survived torturing, nandwrite may re-try writing,
otherwise it marks the eraseblock as bad.

> I'm not too familiar with the NAND technology but it is not possible that
> a block can be erasable but not written to? At least nandtest thinks so, it
> mark blocks as bad on either erase or write failures.

Yes, write and erase failure mean that the erasblock is bad. But I think
marking a block as bad straight away is just dangerous. Who knows may be
this is a small glitch in a bus, or a software bug, or some-one
corrupted driver's memory, or whatever. This is why UBI is doing
eraseblock torturing before marking it as bad. And it is very careful
about error codes - only EIO code is considered as a reason to mark an
eraseblock as bad.

> And so does ubiformat. flash_image() either exits or marks the block
> bad if mtd_write fails. It doesn't try to erase it first.

Yes, this is not very good, I've added TODO there. It should torture the
eraseblock - I'll implement this later in libmtd.c.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-07  8:57     ` Artem Bityutskiy
@ 2009-06-08 17:00       ` Nahor
  2009-06-08 20:43       ` Jehan Bing
  1 sibling, 0 replies; 7+ messages in thread
From: Nahor @ 2009-06-08 17:00 UTC (permalink / raw)
  To: linux-mtd

Artem Bityutskiy wrote:
> I wonder, it it would make sense
> to use libmtd.c from UBI utils and re-write nandwrite? Of course
> libmtd.c would need to be improved as well. But the benefit would be
> shared code between ubiformat and nandwrite.

I've been thinking that too. Unfortunately, I've too much work on my
hands already and I don't have the time for things that are not strictly
necessary. :(

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-07  8:57     ` Artem Bityutskiy
  2009-06-08 17:00       ` Nahor
@ 2009-06-08 20:43       ` Jehan Bing
  2009-06-09 12:59         ` Artem Bityutskiy
  1 sibling, 1 reply; 7+ messages in thread
From: Jehan Bing @ 2009-06-08 20:43 UTC (permalink / raw)
  To: linux-mtd

Artem Bityutskiy wrote:
> Yes, write and erase failure mean that the erasblock is bad. But I think
> marking a block as bad straight away is just dangerous. Who knows may be
> this is a small glitch in a bus, or a software bug, or some-one
> corrupted driver's memory, or whatever. This is why UBI is doing
> eraseblock torturing before marking it as bad. And it is very careful
> about error codes - only EIO code is considered as a reason to mark an
> eraseblock as bad.


Fixed broken behavior in case of write failure. More specifically:
- Only try to mark a block bad if the errors are EIO. Other errors
will abort the tool.
- Also abort the tool if the marking fails instead of ignoring it.

Signed-off-by: Jehan Bing <jehan@orb.com>

--- a/nandwrite.c	2009-06-08 13:31:14.000000000 -0700
+++ b/nandwrite.c	2009-06-08 13:33:32.000000000 -0700
@@ -586,6 +586,10 @@ int main(int argc, char * const argv[])
 			erase_info_t erase;
 
 			perror ("pwrite");
+			if (errno != EIO) {
+				goto closeall;
+			}
+
 			/* Must rewind to blockstart if we can */
 			rewind_blocks = (mtdoffset - blockstart) / meminfo.writesize; /* Not including the one we just attempted */
 			rewind_bytes = (rewind_blocks * meminfo.writesize) + readlen;
@@ -602,7 +606,9 @@ int main(int argc, char * const argv[])
 				(long)erase.start, (long)erase.start+erase.length-1);
 			if (ioctl(fd, MEMERASE, &erase) != 0) {
 				perror("MEMERASE");
-				goto closeall;
+				if (errno != EIO) {
+					goto closeall;
+				}
 			}
 
 			if (markbad) {
@@ -610,7 +616,7 @@ int main(int argc, char * const argv[])
 				fprintf(stderr, "Marking block at %08lx bad\n", (long)bad_addr);
 				if (ioctl(fd, MEMSETBADBLOCK, &bad_addr)) {
 					perror("MEMSETBADBLOCK");
-					/* But continue anyway */
+					goto closeall;
 				}
 			}
 			mtdoffset = blockstart + meminfo.erasesize;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nandwrite's behavior in case of write failure
  2009-06-08 20:43       ` Jehan Bing
@ 2009-06-09 12:59         ` Artem Bityutskiy
  0 siblings, 0 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2009-06-09 12:59 UTC (permalink / raw)
  To: Jehan Bing; +Cc: linux-mtd

On Mon, 2009-06-08 at 13:43 -0700, Jehan Bing wrote:
> Artem Bityutskiy wrote:
> > Yes, write and erase failure mean that the erasblock is bad. But I think
> > marking a block as bad straight away is just dangerous. Who knows may be
> > this is a small glitch in a bus, or a software bug, or some-one
> > corrupted driver's memory, or whatever. This is why UBI is doing
> > eraseblock torturing before marking it as bad. And it is very careful
> > about error codes - only EIO code is considered as a reason to mark an
> > eraseblock as bad.
> 
> 
> Fixed broken behavior in case of write failure. More specifically:
> - Only try to mark a block bad if the errors are EIO. Other errors
> will abort the tool.
> - Also abort the tool if the marking fails instead of ignoring it.
> 
> Signed-off-by: Jehan Bing <jehan@orb.com>

Looks good to me, pushed, thanks.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-09 13:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-05  1:23 Nandwrite's behavior in case of write failure Nahor
2009-06-05 12:31 ` Artem Bityutskiy
2009-06-06  2:40   ` Nahor
2009-06-07  8:57     ` Artem Bityutskiy
2009-06-08 17:00       ` Nahor
2009-06-08 20:43       ` Jehan Bing
2009-06-09 12:59         ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox