* Nandwrite's behavior in case of write failure @ 2009-06-05 1:23 Nahor 2009-06-05 12:31 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Nahor @ 2009-06-05 1:23 UTC (permalink / raw) To: linux-mtd Hi, If the call to pwrite fails, nanwrite tries first to erase the block then to mark it as bad. If erase fails, nandwrite aborts. If setting the bad block flag fails, nandwrite just ignores it and go to the next block. My questions are: - Why erase the block? - Probably linked to the first question, why abort if erase fails? Why not just ignore it and rely on the bad block flag? - Why ignore the bad block flag error? If nandwrite can't set it and just goes on, the caller (app ou user) will think that everything is good. But when reading the partition later, the user will garbage when reaching that page. Thanks, Nahor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-05 1:23 Nandwrite's behavior in case of write failure Nahor @ 2009-06-05 12:31 ` Artem Bityutskiy 2009-06-06 2:40 ` Nahor 0 siblings, 1 reply; 7+ messages in thread From: Artem Bityutskiy @ 2009-06-05 12:31 UTC (permalink / raw) To: Nahor; +Cc: linux-mtd On Thu, 2009-06-04 at 18:23 -0700, Nahor wrote: > Hi, > > If the call to pwrite fails, nanwrite tries first to erase the block > then to mark it as bad. If erase fails, nandwrite aborts. If setting the > bad block flag fails, nandwrite just ignores it and go to the next block. I think this is a bug. It should not aborts in erase fails, but mark the block as bad instead. But it should check the error code as well, and mark it as bad only if the error was EIO. I.e., it should not mark the block as bad on ENOMEM. And yes, if mark_bad() fails, nandwrite should not ignore this. This is what we do in ubiformat, I think. Also, ubiformat asks the user if he wants to mark the block as bad, unless the -y option was used. nandwrite could do something similar. > My questions are: > - Why erase the block? Just in case. We should be very careful with marking blocks as bad, because if you do this by mistake, you may loose your device. Indeed, imagine you marked 100 blocks as bad my mistake, and you do not remember the block numbers. How will you know which blocks you should then unmark? So the idea of erase is to check whether the block is really bad or this is just a driver bug or something. > - Probably linked to the first question, why abort if erase fails? Why > not just ignore it and rely on the bad block flag? I guess this is a bug in nandwrite > - Why ignore the bad block flag error? If nandwrite can't set it and > just goes on, the caller (app ou user) will think that everything is > good. But when reading the partition later, the user will garbage when > reaching that page. And this must be a bug, IMO. -- Best regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-05 12:31 ` Artem Bityutskiy @ 2009-06-06 2:40 ` Nahor 2009-06-07 8:57 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Nahor @ 2009-06-06 2:40 UTC (permalink / raw) To: linux-mtd Artem Bityutskiy wrote: > On Thu, 2009-06-04 at 18:23 -0700, Nahor wrote: >> Hi, >> >> If the call to pwrite fails, nanwrite tries first to erase the block >> then to mark it as bad. If erase fails, nandwrite aborts. If setting the >> bad block flag fails, nandwrite just ignores it and go to the next block. > > I think this is a bug. It should not aborts in erase fails, but mark > the block as bad instead. But it should check the error code as well, > and mark it as bad only if the error was EIO. I.e., it should not > mark the block as bad on ENOMEM. > > And yes, if mark_bad() fails, nandwrite should not ignore this. > > This is what we do in ubiformat, I think. Also, ubiformat asks the user > if he wants to mark the block as bad, unless the -y option was used. > nandwrite could do something similar. nandwrite has the -m/--markbad option instead of -y. However, if the option is not set, the blocks are skipped without being marked. I guess nandwrite should either ask the user or fail immediately. Continuing seems useless. >> My questions are: >> - Why erase the block? > Just in case. We should be very careful with marking blocks as bad, > because if you do this by mistake, you may loose your device. Indeed, > imagine you marked 100 blocks as bad my mistake, and you do not remember > the block numbers. How will you know which blocks you should then > unmark? Well, one could unmark all of them and run nandtest, then flog oneself and swear to pay more attention next time :) More seriously, in my case, I want to use nandwrite for automatic updates so I prefer that it marks too many blocks bad than have the update abort because nandwrite wants to be conservative and exits instead of flagging the offending block and continuing. > So the idea of erase is to check whether the block is really bad > or this is just a driver bug or something. Do you mean that if the erase succeeds, nandwrite shouldn't mark the block bad and just exit? I'm not too familiar with the NAND technology but it is not possible that a block can be erasable but not written to? At least nandtest thinks so, it mark blocks as bad on either erase or write failures. And so does ubiformat. flash_image() either exits or marks the block bad if mtd_write fails. It doesn't try to erase it first. Nahor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-06 2:40 ` Nahor @ 2009-06-07 8:57 ` Artem Bityutskiy 2009-06-08 17:00 ` Nahor 2009-06-08 20:43 ` Jehan Bing 0 siblings, 2 replies; 7+ messages in thread From: Artem Bityutskiy @ 2009-06-07 8:57 UTC (permalink / raw) To: Nahor; +Cc: linux-mtd On Fri, 2009-06-05 at 19:40 -0700, Nahor wrote: > Artem Bityutskiy wrote: > > This is what we do in ubiformat, I think. Also, ubiformat asks the user > > if he wants to mark the block as bad, unless the -y option was used. > > nandwrite could do something similar. > > nandwrite has the -m/--markbad option instead of -y. OK. > However, if the option is not set, the blocks are skipped without being > marked. I guess nandwrite should either ask the user or fail immediately. > Continuing seems useless. Yes. It might as well suggest using -m/ > >> My questions are: > >> - Why erase the block? > > Just in case. We should be very careful with marking blocks as bad, > > because if you do this by mistake, you may loose your device. Indeed, > > imagine you marked 100 blocks as bad my mistake, and you do not remember > > the block numbers. How will you know which blocks you should then > > unmark? > > Well, one could unmark all of them and run nandtest, then flog oneself and > swear to pay more attention next time :) The problem is that factory-marked bad blocks do not have to manifest themselves easily. They may appear to work fine, and file later, e.g. when the temperature is higher. > More seriously, in my case, I want to use nandwrite for automatic updates > so I prefer that it marks too many blocks bad than have the update abort > because nandwrite wants to be conservative and exits instead of flagging > the offending block and continuing. Then you probably need to fix the tool. I wonder, it it would make sense to use libmtd.c from UBI utils and re-write nandwrite? Of course libmtd.c would need to be improved as well. But the benefit would be shared code between ubiformat and nandwrite. > > So the idea of erase is to check whether the block is really bad > > or this is just a driver bug or something. > > Do you mean that if the erase succeeds, nandwrite shouldn't mark the block > bad and just exit? I think it should do something like UBI is doing - erase the eraseblock, then torture it by writing several patterns and reading them back. If the eraseblock survived torturing, nandwrite may re-try writing, otherwise it marks the eraseblock as bad. > I'm not too familiar with the NAND technology but it is not possible that > a block can be erasable but not written to? At least nandtest thinks so, it > mark blocks as bad on either erase or write failures. Yes, write and erase failure mean that the erasblock is bad. But I think marking a block as bad straight away is just dangerous. Who knows may be this is a small glitch in a bus, or a software bug, or some-one corrupted driver's memory, or whatever. This is why UBI is doing eraseblock torturing before marking it as bad. And it is very careful about error codes - only EIO code is considered as a reason to mark an eraseblock as bad. > And so does ubiformat. flash_image() either exits or marks the block > bad if mtd_write fails. It doesn't try to erase it first. Yes, this is not very good, I've added TODO there. It should torture the eraseblock - I'll implement this later in libmtd.c. -- Best regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-07 8:57 ` Artem Bityutskiy @ 2009-06-08 17:00 ` Nahor 2009-06-08 20:43 ` Jehan Bing 1 sibling, 0 replies; 7+ messages in thread From: Nahor @ 2009-06-08 17:00 UTC (permalink / raw) To: linux-mtd Artem Bityutskiy wrote: > I wonder, it it would make sense > to use libmtd.c from UBI utils and re-write nandwrite? Of course > libmtd.c would need to be improved as well. But the benefit would be > shared code between ubiformat and nandwrite. I've been thinking that too. Unfortunately, I've too much work on my hands already and I don't have the time for things that are not strictly necessary. :( ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-07 8:57 ` Artem Bityutskiy 2009-06-08 17:00 ` Nahor @ 2009-06-08 20:43 ` Jehan Bing 2009-06-09 12:59 ` Artem Bityutskiy 1 sibling, 1 reply; 7+ messages in thread From: Jehan Bing @ 2009-06-08 20:43 UTC (permalink / raw) To: linux-mtd Artem Bityutskiy wrote: > Yes, write and erase failure mean that the erasblock is bad. But I think > marking a block as bad straight away is just dangerous. Who knows may be > this is a small glitch in a bus, or a software bug, or some-one > corrupted driver's memory, or whatever. This is why UBI is doing > eraseblock torturing before marking it as bad. And it is very careful > about error codes - only EIO code is considered as a reason to mark an > eraseblock as bad. Fixed broken behavior in case of write failure. More specifically: - Only try to mark a block bad if the errors are EIO. Other errors will abort the tool. - Also abort the tool if the marking fails instead of ignoring it. Signed-off-by: Jehan Bing <jehan@orb.com> --- a/nandwrite.c 2009-06-08 13:31:14.000000000 -0700 +++ b/nandwrite.c 2009-06-08 13:33:32.000000000 -0700 @@ -586,6 +586,10 @@ int main(int argc, char * const argv[]) erase_info_t erase; perror ("pwrite"); + if (errno != EIO) { + goto closeall; + } + /* Must rewind to blockstart if we can */ rewind_blocks = (mtdoffset - blockstart) / meminfo.writesize; /* Not including the one we just attempted */ rewind_bytes = (rewind_blocks * meminfo.writesize) + readlen; @@ -602,7 +606,9 @@ int main(int argc, char * const argv[]) (long)erase.start, (long)erase.start+erase.length-1); if (ioctl(fd, MEMERASE, &erase) != 0) { perror("MEMERASE"); - goto closeall; + if (errno != EIO) { + goto closeall; + } } if (markbad) { @@ -610,7 +616,7 @@ int main(int argc, char * const argv[]) fprintf(stderr, "Marking block at %08lx bad\n", (long)bad_addr); if (ioctl(fd, MEMSETBADBLOCK, &bad_addr)) { perror("MEMSETBADBLOCK"); - /* But continue anyway */ + goto closeall; } } mtdoffset = blockstart + meminfo.erasesize; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Nandwrite's behavior in case of write failure 2009-06-08 20:43 ` Jehan Bing @ 2009-06-09 12:59 ` Artem Bityutskiy 0 siblings, 0 replies; 7+ messages in thread From: Artem Bityutskiy @ 2009-06-09 12:59 UTC (permalink / raw) To: Jehan Bing; +Cc: linux-mtd On Mon, 2009-06-08 at 13:43 -0700, Jehan Bing wrote: > Artem Bityutskiy wrote: > > Yes, write and erase failure mean that the erasblock is bad. But I think > > marking a block as bad straight away is just dangerous. Who knows may be > > this is a small glitch in a bus, or a software bug, or some-one > > corrupted driver's memory, or whatever. This is why UBI is doing > > eraseblock torturing before marking it as bad. And it is very careful > > about error codes - only EIO code is considered as a reason to mark an > > eraseblock as bad. > > > Fixed broken behavior in case of write failure. More specifically: > - Only try to mark a block bad if the errors are EIO. Other errors > will abort the tool. > - Also abort the tool if the marking fails instead of ignoring it. > > Signed-off-by: Jehan Bing <jehan@orb.com> Looks good to me, pushed, thanks. -- Best regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-06-09 13:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-05 1:23 Nandwrite's behavior in case of write failure Nahor 2009-06-05 12:31 ` Artem Bityutskiy 2009-06-06 2:40 ` Nahor 2009-06-07 8:57 ` Artem Bityutskiy 2009-06-08 17:00 ` Nahor 2009-06-08 20:43 ` Jehan Bing 2009-06-09 12:59 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox