* ubi deadlock on .36+ @ 2010-11-03 21:30 Grazvydas Ignotas 2010-11-04 7:29 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Grazvydas Ignotas @ 2010-11-03 21:30 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd Hi, there seems to be some issue with NAND on my OMAP3 board that causes CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug in UBI that makes it loop forever (or very long) printing this: uncorrectable error : UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 0:512, read 512 bytes uncorrectable error : UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 68:512, read 512 bytes UBI: run torture test for PEB 68 UBI: PEB 68 passed torture test, do not mark it a bad here is full log of one minute session, after which I killed power: http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup Gražvydas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-03 21:30 ubi deadlock on .36+ Grazvydas Ignotas @ 2010-11-04 7:29 ` Artem Bityutskiy 2010-11-04 13:07 ` Grazvydas Ignotas 0 siblings, 1 reply; 7+ messages in thread From: Artem Bityutskiy @ 2010-11-04 7:29 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: linux-mtd On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote: > Hi, > > there seems to be some issue with NAND on my OMAP3 board that causes > CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug > in UBI that makes it loop forever (or very long) printing this: > > uncorrectable error : > UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > from PEB 0:512, read 512 bytes > uncorrectable error : > UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > from PEB 68:512, read 512 bytes > UBI: run torture test for PEB 68 > UBI: PEB 68 passed torture test, do not mark it a bad > > > here is full log of one minute session, after which I killed power: > http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup Hmm, could you please enable UBI debugging and provide me the logs? See here for some hints: http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-04 7:29 ` Artem Bityutskiy @ 2010-11-04 13:07 ` Grazvydas Ignotas 2010-11-13 12:37 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Grazvydas Ignotas @ 2010-11-04 13:07 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote: >> Hi, >> >> there seems to be some issue with NAND on my OMAP3 board that causes >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug >> in UBI that makes it loop forever (or very long) printing this: >> >> uncorrectable error : >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes >> from PEB 0:512, read 512 bytes >> uncorrectable error : >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes >> from PEB 68:512, read 512 bytes >> UBI: run torture test for PEB 68 >> UBI: PEB 68 passed torture test, do not mark it a bad >> >> >> here is full log of one minute session, after which I killed power: >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup > > Hmm, could you please enable UBI debugging and provide me the logs? See > here for some hints: > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport done: http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-04 13:07 ` Grazvydas Ignotas @ 2010-11-13 12:37 ` Artem Bityutskiy 2010-11-13 13:15 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Artem Bityutskiy @ 2010-11-13 12:37 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: linux-mtd On Thu, 2010-11-04 at 15:07 +0200, Grazvydas Ignotas wrote: > On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote: > >> Hi, > >> > >> there seems to be some issue with NAND on my OMAP3 board that causes > >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug > >> in UBI that makes it loop forever (or very long) printing this: > >> > >> uncorrectable error : > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > >> from PEB 0:512, read 512 bytes > >> uncorrectable error : > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > >> from PEB 68:512, read 512 bytes > >> UBI: run torture test for PEB 68 > >> UBI: PEB 68 passed torture test, do not mark it a bad > >> > >> > >> here is full log of one minute session, after which I killed power: > >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup > > > > Hmm, could you please enable UBI debugging and provide me the logs? See > > here for some hints: > > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport > > done: > http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2 But would it be possible to enable all UBI debugging messages? -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-13 12:37 ` Artem Bityutskiy @ 2010-11-13 13:15 ` Artem Bityutskiy 2010-11-13 14:23 ` Grazvydas Ignotas 0 siblings, 1 reply; 7+ messages in thread From: Artem Bityutskiy @ 2010-11-13 13:15 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: linux-mtd On Sat, 2010-11-13 at 14:37 +0200, Artem Bityutskiy wrote: > On Thu, 2010-11-04 at 15:07 +0200, Grazvydas Ignotas wrote: > > On Thu, Nov 4, 2010 at 9:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > On Wed, 2010-11-03 at 23:30 +0200, Grazvydas Ignotas wrote: > > >> Hi, > > >> > > >> there seems to be some issue with NAND on my OMAP3 board that causes > > >> CRC errors on 2.6.36 and 2.6.37-rc1. Those seem to be triggering a bug > > >> in UBI that makes it loop forever (or very long) printing this: > > >> > > >> uncorrectable error : > > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > > >> from PEB 0:512, read 512 bytes > > >> uncorrectable error : > > >> UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes > > >> from PEB 68:512, read 512 bytes > > >> UBI: run torture test for PEB 68 > > >> UBI: PEB 68 passed torture test, do not mark it a bad > > >> > > >> > > >> here is full log of one minute session, after which I killed power: > > >> http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup > > > > > > Hmm, could you please enable UBI debugging and provide me the logs? See > > > here for some hints: > > > http://www.linux-mtd.infradead.org/doc/ubi.html#L_how_send_bugreport > > > > done: > > http://notaz.gp2x.de/misc/pnd/logs/linux_20101103_ubi_lockup2 > > But would it be possible to enable all UBI debugging messages? While trying to figure out what is happening in your system, I realized one possible scenario which may confuse UBI. I've added a patch below. This probably won't fix your issue (but you could try), I need more time to think about what was happening. But a log with all messages (not only I/O) would help. Thanks. >From 703ba5f120644fefef3cfed46c0d8ccf6a15b4ee Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Date: Sat, 13 Nov 2010 15:08:29 +0200 Subject: [PATCH] UBI: improve UBI robustness When reading data from the flash, corrupt the buffer we are about to read to before reading. The idea is to fix the following possible situation: 1. The buffer contains data from previous operation, e.g., read from another PEB previously. The data looks like expected, e.g., if we just do not read anything and return - the caller would not notice this. E.g., if we are reading a VID header, the buffer may contain a valid VID header from another PEB. 2. The driver is buggy and returns use success or -EBADMSG or -EUCLEAN, but it does not actually put any data to the buffer. This may confuse UBI or upper layers - they may think the buffer contains valid data while in fact it is just old data. This is especially possible because UBI (and UBIFS) relies on CRC, and treats data as correct even in case of ECC errors if the CRC is correct. Try to prevent this situation by changing the first byte of the buffer. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- drivers/mtd/ubi/io.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c index c2960ac..9ab1a33 100644 --- a/drivers/mtd/ubi/io.c +++ b/drivers/mtd/ubi/io.c @@ -146,6 +146,28 @@ int ubi_io_read(const struct ubi_device *ubi, void *buf, int pnum, int offset, if (err) return err; + /* + * Deliberately corrupt the buffer to improve robustness. Indeed, if we + * do not do this, the following may happen: + * 1. The buffer contains data from previous operation, e.g., read from + * another PEB previously. The data looks like expected, e.g., if we + * just do not read anything and return - the caller would not + * notice this. E.g., if we are reading a VID header, the buffer may + * contain a valid VID header from another PEB. + * 2. The driver is buggy and returns us success or -EBADMSG or + * -EUCLEAN, but it does not actually put any data to the buffer. + * + * This may confuse UBI or upper layers - they may think the buffer + * contains valid data while in fact it is just old data. This is + * especially possible because UBI (and UBIFS) relies on CRC, and + * treats data as correct even in case of ECC errors if the CRC is + * correct. + * + * Try to prevent this situation by changing the first byte of the + * buffer. + */ + *((uint8_t *)buf) ^= 0xFF; + addr = (loff_t)pnum * ubi->peb_size + offset; retry: err = ubi->mtd->read(ubi->mtd, addr, len, &read, buf); -- 1.7.2.3 -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-13 13:15 ` Artem Bityutskiy @ 2010-11-13 14:23 ` Grazvydas Ignotas 2010-11-14 7:50 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Grazvydas Ignotas @ 2010-11-13 14:23 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd On Sat, Nov 13, 2010 at 3:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > While trying to figure out what is happening in your system, I realized > one possible scenario which may confuse UBI. I've added a patch below. > This probably won't fix your issue (but you could try), I need more time > to think about what was happening. But a log with all messages (not only > I/O) would help. Thanks. Well I think I already know what's wrong with my driver - it has subpage reads broken. So UBI tries to read a subpage, driver fails there, then it runs a torture test on full PEB that passes (because page reads work right), marks that PEB as good and retries the subpage read that fails again, and the story repeats. Does that sound like reasonable scenario, or do you still want more debugging logs? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ubi deadlock on .36+ 2010-11-13 14:23 ` Grazvydas Ignotas @ 2010-11-14 7:50 ` Artem Bityutskiy 0 siblings, 0 replies; 7+ messages in thread From: Artem Bityutskiy @ 2010-11-14 7:50 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: linux-mtd On Sat, 2010-11-13 at 16:23 +0200, Grazvydas Ignotas wrote: > On Sat, Nov 13, 2010 at 3:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > While trying to figure out what is happening in your system, I realized > > one possible scenario which may confuse UBI. I've added a patch below. > > This probably won't fix your issue (but you could try), I need more time > > to think about what was happening. But a log with all messages (not only > > I/O) would help. Thanks. > > Well I think I already know what's wrong with my driver - it has > subpage reads broken. So UBI tries to read a subpage, driver fails > there, then it runs a torture test on full PEB that passes (because > page reads work right), marks that PEB as good and retries the subpage > read that fails again, and the story repeats. Does that sound like > reasonable scenario, or do you still want more debugging logs? Yaeah, obviously you have driver problems, I'm just interested to improve UBI's resilience. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-11-14 7:50 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-03 21:30 ubi deadlock on .36+ Grazvydas Ignotas 2010-11-04 7:29 ` Artem Bityutskiy 2010-11-04 13:07 ` Grazvydas Ignotas 2010-11-13 12:37 ` Artem Bityutskiy 2010-11-13 13:15 ` Artem Bityutskiy 2010-11-13 14:23 ` Grazvydas Ignotas 2010-11-14 7:50 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).