* UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB @ 2014-10-10 13:45 Steve B 2014-10-20 8:54 ` Steve B 2014-10-20 13:35 ` Artem Bityutskiy 0 siblings, 2 replies; 10+ messages in thread From: Steve B @ 2014-10-10 13:45 UTC (permalink / raw) To: linux-mtd Hi All, I've been running a reboot test on 5 devices and after about 7000 reboot files in one of our UBIFS volumes reports a corruption (SQUASHFS errors). I have been able to extract the mtd partition and pull it into a nandsim instance on my host machine so I can add some extra debug. Here's the error log when trying to load the file: [41308.683356] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0 [41308.683358] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB 1:352 (unmapped) [41308.683362] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0 [41308.683364] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB 1:352 (unmapped) [41308.683368] UBIFS error (pid 15044): ubifs_read_node: bad node type (255 but expected 1) [41308.683373] UBI DBG gen (pid 15044): test LEB 1:352 [41308.683377] UBIFS error (pid 15044): ubifs_read_node: bad node at LEB 352:0, LEB mapping status 0 [41308.683381] Not a node, first 24 bytes: [41308.683386] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ [41308.683392] Pid: 15044, comm: md5sum Tainted: GF O 3.8.0-29-generic #42~precise1-Ubuntu [41308.683396] Call Trace: [41308.683411] [<ffffffffa0564a7d>] ubifs_read_node+0x26d/0x2d0 [ubifs] [41308.683419] [<ffffffffa0583338>] ubifs_tnc_read_node+0x148/0x160 [ubifs] [41308.683426] [<ffffffffa056816c>] ubifs_tnc_locate+0x1cc/0x1f0 [ubifs] [41308.683431] [<ffffffffa0559ade>] do_readpage+0x10e/0x410 [ubifs] [41308.683437] [<ffffffffa055a4ce>] ubifs_readpage+0x5e/0x580 [ubifs] [41308.683441] [<ffffffff81152de3>] ? __inc_zone_page_state+0x33/0x40 [41308.683444] [<ffffffff81135866>] ? add_to_page_cache_locked.part.26+0x76/0xd0 [41308.683447] [<ffffffff8113593d>] ? add_to_page_cache_locked+0x7d/0x90 [41308.683452] [<ffffffff81135edd>] do_generic_file_read.constprop.31+0x10d/0x440 [41308.683460] [<ffffffff81136f01>] generic_file_aio_read+0xe1/0x220 [41308.683469] [<ffffffff8119b0c3>] do_sync_read+0xa3/0xe0 [41308.683477] [<ffffffff8119b800>] vfs_read+0xb0/0x180 [41308.683485] [<ffffffff8119b922>] sys_read+0x52/0xa0 [41308.683492] [<ffffffff816fc8dd>] system_call_fastpath+0x1a/0x1f I have written a small tool to scan the headers in all PEBs and this confirms that LEB 352 doesn't exist for this volume. We are running a Linux kernel based off 3.4.0 This issue seems similar to an issue already raised on this mailing list: linux-mtd/2010-September/031837.html subject: ubi_eba_init_scan: cannot reserve enough PEBs I would like to know the best way to debug this issue, does anyone have the patch mentioned in the issue above to add more debugging in the right places? Our test setup has the facility of pulling the power on the unit if a GPIO line is toggled, with this we can test sensitive areas of the code, can anyone suggest where would be a good place to perform this action. Thanks for any help on this, Steve B ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B @ 2014-10-20 8:54 ` Steve B 2014-10-20 13:35 ` Artem Bityutskiy 1 sibling, 0 replies; 10+ messages in thread From: Steve B @ 2014-10-20 8:54 UTC (permalink / raw) To: linux-mtd Hi, Can anyone help me with this? Thanks, Steve On Fri, 10 Oct 2014, Steve B wrote: > Hi All, > > I've been running a reboot test on 5 devices and after about 7000 reboot > files in > one of our UBIFS volumes reports a corruption (SQUASHFS errors). I have been > able to extract the mtd partition and pull it into a nandsim instance on my > host > machine so I can add some extra debug. Here's the error log when trying to > load > the file: > > [41308.683356] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0 > [41308.683358] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB > 1:352 (unmapped) > [41308.683362] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0 > [41308.683364] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB > 1:352 (unmapped) > [41308.683368] UBIFS error (pid 15044): ubifs_read_node: bad node type (255 > but expected 1) > [41308.683373] UBI DBG gen (pid 15044): test LEB 1:352 > [41308.683377] UBIFS error (pid 15044): ubifs_read_node: bad node at LEB > 352:0, LEB mapping status 0 > [41308.683381] Not a node, first 24 bytes: > [41308.683386] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ........................ > [41308.683392] Pid: 15044, comm: md5sum Tainted: GF O > 3.8.0-29-generic #42~precise1-Ubuntu > [41308.683396] Call Trace: > [41308.683411] [<ffffffffa0564a7d>] ubifs_read_node+0x26d/0x2d0 [ubifs] > [41308.683419] [<ffffffffa0583338>] ubifs_tnc_read_node+0x148/0x160 [ubifs] > [41308.683426] [<ffffffffa056816c>] ubifs_tnc_locate+0x1cc/0x1f0 [ubifs] > [41308.683431] [<ffffffffa0559ade>] do_readpage+0x10e/0x410 [ubifs] > [41308.683437] [<ffffffffa055a4ce>] ubifs_readpage+0x5e/0x580 [ubifs] > [41308.683441] [<ffffffff81152de3>] ? __inc_zone_page_state+0x33/0x40 > [41308.683444] [<ffffffff81135866>] ? > add_to_page_cache_locked.part.26+0x76/0xd0 > [41308.683447] [<ffffffff8113593d>] ? add_to_page_cache_locked+0x7d/0x90 > [41308.683452] [<ffffffff81135edd>] > do_generic_file_read.constprop.31+0x10d/0x440 > [41308.683460] [<ffffffff81136f01>] generic_file_aio_read+0xe1/0x220 > [41308.683469] [<ffffffff8119b0c3>] do_sync_read+0xa3/0xe0 > [41308.683477] [<ffffffff8119b800>] vfs_read+0xb0/0x180 > [41308.683485] [<ffffffff8119b922>] sys_read+0x52/0xa0 > [41308.683492] [<ffffffff816fc8dd>] system_call_fastpath+0x1a/0x1f > > I have written a small tool to scan the headers in all PEBs and this confirms > that LEB 352 doesn't exist for this volume. > > We are running a Linux kernel based off 3.4.0 > > This issue seems similar to an issue already raised on this mailing list: > linux-mtd/2010-September/031837.html > subject: ubi_eba_init_scan: cannot reserve enough PEBs > > I would like to know the best way to debug this issue, does anyone have the > patch > mentioned in the issue above to add more debugging in the right places? > > Our test setup has the facility of pulling the power on the unit if a GPIO > line > is toggled, with this we can test sensitive areas of the code, can anyone > suggest > where would be a good place to perform this action. > > Thanks for any help on this, > Steve B > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B 2014-10-20 8:54 ` Steve B @ 2014-10-20 13:35 ` Artem Bityutskiy 2014-10-20 13:48 ` Steve B 1 sibling, 1 reply; 10+ messages in thread From: Artem Bityutskiy @ 2014-10-20 13:35 UTC (permalink / raw) To: Steve B; +Cc: linux-mtd On Fri, 2014-10-10 at 14:45 +0100, Steve B wrote: > Our test setup has the facility of pulling the power on the unit if a GPIO line > is toggled, with this we can test sensitive areas of the code, can anyone suggest > where would be a good place to perform this action. Did you perform testing with UBI/UBIFS debug messages on? Or the messages in this e-mail were enable after the corruption happened? Artem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 13:35 ` Artem Bityutskiy @ 2014-10-20 13:48 ` Steve B 2014-10-20 14:05 ` Artem Bityutskiy 0 siblings, 1 reply; 10+ messages in thread From: Steve B @ 2014-10-20 13:48 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd Hi Artem, Thanks for the reply. I have the kernel built with: CONFIG_MTD_UBI_DEBUG=y CONFIG_UBIFS_FS=y The failure was seen on an embedded device, I dumped the flash contents (raw format) and then re-constructed the image on my Linux PC using nandsim with UBI and UBIFS modules i've built with some extra debugging. The call stack in the log posted is the same as the one seen on the device that showed the original problem. Thanks, Steve On Mon, 20 Oct 2014, Artem Bityutskiy wrote: > On Fri, 2014-10-10 at 14:45 +0100, Steve B wrote: >> Our test setup has the facility of pulling the power on the unit if a GPIO line >> is toggled, with this we can test sensitive areas of the code, can anyone suggest >> where would be a good place to perform this action. > > Did you perform testing with UBI/UBIFS debug messages on? Or the > messages in this e-mail were enable after the corruption happened? > > Artem. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 13:48 ` Steve B @ 2014-10-20 14:05 ` Artem Bityutskiy 2014-10-20 14:51 ` Steve B 0 siblings, 1 reply; 10+ messages in thread From: Artem Bityutskiy @ 2014-10-20 14:05 UTC (permalink / raw) To: Steve B; +Cc: linux-mtd On Mon, 2014-10-20 at 14:48 +0100, Steve B wrote: > Hi Artem, > > Thanks for the reply. > > I have the kernel built with: > CONFIG_MTD_UBI_DEBUG=y > CONFIG_UBIFS_FS=y > > The failure was seen on an embedded device, I dumped the flash > contents (raw format) and then re-constructed the image on my Linux PC > using nandsim with UBI and UBIFS modules i've built with some extra debugging. > The call stack in the log posted is the same as the one seen on the device > that showed the original problem. > I see. Would you post the full log, not just the extract near the failure message? >From what I see so far, UBIFS believes that the data of a file is stored in LEB 352. But UBI says: "nope, this LEB was unmapped". So the bug may be in both UBIFS and UBI. When we were writing this code years ago, we were stressing UBIFS power-cut tolerance a lot. We dead real power cuts, but we did emulated power cuts a lot more than real. And emulated power cuts only test UBIFS, not UBI. On the other hand, UBI is very simple comparing to UBIFS. BTW, do you have fastmap enabled or disabled? Are you doing deliberate power cut testing, or you hit this error by chance? In the former case, you may enable additional debugging, probably, depending on how many test-boxes are you stressing simultaneously. Also, are you 100% sure you "reconstructed" the image correctly, and you are not wasting time looking at irrelevant errors? Artem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 14:05 ` Artem Bityutskiy @ 2014-10-20 14:51 ` Steve B 2014-10-20 15:19 ` Artem Bityutskiy 0 siblings, 1 reply; 10+ messages in thread From: Steve B @ 2014-10-20 14:51 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd The log on the target is very minimal as I was only capturing error logs, so don't have much info to share on that, but here is the log from the target: [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 [ 13.178668] Not a node, first 24 bytes: [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0) [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448) [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428) [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4) [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134) [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68) [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30) [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22 [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 I'm currently trying to create a test to more reliably reproduce the issue, when I do I can share those logs. Fastmap doesn't exist in the kernel version we are running (3.4.0) The first test to show up this issue simple copies a file in a tight loop and power is pulled randomly on the board. I am now trying to refine the test to speed up the failure by randomly pulling the power in the NAND driver after a random number of erase/write calls, but I haven't been able to reproduce the issue with this. I'm fairly sure the reconstructed image is correct as it's producing the same backtrace as the original issue on the target. My program that scans the NAND flash dump for UBI headers confirms that the expected mapping is missing. On Mon, 20 Oct 2014, Artem Bityutskiy wrote: > On Mon, 2014-10-20 at 14:48 +0100, Steve B wrote: >> Hi Artem, >> >> Thanks for the reply. >> >> I have the kernel built with: >> CONFIG_MTD_UBI_DEBUG=y >> CONFIG_UBIFS_FS=y >> >> The failure was seen on an embedded device, I dumped the flash >> contents (raw format) and then re-constructed the image on my Linux PC >> using nandsim with UBI and UBIFS modules i've built with some extra debugging. >> The call stack in the log posted is the same as the one seen on the device >> that showed the original problem. >> > > I see. Would you post the full log, not just the extract near the > failure message? > >> From what I see so far, UBIFS believes that the data of a file is stored > in LEB 352. But UBI says: "nope, this LEB was unmapped". > > So the bug may be in both UBIFS and UBI. > > When we were writing this code years ago, we were stressing UBIFS > power-cut tolerance a lot. We dead real power cuts, but we did emulated > power cuts a lot more than real. And emulated power cuts only test > UBIFS, not UBI. On the other hand, UBI is very simple comparing to > UBIFS. > > BTW, do you have fastmap enabled or disabled? > > Are you doing deliberate power cut testing, or you hit this error by > chance? In the former case, you may enable additional debugging, > probably, depending on how many test-boxes are you stressing > simultaneously. > > Also, are you 100% sure you "reconstructed" the image correctly, and you > are not wasting time looking at irrelevant errors? > > Artem. > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 14:51 ` Steve B @ 2014-10-20 15:19 ` Artem Bityutskiy 2014-10-20 15:25 ` Steve B 0 siblings, 1 reply; 10+ messages in thread From: Artem Bityutskiy @ 2014-10-20 15:19 UTC (permalink / raw) To: Steve B; +Cc: linux-mtd On Mon, 2014-10-20 at 15:51 +0100, Steve B wrote: > The log on the target is very minimal as I was only capturing error > logs, so don't have much info to share on that, but here is the log from > the target: > > [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) > [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 > [ 13.178668] Not a node, first 24 bytes: > [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ > [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0) > [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) > [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) > [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448) > [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428) > [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) > [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4) > [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134) > [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68) > [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30) > [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22 > [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) > [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 > > I'm currently trying to create a test to more reliably reproduce the issue, > when I do I can share those logs. > > Fastmap doesn't exist in the kernel version we are running (3.4.0) > > The first test to show up this issue simple copies a file in a tight loop and > power is pulled randomly on the board. I am now trying to refine the test to speed > up the failure by randomly pulling the power in the NAND driver after a random > number of erase/write calls, but I haven't been able to reproduce the issue > with this. Being able to reproduce this would make it bughunting a relatively easy task. You could also try to establish a separate UBI power cut testing. Something like writing various patterns directly to the UBI volume, having power cut, and making sure that what you wrote before power cut is there is exactly what you wrote. UBI writes synchronously, so once the write returns, the data must be on the flash. And you should write in min. I/O unit sizes (on NAND page at a time). I guess you can have a mirror file (if you have a second media), or a fixed algorithm of how you select the LEB and offset to write. And you generate random power cuts, and then after boot-up you verify that the media contains valid data - everything, for which the write operation was finished, should be there. With this, you'd exclude the whole layer of UBIFS complexity. And if you are lucky, and the bug is in UBI, it will be much easier to catch and to fix. With a mirror file it is probably easy. You just do modify the mirror file, recored the last operation, sync the mirror file and the file where you recorded the operation, then you enable the random power cut, and start the operation on the UBI volume. The point is to not have power cut before the mirror file is being changed. Artem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 15:19 ` Artem Bityutskiy @ 2014-10-20 15:25 ` Steve B 2014-10-20 15:47 ` Artem Bityutskiy 0 siblings, 1 reply; 10+ messages in thread From: Steve B @ 2014-10-20 15:25 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd Thank you very much for your suggestions Artem, gives me plenty to look into :) On Mon, 20 Oct 2014, Artem Bityutskiy wrote: > On Mon, 2014-10-20 at 15:51 +0100, Steve B wrote: >> The log on the target is very minimal as I was only capturing error >> logs, so don't have much info to share on that, but here is the log from >> the target: >> >> [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) >> [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 >> [ 13.178668] Not a node, first 24 bytes: >> [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ >> [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0) >> [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) >> [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) >> [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448) >> [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428) >> [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) >> [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4) >> [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134) >> [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68) >> [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30) >> [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22 >> [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) >> [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 >> >> I'm currently trying to create a test to more reliably reproduce the issue, >> when I do I can share those logs. >> >> Fastmap doesn't exist in the kernel version we are running (3.4.0) >> >> The first test to show up this issue simple copies a file in a tight loop and >> power is pulled randomly on the board. I am now trying to refine the test to speed >> up the failure by randomly pulling the power in the NAND driver after a random >> number of erase/write calls, but I haven't been able to reproduce the issue >> with this. > > Being able to reproduce this would make it bughunting a relatively easy > task. > > You could also try to establish a separate UBI power cut testing. > Something like writing various patterns directly to the UBI volume, > having power cut, and making sure that what you wrote before power cut > is there is exactly what you wrote. UBI writes synchronously, so once > the write returns, the data must be on the flash. And you should write > in min. I/O unit sizes (on NAND page at a time). > > I guess you can have a mirror file (if you have a second media), or a > fixed algorithm of how you select the LEB and offset to write. And you > generate random power cuts, and then after boot-up you verify that the > media contains valid data - everything, for which the write operation > was finished, should be there. > > With this, you'd exclude the whole layer of UBIFS complexity. And if you > are lucky, and the bug is in UBI, it will be much easier to catch and to > fix. > > With a mirror file it is probably easy. You just do modify the mirror > file, recored the last operation, sync the mirror file and the file > where you recorded the operation, then you enable the random power cut, > and start the operation on the UBI volume. The point is to not have > power cut before the mirror file is being changed. > > Artem. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 15:25 ` Steve B @ 2014-10-20 15:47 ` Artem Bityutskiy 2014-10-20 16:33 ` Steve B 0 siblings, 1 reply; 10+ messages in thread From: Artem Bityutskiy @ 2014-10-20 15:47 UTC (permalink / raw) To: Steve B; +Cc: linux-mtd On Mon, 2014-10-20 at 16:25 +0100, Steve B wrote: > Thank you very much for your suggestions Artem, gives me plenty to look > into :) May be this will be also a useful piece of data. Back several years ago, we saw ext4 not recovering after power cuts on eMMC. We knew ext4 is extremely complex, so we decided to go and test if eMMC does the right job. Namely, if I submit data, they sync, does eMMC make sure the data are on the media? Adrian (the other UBIFS author) implemented the eMMC power cut test. And he found that you may write to block A, they sync, then write to block B, and have power cut, and after reboot block A is corrupted. So interrupted write to block B affected a different block A. We reported to the eMMC vendor, and the vendor acknowledged the problem and we got newer eMMC, which was devoid of the issue. And our ext4 issues went away, it became a lot better. This is why I say may be stressing UBI could be a good idea. It may be waste of time too, though, if the problem is in UBIFS, or if the test won't reveal a problem in UBI because the test is not good enough. But it may also hit the nail quickly. Artem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB 2014-10-20 15:47 ` Artem Bityutskiy @ 2014-10-20 16:33 ` Steve B 0 siblings, 0 replies; 10+ messages in thread From: Steve B @ 2014-10-20 16:33 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd Thanks again, plenty for me to thing about! On Mon, 20 Oct 2014, Artem Bityutskiy wrote: > On Mon, 2014-10-20 at 16:25 +0100, Steve B wrote: >> Thank you very much for your suggestions Artem, gives me plenty to look >> into :) > > May be this will be also a useful piece of data. > > Back several years ago, we saw ext4 not recovering after power cuts on > eMMC. > > We knew ext4 is extremely complex, so we decided to go and test if eMMC > does the right job. Namely, if I submit data, they sync, does eMMC make > sure the data are on the media? > > Adrian (the other UBIFS author) implemented the eMMC power cut test. And > he found that you may write to block A, they sync, then write to block > B, and have power cut, and after reboot block A is corrupted. So > interrupted write to block B affected a different block A. > > We reported to the eMMC vendor, and the vendor acknowledged the problem > and we got newer eMMC, which was devoid of the issue. And our ext4 > issues went away, it became a lot better. > > This is why I say may be stressing UBI could be a good idea. It may be > waste of time too, though, if the problem is in UBIFS, or if the test > won't reveal a problem in UBI because the test is not good enough. But > it may also hit the nail quickly. > > Artem. > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-20 16:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B 2014-10-20 8:54 ` Steve B 2014-10-20 13:35 ` Artem Bityutskiy 2014-10-20 13:48 ` Steve B 2014-10-20 14:05 ` Artem Bityutskiy 2014-10-20 14:51 ` Steve B 2014-10-20 15:19 ` Artem Bityutskiy 2014-10-20 15:25 ` Steve B 2014-10-20 15:47 ` Artem Bityutskiy 2014-10-20 16:33 ` Steve B
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox