* UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
@ 2014-10-10 13:45 Steve B
2014-10-20 8:54 ` Steve B
2014-10-20 13:35 ` Artem Bityutskiy
0 siblings, 2 replies; 10+ messages in thread
From: Steve B @ 2014-10-10 13:45 UTC (permalink / raw)
To: linux-mtd
Hi All,
I've been running a reboot test on 5 devices and after about 7000 reboot files in
one of our UBIFS volumes reports a corruption (SQUASHFS errors). I have been
able to extract the mtd partition and pull it into a nandsim instance on my host
machine so I can add some extra debug. Here's the error log when trying to load
the file:
[41308.683356] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0
[41308.683358] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB 1:352 (unmapped)
[41308.683362] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0
[41308.683364] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB 1:352 (unmapped)
[41308.683368] UBIFS error (pid 15044): ubifs_read_node: bad node type (255 but expected 1)
[41308.683373] UBI DBG gen (pid 15044): test LEB 1:352
[41308.683377] UBIFS error (pid 15044): ubifs_read_node: bad node at LEB 352:0, LEB mapping status 0
[41308.683381] Not a node, first 24 bytes:
[41308.683386] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................
[41308.683392] Pid: 15044, comm: md5sum Tainted: GF O 3.8.0-29-generic #42~precise1-Ubuntu
[41308.683396] Call Trace:
[41308.683411] [<ffffffffa0564a7d>] ubifs_read_node+0x26d/0x2d0 [ubifs]
[41308.683419] [<ffffffffa0583338>] ubifs_tnc_read_node+0x148/0x160 [ubifs]
[41308.683426] [<ffffffffa056816c>] ubifs_tnc_locate+0x1cc/0x1f0 [ubifs]
[41308.683431] [<ffffffffa0559ade>] do_readpage+0x10e/0x410 [ubifs]
[41308.683437] [<ffffffffa055a4ce>] ubifs_readpage+0x5e/0x580 [ubifs]
[41308.683441] [<ffffffff81152de3>] ? __inc_zone_page_state+0x33/0x40
[41308.683444] [<ffffffff81135866>] ? add_to_page_cache_locked.part.26+0x76/0xd0
[41308.683447] [<ffffffff8113593d>] ? add_to_page_cache_locked+0x7d/0x90
[41308.683452] [<ffffffff81135edd>] do_generic_file_read.constprop.31+0x10d/0x440
[41308.683460] [<ffffffff81136f01>] generic_file_aio_read+0xe1/0x220
[41308.683469] [<ffffffff8119b0c3>] do_sync_read+0xa3/0xe0
[41308.683477] [<ffffffff8119b800>] vfs_read+0xb0/0x180
[41308.683485] [<ffffffff8119b922>] sys_read+0x52/0xa0
[41308.683492] [<ffffffff816fc8dd>] system_call_fastpath+0x1a/0x1f
I have written a small tool to scan the headers in all PEBs and this confirms
that LEB 352 doesn't exist for this volume.
We are running a Linux kernel based off 3.4.0
This issue seems similar to an issue already raised on this mailing list:
linux-mtd/2010-September/031837.html
subject: ubi_eba_init_scan: cannot reserve enough PEBs
I would like to know the best way to debug this issue, does anyone have the patch
mentioned in the issue above to add more debugging in the right places?
Our test setup has the facility of pulling the power on the unit if a GPIO line
is toggled, with this we can test sensitive areas of the code, can anyone suggest
where would be a good place to perform this action.
Thanks for any help on this,
Steve B
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B
@ 2014-10-20 8:54 ` Steve B
2014-10-20 13:35 ` Artem Bityutskiy
1 sibling, 0 replies; 10+ messages in thread
From: Steve B @ 2014-10-20 8:54 UTC (permalink / raw)
To: linux-mtd
Hi,
Can anyone help me with this?
Thanks,
Steve
On Fri, 10 Oct 2014, Steve B wrote:
> Hi All,
>
> I've been running a reboot test on 5 devices and after about 7000 reboot
> files in
> one of our UBIFS volumes reports a corruption (SQUASHFS errors). I have been
> able to extract the mtd partition and pull it into a nandsim instance on my
> host
> machine so I can add some extra debug. Here's the error log when trying to
> load
> the file:
>
> [41308.683356] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0
> [41308.683358] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB
> 1:352 (unmapped)
> [41308.683362] UBI DBG gen (pid 15044): read 4062 bytes from LEB 1:352:0
> [41308.683364] UBI DBG eba (pid 15044): read 4062 bytes from offset 0 of LEB
> 1:352 (unmapped)
> [41308.683368] UBIFS error (pid 15044): ubifs_read_node: bad node type (255
> but expected 1)
> [41308.683373] UBI DBG gen (pid 15044): test LEB 1:352
> [41308.683377] UBIFS error (pid 15044): ubifs_read_node: bad node at LEB
> 352:0, LEB mapping status 0
> [41308.683381] Not a node, first 24 bytes:
> [41308.683386] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ........................
> [41308.683392] Pid: 15044, comm: md5sum Tainted: GF O
> 3.8.0-29-generic #42~precise1-Ubuntu
> [41308.683396] Call Trace:
> [41308.683411] [<ffffffffa0564a7d>] ubifs_read_node+0x26d/0x2d0 [ubifs]
> [41308.683419] [<ffffffffa0583338>] ubifs_tnc_read_node+0x148/0x160 [ubifs]
> [41308.683426] [<ffffffffa056816c>] ubifs_tnc_locate+0x1cc/0x1f0 [ubifs]
> [41308.683431] [<ffffffffa0559ade>] do_readpage+0x10e/0x410 [ubifs]
> [41308.683437] [<ffffffffa055a4ce>] ubifs_readpage+0x5e/0x580 [ubifs]
> [41308.683441] [<ffffffff81152de3>] ? __inc_zone_page_state+0x33/0x40
> [41308.683444] [<ffffffff81135866>] ?
> add_to_page_cache_locked.part.26+0x76/0xd0
> [41308.683447] [<ffffffff8113593d>] ? add_to_page_cache_locked+0x7d/0x90
> [41308.683452] [<ffffffff81135edd>]
> do_generic_file_read.constprop.31+0x10d/0x440
> [41308.683460] [<ffffffff81136f01>] generic_file_aio_read+0xe1/0x220
> [41308.683469] [<ffffffff8119b0c3>] do_sync_read+0xa3/0xe0
> [41308.683477] [<ffffffff8119b800>] vfs_read+0xb0/0x180
> [41308.683485] [<ffffffff8119b922>] sys_read+0x52/0xa0
> [41308.683492] [<ffffffff816fc8dd>] system_call_fastpath+0x1a/0x1f
>
> I have written a small tool to scan the headers in all PEBs and this confirms
> that LEB 352 doesn't exist for this volume.
>
> We are running a Linux kernel based off 3.4.0
>
> This issue seems similar to an issue already raised on this mailing list:
> linux-mtd/2010-September/031837.html
> subject: ubi_eba_init_scan: cannot reserve enough PEBs
>
> I would like to know the best way to debug this issue, does anyone have the
> patch
> mentioned in the issue above to add more debugging in the right places?
>
> Our test setup has the facility of pulling the power on the unit if a GPIO
> line
> is toggled, with this we can test sensitive areas of the code, can anyone
> suggest
> where would be a good place to perform this action.
>
> Thanks for any help on this,
> Steve B
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B
2014-10-20 8:54 ` Steve B
@ 2014-10-20 13:35 ` Artem Bityutskiy
2014-10-20 13:48 ` Steve B
1 sibling, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2014-10-20 13:35 UTC (permalink / raw)
To: Steve B; +Cc: linux-mtd
On Fri, 2014-10-10 at 14:45 +0100, Steve B wrote:
> Our test setup has the facility of pulling the power on the unit if a GPIO line
> is toggled, with this we can test sensitive areas of the code, can anyone suggest
> where would be a good place to perform this action.
Did you perform testing with UBI/UBIFS debug messages on? Or the
messages in this e-mail were enable after the corruption happened?
Artem.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 13:35 ` Artem Bityutskiy
@ 2014-10-20 13:48 ` Steve B
2014-10-20 14:05 ` Artem Bityutskiy
0 siblings, 1 reply; 10+ messages in thread
From: Steve B @ 2014-10-20 13:48 UTC (permalink / raw)
To: Artem Bityutskiy; +Cc: linux-mtd
Hi Artem,
Thanks for the reply.
I have the kernel built with:
CONFIG_MTD_UBI_DEBUG=y
CONFIG_UBIFS_FS=y
The failure was seen on an embedded device, I dumped the flash
contents (raw format) and then re-constructed the image on my Linux PC
using nandsim with UBI and UBIFS modules i've built with some extra debugging.
The call stack in the log posted is the same as the one seen on the device
that showed the original problem.
Thanks,
Steve
On Mon, 20 Oct 2014, Artem Bityutskiy wrote:
> On Fri, 2014-10-10 at 14:45 +0100, Steve B wrote:
>> Our test setup has the facility of pulling the power on the unit if a GPIO line
>> is toggled, with this we can test sensitive areas of the code, can anyone suggest
>> where would be a good place to perform this action.
>
> Did you perform testing with UBI/UBIFS debug messages on? Or the
> messages in this e-mail were enable after the corruption happened?
>
> Artem.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 13:48 ` Steve B
@ 2014-10-20 14:05 ` Artem Bityutskiy
2014-10-20 14:51 ` Steve B
0 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2014-10-20 14:05 UTC (permalink / raw)
To: Steve B; +Cc: linux-mtd
On Mon, 2014-10-20 at 14:48 +0100, Steve B wrote:
> Hi Artem,
>
> Thanks for the reply.
>
> I have the kernel built with:
> CONFIG_MTD_UBI_DEBUG=y
> CONFIG_UBIFS_FS=y
>
> The failure was seen on an embedded device, I dumped the flash
> contents (raw format) and then re-constructed the image on my Linux PC
> using nandsim with UBI and UBIFS modules i've built with some extra debugging.
> The call stack in the log posted is the same as the one seen on the device
> that showed the original problem.
>
I see. Would you post the full log, not just the extract near the
failure message?
>From what I see so far, UBIFS believes that the data of a file is stored
in LEB 352. But UBI says: "nope, this LEB was unmapped".
So the bug may be in both UBIFS and UBI.
When we were writing this code years ago, we were stressing UBIFS
power-cut tolerance a lot. We dead real power cuts, but we did emulated
power cuts a lot more than real. And emulated power cuts only test
UBIFS, not UBI. On the other hand, UBI is very simple comparing to
UBIFS.
BTW, do you have fastmap enabled or disabled?
Are you doing deliberate power cut testing, or you hit this error by
chance? In the former case, you may enable additional debugging,
probably, depending on how many test-boxes are you stressing
simultaneously.
Also, are you 100% sure you "reconstructed" the image correctly, and you
are not wasting time looking at irrelevant errors?
Artem.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 14:05 ` Artem Bityutskiy
@ 2014-10-20 14:51 ` Steve B
2014-10-20 15:19 ` Artem Bityutskiy
0 siblings, 1 reply; 10+ messages in thread
From: Steve B @ 2014-10-20 14:51 UTC (permalink / raw)
To: Artem Bityutskiy; +Cc: linux-mtd
The log on the target is very minimal as I was only capturing error
logs, so don't have much info to share on that, but here is the log from
the target:
[ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
[ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
[ 13.178668] Not a node, first 24 bytes:
[ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................
[ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0)
[ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198)
[ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180)
[ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448)
[ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428)
[ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4)
[ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4)
[ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134)
[ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68)
[ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30)
[ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22
[ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
[ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
I'm currently trying to create a test to more reliably reproduce the issue,
when I do I can share those logs.
Fastmap doesn't exist in the kernel version we are running (3.4.0)
The first test to show up this issue simple copies a file in a tight loop and
power is pulled randomly on the board. I am now trying to refine the test to speed
up the failure by randomly pulling the power in the NAND driver after a random
number of erase/write calls, but I haven't been able to reproduce the issue
with this.
I'm fairly sure the reconstructed image is correct as it's producing the same
backtrace as the original issue on the target. My program that scans the NAND
flash dump for UBI headers confirms that the expected mapping is missing.
On Mon, 20 Oct 2014, Artem Bityutskiy wrote:
> On Mon, 2014-10-20 at 14:48 +0100, Steve B wrote:
>> Hi Artem,
>>
>> Thanks for the reply.
>>
>> I have the kernel built with:
>> CONFIG_MTD_UBI_DEBUG=y
>> CONFIG_UBIFS_FS=y
>>
>> The failure was seen on an embedded device, I dumped the flash
>> contents (raw format) and then re-constructed the image on my Linux PC
>> using nandsim with UBI and UBIFS modules i've built with some extra debugging.
>> The call stack in the log posted is the same as the one seen on the device
>> that showed the original problem.
>>
>
> I see. Would you post the full log, not just the extract near the
> failure message?
>
>> From what I see so far, UBIFS believes that the data of a file is stored
> in LEB 352. But UBI says: "nope, this LEB was unmapped".
>
> So the bug may be in both UBIFS and UBI.
>
> When we were writing this code years ago, we were stressing UBIFS
> power-cut tolerance a lot. We dead real power cuts, but we did emulated
> power cuts a lot more than real. And emulated power cuts only test
> UBIFS, not UBI. On the other hand, UBI is very simple comparing to
> UBIFS.
>
> BTW, do you have fastmap enabled or disabled?
>
> Are you doing deliberate power cut testing, or you hit this error by
> chance? In the former case, you may enable additional debugging,
> probably, depending on how many test-boxes are you stressing
> simultaneously.
>
> Also, are you 100% sure you "reconstructed" the image correctly, and you
> are not wasting time looking at irrelevant errors?
>
> Artem.
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 14:51 ` Steve B
@ 2014-10-20 15:19 ` Artem Bityutskiy
2014-10-20 15:25 ` Steve B
0 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2014-10-20 15:19 UTC (permalink / raw)
To: Steve B; +Cc: linux-mtd
On Mon, 2014-10-20 at 15:51 +0100, Steve B wrote:
> The log on the target is very minimal as I was only capturing error
> logs, so don't have much info to share on that, but here is the log from
> the target:
>
> [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
> [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
> [ 13.178668] Not a node, first 24 bytes:
> [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................
> [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0)
> [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198)
> [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180)
> [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448)
> [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428)
> [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4)
> [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4)
> [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134)
> [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68)
> [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30)
> [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22
> [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
> [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
>
> I'm currently trying to create a test to more reliably reproduce the issue,
> when I do I can share those logs.
>
> Fastmap doesn't exist in the kernel version we are running (3.4.0)
>
> The first test to show up this issue simple copies a file in a tight loop and
> power is pulled randomly on the board. I am now trying to refine the test to speed
> up the failure by randomly pulling the power in the NAND driver after a random
> number of erase/write calls, but I haven't been able to reproduce the issue
> with this.
Being able to reproduce this would make it bughunting a relatively easy
task.
You could also try to establish a separate UBI power cut testing.
Something like writing various patterns directly to the UBI volume,
having power cut, and making sure that what you wrote before power cut
is there is exactly what you wrote. UBI writes synchronously, so once
the write returns, the data must be on the flash. And you should write
in min. I/O unit sizes (on NAND page at a time).
I guess you can have a mirror file (if you have a second media), or a
fixed algorithm of how you select the LEB and offset to write. And you
generate random power cuts, and then after boot-up you verify that the
media contains valid data - everything, for which the write operation
was finished, should be there.
With this, you'd exclude the whole layer of UBIFS complexity. And if you
are lucky, and the bug is in UBI, it will be much easier to catch and to
fix.
With a mirror file it is probably easy. You just do modify the mirror
file, recored the last operation, sync the mirror file and the file
where you recorded the operation, then you enable the random power cut,
and start the operation on the UBI volume. The point is to not have
power cut before the mirror file is being changed.
Artem.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 15:19 ` Artem Bityutskiy
@ 2014-10-20 15:25 ` Steve B
2014-10-20 15:47 ` Artem Bityutskiy
0 siblings, 1 reply; 10+ messages in thread
From: Steve B @ 2014-10-20 15:25 UTC (permalink / raw)
To: Artem Bityutskiy; +Cc: linux-mtd
Thank you very much for your suggestions Artem, gives me plenty to look
into :)
On Mon, 20 Oct 2014, Artem Bityutskiy wrote:
> On Mon, 2014-10-20 at 15:51 +0100, Steve B wrote:
>> The log on the target is very minimal as I was only capturing error
>> logs, so don't have much info to share on that, but here is the log from
>> the target:
>>
>> [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
>> [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
>> [ 13.178668] Not a node, first 24 bytes:
>> [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................
>> [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0)
>> [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198)
>> [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180)
>> [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448)
>> [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428)
>> [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4)
>> [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4)
>> [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134)
>> [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68)
>> [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30)
>> [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22
>> [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1)
>> [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0
>>
>> I'm currently trying to create a test to more reliably reproduce the issue,
>> when I do I can share those logs.
>>
>> Fastmap doesn't exist in the kernel version we are running (3.4.0)
>>
>> The first test to show up this issue simple copies a file in a tight loop and
>> power is pulled randomly on the board. I am now trying to refine the test to speed
>> up the failure by randomly pulling the power in the NAND driver after a random
>> number of erase/write calls, but I haven't been able to reproduce the issue
>> with this.
>
> Being able to reproduce this would make it bughunting a relatively easy
> task.
>
> You could also try to establish a separate UBI power cut testing.
> Something like writing various patterns directly to the UBI volume,
> having power cut, and making sure that what you wrote before power cut
> is there is exactly what you wrote. UBI writes synchronously, so once
> the write returns, the data must be on the flash. And you should write
> in min. I/O unit sizes (on NAND page at a time).
>
> I guess you can have a mirror file (if you have a second media), or a
> fixed algorithm of how you select the LEB and offset to write. And you
> generate random power cuts, and then after boot-up you verify that the
> media contains valid data - everything, for which the write operation
> was finished, should be there.
>
> With this, you'd exclude the whole layer of UBIFS complexity. And if you
> are lucky, and the bug is in UBI, it will be much easier to catch and to
> fix.
>
> With a mirror file it is probably easy. You just do modify the mirror
> file, recored the last operation, sync the mirror file and the file
> where you recorded the operation, then you enable the random power cut,
> and start the operation on the UBI volume. The point is to not have
> power cut before the mirror file is being changed.
>
> Artem.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 15:25 ` Steve B
@ 2014-10-20 15:47 ` Artem Bityutskiy
2014-10-20 16:33 ` Steve B
0 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2014-10-20 15:47 UTC (permalink / raw)
To: Steve B; +Cc: linux-mtd
On Mon, 2014-10-20 at 16:25 +0100, Steve B wrote:
> Thank you very much for your suggestions Artem, gives me plenty to look
> into :)
May be this will be also a useful piece of data.
Back several years ago, we saw ext4 not recovering after power cuts on
eMMC.
We knew ext4 is extremely complex, so we decided to go and test if eMMC
does the right job. Namely, if I submit data, they sync, does eMMC make
sure the data are on the media?
Adrian (the other UBIFS author) implemented the eMMC power cut test. And
he found that you may write to block A, they sync, then write to block
B, and have power cut, and after reboot block A is corrupted. So
interrupted write to block B affected a different block A.
We reported to the eMMC vendor, and the vendor acknowledged the problem
and we got newer eMMC, which was devoid of the issue. And our ext4
issues went away, it became a lot better.
This is why I say may be stressing UBI could be a good idea. It may be
waste of time too, though, if the problem is in UBIFS, or if the test
won't reveal a problem in UBI because the test is not good enough. But
it may also hit the nail quickly.
Artem.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB
2014-10-20 15:47 ` Artem Bityutskiy
@ 2014-10-20 16:33 ` Steve B
0 siblings, 0 replies; 10+ messages in thread
From: Steve B @ 2014-10-20 16:33 UTC (permalink / raw)
To: Artem Bityutskiy; +Cc: linux-mtd
Thanks again, plenty for me to thing about!
On Mon, 20 Oct 2014, Artem Bityutskiy wrote:
> On Mon, 2014-10-20 at 16:25 +0100, Steve B wrote:
>> Thank you very much for your suggestions Artem, gives me plenty to look
>> into :)
>
> May be this will be also a useful piece of data.
>
> Back several years ago, we saw ext4 not recovering after power cuts on
> eMMC.
>
> We knew ext4 is extremely complex, so we decided to go and test if eMMC
> does the right job. Namely, if I submit data, they sync, does eMMC make
> sure the data are on the media?
>
> Adrian (the other UBIFS author) implemented the eMMC power cut test. And
> he found that you may write to block A, they sync, then write to block
> B, and have power cut, and after reboot block A is corrupted. So
> interrupted write to block B affected a different block A.
>
> We reported to the eMMC vendor, and the vendor acknowledged the problem
> and we got newer eMMC, which was devoid of the issue. And our ext4
> issues went away, it became a lot better.
>
> This is why I say may be stressing UBI could be a good idea. It may be
> waste of time too, though, if the problem is in UBIFS, or if the test
> won't reveal a problem in UBI because the test is not good enough. But
> it may also hit the nail quickly.
>
> Artem.
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-20 16:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-10 13:45 UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB Steve B
2014-10-20 8:54 ` Steve B
2014-10-20 13:35 ` Artem Bityutskiy
2014-10-20 13:48 ` Steve B
2014-10-20 14:05 ` Artem Bityutskiy
2014-10-20 14:51 ` Steve B
2014-10-20 15:19 ` Artem Bityutskiy
2014-10-20 15:25 ` Steve B
2014-10-20 15:47 ` Artem Bityutskiy
2014-10-20 16:33 ` Steve B
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox