public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot] Bricked when trying to attach UBI
@ 2012-12-19 11:28 Luca Ceresoli
  2012-12-19 15:24 ` Andreas Bießmann
  2012-12-19 17:32 ` Vikram Narayanan
  0 siblings, 2 replies; 12+ messages in thread
From: Luca Ceresoli @ 2012-12-19 11:28 UTC (permalink / raw)
  To: u-boot

Hi all,

I am facing a problem with some boards that do not boot after some
weeks or months of normal usage, being unable to attach UBI. They do
not boot anymore event after a power cycle, in other words they are
totally bricked.
I don't know exactly what problem UBI has, but it is recoverable by
Linux, but apparently not by U-Boot.

The boards are DIG297 (dig297 board in mainline U-Boot), based on
OMAP3530 and equipped with a NAND flash (Micron MT29F2G16ABBEAHC) as
their unique permanent storage.

U-Boot v2012.04.01 starts correctly. The bootcmd tries to load the
kernel from UBI, starting with the following commands:

echo Booting from nand ...
setenv bootargs console=ttyO2,115200n8 
mtdparts=omap2-nand.0:768k(uboot),128k(reserved),128k(uboot-env),-(ubi) 
ubi.mtd=3 root=ubi0:rootfs ro rootfstype=ubifs ip=....
ubi part nand0,3
...

On "bricked" devices the output of the "ubi part nand0,3" command is:

Creating 1 MTD partitions on "nand0":
0x000000100000-0x000010000000 : "mtd=3"
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)

Now the device is totally blocked, and power cycling does not change
the result.

The interesting thing is that if I load Linux (2.6.37 + OMAP patches +
board support patches) via TFTP and boot it with bootm, it correctly
attaches UBI (fixing any problem it may have) and boots correctly.
After that the board is unbricked: U-Boot can boot again normally from
NAND.

Without the ambition of understanding all UBI internals, I tried to
visually inspect the UBI code around the line where the error is
produced and compare it to the corresponding Linux sources. They looked
extremely similar, so I haven't and obvious hint of why U-Boot and
Linux produce different results.

I also tried with an updated U-Boot master, but the error is still
there.

Obviously I have changed nothing in the UBI and MTD code, both in
U-Boot and in Linux.

Can you suggest a proper way to track the root of the problem, or to
bypass it?

Big thanks in advance,

Luca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 11:28 [U-Boot] Bricked when trying to attach UBI Luca Ceresoli
@ 2012-12-19 15:24 ` Andreas Bießmann
  2012-12-19 15:56   ` Luca Ceresoli
  2012-12-19 17:32 ` Vikram Narayanan
  1 sibling, 1 reply; 12+ messages in thread
From: Andreas Bießmann @ 2012-12-19 15:24 UTC (permalink / raw)
  To: u-boot

Dear Luca Ceresoli,

On 19.12.2012 12:28, Luca Ceresoli wrote:
> Hi all,
> 
> I am facing a problem with some boards that do not boot after some
> weeks or months of normal usage, being unable to attach UBI. They do
> not boot anymore event after a power cycle, in other words they are
> totally bricked.
> I don't know exactly what problem UBI has, but it is recoverable by
> Linux, but apparently not by U-Boot.
> 
> The boards are DIG297 (dig297 board in mainline U-Boot), based on
> OMAP3530 and equipped with a NAND flash (Micron MT29F2G16ABBEAHC) as
> their unique permanent storage.
> 
> U-Boot v2012.04.01 starts correctly. The bootcmd tries to load the
> kernel from UBI, starting with the following commands:
> 
> echo Booting from nand ...
> setenv bootargs console=ttyO2,115200n8
> mtdparts=omap2-nand.0:768k(uboot),128k(reserved),128k(uboot-env),-(ubi)
> ubi.mtd=3 root=ubi0:rootfs ro rootfstype=ubifs ip=....
> ubi part nand0,3
> ...
> 
> On "bricked" devices the output of the "ubi part nand0,3" command is:
> 
> Creating 1 MTD partitions on "nand0":
> 0x000000100000-0x000010000000 : "mtd=3"
> UBI: attaching mtd1 to ubi0
> UBI: physical eraseblock size:   131072 bytes (128 KiB)
> UBI: logical eraseblock size:    129024 bytes
> UBI: smallest flash I/O unit:    2048
> UBI: sub-page size:              512
> UBI: VID header offset:          512 (aligned 512)
> UBI: data offset:                2048
> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)
> 
> Now the device is totally blocked, and power cycling does not change
> the result.

have you tried to increase the malloc arena in u-boot
(CONIG_SYS_MALLOC_LEN)?
We had errors like this before [1],[2] and [3], maybe others -
apparently with another error message, but please give it a try. We know
ubi recovery needs some ram and 1MiB may be not enough.

> The interesting thing is that if I load Linux (2.6.37 + OMAP patches +
> board support patches) via TFTP and boot it with bootm, it correctly
> attaches UBI (fixing any problem it may have) and boots correctly.
> After that the board is unbricked: U-Boot can boot again normally from
> NAND.

The fact that linux can recover with a quite old version points for me
towards 'environment constraints' like to few memory in u-boot.
Unfortunately the error messages in u-boots ubi sometimes missing such
details (like -ENOMEM as in [1]).

Best regards

Andreas Bie?mann

[1] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/124769
[2] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/145526
[3] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/145655

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 15:24 ` Andreas Bießmann
@ 2012-12-19 15:56   ` Luca Ceresoli
  2012-12-19 16:09     ` Andreas Bießmann
  0 siblings, 1 reply; 12+ messages in thread
From: Luca Ceresoli @ 2012-12-19 15:56 UTC (permalink / raw)
  To: u-boot

Hi Andreas,

Andreas Bie?mann wrote:
...
>> Creating 1 MTD partitions on "nand0":
>> 0x000000100000-0x000010000000 : "mtd=3"
>> UBI: attaching mtd1 to ubi0
>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>> UBI: logical eraseblock size:    129024 bytes
>> UBI: smallest flash I/O unit:    2048
>> UBI: sub-page size:              512
>> UBI: VID header offset:          512 (aligned 512)
>> UBI: data offset:                2048
>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)
>>
>> Now the device is totally blocked, and power cycling does not change
>> the result.
>
> have you tried to increase the malloc arena in u-boot
> (CONIG_SYS_MALLOC_LEN)?
> We had errors like this before [1],[2] and [3], maybe others -
> apparently with another error message, but please give it a try. We know
> ubi recovery needs some ram and 1MiB may be not enough.

Thanks for your suggestion.

Unfortunately this does not seem to be the cause of my problem: I tried
increasing my CONFIG_SYS_MALLOC_LEN in include/configs/dig297.h from
(1024 << 10) to both (1024 << 12) and (1024 << 14), but without any
difference.

Luca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 15:56   ` Luca Ceresoli
@ 2012-12-19 16:09     ` Andreas Bießmann
  2012-12-19 17:37       ` Luca Ceresoli
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Bießmann @ 2012-12-19 16:09 UTC (permalink / raw)
  To: u-boot

Hi Luca,

On 19.12.2012 16:56, Luca Ceresoli wrote:
> Hi Andreas,
> 
> Andreas Bie?mann wrote:
> ...
>>> Creating 1 MTD partitions on "nand0":
>>> 0x000000100000-0x000010000000 : "mtd=3"
>>> UBI: attaching mtd1 to ubi0
>>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>>> UBI: logical eraseblock size:    129024 bytes
>>> UBI: smallest flash I/O unit:    2048
>>> UBI: sub-page size:              512
>>> UBI: VID header offset:          512 (aligned 512)
>>> UBI: data offset:                2048
>>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)
>>>
>>> Now the device is totally blocked, and power cycling does not change
>>> the result.
>>
>> have you tried to increase the malloc arena in u-boot
>> (CONIG_SYS_MALLOC_LEN)?
>> We had errors like this before [1],[2] and [3], maybe others -
>> apparently with another error message, but please give it a try. We know
>> ubi recovery needs some ram and 1MiB may be not enough.
> 
> Thanks for your suggestion.
> 
> Unfortunately this does not seem to be the cause of my problem: I tried
> increasing my CONFIG_SYS_MALLOC_LEN in include/configs/dig297.h from
> (1024 << 10) to both (1024 << 12) and (1024 << 14), but without any
> difference.

Well, ok ... Malloc arena is always my first thought if I read about
problems with ubi in u-boot.
Have you looked up the differences in drivers/mtd/ubi/ in your u-boot
and linux tree? Maybe you can see something obviously different in the
ubi_wl_init_scan()?

Best regards

Andreas Bie?mann

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 11:28 [U-Boot] Bricked when trying to attach UBI Luca Ceresoli
  2012-12-19 15:24 ` Andreas Bießmann
@ 2012-12-19 17:32 ` Vikram Narayanan
  2012-12-19 18:22   ` Stefan Roese
  1 sibling, 1 reply; 12+ messages in thread
From: Vikram Narayanan @ 2012-12-19 17:32 UTC (permalink / raw)
  To: u-boot

On 12/19/2012 4:58 PM, Luca Ceresoli wrote:
> Hi all,
>
<snip>
> On "bricked" devices the output of the "ubi part nand0,3" command is:
>
> Creating 1 MTD partitions on "nand0":
> 0x000000100000-0x000010000000 : "mtd=3"
> UBI: attaching mtd1 to ubi0
> UBI: physical eraseblock size:   131072 bytes (128 KiB)
> UBI: logical eraseblock size:    129024 bytes
> UBI: smallest flash I/O unit:    2048
> UBI: sub-page size:              512
> UBI: VID header offset:          512 (aligned 512)
> UBI: data offset:                2048
> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)

Just curious, What does the above command say when you try to attach an 
empty partition. Does it result in the same error?

> Now the device is totally blocked, and power cycling does not change
> the result.
>
> The interesting thing is that if I load Linux (2.6.37 + OMAP patches +
> board support patches) via TFTP and boot it with bootm, it correctly
> attaches UBI (fixing any problem it may have) and boots correctly.
> After that the board is unbricked: U-Boot can boot again normally from
> NAND.
>
> Without the ambition of understanding all UBI internals, I tried to
> visually inspect the UBI code around the line where the error is
> produced and compare it to the corresponding Linux sources. They looked
> extremely similar, so I haven't and obvious hint of why U-Boot and
> Linux produce different results.
>
> I also tried with an updated U-Boot master, but the error is still
> there.
>
> Obviously I have changed nothing in the UBI and MTD code, both in
> U-Boot and in Linux.
>
> Can you suggest a proper way to track the root of the problem, or to
> bypass it?

I think its the right time to sync the UBI code with the current kernel 
tree. But it seems like a huge work. Any suggestions?

Regards,
Vikram

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 16:09     ` Andreas Bießmann
@ 2012-12-19 17:37       ` Luca Ceresoli
  2012-12-20 12:44         ` Holger Brunck
  2012-12-20 16:02         ` Luca Ceresoli
  0 siblings, 2 replies; 12+ messages in thread
From: Luca Ceresoli @ 2012-12-19 17:37 UTC (permalink / raw)
  To: u-boot

Hi Andreas,

Andreas Bie?mann wrote:
> Hi Luca,
>
> On 19.12.2012 16:56, Luca Ceresoli wrote:
>> Hi Andreas,
>>
>> Andreas Bie?mann wrote:
>> ...
>>>> Creating 1 MTD partitions on "nand0":
>>>> 0x000000100000-0x000010000000 : "mtd=3"
>>>> UBI: attaching mtd1 to ubi0
>>>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>>>> UBI: logical eraseblock size:    129024 bytes
>>>> UBI: smallest flash I/O unit:    2048
>>>> UBI: sub-page size:              512
>>>> UBI: VID header offset:          512 (aligned 512)
>>>> UBI: data offset:                2048
>>>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)
>>>>
>>>> Now the device is totally blocked, and power cycling does not change
>>>> the result.
>>>
>>> have you tried to increase the malloc arena in u-boot
>>> (CONIG_SYS_MALLOC_LEN)?
>>> We had errors like this before [1],[2] and [3], maybe others -
>>> apparently with another error message, but please give it a try. We know
>>> ubi recovery needs some ram and 1MiB may be not enough.
>>
>> Thanks for your suggestion.
>>
>> Unfortunately this does not seem to be the cause of my problem: I tried
>> increasing my CONFIG_SYS_MALLOC_LEN in include/configs/dig297.h from
>> (1024 << 10) to both (1024 << 12) and (1024 << 14), but without any
>> difference.
>
> Well, ok ... Malloc arena is always my first thought if I read about
> problems with ubi in u-boot.
> Have you looked up the differences in drivers/mtd/ubi/ in your u-boot
> and linux tree? Maybe you can see something obviously different in the
> ubi_wl_init_scan()?

I had some days ago, but I double-checked now as you suggested. Indeed
there is an important difference: attach_by_scanning() (build.c) calls
ubi_wl_init_scan() and ubi_eba_init_scan() just like Linux does, but in
a swapped order!

This swap dates back to:

commit d63894654df72b010de2abb4b3f07d0d755f65b6
Author: Holger Brunck <holger.brunck@keymile.com>
Date:   Mon Oct 10 13:08:19 2011 +0200

     UBI: init eba tables before wl when attaching a device

     This fixes that u-boot gets stuck when a bitflip was detected
     during "ubi part <ubi_device>". If a bitflip was detected UBI tries
     to copy the PEB to a different place. This needs that the eba table
     are initialized, but this was done after the wear levelling worker
     detects the bitflip. So changes the initialisation of these two
     tasks in u-boot.

     This is a u-boot specific patch and not needed in the linux layer,
     because due to commit 1b1f9a9d00447d
     UBI: Ensure that "background thread" operations are really executed
     we schedule these tasks in place and not as in linux after the inital
     task which schedule this new task is finished.

     Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
     cc: Stefan Roese <sr@denx.de>
     Signed-off-by: Stefan Roese <sr@denx.de>

I tried reverting that commit and... surprise! U-Boot can now attach UBI
and boot properly!

But the cited commit actually fixed a bug that bite our board a few
months back, so it should not be reverted without thinking twice. Now
it apparently introduced another bug. :-(

I'm Cc:ing the commit author for comments.

Nonetheless, I have evidence of a different behaviour between U-Boot
and Linux even before the two swapped functions are called.

What attach_by_scanning() does in Linux is (abbreviated):

static int attach_by_scanning(struct ubi_device *ubi)
{
         si = ubi_scan(ubi);
	...fill ubi->some_fields...;
         err = ubi_read_volume_table(ubi, si);
	/* MARK */
         err = ubi_eba_init_scan(ubi, si); /* swapped in U-Boot */
         err = ubi_wl_init_scan(ubi, si);  /* swapped in U-Boot */
         ubi_scan_destroy_si(si);
         return 0;
}

See the two swapped calls.

At MARK, I printed some of the peb counters in *ubi, and I got
different results for ubi->avail_pebs between U-Boot and Linux:
U-Boot: UBI: POST_TBL: rsvd=2018, avail=21, beb_rsvd_{pebs,level}=0,0
Linux:  UBI: POST_TBL: rsvd=2018, avail=22, beb_rsvd_{pebs,level}=0,0

The printed values were equal before calling ubi_read_volume_table().
I have no idea about where this difference comes from, nor if this
difference can cause my troubles.
I will better investigate tomorrow looking into ubi_read_volume_table().

Luca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 17:32 ` Vikram Narayanan
@ 2012-12-19 18:22   ` Stefan Roese
  2012-12-19 18:47     ` Vikram Narayanan
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Roese @ 2012-12-19 18:22 UTC (permalink / raw)
  To: u-boot

On 12/19/2012 06:32 PM, Vikram Narayanan wrote:
>> On "bricked" devices the output of the "ubi part nand0,3" command is:
>>
>> Creating 1 MTD partitions on "nand0":
>> 0x000000100000-0x000010000000 : "mtd=3"
>> UBI: attaching mtd1 to ubi0
>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>> UBI: logical eraseblock size:    129024 bytes
>> UBI: smallest flash I/O unit:    2048
>> UBI: sub-page size:              512
>> UBI: VID header offset:          512 (aligned 512)
>> UBI: data offset:                2048
>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0, need 1)
> 
> Just curious, What does the above command say when you try to attach an 
> empty partition. Does it result in the same error?
> 
>> Now the device is totally blocked, and power cycling does not change
>> the result.
>>
>> The interesting thing is that if I load Linux (2.6.37 + OMAP patches +
>> board support patches) via TFTP and boot it with bootm, it correctly
>> attaches UBI (fixing any problem it may have) and boots correctly.
>> After that the board is unbricked: U-Boot can boot again normally from
>> NAND.
>>
>> Without the ambition of understanding all UBI internals, I tried to
>> visually inspect the UBI code around the line where the error is
>> produced and compare it to the corresponding Linux sources. They looked
>> extremely similar, so I haven't and obvious hint of why U-Boot and
>> Linux produce different results.
>>
>> I also tried with an updated U-Boot master, but the error is still
>> there.
>>
>> Obviously I have changed nothing in the UBI and MTD code, both in
>> U-Boot and in Linux.
>>
>> Can you suggest a proper way to track the root of the problem, or to
>> bypass it?
> 
> I think its the right time to sync the UBI code with the current kernel 
> tree. But it seems like a huge work. Any suggestions?

Yes, syncing with the latest UBI/UBIFS code would be the best solution.
Even though a try with an increased malloc area as suggested by Andreas
might be a chance.

And yes, this re-sync with the latest-and-greatest Linux code version is
of course a bigger task. It has been suggest as part of booting from an
UBI volume task to the celinux forum:

http://lists.celinuxforum.org/pipermail/celinux-dev/2012-April/000543.html

But nothing has happened till now. Any volunteers? But please keep in
mind that intensive testing is required before the current (stable?)
code version can be replaced.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 18:22   ` Stefan Roese
@ 2012-12-19 18:47     ` Vikram Narayanan
  2012-12-19 18:57       ` Vikram Narayanan
  0 siblings, 1 reply; 12+ messages in thread
From: Vikram Narayanan @ 2012-12-19 18:47 UTC (permalink / raw)
  To: u-boot

On 12/19/2012 11:52 PM, Stefan Roese wrote:
<snip>
>> I think its the right time to sync the UBI code with the current kernel
>> tree. But it seems like a huge work. Any suggestions?
>
> Yes, syncing with the latest UBI/UBIFS code would be the best solution.
> Even though a try with an increased malloc area as suggested by Andreas
> might be a chance.
>
> And yes, this re-sync with the latest-and-greatest Linux code version is
> of course a bigger task. It has been suggest as part of booting from an
> UBI volume task to the celinux forum:
>
> http://lists.celinuxforum.org/pipermail/celinux-dev/2012-April/000543.html

Yeah. I had queried sometime back on the activity of this task.

> But nothing has happened till now. Any volunteers? But please keep in
> mind that intensive testing is required before the current (stable?)
> code version can be replaced.
>

Looks like the MTD layer might needs to be patched up as well at some 
places. What do you think?

Regards,
Vikram

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 18:47     ` Vikram Narayanan
@ 2012-12-19 18:57       ` Vikram Narayanan
  0 siblings, 0 replies; 12+ messages in thread
From: Vikram Narayanan @ 2012-12-19 18:57 UTC (permalink / raw)
  To: u-boot

On 12/20/2012 12:17 AM, Vikram Narayanan wrote:
> On 12/19/2012 11:52 PM, Stefan Roese wrote:
> <snip>
>>> I think its the right time to sync the UBI code with the current kernel
>>> tree. But it seems like a huge work. Any suggestions?
>>
>> Yes, syncing with the latest UBI/UBIFS code would be the best solution.
>> Even though a try with an increased malloc area as suggested by Andreas
>> might be a chance.
>>
>> And yes, this re-sync with the latest-and-greatest Linux code version is
>> of course a bigger task. It has been suggest as part of booting from an
>> UBI volume task to the celinux forum:
>>
>> http://lists.celinuxforum.org/pipermail/celinux-dev/2012-April/000543.html
>>
>
> Yeah. I had queried sometime back on the activity of this task.
>
>> But nothing has happened till now. Any volunteers? But please keep in
>> mind that intensive testing is required before the current (stable?)
>> code version can be replaced.
>>
>
> Looks like the MTD layer might needs to be patched up as well at some
> places. What do you think?

May be we shall start some discussions and put forth some ideas, which 
might eventually invite some volunteers.

What is your proposal of syncing with the latest code?
* Pick out changes from the Kernel's git (pick out UBI related commits 
right from the point where current u-boot code is)
* Compare and move the code

Both are equally complicated with the second option having very less 
chance to figure out why that was added. Ideas are welcome.

Regards,
Vikram

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 17:37       ` Luca Ceresoli
@ 2012-12-20 12:44         ` Holger Brunck
  2012-12-20 16:02         ` Luca Ceresoli
  1 sibling, 0 replies; 12+ messages in thread
From: Holger Brunck @ 2012-12-20 12:44 UTC (permalink / raw)
  To: u-boot

Hi Luca,

On 12/19/2012 06:37 PM, Luca Ceresoli wrote:
> I had some days ago, but I double-checked now as you suggested. Indeed
> there is an important difference: attach_by_scanning() (build.c) calls
> ubi_wl_init_scan() and ubi_eba_init_scan() just like Linux does, but in
> a swapped order!
> 
> This swap dates back to:
> 
> commit d63894654df72b010de2abb4b3f07d0d755f65b6
> Author: Holger Brunck <holger.brunck@keymile.com>
> Date:   Mon Oct 10 13:08:19 2011 +0200
> 
>     UBI: init eba tables before wl when attaching a device
> 
>     This fixes that u-boot gets stuck when a bitflip was detected
>     during "ubi part <ubi_device>". If a bitflip was detected UBI tries
>     to copy the PEB to a different place. This needs that the eba table
>     are initialized, but this was done after the wear levelling worker
>     detects the bitflip. So changes the initialisation of these two
>     tasks in u-boot.
> 
>     This is a u-boot specific patch and not needed in the linux layer,
>     because due to commit 1b1f9a9d00447d
>     UBI: Ensure that "background thread" operations are really executed
>     we schedule these tasks in place and not as in linux after the inital
>     task which schedule this new task is finished.
> 
>     Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
>     cc: Stefan Roese <sr@denx.de>
>     Signed-off-by: Stefan Roese <sr@denx.de>
> 
> I tried reverting that commit and... surprise! U-Boot can now attach UBI
> and boot properly!
> 

:-(

> But the cited commit actually fixed a bug that bite our board a few
> months back, so it should not be reverted without thinking twice. Now
> it apparently introduced another bug. :-(
> 

yes definetely.

I didn't read the whole thread, so I don't know what your exact problem is. On
my boards the ubi layer seems to work fine on latest u-boot. But I see a general
problem we have in the ubi layer in u-boot. I try to summarize my view:

The UBI layer was initialy copied from the linux implementation. But the linux
implementation relies for some tasks e.g. fix correctable errors on a background
thread. Due to the fact that u-boot is single threaded there was one commit
which wants to take care that these background tasks are really executed (CC-ing
the author):
commit 1b1f9a9d00  UBI: Ensure that "background thread" operations are really
executed

U-boot executes this background taks immediately but the linux implementation
executes this tasks later with the help of some synchronisation mechanism.
Therefore we have a different order executing these tasks. My fix did now a
change in the initialisation order of eba tables and the wear leveling thread,
to address my problem. But now it seems to cause a new problem on your side.

So the synchronisation mechanism in u-boot for the ubi tasks which are running
on linux in background is incorrect. But how this could be fixed needs to have
some deeper analyses.

Regards
Holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-19 17:37       ` Luca Ceresoli
  2012-12-20 12:44         ` Holger Brunck
@ 2012-12-20 16:02         ` Luca Ceresoli
  2013-01-02 14:37           ` Luca Ceresoli
  1 sibling, 1 reply; 12+ messages in thread
From: Luca Ceresoli @ 2012-12-20 16:02 UTC (permalink / raw)
  To: u-boot

Hi,

I'm Cc'ing the linux-mtd list as well as the authors of the Linux
commits cited below.

For these new readers: I reported a problem with U-Boot 2012.04.01 not
being able to attach an UBI partition in NAND, while Linux (2.6.37) can
attach and repair it.

It looks like an U-Boot bug, but I discovered strange things around the
chip->badblockbits variable (in the NAND code) by comparing the
relevant code in U-Boot and Linux.

Sorry for Cc'ing so many people, but following this issue I was lead
from one subsystem to another (and from U-Boot to Linux).

Previous discussion is here:
http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/149624

Luca Ceresoli wrote:
> Hi Andreas,
>
> Andreas Bie?mann wrote:
>> Hi Luca,
>>
>> On 19.12.2012 16:56, Luca Ceresoli wrote:
>>> Hi Andreas,
>>>
>>> Andreas Bie?mann wrote:
>>> ...
>>>>> Creating 1 MTD partitions on "nand0":
>>>>> 0x000000100000-0x000010000000 : "mtd=3"
>>>>> UBI: attaching mtd1 to ubi0
>>>>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>>>>> UBI: logical eraseblock size:    129024 bytes
>>>>> UBI: smallest flash I/O unit:    2048
>>>>> UBI: sub-page size:              512
>>>>> UBI: VID header offset:          512 (aligned 512)
>>>>> UBI: data offset:                2048
>>>>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0,
>>>>> need 1)
>>>>>
>>>>> Now the device is totally blocked, and power cycling does not change
>>>>> the result.
>>>>
>>>> have you tried to increase the malloc arena in u-boot
>>>> (CONIG_SYS_MALLOC_LEN)?
>>>> We had errors like this before [1],[2] and [3], maybe others -
>>>> apparently with another error message, but please give it a try. We
>>>> know
>>>> ubi recovery needs some ram and 1MiB may be not enough.
>>>
>>> Thanks for your suggestion.
>>>
>>> Unfortunately this does not seem to be the cause of my problem: I tried
>>> increasing my CONFIG_SYS_MALLOC_LEN in include/configs/dig297.h from
>>> (1024 << 10) to both (1024 << 12) and (1024 << 14), but without any
>>> difference.
>>
>> Well, ok ... Malloc arena is always my first thought if I read about
>> problems with ubi in u-boot.
>> Have you looked up the differences in drivers/mtd/ubi/ in your u-boot
>> and linux tree? Maybe you can see something obviously different in the
>> ubi_wl_init_scan()?
>
> I had some days ago, but I double-checked now as you suggested. Indeed
> there is an important difference: attach_by_scanning() (build.c) calls
> ubi_wl_init_scan() and ubi_eba_init_scan() just like Linux does, but in
> a swapped order!
>
> This swap dates back to:
>
> commit d63894654df72b010de2abb4b3f07d0d755f65b6
> Author: Holger Brunck <holger.brunck@keymile.com>
> Date:   Mon Oct 10 13:08:19 2011 +0200
>
>      UBI: init eba tables before wl when attaching a device
>
>      This fixes that u-boot gets stuck when a bitflip was detected
>      during "ubi part <ubi_device>". If a bitflip was detected UBI tries
>      to copy the PEB to a different place. This needs that the eba table
>      are initialized, but this was done after the wear levelling worker
>      detects the bitflip. So changes the initialisation of these two
>      tasks in u-boot.
>
>      This is a u-boot specific patch and not needed in the linux layer,
>      because due to commit 1b1f9a9d00447d
>      UBI: Ensure that "background thread" operations are really executed
>      we schedule these tasks in place and not as in linux after the inital
>      task which schedule this new task is finished.
>
>      Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
>      cc: Stefan Roese <sr@denx.de>
>      Signed-off-by: Stefan Roese <sr@denx.de>
>
> I tried reverting that commit and... surprise! U-Boot can now attach UBI
> and boot properly!
>
> But the cited commit actually fixed a bug that bite our board a few
> months back, so it should not be reverted without thinking twice. Now
> it apparently introduced another bug. :-(
>
> I'm Cc:ing the commit author for comments.
>
> Nonetheless, I have evidence of a different behaviour between U-Boot
> and Linux even before the two swapped functions are called.
>
> What attach_by_scanning() does in Linux is (abbreviated):
>
> static int attach_by_scanning(struct ubi_device *ubi)
> {
>          si = ubi_scan(ubi);
>      ...fill ubi->some_fields...;
>          err = ubi_read_volume_table(ubi, si);
>      /* MARK */
>          err = ubi_eba_init_scan(ubi, si); /* swapped in U-Boot */
>          err = ubi_wl_init_scan(ubi, si);  /* swapped in U-Boot */
>          ubi_scan_destroy_si(si);
>          return 0;
> }
>
> See the two swapped calls.
>
> At MARK, I printed some of the peb counters in *ubi, and I got
> different results for ubi->avail_pebs between U-Boot and Linux:
> U-Boot: UBI: POST_TBL: rsvd=2018, avail=21, beb_rsvd_{pebs,level}=0,0
> Linux:  UBI: POST_TBL: rsvd=2018, avail=22, beb_rsvd_{pebs,level}=0,0
>
> The printed values were equal before calling ubi_read_volume_table().
> I have no idea about where this difference comes from, nor if this
> difference can cause my troubles.
> I will better investigate tomorrow looking into ubi_read_volume_table().

After half a day of debugging and an insane amount of printk()s added to
both U-Boot and Linux, I have some more hints to understand the problem.

The two different results quoted above show that U-Boot counted 21
available eraseblocks, while Linux counts 22. I am not sure if this can
cause my problem, but it's the first visible difference between U-Boot
and Linux.

This originates from ubi_scan() (scan.c): in U-Boot, it sets
si->bad_peb_count to 1, in Linux to 0. U-Boot's ubi_scan() is very
similar to Linux's, and the differences do not seem to relevant in my 
case. So let's dig down...

- ubi_scan() (scan.c) calls process_eb() (scan.c) for each EB
- process_eb() calls ubi_io_is_bad() (io.c), and if it returns >0 it
   increments si->bad_peb_count, which is what is happening to my board
   when executing U-Boot
- ubi_io_is_bad() calls mtd->block_isbad(), which points to
   nand_block_isbad() (nand_base.c)
- nand_block_isbad() is a wrapper to nand_block_checkbad() (nand_base.c)
- nand_block_checkbad() differs from the Linux code in something
   related to lazy bad block scanning (commit fb49454b1b6c7c6, Feb 2012),
   but this does not seem to change the behaviour I observe;
- nand_block_checkbad() calls either chip->block_bad() or
   nand_isbad_bbt(); I tracked only into the former, but I suspect the
   latter produces the same effects with regard to the problem I'm facing
- chip->block_bad() points to nand_block_bad() (nand_base.c)

nand_block_bad() (nand_base.c) does the following:
static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip)
{
	...

         if (likely(chip->badblockbits == 8))
                 res = bad != 0xFF;
         else
                 res = hweight8(bad) < chip->badblockbits;

         if (getchip)
                 nand_release_device(mtd);

         return res;
}

I don't understand the algorithm, but the relevant variables have these
values:
U-Boot: nand_block_bad: chip->badblockbits=8, bad=0000, hweight8(bad)=0
Linux:  nand_block_bad: chip->badblockbits=0, bad=0000, hweight8(bad)=0
                                            ^

Obviously the U-Boot and Linux produce a different return value.
This propagates up to ubi->bad_peb_count in ubi_scan(), and from there
it changes the behaviour of the following code, leading to a block in
U-Boot and a successful attach in Linux.

chip->badblockbits in current Linux master is described as
"minimum number of set bits in a good block's bad block marker
position; i.e., BBM == 11110111b is not bad when badblockbits == 7".

Still a bit obscure to me because I don't have a general picture.
Anyway, here's how its value comes to be different between U-Boot
(2012.04.01) and Linux (2.6.37).

Linux:
a) commit e0b58d0a7005, Feb 2010:
    mtd: nand: add ->badblockbits for minimum number of set bits in bad
    block byte
    declared the new variable and introduced in nand_get_flash_type()
    (nand_base.c) the following line:
      chip->badblockbits = 8;
b) commit c7b28e25cb9, Jul 2010:
    mtd: nand: refactor BB marker detection
    removed from nand_get_flash_type() (nand_base.c) the same line:
      chip->badblockbits = 8;
c) commit 26d9be11485e, Apr 2011:
    mtd: return badblockbits back
    restored in nand_get_flash_type() (nand_base.c) the following line:
      chip->badblockbits = 8;
   claiming it had been accidentally removed in commit b).

The version of Linux I'm using (2.6.37), contains commits a) and b), so
it has chip->badblockbits equal to 0. According to the log message of
commit c), this should be wrong, but the resulting kernel works!

The version of U-Boot (2012.04.01) contains the result of all 3 commits,
since

   commit 2a8e0fc8b3dc31a3c571e439fbf04b882c8986be
   Author: Christian Hitz <christian.hitz@aizo.com>
   Date:   Wed Oct 12 09:32:02 2011 +0200

       nand: Merge changes from Linux nand driver

       [backport from linux commit
           02f8c6aee8df3cdc935e9bdd4f2d020306035dbe]

       This patch synchronizes the nand driver with the Linux 3.0 state.

This looks like an improvement, but it bricks my board!

I could not resist, and without even trying to understand what I was
doing, I did in U-Boot's nand_get_flash_type() (nand_base.c):

-       chip->badblockbits = 8;
+       chip->badblockbits = 0;

and guess what? U-Boot attached UBI, loaded Linux from it and booted
successfully!

No, I don't think changing lines here and there without any real
understanding is a way to produce reliable software. But I'm unable
to understand why the software that should work better actually bricks
the board and the other one runs fine? And how do I know what the
correct value for chip->badblockbits should be?

And last but most important: how can I properly fix U-Boot?

Thanks,
Luca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] Bricked when trying to attach UBI
  2012-12-20 16:02         ` Luca Ceresoli
@ 2013-01-02 14:37           ` Luca Ceresoli
  0 siblings, 0 replies; 12+ messages in thread
From: Luca Ceresoli @ 2013-01-02 14:37 UTC (permalink / raw)
  To: u-boot

Luca Ceresoli wrote:
> Hi,
>
> I'm Cc'ing the linux-mtd list as well as the authors of the Linux
> commits cited below.
>
> For these new readers: I reported a problem with U-Boot 2012.04.01 not
> being able to attach an UBI partition in NAND, while Linux (2.6.37) can
> attach and repair it.
>
> It looks like an U-Boot bug, but I discovered strange things around the
> chip->badblockbits variable (in the NAND code) by comparing the
> relevant code in U-Boot and Linux.
>
> Sorry for Cc'ing so many people, but following this issue I was lead
> from one subsystem to another (and from U-Boot to Linux).
>
> Previous discussion is here:
> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/149624
>
> Luca Ceresoli wrote:
>> Hi Andreas,
>>
>> Andreas Bie?mann wrote:
>>> Hi Luca,
>>>
>>> On 19.12.2012 16:56, Luca Ceresoli wrote:
>>>> Hi Andreas,
>>>>
>>>> Andreas Bie?mann wrote:
>>>> ...
>>>>>> Creating 1 MTD partitions on "nand0":
>>>>>> 0x000000100000-0x000010000000 : "mtd=3"
>>>>>> UBI: attaching mtd1 to ubi0
>>>>>> UBI: physical eraseblock size:   131072 bytes (128 KiB)
>>>>>> UBI: logical eraseblock size:    129024 bytes
>>>>>> UBI: smallest flash I/O unit:    2048
>>>>>> UBI: sub-page size:              512
>>>>>> UBI: VID header offset:          512 (aligned 512)
>>>>>> UBI: data offset:                2048
>>>>>> UBI error: ubi_wl_init_scan: no enough physical eraseblocks (0,
>>>>>> need 1)
>>>>>>
>>>>>> Now the device is totally blocked, and power cycling does not change
>>>>>> the result.
>>>>>
>>>>> have you tried to increase the malloc arena in u-boot
>>>>> (CONIG_SYS_MALLOC_LEN)?
>>>>> We had errors like this before [1],[2] and [3], maybe others -
>>>>> apparently with another error message, but please give it a try. We
>>>>> know
>>>>> ubi recovery needs some ram and 1MiB may be not enough.
>>>>
>>>> Thanks for your suggestion.
>>>>
>>>> Unfortunately this does not seem to be the cause of my problem: I 
>>>> tried
>>>> increasing my CONFIG_SYS_MALLOC_LEN in include/configs/dig297.h from
>>>> (1024 << 10) to both (1024 << 12) and (1024 << 14), but without any
>>>> difference.
>>>
>>> Well, ok ... Malloc arena is always my first thought if I read about
>>> problems with ubi in u-boot.
>>> Have you looked up the differences in drivers/mtd/ubi/ in your u-boot
>>> and linux tree? Maybe you can see something obviously different in the
>>> ubi_wl_init_scan()?
>>
>> I had some days ago, but I double-checked now as you suggested. Indeed
>> there is an important difference: attach_by_scanning() (build.c) calls
>> ubi_wl_init_scan() and ubi_eba_init_scan() just like Linux does, but in
>> a swapped order!
>>
>> This swap dates back to:
>>
>> commit d63894654df72b010de2abb4b3f07d0d755f65b6
>> Author: Holger Brunck <holger.brunck@keymile.com>
>> Date:   Mon Oct 10 13:08:19 2011 +0200
>>
>>      UBI: init eba tables before wl when attaching a device
>>
>>      This fixes that u-boot gets stuck when a bitflip was detected
>>      during "ubi part <ubi_device>". If a bitflip was detected UBI tries
>>      to copy the PEB to a different place. This needs that the eba table
>>      are initialized, but this was done after the wear levelling worker
>>      detects the bitflip. So changes the initialisation of these two
>>      tasks in u-boot.
>>
>>      This is a u-boot specific patch and not needed in the linux layer,
>>      because due to commit 1b1f9a9d00447d
>>      UBI: Ensure that "background thread" operations are really executed
>>      we schedule these tasks in place and not as in linux after the 
>> inital
>>      task which schedule this new task is finished.
>>
>>      Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
>>      cc: Stefan Roese <sr@denx.de>
>>      Signed-off-by: Stefan Roese <sr@denx.de>
>>
>> I tried reverting that commit and... surprise! U-Boot can now attach UBI
>> and boot properly!
>>
>> But the cited commit actually fixed a bug that bite our board a few
>> months back, so it should not be reverted without thinking twice. Now
>> it apparently introduced another bug. :-(
>>
>> I'm Cc:ing the commit author for comments.
>>
>> Nonetheless, I have evidence of a different behaviour between U-Boot
>> and Linux even before the two swapped functions are called.
>>
>> What attach_by_scanning() does in Linux is (abbreviated):
>>
>> static int attach_by_scanning(struct ubi_device *ubi)
>> {
>>          si = ubi_scan(ubi);
>>      ...fill ubi->some_fields...;
>>          err = ubi_read_volume_table(ubi, si);
>>      /* MARK */
>>          err = ubi_eba_init_scan(ubi, si); /* swapped in U-Boot */
>>          err = ubi_wl_init_scan(ubi, si);  /* swapped in U-Boot */
>>          ubi_scan_destroy_si(si);
>>          return 0;
>> }
>>
>> See the two swapped calls.
>>
>> At MARK, I printed some of the peb counters in *ubi, and I got
>> different results for ubi->avail_pebs between U-Boot and Linux:
>> U-Boot: UBI: POST_TBL: rsvd=2018, avail=21, beb_rsvd_{pebs,level}=0,0
>> Linux:  UBI: POST_TBL: rsvd=2018, avail=22, beb_rsvd_{pebs,level}=0,0
>>
>> The printed values were equal before calling ubi_read_volume_table().
>> I have no idea about where this difference comes from, nor if this
>> difference can cause my troubles.
>> I will better investigate tomorrow looking into ubi_read_volume_table().
>
> After half a day of debugging and an insane amount of printk()s added to
> both U-Boot and Linux, I have some more hints to understand the problem.
>
> The two different results quoted above show that U-Boot counted 21
> available eraseblocks, while Linux counts 22. I am not sure if this can
> cause my problem, but it's the first visible difference between U-Boot
> and Linux.
>
> This originates from ubi_scan() (scan.c): in U-Boot, it sets
> si->bad_peb_count to 1, in Linux to 0. U-Boot's ubi_scan() is very
> similar to Linux's, and the differences do not seem to relevant in my 
> case. So let's dig down...
>
> - ubi_scan() (scan.c) calls process_eb() (scan.c) for each EB
> - process_eb() calls ubi_io_is_bad() (io.c), and if it returns >0 it
>   increments si->bad_peb_count, which is what is happening to my board
>   when executing U-Boot
> - ubi_io_is_bad() calls mtd->block_isbad(), which points to
>   nand_block_isbad() (nand_base.c)
> - nand_block_isbad() is a wrapper to nand_block_checkbad() (nand_base.c)
> - nand_block_checkbad() differs from the Linux code in something
>   related to lazy bad block scanning (commit fb49454b1b6c7c6, Feb 2012),
>   but this does not seem to change the behaviour I observe;
> - nand_block_checkbad() calls either chip->block_bad() or
>   nand_isbad_bbt(); I tracked only into the former, but I suspect the
>   latter produces the same effects with regard to the problem I'm facing
> - chip->block_bad() points to nand_block_bad() (nand_base.c)
>
> nand_block_bad() (nand_base.c) does the following:
> static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip)
> {
>     ...
>
>         if (likely(chip->badblockbits == 8))
>                 res = bad != 0xFF;
>         else
>                 res = hweight8(bad) < chip->badblockbits;
>
>         if (getchip)
>                 nand_release_device(mtd);
>
>         return res;
> }
>
> I don't understand the algorithm, but the relevant variables have these
> values:
> U-Boot: nand_block_bad: chip->badblockbits=8, bad=0000, hweight8(bad)=0
> Linux:  nand_block_bad: chip->badblockbits=0, bad=0000, hweight8(bad)=0
>                                            ^
>
> Obviously the U-Boot and Linux produce a different return value.
> This propagates up to ubi->bad_peb_count in ubi_scan(), and from there
> it changes the behaviour of the following code, leading to a block in
> U-Boot and a successful attach in Linux.
>
> chip->badblockbits in current Linux master is described as
> "minimum number of set bits in a good block's bad block marker
> position; i.e., BBM == 11110111b is not bad when badblockbits == 7".
>
> Still a bit obscure to me because I don't have a general picture.
> Anyway, here's how its value comes to be different between U-Boot
> (2012.04.01) and Linux (2.6.37).
>
> Linux:
> a) commit e0b58d0a7005, Feb 2010:
>    mtd: nand: add ->badblockbits for minimum number of set bits in bad
>    block byte
>    declared the new variable and introduced in nand_get_flash_type()
>    (nand_base.c) the following line:
>      chip->badblockbits = 8;
> b) commit c7b28e25cb9, Jul 2010:
>    mtd: nand: refactor BB marker detection
>    removed from nand_get_flash_type() (nand_base.c) the same line:
>      chip->badblockbits = 8;
> c) commit 26d9be11485e, Apr 2011:
>    mtd: return badblockbits back
>    restored in nand_get_flash_type() (nand_base.c) the following line:
>      chip->badblockbits = 8;
>   claiming it had been accidentally removed in commit b).
>
> The version of Linux I'm using (2.6.37), contains commits a) and b), so
> it has chip->badblockbits equal to 0. According to the log message of
> commit c), this should be wrong, but the resulting kernel works!
>
> The version of U-Boot (2012.04.01) contains the result of all 3 commits,
> since
>
>   commit 2a8e0fc8b3dc31a3c571e439fbf04b882c8986be
>   Author: Christian Hitz <christian.hitz@aizo.com>
>   Date:   Wed Oct 12 09:32:02 2011 +0200
>
>       nand: Merge changes from Linux nand driver
>
>       [backport from linux commit
>           02f8c6aee8df3cdc935e9bdd4f2d020306035dbe]
>
>       This patch synchronizes the nand driver with the Linux 3.0 state.
>
> This looks like an improvement, but it bricks my board!
>
> I could not resist, and without even trying to understand what I was
> doing, I did in U-Boot's nand_get_flash_type() (nand_base.c):
>
> -       chip->badblockbits = 8;
> +       chip->badblockbits = 0;
>
> and guess what? U-Boot attached UBI, loaded Linux from it and booted
> successfully!
>
> No, I don't think changing lines here and there without any real
> understanding is a way to produce reliable software. But I'm unable
> to understand why the software that should work better actually bricks
> the board and the other one runs fine? And how do I know what the
> correct value for chip->badblockbits should be?
>
> And last but most important: how can I properly fix U-Boot?

I had another look at the commit that swapped the calls to
ubi_eba_init_scan() and ubi_wl_init_scan(), and I noticed that it 
changed the
computationof the available PEB count.

In the original (pre-swap) code, running on a working board:

static int attach_by_scanning(struct ubi_device *ubi)
{
          si = ubi_scan(ubi);
      ...fill ubi->some_fields...;
          err = ubi_read_volume_table(ubi, si);

            /* here rsvd=2018, avail=22, beb_rsvd_{pebs,level}=0,0 */

          err = ubi_wl_init_scan(ubi, si);  /* swapped in U-Boot */

            /* herersvd=2019, avail=21, beb_rsvd_{pebs,level}=0,0 ***** */

          err = ubi_eba_init_scan(ubi, si); /* swapped in U-Boot */
          ubi_scan_destroy_si(si);
          return 0;
}


In the current (post-swap) code, running on the same board:

static int attach_by_scanning(struct ubi_device *ubi)
{
          si = ubi_scan(ubi);
      ...fill ubi->some_fields...;
          err = ubi_read_volume_table(ubi, si);

            /* here rsvd=2018, avail=22, beb_rsvd_{pebs,level}=0,0 */

          err = ubi_eba_init_scan(ubi, si); /* swapped in U-Boot */

            /* here rsvd=2039, avail=1, beb_rsvd_{pebs,level}=20,20***** */

          err = ubi_wl_init_scan(ubi, si);  /* swapped in U-Boot */
          ubi_scan_destroy_si(si);
          return 0;
}

Notice the difference on the line marked with "*****": after the swap, the
number of available PEBs changed from 21to 1.

According to the docs, UBI reserves some PEBs for bad PEB handling. By
default, in my 2048-PEBs NAND, it reserved 20 PEBs, wihch are far enough to
recover from a few bad PEBs. These should be computed as part of the 
"available"
PEBs. But current U-Boot (incorrectly?) thinks thereis only 1 available PEB.
On a bricked board, it thinks there are 0, so it cannotattach UBI.

I have no fix for this, but I tried a simple workaround: instead of 
using all
the available space for my logical volumes, I created them with a smaller
size, leaving 32 unused PEBs. Now, in attach_by_scanning(), I got:

pre-swap:  rsvd=1987, avail=53, beb_rsvd_{pebs,level}=0,0
post-swap: rsvd=2007, avail=33, beb_rsvd_{pebs,level}=20,20

The computed number of available PEB is exactly 32 units bigger than it used
to be. This means, also after the swap, U-Boot thinks there are plenty of
available PEBs.

To try to simulate a board that has bad blocks, I then marked some blocks as
bad using 'nand markbad' in U-Boot. The number of available PEBs decreases
accordingly, but is still >0 and U-Boot can attach UBI and boot.

So, it seems that leaving some unused PEBs is a workaround to this problem!
I'm not 100% sure this is ok and will go on to better understand the 
problem.
Any comments are welcome.

Luca

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-01-02 14:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 11:28 [U-Boot] Bricked when trying to attach UBI Luca Ceresoli
2012-12-19 15:24 ` Andreas Bießmann
2012-12-19 15:56   ` Luca Ceresoli
2012-12-19 16:09     ` Andreas Bießmann
2012-12-19 17:37       ` Luca Ceresoli
2012-12-20 12:44         ` Holger Brunck
2012-12-20 16:02         ` Luca Ceresoli
2013-01-02 14:37           ` Luca Ceresoli
2012-12-19 17:32 ` Vikram Narayanan
2012-12-19 18:22   ` Stefan Roese
2012-12-19 18:47     ` Vikram Narayanan
2012-12-19 18:57       ` Vikram Narayanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox