public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* UBI wl_tree_add problems after PEB scrubbed
@ 2012-11-30 15:05 Zach Sadecki
  2012-12-03 10:48 ` Artem Bityutskiy
  2012-12-03 14:46 ` Artem Bityutskiy
  0 siblings, 2 replies; 18+ messages in thread
From: Zach Sadecki @ 2012-11-30 15:05 UTC (permalink / raw)
  To: linux-mtd

Every time I see UBI scrub a PEB with fixable bit-flips (on my custom 
Freescale i.MX28 board) the background thread has problems shortly 
thereafter.  I'm not exactly sure where to start debugging this and I'm 
hoping someone can help point me in the right direction.  Below are 
kernel messages showing the problem from 2 different runs (in which both 
ended up with a hung CPU).  This is using kernel 3.7-rc7.

Also worth noting is that I had to modify the gpmi-nand driver to 
actually report max_bitflips back to the MTD layer to even get to this 
point (before that everything would just run along happily until it hit 
an uncorrectable ECC error).  I will submit a patch for this once 
everything seems OK...

Kernel messages:
[18444.860000] UBI: fixable bit-flip detected at PEB 2789
[18444.870000] UBI: schedule PEB 2789 for scrubbing
[18444.960000] UBI: fixable bit-flip detected at PEB 2789
[18445.120000] UBI: scrubbed PEB 2789 (LEB 3:544), data moved to PEB 1574
[18457.040000] UBI assert failed in wl_tree_add at 204 (pid 31)
[18457.040000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c0279758>] (wl_tree_add+0xb8/0xd0)
[18457.050000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[18457.060000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[18457.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[18457.080000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[18457.090000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[21386.860000] UBI: fixable bit-flip detected at PEB 2467
[21386.870000] UBI: schedule PEB 2467 for scrubbing
[21387.000000] UBI: fixable bit-flip detected at PEB 2467
[21387.190000] UBI: scrubbed PEB 2467 (LEB 3:490), data moved to PEB 1574
[21400.920000] UBI assert failed in wl_tree_add at 204 (pid 31)
[21400.920000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c0279758>] (wl_tree_add+0xb8/0xd0)
[21400.930000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[21400.940000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[21400.950000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[21400.960000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[21400.970000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[26168.070000] BUG: soft lockup - CPU#0 stuck for 22s! [ubi_bgt0d:31]
[26168.070000] irq event stamp: 27382734
[26168.070000] hardirqs last  enabled at (27382733): [<c0008e88>] 
__irq_svc+0x48/0x54
[26168.070000] hardirqs last disabled at (27382734): [<c0008e74>] 
__irq_svc+0x34/0x54
[26168.070000] softirqs last  enabled at (27382732): [<c001ea48>] 
__do_softirq+0x1ac/0x22c
[26168.070000] softirqs last disabled at (27382725): [<c001ebb0>] 
irq_exit+0x84/0x94
[26168.070000]
[26168.070000] Pid: 31, comm:            ubi_bgt0d
[26168.070000] CPU: 0    Not tainted  (3.7.0-rc7-itw #15)
[26168.070000] PC is at wl_tree_add+0x44/0xd0
[26168.070000] LR is at erase_worker+0x3bc/0x6c0
[26168.070000] pc : [<c02796e4>]    lr : [<c027a008>] psr: 80000013
[26168.070000] sp : c76d9e90  ip : c08ea20c  fp : c763add8
[26168.070000] r10: 00000000  r9 : 00000043  r8 : c763ad78
[26168.070000] r7 : c76d8000  r6 : c77073c0  r5 : c7707410  r4 : c7707408
[26168.070000] r3 : 00000046  r2 : 00000044  r1 : c7707348  r0 : c77073c0
[26168.070000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment kernel
[26168.070000] Control: 0005317f  Table: 460a0000  DAC: 00000017
[26168.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
[26168.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
[<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
[26168.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
(hrtimer_interrupt+0x104/0x348)
[26168.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
[<c0013390>] (mxs_timer_interrupt+0x20/0x30)
[26168.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
[<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
[26168.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
[<c00605fc>] (handle_irq_event+0x3c/0x5c)
[26168.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
[<c0062d70>] (handle_level_irq+0x90/0x120)
[26168.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
[<c0060360>] (generic_handle_irq+0x30/0x3c)
[26168.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
[<c0009abc>] (handle_IRQ+0x30/0x84)
[26168.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
(icoll_handle_irq+0x34/0x48)
[26168.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
[<c0008e84>] (__irq_svc+0x44/0x54)
[26168.070000] Exception stack(0xc76d9e48 to 0xc76d9e90)
[26168.070000] 9e40:                   c77073c0 c7707348 00000044 
00000046 c7707408 c7707410
[26168.070000] 9e60: c77073c0 c76d8000 c763ad78 00000043 00000000 
c763add8 c08ea20c c76d9e90
[26168.070000] 9e80: c027a008 c02796e4 80000013 ffffffff
[26168.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c02796e4>] 
(wl_tree_add+0x44/0xd0)
[26168.070000] [<c02796e4>] (wl_tree_add+0x44/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[26168.070000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[26168.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[26168.070000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[26168.070000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)



[13111.940000] UBI: fixable bit-flip detected at PEB 2822
[13111.940000] UBI: schedule PEB 2822 for scrubbing
[13112.030000] UBI: fixable bit-flip detected at PEB 2822
[13112.200000] UBI: scrubbed PEB 2822 (LEB 3:634), data moved to PEB 2766
[13124.260000] UBI assert failed in wl_tree_add at 204 (pid 31)
[13124.270000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c027974c>] (wl_tree_add+0xb8/0xd0)
[13124.280000] [<c027974c>] (wl_tree_add+0xb8/0xd0) from [<c0279ffc>] 
(erase_worker+0x3bc/0x6c0)
[13124.290000] [<c0279ffc>] (erase_worker+0x3bc/0x6c0) from [<c0279610>] 
(do_work+0x8c/0x110)
[13124.300000] [<c0279610>] (do_work+0x8c/0x110) from [<c027ade8>] 
(ubi_thread+0xc4/0x148)
[13124.300000] [<c027ade8>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[13124.310000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[13361.320000] Unable to handle kernel NULL pointer dereference at 
virtual address 00000000
[13361.330000] pgd = c5c64000
[13361.330000] [00000000] *pgd=45c4f831, *pte=00000000, *ppte=00000000
[13361.340000] Internal error: Oops: 817 [#1] ARM
[13361.340000] CPU: 0    Not tainted  (3.7.0-rc7-itw #12)
[13361.340000] PC is at prot_queue_del+0x7c/0xcc
[13361.340000] LR is at ubi_wl_put_peb+0x1d8/0x2d8
[13361.340000] pc : [<c0279070>]    lr : [<c027ac24>] psr: 60000013
[13361.340000] sp : c6f07cd0  ip : c7720228  fp : 00000265
[13361.340000] r10: 00000f75  r9 : 00000003  r8 : c763add8
[13361.340000] r7 : c763ad70  r6 : c763a000  r5 : c7720228  r4 : c763ad74
[13361.340000] r3 : 00200200  r2 : c77202b9  r1 : 00000000  r0 : c763a000
[13361.340000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment user
[13361.340000] Control: 0005317f  Table: 45c64000  DAC: 00000015
[13361.340000] Process sqlite3 (pid: 6567, stack limit = 0xc6f061b8)
[13361.340000] Stack: (0xc6f07cd0 to 0xc6f08000)
[13361.340000] 7cc0:                                     c763ad74 
c027ac24 00000000 c0056088
[13361.340000] 7ce0: 00000000 c763acc4 c76cfc00 00000000 c763a000 
00000003 00000f75 00000265
[13361.340000] 7d00: 00000000 c0275a94 00000000 c03cf0c8 c6f07dc0 
c6e65000 00000265 c6e65000
[13361.340000] 7d20: c6cd6000 c6f07dc0 00000800 c019b45c c6f07dc0 
00000000 c6f07dc0 00000000
[13361.340000] 7d40: c6e65000 c01a779c 00000000 00000000 00000000 
00000000 c6e65b20 00000001
[13361.340000] 7d60: c6cd6000 00000265 c6e652f4 c6e652f4 c6e65000 
c6cd6108 c6cd6000 00000000
[13361.340000] 7d80: 000001b5 c6e65000 00009000 c6cd6000 c6f07dc0 
00000800 00000000 c01a7e30
[13361.340000] 7da0: c01ab440 00000007 00000000 c6f06000 c6c76180 
c6f06000 c6cd6024 c6e65a80
[13361.340000] 7dc0: 00000000 0001f000 00000010 00000265 c5c09ca8 
c6e65bd0 c6cd612c ffffffe4
[13361.340000] 7de0: 000001b5 00000000 c6cd6108 00000000 c6cd612c 
c6e65000 00000001 c018df00
[13361.340000] 7e00: c725bb20 00000003 00000060 00000004 00000080 
c6f06000 00000188 c6e652b4
[13361.340000] 7e20: 00000000 00000000 00000108 00000001 c725d020 
c00bc1d4 00000002 00000000
[13361.340000] 7e40: c725d020 c6d82400 c6e65000 c72f3b20 c72f40c0 
c725d020 000000a0 c018ffd8
[13361.340000] 7e60: c6c76180 60000013 c72f3d38 c6f06000 00000188 
00000047 000000a0 00000048
[13361.340000] 7e80: 00000000 00000001 22222222 c72f3d5c 22222222 
22222222 22222222 22222222
[13361.340000] 7ea0: c725d020 00000000 c725d020 c72f3b20 c72f40a8 
c6e65000 c6f07edc 00000048
[13361.340000] 7ec0: 00000000 c0193ffc 00000001 00000000 00000001 
00000001 c00ccad0 00200020
[13361.340000] 7ee0: 00000000 00000000 00000000 00000278 22222222 
22222222 00000000 c725d020
[13361.340000] 7f00: 00000000 c72f40a8 c6f07f38 c72f3b20 c6f06000 
00000000 00000001 c00ccb04
[13361.340000] 7f20: c6f07f38 000093c4 00000000 c6f3a000 c72f40a8 
c00ce958 c740ee70 c715cc78
[13361.340000] 7f40: 0febbc64 0000000e c6f3a01d 00000000 00000000 
c7138d20 c725d020 00000000
[13361.340000] 7f60: 00000002 00000000 00000000 60000010 00000000 
00000000 00000001 c0056088
[13361.340000] 7f80: c6f06000 0051da2c 00000000 0051d418 0051da2c 
00000000 0051d418 0000000a
[13361.340000] 7fa0: c00093c4 c0009240 0051da2c 00000000 0051da2c 
0051da2c 00000000 b6f2576c
[13361.340000] 7fc0: 0051da2c 00000000 0051d418 0000000a b6f24b24 
00000000 00000000 00000001
[13361.340000] 7fe0: 00000000 bec32f28 b6ed9064 b6deb62c 60000010 
0051da2c 00000000 00000000
[13361.340000] [<c0279070>] (prot_queue_del+0x7c/0xcc) from [<c027ac24>] 
(ubi_wl_put_peb+0x1d8/0x2d8)
[13361.340000] [<c027ac24>] (ubi_wl_put_peb+0x1d8/0x2d8) from 
[<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8)
[13361.340000] [<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8) from 
[<c019b45c>] (ubifs_leb_unmap+0x78/0xf4)
[13361.340000] [<c019b45c>] (ubifs_leb_unmap+0x78/0xf4) from 
[<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0)
[13361.340000] [<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0) from 
[<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8)
[13361.340000] [<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8) from 
[<c018df00>] (make_reservation+0x110/0x428)
[13361.340000] [<c018df00>] (make_reservation+0x110/0x428) from 
[<c018ffd8>] (ubifs_jnl_update+0x120/0x640)
[13361.340000] [<c018ffd8>] (ubifs_jnl_update+0x120/0x640) from 
[<c0193ffc>] (ubifs_unlink+0x168/0x260)
[13361.340000] [<c0193ffc>] (ubifs_unlink+0x168/0x260) from [<c00ccb04>] 
(vfs_unlink+0x84/0xfc)
[13361.340000] [<c00ccb04>] (vfs_unlink+0x84/0xfc) from [<c00ce958>] 
(do_unlinkat+0xd4/0x160)
[13361.340000] [<c00ce958>] (do_unlinkat+0xd4/0x160) from [<c0009240>] 
(ret_fast_syscall+0x0/0x38)
[13361.340000] Code: e59c2000 e59c1004 e59f3040 e5821004 (e5812000)
[13361.740000] ---[ end trace 8fb6e4d7822cc150 ]---
[57160.070000] BUG: soft lockup - CPU#0 stuck for 22s! [init:1]
[57160.070000] irq event stamp: 1311500
[57160.070000] hardirqs last  enabled at (1311499): [<c00654d8>] 
rcu_sched_qs+0x8c/0xc8
[57160.070000] hardirqs last disabled at (1311500): [<c03d15ec>] 
_raw_spin_lock_irq+0x24/0x54
[57160.070000] softirqs last  enabled at (1311474): [<c001ea48>] 
__do_softirq+0x1ac/0x22c
[57160.070000] softirqs last disabled at (1311465): [<c001ebb0>] 
irq_exit+0x84/0x94
[57160.070000]
[57160.070000] Pid: 1, comm:                 init
[57160.070000] CPU: 0    Tainted: G      D       (3.7.0-rc7-itw #12)
[57160.070000] PC is at wait_consider_task+0x4/0xb14
[57160.070000] LR is at do_wait+0x104/0x200
[57160.070000] pc : [<c001ac78>]    lr : [<c001b88c>] psr: 20000013
[57160.070000] sp : c7439f0c  ip : 00000005  fp : c7439f78
[57160.070000] r10: c7438000  r9 : c7439f34  r8 : c7430000
[57160.070000] r7 : c7439f60  r6 : c74300dc  r5 : 00000000  r4 : c6c728a0
[57160.070000] r3 : c6c72984  r2 : c6c728a0  r1 : 00000000  r0 : c7439f60
[57160.070000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment user
[57160.070000] Control: 0005317f  Table: 4779c000  DAC: 00000015
[57160.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
[57160.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
[<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
[57160.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
(hrtimer_interrupt+0x104/0x348)
[57160.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
[<c0013390>] (mxs_timer_interrupt+0x20/0x30)
[57160.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
[<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
[57160.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
[<c00605fc>] (handle_irq_event+0x3c/0x5c)
[57160.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
[<c0062d70>] (handle_level_irq+0x90/0x120)
[57160.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
[<c0060360>] (generic_handle_irq+0x30/0x3c)
[57160.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
[<c0009abc>] (handle_IRQ+0x30/0x84)
[57160.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
(icoll_handle_irq+0x34/0x48)
[57160.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
[<c0008e84>] (__irq_svc+0x44/0x54)
[57160.070000] Exception stack(0xc7439ec0 to 0xc7439f08)
[57160.070000] 9ec0: c7439f60 00000000 c6c728a0 c6c72984 c6c728a0 
00000000 c74300dc c7439f60
[57160.070000] 9ee0: c7430000 c7439f34 c7438000 c7439f78 00000005 
c7439f0c c001b88c c001ac78
[57160.070000] 9f00: 20000013 ffffffff
[57160.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c001ac78>] 
(wait_consider_task+0x4/0xb14)
[57160.070000] [<c001ac78>] (wait_consider_task+0x4/0xb14) from 
[<c7438000>] (0xc7438000)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-11-30 15:05 UBI wl_tree_add problems after PEB scrubbed Zach Sadecki
@ 2012-12-03 10:48 ` Artem Bityutskiy
  2012-12-03 11:01   ` Artem Bityutskiy
  2012-12-03 11:19   ` Richard Weinberger
  2012-12-03 14:46 ` Artem Bityutskiy
  1 sibling, 2 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-03 10:48 UTC (permalink / raw)
  To: Zach Sadecki, Richard Weinberger; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

On Fri, 2012-11-30 at 09:05 -0600, Zach Sadecki wrote:
> Every time I see UBI scrub a PEB with fixable bit-flips (on my custom 
> Freescale i.MX28 board) the background thread has problems shortly 
> thereafter.  I'm not exactly sure where to start debugging this and I'm 
> hoping someone can help point me in the right direction.  Below are 
> kernel messages showing the problem from 2 different runs (in which both 
> ended up with a hung CPU).  This is using kernel 3.7-rc7.
> 
> Also worth noting is that I had to modify the gpmi-nand driver to 
> actually report max_bitflips back to the MTD layer to even get to this 
> point (before that everything would just run along happily until it hit 
> an uncorrectable ECC error).  I will submit a patch for this once 
> everything seems OK...

Ack, reproducible on nandsim with 

sudo sh -c 'echo 1 > /sys/kernel/debug/ubi/ubi0/tst_emulate_bitflips'

I did not confirm this by bisecting, but it seems it is fastmap that
broke it.

And looking at fastmap changes, I immediately see some thing completely
bogus, not related to this:


/**
 * __wl_get_peb - get a physical eraseblock.
 * @ubi: UBI device description object
 *
 * This function returns a physical eraseblock in case of success and a
 * negative error code in case of failure. Might sleep.
 */
static int __wl_get_peb(struct ubi_device *ubi)

Might sleep? Well, yes, because it calls 

ubi_self_check_all_ff()

But then why is this:

       spin_lock(&ubi->wl_lock);
       peb = __wl_get_peb(ubi);
       spin_unlock(&ubi->wl_lock);

Bogus.

Richard, could you please re-test fastmap with all debugging enabled?
I see at least one bug already.

Namely these ones: chk_gen  chk_io  tst_disable_bgt

Also, it seems UBI is completely broken ATM - it craps out immediately
on the first bit-flip. Let me revert fastmap and check if it is fastmap.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-12-03 10:48 ` Artem Bityutskiy
@ 2012-12-03 11:01   ` Artem Bityutskiy
  2012-12-03 11:19   ` Richard Weinberger
  1 sibling, 0 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-03 11:01 UTC (permalink / raw)
  To: Zach Sadecki; +Cc: Richard Weinberger, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

On Mon, 2012-12-03 at 12:48 +0200, Artem Bityutskiy wrote:
> Also, it seems UBI is completely broken ATM - it craps out immediately
> on the first bit-flip. Let me revert fastmap and check if it is
> fastmap. 

When I revert fastmap the assert goes away but the WL thread craps out
differently. I will try to investigate it.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-12-03 10:48 ` Artem Bityutskiy
  2012-12-03 11:01   ` Artem Bityutskiy
@ 2012-12-03 11:19   ` Richard Weinberger
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Weinberger @ 2012-12-03 11:19 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Zach Sadecki

Am Mon, 03 Dec 2012 12:48:49 +0200
schrieb Artem Bityutskiy <dedekind1@gmail.com>:

> On Fri, 2012-11-30 at 09:05 -0600, Zach Sadecki wrote:
> > Every time I see UBI scrub a PEB with fixable bit-flips (on my
> > custom Freescale i.MX28 board) the background thread has problems
> > shortly thereafter.  I'm not exactly sure where to start debugging
> > this and I'm hoping someone can help point me in the right
> > direction.  Below are kernel messages showing the problem from 2
> > different runs (in which both ended up with a hung CPU).  This is
> > using kernel 3.7-rc7.
> > 
> > Also worth noting is that I had to modify the gpmi-nand driver to 
> > actually report max_bitflips back to the MTD layer to even get to
> > this point (before that everything would just run along happily
> > until it hit an uncorrectable ECC error).  I will submit a patch
> > for this once everything seems OK...
> 
> Ack, reproducible on nandsim with 
> 
> sudo sh -c 'echo 1 > /sys/kernel/debug/ubi/ubi0/tst_emulate_bitflips'
> 
> I did not confirm this by bisecting, but it seems it is fastmap that
> broke it.
> 
> And looking at fastmap changes, I immediately see some thing
> completely bogus, not related to this:
> 
> 
> /**
>  * __wl_get_peb - get a physical eraseblock.
>  * @ubi: UBI device description object
>  *
>  * This function returns a physical eraseblock in case of success and
> a
>  * negative error code in case of failure. Might sleep.
>  */
> static int __wl_get_peb(struct ubi_device *ubi)
> 
> Might sleep? Well, yes, because it calls 
> 
> ubi_self_check_all_ff()
> 
> But then why is this:
> 
>        spin_lock(&ubi->wl_lock);
>        peb = __wl_get_peb(ubi);
>        spin_unlock(&ubi->wl_lock);
> 
> Bogus.
> 
> Richard, could you please re-test fastmap with all debugging enabled?
> I see at least one bug already.
> 
> Namely these ones: chk_gen  chk_io  tst_disable_bgt
> 
> Also, it seems UBI is completely broken ATM - it craps out immediately
> on the first bit-flip. Let me revert fastmap and check if it is
> fastmap.

Okay, I'll look at it this afternoon.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-11-30 15:05 UBI wl_tree_add problems after PEB scrubbed Zach Sadecki
  2012-12-03 10:48 ` Artem Bityutskiy
@ 2012-12-03 14:46 ` Artem Bityutskiy
  2012-12-03 15:02   ` Richard Weinberger
  1 sibling, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-03 14:46 UTC (permalink / raw)
  To: Zach Sadecki, Richard Weinberger; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 15832 bytes --]

/me looks at Richard...

OK, I confirm that this is fastmap to blame. Namely, this commit:

Author: Richard Weinberger <richard@nod.at>
Date:   Wed Sep 26 17:51:48 2012 +0200

    UBI: Add fastmap support to the WL sub-system
    
    To make fastmap possible the WL sub-system needs some
    changes.
    Mostly to support fastmaps pools.
    
    Signed-off-by: Richard Weinberger <richard@nod.at>
    Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

If you revert it, bit-flips are handled properly.

Use the following for a time being:

$ git revert 76ac66e dbb7d2a 8199b90

On Fri, 2012-11-30 at 09:05 -0600, Zach Sadecki wrote:
> Every time I see UBI scrub a PEB with fixable bit-flips (on my custom 
> Freescale i.MX28 board) the background thread has problems shortly 
> thereafter.  I'm not exactly sure where to start debugging this and I'm 
> hoping someone can help point me in the right direction.  Below are 
> kernel messages showing the problem from 2 different runs (in which both 
> ended up with a hung CPU).  This is using kernel 3.7-rc7.
> 
> Also worth noting is that I had to modify the gpmi-nand driver to 
> actually report max_bitflips back to the MTD layer to even get to this 
> point (before that everything would just run along happily until it hit 
> an uncorrectable ECC error).  I will submit a patch for this once 
> everything seems OK...
> 
> Kernel messages:
> [18444.860000] UBI: fixable bit-flip detected at PEB 2789
> [18444.870000] UBI: schedule PEB 2789 for scrubbing
> [18444.960000] UBI: fixable bit-flip detected at PEB 2789
> [18445.120000] UBI: scrubbed PEB 2789 (LEB 3:544), data moved to PEB 1574
> [18457.040000] UBI assert failed in wl_tree_add at 204 (pid 31)
> [18457.040000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
> [<c0279758>] (wl_tree_add+0xb8/0xd0)
> [18457.050000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
> (erase_worker+0x3bc/0x6c0)
> [18457.060000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
> (do_work+0x8c/0x110)
> [18457.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
> (ubi_thread+0xc4/0x148)
> [18457.080000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
> (kthread+0xa0/0xac)
> [18457.090000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
> (ret_from_fork+0x14/0x2c)
> [21386.860000] UBI: fixable bit-flip detected at PEB 2467
> [21386.870000] UBI: schedule PEB 2467 for scrubbing
> [21387.000000] UBI: fixable bit-flip detected at PEB 2467
> [21387.190000] UBI: scrubbed PEB 2467 (LEB 3:490), data moved to PEB 1574
> [21400.920000] UBI assert failed in wl_tree_add at 204 (pid 31)
> [21400.920000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
> [<c0279758>] (wl_tree_add+0xb8/0xd0)
> [21400.930000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
> (erase_worker+0x3bc/0x6c0)
> [21400.940000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
> (do_work+0x8c/0x110)
> [21400.950000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
> (ubi_thread+0xc4/0x148)
> [21400.960000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
> (kthread+0xa0/0xac)
> [21400.970000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
> (ret_from_fork+0x14/0x2c)
> [26168.070000] BUG: soft lockup - CPU#0 stuck for 22s! [ubi_bgt0d:31]
> [26168.070000] irq event stamp: 27382734
> [26168.070000] hardirqs last  enabled at (27382733): [<c0008e88>] 
> __irq_svc+0x48/0x54
> [26168.070000] hardirqs last disabled at (27382734): [<c0008e74>] 
> __irq_svc+0x34/0x54
> [26168.070000] softirqs last  enabled at (27382732): [<c001ea48>] 
> __do_softirq+0x1ac/0x22c
> [26168.070000] softirqs last disabled at (27382725): [<c001ebb0>] 
> irq_exit+0x84/0x94
> [26168.070000]
> [26168.070000] Pid: 31, comm:            ubi_bgt0d
> [26168.070000] CPU: 0    Not tainted  (3.7.0-rc7-itw #15)
> [26168.070000] PC is at wl_tree_add+0x44/0xd0
> [26168.070000] LR is at erase_worker+0x3bc/0x6c0
> [26168.070000] pc : [<c02796e4>]    lr : [<c027a008>] psr: 80000013
> [26168.070000] sp : c76d9e90  ip : c08ea20c  fp : c763add8
> [26168.070000] r10: 00000000  r9 : 00000043  r8 : c763ad78
> [26168.070000] r7 : c76d8000  r6 : c77073c0  r5 : c7707410  r4 : c7707408
> [26168.070000] r3 : 00000046  r2 : 00000044  r1 : c7707348  r0 : c77073c0
> [26168.070000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
> Segment kernel
> [26168.070000] Control: 0005317f  Table: 460a0000  DAC: 00000017
> [26168.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
> [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
> [26168.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
> [<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
> [26168.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
> (hrtimer_interrupt+0x104/0x348)
> [26168.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
> [<c0013390>] (mxs_timer_interrupt+0x20/0x30)
> [26168.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
> [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
> [26168.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
> [<c00605fc>] (handle_irq_event+0x3c/0x5c)
> [26168.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
> [<c0062d70>] (handle_level_irq+0x90/0x120)
> [26168.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
> [<c0060360>] (generic_handle_irq+0x30/0x3c)
> [26168.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
> [<c0009abc>] (handle_IRQ+0x30/0x84)
> [26168.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
> (icoll_handle_irq+0x34/0x48)
> [26168.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
> [<c0008e84>] (__irq_svc+0x44/0x54)
> [26168.070000] Exception stack(0xc76d9e48 to 0xc76d9e90)
> [26168.070000] 9e40:                   c77073c0 c7707348 00000044 
> 00000046 c7707408 c7707410
> [26168.070000] 9e60: c77073c0 c76d8000 c763ad78 00000043 00000000 
> c763add8 c08ea20c c76d9e90
> [26168.070000] 9e80: c027a008 c02796e4 80000013 ffffffff
> [26168.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c02796e4>] 
> (wl_tree_add+0x44/0xd0)
> [26168.070000] [<c02796e4>] (wl_tree_add+0x44/0xd0) from [<c027a008>] 
> (erase_worker+0x3bc/0x6c0)
> [26168.070000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
> (do_work+0x8c/0x110)
> [26168.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
> (ubi_thread+0xc4/0x148)
> [26168.070000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
> (kthread+0xa0/0xac)
> [26168.070000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
> (ret_from_fork+0x14/0x2c)
> 
> 
> 
> [13111.940000] UBI: fixable bit-flip detected at PEB 2822
> [13111.940000] UBI: schedule PEB 2822 for scrubbing
> [13112.030000] UBI: fixable bit-flip detected at PEB 2822
> [13112.200000] UBI: scrubbed PEB 2822 (LEB 3:634), data moved to PEB 2766
> [13124.260000] UBI assert failed in wl_tree_add at 204 (pid 31)
> [13124.270000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
> [<c027974c>] (wl_tree_add+0xb8/0xd0)
> [13124.280000] [<c027974c>] (wl_tree_add+0xb8/0xd0) from [<c0279ffc>] 
> (erase_worker+0x3bc/0x6c0)
> [13124.290000] [<c0279ffc>] (erase_worker+0x3bc/0x6c0) from [<c0279610>] 
> (do_work+0x8c/0x110)
> [13124.300000] [<c0279610>] (do_work+0x8c/0x110) from [<c027ade8>] 
> (ubi_thread+0xc4/0x148)
> [13124.300000] [<c027ade8>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
> (kthread+0xa0/0xac)
> [13124.310000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
> (ret_from_fork+0x14/0x2c)
> [13361.320000] Unable to handle kernel NULL pointer dereference at 
> virtual address 00000000
> [13361.330000] pgd = c5c64000
> [13361.330000] [00000000] *pgd=45c4f831, *pte=00000000, *ppte=00000000
> [13361.340000] Internal error: Oops: 817 [#1] ARM
> [13361.340000] CPU: 0    Not tainted  (3.7.0-rc7-itw #12)
> [13361.340000] PC is at prot_queue_del+0x7c/0xcc
> [13361.340000] LR is at ubi_wl_put_peb+0x1d8/0x2d8
> [13361.340000] pc : [<c0279070>]    lr : [<c027ac24>] psr: 60000013
> [13361.340000] sp : c6f07cd0  ip : c7720228  fp : 00000265
> [13361.340000] r10: 00000f75  r9 : 00000003  r8 : c763add8
> [13361.340000] r7 : c763ad70  r6 : c763a000  r5 : c7720228  r4 : c763ad74
> [13361.340000] r3 : 00200200  r2 : c77202b9  r1 : 00000000  r0 : c763a000
> [13361.340000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
> Segment user
> [13361.340000] Control: 0005317f  Table: 45c64000  DAC: 00000015
> [13361.340000] Process sqlite3 (pid: 6567, stack limit = 0xc6f061b8)
> [13361.340000] Stack: (0xc6f07cd0 to 0xc6f08000)
> [13361.340000] 7cc0:                                     c763ad74 
> c027ac24 00000000 c0056088
> [13361.340000] 7ce0: 00000000 c763acc4 c76cfc00 00000000 c763a000 
> 00000003 00000f75 00000265
> [13361.340000] 7d00: 00000000 c0275a94 00000000 c03cf0c8 c6f07dc0 
> c6e65000 00000265 c6e65000
> [13361.340000] 7d20: c6cd6000 c6f07dc0 00000800 c019b45c c6f07dc0 
> 00000000 c6f07dc0 00000000
> [13361.340000] 7d40: c6e65000 c01a779c 00000000 00000000 00000000 
> 00000000 c6e65b20 00000001
> [13361.340000] 7d60: c6cd6000 00000265 c6e652f4 c6e652f4 c6e65000 
> c6cd6108 c6cd6000 00000000
> [13361.340000] 7d80: 000001b5 c6e65000 00009000 c6cd6000 c6f07dc0 
> 00000800 00000000 c01a7e30
> [13361.340000] 7da0: c01ab440 00000007 00000000 c6f06000 c6c76180 
> c6f06000 c6cd6024 c6e65a80
> [13361.340000] 7dc0: 00000000 0001f000 00000010 00000265 c5c09ca8 
> c6e65bd0 c6cd612c ffffffe4
> [13361.340000] 7de0: 000001b5 00000000 c6cd6108 00000000 c6cd612c 
> c6e65000 00000001 c018df00
> [13361.340000] 7e00: c725bb20 00000003 00000060 00000004 00000080 
> c6f06000 00000188 c6e652b4
> [13361.340000] 7e20: 00000000 00000000 00000108 00000001 c725d020 
> c00bc1d4 00000002 00000000
> [13361.340000] 7e40: c725d020 c6d82400 c6e65000 c72f3b20 c72f40c0 
> c725d020 000000a0 c018ffd8
> [13361.340000] 7e60: c6c76180 60000013 c72f3d38 c6f06000 00000188 
> 00000047 000000a0 00000048
> [13361.340000] 7e80: 00000000 00000001 22222222 c72f3d5c 22222222 
> 22222222 22222222 22222222
> [13361.340000] 7ea0: c725d020 00000000 c725d020 c72f3b20 c72f40a8 
> c6e65000 c6f07edc 00000048
> [13361.340000] 7ec0: 00000000 c0193ffc 00000001 00000000 00000001 
> 00000001 c00ccad0 00200020
> [13361.340000] 7ee0: 00000000 00000000 00000000 00000278 22222222 
> 22222222 00000000 c725d020
> [13361.340000] 7f00: 00000000 c72f40a8 c6f07f38 c72f3b20 c6f06000 
> 00000000 00000001 c00ccb04
> [13361.340000] 7f20: c6f07f38 000093c4 00000000 c6f3a000 c72f40a8 
> c00ce958 c740ee70 c715cc78
> [13361.340000] 7f40: 0febbc64 0000000e c6f3a01d 00000000 00000000 
> c7138d20 c725d020 00000000
> [13361.340000] 7f60: 00000002 00000000 00000000 60000010 00000000 
> 00000000 00000001 c0056088
> [13361.340000] 7f80: c6f06000 0051da2c 00000000 0051d418 0051da2c 
> 00000000 0051d418 0000000a
> [13361.340000] 7fa0: c00093c4 c0009240 0051da2c 00000000 0051da2c 
> 0051da2c 00000000 b6f2576c
> [13361.340000] 7fc0: 0051da2c 00000000 0051d418 0000000a b6f24b24 
> 00000000 00000000 00000001
> [13361.340000] 7fe0: 00000000 bec32f28 b6ed9064 b6deb62c 60000010 
> 0051da2c 00000000 00000000
> [13361.340000] [<c0279070>] (prot_queue_del+0x7c/0xcc) from [<c027ac24>] 
> (ubi_wl_put_peb+0x1d8/0x2d8)
> [13361.340000] [<c027ac24>] (ubi_wl_put_peb+0x1d8/0x2d8) from 
> [<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8)
> [13361.340000] [<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8) from 
> [<c019b45c>] (ubifs_leb_unmap+0x78/0xf4)
> [13361.340000] [<c019b45c>] (ubifs_leb_unmap+0x78/0xf4) from 
> [<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0)
> [13361.340000] [<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0) from 
> [<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8)
> [13361.340000] [<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8) from 
> [<c018df00>] (make_reservation+0x110/0x428)
> [13361.340000] [<c018df00>] (make_reservation+0x110/0x428) from 
> [<c018ffd8>] (ubifs_jnl_update+0x120/0x640)
> [13361.340000] [<c018ffd8>] (ubifs_jnl_update+0x120/0x640) from 
> [<c0193ffc>] (ubifs_unlink+0x168/0x260)
> [13361.340000] [<c0193ffc>] (ubifs_unlink+0x168/0x260) from [<c00ccb04>] 
> (vfs_unlink+0x84/0xfc)
> [13361.340000] [<c00ccb04>] (vfs_unlink+0x84/0xfc) from [<c00ce958>] 
> (do_unlinkat+0xd4/0x160)
> [13361.340000] [<c00ce958>] (do_unlinkat+0xd4/0x160) from [<c0009240>] 
> (ret_fast_syscall+0x0/0x38)
> [13361.340000] Code: e59c2000 e59c1004 e59f3040 e5821004 (e5812000)
> [13361.740000] ---[ end trace 8fb6e4d7822cc150 ]---
> [57160.070000] BUG: soft lockup - CPU#0 stuck for 22s! [init:1]
> [57160.070000] irq event stamp: 1311500
> [57160.070000] hardirqs last  enabled at (1311499): [<c00654d8>] 
> rcu_sched_qs+0x8c/0xc8
> [57160.070000] hardirqs last disabled at (1311500): [<c03d15ec>] 
> _raw_spin_lock_irq+0x24/0x54
> [57160.070000] softirqs last  enabled at (1311474): [<c001ea48>] 
> __do_softirq+0x1ac/0x22c
> [57160.070000] softirqs last disabled at (1311465): [<c001ebb0>] 
> irq_exit+0x84/0x94
> [57160.070000]
> [57160.070000] Pid: 1, comm:                 init
> [57160.070000] CPU: 0    Tainted: G      D       (3.7.0-rc7-itw #12)
> [57160.070000] PC is at wait_consider_task+0x4/0xb14
> [57160.070000] LR is at do_wait+0x104/0x200
> [57160.070000] pc : [<c001ac78>]    lr : [<c001b88c>] psr: 20000013
> [57160.070000] sp : c7439f0c  ip : 00000005  fp : c7439f78
> [57160.070000] r10: c7438000  r9 : c7439f34  r8 : c7430000
> [57160.070000] r7 : c7439f60  r6 : c74300dc  r5 : 00000000  r4 : c6c728a0
> [57160.070000] r3 : c6c72984  r2 : c6c728a0  r1 : 00000000  r0 : c7439f60
> [57160.070000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
> Segment user
> [57160.070000] Control: 0005317f  Table: 4779c000  DAC: 00000015
> [57160.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
> [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
> [57160.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
> [<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
> [57160.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
> (hrtimer_interrupt+0x104/0x348)
> [57160.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
> [<c0013390>] (mxs_timer_interrupt+0x20/0x30)
> [57160.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
> [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
> [57160.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
> [<c00605fc>] (handle_irq_event+0x3c/0x5c)
> [57160.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
> [<c0062d70>] (handle_level_irq+0x90/0x120)
> [57160.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
> [<c0060360>] (generic_handle_irq+0x30/0x3c)
> [57160.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
> [<c0009abc>] (handle_IRQ+0x30/0x84)
> [57160.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
> (icoll_handle_irq+0x34/0x48)
> [57160.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
> [<c0008e84>] (__irq_svc+0x44/0x54)
> [57160.070000] Exception stack(0xc7439ec0 to 0xc7439f08)
> [57160.070000] 9ec0: c7439f60 00000000 c6c728a0 c6c72984 c6c728a0 
> 00000000 c74300dc c7439f60
> [57160.070000] 9ee0: c7430000 c7439f34 c7438000 c7439f78 00000005 
> c7439f0c c001b88c c001ac78
> [57160.070000] 9f00: 20000013 ffffffff
> [57160.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c001ac78>] 
> (wait_consider_task+0x4/0xb14)
> [57160.070000] [<c001ac78>] (wait_consider_task+0x4/0xb14) from 
> [<c7438000>] (0xc7438000)
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-12-03 14:46 ` Artem Bityutskiy
@ 2012-12-03 15:02   ` Richard Weinberger
  2012-12-03 15:33     ` Artem Bityutskiy
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2012-12-03 15:02 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, Zach Sadecki

Am Mon, 03 Dec 2012 16:46:00 +0200
schrieb Artem Bityutskiy <dedekind1@gmail.com>:

> /me looks at Richard...

/me looks at the code. ;)

> OK, I confirm that this is fastmap to blame. Namely, this commit:

CONFIG_MTD_UBI_FASTMAP is y and ubi.fm_autoconvert is 1, correct?
What tests are you using? ubi-tests?

Does the issue still happen if you disable ubi_self_check_all_ff() in
__wl_get_peb()
And yeah, that's a very stupid bug, I'm very sorry.

I'm running ubi-tests now for a while, self_check_ec() triggered once,
but I've never seen "UBI assert failed in wl_tree_add at 204" so far.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI wl_tree_add problems after PEB scrubbed
  2012-12-03 15:02   ` Richard Weinberger
@ 2012-12-03 15:33     ` Artem Bityutskiy
       [not found]       ` <1354564667-9549-1-git-send-email-richard@nod.at>
  0 siblings, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-03 15:33 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, Zach Sadecki

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Mon, 2012-12-03 at 16:02 +0100, Richard Weinberger wrote:
> Am Mon, 03 Dec 2012 16:46:00 +0200
> schrieb Artem Bityutskiy <dedekind1@gmail.com>:
> 
> > /me looks at Richard...
> 
> /me looks at the code. ;)
> 
> > OK, I confirm that this is fastmap to blame. Namely, this commit:
> 
> CONFIG_MTD_UBI_FASTMAP is y and ubi.fm_autoconvert is 1, correct?
> What tests are you using? ubi-tests?

No, CONFIG_MTD_UBI_FASTMAP is "no", it is switched off.

I used integck. Here is the command I was using:

$ sudo modprobe nandsim && sudo modprobe ubi mtd=0 && sudo ubimkvol -N test -m /dev/ubi0 && sudo modprobe ubifs && sudo mount -t ubifs /dev/ubi0_0 /mnt/ubifs/ && cd /home/dedekind/git/mtd-utils/tests/fs-tests/integrity && sudo sh -c 'echo 1 > /sys/kernel/debug/ubi/ubi0/tst_emulate_bitflips' && sudo ./integck -n2 /mnt/ubifs/

(sorry, you can easily split it).

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
       [not found]       ` <1354564667-9549-1-git-send-email-richard@nod.at>
@ 2012-12-03 19:57         ` Richard Weinberger
  2012-12-04  8:34           ` Artem Bityutskiy
  2012-12-03 19:57         ` [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb() Richard Weinberger
  2012-12-04 14:45         ` UBI Fastmap fixes for 3.7 Zach Sadecki
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2012-12-03 19:57 UTC (permalink / raw)
  To: dedekind1; +Cc: Richard Weinberger, linux-mtd, zsadecki

If UBI is built without fastmap, get_peb_for_wl() has to
remove the PEB manially from the free tree.
Otherwise the requested PEB lives in two trees.

Reported-by: Zach Sadecki <zsadecki@itwatchdogs.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/wl.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index da7b449..1f9f5f7 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -679,7 +679,13 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
 #else
 static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
 {
-	return find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
+	struct ubi_wl_entry *e;
+
+	e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
+	self_check_in_wl_tree(ubi, e, &ubi->free);
+	rb_erase(&e->u.rb, &ubi->free);
+
+	return e;
 }
 
 int ubi_wl_get_peb(struct ubi_device *ubi)
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
       [not found]       ` <1354564667-9549-1-git-send-email-richard@nod.at>
  2012-12-03 19:57         ` [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl() Richard Weinberger
@ 2012-12-03 19:57         ` Richard Weinberger
  2012-12-04  7:55           ` Artem Bityutskiy
  2012-12-04  7:56           ` Artem Bityutskiy
  2012-12-04 14:45         ` UBI Fastmap fixes for 3.7 Zach Sadecki
  2 siblings, 2 replies; 18+ messages in thread
From: Richard Weinberger @ 2012-12-03 19:57 UTC (permalink / raw)
  To: dedekind1; +Cc: Richard Weinberger, linux-mtd, zsadecki

As ubi_self_check_all_ff() might sleep we are not allowed
to call it from atomic context.
For now we call it only from ubi_wl_get_peb().
There are some code paths where it would also make sense,
but these paths are currently atomic and only enabled
when fastmap is used.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/wl.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 1f9f5f7..2144f61 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -498,7 +498,7 @@ out:
  * @ubi: UBI device description object
  *
  * This function returns a physical eraseblock in case of success and a
- * negative error code in case of failure. Might sleep.
+ * negative error code in case of failure.
  */
 static int __wl_get_peb(struct ubi_device *ubi)
 {
@@ -540,13 +540,6 @@ retry:
 	 * ubi_wl_get_peb() after removing e from the pool. */
 	prot_queue_add(ubi, e);
 #endif
-	err = ubi_self_check_all_ff(ubi, e->pnum, ubi->vid_hdr_aloffset,
-				    ubi->peb_size - ubi->vid_hdr_aloffset);
-	if (err) {
-		ubi_err("new PEB %d does not contain all 0xFF bytes", e->pnum);
-		return err;
-	}
-
 	return e->pnum;
 }
 
@@ -690,12 +683,19 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
 
 int ubi_wl_get_peb(struct ubi_device *ubi)
 {
-	int peb;
+	int peb, err;
 
 	spin_lock(&ubi->wl_lock);
 	peb = __wl_get_peb(ubi);
 	spin_unlock(&ubi->wl_lock);
 
+	err = ubi_self_check_all_ff(ubi, peb, ubi->vid_hdr_aloffset,
+				    ubi->peb_size - ubi->vid_hdr_aloffset);
+	if (err) {
+		ubi_err("new PEB %d does not contain all 0xFF bytes", peb);
+		return err;
+	}
+
 	return peb;
 }
 #endif
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
  2012-12-03 19:57         ` [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb() Richard Weinberger
@ 2012-12-04  7:55           ` Artem Bityutskiy
  2012-12-04  7:56           ` Artem Bityutskiy
  1 sibling, 0 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-04  7:55 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, zsadecki

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Mon, 2012-12-03 at 20:57 +0100, Richard Weinberger wrote:
> +	err = ubi_self_check_all_ff(ubi, peb, ubi->vid_hdr_aloffset,
> +				    ubi->peb_size - ubi->vid_hdr_aloffset);
> +	if (err) {
> +		ubi_err("new PEB %d does not contain all 0xFF bytes", peb);
> +		return err;
> +	}
> +

The original idea was to verify that all PEBs from the free list (tree
actually) are really filled with 0xFFs.

What I see is that with fastmap this is not done.

I think the code should be re-thought and re-structured to make the
verification happen with fastmap enabled as well.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
  2012-12-03 19:57         ` [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb() Richard Weinberger
  2012-12-04  7:55           ` Artem Bityutskiy
@ 2012-12-04  7:56           ` Artem Bityutskiy
  2012-12-04  8:15             ` Artem Bityutskiy
  1 sibling, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-04  7:56 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, zsadecki

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

On Mon, 2012-12-03 at 20:57 +0100, Richard Weinberger wrote:
> As ubi_self_check_all_ff() might sleep we are not allowed
> to call it from atomic context.
> For now we call it only from ubi_wl_get_peb().
> There are some code paths where it would also make sense,
> but these paths are currently atomic and only enabled
> when fastmap is used.
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>

Can you confirm that you enabled the 3 debugging options I sent and
tested both FM and non-FM configurations with those options enabled?

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
  2012-12-04  7:56           ` Artem Bityutskiy
@ 2012-12-04  8:15             ` Artem Bityutskiy
  2012-12-04  9:15               ` Richard Weinberger
  0 siblings, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-04  8:15 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, zsadecki

[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]

On Tue, 2012-12-04 at 09:56 +0200, Artem Bityutskiy wrote:
> On Mon, 2012-12-03 at 20:57 +0100, Richard Weinberger wrote:
> > As ubi_self_check_all_ff() might sleep we are not allowed
> > to call it from atomic context.
> > For now we call it only from ubi_wl_get_peb().
> > There are some code paths where it would also make sense,
> > but these paths are currently atomic and only enabled
> > when fastmap is used.
> > 
> > Signed-off-by: Richard Weinberger <richard@nod.at>
> 
> Can you confirm that you enabled the 3 debugging options I sent and
> tested both FM and non-FM configurations with those options enabled?

Let me elaborate why I am asking.

1. If you did the tests and with this fix the debugging stuff works
fine, then it is OK to take this patch now and let you improve the FM
debugging coverage a bit later.

2. If you did not verify, then the assumption is _may_ be broken anyway,
then I would prefer to not take this patch, because I consider it to be
a band aid.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
  2012-12-03 19:57         ` [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl() Richard Weinberger
@ 2012-12-04  8:34           ` Artem Bityutskiy
  2012-12-04 10:24             ` Richard Weinberger
  0 siblings, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2012-12-04  8:34 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, zsadecki

[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]

On Mon, 2012-12-03 at 20:57 +0100, Richard Weinberger wrote:
> If UBI is built without fastmap, get_peb_for_wl() has to
> remove the PEB manially from the free tree.
> Otherwise the requested PEB lives in two trees.
> 
> Reported-by: Zach Sadecki <zsadecki@itwatchdogs.com>
> Signed-off-by: Richard Weinberger <richard@nod.at>

Thanks Richard, I'll send this to Linus ASAP.

> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index da7b449..1f9f5f7 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -679,7 +679,13 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
>  #else
>  static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
>  {
> -	return find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
> +	struct ubi_wl_entry *e;
> +
> +	e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
> +	self_check_in_wl_tree(ubi, e, &ubi->free);

Looking at all for these 'self_check_in_wl_tree()', I think can kill
most of them and just make 'find_wl_entry()' complain if it did not find
anything. But this is a subject of a separate patch.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
  2012-12-04  8:15             ` Artem Bityutskiy
@ 2012-12-04  9:15               ` Richard Weinberger
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Weinberger @ 2012-12-04  9:15 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, zsadecki

Am Tue, 04 Dec 2012 10:15:01 +0200
schrieb Artem Bityutskiy <dedekind1@gmail.com>:

> On Tue, 2012-12-04 at 09:56 +0200, Artem Bityutskiy wrote:
> > On Mon, 2012-12-03 at 20:57 +0100, Richard Weinberger wrote:
> > > As ubi_self_check_all_ff() might sleep we are not allowed
> > > to call it from atomic context.
> > > For now we call it only from ubi_wl_get_peb().
> > > There are some code paths where it would also make sense,
> > > but these paths are currently atomic and only enabled
> > > when fastmap is used.
> > > 
> > > Signed-off-by: Richard Weinberger <richard@nod.at>
> > 
> > Can you confirm that you enabled the 3 debugging options I sent and
> > tested both FM and non-FM configurations with those options enabled?
> 
> Let me elaborate why I am asking.
> 
> 1. If you did the tests and with this fix the debugging stuff works
> fine, then it is OK to take this patch now and let you improve the FM
> debugging coverage a bit later.

Of course I did the tests.
Namely chk_gen, chk_io, tst_emulate_bit_flips and tst_disable_bgt.
(Also in various combinations)

Please note, there is one issue regarding tst_disable_bgt and fastmap.
Fastmap needs the UBI background thread. Therefore ubi-tests fail if
fastmap is enabled and tst_disable_bgt=1.
For 3.8 I can try to rework my code such that the test passes...
But IMHO the tst_disable_bgt test is useless.

I'll rework the fastmap code paths such that ubi_self_check_all_ff()
can be used everywhere. This is planned for 3.8.
As we are at 3.7-rc8 now I get very nervous if I have to touch
much UBI/fastmap code.

Again, with CONFIG_MTD_UBI_FASTMAP=n I'm no longer able to reproduce
any UBI regression.
If there is still a regression, please report it to me I'll fix it
immediately.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
  2012-12-04  8:34           ` Artem Bityutskiy
@ 2012-12-04 10:24             ` Richard Weinberger
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Weinberger @ 2012-12-04 10:24 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, zsadecki

Am Tue, 04 Dec 2012 10:34:39 +0200
schrieb Artem Bityutskiy <dedekind1@gmail.com>:
 > Looking at all for these 'self_check_in_wl_tree()', I think can kill
> most of them and just make 'find_wl_entry()' complain if it did not
> find anything. But this is a subject of a separate patch.
>

Yeah. I've also thought about that.
This is material for 3.8-rc0.

Thanks,
//richard 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI Fastmap fixes for 3.7
       [not found]       ` <1354564667-9549-1-git-send-email-richard@nod.at>
  2012-12-03 19:57         ` [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl() Richard Weinberger
  2012-12-03 19:57         ` [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb() Richard Weinberger
@ 2012-12-04 14:45         ` Zach Sadecki
  2012-12-04 14:48           ` Richard Weinberger
  2 siblings, 1 reply; 18+ messages in thread
From: Zach Sadecki @ 2012-12-04 14:45 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, dedekind1

On 12/03/2012 01:57 PM, Richard Weinberger wrote:
> This patch series contains two brown-paper-bag-bugfixes.
> I'm very sorry for them!
>
> [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
> This one fixes a very odd issue with fastmap disabled.
> With fastmap disabled get_peb_for_wl() did not remove PEBs from
> the free list. Therefore the free, used and scrub trees got corrupted.
>
> [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
> The issue fixed by that patch is also very nasty. I missed that
> ubi_self_check_all_ff() can sleep. So, we cannot call it from
> __wl_get_peb().
>
> Thanks,
> //richard
Thanks, guys.  An overnight test run has shown that these patches have 
fixed the UBI problems on my board.  Something else odd is happening, 
but that's for another thread...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI Fastmap fixes for 3.7
  2012-12-04 14:45         ` UBI Fastmap fixes for 3.7 Zach Sadecki
@ 2012-12-04 14:48           ` Richard Weinberger
  2012-12-04 14:56             ` Zach Sadecki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2012-12-04 14:48 UTC (permalink / raw)
  To: Zach Sadecki; +Cc: linux-mtd, dedekind1

Am Tue, 4 Dec 2012 08:45:18 -0600
schrieb Zach Sadecki <zsadecki@itwatchdogs.com>:

> On 12/03/2012 01:57 PM, Richard Weinberger wrote:
> > This patch series contains two brown-paper-bag-bugfixes.
> > I'm very sorry for them!
> >
> > [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
> > This one fixes a very odd issue with fastmap disabled.
> > With fastmap disabled get_peb_for_wl() did not remove PEBs from
> > the free list. Therefore the free, used and scrub trees got
> > corrupted.
> >
> > [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
> > The issue fixed by that patch is also very nasty. I missed that
> > ubi_self_check_all_ff() can sleep. So, we cannot call it from
> > __wl_get_peb().
> >
> > Thanks,
> > //richard
> Thanks, guys.  An overnight test run has shown that these patches
> have fixed the UBI problems on my board.  Something else odd is
> happening, but that's for another thread...

Good to know.
Is the other issue also related to UBI?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBI Fastmap fixes for 3.7
  2012-12-04 14:48           ` Richard Weinberger
@ 2012-12-04 14:56             ` Zach Sadecki
  0 siblings, 0 replies; 18+ messages in thread
From: Zach Sadecki @ 2012-12-04 14:56 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, dedekind1

On 12/04/2012 08:48 AM, Richard Weinberger wrote:
> Am Tue, 4 Dec 2012 08:45:18 -0600
> schrieb Zach Sadecki <zsadecki@itwatchdogs.com>:
>
>> On 12/03/2012 01:57 PM, Richard Weinberger wrote:
>>> This patch series contains two brown-paper-bag-bugfixes.
>>> I'm very sorry for them!
>>>
>>> [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl()
>>> This one fixes a very odd issue with fastmap disabled.
>>> With fastmap disabled get_peb_for_wl() did not remove PEBs from
>>> the free list. Therefore the free, used and scrub trees got
>>> corrupted.
>>>
>>> [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb()
>>> The issue fixed by that patch is also very nasty. I missed that
>>> ubi_self_check_all_ff() can sleep. So, we cannot call it from
>>> __wl_get_peb().
>>>
>>> Thanks,
>>> //richard
>> Thanks, guys.  An overnight test run has shown that these patches
>> have fixed the UBI problems on my board.  Something else odd is
>> happening, but that's for another thread...
> Good to know.
> Is the other issue also related to UBI?
>
> Thanks,
> //richard

Yes, it's related, but it may be the gpmi driver.  I'm getting -74 ECC 
errors occasionally which I shouldn't ever, as bitflips are being 
reported and fixed correctly now and the bitflip_threshold is turned 
down enough that it'd take 3 simultaneously appearing bit errors to be 
uncorrectable.  When these ECC errors occur there appears to be some 
wacky behavior from UBI. Maybe it's normal/expected, but it sure looks 
odd to me.  Here's the kernel messages:

[49980.860000] UBI: fixable bit-flip detected at PEB 3310
[49980.870000] UBI: schedule PEB 3310 for scrubbing
[49980.970000] UBI: fixable bit-flip detected at PEB 3310
[49981.120000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 120832 bytes from PEB 3261:4096, read only 120832 bytes, retry
[49981.200000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 120832 bytes from PEB 3261:4096, read only 120832 bytes, retry
[49981.270000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 120832 bytes from PEB 3261:4096, read only 120832 bytes, retry
[49981.340000] UBI error: ubi_io_read: error -74 (ECC error) while 
reading 120832 bytes from PEB 3261:4096, read 120832 bytes
[49981.360000] [<c000daec>] (unwind_backtrace+0x0/0x134) from 
[<c02838f0>] (ubi_io_read+0x180/0x338)
[49981.370000] [<c02838f0>] (ubi_io_read+0x180/0x338) from [<c0280e40>] 
(ubi_eba_copy_leb+0x540/0x710)
[49981.370000] [<c0280e40>] (ubi_eba_copy_leb+0x540/0x710) from 
[<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8)
[49981.380000] [<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8) from 
[<c028592c>] (do_work+0x8c/0x110)
[49981.390000] [<c028592c>] (do_work+0x8c/0x110) from [<c0287740>] 
(ubi_thread+0xd4/0x178)
[49981.400000] [<c0287740>] (ubi_thread+0xd4/0x178) from [<c0037580>] 
(kthread+0xa0/0xac)
[49981.410000] [<c0037580>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[49981.420000] UBI warning: ubi_eba_copy_leb: error -74 while reading 
data back from PEB 3261
[49981.430000] UBI: run torture test for PEB 3261
[49982.100000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49982.180000] UBI: fixable bit-flip detected at PEB 3310
[49982.330000] UBI: fixable bit-flip detected at PEB 3261
[49982.330000] UBI: run torture test for PEB 3261
[49983.000000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49983.080000] UBI: fixable bit-flip detected at PEB 3310
[49983.240000] UBI: fixable bit-flip detected at PEB 3261
[49983.240000] UBI: run torture test for PEB 3261
[49983.910000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49984.000000] UBI: fixable bit-flip detected at PEB 3310
[49984.170000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49984.260000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49984.340000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49984.420000] UBI error: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read 126976 bytes
[49984.430000] [<c000daec>] (unwind_backtrace+0x0/0x134) from 
[<c02838f0>] (ubi_io_read+0x180/0x338)
[49984.440000] [<c02838f0>] (ubi_io_read+0x180/0x338) from [<c0280e40>] 
(ubi_eba_copy_leb+0x540/0x710)
[49984.450000] [<c0280e40>] (ubi_eba_copy_leb+0x540/0x710) from 
[<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8)
[49984.460000] [<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8) from 
[<c028592c>] (do_work+0x8c/0x110)
[49984.470000] [<c028592c>] (do_work+0x8c/0x110) from [<c0287740>] 
(ubi_thread+0xd4/0x178)
[49984.480000] [<c0287740>] (ubi_thread+0xd4/0x178) from [<c0037580>] 
(kthread+0xa0/0xac)
[49984.490000] [<c0037580>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[49984.500000] UBI warning: ubi_eba_copy_leb: error -74 while reading 
data back from PEB 3261
[49984.500000] UBI: run torture test for PEB 3261
[49985.170000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49985.260000] UBI: fixable bit-flip detected at PEB 3310
[49985.440000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49985.530000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49985.610000] UBI warning: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read only 126976 bytes, retry
[49985.690000] UBI error: ubi_io_read: error -74 (ECC error) while 
reading 126976 bytes from PEB 3261:4096, read 126976 bytes
[49985.710000] [<c000daec>] (unwind_backtrace+0x0/0x134) from 
[<c02838f0>] (ubi_io_read+0x180/0x338)
[49985.710000] [<c02838f0>] (ubi_io_read+0x180/0x338) from [<c0280e40>] 
(ubi_eba_copy_leb+0x540/0x710)
[49985.720000] [<c0280e40>] (ubi_eba_copy_leb+0x540/0x710) from 
[<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8)
[49985.730000] [<c0286e00>] (wear_leveling_worker+0x4bc/0x9c8) from 
[<c028592c>] (do_work+0x8c/0x110)
[49985.740000] [<c028592c>] (do_work+0x8c/0x110) from [<c0287740>] 
(ubi_thread+0xd4/0x178)
[49985.750000] [<c0287740>] (ubi_thread+0xd4/0x178) from [<c0037580>] 
(kthread+0xa0/0xac)
[49985.760000] [<c0037580>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[49985.770000] UBI warning: ubi_eba_copy_leb: error -74 while reading 
data back from PEB 3261
[49985.780000] UBI: run torture test for PEB 3261
[49986.450000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49986.530000] UBI: fixable bit-flip detected at PEB 3310
[49986.710000] UBI: fixable bit-flip detected at PEB 3261
[49986.710000] UBI: run torture test for PEB 3261
[49987.380000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49987.500000] UBI: fixable bit-flip detected at PEB 3310
[49987.700000] UBI: fixable bit-flip detected at PEB 3261
[49987.710000] UBI: run torture test for PEB 3261
[49988.370000] UBI: PEB 3261 passed torture test, do not mark it as bad
[49988.450000] UBI: fixable bit-flip detected at PEB 3310
[49988.620000] UBI: scrubbed PEB 3310 (LEB 3:807), data moved to PEB 3261

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-12-04 14:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-30 15:05 UBI wl_tree_add problems after PEB scrubbed Zach Sadecki
2012-12-03 10:48 ` Artem Bityutskiy
2012-12-03 11:01   ` Artem Bityutskiy
2012-12-03 11:19   ` Richard Weinberger
2012-12-03 14:46 ` Artem Bityutskiy
2012-12-03 15:02   ` Richard Weinberger
2012-12-03 15:33     ` Artem Bityutskiy
     [not found]       ` <1354564667-9549-1-git-send-email-richard@nod.at>
2012-12-03 19:57         ` [PATCH 1/2] ubi: Remove PEB from free tree in get_peb_for_wl() Richard Weinberger
2012-12-04  8:34           ` Artem Bityutskiy
2012-12-04 10:24             ` Richard Weinberger
2012-12-03 19:57         ` [PATCH 2/2] ubi: Dont call ubi_self_check_all_ff() in __wl_get_peb() Richard Weinberger
2012-12-04  7:55           ` Artem Bityutskiy
2012-12-04  7:56           ` Artem Bityutskiy
2012-12-04  8:15             ` Artem Bityutskiy
2012-12-04  9:15               ` Richard Weinberger
2012-12-04 14:45         ` UBI Fastmap fixes for 3.7 Zach Sadecki
2012-12-04 14:48           ` Richard Weinberger
2012-12-04 14:56             ` Zach Sadecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox