* ubi : kernel panic on erroneous block @ 2010-08-10 9:56 Matthieu CASTET 2010-08-10 11:42 ` Artem Bityutskiy 2010-08-29 20:46 ` Artem Bityutskiy 0 siblings, 2 replies; 5+ messages in thread From: Matthieu CASTET @ 2010-08-10 9:56 UTC (permalink / raw) To: linux-mtd@lists.infradead.org, Artem Bityutskiy [-- Attachment #1: Type: text/plain, Size: 14390 bytes --] Hi, when running test with ubifs I found the following crash. One block is instable (some read fails with ecc error correctable or not) after a power cut. This is due to interrupted write or erase. Our test do first a read of the ubi volume (cat /dev/ubi3_0 > /dev/null) to force complete read of it. In this case ecc correctable is detected, and scrubbing is scheduled But ubi_eba_copy_leb: the block become uncorrectable and added to erroneous list. When mounting ubifs read doesn't check that it is erroneous and return data. It is added again for scrubbing, but prot_queue_del crash because we already remove it in the first scrubbing try. Here an attempt to fix the problem. This is ugly. I didn't try it yet. I erased my corrupted flash by accident. One other solution could be to add the test in ubi_wl_scrub_peb, but I don't think it is ok to return data on erroneous block. An other solution could be to unmap the block (read will return 0xff), but this may break upper layer ? Matthieu [ 6.613769] UBI DBG (pid 266): ubi_scan: scanning is finished [ 6.627283] UBI: attached mtd3 to ubi3 [ 6.630876] UBI: MTD device name: "P6system" [ 6.636130] UBI: MTD device size: 32 MiB [ 6.640948] UBI: number of good PEBs: 256 [ 6.645567] UBI: number of bad PEBs: 0 [ 6.649979] UBI: max. allowed volumes: 128 [ 6.654591] UBI: wear-leveling threshold: 4096 [ 6.659274] UBI: number of internal volumes: 1 [ 6.663701] UBI: number of user volumes: 1 [ 6.668138] UBI: available PEBs: 0 [ 6.672559] UBI: total number of reserved PEBs: 256 [ 6.677433] UBI: number of PEBs reserved for bad PEB handling: 2 [ 6.683416] UBI: max/mean erase counter: 3717/3558 [ 6.688201] UBI: image sequence number: 1403635655 [ 6.693008] UBI: background thread "ubi_bgt3d" started, PID 269 UBI device number 3, total 256 LEBs (32505856 bytes, 31.0 MiB), available 0 LEBs (0 bytes), LEB size 126976 bytes (124.0 KiB) ----> cat /dev/ubi3_0 > /dev/null [ 6.908524] BA315_STATUS_DEC_ERR : 0 4 on 24525 [ 6.912907] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 6.931062] UBI DBG (pid 272): ubi_io_read: fixable bit-flip detected at PEB 189 [ 6.938448] UBI DBG (pid 272): ubi_wl_scrub_peb: schedule PEB 189 for scrubbing [ 6.947677] BA315_STATUS_DEC_ERR : 512 4 on 24525 [ 6.952226] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 6.976651] UBI error: ubi_io_read: error -74 while reading 126976 bytes from PEB 189:4096, read 126976 bytes [ 6.986429] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0160fa8>] (ubi_io_read+0xf0/0x258) [ 6.994594] [<c0160eb8>] (ubi_io_read+0x0/0x258) from [<c01607e8>] (ubi_eba_copy_leb+0x204/0x58c) [ 7.003434] [<c01605e4>] (ubi_eba_copy_leb+0x0/0x58c) from [<c01638e8>] (wear_leveling_worker+0x2e4/0x630) [ 7.013074] [<c0163604>] (wear_leveling_worker+0x0/0x630) from [<c0162b0c>] (do_work+0x94/0xe8) [ 7.021758] [<c0162a78>] (do_work+0x0/0xe8) from [<c0163cc4>] (ubi_thread+0x90/0x118) [ 7.029576] r7:c789bc50 r6:00000000 r5:c7848000 r4:c789b800 [ 7.035222] [<c0163c34>] (ubi_thread+0x0/0x118) from [<c0047204>] (kthread+0x50/0x7c) [ 7.043036] [<c00471b4>] (kthread+0x0/0x7c) from [<c00360e8>] (do_exit+0x0/0x6ac) [ 7.050505] r5:00000000 r4:00000000 [ 7.054071] UBI warning: ubi_eba_copy_leb: error -74 while reading data from PEB 189 echo sleeping 38 real 0m 3.52s user 0m 0.01s sys 0m 0.25s ----> mounting ubifs on /dev/ubi3_0 info.type = 0x04 info.flags = 0x00000400 info.size = 0x02000000 info.erasesize = 0x00020000 info.writesize = 2048 info.oobsize = 64 ecc.eccbytes = 12 ecc.eccpos = 2,3,4,5,6,7,8,9,10,11,12,13, Please press Enter to activate this console. starting pid 277, tty '': '/bin/sh' BusyBox v1.16.0 (2010-06-30 18:04:36 CEST) b[ 10.423398] UBIFS: recovery needed uilt-in shell (ash) Enter 'help' for a list of built-in[ 10.431750] BA315_STATUS_DEC_ERR : 512 4 on 24525 commands. # echo sleepin[ 10.438532] ff g 38 sleeping 38 # ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 10.467210] UBI error: ubi_io_read: error -74 while reading 126976 bytes from PEB 189:4096, read 126976 bytes [ 10.477007] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0160fa8>] (ubi_io_read+0xf0/0x258) [ 10.485144] [<c0160eb8>] (ubi_io_read+0x0/0x258) from [<c016035c>] (ubi_eba_read_leb+0x1a0/0x428) [ 10.494005] [<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from [<c015e3c0>] (ubi_leb_read+0xe8/0x138) [ 10.502859] [<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] (ubifs_start_scan+0x7c/0xf4) [ 10.511633] r7:c79f3000 r6:00000000 r5:c798b8e0 r4:00000000 [ 10.517276] [<c00d689c>] (ubifs_start_scan+0x0/0xf4) from [<c00d6b4c>] (ubifs_scan+0x2c/0x298) [ 10.525878] r8:00000003 r7:c79f3000 r6:00000000 r5:c8d01000 r4:0001f000 [ 10.532560] [<c00d6b20>] (ubifs_scan+0x0/0x298) from [<c00d71dc>] (ubifs_replay_journal+0x14c/0x13a4) [ 10.541769] [<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054) [ 10.551580] [<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac) [ 10.560525] [<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] (vfs_kern_mount+0x58/0x94) [ 10.569124] [<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] (do_kern_mount+0x40/0xe8) [ 10.577736] r8:c79fa000 r7:c02253ec r6:00000000 r5:c78fb000 r4:00000000 [ 10.584408] [<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] (do_new_mount+0x68/0x8c) [ 10.592833] r8:00000000 r7:0000000a r6:c784bef0 r5:00000000 r4:c79fa000 [ 10.599519] [<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] (do_mount+0x15c/0x1b8) [ 10.607694] r7:c79fa000 r6:c78fb000 r5:c78d9000 r4:00000404 [ 10.613327] [<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] (sys_mount+0x8c/0xd4) [ 10.621144] [<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c) [ 10.629483] r7:00000015 r6:00008840 r5:00000000 r4:00000000 [ 10.638307] BA315_STATUS_DEC_ERR : 0 4 on 24525 [ 10.642677] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 10.667078] UBI DBG (pid 273): ubi_io_read: fixable bit-flip detected at PEB 189 [ 10.674321] UBI DBG (pid 273): ubi_wl_scrub_peb: schedule PEB 189 for scrubbing [ 10.681652] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 10.689703] pgd = c79e4000 [ 10.692381] [00000000] *pgd=47841031, *pte=00000000, *ppte=00000000 [ 10.698639] Internal error: Oops: 817 [#1] [ 10.702713] Modules linked in: [ 10.705761] CPU: 0 Not tainted (2.6.27.47-parrot-dirty #212) [ 10.711769] PC is at prot_queue_del+0x2c/0x50 [ 10.716100] LR is at ubi_wl_scrub_peb+0xec/0x13c [ 10.720700] pc : [<c0162430>] lr : [<c01635b4>] psr: a0000013 [ 10.720716] sp : c784ba70 ip : c78ff290 fp : c784ba7c [ 10.732157] r10: 00000000 r9 : 00000003 r8 : 000000bd [ 10.737371] r7 : c789bbcc r6 : c789bbc0 r5 : c789b800 r4 : c78ff290 [ 10.743883] r3 : 00100100 r2 : 00000001 r1 : 00000000 r0 : ffffffed [ 10.750398] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 10.757519] Control: 0005317f Table: 479e4000 DAC: 00000015 [ 10.763249] Process endurance (pid: 273, stack limit = 0xc784a268) [ 10.769416] Stack: (0xc784ba70 to 0xc784c000) [ 10.773753] ba60: c784baa4 c784ba80 c01635b4 c0162414 [ 10.782004] ba80: c8d01000 00000000 c8d01000 c789b800 c73fd000 000000bd c784baec c784baa8 [ 10.790254] baa0: c01603bc c01634d8 0001f000 00000001 c0236138 c8d01000 00000001 00000000 [ 10.798505] bac0: 00000000 c73fd000 c8d01000 00000000 0001f000 00000003 ffffff8b 00000000 [ 10.806755] bae0: c784bb1c c784baf0 c015e3c0 c01601cc 00000000 0001f000 00000000 c0079a30 [ 10.815005] bb00: 00000000 c798b8e0 00000000 c79f3000 c784bb54 c784bb20 c00d6918 c015e2e8 [ 10.823256] bb20: 0001f000 00000000 c0023874 c0023010 c79f3000 c8d01000 c79f3000 c798b8e0 [ 10.831506] bb40: c79f3000 00000003 c784bbac c784bb58 c00e3650 c00d68ac 00000004 c784bbac [ 10.839757] bb60: c784bb78 c8d01000 c00d6d08 c00d6d14 00000000 00000000 c8d01000 0001f000 [ 10.848007] bb80: c784bbac 00000000 c79f3850 c798b8e0 c79f3000 00000003 ffffff8b 00000000 [ 10.856257] bba0: c784bbf4 c784bbb0 c00e444c c00e3624 00000000 c784bbc0 c8d01000 c00d66b4 [ 10.864508] bbc0: 00005800 00000001 c79f3870 00000000 c79f3850 c79f3870 c79f3000 00000003 [ 10.872758] bbe0: ffffff8b c79f3000 c784bd34 c784bbf8 c00d7c20 c00e4394 00000001 00000000 [ 10.881009] bc00: 00000000 c02305c0 c784a000 000000ef c784bc74 c784bc20 c005ee14 c005de64 [ 10.889259] bc20: 00000001 00000044 000200d2 00000000 c79f3850 00000001 c0007228 c8be5000 [ 10.897509] bc40: 00000000 00000000 c8d01000 c79f376c c784a000 ffffff8b 00000000 00000000 [ 10.905760] bc60: c789bbb0 c7972760 c784bcac c784bc78 c7972760 c789b800 00000007 00000000 [ 10.914010] bc80: 00000000 c784bd00 c7972760 c7972760 c789b800 00000007 c784bcbc c784bca8 [ 10.922261] bca0: c015edcc c00792bc c789b800 c73fd000 c784bce4 c784bcc0 c015eea0 c015ed74 [ 10.930511] bcc0: 0001f000 00000001 00000018 c79f3000 00000000 00000007 c784bcf4 c784bce8 [ 10.938761] bce0: c015e15c c015ee30 c784bd34 c784bcf8 c00e08d8 c015e100 0000000b 00000000 [ 10.947012] bd00: c891b00b 00000000 00000000 000000eb 00000000 c79f3870 c79f3000 000000eb [ 10.955262] bd20: 00000000 00000000 c784bdec c784bd38 c00cdd68 c00d70a0 00000000 c00fccdc [ 10.963513] bd40: c780c1a0 c780ff60 00000000 c784bdc0 00000001 00000003 c784bef0 00000000 [ 10.971763] bd60: 0001e5a0 00000000 00117000 00000000 c784bd94 00000003 01d2f000 00000000 [ 10.980014] bd80: 00000000 c79b8c00 c73ffd20 c79f3724 c79f3008 c79f37f8 c79f36a4 c79f371c [ 10.988264] bda0: c00cbf10 c0225404 c784bdec 00000000 00000001 00000000 0007c001 00000000 [ 10.996514] bdc0: c780c1a0 c79b8c00 c73ffd20 c79b8c00 00000000 c02253ec c784bef0 00000000 [ 11.004765] bde0: c784be6c c784bdf0 c00ced04 c00cd1f4 00000000 c023997c 00000003 00000000 [ 11.013015] be00: 000000fa c784be10 01e46000 00000000 c78fb000 00000003 00000000 00000000 [ 11.021266] be20: 00000001 0001f000 00000006 c73fd18c 0fd00001 c780cf20 c784be6c c784be48 [ 11.029516] be40: c0093e4c 00000000 c78fb000 c780cf20 00000000 c02253ec c784bef0 0000000a [ 11.037766] be60: c784be9c c784be70 c007f04c c00cec50 c780cf20 c784be80 c00928b0 00000000 [ 11.046017] be80: c78fb000 00000000 c02253ec c79fa000 c784bec4 c784bea0 c007f0e8 c007f004 [ 11.054267] bea0: c784bec4 c79fa000 00000000 c784bef0 0000000a 00000000 c784bee4 c784bec8 [ 11.062518] bec0: c0095628 c007f0b8 00000404 c78d9000 c78fb000 c79fa000 c784bf6c c784bee8 [ 11.070768] bee0: c00957a8 c00955d0 c78fb000 00000000 c780c3a0 c74844b8 c784bf6c c784bf08 [ 11.079018] bf00: c002382c 00000001 00000001 00000000 00000000 00000000 0000038c c784bf7c [ 11.087269] bf20: 00001000 c78fb000 c0023d84 c784a000 40068008 c784bf6c 000200d0 c784bf50 [ 11.095519] bf40: 00000000 00000000 c78d9000 0000a38c 00000404 c0023d84 c784a000 40068008 [ 11.103770] bf60: c784bfa4 c784bf70 c0095890 c009565c 00000000 c78d9000 beb41f78 c78fb000 [ 11.112020] bf80: c79fa000 00000000 00000000 00000000 00008840 00000015 00000000 c784bfa8 [ 11.120270] bfa0: c0023c00 c0095814 00000000 00000000 0000a38c 0000a384 0000a398 00000404 [ 11.128521] bfc0: 00000000 00000000 00008840 00000015 000086cc 00000001 40068008 00008e60 [ 11.136771] bfe0: 4001b424 beb41d10 00008cd8 4001b438 20000010 0000a38c 7d195ec9 5c1aa6a4 [ 11.145022] Backtrace: [ 11.147455] [<c0162404>] (prot_queue_del+0x0/0x50) from [<c01635b4>] (ubi_wl_scrub_peb+0xec/0x13c) [ 11.156400] [<c01634c8>] (ubi_wl_scrub_peb+0x0/0x13c) from [<c01603bc>] (ubi_eba_read_leb+0x200/0x428) [ 11.165693] r8:000000bd r7:c73fd000 r6:c789b800 r5:c8d01000 r4:00000000 [ 11.172379] [<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from [<c015e3c0>] (ubi_leb_read+0xe8/0x138) [ 11.181237] [<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] (ubifs_start_scan+0x7c/0xf4) [ 11.190010] r7:c79f3000 r6:00000000 r5:c798b8e0 r4:00000000 [ 11.195654] [<c00d689c>] (ubifs_start_scan+0x0/0xf4) from [<c00e3650>] (ubifs_recover_leb+0x3c/0x730) [ 11.204861] r8:00000003 r7:c79f3000 r6:c798b8e0 r5:c79f3000 r4:c8d01000 [ 11.211547] [<c00e3614>] (ubifs_recover_leb+0x0/0x730) from [<c00e444c>] (ubifs_recover_log_leb+0xc8/0x2dc) [ 11.221274] [<c00e4384>] (ubifs_recover_log_leb+0x0/0x2dc) from [<c00d7c20>] (ubifs_replay_journal+0xb90/0x13a4) [ 11.231435] [<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054) [ 11.241249] [<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac) [ 11.250194] [<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] (vfs_kern_mount+0x58/0x94) [ 11.258793] [<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] (do_kern_mount+0x40/0xe8) [ 11.267390] r8:c79fa000 r7:c02253ec r6:00000000 r5:c78fb000 r4:00000000 [ 11.274077] [<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] (do_new_mount+0x68/0x8c) [ 11.282501] r8:00000000 r7:0000000a r6:c784bef0 r5:00000000 r4:c79fa000 [ 11.289188] [<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] (do_mount+0x15c/0x1b8) [ 11.297352] r7:c79fa000 r6:c78fb000 r5:c78d9000 r4:00000404 [ 11.302997] [<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] (sys_mount+0x8c/0xd4) [ 11.310813] [<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c) [ 11.319151] r7:00000015 r6:00008840 r5:00000000 r4:00000000 [ 11.324794] Code: 089da800 e59c1004 e59c2000 e59f3018 (e5812000) [ 11.330924] Kernel panic - not syncing: Fatal exception [-- Attachment #2: ubifs.diff --] [-- Type: text/x-diff, Size: 1530 bytes --] diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index 7fbe0d7..289c003 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -367,6 +367,7 @@ out_unlock: * returned for any volume type if an ECC error was detected by the MTD device * driver. Other negative error cored may be returned in case of other errors. */ +int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root); int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, void *buf, int offset, int len, int check) { @@ -392,6 +393,19 @@ int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, memset(buf, 0xFF, len); return 0; } + { + struct ubi_wl_entry *e; + int bad; + + spin_lock(&ubi->wl_lock); + e = ubi->lookuptbl[pnum]; + bad = in_wl_tree(e, &ubi->erroneous); + spin_unlock(&ubi->wl_lock); + /* we should not append to read bad block */ + if (bad) { + return -EBADMSG; + } + } dbg_eba("read %d bytes from offset %d of LEB %d:%d, PEB %d", len, offset, vol_id, lnum, pnum); diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 10b6100..3c4c3ed 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -292,7 +292,7 @@ static int produce_free_peb(struct ubi_device *ubi) * This function returns non-zero if @e is in the @root RB-tree and zero if it * is not. */ -static int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root) +/*static */int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root) { struct rb_node *p; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ubi : kernel panic on erroneous block 2010-08-10 9:56 ubi : kernel panic on erroneous block Matthieu CASTET @ 2010-08-10 11:42 ` Artem Bityutskiy 2010-08-23 13:30 ` Matthieu CASTET 2010-08-29 20:46 ` Artem Bityutskiy 1 sibling, 1 reply; 5+ messages in thread From: Artem Bityutskiy @ 2010-08-10 11:42 UTC (permalink / raw) To: Matthieu CASTET, Adrian.Hunter Cc: Artem Bityutskiy, linux-mtd@lists.infradead.org On Tue, 2010-08-10 at 11:56 +0200, Matthieu CASTET wrote: > Hi, > > > when running test with ubifs I found the following crash. > One block is instable (some read fails with ecc error correctable or > not) after a power cut. This is due to interrupted write or erase. > > Our test do first a read of the ubi volume (cat /dev/ubi3_0 > /dev/null) > to force complete read of it. > > In this case ecc correctable is detected, and scrubbing is scheduled > But ubi_eba_copy_leb: the block become uncorrectable and added to > erroneous list. > When mounting ubifs read doesn't check that it is erroneous and return data. > It is added again for scrubbing, but prot_queue_del crash because we > already remove it in the first scrubbing try. > > Here an attempt to fix the problem. This is ugly. I didn't try it yet. I > erased my corrupted flash by accident. > > One other solution could be to add the test in ubi_wl_scrub_peb, but I > don't think it is ok to return data on erroneous block. > > An other solution could be to unmap the block (read will return 0xff), > but this may break upper layer ? Matthieu, unfortunately I'm on holidays so cannot really look at this. And I already have a lot of UBI/UBIFS issues waiting for me to look at. I think I'll start looking at the things only in mid-September/October. Sorry for this. But may be Adrian could take a look at this, if he has some time? :-) Artem. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ubi : kernel panic on erroneous block 2010-08-10 11:42 ` Artem Bityutskiy @ 2010-08-23 13:30 ` Matthieu CASTET 0 siblings, 0 replies; 5+ messages in thread From: Matthieu CASTET @ 2010-08-23 13:30 UTC (permalink / raw) To: dedekind1@gmail.com Cc: Artem Bityutskiy, linux-mtd@lists.infradead.org, Adrian.Hunter@nokia.com Hi Artem, Artem Bityutskiy a écrit : > On Tue, 2010-08-10 at 11:56 +0200, Matthieu CASTET wrote: >> Hi, >> > > Matthieu, unfortunately I'm on holidays so cannot really look at this. > And I already have a lot of UBI/UBIFS issues waiting for me to look at. > I think I'll start looking at the things only in mid-September/October. > Sorry for this. But may be Adrian could take a look at this, if he has > some time? :-) I don't know if you returned from holidays, but as you post stuff on ML it will post further investigation. I have done more test on these flash and I got other failures. The problem seems in the handling of interrupted write. On some nand we use, the page becomes instable and read can return unstable values. The manufacturer told us we should not use page where write was interrupted, they should have a erase cycle before they can be used again. On mounting, for the page where write was interrupted by a power cut : - I saw ecc error, in these case ubifs should reject it in recovery handling and everything should be fine. - I saw correctable error, in this case ubi move the block unless the next read in copy_page return an ecc error. In case of ecc error in copy we saw it too late, ubifs recovery is already done. - in this case ubifs recover can reject it if the data is not ok (bad crc, ...). Note that in these case we did the scrubbing move for nothing. - I saw page that return correct data (ecc and crc ok), but later they return (un)correctable error. Again this is too late [1], recovery is already done. It seems ubi/ubifs doesn't identify interrupted write pages on scanning/mount ATM. It only relies on ecc/crc, but this is not enough for unstable page. They can be good (or 1 bit error) for one read and bad the next read. So the problem is to identify interrupted write pages on scanning/mount. For static volume it should be easy with the interrupted flags. There is the tricky case of data move (for wear leveling or scrubbing) : if sqnum of the copy is the biggest, we should ignore it/copy it. But for dynamic/ubifs that's an other story. May be using ubi sqnum + ubifs journal it should be possible to do something. Matthieu PS : the same story happen for erase, but ubi should handle them correctly. [1] [ 12.720244] UBIFS: un-mount UBI device 3, volume 0 [ 12.760056] UBIFS: mounted UBI device 3, volume 0, name "system" [ 12.765919] UBIFS: file system size: 30601216 bytes (29884 KiB, 29 MiB, 241 LEBs) [ 12.773642] UBIFS: journal size: 1523712 bytes (1488 KiB, 1 MiB, 12 LEBs) [ 12.780868] UBIFS: media format: w4/r0 (latest is w4/r0) [ 12.786668] UBIFS: default compressor: none [ 12.790852] UBIFS: reserved for root: 1445370 bytes (1411 KiB) writing file '//mnt/dir06/file0046.bin' num=70, size=147120 writing file '//mnt/dir0c/file006c.bin' num=108, size=288146 [ 13.491407] UBI error: ubi_io_read: error -74 while reading 60 bytes from PEB 106:129480, read 60 bytes [ 13.500785] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0161040>] (ubi_io_read+0xf0/0x258) [ 13.508952] [<c0160f50>] (ubi_io_read+0x0/0x258) from [<c01603a0>] (ubi_eba_read_leb+0x1b4/0x490) [ 13.517791] [<c01601ec>] (ubi_eba_read_leb+0x0/0x490) from [<c015e3f0>] (ubi_leb_read+0xe8/0x138) [ 13.526649] [<c015e308>] (ubi_leb_read+0x0/0x138) from [<c00d0c48>] (ubifs_read_node+0x40/0x190) [ 13.535423] r7:00000002 r6:00000000 r5:c78489a0 r4:c78489a0 [ 13.541065] [<c00d0c08>] (ubifs_read_node+0x0/0x190) from [<c00d18b8>] (ubifs_read_node_wbuf+0x4c/0x204) [ 13.550547] [<c00d186c>] (ubifs_read_node_wbuf+0x0/0x204) from [<c00e6b60>] (ubifs_tnc_read_node+0x5c/0xf8) [ 13.560274] [<c00e6b04>] (ubifs_tnc_read_node+0x0/0xf8) from [<c00d32a8>] (matches_name+0x94/0xdc) [ 13.569218] [<c00d3214>] (matches_name+0x0/0xdc) from [<c00d3334>] (resolve_collision+0x44/0x204) [ 13.578074] [<c00d32f0>] (resolve_collision+0x0/0x204) from [<c00d45e4>] (ubifs_tnc_remove_nm+0xf0/0x108) [ 13.587615] [<c00d44f4>] (ubifs_tnc_remove_nm+0x0/0x108) from [<c00c7f08>] (ubifs_jnl_rename+0x4f8/0x70c) [ 13.597169] [<c00c7a10>] (ubifs_jnl_rename+0x0/0x70c) from [<c00caaf8>] (ubifs_rename+0x2b0/0x5e4) [ 13.606117] [<c00ca848>] (ubifs_rename+0x0/0x5e4) from [<c008581c>] (vfs_rename+0x238/0x270) [ 13.614538] [<c00855e4>] (vfs_rename+0x0/0x270) from [<c0086e54>] (sys_renameat+0x1b8/0x1cc) [ 13.622965] [<c0086c9c>] (sys_renameat+0x0/0x1cc) from [<c0086e8c>] (sys_rename+0x24/0x28) [ 13.631213] [<c0086e68>] (sys_rename+0x0/0x28) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c) [ 13.639670] UBIFS error (pid 273): ubifs_read_node: bad node type (0 but expected 2) [ 13.647371] UBIFS error (pid 273): ubifs_read_node: bad node at LEB 47:125384 [ 13.654514] UBIFS warning (pid 273): ubifs_ro_mode: switched to read-only mode, error -22 /endurance: endurance.c: 197: create_file: Assertion `status == 0' failed. [ 46.357586] UBIFS error (pid 101): make_reservation: cannot reserve 160 bytes in jhead 1, error -30 [ 46.366503] UBIFS error (pid 101): ubifs_write_inode: can't write inode 19507, error -30 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ubi : kernel panic on erroneous block 2010-08-10 9:56 ubi : kernel panic on erroneous block Matthieu CASTET 2010-08-10 11:42 ` Artem Bityutskiy @ 2010-08-29 20:46 ` Artem Bityutskiy 2010-08-29 21:00 ` Artem Bityutskiy 1 sibling, 1 reply; 5+ messages in thread From: Artem Bityutskiy @ 2010-08-29 20:46 UTC (permalink / raw) To: Matthieu CASTET; +Cc: Artem Bityutskiy, linux-mtd@lists.infradead.org Hi Matthiew, I've read both of your mails. It looks like there are several issues. Let's try to identify them and then deal with them one by one. I think I see one of them, you'll find the fix at the end of this e-mail. But let me also comment on your e-mail to let you know what I think was happening. On Tue, 2010-08-10 at 11:56 +0200, Matthieu CASTET wrote: > when running test with ubifs I found the following crash. > One block is instable (some read fails with ecc error correctable or > not) after a power cut. This is due to interrupted write or erase. OK > Our test do first a read of the ubi volume (cat /dev/ubi3_0 > /dev/null) > to force complete read of it. OK > In this case ecc correctable is detected, and scrubbing is scheduled > But ubi_eba_copy_leb: the block become uncorrectable and added to > erroneous list. OK, this is fine and expected behavior. > When mounting ubifs read doesn't check that it is erroneous and return data. > It is added again for scrubbing, but prot_queue_del crash because we > already remove it in the first scrubbing try. > > Here an attempt to fix the problem. This is ugly. I didn't try it yet. I > erased my corrupted flash by accident. > > One other solution could be to add the test in ubi_wl_scrub_peb, but I > don't think it is ok to return data on erroneous block. > > An other solution could be to unmap the block (read will return 0xff), > but this may break upper layer ? Err, I think the bug is actually very simple - when I introduced the erroneous list, I just forgot to add a check in 'ubi_wl_scrub_peb()'. > [ 6.613769] UBI DBG (pid 266): ubi_scan: scanning is finished > [ 6.627283] UBI: attached mtd3 to ubi3 > [ 6.630876] UBI: MTD device name: "P6system" > [ 6.636130] UBI: MTD device size: 32 MiB > [ 6.640948] UBI: number of good PEBs: 256 > [ 6.645567] UBI: number of bad PEBs: 0 > [ 6.649979] UBI: max. allowed volumes: 128 > [ 6.654591] UBI: wear-leveling threshold: 4096 > [ 6.659274] UBI: number of internal volumes: 1 > [ 6.663701] UBI: number of user volumes: 1 > [ 6.668138] UBI: available PEBs: 0 > [ 6.672559] UBI: total number of reserved PEBs: 256 > [ 6.677433] UBI: number of PEBs reserved for bad PEB handling: 2 > [ 6.683416] UBI: max/mean erase counter: 3717/3558 > [ 6.688201] UBI: image sequence number: 1403635655 > [ 6.693008] UBI: background thread "ubi_bgt3d" started, PID 269 > UBI device number 3, total 256 LEBs (32505856 bytes, 31.0 MiB), > available 0 LEBs (0 bytes), LEB size 126976 bytes (124.0 KiB) > > ----> cat /dev/ubi3_0 > /dev/null OK, you are reading whole UBI. > [ 6.908524] BA315_STATUS_DEC_ERR : 0 4 on 24525 > [ 6.912907] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > [ 6.931062] UBI DBG (pid 272): ubi_io_read: fixable bit-flip detected > at PEB 189 > [ 6.938448] UBI DBG (pid 272): ubi_wl_scrub_peb: schedule PEB 189 for > scrubbing OK, a fixable I/O error happens, and this PEB is scheduled for scrubbing. So far so good. > [ 6.947677] BA315_STATUS_DEC_ERR : 512 4 on 24525 > [ 6.952226] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > [ 6.976651] UBI error: ubi_io_read: error -74 while reading 126976 > bytes from PEB 189:4096, read 126976 bytes OK. While scrubbing, we have a hard error, and UBI reports about it. As you described, we are dealing with an unstable NAND page which sometimes may be read with no errors, sometimes with a correctable ECC error, sometimes with an uncorrectable ECC error. > [ 6.986429] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0160fa8>] > (ubi_io_read+0xf0/0x258) > [ 6.994594] [<c0160eb8>] (ubi_io_read+0x0/0x258) from [<c01607e8>] > (ubi_eba_copy_leb+0x204/0x58c) > [ 7.003434] [<c01605e4>] (ubi_eba_copy_leb+0x0/0x58c) from > [<c01638e8>] (wear_leveling_worker+0x2e4/0x630) > [ 7.013074] [<c0163604>] (wear_leveling_worker+0x0/0x630) from > [<c0162b0c>] (do_work+0x94/0xe8) > [ 7.021758] [<c0162a78>] (do_work+0x0/0xe8) from [<c0163cc4>] > (ubi_thread+0x90/0x118) > [ 7.029576] r7:c789bc50 r6:00000000 r5:c7848000 r4:c789b800 > [ 7.035222] [<c0163c34>] (ubi_thread+0x0/0x118) from [<c0047204>] > (kthread+0x50/0x7c) > [ 7.043036] [<c00471b4>] (kthread+0x0/0x7c) from [<c00360e8>] > (do_exit+0x0/0x6ac) > [ 7.050505] r5:00000000 r4:00000000 You have debugging enable, so you are enjoins extra UBI prints, but they are useful :-) > [ 7.054071] UBI warning: ubi_eba_copy_leb: error -74 while reading > data from PEB 189 The WL code warns that we cannot read the LEB, this is OK. So far so good. > ----> mounting ubifs on /dev/ubi3_0 [snip] > [ 10.467210] UBI error: ubi_io_read: error -74 while reading 126976 > bytes from PEB 189:4096, read 126976 bytes OK, we are now mounting UBIFS, which reads the LEB which is mapped to the faulty PEB 189, which is currently sitting in the 'erroneous' list in UBI, which is kind of logical. Also note, most of PEBs which were write-interrupted will belong to the UBIFS journal. There is another case, as you already identified in your previous mail, the PEBs which were WL-copied in UBI. But let's concentrate on your case. The UBIFS journal scanning code expects -EBADMSG errors, and is designed to handle them. > [ 10.477007] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0160fa8>] > (ubi_io_read+0xf0/0x258) > [ 10.485144] [<c0160eb8>] (ubi_io_read+0x0/0x258) from [<c016035c>] > (ubi_eba_read_leb+0x1a0/0x428) > [ 10.494005] [<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from > [<c015e3c0>] (ubi_leb_read+0xe8/0x138) > [ 10.502859] [<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] > (ubifs_start_scan+0x7c/0xf4) > [ 10.511633] r7:c79f3000 r6:00000000 r5:c798b8e0 r4:00000000 > [ 10.517276] [<c00d689c>] (ubifs_start_scan+0x0/0xf4) from > [<c00d6b4c>] (ubifs_scan+0x2c/0x298) > [ 10.525878] r8:00000003 r7:c79f3000 r6:00000000 r5:c8d01000 r4:0001f000 > [ 10.532560] [<c00d6b20>] (ubifs_scan+0x0/0x298) from [<c00d71dc>] > (ubifs_replay_journal+0x14c/0x13a4) > [ 10.541769] [<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from > [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054) > [ 10.551580] [<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from > [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac) > [ 10.560525] [<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] > (vfs_kern_mount+0x58/0x94) > [ 10.569124] [<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] > (do_kern_mount+0x40/0xe8) > [ 10.577736] r8:c79fa000 r7:c02253ec r6:00000000 r5:c78fb000 r4:00000000 > [ 10.584408] [<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] > (do_new_mount+0x68/0x8c) > [ 10.592833] r8:00000000 r7:0000000a r6:c784bef0 r5:00000000 r4:c79fa000 > [ 10.599519] [<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] > (do_mount+0x15c/0x1b8) > [ 10.607694] r7:c79fa000 r6:c78fb000 r5:c78d9000 r4:00000404 > [ 10.613327] [<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] > (sys_mount+0x8c/0xd4) > [ 10.621144] [<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] > (ret_fast_syscall+0x0/0x2c) OK, this gives us additional info about where we are. > [ 10.629483] r7:00000015 r6:00008840 r5:00000000 r4:00000000 > [ 10.638307] BA315_STATUS_DEC_ERR : 0 4 on 24525 > [ 10.642677] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > [ 10.667078] UBI DBG (pid 273): ubi_io_read: fixable bit-flip detected > at PEB 189 Because of inlining, your stack-dump lacks some function calls. I think it was: ubifs_replay_journal() -> replay_log_leb() -> ubifs_scan() -> ... Then we had -EBADMSG, and returned back to 'replay_log_leb()'. Then we call 'ubifs_recover_log_leb()', which scans the LEB again. Yes, this is sub-optimal to scan twice, but this is how the thing is implemented. I can try to fix it later - but this is not so important now, let's first deal with your issues. But please, feel free to bug me later and remind about this. So, we are scanning for the second time. And this time we get a bit-flip instead of -EBADMSG. > [ 10.674321] UBI DBG (pid 273): ubi_wl_scrub_peb: schedule PEB 189 for And as the stack-dump below shoes, we end up here. > scrubbing > [ 10.681652] Unable to handle kernel NULL pointer dereference at > virtual address 00000000 > [ 10.689703] pgd = c79e4000 > [ 10.692381] [00000000] *pgd=47841031, *pte=00000000, *ppte=00000000 > [ 10.698639] Internal error: Oops: 817 [#1] > [ 10.702713] Modules linked in: > [ 10.705761] CPU: 0 Not tainted (2.6.27.47-parrot-dirty #212) > [ 10.711769] PC is at prot_queue_del+0x2c/0x50 > [ 10.716100] LR is at ubi_wl_scrub_peb+0xec/0x13c > [ 10.720700] pc : [<c0162430>] lr : [<c01635b4>] psr: a0000013 > [ 10.720716] sp : c784ba70 ip : c78ff290 fp : c784ba7c > [ 10.732157] r10: 00000000 r9 : 00000003 r8 : 000000bd > [ 10.737371] r7 : c789bbcc r6 : c789bbc0 r5 : c789b800 r4 : c78ff290 > [ 10.743883] r3 : 00100100 r2 : 00000001 r1 : 00000000 r0 : ffffffed > [ 10.750398] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment user > [ 10.757519] Control: 0005317f Table: 479e4000 DAC: 00000015 > [ 10.763249] Process endurance (pid: 273, stack limit = 0xc784a268) > [ 10.769416] Stack: (0xc784ba70 to 0xc784c000) And oops, because 'ubi_wl_scrub_peb()' is buggy and does not handle the case when the PEB is erroneous. [snip long hex dump of the stack] > [ 11.145022] Backtrace: > [ 11.147455] [<c0162404>] (prot_queue_del+0x0/0x50) from [<c01635b4>] > (ubi_wl_scrub_peb+0xec/0x13c) > [ 11.156400] [<c01634c8>] (ubi_wl_scrub_peb+0x0/0x13c) from > [<c01603bc>] (ubi_eba_read_leb+0x200/0x428) > [ 11.165693] r8:000000bd r7:c73fd000 r6:c789b800 r5:c8d01000 r4:00000000 > [ 11.172379] [<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from > [<c015e3c0>] (ubi_leb_read+0xe8/0x138) > [ 11.181237] [<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] > (ubifs_start_scan+0x7c/0xf4) > [ 11.190010] r7:c79f3000 r6:00000000 r5:c798b8e0 r4:00000000 > [ 11.195654] [<c00d689c>] (ubifs_start_scan+0x0/0xf4) from > [<c00e3650>] (ubifs_recover_leb+0x3c/0x730) > [ 11.204861] r8:00000003 r7:c79f3000 r6:c798b8e0 r5:c79f3000 r4:c8d01000 > [ 11.211547] [<c00e3614>] (ubifs_recover_leb+0x0/0x730) from > [<c00e444c>] (ubifs_recover_log_leb+0xc8/0x2dc) > [ 11.221274] [<c00e4384>] (ubifs_recover_log_leb+0x0/0x2dc) from Yeah, I was right, it is log LEB. > [<c00d7c20>] (ubifs_replay_journal+0xb90/0x13a4) > [ 11.231435] [<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from > [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054) > [ 11.241249] [<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from > [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac) > [ 11.250194] [<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] > (vfs_kern_mount+0x58/0x94) > [ 11.258793] [<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] > (do_kern_mount+0x40/0xe8) > [ 11.267390] r8:c79fa000 r7:c02253ec r6:00000000 r5:c78fb000 r4:00000000 > [ 11.274077] [<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] > (do_new_mount+0x68/0x8c) > [ 11.282501] r8:00000000 r7:0000000a r6:c784bef0 r5:00000000 r4:c79fa000 > [ 11.289188] [<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] > (do_mount+0x15c/0x1b8) > [ 11.297352] r7:c79fa000 r6:c78fb000 r5:c78d9000 r4:00000404 > [ 11.302997] [<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] > (sys_mount+0x8c/0xd4) > [ 11.310813] [<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] > (ret_fast_syscall+0x0/0x2c) > [ 11.319151] r7:00000015 r6:00008840 r5:00000000 r4:00000000 > [ 11.324794] Code: 089da800 e59c1004 e59c2000 e59f3018 (e5812000) > [ 11.330924] Kernel panic - not syncing: Fatal exception The patch at the end should fix this oops. With this patch you would not have an oops. Instead, 'ubi_wl_scrub_peb()' would just ignore the PEB and would just return. Then 'ubifs_recover_leb()' would finish the scanning, collected all good nodes, and then it would refresh this LEB (!!!) using 'ubi_leb_change()' (see ubifs_recover_leb() -> fix_unclean_leb() -> ubi_leb_change()). So the result would be that this faulty PEB would be scheduled for erasure, i.e., exactly what you want. Here is the patch. I only compile-tested it, but it looks correct and obvious to me, and I'd even sent it to Linus for 2.6.36 inclusion, if you'd test it or approved. From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Subject: [PATCH] UBIFS: do not oops when erroneous PEB is scheduled for scrubbing When an erroneous PEB is scheduling for scrubbing, we end up with the following oops: [<c0162404>] (prot_queue_del+0x0/0x50) from [<c01635b4>] (ubi_wl_scrub_peb+0xec/0x13c) [<c01634c8>] (ubi_wl_scrub_peb+0x0/0x13c) from [<c01603bc>] (ubi_eba_read_leb+0x200/0x428) [<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from [<c015e3c0>] (ubi_leb_read+0xe8/0x138) [<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] (ubifs_start_scan+0x7c/0xf4) [<c00d689c>] (ubifs_start_scan+0x0/0xf4) from [<c00e3650>] (ubifs_recover_leb+0x3c/0x730) [<c00e3614>] (ubifs_recover_leb+0x0/0x730) from [<c00e444c>] (ubifs_recover_log_leb+0xc8/0x2dc) [<c00e4384>] (ubifs_recover_log_leb+0x0/0x2dc) from [<c00d7c20>] (ubifs_replay_journal+0xb90/0x13a4) [<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054) [<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac) [<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] (vfs_kern_mount+0x58/0x94) [<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] (do_kern_mount+0x40/0xe8) [<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] (do_new_mount+0x68/0x8c) [<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] (do_mount+0x15c/0x1b8) [<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] (sys_mount+0x8c/0xd4) [<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c) Kernel panic - not syncing: Fatal exception The problem is that 'ubi_wl_scrub_peb()' does not expect that PEBs may be in the erroneous tree, which is a bug. This patch fixes the bug and adds corresponding check to 'ubi_wl_scrub_peb()'. Now it will simply ignore erroneous PEBs, instead of causing an oops. Reported-by: Matthieu CASTET <matthieu.castet@parrot.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- drivers/mtd/ubi/wl.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index ee7b1d8..97a4356 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -1212,7 +1212,8 @@ int ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum) retry: spin_lock(&ubi->wl_lock); e = ubi->lookuptbl[pnum]; - if (e == ubi->move_from || in_wl_tree(e, &ubi->scrub)) { + if (e == ubi->move_from || in_wl_tree(e, &ubi->scrub) || + in_wl_tree(e, &ubi->erroneous)) { spin_unlock(&ubi->wl_lock); return 0; } -- 1.7.2.2 -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ubi : kernel panic on erroneous block 2010-08-29 20:46 ` Artem Bityutskiy @ 2010-08-29 21:00 ` Artem Bityutskiy 0 siblings, 0 replies; 5+ messages in thread From: Artem Bityutskiy @ 2010-08-29 21:00 UTC (permalink / raw) To: Matthieu CASTET; +Cc: Artem Bityutskiy, linux-mtd@lists.infradead.org On Sun, 2010-08-29 at 23:46 +0300, Artem Bityutskiy wrote: > Because of inlining, your stack-dump lacks some function calls. I think > it was: ubifs_replay_journal() -> replay_log_leb() -> ubifs_scan() > -> ... Then we had -EBADMSG, and returned back to 'replay_log_leb()'. > Then we call 'ubifs_recover_log_leb()', which scans the LEB again. > > Yes, this is sub-optimal to scan twice, but this is how the thing is > implemented. I can try to fix it later - but this is not so important > now, let's first deal with your issues. But please, feel free to bug me > later and remind about this. An additional idea: we can strengthen 'ubi_io_read()' and make it re-try several times if there was -EBADMSG. It already retires for 'read != len' case, but probably we should make it retry in case of any error. But this is not a fix, just an improvement. Lets do this at the end, when we have addressed your issues, because otherwise it will be more difficult to reproduce. So, let's postpone this, but please, bug me and remind about this. > From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> > Subject: [PATCH] UBIFS: do not oops when erroneous PEB is scheduled for scrubbing And of course the prefix should be "UBI:". I've pushed this patch to the UBI tree as well. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-08-29 21:00 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-10 9:56 ubi : kernel panic on erroneous block Matthieu CASTET 2010-08-10 11:42 ` Artem Bityutskiy 2010-08-23 13:30 ` Matthieu CASTET 2010-08-29 20:46 ` Artem Bityutskiy 2010-08-29 21:00 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).