* UBIFS corruption after power cut - possibly unstable bits issue?
@ 2015-10-26 19:37 Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
0 siblings, 1 reply; 18+ messages in thread
From: Tim Harvey @ 2015-10-26 19:37 UTC (permalink / raw)
To: Artem Bityutskiy, Adrian Hunter, linux-mtd, Richard Weinberger
Greetings,
I'm looking into a re-producible ubifs corruption issue on an IMX6
based board with a 16Gb MT29F2G08 SLC NAND flash. During boot
reliability tests we can recreate perhaps 1 in 10000 cases of a UBIFS
corruption that results in a permanent failure to mount the rootfs.
In this scenario the U-Boot bootloader first mounts the ubifs to load
the kernel/dtb from the /boot directory, then the kernel is mounting
the rootfs read-only, then later its mounted read/write via /etc/fstab
but is never written to directly by our software. Our software runs
various tests for several minutes then power-cuts the board.
After corruption, U-Boot fails reading the superblock but if I load a
kernel/dtb from the net and boot it I see some more useful info from
the kernel:
[ 0.000000] Linux version 4.2.0-00012-gd722238 (tharvey@tharvey)
(gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #189 SMP
Fri Oct 23 16:24:27 PDT 2015
...
[ 2.375663] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
[ 2.382075] nand: Micron MT29F16G08ADACAH4
[ 2.386190] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
4096, OOB size: 224
[ 2.394821] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
[ 2.401760] Scanning device for bad blocks
[ 4.000436] 3 ofpart partitions found on MTD device gpmi-nand
[ 4.006201] Creating 3 MTD partitions on "gpmi-nand":
[ 4.011298] 0x000000000000-0x000001000000 : "uboot"
[ 4.021174] 0x000001000000-0x000001100000 : "env"
[ 4.028287] 0x000001100000-0x000080000000 : "rootfs"
[ 4.037134] gpmi-nand 112000.gpmi-nand: driver registered.
...
[ 8.678875] ubi0: scanning is finished
[ 8.706815] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
[ 8.712710] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
[ 8.719598] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
[ 8.726415] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
[ 8.733398] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
[ 8.739501] ubi0: user volume: 1, internal volumes: 1, max. volumes
count: 128
[ 8.746748] ubi0: max/mean erase counter: 5/2, WL threshold: 4096,
image sequence number: 36433972
[ 8.755747] ubi0: available PEBs: 0, total reserved PEBs: 8124,
PEBs reserved for bad PEB handling: 160
[ 8.765357] ubi0: background thread "ubi_bgt0d" started, PID 79
...
[ 8.888732] UBIFS (ubi0:0): recovery needed
[ 8.916292] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
864 bytes from PEB 5195:225280, read only 36864 bytes, retry
[ 8.931116] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
864 bytes from PEB 5195:225280, read only 36864 bytes, retry
[ 8.945869] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
864 bytes from PEB 5195:225280, read only 36864 bytes, retry
[ 8.960676] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 3686
4 bytes from PEB 5195:225280, read 36864 bytes
[ 8.971859] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-00012-gd722238 #1
89
[ 8.979177] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 8.985747] Backtrace:
[ 8.988248] [<80013468>] (dump_backtrace) from [<80013654>] (show_stack+0x18/
0x1c)
[ 8.995848] r6:80abf960 r5:80abf960 r4:00000000 r3:00000000
[ 9.001649] [<8001363c>] (show_stack) from [<80798948>]
(dump_stack+0x8c/0xa4)
[ 9.008894] [<807988bc>] (dump_stack) from [<8045d410>]
(ubi_io_read+0x128/0x304)
[ 9.016428] r6:00000000 r5:00009000 r4:ffffffb6 r3:ee070000
[ 9.022201] [<8045d2e8>] (ubi_io_read) from [<8045a650>]
(ubi_eba_read_leb+0x18c/0x430)
[ 9.030210] r10:00000000 r9:f069e000 r8:0000144b r7:00000179
r6:00000000 r5:edab6000
[ 9.038189] r4:edb10000
[ 9.040783] [<8045a4c4>] (ubi_eba_read_leb) from [<80459a08>]
(ubi_leb_read+0x7c/0xd4)
[ 9.048724] r10:00000000 r9:00000179 r8:f069e000 r7:edb10000
r6:00035000 r5:00009000
[ 9.056677] r4:edab6000
[ 9.059254] [<8045998c>] (ubi_leb_read) from [<80253c2c>]
(ubifs_leb_read+0x30/0x9c)
[ 9.067044] r10:ee034000 r9:edb55e40 r8:00000179 r7:00035000
r6:00009000 r5:ee034000
[ 9.074996] r4:edb55e40
[ 9.077563] [<80253bfc>] (ubifs_leb_read) from [<8025bfe0>]
(ubifs_start_scan+0x74/0xf0)
[ 9.085697] r8:f0669000 r7:00000179 r6:ee034000 r5:00035000 r4:edb55e40
[ 9.092575] [<8025bf6c>] (ubifs_start_scan) from [<8026e624>]
(ubifs_recover_leb+0x64/0x810)
[ 9.101051] r8:00009000 r7:00000179 r6:ee034000 r5:edb55cc0 r4:00000179
[ 9.107850] [<8026e5c0>] (ubifs_recover_leb) from [<8025d890>]
(ubifs_replay_journal+0xed0/0x16a0)
[ 9.116846] r10:ee034000 r9:edb55e40 r8:edb55e48 r7:ee034000
r6:edb55cc0 r5:edb55cc0
[ 9.124782] r4:00000179
[ 9.127344] [<8025c9c0>] (ubifs_replay_journal) from [<802515ec>]
(ubifs_mount+0x1234/0x19fc)
[ 9.135906] r10:00000001 r9:ee034008 r8:edb4a300 r7:eda92800
r6:ee034b38 r5:ee034000
[ 9.143842] r4:00000000
[ 9.146411] [<802503b8>] (ubifs_mount) from [<80103c6c>] (mount_fs+0x1c/0xb0)
[ 9.153578] r10:00008001 r9:00000000 r8:80abc2e0 r7:801234a8
r6:80abc2e0 r5:edb4a280
[ 9.161514] r4:eda61400
[ 9.164084] [<80103c50>] (mount_fs) from [<8011fedc>]
(vfs_kern_mount+0x5c/0x144)
[ 9.171602] r6:00008001 r5:edb4a280 r4:eda61400
[ 9.176282] [<8011fe80>] (vfs_kern_mount) from [<801234a8>]
(do_mount+0x19c/0xd0c)
[ 9.183886] r9:80abc2e0 r8:80ab0bdc r7:edb4a240 r6:edb4a280
r5:00000060 r4:00000000
[ 9.191743] [<8012330c>] (do_mount) from [<801243ac>] (SyS_mount+0x7c/0xa8)
[ 9.198721] r10:000000d5 r9:80a8d9c8 r8:8094cfa0 r7:00008001
r6:00000000 r5:edb4a280
[ 9.206660] r4:edb4a240
[ 9.209227] [<80124330>] (SyS_mount) from [<80a3a248>]
(mount_block_root+0x114/0x240)
[ 9.217091] r8:ef5af1c0 r7:80a8d9b8 r6:edb8e000 r5:edb8e000 r4:00008001
[ 9.223907] [<80a3a134>] (mount_block_root) from [<80a3a634>]
(prepare_namespace+0x98/0x1d0)
[ 9.232380] r10:000000d5 r9:80a8d998 r8:80a9b5c0 r7:80a8d990
r6:80a8d9b8 r5:80a8d9b8
[ 9.240297] r4:80a8d9c8
[ 9.242877] [<80a3a59c>] (prepare_namespace) from [<80a39ee4>]
(kernel_init_freeable+0x1a4/0x1f0)
[ 9.251780] r6:80b09000 r5:80b09000 r4:00000007
[ 9.256467] [<80a39d40>] (kernel_init_freeable) from [<80792cdc>]
(kernel_init+0x10/0xf4)
[ 9.264667] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:
80792ccc
[ 9.272614] r4:00000000
[ 9.275177] [<80792ccc>] (kernel_init) from [<8000f938>]
(ret_from_fork+0x14/0x3c)
[ 9.282782] r4:00000000 r3:00000000
[ 9.286909] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
empty space LEB 377:217088, corruption starts at 24762
[ 9.297900] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
corruption at LEB 377:241850
[ 9.306536] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
first 8192 bytes from LEB 377:241850
[ 9.315870] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 9.327374] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 9.338883] 00000040: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 9.350389] 00000060: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
...
[ 9.352755] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
fffffff ffffffff ................................
[ 9.352765] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
fffffff ffffffff ................................
[ 9.352770] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 377 scanning f
ailed
[ 12.271485] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0):
error -117
Thus far we have encountered this with a 16Gb MT29F16G08 and 'not'
with a 2Gb MT29F2G08. The two parts have different geometries and the
16Gb part has a much larger block erase time (2ms) compared to the 2Gb
(700us).
We have also seen several occurrences of errors on both NAND chips
that cause a failure to mount rootfs but are not permanent
(power-cycle again and all is well):
[17:39:02][ 2.666800] ubi0: attaching mtd2
[17:39:05][ 5.388606] ubi0: scanning is finished
[17:39:05][ 5.395747] ubi0 error: vtbl_check: bad CRC at record 72:
0xf116c36b, not 0x000000
[17:39:05][ 5.403330] Volume table record 72 dump:
[17:39:05][ 5.407279] reserved_pebs 0
[17:39:05][ 5.410342] alignment 0
[17:39:05][ 5.413402] data_pad 0
[17:39:05][ 5.416481] vol_type 0
[17:39:05][ 5.419543] upd_marker 0
[17:39:05][ 5.422603] name_len 0
[17:39:05][ 5.427696] name NULL
[17:39:05][ 5.433237] ubi0 error: vtbl_check: bad CRC at record 72:
0xf116c36b, not 0x000000
[17:39:05][ 5.440846] Volume table record 72 dump:
[17:39:05][ 5.446806] reserved_pebs 0
[17:39:05][ 5.451881] alignment 0
[17:39:05][ 5.456967] data_pad 0
[17:39:05][ 5.462042] vol_type 0
[17:39:05][ 5.467137] upd_marker 0
[17:39:05][ 5.472216] name_len 0
[17:39:05][ 5.475304] name NULL
[17:39:05][ 5.478629] ubi0 error: process_lvol: both volume tables
are corrupted
[17:39:05][ 5.492599] ubi0 error: ubi_attach_mtd_dev: failed to
attach mtd2, error -22
[17:39:05][ 5.499728] UBI error: cannot attach mtd2
Are any/all of the above the 'unstable bits' issue? Any
recommendations on what to look for next?
Regards,
Tim
Tim Harvey - Principal Software Engineer
Gateworks Corporation - http://www.gateworks.com/
3026 S. Higuera St. San Luis Obispo CA 93401
805-781-2000
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
@ 2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31 ` Tim Harvey
2015-11-03 10:06 ` Michal Suchanek
0 siblings, 2 replies; 18+ messages in thread
From: Richard Weinberger @ 2015-10-26 20:01 UTC (permalink / raw)
To: Tim Harvey, Artem Bityutskiy, Adrian Hunter, linux-mtd
Tim,
Am 26.10.2015 um 20:37 schrieb Tim Harvey:
> Greetings,
>
> I'm looking into a re-producible ubifs corruption issue on an IMX6
> based board with a 16Gb MT29F2G08 SLC NAND flash. During boot
> reliability tests we can recreate perhaps 1 in 10000 cases of a UBIFS
> corruption that results in a permanent failure to mount the rootfs.
>
> In this scenario the U-Boot bootloader first mounts the ubifs to load
> the kernel/dtb from the /boot directory, then the kernel is mounting
> the rootfs read-only, then later its mounted read/write via /etc/fstab
> but is never written to directly by our software. Our software runs
> various tests for several minutes then power-cuts the board.
>
> After corruption, U-Boot fails reading the superblock but if I load a
> kernel/dtb from the net and boot it I see some more useful info from
> the kernel:
>
> [ 0.000000] Linux version 4.2.0-00012-gd722238 (tharvey@tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #189 SMP
> Fri Oct 23 16:24:27 PDT 2015
> ...
> [ 2.375663] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.382075] nand: Micron MT29F16G08ADACAH4
> [ 2.386190] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.394821] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.401760] Scanning device for bad blocks
> [ 4.000436] 3 ofpart partitions found on MTD device gpmi-nand
> [ 4.006201] Creating 3 MTD partitions on "gpmi-nand":
> [ 4.011298] 0x000000000000-0x000001000000 : "uboot"
> [ 4.021174] 0x000001000000-0x000001100000 : "env"
> [ 4.028287] 0x000001100000-0x000080000000 : "rootfs"
> [ 4.037134] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 8.678875] ubi0: scanning is finished
> [ 8.706815] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.712710] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.719598] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.726415] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.733398] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.739501] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.746748] ubi0: max/mean erase counter: 5/2, WL threshold: 4096,
> image sequence number: 36433972
> [ 8.755747] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.765357] ubi0: background thread "ubi_bgt0d" started, PID 79
> ...
> [ 8.888732] UBIFS (ubi0:0): recovery needed
> [ 8.916292] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
> [ 8.931116] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
> [ 8.945869] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
> [ 8.960676] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 3686
> 4 bytes from PEB 5195:225280, read 36864 bytes
> [ 8.971859] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-00012-gd722238 #1
> 89
> [ 8.979177] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 8.985747] Backtrace:
> [ 8.988248] [<80013468>] (dump_backtrace) from [<80013654>] (show_stack+0x18/
> 0x1c)
> [ 8.995848] r6:80abf960 r5:80abf960 r4:00000000 r3:00000000
> [ 9.001649] [<8001363c>] (show_stack) from [<80798948>]
> (dump_stack+0x8c/0xa4)
> [ 9.008894] [<807988bc>] (dump_stack) from [<8045d410>]
> (ubi_io_read+0x128/0x304)
> [ 9.016428] r6:00000000 r5:00009000 r4:ffffffb6 r3:ee070000
> [ 9.022201] [<8045d2e8>] (ubi_io_read) from [<8045a650>]
> (ubi_eba_read_leb+0x18c/0x430)
> [ 9.030210] r10:00000000 r9:f069e000 r8:0000144b r7:00000179
> r6:00000000 r5:edab6000
> [ 9.038189] r4:edb10000
> [ 9.040783] [<8045a4c4>] (ubi_eba_read_leb) from [<80459a08>]
> (ubi_leb_read+0x7c/0xd4)
> [ 9.048724] r10:00000000 r9:00000179 r8:f069e000 r7:edb10000
> r6:00035000 r5:00009000
> [ 9.056677] r4:edab6000
> [ 9.059254] [<8045998c>] (ubi_leb_read) from [<80253c2c>]
> (ubifs_leb_read+0x30/0x9c)
> [ 9.067044] r10:ee034000 r9:edb55e40 r8:00000179 r7:00035000
> r6:00009000 r5:ee034000
> [ 9.074996] r4:edb55e40
> [ 9.077563] [<80253bfc>] (ubifs_leb_read) from [<8025bfe0>]
> (ubifs_start_scan+0x74/0xf0)
> [ 9.085697] r8:f0669000 r7:00000179 r6:ee034000 r5:00035000 r4:edb55e40
> [ 9.092575] [<8025bf6c>] (ubifs_start_scan) from [<8026e624>]
> (ubifs_recover_leb+0x64/0x810)
> [ 9.101051] r8:00009000 r7:00000179 r6:ee034000 r5:edb55cc0 r4:00000179
> [ 9.107850] [<8026e5c0>] (ubifs_recover_leb) from [<8025d890>]
> (ubifs_replay_journal+0xed0/0x16a0)
> [ 9.116846] r10:ee034000 r9:edb55e40 r8:edb55e48 r7:ee034000
> r6:edb55cc0 r5:edb55cc0
> [ 9.124782] r4:00000179
> [ 9.127344] [<8025c9c0>] (ubifs_replay_journal) from [<802515ec>]
> (ubifs_mount+0x1234/0x19fc)
> [ 9.135906] r10:00000001 r9:ee034008 r8:edb4a300 r7:eda92800
> r6:ee034b38 r5:ee034000
> [ 9.143842] r4:00000000
> [ 9.146411] [<802503b8>] (ubifs_mount) from [<80103c6c>] (mount_fs+0x1c/0xb0)
> [ 9.153578] r10:00008001 r9:00000000 r8:80abc2e0 r7:801234a8
> r6:80abc2e0 r5:edb4a280
> [ 9.161514] r4:eda61400
> [ 9.164084] [<80103c50>] (mount_fs) from [<8011fedc>]
> (vfs_kern_mount+0x5c/0x144)
> [ 9.171602] r6:00008001 r5:edb4a280 r4:eda61400
> [ 9.176282] [<8011fe80>] (vfs_kern_mount) from [<801234a8>]
> (do_mount+0x19c/0xd0c)
> [ 9.183886] r9:80abc2e0 r8:80ab0bdc r7:edb4a240 r6:edb4a280
> r5:00000060 r4:00000000
> [ 9.191743] [<8012330c>] (do_mount) from [<801243ac>] (SyS_mount+0x7c/0xa8)
> [ 9.198721] r10:000000d5 r9:80a8d9c8 r8:8094cfa0 r7:00008001
> r6:00000000 r5:edb4a280
> [ 9.206660] r4:edb4a240
> [ 9.209227] [<80124330>] (SyS_mount) from [<80a3a248>]
> (mount_block_root+0x114/0x240)
> [ 9.217091] r8:ef5af1c0 r7:80a8d9b8 r6:edb8e000 r5:edb8e000 r4:00008001
> [ 9.223907] [<80a3a134>] (mount_block_root) from [<80a3a634>]
> (prepare_namespace+0x98/0x1d0)
> [ 9.232380] r10:000000d5 r9:80a8d998 r8:80a9b5c0 r7:80a8d990
> r6:80a8d9b8 r5:80a8d9b8
> [ 9.240297] r4:80a8d9c8
> [ 9.242877] [<80a3a59c>] (prepare_namespace) from [<80a39ee4>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.251780] r6:80b09000 r5:80b09000 r4:00000007
> [ 9.256467] [<80a39d40>] (kernel_init_freeable) from [<80792cdc>]
> (kernel_init+0x10/0xf4)
> [ 9.264667] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:
> 80792ccc
> [ 9.272614] r4:00000000
> [ 9.275177] [<80792ccc>] (kernel_init) from [<8000f938>]
> (ret_from_fork+0x14/0x3c)
> [ 9.282782] r4:00000000 r3:00000000
> [ 9.286909] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
> empty space LEB 377:217088, corruption starts at 24762
> [ 9.297900] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> corruption at LEB 377:241850
> [ 9.306536] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> first 8192 bytes from LEB 377:241850
> [ 9.315870] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 9.327374] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 9.338883] 00000040: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 9.350389] 00000060: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> ...
> [ 9.352755] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
> fffffff ffffffff ................................
> [ 9.352765] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
> fffffff ffffffff ................................
> [ 9.352770] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 377 scanning f
> ailed
> [ 12.271485] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0):
> error -117
>
> Thus far we have encountered this with a 16Gb MT29F16G08 and 'not'
> with a 2Gb MT29F2G08. The two parts have different geometries and the
> 16Gb part has a much larger block erase time (2ms) compared to the 2Gb
> (700us).
gpmi-nand is not able to correct bit flips on erased pages.
This is why UBI is facing uncorrectable ECC errors and UBIFS gives up.
In March there was an attempt to fix that in software.
But no mainline ready solution was presented so far:
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052521.html
It is not clear whether to implement this directly in gpmi-nand or MTD core.
Currently UBIFS assumes that empty spaces must contain only 0xff octets.
A naive approach would be removing that check from UBIFS, bit this can have
disastrous consequences as UBIFS's recovery algorithm relies on that.
> We have also seen several occurrences of errors on both NAND chips
> that cause a failure to mount rootfs but are not permanent
> (power-cycle again and all is well):
> [17:39:02][ 2.666800] ubi0: attaching mtd2
> [17:39:05][ 5.388606] ubi0: scanning is finished
> [17:39:05][ 5.395747] ubi0 error: vtbl_check: bad CRC at record 72:
> 0xf116c36b, not 0x000000
> [17:39:05][ 5.403330] Volume table record 72 dump:
> [17:39:05][ 5.407279] reserved_pebs 0
> [17:39:05][ 5.410342] alignment 0
> [17:39:05][ 5.413402] data_pad 0
> [17:39:05][ 5.416481] vol_type 0
> [17:39:05][ 5.419543] upd_marker 0
> [17:39:05][ 5.422603] name_len 0
> [17:39:05][ 5.427696] name NULL
> [17:39:05][ 5.433237] ubi0 error: vtbl_check: bad CRC at record 72:
> 0xf116c36b, not 0x000000
> [17:39:05][ 5.440846] Volume table record 72 dump:
> [17:39:05][ 5.446806] reserved_pebs 0
> [17:39:05][ 5.451881] alignment 0
> [17:39:05][ 5.456967] data_pad 0
> [17:39:05][ 5.462042] vol_type 0
> [17:39:05][ 5.467137] upd_marker 0
> [17:39:05][ 5.472216] name_len 0
> [17:39:05][ 5.475304] name NULL
> [17:39:05][ 5.478629] ubi0 error: process_lvol: both volume tables
> are corrupted
> [17:39:05][ 5.492599] ubi0 error: ubi_attach_mtd_dev: failed to
> attach mtd2, error -22
> [17:39:05][ 5.499728] UBI error: cannot attach mtd2
>
> Are any/all of the above the 'unstable bits' issue? Any
Not really. Why do you think so?
> recommendations on what to look for next?
How does the corruption look like in the second case?
Maybe we can identify a pattern. :-)
Thanks,
//richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-26 20:01 ` Richard Weinberger
@ 2015-10-26 20:31 ` Tim Harvey
2015-10-26 21:41 ` Richard Weinberger
2015-11-03 10:06 ` Michal Suchanek
1 sibling, 1 reply; 18+ messages in thread
From: Tim Harvey @ 2015-10-26 20:31 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd
On Mon, Oct 26, 2015 at 1:01 PM, Richard Weinberger <richard@nod.at> wrote:
> Tim,
>
> Am 26.10.2015 um 20:37 schrieb Tim Harvey:
>> Greetings,
>>
>> I'm looking into a re-producible ubifs corruption issue on an IMX6
>> based board with a 16Gb MT29F2G08 SLC NAND flash. During boot
>> reliability tests we can recreate perhaps 1 in 10000 cases of a UBIFS
>> corruption that results in a permanent failure to mount the rootfs.
>>
>> In this scenario the U-Boot bootloader first mounts the ubifs to load
>> the kernel/dtb from the /boot directory, then the kernel is mounting
>> the rootfs read-only, then later its mounted read/write via /etc/fstab
>> but is never written to directly by our software. Our software runs
>> various tests for several minutes then power-cuts the board.
>>
>> After corruption, U-Boot fails reading the superblock but if I load a
>> kernel/dtb from the net and boot it I see some more useful info from
>> the kernel:
>>
>> [ 0.000000] Linux version 4.2.0-00012-gd722238 (tharvey@tharvey)
>> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #189 SMP
>> Fri Oct 23 16:24:27 PDT 2015
>> ...
>> [ 2.375663] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
>> [ 2.382075] nand: Micron MT29F16G08ADACAH4
>> [ 2.386190] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
>> 4096, OOB size: 224
>> [ 2.394821] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
>> [ 2.401760] Scanning device for bad blocks
>> [ 4.000436] 3 ofpart partitions found on MTD device gpmi-nand
>> [ 4.006201] Creating 3 MTD partitions on "gpmi-nand":
>> [ 4.011298] 0x000000000000-0x000001000000 : "uboot"
>> [ 4.021174] 0x000001000000-0x000001100000 : "env"
>> [ 4.028287] 0x000001100000-0x000080000000 : "rootfs"
>> [ 4.037134] gpmi-nand 112000.gpmi-nand: driver registered.
>> ...
>> [ 8.678875] ubi0: scanning is finished
>> [ 8.706815] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
>> [ 8.712710] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
>> [ 8.719598] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
>> [ 8.726415] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
>> [ 8.733398] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
>> [ 8.739501] ubi0: user volume: 1, internal volumes: 1, max. volumes
>> count: 128
>> [ 8.746748] ubi0: max/mean erase counter: 5/2, WL threshold: 4096,
>> image sequence number: 36433972
>> [ 8.755747] ubi0: available PEBs: 0, total reserved PEBs: 8124,
>> PEBs reserved for bad PEB handling: 160
>> [ 8.765357] ubi0: background thread "ubi_bgt0d" started, PID 79
>> ...
>> [ 8.888732] UBIFS (ubi0:0): recovery needed
>> [ 8.916292] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
>> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
>> [ 8.931116] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
>> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
>> [ 8.945869] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 36
>> 864 bytes from PEB 5195:225280, read only 36864 bytes, retry
>> [ 8.960676] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 3686
>> 4 bytes from PEB 5195:225280, read 36864 bytes
>> [ 8.971859] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-00012-gd722238 #1
>> 89
>> [ 8.979177] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [ 8.985747] Backtrace:
>> [ 8.988248] [<80013468>] (dump_backtrace) from [<80013654>] (show_stack+0x18/
>> 0x1c)
>> [ 8.995848] r6:80abf960 r5:80abf960 r4:00000000 r3:00000000
>> [ 9.001649] [<8001363c>] (show_stack) from [<80798948>]
>> (dump_stack+0x8c/0xa4)
>> [ 9.008894] [<807988bc>] (dump_stack) from [<8045d410>]
>> (ubi_io_read+0x128/0x304)
>> [ 9.016428] r6:00000000 r5:00009000 r4:ffffffb6 r3:ee070000
>> [ 9.022201] [<8045d2e8>] (ubi_io_read) from [<8045a650>]
>> (ubi_eba_read_leb+0x18c/0x430)
>> [ 9.030210] r10:00000000 r9:f069e000 r8:0000144b r7:00000179
>> r6:00000000 r5:edab6000
>> [ 9.038189] r4:edb10000
>> [ 9.040783] [<8045a4c4>] (ubi_eba_read_leb) from [<80459a08>]
>> (ubi_leb_read+0x7c/0xd4)
>> [ 9.048724] r10:00000000 r9:00000179 r8:f069e000 r7:edb10000
>> r6:00035000 r5:00009000
>> [ 9.056677] r4:edab6000
>> [ 9.059254] [<8045998c>] (ubi_leb_read) from [<80253c2c>]
>> (ubifs_leb_read+0x30/0x9c)
>> [ 9.067044] r10:ee034000 r9:edb55e40 r8:00000179 r7:00035000
>> r6:00009000 r5:ee034000
>> [ 9.074996] r4:edb55e40
>> [ 9.077563] [<80253bfc>] (ubifs_leb_read) from [<8025bfe0>]
>> (ubifs_start_scan+0x74/0xf0)
>> [ 9.085697] r8:f0669000 r7:00000179 r6:ee034000 r5:00035000 r4:edb55e40
>> [ 9.092575] [<8025bf6c>] (ubifs_start_scan) from [<8026e624>]
>> (ubifs_recover_leb+0x64/0x810)
>> [ 9.101051] r8:00009000 r7:00000179 r6:ee034000 r5:edb55cc0 r4:00000179
>> [ 9.107850] [<8026e5c0>] (ubifs_recover_leb) from [<8025d890>]
>> (ubifs_replay_journal+0xed0/0x16a0)
>> [ 9.116846] r10:ee034000 r9:edb55e40 r8:edb55e48 r7:ee034000
>> r6:edb55cc0 r5:edb55cc0
>> [ 9.124782] r4:00000179
>> [ 9.127344] [<8025c9c0>] (ubifs_replay_journal) from [<802515ec>]
>> (ubifs_mount+0x1234/0x19fc)
>> [ 9.135906] r10:00000001 r9:ee034008 r8:edb4a300 r7:eda92800
>> r6:ee034b38 r5:ee034000
>> [ 9.143842] r4:00000000
>> [ 9.146411] [<802503b8>] (ubifs_mount) from [<80103c6c>] (mount_fs+0x1c/0xb0)
>> [ 9.153578] r10:00008001 r9:00000000 r8:80abc2e0 r7:801234a8
>> r6:80abc2e0 r5:edb4a280
>> [ 9.161514] r4:eda61400
>> [ 9.164084] [<80103c50>] (mount_fs) from [<8011fedc>]
>> (vfs_kern_mount+0x5c/0x144)
>> [ 9.171602] r6:00008001 r5:edb4a280 r4:eda61400
>> [ 9.176282] [<8011fe80>] (vfs_kern_mount) from [<801234a8>]
>> (do_mount+0x19c/0xd0c)
>> [ 9.183886] r9:80abc2e0 r8:80ab0bdc r7:edb4a240 r6:edb4a280
>> r5:00000060 r4:00000000
>> [ 9.191743] [<8012330c>] (do_mount) from [<801243ac>] (SyS_mount+0x7c/0xa8)
>> [ 9.198721] r10:000000d5 r9:80a8d9c8 r8:8094cfa0 r7:00008001
>> r6:00000000 r5:edb4a280
>> [ 9.206660] r4:edb4a240
>> [ 9.209227] [<80124330>] (SyS_mount) from [<80a3a248>]
>> (mount_block_root+0x114/0x240)
>> [ 9.217091] r8:ef5af1c0 r7:80a8d9b8 r6:edb8e000 r5:edb8e000 r4:00008001
>> [ 9.223907] [<80a3a134>] (mount_block_root) from [<80a3a634>]
>> (prepare_namespace+0x98/0x1d0)
>> [ 9.232380] r10:000000d5 r9:80a8d998 r8:80a9b5c0 r7:80a8d990
>> r6:80a8d9b8 r5:80a8d9b8
>> [ 9.240297] r4:80a8d9c8
>> [ 9.242877] [<80a3a59c>] (prepare_namespace) from [<80a39ee4>]
>> (kernel_init_freeable+0x1a4/0x1f0)
>> [ 9.251780] r6:80b09000 r5:80b09000 r4:00000007
>> [ 9.256467] [<80a39d40>] (kernel_init_freeable) from [<80792cdc>]
>> (kernel_init+0x10/0xf4)
>> [ 9.264667] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:
>> 80792ccc
>> [ 9.272614] r4:00000000
>> [ 9.275177] [<80792ccc>] (kernel_init) from [<8000f938>]
>> (ret_from_fork+0x14/0x3c)
>> [ 9.282782] r4:00000000 r3:00000000
>> [ 9.286909] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
>> empty space LEB 377:217088, corruption starts at 24762
>> [ 9.297900] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> corruption at LEB 377:241850
>> [ 9.306536] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> first 8192 bytes from LEB 377:241850
>> [ 9.315870] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.327374] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.338883] 00000040: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.350389] 00000060: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> ...
>> [ 9.352755] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
>> fffffff ffffffff ................................
>> [ 9.352765] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
>> fffffff ffffffff ................................
>> [ 9.352770] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 377 scanning f
>> ailed
>> [ 12.271485] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0):
>> error -117
>>
>> Thus far we have encountered this with a 16Gb MT29F16G08 and 'not'
>> with a 2Gb MT29F2G08. The two parts have different geometries and the
>> 16Gb part has a much larger block erase time (2ms) compared to the 2Gb
>> (700us).
>
> gpmi-nand is not able to correct bit flips on erased pages.
> This is why UBI is facing uncorrectable ECC errors and UBIFS gives up.
Hi Richard,
What would be causing the bit-flips on the erased pages? Could it have
something to do with the larger flash part having a much longer erase
time?
There shouldn't be anything writing to ubifs so I'm not clear what
caused this to occur. Note that even if I remove the /etc/fstab that
causes root to be re-mounted read/write I always see 'UBIFS (ubi0:0):
recovery needed' and I'm not understanding what causes that but it
makes me think that NAND is getting touched each and every boot by the
recovery process.
> In March there was an attempt to fix that in software.
> But no mainline ready solution was presented so far:
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052521.html
>
> It is not clear whether to implement this directly in gpmi-nand or MTD core.
> Currently UBIFS assumes that empty spaces must contain only 0xff octets.
> A naive approach would be removing that check from UBIFS, bit this can have
> disastrous consequences as UBIFS's recovery algorithm relies on that.
I think I ran across that approach right before the thread you pointed
me to: http://thread.gmane.org/gmane.linux.drivers.mtd/52208
>
>> We have also seen several occurrences of errors on both NAND chips
>> that cause a failure to mount rootfs but are not permanent
>> (power-cycle again and all is well):
>> [17:39:02][ 2.666800] ubi0: attaching mtd2
>> [17:39:05][ 5.388606] ubi0: scanning is finished
>> [17:39:05][ 5.395747] ubi0 error: vtbl_check: bad CRC at record 72:
>> 0xf116c36b, not 0x000000
>> [17:39:05][ 5.403330] Volume table record 72 dump:
>> [17:39:05][ 5.407279] reserved_pebs 0
>> [17:39:05][ 5.410342] alignment 0
>> [17:39:05][ 5.413402] data_pad 0
>> [17:39:05][ 5.416481] vol_type 0
>> [17:39:05][ 5.419543] upd_marker 0
>> [17:39:05][ 5.422603] name_len 0
>> [17:39:05][ 5.427696] name NULL
>> [17:39:05][ 5.433237] ubi0 error: vtbl_check: bad CRC at record 72:
>> 0xf116c36b, not 0x000000
>> [17:39:05][ 5.440846] Volume table record 72 dump:
>> [17:39:05][ 5.446806] reserved_pebs 0
>> [17:39:05][ 5.451881] alignment 0
>> [17:39:05][ 5.456967] data_pad 0
>> [17:39:05][ 5.462042] vol_type 0
>> [17:39:05][ 5.467137] upd_marker 0
>> [17:39:05][ 5.472216] name_len 0
>> [17:39:05][ 5.475304] name NULL
>> [17:39:05][ 5.478629] ubi0 error: process_lvol: both volume tables
>> are corrupted
>> [17:39:05][ 5.492599] ubi0 error: ubi_attach_mtd_dev: failed to
>> attach mtd2, error -22
>> [17:39:05][ 5.499728] UBI error: cannot attach mtd2
>>
>> Are any/all of the above the 'unstable bits' issue? Any
>
> Not really. Why do you think so?
No reason - it is just not clear to me how to know if you have run
into the 'unstable bits' issue.
>
>> recommendations on what to look for next?
>
> How does the corruption look like in the second case?
> Maybe we can identify a pattern. :-)
the second case is not permanent corruption - when the system is
power-cycled it comes up fine the next time.
Tim
>
> Thanks,
> //richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-26 20:31 ` Tim Harvey
@ 2015-10-26 21:41 ` Richard Weinberger
2015-10-27 19:01 ` Tim Harvey
0 siblings, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2015-10-26 21:41 UTC (permalink / raw)
To: Tim Harvey; +Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd
Tim,
Am 26.10.2015 um 21:31 schrieb Tim Harvey:
> What would be causing the bit-flips on the erased pages? Could it have
> something to do with the larger flash part having a much longer erase
> time?
According to NAND manufactures can happen also on empty space.
At the time when UBIFS was designed this was not known nor was it observed.
These days it seems to happen on some large/cheap NANDs.
> There shouldn't be anything writing to ubifs so I'm not clear what
> caused this to occur. Note that even if I remove the /etc/fstab that
> causes root to be re-mounted read/write I always see 'UBIFS (ubi0:0):
> recovery needed' and I'm not understanding what causes that but it
> makes me think that NAND is getting touched each and every boot by the
> recovery process.
>
>> In March there was an attempt to fix that in software.
>> But no mainline ready solution was presented so far:
>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052521.html
>>
>> It is not clear whether to implement this directly in gpmi-nand or MTD core.
>> Currently UBIFS assumes that empty spaces must contain only 0xff octets.
>> A naive approach would be removing that check from UBIFS, bit this can have
>> disastrous consequences as UBIFS's recovery algorithm relies on that.
>
> I think I ran across that approach right before the thread you pointed
> me to: http://thread.gmane.org/gmane.linux.drivers.mtd/52208
:-)
> the second case is not permanent corruption - when the system is
> power-cycled it comes up fine the next time.
Yeah, but have you checked how the temporary corruption looks like?
Thanks,
//richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-26 21:41 ` Richard Weinberger
@ 2015-10-27 19:01 ` Tim Harvey
2015-10-27 19:52 ` Richard Weinberger
0 siblings, 1 reply; 18+ messages in thread
From: Tim Harvey @ 2015-10-27 19:01 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd
On Mon, Oct 26, 2015 at 2:41 PM, Richard Weinberger <richard@nod.at> wrote:
> Tim,
>
> Am 26.10.2015 um 21:31 schrieb Tim Harvey:
>> What would be causing the bit-flips on the erased pages? Could it have
>> something to do with the larger flash part having a much longer erase
>> time?
>
> According to NAND manufactures can happen also on empty space.
> At the time when UBIFS was designed this was not known nor was it observed.
> These days it seems to happen on some large/cheap NANDs.
>
Richard,
I'm not understanding what is making you say that the issue I
encountered is 'not' the unstable bits issue described at
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
understanding is that the 'unstable bit' issue refers to bits which
are truly unstable and can read either way each and every read due to
not getting properly erased/written.
If I understand what you are saying you are thinking that my issue is
instead the result of a never-used PEB that had bit-flips from the
manufacturer in which case the bits would read the same every time?
How can we know this PEB was never before used and isn't one that was
being erased/written during a power cut?
In my test scenario where the rootfs is mounted from the kernel
read-only, but later mounted read-write by userspace (yet not being
specifically written to by userspace) then power-cut should 'any' NAND
writes would be occurring at all? And if not as I suspect, then how
could a subsequent boot end up using a PEB that may have been never
previously used and have bit-flips from the manufacturer?
Should we be doing an erase block on every NAND block during our board
manufacturing process to avoid this?
It sounds like this 'unexpected bit-flips on erased pages from the
mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
be updated to refer to this known issue as well as the unstable bit
issue?
>> There shouldn't be anything writing to ubifs so I'm not clear what
>> caused this to occur. Note that even if I remove the /etc/fstab that
>> causes root to be re-mounted read/write I always see 'UBIFS (ubi0:0):
>> recovery needed' and I'm not understanding what causes that but it
>> makes me think that NAND is getting touched each and every boot by the
>> recovery process.
>>
>>> In March there was an attempt to fix that in software.
>>> But no mainline ready solution was presented so far:
>>> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052521.html
>>>
>>> It is not clear whether to implement this directly in gpmi-nand or MTD core.
>>> Currently UBIFS assumes that empty spaces must contain only 0xff octets.
>>> A naive approach would be removing that check from UBIFS, bit this can have
>>> disastrous consequences as UBIFS's recovery algorithm relies on that.
>>
>> I think I ran across that approach right before the thread you pointed
>> me to: http://thread.gmane.org/gmane.linux.drivers.mtd/52208
>
> :-)
>
>> the second case is not permanent corruption - when the system is
>> power-cycled it comes up fine the next time.
>
> Yeah, but have you checked how the temporary corruption looks like?
I can add some debugging to find out - what specifically would be
helpful to add?
Thanks for the help!
Tim
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-27 19:01 ` Tim Harvey
@ 2015-10-27 19:52 ` Richard Weinberger
2015-11-02 20:27 ` Tim Harvey
0 siblings, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2015-10-27 19:52 UTC (permalink / raw)
To: Tim Harvey; +Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd
Tim,
Am 27.10.2015 um 20:01 schrieb Tim Harvey:
> I'm not understanding what is making you say that the issue I
> encountered is 'not' the unstable bits issue described at
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
> understanding is that the 'unstable bit' issue refers to bits which
> are truly unstable and can read either way each and every read due to
> not getting properly erased/written.
You are right. I was sorting out the unstable bits issue a bit too
early. I'm sorry.
Let's double check. Can you enable UBI verbose logging while testing?
Such that we can see which blocks were written/erased while the power cut
happened?
> If I understand what you are saying you are thinking that my issue is
> instead the result of a never-used PEB that had bit-flips from the
> manufacturer in which case the bits would read the same every time?
> How can we know this PEB was never before used and isn't one that was
> being erased/written during a power cut?
I've seen bit flips on cheap SLC NANDs which came out of a sudden.
According to the FAE I was talking to this is legit for NAND
as long the flipping bits are fixable by the ECC engine.
> In my test scenario where the rootfs is mounted from the kernel
> read-only, but later mounted read-write by userspace (yet not being
> specifically written to by userspace) then power-cut should 'any' NAND
> writes would be occurring at all? And if not as I suspect, then how
> could a subsequent boot end up using a PEB that may have been never
> previously used and have bit-flips from the manufacturer?
UBIFS's has a wandering journal. During the remount it moved maybe.
But for a more expressive analysis I'd need a nanddump to find out which
blocks are in which role.
Can you share the nanddump?
> Should we be doing an erase block on every NAND block during our board
> manufacturing process to avoid this?
Sorry, I don't understand this sentence.
Do you mean a full erasure of the whole NAND?
If so, it would not help as the bit flips can come later.
(Without writing/erasing the block)
The root cause is that your NFC cannot correct bit flips on empty pages.
> It sounds like this 'unexpected bit-flips on erased pages from the
> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
> be updated to refer to this known issue as well as the unstable bit
> issue?
As I said the root cause is that some NFCs cannot correct bit flips on empty
pages.
Instead of putting warnings to ubifs.html I'd love to see a solution on the
said drivers or MTD core.
> I can add some debugging to find out - what specifically would be
> helpful to add?
A hexdump of the buffer would be a good start.
> Thanks for the help!
Thanks for sharing your issues. This is the only way
to address them.
That said, as far on no board I had access to I was able to reproduce the unstable bits
issue. It was always something else.
Thanks,
//richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-27 19:52 ` Richard Weinberger
@ 2015-11-02 20:27 ` Tim Harvey
2015-11-02 20:31 ` Tim Harvey
2015-11-03 9:10 ` Artem Bityutskiy
0 siblings, 2 replies; 18+ messages in thread
From: Tim Harvey @ 2015-11-02 20:27 UTC (permalink / raw)
To: Richard Weinberger
Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd, Huang Shijie,
Elie De Brauwer, Brian Norris
On Tue, Oct 27, 2015 at 12:52 PM, Richard Weinberger <richard@nod.at> wrote:
> Tim,
>
> Am 27.10.2015 um 20:01 schrieb Tim Harvey:
>> I'm not understanding what is making you say that the issue I
>> encountered is 'not' the unstable bits issue described at
>> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
>> understanding is that the 'unstable bit' issue refers to bits which
>> are truly unstable and can read either way each and every read due to
>> not getting properly erased/written.
>
> You are right. I was sorting out the unstable bits issue a bit too
> early. I'm sorry.
> Let's double check. Can you enable UBI verbose logging while testing?
> Such that we can see which blocks were written/erased while the power cut
> happened?
>
Richard,
I have 2 IMX6 based boards that I've recreated a UBI/UBIFS NAND
corruption using Linux 4.2 with dynamic debug enabled for
ubi_io_write. This particular test cycle consisted of 5 different
boards and had over 40,000 boot/power-cuts before corruption occurred.
In each case the corruption is in a slightly different area.
The boards remain in a corrupted state which will allow me to get
further debug info.
Board1 looks like this:
[ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
(gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
Thu Oct 29 14:27:36 PDT 2015
[ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
instruction cache
[ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
...
[ 0.000000] Kernel command line: console=ttymxc1,115200
root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
+p" debug
...
[ 2.045065] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
[ 2.051477] nand: Micron MT29F16G08ADACAH4
[ 2.055590] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
4096, OOB size: 224
[ 2.087400] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
[ 2.094321] Scanning device for bad blocks
...
[ 3.704946] 3 ofpart partitions found on MTD device gpmi-nand
[ 3.710735] Creating 3 MTD partitions on "gpmi-nand":
[ 3.715823] 0x000000000000-0x000001000000 : "uboot"
[ 3.725088] 0x000001000000-0x000001100000 : "env"
[ 3.733000] 0x000001100000-0x000080000000 : "rootfs"
[ 3.741880] gpmi-nand 112000.gpmi-nand: driver registered.
...
[ 4.365104] ubi0: attaching mtd2
[ 8.418862] ubi0: scanning is finished
[ 8.445084] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
[ 8.450977] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
[ 8.457864] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
[ 8.464676] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
[ 8.471666] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
[ 8.477767] ubi0: user volume: 1, internal volumes: 1, max. volumes
count: 128
[ 8.485015] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
image sequence number: 42283325
[ 8.494000] ubi0: available PEBs: 0, total reserved PEBs: 8124,
PEBs reserved for bad PEB handling: 160
[ 8.503610] ubi0: background thread "ubi_bgt0d" started, PID 81
...
[ 8.615672] UBIFS (ubi0:0): recovery needed
[ 8.657083] UBIFS (ubi0:0): recovery deferred
[ 8.661965] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
name "rootfs", R/O mode
[ 8.670290] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
min./max. I/O unit sizes: 4096 bytes/4096 bytes
[ 8.680248] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
[ 8.691145] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 8.697001] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
[ 8.709784] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
[ 8.719451] devtmpfs: mounted
[ 8.723002] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
INIT: version 2.88 booting
Starting udev
[ 9.548551] udevd[114]: starting version 182
[ 10.018775] UBIFS (ubi0:0): completing deferred recovery
[ 10.043624] UBI DBG io (pid 145): write 4096 bytes to PEB 2253:4096
[ 10.054878] UBI DBG io (pid 145): write 94208 bytes to PEB 2253:8192
[ 10.096644] UBI DBG io (pid 81): write 4096 bytes to PEB 2250:0
[ 10.143314] UBI DBG io (pid 145): write 4096 bytes to PEB 2254:4096
[ 10.153917] UBI DBG io (pid 145): write 184320 bytes to PEB 2254:8192
^^^^^ PEB 2254 is the one that becomes un-readable on the next boot
[ 10.186288] UBI DBG io (pid 81): write 4096 bytes to PEB 2251:0
[ 10.204620] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 173
[ 10.213375] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:4096
[ 10.223864] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:8192
[ 10.239935] UBI DBG io (pid 145): write 4096 bytes to PEB 2237:32768
[ 10.261787] UBI DBG io (pid 145): write 4096 bytes to PEB 2184:102400
[ 10.272978] UBI DBG io (pid 145): write 4096 bytes to PEB 2173:114688
[ 10.285948] UBI DBG io (pid 145): write 4096 bytes to PEB 2174:114688
[ 10.296027] UBIFS (ubi0:0): deferred recovery completed
[ 10.299453] UBI DBG io (pid 81): write 4096 bytes to PEB 2252:0
bootlogd: cannot allocate pseudo tty: No such file or directory
ALSA: Restoring mixer settings...
Mon Oct 26 16:37:16 UTC 2015
INIT: Entering runlevel: 5
Configuring network interfaces... [ 11.471755] fec 2188000.ethernet
eth0: Freescale FEC PHY driver [Generic PHY]
(mii_bus:phy_addr=2188000.ethernet:00, irq=-1)
[ 11.483499] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.23.1) started
Sending discover...
Sending discover...
[ 15.000462] UBI DBG io (pid 173): write 4096 bytes to PEB 2253:102400
[ 16.472720] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full -
flow control rx/tx
[ 16.480975] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sending discover...
Sending select for 192.168.1.1...
Lease of 192.168.1.1 obtained, lease time 21600
/etc/udhcpc.d/50default: Adding DNS 192.168.1.100
done.
<power cut>
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
(gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
Thu Oct 29 14:27:36 PDT 2015
[ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
instruction cache
[ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
...
[ 2.044936] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
[ 2.051348] nand: Micron MT29F16G08ADACAH4
[ 2.055461] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
4096, OOB size: 224
[ 2.087272] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
[ 2.094196] Scanning device for bad blocks
...
[ 3.710706] Creating 3 MTD partitions on "gpmi-nand":
[ 3.715793] 0x000000000000-0x000001000000 : "uboot"
[ 3.725027] 0x000001000000-0x000001100000 : "env"
[ 3.732929] 0x000001100000-0x000080000000 : "rootfs"
[ 3.741781] gpmi-nand 112000.gpmi-nand: driver registered.
...
[ 4.366226] ubi0: attaching mtd2
[ 8.445733] ubi0: scanning is finished
[ 8.477551] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
[ 8.483443] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
[ 8.490355] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
[ 8.497155] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
[ 8.504148] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
[ 8.510274] ubi0: user volume: 1, internal volumes: 1, max. volumes
count: 128
[ 8.517510] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
image sequence number: 42283325
[ 8.526498] ubi0: available PEBs: 0, total reserved PEBs: 8124,
PEBs reserved for bad PEB handling: 160
[ 8.536151] ubi0: background thread "ubi_bgt0d" started, PID 82
...
[ 8.635364] UBIFS (ubi0:0): recovery needed
[ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
[ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
[ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
^^^^ non correctable ecc error on PEB 2254 - I verified that this was
not the first time this PEB has been used
[ 8.724987] ubi0 error: ubi_io_read: error -74 (ECC error) while
reading 69632 bytes from PEB 2254:192512, read 69632 bytes
[ 8.736163] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.2.0-00015-g64291f7 dirty #195
[ 8.744021] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 8.750582] Backtrace:
[ 8.753074] [<8001400c>] (dump_backtrace) from [<800141f8>]
(show_stack+0x18/0x1c)
[ 8.760667] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
[ 8.766424] [<800141e0>] (show_stack) from [<807ceeb8>]
(dump_stack+0x8c/0xa4)
[ 8.773687] [<807cee2c>] (dump_stack) from [<80474730>]
(ubi_io_read+0x130/0x378)
[ 8.781212] r6:00000000 r5:00011000 r4:ffffffb6 r3:be070000
[ 8.786951] [<80474600>] (ubi_io_read) from [<80471564>]
(ubi_eba_read_leb+0x19c/0x4c8)
[ 8.794975] r10:00000000 r9:c0c63000 r8:000008ce r7:00001f15
r6:00000000 r5:bda3a000
[ 8.802928] r4:bda12000
[ 8.805489] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
(ubi_leb_read+0x94/0x130)
[ 8.813527] r10:00000000 r9:c0c63000 r8:bda3a000 r7:0002d000
r6:00001f15 r5:00011000
[ 8.821467] r4:bda12000
[ 8.824037] [<804706e8>] (ubi_leb_read) from [<80257870>]
(ubifs_leb_read+0x30/0x9c)
[ 8.831815] r10:be29a000 r9:80b77ff8 r8:00001f15 r7:0002d000
r6:00011000 r5:be29a000
[ 8.839731] r4:bdaea600
[ 8.842312] [<80257840>] (ubifs_leb_read) from [<80260fac>]
(ubifs_start_scan+0x84/0x134)
[ 8.850522] r8:c0c36000 r7:00001f15 r6:be29a000 r5:0002d000 r4:bdaea600
[ 8.857322] [<80260f28>] (ubifs_start_scan) from [<80276270>]
(ubifs_recover_leb+0x80/0xb80)
[ 8.865794] r8:00001f15 r7:c0c63000 r6:00011000 r5:00001f15 r4:00000001
[ 8.872609] [<802761f0>] (ubifs_recover_leb) from [<802627f0>]
(ubifs_replay_journal+0xd34/0x1a48)
[ 8.881601] r10:00017188 r9:0002d000 r8:be29a000 r7:00000000
r6:bdaea608 r5:00001f15
[ 8.889518] r4:00000001
[ 8.892106] [<80261abc>] (ubifs_replay_journal) from [<80254954>]
(ubifs_mount+0x136c/0x2170)
[ 8.900666] r10:be29a008 r9:00000000 r8:0003db70 r7:be29a000
r6:80b76870 r5:be3f6000
[ 8.908584] r4:00000000
[ 8.911171] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
[ 8.918323] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
r6:80b2a2e0 r5:bd9cfac0
[ 8.926258] r4:be365900
[ 8.928829] [<801055dc>] (mount_fs) from [<80121868>]
(vfs_kern_mount+0x5c/0x144)
[ 8.936355] r6:00008001 r5:bd9cfac0 r4:be365900
[ 8.941056] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
(do_mount+0x19c/0xd0c)
[ 8.948642] r9:80b2a2e0 r8:80b1ebdc r7:bd9cfa80 r6:bd9cfac0
r5:00000060 r4:00000000
[ 8.956498] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
[ 8.963493] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
r6:00000000 r5:bd9cfac0
[ 8.971429] r4:bd9cfa80
[ 8.973993] [<80125cbc>] (SyS_mount) from [<80aa6260>]
(mount_block_root+0x114/0x240)
[ 8.981856] r8:befa3a40 r7:80afa7a0 r6:bd9d2000 r5:bd9d2000 r4:00008001
[ 8.988651] [<80aa614c>] (mount_block_root) from [<80aa664c>]
(prepare_namespace+0x98/0x1d0)
[ 8.997123] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
r6:80afa7a0 r5:80afa7a0
[ 9.005058] r4:80afa7b0
[ 9.007617] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
(kernel_init_freeable+0x1a4/0x1f0)
[ 9.016522] r6:80b8b000 r5:80b8b000 r4:00000007
[ 9.021217] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
(kernel_init+0x10/0xf4)
[ 9.029412] r10:00000000 r9:00000000 r8:00000000 r7:00000000
r6:00000000 r5:807c91a4
[ 9.037354] r4:00000000
[ 9.039918] [<807c91a4>] (kernel_init) from [<800104b8>]
(ret_from_fork+0x14/0x3c)
[ 9.047525] r4:00000000 r3:00000000
[ 9.051259] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
empty space LEB 7957:184320, corruption starts at 4665
[ 9.062247] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
corruption at LEB 7957:188985
[ 9.070972] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
first 8192 bytes from LEB 7957:188985
[ 9.080396] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 9.091903] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
...
[ 12.002997] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 12.014501] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ................................
[ 12.026000] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 7957
scanning failed
^^^^ recovery failed
[ 12.036886] VFS: Cannot open root device "ubi0:rootfs" or
unknown-block(0,0): error -117
After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
the following on board1:
[ 8.532624] UBIFS (ubi0:0): recovery needed
[ 8.569104] gpmi-nand 112000.gpmi-nand: The page(148656) is an
erased page(1,6,1,1).
[ 8.577748] gpmi-nand 112000.gpmi-nand: The page(148658) is an
erased page(1,6,1,1).
[ 8.586413] gpmi-nand 112000.gpmi-nand: The page(148660) is an
erased page(1,6,1,1).
[ 8.595050] gpmi-nand 112000.gpmi-nand: The page(148662) is an
erased page(1,6,1,1).
[ 8.603705] gpmi-nand 112000.gpmi-nand: The page(148664) is an
erased page(1,6,1,1).
[ 8.612310] gpmi-nand 112000.gpmi-nand: The page(148666) is an
erased page(1,6,1,1).
[ 8.620940] gpmi-nand 112000.gpmi-nand: The page(148668) is an
erased page(1,6,1,1).
[ 8.629580] gpmi-nand 112000.gpmi-nand: The page(148670) is an
erased page(1,6,1,1).
^^^^ 1 bit-flip on ecc read, 1 bit-flip on non-ecc read, ecc-str=6,
memset's buffer to 0xff to attempt to recover
[ 8.644345] UBIFS (ubi0:0): recovery deferred
[ 8.649209] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
name "rootfs", R/O mode
[ 8.657537] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
min./max. I/O unit sizes: 4096 bytes/4096 bytes
[ 8.667486] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
[ 8.678409] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 8.684283] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
[ 8.697990] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
^^^^ able to recover and always ends up this way over further boots
^^^^ after applying this patch I boot with init=foo so as to not allow
rootfs to get mounted read/write by udev and perhaps permanently
recover the board
Board2 looks like this:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
(gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
Thu Oct 29 14:27:36 PDT 2015
[ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
instruction cache
[ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW552X
...
[ 0.000000] Kernel command line: console=ttymxc1,115200
root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
+p" debug
...
[ 2.341988] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
[ 2.348382] nand: Micron MT29F16G08ADACAH4
[ 2.352495] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
4096, OOB size: 224
[ 2.384301] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
[ 2.391224] Scanning device for bad blocks
...
[ 4.024472] 3 ofpart partitions found on MTD device gpmi-nand
[ 4.030265] Creating 3 MTD partitions on "gpmi-nand":
[ 4.035336] 0x000000000000-0x000001000000 : "uboot"
[ 4.044635] 0x000001000000-0x000001100000 : "env"
[ 4.052559] 0x000001100000-0x000080000000 : "rootfs"
[ 4.061452] gpmi-nand 112000.gpmi-nand: driver registered.
...
[ 4.426420] ubi0: attaching mtd2
[ 8.514755] ubi0: scanning is finished
[ 8.541225] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
[ 8.547108] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
[ 8.553994] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
[ 8.560810] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
[ 8.567799] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
[ 8.573901] ubi0: user volume: 1, internal volumes: 1, max. volumes
count: 128
[ 8.581161] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
image sequence number: 42283325
[ 8.590148] ubi0: available PEBs: 0, total reserved PEBs: 8124,
PEBs reserved for bad PEB handling: 160
[ 8.599737] ubi0: background thread "ubi_bgt0d" started, PID 74
...
[ 8.712166] UBIFS (ubi0:0): recovery needed
[ 8.749882] UBIFS (ubi0:0): recovery deferred
[ 8.754639] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
name "rootfs", R/O mode
[ 8.762941] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
min./max. I/O unit sizes: 4096 bytes/4096 bytes
[ 8.772883] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
[ 8.783773] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 8.789631] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
[ 8.803366] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
[ 8.813172] devtmpfs: mounted
[ 8.816746] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
INIT: version 2.88 booting
Starting udev
[ 9.621167] udevd[106]: starting version 182
[ 10.525521] UBIFS (ubi0:0): completing deferred recovery
[ 10.540072] UBI DBG io (pid 162): write 4096 bytes to PEB 5891:4096
[ 10.547480] UBI DBG io (pid 162): write 110592 bytes to PEB 5891:8192
[ 10.578168] UBI DBG io (pid 162): write 4096 bytes to PEB 5892:4096
[ 10.585053] UBI DBG io (pid 74): write 4096 bytes to PEB 2944:0
[ 10.585639] UBI DBG io (pid 162): write 163840 bytes to PEB 5892:8192
[ 10.607478] UBI DBG io (pid 74): write 4096 bytes to PEB 2945:0
[ 10.615485] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 163
[ 10.617094] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:4096
[ 10.618542] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:8192
[ 10.619487] UBI DBG io (pid 162): write 4096 bytes to PEB 2859:212992
[ 10.620347] UBI DBG io (pid 162): write 4096 bytes to PEB 2850:225280
^^^^^ this is the PEB that gets uncorrectable ECC errors - I verified
that this was not the first time this PEB has been used
[ 10.621476] UBI DBG io (pid 162): write 4096 bytes to PEB 2851:225280
[ 10.623058] UBIFS (ubi0:0): deferred recovery completed
[ 10.628641] UBI DBG io (pid 74): write 4096 bytes to PEB 2946:0
bootlogd: cannot allocate pseudo tty: No such file or directory
Populating dev cache
ALSA: Restoring mixer settings...
/usr/sbin/alsactl: load_state:1729: No soundcards found...
Sat Oct 24 11:28:04 UTC 2015
INIT: Entering runlevel: 5
Configuring network interfaces... ifconfig: SIOCGIFFLAGS: No such device
Starting system message bus: dbus.
Starting Dropbear SSH server: dropbear.
Starting rpcbind daemon...done.
Starting watchdog: ^[[1
34m[^[[1
32m OK ^[[1
34m]^[[0
39m
Starting advanced power management daemon: No APM support in kernel
(failed.)
starting DNS forwarder and DHCP server: dnsmasq... done.
<power cut>
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
(gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
Thu Oct 29 14:27:36 PDT 2015
...
[ 2.332131] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
[ 2.338524] nand: Micron MT29F16G08ADACAH4
[ 2.342638] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
4096, OOB size: 224
[ 2.374442] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
[ 2.381366] Scanning device for bad blocks
...
[ 4.013255] 3 ofpart partitions found on MTD device gpmi-nand
[ 4.019040] Creating 3 MTD partitions on "gpmi-nand":
[ 4.024111] 0x000000000000-0x000001000000 : "uboot"
[ 4.033352] 0x000001000000-0x000001100000 : "env"
[ 4.041210] 0x000001100000-0x000080000000 : "rootfs"
[ 4.050164] gpmi-nand 112000.gpmi-nand: driver registered.
...
[ 4.416614] ubi0: attaching mtd2
[ 8.500977] ubi0: scanning is finished
[ 8.527328] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
[ 8.533184] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
[ 8.540089] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
[ 8.546906] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
[ 8.553878] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
[ 8.559998] ubi0: user volume: 1, internal volumes: 1, max. volumes
count: 128
[ 8.567248] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
image sequence number: 42283325
[ 8.576216] ubi0: available PEBs: 0, total reserved PEBs: 8124,
PEBs reserved for bad PEB handling: 160
[ 8.585813] ubi0: background thread "ubi_bgt0d" started, PID 73
..
[ 8.683337] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 8.710495] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 8.736821] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 8.763179] ubi0 error: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read 253952 bytes
^^^^ uncorrectable ECC error on PEB 2850
[ 8.774340] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.2.0-00015-g64291f7 #195
[ 8.782194] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 8.788743] Backtrace:
[ 8.791237] [<8001400c>] (dump_backtrace) from [<800141f8>]
(show_stack+0x18/0x1c)
[ 8.798839] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
[ 8.804589] [<800141e0>] (show_stack) from [<807ceeb8>]
(dump_stack+0x8c/0xa4)
[ 8.811855] [<807cee2c>] (dump_stack) from [<80474730>]
(ubi_io_read+0x130/0x378)
[ 8.819359] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
[ 8.825099] [<80474600>] (ubi_io_read) from [<80471564>]
(ubi_eba_read_leb+0x19c/0x4c8)
[ 8.833125] r10:00000000 r9:a0c0d000 r8:00000b22 r7:00000001
r6:00000000 r5:9dd8a000
[ 8.841065] r4:9dd9e000
[ 8.843628] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
(ubi_leb_read+0x94/0x130)
[ 8.851661] r10:00000000 r9:a0c0d000 r8:9dd8a000 r7:00000000
r6:00000001 r5:0003e000
[ 8.859596] r4:9dd9e000
[ 8.862170] [<804706e8>] (ubi_leb_read) from [<80257870>]
(ubifs_leb_read+0x30/0x9c)
[ 8.869937] r10:9e673008 r9:00000001 r8:00000001 r7:00000000
r6:0003e000 r5:9e673000
[ 8.877873] r4:9de22240
[ 8.880437] [<80257840>] (ubifs_leb_read) from [<80260fac>]
(ubifs_start_scan+0x84/0x134)
[ 8.888637] r8:a0c0d000 r7:00000001 r6:9e673000 r5:00000000 r4:9de22240
[ 8.895433] [<80260f28>] (ubifs_start_scan) from [<802613ac>]
(ubifs_scan+0x30/0x3d4)
[ 8.903284] r8:00000001 r7:9e673000 r6:a0c0d000 r5:00000000 r4:0003e000
[ 8.910097] [<8026137c>] (ubifs_scan) from [<8025fcd8>]
(ubifs_read_master+0x48/0xd78)
[ 8.918034] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:00000001
r6:80b76870 r5:9dda1800
[ 8.925952] r4:00000001
[ 8.928533] [<8025fc90>] (ubifs_read_master) from [<80254714>]
(ubifs_mount+0x112c/0x2170)
[ 8.936819] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
r6:80b76870 r5:9dda1800
[ 8.944738] r4:00000001
[ 8.947326] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
[ 8.954466] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
r6:80b2a2e0 r5:9dd73c80
[ 8.962404] r4:9dd76500
[ 8.964975] [<801055dc>] (mount_fs) from [<80121868>]
(vfs_kern_mount+0x5c/0x144)
[ 8.972479] r6:00008001 r5:9dd73c80 r4:9dd76500
[ 8.977177] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
(do_mount+0x19c/0xd0c)
[ 8.984750] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
r5:00000060 r4:00000000
[ 8.992605] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
[ 8.999588] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
r6:00000000 r5:9dd73c80
[ 9.007522] r4:9e7b5d00
[ 9.010087] [<80125cbc>] (SyS_mount) from [<80aa6260>]
(mount_block_root+0x114/0x240)
[ 9.017941] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
[ 9.024736] [<80aa614c>] (mount_block_root) from [<80aa664c>]
(prepare_namespace+0x98/0x1d0)
[ 9.033195] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
r6:80afa7a0 r5:80afa7a0
[ 9.041129] r4:80afa7b0
[ 9.043690] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
(kernel_init_freeable+0x1a4/0x1f0)
[ 9.052592] r6:80b8b000 r5:80b8b000 r4:00000007
[ 9.057290] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
(kernel_init+0x10/0xf4)
[ 9.065472] r10:00000000 r9:00000000 r8:00000000 r7:00000000
r6:00000000 r5:807c91a4
[ 9.073407] r4:00000000
[ 9.075970] [<807c91a4>] (kernel_init) from [<800104b8>]
(ret_from_fork+0x14/0x3c)
[ 9.083560] r4:00000000 r3:00000000
[ 9.102369] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 9.128774] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 9.155184] ubi0 warning: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
[ 9.181522] ubi0 error: ubi_io_read: error -74 (ECC error) while
reading 253952 bytes from PEB 2850:8192, read 253952 bytes
^^^^ unrecoverable ECC error on PEB 2850
[ 9.192683] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.2.0-00015-g64291f7 #195
[ 9.200535] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 9.207083] Backtrace:
[ 9.209573] [<8001400c>] (dump_backtrace) from [<800141f8>]
(show_stack+0x18/0x1c)
[ 9.217173] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
[ 9.222916] [<800141e0>] (show_stack) from [<807ceeb8>]
(dump_stack+0x8c/0xa4)
[ 9.230177] [<807cee2c>] (dump_stack) from [<80474730>]
(ubi_io_read+0x130/0x378)
[ 9.237688] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
[ 9.243429] [<80474600>] (ubi_io_read) from [<80471564>]
(ubi_eba_read_leb+0x19c/0x4c8)
[ 9.251458] r10:00000000 r9:a0c4c000 r8:00000b22 r7:00000001
r6:00000000 r5:9dd8a000
[ 9.259398] r4:9dd9e000
[ 9.261962] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
(ubi_leb_read+0x94/0x130)
[ 9.269988] r10:00000000 r9:a0c4c000 r8:9dd8a000 r7:00000000
r6:00000001 r5:0003e000
[ 9.277925] r4:9dd9e000
[ 9.280495] [<804706e8>] (ubi_leb_read) from [<80257870>]
(ubifs_leb_read+0x30/0x9c)
[ 9.288264] r10:9e673000 r9:9dd7bf80 r8:00000001 r7:00000000
r6:0003e000 r5:9e673000
[ 9.296182] r4:9e673000
[ 9.298768] [<80257840>] (ubifs_leb_read) from [<8027569c>]
(get_master_node+0x54/0x324)
[ 9.306880] r8:9e673000 r7:80b77ff8 r6:00001000 r5:00001000 r4:9e673000
[ 9.313681] [<80275648>] (get_master_node) from [<80275d98>]
(ubifs_recover_master_node+0x60/0x3ec)
[ 9.322749] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:80b77ff8
r6:00001000 r5:9dda1800
[ 9.330685] r4:9e673000
[ 9.333248] [<80275d38>] (ubifs_recover_master_node) from
[<8026080c>] (ubifs_read_master+0xb7c/0xd78)
[ 9.342575] r7:00000001 r6:80b76870 r5:9dda1800 r4:ffffff8b
[ 9.348335] [<8025fc90>] (ubifs_read_master) from [<80254714>]
(ubifs_mount+0x112c/0x2170)
[ 9.356621] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
r6:80b76870 r5:9dda1800
[ 9.364540] r4:00000001
[ 9.367127] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
[ 9.374266] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
r6:80b2a2e0 r5:9dd73c80
[ 9.382204] r4:9dd76500
[ 9.384772] [<801055dc>] (mount_fs) from [<80121868>]
(vfs_kern_mount+0x5c/0x144)
[ 9.392276] r6:00008001 r5:9dd73c80 r4:9dd76500
[ 9.396974] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
(do_mount+0x19c/0xd0c)
[ 9.404547] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
r5:00000060 r4:00000000
[ 9.412404] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
[ 9.419386] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
r6:00000000 r5:9dd73c80
[ 9.427320] r4:9e7b5d00
[ 9.429887] [<80125cbc>] (SyS_mount) from [<80aa6260>]
(mount_block_root+0x114/0x240)
[ 9.437743] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
[ 9.444539] [<80aa614c>] (mount_block_root) from [<80aa664c>]
(prepare_namespace+0x98/0x1d0)
[ 9.452999] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
r6:80afa7a0 r5:80afa7a0
[ 9.460933] r4:80afa7b0
[ 9.463494] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
(kernel_init_freeable+0x1a4/0x1f0)
[ 9.472386] r6:80b8b000 r5:80b8b000 r4:00000007
[ 9.477081] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
(kernel_init+0x10/0xf4)
[ 9.485262] r10:00000000 r9:00000000 r8:00000000 r7:00000000
r6:00000000 r5:807c91a4
[ 9.493196] r4:00000000
[ 9.495758] [<807c91a4>] (kernel_init) from [<800104b8>]
(ret_from_fork+0x14/0x3c)
[ 9.503349] r4:00000000 r3:00000000
[ 9.507115] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
failed to recover master node
^^^ master node recovery fails
...
[ 9.618335] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(0,0)
After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
the following on board2:
[ 8.605463] gpmi-nand 112000.gpmi-nand: The page(186809) is an
erased page(7,6,1,0).
[ 8.628012] gpmi-nand 112000.gpmi-nand: The page(186809) is an
erased page(7,6,1,0).
^^^ 1 bit-flip on ecc read, 0 bit-flip on non-ecc read, ecc-str=6,
memset's buffer to 0xff to attempt to recover
[ 8.637305] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
failed to recover master node
^^^ unable to recover and always ends up this way over further boots
>> If I understand what you are saying you are thinking that my issue is
>> instead the result of a never-used PEB that had bit-flips from the
>> manufacturer in which case the bits would read the same every time?
>> How can we know this PEB was never before used and isn't one that was
>> being erased/written during a power cut?
>
> I've seen bit flips on cheap SLC NANDs which came out of a sudden.
> According to the FAE I was talking to this is legit for NAND
> as long the flipping bits are fixable by the ECC engine.
>
I'm a little confused by the fact that gpmi-nand.c encounters an
uncorrectable status yet when the bit-flips are counted by Huang's
'gmpi_erased_check' they come out to 1 which is less than the ecc str.
Is this saying that ecc already corrected what it could and there was
'still' 1 more flip?
>> In my test scenario where the rootfs is mounted from the kernel
>> read-only, but later mounted read-write by userspace (yet not being
>> specifically written to by userspace) then power-cut should 'any' NAND
>> writes would be occurring at all? And if not as I suspect, then how
>> could a subsequent boot end up using a PEB that may have been never
>> previously used and have bit-flips from the manufacturer?
>
> UBIFS's has a wandering journal. During the remount it moved maybe.
> But for a more expressive analysis I'd need a nanddump to find out which
> blocks are in which role.
> Can you share the nanddump?
>
This makes sense, and I apparently lied when I said the test software
wasn't writing to the rootfs. There is at least one process that does
write: the dhcp client when it writes to /etc/resolv.con and possibly
something else writes as well. So we are power-cutting an an un-clean
ubifs and never allowing a safe shutdown so I would expect the journal
to be wandering around.
>> Should we be doing an erase block on every NAND block during our board
>> manufacturing process to avoid this?
>
> Sorry, I don't understand this sentence.
> Do you mean a full erasure of the whole NAND?
> If so, it would not help as the bit flips can come later.
> (Without writing/erasing the block)
> The root cause is that your NFC cannot correct bit flips on empty pages.
>
>> It sounds like this 'unexpected bit-flips on erased pages from the
>> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
>> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
>> be updated to refer to this known issue as well as the unstable bit
>> issue?
>
> As I said the root cause is that some NFCs cannot correct bit flips on empty
> pages.
> Instead of putting warnings to ubifs.html I'd love to see a solution on the
> said drivers or MTD core.
>
>> I can add some debugging to find out - what specifically would be
>> helpful to add?
>
> A hexdump of the buffer would be a good start.
>
>> Thanks for the help!
>
> Thanks for sharing your issues. This is the only way
> to address them.
> That said, as far on no board I had access to I was able to reproduce the unstable bits
> issue. It was always something else.
>
> Thanks,
> //richard
I can provide dumps of various blocks if needed or add more debugging
on the reads after the corruption occurred. It takes me a day or two
to re-create the issue if you want me to turn on more debugging during
the writes.
I've cc'd Huang, Elie, and Brian who were involved in the patch to
detect bit-flips in gpmi-nand.c reads - perhaps they have some more
ideas. I find it interesting that in one case that patch resolves the
issue and in the other it does not.
Regards,
Tim
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-02 20:27 ` Tim Harvey
@ 2015-11-02 20:31 ` Tim Harvey
2015-11-02 21:31 ` Richard Weinberger
2015-11-03 13:38 ` Boris Brezillon
2015-11-03 9:10 ` Artem Bityutskiy
1 sibling, 2 replies; 18+ messages in thread
From: Tim Harvey @ 2015-11-02 20:31 UTC (permalink / raw)
To: Richard Weinberger
Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd, Elie De Brauwer,
Brian Norris, Huang Shijie
On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> On Tue, Oct 27, 2015 at 12:52 PM, Richard Weinberger <richard@nod.at> wrote:
>> Tim,
>>
>> Am 27.10.2015 um 20:01 schrieb Tim Harvey:
>>> I'm not understanding what is making you say that the issue I
>>> encountered is 'not' the unstable bits issue described at
>>> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
>>> understanding is that the 'unstable bit' issue refers to bits which
>>> are truly unstable and can read either way each and every read due to
>>> not getting properly erased/written.
>>
>> You are right. I was sorting out the unstable bits issue a bit too
>> early. I'm sorry.
>> Let's double check. Can you enable UBI verbose logging while testing?
>> Such that we can see which blocks were written/erased while the power cut
>> happened?
>>
>
> Richard,
>
> I have 2 IMX6 based boards that I've recreated a UBI/UBIFS NAND
> corruption using Linux 4.2 with dynamic debug enabled for
> ubi_io_write. This particular test cycle consisted of 5 different
> boards and had over 40,000 boot/power-cuts before corruption occurred.
> In each case the corruption is in a slightly different area.
>
> The boards remain in a corrupted state which will allow me to get
> further debug info.
>
> Board1 looks like this:
>
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
> ...
> [ 0.000000] Kernel command line: console=ttymxc1,115200
> root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
> +p" debug
> ...
> [ 2.045065] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.051477] nand: Micron MT29F16G08ADACAH4
> [ 2.055590] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.087400] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.094321] Scanning device for bad blocks
> ...
> [ 3.704946] 3 ofpart partitions found on MTD device gpmi-nand
> [ 3.710735] Creating 3 MTD partitions on "gpmi-nand":
> [ 3.715823] 0x000000000000-0x000001000000 : "uboot"
> [ 3.725088] 0x000001000000-0x000001100000 : "env"
> [ 3.733000] 0x000001100000-0x000080000000 : "rootfs"
> [ 3.741880] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.365104] ubi0: attaching mtd2
> [ 8.418862] ubi0: scanning is finished
> [ 8.445084] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.450977] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.457864] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.464676] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.471666] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.477767] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.485015] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.494000] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.503610] ubi0: background thread "ubi_bgt0d" started, PID 81
> ...
> [ 8.615672] UBIFS (ubi0:0): recovery needed
> [ 8.657083] UBIFS (ubi0:0): recovery deferred
> [ 8.661965] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.670290] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.680248] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.691145] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.697001] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.709784] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> [ 8.719451] devtmpfs: mounted
> [ 8.723002] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
> INIT: version 2.88 booting
> Starting udev
> [ 9.548551] udevd[114]: starting version 182
> [ 10.018775] UBIFS (ubi0:0): completing deferred recovery
> [ 10.043624] UBI DBG io (pid 145): write 4096 bytes to PEB 2253:4096
> [ 10.054878] UBI DBG io (pid 145): write 94208 bytes to PEB 2253:8192
> [ 10.096644] UBI DBG io (pid 81): write 4096 bytes to PEB 2250:0
> [ 10.143314] UBI DBG io (pid 145): write 4096 bytes to PEB 2254:4096
> [ 10.153917] UBI DBG io (pid 145): write 184320 bytes to PEB 2254:8192
> ^^^^^ PEB 2254 is the one that becomes un-readable on the next boot
>
> [ 10.186288] UBI DBG io (pid 81): write 4096 bytes to PEB 2251:0
> [ 10.204620] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 173
> [ 10.213375] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:4096
> [ 10.223864] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:8192
> [ 10.239935] UBI DBG io (pid 145): write 4096 bytes to PEB 2237:32768
> [ 10.261787] UBI DBG io (pid 145): write 4096 bytes to PEB 2184:102400
> [ 10.272978] UBI DBG io (pid 145): write 4096 bytes to PEB 2173:114688
> [ 10.285948] UBI DBG io (pid 145): write 4096 bytes to PEB 2174:114688
> [ 10.296027] UBIFS (ubi0:0): deferred recovery completed
> [ 10.299453] UBI DBG io (pid 81): write 4096 bytes to PEB 2252:0
> bootlogd: cannot allocate pseudo tty: No such file or directory
> ALSA: Restoring mixer settings...
> Mon Oct 26 16:37:16 UTC 2015
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 11.471755] fec 2188000.ethernet
> eth0: Freescale FEC PHY driver [Generic PHY]
> (mii_bus:phy_addr=2188000.ethernet:00, irq=-1)
> [ 11.483499] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.23.1) started
> Sending discover...
> Sending discover...
> [ 15.000462] UBI DBG io (pid 173): write 4096 bytes to PEB 2253:102400
> [ 16.472720] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full -
> flow control rx/tx
> [ 16.480975] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Sending discover...
> Sending select for 192.168.1.1...
> Lease of 192.168.1.1 obtained, lease time 21600
> /etc/udhcpc.d/50default: Adding DNS 192.168.1.100
> done.
>
> <power cut>
>
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
> ...
> [ 2.044936] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.051348] nand: Micron MT29F16G08ADACAH4
> [ 2.055461] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.087272] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.094196] Scanning device for bad blocks
> ...
> [ 3.710706] Creating 3 MTD partitions on "gpmi-nand":
> [ 3.715793] 0x000000000000-0x000001000000 : "uboot"
> [ 3.725027] 0x000001000000-0x000001100000 : "env"
> [ 3.732929] 0x000001100000-0x000080000000 : "rootfs"
> [ 3.741781] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.366226] ubi0: attaching mtd2
> [ 8.445733] ubi0: scanning is finished
> [ 8.477551] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.483443] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.490355] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.497155] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.504148] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.510274] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.517510] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.526498] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.536151] ubi0: background thread "ubi_bgt0d" started, PID 82
> ...
> [ 8.635364] UBIFS (ubi0:0): recovery needed
> [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
> not the first time this PEB has been used
> [ 8.724987] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read 69632 bytes
> [ 8.736163] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 dirty #195
> [ 8.744021] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 8.750582] Backtrace:
> [ 8.753074] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 8.760667] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 8.766424] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 8.773687] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 8.781212] r6:00000000 r5:00011000 r4:ffffffb6 r3:be070000
> [ 8.786951] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 8.794975] r10:00000000 r9:c0c63000 r8:000008ce r7:00001f15
> r6:00000000 r5:bda3a000
> [ 8.802928] r4:bda12000
> [ 8.805489] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 8.813527] r10:00000000 r9:c0c63000 r8:bda3a000 r7:0002d000
> r6:00001f15 r5:00011000
> [ 8.821467] r4:bda12000
> [ 8.824037] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 8.831815] r10:be29a000 r9:80b77ff8 r8:00001f15 r7:0002d000
> r6:00011000 r5:be29a000
> [ 8.839731] r4:bdaea600
> [ 8.842312] [<80257840>] (ubifs_leb_read) from [<80260fac>]
> (ubifs_start_scan+0x84/0x134)
> [ 8.850522] r8:c0c36000 r7:00001f15 r6:be29a000 r5:0002d000 r4:bdaea600
> [ 8.857322] [<80260f28>] (ubifs_start_scan) from [<80276270>]
> (ubifs_recover_leb+0x80/0xb80)
> [ 8.865794] r8:00001f15 r7:c0c63000 r6:00011000 r5:00001f15 r4:00000001
> [ 8.872609] [<802761f0>] (ubifs_recover_leb) from [<802627f0>]
> (ubifs_replay_journal+0xd34/0x1a48)
> [ 8.881601] r10:00017188 r9:0002d000 r8:be29a000 r7:00000000
> r6:bdaea608 r5:00001f15
> [ 8.889518] r4:00000001
> [ 8.892106] [<80261abc>] (ubifs_replay_journal) from [<80254954>]
> (ubifs_mount+0x136c/0x2170)
> [ 8.900666] r10:be29a008 r9:00000000 r8:0003db70 r7:be29a000
> r6:80b76870 r5:be3f6000
> [ 8.908584] r4:00000000
> [ 8.911171] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 8.918323] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:bd9cfac0
> [ 8.926258] r4:be365900
> [ 8.928829] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 8.936355] r6:00008001 r5:bd9cfac0 r4:be365900
> [ 8.941056] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 8.948642] r9:80b2a2e0 r8:80b1ebdc r7:bd9cfa80 r6:bd9cfac0
> r5:00000060 r4:00000000
> [ 8.956498] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 8.963493] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:bd9cfac0
> [ 8.971429] r4:bd9cfa80
> [ 8.973993] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 8.981856] r8:befa3a40 r7:80afa7a0 r6:bd9d2000 r5:bd9d2000 r4:00008001
> [ 8.988651] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 8.997123] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.005058] r4:80afa7b0
> [ 9.007617] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.016522] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.021217] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.029412] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.037354] r4:00000000
> [ 9.039918] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.047525] r4:00000000 r3:00000000
> [ 9.051259] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
> empty space LEB 7957:184320, corruption starts at 4665
> [ 9.062247] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> corruption at LEB 7957:188985
> [ 9.070972] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> first 8192 bytes from LEB 7957:188985
> [ 9.080396] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 9.091903] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> ...
> [ 12.002997] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 12.014501] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 12.026000] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 7957
> scanning failed
> ^^^^ recovery failed
> [ 12.036886] VFS: Cannot open root device "ubi0:rootfs" or
> unknown-block(0,0): error -117
>
> After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
> page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
> the following on board1:
> [ 8.532624] UBIFS (ubi0:0): recovery needed
> [ 8.569104] gpmi-nand 112000.gpmi-nand: The page(148656) is an
> erased page(1,6,1,1).
> [ 8.577748] gpmi-nand 112000.gpmi-nand: The page(148658) is an
> erased page(1,6,1,1).
> [ 8.586413] gpmi-nand 112000.gpmi-nand: The page(148660) is an
> erased page(1,6,1,1).
> [ 8.595050] gpmi-nand 112000.gpmi-nand: The page(148662) is an
> erased page(1,6,1,1).
> [ 8.603705] gpmi-nand 112000.gpmi-nand: The page(148664) is an
> erased page(1,6,1,1).
> [ 8.612310] gpmi-nand 112000.gpmi-nand: The page(148666) is an
> erased page(1,6,1,1).
> [ 8.620940] gpmi-nand 112000.gpmi-nand: The page(148668) is an
> erased page(1,6,1,1).
> [ 8.629580] gpmi-nand 112000.gpmi-nand: The page(148670) is an
> erased page(1,6,1,1).
> ^^^^ 1 bit-flip on ecc read, 1 bit-flip on non-ecc read, ecc-str=6,
> memset's buffer to 0xff to attempt to recover
> [ 8.644345] UBIFS (ubi0:0): recovery deferred
> [ 8.649209] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.657537] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.667486] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.678409] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.684283] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.697990] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> ^^^^ able to recover and always ends up this way over further boots
> ^^^^ after applying this patch I boot with init=foo so as to not allow
> rootfs to get mounted read/write by udev and perhaps permanently
> recover the board
>
>
>
> Board2 looks like this:
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW552X
> ...
> [ 0.000000] Kernel command line: console=ttymxc1,115200
> root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
> +p" debug
> ...
> [ 2.341988] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.348382] nand: Micron MT29F16G08ADACAH4
> [ 2.352495] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.384301] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.391224] Scanning device for bad blocks
> ...
> [ 4.024472] 3 ofpart partitions found on MTD device gpmi-nand
> [ 4.030265] Creating 3 MTD partitions on "gpmi-nand":
> [ 4.035336] 0x000000000000-0x000001000000 : "uboot"
> [ 4.044635] 0x000001000000-0x000001100000 : "env"
> [ 4.052559] 0x000001100000-0x000080000000 : "rootfs"
> [ 4.061452] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.426420] ubi0: attaching mtd2
> [ 8.514755] ubi0: scanning is finished
> [ 8.541225] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.547108] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.553994] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.560810] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.567799] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.573901] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.581161] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.590148] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.599737] ubi0: background thread "ubi_bgt0d" started, PID 74
> ...
> [ 8.712166] UBIFS (ubi0:0): recovery needed
> [ 8.749882] UBIFS (ubi0:0): recovery deferred
> [ 8.754639] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.762941] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.772883] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.783773] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.789631] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.803366] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> [ 8.813172] devtmpfs: mounted
> [ 8.816746] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
> INIT: version 2.88 booting
> Starting udev
> [ 9.621167] udevd[106]: starting version 182
> [ 10.525521] UBIFS (ubi0:0): completing deferred recovery
> [ 10.540072] UBI DBG io (pid 162): write 4096 bytes to PEB 5891:4096
> [ 10.547480] UBI DBG io (pid 162): write 110592 bytes to PEB 5891:8192
> [ 10.578168] UBI DBG io (pid 162): write 4096 bytes to PEB 5892:4096
> [ 10.585053] UBI DBG io (pid 74): write 4096 bytes to PEB 2944:0
> [ 10.585639] UBI DBG io (pid 162): write 163840 bytes to PEB 5892:8192
> [ 10.607478] UBI DBG io (pid 74): write 4096 bytes to PEB 2945:0
> [ 10.615485] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 163
> [ 10.617094] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:4096
> [ 10.618542] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:8192
> [ 10.619487] UBI DBG io (pid 162): write 4096 bytes to PEB 2859:212992
> [ 10.620347] UBI DBG io (pid 162): write 4096 bytes to PEB 2850:225280
> ^^^^^ this is the PEB that gets uncorrectable ECC errors - I verified
> that this was not the first time this PEB has been used
> [ 10.621476] UBI DBG io (pid 162): write 4096 bytes to PEB 2851:225280
> [ 10.623058] UBIFS (ubi0:0): deferred recovery completed
> [ 10.628641] UBI DBG io (pid 74): write 4096 bytes to PEB 2946:0
> bootlogd: cannot allocate pseudo tty: No such file or directory
> Populating dev cache
> ALSA: Restoring mixer settings...
> /usr/sbin/alsactl: load_state:1729: No soundcards found...
> Sat Oct 24 11:28:04 UTC 2015
> INIT: Entering runlevel: 5
> Configuring network interfaces... ifconfig: SIOCGIFFLAGS: No such device
> Starting system message bus: dbus.
> Starting Dropbear SSH server: dropbear.
> Starting rpcbind daemon...done.
> Starting watchdog: [1
> 34m[ [1
> 32m OK [1
> 34m] [0
> 39m
> Starting advanced power management daemon: No APM support in kernel
> (failed.)
> starting DNS forwarder and DHCP server: dnsmasq... done.
>
> <power cut>
>
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey@tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> ...
> [ 2.332131] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.338524] nand: Micron MT29F16G08ADACAH4
> [ 2.342638] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.374442] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.381366] Scanning device for bad blocks
> ...
> [ 4.013255] 3 ofpart partitions found on MTD device gpmi-nand
> [ 4.019040] Creating 3 MTD partitions on "gpmi-nand":
> [ 4.024111] 0x000000000000-0x000001000000 : "uboot"
> [ 4.033352] 0x000001000000-0x000001100000 : "env"
> [ 4.041210] 0x000001100000-0x000080000000 : "rootfs"
> [ 4.050164] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.416614] ubi0: attaching mtd2
> [ 8.500977] ubi0: scanning is finished
> [ 8.527328] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.533184] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.540089] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.546906] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.553878] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.559998] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.567248] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.576216] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.585813] ubi0: background thread "ubi_bgt0d" started, PID 73
> ..
> [ 8.683337] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.710495] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.736821] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.763179] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
> ^^^^ uncorrectable ECC error on PEB 2850
> [ 8.774340] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 #195
> [ 8.782194] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 8.788743] Backtrace:
> [ 8.791237] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 8.798839] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 8.804589] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 8.811855] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 8.819359] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
> [ 8.825099] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 8.833125] r10:00000000 r9:a0c0d000 r8:00000b22 r7:00000001
> r6:00000000 r5:9dd8a000
> [ 8.841065] r4:9dd9e000
> [ 8.843628] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 8.851661] r10:00000000 r9:a0c0d000 r8:9dd8a000 r7:00000000
> r6:00000001 r5:0003e000
> [ 8.859596] r4:9dd9e000
> [ 8.862170] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 8.869937] r10:9e673008 r9:00000001 r8:00000001 r7:00000000
> r6:0003e000 r5:9e673000
> [ 8.877873] r4:9de22240
> [ 8.880437] [<80257840>] (ubifs_leb_read) from [<80260fac>]
> (ubifs_start_scan+0x84/0x134)
> [ 8.888637] r8:a0c0d000 r7:00000001 r6:9e673000 r5:00000000 r4:9de22240
> [ 8.895433] [<80260f28>] (ubifs_start_scan) from [<802613ac>]
> (ubifs_scan+0x30/0x3d4)
> [ 8.903284] r8:00000001 r7:9e673000 r6:a0c0d000 r5:00000000 r4:0003e000
> [ 8.910097] [<8026137c>] (ubifs_scan) from [<8025fcd8>]
> (ubifs_read_master+0x48/0xd78)
> [ 8.918034] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:00000001
> r6:80b76870 r5:9dda1800
> [ 8.925952] r4:00000001
> [ 8.928533] [<8025fc90>] (ubifs_read_master) from [<80254714>]
> (ubifs_mount+0x112c/0x2170)
> [ 8.936819] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
> r6:80b76870 r5:9dda1800
> [ 8.944738] r4:00000001
> [ 8.947326] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 8.954466] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:9dd73c80
> [ 8.962404] r4:9dd76500
> [ 8.964975] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 8.972479] r6:00008001 r5:9dd73c80 r4:9dd76500
> [ 8.977177] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 8.984750] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
> r5:00000060 r4:00000000
> [ 8.992605] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 8.999588] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:9dd73c80
> [ 9.007522] r4:9e7b5d00
> [ 9.010087] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 9.017941] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
> [ 9.024736] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 9.033195] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.041129] r4:80afa7b0
> [ 9.043690] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.052592] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.057290] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.065472] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.073407] r4:00000000
> [ 9.075970] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.083560] r4:00000000 r3:00000000
> [ 9.102369] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.128774] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.155184] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.181522] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
> ^^^^ unrecoverable ECC error on PEB 2850
> [ 9.192683] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 #195
> [ 9.200535] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 9.207083] Backtrace:
> [ 9.209573] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 9.217173] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 9.222916] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 9.230177] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 9.237688] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
> [ 9.243429] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 9.251458] r10:00000000 r9:a0c4c000 r8:00000b22 r7:00000001
> r6:00000000 r5:9dd8a000
> [ 9.259398] r4:9dd9e000
> [ 9.261962] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 9.269988] r10:00000000 r9:a0c4c000 r8:9dd8a000 r7:00000000
> r6:00000001 r5:0003e000
> [ 9.277925] r4:9dd9e000
> [ 9.280495] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 9.288264] r10:9e673000 r9:9dd7bf80 r8:00000001 r7:00000000
> r6:0003e000 r5:9e673000
> [ 9.296182] r4:9e673000
> [ 9.298768] [<80257840>] (ubifs_leb_read) from [<8027569c>]
> (get_master_node+0x54/0x324)
> [ 9.306880] r8:9e673000 r7:80b77ff8 r6:00001000 r5:00001000 r4:9e673000
> [ 9.313681] [<80275648>] (get_master_node) from [<80275d98>]
> (ubifs_recover_master_node+0x60/0x3ec)
> [ 9.322749] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:80b77ff8
> r6:00001000 r5:9dda1800
> [ 9.330685] r4:9e673000
> [ 9.333248] [<80275d38>] (ubifs_recover_master_node) from
> [<8026080c>] (ubifs_read_master+0xb7c/0xd78)
> [ 9.342575] r7:00000001 r6:80b76870 r5:9dda1800 r4:ffffff8b
> [ 9.348335] [<8025fc90>] (ubifs_read_master) from [<80254714>]
> (ubifs_mount+0x112c/0x2170)
> [ 9.356621] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
> r6:80b76870 r5:9dda1800
> [ 9.364540] r4:00000001
> [ 9.367127] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 9.374266] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:9dd73c80
> [ 9.382204] r4:9dd76500
> [ 9.384772] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 9.392276] r6:00008001 r5:9dd73c80 r4:9dd76500
> [ 9.396974] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 9.404547] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
> r5:00000060 r4:00000000
> [ 9.412404] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 9.419386] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:9dd73c80
> [ 9.427320] r4:9e7b5d00
> [ 9.429887] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 9.437743] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
> [ 9.444539] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 9.452999] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.460933] r4:80afa7b0
> [ 9.463494] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.472386] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.477081] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.485262] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.493196] r4:00000000
> [ 9.495758] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.503349] r4:00000000 r3:00000000
> [ 9.507115] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
> failed to recover master node
> ^^^ master node recovery fails
> ...
> [ 9.618335] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(0,0)
>
> After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
> page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
> the following on board2:
> [ 8.605463] gpmi-nand 112000.gpmi-nand: The page(186809) is an
> erased page(7,6,1,0).
> [ 8.628012] gpmi-nand 112000.gpmi-nand: The page(186809) is an
> erased page(7,6,1,0).
> ^^^ 1 bit-flip on ecc read, 0 bit-flip on non-ecc read, ecc-str=6,
> memset's buffer to 0xff to attempt to recover
> [ 8.637305] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
> failed to recover master node
> ^^^ unable to recover and always ends up this way over further boots
>
>>> If I understand what you are saying you are thinking that my issue is
>>> instead the result of a never-used PEB that had bit-flips from the
>>> manufacturer in which case the bits would read the same every time?
>>> How can we know this PEB was never before used and isn't one that was
>>> being erased/written during a power cut?
>>
>> I've seen bit flips on cheap SLC NANDs which came out of a sudden.
>> According to the FAE I was talking to this is legit for NAND
>> as long the flipping bits are fixable by the ECC engine.
>>
>
> I'm a little confused by the fact that gpmi-nand.c encounters an
> uncorrectable status yet when the bit-flips are counted by Huang's
> 'gmpi_erased_check' they come out to 1 which is less than the ecc str.
> Is this saying that ecc already corrected what it could and there was
> 'still' 1 more flip?
>
>>> In my test scenario where the rootfs is mounted from the kernel
>>> read-only, but later mounted read-write by userspace (yet not being
>>> specifically written to by userspace) then power-cut should 'any' NAND
>>> writes would be occurring at all? And if not as I suspect, then how
>>> could a subsequent boot end up using a PEB that may have been never
>>> previously used and have bit-flips from the manufacturer?
>>
>> UBIFS's has a wandering journal. During the remount it moved maybe.
>> But for a more expressive analysis I'd need a nanddump to find out which
>> blocks are in which role.
>> Can you share the nanddump?
>>
>
> This makes sense, and I apparently lied when I said the test software
> wasn't writing to the rootfs. There is at least one process that does
> write: the dhcp client when it writes to /etc/resolv.con and possibly
> something else writes as well. So we are power-cutting an an un-clean
> ubifs and never allowing a safe shutdown so I would expect the journal
> to be wandering around.
>
>>> Should we be doing an erase block on every NAND block during our board
>>> manufacturing process to avoid this?
>>
>> Sorry, I don't understand this sentence.
>> Do you mean a full erasure of the whole NAND?
>> If so, it would not help as the bit flips can come later.
>> (Without writing/erasing the block)
>> The root cause is that your NFC cannot correct bit flips on empty pages.
>>
>>> It sounds like this 'unexpected bit-flips on erased pages from the
>>> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
>>> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
>>> be updated to refer to this known issue as well as the unstable bit
>>> issue?
>>
>> As I said the root cause is that some NFCs cannot correct bit flips on empty
>> pages.
>> Instead of putting warnings to ubifs.html I'd love to see a solution on the
>> said drivers or MTD core.
>>
>>> I can add some debugging to find out - what specifically would be
>>> helpful to add?
>>
>> A hexdump of the buffer would be a good start.
>>
>>> Thanks for the help!
>>
>> Thanks for sharing your issues. This is the only way
>> to address them.
>> That said, as far on no board I had access to I was able to reproduce the unstable bits
>> issue. It was always something else.
>>
>> Thanks,
>> //richard
>
> I can provide dumps of various blocks if needed or add more debugging
> on the reads after the corruption occurred. It takes me a day or two
> to re-create the issue if you want me to turn on more debugging during
> the writes.
>
> I've cc'd Huang, Elie, and Brian who were involved in the patch to
> detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> ideas. I find it interesting that in one case that patch resolves the
> issue and in the other it does not.
>
> Regards,
>
> Tim
Cc: Huang Shijie <shijie.huang@arm.com>
Updated Huang's e-mail as the old one bounced
Tim
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-02 20:31 ` Tim Harvey
@ 2015-11-02 21:31 ` Richard Weinberger
2015-11-02 22:11 ` Brian Norris
2015-11-03 13:38 ` Boris Brezillon
1 sibling, 1 reply; 18+ messages in thread
From: Richard Weinberger @ 2015-11-02 21:31 UTC (permalink / raw)
To: Tim Harvey
Cc: Artem Bityutskiy, Adrian Hunter, linux-mtd, Elie De Brauwer,
Brian Norris, Huang Shijie
Am 02.11.2015 um 21:31 schrieb Tim Harvey:
>> [ 10.018775] UBIFS (ubi0:0): completing deferred recovery
>> [ 10.043624] UBI DBG io (pid 145): write 4096 bytes to PEB 2253:4096
>> [ 10.054878] UBI DBG io (pid 145): write 94208 bytes to PEB 2253:8192
>> [ 10.096644] UBI DBG io (pid 81): write 4096 bytes to PEB 2250:0
>> [ 10.143314] UBI DBG io (pid 145): write 4096 bytes to PEB 2254:4096
>> [ 10.153917] UBI DBG io (pid 145): write 184320 bytes to PEB 2254:8192
>> ^^^^^ PEB 2254 is the one that becomes un-readable on the next boot
So, the important fact is that it was not written while the power cut happened.
Maybe it was erased. Can you please enable that debug print too?
Just to make very sure.
>> [ 10.186288] UBI DBG io (pid 81): write 4096 bytes to PEB 2251:0
>> [ 10.204620] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 173
>> [ 10.213375] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:4096
>> [ 10.223864] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:8192
>> [ 10.239935] UBI DBG io (pid 145): write 4096 bytes to PEB 2237:32768
>> [ 10.261787] UBI DBG io (pid 145): write 4096 bytes to PEB 2184:102400
>> [ 10.272978] UBI DBG io (pid 145): write 4096 bytes to PEB 2173:114688
>> [ 10.285948] UBI DBG io (pid 145): write 4096 bytes to PEB 2174:114688
>> [ 10.296027] UBIFS (ubi0:0): deferred recovery completed
>> [ 10.299453] UBI DBG io (pid 81): write 4096 bytes to PEB 2252:0
[...]
>> [ 15.000462] UBI DBG io (pid 173): write 4096 bytes to PEB 2253:102400
>> [ 16.472720] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full -
>> flow control rx/tx
>> [ 16.480975] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> Sending discover...
>> Sending select for 192.168.1.1...
>> Lease of 192.168.1.1 obtained, lease time 21600
>> /etc/udhcpc.d/50default: Adding DNS 192.168.1.100
>> done.
>>
>> <power cut>
[...]
>> [ 8.635364] UBIFS (ubi0:0): recovery needed
>> [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
>> not the first time this PEB has been used
>> [ 8.724987] ubi0 error: ubi_io_read: error -74 (ECC error) while
>> reading 69632 bytes from PEB 2254:192512, read 69632 bytes
>> [ 8.736163] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.2.0-00015-g64291f7 dirty #195
>> [ 8.744021] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [ 8.750582] Backtrace:
>> [ 8.753074] [<8001400c>] (dump_backtrace) from [<800141f8>]
>> (show_stack+0x18/0x1c)
>> [ 8.760667] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
>> [ 8.766424] [<800141e0>] (show_stack) from [<807ceeb8>]
>> (dump_stack+0x8c/0xa4)
>> [ 8.773687] [<807cee2c>] (dump_stack) from [<80474730>]
>> (ubi_io_read+0x130/0x378)
>> [ 8.781212] r6:00000000 r5:00011000 r4:ffffffb6 r3:be070000
>> [ 8.786951] [<80474600>] (ubi_io_read) from [<80471564>]
>> (ubi_eba_read_leb+0x19c/0x4c8)
>> [ 8.794975] r10:00000000 r9:c0c63000 r8:000008ce r7:00001f15
>> r6:00000000 r5:bda3a000
>> [ 8.802928] r4:bda12000
>> [ 8.805489] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
>> (ubi_leb_read+0x94/0x130)
>> [ 8.813527] r10:00000000 r9:c0c63000 r8:bda3a000 r7:0002d000
>> r6:00001f15 r5:00011000
>> [ 8.821467] r4:bda12000
>> [ 8.824037] [<804706e8>] (ubi_leb_read) from [<80257870>]
>> (ubifs_leb_read+0x30/0x9c)
>> [ 8.831815] r10:be29a000 r9:80b77ff8 r8:00001f15 r7:0002d000
>> r6:00011000 r5:be29a000
>> [ 8.839731] r4:bdaea600
>> [ 8.842312] [<80257840>] (ubifs_leb_read) from [<80260fac>]
>> (ubifs_start_scan+0x84/0x134)
>> [ 8.850522] r8:c0c36000 r7:00001f15 r6:be29a000 r5:0002d000 r4:bdaea600
>> [ 8.857322] [<80260f28>] (ubifs_start_scan) from [<80276270>]
>> (ubifs_recover_leb+0x80/0xb80)
>> [ 8.865794] r8:00001f15 r7:c0c63000 r6:00011000 r5:00001f15 r4:00000001
>> [ 8.872609] [<802761f0>] (ubifs_recover_leb) from [<802627f0>]
>> (ubifs_replay_journal+0xd34/0x1a48)
>> [ 8.881601] r10:00017188 r9:0002d000 r8:be29a000 r7:00000000
>> r6:bdaea608 r5:00001f15
>> [ 8.889518] r4:00000001
>> [ 8.892106] [<80261abc>] (ubifs_replay_journal) from [<80254954>]
>> (ubifs_mount+0x136c/0x2170)
>> [ 8.900666] r10:be29a008 r9:00000000 r8:0003db70 r7:be29a000
>> r6:80b76870 r5:be3f6000
>> [ 8.908584] r4:00000000
>> [ 8.911171] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
>> [ 8.918323] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
>> r6:80b2a2e0 r5:bd9cfac0
>> [ 8.926258] r4:be365900
>> [ 8.928829] [<801055dc>] (mount_fs) from [<80121868>]
>> (vfs_kern_mount+0x5c/0x144)
>> [ 8.936355] r6:00008001 r5:bd9cfac0 r4:be365900
>> [ 8.941056] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
>> (do_mount+0x19c/0xd0c)
>> [ 8.948642] r9:80b2a2e0 r8:80b1ebdc r7:bd9cfa80 r6:bd9cfac0
>> r5:00000060 r4:00000000
>> [ 8.956498] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
>> [ 8.963493] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
>> r6:00000000 r5:bd9cfac0
>> [ 8.971429] r4:bd9cfa80
>> [ 8.973993] [<80125cbc>] (SyS_mount) from [<80aa6260>]
>> (mount_block_root+0x114/0x240)
>> [ 8.981856] r8:befa3a40 r7:80afa7a0 r6:bd9d2000 r5:bd9d2000 r4:00008001
>> [ 8.988651] [<80aa614c>] (mount_block_root) from [<80aa664c>]
>> (prepare_namespace+0x98/0x1d0)
>> [ 8.997123] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
>> r6:80afa7a0 r5:80afa7a0
>> [ 9.005058] r4:80afa7b0
>> [ 9.007617] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
>> (kernel_init_freeable+0x1a4/0x1f0)
>> [ 9.016522] r6:80b8b000 r5:80b8b000 r4:00000007
>> [ 9.021217] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
>> (kernel_init+0x10/0xf4)
>> [ 9.029412] r10:00000000 r9:00000000 r8:00000000 r7:00000000
>> r6:00000000 r5:807c91a4
>> [ 9.037354] r4:00000000
>> [ 9.039918] [<807c91a4>] (kernel_init) from [<800104b8>]
>> (ret_from_fork+0x14/0x3c)
>> [ 9.047525] r4:00000000 r3:00000000
>> [ 9.051259] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
>> empty space LEB 7957:184320, corruption starts at 4665
>> [ 9.062247] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> corruption at LEB 7957:188985
>> [ 9.070972] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> first 8192 bytes from LEB 7957:188985
>> [ 9.080396] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.091903] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> ...
>> [ 12.002997] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 12.014501] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 12.026000] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 7957
>> scanning failed
>> ^^^^ recovery failed
>> [ 12.036886] VFS: Cannot open root device "ubi0:rootfs" or
>> unknown-block(0,0): error -117
>>
>> After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
>> page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
>> the following on board1:
>> [ 8.532624] UBIFS (ubi0:0): recovery needed
>> [ 8.569104] gpmi-nand 112000.gpmi-nand: The page(148656) is an
>> erased page(1,6,1,1).
>> [ 8.577748] gpmi-nand 112000.gpmi-nand: The page(148658) is an
>> erased page(1,6,1,1).
>> [ 8.586413] gpmi-nand 112000.gpmi-nand: The page(148660) is an
>> erased page(1,6,1,1).
>> [ 8.595050] gpmi-nand 112000.gpmi-nand: The page(148662) is an
>> erased page(1,6,1,1).
>> [ 8.603705] gpmi-nand 112000.gpmi-nand: The page(148664) is an
>> erased page(1,6,1,1).
>> [ 8.612310] gpmi-nand 112000.gpmi-nand: The page(148666) is an
>> erased page(1,6,1,1).
>> [ 8.620940] gpmi-nand 112000.gpmi-nand: The page(148668) is an
>> erased page(1,6,1,1).
>> [ 8.629580] gpmi-nand 112000.gpmi-nand: The page(148670) is an
>> erased page(1,6,1,1).
>> ^^^^ 1 bit-flip on ecc read, 1 bit-flip on non-ecc read, ecc-str=6,
>> memset's buffer to 0xff to attempt to recover
>> [ 8.644345] UBIFS (ubi0:0): recovery deferred
>> [ 8.649209] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
>> name "rootfs", R/O mode
>> [ 8.657537] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
>> min./max. I/O unit sizes: 4096 bytes/4096 bytes
>> [ 8.667486] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
>> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
>> [ 8.678409] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
>> [ 8.684283] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
>> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
>> [ 8.697990] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
>> ^^^^ able to recover and always ends up this way over further boots
>> ^^^^ after applying this patch I boot with init=foo so as to not allow
>> rootfs to get mounted read/write by udev and perhaps permanently
>> recover the board
>>
>>
>>
>> Board2 looks like this:
>> [ 9.621167] udevd[106]: starting version 182
>> [ 10.525521] UBIFS (ubi0:0): completing deferred recovery
>> [ 10.540072] UBI DBG io (pid 162): write 4096 bytes to PEB 5891:4096
>> [ 10.547480] UBI DBG io (pid 162): write 110592 bytes to PEB 5891:8192
>> [ 10.578168] UBI DBG io (pid 162): write 4096 bytes to PEB 5892:4096
>> [ 10.585053] UBI DBG io (pid 74): write 4096 bytes to PEB 2944:0
>> [ 10.585639] UBI DBG io (pid 162): write 163840 bytes to PEB 5892:8192
>> [ 10.607478] UBI DBG io (pid 74): write 4096 bytes to PEB 2945:0
>> [ 10.615485] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 163
>> [ 10.617094] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:4096
>> [ 10.618542] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:8192
>> [ 10.619487] UBI DBG io (pid 162): write 4096 bytes to PEB 2859:212992
>> [ 10.620347] UBI DBG io (pid 162): write 4096 bytes to PEB 2850:225280
>> ^^^^^ this is the PEB that gets uncorrectable ECC errors - I verified
Again not written while power cut happened.
Maybe erased. Please enable print. :)
>> that this was not the first time this PEB has been used
>> [ 10.621476] UBI DBG io (pid 162): write 4096 bytes to PEB 2851:225280
>> [ 10.623058] UBIFS (ubi0:0): deferred recovery completed
>> [ 10.628641] UBI DBG io (pid 74): write 4096 bytes to PEB 2946:0
>> bootlogd: cannot allocate pseudo tty: No such file or directory
>> Populating dev cache
>> ALSA: Restoring mixer settings...
>> /usr/sbin/alsactl: load_state:1729: No soundcards found...
>> Sat Oct 24 11:28:04 UTC 2015
>> INIT: Entering runlevel: 5
>> Configuring network interfaces... ifconfig: SIOCGIFFLAGS: No such device
>> Starting system message bus: dbus.
>> Starting Dropbear SSH server: dropbear.
>> Starting rpcbind daemon...done.
>> Starting watchdog: [1
>> 34m[ [1
>> 32m OK [1
>> 34m] [0
>> 39m
>> Starting advanced power management daemon: No APM support in kernel
>> (failed.)
>> starting DNS forwarder and DHCP server: dnsmasq... done.
>>
>> <power cut>
[...]
>> [ 8.683337] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 8.710495] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 8.736821] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 8.763179] ubi0 error: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
>> ^^^^ uncorrectable ECC error on PEB 2850
>> [ 8.774340] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.2.0-00015-g64291f7 #195
>> [ 8.782194] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [ 8.788743] Backtrace:
>> [ 8.791237] [<8001400c>] (dump_backtrace) from [<800141f8>]
>> (show_stack+0x18/0x1c)
>> [ 8.798839] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
>> [ 8.804589] [<800141e0>] (show_stack) from [<807ceeb8>]
>> (dump_stack+0x8c/0xa4)
>> [ 8.811855] [<807cee2c>] (dump_stack) from [<80474730>]
>> (ubi_io_read+0x130/0x378)
>> [ 8.819359] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
>> [ 8.825099] [<80474600>] (ubi_io_read) from [<80471564>]
>> (ubi_eba_read_leb+0x19c/0x4c8)
>> [ 8.833125] r10:00000000 r9:a0c0d000 r8:00000b22 r7:00000001
>> r6:00000000 r5:9dd8a000
>> [ 8.841065] r4:9dd9e000
>> [ 8.843628] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
>> (ubi_leb_read+0x94/0x130)
>> [ 8.851661] r10:00000000 r9:a0c0d000 r8:9dd8a000 r7:00000000
>> r6:00000001 r5:0003e000
>> [ 8.859596] r4:9dd9e000
>> [ 8.862170] [<804706e8>] (ubi_leb_read) from [<80257870>]
>> (ubifs_leb_read+0x30/0x9c)
>> [ 8.869937] r10:9e673008 r9:00000001 r8:00000001 r7:00000000
>> r6:0003e000 r5:9e673000
>> [ 8.877873] r4:9de22240
>> [ 8.880437] [<80257840>] (ubifs_leb_read) from [<80260fac>]
>> (ubifs_start_scan+0x84/0x134)
>> [ 8.888637] r8:a0c0d000 r7:00000001 r6:9e673000 r5:00000000 r4:9de22240
>> [ 8.895433] [<80260f28>] (ubifs_start_scan) from [<802613ac>]
>> (ubifs_scan+0x30/0x3d4)
>> [ 8.903284] r8:00000001 r7:9e673000 r6:a0c0d000 r5:00000000 r4:0003e000
>> [ 8.910097] [<8026137c>] (ubifs_scan) from [<8025fcd8>]
>> (ubifs_read_master+0x48/0xd78)
>> [ 8.918034] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:00000001
>> r6:80b76870 r5:9dda1800
>> [ 8.925952] r4:00000001
>> [ 8.928533] [<8025fc90>] (ubifs_read_master) from [<80254714>]
>> (ubifs_mount+0x112c/0x2170)
>> [ 8.936819] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
>> r6:80b76870 r5:9dda1800
>> [ 8.944738] r4:00000001
>> [ 8.947326] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
>> [ 8.954466] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
>> r6:80b2a2e0 r5:9dd73c80
>> [ 8.962404] r4:9dd76500
>> [ 8.964975] [<801055dc>] (mount_fs) from [<80121868>]
>> (vfs_kern_mount+0x5c/0x144)
>> [ 8.972479] r6:00008001 r5:9dd73c80 r4:9dd76500
>> [ 8.977177] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
>> (do_mount+0x19c/0xd0c)
>> [ 8.984750] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
>> r5:00000060 r4:00000000
>> [ 8.992605] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
>> [ 8.999588] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
>> r6:00000000 r5:9dd73c80
>> [ 9.007522] r4:9e7b5d00
>> [ 9.010087] [<80125cbc>] (SyS_mount) from [<80aa6260>]
>> (mount_block_root+0x114/0x240)
>> [ 9.017941] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
>> [ 9.024736] [<80aa614c>] (mount_block_root) from [<80aa664c>]
>> (prepare_namespace+0x98/0x1d0)
>> [ 9.033195] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
>> r6:80afa7a0 r5:80afa7a0
>> [ 9.041129] r4:80afa7b0
>> [ 9.043690] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
>> (kernel_init_freeable+0x1a4/0x1f0)
>> [ 9.052592] r6:80b8b000 r5:80b8b000 r4:00000007
>> [ 9.057290] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
>> (kernel_init+0x10/0xf4)
>> [ 9.065472] r10:00000000 r9:00000000 r8:00000000 r7:00000000
>> r6:00000000 r5:807c91a4
>> [ 9.073407] r4:00000000
>> [ 9.075970] [<807c91a4>] (kernel_init) from [<800104b8>]
>> (ret_from_fork+0x14/0x3c)
>> [ 9.083560] r4:00000000 r3:00000000
>> [ 9.102369] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 9.128774] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 9.155184] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
>> [ 9.181522] ubi0 error: ubi_io_read: error -74 (ECC error) while
>> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
>> ^^^^ unrecoverable ECC error on PEB 2850
>> [ 9.192683] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.2.0-00015-g64291f7 #195
>> [ 9.200535] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [ 9.207083] Backtrace:
>> [ 9.209573] [<8001400c>] (dump_backtrace) from [<800141f8>]
>> (show_stack+0x18/0x1c)
>> [ 9.217173] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
>> [ 9.222916] [<800141e0>] (show_stack) from [<807ceeb8>]
>> (dump_stack+0x8c/0xa4)
>> [ 9.230177] [<807cee2c>] (dump_stack) from [<80474730>]
>> (ubi_io_read+0x130/0x378)
>> [ 9.237688] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
>> [ 9.243429] [<80474600>] (ubi_io_read) from [<80471564>]
>> (ubi_eba_read_leb+0x19c/0x4c8)
>> [ 9.251458] r10:00000000 r9:a0c4c000 r8:00000b22 r7:00000001
>> r6:00000000 r5:9dd8a000
>> [ 9.259398] r4:9dd9e000
>> [ 9.261962] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
>> (ubi_leb_read+0x94/0x130)
>> [ 9.269988] r10:00000000 r9:a0c4c000 r8:9dd8a000 r7:00000000
>> r6:00000001 r5:0003e000
>> [ 9.277925] r4:9dd9e000
>> [ 9.280495] [<804706e8>] (ubi_leb_read) from [<80257870>]
>> (ubifs_leb_read+0x30/0x9c)
>> [ 9.288264] r10:9e673000 r9:9dd7bf80 r8:00000001 r7:00000000
>> r6:0003e000 r5:9e673000
>> [ 9.296182] r4:9e673000
>> [ 9.298768] [<80257840>] (ubifs_leb_read) from [<8027569c>]
>> (get_master_node+0x54/0x324)
>> [ 9.306880] r8:9e673000 r7:80b77ff8 r6:00001000 r5:00001000 r4:9e673000
>> [ 9.313681] [<80275648>] (get_master_node) from [<80275d98>]
>> (ubifs_recover_master_node+0x60/0x3ec)
>> [ 9.322749] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:80b77ff8
>> r6:00001000 r5:9dda1800
>> [ 9.330685] r4:9e673000
>> [ 9.333248] [<80275d38>] (ubifs_recover_master_node) from
>> [<8026080c>] (ubifs_read_master+0xb7c/0xd78)
>> [ 9.342575] r7:00000001 r6:80b76870 r5:9dda1800 r4:ffffff8b
>> [ 9.348335] [<8025fc90>] (ubifs_read_master) from [<80254714>]
>> (ubifs_mount+0x112c/0x2170)
>> [ 9.356621] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
>> r6:80b76870 r5:9dda1800
>> [ 9.364540] r4:00000001
>> [ 9.367127] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
>> [ 9.374266] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
>> r6:80b2a2e0 r5:9dd73c80
>> [ 9.382204] r4:9dd76500
>> [ 9.384772] [<801055dc>] (mount_fs) from [<80121868>]
>> (vfs_kern_mount+0x5c/0x144)
>> [ 9.392276] r6:00008001 r5:9dd73c80 r4:9dd76500
>> [ 9.396974] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
>> (do_mount+0x19c/0xd0c)
>> [ 9.404547] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
>> r5:00000060 r4:00000000
>> [ 9.412404] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
>> [ 9.419386] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
>> r6:00000000 r5:9dd73c80
>> [ 9.427320] r4:9e7b5d00
>> [ 9.429887] [<80125cbc>] (SyS_mount) from [<80aa6260>]
>> (mount_block_root+0x114/0x240)
>> [ 9.437743] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
>> [ 9.444539] [<80aa614c>] (mount_block_root) from [<80aa664c>]
>> (prepare_namespace+0x98/0x1d0)
>> [ 9.452999] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
>> r6:80afa7a0 r5:80afa7a0
>> [ 9.460933] r4:80afa7b0
>> [ 9.463494] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
>> (kernel_init_freeable+0x1a4/0x1f0)
>> [ 9.472386] r6:80b8b000 r5:80b8b000 r4:00000007
>> [ 9.477081] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
>> (kernel_init+0x10/0xf4)
>> [ 9.485262] r10:00000000 r9:00000000 r8:00000000 r7:00000000
>> r6:00000000 r5:807c91a4
>> [ 9.493196] r4:00000000
>> [ 9.495758] [<807c91a4>] (kernel_init) from [<800104b8>]
>> (ret_from_fork+0x14/0x3c)
>> [ 9.503349] r4:00000000 r3:00000000
>> [ 9.507115] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
>> failed to recover master node
>> ^^^ master node recovery fails
[...]
>> I can provide dumps of various blocks if needed or add more debugging
>> on the reads after the corruption occurred. It takes me a day or two
>> to re-create the issue if you want me to turn on more debugging during
>> the writes.
Please retry first with erase debug prints enabled too.
>> I've cc'd Huang, Elie, and Brian who were involved in the patch to
>> detect bit-flips in gpmi-nand.c reads - perhaps they have some more
>> ideas. I find it interesting that in one case that patch resolves the
>> issue and in the other it does not.
Yeah, good idea!
Thanks,
//richard
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-02 21:31 ` Richard Weinberger
@ 2015-11-02 22:11 ` Brian Norris
0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2015-11-02 22:11 UTC (permalink / raw)
To: Richard Weinberger
Cc: Tim Harvey, Artem Bityutskiy, Adrian Hunter, linux-mtd,
Elie De Brauwer, Huang Shijie, Boris Brezillon
+ Boris
On Mon, Nov 02, 2015 at 10:31:53PM +0100, Richard Weinberger wrote:
> Am 02.11.2015 um 21:31 schrieb Tim Harvey:
>
> >> I've cc'd Huang, Elie, and Brian who were involved in the patch to
> >> detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> >> ideas. I find it interesting that in one case that patch resolves the
> >> issue and in the other it does not.
>
> Yeah, good idea!
Sorry, I don't have much time to care about gpmi-nand right now. I
recall that there was ugly support for detecting bitflips in erased
pages posted at some point (in case that's relevant; didn't read the
whole essay), but that never got merged, and Huang left Freescale.
I also recall Boris struggled with that driver at some point. Perhaps he
can be persuaded to care.
Brian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-02 20:27 ` Tim Harvey
2015-11-02 20:31 ` Tim Harvey
@ 2015-11-03 9:10 ` Artem Bityutskiy
1 sibling, 0 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2015-11-03 9:10 UTC (permalink / raw)
To: Tim Harvey, Richard Weinberger
Cc: Adrian Hunter, linux-mtd, Huang Shijie, Elie De Brauwer,
Brian Norris
On Mon, 2015-11-02 at 12:27 -0800, Tim Harvey wrote:
> [ 9.507115] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
> failed to recover master node
> ^^^ master node recovery fails
> ...
> [ 9.618335] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(0,0)
You need to enable debugging messages, and/or add some custom prints to
'get_master_node()', which for some reason returns an error and makes
'ubifs_recover_master_node()' fail. Figure out where it fails, and we
then can try figuring out why and may be fix it. Add custom printk's if
needed.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31 ` Tim Harvey
@ 2015-11-03 10:06 ` Michal Suchanek
2015-11-03 10:18 ` Ricard Wanderlof
2015-11-03 10:43 ` Artem Bityutskiy
1 sibling, 2 replies; 18+ messages in thread
From: Michal Suchanek @ 2015-11-03 10:06 UTC (permalink / raw)
To: Richard Weinberger
Cc: Tim Harvey, Artem Bityutskiy, Adrian Hunter, MTD Maling List
On 26 October 2015 at 21:01, Richard Weinberger <richard@nod.at> wrote:
> Tim,
>
> Am 26.10.2015 um 20:37 schrieb Tim Harvey:
>>
>> [ 9.282782] r4:00000000 r3:00000000
>> [ 9.286909] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
>> empty space LEB 377:217088, corruption starts at 24762
>> [ 9.297900] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> corruption at LEB 377:241850
>> [ 9.306536] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
>> first 8192 bytes from LEB 377:241850
>> [ 9.315870] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.327374] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.338883] 00000040: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> [ 9.350389] 00000060: ffffffff ffffffff ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ................................
>> ...
>> [ 9.352755] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
>> fffffff ffffffff ................................
>> [ 9.352765] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff f
>> fffffff ffffffff ................................
>> [ 9.352770] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 377 scanning f
>> ailed
>> [ 12.271485] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0):
>> error -117
>>
>> Thus far we have encountered this with a 16Gb MT29F16G08 and 'not'
>> with a 2Gb MT29F2G08. The two parts have different geometries and the
>> 16Gb part has a much larger block erase time (2ms) compared to the 2Gb
>> (700us).
>
> gpmi-nand is not able to correct bit flips on erased pages.
> This is why UBI is facing uncorrectable ECC errors and UBIFS gives up.
> In March there was an attempt to fix that in software.
> But no mainline ready solution was presented so far:
> http://lists.infradead.org/pipermail/linux-mtd/2014-March/052521.html
>
> It is not clear whether to implement this directly in gpmi-nand or MTD core.
> Currently UBIFS assumes that empty spaces must contain only 0xff octets.
> A naive approach would be removing that check from UBIFS, bit this can have
> disastrous consequences as UBIFS's recovery algorithm relies on that.
Hello,
it has been pointed out that this assumption on the part of UBI that
erased pages are composed of 0xff bytes is just wrong.
Here it is wrong in the sense that the pages are not bit-perfect but
there are other reasons why it may go wrong.
- if the nand controller uses randomization layer the pages are 0xff
bytes transformed by the randomization layer
- if MTD grows full-disk encryption then reading a page of 0xff bytes
will yield what this page decrypts to using current encryption scheme
The layering is wrong here. The MTD core should provide a function to
check if a physical page is empty and the driver should provide
driver-specific implementation if needed.
Thanks
Michal
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-03 10:06 ` Michal Suchanek
@ 2015-11-03 10:18 ` Ricard Wanderlof
2015-11-03 10:43 ` Artem Bityutskiy
1 sibling, 0 replies; 18+ messages in thread
From: Ricard Wanderlof @ 2015-11-03 10:18 UTC (permalink / raw)
To: Michal Suchanek
Cc: Richard Weinberger, Tim Harvey, MTD Maling List, Adrian Hunter,
Artem Bityutskiy
On Tue, 3 Nov 2015, Michal Suchanek wrote:
> it has been pointed out that this assumption on the part of UBI that
> erased pages are composed of 0xff bytes is just wrong.
>
> Here it is wrong in the sense that the pages are not bit-perfect but
> there are other reasons why it may go wrong.
>
> - if the nand controller uses randomization layer the pages are 0xff
> bytes transformed by the randomization layer
>
> - if MTD grows full-disk encryption then reading a page of 0xff bytes
> will yield what this page decrypts to using current encryption scheme
>
> The layering is wrong here. The MTD core should provide a function to
> check if a physical page is empty and the driver should provide
> driver-specific implementation if needed.
In a sense, doesn't the driver already do that in that an erased page with
bitflips should be corrected by ECC so that there are no bitflips
apparent? If, as many hardware controllers do, the ECC check for an erased
page fails, then the driver should still be able to determine that the
page in question is erased, with bitflips, and should be able to provide a
true 0xff page upward.
I agree though that an erased-page API function would be good too, as it
moves the guesswork to the right place (the actual driver).
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-03 10:06 ` Michal Suchanek
2015-11-03 10:18 ` Ricard Wanderlof
@ 2015-11-03 10:43 ` Artem Bityutskiy
1 sibling, 0 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2015-11-03 10:43 UTC (permalink / raw)
To: Michal Suchanek, Richard Weinberger
Cc: Tim Harvey, Adrian Hunter, MTD Maling List
On Tue, 2015-11-03 at 11:06 +0100, Michal Suchanek wrote:
> it has been pointed out that this assumption on the part of UBI that
> erased pages are composed of 0xff bytes is just wrong.
>
> Here it is wrong in the sense that the pages are not bit-perfect but
> there are other reasons why it may go wrong.
>
> - if the nand controller uses randomization layer the pages are 0xff
> bytes transformed by the randomization layer
>
> - if MTD grows full-disk encryption then reading a page of 0xff bytes
> will yield what this page decrypts to using current encryption scheme
>
> The layering is wrong here. The MTD core should provide a function to
> check if a physical page is empty and the driver should provide
> driver-specific implementation if needed.
Yes, this was discussed, and the missing part is patches. The reason I
am replying to you is that I find the wording you are using to be
inaccurate.
The assumptions were not wrong when this stuff was designed,
implemented, tested and used in production. And for that hardware they
are not wrong.
Nowadays the situation is different, the HW is different. It is not the
same as the HW UBIFS was implemented and tested for. UBIFS needs to be
adjusted in order to support that hardware. "Wrong wrong wrong" is
incorrect wording, please do not use it. Please, use "needs more work",
"needs extending", "needs to support bit-flips in empty space", etc.
And layering is not wrong. We do not implement useless MTD methods for
something which does not exist in the Linux kernel. Please, introduce
the encryption support, introduce that driver with that method, and
then refactor the code. Please, do not use word "wrong", because it is
not the correct word for this situation, and it is unnecessarily
negative.
Thank you!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-02 20:31 ` Tim Harvey
2015-11-02 21:31 ` Richard Weinberger
@ 2015-11-03 13:38 ` Boris Brezillon
2015-11-16 15:01 ` Tim Harvey
1 sibling, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2015-11-03 13:38 UTC (permalink / raw)
To: Tim Harvey
Cc: Richard Weinberger, Elie De Brauwer, Artem Bityutskiy,
Adrian Hunter, linux-mtd, Huang Shijie, Brian Norris
Hi Tim,
On Mon, 2 Nov 2015 12:31:11 -0800
Tim Harvey <tharvey@gateworks.com> wrote:
> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> > [ 8.635364] UBIFS (ubi0:0): recovery needed
> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
> > not the first time this PEB has been used
I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
erasing the block the bit stays at 0). Have you tried to erase this
block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
(nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
> >
> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> > ideas. I find it interesting that in one case that patch resolves the
> > issue and in the other it does not.
I posted a slightly reworked version of Huang's patch [1] a while ago
addressing the "account for bitflips in OOB area" problem, but maybe we
could do better (avoid this extra "read in raw mode" step, or use the
generic nand_check_erased_ecc_chunk() function when ECC bytes are
aligned).
Best Regards,
Boris
[1]https://patchwork.ozlabs.org/patch/416543/
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-03 13:38 ` Boris Brezillon
@ 2015-11-16 15:01 ` Tim Harvey
2015-11-30 21:58 ` Tim Harvey
0 siblings, 1 reply; 18+ messages in thread
From: Tim Harvey @ 2015-11-16 15:01 UTC (permalink / raw)
To: Boris Brezillon
Cc: Richard Weinberger, Elie De Brauwer, Artem Bityutskiy,
Adrian Hunter, linux-mtd, Huang Shijie, Brian Norris
On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> Hi Tim,
>
> On Mon, 2 Nov 2015 12:31:11 -0800
> Tim Harvey <tharvey@gateworks.com> wrote:
>
>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
>> > [ 8.635364] UBIFS (ubi0:0): recovery needed
>> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
>> > not the first time this PEB has been used
>
> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
> erasing the block the bit stays at 0). Have you tried to erase this
> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
Boris,
I examined the bad PEB on several boards now that I have reproduced
this issue with and found no stuck bits (no 0's following erase, no
1's following erase and raw write all ff's).
So in this case it doesn't appear to be a bad block. Incidentally for
UBI/UBIFS, what is in charge of detecting bad blocks, how are they
detected, and when/how are they marked?
>
>> >
>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
>> > ideas. I find it interesting that in one case that patch resolves the
>> > issue and in the other it does not.
>
> I posted a slightly reworked version of Huang's patch [1] a while ago
> addressing the "account for bitflips in OOB area" problem, but maybe we
> could do better (avoid this extra "read in raw mode" step, or use the
> generic nand_check_erased_ecc_chunk() function when ECC bytes are
> aligned).
>
> Best Regards,
>
> Boris
>
> [1]https://patchwork.ozlabs.org/patch/416543/
At this point I likely need to reproduce this problem with additional
debugging enabled to show what last erased and/or wrote to the PEB's
that are corrupt. I will also try your patch as well and see if that
resolves anything.
Regards,
Tim
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-16 15:01 ` Tim Harvey
@ 2015-11-30 21:58 ` Tim Harvey
2015-12-01 9:12 ` Boris Brezillon
0 siblings, 1 reply; 18+ messages in thread
From: Tim Harvey @ 2015-11-30 21:58 UTC (permalink / raw)
To: Boris Brezillon
Cc: Richard Weinberger, Elie De Brauwer, Artem Bityutskiy,
Adrian Hunter, linux-mtd, Huang Shijie, Brian Norris
On Mon, Nov 16, 2015 at 7:01 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
>> Hi Tim,
>>
>> On Mon, 2 Nov 2015 12:31:11 -0800
>> Tim Harvey <tharvey@gateworks.com> wrote:
>>
>>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
>>> > [ 8.635364] UBIFS (ubi0:0): recovery needed
>>> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>>> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>>> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
>>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
>>> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
>>> > not the first time this PEB has been used
>>
>> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
>> erasing the block the bit stays at 0). Have you tried to erase this
>> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
>> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
>
> Boris,
>
> I examined the bad PEB on several boards now that I have reproduced
> this issue with and found no stuck bits (no 0's following erase, no
> 1's following erase and raw write all ff's).
>
> So in this case it doesn't appear to be a bad block. Incidentally for
> UBI/UBIFS, what is in charge of detecting bad blocks, how are they
> detected, and when/how are they marked?
>
>>
>>> >
>>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
>>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
>>> > ideas. I find it interesting that in one case that patch resolves the
>>> > issue and in the other it does not.
>>
>> I posted a slightly reworked version of Huang's patch [1] a while ago
>> addressing the "account for bitflips in OOB area" problem, but maybe we
>> could do better (avoid this extra "read in raw mode" step, or use the
>> generic nand_check_erased_ecc_chunk() function when ECC bytes are
>> aligned).
>>
>> Best Regards,
>>
>> Boris
>>
>> [1]https://patchwork.ozlabs.org/patch/416543/
>
> At this point I likely need to reproduce this problem with additional
> debugging enabled to show what last erased and/or wrote to the PEB's
> that are corrupt. I will also try your patch as well and see if that
> resolves anything.
>
> Regards,
>
> Tim
Boris,
I tried your patch [1] on a week-long test over 10x IMX6 boards
booting over 60K times across temperature ranges and the patch
resolved many previous failures to mount rootfs errors (previously I
would encounter around 1% failure to mount rootfs). In addition I saw
no nand corruption where I would have expected to see it several times
with those numbers so I suspect this may have resolved that as well.
Can you re-submit your patch for inclusion and/or discussion?
Regards,
Tim
[1]https://patchwork.ozlabs.org/patch/416543/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: UBIFS corruption after power cut - possibly unstable bits issue?
2015-11-30 21:58 ` Tim Harvey
@ 2015-12-01 9:12 ` Boris Brezillon
0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2015-12-01 9:12 UTC (permalink / raw)
To: Tim Harvey
Cc: Richard Weinberger, Elie De Brauwer, Artem Bityutskiy,
Adrian Hunter, linux-mtd, Huang Shijie, Brian Norris
On Mon, 30 Nov 2015 13:58:34 -0800
Tim Harvey <tharvey@gateworks.com> wrote:
> On Mon, Nov 16, 2015 at 7:01 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> > On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:
> >> Hi Tim,
> >>
> >> On Mon, 2 Nov 2015 12:31:11 -0800
> >> Tim Harvey <tharvey@gateworks.com> wrote:
> >>
> >>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> >>> > [ 8.635364] UBIFS (ubi0:0): recovery needed
> >>> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
> >>> > not the first time this PEB has been used
> >>
> >> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
> >> erasing the block the bit stays at 0). Have you tried to erase this
> >> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
> >> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
> >
> > Boris,
> >
> > I examined the bad PEB on several boards now that I have reproduced
> > this issue with and found no stuck bits (no 0's following erase, no
> > 1's following erase and raw write all ff's).
> >
> > So in this case it doesn't appear to be a bad block. Incidentally for
> > UBI/UBIFS, what is in charge of detecting bad blocks, how are they
> > detected, and when/how are they marked?
> >
> >>
> >>> >
> >>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
> >>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> >>> > ideas. I find it interesting that in one case that patch resolves the
> >>> > issue and in the other it does not.
> >>
> >> I posted a slightly reworked version of Huang's patch [1] a while ago
> >> addressing the "account for bitflips in OOB area" problem, but maybe we
> >> could do better (avoid this extra "read in raw mode" step, or use the
> >> generic nand_check_erased_ecc_chunk() function when ECC bytes are
> >> aligned).
> >>
> >> Best Regards,
> >>
> >> Boris
> >>
> >> [1]https://patchwork.ozlabs.org/patch/416543/
> >
> > At this point I likely need to reproduce this problem with additional
> > debugging enabled to show what last erased and/or wrote to the PEB's
> > that are corrupt. I will also try your patch as well and see if that
> > resolves anything.
> >
> > Regards,
> >
> > Tim
>
> Boris,
>
> I tried your patch [1] on a week-long test over 10x IMX6 boards
> booting over 60K times across temperature ranges and the patch
> resolved many previous failures to mount rootfs errors (previously I
> would encounter around 1% failure to mount rootfs). In addition I saw
> no nand corruption where I would have expected to see it several times
> with those numbers so I suspect this may have resolved that as well.
>
> Can you re-submit your patch for inclusion and/or discussion?
I'm quite busy on other topics lately, but feel free to adapt/resubmit
the patch.
Best Regards,
Boris
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-12-01 9:13 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31 ` Tim Harvey
2015-10-26 21:41 ` Richard Weinberger
2015-10-27 19:01 ` Tim Harvey
2015-10-27 19:52 ` Richard Weinberger
2015-11-02 20:27 ` Tim Harvey
2015-11-02 20:31 ` Tim Harvey
2015-11-02 21:31 ` Richard Weinberger
2015-11-02 22:11 ` Brian Norris
2015-11-03 13:38 ` Boris Brezillon
2015-11-16 15:01 ` Tim Harvey
2015-11-30 21:58 ` Tim Harvey
2015-12-01 9:12 ` Boris Brezillon
2015-11-03 9:10 ` Artem Bityutskiy
2015-11-03 10:06 ` Michal Suchanek
2015-11-03 10:18 ` Ricard Wanderlof
2015-11-03 10:43 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).