* ubi_eba_init_scan: cannot reserve enough PEBs @ 2010-07-22 18:37 Matthew L. Creech 2010-07-26 5:21 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-07-22 18:37 UTC (permalink / raw) To: linux-mtd Hi, I have some UBI[FS]-based devices that I keep getting this warning on, and I'm wondering if someone could help me to understand what it means (and how to fix it). Here's an example syslog snippet: Jan 1 00:00:32 kernel: UBI: attaching mtd1 to ubi0 Jan 1 00:00:32 kernel: UBI: physical eraseblock size: 131072 bytes (128 KiB) Jan 1 00:00:32 kernel: UBI: logical eraseblock size: 129024 bytes Jan 1 00:00:32 kernel: UBI: smallest flash I/O unit: 2048 Jan 1 00:00:32 kernel: UBI: sub-page size: 512 Jan 1 00:00:32 kernel: UBI: VID header offset: 512 (aligned 512) Jan 1 00:00:32 kernel: UBI: data offset: 2048 Jan 1 00:00:32 kernel: UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 75, need 81 Jan 1 00:00:32 kernel: UBI: attached mtd1 to ubi0 Jan 1 00:00:32 kernel: UBI: MTD device name: "ubi" Jan 1 00:00:32 kernel: UBI: MTD device size: 1019 MiB Jan 1 00:00:32 kernel: UBI: number of good PEBs: 8117 Jan 1 00:00:32 kernel: UBI: number of bad PEBs: 35 Jan 1 00:00:32 kernel: UBI: max. allowed volumes: 128 Jan 1 00:00:32 kernel: UBI: wear-leveling threshold: 4096 Jan 1 00:00:32 kernel: UBI: number of internal volumes: 1 Jan 1 00:00:32 kernel: UBI: number of user volumes: 1 Jan 1 00:00:32 kernel: UBI: available PEBs: 0 Jan 1 00:00:32 kernel: UBI: total number of reserved PEBs: 8117 Jan 1 00:00:32 kernel: UBI: number of PEBs reserved for bad PEB handling: 75 Jan 1 00:00:32 kernel: UBI: max/mean erase counter: 2/1 Jan 1 00:00:32 kernel: UBI: image sequence number: 0 Jan 1 00:00:32 kernel: UBI: background thread "ubi_bgt0d" started, PID 197 Everything seems okay, except for that complaint about not being able to reserve enough PEBs. I'm not sure why I get this, or what I can do about it. A search turns up this thread: http://patchwork.ozlabs.org/patch/42566/ which implies that one of the UBI volumes was too large & didn't leave enough space for reserved PEBs. But I don't think that applies to me, or what I could do about it. My bring-up process is as follows: ===== 1. My bootloader partition is 5MB out of 1GB, and the rest of NAND is a UBI partition housing a single UBIFS filesystem. That's 8152 blocks @ 128kB each (ignoring OOB). 2. I create my UBIFS image with: mkfs.ubifs -m 2048 -e 129024 -c 8152 -x lzo -U -v -r ./.tmp/ -o ubifs.img The only thing I'm unsure about is the "-c 8152", but since that's the *maximum* LEB count, I'd assumed it was okay to leave it as the full size of my NAND partition. (I don't know how many blocks are bad, so otherwise I'll have to pessimistically limit the size & waste space) 3. I create a UBI image (suitable for flashing) with: ubinize -o ubi.img -m 2048 -p 128KiB -m 2048 -s 512 ubi.cfg The file ubi.cfg is as follows: [ubifs-container] mode=ubi image=ubifs.img vol_id=0 vol_size=100MiB vol_type=dynamic vol_name=container vol_flags=autoresize Since autoresize is set, presumably vol_size doesn't have much effect, correct? My understanding is that when the device boots for the first time, it'll see the autoresize flag and expand the volume size to fill whatever space is available after accounting for bad blocks (nearly 1019MB). 4. I add ECC info and use a hardware NAND programmer to flash the device. ===== Does this all look correct? Or are my options wrong in one of these steps? I'm mainly trying to understand what this particular warning means, and ensure that my procedure is correct. I'm having other UBI-related problems on this platform, and want to be sure that this isn't the cause before I dig further into other possibilities. Any help is appreciated, thanks! -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-22 18:37 ubi_eba_init_scan: cannot reserve enough PEBs Matthew L. Creech @ 2010-07-26 5:21 ` Artem Bityutskiy 2010-07-26 21:13 ` Matthew L. Creech 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-07-26 5:21 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd Hi, On Thu, 2010-07-22 at 14:37 -0400, Matthew L. Creech wrote: > Hi, > > I have some UBI[FS]-based devices that I keep getting this warning on, > and I'm wondering if someone could help me to understand what it means > (and how to fix it). Here's an example syslog snippet: > > Jan 1 00:00:32 kernel: UBI: attaching mtd1 to ubi0 > Jan 1 00:00:32 kernel: UBI: physical eraseblock size: 131072 bytes (128 KiB) > Jan 1 00:00:32 kernel: UBI: logical eraseblock size: 129024 bytes > Jan 1 00:00:32 kernel: UBI: smallest flash I/O unit: 2048 > Jan 1 00:00:32 kernel: UBI: sub-page size: 512 > Jan 1 00:00:32 kernel: UBI: VID header offset: 512 (aligned 512) > Jan 1 00:00:32 kernel: UBI: data offset: 2048 > Jan 1 00:00:32 kernel: UBI warning: ubi_eba_init_scan: cannot reserve > enough PEBs for bad PEB handling, reserved 75, need 81 > Jan 1 00:00:32 kernel: UBI: attached mtd1 to ubi0 > Jan 1 00:00:32 kernel: UBI: MTD device name: "ubi" > Jan 1 00:00:32 kernel: UBI: MTD device size: 1019 MiB > Jan 1 00:00:32 kernel: UBI: number of good PEBs: 8117 > Jan 1 00:00:32 kernel: UBI: number of bad PEBs: 35 > Jan 1 00:00:32 kernel: UBI: max. allowed volumes: 128 > Jan 1 00:00:32 kernel: UBI: wear-leveling threshold: 4096 > Jan 1 00:00:32 kernel: UBI: number of internal volumes: 1 > Jan 1 00:00:32 kernel: UBI: number of user volumes: 1 > Jan 1 00:00:32 kernel: UBI: available PEBs: 0 > Jan 1 00:00:32 kernel: UBI: total number of reserved PEBs: 8117 > Jan 1 00:00:32 kernel: UBI: number of PEBs reserved for bad PEB handling: 75 > Jan 1 00:00:32 kernel: UBI: max/mean erase counter: 2/1 > Jan 1 00:00:32 kernel: UBI: image sequence number: 0 > Jan 1 00:00:32 kernel: UBI: background thread "ubi_bgt0d" started, PID 197 > > Everything seems okay, except for that complaint about not being able > to reserve enough PEBs. I'm not sure why I get this, or what I can do > about it. A search turns up this thread: > > http://patchwork.ozlabs.org/patch/42566/ > > which implies that one of the UBI volumes was too large & didn't leave > enough space for reserved PEBs. But I don't think that applies to me, > or what I could do about it. My bring-up process is as follows: UBI wants 1% of PEBs to be reserved for bad block handling. > ===== > 1. My bootloader partition is 5MB out of 1GB, and the rest of NAND is > a UBI partition housing a single UBIFS filesystem. That's 8152 blocks > @ 128kB each (ignoring OOB). OK. > 2. I create my UBIFS image with: > > mkfs.ubifs -m 2048 -e 129024 -c 8152 -x lzo -U -v -r ./.tmp/ -o ubifs.img > > The only thing I'm unsure about is the "-c 8152", but since that's the > *maximum* LEB count, I'd assumed it was okay to leave it as the full > size of my NAND partition. (I don't know how many blocks are bad, so > otherwise I'll have to pessimistically limit the size & waste space) This looks ok, including -c > 3. I create a UBI image (suitable for flashing) with: > > ubinize -o ubi.img -m 2048 -p 128KiB -m 2048 -s 512 ubi.cfg > > The file ubi.cfg is as follows: > > [ubifs-container] > mode=ubi > image=ubifs.img > vol_id=0 > vol_size=100MiB > vol_type=dynamic > vol_name=container > vol_flags=autoresize > > Since autoresize is set, presumably vol_size doesn't have much effect, > correct? My understanding is that when the device boots for the first > time, it'll see the autoresize flag and expand the volume size to fill > whatever space is available after accounting for bad blocks (nearly > 1019MB). Yes. But in the log you sent I do not see any message about autoresize happening - UBI prints them. Just in case, about autoresize: http://www.linux-mtd.infradead.org/doc/ubi.html#L_autoresize Also, search for word "reserved" on that page, you may get more info about what is this PEB reservation is - but in short, it is to handle bad PEBs. > 4. I add ECC info and use a hardware NAND programmer to flash the device. > ===== Does it erase whole flash before writing the image? I see that your image sequence number is 0, which means you probably use old ubi tools. Please, use the latest ubinize - it should pick random sequence number, or you may use -Q option. The reason why we introduced this was to detect situation when you have flash with valid UBI image, then flash new image on top, but do not erase the eraseblocks you did not flash. This may also happen due to an error while flashing or interrupted flashing. If the old and new image sequence numbers are different, UBI will notice this. Otherwise it may silently accept the "mixed" image and you may end-up with strange errors. It does not look like this is the case for you, but still, make sure your image sequence numbers are not always zero, to play safe. > > Does this all look correct? Or are my options wrong in one of these steps? Look correct. When you see this warning, can you mount UBIFS? Does it look OK? > I'm mainly trying to understand what this particular warning means, > and ensure that my procedure is correct. I'm having other UBI-related > problems on this platform, and want to be sure that this isn't the > cause before I dig further into other possibilities. > > Any help is appreciated, thanks! Can you please enable UBI debugging messages and also "Additional UBI initialization and build messages" and attach a log? See this writing for help: http://www.linux-mtd.infradead.org/faq/ubi.html#L_how_debug -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-26 5:21 ` Artem Bityutskiy @ 2010-07-26 21:13 ` Matthew L. Creech 2010-07-27 15:12 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-07-26 21:13 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd [-- Attachment #1: Type: text/plain, Size: 3377 bytes --] Hi Artem, thanks for the reply. Responses below: On Mon, Jul 26, 2010 at 1:21 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > UBI wants 1% of PEBs to be reserved for bad block handling. > OK, so with my flash layout it wants to reserve 81 blocks for bad PEB handling, and does so when first initialized. Then during the course of normal device operation (lots of reads & writes), let's say 8 more of them go bad that weren't factory bad blocks. Will it now print this warning because it only has 73 reserve PEBs left? (In which case it seems fairly innocuous, right?) > > Yes. But in the log you sent I do not see any message about autoresize > happening - UBI prints them. > Correct, this device was running for a while (several months) before it started having problems. I've never seen the warning printed on a brand new device (which is the only time autoresize happens, right?), but I've seen it on several that have been in operation for a while. I just wanted to be sure that I'm using autoresize properly, and that it's not somehow screwing up the # of reserved PEBs. But it doesn't seem like that's the case. (Unless the sequence # or non-erased flash are to blame - below). > > Does it erase whole flash before writing the image? I see that your > image sequence number is 0, which means you probably use old ubi tools. > Please, use the latest ubinize - it should pick random sequence number, > or you may use -Q option. > I am using up to date mtd-utils now, but this device has been in the field for months. We were using a mtd-utils git snapshot from 4/29/09 to generate the UBI image. I didn't know about the sequence number, I'll be sure we use an updated mtd-utils for our next firmware version. Could this account for the warning and/or the UBIFS error below? Or would these kinds of problems manifest in a different way entirely? > > When you see this warning, can you mount UBIFS? Does it look OK? > So far I've only noticed it on 3 devices. All 3 had UBIFS errors (below) later on in the boot process, which is what prompted me to wonder what the warning meant. I'm not entirely sure that the UBI warning and UBIFS error are related, but so far I haven't noticed the warning on any other devices. > > Can you please enable UBI debugging messages and also "Additional UBI > initialization and build messages" and attach a log? See this writing > for help: http://www.linux-mtd.infradead.org/faq/ubi.html#L_how_debug > Certainly. I enabled all the relevant UBI and UBIFS debugging options that I saw, along with internal self-checks, but there's still not a whole lot of output. Full console dump is attached - this is a different device than the first, but exhibits the same problem. Unfortunately I'm not yet sure what causes the devices to get into this state, so I can't easily reproduce whatever makes it get into this state in the first place. However I own one of them and have it at my desk, so I can perform any tests & gather any additional info that would be helpful. FYI, I did build & run all of the MTD test modules to prove out the platform-level NAND code (MPC 8313), and encountered no problems. However, that was on a different device (one that works fine), since the nature of the tests means that I have to re-partition my flash so that there's a spare MTD to work with. Thanks! -- Matthew L. Creech [-- Attachment #2: ubifs-err.txt --] [-- Type: text/plain, Size: 10918 bytes --] Linux version 2.6.34 (mlcreech@px-build) (gcc version 4.4.4 (GCC) ) #2 Mon Jul 2 6 16:16:52 EDT 2010 bootconsole [udbg0] enabled setup_arch: bootmem mpc831x_rdb_setup_arch() arch: exit Zone PFN ranges: DMA 0x00000000 -> 0x00008000 Normal empty Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x00000000 -> 0x00008000 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 Kernel command line: ubi.mtd=1 root=ubi0:container rw rootfstype=ubifs initlo=/r ootfs1.img init=/linuxrc PID hash table entries: 512 (order: -1, 2048 bytes) Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 126412k/131072k available (3280k kernel code, 4660k reserved, 136k data, 73k bss, 144k init) Kernel virtual memory layout: * 0xfffdf000..0xfffff000 : fixmap * 0xfdffe000..0xfe000000 : early ioremap * 0xc9000000..0xfdffe000 : vmalloc & ioremap SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 Hierarchical RCU implementation. NR_IRQS:512 nr_irqs:512 IPIC (128 IRQ sources) at c9000700 clocksource: timebase mult[f00000a] shift[22] registered Mount-cache hash table entries: 512 NET: Registered protocol family 16 bio: create slab <bio-0> at 0 Freescale Elo / Elo Plus DMA driver Switching to clocksource timebase NET: Registered protocol family 2 IP route cache hash table entries: 1024 (order: 0, 4096 bytes) TCP established hash table entries: 4096 (order: 3, 32768 bytes) TCP bind hash table entries: 4096 (order: 2, 16384 bytes) TCP: Hash tables configured (established 4096 bind 4096) TCP reno registered UDP hash table entries: 256 (order: 0, 4096 bytes) UDP-Lite hash table entries: 256 (order: 0, 4096 bytes) NET: Registered protocol family 1 WDT driver for MPC8xxx initialized. mode:reset timeout=65535 (64 seconds) fsl-elo-dma e00082a8.dma: #0 (fsl,elo-dma-channel), irq 71 fsl-elo-dma e00082a8.dma: #1 (fsl,elo-dma-channel), irq 71 fsl-elo-dma e00082a8.dma: #2 (fsl,elo-dma-channel), irq 71 fsl-elo-dma e00082a8.dma: #3 (fsl,elo-dma-channel), irq 71 squashfs: version 4.0 (2009/01/31) Phillip Lougher Registering unionfs 2.5.4 (for 2.6.34-rc0) msgmni has been set to 246 alg: No test for stdrng (krng) io scheduler noop registered io scheduler deadline registered (default) Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 16) is a 16550A console [ttyS0] enabled, bootconsole disabled console [ttyS0] enabled, bootconsole disabled serial8250.0: ttyS1 at MMIO 0xe0004600 (irq = 17) is a 16550A brd: module loaded loop: module loaded NAND device: Manufacturer ID: 0xec, Chip ID: 0xd3 (Samsung NAND 1GiB 3,3V 8-bit) RedBoot partition parsing not available Creating 2 MTD partitions on "e2800000.flash": 0x000000000000-0x000000500000 : "u-boot" 0x000000500000-0x000040000000 : "ubi" eLBC NAND device at 0xe2800000, bank 0 UBI: attaching mtd1 to ubi0 UBI: physical eraseblock size: 131072 bytes (128 KiB) UBI: logical eraseblock size: 129024 bytes UBI: smallest flash I/O unit: 2048 UBI: sub-page size: 512 UBI: VID header offset: 512 (aligned 512) UBI: data offset: 2048 UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 73, need 81 UBI: attached mtd1 to ubi0 UBI: MTD device name: "ubi" UBI: MTD device size: 1019 MiB UBI: number of good PEBs: 8136 UBI: number of bad PEBs: 16 UBI: max. allowed volumes: 128 UBI: wear-leveling threshold: 4096 UBI: number of internal volumes: 1 UBI: number of user volumes: 1 UBI: available PEBs: 0 UBI: total number of reserved PEBs: 8136 UBI: number of PEBs reserved for bad PEB handling: 73 UBI: max/mean erase counter: 316/49 UBI: image sequence number: 0 UBI: background thread "ubi_bgt0d" started, PID 196 Fixed MDIO Bus: probed eth0: Gianfar Ethernet Controller Version 1.2, 00:e0:0c:00:95:01 eth0: Running with NAPI enabled eth0: RX BD ring size for Q[0]: 256 eth0: TX BD ring size for Q[0]: 256 Freescale PowerQUICC MII Bus: probed i2c /dev entries driver nf_conntrack version 0.5.0 (1975 buckets, 7900 max) ip_tables: (C) 2000-2006 Netfilter Core Team TCP cubic registered NET: Registered protocol family 10 ip6_tables: (C) 2000-2006 Netfilter Core Team IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 UBIFS: recovery needed UBIFS error (pid 1): ubifs_read_node: bad node type (255 but expected 9) UBIFS error (pid 1): ubifs_read_node: bad node at LEB 5586:110392 Call Trace: [c7827cf0] [c00083f0] show_stack+0x7c/0x19c (unreliable) [c7827d30] [c00e9fa8] ubifs_read_node+0x2f4/0x330 [c7827d60] [c010d0b0] ubifs_load_znode+0xe4/0x590 [c7827da0] [c010f9e0] dbg_walk_index+0x208/0x304 [c7827de0] [c010fb34] dbg_check_idx_size+0x58/0xe8 [c7827e10] [c00e6fa4] ubifs_get_sb+0xc90/0x1670 [c7827ea0] [c0078954] vfs_kern_mount+0x74/0x108 [c7827ed0] [c0078a44] do_kern_mount+0x4c/0x114 [c7827f00] [c0091a2c] do_mount+0x68c/0x720 [c7827f50] [c0091b60] sys_mount+0xa0/0xf8 [c7827f80] [c0311154] mount_block_root+0x130/0x2dc [c7827fd0] [c0311420] prepare_namespace+0xb0/0x204 [c7827fe0] [c0310268] kernel_init+0x148/0x168 [c7827ff0] [c0010318] kernel_thread+0x4c/0x68 UBIFS error (pid 1): dbg_check_idx_size: error -22 while walking the index List of all partitions: 1f00 5120 mtdblock0 (driver?) 1f01 1043456 mtdblock1 (driver?) No filesystem could mount root, tried: ubifs Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Call Trace: [c7827f00] [c00083f0] show_stack+0x7c/0x19c (unreliable) [c7827f40] [c0281c2c] panic+0x94/0x108 [c7827f80] [c03112c4] mount_block_root+0x2a0/0x2dc [c7827fd0] [c0311420] prepare_namespace+0xb0/0x204 [c7827fe0] [c0310268] kernel_init+0x148/0x168 [c7827ff0] [c0010318] kernel_thread+0x4c/0x68 Rebooting in 180 seconds.. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-26 21:13 ` Matthew L. Creech @ 2010-07-27 15:12 ` Artem Bityutskiy 2010-07-27 15:21 ` Artem Bityutskiy ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-07-27 15:12 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd On Mon, 2010-07-26 at 17:13 -0400, Matthew L. Creech wrote: > Hi Artem, thanks for the reply. Responses below: > > On Mon, Jul 26, 2010 at 1:21 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > > UBI wants 1% of PEBs to be reserved for bad block handling. > > > > OK, so with my flash layout it wants to reserve 81 blocks for bad PEB > handling, and does so when first initialized. Then during the course > of normal device operation (lots of reads & writes), let's say 8 more > of them go bad that weren't factory bad blocks. Will it now print > this warning because it only has 73 reserve PEBs left? (In which case > it seems fairly innocuous, right?) OK, you are right, UBI should not bug you so early, there are still plenty of reserved PEBs left. What do you think about the following algorithm: 1. If this is a new image, preserve current behavior and warn. 2. If we see that this is a system which has already been used, we warn only when the reserve is really about to end, say, 5% of the reserve is left. > > Yes. But in the log you sent I do not see any message about autoresize > > happening - UBI prints them. > > > > Correct, this device was running for a while (several months) before > it started having problems. I've never seen the warning printed on a > brand new device (which is the only time autoresize happens, right?), > but I've seen it on several that have been in operation for a while. OK, I see. So the problem is that UBI does not distinguish between a new image and an used one, AFAICS. > I just wanted to be sure that I'm using autoresize properly, and that > it's not somehow screwing up the # of reserved PEBs. But it doesn't > seem like that's the case. (Unless the sequence # or non-erased flash > are to blame - below). Yes, it looks like the warning should be fixed. > > Does it erase whole flash before writing the image? I see that your > > image sequence number is 0, which means you probably use old ubi tools. > > Please, use the latest ubinize - it should pick random sequence number, > > or you may use -Q option. > > > > I am using up to date mtd-utils now, but this device has been in the > field for months. We were using a mtd-utils git snapshot from 4/29/09 > to generate the UBI image. Ok, I see. > I didn't know about the sequence number, I'll be sure we use an > updated mtd-utils for our next firmware version. Yeah, it is new, we introduced it when we faced the problems when users for some reasons interrupt flashing or an error occur during flashing, but then users still can boot, but have various interesting issues. > Could this account for the warning and/or the UBIFS error below? Or > would these kinds of problems manifest in a different way entirely? Well, theoretically they can. But if users did not re-flash your devices, then obviously not. > > > > When you see this warning, can you mount UBIFS? Does it look OK? > > > > So far I've only noticed it on 3 devices. All 3 had UBIFS errors > (below) later on in the boot process, which is what prompted me to > wonder what the warning meant. I'm not entirely sure that the UBI > warning and UBIFS error are related, but so far I haven't noticed the > warning on any other devices. I do not think the warning is related to those issues. > > Can you please enable UBI debugging messages and also "Additional UBI > > initialization and build messages" and attach a log? See this writing > > for help: http://www.linux-mtd.infradead.org/faq/ubi.html#L_how_debug > > > > Certainly. I enabled all the relevant UBI and UBIFS debugging options > that I saw, along with internal self-checks, but there's still not a > whole lot of output. Full console dump is attached - this is a > different device than the first, but exhibits the same problem. I'm sure your ring buffer contains more information. This is one of the reasons I gave you the above link - it explains that not all messages go to console and how to get all meassages. Try to use dmesg. In UBIFS code I see that 'ubifs_read_node()' calls 'dbg_dump_node()' which should dump the node. But '255' is 0xFF, so probably UBIFS read all 0xFF. This may be an UBIFS bug, or some corruption, difficult to say. For some reason the place where a valid znode should live is erased. May be if I have a NAND dump of your broken device I can look at it, but do not promise anything, and I'm also on holiday :-) > Unfortunately I'm not yet sure what causes the devices to get into > this state, so I can't easily reproduce whatever makes it get into > this state in the first place. However I own one of them and have it > at my desk, so I can perform any tests & gather any additional info > that would be helpful. What is your kernel? If it is old, make sure you have fixes from the back-port trees. > > FYI, I did build & run all of the MTD test modules to prove out the > platform-level NAND code (MPC 8313), and encountered no problems. > However, that was on a different device (one that works fine), since > the nature of the tests means that I have to re-partition my flash so > that there's a spare MTD to work with. This really does not look like a NAND/MTD driver issue. More look like either an UBIFS bug of some kind of corruption which corrupted an EC or VID header, then UBI decided to erase this PEB, and then UBIFS reads all 0xFFs from there. The second theory should BTW be fixed. Indeed, when UBI finds a PEB with corrupted headers, it adds this PEB to the 'corr' list, and then just erases. But this is wrong! It should erase them only if there are all 0xFFs in the rest of the block. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 15:12 ` Artem Bityutskiy @ 2010-07-27 15:21 ` Artem Bityutskiy 2010-07-28 5:46 ` Stefani Seibold 2010-08-22 15:02 ` Artem Bityutskiy 2010-07-27 20:47 ` Matthew L. Creech 2010-08-22 18:30 ` Artem Bityutskiy 2 siblings, 2 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-07-27 15:21 UTC (permalink / raw) To: Matthew L. Creech; +Cc: stefani, linux-mtd On Tue, 2010-07-27 at 18:12 +0300, Artem Bityutskiy wrote: > This really does not look like a NAND/MTD driver issue. More look like > either an UBIFS bug of some kind of corruption which corrupted an EC or > VID header, then UBI decided to erase this PEB, and then UBIFS reads all > 0xFFs from there. > > The second theory should BTW be fixed. Indeed, when UBI finds a PEB with > corrupted headers, it adds this PEB to the 'corr' list, and then just > erases. But this is wrong! It should erase them only if there are all > 0xFFs in the rest of the block. Yeah, indeed looks like a bad bug in UBI. So, when we have some flash corruptions which corrupt the VID header, UBI just silently erases this PEB! And then we have small chances to find out why on LEB suddenly became unmapped (erased). UBI logic is - if VID header is corrupted, it is because a sudden power cut while writing the header. And we can erase the PEB because if we were writing the header, we have not written the data yet. But it does not bother checking what goes _after_ the header. If there are some data, UBI should not erase the PEB but preserve it and switch to R/O mode. CCing Stefani, I think here group faced a similar issue recently - one of LEB suddenly disappeared. This may be the reason. Then the other question - why VID became corrupted? Dunno, but if UBI won't erase the PEB we'll have better chances to find this out. Does this sound reasonable? -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 15:21 ` Artem Bityutskiy @ 2010-07-28 5:46 ` Stefani Seibold 2010-08-22 15:04 ` Artem Bityutskiy 2010-08-22 15:02 ` Artem Bityutskiy 1 sibling, 1 reply; 25+ messages in thread From: Stefani Seibold @ 2010-07-28 5:46 UTC (permalink / raw) To: dedekind1; +Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Matthew L. Creech Am Dienstag, den 27.07.2010, 18:21 +0300 schrieb Artem Bityutskiy: > On Tue, 2010-07-27 at 18:12 +0300, Artem Bityutskiy wrote: > > This really does not look like a NAND/MTD driver issue. More look like > > either an UBIFS bug of some kind of corruption which corrupted an EC or > > VID header, then UBI decided to erase this PEB, and then UBIFS reads all > > 0xFFs from there. > > > > The second theory should BTW be fixed. Indeed, when UBI finds a PEB with > > corrupted headers, it adds this PEB to the 'corr' list, and then just > > erases. But this is wrong! It should erase them only if there are all > > 0xFFs in the rest of the block. > > Yeah, indeed looks like a bad bug in UBI. So, when we have some flash > corruptions which corrupt the VID header, UBI just silently erases this > PEB! And then we have small chances to find out why on LEB suddenly > became unmapped (erased). > > UBI logic is - if VID header is corrupted, it is because a sudden power > cut while writing the header. And we can erase the PEB because if we > were writing the header, we have not written the data yet. > > But it does not bother checking what goes _after_ the header. If there > are some data, UBI should not erase the PEB but preserve it and switch > to R/O mode. > > CCing Stefani, I think here group faced a similar issue recently - one > of LEB suddenly disappeared. This may be the reason. > > Then the other question - why VID became corrupted? Dunno, but if UBI > won't erase the PEB we'll have better chances to find this out. Does > this sound reasonable? > Not really. Why should especially the master LEB crash? First: As i understand UBIFS will append data to the master LEBs until no space is free, and then it will erase the master LEB and create a new one. Is this right? Second: We need a kind of the patch which i had provided i the case one of the master block will be damaged. Otherwise the whole file system is garbage. - Stefani ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-28 5:46 ` Stefani Seibold @ 2010-08-22 15:04 ` Artem Bityutskiy 2010-08-31 12:09 ` Stefani Seibold 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-08-22 15:04 UTC (permalink / raw) To: Stefani Seibold Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Matthew L. Creech On Wed, 2010-07-28 at 07:46 +0200, Stefani Seibold wrote: > Am Dienstag, den 27.07.2010, 18:21 +0300 schrieb Artem Bityutskiy: > > On Tue, 2010-07-27 at 18:12 +0300, Artem Bityutskiy wrote: > > > This really does not look like a NAND/MTD driver issue. More look like > > > either an UBIFS bug of some kind of corruption which corrupted an EC or > > > VID header, then UBI decided to erase this PEB, and then UBIFS reads all > > > 0xFFs from there. > > > > > > The second theory should BTW be fixed. Indeed, when UBI finds a PEB with > > > corrupted headers, it adds this PEB to the 'corr' list, and then just > > > erases. But this is wrong! It should erase them only if there are all > > > 0xFFs in the rest of the block. > > > > Yeah, indeed looks like a bad bug in UBI. So, when we have some flash > > corruptions which corrupt the VID header, UBI just silently erases this > > PEB! And then we have small chances to find out why on LEB suddenly > > became unmapped (erased). > > > > UBI logic is - if VID header is corrupted, it is because a sudden power > > cut while writing the header. And we can erase the PEB because if we > > were writing the header, we have not written the data yet. > > > > But it does not bother checking what goes _after_ the header. If there > > are some data, UBI should not erase the PEB but preserve it and switch > > to R/O mode. > > > > CCing Stefani, I think here group faced a similar issue recently - one > > of LEB suddenly disappeared. This may be the reason. > > > > Then the other question - why VID became corrupted? Dunno, but if UBI > > won't erase the PEB we'll have better chances to find this out. Does > > this sound reasonable? > > > > Not really. Why should especially the master LEB crash? Not sure, may be there is some reason. Dunno, that was just a thought. > First: As i understand UBIFS will append data to the master LEBs until > no space is free, and then it will erase the master LEB and create a new > one. Is this right? Yes. > Second: We need a kind of the patch which i had provided i the case one > of the master block will be damaged. Otherwise the whole file system is > garbage. Yes, but your patch fixes the symptom, unfortunately. It is ok for you to use as a work-around, but I still hope to find the root cause. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-22 15:04 ` Artem Bityutskiy @ 2010-08-31 12:09 ` Stefani Seibold 2010-09-01 15:47 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Stefani Seibold @ 2010-08-31 12:09 UTC (permalink / raw) To: dedekind1 Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Pagliari, Vivenzio (NSN - DE/Ulm), Matthew L. Creech Am Sonntag, den 22.08.2010, 18:04 +0300 schrieb Artem Bityutskiy: > > Yes, but your patch fixes the symptom, unfortunately. It is ok for you > to use as a work-around, but I still hope to find the root cause. > True, but also if we fix the cause, this could happen. Imagine that one of the two master LEB will get corrupted, due a flash error or a power fail during a write access. Than the system should able to mount this damaged file system and restore the lost master LEB. We should try to make UBIFS as robustly as possible and handle all possible errors. My path is only a first idea to solve this kind of problem. I think it must enhanced by restoring the LEB when the filesystem is mounted read-write. I think it is important to be a bit more defensive and assume the worst case. - Stefani ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-31 12:09 ` Stefani Seibold @ 2010-09-01 15:47 ` Artem Bityutskiy 2010-09-02 6:47 ` Stefani Seibold 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-01 15:47 UTC (permalink / raw) To: Stefani Seibold Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Pagliari, Vivenzio (NSN - DE/Ulm), Matthew L. Creech Hi, On Tue, 2010-08-31 at 14:09 +0200, Stefani Seibold wrote: > Am Sonntag, den 22.08.2010, 18:04 +0300 schrieb Artem Bityutskiy: > > > > Yes, but your patch fixes the symptom, unfortunately. It is ok for you > > to use as a work-around, but I still hope to find the root cause. > True, but also if we fix the cause, this could happen. Imagine that one > of the two master LEB will get corrupted, due a flash error or a power > fail during a write access. Than the system should able to mount this > damaged file system and restore the lost master LEB. Firs of all, UBIFS _does_ handle the situation when on master LEB is corrupted. It is designed for this and this part was tested. _But_ UBIFS expects that the master LEB is corrupted in _certain way_. If it is corrupted in an unexpected way - we panic. To put it differently, we do not handle random corruptions, we handle only corruptions which _look_ like corruptions caused by power cuts. In your case you have very strange corruption. We can apply your patch, problem solved, but will you be 100% comfortable with this? There is a chance that you have some issues which can later have different symptoms. I am still interested to find out the real root reason. I will look at your issue as soon as I have time. I'm currently in Brazil at the LinuxCon and do not have enough time to look at large things so far. > We should try to make UBIFS as robustly as possible and handle all > possible errors. Yes. But again, your case is a failure which does not look like a corruption due to power cuts. In UBIFS we have certain expectations about how Flash behaves, and we designed UBI/UBIFS around these expectations. In your the corruption does not fit our expectations. So we need to understand what happens. Then we can amend UBIFS expectation. Thus, I think your patch should not be applied to upstream UBIFS _before_ the reasons of the issue are fully understood. Lets at least _try_, there is no guarantee we can find out what happened, but lets try anyway. > I think it is important to be a bit more defensive and assume the worst > case. We do try to be defensive - we refuse mounting if we see that the FS is screwed in unexpected way. Instead of swallowing corrupted FS and corrupting it even more - we refuse it. That's very defensive! As I explained, we recover only if we see that the corruption looks like the power-cut corruption. I am actually trying to help you to find the real root cause. Sorry for my stubbornness, but I really try to help. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-01 15:47 ` Artem Bityutskiy @ 2010-09-02 6:47 ` Stefani Seibold 2010-09-02 9:45 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Stefani Seibold @ 2010-09-02 6:47 UTC (permalink / raw) To: dedekind1 Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Pagliari, Vivenzio (NSN - DE/Ulm), Matthew L. Creech Am Mittwoch, den 01.09.2010, 18:47 +0300 schrieb Artem Bityutskiy: > Hi, > > On Tue, 2010-08-31 at 14:09 +0200, Stefani Seibold wrote: > > Am Sonntag, den 22.08.2010, 18:04 +0300 schrieb Artem Bityutskiy: > > > > > > Yes, but your patch fixes the symptom, unfortunately. It is ok for you > > > to use as a work-around, but I still hope to find the root cause. > > > True, but also if we fix the cause, this could happen. Imagine that one > > of the two master LEB will get corrupted, due a flash error or a power > > fail during a write access. Than the system should able to mount this > > damaged file system and restore the lost master LEB. > > Firs of all, UBIFS _does_ handle the situation when on master LEB is > corrupted. It is designed for this and this part was tested. _But_ UBIFS > expects that the master LEB is corrupted in _certain way_. If it is > corrupted in an unexpected way - we panic. > > To put it differently, we do not handle random corruptions, we handle > only corruptions which _look_ like corruptions caused by power cuts. > That is the point! Panic or refuse to mount is not a good solution. If there is a way to mount the file system, than do it. Maybe read-only with a big fat warning, but this is better than nothing. This is like the ext3 fs, which handles this also in this way. > In your case you have very strange corruption. We can apply your patch, > problem solved, but will you be 100% comfortable with this? There is a > chance that you have some issues which can later have different > symptoms. I am still interested to find out the real root reason. > No a will be not 100% comfortable with this, but it is better than nothing. I agree with you to fix the cause and not the symptoms, but may patch will help to get your data back in case of this bug. Greetings, Stefani ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-02 6:47 ` Stefani Seibold @ 2010-09-02 9:45 ` Artem Bityutskiy 0 siblings, 0 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-02 9:45 UTC (permalink / raw) To: Stefani Seibold Cc: Kreuzer, Michael (NSN - DE/Ulm), linux-mtd, Pagliari, Vivenzio (NSN - DE/Ulm), Matthew L. Creech On Thu, 2010-09-02 at 08:47 +0200, Stefani Seibold wrote: > > That is the point! Panic or refuse to mount is not a good solution. If > there is a way to mount the file system, than do it. Maybe read-only > with a big fat warning, but this is better than nothing. This is like > the ext3 fs, which handles this also in this way. > > > In your case you have very strange corruption. We can apply your > patch, > > problem solved, but will you be 100% comfortable with this? There is > a > > chance that you have some issues which can later have different > > symptoms. I am still interested to find out the real root reason. > > > > No a will be not 100% comfortable with this, but it is better than > nothing. I agree with you to fix the cause and not the symptoms, but > may > patch will help to get your data back in case of this bug. May be. I need some more time to think about this. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 15:21 ` Artem Bityutskiy 2010-07-28 5:46 ` Stefani Seibold @ 2010-08-22 15:02 ` Artem Bityutskiy 1 sibling, 0 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-08-22 15:02 UTC (permalink / raw) To: Matthew L. Creech; +Cc: stefani, linux-mtd On Tue, 2010-07-27 at 18:21 +0300, Artem Bityutskiy wrote: > On Tue, 2010-07-27 at 18:12 +0300, Artem Bityutskiy wrote: > > This really does not look like a NAND/MTD driver issue. More look like > > either an UBIFS bug of some kind of corruption which corrupted an EC or > > VID header, then UBI decided to erase this PEB, and then UBIFS reads all > > 0xFFs from there. > > > > The second theory should BTW be fixed. Indeed, when UBI finds a PEB with > > corrupted headers, it adds this PEB to the 'corr' list, and then just > > erases. But this is wrong! It should erase them only if there are all > > 0xFFs in the rest of the block. > > Yeah, indeed looks like a bad bug in UBI. So, when we have some flash > corruptions which corrupt the VID header, UBI just silently erases this > PEB! And then we have small chances to find out why on LEB suddenly > became unmapped (erased). > > UBI logic is - if VID header is corrupted, it is because a sudden power > cut while writing the header. And we can erase the PEB because if we > were writing the header, we have not written the data yet. > > But it does not bother checking what goes _after_ the header. If there > are some data, UBI should not erase the PEB but preserve it and switch > to R/O mode. > > CCing Stefani, I think here group faced a similar issue recently - one > of LEB suddenly disappeared. This may be the reason. > > Then the other question - why VID became corrupted? Dunno, but if UBI > won't erase the PEB we'll have better chances to find this out. Does > this sound reasonable? Are you able to reproduce this problem? Are you still interested in this? I'm going to teach UBI to be less harsh and avoid erasing PEBs which have corrupted headers. I'm still thinking how to do this, though. So, consider UBI is in situation that it is scanning the flash, and encounters a PEB which has corrupted EC and VID headers. Currently UBI just wipes blocks like this. First of all, I do not know how often things like this happen in the wild, in real systems. This should not happen, but I need to be careful. This means that solutions like refusing attaching this MTD device or switching to R/O mode immediately is not really good. So, what I am thinking to do is to just preserve this PEB. Avoid erasing it, but also put it aside, not use it for regular UBI I/O purposes, remove from the wear-leveling cycle. On NAND, this in most cases is doable, because we anyway have a pool of PEBs reserved for bad eraseblocks handling. So UBI can use a PEB from this pool, instead of that corrupted one. On NOR, we do not have such pool. But many systems still probably use less PEBs than it is available, so in many cases it is OK on NOR too. We can allow for several corrupted PEBs like that. But if we have, say, more than 8 PEBs like that, we can refuse attaching such flash. But if UBI really runs out of PEBs, and really needs an empty PEB, we can take the preserve corrupted PEBs and use them. In this case, we'll have to erase them. But my hope is that if we really have a nasty corruption, then upper layers like UBIFS will notice this. Then users will have to look at the logs, and notice UBI complains, and they will have the corrupted PEB for investigations. How does this sound? Ideas? Artem. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 15:12 ` Artem Bityutskiy 2010-07-27 15:21 ` Artem Bityutskiy @ 2010-07-27 20:47 ` Matthew L. Creech 2010-07-30 16:12 ` Artem Bityutskiy 2010-08-22 18:30 ` Artem Bityutskiy 2 siblings, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-07-27 20:47 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd On Tue, Jul 27, 2010 at 11:12 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > OK, you are right, UBI should not bug you so early, there are still > plenty of reserved PEBs left. What do you think about the following > algorithm: > > 1. If this is a new image, preserve current behavior and warn. > 2. If we see that this is a system which has already been used, we warn > only when the reserve is really about to end, say, 5% of the reserve is > left. > Sounds fine to me. And the warning as-is isn't necessarily inaccurate; were it not for the errors later on, I probably would've assumed (correctly) that it's simply due to the fact that some NAND blocks which were initially good have since gone bad, causing my reserve pool of eraseblocks to drop. Then again, that should probably be expected on any long-running NAND device, so it might make sense to only show the warning on a new image. :) >> Could this account for the warning and/or the UBIFS error below? Or >> would these kinds of problems manifest in a different way entirely? > > Well, theoretically they can. But if users did not re-flash your > devices, then obviously not. > > > I'm sure your ring buffer contains more information. This is one of the > reasons I gave you the above link - it explains that not all messages go > to console and how to get all meassages. Try to use dmesg. In UBIFS code > I see that 'ubifs_read_node()' calls 'dbg_dump_node()' which should dump > the node. > Sorry, I missed the bit about "ignore_loglevel" on serial consoles. A more complete log is available here (it's around 5MB): http://mcreech.com/work/ubi-error.txt > > May be if I have a NAND dump of your broken device I can look at it, but > do not promise anything, and I'm also on holiday :-) > Sure, I'll try to set up a NFS root so that I can boot without flash. I realize it might not help much in diagnosing this problem after-the-fact, though. > > What is your kernel? If it is old, make sure you have fixes from the > back-port trees. > Vanilla 2.6.31, plus patches for UnionFS and YAFFS (unused) support and a few board-specific items. One of the devices was running development firmware, so it was using 2.6.34 at the time at which problems were first seen. So I'm assuming that kernel version probably doesn't make much difference, unless there are significant changes sitting in the UBI git tree that don't get pushed upstream as part of the kernel release cycle. > > This really does not look like a NAND/MTD driver issue. More look like > either an UBIFS bug of some kind of corruption which corrupted an EC or > VID header, then UBI decided to erase this PEB, and then UBIFS reads all > 0xFFs from there. > > The second theory should BTW be fixed. Indeed, when UBI finds a PEB with > corrupted headers, it adds this PEB to the 'corr' list, and then just > erases. But this is wrong! It should erase them only if there are all > 0xFFs in the rest of the block. > Makes sense. Unfortunately it's difficult to reproduce the problem (I've certainly tried), so this change probably wouldn't help me in the short-term. However, it would definitely help if/when I encounter the issue again on another device, and will certainly help anybody else who sees similar issues in the future. Thanks again for your help Artem (especially while on vacation). :) -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 20:47 ` Matthew L. Creech @ 2010-07-30 16:12 ` Artem Bityutskiy 2010-07-30 17:51 ` Matthew L. Creech 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-07-30 16:12 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd On Tue, 2010-07-27 at 16:47 -0400, Matthew L. Creech wrote: > On Tue, Jul 27, 2010 at 11:12 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > > OK, you are right, UBI should not bug you so early, there are still > > plenty of reserved PEBs left. What do you think about the following > > algorithm: > > > > 1. If this is a new image, preserve current behavior and warn. > > 2. If we see that this is a system which has already been used, we warn > > only when the reserve is really about to end, say, 5% of the reserve is > > left. > > > > Sounds fine to me. And the warning as-is isn't necessarily > inaccurate; were it not for the errors later on, I probably would've > assumed (correctly) that it's simply due to the fact that some NAND > blocks which were initially good have since gone bad, causing my > reserve pool of eraseblocks to drop. > > Then again, that should probably be expected on any long-running NAND > device, so it might make sense to only show the warning on a new > image. :) Something like this, I guess, would be good enough? >From fc3f6446e374f31c37ee0b5a4fc5de2e51d9e3de Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Date: Fri, 30 Jul 2010 14:59:50 +0300 Subject: [PATCH] UBI: do not warn unnecessarily Currently, when UBI attaches an MTD device and cannot reserve all 1% (by default) of PEBs for bad eraseblocks handling, it prints a warning. However, Matthew L. Creech <mlcreech@gmail.com> is not very happy to see this warning, because he did reserve enough of PEB at the beginning, but with time some PEBs became bad. The warning is not necessary in this case. This patch makes UBI print the warning o if this is a new image o of this is used image and the amount of reserved PEBs is only 10% (or less) of the size of the reserved PEB pool. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- drivers/mtd/ubi/build.c | 3 ++- drivers/mtd/ubi/eba.c | 42 +++++++++++++++++++++++++++++++++++++++--- 2 files changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 13b05cb..78ae894 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -593,6 +593,7 @@ static int attach_by_scanning(struct ubi_device *ubi) ubi->good_peb_count = ubi->peb_count - ubi->bad_peb_count; ubi->max_ec = si->max_ec; ubi->mean_ec = si->mean_ec; + ubi_msg("max. sequence number: %llu", si->max_sqnum); err = ubi_read_volume_table(ubi, si); if (err) @@ -981,7 +982,7 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num, int vid_hdr_offset) ubi_msg("number of PEBs reserved for bad PEB handling: %d", ubi->beb_rsvd_pebs); ubi_msg("max/mean erase counter: %d/%d", ubi->max_ec, ubi->mean_ec); - ubi_msg("image sequence number: %d", ubi->image_seq); + ubi_msg("image sequence number: %d", ubi->image_seq); /* * The below lock makes sure we do not race with 'ubi_thread()' which diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index b582671..fe74749 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -1166,6 +1166,44 @@ out_unlock_leb: } /** + * print_rsvd_warning - warn about not having enough reserved PEBs. + * @ubi: UBI device description object + * + * This is a helper function for 'ubi_eba_init_scan()' which is called when UBI + * cannot reserve enough PEBs for bad block handling. This function makes a + * decision whether we have to print a warning or not. The algorithm is as + * follows: + * o if this is a new UBI image, then just print the warning + * o if this is an UBI image which has already been used for some time, print + * a warning only if we can reserve less than 10% of the expected amount of + * the reserved PEB. + * + * The idea is that when UBI is used, PEBs become bad, and the reserved pool + * of PEBs becomes smaller, which is normal and we do not want to scare users + * with a warning every time they attach the MTD device. This was an issue + * reported by real users. + */ +static void print_rsvd_warning(struct ubi_device *ubi, + struct ubi_scan_info *si) +{ + /* + * The 1 << 18 (256KiB) number is picked randomly, just a reasonably + * large number to distinguish between newly flashed and used images. + */ + if (si->max_sqnum > (1 << 18)) { + int min = ubi->beb_rsvd_level / 10; + + if (!min) + min = 1; + if (ubi->beb_rsvd_pebs > min) + return; + } + + ubi_warn("cannot reserve enough PEBs for bad PEB handling, reserved %d," + " need %d", ubi->beb_rsvd_pebs, ubi->beb_rsvd_level); +} + +/** * ubi_eba_init_scan - initialize the EBA sub-system using scanning information. * @ubi: UBI device description object * @si: scanning information @@ -1237,9 +1275,7 @@ int ubi_eba_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si) if (ubi->avail_pebs < ubi->beb_rsvd_level) { /* No enough free physical eraseblocks */ ubi->beb_rsvd_pebs = ubi->avail_pebs; - ubi_warn("cannot reserve enough PEBs for bad PEB " - "handling, reserved %d, need %d", - ubi->beb_rsvd_pebs, ubi->beb_rsvd_level); + print_rsvd_warning(ubi, si); } else ubi->beb_rsvd_pebs = ubi->beb_rsvd_level; -- 1.6.2.5 -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-30 16:12 ` Artem Bityutskiy @ 2010-07-30 17:51 ` Matthew L. Creech 2010-08-02 4:22 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-07-30 17:51 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd On Fri, Jul 30, 2010 at 12:12 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > Something like this, I guess, would be good enough? > ... > + * o if this is a new UBI image, then just print the warning > + * o if this is an UBI image which has already been used for some time, print > + * a warning only if we can reserve less than 10% of the expected amount of > + * the reserved PEB. > + * > + * The idea is that when UBI is used, PEBs become bad, and the reserved pool > + * of PEBs becomes smaller, which is normal and we do not want to scare users > + * with a warning every time they attach the MTD device. This was an issue > + * reported by real users. This sounds like a good compromise to me. I was wondering whether we'd still want to warn in the case in which nearly all of the reserve PEBs had been exhausted, and using a 10% threshold seems like a good way to accomplish that without scaring others unnecessarily. :) Thanks -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-30 17:51 ` Matthew L. Creech @ 2010-08-02 4:22 ` Artem Bityutskiy 0 siblings, 0 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-08-02 4:22 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd On Fri, 2010-07-30 at 13:51 -0400, Matthew L. Creech wrote: > On Fri, Jul 30, 2010 at 12:12 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > > Something like this, I guess, would be good enough? > > > ... > > + * o if this is a new UBI image, then just print the warning > > + * o if this is an UBI image which has already been used for some time, print > > + * a warning only if we can reserve less than 10% of the expected amount of > > + * the reserved PEB. > > + * > > + * The idea is that when UBI is used, PEBs become bad, and the reserved pool > > + * of PEBs becomes smaller, which is normal and we do not want to scare users > > + * with a warning every time they attach the MTD device. This was an issue > > + * reported by real users. > > This sounds like a good compromise to me. I was wondering whether > we'd still want to warn in the case in which nearly all of the reserve > PEBs had been exhausted, and using a 10% threshold seems like a good > way to accomplish that without scaring others unnecessarily. :) Pushed this patch to ubi-2.6.git -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-07-27 15:12 ` Artem Bityutskiy 2010-07-27 15:21 ` Artem Bityutskiy 2010-07-27 20:47 ` Matthew L. Creech @ 2010-08-22 18:30 ` Artem Bityutskiy 2010-08-24 22:38 ` Matthew L. Creech 2 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-08-22 18:30 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd On Tue, 2010-07-27 at 18:12 +0300, Artem Bityutskiy wrote: > > Certainly. I enabled all the relevant UBI and UBIFS debugging options > > that I saw, along with internal self-checks, but there's still not a > > whole lot of output. Full console dump is attached - this is a > > different device than the first, but exhibits the same problem. > > I'm sure your ring buffer contains more information. This is one of the > reasons I gave you the above link - it explains that not all messages go > to console and how to get all meassages. Try to use dmesg. In UBIFS code > I see that 'ubifs_read_node()' calls 'dbg_dump_node()' which should dump > the node. > > But '255' is 0xFF, so probably UBIFS read all 0xFF. This may be an UBIFS > bug, or some corruption, difficult to say. For some reason the place > where a valid znode should live is erased. > > May be if I have a NAND dump of your broken device I can look at it, but > do not promise anything, and I'm also on holiday :-) Could you please dump LEB 5586 and check whether it is really erased or not? Please, apply the following patch: >From feb1616809b0bebeaf0cb596fdb5d715f6d75700 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Date: Sun, 22 Aug 2010 21:27:30 +0300 Subject: [PATCH] UBIFS: improve error reporting when reading bad node When an error happens during validation of read node, the typical situation is that the LEB we read is unmapped (due to some bug). It is handy to include the mapping status into the error message. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> --- fs/ubifs/io.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c index bcf5a16..9432431 100644 --- a/fs/ubifs/io.c +++ b/fs/ubifs/io.c @@ -815,7 +815,8 @@ int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len, return 0; out: - ubifs_err("bad node at LEB %d:%d", lnum, offs); + ubifs_err("bad node at LEB %d:%d, LEB mapping status %d", lnum, offs, + ubi_is_mapped(c->ubi, lnum)); dbg_dump_node(c, buf); dbg_dump_stack(); return -EINVAL; -- 1.7.2.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-22 18:30 ` Artem Bityutskiy @ 2010-08-24 22:38 ` Matthew L. Creech 2010-08-25 3:51 ` Artem Bityutskiy 2010-08-31 15:36 ` Matthew L. Creech 0 siblings, 2 replies; 25+ messages in thread From: Matthew L. Creech @ 2010-08-24 22:38 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd On Sun, Aug 22, 2010 at 2:30 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > Could you please dump LEB 5586 and check whether it is really erased or > not? Please, apply the following patch: > ... Hi Artem, I applied your patch to 2.6.35 and booted that on the same bad device for which I posted a boot log. But to my surprise, it didn't give any errors at all - the boot process completed normally, and it appears to be working fine so far. I had booted the kernel from memory, so I pulled the power and let it go through the normal boot process (using the 2.6.31 kernel that's built in to the firmware) - that also works fine. I checked my sanity by trying another bad device, and the same thing happened there - booting 2.6.35 somehow "fixed" my problem. Unfortunately, this means I can no longer tell what happened to the original block you were interested in. I'll try to dig up another bad device, and use your patch with an older kenel version to see what happens there. Is it at all possible that the error was caused by something at the UBI or MTD layer which was fixed between 2.6.34 and 2.6.35, and booting up with 2.6.35 "touched" something that made it work again even after reverting to an older kernel? Sounds pretty far-fetched, but I don't know how else to explain these suddenly-recovered devices... -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-24 22:38 ` Matthew L. Creech @ 2010-08-25 3:51 ` Artem Bityutskiy 2010-08-31 15:36 ` Matthew L. Creech 1 sibling, 0 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-08-25 3:51 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd Hi, On Tue, 2010-08-24 at 18:38 -0400, Matthew L. Creech wrote: > I applied your patch to 2.6.35 and booted that on the same bad device > for which I posted a boot log. But to my surprise, it didn't give any > errors at all - the boot process completed normally, and it appears to > be working fine so far. I had booted the kernel from memory, so I > pulled the power and let it go through the normal boot process (using > the 2.6.31 kernel that's built in to the firmware) - that also works > fine. > > I checked my sanity by trying another bad device, and the same thing > happened there - booting 2.6.35 somehow "fixed" my problem. > Unfortunately, this means I can no longer tell what happened to the > original block you were interested in. I'll try to dig up another bad > device, and use your patch with an older kenel version to see what > happens there. This is interesting. BTW, if you use 2.6.31, you should in any case apply patches from the ubifs-v2.6.31 back-port tree, there were some good fixes. > Is it at all possible that the error was caused by something at the > UBI or MTD layer which was fixed between 2.6.34 and 2.6.35, and > booting up with 2.6.35 "touched" something that made it work again > even after reverting to an older kernel? Sounds pretty far-fetched, > but I don't know how else to explain these suddenly-recovered > devices... OK :-) Artem. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-24 22:38 ` Matthew L. Creech 2010-08-25 3:51 ` Artem Bityutskiy @ 2010-08-31 15:36 ` Matthew L. Creech 2010-09-01 18:57 ` Artem Bityutskiy 1 sibling, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-08-31 15:36 UTC (permalink / raw) To: dedekind1; +Cc: JamesLNute, linux-mtd On Tue, Aug 24, 2010 at 6:38 PM, Matthew L. Creech <mlcreech@gmail.com> wrote: > > I applied your patch to 2.6.35 and booted that on the same bad device > for which I posted a boot log. But to my surprise, it didn't give any > errors at all - the boot process completed normally, and it appears to > be working fine so far. I had booted the kernel from memory, so I > pulled the power and let it go through the normal boot process (using > the 2.6.31 kernel that's built in to the firmware) - that also works > fine. > Another device had similar UBIFS corruption, so this time we stayed with 2.6.34 and added the mapping status to that error message. This is the output: UBIFS error (pid 455): ubifs_read_node: bad node type (255 but expected 1) UBI DBG (pid 455): ubi_is_mapped: test LEB 0:7746 UBIFS error (pid 455): ubifs_read_node: bad node at LEB 7746:110360, LEB mapping status 0 Not a node, first 24 bytes: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ Call Trace: [c7b69bf0] [c0008410] show_stack+0x54/0x134 (unreliable) [c7b69c30] [c00cfeb8] ubifs_read_node+0x2f0/0x308 [c7b69c60] [c00ef4a8] ubifs_tnc_read_node+0x60/0x188 [c7b69ca0] [c00d34a4] ubifs_tnc_locate+0xd0/0x1d8 [c7b69d00] [c00c5818] do_readpage+0x1d0/0x4b8 [c7b69d50] [c00c7410] ubifs_readpage+0x3ec/0x400 --- Exception: 901 at flush_dcache_icache_page+0x24/0x30 LR = flush_dcache_icache_page+0x20/0x30 [c7b69db0] [c0048c44] generic_file_aio_read+0x454/0x630 (unreliable) [c7b69e50] [c00698d4] do_sync_read+0xa4/0xe0 [c7b69ef0] [c0069f5c] vfs_read+0xc4/0x16c [c7b69f10] [c006a29c] sys_read+0x4c/0x80 [c7b69f40] [c000f40c] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xfb45b4c LR = 0xfb45b34 Is this what was expected? FYI, this particular device only has corruption on a file that's not boot-critical, so it continues the boot process somewhat normally. A partial log (cut off once it seemed that nothing else useful was happening) is here, if that's of any use: http://mcreech.com/work/ubifs-mapping-status.txt It's also worth mentioning that when enabling UBI & UBIFS self checks on this device, the boot process halts after some errors are detected. The pertinent portion is: UBIFS error (pid 1): scan_check_cb: bad accounting of LEB 31: free 0, dirty 560 flags 0x0, should be free 129024, dirty 0 (pid 1) start dumping LEB 31 UBIFS DBG (pid 1): ubifs_start_scan: scan LEB 31:0 UBI DBG (pid 1): ubi_leb_read: read 129024 bytes from LEB 0:31:0 UBI DBG (pid 1): ubi_eba_read_leb: read 129024 bytes from offset 0 of LEB 0:31 (unmapped) UBIFS DBG (pid 1): ubifs_scan: look at LEB 31:0 (129024 bytes left) UBIFS DBG (pid 1): ubifs_scan_a_node: hit empty space UBIFS DBG (pid 1): ubifs_end_scan: stop scanning LEB 31 at offset 0 LEB 31 has 0 nodes ending at 0 (pid 1) finish dumping LEB 31 UBIFS error (pid 1): do_commit: commit failed, error -22 UBIFS warning (pid 1): ubifs_ro_mode: switched to read-only mode, error -22 Call Trace: [c7827d00] [c0008410] show_stack+0x54/0x134 (unreliable) [c7827d40] [c00ce774] ubifs_ro_mode+0x60/0x70 [c7827d50] [c00daf40] do_commit+0x5f0/0x5fc [c7827dd0] [c00eab14] ubifs_rcvry_gc_commit+0x440/0x46c [c7827e10] [c00cc900] ubifs_get_sb+0xe60/0x1734 [c7827ea0] [c006c794] vfs_kern_mount+0x68/0xf0 [c7827ed0] [c006c85c] do_kern_mount+0x40/0xf0 [c7827f00] [c0080dd8] do_mount+0x634/0x6a0 [c7827f50] [c0080f28] sys_mount+0x90/0xcc [c7827f80] [c02cdec8] mount_block_root+0x108/0x284 [c7827fd0] [c02ce288] prepare_namespace+0xac/0x1e8 [c7827fe0] [c02cd820] kernel_init+0x144/0x154 [c7827ff0] [c000f230] kernel_thread+0x4c/0x68 I've no idea if this is related or not, but there's a full log of that run available here: http://mcreech.com/work/ubifs-self-checks.txt Please let me know if there are any other tests you'd like run on this device. Otherwise, we'll probably try booting 2.6.35 and see what happens. Thanks! -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-08-31 15:36 ` Matthew L. Creech @ 2010-09-01 18:57 ` Artem Bityutskiy 2010-09-06 9:17 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-01 18:57 UTC (permalink / raw) To: Matthew L. Creech, Adrian.Hunter; +Cc: JamesLNute, linux-mtd Hi, On Tue, 2010-08-31 at 11:36 -0400, Matthew L. Creech wrote: > UBIFS error (pid 455): ubifs_read_node: bad node type (255 but expected 1) > UBI DBG (pid 455): ubi_is_mapped: test LEB 0:7746 > UBIFS error (pid 455): ubifs_read_node: bad node at LEB 7746:110360, > LEB mapping status 0 OK, so this LEB is unmapped. > Not a node, first 24 bytes: > 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ........................ > Call Trace: > [c7b69bf0] [c0008410] show_stack+0x54/0x134 (unreliable) > [c7b69c30] [c00cfeb8] ubifs_read_node+0x2f0/0x308 > [c7b69c60] [c00ef4a8] ubifs_tnc_read_node+0x60/0x188 > [c7b69ca0] [c00d34a4] ubifs_tnc_locate+0xd0/0x1d8 > [c7b69d00] [c00c5818] do_readpage+0x1d0/0x4b8 > [c7b69d50] [c00c7410] ubifs_readpage+0x3ec/0x400 > --- Exception: 901 at flush_dcache_icache_page+0x24/0x30 > LR = flush_dcache_icache_page+0x20/0x30 > [c7b69db0] [c0048c44] generic_file_aio_read+0x454/0x630 (unreliable) > [c7b69e50] [c00698d4] do_sync_read+0xa4/0xe0 > [c7b69ef0] [c0069f5c] vfs_read+0xc4/0x16c > [c7b69f10] [c006a29c] sys_read+0x4c/0x80 > [c7b69f40] [c000f40c] ret_from_syscall+0x0/0x38 > --- Exception: c01 at 0xfb45b4c > LR = 0xfb45b34 > > Is this what was expected? Well, it is what I thought it wold be. > FYI, this particular device only has corruption on a file that's not > boot-critical, so it continues the boot process somewhat normally. A > partial log (cut off once it seemed that nothing else useful was > happening) is here, if that's of any use: > > http://mcreech.com/work/ubifs-mapping-status.txt > > It's also worth mentioning that when enabling UBI & UBIFS self checks > on this device, the boot process halts after some errors are detected. > The pertinent portion is: > > UBIFS error (pid 1): scan_check_cb: bad accounting of LEB 31: free 0, > dirty 560 flags 0x0, should be free 129024, dirty 0 > (pid 1) start dumping LEB 31 > UBIFS DBG (pid 1): ubifs_start_scan: scan LEB 31:0 > UBI DBG (pid 1): ubi_leb_read: read 129024 bytes from LEB 0:31:0 > UBI DBG (pid 1): ubi_eba_read_leb: read 129024 bytes from offset 0 of > LEB 0:31 (unmapped) > UBIFS DBG (pid 1): ubifs_scan: look at LEB 31:0 (129024 bytes left) > UBIFS DBG (pid 1): ubifs_scan_a_node: hit empty space > UBIFS DBG (pid 1): ubifs_end_scan: stop scanning LEB 31 at offset 0 > LEB 31 has 0 nodes ending at 0 > (pid 1) finish dumping LEB 31 > UBIFS error (pid 1): do_commit: commit failed, error -22 > UBIFS warning (pid 1): ubifs_ro_mode: switched to read-only mode, error -22 > Call Trace: > [c7827d00] [c0008410] show_stack+0x54/0x134 (unreliable) > [c7827d40] [c00ce774] ubifs_ro_mode+0x60/0x70 > [c7827d50] [c00daf40] do_commit+0x5f0/0x5fc > [c7827dd0] [c00eab14] ubifs_rcvry_gc_commit+0x440/0x46c > [c7827e10] [c00cc900] ubifs_get_sb+0xe60/0x1734 > [c7827ea0] [c006c794] vfs_kern_mount+0x68/0xf0 > [c7827ed0] [c006c85c] do_kern_mount+0x40/0xf0 > [c7827f00] [c0080dd8] do_mount+0x634/0x6a0 > [c7827f50] [c0080f28] sys_mount+0x90/0xcc > [c7827f80] [c02cdec8] mount_block_root+0x108/0x284 > [c7827fd0] [c02ce288] prepare_namespace+0xac/0x1e8 > [c7827fe0] [c02cd820] kernel_init+0x144/0x154 > [c7827ff0] [c000f230] kernel_thread+0x4c/0x68 > > I've no idea if this is related or not, but there's a full log of that > run available here: > > http://mcreech.com/work/ubifs-self-checks.txt > > Please let me know if there are any other tests you'd like run on this > device. Otherwise, we'll probably try booting 2.6.35 and see what > happens. I need to take some time and carefully look at this. And think. Please, make a copy of the contents of your flash, if you can. >From your side what would be helpful is if you tried to figure out how to reproduce this. Since I do not have your HW I cannot reproduce this, so the only thing I can do is to ask you to reproduce the problem with various debugging patches. CCing Adrian - may be he has ideas. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-01 18:57 ` Artem Bityutskiy @ 2010-09-06 9:17 ` Artem Bityutskiy 2010-09-07 15:59 ` Matthew L. Creech 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-06 9:17 UTC (permalink / raw) To: Matthew L. Creech; +Cc: JamesLNute, linux-mtd, Adrian.Hunter On Wed, 2010-09-01 at 21:57 +0300, Artem Bityutskiy wrote: > > Please let me know if there are any other tests you'd like run on this > > device. Otherwise, we'll probably try booting 2.6.35 and see what > > happens. > > I need to take some time and carefully look at this. And think. Please, > make a copy of the contents of your flash, if you can. > > From your side what would be helpful is if you tried to figure out how > to reproduce this. Since I do not have your HW I cannot reproduce this, > so the only thing I can do is to ask you to reproduce the problem with > various debugging patches. Matthew, I've sent a series of UBI patches which may help us to find out the root cause of your issues. The other idea which would definitely help is to create a debugging patch which will track all erasures of PEBs and store them somewhere. I do not know which tracing debugging tools you have, if you have some fast tracing you can just send this info via your tracing interface. But if you don't, you can use other flash or another partition on your flash and store the info. Here is my thinking: UBIFS index points to an unmapped LEB. There are 2 possibilities: either the index is incorrect or someone - UBI or UBIFS mistakenly unmapped an needed LEB. I personally think this is most probably the latter. So we need to gather information about all unmap operations: 1. which LEB and which PEB is unmapped. The best place to get this info is the 'do_sync_erase()' function. But it does not lave LEB. But we need to add an 'int lnum' parameter there, and amend the callers as well. It is some work, but should not be too difficult. 2. Then we need to know who unmapped the LEB - we need the stackdump. Normally, we use 'dump_stack()' function to print stack dump - but it prints to the log buffer. So we need a function which gives us an array of addresses which we then can save and later transform to symbols. Or we need a func which gives us a string containing an array of addresses. We probably need to implement it. But I think kmemleak is doing something like that - we can look there. But also, make sure no-one in UBI use mtd->erase directly, just in case. I think all erases should go via 'do_sync_erase()' 3. Most erasures are done in the background thread, so the stackdump will point to the background thread, which is not informative at all. This means we should also print PEB/LEB/stackdump in 'schedule_erase()' to track all places where the erasure is initiated. So, for each PEB erase / LEB unmap we store 1, 2 and 3 somewhere. When we hit the UBIFS bug, we can see how the LEB was unmapped and how the PEB was erased - this should give use idea what happened. Do you think you can do something like this? I do not think I have time in near future for this. What do you think? But of course, if you learn how to reproduce this - this would be great. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-06 9:17 ` Artem Bityutskiy @ 2010-09-07 15:59 ` Matthew L. Creech 2010-09-07 17:17 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Matthew L. Creech @ 2010-09-07 15:59 UTC (permalink / raw) To: dedekind1; +Cc: JamesLNute, linux-mtd, Adrian.Hunter On Mon, Sep 6, 2010 at 5:17 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > The other idea which would definitely help is to create a debugging > patch which will track all erasures of PEBs and store them somewhere. I > do not know which tracing debugging tools you have, if you have some > fast tracing you can just send this info via your tracing interface. But > if you don't, you can use other flash or another partition on your flash > and store the info. > ... This makes sense, however I'm not sure of a good way to store this info. I don't have any hardware debugging tools (BDI, etc.) though I could probably get my hands on one if you think it would help. Creating another flash partition and saving data there could work, although I'm not familiar with how to do that safely/cleanly from within the kernel (I could give it a try though). The brute-force method would be to just dump all of this info out to the serial console, and I'll leave a minicom session running on a PC to capture everything that happens. I can't be certain how long it would take to get a freshly-formatted device into this state, but the quantity of output isn't a problem if I'm capturing to a PC. I'll probably have time to set this test up in the next few days, but it may be weeks until the test device goes bad (if it ever goes bad at all). As far as what code should be tested, do you want me to just pull a copy of the latest ubifs-2.6 tree? Or apply these patches to something more stable? Thanks for your help Artem! -- Matthew L. Creech ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-07 15:59 ` Matthew L. Creech @ 2010-09-07 17:17 ` Artem Bityutskiy 2010-09-07 17:48 ` Artem Bityutskiy 0 siblings, 1 reply; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-07 17:17 UTC (permalink / raw) To: Matthew L. Creech; +Cc: JamesLNute, linux-mtd, Adrian.Hunter On Tue, 2010-09-07 at 11:59 -0400, Matthew L. Creech wrote: > On Mon, Sep 6, 2010 at 5:17 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > > The other idea which would definitely help is to create a debugging > > patch which will track all erasures of PEBs and store them somewhere. I > > do not know which tracing debugging tools you have, if you have some > > fast tracing you can just send this info via your tracing interface. But > > if you don't, you can use other flash or another partition on your flash > > and store the info. > > > ... > > This makes sense, however I'm not sure of a good way to store this > info. I don't have any hardware debugging tools (BDI, etc.) though I > could probably get my hands on one if you think it would help. > Creating another flash partition and saving data there could work, > although I'm not familiar with how to do that safely/cleanly from > within the kernel (I could give it a try though). Well, MTD API provides all basic operations. But I guess this is more complex, so I'd leave this as a last resort method. > The brute-force method would be to just dump all of this info out to > the serial console, Serial is too slow. If this bug is about race conditions, serial will make it unreproducible. And this will slow down stuff too much, so you'd run the same amount of operations tens time slower. Take a look at the netconsole - using ethernet will be faster. It is really easy to setup - just do not forget to switch off/configure iptables. See Documentation/networking/netconsole.txt > and I'll leave a minicom session running on a PC > to capture everything that happens. I can't be certain how long it > would take to get a freshly-formatted device into this state, but the > quantity of output isn't a problem if I'm capturing to a PC. Yeah, s/minicom/netconsole/, and capture the stuff in a file. This will makes stuff simpler, at least the 'dump_stack()' issue I told about will go away. > I'll probably have time to set this test up in the next few days, but > it may be weeks until the test device goes bad (if it ever goes bad at > all). As far as what code should be tested, do you want me to just > pull a copy of the latest ubifs-2.6 tree? This is probably not so important. Use the ubifs you have, but I'd like to also see it. And send the debugging patch to me for reveiw to comments (you'll need to prepare). > Or apply these patches to > something more stable? I think the ubifs base where you already saw the issue will be the best. > Thanks for your help Artem! NP, feel free to ask questions. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: ubi_eba_init_scan: cannot reserve enough PEBs 2010-09-07 17:17 ` Artem Bityutskiy @ 2010-09-07 17:48 ` Artem Bityutskiy 0 siblings, 0 replies; 25+ messages in thread From: Artem Bityutskiy @ 2010-09-07 17:48 UTC (permalink / raw) To: Matthew L. Creech; +Cc: JamesLNute, linux-mtd, Adrian.Hunter On Tue, 2010-09-07 at 20:17 +0300, Artem Bityutskiy wrote: > > I'll probably have time to set this test up in the next few days, but > > it may be weeks until the test device goes bad (if it ever goes bad at > > all). As far as what code should be tested, do you want me to just > > pull a copy of the latest ubifs-2.6 tree? > > This is probably not so important. Use the ubifs you have, but I'd like > to also see it. And send the debugging patch to me for reveiw to > comments (you'll need to prepare). Err, I also think you should try my UBI patches which may help. But they may themselves have a bug. May be 2 setups - one with one of your trees and the other is latest ubifs + those patches? -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-09-07 17:48 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-22 18:37 ubi_eba_init_scan: cannot reserve enough PEBs Matthew L. Creech 2010-07-26 5:21 ` Artem Bityutskiy 2010-07-26 21:13 ` Matthew L. Creech 2010-07-27 15:12 ` Artem Bityutskiy 2010-07-27 15:21 ` Artem Bityutskiy 2010-07-28 5:46 ` Stefani Seibold 2010-08-22 15:04 ` Artem Bityutskiy 2010-08-31 12:09 ` Stefani Seibold 2010-09-01 15:47 ` Artem Bityutskiy 2010-09-02 6:47 ` Stefani Seibold 2010-09-02 9:45 ` Artem Bityutskiy 2010-08-22 15:02 ` Artem Bityutskiy 2010-07-27 20:47 ` Matthew L. Creech 2010-07-30 16:12 ` Artem Bityutskiy 2010-07-30 17:51 ` Matthew L. Creech 2010-08-02 4:22 ` Artem Bityutskiy 2010-08-22 18:30 ` Artem Bityutskiy 2010-08-24 22:38 ` Matthew L. Creech 2010-08-25 3:51 ` Artem Bityutskiy 2010-08-31 15:36 ` Matthew L. Creech 2010-09-01 18:57 ` Artem Bityutskiy 2010-09-06 9:17 ` Artem Bityutskiy 2010-09-07 15:59 ` Matthew L. Creech 2010-09-07 17:17 ` Artem Bityutskiy 2010-09-07 17:48 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).