public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* ubifs error at boot "bad node type"
@ 2014-07-08 10:10 Richard Genoud
  2014-07-09  0:51 ` hujianyang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Richard Genoud @ 2014-07-08 10:10 UTC (permalink / raw)
  To: Artem Bityutskiy, Adrian Hunter; +Cc: linux-mtd

Hi Artem, Adrian,

I ran into an ubifs error on a board (discovered it because the board wouldn't boot anymore).
The /etc directory seems to be damaged.
To start checking what's wrong, I boot the same kernel from NFSROOT instead of nand.
The kernel is quite recent (3.14.5), and of course, I don't know how the /etc directory got damaged.

The UBI layer seems to be perfectly happy, no bad block, just some bitflips :

# nanddump /dev/mtd3 -f /dev/null
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x0f580000...
[  744.710937] atmel_nand 40000000.nand: Bit flip in data area, byte_pos: 1588, bit_pos: 5, 0x2d -> 0x0d
ECC: 1 corrected bitflip(s) at offset 0x09ca8000
[  746.625000] atmel_nand 40000000.nand: Bit flip in data area, byte_pos: 78, bit_pos: 0, 0xff -> 0xfe
ECC: 1 corrected bitflip(s) at offset 0x0abdb800

# insmod ./mtd_readtest.ko dev=3
[ 2702.210937] 
[ 2702.210937] =================================================
[ 2702.218750] mtd_readtest: MTD device: 3
[ 2702.218750] mtd_readtest: MTD device size 257425408, eraseblock size 131072, page size 2048, count of eraseblocks 1964, pages per eraseblock 64, OOB size 64
[ 2702.226562] mtd_test: scanning for bad eraseblocks
[ 2702.234375] mtd_test: scanned 1964 eraseblocks, 0 are bad
[ 2702.242187] mtd_readtest: testing page read
[ 2721.179687] atmel_nand 40000000.nand: Bit flip in data area, byte_pos: 1588, bit_pos: 5, 0x2d -> 0x0d
[ 2721.875000] atmel_nand 40000000.nand: Bit flip in data area, byte_pos: 647, bit_pos: 1, 0x06 -> 0x04
[ 2723.031250] atmel_nand 40000000.nand: Bit flip in data area, byte_pos: 78, bit_pos: 0, 0xff -> 0xfe
[ 2732.031250] mtd_readtest: finished
[ 2732.039062] =================================================




ubiattach is ok:
# ubiattach -m 3
[   50.164062] UBI: default fastmap pool size: 95
[   50.164062] UBI: default fastmap WL pool size: 25
[   50.171875] UBI: attaching mtd3 to ubi0
[   51.070312] UBI: scanning is finished
[   51.101562] UBI: attached mtd3 (name "ubi", size 245 MiB) to ubi0
[   51.101562] UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[   51.109375] UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[   51.117187] UBI: VID header offset: 2048 (aligned 2048), data offset: 4096
[   51.125000] UBI: good PEBs: 1964, bad PEBs: 0, corrupted PEBs: 0
[   51.132812] UBI: user volume: 3, internal volumes: 1, max. volumes count: 128
[   51.132812] UBI: max/mean erase counter: 5/1, WL threshold: 4096, image sequence number: 906707977
[   51.140625] UBI: available PEBs: 0, total reserved PEBs: 1964, PEBs reserved for bad PEB handling: 40
[   51.148437] UBI: background thread "ubi_bgt0d" started, PID 649
UBI device number 0, total 1964 LEBs (249380864 bytes, 237.8 MiB), available 0 LEBs (0 bytes), LEB size 126976 bytes (124.0 KiB)


mounting the ubifs volume is also ok:
# mount -oro,chk_data_crc -tubifs /dev/ubi0_1 /mnt/rfs
[   76.187500] UBIFS: recovery needed
[   76.226562] UBIFS: recovery deferred
[   76.234375] UBIFS: mounted UBI device 0, volume 1, name "rootfs", R/O mode
[   76.242187] UBIFS: LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[   76.250000] UBIFS: FS size: 66789376 bytes (63 MiB, 526 LEBs), journal size 9015296 bytes (8 MiB, 71 LEBs)
[   76.257812] UBIFS: reserved for root: 0 bytes (0 KiB)
[   76.265625] UBIFS: media format: w4/r0 (latest is w4/r0), UUID 9BEA6CAF-3082-4515-BE97-02395D576493, small LPT model


But trying to list / some files fails :

# ls -l /mnt/rfs/
[   87.054687] UBIFS error (pid 652): ubifs_read_node: bad node type (255 but expected 0)
[   87.062500] UBIFS error (pid 652): ubifs_read_node: bad node at LEB 405:39728, LEB mapping status 1
[   87.070312] Not a node, first 24 bytes:
[   87.078125] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
[   87.085937] CPU: 0 PID: 652 Comm: ls Not tainted 3.14.5-g5e62e31 #1
[   87.093750] [<c000d298>] (unwind_backtrace) from [<c000baa8>] (show_stack+0x10/0x14)
[   87.101562] [<c000baa8>] (show_stack) from [<c00ec5b0>] (ubifs_read_node+0x250/0x28c)
[   87.109375] [<c00ec5b0>] (ubifs_read_node) from [<c0106d0c>] (ubifs_tnc_read_node+0x48/0x124)
[   87.117187] [<c0106d0c>] (ubifs_tnc_read_node) from [<c00ef558>] (ubifs_tnc_locate+0xfc/0x18c)
[   87.125000] [<c00ef558>] (ubifs_tnc_locate) from [<c00e8168>] (ubifs_iget+0x64/0x7c0)
[   87.132812] [<c00e8168>] (ubifs_iget) from [<c00e4b74>] (ubifs_lookup+0x1b0/0x1e0)
[   87.140625] [<c00e4b74>] (ubifs_lookup) from [<c007cc38>] (lookup_real+0x30/0x4c)
[   87.148437] [<c007cc38>] (lookup_real) from [<c007d898>] (__lookup_hash+0x30/0x3c)
[   87.148437] [<c007d898>] (__lookup_hash) from [<c007d8e0>] (lookup_slow+0x3c/0x9c)
[   87.156250] [<c007d8e0>] (lookup_slow) from [<c007ee48>] (path_lookupat+0x110/0x6b8)
[   87.164062] [<c007ee48>] (path_lookupat) from [<c007f410>] (filename_lookup+0x20/0x5c)
[   87.171875] [<c007f410>] (filename_lookup) from [<c008189c>] (user_path_at_empty+0x54/0x7c)
[   87.179687] [<c008189c>] (user_path_at_empty) from [<c00818d4>] (user_path_at+0x10/0x14)
[   87.187500] [<c00818d4>] (user_path_at) from [<c0078fdc>] (vfs_fstatat+0x44/0x8c)
[   87.195312] [<c0078fdc>] (vfs_fstatat) from [<c007926c>] (SyS_lstat64+0x14/0x30)
[   87.203125] [<c007926c>] (SyS_lstat64) from [<c0009360>] (ret_fast_syscall+0x0/0x2c)
[   87.210937] UBIFS error (pid 652): ubifs_iget: failed to read inode 68, error -22
[   87.218750] UBIFS error (pid 652): ubifs_lookup: dead directory entry 'etc', error -22
[   87.226562] UBIFS warning (pid 652): ubifs_ro_mode: switched to read-only mode, error -22
[   87.234375] CPU: 0 PID: 652 Comm: ls Not tainted 3.14.5-g5e62e31 #1
[   87.234375] [<c000d298>] (unwind_backtrace) from [<c000baa8>] (show_stack+0x10/0x14)
[   87.242187] [<c000baa8>] (show_stack) from [<c00e4af4>] (ubifs_lookup+0x130/0x1e0)
[   87.250000] [<c00e4af4>] (ubifs_lookup) from [<c007cc38>] (lookup_real+0x30/0x4c)
[   87.257812] [<c007cc38>] (lookup_real) from [<c007d898>] (__lookup_hash+0x30/0x3c)
[   87.265625] [<c007d898>] (__lookup_hash) from [<c007d8e0>] (lookup_slow+0x3c/0x9c)
[   87.273437] [<c007d8e0>] (lookup_slow) from [<c007ee48>] (path_lookupat+0x110/0x6b8)
[   87.281250] [<c007ee48>] (path_lookupat) from [<c007f410>] (filename_lookup+0x20/0x5c)
[   87.289062] [<c007f410>] (filename_lookup) from [<c008189c>] (user_path_at_empty+0x54/0x7c)
[   87.296875] [<c008189c>] (user_path_at_empty) from [<c00818d4>] (user_path_at+0x10/0x14)
[   87.304687] [<c00818d4>] (user_path_at) from [<c0078fdc>] (vfs_fstatat+0x44/0x8c)
[   87.312500] [<c0078fdc>] (vfs_fstatat) from [<c007926c>] (SyS_lstat64+0x14/0x30)
[   87.320312] [<c007926c>] (SyS_lstat64) from [<c0009360>] (ret_fast_syscall+0x0/0x2c)
ls: /mnt/rfs/etc: Invalid argument
total 0
drwxr-xr-x    2 root     root         10512 Jun 16 16:05 bin/
drwxr-xr-x    2 root     root           232 Jun  6 06:30 data/
drwxr-xr-x    2 root     root           224 Jun 16 16:46 dev/
drwxr-xr-x    4 root     root           288 Jun 16 16:46 home/
drwxr-xr-x    5 root     root          3200 Jun 16 16:45 lib/
lrwxrwxrwx    1 root     root             3 Jun 16 15:49 lib32 -> lib/
lrwxrwxrwx    1 root     root            11 Jun 16 16:05 linuxrc -> bin/busybox*
drwxr-xr-x    2 root     root           160 Jun  6 06:30 mnt/
drwxr-xr-x    2 root     root           160 Jun  6 06:30 proc/
drwx------    3 root     root           672 Jun  6 06:30 root/
lrwxrwxrwx    1 root     root             3 Jun  6 06:30 run -> tmp/
drwxr-xr-x    2 root     root          3848 Jun 16 16:41 sbin/
drwxr-xr-x    2 root     root           160 Jun  6 06:30 sys/
drwxrwxrwt    3 root     root           224 Jun  6 06:30 tmp/
drwxr-xr-x    9 root     root           696 Jun 16 16:45 usr/
drwxr-xr-x    4 root     root           672 Jun 16 16:29 var/


Before trying to remove the volume, I would like to know what you suggest to investigate this ?


# ubinfo /dev/ubi0
ubi0
Volumes count:                           3
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     1964 (249380864 bytes, 237.8 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  40
Current maximum erase counter value:     5
Minimum input/output unit size:          2048 bytes
Character device major/minor:            251:0
Present volumes:                         0, 1, 2




dmesg:
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 3.14.5-g5e62e31 (info@paratronic.fr) (gcc version 4.7.3 (Buildroot 2013.08-00663-g9bc025a) ) #1 2014-06-02 17:34:43 +0200
[    0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
[    0.000000] CPU: VIVT data cache, VIVT instruction cache
[    0.000000] Machine model: Paratronic LNS
[    0.000000] Memory policy: Data cache writeback
[    0.000000] AT91: Detected soc type: at91sam9x5
[    0.000000] AT91: Detected soc subtype: at91sam9g35
[    0.000000] AT91: sram at 0x300000 of 0x8000 mapped at 0xfef70000
[    0.000000] On node 0 totalpages: 32768
[    0.000000] free_area_init_node: node 0, pgdat c03a3fa8, node_mem_map c7efa000
[    0.000000]   Normal zone: 256 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 32768 pages, LIFO batch:7
[    0.000000] Clocks: CPU 400 MHz, master 133 MHz, main 12.000 MHz
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
[    0.000000] Kernel command line: console=ttyS0,115200 loglevel=8 panic=2 mtdparts=atmel_nand:256M(all),128k@0(dtb),10112k(kernel),251392k(ubi),512k(bbt)ro ip=dhcp rw root=/dev/nfs nfsroot=192.168.7.1:/nfsroot,v3
[    0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[    0.000000] Memory: 125920K/131072K available (2600K kernel code, 145K rwdata, 828K rodata, 118K init, 194K bss, 5152K reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xc8800000 - 0xff000000   ( 872 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0xc0008000 - 0xc036163c   (3430 kB)
[    0.000000]       .init : 0xc0362000 - 0xc037f98c   ( 119 kB)
[    0.000000]       .data : 0xc0380000 - 0xc03a46a0   ( 146 kB)
[    0.000000]        .bss : 0xc03a46a0 - 0xc03d502c   ( 195 kB)
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] sched_clock: 32 bits at 128 Hz, resolution 7812500ns, wraps every 16777216000000000ns
[    0.000000] Console: colour dummy device 80x30
[    0.054687] Calibrating delay loop... 198.76 BogoMIPS (lpj=775168)
[    0.054687] pid_max: default: 32768 minimum: 301
[    0.062500] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.062500] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.070312] CPU: Testing write buffer coherency: ok
[    0.070312] Setting up static identity map for 0x20275978 - 0x202759b4
[    0.078125] devtmpfs: initialized
[    0.078125] pinctrl core: initialized pinctrl subsystem
[    0.078125] NET: Registered protocol family 16
[    0.078125] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.109375] gpio-at91 fffff400.gpio: at address fefff400
[    0.109375] gpio-at91 fffff600.gpio: at address fefff600
[    0.117187] gpio-at91 fffff800.gpio: at address fefff800
[    0.117187] gpio-at91 fffffa00.gpio: at address fefffa00
[    0.117187] pinctrl-at91 pinctrl.2: initialized AT91 pinctrl driver
[    0.117187] tcb_clksrc: tc0 at 16.166 MHz
[    0.148437] bio: create slab <bio-0> at 0
[    0.148437] at_hdmac ffffec00.dma-controller: Atmel AHB DMA Controller ( cpy slave ), 8 channels
[    0.156250] at_hdmac ffffee00.dma-controller: Atmel AHB DMA Controller ( cpy slave ), 8 channels
[    0.171875] Switched to clocksource tcb_clksrc
[    0.226562] NET: Registered protocol family 2
[    0.226562] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
[    0.226562] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
[    0.226562] TCP: Hash tables configured (established 1024 bind 1024)
[    0.226562] TCP: reno registered
[    0.226562] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    0.226562] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    0.226562] NET: Registered protocol family 1
[    0.226562] RPC: Registered named UNIX socket transport module.
[    0.226562] RPC: Registered udp transport module.
[    0.226562] RPC: Registered tcp transport module.
[    0.226562] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.234375] futex hash table entries: 256 (order: -1, 3072 bytes)
[    0.234375] msgmni has been set to 245
[    0.234375] NET: Registered protocol family 38
[    0.242187] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.242187] io scheduler noop registered (default)
[    0.242187] atmel_lcdfb f8038000.lcd: 300KiB frame buffer at 27980000 (mapped at c8848000)
[    0.882812] atmel_lcdfb f8038000.lcd: fb0: Atmel LCDC at 0xf8038000 (mapped at c8896000), irq 33
[    0.882812] fffff200.serial: ttyS0 at MMIO 0xfffff200 (irq = 16, base_baud = 8333333) is a ATMEL_SERIAL
[    1.242187] console [ttyS0] enabled
[    1.250000] f801c000.serial: ttyS1 at MMIO 0xf801c000 (irq = 23, base_baud = 8333333) is a ATMEL_SERIAL
[    1.257812] f8020000.serial: ttyS2 at MMIO 0xf8020000 (irq = 24, base_baud = 8333333) is a ATMEL_SERIAL
[    1.265625] f8024000.serial: ttyS3 at MMIO 0xf8024000 (irq = 25, base_baud = 8333333) is a ATMEL_SERIAL
[    1.281250] brd: module loaded
[    1.304687] loop: module loaded
[    1.320312] atmel_nand 40000000.nand: Use On Flash BBT
[    1.328125] atmel_nand 40000000.nand: No DMA support for NAND access.
[    1.335937] Device Unique ID: 8a b2 e0 f2 e0 78 20 65 ff 00 ff 00 ff 00 ff 00
[    1.335937] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[    1.343750] nand: Micron MT29F2G08ABAEAWP
[    1.351562] nand: 256MiB, SLC, page size: 2048, OOB size: 64
[    1.351562] atmel_nand 40000000.nand: minimum ECC: 4 bits in 512 bytes
[    1.359375] atmel_nand 40000000.nand: Initialize PMECC params, cap: 4, sector: 512
[    1.367187] Bad block table found at page 131008, version 0x01
[    1.375000] Bad block table found at page 130944, version 0x01
[    1.382812] 5 cmdlinepart partitions found on MTD device atmel_nand
[    1.382812] Creating 5 MTD partitions on "atmel_nand":
[    1.390625] 0x000000000000-0x000010000000 : "all"
[    1.398437] 0x000000000000-0x000000020000 : "dtb"
[    1.406250] 0x000000020000-0x000000a00000 : "kernel"
[    1.406250] 0x000000a00000-0x00000ff80000 : "ubi"
[    1.414062] 0x00000ff80000-0x000010000000 : "bbt"
[    1.421875] atmel_spi f0000000.spi: version: 0x212
[    1.429687] atmel_spi f0000000.spi: Using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
[    1.437500] atmel_spi f0000000.spi: Atmel SPI Controller at 0xf0000000 (irq 28)
[    1.445312] atmel_spi f0004000.spi: version: 0x212
[    1.453125] atmel_spi f0004000.spi: Using dma1chan0 (tx) and dma1chan1 (rx) for DMA transfers
[    1.460937] atmel_spi f0004000.spi: Atmel SPI Controller at 0xf0004000 (irq 29)
[    1.476562] libphy: MACB_mii_bus: probed
[    1.562500] macb f802c000.ethernet eth0: Cadence MACB at 0xf802c000 irq 31 (00:04:a3:d2:a7:ef)
[    1.570312] macb f802c000.ethernet eth0: attached PHY driver [Davicom DM9161A] (mii_bus:phy_addr=f802c000.etherne:00, irq=-1)
[    1.578125] at91_wdt fffffe40.watchdog: watchdog is disabled
[    1.585937] at91_wdt: probe of fffffe40.watchdog failed with error -22
[    1.593750] TCP: cubic registered
[    1.593750] NET: Registered protocol family 17
[    1.601562] Key type dns_resolver registered
[    4.562500] macb f802c000.ethernet eth0: link up (100/Full)
[    4.593750] Sending DHCP requests ., OK
[    4.632812] IP-Config: Got DHCP answer from 192.168.7.1, my address is 192.168.7.207
[    4.640625] IP-Config: Complete:
[    4.640625]      device=eth0, hwaddr=00:04:a3:d2:a7:ef, ipaddr=192.168.7.207, mask=255.255.255.0, gw=192.168.7.1
[    4.648437]      host=192.168.7.207, domain=, nis-domain=(none)
[    4.656250]      bootserver=192.168.7.1, rootserver=192.168.7.1, rootpath=/nfsroot
[    4.664062]      nameserver0=192.168.1.2
[    4.757812] VFS: Mounted root (nfs filesystem) on device 0:12.
[    4.765625] devtmpfs: mounted
[    4.765625] Freeing unused kernel memory: 116K (c0362000 - c037f000)
[    5.070312] UBIFS error (pid 353): ubifs_mount: cannot open "ubi0:data", error -19
[    7.781250] udevd[468]: starting version 182
[    9.585937] random: dd urandom read with 110 bits of entropy available
[   10.742187] random: nonblocking pool is initialized
# 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-08 10:10 ubifs error at boot "bad node type" Richard Genoud
@ 2014-07-09  0:51 ` hujianyang
  2014-07-09  7:23   ` Richard Genoud
  2014-07-16  8:29 ` Artem Bityutskiy
  2014-07-16 12:17 ` Artem Bityutskiy
  2 siblings, 1 reply; 12+ messages in thread
From: hujianyang @ 2014-07-09  0:51 UTC (permalink / raw)
  To: Richard Genoud; +Cc: linux-mtd

> 
> # ls -l /mnt/rfs/
> [   87.054687] UBIFS error (pid 652): ubifs_read_node: bad node type (255 but expected 0)
> [   87.062500] UBIFS error (pid 652): ubifs_read_node: bad node at LEB 405:39728, LEB mapping status 1
> [   87.070312] Not a node, first 24 bytes:
> [   87.078125] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................

Seems LEB 405 is empty? This is not the first time I saw
an error like this.

Is this happen after an unclean umount like power cut?

Artem, do you have any ideas?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-09  0:51 ` hujianyang
@ 2014-07-09  7:23   ` Richard Genoud
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Genoud @ 2014-07-09  7:23 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-mtd

2014-07-09 2:51 GMT+02:00 hujianyang <hujianyang@huawei.com>:
>>
>> # ls -l /mnt/rfs/
>> [   87.054687] UBIFS error (pid 652): ubifs_read_node: bad node type (255 but expected 0)
>> [   87.062500] UBIFS error (pid 652): ubifs_read_node: bad node at LEB 405:39728, LEB mapping status 1
>> [   87.070312] Not a node, first 24 bytes:
>> [   87.078125] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
>
> Seems LEB 405 is empty? This is not the first time I saw
> an error like this.
>
> Is this happen after an unclean umount like power cut?
I'm only guessing that it happened after a power cut, I wasn't there
when it happened...

For now, it's the 1st and only time.
I'll wait and see if it happens again.


> Artem, do you have any ideas?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-08 10:10 ubifs error at boot "bad node type" Richard Genoud
  2014-07-09  0:51 ` hujianyang
@ 2014-07-16  8:29 ` Artem Bityutskiy
  2014-07-16 12:02   ` Richard Weinberger
  2014-07-16 12:06   ` hujianyang
  2014-07-16 12:17 ` Artem Bityutskiy
  2 siblings, 2 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2014-07-16  8:29 UTC (permalink / raw)
  To: Richard Genoud; +Cc: linux-mtd, Adrian Hunter

On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
> ubiattach is ok:
> # ubiattach -m 3
> [   50.164062] UBI: default fastmap pool size: 95
> [   50.164062] UBI: default fastmap WL pool size: 25

I do not know why this happened, I never saw reports like this before.
Most probably this is because of fastmap. I did not hear any report that
it was extensively verified WRT power cuts, and my theory is that there
is a bug in fastmap which causes this, and this may be related to power
cuts. I do not have proves, but suggest you to dig in this direction.
E.g., setup power-cut testing.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16  8:29 ` Artem Bityutskiy
@ 2014-07-16 12:02   ` Richard Weinberger
  2014-07-16 12:26     ` Richard Genoud
  2014-07-16 12:06   ` hujianyang
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Weinberger @ 2014-07-16 12:02 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Richard Genoud, linux-mtd@lists.infradead.org, Adrian Hunter

On Wed, Jul 16, 2014 at 10:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
>> ubiattach is ok:
>> # ubiattach -m 3
>> [   50.164062] UBI: default fastmap pool size: 95
>> [   50.164062] UBI: default fastmap WL pool size: 25
>
> I do not know why this happened, I never saw reports like this before.
> Most probably this is because of fastmap. I did not hear any report that
> it was extensively verified WRT power cuts, and my theory is that there
> is a bug in fastmap which causes this, and this may be related to power
> cuts. I do not have proves, but suggest you to dig in this direction.
> E.g., setup power-cut testing.

The logs don't show the "attached by fastmap" message. So UBI got
attached by scanning.
So, fastmap support is here but not used.

Only the fastmap pools are used.

> --
> Best Regards,
> Artem Bityutskiy
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/



-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16  8:29 ` Artem Bityutskiy
  2014-07-16 12:02   ` Richard Weinberger
@ 2014-07-16 12:06   ` hujianyang
  1 sibling, 0 replies; 12+ messages in thread
From: hujianyang @ 2014-07-16 12:06 UTC (permalink / raw)
  To: dedekind1; +Cc: Richard Genoud, linux-mtd, Adrian Hunter

On 2014/7/16 16:29, Artem Bityutskiy wrote:
> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
>> ubiattach is ok:
>> # ubiattach -m 3
>> [   50.164062] UBI: default fastmap pool size: 95
>> [   50.164062] UBI: default fastmap WL pool size: 25
> 
> I do not know why this happened, I never saw reports like this before.
> Most probably this is because of fastmap. I did not hear any report that
> it was extensively verified WRT power cuts, and my theory is that there
> is a bug in fastmap which causes this, and this may be related to power
> cuts. I do not have proves, but suggest you to dig in this direction.
> E.g., setup power-cut testing.
> 

My colleague who use UBIFS with kernel v2.6.34(WR4?) hit
an error like this:

UBIFS error (pid 0): ubifs_read_node: bad node type (255 but expected 0)
UBIFS error (pid 0): ubifs_read_node: bad node at LEB 112:67824, LEB mapping status 1
UBIFS error (pid 0): ubifs_iget: failed to read inode 1, error -22

He said he was running UBIFS for 3 years and would hit this
error about one time per year.

I googled these error messages and found some same reports
in 2010~2012. So I think this problem is fixed by early
fixes, I asked him to backport this patch set

[PATCH 00/13] UBI reliability improvements

but I'm sure of that. After this error report, I was
worrying about this bug still left.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-08 10:10 ubifs error at boot "bad node type" Richard Genoud
  2014-07-09  0:51 ` hujianyang
  2014-07-16  8:29 ` Artem Bityutskiy
@ 2014-07-16 12:17 ` Artem Bityutskiy
  2 siblings, 0 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2014-07-16 12:17 UTC (permalink / raw)
  To: Richard Genoud; +Cc: linux-mtd, Adrian Hunter

On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
> [   87.054687] UBIFS error (pid 652): ubifs_read_node: bad node type
> (255 but expected 0)
> [   87.062500] UBIFS error (pid 652): ubifs_read_node: bad node at LEB
> 405:39728, LEB mapping status 1
> [   87.070312] Not a node, first 24 bytes:

I think it is a good idea do dump the flash and save it, just in case.

So the LEB is mapped.

But the TNC points to an an area with all 0xFFs.

Either TNC is incorrect or, the UBI did not map the LEB to the correct
PEB. "Fastmap" would be an easy target to blame for me, but if it is not
used, I'd say the former is more probably.

Enabling full UBI/UBIFS debugging and reproducing the failure could
help.

And please, do save the flash image (use nanddump).

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16 12:02   ` Richard Weinberger
@ 2014-07-16 12:26     ` Richard Genoud
  2014-07-16 12:32       ` Richard Weinberger
  2014-07-16 12:33       ` Artem Bityutskiy
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Genoud @ 2014-07-16 12:26 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mtd@lists.infradead.org, Adrian Hunter, Artem Bityutskiy

2014-07-16 14:02 GMT+02:00 Richard Weinberger <richard.weinberger@gmail.com>:
> On Wed, Jul 16, 2014 at 10:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
>>> ubiattach is ok:
>>> # ubiattach -m 3
>>> [   50.164062] UBI: default fastmap pool size: 95
>>> [   50.164062] UBI: default fastmap WL pool size: 25
>>
>> I do not know why this happened, I never saw reports like this before.
>> Most probably this is because of fastmap. I did not hear any report that
>> it was extensively verified WRT power cuts, and my theory is that there
>> is a bug in fastmap which causes this, and this may be related to power
>> cuts. I do not have proves, but suggest you to dig in this direction.
>> E.g., setup power-cut testing.
>
> The logs don't show the "attached by fastmap" message. So UBI got
> attached by scanning.
> So, fastmap support is here but not used.
>
> Only the fastmap pools are used.

Well, actually, fastmap is used on this volume when I boot from NAND.
(and I must say, It's fast !)
Just this time, as I boot from nfsroot, or from a unclean state,
fastmap didn't attached UBI.
I'll check if it's still operational.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16 12:26     ` Richard Genoud
@ 2014-07-16 12:32       ` Richard Weinberger
  2014-07-16 14:39         ` Richard Genoud
  2014-07-16 12:33       ` Artem Bityutskiy
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Weinberger @ 2014-07-16 12:32 UTC (permalink / raw)
  To: Richard Genoud
  Cc: linux-mtd@lists.infradead.org, Adrian Hunter, Artem Bityutskiy

Am 16.07.2014 14:26, schrieb Richard Genoud:
> 2014-07-16 14:02 GMT+02:00 Richard Weinberger <richard.weinberger@gmail.com>:
>> On Wed, Jul 16, 2014 at 10:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>>> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
>>>> ubiattach is ok:
>>>> # ubiattach -m 3
>>>> [   50.164062] UBI: default fastmap pool size: 95
>>>> [   50.164062] UBI: default fastmap WL pool size: 25
>>>
>>> I do not know why this happened, I never saw reports like this before.
>>> Most probably this is because of fastmap. I did not hear any report that
>>> it was extensively verified WRT power cuts, and my theory is that there
>>> is a bug in fastmap which causes this, and this may be related to power
>>> cuts. I do not have proves, but suggest you to dig in this direction.
>>> E.g., setup power-cut testing.
>>
>> The logs don't show the "attached by fastmap" message. So UBI got
>> attached by scanning.
>> So, fastmap support is here but not used.
>>
>> Only the fastmap pools are used.
> 
> Well, actually, fastmap is used on this volume when I boot from NAND.
> (and I must say, It's fast !)
> Just this time, as I boot from nfsroot, or from a unclean state,
> fastmap didn't attached UBI.
> I'll check if it's still operational.

Okay, this explains your logs.
I fear you don't have the original image anymore?
Just thinking of how to understand/analyze this issue... :-\

Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16 12:26     ` Richard Genoud
  2014-07-16 12:32       ` Richard Weinberger
@ 2014-07-16 12:33       ` Artem Bityutskiy
  2014-07-16 12:49         ` Richard Weinberger
  1 sibling, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2014-07-16 12:33 UTC (permalink / raw)
  To: Richard Genoud
  Cc: Richard Weinberger, linux-mtd@lists.infradead.org, Adrian Hunter

On Wed, 2014-07-16 at 14:26 +0200, Richard Genoud wrote:
> 2014-07-16 14:02 GMT+02:00 Richard Weinberger <richard.weinberger@gmail.com>:
> > On Wed, Jul 16, 2014 at 10:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> >> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
> >>> ubiattach is ok:
> >>> # ubiattach -m 3
> >>> [   50.164062] UBI: default fastmap pool size: 95
> >>> [   50.164062] UBI: default fastmap WL pool size: 25
> >>
> >> I do not know why this happened, I never saw reports like this before.
> >> Most probably this is because of fastmap. I did not hear any report that
> >> it was extensively verified WRT power cuts, and my theory is that there
> >> is a bug in fastmap which causes this, and this may be related to power
> >> cuts. I do not have proves, but suggest you to dig in this direction.
> >> E.g., setup power-cut testing.
> >
> > The logs don't show the "attached by fastmap" message. So UBI got
> > attached by scanning.
> > So, fastmap support is here but not used.
> >
> > Only the fastmap pools are used.
> 
> Well, actually, fastmap is used on this volume when I boot from NAND.
> (and I must say, It's fast !)
> Just this time, as I boot from nfsroot, or from a unclean state,
> fastmap didn't attached UBI.
> I'll check if it's still operational.

Just a head-up: check the Kconfig message for this option, it says that
even the on-flash format for it may change, although I am not quite sure
if Richard W. still means this.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16 12:33       ` Artem Bityutskiy
@ 2014-07-16 12:49         ` Richard Weinberger
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Weinberger @ 2014-07-16 12:49 UTC (permalink / raw)
  To: dedekind1, Richard Genoud; +Cc: linux-mtd@lists.infradead.org, Adrian Hunter

Am 16.07.2014 14:33, schrieb Artem Bityutskiy:
>> Well, actually, fastmap is used on this volume when I boot from NAND.
>> (and I must say, It's fast !)
>> Just this time, as I boot from nfsroot, or from a unclean state,
>> fastmap didn't attached UBI.
>> I'll check if it's still operational.
> 
> Just a head-up: check the Kconfig message for this option, it says that
> even the on-flash format for it may change, although I am not quite sure
> if Richard W. still means this.

Correct, fastmap *is* experimental.
But I'm eager to sort issues out.
And the very last thing I'd do is changing the on-flash layout.

Now the bad news, currently nobody pays me to work on UBI/Fastmap.
This means that paid projects go first.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ubifs error at boot "bad node type"
  2014-07-16 12:32       ` Richard Weinberger
@ 2014-07-16 14:39         ` Richard Genoud
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Genoud @ 2014-07-16 14:39 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mtd@lists.infradead.org, Adrian Hunter, Artem Bityutskiy

2014-07-16 14:32 GMT+02:00 Richard Weinberger <richard@nod.at>:
> Am 16.07.2014 14:26, schrieb Richard Genoud:
>> 2014-07-16 14:02 GMT+02:00 Richard Weinberger <richard.weinberger@gmail.com>:
>>> On Wed, Jul 16, 2014 at 10:29 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>>>> On Tue, 2014-07-08 at 12:10 +0200, Richard Genoud wrote:
>>>>> ubiattach is ok:
>>>>> # ubiattach -m 3
>>>>> [   50.164062] UBI: default fastmap pool size: 95
>>>>> [   50.164062] UBI: default fastmap WL pool size: 25
>>>>
>>>> I do not know why this happened, I never saw reports like this before.
>>>> Most probably this is because of fastmap. I did not hear any report that
>>>> it was extensively verified WRT power cuts, and my theory is that there
>>>> is a bug in fastmap which causes this, and this may be related to power
>>>> cuts. I do not have proves, but suggest you to dig in this direction.
>>>> E.g., setup power-cut testing.
>>>
>>> The logs don't show the "attached by fastmap" message. So UBI got
>>> attached by scanning.
>>> So, fastmap support is here but not used.
>>>
>>> Only the fastmap pools are used.
>>
>> Well, actually, fastmap is used on this volume when I boot from NAND.
>> (and I must say, It's fast !)
>> Just this time, as I boot from nfsroot, or from a unclean state,
>> fastmap didn't attached UBI.
>> I'll check if it's still operational.
>
> Okay, this explains your logs.
> I fear you don't have the original image anymore?
> Just thinking of how to understand/analyze this issue... :-\
Actually (after digging a little bit), I found the rootfs.ubifs image used.
It was flashed with ubiupdatevol

And, as Artem suggested, I'm going to save the entire flash.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-07-16 14:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-08 10:10 ubifs error at boot "bad node type" Richard Genoud
2014-07-09  0:51 ` hujianyang
2014-07-09  7:23   ` Richard Genoud
2014-07-16  8:29 ` Artem Bityutskiy
2014-07-16 12:02   ` Richard Weinberger
2014-07-16 12:26     ` Richard Genoud
2014-07-16 12:32       ` Richard Weinberger
2014-07-16 14:39         ` Richard Genoud
2014-07-16 12:33       ` Artem Bityutskiy
2014-07-16 12:49         ` Richard Weinberger
2014-07-16 12:06   ` hujianyang
2014-07-16 12:17 ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox