* [BUG] UBIFS corruption on powerpc 32-bit targets @ 2026-01-29 10:30 Tomas Alvarez Vanoli 2026-01-30 1:34 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-01-29 10:30 UTC (permalink / raw) To: linux-mtd@lists.infradead.org Hello, I am writing beacuse we are experiencing ubifs corruption on our powerpc boards after updating from 6.1 to 6.12.57. I haven't pulled the latest UBIFS or UBI code, I looked around the changes some weeks ago and did not find something that seemed relevant. We have a variety of architectures running practically the same application (w.r.t. file/db writing), and have only experienced this in the powerpc boards, namely one using the layerscape t1040 in 32-bit mode and another one using MPC8360. Corruption appears only with the newer kernel, same compiler and libraries as older one. We've ran integck and fsstress for long periods of time in various boards but we have not been able to reproduce it, it happens with some automated tests on our test lab, after some hours of running tests (not stress tests). The application uses c++'s filesystem library. We have an ubi device (nand flash) holding mostly read-only volumes with kernels and rootfses, one rw volume for configurations (ubifs) and one for storing coredumps (jffs2). We don't have any patches on the mtd layer except reverting this aeeec092e57e6e4a9f7bcc18f71d9e17e9146bce, because we do have mtdblock on top of ubi for the read-only squashfs and no chance to change that due to retrocompatibility. The only way to recover it is to remove the ubi volume and create it again. This is the output of mtdinfo -a (mtd-utils version 2.1.6) for the volumes on the nand: ``` Count of MTD devices: 19 Present MTD devices: mtd0, mtd1, mtd2, mtd3, mtd4, mtd5, mtd6, mtd7,mtd8, mtd9, mtd10, mtd11, mtd12, mtd13, mtd14, mtd15, mtd16, mtd17, mtd18 Sysfs interface supported: yes ... stuff on ubi0 ... mtd10 Name: ubi1 Type: nand Eraseblock size: 131072 bytes, 128.0 KiB Amount of eraseblocks: 4096 (536870912 bytes, 512.0 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes OOB size: 128 bytes Character device major/minor: 90:20 Bad blocks are allowed: true Device is writable: true mtd11 Name: bootfs0 Type: ubi Eraseblock size: 130944 bytes, 127.8 KiB Amount of eraseblocks: 40 (5237760 bytes, 4.9 MiB) Minimum input/output unit size: 1 byte Sub-page size: 1 byte Character device major/minor: 90:22 Bad blocks are allowed: false Device is writable: true mtd12 Name: rootfs0 Type: ubi Eraseblock size: 130944 bytes, 127.8 KiB Amount of eraseblocks: 47 (6154368 bytes, 5.8 MiB) Minimum input/output unit size: 1 byte Sub-page size: 1 byte Character device major/minor: 90:24 Bad blocks are allowed: false Device is writable: true mtd13 Name: cfg Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 1058 (134340608 bytes, 128.1 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:26 Bad blocks are allowed: false Device is writable: true mtd14 Name: coredump Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 42 (5332992 bytes, 5.0 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:28 Bad blocks are allowed: false Device is writable: true mtd15 Name: bootfs1 Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 84 (10665984 bytes, 10.1 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:30 Bad blocks are allowed: false Device is writable: true mtd16 Name: rootfs1 Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 325 (41267200 bytes, 39.3 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:32 Bad blocks are allowed: false Device is writable: true mtd17 Name: bootfs2 Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 84 (10665984 bytes, 10.1 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:34 Bad blocks are allowed: false Device is writable: true mtd18 Name: rootfs2 Type: ubi Eraseblock size: 126976 bytes, 124.0 KiB Amount of eraseblocks: 325 (41267200 bytes, 39.3 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes Character device major/minor: 90:36 Bad blocks are allowed: false Device is writable: true ``` The error we are seeing is: ``` root@kmcent2:~# find /cfg/ -name *tar.gz UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 3917759, error -2 UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry 'ne_config.xml.gz', error -2 UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only mode, error -2 find: /cfg/board/cfg/mob/backup/ne/ne_config.xml.gz: No such fiCPU: 1 UID: 0 PID: 134 Comm: find Not tainted 6.12.57-00435-gf9e139970f1f #0 Hardware name: keymile,kmcent2 e5500 0x80241021 CoreNet Generic Call Trace: [c2cabba0] [c0bbf558] dump_stack_lvl+0xfc/0x120 (unreliable) [c2cabbc0] [c041c8e4] ubifs_lookup+0x378/0x394 [c2cabc20] [c02f100c] __lookup_slow+0xb0/0x1c0 [c2cabc60] [c02f60d4] walk_component+0x158/0x24c [c2cabc90] [c02f7078] path_lookupat+0xa4/0x238 [c2cabcc0] [c02f77b8] filename_lookup+0xcc/0x200 [c2cabd90] [c02e7b08] vfs_statx+0xa0/0x150 [c2cabdd0] [c02e858c] do_statx+0x84/0xe4 [c2cabec0] [c02e87cc] sys_statx+0x8c/0x120 [c2cabef0] [c00127bc] system_call_exception+0x9c/0x1e0 [c2cabf10] [c00170e8] ret_from_syscall+0x0/0x28 --- interrupt: c00 at 0xfe37b04 NIP: 0fe37b04 LR: 0fe37ad0 CTR: 00000000 REGS: c2cabf20 TRAP: 0c00 Not tainted (6.12.57-00435-gf9e139970f1f) MSR: 0002d002 <CE,EE,PR,ME> CR: 24008483 XER: 20000000 GPR00: 0000017f bfe51f30 b7e51800 ffffff9c 100dc360 00000900 000007ff bfe51f38 GPR08: 00000001 00000008 00008ffa c2cabf10 24004884 100c365a 00000000 10144bf0 GPR16: 10140000 00000000 1015c110 00000000 10140000 100042d0 b7e7ebe8 100c0000 GPR24: 00000000 00000000 100c0000 100dc360 b7e2e010 100dc360 0ff2afa8 bfe520c8 NIP [0fe37b04] 0xfe37b04 LR [0fe37ad0] 0xfe37ad0 --- interrupt: c00 le or directory ``` I wanted to pick the lists' brain to figure out what else to try to debug or try to reproduce this more reliably, or if there's any relevant changes that you recall that could be causing this issue. I would also like to know how to go about dumping and flashing this into another board or to investigating it in my own pc. I tried dumping with mtd_debug and nanddump and then writing it to another board but I get ECC errors, surely I am doing something wrong. Thanks and best regards, Tomas Alvarez Vanoli ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-01-29 10:30 [BUG] UBIFS corruption on powerpc 32-bit targets Tomas Alvarez Vanoli @ 2026-01-30 1:34 ` Zhihao Cheng 2026-02-03 9:12 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-01-30 1:34 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/1/29 18:30, Tomas Alvarez Vanoli 写道: > Hello, > > I am writing beacuse we are experiencing ubifs corruption on our powerpc boards > after updating from 6.1 to 6.12.57. > I haven't pulled the latest UBIFS or UBI code, I looked around the changes some > weeks ago and did not find something that seemed relevant. > > We have a variety of architectures running practically the same application > (w.r.t. file/db writing), and have only experienced this in the powerpc boards, > namely one using the layerscape t1040 in 32-bit mode and another one using > MPC8360. > > Corruption appears only with the newer kernel, same compiler and libraries as > older one. > > We've ran integck and fsstress for long periods of time in various boards but we > have not been able to reproduce it, it happens with some automated tests on our > test lab, after some hours of running tests (not stress tests). The application > uses c++'s filesystem library. > > We have an ubi device (nand flash) holding mostly read-only volumes with kernels > and rootfses, one rw volume for configurations (ubifs) and one for storing > coredumps (jffs2). > [...] > ``` > > The error we are seeing is: > > ``` > root@kmcent2:~# find /cfg/ -name *tar.gz > UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 3917759, error -2 > UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry 'ne_config.xml.gz', error -2 > UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only mode, error -2 Hi, What is the type of volume ubi1:0, ro or rw? The apparent reason is that the UBIFS image is inconsistent, the inode cannot be found but the dentry(ne_config.xml.gz) still exists on flash. Which means that user can see all directory entries under /cfg/board/cfg/mob/backup/ne/, but file 'ne_config.xml.gz' cannot be accessed. You could dump the mtd image for 'ubi1' by the command 'dd if=/dev/mtd10 of=mtd_image bs=1M', and send the file 'mtd_image' to us by email, maybe we could get more information from the image. BTW, are there any abnormal history logs about ubi1 before the error message? > find: /cfg/board/cfg/mob/backup/ne/ne_config.xml.gz: No such fiCPU: 1 UID: 0 PID: 134 Comm: find Not tainted 6.12.57-00435-gf9e139970f1f #0 > Hardware name: keymile,kmcent2 e5500 0x80241021 CoreNet Generic > Call Trace: > [c2cabba0] [c0bbf558] dump_stack_lvl+0xfc/0x120 (unreliable) > [c2cabbc0] [c041c8e4] ubifs_lookup+0x378/0x394 > [c2cabc20] [c02f100c] __lookup_slow+0xb0/0x1c0 > [c2cabc60] [c02f60d4] walk_component+0x158/0x24c > [c2cabc90] [c02f7078] path_lookupat+0xa4/0x238 > [c2cabcc0] [c02f77b8] filename_lookup+0xcc/0x200 > [c2cabd90] [c02e7b08] vfs_statx+0xa0/0x150 > [c2cabdd0] [c02e858c] do_statx+0x84/0xe4 > [c2cabec0] [c02e87cc] sys_statx+0x8c/0x120 > [c2cabef0] [c00127bc] system_call_exception+0x9c/0x1e0 > [c2cabf10] [c00170e8] ret_from_syscall+0x0/0x28 > --- interrupt: c00 at 0xfe37b04 > NIP: 0fe37b04 LR: 0fe37ad0 CTR: 00000000 > REGS: c2cabf20 TRAP: 0c00 Not tainted (6.12.57-00435-gf9e139970f1f) > MSR: 0002d002 <CE,EE,PR,ME> CR: 24008483 XER: 20000000 > > GPR00: 0000017f bfe51f30 b7e51800 ffffff9c 100dc360 00000900 000007ff bfe51f38 > GPR08: 00000001 00000008 00008ffa c2cabf10 24004884 100c365a 00000000 10144bf0 > GPR16: 10140000 00000000 1015c110 00000000 10140000 100042d0 b7e7ebe8 100c0000 > GPR24: 00000000 00000000 100c0000 100dc360 b7e2e010 100dc360 0ff2afa8 bfe520c8 > NIP [0fe37b04] 0xfe37b04 > LR [0fe37ad0] 0xfe37ad0 > --- interrupt: c00 > le or directory > ``` > > I wanted to pick the lists' brain to figure out what else to try to debug or try > to reproduce this more reliably, or if there's any relevant changes that you > recall that could be causing this issue. > > I would also like to know how to go about dumping and flashing this into another > board or to investigating it in my own pc. Try following commands? nanddump -n -o -f flash_image /dev/mtdX nandwrite -o -n /dev/mtdY flash_image > > I tried dumping with mtd_debug and nanddump and then writing it to another board > but I get ECC errors, surely I am doing something wrong. > > Thanks and best regards, > Tomas Alvarez Vanoli > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-01-30 1:34 ` Zhihao Cheng @ 2026-02-03 9:12 ` Tomas Alvarez Vanoli 2026-02-04 4:54 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-03 9:12 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org >Hi, >What is the type of volume ubi1:0, ro or rw? root@kmcent2:~# ubinfo -d 1 -N cfg Volume ID: 0 (on ubi1) Type: dynamic Alignment: 1 Size: 1058 LEBs (134340608 bytes, 128.1 MiB) State: OK Name: cfg Character device major/minor: 246:1 >You could dump the mtd image for 'ubi1' by the command 'dd if=/dev/mtd10 >of=mtd_image bs=1M', and send the file 'mtd_image' to us by email, maybe >we could get more information from the image. I started some consultation with the cybersecurity department. Unfortunately, the nand image contains our full application software and third parties' proprietary software so it might not be possible. Are there any analysis tools you'd recommend looking into? >BTW, are there any abnormal history logs about ubi1 before the error message? Not really, in the application where the error happens, with kernel 6.12, everything looks good until the panic happens. We also have "known-good stable application" that boots when no other application exists (here I refer to kernel + rootfs as application), which runs kernel 4.14.20 and there I see this (before 6.12.x has ever ran on the board): UBIFS: parse sync UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 106 UBIFS (ubi1:0): recovery needed UBIFS (ubi1:0): recovery completed UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "cfg" UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes UBIFS (ubi1:0): FS size: 132943872 bytes (126 MiB, 1047 LEBs), journal size 6602752 bytes (6 MiB, 52 LEBs) UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB) UBIFS (ubi1:0): media format: w5/r0 (latest is w5/r0), UUID DC7EC969-1F4D-4669-9D31-16C57EAB96DA, small LPT model The part where it says recovery needed/completed looks strange but it does not complain about any errors anyway. >Try following commands? >nanddump -n -o -f flash_image /dev/mtdX >nandwrite -o -n /dev/mtdY flash_image This results in a flooding of ECC errors like this one: ubi1 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 29:0, read 64 bytes I have to erase the nand from u-boot to recover it. NAND chips on both boards are the same, from boot logs: nand: device found, Manufacturer ID: 0x01, Chip ID: 0xac nand: AMD/Spansion S34MS04G2 nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 Bad block table found at page 262080, version 0x01 Bad block table found at page 262016, version 0x01 Best regards, Tomas Alvarez Vanoli ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-03 9:12 ` Tomas Alvarez Vanoli @ 2026-02-04 4:54 ` Zhihao Cheng 2026-02-04 14:04 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-04 4:54 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/3 17:12, Tomas Alvarez Vanoli 写道: >> Hi, >> What is the type of volume ubi1:0, ro or rw? > > root@kmcent2:~# ubinfo -d 1 -N cfg > Volume ID: 0 (on ubi1) > Type: dynamic > Alignment: 1 > Size: 1058 LEBs (134340608 bytes, 128.1 MiB) > State: OK > Name: cfg > Character device major/minor: 246:1 > >> You could dump the mtd image for 'ubi1' by the command 'dd if=/dev/mtd10 >> of=mtd_image bs=1M', and send the file 'mtd_image' to us by email, maybe >> we could get more information from the image. > > I started some consultation with the cybersecurity department. Unfortunately, > the nand image contains our full application software and third parties' > proprietary software so it might not be possible. > > Are there any analysis tools you'd recommend looking into? Try 'fsck.ubifs -g3 -y /dev/ubi0_1'. Attention, fsck will change the ubifs image content, you could dump the mtd image to another device(or virtual machine) and run fsck on it. https://www.linux-mtd.infradead.org/?p=mtd-utils.git;a=summary > >> BTW, are there any abnormal history logs about ubi1 before the error message? > > Not really, in the application where the error happens, with kernel 6.12, > everything looks good until the panic happens. > > We also have "known-good stable application" that boots when no other > application exists (here I refer to kernel + rootfs as application), which > runs kernel 4.14.20 and there I see this (before 6.12.x has ever ran on the > board): > > UBIFS: parse sync > UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 106 > UBIFS (ubi1:0): recovery needed > UBIFS (ubi1:0): recovery completed > UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "cfg" > UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: > 2048 bytes/2048 bytes > UBIFS (ubi1:0): FS size: 132943872 bytes (126 MiB, 1047 LEBs), journal size > 6602752 bytes (6 MiB, 52 LEBs) > UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB) > UBIFS (ubi1:0): media format: w5/r0 (latest is w5/r0), UUID > DC7EC969-1F4D-4669-9D31-16C57EAB96DA, small LPT model > > The part where it says recovery needed/completed looks strange but it does not > complain about any errors anyway. The 'recovery' message means that ubifs has experienced an unclean reboot. It is a normal message. > >> Try following commands? >> nanddump -n -o -f flash_image /dev/mtdX >> nandwrite -o -n /dev/mtdY flash_image > nanddump -o (-n) -f flash_image /dev/mtdX flash_eraseall /dev/mtd0 nandwrite -o (-n) /dev/mtdY flash_image or dd of=flash_image if=/dev/mtdX bs=1M flash_eraseall /dev/mtdY dd if=flash_image of=/dev/mtdY bs=1M > This results in a flooding of ECC errors like this one: > ubi1 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB > 29:0, read 64 bytes > > I have to erase the nand from u-boot to recover it. > NAND chips on both boards are the same, from boot logs: > nand: device found, Manufacturer ID: 0x01, Chip ID: 0xac > nand: AMD/Spansion S34MS04G2 > nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 > Bad block table found at page 262080, version 0x01 > Bad block table found at page 262016, version 0x01 > > Best regards, > Tomas Alvarez Vanoli > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-04 4:54 ` Zhihao Cheng @ 2026-02-04 14:04 ` Tomas Alvarez Vanoli 2026-02-05 2:14 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-04 14:04 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org >Try 'fsck.ubifs -g3 -y /dev/ubi0_1'. I downloaded and cross-compiled mtd-utils for ppc, not sure if I had given some wrong param for configure but I was getting a segfault at ubifs-utils/fsck.ubifs/problem.c:236 (log_out in FILE_IS_INCONSISTENT). This is the backtrace of the crash in case you are interested (otherwise skip ahead to the run that worked): ``` (gdb) bt #0 __GI_strlen () at ../sysdeps/powerpc/powerpc32/strlen.S:98 #1 0x0f7ec038 in __printf_buffer (buf=buf@entry=0xbffff748, format=format@entry=0x1005bd38 "%s[%d] (%s%s): problem: %s, ino %lu type %s, nlink %u xcnt %u xsz %u xnms %u size %llu, should be nlink %u xcnt %u xsz %u xnms %u size %llu\n", ap=ap@entry=0xbffff838, mode_flags=mode_flags@entry=0) at vfprintf-process-arg.c:435 #2 0x0f7ec608 in __vfprintf_internal (s=<optimized out>, format=0x1005bd38 "%s[%d] (%s%s): problem: %s, ino %lu type %s, nlink %u xcnt %u xsz %u xnms %u size %llu, should be nlink %u xcnt %u xsz %u xnms %u size %llu\n" ap=ap@entry=0xbffff838, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1544 #3 0x0f7e211c in __printf (format=<optimized out>) at printf.c:33 #4 0x1003da80 in print_problem (priv=0xbffff970, problem_type=<optimized out>, problem=0x10060494 <problem_table+184>, c=0x1006d2e8 <info_>) at ubifs-utils/fsck.ubifs/problem.c:236 #5 fix_problem (c=c@entry=0x1006d2e8 <info_>, problem_type=problem_type@entry=23, priv=priv@entry=0xbffff9f0) at ubifs-utils/fsck.ubifs/problem.c:354 #6 0x1003f9dc in handle_invalid_file (priv=0x0, file=0x1011c890, problem_type=23, c=0x1006d2e8 <info_>) at ubifs-utils/fsck.ubifs/extract_files.c:844 #7 correct_file_info (file=0x1011c890, c=0x1006d2e8 <info_>) at ubifs-utils/fsck.ubifs/extract_files.c:1496 #8 correct_file_info (c=0x1006d2e8 <info_>, file=0x1011c890) at ubifs-utils/fsck.ubifs/extract_files.c:1473 #9 0x100439dc in check_and_correct_files (c=c@entry=0x1006d2e8 <info_>) at ubifs-utils/fsck.ubifs/extract_files.c:1552 #10 0x10005f94 in do_fsck () at ubifs-utils/fsck.ubifs/fsck.ubifs.c:490 #11 main (argc=<optimized out>, argv=<optimized out>) at ubifs-utils/fsck.ubifs/fsck.ubifs.c:622 (gdb) ``` There's a warning about the formatter used for ino_t, used formatter is %lu but ino_t is long long unsigned int. Anyway, I split the logs into one call per value and ran it, got the following (patched up the log into a single line again): ``` root@kmcent2:~# ./fsck.ubifs -y -g3 /dev/ubi1_0 fsck.ubifs[182] (/dev/ubi1_0,danger mode): Read superblock fsck.ubifs[182] (/dev/ubi1_0,danger mode): Read master & init lpt <INFO> fsck.ubifs[182] (/dev/ubi1_0): ubifs_load_filesystem: recovery needed fsck.ubifs[182] (/dev/ubi1_0,danger mode): Replay journal fsck.ubifs[182] (/dev/ubi1_0,danger mode): Handle orphan nodes fsck.ubifs[182] (/dev/ubi1_0,danger mode): Recover isize fsck.ubifs[182] (/dev/ubi1_0,danger mode): Traverse TNC and construct files fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and handle invalid files fsck.ubifs[182] (/dev/ubi1_0,danger mode): problem: File has no inode, ino 0 fsck.ubifs[182] (/dev/ubi1_0,danger mode): Delete it? y fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and handle unreachable files fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct files fsck.ubifs[182] (/dev/ubi1_0,danger mode): problem: File is inconsistent ino 184322 type dir nlink 2 xcnt 0 xsz 0 xnms 0 size 472, should be nlink 2 xcnt 0 xsz 0 xnms 0 size 392 fsck.ubifs[182] (/dev/ubi1_0,danger mode): Fix it? y fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check whether the TNC is empty fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct the space statistics fsck.ubifs[182] (/dev/ubi1_0,danger mode): Commit problem fixing modifications fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct the index size fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and create root dir fsck.ubifs[182] (/dev/ubi1_0,danger mode): Final committing fsck.ubifs[182] (-): ********** Filesystem was modified ********** fsck.ubifs[182] (-): FSCK success! root@kmcent2:~# ``` Seems potentially the size of the node was wrong? After the run, the fs is, as expected, not broken anymore. PS: my email client might be omitting the in-reply-to headers, sorry about that Best regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-04 14:04 ` Tomas Alvarez Vanoli @ 2026-02-05 2:14 ` Zhihao Cheng 2026-02-05 15:47 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-05 2:14 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/4 22:04, Tomas Alvarez Vanoli 写道: >> Try 'fsck.ubifs -g3 -y /dev/ubi0_1'. > > I downloaded and cross-compiled mtd-utils for ppc, not sure if I had given > some wrong param for configure but I was getting a segfault at > ubifs-utils/fsck.ubifs/problem.c:236 (log_out in FILE_IS_INCONSISTENT). > > This is the backtrace of the crash in case you are interested (otherwise skip > ahead to the run that worked): > > ``` > (gdb) bt > #0 __GI_strlen () at ../sysdeps/powerpc/powerpc32/strlen.S:98 > #1 0x0f7ec038 in __printf_buffer (buf=buf@entry=0xbffff748, > format=format@entry=0x1005bd38 "%s[%d] (%s%s): problem: %s, ino %lu type %s, > nlink %u xcnt %u xsz %u xnms %u size %llu, should be nlink %u xcnt %u xsz %u > xnms %u size %llu\n", ap=ap@entry=0xbffff838, > mode_flags=mode_flags@entry=0) at vfprintf-process-arg.c:435 > #2 0x0f7ec608 in __vfprintf_internal (s=<optimized out>, > format=0x1005bd38 "%s[%d] (%s%s): problem: %s, ino %lu type %s, nlink %u xcnt %u > xsz %u xnms %u size %llu, should be nlink %u xcnt %u xsz %u xnms %u size %llu\n" > ap=ap@entry=0xbffff838, > mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1544 > #3 0x0f7e211c in __printf (format=<optimized out>) at printf.c:33 > #4 0x1003da80 in print_problem (priv=0xbffff970, problem_type=<optimized out>, > problem=0x10060494 <problem_table+184>, c=0x1006d2e8 <info_>) at > ubifs-utils/fsck.ubifs/problem.c:236 > #5 fix_problem (c=c@entry=0x1006d2e8 <info_>, > problem_type=problem_type@entry=23, > priv=priv@entry=0xbffff9f0) at ubifs-utils/fsck.ubifs/problem.c:354 > #6 0x1003f9dc in handle_invalid_file (priv=0x0, file=0x1011c890, > problem_type=23, c=0x1006d2e8 <info_>) > at ubifs-utils/fsck.ubifs/extract_files.c:844 > #7 correct_file_info (file=0x1011c890, c=0x1006d2e8 <info_>) > at ubifs-utils/fsck.ubifs/extract_files.c:1496 > #8 correct_file_info (c=0x1006d2e8 <info_>, file=0x1011c890) > at ubifs-utils/fsck.ubifs/extract_files.c:1473 > #9 0x100439dc in check_and_correct_files (c=c@entry=0x1006d2e8 <info_>) > at ubifs-utils/fsck.ubifs/extract_files.c:1552 > #10 0x10005f94 in do_fsck () at ubifs-utils/fsck.ubifs/fsck.ubifs.c:490 > #11 main (argc=<optimized out>, argv=<optimized out>) at > ubifs-utils/fsck.ubifs/fsck.ubifs.c:622 > (gdb) > ``` > Thanks for pointing it out, Hayama has sent out the patches to fix it, see https://patchwork.ozlabs.org/project/linux-mtd/cover/9276ac09-7c7d-41df-9ee2-094f1c8edf93@lineo.co.jp/. It would be appreciate that if you could modify according to the suggestions and resend them. > There's a warning about the formatter used for ino_t, used formatter is %lu but > ino_t is long long unsigned int. > > Anyway, I split the logs into one call per value and ran it, got the following > (patched up the log into a single line again): > > ``` > root@kmcent2:~# ./fsck.ubifs -y -g3 /dev/ubi1_0 > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Read superblock > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Read master & init lpt > <INFO> fsck.ubifs[182] (/dev/ubi1_0): ubifs_load_filesystem: recovery needed > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Replay journal > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Handle orphan nodes > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Recover isize > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Traverse TNC and construct files > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and handle invalid files > fsck.ubifs[182] (/dev/ubi1_0,danger mode): problem: File has no inode, ino 0 > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Delete it? y Looks like that there is only one problem, the inode cannot be found but the dentry(ne_config.xml.gz) still exists on flash. I guess the lost inode is 3917759 in previous log. Maybe the inode is UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 3917759, error -2 UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry 'ne_config.xml.gz', error -2 UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only mode, error -2 find: /cfg/board/cfg/mob/backup/ne/ne_config.xml.gz: No such fiCPU: 1 UID: 0 PID: 134 Comm: find Not tainted 6.12.57-00435-gf9e139970f1f #0 Apply following patch to kernel and mount bad ubifs image, let's try to find more information after a full volume scanning: diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index f453c37cee37..35d54e1a7ecd 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -1491,6 +1491,89 @@ static int mount_ubifs(struct ubifs_info *c) } else ubifs_assert(c, c->lst.taken_empty_lebs > 0); + int lnum, len, ret, offs; + char *leb_buf, *buf; + + leb_buf = vmalloc(c->leb_size); + if (!leb_buf) + return -ENOMEM; + + for (lnum = c->main_first; lnum < c->leb_cnt; ++lnum) { + pr_info("---------- Start scan LEB %d (main_first %d) ----------\n", lnum, c->main_first); + ubifs_leb_read(c, lnum, leb_buf, 0, c->leb_size, 0); + //print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 32, 1, (void *)leb_buf, c->leb_size, 1); + len = c->leb_size; + buf = leb_buf; + offs = 0; + + while (len >= 8) { + int found; + int node_len; + struct ubifs_ch *ch = (struct ubifs_ch *)buf; + struct ubifs_ino_node *ino; + struct ubifs_dent_node *dent; + struct ubifs_data_node *dn; + union ubifs_key key; + + ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 0); + if (ret > 0) { + pr_info("padding bytes %d:%d\n", offs, offs + ret); + offs += ret; + buf += ret; + len -= ret; + continue; + } + if (ret == SCANNED_EMPTY_SPACE) { + break; + } else if (ret != SCANNED_A_NODE) { + ubifs_err(c, "SCAN ret = %d", ret); + break; + } + + node_len = ALIGN(le32_to_cpu(ch->len), 8); + + switch (ch->node_type) { + case UBIFS_INO_NODE: + ino = (struct ubifs_ino_node *)buf; + key_read(c, &ino->key, &key); + + found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); + pr_info("ino %u found %d %d-%d\n", key.u32[0], found, offs, offs + node_len); + break; + case UBIFS_DENT_NODE: + dent = (struct ubifs_dent_node *)buf; + key_read(c, &dent->key, &key); + + found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); + pr_info("dent %llu(%s) found %d %d-%d\n", dent->inum, dent->name, found, offs, offs + node_len); + break; + case UBIFS_DATA_NODE: + dn = (struct ubifs_data_node *)buf; + key_read(c, &dn->key, &key); + + found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); + pr_info("data %u found %d %d-%d\n", key.u32[0], found, offs, offs + node_len); + break; + default: + pr_info("unknown type %d\n", ch->node_type); + break; + } + + offs += node_len; + buf += node_len; + len -= node_len; + } + + for (; len > 4; offs += 4, buf += 4, len -= 4) + if (*(uint32_t *)buf != 0xffffffff) + break; + for (; len; offs++, buf++, len--) + if (*(uint8_t *)buf != 0xff) + ubifs_err(c, "corrupt empty space at LEB %d:%d", lnum, offs); + } + + pr_info("============ FINISH ==========\n"); + err = dbg_check_filesystem(c); if (err) goto out_infos; > > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and handle unreachable files > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct files > fsck.ubifs[182] (/dev/ubi1_0,danger mode): problem: File is inconsistent ino > 184322 type dir nlink 2 xcnt 0 xsz 0 xnms 0 size 472, should be nlink 2 xcnt 0 > xsz 0 xnms 0 size 392 > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Fix it? y > > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check whether the TNC is empty > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct the space > statistics > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Commit problem fixing modifications > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and correct the index size > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Check and create root dir > fsck.ubifs[182] (/dev/ubi1_0,danger mode): Final committing > fsck.ubifs[182] (-): ********** Filesystem was modified ********** > fsck.ubifs[182] (-): FSCK success! > root@kmcent2:~# > ``` > > Seems potentially the size of the node was wrong? > After the run, the fs is, as expected, not broken anymore. > > > PS: my email client might be omitting the in-reply-to headers, sorry about that > > Best regards, > Tomas > > > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-05 2:14 ` Zhihao Cheng @ 2026-02-05 15:47 ` Tomas Alvarez Vanoli 2026-02-06 2:21 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-05 15:47 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org Hi, >It would be appreciate that if you could modify according to the >suggestions and resend them. I'll take look at the patch and submit the changes soon. >Looks like that there is only one problem, the inode cannot be found but >the dentry(ne_config.xml.gz) still exists on flash. I guess the lost >inode is 3917759 in previous log. >Maybe the inode is > >UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 3917759, >error -2 >UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry >'ne_config.xml.gz', error -2 >UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only >mode, error -2 >find: /cfg/board/cfg/mob/backup/ne/ne_config.xml.gz: No such fiCPU: 1 >UID: 0 PID: 134 Comm: find Not tainted 6.12.57-00435-gf9e139970f1f #0 > >Apply following patch to kernel and mount bad ubifs image, let's try to >find more information after a full volume scanning: I flashed again the corrupted image, the same error happens with "find /cfg", but also I get a couple ecc errors afterwards for some reason. Here's some excerpts related to ino 3917759 from the full volume scan ``` ---------- Start scan LEB 911 (main_first 11) ---------- data 3917705 found 0 0-3792 padding bytes 3792:4096 data 3917705 found 0 4096-4408 padding bytes 4408:6144 data 3917707 found 0 6144-6608 padding bytes 6608:8192 data 3917708 found 0 8192-8360 padding bytes 8360:10240 data 3917709 found 0 10240-10824 padding bytes 10824:12288 data 3917710 found 0 12288-12752 padding bytes 12752:14336 data 3917711 found 0 14336-14816 padding bytes 14816:16384 data 3917712 found 0 16384-17192 padding bytes 17192:18432 data 3917714 found 0 18432-19496 padding bytes 19496:20480 data 3917715 found 0 20480-20648 padding bytes 20648:22528 data 3917716 found 0 22528-23000 padding bytes 23000:24576 data 3917717 found 0 24576-25056 padding bytes 25056:26624 data 3917718 found 0 26624-27112 padding bytes 27112:28672 data 3917719 found 0 28672-30768 padding bytes 30768:32768 data 3917721 found 0 32768-32936 padding bytes 32936:34816 data 3917722 found 0 34816-35288 padding bytes 35288:36864 data 3917723 found 0 36864-37344 padding bytes 37344:38912 data 3917724 found 0 38912-39400 padding bytes 39400:40960 data 3917725 found 0 40960-42024 padding bytes 42024:43008 data 3917726 found 0 43008-45144 padding bytes 45144:47104 data 3917728 found 0 47104-48200 padding bytes 48200:49152 data 3917729 found 0 49152-49632 padding bytes 49632:51200 data 3917731 found 0 51200-51368 padding bytes 51368:53248 data 3917732 found 0 53248-54336 padding bytes 54336:55296 data 3917733 found 0 55296-55784 padding bytes 55784:57344 data 3917734 found 0 57344-57840 padding bytes 57840:59392 data 3917735 found 0 59392-59552 padding bytes 59552:61440 data 3917736 found 0 61440-61920 padding bytes 61920:63488 data 3917738 found 0 63488-63968 padding bytes 63968:65536 data 3917739 found 0 65536-66680 padding bytes 66680:67584 data 3917740 found 0 67584-67752 padding bytes 67752:69632 data 3917741 found 0 69632-70112 padding bytes 70112:71680 data 3917742 found 0 71680-72176 padding bytes 72176:73728 data 3917743 found 0 73728-74136 padding bytes 74136:75776 data 3917745 found 0 75776-75944 padding bytes 75944:77824 data 3917746 found 0 77824-78888 padding bytes 78888:79872 data 3917747 found 0 79872-80328 padding bytes 80328:81920 data 3917748 found 0 81920-82384 padding bytes 82384:83968 data 3917749 found 0 83968-84440 padding bytes 84440:86016 data 3917750 found 0 86016-86728 padding bytes 86728:88064 data 3917698 found 0 88064-88264 padding bytes 88264:90112 data 3917753 found 1 90112-90592 padding bytes 90592:92160 data 3917755 found 1 92160-92640 padding bytes 92640:94208 data 3917756 found 1 94208-94688 padding bytes 94688:96256 data 3917757 found 1 96256-96744 padding bytes 96744:98304 data 3917758 found 1 98304-98792 padding bytes 98792:100352 data 3917759 found 0 100352-101792 padding bytes 101792:102400 data 3917760 found 1 102400-103464 padding bytes 103464:104448 data 3917762 found 1 104448-104824 padding bytes 104824:106496 data 3917763 found 1 106496-106664 padding bytes 106664:108544 data 3917761 found 1 108544-112688 padding bytes 112688:114688 data 3917761 found 1 114688-117768 padding bytes 117768:118784 data 3917767 found 1 118784-120224 padding bytes 120224:120832 data 3917768 found 1 120832-121320 padding bytes 121320:122880 data 3917769 found 1 122880-123360 padding bytes 123360:124928 ino 69 found 0 124928-125088 padding bytes 125088:126976 ---------- Start scan LEB 912 (main_first 11) ---------- ``` [...] ``` ---------- Start scan LEB 914 (main_first 11) ---------- ino 3917758 found 0 0-160 padding bytes 160:2048 dent 13819078852795695104(ne_config.xml.gz) found 1 2048-2128 ino 3917759 found 0 2128-2288 ino 3917754 found 0 2288-2448 padding bytes 2448:4096 ino 3917758 found 1 4096-4256 padding bytes 4256:6144 ino 3917759 found 0 6144-6304 padding bytes 6304:8192 ``` There's also a second dent found in the tree for ne_config.xml.gz (there's another file named the same elsewhere), probably irrelevant: ``` padding bytes 29056:30720 dent 14323482011061190656(ne) found 1 30720-30784 ino 3917766 found 0 30784-30944 ino 3917765 found 0 30944-31104 padding bytes 31104:32768 dent 14395539605099118592(ne_config.xml.gz) found 1 32768-32848 ino 3917767 found 0 32848-33008 ino 3917766 found 0 33008-33168 padding bytes 33168:34816 ino 3917767 found 0 34816-34976 padding bytes 34976:36864 ``` The full logs are roughly 120 thousand lines, let me know if you need me to search for something or would like them filtered differently. Best Regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-05 15:47 ` Tomas Alvarez Vanoli @ 2026-02-06 2:21 ` Zhihao Cheng 2026-02-06 16:14 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-06 2:21 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/5 23:47, Tomas Alvarez Vanoli 写道: > Hi, > >> It would be appreciate that if you could modify according to the >> suggestions and resend them. > > I'll take look at the patch and submit the changes soon. > >> Looks like that there is only one problem, the inode cannot be found but >> the dentry(ne_config.xml.gz) still exists on flash. I guess the lost >> inode is 3917759 in previous log. >> Maybe the inode is >> >> UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 3917759, >> error -2 >> UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry >> 'ne_config.xml.gz', error -2 >> UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only >> mode, error -2 >> find: /cfg/board/cfg/mob/backup/ne/ne_config.xml.gz: No such fiCPU: 1 >> UID: 0 PID: 134 Comm: find Not tainted 6.12.57-00435-gf9e139970f1f #0 >> >> Apply following patch to kernel and mount bad ubifs image, let's try to >> find more information after a full volume scanning: > > I flashed again the corrupted image, the same error happens with "find /cfg", > but also I get a couple ecc errors afterwards for some reason. > > Here's some excerpts related to ino 3917759 from the full volume scan > > ``` > ---------- Start scan LEB 911 (main_first 11) ---------- > data 3917758 found 1 98304-98792 > padding bytes 98792:100352 > data 3917759 found 0 100352-101792 > padding bytes 101792:102400 The 'data 3917759' means that ne_config.xml.gz has ever been written, the message 'found 0' means that ne_config.xml.gz was truncated or deleted later. > data 3917760 found 1 102400-103464 > padding bytes 103464:104448 ... > ---------- Start scan LEB 912 (main_first 11) ---------- > ``` > > [...] > > ``` > ---------- Start scan LEB 914 (main_first 11) ---------- > ino 3917758 found 0 0-160 > padding bytes 160:2048 > dent 13819078852795695104(ne_config.xml.gz) found 1 2048-2128 The dentry should be this one. 13819078852795695104 (0xbfc73b0000000000 is the little endian in u64 format for 0x3bc7bf[3917759]). > ino 3917759 found 0 2128-2288 > ino 3917754 found 0 2288-2448 > padding bytes 2448:4096 Above three items 'dent+ino[3917759]+dir ino[3917754]' should be a creation op, > ino 3917758 found 1 4096-4256 > padding bytes 4256:6144 > ino 3917759 found 0 6144-6304 > padding bytes 6304:8192 Then, inode 3917759 is changed by some op(eg. write/delete). So, file 'ne_config.xml.gz' was created and written, then following two ops may be executed: 1. truncate. No truncation nodes are found, maybe there are not printed, or they were gced later. Apply following patch to verify it. Even so, ubifs makes sure that truncation is an atomic modification op on flash, the inode(found=1) is finally be scanned from the flash, it cannot be happened that inode is not found. diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index 35d54e1a7ecd..274bdab606b3 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -1509,10 +1509,12 @@ static int mount_ubifs(struct ubifs_info *c) while (len >= 8) { int found; int node_len; + unsigned long long seq; struct ubifs_ch *ch = (struct ubifs_ch *)buf; struct ubifs_ino_node *ino; struct ubifs_dent_node *dent; struct ubifs_data_node *dn; + struct ubifs_trun_node *trun; union ubifs_key key; ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 0); @@ -1531,6 +1533,7 @@ static int mount_ubifs(struct ubifs_info *c) } node_len = ALIGN(le32_to_cpu(ch->len), 8); + seq = le64_to_cpu(ch->sqnum); switch (ch->node_type) { case UBIFS_INO_NODE: @@ -1538,21 +1541,25 @@ static int mount_ubifs(struct ubifs_info *c) key_read(c, &ino->key, &key); found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); - pr_info("ino %u found %d %d-%d\n", key.u32[0], found, offs, offs + node_len); + pr_info("ino %u seq %llu nlink %u found %d %d-%d\n", key.u32[0], seq, le32_to_cpu(ino->nlink), found, offs, offs + node_len); break; case UBIFS_DENT_NODE: dent = (struct ubifs_dent_node *)buf; key_read(c, &dent->key, &key); found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); - pr_info("dent %llu(%s) found %d %d-%d\n", dent->inum, dent->name, found, offs, offs + node_len); + pr_info("dent %llu(%s) seq %llu found %d %d-%d\n", le64_to_cpu(dent->inum), dent->name, seq, found, offs, offs + node_len); break; case UBIFS_DATA_NODE: dn = (struct ubifs_data_node *)buf; key_read(c, &dn->key, &key); found = ubifs_tnc_has_node(c, &key, 0, lnum, offs, 0); - pr_info("data %u found %d %d-%d\n", key.u32[0], found, offs, offs + node_len); + pr_info("data %u seq %llu found %d %d-%d\n", key.u32[0], seq, found, offs, offs + node_len); + break; + case UBIFS_TRUN_NODE: + trun = (struct ubifs_trun_node *)buf; + pr_info("trun %u seq %llu %d-%d\n", le32_to_cpu(trun->inum), seq, offs, offs + node_len); break; default: pr_info("unknown type %d\n", ch->node_type); 2. delete. The ubifs will pack dentry/inode/dir_inode into an atomic modification op on flash, it cannot be happened that dentry exists but inode is deleted. Are there any error/warning messages(from ubi/ubifs/flash driver) during scanning? > ``` > > There's also a second dent found in the tree for ne_config.xml.gz (there's > another file named the same elsewhere), probably irrelevant: > > ``` > padding bytes 29056:30720 > dent 14323482011061190656(ne) found 1 30720-30784 > ino 3917766 found 0 30784-30944 > ino 3917765 found 0 30944-31104 > padding bytes 31104:32768 > dent 14395539605099118592(ne_config.xml.gz) found 1 32768-32848 > ino 3917767 found 0 32848-33008 > ino 3917766 found 0 33008-33168 > padding bytes 33168:34816 > ino 3917767 found 0 34816-34976 > padding bytes 34976:36864 > ``` > > The full logs are roughly 120 thousand lines, let me know if you need me to > search for something or would like them filtered differently. > > Best Regards, > Tomas > > > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-06 2:21 ` Zhihao Cheng @ 2026-02-06 16:14 ` Tomas Alvarez Vanoli 2026-02-07 2:58 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-06 16:14 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org > Are there any error/warning messages(from ubi/ubifs/flash driver) during > scanning? Now in the mounting logs we get some ECC errors, but I am sure this comes from the way the image was dumped and flashed again, because I did not see this before. No truncation appears for the affected inode. I have another board where the corruption happened, so I can analyze that one with the patches suggested too. On this other board, the error is the same but with a different file: UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "cfg" UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes UBIFS (ubi1:0): FS size: 132943872 bytes (126 MiB, 1047 LEBs), max 1058 LEBs, journal size 6602752 bytes (6 MiB, 52 LEBs) UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB) UBIFS (ubi1:0): media format: w5/r0 (latest is w5/r0), UUID DC7EC969-1F4D-4669-9D31-16C57EAB96DA, small LPT model UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 123367, error -2 UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry 'common_userdata_compat.xml.gz', error -2 UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only mode, error -2 There are no error messages in the mount log for this board, the node situation is the same, and there are no truncation operations for this node. [me@mypc debug]$ grep -B8 -A5 123367 logs_from_mounting_second_board data 123362 seq 1375451 found 1 2048-2520 padding bytes 2520:4096 data 123364 seq 1375458 found 1 4096-4576 padding bytes 4576:6144 data 123365 seq 1375459 found 1 6144-6624 padding bytes 6624:8192 data 123366 seq 1375489 found 1 8192-8680 padding bytes 8680:10240 data 123367 seq 1375493 found 0 10240-10728 padding bytes 10728:12288 data 123368 seq 1375507 found 1 12288-13592 padding bytes 13592:14336 data 123369 seq 1375508 found 1 14336-15384 padding bytes 15384:16384 -- ino 123363 seq 1375463 nlink 2 found 0 125168-125328 padding bytes 125328:126976 ---------- Start scan LEB 182 (main_first 11) ---------- ---------- Start scan LEB 183 (main_first 11) ---------- ino 123365 seq 1375465 nlink 1 found 1 0-160 padding bytes 160:2048 ino 123366 seq 1375466 nlink 1 found 0 2048-2208 padding bytes 2208:4096 dent 123367(common_userdata_compat.xml.gz) seq 1375486 found 1 4096-4184 ino 123367 seq 1375487 nlink 1 found 0 4184-4344 ino 123361 seq 1375488 nlink 2 found 0 4344-4504 padding bytes 4504:6144 ino 123367 seq 1375490 nlink 1 found 0 6144-6304 padding bytes 6304:8192 ino 123366 seq 1375492 nlink 1 found 1 8192-8352 padding bytes 8352:10240 dent 123368(ne_config.xml.gz) seq 1375496 found 1 10240-10320 ino 123368 seq 1375497 nlink 1 found 0 10320-10480 ino 123363 seq 1375498 nlink 2 found 0 10480-10640 padding bytes 10640:12288 ino 123367 seq 1375500 nlink 1 found 0 12288-12448 padding bytes 12448:14336 dent 123369(unit-11_config.xml.gz) seq 1375502 found 1 14336-14416 ino 123369 seq 1375503 nlink 1 found 0 14416-14576 ino 123361 seq 1375504 nlink 2 found 0 14576-14736 padding bytes 14736:16384 Just as a summary of the whole thing: - Happens only when we introduce 6.12.57 kernel into our testing system. - The boards go from running 6.12.57, to 4.14.20, to 6.1.75 for testing old release. When the 6.1.75 tries to access files in the cfg volume, there we see the error. - We don't see any warning or error messages from the kernel. - It only happens in the two different powerpc cpu boards we have, not in the armv8 one. All the boards use fsl,ifc-nand and the same nand chip. - ubi tests from mtd-utils succeed for the older and the newer kernel too. Best Regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-06 16:14 ` Tomas Alvarez Vanoli @ 2026-02-07 2:58 ` Zhihao Cheng 2026-02-09 15:45 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-07 2:58 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/7 0:14, Tomas Alvarez Vanoli 写道: >> Are there any error/warning messages(from ubi/ubifs/flash driver) during >> scanning? > > Now in the mounting logs we get some ECC errors, but I am sure this comes from > the way the image was dumped and flashed again, because I did not see this > before. No truncation appears for the affected inode. > > I have another board where the corruption happened, so I can analyze that one > with the patches suggested too. > On this other board, the error is the same but with a different file: > > UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "cfg" > UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 > bytes/2048 bytes > UBIFS (ubi1:0): FS size: 132943872 bytes (126 MiB, 1047 LEBs), max 1058 LEBs, > journal size 6602752 bytes (6 MiB, 52 LEBs) > UBIFS (ubi1:0): reserved for root: 4952683 bytes (4836 KiB) > UBIFS (ubi1:0): media format: w5/r0 (latest is w5/r0), UUID > DC7EC969-1F4D-4669-9D31-16C57EAB96DA, small LPT model > UBIFS error (ubi1:0 pid 134): ubifs_iget: failed to read inode 123367, error -2 > UBIFS error (ubi1:0 pid 134): ubifs_lookup: dead directory entry > 'common_userdata_compat.xml.gz', error -2 > UBIFS warning (ubi1:0 pid 134): ubifs_ro_mode: switched to read-only mode, error > -2 > > There are no error messages in the mount log for this board, the node situation > is the same, and there are no truncation operations for this node. > > [me@mypc debug]$ grep -B8 -A5 123367 logs_from_mounting_second_board > data 123362 seq 1375451 found 1 2048-2520 > padding bytes 2520:4096 > data 123364 seq 1375458 found 1 4096-4576 > padding bytes 4576:6144 > data 123365 seq 1375459 found 1 6144-6624 > padding bytes 6624:8192 > data 123366 seq 1375489 found 1 8192-8680 > padding bytes 8680:10240 > data 123367 seq 1375493 found 0 10240-10728 > padding bytes 10728:12288 > data 123368 seq 1375507 found 1 12288-13592 > padding bytes 13592:14336 > data 123369 seq 1375508 found 1 14336-15384 > padding bytes 15384:16384 > -- > ino 123363 seq 1375463 nlink 2 found 0 125168-125328 > padding bytes 125328:126976 > ---------- Start scan LEB 182 (main_first 11) ---------- > ---------- Start scan LEB 183 (main_first 11) ---------- > ino 123365 seq 1375465 nlink 1 found 1 0-160 > padding bytes 160:2048 > ino 123366 seq 1375466 nlink 1 found 0 2048-2208 > padding bytes 2208:4096 > dent 123367(common_userdata_compat.xml.gz) seq 1375486 found 1 4096-4184 > ino 123367 seq 1375487 nlink 1 found 0 4184-4344 > ino 123361 seq 1375488 nlink 2 found 0 4344-4504 > padding bytes 4504:6144 > ino 123367 seq 1375490 nlink 1 found 0 6144-6304 > padding bytes 6304:8192 > ino 123366 seq 1375492 nlink 1 found 1 8192-8352 > padding bytes 8352:10240 > dent 123368(ne_config.xml.gz) seq 1375496 found 1 10240-10320 > ino 123368 seq 1375497 nlink 1 found 0 10320-10480 > ino 123363 seq 1375498 nlink 2 found 0 10480-10640 > padding bytes 10640:12288 > ino 123367 seq 1375500 nlink 1 found 0 12288-12448 > padding bytes 12448:14336 > dent 123369(unit-11_config.xml.gz) seq 1375502 found 1 14336-14416 > ino 123369 seq 1375503 nlink 1 found 0 14416-14576 > ino 123361 seq 1375504 nlink 2 found 0 14576-14736 > padding bytes 14736:16384 > Looks like that it is the same problem. > > Just as a summary of the whole thing: > > - Happens only when we introduce 6.12.57 kernel into our testing system. > > - The boards go from running 6.12.57, to 4.14.20, to 6.1.75 for testing old > release. When the 6.1.75 tries to access files in the cfg volume, there we see > the error. The same ubifs image is loaded by 6.12.57, 4.14.20 and 6.1.75? And the ubi volume won't be formatted after switching kernel? If I understand right, do you backport ubifs bugfix patches to 4.14? Following commits could corrupt the ubifs image, which may lead to the problem: 1. ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e ubifs: Check link count of inodes when killing orphans. 2. 4ab25ac8b2b5514151d5f91cf9514df08dd26938 ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans() It would be better to apply fix patches through 4.14~HEAD. (git log v4.14..HEAD fs/ubifs/) > > - We don't see any warning or error messages from the kernel. > > - It only happens in the two different powerpc cpu boards we have, not in the > armv8 one. All the boards use fsl,ifc-nand and the same nand chip. > > - ubi tests from mtd-utils succeed for the older and the newer kernel too. > > > Best Regards, > Tomas. > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-07 2:58 ` Zhihao Cheng @ 2026-02-09 15:45 ` Tomas Alvarez Vanoli 2026-02-10 2:38 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-09 15:45 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org >> >> Just as a summary of the whole thing: >> >> - Happens only when we introduce 6.12.57 kernel into our testing system. >> >> - The boards go from running 6.12.57, to 4.14.20, to 6.1.75 for >> testing old release. When the 6.1.75 tries to access files in the cfg >> volume, there we see the error. > The same ubifs image is loaded by 6.12.57, 4.14.20 and 6.1.75? And the ubi > volume won't be formatted after switching kernel? Yes, the same ubifs image is shared by the kernels/applications. > If I understand right, do you backport ubifs bugfix patches to 4.14? No we don't. The application that uses 4.14.20 is the read-only factory image, which is not updated normally. It mounts the ubi volume but it only reads a file from it, won't do much to it. This one in particular was compiled in 2018 for example, so it does not include those commits. > Following commits could corrupt the ubifs image, which may lead to the > problem: > 1. ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e ubifs: Check link count of inodes > when killing orphans. > 2. 4ab25ac8b2b5514151d5f91cf9514df08dd26938 ubifs: Fix > ubifs_tnc_lookup() usage in do_kill_orphans() > >It would be better to apply fix patches through 4.14~HEAD. > (git log v4.14..HEAD fs/ubifs/) By this you mean that these patches can lead to corruption when switching to a version of the kernel that does not have them? If that is what you mean, I don't think it is the case because we have been running 6.1 with these patches for more than a year and never seen the issue, however it appears within hours of introducing 6.12 into the mix. Somehow there is some difference in between 6.12 and 6.1 that makes this error happen, but it could potentially have to do also with jumping from 6.12 to 4.14, even though jumping from 6.1 to 4.14 has not caused the problem for the past 1+ year. Best regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-09 15:45 ` Tomas Alvarez Vanoli @ 2026-02-10 2:38 ` Zhihao Cheng 2026-02-10 16:40 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-10 2:38 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/9 23:45, Tomas Alvarez Vanoli 写道: >>> >>> Just as a summary of the whole thing: >>> >>> - Happens only when we introduce 6.12.57 kernel into our testing system. >>> >>> - The boards go from running 6.12.57, to 4.14.20, to 6.1.75 for >>> testing old release. When the 6.1.75 tries to access files in the cfg >>> volume, there we see the error. >> The same ubifs image is loaded by 6.12.57, 4.14.20 and 6.1.75? And the ubi >> volume won't be formatted after switching kernel? > > Yes, the same ubifs image is shared by the kernels/applications. > >> If I understand right, do you backport ubifs bugfix patches to 4.14? > > No we don't. The application that uses 4.14.20 is the read-only factory image, > which is not updated normally. It mounts the ubi volume but it only reads a file > from it, won't do much to it. This one in particular was compiled in 2018 for > example, so it does not include those commits. Is the ubifs mounted in readonly mode in 4.14? If not, the do_kill_orphan process will be committed on flash. > >> Following commits could corrupt the ubifs image, which may lead to the >> problem: >> 1. ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e ubifs: Check link count of inodes >> when killing orphans. >> 2. 4ab25ac8b2b5514151d5f91cf9514df08dd26938 ubifs: Fix >> ubifs_tnc_lookup() usage in do_kill_orphans() >> >> It would be better to apply fix patches through 4.14~HEAD. >> (git log v4.14..HEAD fs/ubifs/) > > By this you mean that these patches can lead to corruption when switching to a > version of the kernel that does not have them? If that is what you mean, I don't > think it is the case because we have been running 6.1 with these patches for > more than a year and never seen the issue, however it appears within hours of > introducing 6.12 into the mix. > > Somehow there is some difference in between 6.12 and 6.1 that makes this error > happen, but it could potentially have to do also with jumping from 6.12 to 4.14, > even though jumping from 6.1 to 4.14 has not caused the problem for the past 1+ > year. The commit 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902("ubifs: Fix unattached inode when powercut happens in creating") [v6.11] is introduced between 6.1 and 6.12, which will add every inode into orphan area at the begining of creation. So, you may try option 1 or 2 and check whether the problem could happen: 1. mount 4.14 readonly 2. backport fix patches on 4.14 > > Best regards, > Tomas > > > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-10 2:38 ` Zhihao Cheng @ 2026-02-10 16:40 ` Tomas Alvarez Vanoli 2026-02-11 6:58 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-10 16:40 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org >Is the ubifs mounted in readonly mode in 4.14? >If not, the do_kill_orphan process will be committed on flash. No, it is not read only. >The commit 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902("ubifs: Fix >unattached inode when powercut happens in creating") [v6.11] is >introduced between 6.1 and 6.12, which will add every inode into orphan >area at the begining of creation. By creation you mean the first occurence of it or whenever there's an update? If I understand correctly, what you suggest is that it's possible that as the new inode is added to orphan area and the board is reset and booted into the older kernel, which then cleans all the orphans even if they have link count, leaving the fs broken? So, in other words, my understanding of what you say is that for 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 to not break anything, any kernel that mounts the same partition needs ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e and 4ab25ac8b2b5514151d5f91cf9514df08dd26938 . >So, you may try option 1 or 2 and check whether the problem could happen: >1. mount 4.14 readonly >2. backport fix patches on 4.14 Potentially also replacing the 4.14 application with an application based on 6.1? I am also interested in consistently reproducing this at my desk. I have been tinkering with adding a message + a 10 second delay at points in the ubifs code so that I can reset the board and boot the old kernel to trigger this behaviour. Would you know at which exact point/conditions it would make sense to add this? Or is it not possible? I do not fully understand your theory, I have read the docs for the fs but I have little experience with filesystem implementation in general, so I am not sure how feasible this "hack" is. Thanks for all your support! Best regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-10 16:40 ` Tomas Alvarez Vanoli @ 2026-02-11 6:58 ` Zhihao Cheng 2026-02-11 7:01 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-11 6:58 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/11 0:40, Tomas Alvarez Vanoli 写道: >> Is the ubifs mounted in readonly mode in 4.14? >> If not, the do_kill_orphan process will be committed on flash. > > No, it is not read only. > >> The commit 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902("ubifs: Fix >> unattached inode when powercut happens in creating") [v6.11] is >> introduced between 6.1 and 6.12, which will add every inode into orphan >> area at the begining of creation. > > By creation you mean the first occurence of it or whenever there's an update? > > If I understand correctly, what you suggest is that it's possible that as > the new inode is added to orphan area and the board is reset and booted into the > older kernel, which then cleans all the orphans even if they have link count, > leaving the fs broken? > > So, in other words, my understanding of what you say is that for > 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 to not break anything, any kernel that > mounts the same partition needs ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e and > 4ab25ac8b2b5514151d5f91cf9514df08dd26938 . Correct. > >> So, you may try option 1 or 2 and check whether the problem could happen: >> 1. mount 4.14 readonly >> 2. backport fix patches on 4.14 > > Potentially also replacing the 4.14 application with an application based on > 6.1? What does 'application' mean? > > I am also interested in consistently reproducing this at my desk. I have been > tinkering with adding a message + a 10 second delay at points in the ubifs code > so that I can reset the board and boot the old kernel to trigger this behaviour. > Would you know at which exact point/conditions it would make sense to add this? > Or is it not possible? > I do not fully understand your theory, I have read the docs for the fs but I > have little experience with filesystem implementation in general, so I am not > sure how feasible this "hack" is. Apply following patch in 6.12: Apply following patch in 4.14: Run: start kernel 6.12 $ touch $MNT/file # cmd will stuck, $MNT is the ubifs mntpoint $ dmesg # you will see kernel message 'wait sync and write orphans', open another terminal $ sync # open another terminal, you will see kernel message 'write orphan XX', and the 'touch' cmd goes on $ dd if=/dev/mtd0 of=flash bs=1M $ sync and poweroff start kernel 4.14 $ flash_eraseall /dev/mtd0 $ dd if=flash of=/dev/mtd0 bs=1M # make sure mtd0 is not attached by ubi $ ubiattach -m0 $ mount /dev/ubi0_0 temp # dmesg will show 'remove ino XX' $ ls temp ls: cannot access 'temp/file': No such file or directory file $ dmesg [ 69.200639] UBIFS error (ubi0:0 pid 1587): ubifs_iget [ubifs]: failed to read inode 65, error -2 [ 69.206214] UBIFS error (ubi0:0 pid 1587): ubifs_lookup [ubifs]: dead directory entry 'file', error -2 [ 69.207482] UBIFS warning (ubi0:0 pid 1587): ubifs_ro_mode [ubifs]: switched to read-only mode, error -2 [ 69.208783] CPU: 2 PID: 1587 Comm: ls Not tainted 4.19.90-00006-g332043630ae1-dirty #50 [ 69.209837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 69.211545] Call Trace: [ 69.211921] dump_stack+0xb8/0xef [ 69.212370] ubifs_ro_mode+0x54/0x60 [ubifs] [ 69.212950] ubifs_lookup+0x3b6/0x550 [ubifs] [ 69.213535] __lookup_slow+0x9e/0x210 [ 69.214045] lookup_slow+0x46/0x70 [ 69.214491] walk_component+0x265/0x4c0 [ 69.215013] path_lookupat.isra.0+0x9e/0x360 [ 69.215588] filename_lookup+0xb5/0x230 [ 69.216107] ? _raw_spin_unlock_irqrestore+0x3f/0x80 [ 69.216769] ? __this_cpu_preempt_check+0x17/0x20 [ 69.217386] ? __local_bh_enable_ip+0x62/0x110 [ 69.218038] ? _raw_spin_unlock_bh+0x2e/0x40 [ 69.218761] ? wb_wakeup_delayed+0x56/0x90 [ 69.219301] ? kmem_cache_alloc+0x110/0x3d0 [ 69.219852] user_path_at_empty+0x43/0x60 [ 69.220391] vfs_statx+0x90/0x150 [ 69.220847] ? __sb_end_write+0x5f/0xb0 [ 69.221354] __se_sys_newlstat+0x46/0xa0 [ 69.221892] ? __se_sys_getdents+0xe0/0x1c0 [ 69.222465] ? filldir64+0x340/0x340 [ 69.222958] __x64_sys_newlstat+0x1a/0x30 [ 69.223499] do_syscall_64+0x95/0x460 [ 69.224004] ? prepare_exit_to_usermode+0xaa/0x180 [ 69.224652] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 > > Thanks for all your support! > Best regards, > Tomas > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-11 6:58 ` Zhihao Cheng @ 2026-02-11 7:01 ` Zhihao Cheng 2026-02-11 16:51 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-11 7:01 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/11 14:58, Zhihao Cheng 写道: > 在 2026/2/11 0:40, Tomas Alvarez Vanoli 写道: >>> Is the ubifs mounted in readonly mode in 4.14? >>> If not, the do_kill_orphan process will be committed on flash. >> >> No, it is not read only. >> >>> The commit 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902("ubifs: Fix >>> unattached inode when powercut happens in creating") [v6.11] is >>> introduced between 6.1 and 6.12, which will add every inode into orphan >>> area at the begining of creation. >> >> By creation you mean the first occurence of it or whenever there's an >> update? >> >> If I understand correctly, what you suggest is that it's possible that as >> the new inode is added to orphan area and the board is reset and >> booted into the >> older kernel, which then cleans all the orphans even if they have link >> count, >> leaving the fs broken? >> >> So, in other words, my understanding of what you say is that for >> 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 to not break anything, any >> kernel that >> mounts the same partition needs >> ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e and >> 4ab25ac8b2b5514151d5f91cf9514df08dd26938 . > > Correct. >> >>> So, you may try option 1 or 2 and check whether the problem could >>> happen: >>> 1. mount 4.14 readonly >>> 2. backport fix patches on 4.14 >> >> Potentially also replacing the 4.14 application with an application >> based on >> 6.1? > > What does 'application' mean? >> >> I am also interested in consistently reproducing this at my desk. I >> have been >> tinkering with adding a message + a 10 second delay at points in the >> ubifs code >> so that I can reset the board and boot the old kernel to trigger this >> behaviour. >> Would you know at which exact point/conditions it would make sense to >> add this? >> Or is it not possible? >> I do not fully understand your theory, I have read the docs for the fs >> but I >> have little experience with filesystem implementation in general, so I >> am not >> sure how feasible this "hack" is. > > Apply following patch in 6.12: > diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 3c3d3ad4fa6c..3d2f0f6b9a07 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -302,6 +302,8 @@ static int ubifs_prepare_create(struct inode *dir, struct dentry *dentry, return fscrypt_setup_filename(dir, &dentry->d_name, 0, nm); } +#include <linux/delay.h> +extern int g_wait; static int ubifs_create(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { @@ -341,6 +343,13 @@ static int ubifs_create(struct mnt_idmap *idmap, struct inode *dir, if (err) goto out_inode; + pr_err("wait sync and write orphans\n"); + while (!g_wait) { + smp_rmb(); + msleep(10); + } + pr_err("wait done\n"); + set_nlink(inode, 1); mutex_lock(&dir_ui->ui_mutex); dir->i_size += sz_change; diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c index 5555dd740889..943c93e2f49d 100644 --- a/fs/ubifs/orphan.c +++ b/fs/ubifs/orphan.c @@ -284,6 +284,7 @@ static int do_write_orph_node(struct ubifs_info *c, int len, int atomic) * orphan head. On success, %0 is returned, otherwise a negative error code * is returned. */ +int g_wait = 0; static int write_orph_node(struct ubifs_info *c, int atomic) { struct ubifs_orphan *orphan, *cnext; @@ -318,6 +319,8 @@ static int write_orph_node(struct ubifs_info *c, int atomic) orphan = cnext; ubifs_assert(c, orphan->cmt); orph->inos[i] = cpu_to_le64(orphan->inum); + pr_err("write orphan %lu\n", orphan->inum); + g_wait = 1; orphan->cmt = 0; cnext = orphan->cnext; orphan->cnext = NULL; > > Apply following patch in 4.14: > diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c index 8f70494efb0c..f04a9609c753 100644 --- a/fs/ubifs/orphan.c +++ b/fs/ubifs/orphan.c @@ -614,6 +614,7 @@ static int do_kill_orphans(struct ubifs_info *c, struct ubifs_scan_leb *sleb, inum = le64_to_cpu(orph->inos[i]); dbg_rcvry("deleting orphaned inode %lu", (unsigned long)inum); + pr_err("remove ino %lu\n", inum); err = ubifs_tnc_remove_ino(c, inum); if (err) return err; > > Run: My ubifs image is mkfs in 4.14. > start kernel 6.12 > $ touch $MNT/file # cmd will stuck, $MNT is the ubifs mntpoint > $ dmesg # you will see kernel message 'wait sync and write orphans', > open another terminal > $ sync # open another terminal, you will see kernel message 'write > orphan XX', and the 'touch' cmd goes on > $ dd if=/dev/mtd0 of=flash bs=1M > $ sync and poweroff > > start kernel 4.14 > $ flash_eraseall /dev/mtd0 > $ dd if=flash of=/dev/mtd0 bs=1M # make sure mtd0 is not attached by ubi > $ ubiattach -m0 > $ mount /dev/ubi0_0 temp # dmesg will show 'remove ino XX' > $ ls temp > ls: cannot access 'temp/file': No such file or directory > file > $ dmesg > [ 69.200639] UBIFS error (ubi0:0 pid 1587): ubifs_iget [ubifs]: failed > to read inode 65, error -2 > [ 69.206214] UBIFS error (ubi0:0 pid 1587): ubifs_lookup [ubifs]: dead > directory entry 'file', error -2 > [ 69.207482] UBIFS warning (ubi0:0 pid 1587): ubifs_ro_mode [ubifs]: > switched to read-only mode, error -2 > [ 69.208783] CPU: 2 PID: 1587 Comm: ls Not tainted > 4.19.90-00006-g332043630ae1-dirty #50 > [ 69.209837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 > 04/01/2014 > [ 69.211545] Call Trace: > [ 69.211921] dump_stack+0xb8/0xef > [ 69.212370] ubifs_ro_mode+0x54/0x60 [ubifs] > [ 69.212950] ubifs_lookup+0x3b6/0x550 [ubifs] > [ 69.213535] __lookup_slow+0x9e/0x210 > [ 69.214045] lookup_slow+0x46/0x70 > [ 69.214491] walk_component+0x265/0x4c0 > [ 69.215013] path_lookupat.isra.0+0x9e/0x360 > [ 69.215588] filename_lookup+0xb5/0x230 > [ 69.216107] ? _raw_spin_unlock_irqrestore+0x3f/0x80 > [ 69.216769] ? __this_cpu_preempt_check+0x17/0x20 > [ 69.217386] ? __local_bh_enable_ip+0x62/0x110 > [ 69.218038] ? _raw_spin_unlock_bh+0x2e/0x40 > [ 69.218761] ? wb_wakeup_delayed+0x56/0x90 > [ 69.219301] ? kmem_cache_alloc+0x110/0x3d0 > [ 69.219852] user_path_at_empty+0x43/0x60 > [ 69.220391] vfs_statx+0x90/0x150 > [ 69.220847] ? __sb_end_write+0x5f/0xb0 > [ 69.221354] __se_sys_newlstat+0x46/0xa0 > [ 69.221892] ? __se_sys_getdents+0xe0/0x1c0 > [ 69.222465] ? filldir64+0x340/0x340 > [ 69.222958] __x64_sys_newlstat+0x1a/0x30 > [ 69.223499] do_syscall_64+0x95/0x460 > [ 69.224004] ? prepare_exit_to_usermode+0xaa/0x180 > [ 69.224652] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 > >> >> Thanks for all your support! >> Best regards, >> Tomas >> . >> > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-11 7:01 ` Zhihao Cheng @ 2026-02-11 16:51 ` Tomas Alvarez Vanoli 2026-02-12 1:19 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-11 16:51 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org > What does 'application' mean? I meant just completely changing the kernel and rootfs that is there as a backup image and using a more recent kernel. > > Apply following patch in 6.12: > [...] Unfortunately on my system this just gets stuck, because the only way write_orph_node gets called is through do_commit calling ubifs_orphan_end_commit, but that path is never executed (except when mounting). I tried mounting with and without sync, same results. Best Regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-11 16:51 ` Tomas Alvarez Vanoli @ 2026-02-12 1:19 ` Zhihao Cheng 2026-02-12 15:43 ` Tomas Alvarez Vanoli 0 siblings, 1 reply; 19+ messages in thread From: Zhihao Cheng @ 2026-02-12 1:19 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/12 0:51, Tomas Alvarez Vanoli 写道: >> What does 'application' mean? > I meant just completely changing the kernel and rootfs that is there as a backup > image and using a more recent kernel. > Yes, it will prevent the problem too. >> >> Apply following patch in 6.12: >> [...] > > Unfortunately on my system this just gets stuck, because the only way > write_orph_node gets called is through do_commit calling > ubifs_orphan_end_commit, but that path is never executed (except when mounting). > > I tried mounting with and without sync, same results. > The 'sync' syscall will call ubifs_orphan_end_commit if the orhpan list is not empty. ubifs_sync_fs -> ubifs_run_commit -> do_commit -> ubifs_orphan_end_commit In ubifs_create, the new inode will be added into orphan list. ubifs_create -> ubifs_new_inode -> ubifs_add_orphan You can add some debug messages to check which condition is broken. > Best Regards, > Tomas > . > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-12 1:19 ` Zhihao Cheng @ 2026-02-12 15:43 ` Tomas Alvarez Vanoli 2026-02-13 1:16 ` Zhihao Cheng 0 siblings, 1 reply; 19+ messages in thread From: Tomas Alvarez Vanoli @ 2026-02-12 15:43 UTC (permalink / raw) To: Zhihao Cheng, linux-mtd@lists.infradead.org >The 'sync' syscall will call ubifs_orphan_end_commit if the orhpan list >is not empty. >ubifs_sync_fs -> ubifs_run_commit -> do_commit -> ubifs_orphan_end_commit > >In ubifs_create, the new inode will be added into orphan list. >ubifs_create -> ubifs_new_inode -> ubifs_add_orphan > >You can add some debug messages to check which condition is broken. Alright, it was some user error :) I am able to reproduce it consistently now. We will have to analyze what to do in the end about this, because we have no control of what versions are being ran in what order in the field. Backwards compatibility and being able to switch versions is important. Since this volume does not use encryption and 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 seems to be fixing a bug related only to encryption, we might just revert the commit for the affected boards. Hopefully this will not generate other unforseen consecuences. I'll submit the fixes for mtd-utils tomorrow. Thanks for all the help Best Regards, Tomas ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] UBIFS corruption on powerpc 32-bit targets 2026-02-12 15:43 ` Tomas Alvarez Vanoli @ 2026-02-13 1:16 ` Zhihao Cheng 0 siblings, 0 replies; 19+ messages in thread From: Zhihao Cheng @ 2026-02-13 1:16 UTC (permalink / raw) To: Tomas Alvarez Vanoli, linux-mtd@lists.infradead.org 在 2026/2/12 23:43, Tomas Alvarez Vanoli 写道: >> The 'sync' syscall will call ubifs_orphan_end_commit if the orhpan list >> is not empty. >> ubifs_sync_fs -> ubifs_run_commit -> do_commit -> ubifs_orphan_end_commit >> >> In ubifs_create, the new inode will be added into orphan list. >> ubifs_create -> ubifs_new_inode -> ubifs_add_orphan >> >> You can add some debug messages to check which condition is broken. > > Alright, it was some user error :) I am able to reproduce it consistently now. > > We will have to analyze what to do in the end about this, because we have no > control of what versions are being ran in what order in the field. Backwards > compatibility and being able to switch versions is important. Since this > volume does not use encryption and 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 > seems to be fixing a bug related only to encryption, we might just revert the > commit for the affected boards. Hopefully this will not generate other > unforseen consecuences. If the xattr(selinux,encryption,user_xattr) is not enabled, 3af2d3a8c56fe7dc24f60c4df0ab85b7ac941902 can be reverted. But the problem may still happend if application links tempfile. > > I'll submit the fixes for mtd-utils tomorrow. Thanks. I'll take a week-long vacation statrting from 2.15. Therefore, I might not be able to respond in time. > > Thanks for all the help > Best Regards, > Tomas. > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-02-13 1:16 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-29 10:30 [BUG] UBIFS corruption on powerpc 32-bit targets Tomas Alvarez Vanoli 2026-01-30 1:34 ` Zhihao Cheng 2026-02-03 9:12 ` Tomas Alvarez Vanoli 2026-02-04 4:54 ` Zhihao Cheng 2026-02-04 14:04 ` Tomas Alvarez Vanoli 2026-02-05 2:14 ` Zhihao Cheng 2026-02-05 15:47 ` Tomas Alvarez Vanoli 2026-02-06 2:21 ` Zhihao Cheng 2026-02-06 16:14 ` Tomas Alvarez Vanoli 2026-02-07 2:58 ` Zhihao Cheng 2026-02-09 15:45 ` Tomas Alvarez Vanoli 2026-02-10 2:38 ` Zhihao Cheng 2026-02-10 16:40 ` Tomas Alvarez Vanoli 2026-02-11 6:58 ` Zhihao Cheng 2026-02-11 7:01 ` Zhihao Cheng 2026-02-11 16:51 ` Tomas Alvarez Vanoli 2026-02-12 1:19 ` Zhihao Cheng 2026-02-12 15:43 ` Tomas Alvarez Vanoli 2026-02-13 1:16 ` Zhihao Cheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox