* metadata_csum + unclean shutdown = failure to boot @ 2012-10-07 5:04 George Spelvin 2012-10-07 13:39 ` Tao Ma 0 siblings, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-10-07 5:04 UTC (permalink / raw) To: linux-ext4; +Cc: linux Feeling a bit adventurous, I enabled metadata_csum on many of my daily-use file systems. I have now noticed a problem in the event of an unclean shutdown. On reboot, the kernel complains about a bad superblock checksum, suggests e2fsck, and then fails to mount the root filesystem. This makes running e2fsck a bit problematic. I can fix it manually, but it makes automatic reboots *extremely* problematic. Is it possible to fix the kernel code to be a bit more forgiving? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 5:04 metadata_csum + unclean shutdown = failure to boot George Spelvin @ 2012-10-07 13:39 ` Tao Ma 2012-10-07 15:09 ` George Spelvin 0 siblings, 1 reply; 22+ messages in thread From: Tao Ma @ 2012-10-07 13:39 UTC (permalink / raw) To: George Spelvin; +Cc: linux-ext4 Hi George, On 10/07/2012 01:04 PM, George Spelvin wrote: > Feeling a bit adventurous, I enabled metadata_csum on many of > my daily-use file systems. > > I have now noticed a problem in the event of an unclean shutdown. > > On reboot, the kernel complains about a bad superblock checksum, suggests > e2fsck, and then fails to mount the root filesystem. > > This makes running e2fsck a bit problematic. > > I can fix it manually, but it makes automatic reboots *extremely* > problematic. > > Is it possible to fix the kernel code to be a bit more forgiving? Interesting. In general, metadata checksum should be updated with the same content it checksums. So could you please answer my questions first? 1. your kernel version please? 2. what do you mean a *unclean* shutdown? 3. what do you find in your /var/log/messages except the bad superblock checksum error? Thanks Tao ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 13:39 ` Tao Ma @ 2012-10-07 15:09 ` George Spelvin 2012-10-07 18:10 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-10-07 15:09 UTC (permalink / raw) To: linux, tm; +Cc: linux-ext4 > Interesting. In general, metadata checksum should be updated with the > same content it checksums. So could you please answer my questions first? Good point; it *is* only a single (512 byte) sector. And it's a 512-byte sector drive (pair of drives in RAID-1, actually). However, it appears that while the FS is mounted, an invalid checksum *is* written. That's the bug: # dumpe2fs /dev/md2 dumpe2fs 1.43-WIP (22-Sep-2012) dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2 Couldn't find valid filesystem superblock. # /tmp/old/sbin/dumpe2fs -f -h /dev/md2 dumpe2fs 1.42.5 (29-Jul-2012) ./dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2 Couldn't find valid filesystem superblock. (Unfortunately, dumpe2fs doesn't have a -n flag like debugfs.) Here's the first 2K of the partition (superblock at 1K), in case it helps: # xxd -g4 -l2048 -a /dev/md2 0000000: 00000000 00000000 00000000 00000000 ................ * 00003f0: 00000000 00000000 00000000 1d32be49 .............2.I 0000400: b01d4300 301a3605 82b44200 28242504 ..C.0.6...B.($%. 0000410: faae3900 00000000 02000000 02000000 ..9............. 0000420: 00800000 00800000 70060000 c8007150 ........p.....qP 0000430: c8007150 0200ffff 53ef0100 01000000 ..qP....S....... 0000440: eaf37050 00000000 00000000 01000000 ..pP............ 0000450: 00000000 0b000000 00010000 3c000000 ............<... 0000460: 46020000 6b040000 a61d8e82 4c814f84 F...k.......L.O. 0000470: 9011cf24 8d295eeb 726f6f74 00000000 ...$.)^.root.... 0000480: 00000000 00000000 2f006e74 00000000 ......../.nt.... 0000490: 00000000 00000000 00000000 00000000 ................ * 00004c0: 00000000 00000000 00000000 0000eb03 ................ 00004d0: 00000000 00000000 00000000 00000000 ................ 00004e0: 08000000 00000000 a1863300 dc2dbaa1 ..........3..-.. 00004f0: 7ada4a32 96a5dbe8 c42859c2 01010000 z.J2.....(Y..... 0000500: 0c000000 00000000 b2fbc24f 0af30200 ...........O.... 0000510: 04000000 00000000 00000000 ff7f0000 ................ 0000520: 00809802 ff7f0000 01000000 ffff9802 ................ 0000530: 00000000 00000000 00000000 00000000 ................ 0000540: 00000000 00000000 00000000 00000008 ................ 0000550: 00000000 00000000 00000000 1c001c00 ................ 0000560: 01000000 00000000 00000000 00000000 ................ 0000570: 00000000 04010000 e4142809 00000000 ..........(..... 0000580: 00000000 00000000 00000000 00000000 ................ * 00007f0: 00000000 00000000 00000000 38a11164 ............8..d > 1. your kernel version please? 3.6.0. > 2. what do you mean a *unclean* shutdown? AC power failure. This is actually the second time I've seen the problem, althought the first was pilot error while trying to rearrange fan power cables with the power on. > 3. what do you find in your /var/log/messages except the bad superblock > checksum error? I don't understand the question. There's nothing there *including* no bad superblock checksum error! The kernel panicked with "unable to mount root file system", so it didn't even load init, much less get /var/log writeable or start a syslog process. I didn't transcribe the on-screen messages because I assumed the code was working "as expected" on reboot: it only checks the primary superblock, and if there's an error there, it bails. On reboot, there's nothing interesting, since by the time we got there, e2fsck had run and cleaned up the file system. It just says Oct 7 00:11:05 $HOST kernel: EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null) Oct 7 00:11:05 $HOST kernel: VFS: Mounted root (ext4 filesystem) readonly on device 9:2. ... followed by module loaading. The code causing the mount failure on boot is easy to find. fs/ext4/super.c line 3311: /* Check superblock checksum */ if (!ext4_superblock_csum_verify(sb, es)) { ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with " "invalid superblock checksum. Run e2fsck?"); silent = 1; goto cantfind_ext4; } [...] cantfind_ext4: if (!silent) ext4_msg(sb, KERN_ERR, "VFS: Can't find ext4 filesystem"); goto failed_mount; The challenge is to find what's writing the bad superblock checksum. Here's one clue (/boot also has metadata_csum enabled) # dumpe2fs -h /dev/md0 | tee /tmp/1 dumpe2fs 1.43-WIP (22-Sep-2012) Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72345 Free inodes: 26976 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Sun Oct 7 14:51:04 2012 Last write time: Sun Oct 7 14:51:33 2012 Mount count: 6 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 33 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0x0063be5b Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 Journal sequence: 0x0000f765 Journal start: 0 # mount /boot # dumpe2fs -h /dev/md0 | diff -u /tmp/1 - dumpe2fs 1.43-WIP (22-Sep-2012) --- /tmp/1 2012-10-07 14:51:39.337345910 +0000 +++ - 2012-10-07 14:51:52.454825889 +0000 @@ -3,7 +3,7 @@ Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) -Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum +Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean @@ -24,9 +24,9 @@ Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 -Last mount time: Sun Oct 7 14:51:04 2012 -Last write time: Sun Oct 7 14:51:33 2012 -Mount count: 6 +Last mount time: Sun Oct 7 14:51:42 2012 +Last write time: Sun Oct 7 14:51:42 2012 +Mount count: 7 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) @@ -42,7 +42,7 @@ Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c -Checksum: 0x0063be5b +Checksum: 0x90bee798 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 # ln /boot/sid.bmp /boot/foo # dumpe2fs -h /dev/md0 | diff -u /tmp/1 - dumpe2fs 1.43-WIP (22-Sep-2012) --- /tmp/1 2012-10-07 14:51:39.337345910 +0000 +++ - 2012-10-07 14:53:43.619910763 +0000 @@ -3,7 +3,7 @@ Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) -Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum +Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean @@ -24,9 +24,9 @@ Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 -Last mount time: Sun Oct 7 14:51:04 2012 -Last write time: Sun Oct 7 14:51:33 2012 -Mount count: 6 +Last mount time: Sun Oct 7 14:51:42 2012 +Last write time: Sun Oct 7 14:51:42 2012 +Mount count: 7 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) @@ -42,10 +42,10 @@ Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c -Checksum: 0x0063be5b +Checksum: 0x90bee798 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 -Journal sequence: 0x0000f765 -Journal start: 0 +Journal sequence: 0x0000f766 +Journal start: 1 # touch /boot/bar (Lots more activity, and I can't make the checksum fail. But...) # umount /boot # mount /boot # touch /boot/baz Arrgh! I can't reproduce it! Earlier, I did "mount /boot", "dumpe2fs -h" (successfully), "touch /boot/foo", and dumpe2fs died with a checksum error. So I thought "aha! Is it the data write or the inode allocation? I'll make a hard link to avoid the inode allocation", but it appears it wasn't (just) either. But my root file system (/dev/md2) still has a messed up checksum... # dumpe2fs -h /dev/md2 dumpe2fs 1.43-WIP (22-Sep-2012) dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2 Couldn't find valid filesystem superblock. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 15:09 ` George Spelvin @ 2012-10-07 18:10 ` Theodore Ts'o 2012-10-07 20:18 ` George Spelvin 0 siblings, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2012-10-07 18:10 UTC (permalink / raw) To: George Spelvin; +Cc: tm, linux-ext4 I just had a random thought. Which bootloader are you using? Is it grub, or grub2 per chance? I wonder if it's grub modifying the file system and touching the superblock, and not knowing about the new metadata checksum feature.... - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 18:10 ` Theodore Ts'o @ 2012-10-07 20:18 ` George Spelvin 2012-10-07 22:54 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-10-07 20:18 UTC (permalink / raw) To: linux, tytso; +Cc: linux-ext4, tm > I just had a random thought. Which bootloader are you using? Is it > grub, or grub2 per chance? Nope, # lilo -V LILO version 23.2 (released 09-Apr-2011) (Debian GNU/Linux) It's a 32-bit Debian/unstable (sid) userland, on a 64-bit kernel, running on a 2nd gen i7-2xxx and a Gigabyte Z68A-D3H-B3 motherboard. The drives are using the motherboard AHCI controller. Thanks for the idea, though. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 20:18 ` George Spelvin @ 2012-10-07 22:54 ` Theodore Ts'o 2012-10-08 1:05 ` George Spelvin 2012-10-08 1:25 ` George Spelvin 0 siblings, 2 replies; 22+ messages in thread From: Theodore Ts'o @ 2012-10-07 22:54 UTC (permalink / raw) To: George Spelvin; +Cc: linux-ext4, tm If you can replicate this, could you try applying the following patch to e2fsck, and install it and then capture the output from e2fsck when it repairs the file system? That might give us some clues as to what is going on. I've been going through the sources and I don't see any place where we mark the superblock as dirty and write it out without first writing the checksum first. There is a chance we could get screwed by a race in no journal mode where two processes modify superblock at the same time, but we don't actually modify the superblock that much. The primary case where the superblock gets modified while the file system is mounted is when we add and remove inods from the orphan list, and that is serialized by a mutex. The other times when we modify the superblock is when we add a feature in a few rare cases (the large file feature, or the xattr compat feature, etc.) and of course during an online resizing. But that's not likely to be happening in your case. So I really don't understand what might be happening on your system, which is why this patch will hopefully shed some light as to what is going on. - Ted diff --git a/e2fsck/unix.c b/e2fsck/unix.c index d2b1bbd..b1fe32c 100644 --- a/e2fsck/unix.c +++ b/e2fsck/unix.c @@ -1064,6 +1064,13 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr, retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options, flags, 0, 0, io_ptr, ret_fs); + if (*ret_fs && (*ret_fs)->super && retval == EXT2_ET_SB_CSUM_INVALID) { + list_super((*ret_fs)->super); + ext2fs_superblock_csum_set(*ret_fs, (*ret_fs)->super); + printf("Expected checksum was %04x\n", + (*ret_fs)->super->s_checksum); + } + if (ret_fs) e2fsck_set_bitmap_type(*ret_fs, EXT2FS_BMAP64_RBTREE, "default", NULL); ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 22:54 ` Theodore Ts'o @ 2012-10-08 1:05 ` George Spelvin 2012-10-08 1:25 ` George Spelvin 1 sibling, 0 replies; 22+ messages in thread From: George Spelvin @ 2012-10-08 1:05 UTC (permalink / raw) To: linux, tytso; +Cc: linux-ext4, tm > If you can replicate this, could you try applying the following patch > to e2fsck, and install it and then capture the output from e2fsck when > it repairs the file system? Well, as I mentioned, the superblock of the currently running root filesystem has a bad checksum right now, so if you don't mind me NOT repairing the FS, it's particularly easy. (What's why I included a hex-dump of the superblock earlier.) Let me try fsck -n on the running file system... # ./e2fsck -n /dev/md2 e2fsck 1.43-WIP (22-Sep-2012) Warning! /dev/md2 is mounted. Filesystem volume name: root Last mounted on: / Filesystem UUID: a61d8e82-4c81-4f84-9011-cf248d295eeb Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex _bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 4398512 Block count: 87431728 Reserved block count: 4371586 Free blocks: 69542952 Free inodes: 3780346 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1003 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 1648 Inode blocks per group: 103 Flex block group size: 16 Filesystem created: Mon May 28 04:14:42 2012 Last mount time: Sun Oct 7 04:10:48 2012 Last write time: Sun Oct 7 04:10:48 2012 Mount count: 2 Maximum mount count: -1 Last checked: Sun Oct 7 03:15:54 2012 Check interval: 0 (<none>) Lifetime writes: 147 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 3376801 Default directory hash: half_md4 Directory Hash Seed: dc2dbaa1-7ada-4a32-96a5-dbe8c42859c2 Journal backup: inode blocks Checksum type: crc32c Checksum: 0x6411a138 Expected checksum was 242b557a ext2fs_open2: Superblock checksum does not match superblock /tmp/e2fsck: Superblock invalid, trying backup blocks... Superblock needs_recovery flag is clear, but journal has data. Recovery flag not set in backup superblock, so running journal anyway. Clear journal? no root was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? no Inode 2214932 was part of the orphaned inode list. IGNORED. Deleted inode 2640258 has zero dtime. Fix? no Inode 3376801 was part of the orphaned inode list. IGNORED. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(5799936--5800165) -8017765 -8017789 -8027658 -8027660 -8958208 -(19016096--19016124) -38855165 -38873550 -52463109 -(58774956--58774992) -67160656 -67160667 -67160687 -67160703 -67160718 -67160729 -67160905 -69785176 Fix? no [etc.] Would hard-crashing the machine and running e2fsck on a static file systtem tell you more? > There is a chance we could get screwed by a race in no journal mode > where two processes modify superblock at the same time, but we don't > actually modify the superblock that much. The primary case where the > superblock gets modified while the file system is mounted is when we > add and remove inods from the orphan list, and that is serialized by a > mutex. The other times when we modify the superblock is when we add a > feature in a few rare cases (the large file feature, or the xattr > compat feature, etc.) and of course during an online resizing. But > that's not likely to be happening in your case. So I really don't > understand what might be happening on your system, which is why this > patch will hopefully shed some light as to what is going on. Thinking about it, it *is* confusing. Although with help from your clue about the orphan inode list, I just managed the following. It appears to be repeatable. Is this of any help? # mount /boot # dumpe2fs -h /dev/md0 dumpe2fs 1.43-WIP (22-Sep-2012) Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72229 Free inodes: 26977 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Mon Oct 8 00:57:42 2012 Last write time: Mon Oct 8 00:57:42 2012 Mount count: 13 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 34 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0xec7bcce8 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 Journal sequence: 0x0000f78c Journal start: 0 # sleep 5 > /boot/foo & rm /boot/foo [2] 6554 # dumpe2fs -h /dev/md0 dumpe2fs 1.43-WIP (22-Sep-2012) dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0 Couldn't find valid filesystem superblock. # /tmp/e2fsck -n /dev/md0 e2fsck 1.43-WIP (22-Sep-2012) Warning! /dev/md0 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. boot: clean, 22175/49152 files, 173371/245600 blocks [2]- Done sleep 5 > /boot/foo # dumpe2fs -h /dev/md0 dumpe2fs 1.43-WIP (22-Sep-2012) Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72229 Free inodes: 26977 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Mon Oct 8 00:57:42 2012 Last write time: Mon Oct 8 00:57:42 2012 Mount count: 13 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 34 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0xec7bcce8 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 Journal sequence: 0x0000f78d Journal start: 1 # sleep 5 > /boot/foo & rm /boot/foo ; dumpe2fs -h /dev/md0 ; dd if=/dev/md0 of=/tmp/md0 count=8 [2] 6137 dumpe2fs 1.43-WIP (22-Sep-2012) dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0 Couldn't find valid filesystem superblock. 8+0 records in 8+0 records out 4096 bytes (4.1 kB) copied, 3.8679e-05 s, 106 MB/s [666]# dumpe2fs -h /dev/md0 dumpe2fs 1.43-WIP (22-Sep-2012) Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72229 Free inodes: 26977 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Mon Oct 8 00:57:42 2012 Last write time: Mon Oct 8 00:57:42 2012 Mount count: 13 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 34 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0xec7bcce8 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 Journal sequence: 0x0000f78d Journal start: 1 [2]- Done sleep 5 > /boot/foo # xxd -g4 -a /tmp/md0 0000000: faeb2101 b4014c49 4c4f1702 87f77050 ..!...LILO....pP 0000010: 00000000 02fcc24f 00000000 c2008070 .......O.......p 0000020: e6517a2e b8c0078e d0bc0008 fb525306 .Qz..........RS. 0000030: 56fc8ed8 31ed60b8 0012b336 cd1061b0 V...1.`....6..a. 0000040: 0de86601 b00ae861 01b04ce8 5c01601e ..f....a..L.\.`. 0000050: 0780fafe 750288f2 bb00028a 761e89d0 ....u.......v... 0000060: 80e48030 e0780a3c 107306f6 461c4075 ...0.x.<.s..F.@u 0000070: 2e88f266 8b761866 09f67423 52b408b2 ...f.v.f..t#R... 0000080: 8053cd13 5b72570f b6caba7f 00426631 .S..[rW......Bf1 0000090: c040e860 00663bb7 b8017403 e2ef5a53 .@.`.f;...t...ZS 00000a0: 8a761fbe 2000e8df 00b49966 817ffc4c .v.. ......f...L 00000b0: 494c4f75 295e6880 080731db e8c90075 ILOu)^h...1....u 00000c0: fbbe0600 89f7b90a 00b49af3 a6750fb0 .............u.. 00000d0: 02ae750a 0655b049 e8cf00cb b440b020 ..u..U.I.....@. 00000e0: e8c700e8 b400fe4e 007407bc e80761e9 .......N.t....a. 00000f0: 5cfff4eb fd605555 66500653 6a016a10 \....`UUfP.Sj.j. 0000100: 89e653f6 c6607470 f6c62074 14bbaa55 ..S..`tp.. t...U 0000110: b441cd13 720b81fb 55aa7505 f6c10175 .A..r...U.u....u 0000120: 415206b4 08cd1307 72b451c0 e90686e9 AR......r.Q..... 0000130: 89cf59c1 ea089240 4983e13f 41f7e193 ..Y....@I..?A... 0000140: 8b44088b 540a39da 7392f7f3 39f8778c .D..T.9.s...9.w. 0000150: c0e40686 e092f6f1 08e289d1 415a88c6 ............AZ.. 0000160: eb1cb442 5bbd0500 60cd1373 164d74b8 ...B[...`..s.Mt. 0000170: 31c0cd13 614debf0 66505958 88e6b801 1...aM..fPYX.... 0000180: 02ebe18d 641061c3 66ad6609 c0740a66 ....d.a.f.f..t.f 0000190: 034610e8 5fff80c7 02c3c1c0 04e80300 .F.._........... 00001a0: c1c00424 0f2704f0 144060bb 0700b40e ...$.'...@`..... 00001b0: cd1061c3 00000000 00000000 00000000 ..a............. 00001c0: 00000000 00000000 00000000 00000000 ................ * 00001f0: 00000000 00000000 00000000 000055aa ..............U. 0000200: 00000000 00000000 00000000 00000000 ................ * 00003f0: 00000000 00000000 00000000 03b7302c ..............0, 0000400: 00c00000 60bf0300 f82f0000 251a0100 ....`..../..%... 0000410: 61690000 00000000 02000000 02000000 ai.............. 0000420: 00800000 00800000 00180000 06257250 .............%rP 0000430: 06257250 0d00ffff 53ef0100 01000000 .%rP....S....... 0000440: 5a706b50 00000000 00000000 01000000 ZpkP............ 0000450: 00000000 0b000000 00010000 3c000000 ............<... 0000460: 46020000 6b040000 72aa9b1c 4180444a F...k...r...A.DJ 0000470: 8e15836d dad4f235 626f6f74 00000000 ...m...5boot.... 0000480: 00000000 00000000 2f626f6f 74000000 ......../boot... 0000490: 00000000 00000000 00000000 00000000 ................ * 00004c0: 00000000 00000000 00000000 00003b00 ..............;. 00004d0: 00000000 00000000 00000000 00000000 ................ 00004e0: 08000000 00000000 ad000000 f5fe1926 ...............& 00004f0: d2da4864 b41fa932 76ae313f 01010000 ..Hd...2v.1?.... 0000500: 0c000000 00000000 e2f9c24f 0af30100 ...........O.... 0000510: 04000000 00000000 00000000 00100000 ................ 0000520: 00000100 00000000 00000000 00000000 ................ 0000530: 00000000 00000000 00000000 00000000 ................ 0000540: 00000000 00000000 00000000 00000001 ................ 0000550: 00000000 00000000 00000000 1c001c00 ................ 0000560: 01000000 00000000 00000000 00000000 ................ 0000570: 00000000 04010000 bd501802 00000000 .........P...... 0000580: 00000000 00000000 00000000 00000000 ................ * 00007f0: 00000000 00000000 00000000 e8cc7bec ..............{. 0000800: 00000000 00000000 00000000 00000000 ................ * 0000ff0: 00000000 00000000 00000000 00000000 ................ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-07 22:54 ` Theodore Ts'o 2012-10-08 1:05 ` George Spelvin @ 2012-10-08 1:25 ` George Spelvin 2012-10-08 2:41 ` Theodore Ts'o 1 sibling, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-10-08 1:25 UTC (permalink / raw) To: linux, tytso; +Cc: linux-ext4, tm More reproduction (and hopefully useful ideas at the end) # sleep 10 > /boot/foo & rm /boot/foo ; dumpe2fs -h /dev/md0 ; dd if=/dev/md0 of=/tmp/md0a count=4 ; /tmp/e2fsck -n /dev/md0 [2] 21690 dumpe2fs 1.43-WIP (22-Sep-2012) dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0 Couldn't find valid filesystem superblock. 4+0 records in 4+0 records out 2048 bytes (2.0 kB) copied, 3.0265e-05 s, 67.7 MB/s e2fsck 1.43-WIP (22-Sep-2012) Warning! /dev/md0 is mounted. Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72229 Free inodes: 26977 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Mon Oct 8 00:57:42 2012 Last write time: Mon Oct 8 00:57:42 2012 Mount count: 13 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 34 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 173 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0xec7bcce8 Expected checksum was dfd1473e ext2fs_open2: Superblock checksum does not match superblock /tmp/e2fsck: Superblock invalid, trying backup blocks... Superblock needs_recovery flag is clear, but journal has data. Recovery flag not set in backup superblock, so running journal anyway. Clear journal? no boot was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Deleted inode 173 has zero dtime. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (16896, counted=5353). Fix? no Free blocks count wrong for group #1 (7259, counted=2217). Fix? no Free blocks count wrong for group #2 (12585, counted=3984). Fix? no Free blocks count wrong for group #3 (19829, counted=17775). Fix? no Free blocks count wrong for group #4 (17162, counted=15224). Fix? no Free blocks count wrong for group #5 (18729, counted=10523). Fix? no Free blocks count wrong for group #6 (13443, counted=11644). Fix? no Free blocks count wrong for group #7 (8424, counted=5509). Fix? no Free blocks count wrong (114415, counted=72229). Fix? no Inode bitmap differences: -173 Fix? no Free inodes count wrong for group #0 (651, counted=439). Fix? no Free inodes count wrong for group #1 (128, counted=286). Fix? no Free inodes count wrong for group #2 (1137, counted=1158). Fix? no Free inodes count wrong for group #3 (792, counted=823). Fix? no Free inodes count wrong (26978, counted=26976). Fix? no Inode bitmap differences: Group 0 inode bitmap does not match checksum IGNORED. Group 1 inode bitmap does not match checksum IGNORED. Group 2 inode bitmap does not match checksum IGNORED. Group 3 inode bitmap does not match checksum IGNORED. Group 5 inode bitmap does not match checksum IGNORED. Group 6 inode bitmap does not match checksum IGNORED. Group 7 inode bitmap does not match checksum IGNORED. Block bitmap differences: Group 0 block bitmap does not match checksum IGNORED. Group 1 block bitmap does not match checksum IGNORED. Group 2 block bitmap does not match checksum IGNORED. Group 3 block bitmap does not match checksum IGNORED. Group 4 block bitmap does not match checksum IGNORED. Group 5 block bitmap does not match checksum IGNORED. Group 6 block bitmap does not match checksum IGNORED. Group 7 block bitmap does not match checksum IGNORED. boot: ********** WARNING: Filesystem still has errors ********** boot: 22174/49152 files (3.6% non-contiguous), 131185/245600 blocks # dumpe2fs -h /dev/md0 dumpe2fs 1.43-WIP (22-Sep-2012) Filesystem volume name: boot Last mounted on: /boot Filesystem UUID: 72aa9b1c-4180-444a-8e15-836ddad4f235 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 49152 Block count: 245600 Reserved block count: 12280 Free blocks: 72229 Free inodes: 26977 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 59 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 6144 Inode blocks per group: 384 Flex block group size: 16 Filesystem created: Mon May 28 04:06:58 2012 Last mount time: Mon Oct 8 00:57:42 2012 Last write time: Mon Oct 8 00:57:42 2012 Mount count: 13 Maximum mount count: -1 Last checked: Tue Oct 2 22:53:14 2012 Check interval: 0 (<none>) Lifetime writes: 34 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f5fe1926-d2da-4864-b41f-a93276ae313f Journal backup: inode blocks Checksum type: crc32c Checksum: 0xec7bcce8 Journal features: journal_incompat_revoke Journal size: 16M Journal length: 4096 Journal sequence: 0x0000f78d Journal start: 1 [2]- Done sleep 10 > /boot/foo # xxd -g4 -a /dev/md0a [... first 512b snipped ...] 0000200: 00000000 00000000 00000000 00000000 ................ * 00003f0: 00000000 00000000 00000000 03b7302c ..............0, 0000400: 00c00000 60bf0300 f82f0000 251a0100 ....`..../..%... 0000410: 61690000 00000000 02000000 02000000 ai.............. 0000420: 00800000 00800000 00180000 06257250 .............%rP 0000430: 06257250 0d00ffff 53ef0100 01000000 .%rP....S....... 0000440: 5a706b50 00000000 00000000 01000000 ZpkP............ 0000450: 00000000 0b000000 00010000 3c000000 ............<... 0000460: 46020000 6b040000 72aa9b1c 4180444a F...k...r...A.DJ 0000470: 8e15836d dad4f235 626f6f74 00000000 ...m...5boot.... 0000480: 00000000 00000000 2f626f6f 74000000 ......../boot... 0000490: 00000000 00000000 00000000 00000000 ................ * 00004c0: 00000000 00000000 00000000 00003b00 ..............;. 00004d0: 00000000 00000000 00000000 00000000 ................ 00004e0: 08000000 00000000 ad000000 f5fe1926 ...............& 00004f0: d2da4864 b41fa932 76ae313f 01010000 ..Hd...2v.1?.... 0000500: 0c000000 00000000 e2f9c24f 0af30100 ...........O.... 0000510: 04000000 00000000 00000000 00100000 ................ 0000520: 00000100 00000000 00000000 00000000 ................ 0000530: 00000000 00000000 00000000 00000000 ................ 0000540: 00000000 00000000 00000000 00000001 ................ 0000550: 00000000 00000000 00000000 1c001c00 ................ 0000560: 01000000 00000000 00000000 00000000 ................ 0000570: 00000000 04010000 bd501802 00000000 .........P...... 0000580: 00000000 00000000 00000000 00000000 ................ * 00007f0: 00000000 00000000 00000000 e8cc7bec ..............{. # That's a dumpe2fs, a dumpe2fs, and a (patched) e2fsck on the corruption. For reference, here's the superblock after the sleep expired (and dumpe2fs stopped complaining) # xxd -g4 -a -l2048 /dev/md0 0000200: 00000000 00000000 00000000 00000000 ................ * 00003f0: 00000000 00000000 00000000 03b7302c ..............0, 0000400: 00c00000 60bf0300 f82f0000 251a0100 ....`..../..%... 0000410: 61690000 00000000 02000000 02000000 ai.............. 0000420: 00800000 00800000 00180000 06257250 .............%rP 0000430: 06257250 0d00ffff 53ef0100 01000000 .%rP....S....... 0000440: 5a706b50 00000000 00000000 01000000 ZpkP............ 0000450: 00000000 0b000000 00010000 3c000000 ............<... 0000460: 46020000 6b040000 72aa9b1c 4180444a F...k...r...A.DJ 0000470: 8e15836d dad4f235 626f6f74 00000000 ...m...5boot.... 0000480: 00000000 00000000 2f626f6f 74000000 ......../boot... 0000490: 00000000 00000000 00000000 00000000 ................ * 00004c0: 00000000 00000000 00000000 00003b00 ..............;. 00004d0: 00000000 00000000 00000000 00000000 ................ 00004e0: 08000000 00000000 00000000 f5fe1926 ...............& 00004f0: d2da4864 b41fa932 76ae313f 01010000 ..Hd...2v.1?.... 0000500: 0c000000 00000000 e2f9c24f 0af30100 ...........O.... 0000510: 04000000 00000000 00000000 00100000 ................ 0000520: 00000100 00000000 00000000 00000000 ................ 0000530: 00000000 00000000 00000000 00000000 ................ 0000540: 00000000 00000000 00000000 00000001 ................ 0000550: 00000000 00000000 00000000 1c001c00 ................ 0000560: 01000000 00000000 00000000 00000000 ................ 0000570: 00000000 04010000 bd501802 00000000 .........P...... 0000580: 00000000 00000000 00000000 00000000 ................ * 00007f0: 00000000 00000000 00000000 e8cc7bec ..............{. Notice that the only difference is that the byte at 0x04e8 (offset 0xe8 in the superblock) is cleared, and the checksum is NOT changed, in the "working" superblock. Perhaps you're looking for the bug backward: the checksum *is* getting upated, but the data checksummed is *not*, leading to the mismatch. They're also in different halves, so perhaps not writing out both sectors? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-08 1:25 ` George Spelvin @ 2012-10-08 2:41 ` Theodore Ts'o 2012-10-08 3:17 ` George Spelvin 2012-11-01 1:05 ` ext4: fix metadata checksum calculation for the superblock George Spelvin 0 siblings, 2 replies; 22+ messages in thread From: Theodore Ts'o @ 2012-10-08 2:41 UTC (permalink / raw) To: George Spelvin; +Cc: linux-ext4, tm I found the problem. It turns out ext4_handle_dirty_super() was completely FUBAR'ed and was calculating the checksum on the wrong data (for all but 1k block file systems, sigh). We just didn't notice because the checksum would be correctly set when the file system was unmounted cleanly. (Sigh). The following patch should fix things. Thanks for testing out the metadata checksum on the root file system, and reporting this problem!!! - Ted >From bdd7ed290bf12c2e9132fbe97208a1af79c7a29d Mon Sep 17 00:00:00 2001 From: Theodore Ts'o <tytso@mit.edu> Date: Sun, 7 Oct 2012 22:18:56 -0400 Subject: [PATCH] ext4: fix metadata checksum calculation for the superblock The function ext4_handle_dirty_super() was calculating the superblock on the wrong block data. As a result, when the superblock is modified while it is mounted (most commonly, when inodes are added or removed from the orphan list), the superblock checksum would be wrong. We didn't notice because the superblock *was* being correctly calculated in ext4_commit_super(), and this would get called when the file system was unmounted. So the problem only became obvious if the system crashed while the file system was mounted. Fix this by removing the poorly designed function signature for ext4_superblock_Csum_set(); if it only took a single argument, the pointer to a struct superblock, the ambiguity which caused this mistake would have been impossible. Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org --- fs/ext4/ext4.h | 3 +-- fs/ext4/ext4_jbd2.c | 8 ++------ fs/ext4/super.c | 7 ++++--- 3 files changed, 7 insertions(+), 11 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 3ab2539..78971cf 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2063,8 +2063,7 @@ extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count); extern int ext4_calculate_overhead(struct super_block *sb); extern int ext4_superblock_csum_verify(struct super_block *sb, struct ext4_super_block *es); -extern void ext4_superblock_csum_set(struct super_block *sb, - struct ext4_super_block *es); +extern void ext4_superblock_csum_set(struct super_block *sb); extern void *ext4_kvmalloc(size_t size, gfp_t flags); extern void *ext4_kvzalloc(size_t size, gfp_t flags); extern void ext4_kvfree(void *ptr); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index bfa65b4..b4323ba 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -143,17 +143,13 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, struct buffer_head *bh = EXT4_SB(sb)->s_sbh; int err = 0; + ext4_superblock_csum_set(sb); if (ext4_handle_valid(handle)) { - ext4_superblock_csum_set(sb, - (struct ext4_super_block *)bh->b_data); err = jbd2_journal_dirty_metadata(handle, bh); if (err) ext4_journal_abort_handle(where, line, __func__, bh, handle, err); - } else { - ext4_superblock_csum_set(sb, - (struct ext4_super_block *)bh->b_data); + } else mark_buffer_dirty(bh); - } return err; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 982f6fc..5ededf1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -143,9 +143,10 @@ int ext4_superblock_csum_verify(struct super_block *sb, return es->s_checksum == ext4_superblock_csum(sb, es); } -void ext4_superblock_csum_set(struct super_block *sb, - struct ext4_super_block *es) +void ext4_superblock_csum_set(struct super_block *sb) { + struct ext4_super_block *es = EXT4_SB(sb)->s_es; + if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) return; @@ -4387,7 +4388,7 @@ static int ext4_commit_super(struct super_block *sb, int sync) cpu_to_le32(percpu_counter_sum_positive( &EXT4_SB(sb)->s_freeinodes_counter)); BUFFER_TRACE(sbh, "marking dirty"); - ext4_superblock_csum_set(sb, es); + ext4_superblock_csum_set(sb); mark_buffer_dirty(sbh); if (sync) { error = sync_dirty_buffer(sbh); -- 1.7.12.rc0.22.gcdd159b ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-08 2:41 ` Theodore Ts'o @ 2012-10-08 3:17 ` George Spelvin 2012-10-08 4:03 ` Tao Ma 2012-11-01 1:05 ` ext4: fix metadata checksum calculation for the superblock George Spelvin 1 sibling, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-10-08 3:17 UTC (permalink / raw) To: linux, tytso; +Cc: linux-ext4, tm I'm testing that patch, but you may want to fix it a bit more before submitting to stable@... fs/ext4/resize.c: In function 'update_backups': fs/ext4/resize.c:973:39: error: too many arguments to function 'ext4_superblock_csum_set' In file included from fs/ext4/ext4_jbd2.h:20:0, from fs/ext4/resize.c:17: fs/ext4/ext4.h:2049:13: note: declared here The fix is of course obvious and I'm compiling it now. diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 41f6ef6..e781259 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -970,7 +970,7 @@ static void update_backups(struct super_block *sb, goto exit_err; } - ext4_superblock_csum_set(sb, (struct ext4_super_block *)data); + ext4_superblock_csum_set(sb); while ((group = ext4_list_backups(sb, &three, &five, &seven)) < last) { struct buffer_head *bh; ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-08 3:17 ` George Spelvin @ 2012-10-08 4:03 ` Tao Ma 2012-10-08 11:35 ` George Spelvin 0 siblings, 1 reply; 22+ messages in thread From: Tao Ma @ 2012-10-08 4:03 UTC (permalink / raw) To: George Spelvin; +Cc: tytso, linux-ext4 On 10/08/2012 11:17 AM, George Spelvin wrote: > I'm testing that patch, but you may want to fix it a bit more before submitting to > stable@... > fs/ext4/resize.c: In function 'update_backups': > fs/ext4/resize.c:973:39: error: too many arguments to function 'ext4_superblock_csum_set' > In file included from fs/ext4/ext4_jbd2.h:20:0, > from fs/ext4/resize.c:17: > fs/ext4/ext4.h:2049:13: note: declared here > > The fix is of course obvious and I'm compiling it now. > > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c > index 41f6ef6..e781259 100644 > --- a/fs/ext4/resize.c > +++ b/fs/ext4/resize.c > @@ -970,7 +970,7 @@ static void update_backups(struct super_block *sb, > goto exit_err; > } > > - ext4_superblock_csum_set(sb, (struct ext4_super_block *)data); > + ext4_superblock_csum_set(sb); this line is already removed in my commit bef53b01 and will be in stable. So this patch should work as expected. Thanks Tao > > while ((group = ext4_list_backups(sb, &three, &five, &seven)) < last) { > struct buffer_head *bh; > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: metadata_csum + unclean shutdown = failure to boot 2012-10-08 4:03 ` Tao Ma @ 2012-10-08 11:35 ` George Spelvin 0 siblings, 0 replies; 22+ messages in thread From: George Spelvin @ 2012-10-08 11:35 UTC (permalink / raw) To: linux, tm; +Cc: linux-ext4, tytso Tao Ma <tm@tao.ma> wrote: > This line is already removed in my commit bef53b01 and will be in > stable. So this patch should work as expected. Ah, okay. Well, with that, it appears to be working; I can't reproduce the problem any more. Thanks to everyone for a significant bugfix starting with a vague report over a weekend! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-10-08 2:41 ` Theodore Ts'o 2012-10-08 3:17 ` George Spelvin @ 2012-11-01 1:05 ` George Spelvin 2012-11-01 1:13 ` Darrick J. Wong 1 sibling, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-11-01 1:05 UTC (permalink / raw) To: darrick.wong, linux, tytso; +Cc: linux-ext4, tm I'm currently running with two ext4 patches: Author: Theodore Ts'o <tytso@mit.edu> Date: Sun Oct 7 22:18:56 2012 -0400 Subject: ext4: fix metadata checksum calculation for the superblock Author: Darrick J. Wong <darrick.wong@oracle.com> Date: Wed Oct 17 12:51:30 2012 -0700 Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan They appear to fix real problems. I notice, that neither of these have made it into 2.6.5. Should they be sent to -stable at some point? I'm not trying to overrule your judgements on the matter, just ensure that the omission is actually a conscious decision rather than an oversight. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 1:05 ` ext4: fix metadata checksum calculation for the superblock George Spelvin @ 2012-11-01 1:13 ` Darrick J. Wong 2012-11-01 1:50 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Darrick J. Wong @ 2012-11-01 1:13 UTC (permalink / raw) To: George Spelvin; +Cc: tytso, linux-ext4, tm On Wed, Oct 31, 2012 at 09:05:21PM -0400, George Spelvin wrote: > I'm currently running with two ext4 patches: > > Author: Theodore Ts'o <tytso@mit.edu> > Date: Sun Oct 7 22:18:56 2012 -0400 > Subject: ext4: fix metadata checksum calculation for the superblock > > Author: Darrick J. Wong <darrick.wong@oracle.com> > Date: Wed Oct 17 12:51:30 2012 -0700 > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan > > They appear to fix real problems. I notice, that neither of these have > made it into 2.6.5. Should they be sent to -stable at some point? > > I'm not trying to overrule your judgements on the matter, just ensure that > the omission is actually a conscious decision rather than an oversight. <shrug> I was wondering too, but I figured Ted was probably busy dealing with the corruption bug and such. (Which itself doesn't seem to be in 3.6.x yet) I suppose it's a good sign that it's been more than a week and you haven't hit anything else... --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 1:13 ` Darrick J. Wong @ 2012-11-01 1:50 ` Theodore Ts'o 2012-11-01 3:22 ` Darrick J. Wong 2012-11-01 6:12 ` George Spelvin 0 siblings, 2 replies; 22+ messages in thread From: Theodore Ts'o @ 2012-11-01 1:50 UTC (permalink / raw) To: Darrick J. Wong; +Cc: George Spelvin, linux-ext4, tm On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote: > > Author: Theodore Ts'o <tytso@mit.edu> > > Date: Sun Oct 7 22:18:56 2012 -0400 > > Subject: ext4: fix metadata checksum calculation for the superblock This one was cc'ed to stable@vger.kernel.org. But when you said "I notice, that neither of thse have made it into 2.6.5", I assume you meant 3.5? The last 3.5 kernel is 3.5.7, and Greg K-H isn't backporting fixes to 3.5.x any more. (See http://www.kernel.org to see which kernels are marked "EOL"; those are the ones which are no longer getting updates.) So that means it should eventually make it to the 3.4.x and 3.6.x kernels. > > Author: Darrick J. Wong <darrick.wong@oracle.com> > > Date: Wed Oct 17 12:51:30 2012 -0700 > > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan I missed this one because the subject line didn't have [PATCH] in it. (Darrick, it really helps if you use git format-patch / git send-email; you can use a message-id of the message you're replying to in the mail thread to chain the message to the thread.) I would have eventually found it in patchwork, but even in patchwork the listing would have had a potentially misleading subject line, since it grabs the patch title from the subject line of the e-mail. > <shrug> I was wondering too, but I figured Ted was probably busy dealing with > the corruption bug and such. > > (Which itself doesn't seem to be in 3.6.x yet) It isn't in 3.7-rc3 because I didn't see it before I sent the pull request to Linus.... At this point I'll just include it in the patches to be sent to Linus at the next merge window, mainly because I don't have the time to run a separate regression test run just for this patch, and it's only a cosmetic issue, right? - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 1:50 ` Theodore Ts'o @ 2012-11-01 3:22 ` Darrick J. Wong 2012-11-01 6:12 ` George Spelvin 1 sibling, 0 replies; 22+ messages in thread From: Darrick J. Wong @ 2012-11-01 3:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: George Spelvin, linux-ext4, tm On Wed, Oct 31, 2012 at 09:50:15PM -0400, Theodore Ts'o wrote: > On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote: > > > Author: Theodore Ts'o <tytso@mit.edu> > > > Date: Sun Oct 7 22:18:56 2012 -0400 > > > Subject: ext4: fix metadata checksum calculation for the superblock > > This one was cc'ed to stable@vger.kernel.org. But when you said "I > notice, that neither of thse have made it into 2.6.5", I assume you > meant 3.5? The last 3.5 kernel is 3.5.7, and Greg K-H isn't > backporting fixes to 3.5.x any more. (See http://www.kernel.org to > see which kernels are marked "EOL"; those are the ones which are no > longer getting updates.) > > So that means it should eventually make it to the 3.4.x and 3.6.x > kernels. I thought he meant 3.6.5, but I haven't really been paying 3.6.x much attention. > > > Author: Darrick J. Wong <darrick.wong@oracle.com> > > > Date: Wed Oct 17 12:51:30 2012 -0700 > > > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan > > I missed this one because the subject line didn't have [PATCH] in it. > (Darrick, it really helps if you use git format-patch / git > send-email; you can use a message-id of the message you're replying to > in the mail thread to chain the message to the thread.) > > I would have eventually found it in patchwork, but even in patchwork > the listing would have had a potentially misleading subject line, > since it grabs the patch title from the subject line of the e-mail. Oops, I guess I did forget the magic "[PATCH]". Sorry about that. > > <shrug> I was wondering too, but I figured Ted was probably busy dealing with > > the corruption bug and such. > > > > (Which itself doesn't seem to be in 3.6.x yet) > > It isn't in 3.7-rc3 because I didn't see it before I sent the pull > request to Linus.... > > At this point I'll just include it in the patches to be sent to Linus > at the next merge window, mainly because I don't have the time to run > a separate regression test run just for this patch, and it's only a > cosmetic issue, right? Yep. --D > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 1:50 ` Theodore Ts'o 2012-11-01 3:22 ` Darrick J. Wong @ 2012-11-01 6:12 ` George Spelvin 2012-11-01 6:49 ` Darrick J. Wong 1 sibling, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-11-01 6:12 UTC (permalink / raw) To: darrick.wong, tytso; +Cc: linux-ext4, linux, tm > This one was cc'ed to stable@vger.kernel.org. But when you said "I > notice, that neither of thse have made it into 2.6.5", I assume you > meant 3.5? Whoops, typo! I meant 3.6.5, the very latest just-out-today stable kernel. Quite a few 3.6.x kernels have come out since that patch was Cc'ed, and it keeps not being included. So I wondered. > So that means it should eventually make it to the 3.4.x and 3.6.x > kernels. That's what I thought, but I didn't want to pester Greg until I was sure of your intentions. > At this point I'll just include it in the patches to be sent to Linus > at the next merge window, mainly because I don't have the time to run > a separate regression test run just for this patch, and it's only a > cosmetic issue, right? Well, it causes the file system to be marked dirty and unnecessarily checked on reboot, which I contend is a bug, but it's not a data-loss bug. I do worry that it could cause file lookup to fail when it shouldn't, which *is* effectively a data-loss bug, even if the data reappears on reboot. But I'd have to understand the problem and fix better to know if that actually happens; I haven't observed it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 6:12 ` George Spelvin @ 2012-11-01 6:49 ` Darrick J. Wong 2012-11-01 7:07 ` George Spelvin 0 siblings, 1 reply; 22+ messages in thread From: Darrick J. Wong @ 2012-11-01 6:49 UTC (permalink / raw) To: George Spelvin; +Cc: tytso, linux-ext4, tm On Thu, Nov 01, 2012 at 02:12:12AM -0400, George Spelvin wrote: > > This one was cc'ed to stable@vger.kernel.org. But when you said "I > > notice, that neither of thse have made it into 2.6.5", I assume you > > meant 3.5? > > Whoops, typo! I meant 3.6.5, the very latest just-out-today stable > kernel. > > Quite a few 3.6.x kernels have come out since that patch was Cc'ed, > and it keeps not being included. So I wondered. > > > So that means it should eventually make it to the 3.4.x and 3.6.x > > kernels. > > That's what I thought, but I didn't want to pester Greg until I was sure > of your intentions. > > > At this point I'll just include it in the patches to be sent to Linus > > at the next merge window, mainly because I don't have the time to run > > a separate regression test run just for this patch, and it's only a > > cosmetic issue, right? > > Well, it causes the file system to be marked dirty and unnecessarily > checked on reboot, which I contend is a bug, but it's not a data-loss > bug. > > I do worry that it could cause file lookup to fail when it shouldn't, > which *is* effectively a data-loss bug, even if the data reappears > on reboot. But I'd have to understand the problem and fix better to > know if that actually happens; I haven't observed it. Yes, it would be useful to know what's going on with this directory file, since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it. What I was /going/ for was that the kernel would notice a bad directory and flag it for fsck on reboot. Upon reboot, fsck would be run, notice the bad dir, and feed it to the directory rebuilder to get it fixed for good. However, there doesn't seem to be any real checksum mismatch, so the rebuild doesn't happen. Also ... refresh my memory -- some files have disappeared as a result of this happening? --D ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 6:49 ` Darrick J. Wong @ 2012-11-01 7:07 ` George Spelvin 2012-11-01 7:18 ` Darrick J. Wong 0 siblings, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-11-01 7:07 UTC (permalink / raw) To: darrick.wong, linux; +Cc: linux-ext4, tm, tytso > Yes, it would be useful to know what's going on with this directory file, > since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it. > What I was /going/ for was that the kernel would notice a bad directory > and flag it for fsck on reboot. Upon reboot, fsck would be run, notice > the bad dir, and feed it to the directory rebuilder to get it fixed > for good. However, there doesn't seem to be any real checksum mismatch, > so the rebuild doesn't happen. That's what confuses me. I had already run e2fsck -D (which I assume rebuilds all directories, even if unnecessary) before observing the problem. The other odd clue is that it's always nfsd that chokes; other accesses to the directory (ls -U, ls -lU, grep -r) don't produce the message. > Also ... refresh my memory -- some files have disappeared as a result of this > happening? I haven't observed it, no. But the nature of the symptoms suggests it might be happening. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 7:07 ` George Spelvin @ 2012-11-01 7:18 ` Darrick J. Wong 2012-11-01 7:28 ` George Spelvin 0 siblings, 1 reply; 22+ messages in thread From: Darrick J. Wong @ 2012-11-01 7:18 UTC (permalink / raw) To: George Spelvin; +Cc: linux-ext4, tm, tytso On Thu, Nov 01, 2012 at 03:07:31AM -0400, George Spelvin wrote: > > Yes, it would be useful to know what's going on with this directory file, > > since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it. > > What I was /going/ for was that the kernel would notice a bad directory > > and flag it for fsck on reboot. Upon reboot, fsck would be run, notice > > the bad dir, and feed it to the directory rebuilder to get it fixed > > for good. However, there doesn't seem to be any real checksum mismatch, > > so the rebuild doesn't happen. > > That's what confuses me. I had already run e2fsck -D (which I assume > rebuilds all directories, even if unnecessary) before observing the > problem. The other odd clue is that it's always nfsd that chokes; > other accesses to the directory (ls -U, ls -lU, grep -r) don't produce > the message. Oh, so ... it's just nfsd that causes the linear fallback? Regular (i.e. non-nfs) users can see everything in the dir, no error messages? Now *that* is odd. :) You know, I was starting to wonder what on earth would even cause the fallback in the first place. It even looked like most of the "your dir is corrupt" exits from that function would spit out an error or be somehow obviously broken. > > Also ... refresh my memory -- some files have disappeared as a result of this > > happening? > > I haven't observed it, no. But the nature of the symptoms suggests it > might be happening. Hum. When linear scan happens on a hashed dir, it's scanning the same blocks that the hash scan sees. The htree block looks like a regular directory block with one huge "unused" dirent that wraps all the htree data. So, the linear scan should find the exact same files as a htree scan would. If it doesn't, something's wrong. But you say it isn't, so I imagine it's fine. <shrug> Another thing for me to ponder tomorrow. :) --D ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 7:18 ` Darrick J. Wong @ 2012-11-01 7:28 ` George Spelvin 2012-11-02 0:05 ` Darrick J. Wong 0 siblings, 1 reply; 22+ messages in thread From: George Spelvin @ 2012-11-01 7:28 UTC (permalink / raw) To: darrick.wong, linux; +Cc: linux-ext4, tm, tytso > Oh, so ... it's just nfsd that causes the linear fallback? Regular (i.e. > non-nfs) users can see everything in the dir, no error messages? Yup. After it survived one e2fsck -D, I poked at the directory a bit to see if I could cause the error. No success from local access. It's also probably an NFSv2 client. I wonder if it's doing something odd with directory seeks that's causing problems; perhaps htree and the 32-bit seek cookie limit are not friends? >> I haven't observed it, no. But the nature of the symptoms suggests it >> might be happening. > Hum. When linear scan happens on a hashed dir, it's scanning the same > blocks that the hash scan sees. The htree block looks like a regular > directory block with one huge "unused" dirent that wraps all the htree > data. So, the linear scan should find the exact same files as a htree > scan would. If it doesn't, something's wrong. But you say it isn't, > so I imagine it's fine. Maybe I was wrong. I was worried that it was aborting the directory scan due to the error and thus files would disappear. If that doesn't happen, no worries. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ext4: fix metadata checksum calculation for the superblock 2012-11-01 7:28 ` George Spelvin @ 2012-11-02 0:05 ` Darrick J. Wong 0 siblings, 0 replies; 22+ messages in thread From: Darrick J. Wong @ 2012-11-02 0:05 UTC (permalink / raw) To: George Spelvin; +Cc: linux-ext4, tm, tytso On Thu, Nov 01, 2012 at 03:28:47AM -0400, George Spelvin wrote: > > Oh, so ... it's just nfsd that causes the linear fallback? Regular (i.e. > > non-nfs) users can see everything in the dir, no error messages? > > Yup. After it survived one e2fsck -D, I poked at the directory a bit > to see if I could cause the error. No success from local access. > > It's also probably an NFSv2 client. I wonder if it's doing something > odd with directory seeks that's causing problems; perhaps htree and the > 32-bit seek cookie limit are not friends? <shrug> I'm not nfs-wise, sadly. I _am_ wondering if an ftrace of this might be useful... or a gigantic glut of data that I'll never finish processing. Just from a quick read of ext4_find_entry() it looks like the only thing that results in fallback mode without a kernel message is ext4_bread() failing in dx_probe()? > >> I haven't observed it, no. But the nature of the symptoms suggests it > >> might be happening. > > > Hum. When linear scan happens on a hashed dir, it's scanning the same > > blocks that the hash scan sees. The htree block looks like a regular > > directory block with one huge "unused" dirent that wraps all the htree > > data. So, the linear scan should find the exact same files as a htree > > scan would. If it doesn't, something's wrong. But you say it isn't, > > so I imagine it's fine. > > Maybe I was wrong. I was worried that it was aborting the directory > scan due to the error and thus files would disappear. If that doesn't > happen, no worries. Oh well, it'll run slowly but at least it won't be throwing up errors. --D ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-11-02 0:06 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-07 5:04 metadata_csum + unclean shutdown = failure to boot George Spelvin 2012-10-07 13:39 ` Tao Ma 2012-10-07 15:09 ` George Spelvin 2012-10-07 18:10 ` Theodore Ts'o 2012-10-07 20:18 ` George Spelvin 2012-10-07 22:54 ` Theodore Ts'o 2012-10-08 1:05 ` George Spelvin 2012-10-08 1:25 ` George Spelvin 2012-10-08 2:41 ` Theodore Ts'o 2012-10-08 3:17 ` George Spelvin 2012-10-08 4:03 ` Tao Ma 2012-10-08 11:35 ` George Spelvin 2012-11-01 1:05 ` ext4: fix metadata checksum calculation for the superblock George Spelvin 2012-11-01 1:13 ` Darrick J. Wong 2012-11-01 1:50 ` Theodore Ts'o 2012-11-01 3:22 ` Darrick J. Wong 2012-11-01 6:12 ` George Spelvin 2012-11-01 6:49 ` Darrick J. Wong 2012-11-01 7:07 ` George Spelvin 2012-11-01 7:18 ` Darrick J. Wong 2012-11-01 7:28 ` George Spelvin 2012-11-02 0:05 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).