linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* metadata_csum + unclean shutdown = failure to boot
@ 2012-10-07  5:04 George Spelvin
  2012-10-07 13:39 ` Tao Ma
  0 siblings, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-10-07  5:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: linux

Feeling a bit adventurous, I enabled metadata_csum on many of
my daily-use file systems.

I have now noticed a problem in the event of an unclean shutdown.

On reboot, the kernel complains about a bad superblock checksum, suggests
e2fsck, and then fails to mount the root filesystem.

This makes running e2fsck a bit problematic.

I can fix it manually, but it makes automatic reboots *extremely*
problematic.

Is it possible to fix the kernel code to be a bit more forgiving?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07  5:04 metadata_csum + unclean shutdown = failure to boot George Spelvin
@ 2012-10-07 13:39 ` Tao Ma
  2012-10-07 15:09   ` George Spelvin
  0 siblings, 1 reply; 22+ messages in thread
From: Tao Ma @ 2012-10-07 13:39 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-ext4

Hi George,
On 10/07/2012 01:04 PM, George Spelvin wrote:
> Feeling a bit adventurous, I enabled metadata_csum on many of
> my daily-use file systems.
> 
> I have now noticed a problem in the event of an unclean shutdown.
> 
> On reboot, the kernel complains about a bad superblock checksum, suggests
> e2fsck, and then fails to mount the root filesystem.
> 
> This makes running e2fsck a bit problematic.
> 
> I can fix it manually, but it makes automatic reboots *extremely*
> problematic.
> 
> Is it possible to fix the kernel code to be a bit more forgiving?
Interesting. In general, metadata checksum should be updated with the
same content it checksums. So could you please answer my questions first?
1. your kernel version please?
2. what do you mean a *unclean* shutdown?
3. what do you find in your /var/log/messages except the bad superblock
checksum error?

Thanks
Tao

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 13:39 ` Tao Ma
@ 2012-10-07 15:09   ` George Spelvin
  2012-10-07 18:10     ` Theodore Ts'o
  0 siblings, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-10-07 15:09 UTC (permalink / raw)
  To: linux, tm; +Cc: linux-ext4

> Interesting. In general, metadata checksum should be updated with the
> same content it checksums. So could you please answer my questions first?

Good point; it *is* only a single (512 byte) sector.  And it's a
512-byte sector drive (pair of drives in RAID-1, actually).

However, it appears that while the FS is mounted, an invalid checksum
*is* written.  That's the bug:

# dumpe2fs /dev/md2                     
dumpe2fs 1.43-WIP (22-Sep-2012)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2
Couldn't find valid filesystem superblock.
# /tmp/old/sbin/dumpe2fs -f -h /dev/md2
dumpe2fs 1.42.5 (29-Jul-2012)
./dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2
Couldn't find valid filesystem superblock.

(Unfortunately, dumpe2fs doesn't have a -n flag like debugfs.)


Here's the first 2K of the partition (superblock at 1K), in case it helps:
# xxd -g4 -l2048 -a /dev/md2
0000000: 00000000 00000000 00000000 00000000  ................
*
00003f0: 00000000 00000000 00000000 1d32be49  .............2.I
0000400: b01d4300 301a3605 82b44200 28242504  ..C.0.6...B.($%.
0000410: faae3900 00000000 02000000 02000000  ..9.............
0000420: 00800000 00800000 70060000 c8007150  ........p.....qP
0000430: c8007150 0200ffff 53ef0100 01000000  ..qP....S.......
0000440: eaf37050 00000000 00000000 01000000  ..pP............
0000450: 00000000 0b000000 00010000 3c000000  ............<...
0000460: 46020000 6b040000 a61d8e82 4c814f84  F...k.......L.O.
0000470: 9011cf24 8d295eeb 726f6f74 00000000  ...$.)^.root....
0000480: 00000000 00000000 2f006e74 00000000  ......../.nt....
0000490: 00000000 00000000 00000000 00000000  ................
*
00004c0: 00000000 00000000 00000000 0000eb03  ................
00004d0: 00000000 00000000 00000000 00000000  ................
00004e0: 08000000 00000000 a1863300 dc2dbaa1  ..........3..-..
00004f0: 7ada4a32 96a5dbe8 c42859c2 01010000  z.J2.....(Y.....
0000500: 0c000000 00000000 b2fbc24f 0af30200  ...........O....
0000510: 04000000 00000000 00000000 ff7f0000  ................
0000520: 00809802 ff7f0000 01000000 ffff9802  ................
0000530: 00000000 00000000 00000000 00000000  ................
0000540: 00000000 00000000 00000000 00000008  ................
0000550: 00000000 00000000 00000000 1c001c00  ................
0000560: 01000000 00000000 00000000 00000000  ................
0000570: 00000000 04010000 e4142809 00000000  ..........(.....
0000580: 00000000 00000000 00000000 00000000  ................
*
00007f0: 00000000 00000000 00000000 38a11164  ............8..d


> 1. your kernel version please?

3.6.0.

> 2. what do you mean a *unclean* shutdown?

AC power failure.  This is actually the second time I've seen the problem,
althought the first was pilot error while trying to rearrange fan power
cables with the power on.

> 3. what do you find in your /var/log/messages except the bad superblock
> checksum error?

I don't understand the question.  There's nothing there *including* no
bad superblock checksum error!

The kernel panicked with "unable to mount root file system", so it
didn't even load init, much less get /var/log writeable or start a
syslog process.

I didn't transcribe the on-screen messages because I assumed the code was
working "as expected" on reboot: it only checks the primary superblock,
and if there's an error there, it bails.

On reboot, there's nothing interesting, since by the time we got there,
e2fsck had run and cleaned up the file system.  It just says

Oct  7 00:11:05 $HOST kernel: EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)
Oct  7 00:11:05 $HOST kernel: VFS: Mounted root (ext4 filesystem) readonly on device 9:2.
... followed by module loaading.


The code causing the mount failure on boot is easy to find.  fs/ext4/super.c line 3311:

        /* Check superblock checksum */
        if (!ext4_superblock_csum_verify(sb, es)) {
                ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with "
                         "invalid superblock checksum.  Run e2fsck?");
                silent = 1;
                goto cantfind_ext4;
        }
[...]
cantfind_ext4:
        if (!silent)
                ext4_msg(sb, KERN_ERR, "VFS: Can't find ext4 filesystem");
        goto failed_mount;


The challenge is to find what's writing the bad superblock checksum.


Here's one clue (/boot also has metadata_csum enabled)

# dumpe2fs -h /dev/md0 | tee /tmp/1
dumpe2fs 1.43-WIP (22-Sep-2012)
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72345
Free inodes:              26976
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Sun Oct  7 14:51:04 2012
Last write time:          Sun Oct  7 14:51:33 2012
Mount count:              6
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          33 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x0063be5b
Journal features:         journal_incompat_revoke
Journal size:             16M
Journal length:           4096
Journal sequence:         0x0000f765
Journal start:            0

# mount /boot
# dumpe2fs -h /dev/md0 | diff -u /tmp/1 -
dumpe2fs 1.43-WIP (22-Sep-2012)
--- /tmp/1      2012-10-07 14:51:39.337345910 +0000
+++ -   2012-10-07 14:51:52.454825889 +0000
@@ -3,7 +3,7 @@
 Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
 Filesystem magic number:  0xEF53
 Filesystem revision #:    1 (dynamic)
-Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
 Filesystem flags:         signed_directory_hash 
 Default mount options:    user_xattr acl
 Filesystem state:         clean
@@ -24,9 +24,9 @@
 Inode blocks per group:   384
 Flex block group size:    16
 Filesystem created:       Mon May 28 04:06:58 2012
-Last mount time:          Sun Oct  7 14:51:04 2012
-Last write time:          Sun Oct  7 14:51:33 2012
-Mount count:              6
+Last mount time:          Sun Oct  7 14:51:42 2012
+Last write time:          Sun Oct  7 14:51:42 2012
+Mount count:              7
 Maximum mount count:      -1
 Last checked:             Tue Oct  2 22:53:14 2012
 Check interval:           0 (<none>)
@@ -42,7 +42,7 @@
 Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
 Journal backup:           inode blocks
 Checksum type:            crc32c
-Checksum:                 0x0063be5b
+Checksum:                 0x90bee798
 Journal features:         journal_incompat_revoke
 Journal size:             16M
 Journal length:           4096
# ln /boot/sid.bmp /boot/foo  
# dumpe2fs -h /dev/md0 | diff -u /tmp/1 -
dumpe2fs 1.43-WIP (22-Sep-2012)
--- /tmp/1      2012-10-07 14:51:39.337345910 +0000
+++ -   2012-10-07 14:53:43.619910763 +0000
@@ -3,7 +3,7 @@
 Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
 Filesystem magic number:  0xEF53
 Filesystem revision #:    1 (dynamic)
-Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
+Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
 Filesystem flags:         signed_directory_hash 
 Default mount options:    user_xattr acl
 Filesystem state:         clean
@@ -24,9 +24,9 @@
 Inode blocks per group:   384
 Flex block group size:    16
 Filesystem created:       Mon May 28 04:06:58 2012
-Last mount time:          Sun Oct  7 14:51:04 2012
-Last write time:          Sun Oct  7 14:51:33 2012
-Mount count:              6
+Last mount time:          Sun Oct  7 14:51:42 2012
+Last write time:          Sun Oct  7 14:51:42 2012
+Mount count:              7
 Maximum mount count:      -1
 Last checked:             Tue Oct  2 22:53:14 2012
 Check interval:           0 (<none>)
@@ -42,10 +42,10 @@
 Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
 Journal backup:           inode blocks
 Checksum type:            crc32c
-Checksum:                 0x0063be5b
+Checksum:                 0x90bee798
 Journal features:         journal_incompat_revoke
 Journal size:             16M
 Journal length:           4096
-Journal sequence:         0x0000f765
-Journal start:            0
+Journal sequence:         0x0000f766
+Journal start:            1
 
# touch /boot/bar
(Lots more activity, and I can't make the checksum fail.  But...)

# umount /boot
# mount /boot
# touch /boot/baz

Arrgh!  I can't reproduce it!  Earlier, I did "mount /boot", "dumpe2fs -h"
(successfully), "touch /boot/foo", and dumpe2fs died with a checksum error.

So I thought "aha!  Is it the data write or the inode allocation?  I'll
make a hard link to avoid the inode allocation", but it appears it
wasn't (just) either.

But my root file system (/dev/md2) still has a messed up checksum...

# dumpe2fs -h /dev/md2
dumpe2fs 1.43-WIP (22-Sep-2012)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md2
Couldn't find valid filesystem superblock.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 15:09   ` George Spelvin
@ 2012-10-07 18:10     ` Theodore Ts'o
  2012-10-07 20:18       ` George Spelvin
  0 siblings, 1 reply; 22+ messages in thread
From: Theodore Ts'o @ 2012-10-07 18:10 UTC (permalink / raw)
  To: George Spelvin; +Cc: tm, linux-ext4

I just had a random thought.  Which bootloader are you using?  Is it
grub, or grub2 per chance?

I wonder if it's grub modifying the file system and touching the
superblock, and not knowing about the new metadata checksum feature....

	    	    	    	      	  - Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 18:10     ` Theodore Ts'o
@ 2012-10-07 20:18       ` George Spelvin
  2012-10-07 22:54         ` Theodore Ts'o
  0 siblings, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-10-07 20:18 UTC (permalink / raw)
  To: linux, tytso; +Cc: linux-ext4, tm

> I just had a random thought.  Which bootloader are you using?  Is it
> grub, or grub2 per chance?

Nope,
# lilo -V
LILO version 23.2 (released 09-Apr-2011)
(Debian GNU/Linux)

It's a 32-bit Debian/unstable (sid) userland, on a 64-bit kernel,
running on a 2nd gen i7-2xxx and a Gigabyte Z68A-D3H-B3 motherboard.
The drives are using the motherboard AHCI controller.

Thanks for the idea, though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 20:18       ` George Spelvin
@ 2012-10-07 22:54         ` Theodore Ts'o
  2012-10-08  1:05           ` George Spelvin
  2012-10-08  1:25           ` George Spelvin
  0 siblings, 2 replies; 22+ messages in thread
From: Theodore Ts'o @ 2012-10-07 22:54 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-ext4, tm

If you can replicate this, could you try applying the following patch
to e2fsck, and install it and then capture the output from e2fsck when
it repairs the file system?

That might give us some clues as to what is going on.  I've been going
through the sources and I don't see any place where we mark the
superblock as dirty and write it out without first writing the
checksum first.

There is a chance we could get screwed by a race in no journal mode
where two processes modify superblock at the same time, but we don't
actually modify the superblock that much.  The primary case where the
superblock gets modified while the file system is mounted is when we
add and remove inods from the orphan list, and that is serialized by a
mutex.  The other times when we modify the superblock is when we add a
feature in a few rare cases (the large file feature, or the xattr
compat feature, etc.) and of course during an online resizing.  But
that's not likely to be happening in your case.  So I really don't
understand what might be happening on your system, which is why this
patch will hopefully shed some light as to what is going on.

      	   	     	       	     	- Ted

diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index d2b1bbd..b1fe32c 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1064,6 +1064,13 @@ static errcode_t try_open_fs(e2fsck_t ctx, int flags, io_manager io_ptr,
 		retval = ext2fs_open2(ctx->filesystem_name, ctx->io_options,
 				      flags, 0, 0, io_ptr, ret_fs);
 
+	if (*ret_fs && (*ret_fs)->super && retval == EXT2_ET_SB_CSUM_INVALID) {
+		list_super((*ret_fs)->super);
+		ext2fs_superblock_csum_set(*ret_fs, (*ret_fs)->super);
+		printf("Expected checksum was %04x\n",
+		       (*ret_fs)->super->s_checksum);
+	}
+
 	if (ret_fs)
 		e2fsck_set_bitmap_type(*ret_fs, EXT2FS_BMAP64_RBTREE,
 				       "default", NULL);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 22:54         ` Theodore Ts'o
@ 2012-10-08  1:05           ` George Spelvin
  2012-10-08  1:25           ` George Spelvin
  1 sibling, 0 replies; 22+ messages in thread
From: George Spelvin @ 2012-10-08  1:05 UTC (permalink / raw)
  To: linux, tytso; +Cc: linux-ext4, tm

> If you can replicate this, could you try applying the following patch
> to e2fsck, and install it and then capture the output from e2fsck when
> it repairs the file system?

Well, as I mentioned, the superblock of the currently running root
filesystem has a bad checksum right now, so if you don't mind me NOT
repairing the FS, it's particularly easy.  (What's why I included a
hex-dump of the superblock earlier.)

Let me try fsck -n on the running file system...

# ./e2fsck -n /dev/md2
e2fsck 1.43-WIP (22-Sep-2012)
Warning!  /dev/md2 is mounted.
Filesystem volume name:   root
Last mounted on:          /
Filesystem UUID:          a61d8e82-4c81-4f84-9011-cf248d295eeb
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex
_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              4398512
Block count:              87431728
Reserved block count:     4371586
Free blocks:              69542952
Free inodes:              3780346
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1003
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         1648
Inode blocks per group:   103
Flex block group size:    16
Filesystem created:       Mon May 28 04:14:42 2012
Last mount time:          Sun Oct  7 04:10:48 2012
Last write time:          Sun Oct  7 04:10:48 2012
Mount count:              2
Maximum mount count:      -1
Last checked:             Sun Oct  7 03:15:54 2012
Check interval:           0 (<none>)
Lifetime writes:          147 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       3376801
Default directory hash:   half_md4
Directory Hash Seed:      dc2dbaa1-7ada-4a32-96a5-dbe8c42859c2
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x6411a138
Expected checksum was 242b557a
ext2fs_open2: Superblock checksum does not match superblock
/tmp/e2fsck: Superblock invalid, trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
Clear journal? no

root was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 2214932 was part of the orphaned inode list.  IGNORED.
Deleted inode 2640258 has zero dtime.  Fix? no

Inode 3376801 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(5799936--5800165) -8017765 -8017789 -8027658 -8027660 -8958208 -(19016096--19016124) -38855165 -38873550 -52463109 -(58774956--58774992) -67160656 -67160667 -67160687 -67160703 -67160718 -67160729 -67160905 -69785176
Fix? no
[etc.]

Would hard-crashing the machine and running e2fsck on a static file systtem tell you more?

> There is a chance we could get screwed by a race in no journal mode
> where two processes modify superblock at the same time, but we don't
> actually modify the superblock that much.  The primary case where the
> superblock gets modified while the file system is mounted is when we
> add and remove inods from the orphan list, and that is serialized by a
> mutex.  The other times when we modify the superblock is when we add a
> feature in a few rare cases (the large file feature, or the xattr
> compat feature, etc.) and of course during an online resizing.  But
> that's not likely to be happening in your case.  So I really don't
> understand what might be happening on your system, which is why this
> patch will hopefully shed some light as to what is going on.

Thinking about it, it *is* confusing.

Although with help from your clue about the orphan inode list, I just
managed the following.  It appears to be repeatable.  Is this of any help?

# mount /boot
# dumpe2fs -h /dev/md0
dumpe2fs 1.43-WIP (22-Sep-2012)
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72229
Free inodes:              26977
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Mon Oct  8 00:57:42 2012
Last write time:          Mon Oct  8 00:57:42 2012
Mount count:              13
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          34 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xec7bcce8
Journal features:         journal_incompat_revoke
Journal size:             16M
Journal length:           4096
Journal sequence:         0x0000f78c
Journal start:            0

# sleep 5 > /boot/foo & rm /boot/foo
[2] 6554
# dumpe2fs -h /dev/md0
dumpe2fs 1.43-WIP (22-Sep-2012)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0
Couldn't find valid filesystem superblock.
# /tmp/e2fsck -n /dev/md0
e2fsck 1.43-WIP (22-Sep-2012)
Warning!  /dev/md0 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
boot: clean, 22175/49152 files, 173371/245600 blocks
[2]-  Done                    sleep 5 > /boot/foo
# dumpe2fs -h /dev/md0
dumpe2fs 1.43-WIP (22-Sep-2012)
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72229
Free inodes:              26977
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Mon Oct  8 00:57:42 2012
Last write time:          Mon Oct  8 00:57:42 2012
Mount count:              13
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          34 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xec7bcce8
Journal features:         journal_incompat_revoke
Journal size:             16M
Journal length:           4096
Journal sequence:         0x0000f78d
Journal start:            1
# sleep 5 > /boot/foo & rm /boot/foo ; dumpe2fs -h /dev/md0 ; dd if=/dev/md0 of=/tmp/md0 count=8 
[2] 6137
dumpe2fs 1.43-WIP (22-Sep-2012)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0
Couldn't find valid filesystem superblock.
8+0 records in
8+0 records out
4096 bytes (4.1 kB) copied, 3.8679e-05 s, 106 MB/s
[666]# dumpe2fs -h /dev/md0
dumpe2fs 1.43-WIP (22-Sep-2012)
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72229
Free inodes:              26977
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Mon Oct  8 00:57:42 2012
Last write time:          Mon Oct  8 00:57:42 2012
Mount count:              13
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          34 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xec7bcce8
Journal features:         journal_incompat_revoke
Journal size:             16M
Journal length:           4096
Journal sequence:         0x0000f78d
Journal start:            1

[2]-  Done                    sleep 5 > /boot/foo
# xxd -g4 -a /tmp/md0
0000000: faeb2101 b4014c49 4c4f1702 87f77050  ..!...LILO....pP
0000010: 00000000 02fcc24f 00000000 c2008070  .......O.......p
0000020: e6517a2e b8c0078e d0bc0008 fb525306  .Qz..........RS.
0000030: 56fc8ed8 31ed60b8 0012b336 cd1061b0  V...1.`....6..a.
0000040: 0de86601 b00ae861 01b04ce8 5c01601e  ..f....a..L.\.`.
0000050: 0780fafe 750288f2 bb00028a 761e89d0  ....u.......v...
0000060: 80e48030 e0780a3c 107306f6 461c4075  ...0.x.<.s..F.@u
0000070: 2e88f266 8b761866 09f67423 52b408b2  ...f.v.f..t#R...
0000080: 8053cd13 5b72570f b6caba7f 00426631  .S..[rW......Bf1
0000090: c040e860 00663bb7 b8017403 e2ef5a53  .@.`.f;...t...ZS
00000a0: 8a761fbe 2000e8df 00b49966 817ffc4c  .v.. ......f...L
00000b0: 494c4f75 295e6880 080731db e8c90075  ILOu)^h...1....u
00000c0: fbbe0600 89f7b90a 00b49af3 a6750fb0  .............u..
00000d0: 02ae750a 0655b049 e8cf00cb b440b020  ..u..U.I.....@. 
00000e0: e8c700e8 b400fe4e 007407bc e80761e9  .......N.t....a.
00000f0: 5cfff4eb fd605555 66500653 6a016a10  \....`UUfP.Sj.j.
0000100: 89e653f6 c6607470 f6c62074 14bbaa55  ..S..`tp.. t...U
0000110: b441cd13 720b81fb 55aa7505 f6c10175  .A..r...U.u....u
0000120: 415206b4 08cd1307 72b451c0 e90686e9  AR......r.Q.....
0000130: 89cf59c1 ea089240 4983e13f 41f7e193  ..Y....@I..?A...
0000140: 8b44088b 540a39da 7392f7f3 39f8778c  .D..T.9.s...9.w.
0000150: c0e40686 e092f6f1 08e289d1 415a88c6  ............AZ..
0000160: eb1cb442 5bbd0500 60cd1373 164d74b8  ...B[...`..s.Mt.
0000170: 31c0cd13 614debf0 66505958 88e6b801  1...aM..fPYX....
0000180: 02ebe18d 641061c3 66ad6609 c0740a66  ....d.a.f.f..t.f
0000190: 034610e8 5fff80c7 02c3c1c0 04e80300  .F.._...........
00001a0: c1c00424 0f2704f0 144060bb 0700b40e  ...$.'...@`.....
00001b0: cd1061c3 00000000 00000000 00000000  ..a.............
00001c0: 00000000 00000000 00000000 00000000  ................
*
00001f0: 00000000 00000000 00000000 000055aa  ..............U.
0000200: 00000000 00000000 00000000 00000000  ................
*
00003f0: 00000000 00000000 00000000 03b7302c  ..............0,
0000400: 00c00000 60bf0300 f82f0000 251a0100  ....`..../..%...
0000410: 61690000 00000000 02000000 02000000  ai..............
0000420: 00800000 00800000 00180000 06257250  .............%rP
0000430: 06257250 0d00ffff 53ef0100 01000000  .%rP....S.......
0000440: 5a706b50 00000000 00000000 01000000  ZpkP............
0000450: 00000000 0b000000 00010000 3c000000  ............<...
0000460: 46020000 6b040000 72aa9b1c 4180444a  F...k...r...A.DJ
0000470: 8e15836d dad4f235 626f6f74 00000000  ...m...5boot....
0000480: 00000000 00000000 2f626f6f 74000000  ......../boot...
0000490: 00000000 00000000 00000000 00000000  ................
*
00004c0: 00000000 00000000 00000000 00003b00  ..............;.
00004d0: 00000000 00000000 00000000 00000000  ................
00004e0: 08000000 00000000 ad000000 f5fe1926  ...............&
00004f0: d2da4864 b41fa932 76ae313f 01010000  ..Hd...2v.1?....
0000500: 0c000000 00000000 e2f9c24f 0af30100  ...........O....
0000510: 04000000 00000000 00000000 00100000  ................
0000520: 00000100 00000000 00000000 00000000  ................
0000530: 00000000 00000000 00000000 00000000  ................
0000540: 00000000 00000000 00000000 00000001  ................
0000550: 00000000 00000000 00000000 1c001c00  ................
0000560: 01000000 00000000 00000000 00000000  ................
0000570: 00000000 04010000 bd501802 00000000  .........P......
0000580: 00000000 00000000 00000000 00000000  ................
*
00007f0: 00000000 00000000 00000000 e8cc7bec  ..............{.
0000800: 00000000 00000000 00000000 00000000  ................
*
0000ff0: 00000000 00000000 00000000 00000000  ................

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-07 22:54         ` Theodore Ts'o
  2012-10-08  1:05           ` George Spelvin
@ 2012-10-08  1:25           ` George Spelvin
  2012-10-08  2:41             ` Theodore Ts'o
  1 sibling, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-10-08  1:25 UTC (permalink / raw)
  To: linux, tytso; +Cc: linux-ext4, tm

More reproduction (and hopefully useful ideas at the end)

# sleep 10 > /boot/foo & rm /boot/foo ; dumpe2fs -h /dev/md0 ; dd if=/dev/md0 of=/tmp/md0a count=4 ; /tmp/e2fsck -n /dev/md0
[2] 21690
dumpe2fs 1.43-WIP (22-Sep-2012)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/md0
Couldn't find valid filesystem superblock.
4+0 records in
4+0 records out
2048 bytes (2.0 kB) copied, 3.0265e-05 s, 67.7 MB/s
e2fsck 1.43-WIP (22-Sep-2012)
Warning!  /dev/md0 is mounted.
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72229
Free inodes:              26977
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Mon Oct  8 00:57:42 2012
Last write time:          Mon Oct  8 00:57:42 2012
Mount count:              13
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          34 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       173
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xec7bcce8
Expected checksum was dfd1473e
ext2fs_open2: Superblock checksum does not match superblock
/tmp/e2fsck: Superblock invalid, trying backup blocks...
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
Clear journal? no

boot was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 173 has zero dtime.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (16896, counted=5353).
Fix? no

Free blocks count wrong for group #1 (7259, counted=2217).
Fix? no

Free blocks count wrong for group #2 (12585, counted=3984).
Fix? no

Free blocks count wrong for group #3 (19829, counted=17775).
Fix? no

Free blocks count wrong for group #4 (17162, counted=15224).
Fix? no

Free blocks count wrong for group #5 (18729, counted=10523).
Fix? no

Free blocks count wrong for group #6 (13443, counted=11644).
Fix? no

Free blocks count wrong for group #7 (8424, counted=5509).
Fix? no

Free blocks count wrong (114415, counted=72229).
Fix? no

Inode bitmap differences:  -173
Fix? no

Free inodes count wrong for group #0 (651, counted=439).
Fix? no

Free inodes count wrong for group #1 (128, counted=286).
Fix? no

Free inodes count wrong for group #2 (1137, counted=1158).
Fix? no

Free inodes count wrong for group #3 (792, counted=823).
Fix? no

Free inodes count wrong (26978, counted=26976).
Fix? no

Inode bitmap differences: Group 0 inode bitmap does not match checksum
IGNORED.
Group 1 inode bitmap does not match checksum
IGNORED.
Group 2 inode bitmap does not match checksum
IGNORED.
Group 3 inode bitmap does not match checksum
IGNORED.
Group 5 inode bitmap does not match checksum
IGNORED.
Group 6 inode bitmap does not match checksum
IGNORED.
Group 7 inode bitmap does not match checksum
IGNORED.
Block bitmap differences: Group 0 block bitmap does not match checksum
IGNORED.
Group 1 block bitmap does not match checksum
IGNORED.
Group 2 block bitmap does not match checksum
IGNORED.
Group 3 block bitmap does not match checksum
IGNORED.
Group 4 block bitmap does not match checksum
IGNORED.
Group 5 block bitmap does not match checksum
IGNORED.
Group 6 block bitmap does not match checksum
IGNORED.
Group 7 block bitmap does not match checksum
IGNORED.

boot: ********** WARNING: Filesystem still has errors **********

boot: 22174/49152 files (3.6% non-contiguous), 131185/245600 blocks
# dumpe2fs -h /dev/md0
dumpe2fs 1.43-WIP (22-Sep-2012)
Filesystem volume name:   boot
Last mounted on:          /boot
Filesystem UUID:          72aa9b1c-4180-444a-8e15-836ddad4f235
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              49152
Block count:              245600
Reserved block count:     12280
Free blocks:              72229
Free inodes:              26977
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      59
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         6144
Inode blocks per group:   384
Flex block group size:    16
Filesystem created:       Mon May 28 04:06:58 2012
Last mount time:          Mon Oct  8 00:57:42 2012
Last write time:          Mon Oct  8 00:57:42 2012
Mount count:              13
Maximum mount count:      -1
Last checked:             Tue Oct  2 22:53:14 2012
Check interval:           0 (<none>)
Lifetime writes:          34 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f5fe1926-d2da-4864-b41f-a93276ae313f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xec7bcce8
Journal features:         journal_incompat_revoke
Journal size:             16M
Journal length:           4096
Journal sequence:         0x0000f78d
Journal start:            1

[2]-  Done                    sleep 10 > /boot/foo
# xxd -g4 -a /dev/md0a
[... first 512b snipped ...]
0000200: 00000000 00000000 00000000 00000000  ................
*
00003f0: 00000000 00000000 00000000 03b7302c  ..............0,
0000400: 00c00000 60bf0300 f82f0000 251a0100  ....`..../..%...
0000410: 61690000 00000000 02000000 02000000  ai..............
0000420: 00800000 00800000 00180000 06257250  .............%rP
0000430: 06257250 0d00ffff 53ef0100 01000000  .%rP....S.......
0000440: 5a706b50 00000000 00000000 01000000  ZpkP............
0000450: 00000000 0b000000 00010000 3c000000  ............<...
0000460: 46020000 6b040000 72aa9b1c 4180444a  F...k...r...A.DJ
0000470: 8e15836d dad4f235 626f6f74 00000000  ...m...5boot....
0000480: 00000000 00000000 2f626f6f 74000000  ......../boot...
0000490: 00000000 00000000 00000000 00000000  ................
*
00004c0: 00000000 00000000 00000000 00003b00  ..............;.
00004d0: 00000000 00000000 00000000 00000000  ................
00004e0: 08000000 00000000 ad000000 f5fe1926  ...............&
00004f0: d2da4864 b41fa932 76ae313f 01010000  ..Hd...2v.1?....
0000500: 0c000000 00000000 e2f9c24f 0af30100  ...........O....
0000510: 04000000 00000000 00000000 00100000  ................
0000520: 00000100 00000000 00000000 00000000  ................
0000530: 00000000 00000000 00000000 00000000  ................
0000540: 00000000 00000000 00000000 00000001  ................
0000550: 00000000 00000000 00000000 1c001c00  ................
0000560: 01000000 00000000 00000000 00000000  ................
0000570: 00000000 04010000 bd501802 00000000  .........P......
0000580: 00000000 00000000 00000000 00000000  ................
*
00007f0: 00000000 00000000 00000000 e8cc7bec  ..............{.
#


That's a dumpe2fs, a dumpe2fs, and a (patched) e2fsck on the corruption.
For reference, here's the superblock after the sleep expired (and
dumpe2fs stopped complaining)
# xxd -g4 -a -l2048 /dev/md0
0000200: 00000000 00000000 00000000 00000000  ................
*
00003f0: 00000000 00000000 00000000 03b7302c  ..............0,
0000400: 00c00000 60bf0300 f82f0000 251a0100  ....`..../..%...
0000410: 61690000 00000000 02000000 02000000  ai..............
0000420: 00800000 00800000 00180000 06257250  .............%rP
0000430: 06257250 0d00ffff 53ef0100 01000000  .%rP....S.......
0000440: 5a706b50 00000000 00000000 01000000  ZpkP............
0000450: 00000000 0b000000 00010000 3c000000  ............<...
0000460: 46020000 6b040000 72aa9b1c 4180444a  F...k...r...A.DJ
0000470: 8e15836d dad4f235 626f6f74 00000000  ...m...5boot....
0000480: 00000000 00000000 2f626f6f 74000000  ......../boot...
0000490: 00000000 00000000 00000000 00000000  ................
*
00004c0: 00000000 00000000 00000000 00003b00  ..............;.
00004d0: 00000000 00000000 00000000 00000000  ................
00004e0: 08000000 00000000 00000000 f5fe1926  ...............&
00004f0: d2da4864 b41fa932 76ae313f 01010000  ..Hd...2v.1?....
0000500: 0c000000 00000000 e2f9c24f 0af30100  ...........O....
0000510: 04000000 00000000 00000000 00100000  ................
0000520: 00000100 00000000 00000000 00000000  ................
0000530: 00000000 00000000 00000000 00000000  ................
0000540: 00000000 00000000 00000000 00000001  ................
0000550: 00000000 00000000 00000000 1c001c00  ................
0000560: 01000000 00000000 00000000 00000000  ................
0000570: 00000000 04010000 bd501802 00000000  .........P......
0000580: 00000000 00000000 00000000 00000000  ................
*
00007f0: 00000000 00000000 00000000 e8cc7bec  ..............{.

Notice that the only difference is that the byte at 0x04e8 (offset 0xe8
in the superblock) is cleared, and the checksum is NOT changed, in the
"working" superblock.  Perhaps you're looking for the bug backward:
the checksum *is* getting upated, but the data checksummed is *not*,
leading to the mismatch.

They're also in different halves, so perhaps not writing out
both sectors?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-08  1:25           ` George Spelvin
@ 2012-10-08  2:41             ` Theodore Ts'o
  2012-10-08  3:17               ` George Spelvin
  2012-11-01  1:05               ` ext4: fix metadata checksum calculation for the superblock George Spelvin
  0 siblings, 2 replies; 22+ messages in thread
From: Theodore Ts'o @ 2012-10-08  2:41 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-ext4, tm

I found the problem.  It turns out ext4_handle_dirty_super() was
completely FUBAR'ed and was calculating the checksum on the wrong data
(for all but 1k block file systems, sigh).

We just didn't notice because the checksum would be correctly set when
the file system was unmounted cleanly.  (Sigh).

The following patch should fix things.  Thanks for testing out the
metadata checksum on the root file system, and reporting this
problem!!!

						- Ted

>From bdd7ed290bf12c2e9132fbe97208a1af79c7a29d Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Sun, 7 Oct 2012 22:18:56 -0400
Subject: [PATCH] ext4: fix metadata checksum calculation for the superblock

The function ext4_handle_dirty_super() was calculating the superblock
on the wrong block data.  As a result, when the superblock is modified
while it is mounted (most commonly, when inodes are added or removed
from the orphan list), the superblock checksum would be wrong.  We
didn't notice because the superblock *was* being correctly calculated
in ext4_commit_super(), and this would get called when the file system
was unmounted.  So the problem only became obvious if the system
crashed while the file system was mounted.

Fix this by removing the poorly designed function signature for
ext4_superblock_Csum_set(); if it only took a single argument, the
pointer to a struct superblock, the ambiguity which caused this
mistake would have been impossible.

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
---
 fs/ext4/ext4.h      | 3 +--
 fs/ext4/ext4_jbd2.c | 8 ++------
 fs/ext4/super.c     | 7 ++++---
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3ab2539..78971cf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2063,8 +2063,7 @@ extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);
 extern int ext4_calculate_overhead(struct super_block *sb);
 extern int ext4_superblock_csum_verify(struct super_block *sb,
 				       struct ext4_super_block *es);
-extern void ext4_superblock_csum_set(struct super_block *sb,
-				     struct ext4_super_block *es);
+extern void ext4_superblock_csum_set(struct super_block *sb);
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index bfa65b4..b4323ba 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -143,17 +143,13 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
 	struct buffer_head *bh = EXT4_SB(sb)->s_sbh;
 	int err = 0;
 
+	ext4_superblock_csum_set(sb);
 	if (ext4_handle_valid(handle)) {
-		ext4_superblock_csum_set(sb,
-				(struct ext4_super_block *)bh->b_data);
 		err = jbd2_journal_dirty_metadata(handle, bh);
 		if (err)
 			ext4_journal_abort_handle(where, line, __func__,
 						  bh, handle, err);
-	} else {
-		ext4_superblock_csum_set(sb,
-				(struct ext4_super_block *)bh->b_data);
+	} else
 		mark_buffer_dirty(bh);
-	}
 	return err;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 982f6fc..5ededf1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -143,9 +143,10 @@ int ext4_superblock_csum_verify(struct super_block *sb,
 	return es->s_checksum == ext4_superblock_csum(sb, es);
 }
 
-void ext4_superblock_csum_set(struct super_block *sb,
-			      struct ext4_super_block *es)
+void ext4_superblock_csum_set(struct super_block *sb)
 {
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+
 	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
 		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
 		return;
@@ -4387,7 +4388,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
 		cpu_to_le32(percpu_counter_sum_positive(
 				&EXT4_SB(sb)->s_freeinodes_counter));
 	BUFFER_TRACE(sbh, "marking dirty");
-	ext4_superblock_csum_set(sb, es);
+	ext4_superblock_csum_set(sb);
 	mark_buffer_dirty(sbh);
 	if (sync) {
 		error = sync_dirty_buffer(sbh);
-- 
1.7.12.rc0.22.gcdd159b


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-08  2:41             ` Theodore Ts'o
@ 2012-10-08  3:17               ` George Spelvin
  2012-10-08  4:03                 ` Tao Ma
  2012-11-01  1:05               ` ext4: fix metadata checksum calculation for the superblock George Spelvin
  1 sibling, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-10-08  3:17 UTC (permalink / raw)
  To: linux, tytso; +Cc: linux-ext4, tm

I'm testing that patch, but you may want to fix it a bit more before submitting to
stable@...
fs/ext4/resize.c: In function 'update_backups':
fs/ext4/resize.c:973:39: error: too many arguments to function 'ext4_superblock_csum_set'
In file included from fs/ext4/ext4_jbd2.h:20:0,
                 from fs/ext4/resize.c:17:
fs/ext4/ext4.h:2049:13: note: declared here

The fix is of course obvious and I'm compiling it now.

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 41f6ef6..e781259 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -970,7 +970,7 @@ static void update_backups(struct super_block *sb,
 		goto exit_err;
 	}
 
-	ext4_superblock_csum_set(sb, (struct ext4_super_block *)data);
+	ext4_superblock_csum_set(sb);
 
 	while ((group = ext4_list_backups(sb, &three, &five, &seven)) < last) {
 		struct buffer_head *bh;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-08  3:17               ` George Spelvin
@ 2012-10-08  4:03                 ` Tao Ma
  2012-10-08 11:35                   ` George Spelvin
  0 siblings, 1 reply; 22+ messages in thread
From: Tao Ma @ 2012-10-08  4:03 UTC (permalink / raw)
  To: George Spelvin; +Cc: tytso, linux-ext4

On 10/08/2012 11:17 AM, George Spelvin wrote:
> I'm testing that patch, but you may want to fix it a bit more before submitting to
> stable@...
> fs/ext4/resize.c: In function 'update_backups':
> fs/ext4/resize.c:973:39: error: too many arguments to function 'ext4_superblock_csum_set'
> In file included from fs/ext4/ext4_jbd2.h:20:0,
>                  from fs/ext4/resize.c:17:
> fs/ext4/ext4.h:2049:13: note: declared here
> 
> The fix is of course obvious and I'm compiling it now.
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 41f6ef6..e781259 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -970,7 +970,7 @@ static void update_backups(struct super_block *sb,
>  		goto exit_err;
>  	}
>  
> -	ext4_superblock_csum_set(sb, (struct ext4_super_block *)data);
> +	ext4_superblock_csum_set(sb);
this line is already removed in my commit bef53b01 and will be in
stable. So this patch should work as expected.

Thanks
Tao
>  
>  	while ((group = ext4_list_backups(sb, &three, &five, &seven)) < last) {
>  		struct buffer_head *bh;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: metadata_csum + unclean shutdown = failure to boot
  2012-10-08  4:03                 ` Tao Ma
@ 2012-10-08 11:35                   ` George Spelvin
  0 siblings, 0 replies; 22+ messages in thread
From: George Spelvin @ 2012-10-08 11:35 UTC (permalink / raw)
  To: linux, tm; +Cc: linux-ext4, tytso

Tao Ma <tm@tao.ma> wrote:
> This line is already removed in my commit bef53b01 and will be in
> stable. So this patch should work as expected.

Ah, okay.  Well, with that, it appears to be working; I can't
reproduce the problem any more.

Thanks to everyone for a significant bugfix starting with a vague
report over a weekend!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-10-08  2:41             ` Theodore Ts'o
  2012-10-08  3:17               ` George Spelvin
@ 2012-11-01  1:05               ` George Spelvin
  2012-11-01  1:13                 ` Darrick J. Wong
  1 sibling, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-11-01  1:05 UTC (permalink / raw)
  To: darrick.wong, linux, tytso; +Cc: linux-ext4, tm

I'm currently running with two ext4 patches:

Author: Theodore Ts'o <tytso@mit.edu>
Date: Sun Oct 7 22:18:56 2012 -0400
Subject: ext4: fix metadata checksum calculation for the superblock

Author: Darrick J. Wong <darrick.wong@oracle.com>
Date: Wed Oct 17 12:51:30 2012 -0700
Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan

They appear to fix real problems.  I notice, that neither of these have
made it into 2.6.5.  Should they be sent to -stable at some point?

I'm not trying to overrule your judgements on the matter, just ensure that
the omission is actually a conscious decision rather than an oversight.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  1:05               ` ext4: fix metadata checksum calculation for the superblock George Spelvin
@ 2012-11-01  1:13                 ` Darrick J. Wong
  2012-11-01  1:50                   ` Theodore Ts'o
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2012-11-01  1:13 UTC (permalink / raw)
  To: George Spelvin; +Cc: tytso, linux-ext4, tm

On Wed, Oct 31, 2012 at 09:05:21PM -0400, George Spelvin wrote:
> I'm currently running with two ext4 patches:
> 
> Author: Theodore Ts'o <tytso@mit.edu>
> Date: Sun Oct 7 22:18:56 2012 -0400
> Subject: ext4: fix metadata checksum calculation for the superblock
> 
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date: Wed Oct 17 12:51:30 2012 -0700
> Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan
> 
> They appear to fix real problems.  I notice, that neither of these have
> made it into 2.6.5.  Should they be sent to -stable at some point?
> 
> I'm not trying to overrule your judgements on the matter, just ensure that
> the omission is actually a conscious decision rather than an oversight.

<shrug> I was wondering too, but I figured Ted was probably busy dealing with
the corruption bug and such.

(Which itself doesn't seem to be in 3.6.x yet)

I suppose it's a good sign that it's been more than a week and you haven't hit
anything else...

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  1:13                 ` Darrick J. Wong
@ 2012-11-01  1:50                   ` Theodore Ts'o
  2012-11-01  3:22                     ` Darrick J. Wong
  2012-11-01  6:12                     ` George Spelvin
  0 siblings, 2 replies; 22+ messages in thread
From: Theodore Ts'o @ 2012-11-01  1:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: George Spelvin, linux-ext4, tm

On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote:
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date: Sun Oct 7 22:18:56 2012 -0400
> > Subject: ext4: fix metadata checksum calculation for the superblock

This one was cc'ed to stable@vger.kernel.org.  But when you said "I
notice, that neither of thse have made it into 2.6.5", I assume you
meant 3.5?  The last 3.5 kernel is 3.5.7, and Greg K-H isn't
backporting fixes to 3.5.x any more.  (See http://www.kernel.org to
see which kernels are marked "EOL"; those are the ones which are no
longer getting updates.)

So that means it should eventually make it to the 3.4.x and 3.6.x
kernels.

> > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > Date: Wed Oct 17 12:51:30 2012 -0700
> > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan

I missed this one because the subject line didn't have [PATCH] in it.
(Darrick, it really helps if you use git format-patch / git
send-email; you can use a message-id of the message you're replying to
in the mail thread to chain the message to the thread.)

I would have eventually found it in patchwork, but even in patchwork
the listing would have had a potentially misleading subject line,
since it grabs the patch title from the subject line of the e-mail.

> <shrug> I was wondering too, but I figured Ted was probably busy dealing with
> the corruption bug and such.
> 
> (Which itself doesn't seem to be in 3.6.x yet)

It isn't in 3.7-rc3 because I didn't see it before I sent the pull
request to Linus....

At this point I'll just include it in the patches to be sent to Linus
at the next merge window, mainly because I don't have the time to run
a separate regression test run just for this patch, and it's only a
cosmetic issue, right?

						- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  1:50                   ` Theodore Ts'o
@ 2012-11-01  3:22                     ` Darrick J. Wong
  2012-11-01  6:12                     ` George Spelvin
  1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2012-11-01  3:22 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: George Spelvin, linux-ext4, tm

On Wed, Oct 31, 2012 at 09:50:15PM -0400, Theodore Ts'o wrote:
> On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote:
> > > Author: Theodore Ts'o <tytso@mit.edu>
> > > Date: Sun Oct 7 22:18:56 2012 -0400
> > > Subject: ext4: fix metadata checksum calculation for the superblock
> 
> This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> notice, that neither of thse have made it into 2.6.5", I assume you
> meant 3.5?  The last 3.5 kernel is 3.5.7, and Greg K-H isn't
> backporting fixes to 3.5.x any more.  (See http://www.kernel.org to
> see which kernels are marked "EOL"; those are the ones which are no
> longer getting updates.)
> 
> So that means it should eventually make it to the 3.4.x and 3.6.x
> kernels.

I thought he meant 3.6.5, but I haven't really been paying 3.6.x much attention.

> > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > Date: Wed Oct 17 12:51:30 2012 -0700
> > > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan
> 
> I missed this one because the subject line didn't have [PATCH] in it.
> (Darrick, it really helps if you use git format-patch / git
> send-email; you can use a message-id of the message you're replying to
> in the mail thread to chain the message to the thread.)
> 
> I would have eventually found it in patchwork, but even in patchwork
> the listing would have had a potentially misleading subject line,
> since it grabs the patch title from the subject line of the e-mail.

Oops, I guess I did forget the magic "[PATCH]".  Sorry about that.

> > <shrug> I was wondering too, but I figured Ted was probably busy dealing with
> > the corruption bug and such.
> > 
> > (Which itself doesn't seem to be in 3.6.x yet)
> 
> It isn't in 3.7-rc3 because I didn't see it before I sent the pull
> request to Linus....
> 
> At this point I'll just include it in the patches to be sent to Linus
> at the next merge window, mainly because I don't have the time to run
> a separate regression test run just for this patch, and it's only a
> cosmetic issue, right?

Yep.

--D
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  1:50                   ` Theodore Ts'o
  2012-11-01  3:22                     ` Darrick J. Wong
@ 2012-11-01  6:12                     ` George Spelvin
  2012-11-01  6:49                       ` Darrick J. Wong
  1 sibling, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-11-01  6:12 UTC (permalink / raw)
  To: darrick.wong, tytso; +Cc: linux-ext4, linux, tm

> This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> notice, that neither of thse have made it into 2.6.5", I assume you
> meant 3.5?

Whoops, typo!  I meant 3.6.5, the very latest just-out-today stable
kernel.

Quite a few 3.6.x kernels have come out since that patch was Cc'ed,
and it keeps not being included.  So I wondered.

> So that means it should eventually make it to the 3.4.x and 3.6.x
> kernels.

That's what I thought, but I didn't want to pester Greg until I was sure
of your intentions.

> At this point I'll just include it in the patches to be sent to Linus
> at the next merge window, mainly because I don't have the time to run
> a separate regression test run just for this patch, and it's only a
> cosmetic issue, right?

Well, it causes the file system to be marked dirty and unnecessarily
checked on reboot, which I contend is a bug, but it's not a data-loss
bug.

I do worry that it could cause file lookup to fail when it shouldn't,
which *is* effectively a data-loss bug, even if the data reappears
on reboot.  But I'd have to understand the problem and fix better to
know if that actually happens; I haven't observed it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  6:12                     ` George Spelvin
@ 2012-11-01  6:49                       ` Darrick J. Wong
  2012-11-01  7:07                         ` George Spelvin
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2012-11-01  6:49 UTC (permalink / raw)
  To: George Spelvin; +Cc: tytso, linux-ext4, tm

On Thu, Nov 01, 2012 at 02:12:12AM -0400, George Spelvin wrote:
> > This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> > notice, that neither of thse have made it into 2.6.5", I assume you
> > meant 3.5?
> 
> Whoops, typo!  I meant 3.6.5, the very latest just-out-today stable
> kernel.
> 
> Quite a few 3.6.x kernels have come out since that patch was Cc'ed,
> and it keeps not being included.  So I wondered.
> 
> > So that means it should eventually make it to the 3.4.x and 3.6.x
> > kernels.
> 
> That's what I thought, but I didn't want to pester Greg until I was sure
> of your intentions.
> 
> > At this point I'll just include it in the patches to be sent to Linus
> > at the next merge window, mainly because I don't have the time to run
> > a separate regression test run just for this patch, and it's only a
> > cosmetic issue, right?
> 
> Well, it causes the file system to be marked dirty and unnecessarily
> checked on reboot, which I contend is a bug, but it's not a data-loss
> bug.
> 
> I do worry that it could cause file lookup to fail when it shouldn't,
> which *is* effectively a data-loss bug, even if the data reappears
> on reboot.  But I'd have to understand the problem and fix better to
> know if that actually happens; I haven't observed it.

Yes, it would be useful to know what's going on with this directory file, since
it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.  What I was
/going/ for was that the kernel would notice a bad directory and flag it for
fsck on reboot.  Upon reboot, fsck would be run, notice the bad dir, and feed
it to the directory rebuilder to get it fixed for good.  However, there doesn't
seem to be any real checksum mismatch, so the rebuild doesn't happen.

Also ... refresh my memory -- some files have disappeared as a result of this
happening?

--D

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  6:49                       ` Darrick J. Wong
@ 2012-11-01  7:07                         ` George Spelvin
  2012-11-01  7:18                           ` Darrick J. Wong
  0 siblings, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-11-01  7:07 UTC (permalink / raw)
  To: darrick.wong, linux; +Cc: linux-ext4, tm, tytso

> Yes, it would be useful to know what's going on with this directory file,
> since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.
> What I was /going/ for was that the kernel would notice a bad directory
> and flag it for fsck on reboot.  Upon reboot, fsck would be run, notice
> the bad dir, and feed it to the directory rebuilder to get it fixed
> for good.  However, there doesn't seem to be any real checksum mismatch,
> so the rebuild doesn't happen.

That's what confuses me.  I had already run e2fsck -D (which I assume
rebuilds all directories, even if unnecessary) before observing the
problem.  The other odd clue is that it's always nfsd that chokes;
other accesses to the directory (ls -U, ls -lU, grep -r) don't produce
the message.

> Also ... refresh my memory -- some files have disappeared as a result of this
> happening?

I haven't observed it, no.  But the nature of the symptoms suggests it
might be happening.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  7:07                         ` George Spelvin
@ 2012-11-01  7:18                           ` Darrick J. Wong
  2012-11-01  7:28                             ` George Spelvin
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2012-11-01  7:18 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-ext4, tm, tytso

On Thu, Nov 01, 2012 at 03:07:31AM -0400, George Spelvin wrote:
> > Yes, it would be useful to know what's going on with this directory file,
> > since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.
> > What I was /going/ for was that the kernel would notice a bad directory
> > and flag it for fsck on reboot.  Upon reboot, fsck would be run, notice
> > the bad dir, and feed it to the directory rebuilder to get it fixed
> > for good.  However, there doesn't seem to be any real checksum mismatch,
> > so the rebuild doesn't happen.
> 
> That's what confuses me.  I had already run e2fsck -D (which I assume
> rebuilds all directories, even if unnecessary) before observing the
> problem.  The other odd clue is that it's always nfsd that chokes;
> other accesses to the directory (ls -U, ls -lU, grep -r) don't produce
> the message.

Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
non-nfs) users can see everything in the dir, no error messages?

Now *that* is odd. :)

You know, I was starting to wonder what on earth would even cause the fallback
in the first place.  It even looked like most of the "your dir is corrupt"
exits from that function would spit out an error or be somehow obviously
broken.

> > Also ... refresh my memory -- some files have disappeared as a result of this
> > happening?
> 
> I haven't observed it, no.  But the nature of the symptoms suggests it
> might be happening.

Hum.  When linear scan happens on a hashed dir, it's scanning the same blocks
that the hash scan sees.   The htree block looks like a regular directory block
with one huge "unused" dirent that wraps all the htree data.  So, the linear
scan should find the exact same files as a htree scan would.  If it doesn't,
something's wrong.  But you say it isn't, so I imagine it's fine.

<shrug> Another thing for me to ponder tomorrow. :)

--D

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  7:18                           ` Darrick J. Wong
@ 2012-11-01  7:28                             ` George Spelvin
  2012-11-02  0:05                               ` Darrick J. Wong
  0 siblings, 1 reply; 22+ messages in thread
From: George Spelvin @ 2012-11-01  7:28 UTC (permalink / raw)
  To: darrick.wong, linux; +Cc: linux-ext4, tm, tytso

> Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
> non-nfs) users can see everything in the dir, no error messages?

Yup.  After it survived one e2fsck -D, I poked at the directory a bit
to see if I could cause the error.  No success from local access.

It's also probably an NFSv2 client.  I wonder if it's doing something
odd with directory seeks that's causing problems; perhaps htree and the
32-bit seek cookie limit are not friends?

>> I haven't observed it, no.  But the nature of the symptoms suggests it
>> might be happening.

> Hum.  When linear scan happens on a hashed dir, it's scanning the same
> blocks that the hash scan sees.   The htree block looks like a regular
> directory block with one huge "unused" dirent that wraps all the htree
> data.  So, the linear scan should find the exact same files as a htree
> scan would.  If it doesn't, something's wrong.  But you say it isn't,
> so I imagine it's fine.

Maybe I was wrong.  I was worried that it was aborting the directory
scan due to the error and thus files would disappear.  If that doesn't
happen, no worries.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ext4: fix metadata checksum calculation for the superblock
  2012-11-01  7:28                             ` George Spelvin
@ 2012-11-02  0:05                               ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2012-11-02  0:05 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-ext4, tm, tytso

On Thu, Nov 01, 2012 at 03:28:47AM -0400, George Spelvin wrote:
> > Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
> > non-nfs) users can see everything in the dir, no error messages?
> 
> Yup.  After it survived one e2fsck -D, I poked at the directory a bit
> to see if I could cause the error.  No success from local access.
> 
> It's also probably an NFSv2 client.  I wonder if it's doing something
> odd with directory seeks that's causing problems; perhaps htree and the
> 32-bit seek cookie limit are not friends?

<shrug> I'm not nfs-wise, sadly.  I _am_ wondering if an ftrace of this might
be useful... or a gigantic glut of data that I'll never finish processing.

Just from a quick read of ext4_find_entry() it looks like the only thing that
results in fallback mode without a kernel message is ext4_bread() failing in
dx_probe()?

> >> I haven't observed it, no.  But the nature of the symptoms suggests it
> >> might be happening.
> 
> > Hum.  When linear scan happens on a hashed dir, it's scanning the same
> > blocks that the hash scan sees.   The htree block looks like a regular
> > directory block with one huge "unused" dirent that wraps all the htree
> > data.  So, the linear scan should find the exact same files as a htree
> > scan would.  If it doesn't, something's wrong.  But you say it isn't,
> > so I imagine it's fine.
> 
> Maybe I was wrong.  I was worried that it was aborting the directory
> scan due to the error and thus files would disappear.  If that doesn't
> happen, no worries.

Oh well, it'll run slowly but at least it won't be throwing up errors.

--D

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-11-02  0:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-07  5:04 metadata_csum + unclean shutdown = failure to boot George Spelvin
2012-10-07 13:39 ` Tao Ma
2012-10-07 15:09   ` George Spelvin
2012-10-07 18:10     ` Theodore Ts'o
2012-10-07 20:18       ` George Spelvin
2012-10-07 22:54         ` Theodore Ts'o
2012-10-08  1:05           ` George Spelvin
2012-10-08  1:25           ` George Spelvin
2012-10-08  2:41             ` Theodore Ts'o
2012-10-08  3:17               ` George Spelvin
2012-10-08  4:03                 ` Tao Ma
2012-10-08 11:35                   ` George Spelvin
2012-11-01  1:05               ` ext4: fix metadata checksum calculation for the superblock George Spelvin
2012-11-01  1:13                 ` Darrick J. Wong
2012-11-01  1:50                   ` Theodore Ts'o
2012-11-01  3:22                     ` Darrick J. Wong
2012-11-01  6:12                     ` George Spelvin
2012-11-01  6:49                       ` Darrick J. Wong
2012-11-01  7:07                         ` George Spelvin
2012-11-01  7:18                           ` Darrick J. Wong
2012-11-01  7:28                             ` George Spelvin
2012-11-02  0:05                               ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).