linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* journal recovery problems with metadata_csum, *non-64bit*
@ 2014-08-08 19:31 TR Reardon
  2014-08-08 21:47 ` Theodore Ts'o
  2014-08-08 22:15 ` Darrick J. Wong
  0 siblings, 2 replies; 8+ messages in thread
From: TR Reardon @ 2014-08-08 19:31 UTC (permalink / raw)
  To: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

Kernel 3.16, e2fsprogs 1.43-WIP (1.42.11 compiled with metadata_csum
support), filesystems mounted with journal_async_commit.

This may be an fsck problem, hard to tell.  I consistently have
problems with journal recovery after hard reboot when filesystem has
metadata_csum.   Interestingly, no problems with 64-bit filesystems.

during journal replay, bogus block numbers are reported.  in this
case, a 128MB file was deleted on two filesystems where the only
difference is 64bit vs non-64bit.  This also happens with or without
bigalloc, though I only document bigalloc case here.

please see attached superblocks, dumped just prior to hard reboot.
disk8=sdm1, disk9=sdn1

dmesg:

[Fri Aug  8 15:16:28 2014] JBD2: Out of memory during recovery.
[Fri Aug  8 15:16:28 2014] JBD2: recovery failed
[Fri Aug  8 15:16:28 2014] EXT4-fs (sdm1): error loading journal
[Fri Aug  8 15:16:27 2014] EXT4-fs (sdn1): recovery complete
[Fri Aug  8 15:16:27 2014] EXT4-fs (sdn1): mounted filesystem with
ordered data mode. Opts: journal_async_commit


fsck:

#e2fsck -f -v /dev/sdm1
e2fsck 1.43-WIP (09-Jul-2014)
disk8: recovering journal
Error writing block 549755813991 (Invalid argument).  Ignore error<y>? yes
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (11332304, counted=11365072).
Fix<y>? yes
Free inodes count wrong (177287, counted=177288).
Fix<y>? yes

disk8: ***** FILE SYSTEM WAS MODIFIED *****

        1656 inodes used (0.93%, out of 178944)
         260 non-contiguous files (15.7%)
           0 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 1280/363/5
   721201312 blocks used (98.45%, out of 732566384)
           0 bad blocks
         358 large files

        1267 regular files
         380 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
        1647 files

[-- Attachment #2: disk8-dumpe2fs-postrm --]
[-- Type: application/octet-stream, Size: 2068 bytes --]

Filesystem volume name:   disk8
Last mounted on:          /mnt/disk8
Filesystem UUID:          0df882a5-1df6-4d51-b037-c132e308e341
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              178944
Block count:              732566384
Reserved block count:     0
Free blocks:              11332304
Free inodes:              177287
First block:              0
Block size:               4096
Cluster size:             65536
Reserved GDT blocks:      53
Blocks per group:         524288
Clusters per group:       32768
Inodes per group:         128
Inode blocks per group:   8
Flex block group size:    16
Filesystem created:       Tue Jan 15 17:18:07 2013
Last mount time:          Fri Aug  8 14:46:01 2014
Last write time:          Fri Aug  8 14:46:01 2014
Mount count:              1
Maximum mount count:      -1
Last checked:             Fri Aug  8 14:43:58 2014
Check interval:           0 (<none>)
Lifetime writes:          6022 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      e0f46c10-ece6-43a4-9469-c0d89d644b0f
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xfa490415
Journal features:         journal_incompat_revoke journal_async_commit journal_checksum_v2
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0000007f
Journal start:            1
Journal checksum type:    crc32c
Journal checksum:         0xcfb7b45e


[-- Attachment #3: disk9-dumpe2fs-postrm --]
[-- Type: application/octet-stream, Size: 2077 bytes --]

Filesystem volume name:   disk9
Last mounted on:          /mnt/disk9
Filesystem UUID:          11399183-4f42-4b14-8b5d-37cbda38b029
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              178944
Block count:              732566384
Reserved block count:     0
Free blocks:              167886960
Free inodes:              177612
First block:              0
Block size:               4096
Cluster size:             65536
Group descriptor size:    64
Blocks per group:         524288
Clusters per group:       32768
Inodes per group:         128
Inode blocks per group:   8
Flex block group size:    16
Filesystem created:       Thu Dec 26 19:29:21 2013
Last mount time:          Fri Aug  8 14:42:56 2014
Last write time:          Fri Aug  8 14:42:56 2014
Mount count:              10
Maximum mount count:      -1
Last checked:             Thu Aug  7 01:35:20 2014
Check interval:           0 (<none>)
Lifetime writes:          2413 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      77780d74-59c9-4910-88c1-ff50c3e955c1
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x20f09425
Journal features:         journal_incompat_revoke journal_64bit journal_async_commit journal_checksum_v2
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0001e11b
Journal start:            1
Journal checksum type:    crc32c
Journal checksum:         0x98d70b93


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: journal recovery problems with metadata_csum, *non-64bit*
@ 2014-08-10 22:35 TR Reardon
  0 siblings, 0 replies; 8+ messages in thread
From: TR Reardon @ 2014-08-10 22:35 UTC (permalink / raw)
  To: Theodore Ts'o, Darrick J. Wong; +Cc: linux-ext4

Ok, I found the problem in jbd2, and have a solution, though it's
debatable what the ideal solution is.  For now, the simplest patch is
below, though a similar patch in lib/ext2fs/kernel-jbd.h is required
to get e2fsck back in sync.

The original c3900875 commit adding metadata_csum (ie
journal_checksum_v2) to jbd2 added 2 extra bytes for the block
checksums, in addition to re-allocating 2 bytes from the 4 bytes of
flags.  However, a decision was made to only retain the lower 16-bits
of the crc32c, and thus those extra 2 bytes were unneeded.  But those
2 extra bytes were never "deallocated" from journal_tag_bytes().

Unfortunately, different code relies on JBD_TAG_SIZE32/64 constants
directly rather than the journal_tag_bytes() utility function, in
particular the recovery code which is common to e2fsck and jbd2.  This
led different tools to think they were looking at a 64bit journal when
actually it was 32bit.  Code that relied on journal_tag_bytes()
remained safe, so the block iterators were fine, but any direct use of
those constants [including the hideous greater-than comparison in
read_tag_bytes()] went awry, and journal replay will fail.

As far as I can tell, metadata_csum + journal checksum has never
worked for 32bit filesystems. By a little bit of padding luck, 64bit
worked fine.

Now, as to the solution: depends on whether one feels that existing
in-the-wild journals matter. The original commit was May 2012, are we
past early-adopters now?  If this patch is taken, you shrink the
journal block tags to the intended size but in-the-wild journals will
be broken.  But they already are, so...?  This opens up the
possibility of now using those extra 2 bytes and retaining full 32-bit
crc32c for the block tags.  If going that route, debugs/logdump needs
a fix in addition to changes to jbd2.

FWIW, the "JBD2: Out of memory during recovery." error in
fs/jbd2/recovery.c was opaque at best and should be changed to always
include the block# that caused the problem.

+Reardon

---
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 67b8e30..dc27d09 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2166,15 +2166,11 @@ int jbd2_journal_blocks_per_page(struct inode *inode)
 size_t journal_tag_bytes(journal_t *journal)
 {
        journal_block_tag_t tag;
-       size_t x = 0;

^ permalink raw reply related	[flat|nested] 8+ messages in thread
[parent not found: <BLU437-SMTP3407533C3883653626BD3DFDEC0@phx.gbl>]

end of thread, other threads:[~2014-08-11  7:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-08 19:31 journal recovery problems with metadata_csum, *non-64bit* TR Reardon
2014-08-08 21:47 ` Theodore Ts'o
2014-08-08 22:29   ` TR Reardon
2014-08-09  0:21     ` Theodore Ts'o
     [not found]       ` <BLU436-SMTP1726DA8B9CB511781B73155FDEF0@phx.gbl>
2014-08-09  4:07         ` Theodore Ts'o
2014-08-08 22:15 ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2014-08-10 22:35 TR Reardon
     [not found] <BLU437-SMTP3407533C3883653626BD3DFDEC0@phx.gbl>
2014-08-11  7:10 ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).