From: Theodore Ts'o <tytso@mit.edu>
To: Eric Sandeen <sandeen@redhat.com>, Nix <nix@esperi.org.uk>
Cc: linux-ext4@vger.kernel.org
Subject: Testing ext4's journal via simulating a reboot via KVM
Date: Sat, 27 Oct 2012 04:01:27 -0400 [thread overview]
Message-ID: <20121027080127.GA12045@thunk.org> (raw)
In-Reply-To: <508AF3FA.4020506@redhat.com>
On Fri, Oct 26, 2012 at 03:35:06PM -0500, Eric Sandeen wrote:
>
> Out of curiosity, when I test log replay with the journal_checksum option, I
> almost always get something like:
>
> [ 999.917805] JBD2: journal transaction 84121 on dm-1-8 is corrupt.
> [ 999.923904] EXT4-fs (dm-1): error loading journal
I tried to reproduce your findings, using my kvm setup. I've pushed
the changes that I am using here (see the kvm-autorun and kvm-xfstests
directories):
git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
Using both the kernel tree as of my last pull request to Linus, as
well as v3.6.3, I was not able to reproduce a failure using this:
./kvm-xfstests -m nobarrier,journal_async_commit,journal_checksum fsstress
<wait until fsstress has started running for 10 seconds or so, and in
another window>
killall kvm
I then built a version of e2fsck using the configure option
--enable-jbd-debug, and then ran e2fsck with the E2FSCK_JBD_DEBUG
environment variable set to 3. (This allowed me to confirm that the
checksums really were getting set.)
Running e2fsck -f on the underlying volume, I could see that checksums
was in fact properly set, and the journal replay completed
successfully. I tried this multiple times, and it worked every single
time. This was with me killing the kvm while fsstress was running so
there was over 300 transactions that had to be replayed.
Eric has said that he was able to see journal checksum failures which
caused the journal to abort using his setup. It's very interesting
that I could not (no matter how many times I tried, and with
variations on the mount options). It makes me wonder if there is some
difference with how dm-snapshot was working versus simply just killing
the kvm process --- could it be that dm-snapshot wasn't taking a
consistent snapshot? The fact that KVM is seeing valid checksums
would imply that the file system layer is (at least) sending valid
data to the disk. Why dm_snapshot is not seeing valid checksum is
definitely an interesting question.
Eric if you can build a version of e2fsck with --config-jbd-debug
enabled, that would be useful since you'll be able to see how much the
expected and real checksum vary with each other. Maybe that will tell
us something...
- Ted
P.S. One other interesting thing I discovered is this. Using debugfs
-R "logdump -a", I found the following:
Found expected sequence 20587, type 2 (commit block) at block 5302
Found expected sequence 20588, type 1 (descriptor block) at block 5303
Dumping descriptor block, sequence 20588, at block 5303:
FS block 23 logged at journal block 5304 (flags 0x0)
FS block 1 logged at journal block 5305 (flags 0x2)
FS block 1157 logged at journal block 5306 (flags 0xa)
Found expected sequence 20588, type 2 (commit block) at block 5307
Found expected sequence 20589, type 2 (commit block) at block 5308
Found expected sequence 20590, type 2 (commit block) at block 5309
Note the sequence of what appear to be completely empty commit blocks.
I'm not sure what fsstress is doing which is causing e2fsck to issue
empty commit blocks, but as far as I can tell, they are completely
pointless.
I tried running "debugfs -R logdump" on my root file system, and I saw
a few cases of empty commits, but at a much reduced rate. Still, if
we can figure out how to stop the jbd2 layer from creating these empty
commits, it would certainly optimize things a bit.
next prev parent reply other threads:[~2012-10-27 8:01 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87objupjlr.fsf@spindle.srvr.nix>
[not found] ` <20121023013343.GB6370@fieldses.org>
[not found] ` <87mwzdnuww.fsf@spindle.srvr.nix>
[not found] ` <20121023143019.GA3040@fieldses.org>
[not found] ` <874nllxi7e.fsf_-_@spindle.srvr.nix>
[not found] ` <874nllxi7e.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
[not found] ` <87pq48nbyz.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 20:35 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Eric Sandeen
2012-10-26 20:37 ` Nix
[not found] ` <87wqydx957.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 20:56 ` Theodore Ts'o
[not found] ` <20121026205618.GC8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-26 20:59 ` Nix
[not found] ` <87objpx84k.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
[not found] ` <87haphx76u.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen
2012-10-27 21:21 ` Nix
2012-10-27 21:23 ` Eric Sandeen
2012-10-27 21:29 ` Nix
2012-10-27 21:34 ` Eric Sandeen
2012-10-27 21:40 ` Nix
[not found] ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29 2:09 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
[not found] ` <20121026211542.GE8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-27 3:11 ` Jim Rees
2012-10-27 8:01 ` Theodore Ts'o [this message]
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 5:07 ` Andreas Dilger
2012-10-29 17:08 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121027080127.GA12045@thunk.org \
--to=tytso@mit.edu \
--cc=linux-ext4@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).