linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Andreas Dilger <adilger@sun.com>
Cc: linux-ext4@vger.kernel.org, Girish Shilamkar <Girish.Shilamkar@sun.com>
Subject: Re: What to do when the journal checksum is incorrect
Date: Mon, 26 May 2008 10:54:44 -0400	[thread overview]
Message-ID: <20080526145440.GA9893@mit.edu> (raw)
In-Reply-To: <20080525113842.GE5970@mit.edu>

On Sun, May 25, 2008 at 07:38:42AM -0400, Theodore Tso wrote:
> So the other alternative I seriously considered was not replaying the
> journal at all, and bailing out after seeing the bad checksum --- but
> that just defers the problem to e2fsck, and e2fsck can't really do
> anything much different, and the tools to allow a human to make a
> decision on a block by block basis in the journal don't exist, and
> even if they did would make more system administrators run screaming.
> 
> I suspect the *best* approach is to change the journal format one more
> time, and include a CRC on a per-block basis in the descriptor blocks,
> and a CRC for the entire descriptor block.  That way, we can decide
> what to replay or not on a per-block basis.

One other alternative I forgot to mention is that e2fsck (or the
kernel) could skip the bad (non-terminal) commit, and continue
replaying the rest of the good commits.  That's a much coarser grain
alternative to the above, given that there could be hundred or more
blocks in a particular commit that we would be skipping.

E2fsck could also save the journal to an external file which could
allow an expert to paw over the results and try to do something sane
with the results.  Given that *very* few experts would know what to
do, I'm not entirely sure it's worth the effort --- although perhaps
some of the folks who are working on ext3grep might be interested
writing some journal forensic tools.

> Nope, right now the mount will silently succeed, even though part of
> the journal (include the commit with the bad CRC, but no commits
> beyond that) have been replayed, and then the entire journal is
> truncated and thrown away.

BTW, here's the patch that will cause fs/jbd2/recovery.c to terminate
upon finding the non-terminal commit with the failed CRC.  It's not a
complete fix, since apparently fs/jbd2/recovery.c is also currently
noting the in the journal-async-commit, terminal-commit with bad CRC
case, fs/jbd2/recovery.c is noting the bad CRC --- and then replaying
it, which means that journal-async-commit is currently NOT safe.  (Not
yet fixed with my current patch; I had been so focused on the
non-terminal bad CRC case, that I only recently noticed that the
terminal async journal case was also busted.)

All of this was found by porting over the fs/jbd2/recovery.c changes
into e2fsck/recovery.c, compiling e2fsprogs with --enable-jbd-debug,
and then setting the JBD_DEBUG environment variable, then watching
journal recoveries with selectively corrupted journals using lde.
That's also how I found the buffer head memory leak which sent to
linux-ext4 a few days ago --- and why I try hard to keep
e2fsck/recover.c in sync with fs/jbd[2]/recovery.c.

The Lesson of the Week is that running tests in userspace is easier
and faster, which encourages more testing --- which means bugs get
found faster.  :-)

      		      	      	  	      	   - Ted

commit 78c061f21271968e24c434bb8a5ec41602ad0b99
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat May 24 15:45:31 2008 -0400

    jbd2: Fix handling of a bad checksum of a non-terminal commit
    
    If there is a checksum problem in transaction n, and it is not the
    last transaction, the problem isn't until the commit block for
    transaction n+1 is found. So we need to decrement the commit_ID
    counter so the right transaction is marked as the troublemaker, and we
    don't replay the corrupted transaction.
    
    Cc: Andreas Dilger <adilger@clusterfs.com>
    Cc: Girish Shilamkar <girish@clusterfs.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 7199db5..2d48448 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -611,6 +611,7 @@ static int do_one_pass(journal_t *journal,
 				chksum_err = chksum_seen = 0;
 
 				if (info->end_transaction) {
+					next_commit_ID--;
 					printk(KERN_ERR "JBD: Transaction %u "
 						"found to be corrupt.\n",
 						next_commit_ID - 1);

  reply	other threads:[~2008-05-26 16:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-24 22:34 What to do when the journal checksum is incorrect Theodore Ts'o
2008-05-25  6:30 ` Andreas Dilger
2008-05-25 11:38   ` Theodore Tso
2008-05-26 14:54     ` Theodore Tso [this message]
2008-05-26 18:24     ` Andreas Dilger
2008-05-26 21:28       ` Ric Wheeler
2008-06-03 10:22 ` Girish Shilamkar
2008-06-03 21:27   ` Andreas Dilger
2008-06-04 23:40   ` Theodore Tso
2008-06-04 23:56     ` [PATCH] jbd2: Fix memory leak when verifying checksums in the journal Theodore Ts'o
2008-06-04 23:56       ` [PATCH] jbd2: If a journal checksum error is detected, propagate the error to ext4 Theodore Ts'o
2008-06-05  3:17         ` Andreas Dilger
2008-06-05 16:21           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080526145440.GA9893@mit.edu \
    --to=tytso@mit.edu \
    --cc=Girish.Shilamkar@sun.com \
    --cc=adilger@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).