linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Markus <M4rkusXXL@web.de>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Dirty ext4 blocks system startup
Date: Mon, 7 Apr 2014 08:48:20 -0400	[thread overview]
Message-ID: <20140407124820.GB8468@thunk.org> (raw)
In-Reply-To: <7488414.mDGKOZ8cSK@web.de>

On Mon, Apr 07, 2014 at 12:58:40PM +0200, Markus wrote:
> 
> Finally e2image finished successfully. But the produced file is way too big for a mail.
> 
> Any other possibility?
> (e2image does dump everything except file data and free space. But the problem seems to be just in the bitmap and/or journal.)
> 
> Actually, when I look at the code around e2fsck/recovery.c:594
> The error is detected and continue is called.
> But tagp/tag is never changed, but the checksum is always compared to the one from tag. Intended?

What mount options are you using?  It appears that you have journal
checksums enabled, which isn't on by default, and unfortunately,
there's a good reason for that.  The original code assumed that the
most common case for journal corruption would be caused by an
incomplete journal transaction getting written out if one were using
journal_async_commit.  This feature has not been enabled by default
because the qeustion of what to do when the journal gets corrupted in
other cases is not an easy one.

If some part of a transaction which is not the very last transaction
in the journal gets corrupted, replaying it could do severe damage to
the file system.  Unfortunately, simply deleting the journal and then
recreating it could also do more damage as well.  Most of the time, a
bad checksum happens because the last transaction hasn't fully made it
out to disk (especially if you use the journal_async_commit option,
which is a bit of a misnomer and has its own caveats[1]).  But if the
checksum violation happens in a journal transaction that is not the
last transaction in the journal, right now the recovery code aborts,
because we don't have good automated logic to handle this case.

I suspect if you need to get your file system back on its feet, the
best thing to do is to create a patched e2fsck that doesn't abort when
it finds a checksum error, but instead continues.  Then run it to
replay the journal, and then force a full file system check and hope
for the best.

What has been on my todo list to implement, but has been relatively
low priority because this is not a feature that we've documented or
encouraged peple to use, is to have e2fsck skip the transaction has a
bad checksum (i.e., not replay it at all), and then force a full file
system check.  This is a bit safer, but if you make e2fsck ignore the
checksum, it's no worse than if journal checksums weren't enabled in
the first place.

The long term thing that we need to add before we can really support
journal checksums is to checksum each individual data block, instead
of just each transaction.  Then when we have a bad checksum, we can
skip just the one bad data block, and then force a full fsck.

I'm sorry you ran into this.  What I should do is to disable these
mount options for now, since users who stumble across them, as
apparently you have, might be tempted to use them, and then get into
trouble.

     	      	      	   	      	   	 - Ted

[1] The issue with journal_async_commit is that it's possible (fairly
unlikely, but still possible) that the guarantees of data=ordered will
be violated.  If the data blocks that were written out while we are
resolving a delayed allocation writeback haven't made it all the way
down to the platter, it's possible for all of the journal writes and
the commit block to be reordered ahead of the data blocks.  In that
case, the checksum for the commit block would be valid, but some of
the data blocks might not have been written back to disk.

  reply	other threads:[~2014-04-07 12:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1459400.cqhC1n3S74@f209>
2014-04-04 10:35 ` Dirty ext4 blocks system startup Markus
2014-04-04 18:20   ` Darrick J. Wong
2014-04-05 13:10     ` Markus
2014-04-07 10:58       ` Markus
2014-04-07 12:48         ` Theodore Ts'o [this message]
2014-04-07 14:06           ` Markus
2014-04-08 14:25             ` Markus
2014-04-08 15:28               ` Theodore Ts'o
2014-04-08 19:18             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140407124820.GB8468@thunk.org \
    --to=tytso@mit.edu \
    --cc=M4rkusXXL@web.de \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).