Re: Dirty ext4 blocks system startup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Markus <M4rkusXXL@web.de>
To: Theodore Ts'o <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Dirty ext4 blocks system startup
Date: Tue, 08 Apr 2014 16:25:08 +0200	[thread overview]
Message-ID: <1452787.GTL29L0o32@web.de> (raw)
In-Reply-To: <2164274.jmlex94sWc@web.de>

I patched e2fsck as mentionied below.
./e2fsck /dev/md5
e2fsck 1.43-WIP (4-Feb-2014)
/dev/md5: recovering journal
JBD: Invalid checksum recovering block 1152 in log
JBD: Invalid checksum recovering block 1156 in log
Setting free inodes count to 366227296 (was 366241761)
Setting free blocks count to 652527218 (was 730998757)
/dev/md5: clean, 41120/366268416 files, 2277606286/2930133504 blocks

So two blocks were bad. But the the recovery worked and the last few files all were intact.

A full check did not find any errors:
./e2fsck -f -n /dev/md5
e2fsck 1.43-WIP (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md5: 41120/366268416 files (4.5% non-contiguous), 2277606286/2930133504 blocks

So I think that fs is now fine again.


But still, e2fsck should not be trapped in an endless loop.


Thanks,
Markus


Markus wrote on 07.04.2014:
> Theodore Ts'o wrote on 07.04.2014:
> > On Mon, Apr 07, 2014 at 12:58:40PM +0200, Markus wrote:
> > > 
> > > Finally e2image finished successfully. But the produced file is way too 
> big for a mail.
> > > 
> > > Any other possibility?
> > > (e2image does dump everything except file data and free space. But the 
> problem seems to be just in the bitmap and/or journal.)
> > > 
> > > Actually, when I look at the code around e2fsck/recovery.c:594
> > > The error is detected and continue is called.
> > > But tagp/tag is never changed, but the checksum is always compared to the 
> one from tag. Intended?
> > 
> > What mount options are you using?  It appears that you have journal
> > checksums enabled, which isn't on by default, and unfortunately,
> > there's a good reason for that.  The original code assumed that the
> > most common case for journal corruption would be caused by an
> > incomplete journal transaction getting written out if one were using
> > journal_async_commit.  This feature has not been enabled by default
> > because the qeustion of what to do when the journal gets corrupted in
> > other cases is not an easy one.
> 
> Normally just "noatime,journal_checksum", but with the corrupted journal I use 
> "ro,noload".
> 
> The "man mount" reads well about that "journal_checksum" option ;)
> 
> 
> > If some part of a transaction which is not the very last transaction
> > in the journal gets corrupted, replaying it could do severe damage to
> > the file system.  Unfortunately, simply deleting the journal and then
> > recreating it could also do more damage as well.  Most of the time, a
> > bad checksum happens because the last transaction hasn't fully made it
> > out to disk (especially if you use the journal_async_commit option,
> > which is a bit of a misnomer and has its own caveats[1]).  But if the
> > checksum violation happens in a journal transaction that is not the
> > last transaction in the journal, right now the recovery code aborts,
> > because we don't have good automated logic to handle this case.
> 
> The recovery does not seem to abort. It calles continue and is caught in an 
> endless loop.
> 
> 
> > I suspect if you need to get your file system back on its feet, the
> > best thing to do is to create a patched e2fsck that doesn't abort when
> > it finds a checksum error, but instead continues.  Then run it to
> > replay the journal, and then force a full file system check and hope
> > for the best.
> 
> The code calls "continue". ;)
> So I just remove the whole if clause:
>   /* Look for block corruption */
>   if (!jbd2_block_tag_csum_verify(
>   		journal, tag, obh->b_data,
>   		be32_to_cpu(tmp->h_sequence))) {
> - 	brelse(obh);
> - 	success = -EIO;
>   	printk(KERN_ERR "JBD: Invalid "
>   			"checksum recovering "
>   			"block %lld in log\n",
>   			blocknr);
> - 	continue;
>   }
> 
> It would then ignore the checksum and just issue a message. Right?
> 
> 
> > What has been on my todo list to implement, but has been relatively
> > low priority because this is not a feature that we've documented or
> > encouraged peple to use, is to have e2fsck skip the transaction has a
> > bad checksum (i.e., not replay it at all), and then force a full file
> > system check.  This is a bit safer, but if you make e2fsck ignore the
> > checksum, it's no worse than if journal checksums weren't enabled in
> > the first place.
> > 
> > The long term thing that we need to add before we can really support
> > journal checksums is to checksum each individual data block, instead
> > of just each transaction.  Then when we have a bad checksum, we can
> > skip just the one bad data block, and then force a full fsck.
> > 
> > I'm sorry you ran into this.  What I should do is to disable these
> > mount options for now, since users who stumble across them, as
> > apparently you have, might be tempted to use them, and then get into
> > trouble.
> > 
> >      	      	      	   	      	   	 - Ted
> > 
> > [1] The issue with journal_async_commit is that it's possible (fairly
> > unlikely, but still possible) that the guarantees of data=ordered will
> > be violated.  If the data blocks that were written out while we are
> > resolving a delayed allocation writeback haven't made it all the way
> > down to the platter, it's possible for all of the journal writes and
> > the commit block to be reordered ahead of the data blocks.  In that
> > case, the checksum for the commit block would be valid, but some of
> > the data blocks might not have been written back to disk.
> 
> Thanks so far,
> Markus

next prev parent reply	other threads:[~2014-04-08 14:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-01 14:36 Ext4 Recovery: Invalid checksum recovering block # in log Markus
2014-04-04 10:35 ` Dirty ext4 blocks system startup Markus
2014-04-04 18:20   ` Darrick J. Wong
2014-04-05 13:10     ` Markus
2014-04-07 10:58       ` Markus
2014-04-07 12:48         ` Theodore Ts'o
2014-04-07 14:06           ` Markus
2014-04-08 14:25             ` Markus [this message]
2014-04-08 15:28               ` Theodore Ts'o
2014-04-08 19:18             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452787.GTL29L0o32@web.de \
    --to=m4rkusxxl@web.de \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.