linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 2/2] ext4: Automatically enable journal_async_commit on ext4 file systems
Date: Fri, 11 Sep 2009 10:39:32 -0400	[thread overview]
Message-ID: <4AAA6124.6090509@redhat.com> (raw)
In-Reply-To: <20090911131332.GD20710@mit.edu>

On 09/11/2009 09:13 AM, Theodore Tso wrote:
> On Fri, Sep 11, 2009 at 07:07:27AM -0400, Ric Wheeler wrote:
>    
>> I still think that we changing from a situation in which the drive state
>> with regards to our transactions is almost always consistent to one in
>> which we will often not be consistent.
>>
>> More or less, moving from tight control of the persistent state on the
>> platter to a situation in which, after power failure, we will more often
>> see a bad transaction.  The checksum will catch those conditions, but
>> catching and repairing is not the same as avoiding the need to repair in
>> the first place :)
>>      
> We won't need to repair anything.  We still have a barrier before we
> allow the filesystem to proceed with writing back buffers or
> allocating blocks that aren't safe to be be written back or allocated
> until after the commit.
>
> So if the checksum doesn't match, we simply discard the last commit,
> and the filesystem will be in a consistent state.  This case is
> analogous to what happens if we didn't have enough time to write the
> journal blocks plus the commit blocks before the crash.  By removing
> the barrier before the commit block, it's possible for the commit
> block to be written before the rest of the journal blocks, but we can
> treat this case the same way that we treat a missing commit block ---
> we simply throw away the last transaction.
>
>
> The problems that I've worried about in the past is what happens if we
> have a checksum failure on some commit block *other* than the last
> commit block in the journal.  In that case, we *will* need to do a
> full file system check and repair, and it is a toss up whether we are
> better off ignoring the checksum failure, and replaying all of the
> journal transaction, and hope that the checksum failure is caused by a
> corrupted data block that will be later overwritten by a later
> transaction, or whether we abort the journal replay immediately and
> not replay the later transactions.  Currently we do the latter, but
> the problem is that since we have already started reusing blocks that
> might have been deleted in previous transactions, and some of the
> buffes pinned by previous transactions have already been written out,
> the file system will be in trouble.  This is where adding per-block
> checksums into the journal descriptor blocks might allow us to do a
> better job of recovering from failures in the journal.
>
> *However*, this is problem is totally orthogonal to the async commit.
> In the case of the last transaction, where some journal blocks were
> written out before the commit block was written out, it is safe to
> throw away the last transaction and consider it simply a "not
> committed transaction".
>
>    
>> The key is really how can we measure the impact of this in a realistic
>> way. How many fsck's are needed after a power fail? Chris's directory
>> corruption test?
>>      
> So the test should be that there should be *zero* file system
> corruptions caused by a power failure.  (Unless the power fail induces
> a hardware error, of course; if the stress caused by the power drop
> causes a head crash, nothing we can do about that in software!)  The
> async commit patch should be that safe.  If we can confirm that, then
> the case for making it be the default mount option should be a
> no-brainer.
>
>         	      	     	     	       - Ted
>    

The above makes sense to me. Now we just need to figure out how to test 
properly and verify :-(

ric


      reply	other threads:[~2009-09-11 14:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-05 22:32 [PATCH 1/2] ext4: Remove journal_checksum mount option and enable it by default Theodore Ts'o
2009-09-05 22:32 ` [PATCH 2/2] ext4: Automatically enable journal_async_commit on ext4 file systems Theodore Ts'o
2009-09-05 22:57   ` Andreas Dilger
2009-09-06  1:32     ` Theodore Tso
2009-09-06  2:57   ` Eric Sandeen
2009-09-07 23:48     ` Ric Wheeler
2009-09-07 23:42   ` Ric Wheeler
2009-09-08  4:45     ` Theodore Tso
2009-09-08 11:50       ` Ric Wheeler
2009-09-11  2:45         ` Theodore Tso
2009-09-11 11:07           ` Ric Wheeler
2009-09-11 13:13             ` Theodore Tso
2009-09-11 14:39               ` Ric Wheeler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AAA6124.6090509@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).