linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>,
	Christian Fischer <Christian.Fischer@easterngraphics.com>,
	linux-ext4@vger.kernel.org
Subject: Re: Enable asynchronous commits by default patch revoked?
Date: Mon, 24 Aug 2009 16:28:16 -0400	[thread overview]
Message-ID: <4A92F7E0.9010001@redhat.com> (raw)
In-Reply-To: <20090824201027.GC17684@mit.edu>

Theodore Tso wrote:
> On Mon, Aug 24, 2009 at 12:31:19PM -0600, Andreas Dilger wrote:
>   
>>> No one has gotten around to looking at this closely; I think adding a
>>> strategically placed blkdev_issue_flush() will allow us to safely
>>> enable this feature, but it needs careful study.
>>>       
>> I don't think that was the issue, but rather that we wanted to have
>> per-block checksums in order to handle the case were some block in
>> transaction A is causing a transaction checksum failure, yet transaction
>> B has already committed and begun checkpointing.
>>     
>
> There are multiple problems that are going on here.
>
> Adding per-block checksums is useful in providing better resiliance in
> the case of write errors in the journal, and providing better error
> handling when we detect a checksum error in the commit block.
>   

My issue with the async commit is that it is basically a detection 
mechanism.

Drives will (almost always) write to platter sequential writes in order. 
Async commit lets us send down things out of order which means that we 
have a wider window of "bad state" for any given transaction...

ric

> However, even if we add even if we add per-block checksums, there is
> still the problem that there is logic in the jbd layer where we avoid
> reusing certain blocks until we are sure the transaction has
> committed.  With asynchronous commits, there is no way of knowing when
> it is safe to reuse those blocks.  
>
> To illustrate, consider a data block for an inode which has just been
> deleted.  We have code which prevents that block from being reused
> until the transaction has committed; but when we use asynchronous
> commits, we don't know when that will be.  Currently the async commit
> code assumes that once we send the commit block to be written, the
> commit has happened; this opens up a race where between when the
> commit record (and all of the associated journal blocks) is actually
> written to disk, and when a data block gets reused.
>
> Most of the time, this will cause silent corruption of a data file
> that was about to be deleted right before the power failure --- but if
> the block in question was part of a directory that was in the process
> of being deleted, it could result in a filesystem corruption
> detectable by e2fsck.
>
> Looking at the code, the best we can do in the short-term is to write
> the commit record where we do, but do so with a barrier requested, and
> then wait for the commit block where we do.  This will provide some
> performance improvement, since we won't wait for all of the journal
> blocks to be sent to disk before writing the commit record.  However,
> we still have to wait for the commit record (and all of the blocks
> before it) to be committed to stable store before we can mark the
> transaction as being truly committed.  So it's not a true "async
> commit", but it is a benefit of adding journal checksums.
>
> It might be possible in the long term to batch together multiple
> transaction in the "committing" state, instead of only allowing one
> transaction to be in the committing state, and to prevent blocks from
> being reused or allowing pinned buffer_heads from writing the contents
> of the blocks to their final location on disk until we are 100% sure
> the commit block and all of the associated journal blocks really have
> made it to disk.  However, it's probably not worth doing this, since
> the only time this would make a big difference is when we have several
> transactions closing within the space of a short time; and in that
> case the fsync() requires a barrier operation anyway.
>
> 							- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


  reply	other threads:[~2009-08-24 20:29 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200908241033.10527.Christian.Fischer@easterngraphics.com>
2009-08-24 13:34 ` Enable asynchronous commits by default patch revoked? Theodore Tso
2009-08-24 18:31   ` Andreas Dilger
2009-08-24 18:37     ` Ric Wheeler
2009-08-24 20:10     ` Theodore Tso
2009-08-24 20:28       ` Ric Wheeler [this message]
2009-08-24 22:07         ` Theodore Tso
2009-08-24 22:12           ` Ric Wheeler
2009-08-24 23:28             ` Theodore Tso
2009-08-24 23:43               ` Andreas Dilger
2009-08-25  0:15                 ` Theodore Tso
2009-08-25 17:52                   ` Andreas Dilger
2009-08-25 18:07                     ` Ric Wheeler
2009-08-25 21:11                       ` Theodore Tso
2009-08-26  9:50                         ` Andreas Dilger
2009-08-26 13:14                           ` Theodore Tso
2009-08-26 22:00                             ` Andreas Dilger
2009-08-26 22:55                               ` Theodore Tso
2009-08-25 18:21                     ` Ric Wheeler
2009-08-26 16:02                   ` Jan Kara
2009-08-24 22:46           ` Andreas Dilger
2009-08-24 23:52             ` Theodore Tso
2009-09-02 14:48           ` Tom Vier
2009-09-02 15:03             ` Theodore Tso
2009-08-24 21:28       ` Andreas Dilger
2009-08-25  6:16   ` Christian Fischer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A92F7E0.9010001@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=Christian.Fischer@easterngraphics.com \
    --cc=adilger@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).