linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Sandeen <sandeen@redhat.com>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes
Date: Tue, 20 May 2008 13:08:03 -0400	[thread overview]
Message-ID: <200805201308.03956.chris.mason@oracle.com> (raw)
In-Reply-To: <20080520162710.GM16676@shareable.org>

On Tuesday 20 May 2008, Jamie Lokier wrote:
> Chris Mason wrote:
> > > You don't need the barrier after in some cases, or it can be deferred
> > > until a better time.  E.g. when the disk write cache is probably empty
> > > (some time after write-idle), barrier flushes may take the same time
> > > as NOPs.
> >
> > I hesitate to get too fancy here, if the disk is idle we probably
> > won't notice the performance gain.
>
> I think you're right, but it's hard to be sure.  One of the problems
> with barrier-implemented-as-flush-all is that it flushes data=ordered
> data, even when that's not wanted, and there can be a lot of data in
> the disk's write cache, spread over many seeks.

Jens and I talked about tossing the barriers completely and just doing FUA for 
all metadata writes.  For drives with NCQ, we'll get something close to 
optimal because the higher layer elevators are already doing most of the hard 
work.

Either way, you do want the flush to cover all the data=ordered writes, at 
least all the ordered writes from the transaction you're about to commit.  
Telling the difference between data=ordered from an old transaction or from 
the running transaction gets into pushing ordering down to the lower levels 
(see below)

>
> Then it's good to delay barrier-flushes to batch metadata commits, but
> good to issue the barrier-flushes prior to large batches of
> data=ordered data, so the latter can be survive in the disk write
> cache for seek optimisations with later requests which aren't yet
> known.
>
> All this sounds complicated at the JBD layer, and IMHO much simpler at
> the request elevator layer.
>
> > But, it complicates the decision about when you're allowed to dirty
> > a metadata block for writeback.  It used to be dirty-after-commit
> > and it would change to dirty-after-barrier.  I suspect that is some
> > significant surgery into jbd.
>
> Rather than tracking when it's "allowed" to dirty a metadata block, it
> will be simpler to keep a flag saying "barrier needed", and just issue
> the barrier prior to writing a metadata block, if the flag is set.
>

Adding explicit ordering into the IO path is really interesting.  We toss a 
bunch of IO down to the lower layers with information about dependencies and 
let the lower layers figure it out.  James had a bunch of ideas here, but I'm 
afraid the only people that understood it were James and the whiteboard he 
was scribbling on.

The trick is to code the ordering in such a way that an IO failure breaks the 
chain, and that the filesystem has some sensible chance to deal with all 
these requests that have failed because an earlier write failed.

Also, once we go down the ordering road, it is really tempting to forget that 
ordering does ensure consistency but doesn't tell us the write actually 
happened.  fsync and friends need to hook into the dependency chain to wait 
for the barrier instead of waiting for the commit.

But, back to the short term for a second, what we need are some benchmarks for 
barriers on and off and some guidance from the ext34 maintainers about 
turning them on by default.  We shouldn't be pushing this FS integrity 
decision off on the distros.

My test prog is definitely a worst case, but I'm pretty confident that most 
mail server workloads end up doing similar IO.

A 16MB or 32MB disk cache is common these days, and that is a very sizable 
percentage of the jbd log size.  I think the potential for corruptions on 
power failure is only growing over time.

-chris

  reply	other threads:[~2008-05-20 17:08 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
2008-05-16 19:05 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
2008-05-19  8:58   ` Pavel Machek
2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 22:15   ` Jamie Lokier
2008-05-16 19:08 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
2008-05-16 19:09 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-20  2:34   ` Theodore Tso
2008-05-20 15:43     ` Jamie Lokier
2008-05-20 15:52       ` Eric Sandeen
2008-05-20 19:54       ` Jens Axboe
2008-05-20 22:02         ` Jamie Lokier
2008-05-21  7:30           ` Jens Axboe
     [not found]       ` <4832F3C6.1050601@redhat.com>
2008-05-20 20:14         ` Jens Axboe
2008-05-16 20:05 ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Andrew Morton
2008-05-16 20:53   ` Eric Sandeen
2008-05-16 20:58     ` Andrew Morton
2008-05-16 21:45       ` Jamie Lokier
2008-05-16 22:03         ` Eric Sandeen
2008-05-16 22:09           ` Jamie Lokier
2008-05-16 22:03     ` Jamie Lokier
2008-05-16 22:21       ` Eric Sandeen
2008-05-16 22:53         ` Jamie Lokier
2008-05-17  0:20           ` Theodore Tso
2008-05-17  0:35             ` Andrew Morton
2008-05-17 13:43               ` Theodore Tso
2008-05-17 17:59                 ` Andreas Dilger
2008-05-17 20:44                 ` Theodore Tso
2008-05-20 14:45                   ` Jamie Lokier
2008-05-18  0:48               ` Chris Mason
2008-05-18  1:36                 ` Theodore Tso
2008-05-18 14:49                   ` Ric Wheeler
     [not found]                   ` <4830420D.4080608@gmail.com>
2008-05-20 14:42                     ` Jamie Lokier
2008-05-20 23:48                     ` Jamie Lokier
2008-05-20 23:44                 ` Jamie Lokier
2008-05-18 20:03         ` Andi Kleen
2008-05-19  0:43           ` Theodore Tso
2008-05-19  2:29             ` Eric Sandeen
     [not found]             ` <4830E60A.2010809@redhat.com>
2008-05-19  4:11               ` Andrew Morton
2008-05-19 17:16                 ` Chris Mason
2008-05-19 18:39                   ` Chris Mason
2008-05-19 22:39                     ` Jan Kara
2008-05-20  0:29                       ` Chris Mason
2008-05-20  3:29                         ` Timothy Shimmin
2008-05-20 12:04                           ` Chris Mason
2008-05-20  8:25                     ` Jens Axboe
2008-05-20 12:17                       ` Chris Mason
2008-05-21 11:22                     ` Pavel Machek
2008-05-21 12:32                       ` Theodore Tso
2008-05-21 18:03                       ` Andrew Morton
2008-05-21 18:15                         ` Eric Sandeen
2008-05-21 19:43                           ` Jamie Lokier
2008-05-21 18:29                         ` Theodore Tso
2008-05-21 18:49                           ` Andrew Morton
2008-05-21 19:42                             ` Jamie Lokier
2008-05-21 19:36                           ` Jamie Lokier
     [not found]                           ` <20080521193633.GA26780@shareable.org>
2008-05-21 19:40                             ` Chris Mason
2008-05-21 19:54                         ` Jamie Lokier
2008-05-20 14:58                   ` Jamie Lokier
2008-05-21 22:30                   ` Daniel Phillips
2008-05-20 23:35               ` Jamie Lokier
2008-05-19  0:28       ` Theodore Tso
2008-05-20 15:13         ` Jamie Lokier
     [not found]         ` <20080520151306.GF16676@shareable.org>
2008-05-21 20:25           ` Greg Smith
2008-05-16 22:30   ` Jamie Lokier
2008-05-18 19:54   ` Andi Kleen
2008-05-19 13:26     ` Chris Mason
2008-05-19 14:46       ` Theodore Tso
2008-05-20  2:51         ` [PATCH, RFC] ext4: Fix use of write barrier in commit logic Theodore Tso
     [not found]         ` <20080520025112.GN15035@mit.edu>
2008-05-20 15:23           ` Jamie Lokier
2008-05-23 18:33         ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Ric Wheeler
2008-05-20 15:36       ` Jamie Lokier
2008-05-20 16:02         ` Chris Mason
2008-05-20 16:27           ` Jamie Lokier
2008-05-20 17:08             ` Chris Mason [this message]
2008-05-20 22:26               ` Jamie Lokier
2008-05-19  9:04   ` Pavel Machek
2008-05-29 13:36   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200805201308.03956.chris.mason@oracle.com \
    --to=chris.mason@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=jamie@shareable.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).