From: Jamie Lokier <jamie@shareable.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Eric Sandeen <sandeen@redhat.com>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes
Date: Tue, 20 May 2008 17:27:10 +0100 [thread overview]
Message-ID: <20080520162710.GM16676@shareable.org> (raw)
In-Reply-To: <200805201202.54420.chris.mason@oracle.com>
Chris Mason wrote:
> > You don't need the barrier after in some cases, or it can be deferred
> > until a better time. E.g. when the disk write cache is probably empty
> > (some time after write-idle), barrier flushes may take the same time
> > as NOPs.
>
> I hesitate to get too fancy here, if the disk is idle we probably
> won't notice the performance gain.
I think you're right, but it's hard to be sure. One of the problems
with barrier-implemented-as-flush-all is that it flushes data=ordered
data, even when that's not wanted, and there can be a lot of data in
the disk's write cache, spread over many seeks.
Then it's good to delay barrier-flushes to batch metadata commits, but
good to issue the barrier-flushes prior to large batches of
data=ordered data, so the latter can be survive in the disk write
cache for seek optimisations with later requests which aren't yet
known.
All this sounds complicated at the JBD layer, and IMHO much simpler at
the request elevator layer.
> But, it complicates the decision about when you're allowed to dirty
> a metadata block for writeback. It used to be dirty-after-commit
> and it would change to dirty-after-barrier. I suspect that is some
> significant surgery into jbd.
Rather than tracking when it's "allowed" to dirty a metadata block, it
will be simpler to keep a flag saying "barrier needed", and just issue
the barrier prior to writing a metadata block, if the flag is set.
So metadata write scheduling doesn't need to be changed at all. That
will be quite simple.
You might still change the scheduling, but only as a performance
heuristic in any way which turns out to be easy.
Really, that flag should live in the request elevator instead, where
it could do more good. I.e. WRITE_BARRIER wouldn't actually issue a
barrier op to disk after writing. It would just set a request
elevator flag, so a barrier op is issued prior to the next WRITE.
That road opens some nice optimisations on software RAID, which aren't
possible if it's done at the JBD layer.
> Also, since a commit isn't really done until the barrier is done, you can't
> reuse blocks freed by the committing transaction until after the barrier,
> which means changes in the deletion handling code.
Good point.
In this case, re-allocating time isn't the problem: actually writing
to them is. Writes to recycled block require to be ordered after
commits which recycled them.
As above, just issue the barrier prior to the next write which needs
to be ordered - effectively it's glued on the front of the write op.
This comes for free with no change to deletion code (wow :-) if the
only operations are WRITE_BARRIER (= flush before and after or
equivalent) and WRITE (ordered by WRITE_BARRIER).
> > What's more, barriers can be deferred past data=ordered in-place data
> > writes, although that's not always an optimisation.
> >
>
> It might be really interesting to have a
> i'm-about-to-barrier-find-some-io-to-run call. Something along the lines of
> draining the dirty pages when the drive is woken up in laptop mode. There's
> lots of fun with page lock vs journal lock ordering, but Jan has a handle on
> that I think.
I'm suspecting the opposite might be better.
I'm-about-to-barrier-please-move-the-barrier-in-front-of-unordered-writes.
The more writes you _don't_ flush synchronously, the more
opportunities you give the disk's cache to reduce seeking.
It's only a hunch though.
-- Jamie
next prev parent reply other threads:[~2008-05-20 16:27 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
2008-05-16 19:05 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
2008-05-19 8:58 ` Pavel Machek
2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 22:15 ` Jamie Lokier
2008-05-16 19:08 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
2008-05-16 19:09 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-20 2:34 ` Theodore Tso
2008-05-20 15:43 ` Jamie Lokier
2008-05-20 15:52 ` Eric Sandeen
2008-05-20 19:54 ` Jens Axboe
2008-05-20 22:02 ` Jamie Lokier
2008-05-21 7:30 ` Jens Axboe
[not found] ` <4832F3C6.1050601@redhat.com>
2008-05-20 20:14 ` Jens Axboe
2008-05-16 20:05 ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Andrew Morton
2008-05-16 20:53 ` Eric Sandeen
2008-05-16 20:58 ` Andrew Morton
2008-05-16 21:45 ` Jamie Lokier
2008-05-16 22:03 ` Eric Sandeen
2008-05-16 22:09 ` Jamie Lokier
2008-05-16 22:03 ` Jamie Lokier
2008-05-16 22:21 ` Eric Sandeen
2008-05-16 22:53 ` Jamie Lokier
2008-05-17 0:20 ` Theodore Tso
2008-05-17 0:35 ` Andrew Morton
2008-05-17 13:43 ` Theodore Tso
2008-05-17 17:59 ` Andreas Dilger
2008-05-17 20:44 ` Theodore Tso
2008-05-20 14:45 ` Jamie Lokier
2008-05-18 0:48 ` Chris Mason
2008-05-18 1:36 ` Theodore Tso
2008-05-18 14:49 ` Ric Wheeler
[not found] ` <4830420D.4080608@gmail.com>
2008-05-20 14:42 ` Jamie Lokier
2008-05-20 23:48 ` Jamie Lokier
2008-05-20 23:44 ` Jamie Lokier
2008-05-18 20:03 ` Andi Kleen
2008-05-19 0:43 ` Theodore Tso
2008-05-19 2:29 ` Eric Sandeen
[not found] ` <4830E60A.2010809@redhat.com>
2008-05-19 4:11 ` Andrew Morton
2008-05-19 17:16 ` Chris Mason
2008-05-19 18:39 ` Chris Mason
2008-05-19 22:39 ` Jan Kara
2008-05-20 0:29 ` Chris Mason
2008-05-20 3:29 ` Timothy Shimmin
2008-05-20 12:04 ` Chris Mason
2008-05-20 8:25 ` Jens Axboe
2008-05-20 12:17 ` Chris Mason
2008-05-21 11:22 ` Pavel Machek
2008-05-21 12:32 ` Theodore Tso
2008-05-21 18:03 ` Andrew Morton
2008-05-21 18:15 ` Eric Sandeen
2008-05-21 19:43 ` Jamie Lokier
2008-05-21 18:29 ` Theodore Tso
2008-05-21 18:49 ` Andrew Morton
2008-05-21 19:42 ` Jamie Lokier
2008-05-21 19:36 ` Jamie Lokier
[not found] ` <20080521193633.GA26780@shareable.org>
2008-05-21 19:40 ` Chris Mason
2008-05-21 19:54 ` Jamie Lokier
2008-05-20 14:58 ` Jamie Lokier
2008-05-21 22:30 ` Daniel Phillips
2008-05-20 23:35 ` Jamie Lokier
2008-05-19 0:28 ` Theodore Tso
2008-05-20 15:13 ` Jamie Lokier
[not found] ` <20080520151306.GF16676@shareable.org>
2008-05-21 20:25 ` Greg Smith
2008-05-16 22:30 ` Jamie Lokier
2008-05-18 19:54 ` Andi Kleen
2008-05-19 13:26 ` Chris Mason
2008-05-19 14:46 ` Theodore Tso
2008-05-20 2:51 ` [PATCH, RFC] ext4: Fix use of write barrier in commit logic Theodore Tso
[not found] ` <20080520025112.GN15035@mit.edu>
2008-05-20 15:23 ` Jamie Lokier
2008-05-23 18:33 ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Ric Wheeler
2008-05-20 15:36 ` Jamie Lokier
2008-05-20 16:02 ` Chris Mason
2008-05-20 16:27 ` Jamie Lokier [this message]
2008-05-20 17:08 ` Chris Mason
2008-05-20 22:26 ` Jamie Lokier
2008-05-19 9:04 ` Pavel Machek
2008-05-29 13:36 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080520162710.GM16676@shareable.org \
--to=jamie@shareable.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).