From: Linus Torvalds <torvalds@linux-foundation.org>
To: Theodore Tso <tytso@mit.edu>
Cc: Arjan van de Ven <arjan@infradead.org>,
Jens Axboe <jens.axboe@oracle.com>,
Linux Kernel Developers List <linux-kernel@vger.kernel.org>,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [GIT PULL] Ext3 latency fixes
Date: Sun, 5 Apr 2009 10:01:06 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0904050927580.4023@localhost.localdomain> (raw)
In-Reply-To: <20090405001005.GA7553@mit.edu>
On Sat, 4 Apr 2009, Theodore Tso wrote:
>
> It may be that it's easier from an less-lines-of-the-kernels-to-change
> point of view to add a WRITE_SYNC_PLUGGED which doesn't do the unplug,
> and keep WRITE_SNYC as having the implicit unplug. Or maybe it would
> be better to completely separate the "send a write which is marked as
> SYNC" from the "implicit unplug" in the API.
Well, the block layer internally already separates the two, it's just
WRITE_SYNC that ties them together. So a trivial patch like the appended
would make WRITE_SYNC just mean "this is synchronous IO" without the
unplugging, and then WRITE_UNPLUG would mark it both as synchronous _and_
unplug it.
(Or you could do the WRITE_SYNC_PLUGGED model too, but I think WRITE_SYNC
and WRITE_UNPLUG actually would be better names, since they talk about the
effects more directly).
The question is more of a "who really wants the unplugging". Presumably
the direct-io code-path really does (on the assumption that if you are
doing direct-io, the caller had better done all the coalescing it wants).
Non-rotational media would tend to want the unplugging, but the block
layer already does that (ie in __make_request() we already do:
if (unplug || blk_queue_nonrot(q))
__generic_unplug_device(q);
so non-rotational devices get unplugged whether the request was a
unplugging request or not).
Of course, different IO schedulers react differently to that whole "sync
vs unplug" thing. I think CFQ is the only one that actually cares about
the "sync" bit (using different queues for sync vs async). The other
schedulers only care about the plugging. So the patch below really doesn't
make much sense as-is, because as things are right now, the scheduler
behaviors are so different for the unplug-vs-sync thing that no sane user
can ever know whether they should use WRITE_SYNC (== higher priority
queueing for CFQ, no-op for others) or WRITE_UNPLUG (unplug on all, and
additionally higher priority for CFQ).
So I'm not serious about this patch, because as things are right now, I
don't think it's sensible, but I do think that this just generally shows
the confusion of what we should do. When is plugging right, when isn't it?
Also, one of the issues seems to literally be that the higher-level
request handling doesn't care AT ALL about the priority. Allocating the
request itself does keep reads and writes separated, but if the request is
a SYNCIO request, and non-sync writes have filled up th write requests,
we'll have to wait for the background writes to free up requests.
That is quite possibly the longest wait we have in the system.
See get_request():
const int rw = rw_flags & 0x01;
(That should _really_ be RW_MASK instead of 0x01, I suspect) and how the
request allocation itself is done.
I think this is broken.
Linus
---
include/linux/fs.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a09e17c..a144377 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -95,7 +95,8 @@ struct inodes_stat_t {
#define SWRITE 3 /* for ll_rw_block() - wait for buffer lock */
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
-#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
+#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNCIO))
+#define WRITE_UNPLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
#define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
next prev parent reply other threads:[~2009-04-05 17:05 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-03 7:01 [GIT PULL] Ext3 latency fixes Theodore Ts'o
2009-04-03 7:01 ` [PATCH 1/4] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-04-03 7:01 ` [PATCH 2/4] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
2009-04-03 7:01 ` [PATCH 3/4] ext3: Add replace-on-truncate hueristics for data=writeback mode Theodore Ts'o
2009-04-03 7:01 ` [PATCH 4/4] ext3: Add replace-on-rename " Theodore Ts'o
2009-04-03 15:01 ` EXT4 in embedded systems Nick Hennenfent (nhennefe)
2009-04-03 16:06 ` Eric Sandeen
2009-04-03 17:15 ` Nick Hennenfent (nhennefe)
2009-04-03 18:24 ` [GIT PULL] Ext3 latency fixes Linus Torvalds
2009-04-03 18:47 ` Jens Axboe
2009-04-03 19:13 ` Theodore Tso
2009-04-03 21:01 ` Chris Mason
2009-04-03 19:02 ` Linus Torvalds
2009-04-03 20:41 ` Linus Torvalds
2009-04-04 13:57 ` Theodore Tso
2009-04-04 15:16 ` Jens Axboe
2009-04-04 15:57 ` Linus Torvalds
2009-04-04 16:06 ` Linus Torvalds
2009-04-04 17:36 ` Jens Axboe
2009-04-04 17:34 ` Jens Axboe
2009-04-04 17:44 ` Linus Torvalds
2009-04-04 18:00 ` Trenton D. Adams
2009-04-04 18:01 ` Jens Axboe
2009-04-04 18:10 ` Linus Torvalds
2009-04-04 23:22 ` Theodore Tso
2009-04-04 23:33 ` Arjan van de Ven
2009-04-05 0:10 ` Theodore Tso
2009-04-05 15:05 ` Arjan van de Ven
2009-04-05 17:01 ` Linus Torvalds [this message]
2009-04-05 17:15 ` Mark Lord
2009-04-05 20:57 ` Jeff Garzik
2009-04-05 23:48 ` Arjan van de Ven
2009-04-06 2:32 ` Mark Lord
2009-04-06 5:47 ` Jeff Garzik
2009-04-07 18:18 ` Linus Torvalds
2009-04-07 18:22 ` Linus Torvalds
2009-04-06 8:13 ` Jens Axboe
2009-04-05 18:56 ` Arjan van de Ven
2009-04-05 19:34 ` Linus Torvalds
2009-04-05 20:06 ` Arjan van de Ven
2009-04-06 6:25 ` Jens Axboe
2009-04-06 6:05 ` Theodore Tso
2009-04-06 6:23 ` Jens Axboe
2009-04-06 8:16 ` Jens Axboe
2009-04-06 14:48 ` Linus Torvalds
2009-04-06 15:09 ` Jens Axboe
2009-04-06 6:15 ` Jens Axboe
2009-04-04 20:18 ` Ingo Molnar
2009-04-06 21:50 ` Lennart Sorensen
2009-04-07 13:31 ` Mark Lord
2009-04-07 14:48 ` Lennart Sorensen
2009-04-07 19:21 ` Mark Lord
2009-04-07 19:57 ` Lennart Sorensen
2009-04-04 20:56 ` Arjan van de Ven
2009-04-06 7:06 ` Jens Axboe
2009-04-07 15:39 ` Indan Zupancic
2009-04-04 19:18 ` Theodore Tso
2009-04-06 8:12 ` Jens Axboe
2009-04-04 22:13 ` Linus Torvalds
2009-04-04 22:19 ` Linus Torvalds
2009-04-05 0:20 ` Theodore Tso
2009-04-03 19:54 ` Theodore Tso
-- strict thread matches above, loose matches on Subject: below --
2009-04-08 23:40 Theodore Ts'o
2009-04-09 15:49 ` Linus Torvalds
2009-04-09 16:23 ` Chris Mason
2009-04-09 17:49 ` Jan Kara
2009-04-09 18:10 ` Chris Mason
2009-04-09 19:04 ` Jan Kara
2009-04-09 17:36 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0904050927580.4023@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).