From: Vivek Goyal <vgoyal@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Ted Ts'o <tytso@mit.edu>, Tejun Heo <tj@kernel.org>,
Jan Kara <jack@suse.cz>,
jaxboe@fusionio.com, James.Bottomley@suse.de,
linux-fsdevel@vger.kernel.org, linux-scsi@vger.kernel.org,
chris.mason@oracle.com, swhiteho@redhat.com,
konishi.ryusuke@lab.ntt.co.jp
Subject: Re: [RFC] relaxed barrier semantics
Date: Thu, 29 Jul 2010 23:17:21 -0400 [thread overview]
Message-ID: <20100730031721.GA31762@redhat.com> (raw)
In-Reply-To: <20100729200655.GB17767@lst.de>
On Thu, Jul 29, 2010 at 10:06:55PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 29, 2010 at 04:02:17PM -0400, Vivek Goyal wrote:
> > Looks like you still want to go with option 2 where you will scan the file
> > system code for requirement of DRAIN semantics and everything is fine then for
> > devices no supporting volatile caches, you will mark request queue as NONE.
>
> The filesystem can't simply change the request queue settings. A request
> queue is often shared by multiple filesystems that can have very
> different requirements.
>
> > This solves the problem on devices with WCE=0 but what about devices with
> > WCE=1. If file systems anyway don't require DRAIN semantics, then we
> > should not require it on devices with WCE=1 also?
>
> Yes.
>
> > If yes, then why not go with another variant of barriers which don't
> > perform DRAIN and just do PREFLUSH + FUA (or post flush for devices not
> > supporting FUA).
>
> I've been trying to prototype it, but it's in fact rather hard to
> get this right. Tejun has done a really good job at the current
> barrier implementation and coming up with something just half as
> clever for the relaxed barriers has been driving me mad.
>
> > And then file systems can slowly move to using this non
> > draining barrier usage wherever appropriate.
>
> Actually supporting different kind of barriers at the same time
> is even harder. We'll need two different state machines for them,
> including the actual state in the request_queue. And then make
> sure when different filesystems on the same queue use different
> types work well together. If at all possible switching the semantics
> on a flag day would make life a lot simpler.
Hi Christoph,
I was looking at barrier code and was trying to think that how hard it is
to support a new barrier type which does not implement DRAIN but only
does PREFLUSH + FUA for devices with WCE=1.
To me it looked like as if everything is there and it is just a matter
of skipping elevator draining and request queue draining.
Can you please have a look at attached patch. This is not a complete patch
but just a part of it if we were to implement another barrier type, say
FLUSHBARRIER. Do you think this will work or I am blissfuly unaware of
complexity here and oversimplifying the things.
Thanks
Vivek
---
block/blk-barrier.c | 14 +++++++++++++-
block/elevator.c | 3 ++-
include/linux/blkdev.h | 5 ++++-
3 files changed, 19 insertions(+), 3 deletions(-)
Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h 2010-06-19 09:54:32.000000000 -0400
+++ linux-2.6/include/linux/blkdev.h 2010-07-29 22:36:52.000000000 -0400
@@ -97,6 +97,7 @@ enum rq_flag_bits {
__REQ_SORTED, /* elevator knows about this request */
__REQ_SOFTBARRIER, /* may not be passed by ioscheduler */
__REQ_HARDBARRIER, /* may not be passed by drive either */
+ __REQ_FLUSHBARRIER, /* only flush barrier. no drains required */
__REQ_FUA, /* forced unit access */
__REQ_NOMERGE, /* don't touch this for merging */
__REQ_STARTED, /* drive already may have started this one */
@@ -126,6 +127,7 @@ enum rq_flag_bits {
#define REQ_SORTED (1 << __REQ_SORTED)
#define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER)
#define REQ_HARDBARRIER (1 << __REQ_HARDBARRIER)
+#define REQ_FLUSHBARRIER (1 << __REQ_FLUSHBARRIER)
#define REQ_FUA (1 << __REQ_FUA)
#define REQ_NOMERGE (1 << __REQ_NOMERGE)
#define REQ_STARTED (1 << __REQ_STARTED)
@@ -626,6 +628,7 @@ enum {
#define blk_rq_cpu_valid(rq) ((rq)->cpu != -1)
#define blk_sorted_rq(rq) ((rq)->cmd_flags & REQ_SORTED)
#define blk_barrier_rq(rq) ((rq)->cmd_flags & REQ_HARDBARRIER)
+#define blk_flush_barrier_rq(rq) ((rq)->cmd_flags & REQ_FLUSHBARRIER)
#define blk_fua_rq(rq) ((rq)->cmd_flags & REQ_FUA)
#define blk_discard_rq(rq) ((rq)->cmd_flags & REQ_DISCARD)
#define blk_bidi_rq(rq) ((rq)->next_rq != NULL)
@@ -681,7 +684,7 @@ static inline void blk_clear_queue_full(
* it already be started by driver.
*/
#define RQ_NOMERGE_FLAGS \
- (REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER)
+ (REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER | REQ_FLUSHBARRIER)
#define rq_mergeable(rq) \
(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
(blk_discard_rq(rq) || blk_fs_request((rq))))
Index: linux-2.6/block/blk-barrier.c
===================================================================
--- linux-2.6.orig/block/blk-barrier.c 2010-06-19 09:54:29.000000000 -0400
+++ linux-2.6/block/blk-barrier.c 2010-07-29 23:02:05.000000000 -0400
@@ -219,7 +219,8 @@ static inline bool start_ordered(struct
} else
skip |= QUEUE_ORDSEQ_PREFLUSH;
- if ((q->ordered & QUEUE_ORDERED_BY_DRAIN) && queue_in_flight(q))
+ if ((q->ordered & QUEUE_ORDERED_BY_DRAIN) && queue_in_flight(q)
+ && !blk_flush_barrier_rq(rq))
rq = NULL;
else
skip |= QUEUE_ORDSEQ_DRAIN;
@@ -241,6 +242,17 @@ bool blk_do_ordered(struct request_queue
if (!q->ordseq) {
if (!is_barrier)
return true;
+ /*
+ * For flush only barriers, nothing has to be done if there is
+ * no caching happening on the deice. The barrier request is
+ * still has to be written to disk but it can written as
+ * normal rq.
+ */
+
+ if (blk_flush_barrier_rq(rq)
+ && (q->ordered & QUEUE_ORDERED_BY_DRAIN
+ || q->ordered & QUEUE_ORDERED_BY_TAG))
+ return true;
if (q->next_ordered != QUEUE_ORDERED_NONE)
return start_ordered(q, rqp);
Index: linux-2.6/block/elevator.c
===================================================================
--- linux-2.6.orig/block/elevator.c 2010-06-19 09:54:29.000000000 -0400
+++ linux-2.6/block/elevator.c 2010-07-29 23:06:21.000000000 -0400
@@ -628,7 +628,8 @@ void elv_insert(struct request_queue *q,
case ELEVATOR_INSERT_BACK:
rq->cmd_flags |= REQ_SOFTBARRIER;
- elv_drain_elevator(q);
+ if (!blk_flush_barrier_rq(rq))
+ elv_drain_elevator(q);
list_add_tail(&rq->queuelist, &q->queue_head);
/*
* We kick the queue here for the following reasons.
next prev parent reply other threads:[~2010-07-30 3:17 UTC|newest]
Thread overview: 155+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-27 16:56 [RFC] relaxed barrier semantics Christoph Hellwig
2010-07-27 17:54 ` Jan Kara
2010-07-27 18:35 ` Vivek Goyal
2010-07-27 18:42 ` James Bottomley
2010-07-27 18:51 ` Ric Wheeler
2010-07-27 19:43 ` Christoph Hellwig
2010-07-27 19:38 ` Christoph Hellwig
2010-07-28 8:08 ` Tejun Heo
2010-07-28 8:20 ` Tejun Heo
2010-07-28 13:55 ` Vladislav Bolkhovitin
2010-07-28 14:23 ` Tejun Heo
2010-07-28 14:37 ` James Bottomley
2010-07-28 14:44 ` Tejun Heo
2010-07-28 16:17 ` Vladislav Bolkhovitin
2010-07-28 16:17 ` Vladislav Bolkhovitin
2010-07-28 16:16 ` Vladislav Bolkhovitin
2010-07-28 8:24 ` Christoph Hellwig
2010-07-28 8:40 ` Tejun Heo
2010-07-28 8:50 ` Christoph Hellwig
2010-07-28 8:58 ` Tejun Heo
2010-07-28 9:00 ` Christoph Hellwig
2010-07-28 9:11 ` Hannes Reinecke
2010-07-28 9:16 ` Christoph Hellwig
2010-07-28 9:24 ` Tejun Heo
2010-07-28 9:38 ` Christoph Hellwig
2010-07-28 9:28 ` Steven Whitehouse
2010-07-28 9:35 ` READ_META semantics, was " Christoph Hellwig
2010-07-28 13:52 ` Jeff Moyer
2010-07-28 9:17 ` Tejun Heo
2010-07-28 9:28 ` Christoph Hellwig
2010-07-28 9:48 ` Tejun Heo
2010-07-28 10:19 ` Steven Whitehouse
2010-07-28 11:45 ` Christoph Hellwig
2010-07-28 12:47 ` Jan Kara
2010-07-28 23:00 ` Christoph Hellwig
2010-07-29 10:45 ` Jan Kara
2010-07-29 16:54 ` Joel Becker
2010-07-29 17:02 ` Christoph Hellwig
2010-07-29 17:02 ` Christoph Hellwig
2010-07-29 1:44 ` Ted Ts'o
2010-07-29 2:43 ` Vivek Goyal
2010-07-29 8:42 ` Christoph Hellwig
2010-07-29 20:02 ` Vivek Goyal
2010-07-29 20:06 ` Christoph Hellwig
2010-07-30 3:17 ` Vivek Goyal [this message]
2010-07-30 7:07 ` Christoph Hellwig
2010-07-30 7:41 ` Vivek Goyal
2010-08-02 18:28 ` [RFC PATCH] Flush only barriers (Was: Re: [RFC] relaxed barrier semantics) Vivek Goyal
2010-08-03 13:03 ` Christoph Hellwig
2010-08-04 15:29 ` Vivek Goyal
2010-08-04 16:21 ` Christoph Hellwig
2010-07-29 2:43 ` [RFC] relaxed barrier semantics Vivek Goyal
2010-07-29 8:31 ` Christoph Hellwig
2010-07-29 11:16 ` Jan Kara
2010-07-29 13:00 ` extfs reliability Vladislav Bolkhovitin
2010-07-29 13:08 ` Christoph Hellwig
2010-07-29 14:12 ` Vladislav Bolkhovitin
2010-07-29 14:34 ` Jan Kara
2010-07-29 18:20 ` Vladislav Bolkhovitin
2010-07-29 18:49 ` Vladislav Bolkhovitin
2010-07-29 14:26 ` Jan Kara
2010-07-29 18:20 ` Vladislav Bolkhovitin
2010-07-29 18:58 ` Ted Ts'o
2010-07-29 19:44 ` [RFC] relaxed barrier semantics Ric Wheeler
2010-07-29 19:44 ` Ric Wheeler
2010-07-29 19:49 ` Christoph Hellwig
2010-07-29 19:56 ` Ric Wheeler
2010-07-29 19:59 ` James Bottomley
2010-07-29 20:03 ` Christoph Hellwig
2010-07-29 20:07 ` James Bottomley
2010-07-29 20:11 ` Christoph Hellwig
2010-07-30 12:45 ` Vladislav Bolkhovitin
2010-07-30 12:56 ` Christoph Hellwig
2010-08-04 1:58 ` Jamie Lokier
2010-07-30 12:46 ` Vladislav Bolkhovitin
2010-07-30 12:57 ` Christoph Hellwig
2010-07-30 13:09 ` Vladislav Bolkhovitin
2010-07-30 13:12 ` Christoph Hellwig
2010-07-30 17:40 ` Vladislav Bolkhovitin
2010-07-29 20:58 ` Ric Wheeler
2010-07-29 22:30 ` Andreas Dilger
2010-07-29 23:04 ` Ted Ts'o
2010-07-29 23:08 ` Ric Wheeler
2010-07-29 23:08 ` Ric Wheeler
2010-07-29 23:28 ` James Bottomley
2010-07-29 23:37 ` James Bottomley
2010-07-30 0:19 ` Ted Ts'o
2010-07-30 12:56 ` Vladislav Bolkhovitin
2010-07-30 7:11 ` Christoph Hellwig
2010-07-30 7:11 ` Christoph Hellwig
2010-07-30 12:56 ` Vladislav Bolkhovitin
2010-07-30 13:07 ` Tejun Heo
2010-07-30 13:22 ` Vladislav Bolkhovitin
2010-07-30 13:27 ` Vladislav Bolkhovitin
2010-07-30 13:09 ` Christoph Hellwig
2010-07-30 13:25 ` Vladislav Bolkhovitin
2010-07-30 13:34 ` Christoph Hellwig
2010-07-30 13:44 ` Vladislav Bolkhovitin
2010-07-30 14:20 ` Christoph Hellwig
2010-07-31 0:47 ` Jan Kara
2010-07-31 9:12 ` Christoph Hellwig
2010-08-02 13:14 ` Jan Kara
2010-08-02 10:38 ` Vladislav Bolkhovitin
2010-08-02 12:48 ` Christoph Hellwig
2010-08-02 19:03 ` xfs rm performance Vladislav Bolkhovitin
2010-08-02 19:18 ` Christoph Hellwig
2010-08-05 19:31 ` Vladislav Bolkhovitin
2010-08-02 19:01 ` [RFC] relaxed barrier semantics Vladislav Bolkhovitin
2010-08-02 19:26 ` Christoph Hellwig
2010-07-30 12:56 ` Vladislav Bolkhovitin
2010-07-31 0:35 ` Jan Kara
2010-08-02 16:47 ` Ryusuke Konishi
2010-08-02 17:39 ` Chris Mason
2010-08-05 13:11 ` Vladislav Bolkhovitin
2010-08-05 13:32 ` Chris Mason
2010-08-05 14:52 ` Hannes Reinecke
2010-08-05 14:52 ` Hannes Reinecke
2010-08-05 15:17 ` Chris Mason
2010-08-05 17:07 ` Christoph Hellwig
2010-08-05 19:48 ` Vladislav Bolkhovitin
2010-08-05 19:50 ` Christoph Hellwig
2010-08-05 20:05 ` Vladislav Bolkhovitin
2010-08-06 14:56 ` Hannes Reinecke
2010-08-06 18:38 ` Vladislav Bolkhovitin
2010-08-06 23:38 ` Christoph Hellwig
2010-08-06 23:34 ` Christoph Hellwig
2010-08-05 19:48 ` Vladislav Bolkhovitin
2010-08-05 17:09 ` Christoph Hellwig
2010-08-05 19:32 ` Vladislav Bolkhovitin
2010-08-05 19:40 ` Christoph Hellwig
2010-08-05 13:11 ` Vladislav Bolkhovitin
2010-07-28 13:56 ` Vladislav Bolkhovitin
2010-07-28 14:42 ` Vivek Goyal
2010-07-27 19:37 ` Christoph Hellwig
2010-08-03 18:49 ` [PATCH, RFC 1/2] relaxed cache flushes Christoph Hellwig
2010-08-03 18:51 ` [PATCH, RFC 2/2] dm: support REQ_FLUSH directly Christoph Hellwig
2010-08-04 4:57 ` Kiyoshi Ueda
2010-08-04 8:54 ` Christoph Hellwig
2010-08-05 2:16 ` Jun'ichi Nomura
2010-08-26 22:50 ` Mike Snitzer
2010-08-27 0:40 ` Mike Snitzer
2010-08-27 1:20 ` Jamie Lokier
2010-08-27 1:43 ` Jun'ichi Nomura
2010-08-27 4:08 ` Mike Snitzer
2010-08-27 5:52 ` Jun'ichi Nomura
2010-08-27 14:13 ` Mike Snitzer
2010-08-30 4:45 ` Jun'ichi Nomura
2010-08-30 8:33 ` Tejun Heo
2010-08-30 12:43 ` Mike Snitzer
2010-08-30 12:45 ` Tejun Heo
2010-08-06 16:04 ` [PATCH, RFC] relaxed barriers Tejun Heo
2010-08-06 23:34 ` Christoph Hellwig
2010-08-07 10:13 ` [PATCH REPOST " Tejun Heo
2010-08-08 14:31 ` Christoph Hellwig
2010-08-09 14:50 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100730031721.GA31762@redhat.com \
--to=vgoyal@redhat.com \
--cc=James.Bottomley@suse.de \
--cc=chris.mason@oracle.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=konishi.ryusuke@lab.ntt.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=swhiteho@redhat.com \
--cc=tj@kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.