Re: blk-mq flush fix

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Shaohua Li <shli@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>,
	Alexander Gordeev <agordeev@redhat.com>,
	Tejun Heo <tj@kernel.org>,
	Nicholas Bellinger <nab@linux-iscsi.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: blk-mq flush fix
Date: Tue, 29 Oct 2013 03:47:41 +0800	[thread overview]
Message-ID: <20131028194741.GA24664@kernel.org> (raw)
In-Reply-To: <526EBD51.1010804@kernel.dk>

On Mon, Oct 28, 2013 at 01:38:57PM -0600, Jens Axboe wrote:
> On 10/28/2013 10:57 AM, Shaohua Li wrote:
> > 
> > 
> > 
> > 2013/10/28 Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>>
> > 
> >     On 10/28/2013 02:48 AM, Christoph Hellwig wrote:
> >     > On Sun, Oct 27, 2013 at 10:29:25PM +0000, Jens Axboe wrote:
> >     >> On Sat, Oct 26 2013, Christoph Hellwig wrote:
> >     >>> I think this variant of the patch from Alexander should fix the
> >     issue
> >     >>> in a minimally invasive way.  Longer term I'd prefer to use
> >     q->flush_rq
> >     >>> like in the non-mq case by copying over the context and tag
> >     information.
> >     >>
> >     >> This one is pretty simple, we could definitely use it as a band
> >     aid. I
> >     >> too would greatly prefer using the static ->flush_rq instead.
> >     Just have
> >     >> it marked to bypass most of the free logic.
> >     >
> >     > We already bypass the free logical by setting and end_io callback for
> >     > a while, similar to what the old code does.  Maybe it's not all that
> >     > hard to prealloc the request, let me give a sping.  Using the static
> >     > allocated one will be hard due to the driver-specific extra data,
> >     > though.
> > 
> >     It's not that I think the existing patch is THAT bad, it fits in alright
> >     with the reserved tagging and works regardless of whether a driver uses
> >     reserved tags or not. And it does have the upside of not requiring
> >     special checks or logic for this special non-tagged request that using
> >     the preallocated would might need.
> > 
> >     >> I'll add this one.
> >     >
> >     > Gimme another day or so to figure this out.
> > 
> >     OK, holding off.
> > 
> > 
> > Another option: we could throttle flush-request allocation in
> > blk_mq_alloc_request(), for example, flush_req_nr >= max_tags - 1, make
> > the allocation wait.
> 
> That could work too. If we back off, then we could restart it once a
> request completes. That does, however, requiring checking that and
> potentially kicking all the queues on completion when that happens.

Sounds not a big problem because the case flush_req uses all tags is very rare.
The good side is we can avoid reserving a tag, which is precious.

I cooked a patch to demonstrate the idea, only compiled yet.

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3e4cc9c..192c2aa 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -284,7 +284,6 @@ static void mq_flush_work(struct work_struct *work)
 
 	q = container_of(work, struct request_queue, mq_flush_work);
 
-	/* We don't need set REQ_FLUSH_SEQ, it's for consistency */
 	rq = blk_mq_alloc_request(q, WRITE_FLUSH|REQ_FLUSH_SEQ,
 		__GFP_WAIT|GFP_ATOMIC);
 	rq->cmd_type = REQ_TYPE_FS;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ac804c6..fbbe0cc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -180,8 +180,21 @@ static void blk_mq_rq_ctx_init(struct blk_mq_ctx *ctx, struct request *rq,
 }
 
 static struct request *__blk_mq_alloc_request(struct blk_mq_hw_ctx *hctx,
-					      gfp_t gfp, bool reserved)
+					      gfp_t gfp, bool reserved,
+					      int rw)
 {
+
+	/*
+	 * flush need allocate a request, leave at least one request for
+	 * non-flush IO to avoid deadlock
+	 */
+	if ((rw & REQ_FLUSH) && !(rw & REQ_FLUSH_SEQ)) {
+		atomic_inc(&hctx->pending_flush);
+		/* fallback to a wait allocation */
+		if (atomic_read(&hctx->pending_flush) >= hctx->queue_depth -
+		    hctx->reserved_tags - 1)
+			return NULL;
+	}
 	return blk_mq_alloc_rq(hctx, gfp, reserved);
 }
 
@@ -195,7 +208,7 @@ static struct request *blk_mq_alloc_request_pinned(struct request_queue *q,
 		struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
 		struct blk_mq_hw_ctx *hctx = q->mq_ops->map_queue(q, ctx->cpu);
 
-		rq = __blk_mq_alloc_request(hctx, gfp & ~__GFP_WAIT, reserved);
+		rq = __blk_mq_alloc_request(hctx, gfp & ~__GFP_WAIT, reserved, rw);
 		if (rq) {
 			blk_mq_rq_ctx_init(ctx, rq, rw);
 			break;
@@ -253,6 +266,10 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx,
 	const int tag = rq->tag;
 	struct request_queue *q = rq->q;
 
+	if ((rq->cmd_flags & REQ_FLUSH) && !(rq->cmd_flags & REQ_FLUSH_SEQ)) {
+		atomic_dec(&hctx->pending_flush);
+	}
+
 	blk_mq_rq_init(hctx, rq);
 	blk_mq_put_tag(hctx->tags, tag);
 
@@ -918,7 +935,7 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	hctx = q->mq_ops->map_queue(q, ctx->cpu);
 
 	trace_block_getrq(q, bio, rw);
-	rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, false);
+	rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, false, rw);
 	if (likely(rq))
 		blk_mq_rq_ctx_init(ctx, rq, rw);
 	else {
@@ -1202,6 +1219,7 @@ static int blk_mq_init_hw_queues(struct request_queue *q,
 		hctx->queue_num = i;
 		hctx->flags = reg->flags;
 		hctx->queue_depth = reg->queue_depth;
+		hctx->reserved_tags = reg->reserved_tags;
 		hctx->cmd_size = reg->cmd_size;
 
 		blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3368b97..0f81528 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -36,12 +36,15 @@ struct blk_mq_hw_ctx {
 	struct list_head	page_list;
 	struct blk_mq_tags	*tags;
 
+	atomic_t		pending_flush;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	10
 	unsigned long		dispatched[BLK_MQ_MAX_DISPATCH_ORDER];
 
 	unsigned int		queue_depth;
+	unsigned int		reserved_tags;
 	unsigned int		numa_node;
 	unsigned int		cmd_size;	/* per-request extra data */

     prev parent reply	other threads:[~2013-10-28 20:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-26 11:46 blk-mq flush fix Christoph Hellwig
2013-10-26 15:31 ` Christoph Hellwig
2013-10-27 22:29   ` Jens Axboe
2013-10-28  8:48     ` Christoph Hellwig
2013-10-28 16:29       ` Jens Axboe
2013-10-28 16:46         ` Christoph Hellwig
2013-10-28 16:59           ` Jens Axboe
2013-10-28 19:30             ` Christoph Hellwig
2013-10-28 19:39               ` Jens Axboe
     [not found]         ` <CANejiEWyznEOtRAXrsgEqGoo2EWJDGBt7XH4AZFksWSmR4UY+Q@mail.gmail.com>
2013-10-28 19:38           ` Jens Axboe
2013-10-28 19:47             ` Shaohua Li [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3e4cc9c dfblob:192c2aa dfblob:ac804c6 dfblob:fbbe0cc
dfblob:3368b97 dfblob:0f81528 )
 OR (
bs:"Re: blk-mq flush fix" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131028194741.GA24664@kernel.org \
    --to=shli@kernel.org \
    --cc=agordeev@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nab@linux-iscsi.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.