Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH V6 6/9] blk-mq: always free hctx after request queue is freed
Date: Wed, 24 Apr 2019 09:45:36 +0800	[thread overview]
Message-ID: <20190424014535.GD634@ming.t460p> (raw)
In-Reply-To: <20190424011242.GB634@ming.t460p>

On Wed, Apr 24, 2019@09:12:42AM +0800, Ming Lei wrote:
> On Tue, Apr 23, 2019@04:07:49PM +0200, Hannes Reinecke wrote:
> > On 4/23/19 3:30 PM, Ming Lei wrote:
> > > Hi Hannes,
> > > 
> > > Thanks for your response.
> > > 
> > > On Tue, Apr 23, 2019@01:19:33PM +0200, Hannes Reinecke wrote:
> > > > On 4/22/19 5:30 AM, Ming Lei wrote:
> > [ .. ]
> > > > > 
> > > > > Hi Hannes,
> > > > > 
> > > > > Could you please let us know if you have better idea for this issue?
> > > > > Otherwise, I think we need to move on since it is real issue, and users
> > > > > want to fix that.
> > > > > 
> > > > Okay. Having looked over the problem and possible alternatives, it looks
> > > > indeed like a viable solution.
> > > > I do agree that it's a sensible design to have an additional holding area
> > > > for hardware context elements, given that they might be reassigned during
> > > > blk_mq_realloc_hw_ctxs().
> > > > 
> > > > However, I would rename the 'dead' elements to 'unused' (ie unused_hctx_list
> > > > etc).
> > > 
> > > OK, looks the name of 'unused' is better.
> > > 
> > > > 
> > > > And I would deallocate q->queue_hw_ctx in blk_mq_exit_queue() to make things
> > > > more consistent.
> > > 
> > > No, that is wrong.
> > > 
> > > The request queue's refcount is often held when blk_cleanup_queue() is running,
> > > and blk_mq_exit_queue() is called directly by blk_cleanup_queue(). One invariant
> > > is that we have to allow most APIs running well if the request queue is live
> > > from kobject view. If q->queue_hw_ctx is freed in blk_mq_exit_queue(),
> > > it is quite easy to cause use-after-free.
> > > 
> > Ah. Thought as much.
> > But then in most cases the ->queue_hw_ctx pointer is immaterial as we're
> > accessing things via the hctx pointer, which remains valid.
> > 
> > > > Problem with the current patch is that in blk_mq_release() we iterate
> > > > the 'dead_hctx_list' and free up everything in there, but then blindly call
> > > > 'kfree(q->queue_hw_ctx)' without checking if there are any context pointers
> > > > left.
> > > 
> > > If request queue is dead, it is safe to assume that there isn't any
> > > reference to request queue and q->queue_hw_ctx. Otherwise, it must be
> > > a bug somewhere.
> > > 
> > Precisely.
> > What I'm trying to achieve with this is to protect against such issues,
> > which are quite easy to introduce given the complexity of the code ...
> 
> But releasing q->queue_hw_ctx in blk_cleanup_queue() is easy to cause
> use-after-free even though the request queue's refcount is held. We can't
> do that simply.
> 
> If someone is still trying to use q->queue_hw_ctx[] after the request
> queue is dead, the bug is in the caller of block layer API, not in
> block layer.
> 
> What the patchset is trying to fix is the race in block layer, not
> users of block layer, not drivers. So far, I don't see such driver
> issue.
> 
> Just thought q->queue_hw_ctx as the request queue's resource, you will
> see it is pretty reasonable to free q->queue_hw_ctx in the queue's
> release handler.
> 
> > 
> > > > When moving the call to blk_mq_exit_queue() we could to a simple
> > > > WARN_ON(q->queue_hw_ctx) in blk_mq_release() to ensure that everything got
> > > > deallocated properly.
> > > 
> > > At that time, hctx instance might be active, but that is fine given hctx
> > > is covered by its own kobject. What we need to do is to make sure that no
> > > any references to q->queue_hw_ctx and the request queue.
> > > 
> > My point being here:
> > void blk_mq_release(struct request_queue *q)
> > {
> >         struct blk_mq_hw_ctx *hctx, *next;
> > 
> >         cancel_delayed_work_sync(&q->requeue_work);
> > 
> >         /* all hctx are in .dead_hctx_list now */
> >         list_for_each_entry_safe(hctx, next, &q->dead_hctx_list, hctx_list)
> > {
> >                 list_del_init(&hctx->hctx_list);
> >                 kobject_put(&hctx->kobj);
> >         }
> > 
> >         kfree(q->queue_hw_ctx);
> > 
> >         /*
> >          * release .mq_kobj and sw queue's kobject now because
> >          * both share lifetime with request queue.
> >          */
> >         blk_mq_sysfs_deinit(q);
> > }
> > 
> > This assumes that _all_ hctx pointers are being removed from
> > q->queue_hw_ctx, and are moved to the 'dead' list.
> > If for some reason this is not the case we'll be leaking hctx pointers here.
> 
> IMO, there aren't such some reasons. When blk_mq_release() is called,
> every hctx of this request queue has been "exited" via blk_mq_exit_hctx(),
> either from blk_mq_update_nr_hw_queues() or blk_cleanup_queue().
> 
> If there are hctxs not moved to the 'dead'(or 'unused') list here, it is
> simply a bug in blk-mq. However, I don't see such case now.

Or we can add the following check in blk_mq_release() for capturing
the impossible blk-mq bug:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2ca4395f794d..f0d08087b8f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2366,6 +2366,8 @@ blk_mq_alloc_hctx(struct request_queue *q,
 	hctx->queue = q;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
+	INIT_LIST_HEAD(&hctx->hctx_list);
+
 	/*
 	 * Allocate space for all possible cpus to avoid allocation at
 	 * runtime
@@ -2680,9 +2682,14 @@ static int blk_mq_alloc_ctxs(struct request_queue *q)
 void blk_mq_release(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx, *next;
+	int i;
 
 	cancel_delayed_work_sync(&q->requeue_work);
 
+	/* warn if live hctx is found, that shouldn't happen */
+	queue_for_each_hw_ctx(q, hctx, i)
+		WARN_ON_ONCE(hctx && list_empty(&hctx->hctx_list));
+
 	/* all hctx are in .dead_hctx_list now */
 	list_for_each_entry_safe(hctx, next, &q->dead_hctx_list, hctx_list) {
 		list_del_init(&hctx->hctx_list);

Thanks,
Ming

  reply	other threads:[~2019-04-24  1:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-17  3:44 [PATCH V6 0/9] blk-mq: fix races related with freeing queue Ming Lei
2019-04-17  3:44 ` [PATCH V6 1/9] blk-mq: grab .q_usage_counter when queuing request from plug code path Ming Lei
2019-04-17  3:44 ` [PATCH V6 2/9] blk-mq: move cancel of requeue_work into blk_mq_release Ming Lei
2019-04-17 12:00   ` Hannes Reinecke
2019-04-17  3:44 ` [PATCH V6 3/9] blk-mq: free hw queue's resource in hctx's release handler Ming Lei
2019-04-17 12:02   ` Hannes Reinecke
2019-04-17  3:44 ` [PATCH V6 4/9] blk-mq: move all hctx alloction & initialization into __blk_mq_alloc_and_init_hctx Ming Lei
2019-04-17 12:03   ` Hannes Reinecke
2019-04-17  3:44 ` [PATCH V6 5/9] blk-mq: split blk_mq_alloc_and_init_hctx into two parts Ming Lei
2019-04-17  3:44 ` [PATCH V6 6/9] blk-mq: always free hctx after request queue is freed Ming Lei
2019-04-17 12:08   ` Hannes Reinecke
2019-04-17 12:59     ` Ming Lei
2019-04-22  3:30       ` Ming Lei
2019-04-23 11:19         ` Hannes Reinecke
2019-04-23 13:30           ` Ming Lei
2019-04-23 14:07             ` Hannes Reinecke
2019-04-24  1:12               ` Ming Lei
2019-04-24  1:45                 ` Ming Lei [this message]
2019-04-24  5:55                   ` Hannes Reinecke
2019-04-17  3:44 ` [PATCH V6 7/9] blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release Ming Lei
2019-04-17  3:44 ` [PATCH V6 8/9] block: don't drain in-progress dispatch in blk_cleanup_queue() Ming Lei
2019-04-17  3:44 ` [PATCH V6 9/9] nvme: hold request queue's refcount in ns's whole lifetime Ming Lei
2019-04-17 12:10   ` Hannes Reinecke
2019-04-17 15:55   ` Keith Busch
2019-04-17 17:22 ` [PATCH V6 0/9] blk-mq: fix races related with freeing queue James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190424014535.GD634@ming.t460p \
    --to=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox