From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.garry@huawei.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, hch@lst.de, hare@suse.de,
ppvk@codeaurora.org, bvanassche@acm.org,
kashyap.desai@broadcom.com
Subject: Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs
Date: Thu, 3 Dec 2020 08:55:05 +0800 [thread overview]
Message-ID: <20201203005505.GB540033@T590> (raw)
In-Reply-To: <aaf77015-3039-6b04-3417-d376e3467444@huawei.com>
On Wed, Dec 02, 2020 at 11:18:31AM +0000, John Garry wrote:
> On 02/12/2020 03:31, Ming Lei wrote:
> > On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
> > > It has been reported many times that a use-after-free can be intermittently
> > > found when iterating busy requests:
> > >
> > > - https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@huawei.com/
> > > - https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@acm.org/T/#m6c1ac11540522716f645d004e2a5a13c9f218908
> > > - https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@kernel.dk/
> > >
> > > The issue is that when we switch scheduler or change queue nr_requests,
> > > the driver tagset may keep references to the stale requests.
> > >
> > > As a solution, clean up any references to those requests in the driver
> > > tagset when freeing. This is done with a cmpxchg to make safe any race
> > > with setting the driver tagset request from another queue.
> > >
> > > Signed-off-by: John Garry <john.garry@huawei.com>
> > > --
> > > Set as RFC as I need to test more. And not sure on solution method, as
> > > Bart had another idea.
> > >
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index d1eafe2c045c..9b042c7036b3 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -621,7 +621,7 @@ void blk_mq_sched_free_requests(struct request_queue *q)
> > > queue_for_each_hw_ctx(q, hctx, i) {
> > > if (hctx->sched_tags)
> > > - blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
> > > + blk_mq_free_rqs_ext(q->tag_set, hctx->sched_tags, i, hctx->tags);
> > > }
> > > }
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 9c92053e704d..562db72e7d79 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -576,7 +576,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
> > > return -ENOMEM;
> > > }
> > > - blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
> > > + blk_mq_free_rqs_ext(set, *tagsptr, hctx->queue_num, hctx->tags);
> > > blk_mq_free_rq_map(*tagsptr, flags);
> > > *tagsptr = new;
> > > } else {
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 55bcee5dc032..f3aad695cd25 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -2271,8 +2271,8 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
> > > return BLK_QC_T_NONE;
> > > }
> > > -void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
> > > - unsigned int hctx_idx)
> > > +void blk_mq_free_rqs_ext(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
> > > + unsigned int hctx_idx, struct blk_mq_tags *references)
> > > {
> > > struct page *page;
> > > @@ -2281,10 +2281,13 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
> > > for (i = 0; i < tags->nr_tags; i++) {
> > > struct request *rq = tags->static_rqs[i];
> > > + int j;
> > > if (!rq)
> > > continue;
> > > set->ops->exit_request(set, rq, hctx_idx);
> > > + for (j = 0; references && j < references->nr_tags; j++)
> > > + cmpxchg(&references->rqs[j], rq, 0);
> >
> > Seems you didn't address the comment in the following link:
> >
> > https://lore.kernel.org/linux-block/10331543-9e45-ae63-8cdb-17e5a2a3b7ef@huawei.com/
> >
> > The request to be freed may still be refered in another path, such as blk_mq_queue_tag_busy_iter
> > or blk_mq_tagset_busy_iter(), and cmpxchg() doesn't drain/wait for other refers.
> >
>
> Hi Ming,
>
> Yeah, so I said that was another problem which you mentioned there, which
> I'm not addressing, but I don't think that I'm making thing worse here.
The thing is that this patch does not fix the issue completely.
>
> So AFAICS, the blk-mq/sched code doesn't wait for any "readers" to be
> finished, such as those running blk_mq_queue_tag_busy_iter or
> blk_mq_tagset_busy_iter() in another context.
>
> So how about the idea of introducing some synchronization primitive, such as
> semaphore, which those "readers" must grab and release at start and end (of
> iter), to ensure the requests are not freed during the iteration?
It looks good, however devil is in details, please make into patch for
review.
thanks,
Ming
next prev parent reply other threads:[~2020-12-03 0:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-01 13:02 [RFC PATCH] blk-mq: Clean up references when freeing rqs John Garry
2020-12-02 3:31 ` Ming Lei
2020-12-02 11:18 ` John Garry
2020-12-03 0:55 ` Ming Lei [this message]
2020-12-03 9:26 ` John Garry
2020-12-08 11:36 ` John Garry
2020-12-08 17:36 ` John Garry
2020-12-09 1:01 ` Ming Lei
2020-12-09 9:55 ` John Garry
2020-12-10 2:07 ` Ming Lei
2020-12-10 10:44 ` John Garry
2020-12-10 12:22 ` John Garry
2020-12-11 0:21 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201203005505.GB540033@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.garry@huawei.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ppvk@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).