From: Ming Lei <ming.lei@redhat.com>
To: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: luojiaxing <luojiaxing@huawei.com>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"Martin K. Petersen" <martin.petersen@oracle.com>,
John Garry <john.garry@huawei.com>
Subject: Re: [PATCH] blk-mq: avoid to iterate over stale request
Date: Wed, 15 Dec 2021 11:36:22 +0800 [thread overview]
Message-ID: <YblitnLqJtkK/xBt@T590> (raw)
In-Reply-To: <0d8666c9983158a4954f30f6b429e797@mail.gmail.com>
Hello Kashyap,
On Wed, Dec 15, 2021 at 12:11:13AM +0530, Kashyap Desai wrote:
> + John Garry
>
> > blk-mq can't run allocating driver tag and updating ->rqs[tag]
> atomically,
> > meantime blk-mq doesn't clear ->rqs[tag] after the driver tag is
> released.
> >
> > So there is chance to iterating over one stale request just after the
> tag is
> > allocated and before updating ->rqs[tag].
> >
> > scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
> in-flight
> > requests after scsi host is blocked, so no new scsi command can be
> marked as
> > SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run by
> blk-
> > mq core. One request is marked as SCMD_STATE_INFLIGHT, but this request
> > may have been kept in another slot of ->rqs[], meantime the slot can be
> > allocated out but ->rqs[] isn't updated yet. Then this in-flight request
> is
> > counted twice as SCMD_STATE_INFLIGHT. This way causes trouble in
> handling
> > scsi error.
>
> Hi Ming,
>
> We found similar issue on RHEL8.5 (kernel does not have this patch in
> discussion.). Issue reproduced on 5.15 kernel as well.
> I understood this commit will fix specific race condition and avoid
> reading incorrect host_busy value.
> As per commit message - That incorrect host_busy will be just transient.
> If we read after some delay, correct host_busy count will be available.
> Right ?
Yeah, any counter(include atomic counter) works in this way.
But here it may be 'permanent' because one stale request pointer may stay in
one slot of ->rqs[] for long enough time if this slot isn't reused, meantime
the same request can be reallocated in case of real io scheduler. Maybe the
commit log should be improved a bit for making it explicit.
>
> In my case (I am using shared host tag enabled driver), it is not race
> condition issue but stale rqs[] entries create permanent incorrect count
> of host_busy.
> Example - There are two pending IOs. This IOs are timed out. Bitmap of
> pending IO is tag#5 (actually belongs to hctx0), tag#10 (actually belongs
> to hctx1). Note - This is a shared bit map.
> If hctx0 has same address of the request at 5th and 10th index, we will
It shouldn't be possible, since ->rqs[] is per-tags. If it is shared bit
map, both tag#5 and tag#10 are set, and both shared_tags->rqs[5] &
shared_tags->rqs[10] should point to the updated requests(timed out).
> count total 2 inflight commands instead of 1 from hctx0 context + From
> hctx1 context, we will count 1 inflight command = Total is 3.
> Even though we read after some delay, host_busy will be incorrect. We
> expect host_busy = 2 but it will return 3.
>
> This patch fix my issue explained above for shared host-tag case. I am
> confused reading the commit message. You may not have intentionally fix
> the issue as I explained but indirectly it fixes my issue. Am I correct ?
>
> What was an issue reported by Luojiaxiang ? I am interested to know if
> issue reported by Luojiaxiang had shared host tagset enabled ?
https://lore.kernel.org/linux-scsi/fe5cf6c4-ce5e-4a0f-f4ab-5c10539492cb@huawei.com/
It should be same with yours, and the original report described & explained
the issue in details.
thanks,
Ming
>
> Kashyap
>
> >
> > Fixes the issue by not iterating over stale request.
> >
> > Cc: linux-scsi@vger.kernel.org
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Reported-by: luojiaxing <luojiaxing@huawei.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > block/blk-mq-tag.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> > 86f87346232a..ff5caeb82542 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -208,7 +208,7 @@ static struct request
> > *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> >
> > spin_lock_irqsave(&tags->lock, flags);
> > rq = tags->rqs[bitnr];
> > - if (!rq || !refcount_inc_not_zero(&rq->ref))
> > + if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref))
> > rq = NULL;
> > spin_unlock_irqrestore(&tags->lock, flags);
> > return rq;
> > --
> > 2.31.1
--
Ming
next prev parent reply other threads:[~2021-12-15 3:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 6:50 [PATCH] blk-mq: avoid to iterate over stale request Ming Lei
2021-09-06 22:44 ` Bart Van Assche
2021-09-07 1:14 ` Ming Lei
2021-09-13 1:26 ` Ming Lei
2021-09-13 1:31 ` Jens Axboe
2021-12-14 18:41 ` Kashyap Desai
2021-12-15 3:36 ` Ming Lei [this message]
2021-12-15 7:30 ` Kashyap Desai
2021-12-15 8:02 ` Ming Lei
2021-12-15 8:45 ` Kashyap Desai
2021-12-15 12:56 ` Ming Lei
2021-12-16 11:55 ` Kashyap Desai
2021-12-16 12:50 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YblitnLqJtkK/xBt@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=john.garry@huawei.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=luojiaxing@huawei.com \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox