From: Ming Lei <ming.lei@redhat.com>
To: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: luojiaxing <luojiaxing@huawei.com>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"Martin K. Petersen" <martin.petersen@oracle.com>,
John Garry <john.garry@huawei.com>
Subject: Re: [PATCH] blk-mq: avoid to iterate over stale request
Date: Wed, 15 Dec 2021 11:36:22 +0800 [thread overview]
Message-ID: <YblitnLqJtkK/xBt@T590> (raw)
In-Reply-To: <0d8666c9983158a4954f30f6b429e797@mail.gmail.com>
Hello Kashyap,
On Wed, Dec 15, 2021 at 12:11:13AM +0530, Kashyap Desai wrote:
> + John Garry
>
> > blk-mq can't run allocating driver tag and updating ->rqs[tag]
> atomically,
> > meantime blk-mq doesn't clear ->rqs[tag] after the driver tag is
> released.
> >
> > So there is chance to iterating over one stale request just after the
> tag is
> > allocated and before updating ->rqs[tag].
> >
> > scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
> in-flight
> > requests after scsi host is blocked, so no new scsi command can be
> marked as
> > SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run by
> blk-
> > mq core. One request is marked as SCMD_STATE_INFLIGHT, but this request
> > may have been kept in another slot of ->rqs[], meantime the slot can be
> > allocated out but ->rqs[] isn't updated yet. Then this in-flight request
> is
> > counted twice as SCMD_STATE_INFLIGHT. This way causes trouble in
> handling
> > scsi error.
>
> Hi Ming,
>
> We found similar issue on RHEL8.5 (kernel does not have this patch in
> discussion.). Issue reproduced on 5.15 kernel as well.
> I understood this commit will fix specific race condition and avoid
> reading incorrect host_busy value.
> As per commit message - That incorrect host_busy will be just transient.
> If we read after some delay, correct host_busy count will be available.
> Right ?
Yeah, any counter(include atomic counter) works in this way.
But here it may be 'permanent' because one stale request pointer may stay in
one slot of ->rqs[] for long enough time if this slot isn't reused, meantime
the same request can be reallocated in case of real io scheduler. Maybe the
commit log should be improved a bit for making it explicit.
>
> In my case (I am using shared host tag enabled driver), it is not race
> condition issue but stale rqs[] entries create permanent incorrect count
> of host_busy.
> Example - There are two pending IOs. This IOs are timed out. Bitmap of
> pending IO is tag#5 (actually belongs to hctx0), tag#10 (actually belongs
> to hctx1). Note - This is a shared bit map.
> If hctx0 has same address of the request at 5th and 10th index, we will
It shouldn't be possible, since ->rqs[] is per-tags. If it is shared bit
map, both tag#5 and tag#10 are set, and both shared_tags->rqs[5] &
shared_tags->rqs[10] should point to the updated requests(timed out).
> count total 2 inflight commands instead of 1 from hctx0 context + From
> hctx1 context, we will count 1 inflight command = Total is 3.
> Even though we read after some delay, host_busy will be incorrect. We
> expect host_busy = 2 but it will return 3.
>
> This patch fix my issue explained above for shared host-tag case. I am
> confused reading the commit message. You may not have intentionally fix
> the issue as I explained but indirectly it fixes my issue. Am I correct ?
>
> What was an issue reported by Luojiaxiang ? I am interested to know if
> issue reported by Luojiaxiang had shared host tagset enabled ?
https://lore.kernel.org/linux-scsi/fe5cf6c4-ce5e-4a0f-f4ab-5c10539492cb@huawei.com/
It should be same with yours, and the original report described & explained
the issue in details.
thanks,
Ming
>
> Kashyap
>
> >
> > Fixes the issue by not iterating over stale request.
> >
> > Cc: linux-scsi@vger.kernel.org
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Reported-by: luojiaxing <luojiaxing@huawei.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > block/blk-mq-tag.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> > 86f87346232a..ff5caeb82542 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -208,7 +208,7 @@ static struct request
> > *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
> >
> > spin_lock_irqsave(&tags->lock, flags);
> > rq = tags->rqs[bitnr];
> > - if (!rq || !refcount_inc_not_zero(&rq->ref))
> > + if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref))
> > rq = NULL;
> > spin_unlock_irqrestore(&tags->lock, flags);
> > return rq;
> > --
> > 2.31.1
--
Ming
next prev parent reply other threads:[~2021-12-15 3:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 6:50 [PATCH] blk-mq: avoid to iterate over stale request Ming Lei
2021-09-06 22:44 ` Bart Van Assche
2021-09-07 1:14 ` Ming Lei
2021-09-13 1:26 ` Ming Lei
2021-09-13 1:31 ` Jens Axboe
2021-12-14 18:41 ` Kashyap Desai
2021-12-15 3:36 ` Ming Lei [this message]
2021-12-15 7:30 ` Kashyap Desai
2021-12-15 8:02 ` Ming Lei
2021-12-15 8:45 ` Kashyap Desai
2021-12-15 12:56 ` Ming Lei
2021-12-16 11:55 ` Kashyap Desai
2021-12-16 12:50 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YblitnLqJtkK/xBt@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=john.garry@huawei.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=luojiaxing@huawei.com \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.