From: Ming Lei <ming.lei@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: dgilbert@interlog.com, linux-scsi <linux-scsi@vger.kernel.org>,
James Bottomley <james.bottomley@hansenpartnership.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Kashyap Desai <kashyap.desai@broadcom.com>
Subject: Re: REQ_HIPRI and SCSI mid-level
Date: Tue, 1 Jun 2021 21:18:50 +0800 [thread overview]
Message-ID: <YLYzunA0PPShKxJd@T590> (raw)
In-Reply-To: <fb50a05f-11f5-adb5-308e-769923d8d3ff@suse.de>
On Tue, Jun 01, 2021 at 02:19:10PM +0200, Hannes Reinecke wrote:
> On 5/28/21 3:32 AM, Ming Lei wrote:
> > On Thu, May 27, 2021 at 05:43:07PM +0200, Hannes Reinecke wrote:
> >> On 5/25/21 6:03 PM, Douglas Gilbert wrote:
> >>> On 2021-05-21 5:56 p.m., Douglas Gilbert wrote:
> >>>> The REQ_HIPRI flag on requests is associated with blk_poll() (aka iopoll)
> >>>> and assumes the user space (or some higher level) will be calling
> >>>> blk_poll() on requests marked with REQ_HIPRI and that will lead to their
> >>>> completion.
> >>>>
> >>>> In lk 5.13-rc1 the megaraid and scsi_debug LLDs support blk_poll() [seen
> >>>> by searching for 'mq_poll'] with more to follow, I assume. I have tested
> >>>> blk_poll() on the scsi_debug driver using both fio and the new sg driver.
> >>>> It works well with one caveat: as long as there isn't an error.
> >>>> After fighting with that error processing from the ULD side (i.e. the
> >>>> new sg driver) and the LLD side I am concluding that the glue that
> >>>> holds them together, that is, the mid-level is not as REQ_HIPRI aware
> >>>> as it should be.
> >>>>
> >>>> Yes REQ_HIPRI is there in scsi_lib.c but it is missing from scsi_error.c
> >>>> How can scsi_error.c re-issue requests _without_ taking into account
> >>>> that the original was issued with REQ_HIPRI ? Well I don't know but I'm
> >>>> pretty sure that is close to the area that I see causing problems
> >>>> (mainly lockups).
> >>>>
> >>>> As an example the scsi_debug driver has an in-use bitmap that when a new
> >>>> request arrives the code looks for an empty slot. Due to (incorrect)
> >>>> parameter setup that may fail. If the driver returns:
> >>>> device_qfull_result = (DID_OK << 16) | SAM_STAT_TASK_SET_FULL;
> >>>> then I see lock-ups if the request in question has REQ_HIPRI set.
> >>>>
> >>>> If that is changed to:
> >>>> device_qfull_result = (DID_ABORT << 16) | SAM_STAT_TASK_SET_FULL;
> >>>> then my user space test program sees that error and aborts showing the
> >>>> TASK SET FULL SCSI status. That is much better than a lockup ...
> >>>>
> >> That's because with the first result the command is requeued (due to
> >> DID_OK), whereas in the latter result the command is aborted (due to
> >> DID_ABORT).
> >>
> >> So the question really is whether we should retry the commands which have
> >> REQ_HIPRI set, or whether we shouldn't rather complete them with appropriate
> >> error code.
> >> A bit like enhanced BLOCK_PC requests, if you will.
> >>
> >>>> Having played around with variants of the above for a few weeks, I'd
> >>>> like to throw this problem into the open :-)
> >>>>
> >>>>
> >>>> Suggestion: perhaps the eh could give up immediately on any request
> >>>> with REQ_HIPRI set (i.e. make it a higher level layer's problem).
> >>
> >> Like I said above: it's not only scsi EH which would need to be modified,
> >> but possibly also the result evaluation in scsi_decide_disposition(); it's
> >> questionable whether a HIPRI command should be requeued at all.
> >
> > Why can't HIPRI req be requeued?
> >
> Oh, it can.
> As I said: it's questionable; HIPRI / polled completions are just that,
> polling for completions. And a completion indicating a requeue is
> _still_ a completion.
> So one could argue that we should return here (as it's a completion, and
> we're polling for completion).
No, blk_poll() actually poll for bio's completion instead of request's
completion. If the submitted HIPRI bio isn't completed, the polling won't be
stopped.
>
> >>
> >> But this might even affect the NVMe folks; they do return commands with
> >> BLK_STS_RESOURCE, too.
> >
> > Block layer will be responsible for re-queueing BLK_STS_RESOURCE requests,
> > so still not understand why it is one issue for HIPRI req. Also
> > rq->mq_hctx won't be changed since its allocation, blk_poll()
> > will keep polling on the correct hw queue for reaping the IO.
> >
> As mentioned above, I was talking about completions indicating a requeue.
> Requeues due to resource shortage on the initiator side would of course
> be requeued.
blk_poll() doesn't care request requeue, what matters is that if the HIPRI
bio is completed or not.
So either timeout or EH codes needn't to handle HIPRI IO specially.
Thanks,
Ming
next prev parent reply other threads:[~2021-06-01 13:19 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-21 21:56 REQ_HIPRI and SCSI mid-level Douglas Gilbert
2021-05-25 16:03 ` Douglas Gilbert
2021-05-27 15:43 ` Hannes Reinecke
2021-05-28 1:32 ` Ming Lei
2021-06-01 12:19 ` Hannes Reinecke
2021-06-01 13:18 ` Ming Lei [this message]
2021-05-26 0:34 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YLYzunA0PPShKxJd@T590 \
--to=ming.lei@redhat.com \
--cc=dgilbert@interlog.com \
--cc=hare@suse.de \
--cc=james.bottomley@hansenpartnership.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox