public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCHv2] nvme-pci: fix timeout request state check
@ 2023-01-18  5:22 Keith Busch
  2023-01-18  5:33 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-01-18  5:22 UTC (permalink / raw)
  To: linux-nvme, hch, sagi, Jens Axboe; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

Polling the completion can progress the request state to IDLE, either
inline with the completion, or through softirq. Either way, the state
may not be COMPLETED, so don't check for that. We only care if the state
isn't STARTED.

This is fixing an issue where the driver aborts an IO that we just
completed. Seeing the "aborting" message instead of "polled" is very
misleading as to where the timeout problem resides.

Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2: Fixed spelling mistake in subject

 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a2553b7d9bb8e..c92840333230b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1362,7 +1362,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	else
 		nvme_poll_irqdisable(nvmeq);
 
-	if (blk_mq_request_completed(req)) {
+	if (!blk_mq_request_started(req)) {
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, completion polled\n",
 			 req->tag, nvmeq->qid);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHv2] nvme-pci: fix timeout request state check
  2023-01-18  5:22 [PATCHv2] nvme-pci: fix timeout request state check Keith Busch
@ 2023-01-18  5:33 ` Christoph Hellwig
  2023-01-18  5:52   ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2023-01-18  5:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, hch, sagi, Jens Axboe, Keith Busch

On Tue, Jan 17, 2023 at 09:22:44PM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Polling the completion can progress the request state to IDLE, either
> inline with the completion, or through softirq. Either way, the state
> may not be COMPLETED, so don't check for that. We only care if the state
> isn't STARTED.
> 
> This is fixing an issue where the driver aborts an IO that we just
> completed. Seeing the "aborting" message instead of "polled" is very
> misleading as to where the timeout problem resides.

Hmm.  Using a started helper for something that by definition is started
doesn't really make much sense.  I guess the problem here is that
blk_mq_end_request_batch sets the state to MQ_RQ_IDLE afte calling
blk_complete_request?  Maybe we just need an explicit check for
MQ_RQ_IDLE here as started seems like the wrong implication here.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2] nvme-pci: fix timeout request state check
  2023-01-18  5:33 ` Christoph Hellwig
@ 2023-01-18  5:52   ` Keith Busch
  2023-01-18  7:33     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-01-18  5:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, sagi, Jens Axboe

On Wed, Jan 18, 2023 at 06:33:06AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 17, 2023 at 09:22:44PM -0800, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Polling the completion can progress the request state to IDLE, either
> > inline with the completion, or through softirq. Either way, the state
> > may not be COMPLETED, so don't check for that. We only care if the state
> > isn't STARTED.
> > 
> > This is fixing an issue where the driver aborts an IO that we just
> > completed. Seeing the "aborting" message instead of "polled" is very
> > misleading as to where the timeout problem resides.
> 
> Hmm.  Using a started helper for something that by definition is started
> doesn't really make much sense.  I guess the problem here is that
> blk_mq_end_request_batch sets the state to MQ_RQ_IDLE afte calling
> blk_complete_request?  Maybe we just need an explicit check for
> MQ_RQ_IDLE here as started seems like the wrong implication here.

We're actually not batching here (no IOB in the timeout context), so we
are either:

  a. calling nvme_pci_complete_rq() inline with the cqe
  b. racing with smp ipi or softirq

If case (a), we will always see IDLE. If (b), we are racing and may see
either COMPLETED or IDLE, so we have to check that it's not either of
those. Since there's only one other state (STARTED) that was guaranteed
prior to entering the timeout handler, we can just make sure it's not
that one after the poll to know if abort escalation is needed.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2] nvme-pci: fix timeout request state check
  2023-01-18  5:52   ` Keith Busch
@ 2023-01-18  7:33     ` Christoph Hellwig
  2023-01-18 15:21       ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2023-01-18  7:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Jens Axboe

On Tue, Jan 17, 2023 at 10:52:39PM -0700, Keith Busch wrote:
> We're actually not batching here (no IOB in the timeout context), so we
> are either:
> 
>   a. calling nvme_pci_complete_rq() inline with the cqe
>   b. racing with smp ipi or softirq
> 
> If case (a), we will always see IDLE. If (b), we are racing and may see
> either COMPLETED or IDLE, so we have to check that it's not either of
> those. Since there's only one other state (STARTED) that was guaranteed
> prior to entering the timeout handler, we can just make sure it's not
> that one after the poll to know if abort escalation is needed.

The point is still that "started" is the wrong check here and relies
on an implementation detail.  I think we're better off with an explicit
IDLE check and a big fat comment.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2] nvme-pci: fix timeout request state check
  2023-01-18  7:33     ` Christoph Hellwig
@ 2023-01-18 15:21       ` Keith Busch
  2023-01-18 16:35         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-01-18 15:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, sagi, Jens Axboe

On Wed, Jan 18, 2023 at 08:33:30AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 17, 2023 at 10:52:39PM -0700, Keith Busch wrote:
> > We're actually not batching here (no IOB in the timeout context), so we
> > are either:
> > 
> >   a. calling nvme_pci_complete_rq() inline with the cqe
> >   b. racing with smp ipi or softirq
> > 
> > If case (a), we will always see IDLE. If (b), we are racing and may see
> > either COMPLETED or IDLE, so we have to check that it's not either of
> > those. Since there's only one other state (STARTED) that was guaranteed
> > prior to entering the timeout handler, we can just make sure it's not
> > that one after the poll to know if abort escalation is needed.
> 
> The point is still that "started" is the wrong check here and relies
> on an implementation detail.  I think we're better off with an explicit
> IDLE check and a big fat comment.

So you want the check to look like this instead?

---
@@ -1362,7 +1362,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
        else
                nvme_poll_irqdisable(nvmeq);

-       if (blk_mq_request_completed(req)) {
+       if (blk_mq_request_completed(req) ||
+           blk_mq_rq_state(req) == MQ_RQ_IDLE) {
                dev_warn(dev->ctrl.device,
                         "I/O %d QID %d timeout, completion polled\n",
                         req->tag, nvmeq->qid);
--

That's essentially a more complicated equivalent to what I have, but
fine with me if you think it's more clear.

Alternatively, I also considered moving the IDLE state setting to when
the request is actually freed, which might make more sense and works
without changing the nvme driver:

---
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -713,6 +713,7 @@ static void __blk_mq_free_request(struct request *rq)
        struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
        const int sched_tag = rq->internal_tag;

+       WRITE_ONCE(rq->state, MQ_RQ_IDLE);
        blk_crypto_free_request(rq);
        blk_pm_mark_last_busy(rq);
        rq->mq_hctx = NULL;
@@ -741,7 +742,6 @@ void blk_mq_free_request(struct request *rq)

        rq_qos_done(q, rq);

-       WRITE_ONCE(rq->state, MQ_RQ_IDLE);
        if (req_ref_put_and_test(rq))
                __blk_mq_free_request(rq);
 }
--


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2] nvme-pci: fix timeout request state check
  2023-01-18 15:21       ` Keith Busch
@ 2023-01-18 16:35         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2023-01-18 16:35 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Jens Axboe

On Wed, Jan 18, 2023 at 08:21:58AM -0700, Keith Busch wrote:
> > The point is still that "started" is the wrong check here and relies
> > on an implementation detail.  I think we're better off with an explicit
> > IDLE check and a big fat comment.
> 
> So you want the check to look like this instead?
> 
> ---
> @@ -1362,7 +1362,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>         else
>                 nvme_poll_irqdisable(nvmeq);
> 
> -       if (blk_mq_request_completed(req)) {
> +       if (blk_mq_request_completed(req) ||
> +           blk_mq_rq_state(req) == MQ_RQ_IDLE) {
>                 dev_warn(dev->ctrl.device,
>                          "I/O %d QID %d timeout, completion polled\n",
>                          req->tag, nvmeq->qid);
> --

Or maybe just:

	if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) {
		..

> Alternatively, I also considered moving the IDLE state setting to when
> the request is actually freed, which might make more sense and works
> without changing the nvme driver:

I like that idea, but this touches gnarly code, so it'll need good
testing.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-18 16:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-18  5:22 [PATCHv2] nvme-pci: fix timeout request state check Keith Busch
2023-01-18  5:33 ` Christoph Hellwig
2023-01-18  5:52   ` Keith Busch
2023-01-18  7:33     ` Christoph Hellwig
2023-01-18 15:21       ` Keith Busch
2023-01-18 16:35         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox