* [PATCHv2] nvme-pci: fix timeout request state check
@ 2023-01-18 5:22 Keith Busch
2023-01-18 5:33 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-01-18 5:22 UTC (permalink / raw)
To: linux-nvme, hch, sagi, Jens Axboe; +Cc: Keith Busch
From: Keith Busch <kbusch@kernel.org>
Polling the completion can progress the request state to IDLE, either
inline with the completion, or through softirq. Either way, the state
may not be COMPLETED, so don't check for that. We only care if the state
isn't STARTED.
This is fixing an issue where the driver aborts an IO that we just
completed. Seeing the "aborting" message instead of "polled" is very
misleading as to where the timeout problem resides.
Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2: Fixed spelling mistake in subject
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a2553b7d9bb8e..c92840333230b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1362,7 +1362,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
else
nvme_poll_irqdisable(nvmeq);
- if (blk_mq_request_completed(req)) {
+ if (!blk_mq_request_started(req)) {
dev_warn(dev->ctrl.device,
"I/O %d QID %d timeout, completion polled\n",
req->tag, nvmeq->qid);
--
2.30.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCHv2] nvme-pci: fix timeout request state check 2023-01-18 5:22 [PATCHv2] nvme-pci: fix timeout request state check Keith Busch @ 2023-01-18 5:33 ` Christoph Hellwig 2023-01-18 5:52 ` Keith Busch 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2023-01-18 5:33 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi, Jens Axboe, Keith Busch On Tue, Jan 17, 2023 at 09:22:44PM -0800, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Polling the completion can progress the request state to IDLE, either > inline with the completion, or through softirq. Either way, the state > may not be COMPLETED, so don't check for that. We only care if the state > isn't STARTED. > > This is fixing an issue where the driver aborts an IO that we just > completed. Seeing the "aborting" message instead of "polled" is very > misleading as to where the timeout problem resides. Hmm. Using a started helper for something that by definition is started doesn't really make much sense. I guess the problem here is that blk_mq_end_request_batch sets the state to MQ_RQ_IDLE afte calling blk_complete_request? Maybe we just need an explicit check for MQ_RQ_IDLE here as started seems like the wrong implication here. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] nvme-pci: fix timeout request state check 2023-01-18 5:33 ` Christoph Hellwig @ 2023-01-18 5:52 ` Keith Busch 2023-01-18 7:33 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Keith Busch @ 2023-01-18 5:52 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, sagi, Jens Axboe On Wed, Jan 18, 2023 at 06:33:06AM +0100, Christoph Hellwig wrote: > On Tue, Jan 17, 2023 at 09:22:44PM -0800, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Polling the completion can progress the request state to IDLE, either > > inline with the completion, or through softirq. Either way, the state > > may not be COMPLETED, so don't check for that. We only care if the state > > isn't STARTED. > > > > This is fixing an issue where the driver aborts an IO that we just > > completed. Seeing the "aborting" message instead of "polled" is very > > misleading as to where the timeout problem resides. > > Hmm. Using a started helper for something that by definition is started > doesn't really make much sense. I guess the problem here is that > blk_mq_end_request_batch sets the state to MQ_RQ_IDLE afte calling > blk_complete_request? Maybe we just need an explicit check for > MQ_RQ_IDLE here as started seems like the wrong implication here. We're actually not batching here (no IOB in the timeout context), so we are either: a. calling nvme_pci_complete_rq() inline with the cqe b. racing with smp ipi or softirq If case (a), we will always see IDLE. If (b), we are racing and may see either COMPLETED or IDLE, so we have to check that it's not either of those. Since there's only one other state (STARTED) that was guaranteed prior to entering the timeout handler, we can just make sure it's not that one after the poll to know if abort escalation is needed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] nvme-pci: fix timeout request state check 2023-01-18 5:52 ` Keith Busch @ 2023-01-18 7:33 ` Christoph Hellwig 2023-01-18 15:21 ` Keith Busch 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2023-01-18 7:33 UTC (permalink / raw) To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Jens Axboe On Tue, Jan 17, 2023 at 10:52:39PM -0700, Keith Busch wrote: > We're actually not batching here (no IOB in the timeout context), so we > are either: > > a. calling nvme_pci_complete_rq() inline with the cqe > b. racing with smp ipi or softirq > > If case (a), we will always see IDLE. If (b), we are racing and may see > either COMPLETED or IDLE, so we have to check that it's not either of > those. Since there's only one other state (STARTED) that was guaranteed > prior to entering the timeout handler, we can just make sure it's not > that one after the poll to know if abort escalation is needed. The point is still that "started" is the wrong check here and relies on an implementation detail. I think we're better off with an explicit IDLE check and a big fat comment. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] nvme-pci: fix timeout request state check 2023-01-18 7:33 ` Christoph Hellwig @ 2023-01-18 15:21 ` Keith Busch 2023-01-18 16:35 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Keith Busch @ 2023-01-18 15:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, sagi, Jens Axboe On Wed, Jan 18, 2023 at 08:33:30AM +0100, Christoph Hellwig wrote: > On Tue, Jan 17, 2023 at 10:52:39PM -0700, Keith Busch wrote: > > We're actually not batching here (no IOB in the timeout context), so we > > are either: > > > > a. calling nvme_pci_complete_rq() inline with the cqe > > b. racing with smp ipi or softirq > > > > If case (a), we will always see IDLE. If (b), we are racing and may see > > either COMPLETED or IDLE, so we have to check that it's not either of > > those. Since there's only one other state (STARTED) that was guaranteed > > prior to entering the timeout handler, we can just make sure it's not > > that one after the poll to know if abort escalation is needed. > > The point is still that "started" is the wrong check here and relies > on an implementation detail. I think we're better off with an explicit > IDLE check and a big fat comment. So you want the check to look like this instead? --- @@ -1362,7 +1362,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req) else nvme_poll_irqdisable(nvmeq); - if (blk_mq_request_completed(req)) { + if (blk_mq_request_completed(req) || + blk_mq_rq_state(req) == MQ_RQ_IDLE) { dev_warn(dev->ctrl.device, "I/O %d QID %d timeout, completion polled\n", req->tag, nvmeq->qid); -- That's essentially a more complicated equivalent to what I have, but fine with me if you think it's more clear. Alternatively, I also considered moving the IDLE state setting to when the request is actually freed, which might make more sense and works without changing the nvme driver: --- --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -713,6 +713,7 @@ static void __blk_mq_free_request(struct request *rq) struct blk_mq_hw_ctx *hctx = rq->mq_hctx; const int sched_tag = rq->internal_tag; + WRITE_ONCE(rq->state, MQ_RQ_IDLE); blk_crypto_free_request(rq); blk_pm_mark_last_busy(rq); rq->mq_hctx = NULL; @@ -741,7 +742,6 @@ void blk_mq_free_request(struct request *rq) rq_qos_done(q, rq); - WRITE_ONCE(rq->state, MQ_RQ_IDLE); if (req_ref_put_and_test(rq)) __blk_mq_free_request(rq); } -- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] nvme-pci: fix timeout request state check 2023-01-18 15:21 ` Keith Busch @ 2023-01-18 16:35 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2023-01-18 16:35 UTC (permalink / raw) To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Jens Axboe On Wed, Jan 18, 2023 at 08:21:58AM -0700, Keith Busch wrote: > > The point is still that "started" is the wrong check here and relies > > on an implementation detail. I think we're better off with an explicit > > IDLE check and a big fat comment. > > So you want the check to look like this instead? > > --- > @@ -1362,7 +1362,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req) > else > nvme_poll_irqdisable(nvmeq); > > - if (blk_mq_request_completed(req)) { > + if (blk_mq_request_completed(req) || > + blk_mq_rq_state(req) == MQ_RQ_IDLE) { > dev_warn(dev->ctrl.device, > "I/O %d QID %d timeout, completion polled\n", > req->tag, nvmeq->qid); > -- Or maybe just: if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { .. > Alternatively, I also considered moving the IDLE state setting to when > the request is actually freed, which might make more sense and works > without changing the nvme driver: I like that idea, but this touches gnarly code, so it'll need good testing. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-18 16:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-18 5:22 [PATCHv2] nvme-pci: fix timeout request state check Keith Busch 2023-01-18 5:33 ` Christoph Hellwig 2023-01-18 5:52 ` Keith Busch 2023-01-18 7:33 ` Christoph Hellwig 2023-01-18 15:21 ` Keith Busch 2023-01-18 16:35 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox