[PATCH] NVMe: Use error handling on failed sync commands

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] NVMe: Use error handling on failed sync commands
@ 2013-12-20 18:14 Keith Busch
  2013-12-20 18:47 ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2013-12-20 18:14 UTC (permalink / raw)


Sync commands schedule an internal timeout to cancel rather than using
the nvme timeout handler kthread. We should still try to recover so
moving the check for cancelled commands after the error handling.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/block/nvme-core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index b59a93a..79a130c 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1122,12 +1122,12 @@ static void nvme_cancel_ios(struct nvme_queue *nvmeq, bool timeout)
 
 		if (timeout && !time_after(now, info[cmdid].timeout))
 			continue;
-		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
-			continue;
 		if (timeout && nvmeq->dev->initialized) {
 			nvme_abort_cmd(cmdid, nvmeq);
 			continue;
 		}
+		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
+			continue;
 		dev_warn(nvmeq->q_dmadev, "Cancelling I/O %d QID %d\n", cmdid,
 								nvmeq->qid);
 		ctx = cancel_cmdid(nvmeq, cmdid, &fn);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] NVMe: Use error handling on failed sync commands
  2013-12-20 18:14 [PATCH] NVMe: Use error handling on failed sync commands Keith Busch
@ 2013-12-20 18:47 ` Matthew Wilcox
  2013-12-20 19:41   ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2013-12-20 18:47 UTC (permalink / raw)


On Fri, Dec 20, 2013@11:14:09AM -0700, Keith Busch wrote:
> Sync commands schedule an internal timeout to cancel rather than using
> the nvme timeout handler kthread. We should still try to recover so
> moving the check for cancelled commands after the error handling.

>  		if (timeout && !time_after(now, info[cmdid].timeout))
>  			continue;
> -		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
> -			continue;
>  		if (timeout && nvmeq->dev->initialized) {
>  			nvme_abort_cmd(cmdid, nvmeq);
>  			continue;
>  		}
> +		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
> +			continue;
>  		dev_warn(nvmeq->q_dmadev, "Cancelling I/O %d QID %d\n", cmdid,
>  								nvmeq->qid);
>  		ctx = cancel_cmdid(nvmeq, cmdid, &fn);

I'm confused by this patch.  Won't it cause us to send abort commands
repeatedly for commands IDs that have already been cancelled, but haven't
yet been completed as cancelled?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] NVMe: Use error handling on failed sync commands
  2013-12-20 18:47 ` Matthew Wilcox
@ 2013-12-20 19:41   ` Keith Busch
  0 siblings, 0 replies; 3+ messages in thread
From: Keith Busch @ 2013-12-20 19:41 UTC (permalink / raw)

On Fri, 20 Dec 2013, Matthew Wilcox wrote:
> On Fri, Dec 20, 2013@11:14:09AM -0700, Keith Busch wrote:
>> Sync commands schedule an internal timeout to cancel rather than using
>> the nvme timeout handler kthread. We should still try to recover so
>> moving the check for cancelled commands after the error handling.
>
>>  		if (timeout && !time_after(now, info[cmdid].timeout))
>>  			continue;
>> -		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
>> -			continue;
>>  		if (timeout && nvmeq->dev->initialized) {
>>  			nvme_abort_cmd(cmdid, nvmeq);
>>  			continue;
>>  		}
>> +		if (info[cmdid].ctx == CMD_CTX_CANCELLED)
>> +			continue;
>>  		dev_warn(nvmeq->q_dmadev, "Cancelling I/O %d QID %d\n", cmdid,
>>  								nvmeq->qid);
>>  		ctx = cancel_cmdid(nvmeq, cmdid, &fn);
>
> I'm confused by this patch.  Won't it cause us to send abort commands
> repeatedly for commands IDs that have already been cancelled, but haven't
> yet been completed as cancelled?

It appears so, but 'nvme_abort_cmd' aborts only once then resets the
controller if the command still isn't returned. The drive is not polled
for timeouts when the reset handler is running so it won't timeout
again, and the command is forced cancelled during reset with cancel_ios()
'timeout' set to false.

'sync_command' is the only place an IO can be cancelled while the
device being polled for timeouts. I think we want to try recovering the
unresponsive controller even if the sync timeout already cancelled.

Without this patch, a failed sync command just leaks a cmdid until the
controller is reset some other way.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-12-20 19:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-20 18:14 [PATCH] NVMe: Use error handling on failed sync commands Keith Busch
2013-12-20 18:47 ` Matthew Wilcox
2013-12-20 19:41   ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.