* [PATCH] NVMe: Fix abort handling
@ 2014-12-08 19:21 Sam Bradshaw
2014-12-08 19:16 ` Keith Busch
2014-12-08 19:18 ` Jens Axboe
0 siblings, 2 replies; 3+ messages in thread
From: Sam Bradshaw @ 2014-12-08 19:21 UTC (permalink / raw)
This patch fixes some problems in the abort handling code; freeing the
correct request (abort_req) and setting abort_limit & rq_aborted if
submitted the abort request fails. In addition, a device reset is
scheduled pro-actively if the device fails to successful abort a command
rather than waiting for the timeout handler to disposition the condition.
Signed-off-by: Sam Bradshaw <sbradshaw at micron.com>
---
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index c154165..34021ed 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -284,6 +284,18 @@ static void abort_completion(struct nvme_queue *nvmeq, void *ctx,
dev_warn(nvmeq->q_dmadev, "Abort status:%x result:%x", status, result);
++nvmeq->dev->abort_limit;
+
+ /* Reset controller if abort failed */
+ if (cqe->result & 0x1) {
+ if (work_busy(&nvmeq->dev->reset_work))
+ return;
+ spin_lock(&dev_list_lock);
+ list_del_init(&nvmeq->dev->node);
+ nvmeq->dev->reset_workfn = nvme_reset_failed_dev;
+ queue_work(nvme_workq, &nvmeq->dev->reset_work);
+ spin_unlock(&dev_list_lock);
+ return;
+ }
}
static void async_completion(struct nvme_queue *nvmeq, void *ctx,
@@ -1054,7 +1066,9 @@ static void nvme_abort_req(struct request *req)
dev_warn(nvmeq->q_dmadev,
"Could not abort I/O %d QID %d",
req->tag, nvmeq->qid);
- blk_mq_free_request(req);
+ blk_mq_free_request(abort_req);
+ ++dev->abort_limit;
+ cmd_rq->aborted = 0;
}
}
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH] NVMe: Fix abort handling
2014-12-08 19:21 [PATCH] NVMe: Fix abort handling Sam Bradshaw
@ 2014-12-08 19:16 ` Keith Busch
2014-12-08 19:18 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Keith Busch @ 2014-12-08 19:16 UTC (permalink / raw)
On Mon, 8 Dec 2014, Sam Bradshaw wrote:
> This patch fixes some problems in the abort handling code; freeing the
> correct request (abort_req) and setting abort_limit & rq_aborted if
> submitted the abort request fails. In addition, a device reset is
> scheduled pro-actively if the device fails to successful abort a command
> rather than waiting for the timeout handler to disposition the condition.
>
> Signed-off-by: Sam Bradshaw <sbradshaw at micron.com>
> ---
> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index c154165..34021ed 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -284,6 +284,18 @@ static void abort_completion(struct nvme_queue *nvmeq, void *ctx,
>
> dev_warn(nvmeq->q_dmadev, "Abort status:%x result:%x", status, result);
> ++nvmeq->dev->abort_limit;
> +
> + /* Reset controller if abort failed */
> + if (cqe->result & 0x1) {
I think you're asking for trouble by checking this bit to determine if
you need to reset the controller. If result is set to 1, the controller
is reporting it failed to abort the command, but you've no idea why. Maybe
the command's completion was already on a completion queue, so it couldn't
find it, and now you've reset the controller when you didn't have to.
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH] NVMe: Fix abort handling
2014-12-08 19:21 [PATCH] NVMe: Fix abort handling Sam Bradshaw
2014-12-08 19:16 ` Keith Busch
@ 2014-12-08 19:18 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2014-12-08 19:18 UTC (permalink / raw)
On 12/08/2014 12:21 PM, Sam Bradshaw wrote:
> This patch fixes some problems in the abort handling code; freeing the
> correct request (abort_req) and setting abort_limit & rq_aborted if
> submitted the abort request fails. In addition, a device reset is
> scheduled pro-actively if the device fails to successful abort a command
> rather than waiting for the timeout handler to disposition the condition.
Generally I'm not a huge fan of patches that end up having "in addition"
or "additionally" in the commit message, that's usually a clear sign
that it should have been two (or more) patches.
I'll cut this hunk:
> @@ -1054,7 +1066,9 @@ static void nvme_abort_req(struct request *req)
> dev_warn(nvmeq->q_dmadev,
> "Could not abort I/O %d QID %d",
> req->tag, nvmeq->qid);
> - blk_mq_free_request(req);
> + blk_mq_free_request(abort_req);
> + ++dev->abort_limit;
> + cmd_rq->aborted = 0;
> }
> }
since that's a clear bug fix, and you can hash out the details on the
reset handling with Keith, since he's a lot more qualified to render an
opinion on that.
OK?
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-08 19:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-08 19:21 [PATCH] NVMe: Fix abort handling Sam Bradshaw
2014-12-08 19:16 ` Keith Busch
2014-12-08 19:18 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox