* [PATCH] nvme: Implement Enhanced Command Retry
@ 2018-11-27 15:37 Keith Busch
2018-11-27 15:46 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2018-11-27 15:37 UTC (permalink / raw)
A controller may have an internal state that is not able to successfully
process commands for a short duration. In such states, an immediate
command requeue is expected to fail. The driver may exceed its max
retry count, which permanently ends the command in failure when the same
command would succeed after waiting for the controller to be ready.
NVMe ratified TP 4033 provides a delay hint in the completion status
code for failed commands. Implement the retry delay based on the command
completion status and the controller's requested delay.
Note that requeued commands are handled per request_queue, not per
individual request. If multiple commands fail, the controller should
consistently report the desired delay time for retryable commands in
all CQEs, otherwise the requeue list may be kicked too soon.
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
drivers/nvme/host/core.c | 37 ++++++++++++++++++++++++++++++++++++-
drivers/nvme/host/nvme.h | 1 +
include/linux/nvme.h | 17 ++++++++++++++++-
3 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c418b7a347e0..7397cb48db70 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -261,8 +261,16 @@ void nvme_complete_rq(struct request *req)
}
if (!blk_queue_dying(req->q)) {
+ unsigned long delay = 0;
+ struct nvme_ns *ns = req->q->queuedata;
+ u16 crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
+
+ if (crd && ns)
+ delay = ns->ctrl->crdt[crd - 1] * 100;
+
nvme_req(req)->retries++;
- blk_mq_requeue_request(req, true);
+ blk_mq_requeue_request(req, false);
+ blk_mq_delay_kick_requeue_list(req->q, delay);
return;
}
}
@@ -1883,6 +1891,25 @@ static int nvme_configure_timestamp(struct nvme_ctrl *ctrl)
return ret;
}
+static int nvme_configure_acre(struct nvme_ctrl *ctrl)
+{
+ struct nvme_feat_host_behavior *host;
+ int ret;
+
+ if (!ctrl->crdt[0])
+ return 0;
+
+ host = kzalloc(sizeof(*host), GFP_KERNEL);
+ if (!host)
+ return 0;
+
+ host->acre = NVME_ENABLE_ACRE;
+ ret = nvme_set_features(ctrl, NVME_FEAT_HOST_BEHAVIOR, 0,
+ host, sizeof(*host), NULL);
+ kfree(host);
+ return ret;
+}
+
static int nvme_configure_apst(struct nvme_ctrl *ctrl)
{
/*
@@ -2404,6 +2431,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
ctrl->quirks &= ~NVME_QUIRK_NO_DEEPEST_PS;
}
+ ctrl->crdt[0] = le16_to_cpu(id->crdt1);
+ ctrl->crdt[1] = le16_to_cpu(id->crdt2);
+ ctrl->crdt[2] = le16_to_cpu(id->crdt3);
+
ctrl->oacs = le16_to_cpu(id->oacs);
ctrl->oncs = le16_to_cpup(&id->oncs);
ctrl->oaes = le32_to_cpu(id->oaes);
@@ -2504,6 +2535,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
if (ret < 0)
return ret;
+ ret = nvme_configure_acre(ctrl);
+ if (ret < 0)
+ return ret;
+
ctrl->identified = true;
return 0;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 27663ce3044e..51063dbf08bf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -181,6 +181,7 @@ struct nvme_ctrl {
u32 page_size;
u32 max_hw_sectors;
u32 max_segments;
+ u16 crdt[3];
u16 oncs;
u16 oacs;
u16 nssa;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index c03973c215ad..88812cb15be0 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -223,7 +223,11 @@ struct nvme_id_ctrl {
__le32 rtd3e;
__le32 oaes;
__le32 ctratt;
- __u8 rsvd100[156];
+ __u8 rsvd100[28];
+ __le16 crdt1;
+ __le16 crdt2;
+ __le16 crdt3;
+ __u8 rsvd134[122];
__le16 oacs;
__u8 acl;
__u8 aerl;
@@ -756,6 +760,15 @@ enum {
NVME_HOST_MEM_RETURN = (1 << 1),
};
+struct nvme_feat_host_behavior {
+ __u8 acre;
+ __u8 resv1[511];
+};
+
+enum {
+ NVME_ENABLE_ACRE = 1,
+};
+
/* Admin commands */
enum nvme_admin_opcode {
@@ -810,6 +823,7 @@ enum {
NVME_FEAT_RRL = 0x12,
NVME_FEAT_PLM_CONFIG = 0x13,
NVME_FEAT_PLM_WINDOW = 0x14,
+ NVME_FEAT_HOST_BEHAVIOR = 0x16,
NVME_FEAT_SW_PROGRESS = 0x80,
NVME_FEAT_HOST_ID = 0x81,
NVME_FEAT_RESV_MASK = 0x82,
@@ -1265,6 +1279,7 @@ enum {
NVME_SC_ANA_TRANSITION = 0x303,
NVME_SC_HOST_PATH_ERROR = 0x370,
+ NVME_SC_CRD = 0x1800,
NVME_SC_DNR = 0x4000,
};
--
2.14.4
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH] nvme: Implement Enhanced Command Retry
2018-11-27 15:37 [PATCH] nvme: Implement Enhanced Command Retry Keith Busch
@ 2018-11-27 15:46 ` Christoph Hellwig
2018-11-27 15:50 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2018-11-27 15:46 UTC (permalink / raw)
On Tue, Nov 27, 2018@08:37:10AM -0700, Keith Busch wrote:
> if (!blk_queue_dying(req->q)) {
> + unsigned long delay = 0;
> + struct nvme_ns *ns = req->q->queuedata;
> + u16 crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
> +
> + if (crd && ns)
> + delay = ns->ctrl->crdt[crd - 1] * 100;
Do we need bounce checking for crd here?
> +
> nvme_req(req)->retries++;
> - blk_mq_requeue_request(req, true);
> + blk_mq_requeue_request(req, false);
> + blk_mq_delay_kick_requeue_list(req->q, delay);
> return;
I also think it would be nice to split out this code block into
a little nvme_retry_req ala nvme_failover_req.
Otherwise this looks sensible to me.
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH] nvme: Implement Enhanced Command Retry
2018-11-27 15:46 ` Christoph Hellwig
@ 2018-11-27 15:50 ` Keith Busch
2018-11-27 16:14 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2018-11-27 15:50 UTC (permalink / raw)
On Tue, Nov 27, 2018@04:46:26PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 27, 2018@08:37:10AM -0700, Keith Busch wrote:
> > if (!blk_queue_dying(req->q)) {
> > + unsigned long delay = 0;
> > + struct nvme_ns *ns = req->q->queuedata;
> > + u16 crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
> > +
> > + if (crd && ns)
> > + delay = ns->ctrl->crdt[crd - 1] * 100;
>
> Do we need bounce checking for crd here?
I assume you mean bounds, in which case it should not be necessary since
the result of the mask and shift can't be > 3, which is the size of the
crdt array. I can add the check if not having it is visually alarming.
> > +
> > nvme_req(req)->retries++;
> > - blk_mq_requeue_request(req, true);
> > + blk_mq_requeue_request(req, false);
> > + blk_mq_delay_kick_requeue_list(req->q, delay);
> > return;
>
> I also think it would be nice to split out this code block into
> a little nvme_retry_req ala nvme_failover_req.
Sounds good, will do.
> Otherwise this looks sensible to me.
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH] nvme: Implement Enhanced Command Retry
2018-11-27 15:50 ` Keith Busch
@ 2018-11-27 16:14 ` Christoph Hellwig
0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2018-11-27 16:14 UTC (permalink / raw)
On Tue, Nov 27, 2018@08:50:00AM -0700, Keith Busch wrote:
> On Tue, Nov 27, 2018@04:46:26PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 27, 2018@08:37:10AM -0700, Keith Busch wrote:
> > > if (!blk_queue_dying(req->q)) {
> > > + unsigned long delay = 0;
> > > + struct nvme_ns *ns = req->q->queuedata;
> > > + u16 crd = (nvme_req(req)->status & NVME_SC_CRD) >> 11;
> > > +
> > > + if (crd && ns)
> > > + delay = ns->ctrl->crdt[crd - 1] * 100;
> >
> > Do we need bounce checking for crd here?
>
> I assume you mean bounds, in which case it should not be necessary since
> the result of the mask and shift can't be > 3, which is the size of the
> crdt array. I can add the check if not having it is visually alarming.
True, I don't think we'll need a check. Maybe at best a comment.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-11-27 16:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-27 15:37 [PATCH] nvme: Implement Enhanced Command Retry Keith Busch
2018-11-27 15:46 ` Christoph Hellwig
2018-11-27 15:50 ` Keith Busch
2018-11-27 16:14 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.