* [PATCH v3 0/4] nvme: protect against possible request reference after completion
@ 2021-06-16 21:19 Sagi Grimberg
2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Sagi Grimberg @ 2021-06-16 21:19 UTC (permalink / raw)
To: linux-nvme, Christoph Hellwig, Keith Busch
Cc: Hannes Reinecke, Chaitanya Kulkarni
Nothing in nvme protects against referencing a request after it was completed.
For example, in case a buggy controller sends a completion twice for the same
request, the host can access and modify a request that was already completed.
At best, this will cause a panic, but on the worst case, this can cause a silent
data corruption if the request was already reused and executed by the time
we reference it.
The nvme command_id is an opaque that we simply placed the request tag thus far.
To protect against a access after completion, we introduce a generation counter
to the upper 4-bits of the command_id that will increment every invocation and
be validated upon the reception of a completion. This will limit the maximum
queue depth to be effectively 4095, but we hardly ever use such long queues
(in fabrics the maximum is already 1024).
Changes from v2:
- cc linux-nfs,linux-kernel for patch 1/4
- fix expected genctr print in patch 4/4
- match param_set_uint_minmax indentation
- collected review tags
Changes from v1:
- lift param_set_uint_minmax and reuse it
- simplify initialization in patch 3/4
Sagi Grimberg (4):
params: lift param_set_uint_minmax to common code
nvme-pci: limit maximum queue depth to 4095
nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
nvme: code command_id with a genctr for use-after-free validation
drivers/nvme/host/core.c | 3 ++-
drivers/nvme/host/nvme.h | 47 ++++++++++++++++++++++++++++++++++++-
drivers/nvme/host/pci.c | 17 ++++++--------
drivers/nvme/host/rdma.c | 4 ++--
drivers/nvme/host/tcp.c | 38 ++++++++++++------------------
drivers/nvme/target/loop.c | 4 ++--
include/linux/moduleparam.h | 3 +++
kernel/params.c | 19 +++++++++++++++
net/sunrpc/xprtsock.c | 18 --------------
9 files changed, 96 insertions(+), 57 deletions(-)
--
2.27.0
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH v3 1/4] params: lift param_set_uint_minmax to common code 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg @ 2021-06-16 21:19 ` Sagi Grimberg 2021-06-17 5:45 ` Hannes Reinecke ` (2 more replies) 2021-06-16 21:19 ` [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg ` (3 subsequent siblings) 4 siblings, 3 replies; 13+ messages in thread From: Sagi Grimberg @ 2021-06-16 21:19 UTC (permalink / raw) To: linux-nvme, Christoph Hellwig, Keith Busch Cc: Hannes Reinecke, Chaitanya Kulkarni It is a useful helper hence move it to common code so others can enjoy it. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- include/linux/moduleparam.h | 3 +++ kernel/params.c | 19 +++++++++++++++++++ net/sunrpc/xprtsock.c | 18 ------------------ 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h index eed280fae433..b36ece513616 100644 --- a/include/linux/moduleparam.h +++ b/include/linux/moduleparam.h @@ -432,6 +432,9 @@ extern const struct kernel_param_ops param_ops_uint; extern int param_set_uint(const char *val, const struct kernel_param *kp); extern int param_get_uint(char *buffer, const struct kernel_param *kp); #define param_check_uint(name, p) __param_check(name, p, unsigned int) +int param_set_uint_minmax(const char *val, + const struct kernel_param *kp, + unsigned int min, unsigned int max); extern const struct kernel_param_ops param_ops_long; extern int param_set_long(const char *val, const struct kernel_param *kp); diff --git a/kernel/params.c b/kernel/params.c index 2daa2780a92c..7d3c61e64140 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -243,6 +243,25 @@ STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", kstrtoul); STANDARD_PARAM_DEF(ullong, unsigned long long, "%llu", kstrtoull); STANDARD_PARAM_DEF(hexint, unsigned int, "%#08x", kstrtouint); +int param_set_uint_minmax(const char *val, + const struct kernel_param *kp, + unsigned int min, unsigned int max) +{ + unsigned int num; + int ret; + + if (!val) + return -EINVAL; + ret = kstrtouint(val, 0, &num); + if (ret) + return ret; + if (num < min || num > max) + return -EINVAL; + *((unsigned int *)kp->arg) = num; + return 0; +} +EXPORT_SYMBOL(param_set_uint_minmax); + int param_set_charp(const char *val, const struct kernel_param *kp) { if (strlen(val) > 1024) { diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 47aa47a2b07c..0cfbf618e8c2 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -3130,24 +3130,6 @@ void cleanup_socket_xprt(void) xprt_unregister_transport(&xs_bc_tcp_transport); } -static int param_set_uint_minmax(const char *val, - const struct kernel_param *kp, - unsigned int min, unsigned int max) -{ - unsigned int num; - int ret; - - if (!val) - return -EINVAL; - ret = kstrtouint(val, 0, &num); - if (ret) - return ret; - if (num < min || num > max) - return -EINVAL; - *((unsigned int *)kp->arg) = num; - return 0; -} - static int param_set_portnr(const char *val, const struct kernel_param *kp) { return param_set_uint_minmax(val, kp, -- 2.27.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] params: lift param_set_uint_minmax to common code 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg @ 2021-06-17 5:45 ` Hannes Reinecke 2021-06-17 8:00 ` Daniel Wagner 2021-06-17 13:48 ` Christoph Hellwig 2 siblings, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2021-06-17 5:45 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch Cc: Chaitanya Kulkarni On 6/16/21 11:19 PM, Sagi Grimberg wrote: > It is a useful helper hence move it to common code so others can enjoy > it. > > Suggested-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > Cc: linux-nfs@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > include/linux/moduleparam.h | 3 +++ > kernel/params.c | 19 +++++++++++++++++++ > net/sunrpc/xprtsock.c | 18 ------------------ > 3 files changed, 22 insertions(+), 18 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] params: lift param_set_uint_minmax to common code 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg 2021-06-17 5:45 ` Hannes Reinecke @ 2021-06-17 8:00 ` Daniel Wagner 2021-06-17 13:48 ` Christoph Hellwig 2 siblings, 0 replies; 13+ messages in thread From: Daniel Wagner @ 2021-06-17 8:00 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni On Wed, Jun 16, 2021 at 02:19:33PM -0700, Sagi Grimberg wrote: > It is a useful helper hence move it to common code so others can enjoy > it. > > Suggested-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > Cc: linux-nfs@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [...] > +int param_set_uint_minmax(const char *val, > + const struct kernel_param *kp, > + unsigned int min, unsigned int max) > +{ > + unsigned int num; > + int ret; > + > + if (!val) > + return -EINVAL; > + ret = kstrtouint(val, 0, &num); > + if (ret) > + return ret; > + if (num < min || num > max) > + return -EINVAL; > + *((unsigned int *)kp->arg) = num; > + return 0; > +} > +EXPORT_SYMBOL(param_set_uint_minmax); Couldn't this be EXPORT_SYMBOL_GPL? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/4] params: lift param_set_uint_minmax to common code 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg 2021-06-17 5:45 ` Hannes Reinecke 2021-06-17 8:00 ` Daniel Wagner @ 2021-06-17 13:48 ` Christoph Hellwig 2 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2021-06-17 13:48 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni This needs a Cc to linux-kernel.. > @@ -432,6 +432,9 @@ extern const struct kernel_param_ops param_ops_uint; > extern int param_set_uint(const char *val, const struct kernel_param *kp); > extern int param_get_uint(char *buffer, const struct kernel_param *kp); > #define param_check_uint(name, p) __param_check(name, p, unsigned int) > +int param_set_uint_minmax(const char *val, > + const struct kernel_param *kp, > + unsigned int min, unsigned int max); Super minor nitpick, but for consistency I'd move it above the param_check_uint definition/ _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg @ 2021-06-16 21:19 ` Sagi Grimberg 2021-06-17 5:46 ` Hannes Reinecke 2021-06-17 8:04 ` Daniel Wagner 2021-06-16 21:19 ` [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg ` (2 subsequent siblings) 4 siblings, 2 replies; 13+ messages in thread From: Sagi Grimberg @ 2021-06-16 21:19 UTC (permalink / raw) To: linux-nvme, Christoph Hellwig, Keith Busch Cc: Hannes Reinecke, Chaitanya Kulkarni We are going to use the upper 4-bits of the command_id for a generation counter, so enforce the new queue depth upper limit. As we enforce both min and max queue depth, use param_set_uint_minmax istead of open coding it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- drivers/nvme/host/pci.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 9cdf2099027a..d2ee9e46f5d3 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -60,6 +60,8 @@ MODULE_PARM_DESC(sgl_threshold, "Use SGLs when average request segment size is larger or equal to " "this size. Use 0 to disable SGLs."); +#define NVME_PCI_MIN_QUEUE_SIZE 2 +#define NVME_PCI_MAX_QUEUE_SIZE 4095 static int io_queue_depth_set(const char *val, const struct kernel_param *kp); static const struct kernel_param_ops io_queue_depth_ops = { .set = io_queue_depth_set, @@ -68,7 +70,7 @@ static const struct kernel_param_ops io_queue_depth_ops = { static unsigned int io_queue_depth = 1024; module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644); -MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); +MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2 and < 4096"); static int io_queue_count_set(const char *val, const struct kernel_param *kp) { @@ -157,14 +159,9 @@ struct nvme_dev { static int io_queue_depth_set(const char *val, const struct kernel_param *kp) { - int ret; - u32 n; - - ret = kstrtou32(val, 10, &n); - if (ret != 0 || n < 2) - return -EINVAL; - - return param_set_uint(val, kp); + return param_set_uint_minmax(val, kp, + NVME_PCI_MIN_QUEUE_SIZE, + NVME_PCI_MAX_QUEUE_SIZE); } static inline unsigned int sq_idx(unsigned int qid, u32 stride) -- 2.27.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 2021-06-16 21:19 ` [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg @ 2021-06-17 5:46 ` Hannes Reinecke 2021-06-17 8:04 ` Daniel Wagner 1 sibling, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2021-06-17 5:46 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch Cc: Chaitanya Kulkarni On 6/16/21 11:19 PM, Sagi Grimberg wrote: > We are going to use the upper 4-bits of the command_id for a generation > counter, so enforce the new queue depth upper limit. As we enforce > both min and max queue depth, use param_set_uint_minmax istead of > open coding it. > > Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > drivers/nvme/host/pci.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 2021-06-16 21:19 ` [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg 2021-06-17 5:46 ` Hannes Reinecke @ 2021-06-17 8:04 ` Daniel Wagner 1 sibling, 0 replies; 13+ messages in thread From: Daniel Wagner @ 2021-06-17 8:04 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni On Wed, Jun 16, 2021 at 02:19:34PM -0700, Sagi Grimberg wrote: > We are going to use the upper 4-bits of the command_id for a generation > counter, so enforce the new queue depth upper limit. As we enforce > both min and max queue depth, use param_set_uint_minmax istead of > open coding it. > > Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- Reviewed-by: Daniel Wagner <dwagner@suse.de> _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg 2021-06-16 21:19 ` [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg @ 2021-06-16 21:19 ` Sagi Grimberg 2021-06-17 8:11 ` Daniel Wagner 2021-06-16 21:19 ` [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg 2021-07-16 7:15 ` [PATCH v3 0/4] nvme: protect against possible request reference after completion Christoph Hellwig 4 siblings, 1 reply; 13+ messages in thread From: Sagi Grimberg @ 2021-06-16 21:19 UTC (permalink / raw) To: linux-nvme, Christoph Hellwig, Keith Busch Cc: Hannes Reinecke, Chaitanya Kulkarni We already validate it when receiving the c2hdata pdu header and this is not changing so this is a redundant check. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- drivers/nvme/host/tcp.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index c7bd37103cf4..3ad65f42fc1e 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -703,17 +703,9 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, unsigned int *offset, size_t *len) { struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu; - struct nvme_tcp_request *req; - struct request *rq; - - rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); - if (!rq) { - dev_err(queue->ctrl->ctrl.device, - "queue %d tag %#x not found\n", - nvme_tcp_queue_id(queue), pdu->command_id); - return -ENOENT; - } - req = blk_mq_rq_to_pdu(rq); + struct request *rq = + blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); + struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); while (true) { int recv_len, ret; -- 2.27.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data 2021-06-16 21:19 ` [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg @ 2021-06-17 8:11 ` Daniel Wagner 0 siblings, 0 replies; 13+ messages in thread From: Daniel Wagner @ 2021-06-17 8:11 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni On Wed, Jun 16, 2021 at 02:19:35PM -0700, Sagi Grimberg wrote: > We already validate it when receiving the c2hdata pdu header > and this is not changing so this is a redundant check. > > Reviewed-by: Hannes Reinecke <hare@suse.de> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg ` (2 preceding siblings ...) 2021-06-16 21:19 ` [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg @ 2021-06-16 21:19 ` Sagi Grimberg 2021-06-17 8:56 ` Daniel Wagner 2021-07-16 7:15 ` [PATCH v3 0/4] nvme: protect against possible request reference after completion Christoph Hellwig 4 siblings, 1 reply; 13+ messages in thread From: Sagi Grimberg @ 2021-06-16 21:19 UTC (permalink / raw) To: linux-nvme, Christoph Hellwig, Keith Busch Cc: Hannes Reinecke, Chaitanya Kulkarni We cannot detect a (perhaps buggy) controller that is sending us a completion for a request that was already completed (for example sending a completion twice), this phenomenon was seen in the wild a few times. So to protect against this, we use the upper 4 msbits of the nvme sqe command_id to use as a 4-bit generation counter and verify it matches the existing request generation that is incrementing on every execution. The 16-bit command_id structure now is constructed by: | xxxx | xxxxxxxxxxxx | gen request tag This means that we are giving up some possible queue depth as 12 bits allow for a maximum queue depth of 4095 instead of 65536, however we never create such long queues anyways so no real harm done. Suggested-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- drivers/nvme/host/core.c | 3 ++- drivers/nvme/host/nvme.h | 47 +++++++++++++++++++++++++++++++++++++- drivers/nvme/host/pci.c | 2 +- drivers/nvme/host/rdma.c | 4 ++-- drivers/nvme/host/tcp.c | 26 ++++++++++----------- drivers/nvme/target/loop.c | 4 ++-- 6 files changed, 66 insertions(+), 20 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 25ad9027f928..a75876dfa38c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1026,7 +1026,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req) return BLK_STS_IOERR; } - cmd->common.command_id = req->tag; + nvme_req(req)->genctr++; + cmd->common.command_id = nvme_cid(req); trace_nvme_setup_cmd(req, cmd); return ret; } diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 1aab74128d40..ad518b1c0fac 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -158,6 +158,7 @@ enum nvme_quirks { struct nvme_request { struct nvme_command *cmd; union nvme_result result; + u8 genctr; u8 retries; u8 flags; u16 status; @@ -497,6 +498,49 @@ struct nvme_ctrl_ops { int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); }; +/* + * nvme command_id is constructed as such: + * | xxxx | xxxxxxxxxxxx | + * gen request tag + */ +#define nvme_genctr_mask(gen) (gen & 0xf) +#define nvme_cid_install_genctr(gen) (nvme_genctr_mask(gen) << 12) +#define nvme_genctr_from_cid(cid) ((cid & 0xf000) >> 12) +#define nvme_tag_from_cid(cid) (cid & 0xfff) + +static inline u16 nvme_cid(struct request *rq) +{ + return nvme_cid_install_genctr(nvme_req(rq)->genctr) | rq->tag; +} + +static inline struct request *nvme_find_rq(struct blk_mq_tags *tags, + u16 command_id) +{ + u8 genctr = nvme_genctr_from_cid(command_id); + u16 tag = nvme_tag_from_cid(command_id); + struct request *rq; + + rq = blk_mq_tag_to_rq(tags, tag); + if (unlikely(!rq)) { + pr_err("could not locate request for tag %#x\n", + tag); + return NULL; + } + if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) { + dev_err(nvme_req(rq)->ctrl->device, + "request %#x genctr mismatch (got %#x expected %#x)\n", + tag, genctr, nvme_genctr_mask(nvme_req(rq)->genctr)); + return NULL; + } + return rq; +} + +static inline struct request *nvme_cid_to_rq(struct blk_mq_tags *tags, + u16 command_id) +{ + return blk_mq_tag_to_rq(tags, nvme_tag_from_cid(command_id)); +} + #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj, const char *dev_name); @@ -594,7 +638,8 @@ static inline void nvme_put_ctrl(struct nvme_ctrl *ctrl) static inline bool nvme_is_aen_req(u16 qid, __u16 command_id) { - return !qid && command_id >= NVME_AQ_BLK_MQ_DEPTH; + return !qid && + nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH; } void nvme_complete_rq(struct request *req); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d2ee9e46f5d3..e111532c2082 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1012,7 +1012,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx) return; } - req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), command_id); + req = nvme_find_rq(nvme_queue_tagset(nvmeq), command_id); if (unlikely(!req)) { dev_warn(nvmeq->dev->ctrl.device, "invalid id %d completed on queue %d\n", diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 74bf2c7f2b80..14d5023603d7 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1730,10 +1730,10 @@ static void nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue, struct request *rq; struct nvme_rdma_request *req; - rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id); + rq = nvme_find_rq(nvme_rdma_tagset(queue), cqe->command_id); if (!rq) { dev_err(queue->ctrl->ctrl.device, - "tag 0x%x on QP %#x not found\n", + "got bad command_id %#x on QP %#x\n", cqe->command_id, queue->qp->qp_num); nvme_rdma_error_recovery(queue->ctrl); return; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 3ad65f42fc1e..b906ee41449e 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -488,11 +488,11 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue, { struct request *rq; - rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), cqe->command_id); + rq = nvme_find_rq(nvme_tcp_tagset(queue), cqe->command_id); if (!rq) { dev_err(queue->ctrl->ctrl.device, - "queue %d tag 0x%x not found\n", - nvme_tcp_queue_id(queue), cqe->command_id); + "got bad cqe.command_id %#x on queue %d\n", + cqe->command_id, nvme_tcp_queue_id(queue)); nvme_tcp_error_recovery(&queue->ctrl->ctrl); return -EINVAL; } @@ -509,11 +509,11 @@ static int nvme_tcp_handle_c2h_data(struct nvme_tcp_queue *queue, { struct request *rq; - rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); + rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id); if (!rq) { dev_err(queue->ctrl->ctrl.device, - "queue %d tag %#x not found\n", - nvme_tcp_queue_id(queue), pdu->command_id); + "got bad c2hdata.command_id %#x on queue %d\n", + pdu->command_id, nvme_tcp_queue_id(queue)); return -ENOENT; } @@ -607,7 +607,7 @@ static int nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req, data->hdr.plen = cpu_to_le32(data->hdr.hlen + hdgst + req->pdu_len + ddgst); data->ttag = pdu->ttag; - data->command_id = rq->tag; + data->command_id = nvme_cid(rq); data->data_offset = cpu_to_le32(req->data_sent); data->data_length = cpu_to_le32(req->pdu_len); return 0; @@ -620,11 +620,11 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, struct request *rq; int ret; - rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); + rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id); if (!rq) { dev_err(queue->ctrl->ctrl.device, - "queue %d tag %#x not found\n", - nvme_tcp_queue_id(queue), pdu->command_id); + "got bad r2t.command_id %#x on queue %d\n", + pdu->command_id, nvme_tcp_queue_id(queue)); return -ENOENT; } req = blk_mq_rq_to_pdu(rq); @@ -704,7 +704,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb, { struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu; struct request *rq = - blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id); + nvme_cid_to_rq(nvme_tcp_tagset(queue), pdu->command_id); struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq); while (true) { @@ -797,8 +797,8 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue, } if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) { - struct request *rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), - pdu->command_id); + struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue), + pdu->command_id); nvme_tcp_end_request(rq, NVME_SC_SUCCESS); queue->nr_cqe++; diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index cb30cb942e1d..88d34212b6dc 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -107,10 +107,10 @@ static void nvme_loop_queue_response(struct nvmet_req *req) } else { struct request *rq; - rq = blk_mq_tag_to_rq(nvme_loop_tagset(queue), cqe->command_id); + rq = nvme_find_rq(nvme_loop_tagset(queue), cqe->command_id); if (!rq) { dev_err(queue->ctrl->ctrl.device, - "tag 0x%x on queue %d not found\n", + "got bad command_id %#x on queue %d\n", cqe->command_id, nvme_loop_queue_idx(queue)); return; } -- 2.27.0 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation 2021-06-16 21:19 ` [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg @ 2021-06-17 8:56 ` Daniel Wagner 0 siblings, 0 replies; 13+ messages in thread From: Daniel Wagner @ 2021-06-17 8:56 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni On Wed, Jun 16, 2021 at 02:19:36PM -0700, Sagi Grimberg wrote: > We cannot detect a (perhaps buggy) controller that is sending us > a completion for a request that was already completed (for example > sending a completion twice), this phenomenon was seen in the wild > a few times. > > So to protect against this, we use the upper 4 msbits of the nvme sqe > command_id to use as a 4-bit generation counter and verify it matches > the existing request generation that is incrementing on every execution. > > The 16-bit command_id structure now is constructed by: > | xxxx | xxxxxxxxxxxx | > gen request tag > > This means that we are giving up some possible queue depth as 12 bits > allow for a maximum queue depth of 4095 instead of 65536, however we > never create such long queues anyways so no real harm done. > > Suggested-by: Keith Busch <kbusch@kernel.org> > Reviewed-by: Hannes Reinecke <hare@suse.de> > Acked-by: Keith Busch <kbusch@kernel.org> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> I've tested (only functional) this on FC (NetApp target). All looks good. Tested-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] nvme: protect against possible request reference after completion 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg ` (3 preceding siblings ...) 2021-06-16 21:19 ` [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg @ 2021-07-16 7:15 ` Christoph Hellwig 4 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2021-07-16 7:15 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke, Chaitanya Kulkarni Thanks, applies to nvme-5.15. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-07-16 7:15 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-16 21:19 [PATCH v3 0/4] nvme: protect against possible request reference after completion Sagi Grimberg 2021-06-16 21:19 ` [PATCH v3 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg 2021-06-17 5:45 ` Hannes Reinecke 2021-06-17 8:00 ` Daniel Wagner 2021-06-17 13:48 ` Christoph Hellwig 2021-06-16 21:19 ` [PATCH v3 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg 2021-06-17 5:46 ` Hannes Reinecke 2021-06-17 8:04 ` Daniel Wagner 2021-06-16 21:19 ` [PATCH v3 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg 2021-06-17 8:11 ` Daniel Wagner 2021-06-16 21:19 ` [PATCH v3 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg 2021-06-17 8:56 ` Daniel Wagner 2021-07-16 7:15 ` [PATCH v3 0/4] nvme: protect against possible request reference after completion Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).