public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA)
@ 2024-05-10 16:30 Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 1/5] nvme: add definitions for cancel command Maurizio Lombardi
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

The nvme host driver could take advantage of Cancel (defined by TP 4097a)
to abort a command that timed out instead of resetting the controller.

The abort command's implementation is mandated by the specs but
the target isn't really required to perform any meaningful work
(as it happens, for example, with the Linux nvmet implementation, where the
abort command does nothing at all).
On the other hand, the Cancel command is optional, therefore if the target
claims to support it, it's reasonable to expect its implementation to
perform something useful.

This patchset modifies the tcp and rdma host drivers' timeout
handlers to check if the target supports the Cancel command; if yes,
it will use it to try to abort the command that timed out.
If Cancel isn't supported or if the abort operation fails,
the host will perform a controller reset.

Tests were carried out both against a modified nvmet driver
(see PATCH 5) and against real hardware devices.

example of output in dmesg:
nvme nvme4: I/O tag 73 (4049) type 4 opcode 0x1 (Write) QID 8 timeout
nvme nvme4: Cancel status: 0x0 imm abrts = 0 def abrts = 1


John Meneghini (1):
  nvme: add definitions for cancel command

Maurizio Lombardi (4):
  nvme-core: add a function to submit a cancel command
  nvme-tcp: use the cancel command to perform an abort if target
    supports it
  nvme-rdma: use the cancel command to perform an abort if target
    supports it
  nvmet: target support for cancel commands

 drivers/nvme/host/constants.c       |  1 +
 drivers/nvme/host/core.c            | 55 ++++++++++++++++++++
 drivers/nvme/host/nvme.h            |  2 +
 drivers/nvme/host/rdma.c            | 28 ++++++++--
 drivers/nvme/host/tcp.c             | 19 +++++++
 drivers/nvme/target/Makefile        |  2 +-
 drivers/nvme/target/admin-cmd.c     | 36 +++++++++++++
 drivers/nvme/target/core.c          | 45 ++++++++++++++++
 drivers/nvme/target/io-cmd-cancel.c | 81 +++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c   | 19 +++++++
 drivers/nvme/target/nvmet.h         | 10 ++++
 include/linux/nvme.h                | 19 +++++++
 12 files changed, 312 insertions(+), 5 deletions(-)
 create mode 100644 drivers/nvme/target/io-cmd-cancel.c

-- 
2.39.3



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] nvme: add definitions for cancel command
  2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
@ 2024-05-10 16:30 ` Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 2/5] nvme-core: add a function to submit a " Maurizio Lombardi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

From: John Meneghini <jmeneghi@redhat.com>

Add new definitions needed to support TP4097a.

Signed-off-by: John Meneghini <jmeneghi@redhat.com>
---
 drivers/nvme/host/constants.c |  1 +
 drivers/nvme/host/core.c      |  1 +
 include/linux/nvme.h          | 19 +++++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c
index 6f2ebb5fcdb0..d1fb08a0cc42 100644
--- a/drivers/nvme/host/constants.c
+++ b/drivers/nvme/host/constants.c
@@ -19,6 +19,7 @@ static const char * const nvme_ops[] = {
 	[nvme_cmd_resv_report] = "Reservation Report",
 	[nvme_cmd_resv_acquire] = "Reservation Acquire",
 	[nvme_cmd_resv_release] = "Reservation Release",
+	[nvme_cmd_cancel] = "Cancel",
 	[nvme_cmd_zone_mgmt_send] = "Zone Management Send",
 	[nvme_cmd_zone_mgmt_recv] = "Zone Management Receive",
 	[nvme_cmd_zone_append] = "Zone Append",
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 095f59e7aa93..b48967c98114 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4835,6 +4835,7 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_dsm_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_write_zeroes_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_abort_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_cancel_cmd) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_get_log_page_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_command) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 425573202295..8efebc805e19 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -830,6 +830,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
 	nvme_cmd_resv_release	= 0x15,
+	nvme_cmd_cancel		= 0x18,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
 	nvme_cmd_zone_append	= 0x7d,
@@ -1343,6 +1344,22 @@ struct nvme_abort_cmd {
 	__u32			rsvd11[5];
 };
 
+struct nvme_cancel_cmd {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__u32                   rsvd1[8];
+	__le16			sqid;
+	__u16			cid;
+	__u8			action;
+	__u8			rsvd11[3];
+	__u32			rsvd12[4];
+};
+
+#define NVME_CANCEL_ACTION_MUL_CMD	1
+#define NVME_CANCEL_ACTION_SINGLE_CMD	0
+
 struct nvme_download_firmware {
 	__u8			opcode;
 	__u8			flags;
@@ -1798,6 +1815,7 @@ struct nvme_command {
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
 		struct nvme_abort_cmd abort;
+		struct nvme_cancel_cmd cancel;
 		struct nvme_get_log_page_command get_log_page;
 		struct nvmf_common_command fabrics;
 		struct nvmf_connect_command connect;
@@ -1938,6 +1956,7 @@ enum {
 	NVME_SC_INVALID_PI		= 0x181,
 	NVME_SC_READ_ONLY		= 0x182,
 	NVME_SC_ONCS_NOT_SUPPORTED	= 0x183,
+	NVME_SC_INVALID_CID		= 0x184,
 
 	/*
 	 * I/O Command Set Specific - Fabrics commands:
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] nvme-core: add a function to submit a cancel command
  2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 1/5] nvme: add definitions for cancel command Maurizio Lombardi
@ 2024-05-10 16:30 ` Maurizio Lombardi
  2024-05-12 14:06   ` Sagi Grimberg
  2024-05-10 16:30 ` [PATCH 3/5] nvme-tcp: use the cancel command to perform an abort if target supports it Maurizio Lombardi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

Add a function to send a cancel command to abort the specified request.
To execute the task, the "single command" cancel action is used.

When the cancel command completes, the host driver will print the
number of deferred and immediate aborts performed by the target.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 54 ++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b48967c98114..3d8c31d5f8ae 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2965,6 +2965,60 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
 	return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
 }
 
+static enum rq_end_io_ret nvme_cancel_endio(struct request *req, blk_status_t error)
+{
+	struct nvme_ctrl *ctrl = req->end_io_data;
+	u32 result;
+	u16 imm_abrts, def_abrts;
+	u16 status = nvme_req(req)->status;
+
+	result = le32_to_cpu(nvme_req(req)->result.u32);
+
+	def_abrts = upper_16_bits(result);
+	imm_abrts = lower_16_bits(result);
+
+	dev_warn(ctrl->device,
+		 "Cancel status: 0x%x imm abrts = %u def abrts = %u",
+		 status, imm_abrts, def_abrts);
+
+	blk_mq_free_request(req);
+	return RQ_END_IO_NONE;
+}
+
+int nvme_submit_cancel_req(struct nvme_ctrl *ctrl, struct request *rq,
+			   unsigned int sqid)
+{
+	struct nvme_command c = { };
+	struct request *cancel_req;
+
+	if (sqid == 0)
+		return -EINVAL;
+
+	c.cancel.opcode = nvme_cmd_cancel;
+	c.cancel.cid = nvme_cid(rq);
+	c.cancel.sqid = cpu_to_le32(sqid);
+	c.cancel.nsid = NVME_NSID_ALL;
+	c.cancel.action = NVME_CANCEL_ACTION_SINGLE_CMD;
+
+	cancel_req = blk_mq_alloc_request_hctx(rq->q, nvme_req_op(&c),
+					       BLK_MQ_REQ_NOWAIT |
+					       BLK_MQ_REQ_RESERVED,
+					       sqid - 1);
+	if (IS_ERR(cancel_req)) {
+		dev_warn(ctrl->device, "Cancel command failed!\n");
+		return PTR_ERR(cancel_req);
+	}
+
+	nvme_init_request(cancel_req, &c);
+	cancel_req->end_io = nvme_cancel_endio;
+	cancel_req->end_io_data = ctrl;
+
+	blk_execute_rq_nowait(cancel_req, false);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_submit_cancel_req);
+
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
 				struct nvme_effects_log **log)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b564c5f1450c..c4cea8b948a8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -895,6 +895,8 @@ int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
 void nvme_queue_scan(struct nvme_ctrl *ctrl);
 int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
 		void *log, size_t size, u64 offset);
+int nvme_submit_cancel_req(struct nvme_ctrl *ctrl, struct request *rq,
+			       unsigned int sqid);
 bool nvme_tryget_ns_head(struct nvme_ns_head *head);
 void nvme_put_ns_head(struct nvme_ns_head *head);
 int nvme_cdev_add(struct cdev *cdev, struct device *cdev_device,
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] nvme-tcp: use the cancel command to perform an abort if target supports it
  2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 1/5] nvme: add definitions for cancel command Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 2/5] nvme-core: add a function to submit a " Maurizio Lombardi
@ 2024-05-10 16:30 ` Maurizio Lombardi
  2024-05-10 16:30 ` [PATCH 4/5] nvme-rdma: " Maurizio Lombardi
  2024-05-10 16:30 ` [donotmerge PATCH 5/5] nvmet: target support for cancel commands Maurizio Lombardi
  4 siblings, 0 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

If available, use Cancel command to abort the command that timed
out instead of resetting the controller.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/tcp.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 28bc2f373cfa..d7922fa9aabe 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -121,6 +121,7 @@ struct nvme_tcp_request {
 	size_t			offset;
 	size_t			data_sent;
 	enum nvme_tcp_send_state state;
+	bool			aborted;
 };
 
 enum nvme_tcp_queue_flags {
@@ -2442,6 +2443,8 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
 	struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
 	struct nvme_command *cmd = &pdu->cmd;
 	int qid = nvme_tcp_queue_id(req->queue);
+	int error;
+	u32 effects;
 
 	dev_warn(ctrl->device,
 		 "I/O tag %d (%04x) type %d opcode %#x (%s) QID %d timeout\n",
@@ -2466,10 +2469,25 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
 		return BLK_EH_DONE;
 	}
 
+	if (!req->aborted) {
+		effects = le32_to_cpu(ctrl->effects->iocs[nvme_cmd_cancel]);
+
+		if (!(effects & NVME_CMD_EFFECTS_CSUPP))
+			goto err_recovery;
+
+		error = nvme_submit_cancel_req(ctrl, rq, qid);
+		if (error)
+			goto err_recovery;
+
+		req->aborted = true;
+		return BLK_EH_RESET_TIMER;
+	}
+
 	/*
 	 * LIVE state should trigger the normal error recovery which will
 	 * handle completing this request.
 	 */
+err_recovery:
 	nvme_tcp_error_recovery(ctrl);
 	return BLK_EH_RESET_TIMER;
 }
@@ -2514,6 +2532,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
 	req->pdu_len = 0;
 	req->pdu_sent = 0;
 	req->h2cdata_left = 0;
+	req->aborted = false;
 	req->data_len = blk_rq_nr_phys_segments(rq) ?
 				blk_rq_payload_bytes(rq) : 0;
 	req->curr_bio = rq->bio;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] nvme-rdma: use the cancel command to perform an abort if target supports it
  2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
                   ` (2 preceding siblings ...)
  2024-05-10 16:30 ` [PATCH 3/5] nvme-tcp: use the cancel command to perform an abort if target supports it Maurizio Lombardi
@ 2024-05-10 16:30 ` Maurizio Lombardi
  2024-05-10 16:30 ` [donotmerge PATCH 5/5] nvmet: target support for cancel commands Maurizio Lombardi
  4 siblings, 0 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

If available, use Cancel command to abort the command that timed
out instead of resetting the controller.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/rdma.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 366f0bb4ebfc..a45b45965b9e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -74,6 +74,7 @@ struct nvme_rdma_request {
 	struct nvme_rdma_sgl	data_sgl;
 	struct nvme_rdma_sgl	*metadata_sgl;
 	bool			use_sig_mr;
+	bool			aborted;
 };
 
 enum nvme_rdma_queue_flags {
@@ -1956,16 +1957,19 @@ static enum blk_eh_timer_return nvme_rdma_timeout(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_queue *queue = req->queue;
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
+	struct nvme_rdma_ctrl *rdma_ctrl = queue->ctrl;
+	struct nvme_ctrl *ctrl = &rdma_ctrl->ctrl;
 	struct nvme_command *cmd = req->req.cmd;
 	int qid = nvme_rdma_queue_idx(queue);
+	int error;
+	u32 effects;
 
-	dev_warn(ctrl->ctrl.device,
+	dev_warn(ctrl->device,
 		 "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout\n",
 		 rq->tag, nvme_cid(rq), cmd->common.opcode,
 		 nvme_fabrics_opcode_str(qid, cmd), qid);
 
-	if (nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_LIVE) {
+	if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) {
 		/*
 		 * If we are resetting, connecting or deleting we should
 		 * complete immediately because we may block controller
@@ -1983,11 +1987,26 @@ static enum blk_eh_timer_return nvme_rdma_timeout(struct request *rq)
 		return BLK_EH_DONE;
 	}
 
+	if (!req->aborted) {
+		effects = le32_to_cpu(ctrl->effects->iocs[nvme_cmd_cancel]);
+
+		if (!(effects & NVME_CMD_EFFECTS_CSUPP))
+			goto err_recovery;
+
+		error = nvme_submit_cancel_req(ctrl, rq, qid);
+		if (error)
+			goto err_recovery;
+
+		req->aborted = true;
+		return BLK_EH_RESET_TIMER;
+	}
+
 	/*
 	 * LIVE state should trigger the normal error recovery which will
 	 * handle completing this request.
 	 */
-	nvme_rdma_error_recovery(ctrl);
+err_recovery:
+	nvme_rdma_error_recovery(rdma_ctrl);
 	return BLK_EH_RESET_TIMER;
 }
 
@@ -2011,6 +2030,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 		return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq);
 
 	dev = queue->device->dev;
+	req->aborted = false;
 
 	req->sqe.dma = ib_dma_map_single(dev, req->sqe.data,
 					 sizeof(struct nvme_command),
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [donotmerge PATCH 5/5] nvmet: target support for cancel commands
  2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
                   ` (3 preceding siblings ...)
  2024-05-10 16:30 ` [PATCH 4/5] nvme-rdma: " Maurizio Lombardi
@ 2024-05-10 16:30 ` Maurizio Lombardi
  4 siblings, 0 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-05-10 16:30 UTC (permalink / raw)
  To: kbusch; +Cc: hare, sagi, emilne, jmeneghi, linux-nvme

This is just a target patch that I used to do some basic tests of the host
Cancel patchset.

This is not intended to be merged.

It's loosely based on LIO's iSCSI driver code, it keeps a list of
outstanding commands and scans it when cancel is requested,
it then marks the request as aborted and the nvmet_req_complete
callback sets the NVME_SC_ABORT_REQ code in the status.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/target/Makefile        |  2 +-
 drivers/nvme/target/admin-cmd.c     | 36 +++++++++++++
 drivers/nvme/target/core.c          | 45 ++++++++++++++++
 drivers/nvme/target/io-cmd-cancel.c | 81 +++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-file.c   | 19 +++++++
 drivers/nvme/target/nvmet.h         | 10 ++++
 6 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 drivers/nvme/target/io-cmd-cancel.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index c66820102493..6541808579ae 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+			discovery.o io-cmd-file.o io-cmd-bdev.o io-cmd-cancel.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
 nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvmet-$(CONFIG_NVME_TARGET_AUTH)	+= fabrics-cmd-auth.o auth.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index f5b7054a4a05..f5640745efe7 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -176,6 +176,7 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
 	log->iocs[nvme_cmd_read] =
 	log->iocs[nvme_cmd_flush] =
 	log->iocs[nvme_cmd_dsm]	=
+	log->iocs[nvme_cmd_cancel] =
 		cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
 	log->iocs[nvme_cmd_write] =
 	log->iocs[nvme_cmd_write_zeroes] =
@@ -736,8 +737,43 @@ static void nvmet_execute_identify(struct nvmet_req *req)
  */
 static void nvmet_execute_abort(struct nvmet_req *req)
 {
+	u16 cid;
+	__le16 sqid;
+	unsigned long flags;
+	struct nvmet_sq *sq;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_req *r, *next;
+
 	if (!nvmet_check_transfer_len(req, 0))
 		return;
+
+	cid  = req->cmd->abort.cid;
+	sqid = le16_to_cpu(req->cmd->abort.sqid);
+	if (sqid > ctrl->subsys->max_qid) {
+		/* Invalid queue id */
+		goto error;
+	}
+
+	sq = ctrl->sqs[sqid];
+	if (!sq)
+		goto error;
+
+	spin_lock_irqsave(&sq->state_lock, flags);
+	list_for_each_entry_safe(r, next, &sq->state_list, state_list) {
+		if (cid != r->cmd->common.command_id)
+			continue;
+
+		if (r == req) {
+			/* Abort command can't abort itself */
+			continue;
+		}
+
+		nvmet_req_abort(r);
+		break;
+	}
+	spin_unlock_irqrestore(&sq->state_lock, flags);
+
+error:
 	nvmet_set_result(req, 1);
 	nvmet_req_complete(req, 0);
 }
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index e06013c5dace..7b0d59d1e673 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -772,12 +772,33 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
 void nvmet_req_complete(struct nvmet_req *req, u16 status)
 {
 	struct nvmet_sq *sq = req->sq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sq->state_lock, flags);
+
+	if (unlikely(req->aborted))
+		status = NVME_SC_ABORT_REQ;
+	else
+		list_del(&req->state_list);
+
+	spin_unlock_irqrestore(&sq->state_lock, flags);
 
 	__nvmet_req_complete(req, status);
 	percpu_ref_put(&sq->ref);
 }
 EXPORT_SYMBOL_GPL(nvmet_req_complete);
 
+void nvmet_req_abort(struct nvmet_req *req)
+{
+	lockdep_assert_held(&req->sq->state_lock);
+
+	if (req->abort)
+		req->abort(req);
+
+	req->aborted = true;
+	list_del_init(&req->state_list);
+}
+
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
 		u16 qid, u16 size)
 {
@@ -818,6 +839,8 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
 	percpu_ref_exit(&sq->ref);
 	nvmet_auth_sq_free(sq);
 
+	WARN_ON_ONCE(!list_empty(&sq->state_list));
+
 	if (ctrl) {
 		/*
 		 * The teardown flow may take some time, and the host may not
@@ -851,6 +874,8 @@ int nvmet_sq_init(struct nvmet_sq *sq)
 	}
 	init_completion(&sq->free_done);
 	init_completion(&sq->confirm_done);
+	INIT_LIST_HEAD(&sq->state_list);
+	spin_lock_init(&sq->state_lock);
 	nvmet_auth_sq_init(sq);
 
 	return 0;
@@ -904,6 +929,12 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 	if (nvmet_is_passthru_req(req))
 		return nvmet_parse_passthru_io_cmd(req);
 
+	if (req->cmd->common.opcode == nvme_cmd_cancel) {
+		req->execute = nvmet_execute_cancel;
+		if (req->cmd->cancel.nsid == NVME_NSID_ALL)
+			return 0;
+	}
+
 	ret = nvmet_req_find_ns(req);
 	if (unlikely(ret))
 		return ret;
@@ -937,6 +968,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops)
 {
 	u8 flags = req->cmd->common.flags;
+	unsigned long sflags;
 	u16 status;
 
 	req->cq = cq;
@@ -953,6 +985,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	req->ns = NULL;
 	req->error_loc = NVMET_NO_ERROR_LOC;
 	req->error_slba = 0;
+	req->aborted = false;
 
 	/* no support for fused commands yet */
 	if (unlikely(flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND))) {
@@ -993,6 +1026,12 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	if (sq->ctrl)
 		sq->ctrl->reset_tbkas = true;
 
+	INIT_LIST_HEAD(&req->state_list);
+
+	spin_lock_irqsave(&sq->state_lock, sflags);
+	list_add_tail(&req->state_list, &sq->state_list);
+	spin_unlock_irqrestore(&sq->state_lock, sflags);
+
 	return true;
 
 fail:
@@ -1003,6 +1042,12 @@ EXPORT_SYMBOL_GPL(nvmet_req_init);
 
 void nvmet_req_uninit(struct nvmet_req *req)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&req->sq->state_lock, flags);
+	list_del(&req->state_list);
+	spin_unlock_irqrestore(&req->sq->state_lock, flags);
+
 	percpu_ref_put(&req->sq->ref);
 	if (req->ns)
 		nvmet_put_namespace(req->ns);
diff --git a/drivers/nvme/target/io-cmd-cancel.c b/drivers/nvme/target/io-cmd-cancel.c
new file mode 100644
index 000000000000..35d0eaa7f707
--- /dev/null
+++ b/drivers/nvme/target/io-cmd-cancel.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe I/O cancel command implementation.
+ * Copyright (c) 2023 Red Hat
+ */
+
+#include "nvmet.h"
+
+void nvmet_execute_cancel(struct nvmet_req *req)
+{
+	u16 cid;
+	__le16 sqid;
+	__le32 nsid;
+	struct nvmet_sq *sq;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_req *r, *next;
+	unsigned long flags;
+	int ret = 0;
+	u16 imm_abrts = 0;
+	u16 def_abrts = 0;
+	bool mult_cmds;
+
+	if (!nvmet_check_transfer_len(req, 0))
+		return;
+
+	cid  = req->cmd->cancel.cid;
+	sqid = le16_to_cpu(req->cmd->cancel.sqid);
+	nsid = le32_to_cpu(req->cmd->cancel.nsid);
+	mult_cmds = req->cmd->cancel.action & NVME_CANCEL_ACTION_MUL_CMD;
+
+	if (sqid > ctrl->subsys->max_qid) {
+		ret = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto error;
+	}
+
+	sq = req->sq;
+
+	if (cid == req->cmd->cancel.command_id && !mult_cmds) {
+		ret = NVME_SC_INVALID_CID | NVME_SC_DNR;
+		goto error;
+	} else if ((cid != 0xFFFF && mult_cmds) || sqid != sq->qid) {
+		ret = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto error;
+	}
+
+	spin_lock_irqsave(&sq->state_lock, flags);
+	list_for_each_entry_safe(r, next, &sq->state_list, state_list) {
+		if (r == req) {
+			/* Cancel command can't abort itself */
+			continue;
+		}
+
+		if (mult_cmds) {
+			if (r->cmd->common.nsid != NVME_NSID_ALL &&
+			    r->cmd->common.nsid != nsid) {
+				continue;
+			}
+
+			nvmet_req_abort(r);
+			def_abrts++;
+		} else {
+			if (cid != r->cmd->common.command_id)
+				continue;
+
+			if (nsid != NVME_NSID_ALL && nsid != r->cmd->common.nsid) {
+				ret = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+				break;
+			}
+
+			nvmet_req_abort(r);
+			def_abrts++;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&sq->state_lock, flags);
+
+error:
+	nvmet_set_result(req, (def_abrts << 16) | imm_abrts);
+	nvmet_req_complete(req, ret);
+}
+
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 2d068439b129..f01033481718 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -330,6 +330,21 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
 	queue_work(nvmet_wq, &req->f.work);
 }
 
+static void nvmet_file_abort_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+
+	nvmet_req_complete(req, NVME_SC_ABORT_REQ);
+}
+
+static void nvmet_file_cancel_work(struct nvmet_req *req)
+{
+	if (cancel_work(&req->f.work)) {
+		INIT_WORK(&req->f.work, nvmet_file_abort_work);
+		queue_work(nvmet_wq, &req->f.work);
+	}
+}
+
 static void nvmet_file_write_zeroes_work(struct work_struct *w)
 {
 	struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -366,15 +381,19 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_read:
 	case nvme_cmd_write:
 		req->execute = nvmet_file_execute_rw;
+		req->abort   = nvmet_file_cancel_work;
 		return 0;
 	case nvme_cmd_flush:
 		req->execute = nvmet_file_execute_flush;
+		req->abort   = nvmet_file_cancel_work;
 		return 0;
 	case nvme_cmd_dsm:
 		req->execute = nvmet_file_execute_dsm;
+		req->abort   = nvmet_file_cancel_work;
 		return 0;
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_file_execute_write_zeroes;
+		req->abort   = nvmet_file_cancel_work;
 		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c1306de1f4dd..777c35d4c6f0 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -124,6 +124,9 @@ struct nvmet_sq {
 #endif
 	struct completion	free_done;
 	struct completion	confirm_done;
+
+	spinlock_t		state_lock;
+	struct list_head	state_list;
 };
 
 struct nvmet_ana_group {
@@ -400,12 +403,16 @@ struct nvmet_req {
 	struct nvmet_port	*port;
 
 	void (*execute)(struct nvmet_req *req);
+	void (*abort)(struct nvmet_req *req);
 	const struct nvmet_fabrics_ops *ops;
 
 	struct pci_dev		*p2p_dev;
 	struct device		*p2p_client;
 	u16			error_loc;
 	u64			error_slba;
+
+	struct list_head	state_list;
+	bool			aborted;
 };
 
 #define NVMET_MAX_MPOOL_BVEC		16
@@ -468,12 +475,15 @@ u16 nvmet_parse_discovery_cmd(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req);
 
+void nvmet_execute_cancel(struct nvmet_req *req);
+
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
 void nvmet_req_uninit(struct nvmet_req *req);
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
+void nvmet_req_abort(struct nvmet_req *req);
 int nvmet_req_alloc_sgls(struct nvmet_req *req);
 void nvmet_req_free_sgls(struct nvmet_req *req);
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] nvme-core: add a function to submit a cancel command
  2024-05-10 16:30 ` [PATCH 2/5] nvme-core: add a function to submit a " Maurizio Lombardi
@ 2024-05-12 14:06   ` Sagi Grimberg
  2024-06-20  8:31     ` Maurizio Lombardi
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2024-05-12 14:06 UTC (permalink / raw)
  To: Maurizio Lombardi, kbusch; +Cc: hare, emilne, jmeneghi, linux-nvme



On 10/05/2024 19:30, Maurizio Lombardi wrote:
> Add a function to send a cancel command to abort the specified request.
> To execute the task, the "single command" cancel action is used.
>
> When the cancel command completes, the host driver will print the
> number of deferred and immediate aborts performed by the target.
>
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>   drivers/nvme/host/core.c | 54 ++++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/nvme.h |  2 ++
>   2 files changed, 56 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b48967c98114..3d8c31d5f8ae 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2965,6 +2965,60 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
>   	return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
>   }
>   
> +static enum rq_end_io_ret nvme_cancel_endio(struct request *req, blk_status_t error)
> +{
> +	struct nvme_ctrl *ctrl = req->end_io_data;
> +	u32 result;
> +	u16 imm_abrts, def_abrts;
> +	u16 status = nvme_req(req)->status;
> +
> +	result = le32_to_cpu(nvme_req(req)->result.u32);
> +
> +	def_abrts = upper_16_bits(result);
> +	imm_abrts = lower_16_bits(result);
> +
> +	dev_warn(ctrl->device,
> +		 "Cancel status: 0x%x imm abrts = %u def abrts = %u",
> +		 status, imm_abrts, def_abrts);
> +
> +	blk_mq_free_request(req);
> +	return RQ_END_IO_NONE;
> +}
> +
> +int nvme_submit_cancel_req(struct nvme_ctrl *ctrl, struct request *rq,
> +			   unsigned int sqid)
> +{
> +	struct nvme_command c = { };
> +	struct request *cancel_req;
> +
> +	if (sqid == 0)
> +		return -EINVAL;
> +
> +	c.cancel.opcode = nvme_cmd_cancel;
> +	c.cancel.cid = nvme_cid(rq);
> +	c.cancel.sqid = cpu_to_le32(sqid);
> +	c.cancel.nsid = NVME_NSID_ALL;
> +	c.cancel.action = NVME_CANCEL_ACTION_SINGLE_CMD;
> +
> +	cancel_req = blk_mq_alloc_request_hctx(rq->q, nvme_req_op(&c),

I know that people discouraged expanding the use of 
blk_mq_alloc_request_hctx in the past, but I think it is warranted here. 
Note that it does not play well with cpu hotplug code (we have the same 
issue with nvmf connect).
> +					       BLK_MQ_REQ_NOWAIT |
> +					       BLK_MQ_REQ_RESERVED,
> +					       sqid - 1);

I think we need to account for a new reserved commands that is now
can be taken outside of connect. Should this be reserved at all?
meaning even if we account for it, there are potentially more and more
requests that we'd want to cancel...

We just seen why not reserving requests for connect is in:
de105068fead ("nvme: fix reconnection fail due to reserved tag allocation")


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] nvme-core: add a function to submit a cancel command
  2024-05-12 14:06   ` Sagi Grimberg
@ 2024-06-20  8:31     ` Maurizio Lombardi
  2024-06-20 13:01       ` Maurizio Lombardi
  0 siblings, 1 reply; 15+ messages in thread
From: Maurizio Lombardi @ 2024-06-20  8:31 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: kbusch, hare, emilne, jmeneghi, linux-nvme

Hello Sagi,

ne 12. 5. 2024 v 16:06 odesílatel Sagi Grimberg <sagi@grimberg.me> napsal:
> > +                                            BLK_MQ_REQ_NOWAIT |
> > +                                            BLK_MQ_REQ_RESERVED,
> > +                                            sqid - 1);
>
> I think we need to account for a new reserved commands that is now
> can be taken outside of connect. Should this be reserved at all?

That is true, I should add some reserved tags, and yes, they have to
be reserved because
blk_mq_alloc_request_hctx() enforces it to keep the low level allocator simple.

> meaning even if we account for it, there are potentially more and more
> requests that we'd want to cancel...

Yes, so the question is how many tags should be reserved for cancel commands.

I received a suggestion to make Cancel command work the following way:

- The first timeout on a specific I/O queue triggers a cancel in
"single command" mode, aborting the command
that timed out.
- If another command times out on the same I/O queue while the fist
cancel is still executing, send a second cancel command in
"multiple command" mode, aborting all of the outstanding commands on
the I/O queue.

but this could potentially require 2 * queue_count reserved tags, so
tens of reserved tags for a command that maybe will be seldom used;
also, the implementation will be more complex.

Another possibility is to just use Cancel in "single command" mode,
similarly to how the normal abort command works and limit it
with a variable in the nvme_ctrl structure, as we do with
"abort_limit" (it could be called "cancel_limit").
This would prevent us from exhausting the pool of reserved tags, thus
avoiding the problems like with commit de105068fead while keeping the
implementation very simple;
if we reach "cancel_limit" we can always fallback to a controller
reset. On the other hand, we would use only a subset of the features
Cancel has to offer.

Maurizio



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] nvme-core: add a function to submit a cancel command
  2024-06-20  8:31     ` Maurizio Lombardi
@ 2024-06-20 13:01       ` Maurizio Lombardi
  2024-06-24 10:06         ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Maurizio Lombardi @ 2024-06-20 13:01 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: kbusch, hare, emilne, jmeneghi, linux-nvme

čt 20. 6. 2024 v 10:31 odesílatel Maurizio Lombardi
<mlombard@redhat.com> napsal:
>
> if we reach "cancel_limit" we can always fallback to a controller
> reset.

... or wait for it to free up a slot, like nvme-pci does.

Maurizio



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] nvme-core: add a function to submit a cancel command
  2024-06-20 13:01       ` Maurizio Lombardi
@ 2024-06-24 10:06         ` Sagi Grimberg
  2024-06-26 18:38           ` [PATCH v2 7/7] nvme: add reserved ioq tags for cancel John Meneghini
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2024-06-24 10:06 UTC (permalink / raw)
  To: Maurizio Lombardi; +Cc: kbusch, hare, emilne, jmeneghi, linux-nvme



On 20/06/2024 16:01, Maurizio Lombardi wrote:
> čt 20. 6. 2024 v 10:31 odesílatel Maurizio Lombardi
> <mlombard@redhat.com> napsal:
>> if we reach "cancel_limit" we can always fallback to a controller
>> reset.
> ... or wait for it to free up a slot, like nvme-pci does.

I think that the approach to have 2 is probably sufficient (one is a direct
cancel and one is cancel-all).

If we find that people need more, we may consider adding sysctl for it
or something...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 7/7] nvme: add reserved ioq tags for cancel
  2024-06-24 10:06         ` Sagi Grimberg
@ 2024-06-26 18:38           ` John Meneghini
  2024-06-26 19:03             ` Keith Busch
  2024-06-26 19:10             ` John Meneghini
  0 siblings, 2 replies; 15+ messages in thread
From: John Meneghini @ 2024-06-26 18:38 UTC (permalink / raw)
  To: sagi; +Cc: emilne, hare, jmeneghi, kbusch, linux-nvme, mlombard

If the nvme Cancel command is supported, we need to reserve 2 tags for
each IO queue. Note that one addition tag is reserved to account for
the case where this is a fabrics controller.

Signed-off-by: John Meneghini <jmeneghi@redhat.com>
---
 drivers/nvme/host/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 691dd6ee6dc3..76554fb373a3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4570,6 +4570,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 		unsigned int cmd_size)
 {
 	int ret;
+	u32 effects = le32_to_cpu(ctrl->effects->iocs[nvme_cmd_cancel]);
 
 	memset(set, 0, sizeof(*set));
 	set->ops = ops;
@@ -4580,9 +4581,13 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 	 */
 	if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
 		set->reserved_tags = NVME_AQ_DEPTH;
+	else if  (effects & NVME_CMD_EFFECTS_CSUPP)
+		/* Reserve 2 X io_queue count for NVMe Cancel */
+		set->reserved_tags = (2 * ctrl->queue_count);
 	else if (ctrl->ops->flags & NVME_F_FABRICS)
 		/* Reserved for fabric connect */
 		set->reserved_tags = 1;
+
 	set->numa_node = ctrl->numa_node;
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
 	if (ctrl->ops->flags & NVME_F_BLOCKING)
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 7/7] nvme: add reserved ioq tags for cancel
  2024-06-26 18:38           ` [PATCH v2 7/7] nvme: add reserved ioq tags for cancel John Meneghini
@ 2024-06-26 19:03             ` Keith Busch
  2024-06-26 19:20               ` John Meneghini
  2024-06-26 19:10             ` John Meneghini
  1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2024-06-26 19:03 UTC (permalink / raw)
  To: John Meneghini; +Cc: sagi, emilne, hare, kbusch, linux-nvme, mlombard

On Wed, Jun 26, 2024 at 02:38:19PM -0400, John Meneghini wrote:
> If the nvme Cancel command is supported, we need to reserve 2 tags for
> each IO queue. Note that one addition tag is reserved to account for
> the case where this is a fabrics controller.
 
> @@ -4580,9 +4581,13 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>  	 */
>  	if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
>  		set->reserved_tags = NVME_AQ_DEPTH;
> +	else if  (effects & NVME_CMD_EFFECTS_CSUPP)
> +		/* Reserve 2 X io_queue count for NVMe Cancel */
> +		set->reserved_tags = (2 * ctrl->queue_count);

The reserved_tags are already per queue, no need to multiply.

>  	else if (ctrl->ops->flags & NVME_F_FABRICS)
>  		/* Reserved for fabric connect */
>  		set->reserved_tags = 1;

You mentioned an addtional tags is supposed to be reserved for fabrics.
Shouldn't this just be an "if", not an "else if", and then increment
reserved_tags?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 7/7] nvme: add reserved ioq tags for cancel
  2024-06-26 18:38           ` [PATCH v2 7/7] nvme: add reserved ioq tags for cancel John Meneghini
  2024-06-26 19:03             ` Keith Busch
@ 2024-06-26 19:10             ` John Meneghini
  1 sibling, 0 replies; 15+ messages in thread
From: John Meneghini @ 2024-06-26 19:10 UTC (permalink / raw)
  To: sagi; +Cc: emilne, hare, kbusch, linux-nvme, mlombard

This is the patch I wrote to solve this problem while at LSF/MM. Is this what you are thinking about Sagi?

Note: more changes are needed to the error handlers to account for this.  The idea is that the eh will need to be modified to 
keep track of outstanding nvme-cancel command for each io queue.  Following the first command timeout the eh will send one 
single command cancel command to abort the slow command in the controller.  If second command timeout occurs before the CQE for 
the first cancel is returned by the controller the error handler sends a  Multiple Command Cancel to the IO queue with NSID set 
to FFFFFFFFh.  This form of the cancel command will cancel/abort all outstanding commands on the IO queue.

The problem is, in most cases when a command times out due to a problem in the controller not just one command times out but all 
outstanding commands timeout in a thundering herd. There are cases where a single IO will hang up, but those aren't usually 
reads or writes - like a reservation command that gets suck, or a dsm command that's going slow. Usually when reads and writes 
start timing out it's because the storage is just swamped and all IOs start to slow down.

Therefore, with only 2 reserved tags on each IO queue, the host should be able to use the cancel command to abort any and all 
outstanding IOs that time out.

/John

On 6/26/24 14:38, John Meneghini wrote:Multiple Command Cancel:
> If the nvme Cancel command is supported, we need to reserve 2 tags for
> each IO queue. Note that one addition tag is reserved to account for
> the case where this is a fabrics controller.
> 
> Signed-off-by: John Meneghini <jmeneghi@redhat.com>
> ---
>   drivers/nvme/host/core.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 691dd6ee6dc3..76554fb373a3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4570,6 +4570,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   		unsigned int cmd_size)
>   {
>   	int ret;
> +	u32 effects = le32_to_cpu(ctrl->effects->iocs[nvme_cmd_cancel]);
>   
>   	memset(set, 0, sizeof(*set));
>   	set->ops = ops;
> @@ -4580,9 +4581,13 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   	 */
>   	if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
>   		set->reserved_tags = NVME_AQ_DEPTH;
> +	else if  (effects & NVME_CMD_EFFECTS_CSUPP)
> +		/* Reserve 2 X io_queue count for NVMe Cancel */
> +		set->reserved_tags = (2 * ctrl->queue_count);
>   	else if (ctrl->ops->flags & NVME_F_FABRICS)
>   		/* Reserved for fabric connect */
>   		set->reserved_tags = 1;
> +
>   	set->numa_node = ctrl->numa_node;
>   	set->flags = BLK_MQ_F_SHOULD_MERGE;
>   	if (ctrl->ops->flags & NVME_F_BLOCKING)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 7/7] nvme: add reserved ioq tags for cancel
  2024-06-26 19:03             ` Keith Busch
@ 2024-06-26 19:20               ` John Meneghini
  2024-06-27  8:12                 ` Maurizio Lombardi
  0 siblings, 1 reply; 15+ messages in thread
From: John Meneghini @ 2024-06-26 19:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: sagi, emilne, hare, kbusch, linux-nvme, mlombard

On 6/26/24 15:03, Keith Busch wrote:
> On Wed, Jun 26, 2024 at 02:38:19PM -0400, John Meneghini wrote:
>> If the nvme Cancel command is supported, we need to reserve 2 tags for
>> each IO queue. Note that one addition tag is reserved to account for
>> the case where this is a fabrics controller.
>   
>> @@ -4580,9 +4581,13 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>>   	 */
>>   	if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
>>   		set->reserved_tags = NVME_AQ_DEPTH;
>> +	else if  (effects & NVME_CMD_EFFECTS_CSUPP)
>> +		/* Reserve 2 X io_queue count for NVMe Cancel */
>> +		set->reserved_tags = (2 * ctrl->queue_count);
> 
> The reserved_tags are already per queue, no need to multiply.

Ahh, ok. I wasn't sure about this. So the tag set belongs to the controller, and is shared by all IO queues, but reserved tags 
are reserved for each IO queue.  Let me fix that before I begin testing this.

>>   	else if (ctrl->ops->flags & NVME_F_FABRICS)
>>   		/* Reserved for fabric connect */
>>   		set->reserved_tags = 1;
> 
> You mentioned an addtional tags is supposed to be reserved for fabrics.
> Shouldn't this just be an "if", not an "else if", and then increment
> reserved_tags?

Yes, good point. I was thinking we don't need more than 2 tags if it's a fabric.

So you want to reserve 3 tags when both cancel and fabrics are true?

/John



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 7/7] nvme: add reserved ioq tags for cancel
  2024-06-26 19:20               ` John Meneghini
@ 2024-06-27  8:12                 ` Maurizio Lombardi
  0 siblings, 0 replies; 15+ messages in thread
From: Maurizio Lombardi @ 2024-06-27  8:12 UTC (permalink / raw)
  To: John Meneghini
  Cc: Keith Busch, sagi, emilne, hare, kbusch, linux-nvme, mlombard

V Wed, Jun 26, 2024 at 03:20:08PM -0400, John Meneghini napsal(a):
> > >   	else if (ctrl->ops->flags & NVME_F_FABRICS)
> > >   		/* Reserved for fabric connect */
> > >   		set->reserved_tags = 1;
> > 
> > You mentioned an addtional tags is supposed to be reserved for fabrics.
> > Shouldn't this just be an "if", not an "else if", and then increment
> > reserved_tags?
> 
> Yes, good point. I was thinking we don't need more than 2 tags if it's a fabric.
> 
> So you want to reserve 3 tags when both cancel and fabrics are true?

You dropped the reserved tag for the connect command, this is from
the V2 patchset I am working on:

From: Maurizio Lombardi <mlombard@redhat.com>
Date: Tue, 25 Jun 2024 15:45:54 +0200
Subject: [PATCH 3/8] nvme-core: reserve tags for cancel commands

If the controller supports cancel commands, reserve two tags for them

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fea5400ddf92..1b20f1a10321 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4532,9 +4532,14 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
         */
        if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
                set->reserved_tags = NVME_AQ_DEPTH;
-       else if (ctrl->ops->flags & NVME_F_FABRICS)
+       else if (ctrl->ops->flags & NVME_F_FABRICS) {
                /* Reserved for fabric connect */
                set->reserved_tags = 1;
+               if (nvme_io_command_supported(ctrl, nvme_cmd_cancel)) {
+                       /* Reserved for cancel commands */
+                       set->reserved_tags += 2;
+               }
+       }
        set->numa_node = ctrl->numa_node;
        set->flags = BLK_MQ_F_SHOULD_MERGE;
        if (ctrl->ops->flags & NVME_F_BLOCKING)
-- 
2.43.0

Maurizio



^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-27  8:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10 16:30 [PATCH 0/5] Support for Cancel commands for host (TCP and RDMA) Maurizio Lombardi
2024-05-10 16:30 ` [PATCH 1/5] nvme: add definitions for cancel command Maurizio Lombardi
2024-05-10 16:30 ` [PATCH 2/5] nvme-core: add a function to submit a " Maurizio Lombardi
2024-05-12 14:06   ` Sagi Grimberg
2024-06-20  8:31     ` Maurizio Lombardi
2024-06-20 13:01       ` Maurizio Lombardi
2024-06-24 10:06         ` Sagi Grimberg
2024-06-26 18:38           ` [PATCH v2 7/7] nvme: add reserved ioq tags for cancel John Meneghini
2024-06-26 19:03             ` Keith Busch
2024-06-26 19:20               ` John Meneghini
2024-06-27  8:12                 ` Maurizio Lombardi
2024-06-26 19:10             ` John Meneghini
2024-05-10 16:30 ` [PATCH 3/5] nvme-tcp: use the cancel command to perform an abort if target supports it Maurizio Lombardi
2024-05-10 16:30 ` [PATCH 4/5] nvme-rdma: " Maurizio Lombardi
2024-05-10 16:30 ` [donotmerge PATCH 5/5] nvmet: target support for cancel commands Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox