[PATCH 0/4] Rework NVMe abort handling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 13:28 ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)


PCI currently is the only transport we have which is doing more than
queueing a transport reset but trying to abort the command first.

This series brings the other transports we currently have up to the
same level.

On FC we can prepend a third escalation level by first to abort the FC
operation before trying to abort the NVMe command.

Johannes Thumshirn (4):
  nvme: factor out pci abort handling into core
  nvme: rdma: abort commands before resetting controller
  nvmet: loop: abort commands before resetting controller
  nvme: fc: abort commands before resetting controller

 drivers/nvme/host/core.c   | 47 +++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fc.c     |  9 +++++++
 drivers/nvme/host/nvme.h   |  1 +
 drivers/nvme/host/pci.c    | 61 +++++++++-------------------------------------
 drivers/nvme/host/rdma.c   | 22 ++++++++++++++---
 drivers/nvme/target/loop.c | 17 ++++++++++++-
 6 files changed, 104 insertions(+), 53 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 13:28 ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch
  Cc: James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

PCI currently is the only transport we have which is doing more than
queueing a transport reset but trying to abort the command first.

This series brings the other transports we currently have up to the
same level.

On FC we can prepend a third escalation level by first to abort the FC
operation before trying to abort the NVMe command.

Johannes Thumshirn (4):
  nvme: factor out pci abort handling into core
  nvme: rdma: abort commands before resetting controller
  nvmet: loop: abort commands before resetting controller
  nvme: fc: abort commands before resetting controller

 drivers/nvme/host/core.c   | 47 +++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fc.c     |  9 +++++++
 drivers/nvme/host/nvme.h   |  1 +
 drivers/nvme/host/pci.c    | 61 +++++++++-------------------------------------
 drivers/nvme/host/rdma.c   | 22 ++++++++++++++---
 drivers/nvme/target/loop.c | 17 ++++++++++++-
 6 files changed, 104 insertions(+), 53 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] nvme: factor out pci abort handling into core
  2018-07-19 13:28 ` Johannes Thumshirn
@ 2018-07-19 13:28   ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)


Currently PCI is the only transport which does a more fine grained
error handling than just resetting the controller.

Factor out the command abort logic into nvme-core so other transports
can benefit of it as well.

Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 61 ++++++++++--------------------------------------
 3 files changed, 60 insertions(+), 49 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e77e6418a21c..82896be14191 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -702,6 +702,53 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 }
 EXPORT_SYMBOL_GPL(nvme_submit_sync_cmd);
 
+
+static void abort_endio(struct request *req, blk_status_t error)
+{
+	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
+
+	dev_warn(ctrl->device,
+		 "Abort status: 0x%x", nvme_req(req)->status);
+	atomic_inc(&ctrl->abort_limit);
+	blk_mq_free_request(req);
+}
+
+int nvme_abort_cmd(struct nvme_ctrl *ctrl,
+					struct request *rq, __le16 sqid)
+{
+	struct request *abort_req;
+	struct nvme_command cmd;
+
+	if (nvme_req(rq)->flags & NVME_REQ_CANCELLED)
+		return -EAGAIN;
+
+	if (atomic_dec_return(&ctrl->abort_limit) < 0) {
+		atomic_inc(&ctrl->abort_limit);
+		return -EBUSY;
+	}
+
+	nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.abort.opcode = nvme_admin_abort_cmd;
+	cmd.abort.cid = rq->tag;
+	cmd.abort.sqid = sqid;
+
+	abort_req = nvme_alloc_request(ctrl->admin_q, &cmd,
+				       BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);
+	if (IS_ERR(abort_req)) {
+		atomic_inc(&ctrl->abort_limit);
+		return PTR_ERR(abort_req);
+	}
+
+	abort_req->timeout = ADMIN_TIMEOUT;
+	abort_req->end_io_data = NULL;
+	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_abort_cmd);
+
 static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
 		unsigned len, u32 seed, bool write)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4ad0c8ad2a27..39d6e4bc0402 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -445,6 +445,7 @@ int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		u8 log_page, void *log, size_t size, u64 offset);
+int nvme_abort_cmd(struct nvme_ctrl *ctrl, struct request  *rq, __le16 sqid);
 
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6678e9134348..321b8d55b693 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -188,7 +188,6 @@ struct nvme_iod {
 	struct nvme_request req;
 	struct nvme_queue *nvmeq;
 	bool use_sgl;
-	int aborted;
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	int length;		/* Of data, in bytes */
@@ -495,7 +494,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 		iod->sg = iod->inline_sg;
 	}
 
-	iod->aborted = 0;
 	iod->npages = -1;
 	iod->nents = 0;
 	iod->length = size;
@@ -1133,17 +1131,6 @@ static int adapter_delete_sq(struct nvme_dev *dev, u16 sqid)
 	return adapter_delete_queue(dev, nvme_admin_delete_sq, sqid);
 }
 
-static void abort_endio(struct request *req, blk_status_t error)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct nvme_queue *nvmeq = iod->nvmeq;
-
-	dev_warn(nvmeq->dev->ctrl.device,
-		 "Abort status: 0x%x", nvme_req(req)->status);
-	atomic_inc(&nvmeq->dev->ctrl.abort_limit);
-	blk_mq_free_request(req);
-}
-
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1193,9 +1180,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = iod->nvmeq;
 	struct nvme_dev *dev = nvmeq->dev;
-	struct request *abort_req;
-	struct nvme_command cmd;
 	u32 csts = readl(dev->bar + NVME_REG_CSTS);
+	int ret;
 
 	/* If PCI error recovery process is happening, we cannot reset or
 	 * the recovery mechanism will surely fail.
@@ -1243,54 +1229,31 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		break;
 	}
 
+	/*
+	 * The aborted req will be completed on receiving the abort req.
+	 * We enable the timer again. If hit twice, it'll cause a device reset,
+	 * as the device then is in a faulty state.
+	 */
+	ret = nvme_abort_cmd(&dev->ctrl, req, nvmeq->qid);
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
+
 	/*
  	 * Shutdown the controller immediately and schedule a reset if the
  	 * command was already aborted once before and still hasn't been
  	 * returned to the driver, or if this is the admin queue.
 	 */
-	if (!nvmeq->qid || iod->aborted) {
+	if (ret || !nvmeq->qid || nvme_req(req)->flags & NVME_REQ_CANCELLED) {
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
 		nvme_dev_disable(dev, false);
 		nvme_reset_ctrl(&dev->ctrl);
 
-		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	}
 
-	if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) {
-		atomic_inc(&dev->ctrl.abort_limit);
-		return BLK_EH_RESET_TIMER;
-	}
-	iod->aborted = 1;
-
-	memset(&cmd, 0, sizeof(cmd));
-	cmd.abort.opcode = nvme_admin_abort_cmd;
-	cmd.abort.cid = req->tag;
-	cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
-
-	dev_warn(nvmeq->dev->ctrl.device,
-		"I/O %d QID %d timeout, aborting\n",
-		 req->tag, nvmeq->qid);
-
-	abort_req = nvme_alloc_request(dev->ctrl.admin_q, &cmd,
-			BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);
-	if (IS_ERR(abort_req)) {
-		atomic_inc(&dev->ctrl.abort_limit);
-		return BLK_EH_RESET_TIMER;
-	}
-
-	abort_req->timeout = ADMIN_TIMEOUT;
-	abort_req->end_io_data = NULL;
-	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
-
-	/*
-	 * The aborted req will be completed on receiving the abort req.
-	 * We enable the timer again. If hit twice, it'll cause a device reset,
-	 * as the device then is in a faulty state.
-	 */
-	return BLK_EH_RESET_TIMER;
+	return BLK_EH_DONE;
 }
 
 static void nvme_free_queue(struct nvme_queue *nvmeq)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/4] nvme: factor out pci abort handling into core
@ 2018-07-19 13:28   ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch
  Cc: James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

Currently PCI is the only transport which does a more fine grained
error handling than just resetting the controller.

Factor out the command abort logic into nvme-core so other transports
can benefit of it as well.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 61 ++++++++++--------------------------------------
 3 files changed, 60 insertions(+), 49 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e77e6418a21c..82896be14191 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -702,6 +702,53 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 }
 EXPORT_SYMBOL_GPL(nvme_submit_sync_cmd);
 
+
+static void abort_endio(struct request *req, blk_status_t error)
+{
+	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
+
+	dev_warn(ctrl->device,
+		 "Abort status: 0x%x", nvme_req(req)->status);
+	atomic_inc(&ctrl->abort_limit);
+	blk_mq_free_request(req);
+}
+
+int nvme_abort_cmd(struct nvme_ctrl *ctrl,
+					struct request *rq, __le16 sqid)
+{
+	struct request *abort_req;
+	struct nvme_command cmd;
+
+	if (nvme_req(rq)->flags & NVME_REQ_CANCELLED)
+		return -EAGAIN;
+
+	if (atomic_dec_return(&ctrl->abort_limit) < 0) {
+		atomic_inc(&ctrl->abort_limit);
+		return -EBUSY;
+	}
+
+	nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.abort.opcode = nvme_admin_abort_cmd;
+	cmd.abort.cid = rq->tag;
+	cmd.abort.sqid = sqid;
+
+	abort_req = nvme_alloc_request(ctrl->admin_q, &cmd,
+				       BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);
+	if (IS_ERR(abort_req)) {
+		atomic_inc(&ctrl->abort_limit);
+		return PTR_ERR(abort_req);
+	}
+
+	abort_req->timeout = ADMIN_TIMEOUT;
+	abort_req->end_io_data = NULL;
+	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nvme_abort_cmd);
+
 static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
 		unsigned len, u32 seed, bool write)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4ad0c8ad2a27..39d6e4bc0402 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -445,6 +445,7 @@ int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		u8 log_page, void *log, size_t size, u64 offset);
+int nvme_abort_cmd(struct nvme_ctrl *ctrl, struct request  *rq, __le16 sqid);
 
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6678e9134348..321b8d55b693 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -188,7 +188,6 @@ struct nvme_iod {
 	struct nvme_request req;
 	struct nvme_queue *nvmeq;
 	bool use_sgl;
-	int aborted;
 	int npages;		/* In the PRP list. 0 means small pool in use */
 	int nents;		/* Used in scatterlist */
 	int length;		/* Of data, in bytes */
@@ -495,7 +494,6 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 		iod->sg = iod->inline_sg;
 	}
 
-	iod->aborted = 0;
 	iod->npages = -1;
 	iod->nents = 0;
 	iod->length = size;
@@ -1133,17 +1131,6 @@ static int adapter_delete_sq(struct nvme_dev *dev, u16 sqid)
 	return adapter_delete_queue(dev, nvme_admin_delete_sq, sqid);
 }
 
-static void abort_endio(struct request *req, blk_status_t error)
-{
-	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-	struct nvme_queue *nvmeq = iod->nvmeq;
-
-	dev_warn(nvmeq->dev->ctrl.device,
-		 "Abort status: 0x%x", nvme_req(req)->status);
-	atomic_inc(&nvmeq->dev->ctrl.abort_limit);
-	blk_mq_free_request(req);
-}
-
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1193,9 +1180,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = iod->nvmeq;
 	struct nvme_dev *dev = nvmeq->dev;
-	struct request *abort_req;
-	struct nvme_command cmd;
 	u32 csts = readl(dev->bar + NVME_REG_CSTS);
+	int ret;
 
 	/* If PCI error recovery process is happening, we cannot reset or
 	 * the recovery mechanism will surely fail.
@@ -1243,54 +1229,31 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		break;
 	}
 
+	/*
+	 * The aborted req will be completed on receiving the abort req.
+	 * We enable the timer again. If hit twice, it'll cause a device reset,
+	 * as the device then is in a faulty state.
+	 */
+	ret = nvme_abort_cmd(&dev->ctrl, req, nvmeq->qid);
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
+
 	/*
  	 * Shutdown the controller immediately and schedule a reset if the
  	 * command was already aborted once before and still hasn't been
  	 * returned to the driver, or if this is the admin queue.
 	 */
-	if (!nvmeq->qid || iod->aborted) {
+	if (ret || !nvmeq->qid || nvme_req(req)->flags & NVME_REQ_CANCELLED) {
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
 		nvme_dev_disable(dev, false);
 		nvme_reset_ctrl(&dev->ctrl);
 
-		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	}
 
-	if (atomic_dec_return(&dev->ctrl.abort_limit) < 0) {
-		atomic_inc(&dev->ctrl.abort_limit);
-		return BLK_EH_RESET_TIMER;
-	}
-	iod->aborted = 1;
-
-	memset(&cmd, 0, sizeof(cmd));
-	cmd.abort.opcode = nvme_admin_abort_cmd;
-	cmd.abort.cid = req->tag;
-	cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
-
-	dev_warn(nvmeq->dev->ctrl.device,
-		"I/O %d QID %d timeout, aborting\n",
-		 req->tag, nvmeq->qid);
-
-	abort_req = nvme_alloc_request(dev->ctrl.admin_q, &cmd,
-			BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);
-	if (IS_ERR(abort_req)) {
-		atomic_inc(&dev->ctrl.abort_limit);
-		return BLK_EH_RESET_TIMER;
-	}
-
-	abort_req->timeout = ADMIN_TIMEOUT;
-	abort_req->end_io_data = NULL;
-	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
-
-	/*
-	 * The aborted req will be completed on receiving the abort req.
-	 * We enable the timer again. If hit twice, it'll cause a device reset,
-	 * as the device then is in a faulty state.
-	 */
-	return BLK_EH_RESET_TIMER;
+	return BLK_EH_DONE;
 }
 
 static void nvme_free_queue(struct nvme_queue *nvmeq)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/4] nvme: factor out pci abort handling into core
  2018-07-19 13:28   ` Johannes Thumshirn
@ 2018-07-19 16:29     ` kbuild test robot
  -1 siblings, 0 replies; 30+ messages in thread
From: kbuild test robot @ 2018-07-19 16:29 UTC (permalink / raw)


Hi Johannes,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc5 next-20180719]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Rework-NVMe-abort-handling/20180719-230642
config: x86_64-randconfig-x011-201828 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/nvme/host/core.c: In function 'abort_endio':
>> drivers/nvme/host/core.c:711:40: error: 'struct nvme_request' has no member named 'ctrl'
     struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
                                           ^~

vim +711 drivers/nvme/host/core.c

   707	
   708	
   709	static void abort_endio(struct request *req, blk_status_t error)
   710	{
 > 711		struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
   712	
   713		dev_warn(ctrl->device,
   714			 "Abort status: 0x%x", nvme_req(req)->status);
   715		atomic_inc(&ctrl->abort_limit);
   716		blk_mq_free_request(req);
   717	}
   718	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 34695 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180720/d222223b/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] nvme: factor out pci abort handling into core
@ 2018-07-19 16:29     ` kbuild test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kbuild test robot @ 2018-07-19 16:29 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: kbuild-all, Christoph Hellwig, Sagi Grimberg, Keith Busch,
	James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

Hi Johannes,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc5 next-20180719]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Rework-NVMe-abort-handling/20180719-230642
config: x86_64-randconfig-x011-201828 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/nvme/host/core.c: In function 'abort_endio':
>> drivers/nvme/host/core.c:711:40: error: 'struct nvme_request' has no member named 'ctrl'
     struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
                                           ^~

vim +711 drivers/nvme/host/core.c

   707	
   708	
   709	static void abort_endio(struct request *req, blk_status_t error)
   710	{
 > 711		struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
   712	
   713		dev_warn(ctrl->device,
   714			 "Abort status: 0x%x", nvme_req(req)->status);
   715		atomic_inc(&ctrl->abort_limit);
   716		blk_mq_free_request(req);
   717	}
   718	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34695 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/4] nvme: rdma: abort commands before resetting controller
  2018-07-19 13:28 ` Johannes Thumshirn
@ 2018-07-19 13:28   ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)


Currently if a timeout on rdma happens we just go ahead and reset the
underlying transport.

Instead try to abort the timeout out command and if this fails as well
continue the error path escalation and tear down the transport.

Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/rdma.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 13a6064e4794..aed4752cfac6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1671,10 +1671,26 @@ static enum blk_eh_timer_return
 nvme_rdma_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl;
+	u16 qid = nvme_rdma_queue_idx(req->queue);
+	int ret;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d timeout, aborting\n",
+		 rq->tag, qid);
+
+	ret = nvme_abort_cmd(ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
 
-	dev_warn(req->queue->ctrl->ctrl.device,
-		 "I/O %d QID %d timeout, reset controller\n",
-		 rq->tag, nvme_rdma_queue_idx(req->queue));
+	/*
+	 * If the request was already cancelled once there's no need
+	 * in doing it again, escalate to the next error recovery
+	 * level.
+	 */
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d abort failed %d, reset controller\n",
+		 rq->tag, qid, ret);
 
 	/* queue error recovery */
 	nvme_rdma_error_recovery(req->queue->ctrl);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] nvme: rdma: abort commands before resetting controller
@ 2018-07-19 13:28   ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch
  Cc: James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

Currently if a timeout on rdma happens we just go ahead and reset the
underlying transport.

Instead try to abort the timeout out command and if this fails as well
continue the error path escalation and tear down the transport.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/host/rdma.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 13a6064e4794..aed4752cfac6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1671,10 +1671,26 @@ static enum blk_eh_timer_return
 nvme_rdma_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl;
+	u16 qid = nvme_rdma_queue_idx(req->queue);
+	int ret;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d timeout, aborting\n",
+		 rq->tag, qid);
+
+	ret = nvme_abort_cmd(ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
 
-	dev_warn(req->queue->ctrl->ctrl.device,
-		 "I/O %d QID %d timeout, reset controller\n",
-		 rq->tag, nvme_rdma_queue_idx(req->queue));
+	/*
+	 * If the request was already cancelled once there's no need
+	 * in doing it again, escalate to the next error recovery
+	 * level.
+	 */
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d abort failed %d, reset controller\n",
+		 rq->tag, qid, ret);
 
 	/* queue error recovery */
 	nvme_rdma_error_recovery(req->queue->ctrl);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] nvmet: loop: abort commands before resetting controller
  2018-07-19 13:28 ` Johannes Thumshirn
@ 2018-07-19 13:28   ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)


Currently if a timeout on loop happens we just go ahead and reset the
underlying transport.

Instead try to abort the timeout out command and if this fails as well
continue the error path escalation and tear down the transport.

Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/target/loop.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index af7fbf4132b0..b6c355125cb0 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -141,9 +141,24 @@ static enum blk_eh_timer_return
 nvme_loop_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_loop_iod *iod = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *ctrl = &iod->queue->ctrl->ctrl;
+	int qid = nvme_loop_queue_idx(iod->queue);
+	int ret;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d timeout, aborting\n",
+		 rq->tag, qid);
+
+	ret = nvme_abort_cmd(ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d abort failed %d, reset controller\n",
+		 rq->tag, qid, ret);
 
 	/* queue error recovery */
-	nvme_reset_ctrl(&iod->queue->ctrl->ctrl);
+	nvme_reset_ctrl(ctrl);
 
 	/* fail with DNR on admin cmd timeout */
 	nvme_req(rq)->status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] nvmet: loop: abort commands before resetting controller
@ 2018-07-19 13:28   ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch
  Cc: James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

Currently if a timeout on loop happens we just go ahead and reset the
underlying transport.

Instead try to abort the timeout out command and if this fails as well
continue the error path escalation and tear down the transport.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/target/loop.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index af7fbf4132b0..b6c355125cb0 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -141,9 +141,24 @@ static enum blk_eh_timer_return
 nvme_loop_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_loop_iod *iod = blk_mq_rq_to_pdu(rq);
+	struct nvme_ctrl *ctrl = &iod->queue->ctrl->ctrl;
+	int qid = nvme_loop_queue_idx(iod->queue);
+	int ret;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d timeout, aborting\n",
+		 rq->tag, qid);
+
+	ret = nvme_abort_cmd(ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
+
+	dev_warn(ctrl->device,
+		 "I/O %d QID %d abort failed %d, reset controller\n",
+		 rq->tag, qid, ret);
 
 	/* queue error recovery */
-	nvme_reset_ctrl(&iod->queue->ctrl->ctrl);
+	nvme_reset_ctrl(ctrl);
 
 	/* fail with DNR on admin cmd timeout */
 	nvme_req(rq)->status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] nvme: fc: abort commands before resetting controller
  2018-07-19 13:28 ` Johannes Thumshirn
@ 2018-07-19 13:28   ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)


Currently if a timeout on fc happens we just go ahead and reset the
underlying transport.

Instead try for a more fine grained error recovery by first aborting
the FC OP, if this fails abort the NVMe command and if this as well
fails go the hard route and reset the controller.

Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/fc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 9cc33752539a..0ba0075c3b75 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2054,6 +2054,15 @@ nvme_fc_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
 	struct nvme_fc_ctrl *ctrl = op->ctrl;
+	u16 qid = op->queue->qnum;
+	int ret;
+
+	if (!__nvme_fc_abort_op(ctrl, op))
+		return BLK_EH_RESET_TIMER;
+
+	ret = nvme_abort_cmd(&ctrl->ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
 
 	/*
 	 * we can't individually ABTS an io without affecting the queue,
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] nvme: fc: abort commands before resetting controller
@ 2018-07-19 13:28   ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Keith Busch
  Cc: James Smart, Hannes Reinecke, Ewan Milne, Max Gurtovoy,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

Currently if a timeout on fc happens we just go ahead and reset the
underlying transport.

Instead try for a more fine grained error recovery by first aborting
the FC OP, if this fails abort the NVMe command and if this as well
fails go the hard route and reset the controller.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/host/fc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 9cc33752539a..0ba0075c3b75 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2054,6 +2054,15 @@ nvme_fc_timeout(struct request *rq, bool reserved)
 {
 	struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
 	struct nvme_fc_ctrl *ctrl = op->ctrl;
+	u16 qid = op->queue->qnum;
+	int ret;
+
+	if (!__nvme_fc_abort_op(ctrl, op))
+		return BLK_EH_RESET_TIMER;
+
+	ret = nvme_abort_cmd(&ctrl->ctrl, rq, cpu_to_le16(qid));
+	if (!ret)
+		return BLK_EH_RESET_TIMER;
 
 	/*
 	 * we can't individually ABTS an io without affecting the queue,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 13:28 ` Johannes Thumshirn
@ 2018-07-19 13:42   ` Christoph Hellwig
  -1 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 13:42 UTC (permalink / raw)


On Thu, Jul 19, 2018@03:28:34PM +0200, Johannes Thumshirn wrote:
> PCI currently is the only transport we have which is doing more than
> queueing a transport reset but trying to abort the command first.
> 
> This series brings the other transports we currently have up to the
> same level.

Without even looking at the code yet:  why?  The nvme abort isn't
very useful, and due to the lack of ordering between different
queues almost harmful on fabrics.  What problem do you try to
solve?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 13:42   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 13:42 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, James Smart,
	Hannes Reinecke, Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 03:28:34PM +0200, Johannes Thumshirn wrote:
> PCI currently is the only transport we have which is doing more than
> queueing a transport reset but trying to abort the command first.
> 
> This series brings the other transports we currently have up to the
> same level.

Without even looking at the code yet:  why?  The nvme abort isn't
very useful, and due to the lack of ordering between different
queues almost harmful on fabrics.  What problem do you try to
solve?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 13:42   ` Christoph Hellwig
@ 2018-07-19 14:10     ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:10 UTC (permalink / raw)

On Thu, Jul 19, 2018@03:42:03PM +0200, Christoph Hellwig wrote:
> Without even looking at the code yet:  why?  The nvme abort isn't
> very useful, and due to the lack of ordering between different
> queues almost harmful on fabrics.  What problem do you try to
> solve?

The problem I'm trying to solve here is really just single commands
timing out because of i.e. a bad switch in between which causes frame
loss somewhere.

I know RDMA and FC are defined to be lossless but reality sometimes
has a different view on this (can't talk too much for RDMA but I've
had some nice bugs in SCSI due to faulty switches dropping odd
frames).

Of cause we can still do the big hammer if one command times out due
to a misbehaving switch but we can also at least try to abort it. I
know aborts are defined as best effort, but as we're in the error path
anyways it doesn't hurt to at least try.

This would give us a chance to recover from such situations, of cause
given the target actually does something when receiving an abort.

In the FC case we can even send an ABTS and try to abort the command
on the FC side first, before doing it on NVMe. I'm not sure if we can
do it on RDMA or PCIe as well.

So the issue I'm trying to solve is easy, if one command times out for
whatever reason, there's no need to go the big transport reset route
before not even trying to recover from it. Possibly we should also try
doing a queue reset if aborting failed before doing the transport
reset.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 14:10     ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, James Smart, Hannes Reinecke,
	Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 03:42:03PM +0200, Christoph Hellwig wrote:
> Without even looking at the code yet:  why?  The nvme abort isn't
> very useful, and due to the lack of ordering between different
> queues almost harmful on fabrics.  What problem do you try to
> solve?

The problem I'm trying to solve here is really just single commands
timing out because of i.e. a bad switch in between which causes frame
loss somewhere.

I know RDMA and FC are defined to be lossless but reality sometimes
has a different view on this (can't talk too much for RDMA but I've
had some nice bugs in SCSI due to faulty switches dropping odd
frames).

Of cause we can still do the big hammer if one command times out due
to a misbehaving switch but we can also at least try to abort it. I
know aborts are defined as best effort, but as we're in the error path
anyways it doesn't hurt to at least try.

This would give us a chance to recover from such situations, of cause
given the target actually does something when receiving an abort.

In the FC case we can even send an ABTS and try to abort the command
on the FC side first, before doing it on NVMe. I'm not sure if we can
do it on RDMA or PCIe as well.

So the issue I'm trying to solve is easy, if one command times out for
whatever reason, there's no need to go the big transport reset route
before not even trying to recover from it. Possibly we should also try
doing a queue reset if aborting failed before doing the transport
reset.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:10     ` Johannes Thumshirn
@ 2018-07-19 14:23       ` Christoph Hellwig
  -1 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 14:23 UTC (permalink / raw)

On Thu, Jul 19, 2018@04:10:25PM +0200, Johannes Thumshirn wrote:
> The problem I'm trying to solve here is really just single commands
> timing out because of i.e. a bad switch in between which causes frame
> loss somewhere.

And that is exactly the case where NVMe abort does not actually work
in any sensible way.

Remember that while NVMe guarantes ordered delivery inside a given
queue it does not guarantee anything between multiple queues.

So now you have your buggy FC setup where an I/O command times out
because your switch delayed it for two hours due to a firmware bug.

After 30 seconds we send an abort over the admin queue, which happens
to pass through just fine.  The controller will tell you: no command
found as it has never seen it.

No with the the code following what we have in PCIe that just means
we'll eventually controller reset after the I/O command times out
the second time as we still won't have seen a completion for it.

If you incorrectly just continue and resend the command we'll actually
get the command sent twice and thus a potential bug once the original
command just gets sent along.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 14:23       ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 14:23 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, James Smart,
	Hannes Reinecke, Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 04:10:25PM +0200, Johannes Thumshirn wrote:
> The problem I'm trying to solve here is really just single commands
> timing out because of i.e. a bad switch in between which causes frame
> loss somewhere.

And that is exactly the case where NVMe abort does not actually work
in any sensible way.

Remember that while NVMe guarantes ordered delivery inside a given
queue it does not guarantee anything between multiple queues.

So now you have your buggy FC setup where an I/O command times out
because your switch delayed it for two hours due to a firmware bug.

After 30 seconds we send an abort over the admin queue, which happens
to pass through just fine.  The controller will tell you: no command
found as it has never seen it.

No with the the code following what we have in PCIe that just means
we'll eventually controller reset after the I/O command times out
the second time as we still won't have seen a completion for it.

If you incorrectly just continue and resend the command we'll actually
get the command sent twice and thus a potential bug once the original
command just gets sent along.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:23       ` Christoph Hellwig
@ 2018-07-19 14:35         ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:35 UTC (permalink / raw)


On Thu, Jul 19, 2018@04:23:55PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 19, 2018@04:10:25PM +0200, Johannes Thumshirn wrote:
> > The problem I'm trying to solve here is really just single commands
> > timing out because of i.e. a bad switch in between which causes frame
> > loss somewhere.
> 
> And that is exactly the case where NVMe abort does not actually work
> in any sensible way.
> 
> Remember that while NVMe guarantes ordered delivery inside a given
> queue it does not guarantee anything between multiple queues.
> 
> So now you have your buggy FC setup where an I/O command times out
> because your switch delayed it for two hours due to a firmware bug.
> 
> After 30 seconds we send an abort over the admin queue, which happens
> to pass through just fine.  The controller will tell you: no command
> found as it has never seen it.
> 
> No with the the code following what we have in PCIe that just means
> we'll eventually controller reset after the I/O command times out
> the second time as we still won't have seen a completion for it.

Exactly that was my intention.

> If you incorrectly just continue and resend the command we'll actually
> get the command sent twice and thus a potential bug once the original
> command just gets sent along.

OK, let me see where I'm stuck here. We're issuing a command, it gets
lost due to $REASON and I'm aborting it. The upper layers then
eventually retry the command and it arrives at the target side. But so
does the old command as well and we have a duplicate. Correct?

So if we keep our old behavior and tear down the queues and
re-establish them, then the upper layers retry the command and it
arrives on the target. But shortly afterwards the switch happens to
find the old command in it's ingress buffers and decides to forward it
to the target as well, how does that differ? The CMDID and SQID are
probably different but all the payload will be the same, wouldn't it?

So we still have our duplicate on the other side, don't we?

I feel I'm missing out something here.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 14:35         ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, James Smart, Hannes Reinecke,
	Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 04:23:55PM +0200, Christoph Hellwig wrote:
> On Thu, Jul 19, 2018 at 04:10:25PM +0200, Johannes Thumshirn wrote:
> > The problem I'm trying to solve here is really just single commands
> > timing out because of i.e. a bad switch in between which causes frame
> > loss somewhere.
> 
> And that is exactly the case where NVMe abort does not actually work
> in any sensible way.
> 
> Remember that while NVMe guarantes ordered delivery inside a given
> queue it does not guarantee anything between multiple queues.
> 
> So now you have your buggy FC setup where an I/O command times out
> because your switch delayed it for two hours due to a firmware bug.
> 
> After 30 seconds we send an abort over the admin queue, which happens
> to pass through just fine.  The controller will tell you: no command
> found as it has never seen it.
> 
> No with the the code following what we have in PCIe that just means
> we'll eventually controller reset after the I/O command times out
> the second time as we still won't have seen a completion for it.

Exactly that was my intention.

> If you incorrectly just continue and resend the command we'll actually
> get the command sent twice and thus a potential bug once the original
> command just gets sent along.

OK, let me see where I'm stuck here. We're issuing a command, it gets
lost due to $REASON and I'm aborting it. The upper layers then
eventually retry the command and it arrives at the target side. But so
does the old command as well and we have a duplicate. Correct?

So if we keep our old behavior and tear down the queues and
re-establish them, then the upper layers retry the command and it
arrives on the target. But shortly afterwards the switch happens to
find the old command in it's ingress buffers and decides to forward it
to the target as well, how does that differ? The CMDID and SQID are
probably different but all the payload will be the same, wouldn't it?

So we still have our duplicate on the other side, don't we?

I feel I'm missing out something here.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:35         ` Johannes Thumshirn
@ 2018-07-19 14:50           ` Christoph Hellwig
  -1 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 14:50 UTC (permalink / raw)


On Thu, Jul 19, 2018@04:35:34PM +0200, Johannes Thumshirn wrote:
> > No with the the code following what we have in PCIe that just means
> > we'll eventually controller reset after the I/O command times out
> > the second time as we still won't have seen a completion for it.
> 
> Exactly that was my intention.

Which means the only thing you do for your use case is to delay
recovery even further.

> OK, let me see where I'm stuck here. We're issuing a command, it gets
> lost due to $REASON and I'm aborting it. The upper layers then
> eventually retry the command and it arrives at the target side. But so
> does the old command as well and we have a duplicate. Correct?

The upper layer is only going to retry after tearing down the transport
connection.  And a tear down of the connection MUST clear all pending
commands on the way.  If it doesn't we are in deep, deep trouble.

A NVMe abort has no chance of clearing things at the transport layer.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 14:50           ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2018-07-19 14:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, James Smart,
	Hannes Reinecke, Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 04:35:34PM +0200, Johannes Thumshirn wrote:
> > No with the the code following what we have in PCIe that just means
> > we'll eventually controller reset after the I/O command times out
> > the second time as we still won't have seen a completion for it.
> 
> Exactly that was my intention.

Which means the only thing you do for your use case is to delay
recovery even further.

> OK, let me see where I'm stuck here. We're issuing a command, it gets
> lost due to $REASON and I'm aborting it. The upper layers then
> eventually retry the command and it arrives at the target side. But so
> does the old command as well and we have a duplicate. Correct?

The upper layer is only going to retry after tearing down the transport
connection.  And a tear down of the connection MUST clear all pending
commands on the way.  If it doesn't we are in deep, deep trouble.

A NVMe abort has no chance of clearing things at the transport layer.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:50           ` Christoph Hellwig
@ 2018-07-19 14:54             ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:54 UTC (permalink / raw)


On Thu, Jul 19, 2018@04:50:05PM +0200, Christoph Hellwig wrote:
> The upper layer is only going to retry after tearing down the transport
> connection.  And a tear down of the connection MUST clear all pending
> commands on the way.  If it doesn't we are in deep, deep trouble.
> 
> A NVMe abort has no chance of clearing things at the transport layer.

OK so all I can do in my case (if I want soft error recovery) is send
out a transport abort in the timeout handler and then rearm the timer
and requeue the command.

Which leaves us to the FC patch only, modified of cause.

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 14:54             ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-19 14:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, James Smart, Hannes Reinecke,
	Ewan Milne, Max Gurtovoy, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist

On Thu, Jul 19, 2018 at 04:50:05PM +0200, Christoph Hellwig wrote:
> The upper layer is only going to retry after tearing down the transport
> connection.  And a tear down of the connection MUST clear all pending
> commands on the way.  If it doesn't we are in deep, deep trouble.
> 
> A NVMe abort has no chance of clearing things at the transport layer.

OK so all I can do in my case (if I want soft error recovery) is send
out a transport abort in the timeout handler and then rearm the timer
and requeue the command.

Which leaves us to the FC patch only, modified of cause.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:54             ` Johannes Thumshirn
@ 2018-07-19 15:04               ` James Smart
  -1 siblings, 0 replies; 30+ messages in thread
From: James Smart @ 2018-07-19 15:04 UTC (permalink / raw)


On 7/19/2018 7:54 AM, Johannes Thumshirn wrote:
> On Thu, Jul 19, 2018@04:50:05PM +0200, Christoph Hellwig wrote:
>> The upper layer is only going to retry after tearing down the transport
>> connection.  And a tear down of the connection MUST clear all pending
>> commands on the way.  If it doesn't we are in deep, deep trouble.
>>
>> A NVMe abort has no chance of clearing things at the transport layer.
> OK so all I can do in my case (if I want soft error recovery) is send
> out a transport abort in the timeout handler and then rearm the timer
> and requeue the command.
>
> Which leaves us to the FC patch only, modified of cause.
>

Which I'm going to say no on. I originally did the abort before the 
reset and it brought out some confusion in the reset code. Thus the 
existing flow which just resets the controller which has to be done 
after the abort anyway.

-- james

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 15:04               ` James Smart
  0 siblings, 0 replies; 30+ messages in thread
From: James Smart @ 2018-07-19 15:04 UTC (permalink / raw)
  To: Johannes Thumshirn, Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, Hannes Reinecke, Ewan Milne,
	Max Gurtovoy, Linux NVMe Mailinglist, Linux Kernel Mailinglist

On 7/19/2018 7:54 AM, Johannes Thumshirn wrote:
> On Thu, Jul 19, 2018 at 04:50:05PM +0200, Christoph Hellwig wrote:
>> The upper layer is only going to retry after tearing down the transport
>> connection.  And a tear down of the connection MUST clear all pending
>> commands on the way.  If it doesn't we are in deep, deep trouble.
>>
>> A NVMe abort has no chance of clearing things at the transport layer.
> OK so all I can do in my case (if I want soft error recovery) is send
> out a transport abort in the timeout handler and then rearm the timer
> and requeue the command.
>
> Which leaves us to the FC patch only, modified of cause.
>

Which I'm going to say no on. I originally did the abort before the 
reset and it brought out some confusion in the reset code. Thus the 
existing flow which just resets the controller which has to be done 
after the abort anyway.

-- james


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 15:04               ` James Smart
@ 2018-07-20  6:36                 ` Johannes Thumshirn
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-20  6:36 UTC (permalink / raw)


On Thu, Jul 19, 2018@08:04:09AM -0700, James Smart wrote:
> Which I'm going to say no on. I originally did the abort before the reset
> and it brought out some confusion in the reset code. Thus the existing flow
> which just resets the controller which has to be done after the abort
> anyway.

OK copied that.

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-20  6:36                 ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2018-07-20  6:36 UTC (permalink / raw)
  To: James Smart
  Cc: Christoph Hellwig, Sagi Grimberg, Linux Kernel Mailinglist,
	Linux NVMe Mailinglist, Keith Busch, Ewan Milne, Max Gurtovoy,
	Hannes Reinecke

On Thu, Jul 19, 2018 at 08:04:09AM -0700, James Smart wrote:
> Which I'm going to say no on. I originally did the abort before the reset
> and it brought out some confusion in the reset code. Thus the existing flow
> which just resets the controller which has to be done after the abort
> anyway.

OK copied that.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] Rework NVMe abort handling
  2018-07-19 14:10     ` Johannes Thumshirn
@ 2018-07-19 15:00       ` James Smart
  -1 siblings, 0 replies; 30+ messages in thread
From: James Smart @ 2018-07-19 15:00 UTC (permalink / raw)

On 7/19/2018 7:10 AM, Johannes Thumshirn wrote:
> On Thu, Jul 19, 2018@03:42:03PM +0200, Christoph Hellwig wrote:
>> Without even looking at the code yet:  why?  The nvme abort isn't
>> very useful, and due to the lack of ordering between different
>> queues almost harmful on fabrics.  What problem do you try to
>> solve?
> The problem I'm trying to solve here is really just single commands
> timing out because of i.e. a bad switch in between which causes frame
> loss somewhere.
>
> I know RDMA and FC are defined to be lossless but reality sometimes
> has a different view on this (can't talk too much for RDMA but I've
> had some nice bugs in SCSI due to faulty switches dropping odd
> frames).
>
> Of cause we can still do the big hammer if one command times out due
> to a misbehaving switch but we can also at least try to abort it. I
> know aborts are defined as best effort, but as we're in the error path
> anyways it doesn't hurt to at least try.
>
> This would give us a chance to recover from such situations, of cause
> given the target actually does something when receiving an abort.
>
> In the FC case we can even send an ABTS and try to abort the command
> on the FC side first, before doing it on NVMe. I'm not sure if we can
> do it on RDMA or PCIe as well.
>
> So the issue I'm trying to solve is easy, if one command times out for
> whatever reason, there's no need to go the big transport reset route
> before not even trying to recover from it. Possibly we should also try
> doing a queue reset if aborting failed before doing the transport
> reset.
>
> Byte,
> 	Johannes

I'm with Christoph.

It doesn't work that way... command delivery is very much tied to any 
command ordering delivery requirements as well as sqhd increment on the 
target, and response delivery is tied similarly tied to sqhd delivery to 
the host as well as ordering requirements on responses. With aborts as 
you're implementing, you drop those things.? Granted, Linux's lack of 
paying attention to SQHD (a problem waiting to happen in my mind) as 
well as not using fused commands (and no other commands yet requiring 
order) make it believe it can get away without it.

You're going to confuse transports as there's no understanding in the 
transport protocol on what it means to abort/cancel a single io.?? The 
specs are rather clear, and for a good reason, that non-delivery (the 
abort or cancellation) mandates connection teardown which in turn 
mandates association teardown. You will be creating non-standard 
implementations that will fail interoperability and compliance.

If you really want single io abort - implement it in the NVMe standard 
way with Aborts to the admin queue, subject to the ACL limit.? Then push 
on the targets to support deep ACL counts and honestly responding to 
ABORT, and there will still be race conditions between the ABORT and its 
command that will make an interesting retry policy. Or, wait for Fred 
Knights, new proposal on ABORTS.

-- james

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] Rework NVMe abort handling
@ 2018-07-19 15:00       ` James Smart
  0 siblings, 0 replies; 30+ messages in thread
From: James Smart @ 2018-07-19 15:00 UTC (permalink / raw)
  To: Johannes Thumshirn, Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, Hannes Reinecke, Ewan Milne,
	Max Gurtovoy, Linux NVMe Mailinglist, Linux Kernel Mailinglist

On 7/19/2018 7:10 AM, Johannes Thumshirn wrote:
> On Thu, Jul 19, 2018 at 03:42:03PM +0200, Christoph Hellwig wrote:
>> Without even looking at the code yet:  why?  The nvme abort isn't
>> very useful, and due to the lack of ordering between different
>> queues almost harmful on fabrics.  What problem do you try to
>> solve?
> The problem I'm trying to solve here is really just single commands
> timing out because of i.e. a bad switch in between which causes frame
> loss somewhere.
>
> I know RDMA and FC are defined to be lossless but reality sometimes
> has a different view on this (can't talk too much for RDMA but I've
> had some nice bugs in SCSI due to faulty switches dropping odd
> frames).
>
> Of cause we can still do the big hammer if one command times out due
> to a misbehaving switch but we can also at least try to abort it. I
> know aborts are defined as best effort, but as we're in the error path
> anyways it doesn't hurt to at least try.
>
> This would give us a chance to recover from such situations, of cause
> given the target actually does something when receiving an abort.
>
> In the FC case we can even send an ABTS and try to abort the command
> on the FC side first, before doing it on NVMe. I'm not sure if we can
> do it on RDMA or PCIe as well.
>
> So the issue I'm trying to solve is easy, if one command times out for
> whatever reason, there's no need to go the big transport reset route
> before not even trying to recover from it. Possibly we should also try
> doing a queue reset if aborting failed before doing the transport
> reset.
>
> Byte,
> 	Johannes

I'm with Christoph.

It doesn't work that way... command delivery is very much tied to any 
command ordering delivery requirements as well as sqhd increment on the 
target, and response delivery is tied similarly tied to sqhd delivery to 
the host as well as ordering requirements on responses. With aborts as 
you're implementing, you drop those things.  Granted, Linux's lack of 
paying attention to SQHD (a problem waiting to happen in my mind) as 
well as not using fused commands (and no other commands yet requiring 
order) make it believe it can get away without it.

You're going to confuse transports as there's no understanding in the 
transport protocol on what it means to abort/cancel a single io.   The 
specs are rather clear, and for a good reason, that non-delivery (the 
abort or cancellation) mandates connection teardown which in turn 
mandates association teardown. You will be creating non-standard 
implementations that will fail interoperability and compliance.

If you really want single io abort - implement it in the NVMe standard 
way with Aborts to the admin queue, subject to the ACL limit.  Then push 
on the targets to support deep ACL counts and honestly responding to 
ABORT, and there will still be race conditions between the ABORT and its 
command that will make an interesting retry policy. Or, wait for Fred 
Knights, new proposal on ABORTS.

-- james

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-07-20  6:36 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-19 13:28 [PATCH 0/4] Rework NVMe abort handling Johannes Thumshirn
2018-07-19 13:28 ` Johannes Thumshirn
2018-07-19 13:28 ` [PATCH 1/4] nvme: factor out pci abort handling into core Johannes Thumshirn
2018-07-19 13:28   ` Johannes Thumshirn
2018-07-19 16:29   ` kbuild test robot
2018-07-19 16:29     ` kbuild test robot
2018-07-19 13:28 ` [PATCH 2/4] nvme: rdma: abort commands before resetting controller Johannes Thumshirn
2018-07-19 13:28   ` Johannes Thumshirn
2018-07-19 13:28 ` [PATCH 3/4] nvmet: loop: " Johannes Thumshirn
2018-07-19 13:28   ` Johannes Thumshirn
2018-07-19 13:28 ` [PATCH 4/4] nvme: fc: " Johannes Thumshirn
2018-07-19 13:28   ` Johannes Thumshirn
2018-07-19 13:42 ` [PATCH 0/4] Rework NVMe abort handling Christoph Hellwig
2018-07-19 13:42   ` Christoph Hellwig
2018-07-19 14:10   ` Johannes Thumshirn
2018-07-19 14:10     ` Johannes Thumshirn
2018-07-19 14:23     ` Christoph Hellwig
2018-07-19 14:23       ` Christoph Hellwig
2018-07-19 14:35       ` Johannes Thumshirn
2018-07-19 14:35         ` Johannes Thumshirn
2018-07-19 14:50         ` Christoph Hellwig
2018-07-19 14:50           ` Christoph Hellwig
2018-07-19 14:54           ` Johannes Thumshirn
2018-07-19 14:54             ` Johannes Thumshirn
2018-07-19 15:04             ` James Smart
2018-07-19 15:04               ` James Smart
2018-07-20  6:36               ` Johannes Thumshirn
2018-07-20  6:36                 ` Johannes Thumshirn
2018-07-19 15:00     ` James Smart
2018-07-19 15:00       ` James Smart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.