nvme driver split V2

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* nvme driver split V2
@ 2015-10-16  5:58 Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


This series contains most of the work of splitting the nvme driver into a
core driver, a PCIe frontend for it and the possibility of future other
frontends like virtio or Fabrics.

This creates a new core.c with functions sitting on top of the block layer,
splits and split a new struct nvme_ctrl out of struct nvme_dev so that the
SCSI translation layer is decoupled from the PCI driver internals.  A new
struct nvme_ctrl_ops is introduced to call into the low level driver, and
this is subsequently used to move all the block device, char device and
sysfs interface code as well as the controller identification and namespace
scanning into the common code.  I've ported both my nvme-loop and the
fabrics driver over to this scheme to validate that it works fine, and Ming
ported over virtio_nvme,

Once this this series is done there are a few more items remaining:

 - abort rewrite to sit on top of struct request infrastructure (in progress)
 - AEN rewrite, including making a small part of it common
 - a proper state machine for probing, scanning and resets (will take a while)

To make testing easier I've pushed out a git tree with this and all
dependencies:

    git://git.infradead.org/users/hch/block.git nvme-split.5

or in gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-split.5

Chances since V1:
  - fixed a section mismatch
  - fixed missing byte swap
  - fixed metadata passthrough (thanks Keith!)
  - merged the previously separate series into a single one
  - one whitespace fix
  - added two additional patches to move trivial I/O path helpers to nvme.h

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:04   ` Sagi Grimberg
  2015-10-16  5:58 ` [PATCH 02/18] nvme: move struct nvme_iod to pci.c Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


When we fail various metadata related operations in nvme_queue_rq we
need to unmap the data SGL.

Cc: stable at vger.kernel.org
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 22d8375..2f05292 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -906,19 +906,28 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 			goto retry_cmd;
 		}
 		if (blk_integrity_rq(req)) {
-			if (blk_rq_count_integrity_sg(req->q, req->bio) != 1)
+			if (blk_rq_count_integrity_sg(req->q, req->bio) != 1) {
+				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
+						dma_dir);
 				goto error_cmd;
+			}
 
 			sg_init_table(iod->meta_sg, 1);
 			if (blk_rq_map_integrity_sg(
-					req->q, req->bio, iod->meta_sg) != 1)
+					req->q, req->bio, iod->meta_sg) != 1) {
+				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
+						dma_dir);
 				goto error_cmd;
+			}
 
 			if (rq_data_dir(req))
 				nvme_dif_remap(req, nvme_dif_prep);
 
-			if (!dma_map_sg(nvmeq->q_dmadev, iod->meta_sg, 1, dma_dir))
+			if (!dma_map_sg(nvmeq->q_dmadev, iod->meta_sg, 1, dma_dir)) {
+				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
+						dma_dir);
 				goto error_cmd;
+			}
 		}
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq
  2015-10-16  5:58 ` [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
@ 2015-10-20 10:04   ` Sagi Grimberg
  2015-10-20 14:07     ` Busch, Keith
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:04 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> When we fail various metadata related operations in nvme_queue_rq we
> need to unmap the data SGL.
>
> Cc: stable at vger.kernel.org
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 22d8375..2f05292 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -906,19 +906,28 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>   			goto retry_cmd;
>   		}
>   		if (blk_integrity_rq(req)) {
> -			if (blk_rq_count_integrity_sg(req->q, req->bio) != 1)
> +			if (blk_rq_count_integrity_sg(req->q, req->bio) != 1) {
> +				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
> +						dma_dir);
>   				goto error_cmd;
> +			}
>
>   			sg_init_table(iod->meta_sg, 1);
>   			if (blk_rq_map_integrity_sg(
> -					req->q, req->bio, iod->meta_sg) != 1)
> +					req->q, req->bio, iod->meta_sg) != 1) {
> +				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
> +						dma_dir);
>   				goto error_cmd;
> +			}

This is not related to the patch itself. But this condition seems bogus
to me. We passed meta_sg that consists of a single entry. If we
happened to map more than a single entry we're already in trouble as we
overrun meta_sg (modified the iod->sg pointer).

I think a WARN_ON_ONCE statement is more suitable here (which should
probably come as a separate patch).

Other than that, looks good to me:

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq
  2015-10-20 10:04   ` Sagi Grimberg
@ 2015-10-20 14:07     ` Busch, Keith
  0 siblings, 0 replies; 59+ messages in thread
From: Busch, Keith @ 2015-10-20 14:07 UTC (permalink / raw)


On Tue, Oct 20, 2015@01:04:24PM +0300, Sagi Grimberg wrote:
> On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> >  			sg_init_table(iod->meta_sg, 1);
> >  			if (blk_rq_map_integrity_sg(
> >-					req->q, req->bio, iod->meta_sg) != 1)
> >+					req->q, req->bio, iod->meta_sg) != 1) {
> >+				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
> >+						dma_dir);
> >  				goto error_cmd;
> >+			}
> 
> This is not related to the patch itself. But this condition seems bogus
> to me. We passed meta_sg that consists of a single entry. If we
> happened to map more than a single entry we're already in trouble as we
> overrun meta_sg (modified the iod->sg pointer).
> 
> I think a WARN_ON_ONCE statement is more suitable here (which should
> probably come as a separate patch).

We are in trouble if it maps more than 1, but I think the condition
here is intended to guard against 0 rather than > 1. We should already
be ensured it won't be > 1 from a previous check.

Based on the implementation of blk_rq_map_integrity_sg and the functions
earlier setup, I don't think we can ever see 0 here either.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 02/18] nvme: move struct nvme_iod to pci.c
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:04   ` Sagi Grimberg
  2015-10-16  5:58 ` [PATCH 03/18] nvme: split command submission helpers out of pci.c Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


This structure is specific to the PCIe driver internals and should be moved
to pci.c.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Acked-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/nvme.h | 17 -----------------
 drivers/nvme/host/pci.c  | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index c1f41bf..835941b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -88,23 +88,6 @@ struct nvme_ns {
 	u32 mode_select_block_len;
 };
 
-/*
- * The nvme_iod describes the data in an I/O, including the list of PRP
- * entries.  You can't see it in this data structure because C doesn't let
- * me express that.  Use nvme_alloc_iod to ensure there's enough space
- * allocated to store the PRP list.
- */
-struct nvme_iod {
-	unsigned long private;	/* For the use of the submitter of the I/O */
-	int npages;		/* In the PRP list. 0 means small pool in use */
-	int offset;		/* Of PRP list */
-	int nents;		/* Used in scatterlist */
-	int length;		/* Of data, in bytes */
-	dma_addr_t first_dma;
-	struct scatterlist meta_sg[1]; /* metadata requires single contiguous buffer */
-	struct scatterlist sg[0];
-};
-
 static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 {
 	return (sector >> (ns->lba_shift - 9));
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2f05292..7e1b438 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -128,6 +128,23 @@ struct nvme_queue {
 };
 
 /*
+ * The nvme_iod describes the data in an I/O, including the list of PRP
+ * entries.  You can't see it in this data structure because C doesn't let
+ * me express that.  Use nvme_alloc_iod to ensure there's enough space
+ * allocated to store the PRP list.
+ */
+struct nvme_iod {
+	unsigned long private;	/* For the use of the submitter of the I/O */
+	int npages;		/* In the PRP list. 0 means small pool in use */
+	int offset;		/* Of PRP list */
+	int nents;		/* Used in scatterlist */
+	int length;		/* Of data, in bytes */
+	dma_addr_t first_dma;
+	struct scatterlist meta_sg[1]; /* metadata requires single contiguous buffer */
+	struct scatterlist sg[0];
+};
+
+/*
  * Check we didin't inadvertently grow the command struct
  */
 static inline void _nvme_check_size(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/18] nvme: move struct nvme_iod to pci.c
  2015-10-16  5:58 ` [PATCH 02/18] nvme: move struct nvme_iod to pci.c Christoph Hellwig
@ 2015-10-20 10:04   ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:04 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> This structure is specific to the PCIe driver internals and should be moved
> to pci.c.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>

looks good to me:

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 03/18] nvme: split command submission helpers out of pci.c
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 02/18] nvme: move struct nvme_iod to pci.c Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:07   ` Sagi Grimberg
  2015-10-21 18:48   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 04/18] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


Create a new core.c and start by adding the command submission helpers
to it, which are already abstracted away from the actual hardware queues
by the block layer.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Acked-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/Makefile |   2 +-
 drivers/nvme/host/core.c   | 172 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h   |   3 +
 drivers/nvme/host/pci.c    | 153 +---------------------------------------
 4 files changed, 177 insertions(+), 153 deletions(-)
 create mode 100644 drivers/nvme/host/core.c

diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index cfb6679..336b4ea 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -1,4 +1,4 @@
 
 obj-$(CONFIG_BLK_DEV_NVME)     += nvme.o
 
-nvme-y		+= pci.o scsi.o
+nvme-y		+= core.o pci.o scsi.o
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
new file mode 100644
index 0000000..dfb528d
--- /dev/null
+++ b/drivers/nvme/host/core.c
@@ -0,0 +1,172 @@
+/*
+ * NVM Express device driver
+ * Copyright (c) 2011-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "nvme.h"
+
+/*
+ * Returns 0 on success.  If the result is negative, it's a Linux error code;
+ * if the result is positive, it's an NVM Express status code
+ */
+int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void *buffer, void __user *ubuffer, unsigned bufflen,
+		u32 *result, unsigned timeout)
+{
+	bool write = cmd->common.opcode & 1;
+	struct bio *bio = NULL;
+	struct request *req;
+	int ret;
+
+	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->cmd_type = REQ_TYPE_DRV_PRIV;
+	req->cmd_flags |= REQ_FAILFAST_DRIVER;
+	req->__data_len = 0;
+	req->__sector = (sector_t) -1;
+	req->bio = req->biotail = NULL;
+
+	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+
+	req->cmd = (unsigned char *)cmd;
+	req->cmd_len = sizeof(struct nvme_command);
+	req->special = (void *)0;
+
+	if (buffer && bufflen) {
+		ret = blk_rq_map_kern(q, req, buffer, bufflen, __GFP_WAIT);
+		if (ret)
+			goto out;
+	} else if (ubuffer && bufflen) {
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_WAIT);
+		if (ret)
+			goto out;
+		bio = req->bio;
+	}
+
+	blk_execute_rq(req->q, NULL, req, 0);
+	if (bio)
+		blk_rq_unmap_user(bio);
+	if (result)
+		*result = (u32)(uintptr_t)req->special;
+	ret = req->errors;
+ out:
+	blk_mq_free_request(req);
+	return ret;
+}
+
+int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void *buffer, unsigned bufflen)
+{
+	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, NULL, 0);
+}
+
+int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl **id)
+{
+	struct nvme_command c = { };
+	int error;
+
+	/* gcc-4.4.4 (at least) has issues with initializers and anon unions */
+	c.identify.opcode = nvme_admin_identify;
+	c.identify.cns = cpu_to_le32(1);
+
+	*id = kmalloc(sizeof(struct nvme_id_ctrl), GFP_KERNEL);
+	if (!*id)
+		return -ENOMEM;
+
+	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
+			sizeof(struct nvme_id_ctrl));
+	if (error)
+		kfree(*id);
+	return error;
+}
+
+int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
+		struct nvme_id_ns **id)
+{
+	struct nvme_command c = { };
+	int error;
+
+	/* gcc-4.4.4 (at least) has issues with initializers and anon unions */
+	c.identify.opcode = nvme_admin_identify,
+	c.identify.nsid = cpu_to_le32(nsid),
+
+	*id = kmalloc(sizeof(struct nvme_id_ns), GFP_KERNEL);
+	if (!*id)
+		return -ENOMEM;
+
+	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
+			sizeof(struct nvme_id_ns));
+	if (error)
+		kfree(*id);
+	return error;
+}
+
+int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned nsid,
+					dma_addr_t dma_addr, u32 *result)
+{
+	struct nvme_command c;
+
+	memset(&c, 0, sizeof(c));
+	c.features.opcode = nvme_admin_get_features;
+	c.features.nsid = cpu_to_le32(nsid);
+	c.features.prp1 = cpu_to_le64(dma_addr);
+	c.features.fid = cpu_to_le32(fid);
+
+	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
+			result, 0);
+}
+
+int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned dword11,
+					dma_addr_t dma_addr, u32 *result)
+{
+	struct nvme_command c;
+
+	memset(&c, 0, sizeof(c));
+	c.features.opcode = nvme_admin_set_features;
+	c.features.prp1 = cpu_to_le64(dma_addr);
+	c.features.fid = cpu_to_le32(fid);
+	c.features.dword11 = cpu_to_le32(dword11);
+
+	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
+			result, 0);
+}
+
+int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log **log)
+{
+	struct nvme_command c = { };
+	int error;
+
+	c.common.opcode = nvme_admin_get_log_page,
+	c.common.nsid = cpu_to_le32(0xFFFFFFFF),
+	c.common.cdw10[0] = cpu_to_le32(
+			(((sizeof(struct nvme_smart_log) / 4) - 1) << 16) |
+			 NVME_LOG_SMART),
+
+	*log = kmalloc(sizeof(struct nvme_smart_log), GFP_KERNEL);
+	if (!*log)
+		return -ENOMEM;
+
+	error = nvme_submit_sync_cmd(dev->admin_q, &c, *log,
+			sizeof(struct nvme_smart_log));
+	if (error)
+		kfree(*log);
+	return error;
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 835941b..0633a7b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -22,6 +22,9 @@
 extern unsigned char nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
 
+extern unsigned char admin_timeout;
+#define ADMIN_TIMEOUT	(admin_timeout * HZ)
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7e1b438..628c572 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -50,10 +50,9 @@
 #define NVME_AQ_DEPTH		256
 #define SQ_SIZE(depth)		(depth * sizeof(struct nvme_command))
 #define CQ_SIZE(depth)		(depth * sizeof(struct nvme_completion))
-#define ADMIN_TIMEOUT		(admin_timeout * HZ)
 #define SHUTDOWN_TIMEOUT	(shutdown_timeout * HZ)
 
-static unsigned char admin_timeout = 60;
+unsigned char admin_timeout = 60;
 module_param(admin_timeout, byte, 0644);
 MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin commands");
 
@@ -1031,63 +1030,6 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 	return IRQ_WAKE_THREAD;
 }
 
-/*
- * Returns 0 on success.  If the result is negative, it's a Linux error code;
- * if the result is positive, it's an NVM Express status code
- */
-int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
-		void *buffer, void __user *ubuffer, unsigned bufflen,
-		u32 *result, unsigned timeout)
-{
-	bool write = cmd->common.opcode & 1;
-	struct bio *bio = NULL;
-	struct request *req;
-	int ret;
-
-	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	req->cmd_type = REQ_TYPE_DRV_PRIV;
-	req->cmd_flags |= REQ_FAILFAST_DRIVER;
-	req->__data_len = 0;
-	req->__sector = (sector_t) -1;
-	req->bio = req->biotail = NULL;
-
-	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
-
-	req->cmd = (unsigned char *)cmd;
-	req->cmd_len = sizeof(struct nvme_command);
-	req->special = (void *)0;
-
-	if (buffer && bufflen) {
-		ret = blk_rq_map_kern(q, req, buffer, bufflen, __GFP_WAIT);
-		if (ret)
-			goto out;
-	} else if (ubuffer && bufflen) {
-		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_WAIT);
-		if (ret)
-			goto out;
-		bio = req->bio;
-	}
-
-	blk_execute_rq(req->q, NULL, req, 0);
-	if (bio)
-		blk_rq_unmap_user(bio);
-	if (result)
-		*result = (u32)(uintptr_t)req->special;
-	ret = req->errors;
- out:
-	blk_mq_free_request(req);
-	return ret;
-}
-
-int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
-		void *buffer, unsigned bufflen)
-{
-	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, NULL, 0);
-}
-
 static int nvme_submit_async_admin_req(struct nvme_dev *dev)
 {
 	struct nvme_queue *nvmeq = dev->queues[0];
@@ -1199,99 +1141,6 @@ static int adapter_delete_sq(struct nvme_dev *dev, u16 sqid)
 	return adapter_delete_queue(dev, nvme_admin_delete_sq, sqid);
 }
 
-int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl **id)
-{
-	struct nvme_command c = { };
-	int error;
-
-	/* gcc-4.4.4 (at least) has issues with initializers and anon unions */
-	c.identify.opcode = nvme_admin_identify;
-	c.identify.cns = cpu_to_le32(1);
-
-	*id = kmalloc(sizeof(struct nvme_id_ctrl), GFP_KERNEL);
-	if (!*id)
-		return -ENOMEM;
-
-	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
-			sizeof(struct nvme_id_ctrl));
-	if (error)
-		kfree(*id);
-	return error;
-}
-
-int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
-		struct nvme_id_ns **id)
-{
-	struct nvme_command c = { };
-	int error;
-
-	/* gcc-4.4.4 (at least) has issues with initializers and anon unions */
-	c.identify.opcode = nvme_admin_identify,
-	c.identify.nsid = cpu_to_le32(nsid),
-
-	*id = kmalloc(sizeof(struct nvme_id_ns), GFP_KERNEL);
-	if (!*id)
-		return -ENOMEM;
-
-	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
-			sizeof(struct nvme_id_ns));
-	if (error)
-		kfree(*id);
-	return error;
-}
-
-int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned nsid,
-					dma_addr_t dma_addr, u32 *result)
-{
-	struct nvme_command c;
-
-	memset(&c, 0, sizeof(c));
-	c.features.opcode = nvme_admin_get_features;
-	c.features.nsid = cpu_to_le32(nsid);
-	c.features.prp1 = cpu_to_le64(dma_addr);
-	c.features.fid = cpu_to_le32(fid);
-
-	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
-			result, 0);
-}
-
-int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned dword11,
-					dma_addr_t dma_addr, u32 *result)
-{
-	struct nvme_command c;
-
-	memset(&c, 0, sizeof(c));
-	c.features.opcode = nvme_admin_set_features;
-	c.features.prp1 = cpu_to_le64(dma_addr);
-	c.features.fid = cpu_to_le32(fid);
-	c.features.dword11 = cpu_to_le32(dword11);
-
-	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
-			result, 0);
-}
-
-int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log **log)
-{
-	struct nvme_command c = { };
-	int error;
-
-	c.common.opcode = nvme_admin_get_log_page,
-	c.common.nsid = cpu_to_le32(0xFFFFFFFF),
-	c.common.cdw10[0] = cpu_to_le32(
-			(((sizeof(struct nvme_smart_log) / 4) - 1) << 16) |
-			 NVME_LOG_SMART),
-
-	*log = kmalloc(sizeof(struct nvme_smart_log), GFP_KERNEL);
-	if (!*log)
-		return -ENOMEM;
-
-	error = nvme_submit_sync_cmd(dev->admin_q, &c, *log,
-			sizeof(struct nvme_smart_log));
-	if (error)
-		kfree(*log);
-	return error;
-}
-
 /**
  * nvme_abort_req - Attempt aborting a request
  *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/18] nvme: split command submission helpers out of pci.c
  2015-10-16  5:58 ` [PATCH 03/18] nvme: split command submission helpers out of pci.c Christoph Hellwig
@ 2015-10-20 10:07   ` Sagi Grimberg
  2015-10-21 18:48   ` J Freyensee
  1 sibling, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:07 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> Create a new core.c and start by adding the command submission helpers
> to it, which are already abstracted away from the actual hardware queues
> by the block layer.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>

looks good to me,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 03/18] nvme: split command submission helpers out of pci.c
  2015-10-16  5:58 ` [PATCH 03/18] nvme: split command submission helpers out of pci.c Christoph Hellwig
  2015-10-20 10:07   ` Sagi Grimberg
@ 2015-10-21 18:48   ` J Freyensee
  1 sibling, 0 replies; 59+ messages in thread
From: J Freyensee @ 2015-10-21 18:48 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> Create a new core.c and start by adding the command submission 
> helpers
> to it, which are already abstracted away from the actual hardware 
> queues
> by the block layer.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/Makefile |   2 +-
>  drivers/nvme/host/core.c   | 172 
> +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h   |   3 +
>  drivers/nvme/host/pci.c    | 153 +----------------------------------
> -----
>  4 files changed, 177 insertions(+), 153 deletions(-)
>  create mode 100644 drivers/nvme/host/core.c
> 
> diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
> index cfb6679..336b4ea 100644
> --- a/drivers/nvme/host/Makefile
> +++ b/drivers/nvme/host/Makefile
> @@ -1,4 +1,4 @@
>  
>  obj-$(CONFIG_BLK_DEV_NVME)     += nvme.o
>  
> -nvme-y		+= pci.o scsi.o
> +nvme-y		+= core.o pci.o scsi.o
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> new file mode 100644
> index 0000000..dfb528d
> --- /dev/null
> +++ b/drivers/nvme/host/core.c
> @@ -0,0 +1,172 @@
> +/*
> + * NVM Express device driver
> + * Copyright (c) 2011-2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or 
> modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of 
> MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
> License for
> + * more details.
> + */
> +
> +#include <linux/blkdev.h>
> +#include <linux/blk-mq.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include "nvme.h"
> +
> +/*
> + * Returns 0 on success.  If the result is negative, it's a Linux 
> error code;
> + * if the result is positive, it's an NVM Express status code
> + */
> +int __nvme_submit_sync_cmd(struct request_queue *q, struct 
> nvme_command *cmd,
> +		void *buffer, void __user *ubuffer, unsigned 
> bufflen,
> +		u32 *result, unsigned timeout)
> +{
> +	bool write = cmd->common.opcode & 1;
> +	struct bio *bio = NULL;
> +	struct request *req;
> +	int ret;
> +
> +	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	req->cmd_type = REQ_TYPE_DRV_PRIV;
> +	req->cmd_flags |= REQ_FAILFAST_DRIVER;
> +	req->__data_len = 0;
> +	req->__sector = (sector_t) -1;
> +	req->bio = req->biotail = NULL;
> +
> +	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
> +
> +	req->cmd = (unsigned char *)cmd;
> +	req->cmd_len = sizeof(struct nvme_command);
> +	req->special = (void *)0;
> +
> +	if (buffer && bufflen) {
> +		ret = blk_rq_map_kern(q, req, buffer, bufflen, 
> __GFP_WAIT);
> +		if (ret)
> +			goto out;
> +	} else if (ubuffer && bufflen) {
> +		ret = blk_rq_map_user(q, req, NULL, ubuffer, 
> bufflen, __GFP_WAIT);
> +		if (ret)
> +			goto out;
> +		bio = req->bio;
> +	}
> +
> +	blk_execute_rq(req->q, NULL, req, 0);
> +	if (bio)
> +		blk_rq_unmap_user(bio);
> +	if (result)
> +		*result = (u32)(uintptr_t)req->special;
> +	ret = req->errors;
> + out:
> +	blk_mq_free_request(req);
> +	return ret;
> +}
> +
> +int nvme_submit_sync_cmd(struct request_queue *q, struct 
> nvme_command *cmd,
> +		void *buffer, unsigned bufflen)
> +{
> +	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, 
> NULL, 0);
> +}
> +
> +int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl 
> **id)
> +{
> +	struct nvme_command c = { };
> +	int error;
> +
> +	/* gcc-4.4.4 (at least) has issues with initializers and 
> anon unions */
> +	c.identify.opcode = nvme_admin_identify;
> +	c.identify.cns = cpu_to_le32(1);
> +
> +	*id = kmalloc(sizeof(struct nvme_id_ctrl), GFP_KERNEL);
> +	if (!*id)
> +		return -ENOMEM;
> +
> +	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
> +			sizeof(struct nvme_id_ctrl));
> +	if (error)
> +		kfree(*id);
> +	return error;
> +}
> +
> +int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
> +		struct nvme_id_ns **id)
> +{
> +	struct nvme_command c = { };
> +	int error;
> +
> +	/* gcc-4.4.4 (at least) has issues with initializers and 
> anon unions */
> +	c.identify.opcode = nvme_admin_identify,
> +	c.identify.nsid = cpu_to_le32(nsid),
> +
> +	*id = kmalloc(sizeof(struct nvme_id_ns), GFP_KERNEL);
> +	if (!*id)
> +		return -ENOMEM;
> +
> +	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
> +			sizeof(struct nvme_id_ns));
> +	if (error)
> +		kfree(*id);
> +	return error;
> +}
> +
> +int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned 
> nsid,
> +					dma_addr_t dma_addr, u32 
> *result)
> +{
> +	struct nvme_command c;
> +
> +	memset(&c, 0, sizeof(c));
> +	c.features.opcode = nvme_admin_get_features;
> +	c.features.nsid = cpu_to_le32(nsid);
> +	c.features.prp1 = cpu_to_le64(dma_addr);
> +	c.features.fid = cpu_to_le32(fid);
> +
> +	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 
> 0,
> +			result, 0);
> +}
> +
> +int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned 
> dword11,
> +					dma_addr_t dma_addr, u32 
> *result)
> +{
> +	struct nvme_command c;
> +
> +	memset(&c, 0, sizeof(c));
> +	c.features.opcode = nvme_admin_set_features;
> +	c.features.prp1 = cpu_to_le64(dma_addr);
> +	c.features.fid = cpu_to_le32(fid);
> +	c.features.dword11 = cpu_to_le32(dword11);
> +
> +	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 
> 0,
> +			result, 0);
> +}
> +
> +int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log 
> **log)
> +{
> +	struct nvme_command c = { };
> +	int error;
> +
> +	c.common.opcode = nvme_admin_get_log_page,
> +	c.common.nsid = cpu_to_le32(0xFFFFFFFF),
> +	c.common.cdw10[0] = cpu_to_le32(
> +			(((sizeof(struct nvme_smart_log) / 4) - 1) 
> << 16) |
> +			 NVME_LOG_SMART),
> +
> +	*log = kmalloc(sizeof(struct nvme_smart_log), GFP_KERNEL);
> +	if (!*log)
> +		return -ENOMEM;
> +
> +	error = nvme_submit_sync_cmd(dev->admin_q, &c, *log,
> +			sizeof(struct nvme_smart_log));
> +	if (error)
> +		kfree(*log);
> +	return error;
> +}
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 835941b..0633a7b 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -22,6 +22,9 @@
>  extern unsigned char nvme_io_timeout;
>  #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
>  
> +extern unsigned char admin_timeout;
> +#define ADMIN_TIMEOUT	(admin_timeout * HZ)
> +
>  /*
>   * Represents an NVM Express device.  Each nvme_dev is a PCI 
> function.
>   */
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 7e1b438..628c572 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -50,10 +50,9 @@
>  #define NVME_AQ_DEPTH		256
>  #define SQ_SIZE(depth)		(depth * sizeof(struct 
> nvme_command))
>  #define CQ_SIZE(depth)		(depth * sizeof(struct 
> nvme_completion))
> -#define ADMIN_TIMEOUT		(admin_timeout * HZ)
>  #define SHUTDOWN_TIMEOUT	(shutdown_timeout * HZ)

I think some or all of these #defines should be moved out of the .c
file and stuck into a .h file as well?  Why would any of the queue
depths or the shutdown timeout be PCIe specific?

>  
> -static unsigned char admin_timeout = 60;
> +unsigned char admin_timeout = 60;
>  module_param(admin_timeout, byte, 0644);
>  MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin 
> commands");
>  


> @@ -1031,63 +1030,6 @@ static irqreturn_t nvme_irq_check(int irq, 
> void *data)
>  	return IRQ_WAKE_THREAD;
>  }
>  
> -/*
> - * Returns 0 on success.  If the result is negative, it's a Linux 
> error code;
> - * if the result is positive, it's an NVM Express status code
> - */
> -int __nvme_submit_sync_cmd(struct request_queue *q, struct 
> nvme_command *cmd,
> -		void *buffer, void __user *ubuffer, unsigned 
> bufflen,
> -		u32 *result, unsigned timeout)
> -{
> -	bool write = cmd->common.opcode & 1;
> -	struct bio *bio = NULL;
> -	struct request *req;
> -	int ret;
> -
> -	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	req->cmd_type = REQ_TYPE_DRV_PRIV;
> -	req->cmd_flags |= REQ_FAILFAST_DRIVER;
> -	req->__data_len = 0;
> -	req->__sector = (sector_t) -1;
> -	req->bio = req->biotail = NULL;
> -
> -	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
> -
> -	req->cmd = (unsigned char *)cmd;
> -	req->cmd_len = sizeof(struct nvme_command);
> -	req->special = (void *)0;
> -
> -	if (buffer && bufflen) {
> -		ret = blk_rq_map_kern(q, req, buffer, bufflen, 
> __GFP_WAIT);
> -		if (ret)
> -			goto out;
> -	} else if (ubuffer && bufflen) {
> -		ret = blk_rq_map_user(q, req, NULL, ubuffer, 
> bufflen, __GFP_WAIT);
> -		if (ret)
> -			goto out;
> -		bio = req->bio;
> -	}
> -
> -	blk_execute_rq(req->q, NULL, req, 0);
> -	if (bio)
> -		blk_rq_unmap_user(bio);
> -	if (result)
> -		*result = (u32)(uintptr_t)req->special;
> -	ret = req->errors;
> - out:
> -	blk_mq_free_request(req);
> -	return ret;
> -}
> -
> -int nvme_submit_sync_cmd(struct request_queue *q, struct 
> nvme_command *cmd,
> -		void *buffer, unsigned bufflen)
> -{
> -	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, 
> NULL, 0);
> -}
> -
>  static int nvme_submit_async_admin_req(struct nvme_dev *dev)
>  {
>  	struct nvme_queue *nvmeq = dev->queues[0];
> @@ -1199,99 +1141,6 @@ static int adapter_delete_sq(struct nvme_dev 
> *dev, u16 sqid)
>  	return adapter_delete_queue(dev, nvme_admin_delete_sq, 
> sqid);
>  }
>  
> -int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl 
> **id)
> -{
> -	struct nvme_command c = { };
> -	int error;
> -
> -	/* gcc-4.4.4 (at least) has issues with initializers and 
> anon unions */
> -	c.identify.opcode = nvme_admin_identify;
> -	c.identify.cns = cpu_to_le32(1);
> -
> -	*id = kmalloc(sizeof(struct nvme_id_ctrl), GFP_KERNEL);
> -	if (!*id)
> -		return -ENOMEM;
> -
> -	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
> -			sizeof(struct nvme_id_ctrl));
> -	if (error)
> -		kfree(*id);
> -	return error;
> -}
> -
> -int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
> -		struct nvme_id_ns **id)
> -{
> -	struct nvme_command c = { };
> -	int error;
> -
> -	/* gcc-4.4.4 (at least) has issues with initializers and 
> anon unions */
> -	c.identify.opcode = nvme_admin_identify,
> -	c.identify.nsid = cpu_to_le32(nsid),
> -
> -	*id = kmalloc(sizeof(struct nvme_id_ns), GFP_KERNEL);
> -	if (!*id)
> -		return -ENOMEM;
> -
> -	error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
> -			sizeof(struct nvme_id_ns));
> -	if (error)
> -		kfree(*id);
> -	return error;
> -}
> -
> -int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned 
> nsid,
> -					dma_addr_t dma_addr, u32 
> *result)
> -{
> -	struct nvme_command c;
> -
> -	memset(&c, 0, sizeof(c));
> -	c.features.opcode = nvme_admin_get_features;
> -	c.features.nsid = cpu_to_le32(nsid);
> -	c.features.prp1 = cpu_to_le64(dma_addr);
> -	c.features.fid = cpu_to_le32(fid);
> -
> -	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 
> 0,
> -			result, 0);
> -}
> -
> -int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned 
> dword11,
> -					dma_addr_t dma_addr, u32 
> *result)
> -{
> -	struct nvme_command c;
> -
> -	memset(&c, 0, sizeof(c));
> -	c.features.opcode = nvme_admin_set_features;
> -	c.features.prp1 = cpu_to_le64(dma_addr);
> -	c.features.fid = cpu_to_le32(fid);
> -	c.features.dword11 = cpu_to_le32(dword11);
> -
> -	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 
> 0,
> -			result, 0);
> -}
> -
> -int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log 
> **log)
> -{
> -	struct nvme_command c = { };
> -	int error;
> -
> -	c.common.opcode = nvme_admin_get_log_page,
> -	c.common.nsid = cpu_to_le32(0xFFFFFFFF),
> -	c.common.cdw10[0] = cpu_to_le32(
> -			(((sizeof(struct nvme_smart_log) / 4) - 1) 
> << 16) |
> -			 NVME_LOG_SMART),
> -
> -	*log = kmalloc(sizeof(struct nvme_smart_log), GFP_KERNEL);
> -	if (!*log)
> -		return -ENOMEM;
> -
> -	error = nvme_submit_sync_cmd(dev->admin_q, &c, *log,
> -			sizeof(struct nvme_smart_log));
> -	if (error)
> -		kfree(*log);
> -	return error;
> -}
> -
>  /**
>   * nvme_abort_req - Attempt aborting a request
>   *

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 04/18] nvme: add a vendor field to struct nvme_dev
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 03/18] nvme: split command submission helpers out of pci.c Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:08   ` Sagi Grimberg
  2015-10-21 18:58   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 05/18] nvme: use offset instead of a struct for registers Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


The SCSI translation layer currently has to poke into the PCI device
structure to find a vendor ID for the device identification fallback.
We won't nessecarily have a PCI device behind the device structure in
the future, so add a new vendor field that can be filled out by the
PCIe driver instead.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Acked-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/nvme.h | 1 +
 drivers/nvme/host/pci.c  | 3 +++
 drivers/nvme/host/scsi.c | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 0633a7b..706f678 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -69,6 +69,7 @@ struct nvme_dev {
 	u16 abort_limit;
 	u8 event_limit;
 	u8 vwc;
+	u16 vendor;
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 628c572..cd731f5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3025,6 +3025,9 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_WORK(&dev->reset_work, nvme_reset_work);
 	dev->dev = get_device(&pdev->dev);
 	pci_set_drvdata(pdev, dev);
+
+	dev->vendor = pdev->vendor;
+
 	result = nvme_set_instance(dev);
 	if (result)
 		goto put_pci;
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index c3d8d38..8f2d2c5 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -657,7 +657,7 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 		inq_response[6] = 0x00;    /* Rsvd */
 		inq_response[7] = 0x44;    /* Designator Length */
 
-		sprintf(&inq_response[8], "%04x", to_pci_dev(dev->dev)->vendor);
+		sprintf(&inq_response[8], "%04x", dev->vendor);
 		memcpy(&inq_response[12], dev->model, sizeof(dev->model));
 		sprintf(&inq_response[52], "%04x", tmp_id);
 		memcpy(&inq_response[56], dev->serial, sizeof(dev->serial));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/18] nvme: add a vendor field to struct nvme_dev
  2015-10-16  5:58 ` [PATCH 04/18] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
@ 2015-10-20 10:08   ` Sagi Grimberg
  2015-10-21 18:58   ` J Freyensee
  1 sibling, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:08 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> The SCSI translation layer currently has to poke into the PCI device
> structure to find a vendor ID for the device identification fallback.
> We won't nessecarily have a PCI device behind the device structure in
> the future, so add a new vendor field that can be filled out by the
> PCIe driver instead.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>

looks good to me,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 04/18] nvme: add a vendor field to struct nvme_dev
  2015-10-16  5:58 ` [PATCH 04/18] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
  2015-10-20 10:08   ` Sagi Grimberg
@ 2015-10-21 18:58   ` J Freyensee
  2015-10-21 19:10     ` Busch, Keith
  1 sibling, 1 reply; 59+ messages in thread
From: J Freyensee @ 2015-10-21 18:58 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> The SCSI translation layer currently has to poke into the PCI device
> structure to find a vendor ID for the device identification fallback.
> We won't nessecarily have a PCI device behind the device structure in
> the future, so add a new vendor field that can be filled out by the
> PCIe driver instead.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/nvme.h | 1 +
>  drivers/nvme/host/pci.c  | 3 +++
>  drivers/nvme/host/scsi.c | 2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 0633a7b..706f678 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -69,6 +69,7 @@ struct nvme_dev {
>  	u16 abort_limit;
>  	u8 event_limit;
>  	u8 vwc;
> +	u16 vendor;
>  };
>  
>  /*
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 628c572..cd731f5 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3025,6 +3025,9 @@ static int nvme_probe(struct pci_dev *pdev, 
> const struct pci_device_id *id)
>  	INIT_WORK(&dev->reset_work, nvme_reset_work);
>  	dev->dev = get_device(&pdev->dev);
>  	pci_set_drvdata(pdev, dev);
> +
> +	dev->vendor = pdev->vendor;
> +
>  	result = nvme_set_instance(dev);
>  	if (result)
>  		goto put_pci;
> diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
> index c3d8d38..8f2d2c5 100644
> --- a/drivers/nvme/host/scsi.c
> +++ b/drivers/nvme/host/scsi.c
> @@ -657,7 +657,7 @@ static int nvme_trans_device_id_page(struct 
> nvme_ns *ns, struct sg_io_hdr *hdr,
>  		inq_response[6] = 0x00;    /* Rsvd */
>  		inq_response[7] = 0x44;    /* Designator Length */
>  
> -		sprintf(&inq_response[8], "%04x", to_pci_dev(dev
> ->dev)->vendor);
> +		sprintf(&inq_response[8], "%04x", dev->vendor);

I'm ok with this patch, but I wanted to ask the question for my own
benefit, what is the Linux kernel open-source practice of using
sprintf() and string settings?  I typically try to use snprintf().


>  		memcpy(&inq_response[12], dev->model, sizeof(dev
> ->model));
>  		sprintf(&inq_response[52], "%04x", tmp_id);
>  		memcpy(&inq_response[56], dev->serial, sizeof(dev
> ->serial));

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 04/18] nvme: add a vendor field to struct nvme_dev
  2015-10-21 18:58   ` J Freyensee
@ 2015-10-21 19:10     ` Busch, Keith
  0 siblings, 0 replies; 59+ messages in thread
From: Busch, Keith @ 2015-10-21 19:10 UTC (permalink / raw)


On Wed, Oct 21, 2015@11:58:11AM -0700, J Freyensee wrote:
> On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> > -		sprintf(&inq_response[8], "%04x", to_pci_dev(dev
> > ->dev)->vendor);
> > +		sprintf(&inq_response[8], "%04x", dev->vendor);
> 
> I'm ok with this patch, but I wanted to ask the question for my own
> benefit, what is the Linux kernel open-source practice of using
> sprintf() and string settings?  I typically try to use snprintf().

Generally yes, snprintf is preferred, though the usage in this specific
example is more similar to a memcpy. We know the buffer size and the
length being copied into it is fixed; sprintf is just convenient.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 05/18] nvme: use offset instead of a struct for registers
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 04/18] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16 17:30   ` Busch, Keith
  2015-10-21 20:28   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


This makes life easier for future non-PCI drivers where access to the
registers might be more complicated.  Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Acked-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 58 +++++++++++++++++++++++++-----------------------
 drivers/nvme/host/scsi.c |  6 ++---
 include/linux/nvme.h     | 27 +++++++++++-----------
 4 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 706f678..370aa5b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -46,7 +46,7 @@ struct nvme_dev {
 	u32 db_stride;
 	u32 ctrl_config;
 	struct msix_entry *entry;
-	struct nvme_bar __iomem *bar;
+	void __iomem *bar;
 	struct list_head namespaces;
 	struct kref kref;
 	struct device *device;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cd731f5..6b0dcb6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1307,7 +1307,7 @@ static void nvme_disable_queue(struct nvme_dev *dev, int qid)
 
 	/* Don't tell the adapter to delete the admin queue.
 	 * Don't tell a removed adapter to delete IO queues. */
-	if (qid && readl(&dev->bar->csts) != -1) {
+	if (qid && readl(dev->bar + NVME_REG_CSTS) != -1) {
 		adapter_delete_sq(dev, qid);
 		adapter_delete_cq(dev, qid);
 	}
@@ -1460,7 +1460,7 @@ static int nvme_wait_ready(struct nvme_dev *dev, u64 cap, bool enabled)
 
 	timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
 
-	while ((readl(&dev->bar->csts) & NVME_CSTS_RDY) != bit) {
+	while ((readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_RDY) != bit) {
 		msleep(100);
 		if (fatal_signal_pending(current))
 			return -EINTR;
@@ -1485,7 +1485,7 @@ static int nvme_disable_ctrl(struct nvme_dev *dev, u64 cap)
 {
 	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
 	dev->ctrl_config &= ~NVME_CC_ENABLE;
-	writel(dev->ctrl_config, &dev->bar->cc);
+	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
 
 	return nvme_wait_ready(dev, cap, false);
 }
@@ -1494,7 +1494,7 @@ static int nvme_enable_ctrl(struct nvme_dev *dev, u64 cap)
 {
 	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
 	dev->ctrl_config |= NVME_CC_ENABLE;
-	writel(dev->ctrl_config, &dev->bar->cc);
+	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
 
 	return nvme_wait_ready(dev, cap, true);
 }
@@ -1506,10 +1506,10 @@ static int nvme_shutdown_ctrl(struct nvme_dev *dev)
 	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
 	dev->ctrl_config |= NVME_CC_SHN_NORMAL;
 
-	writel(dev->ctrl_config, &dev->bar->cc);
+	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
 
 	timeout = SHUTDOWN_TIMEOUT + jiffies;
-	while ((readl(&dev->bar->csts) & NVME_CSTS_SHST_MASK) !=
+	while ((readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_SHST_MASK) !=
 							NVME_CSTS_SHST_CMPLT) {
 		msleep(100);
 		if (fatal_signal_pending(current))
@@ -1584,7 +1584,7 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 {
 	int result;
 	u32 aqa;
-	u64 cap = readq(&dev->bar->cap);
+	u64 cap = readq(dev->bar + NVME_REG_CAP);
 	struct nvme_queue *nvmeq;
 	unsigned page_shift = PAGE_SHIFT;
 	unsigned dev_page_min = NVME_CAP_MPSMIN(cap) + 12;
@@ -1605,11 +1605,12 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 		page_shift = dev_page_max;
 	}
 
-	dev->subsystem = readl(&dev->bar->vs) >= NVME_VS(1, 1) ?
+	dev->subsystem = readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 1) ?
 						NVME_CAP_NSSRC(cap) : 0;
 
-	if (dev->subsystem && (readl(&dev->bar->csts) & NVME_CSTS_NSSRO))
-		writel(NVME_CSTS_NSSRO, &dev->bar->csts);
+	if (dev->subsystem &&
+	    (readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_NSSRO))
+		writel(NVME_CSTS_NSSRO, dev->bar + NVME_REG_CSTS);
 
 	result = nvme_disable_ctrl(dev, cap);
 	if (result < 0)
@@ -1632,9 +1633,9 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 	dev->ctrl_config |= NVME_CC_ARB_RR | NVME_CC_SHN_NONE;
 	dev->ctrl_config |= NVME_CC_IOSQES | NVME_CC_IOCQES;
 
-	writel(aqa, &dev->bar->aqa);
-	writeq(nvmeq->sq_dma_addr, &dev->bar->asq);
-	writeq(nvmeq->cq_dma_addr, &dev->bar->acq);
+	writel(aqa, dev->bar + NVME_REG_AQA);
+	writeq(nvmeq->sq_dma_addr, dev->bar + NVME_REG_ASQ);
+	writeq(nvmeq->cq_dma_addr, dev->bar + NVME_REG_ACQ);
 
 	result = nvme_enable_ctrl(dev, cap);
 	if (result)
@@ -1776,7 +1777,7 @@ static int nvme_subsys_reset(struct nvme_dev *dev)
 	if (!dev->subsystem)
 		return -ENOTTY;
 
-	writel(0x4E564D65, &dev->bar->nssr); /* "NVMe" */
+	writel(0x4E564D65, dev->bar + NVME_REG_NSSR); /* "NVMe" */
 	return 0;
 }
 
@@ -1954,14 +1955,14 @@ static int nvme_kthread(void *data)
 		spin_lock(&dev_list_lock);
 		list_for_each_entry_safe(dev, next, &dev_list, node) {
 			int i;
-			u32 csts = readl(&dev->bar->csts);
+			u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
 			if ((dev->subsystem && (csts & NVME_CSTS_NSSRO)) ||
 							csts & NVME_CSTS_CFS) {
 				if (!__nvme_reset(dev)) {
 					dev_warn(dev->dev,
 						"Failed status: %x, reset controller\n",
-						readl(&dev->bar->csts));
+						readl(dev->bar + NVME_REG_CSTS));
 				}
 				continue;
 			}
@@ -2119,11 +2120,11 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 	if (!use_cmb_sqes)
 		return NULL;
 
-	dev->cmbsz = readl(&dev->bar->cmbsz);
+	dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ);
 	if (!(NVME_CMB_SZ(dev->cmbsz)))
 		return NULL;
 
-	cmbloc = readl(&dev->bar->cmbloc);
+	cmbloc = readl(dev->bar + NVME_REG_CMBLOC);
 
 	szu = (u64)1 << (12 + 4 * NVME_CMB_SZU(dev->cmbsz));
 	size = szu * NVME_CMB_SZ(dev->cmbsz);
@@ -2197,7 +2198,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 				return -ENOMEM;
 			size = db_bar_size(dev, nr_io_queues);
 		} while (1);
-		dev->dbs = ((void __iomem *)dev->bar) + 4096;
+		dev->dbs = dev->bar + 4096;
 		adminq->q_db = dev->dbs;
 	}
 
@@ -2273,8 +2274,9 @@ static struct nvme_ns *nvme_find_ns(struct nvme_dev *dev, unsigned nsid)
 
 static inline bool nvme_io_incapable(struct nvme_dev *dev)
 {
-	return (!dev->bar || readl(&dev->bar->csts) & NVME_CSTS_CFS ||
-							dev->online_queues < 2);
+	return (!dev->bar ||
+		readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_CFS ||
+		dev->online_queues < 2);
 }
 
 static void nvme_ns_remove(struct nvme_ns *ns)
@@ -2357,7 +2359,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int res;
 	struct nvme_id_ctrl *ctrl;
-	int shift = NVME_CAP_MPSMIN(readq(&dev->bar->cap)) + 12;
+	int shift = NVME_CAP_MPSMIN(readq(dev->bar + NVME_REG_CAP)) + 12;
 
 	res = nvme_identify_ctrl(dev, &ctrl);
 	if (res) {
@@ -2431,7 +2433,7 @@ static int nvme_dev_map(struct nvme_dev *dev)
 	if (!dev->bar)
 		goto disable;
 
-	if (readl(&dev->bar->csts) == -1) {
+	if (readl(dev->bar + NVME_REG_CSTS) == -1) {
 		result = -ENODEV;
 		goto unmap;
 	}
@@ -2446,11 +2448,11 @@ static int nvme_dev_map(struct nvme_dev *dev)
 			goto unmap;
 	}
 
-	cap = readq(&dev->bar->cap);
+	cap = readq(dev->bar + NVME_REG_CAP);
 	dev->q_depth = min_t(int, NVME_CAP_MQES(cap) + 1, NVME_Q_DEPTH);
 	dev->db_stride = 1 << NVME_CAP_STRIDE(cap);
-	dev->dbs = ((void __iomem *)dev->bar) + 4096;
-	if (readl(&dev->bar->vs) >= NVME_VS(1, 2))
+	dev->dbs = dev->bar + 4096;
+	if (readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 2))
 		dev->cmb = nvme_map_cmb(dev);
 
 	return 0;
@@ -2509,7 +2511,7 @@ static void nvme_wait_dq(struct nvme_delq_ctx *dq, struct nvme_dev *dev)
 			 * queues than admin tags.
 			 */
 			set_current_state(TASK_RUNNING);
-			nvme_disable_ctrl(dev, readq(&dev->bar->cap));
+			nvme_disable_ctrl(dev, readq(dev->bar + NVME_REG_CAP));
 			nvme_clear_queue(dev->queues[0]);
 			flush_kthread_worker(dq->worker);
 			nvme_disable_queue(dev, 0);
@@ -2681,7 +2683,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 
 	if (dev->bar) {
 		nvme_freeze_queues(dev);
-		csts = readl(&dev->bar->csts);
+		csts = readl(dev->bar + NVME_REG_CSTS);
 	}
 	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
 		for (i = dev->queue_count - 1; i >= 0; i--) {
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index 8f2d2c5..a5f6af1 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -611,7 +611,7 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 
 	memset(inq_response, 0, alloc_len);
 	inq_response[1] = INQ_DEVICE_IDENTIFICATION_PAGE;    /* Page Code */
-	if (readl(&dev->bar->vs) >= NVME_VS(1, 1)) {
+	if (readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 1)) {
 		struct nvme_id_ns *id_ns;
 		void *eui;
 		int len;
@@ -623,7 +623,7 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 
 		eui = id_ns->eui64;
 		len = sizeof(id_ns->eui64);
-		if (readl(&dev->bar->vs) >= NVME_VS(1, 2)) {
+		if (readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 2)) {
 			if (bitmap_empty(eui, len * 8)) {
 				eui = id_ns->nguid;
 				len = sizeof(id_ns->nguid);
@@ -2297,7 +2297,7 @@ static int nvme_trans_test_unit_ready(struct nvme_ns *ns,
 {
 	struct nvme_dev *dev = ns->dev;
 
-	if (!(readl(&dev->bar->csts) & NVME_CSTS_RDY))
+	if (!(readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_RDY))
 		return nvme_trans_completion(hdr, SAM_STAT_CHECK_CONDITION,
 					    NOT_READY, SCSI_ASC_LUN_NOT_READY,
 					    SCSI_ASCQ_CAUSE_NOT_REPORTABLE);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 3af5f45..4d5e513 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -17,20 +17,19 @@
 
 #include <linux/types.h>
 
-struct nvme_bar {
-	__u64			cap;	/* Controller Capabilities */
-	__u32			vs;	/* Version */
-	__u32			intms;	/* Interrupt Mask Set */
-	__u32			intmc;	/* Interrupt Mask Clear */
-	__u32			cc;	/* Controller Configuration */
-	__u32			rsvd1;	/* Reserved */
-	__u32			csts;	/* Controller Status */
-	__u32			nssr;	/* Subsystem Reset */
-	__u32			aqa;	/* Admin Queue Attributes */
-	__u64			asq;	/* Admin SQ Base Address */
-	__u64			acq;	/* Admin CQ Base Address */
-	__u32			cmbloc; /* Controller Memory Buffer Location */
-	__u32			cmbsz;  /* Controller Memory Buffer Size */
+enum {
+	NVME_REG_CAP	= 0x0000,	/* Controller Capabilities */
+	NVME_REG_VS	= 0x0008,	/* Version */
+	NVME_REG_INTMS	= 0x000c,	/* Interrupt Mask Set */
+	NVME_REG_INTMC	= 0x0010,	/* Interrupt Mask Set */
+	NVME_REG_CC	= 0x0014,	/* Controller Configuration */
+	NVME_REG_CSTS	= 0x001c,	/* Controller Status */
+	NVME_REG_NSSR	= 0x0020,	/* NVM Subsystem Reset */
+	NVME_REG_AQA	= 0x0024,	/* Admin Queue Attributes */
+	NVME_REG_ASQ	= 0x0028,	/* Admin SQ Base Address */
+	NVME_REG_ACQ	= 0x0030,	/* Admin SQ Base Address */
+	NVME_REG_CMBLOC = 0x0038,	/* Controller Memory Buffer Location */
+	NVME_REG_CMBSZ	= 0x0040,	/* Controller Memory Buffer Size */
 };
 
 #define NVME_CAP_MQES(cap)	((cap) & 0xffff)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/18] nvme: use offset instead of a struct for registers
  2015-10-16  5:58 ` [PATCH 05/18] nvme: use offset instead of a struct for registers Christoph Hellwig
@ 2015-10-16 17:30   ` Busch, Keith
  2015-10-21 20:28   ` J Freyensee
  1 sibling, 0 replies; 59+ messages in thread
From: Busch, Keith @ 2015-10-16 17:30 UTC (permalink / raw)


On Fri, Oct 16, 2015@07:58:35AM +0200, Christoph Hellwig wrote:
> This makes life easier for future non-PCI drivers where access to the
> registers might be more complicated.  Note that Linux drivers are
> pretty evenly split between the two versions, and in fact the NVMe
> driver already uses offsets for the doorbells.

> +enum {
> +	NVME_REG_CAP	= 0x0000,	/* Controller Capabilities */
> +	NVME_REG_VS	= 0x0008,	/* Version */
> +	NVME_REG_INTMS	= 0x000c,	/* Interrupt Mask Set */
> +	NVME_REG_INTMC	= 0x0010,	/* Interrupt Mask Set */
> +	NVME_REG_CC	= 0x0014,	/* Controller Configuration */
> +	NVME_REG_CSTS	= 0x001c,	/* Controller Status */
> +	NVME_REG_NSSR	= 0x0020,	/* NVM Subsystem Reset */
> +	NVME_REG_AQA	= 0x0024,	/* Admin Queue Attributes */
> +	NVME_REG_ASQ	= 0x0028,	/* Admin SQ Base Address */
> +	NVME_REG_ACQ	= 0x0030,	/* Admin SQ Base Address */
> +	NVME_REG_CMBLOC = 0x0038,	/* Controller Memory Buffer Location */
> +	NVME_REG_CMBSZ	= 0x0040,	/* Controller Memory Buffer Size */
>  };

Darn, CMBSZ is offset 0x3c. Missed that in the first review; controllers
with this capability haven't made it to my sanity test machines yet...

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 05/18] nvme: use offset instead of a struct for registers
  2015-10-16  5:58 ` [PATCH 05/18] nvme: use offset instead of a struct for registers Christoph Hellwig
  2015-10-16 17:30   ` Busch, Keith
@ 2015-10-21 20:28   ` J Freyensee
  1 sibling, 0 replies; 59+ messages in thread
From: J Freyensee @ 2015-10-21 20:28 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> This makes life easier for future non-PCI drivers where access to the
> registers might be more complicated.  Note that Linux drivers are
> pretty evenly split between the two versions, and in fact the NVMe
> driver already uses offsets for the doorbells.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/nvme.h |  2 +-
>  drivers/nvme/host/pci.c  | 58 +++++++++++++++++++++++++-------------
> ----------
>  drivers/nvme/host/scsi.c |  6 ++---
>  include/linux/nvme.h     | 27 +++++++++++-----------
>  4 files changed, 47 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 706f678..370aa5b 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -46,7 +46,7 @@ struct nvme_dev {
>  	u32 db_stride;
>  	u32 ctrl_config;
>  	struct msix_entry *entry;
> -	struct nvme_bar __iomem *bar;
> +	void __iomem *bar;
>  	struct list_head namespaces;
>  	struct kref kref;
>  	struct device *device;
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cd731f5..6b0dcb6 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1307,7 +1307,7 @@ static void nvme_disable_queue(struct nvme_dev 
> *dev, int qid)
>  
>  	/* Don't tell the adapter to delete the admin queue.
>  	 * Don't tell a removed adapter to delete IO queues. */
> -	if (qid && readl(&dev->bar->csts) != -1) {
> +	if (qid && readl(dev->bar + NVME_REG_CSTS) != -1) {
>  		adapter_delete_sq(dev, qid);
>  		adapter_delete_cq(dev, qid);
>  	}
> @@ -1460,7 +1460,7 @@ static int nvme_wait_ready(struct nvme_dev 
> *dev, u64 cap, bool enabled)
>  
>  	timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
>  
> -	while ((readl(&dev->bar->csts) & NVME_CSTS_RDY) != bit) {
> +	while ((readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_RDY) != 
> bit) {
>  		msleep(100);
>  		if (fatal_signal_pending(current))
>  			return -EINTR;
> @@ -1485,7 +1485,7 @@ static int nvme_disable_ctrl(struct nvme_dev 
> *dev, u64 cap)
>  {
>  	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
>  	dev->ctrl_config &= ~NVME_CC_ENABLE;
> -	writel(dev->ctrl_config, &dev->bar->cc);
> +	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
>  
>  	return nvme_wait_ready(dev, cap, false);
>  }
> @@ -1494,7 +1494,7 @@ static int nvme_enable_ctrl(struct nvme_dev 
> *dev, u64 cap)
>  {
>  	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
>  	dev->ctrl_config |= NVME_CC_ENABLE;
> -	writel(dev->ctrl_config, &dev->bar->cc);
> +	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
>  
>  	return nvme_wait_ready(dev, cap, true);
>  }
> @@ -1506,10 +1506,10 @@ static int nvme_shutdown_ctrl(struct nvme_dev 
> *dev)
>  	dev->ctrl_config &= ~NVME_CC_SHN_MASK;
>  	dev->ctrl_config |= NVME_CC_SHN_NORMAL;
>  
> -	writel(dev->ctrl_config, &dev->bar->cc);
> +	writel(dev->ctrl_config, dev->bar + NVME_REG_CC);
>  
>  	timeout = SHUTDOWN_TIMEOUT + jiffies;
> -	while ((readl(&dev->bar->csts) & NVME_CSTS_SHST_MASK) !=
> +	while ((readl(dev->bar + NVME_REG_CSTS) & 
> NVME_CSTS_SHST_MASK) !=
>  							NVME_CSTS_SH
> ST_CMPLT) {
>  		msleep(100);
>  		if (fatal_signal_pending(current))
> @@ -1584,7 +1584,7 @@ static int nvme_configure_admin_queue(struct 
> nvme_dev *dev)
>  {
>  	int result;
>  	u32 aqa;
> -	u64 cap = readq(&dev->bar->cap);
> +	u64 cap = readq(dev->bar + NVME_REG_CAP);
>  	struct nvme_queue *nvmeq;
>  	unsigned page_shift = PAGE_SHIFT;
>  	unsigned dev_page_min = NVME_CAP_MPSMIN(cap) + 12;
> @@ -1605,11 +1605,12 @@ static int nvme_configure_admin_queue(struct 
> nvme_dev *dev)
>  		page_shift = dev_page_max;
>  	}
>  
> -	dev->subsystem = readl(&dev->bar->vs) >= NVME_VS(1, 1) ?
> +	dev->subsystem = readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 
> 1) ?
>  						NVME_CAP_NSSRC(cap) 
> : 0;
>  
> -	if (dev->subsystem && (readl(&dev->bar->csts) & 
> NVME_CSTS_NSSRO))
> -		writel(NVME_CSTS_NSSRO, &dev->bar->csts);
> +	if (dev->subsystem &&
> +	    (readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_NSSRO))
> +		writel(NVME_CSTS_NSSRO, dev->bar + NVME_REG_CSTS);
>  
>  	result = nvme_disable_ctrl(dev, cap);
>  	if (result < 0)
> @@ -1632,9 +1633,9 @@ static int nvme_configure_admin_queue(struct 
> nvme_dev *dev)
>  	dev->ctrl_config |= NVME_CC_ARB_RR | NVME_CC_SHN_NONE;
>  	dev->ctrl_config |= NVME_CC_IOSQES | NVME_CC_IOCQES;
>  
> -	writel(aqa, &dev->bar->aqa);
> -	writeq(nvmeq->sq_dma_addr, &dev->bar->asq);
> -	writeq(nvmeq->cq_dma_addr, &dev->bar->acq);
> +	writel(aqa, dev->bar + NVME_REG_AQA);
> +	writeq(nvmeq->sq_dma_addr, dev->bar + NVME_REG_ASQ);
> +	writeq(nvmeq->cq_dma_addr, dev->bar + NVME_REG_ACQ);
>  
>  	result = nvme_enable_ctrl(dev, cap);
>  	if (result)
> @@ -1776,7 +1777,7 @@ static int nvme_subsys_reset(struct nvme_dev 
> *dev)
>  	if (!dev->subsystem)
>  		return -ENOTTY;
>  
> -	writel(0x4E564D65, &dev->bar->nssr); /* "NVMe" */
> +	writel(0x4E564D65, dev->bar + NVME_REG_NSSR); /* "NVMe" */

It would be nice if this value was a macro in a .h file as this is not
necessarily specific to PCIe.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 05/18] nvme: use offset instead of a struct for registers Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:19   ` Sagi Grimberg
  2015-10-21 21:23   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


The new struct nvme_ctrl will be used by the common NVMe code that sits
on top of struct request_queue and the new nvme_ctrl_ops abstraction.
It only contains the bare minimum required, which consists of values
sampled during controller probe, the admin queue pointer and a second
struct device pointer at the moment, but more will follow later.  Only
values that are not used in the I/O fast path should be moved to
struct nvme_ctrl so that drivers can optimize their cache line usage
easily.  That's also the reason why we have two device pointers as
the struct device is used for DMA mapping purposes.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Acked-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c |  10 +--
 drivers/nvme/host/nvme.h |  61 ++++++---------
 drivers/nvme/host/pci.c  | 190 +++++++++++++++++++++++++++++++----------------
 drivers/nvme/host/scsi.c |  85 ++++++++++-----------
 4 files changed, 190 insertions(+), 156 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dfb528d..e2e8818 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -78,7 +78,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, NULL, 0);
 }
 
-int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl **id)
+int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 {
 	struct nvme_command c = { };
 	int error;
@@ -98,7 +98,7 @@ int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl **id)
 	return error;
 }
 
-int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
+int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 		struct nvme_id_ns **id)
 {
 	struct nvme_command c = { };
@@ -119,7 +119,7 @@ int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
 	return error;
 }
 
-int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned nsid,
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 					dma_addr_t dma_addr, u32 *result)
 {
 	struct nvme_command c;
@@ -134,7 +134,7 @@ int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned nsid,
 			result, 0);
 }
 
-int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned dword11,
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 					dma_addr_t dma_addr, u32 *result)
 {
 	struct nvme_command c;
@@ -149,7 +149,7 @@ int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned dword11,
 			result, 0);
 }
 
-int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log **log)
+int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log)
 {
 	struct nvme_command c = { };
 	int error;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 370aa5b..3e409fa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -25,46 +25,16 @@ extern unsigned char nvme_io_timeout;
 extern unsigned char admin_timeout;
 #define ADMIN_TIMEOUT	(admin_timeout * HZ)
 
-/*
- * Represents an NVM Express device.  Each nvme_dev is a PCI function.
- */
-struct nvme_dev {
-	struct list_head node;
-	struct nvme_queue **queues;
+struct nvme_ctrl {
+	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
-	struct blk_mq_tag_set tagset;
-	struct blk_mq_tag_set admin_tagset;
-	u32 __iomem *dbs;
 	struct device *dev;
-	struct dma_pool *prp_page_pool;
-	struct dma_pool *prp_small_pool;
 	int instance;
-	unsigned queue_count;
-	unsigned online_queues;
-	unsigned max_qid;
-	int q_depth;
-	u32 db_stride;
-	u32 ctrl_config;
-	struct msix_entry *entry;
-	void __iomem *bar;
-	struct list_head namespaces;
-	struct kref kref;
-	struct device *device;
-	struct work_struct reset_work;
-	struct work_struct probe_work;
-	struct work_struct scan_work;
+
 	char name[12];
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
-	bool subsystem;
-	u32 max_hw_sectors;
-	u32 stripe_size;
-	u32 page_size;
-	void __iomem *cmb;
-	dma_addr_t cmb_dma_addr;
-	u64 cmb_size;
-	u32 cmbsz;
 	u16 oncs;
 	u16 abort_limit;
 	u8 event_limit;
@@ -78,7 +48,7 @@ struct nvme_dev {
 struct nvme_ns {
 	struct list_head list;
 
-	struct nvme_dev *dev;
+	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
 	struct kref kref;
@@ -92,6 +62,19 @@ struct nvme_ns {
 	u32 mode_select_block_len;
 };
 
+struct nvme_ctrl_ops {
+	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
+};
+
+static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
+{
+	u32 val = 0;
+
+	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val))
+		return false;
+	return val & NVME_CSTS_RDY;
+}
+
 static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 {
 	return (sector >> (ns->lba_shift - 9));
@@ -102,13 +85,13 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buffer, void __user *ubuffer, unsigned bufflen,
 		u32 *result, unsigned timeout);
-int nvme_identify_ctrl(struct nvme_dev *dev, struct nvme_id_ctrl **id);
-int nvme_identify_ns(struct nvme_dev *dev, unsigned nsid,
+int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id);
+int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 		struct nvme_id_ns **id);
-int nvme_get_log_page(struct nvme_dev *dev, struct nvme_smart_log **log);
-int nvme_get_features(struct nvme_dev *dev, unsigned fid, unsigned nsid,
+int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log);
+int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 			dma_addr_t dma_addr, u32 *result);
-int nvme_set_features(struct nvme_dev *dev, unsigned fid, unsigned dword11,
+int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 			dma_addr_t dma_addr, u32 *result);
 
 struct sg_io_hdr;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6b0dcb6..8b0ba11 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -85,6 +85,9 @@ static wait_queue_head_t nvme_kthread_wait;
 
 static struct class *nvme_class;
 
+struct nvme_dev;
+struct nvme_queue;
+
 static int __nvme_reset(struct nvme_dev *dev);
 static int nvme_reset(struct nvme_dev *dev);
 static int nvme_process_cq(struct nvme_queue *nvmeq);
@@ -100,6 +103,49 @@ struct async_cmd_info {
 };
 
 /*
+ * Represents an NVM Express device.  Each nvme_dev is a PCI function.
+ */
+struct nvme_dev {
+	struct list_head node;
+	struct nvme_queue **queues;
+	struct blk_mq_tag_set tagset;
+	struct blk_mq_tag_set admin_tagset;
+	u32 __iomem *dbs;
+	struct device *dev;
+	struct dma_pool *prp_page_pool;
+	struct dma_pool *prp_small_pool;
+	unsigned queue_count;
+	unsigned online_queues;
+	unsigned max_qid;
+	int q_depth;
+	u32 db_stride;
+	u32 ctrl_config;
+	struct msix_entry *entry;
+	void __iomem *bar;
+	struct list_head namespaces;
+	struct kref kref;
+	struct device *device;
+	struct work_struct reset_work;
+	struct work_struct probe_work;
+	struct work_struct scan_work;
+	bool subsystem;
+	u32 max_hw_sectors;
+	u32 stripe_size;
+	u32 page_size;
+	void __iomem *cmb;
+	dma_addr_t cmb_dma_addr;
+	u64 cmb_size;
+	u32 cmbsz;
+
+	struct nvme_ctrl ctrl;
+};
+
+static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
+{
+	return container_of(ctrl, struct nvme_dev, ctrl);
+}
+
+/*
  * An NVM Express queue.  Each device has at least two (one for admin
  * commands and one for I/O commands).
  */
@@ -331,7 +377,7 @@ static void async_req_completion(struct nvme_queue *nvmeq, void *ctx,
 	u16 status = le16_to_cpup(&cqe->status) >> 1;
 
 	if (status == NVME_SC_SUCCESS || status == NVME_SC_ABORT_REQ)
-		++nvmeq->dev->event_limit;
+		++nvmeq->dev->ctrl.event_limit;
 	if (status != NVME_SC_SUCCESS)
 		return;
 
@@ -355,7 +401,7 @@ static void abort_completion(struct nvme_queue *nvmeq, void *ctx,
 	blk_mq_free_request(req);
 
 	dev_warn(nvmeq->q_dmadev, "Abort status:%x result:%x", status, result);
-	++nvmeq->dev->abort_limit;
+	++nvmeq->dev->ctrl.abort_limit;
 }
 
 static void async_completion(struct nvme_queue *nvmeq, void *ctx,
@@ -1037,7 +1083,7 @@ static int nvme_submit_async_admin_req(struct nvme_dev *dev)
 	struct nvme_cmd_info *cmd_info;
 	struct request *req;
 
-	req = blk_mq_alloc_request(dev->admin_q, WRITE, GFP_ATOMIC, true);
+	req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_ATOMIC, true);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -1062,7 +1108,7 @@ static int nvme_submit_admin_async_cmd(struct nvme_dev *dev,
 	struct request *req;
 	struct nvme_cmd_info *cmd_rq;
 
-	req = blk_mq_alloc_request(dev->admin_q, WRITE, GFP_KERNEL, false);
+	req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_KERNEL, false);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -1086,7 +1132,7 @@ static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
 	c.delete_queue.opcode = opcode;
 	c.delete_queue.qid = cpu_to_le16(id);
 
-	return nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0);
+	return nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
 }
 
 static int adapter_alloc_cq(struct nvme_dev *dev, u16 qid,
@@ -1107,7 +1153,7 @@ static int adapter_alloc_cq(struct nvme_dev *dev, u16 qid,
 	c.create_cq.cq_flags = cpu_to_le16(flags);
 	c.create_cq.irq_vector = cpu_to_le16(nvmeq->cq_vector);
 
-	return nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0);
+	return nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
 }
 
 static int adapter_alloc_sq(struct nvme_dev *dev, u16 qid,
@@ -1128,7 +1174,7 @@ static int adapter_alloc_sq(struct nvme_dev *dev, u16 qid,
 	c.create_sq.sq_flags = cpu_to_le16(flags);
 	c.create_sq.cqid = cpu_to_le16(qid);
 
-	return nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0);
+	return nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
 }
 
 static int adapter_delete_cq(struct nvme_dev *dev, u16 cqid)
@@ -1167,10 +1213,10 @@ static void nvme_abort_req(struct request *req)
 		return;
 	}
 
-	if (!dev->abort_limit)
+	if (!dev->ctrl.abort_limit)
 		return;
 
-	abort_req = blk_mq_alloc_request(dev->admin_q, WRITE, GFP_ATOMIC,
+	abort_req = blk_mq_alloc_request(dev->ctrl.admin_q, WRITE, GFP_ATOMIC,
 									false);
 	if (IS_ERR(abort_req))
 		return;
@@ -1184,7 +1230,7 @@ static void nvme_abort_req(struct request *req)
 	cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
 	cmd.abort.command_id = abort_req->tag;
 
-	--dev->abort_limit;
+	--dev->ctrl.abort_limit;
 	cmd_rq->aborted = 1;
 
 	dev_warn(nvmeq->q_dmadev, "Aborting I/O %d QID %d\n", req->tag,
@@ -1279,8 +1325,8 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 	nvmeq->cq_vector = -1;
 	spin_unlock_irq(&nvmeq->q_lock);
 
-	if (!nvmeq->qid && nvmeq->dev->admin_q)
-		blk_mq_freeze_queue_start(nvmeq->dev->admin_q);
+	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
+		blk_mq_freeze_queue_start(nvmeq->dev->ctrl.admin_q);
 
 	irq_set_affinity_hint(vector, NULL);
 	free_irq(vector, nvmeq);
@@ -1376,7 +1422,7 @@ static struct nvme_queue *nvme_alloc_queue(struct nvme_dev *dev, int qid,
 	nvmeq->q_dmadev = dev->dev;
 	nvmeq->dev = dev;
 	snprintf(nvmeq->irqname, sizeof(nvmeq->irqname), "nvme%dq%d",
-			dev->instance, qid);
+			dev->ctrl.instance, qid);
 	spin_lock_init(&nvmeq->q_lock);
 	nvmeq->cq_head = 0;
 	nvmeq->cq_phase = 1;
@@ -1543,15 +1589,15 @@ static struct blk_mq_ops nvme_mq_ops = {
 
 static void nvme_dev_remove_admin(struct nvme_dev *dev)
 {
-	if (dev->admin_q && !blk_queue_dying(dev->admin_q)) {
-		blk_cleanup_queue(dev->admin_q);
+	if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q)) {
+		blk_cleanup_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
 }
 
 static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 {
-	if (!dev->admin_q) {
+	if (!dev->ctrl.admin_q) {
 		dev->admin_tagset.ops = &nvme_mq_admin_ops;
 		dev->admin_tagset.nr_hw_queues = 1;
 		dev->admin_tagset.queue_depth = NVME_AQ_DEPTH - 1;
@@ -1564,18 +1610,18 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		if (blk_mq_alloc_tag_set(&dev->admin_tagset))
 			return -ENOMEM;
 
-		dev->admin_q = blk_mq_init_queue(&dev->admin_tagset);
-		if (IS_ERR(dev->admin_q)) {
+		dev->ctrl.admin_q = blk_mq_init_queue(&dev->admin_tagset);
+		if (IS_ERR(dev->ctrl.admin_q)) {
 			blk_mq_free_tag_set(&dev->admin_tagset);
 			return -ENOMEM;
 		}
-		if (!blk_get_queue(dev->admin_q)) {
+		if (!blk_get_queue(dev->ctrl.admin_q)) {
 			nvme_dev_remove_admin(dev);
-			dev->admin_q = NULL;
+			dev->ctrl.admin_q = NULL;
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unfreeze_queue(dev->admin_q);
+		blk_mq_unfreeze_queue(dev->ctrl.admin_q);
 
 	return 0;
 }
@@ -1657,7 +1703,7 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 
 static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 {
-	struct nvme_dev *dev = ns->dev;
+	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
 	struct nvme_user_io io;
 	struct nvme_command c;
 	unsigned length, meta_len;
@@ -1732,7 +1778,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	return status;
 }
 
-static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
+static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 			struct nvme_passthru_cmd __user *ucmd)
 {
 	struct nvme_passthru_cmd cmd;
@@ -1761,7 +1807,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
 	if (cmd.timeout_ms)
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
-	status = __nvme_submit_sync_cmd(ns ? ns->queue : dev->admin_q, &c,
+	status = __nvme_submit_sync_cmd(ns ? ns->queue : ctrl->admin_q, &c,
 			NULL, (void __user *)(uintptr_t)cmd.addr, cmd.data_len,
 			&cmd.result, timeout);
 	if (status >= 0) {
@@ -1791,9 +1837,9 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
 		force_successful_syscall_return();
 		return ns->ns_id;
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ns->dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->dev, ns, (void __user *)arg);
+		return nvme_user_cmd(ns->ctrl, ns, (void __user *)arg);
 	case NVME_IOCTL_SUBMIT_IO:
 		return nvme_submit_io(ns, (void __user *)arg);
 	case SG_GET_VERSION_NUM:
@@ -1823,12 +1869,13 @@ static void nvme_free_dev(struct kref *kref);
 static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
+	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
 
 	spin_lock(&dev_list_lock);
 	ns->disk->private_data = NULL;
 	spin_unlock(&dev_list_lock);
 
-	kref_put(&ns->dev->kref, nvme_free_dev);
+	kref_put(&dev->kref, nvme_free_dev);
 	put_disk(ns->disk);
 	kfree(ns);
 }
@@ -1877,15 +1924,15 @@ static void nvme_config_discard(struct nvme_ns *ns)
 static int nvme_revalidate_disk(struct gendisk *disk)
 {
 	struct nvme_ns *ns = disk->private_data;
-	struct nvme_dev *dev = ns->dev;
+	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
 	struct nvme_id_ns *id;
 	u8 lbaf, pi_type;
 	u16 old_ms;
 	unsigned short bs;
 
-	if (nvme_identify_ns(dev, ns->ns_id, &id)) {
+	if (nvme_identify_ns(&dev->ctrl, ns->ns_id, &id)) {
 		dev_warn(dev->dev, "%s: Identify failure nvme%dn%d\n", __func__,
-						dev->instance, ns->ns_id);
+						dev->ctrl.instance, ns->ns_id);
 		return -ENODEV;
 	}
 	if (id->ncap == 0) {
@@ -1929,7 +1976,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	else
 		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
 
-	if (dev->oncs & NVME_CTRL_ONCS_DSM)
+	if (dev->ctrl.oncs & NVME_CTRL_ONCS_DSM)
 		nvme_config_discard(ns);
 
 	kfree(id);
@@ -1973,10 +2020,10 @@ static int nvme_kthread(void *data)
 				spin_lock_irq(&nvmeq->q_lock);
 				nvme_process_cq(nvmeq);
 
-				while ((i == 0) && (dev->event_limit > 0)) {
+				while (i == 0 && dev->ctrl.event_limit > 0) {
 					if (nvme_submit_async_admin_req(dev))
 						break;
-					dev->event_limit--;
+					dev->ctrl.event_limit--;
 				}
 				spin_unlock_irq(&nvmeq->q_lock);
 			}
@@ -2002,7 +2049,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 		goto out_free_ns;
 	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, ns->queue);
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
-	ns->dev = dev;
+	ns->ctrl = &dev->ctrl;
 	ns->queue->queuedata = ns;
 
 	disk = alloc_disk_node(0, node);
@@ -2023,7 +2070,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	}
 	if (dev->stripe_size)
 		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
-	if (dev->vwc & NVME_CTRL_VWC_PRESENT)
+	if (dev->ctrl.vwc & NVME_CTRL_VWC_PRESENT)
 		blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
 	blk_queue_virt_boundary(ns->queue, dev->page_size - 1);
 
@@ -2034,7 +2081,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	disk->queue = ns->queue;
 	disk->driverfs_dev = dev->device;
 	disk->flags = GENHD_FL_EXT_DEVT;
-	sprintf(disk->disk_name, "nvme%dn%d", dev->instance, nsid);
+	sprintf(disk->disk_name, "nvme%dn%d", dev->ctrl.instance, nsid);
 
 	/*
 	 * Initialize capacity to 0 until we establish the namespace format and
@@ -2097,7 +2144,7 @@ static int set_queue_count(struct nvme_dev *dev, int count)
 	u32 result;
 	u32 q_count = (count - 1) | ((count - 1) << 16);
 
-	status = nvme_set_features(dev, NVME_FEAT_NUM_QUEUES, q_count, 0,
+	status = nvme_set_features(&dev->ctrl, NVME_FEAT_NUM_QUEUES, q_count, 0,
 								&result);
 	if (status < 0)
 		return status;
@@ -2281,7 +2328,8 @@ static inline bool nvme_io_incapable(struct nvme_dev *dev)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
-	bool kill = nvme_io_incapable(ns->dev) && !blk_queue_dying(ns->queue);
+	bool kill = nvme_io_incapable(to_nvme_dev(ns->ctrl)) &&
+			!blk_queue_dying(ns->queue);
 
 	if (kill)
 		blk_set_queue_dying(ns->queue);
@@ -2341,7 +2389,7 @@ static void nvme_dev_scan(struct work_struct *work)
 
 	if (!dev->tagset.tags)
 		return;
-	if (nvme_identify_ctrl(dev, &ctrl))
+	if (nvme_identify_ctrl(&dev->ctrl, &ctrl))
 		return;
 	nvme_scan_namespaces(dev, le32_to_cpup(&ctrl->nn));
 	kfree(ctrl);
@@ -2361,18 +2409,18 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	struct nvme_id_ctrl *ctrl;
 	int shift = NVME_CAP_MPSMIN(readq(dev->bar + NVME_REG_CAP)) + 12;
 
-	res = nvme_identify_ctrl(dev, &ctrl);
+	res = nvme_identify_ctrl(&dev->ctrl, &ctrl);
 	if (res) {
 		dev_err(dev->dev, "Identify Controller failed (%d)\n", res);
 		return -EIO;
 	}
 
-	dev->oncs = le16_to_cpup(&ctrl->oncs);
-	dev->abort_limit = ctrl->acl + 1;
-	dev->vwc = ctrl->vwc;
-	memcpy(dev->serial, ctrl->sn, sizeof(ctrl->sn));
-	memcpy(dev->model, ctrl->mn, sizeof(ctrl->mn));
-	memcpy(dev->firmware_rev, ctrl->fr, sizeof(ctrl->fr));
+	dev->ctrl.oncs = le16_to_cpup(&ctrl->oncs);
+	dev->ctrl.abort_limit = ctrl->acl + 1;
+	dev->ctrl.vwc = ctrl->vwc;
+	memcpy(dev->ctrl.serial, ctrl->sn, sizeof(ctrl->sn));
+	memcpy(dev->ctrl.model, ctrl->mn, sizeof(ctrl->mn));
+	memcpy(dev->ctrl.firmware_rev, ctrl->fr, sizeof(ctrl->fr));
 	if (ctrl->mdts)
 		dev->max_hw_sectors = 1 << (ctrl->mdts + shift - 9);
 	if ((pdev->vendor == PCI_VENDOR_ID_INTEL) &&
@@ -2599,7 +2647,7 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 	DEFINE_KTHREAD_WORKER_ONSTACK(worker);
 	struct nvme_delq_ctx dq;
 	struct task_struct *kworker_task = kthread_run(kthread_worker_fn,
-					&worker, "nvme%d", dev->instance);
+					&worker, "nvme%d", dev->ctrl.instance);
 
 	if (IS_ERR(kworker_task)) {
 		dev_err(dev->dev,
@@ -2750,14 +2798,14 @@ static int nvme_set_instance(struct nvme_dev *dev)
 	if (error)
 		return -ENODEV;
 
-	dev->instance = instance;
+	dev->ctrl.instance = instance;
 	return 0;
 }
 
 static void nvme_release_instance(struct nvme_dev *dev)
 {
 	spin_lock(&dev_list_lock);
-	ida_remove(&nvme_instance_ida, dev->instance);
+	ida_remove(&nvme_instance_ida, dev->ctrl.instance);
 	spin_unlock(&dev_list_lock);
 }
 
@@ -2770,8 +2818,8 @@ static void nvme_free_dev(struct kref *kref)
 	nvme_release_instance(dev);
 	if (dev->tagset.tags)
 		blk_mq_free_tag_set(&dev->tagset);
-	if (dev->admin_q)
-		blk_put_queue(dev->admin_q);
+	if (dev->ctrl.admin_q)
+		blk_put_queue(dev->ctrl.admin_q);
 	kfree(dev->queues);
 	kfree(dev->entry);
 	kfree(dev);
@@ -2785,8 +2833,8 @@ static int nvme_dev_open(struct inode *inode, struct file *f)
 
 	spin_lock(&dev_list_lock);
 	list_for_each_entry(dev, &dev_list, node) {
-		if (dev->instance == instance) {
-			if (!dev->admin_q) {
+		if (dev->ctrl.instance == instance) {
+			if (!dev->ctrl.admin_q) {
 				ret = -EWOULDBLOCK;
 				break;
 			}
@@ -2816,12 +2864,12 @@ static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(dev, NULL, (void __user *)arg);
+		return nvme_user_cmd(&dev->ctrl, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
 		if (list_empty(&dev->namespaces))
 			return -ENOTTY;
 		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
-		return nvme_user_cmd(dev, ns, (void __user *)arg);
+		return nvme_user_cmd(&dev->ctrl, ns, (void __user *)arg);
 	case NVME_IOCTL_RESET:
 		dev_warn(dev->dev, "resetting controller\n");
 		return nvme_reset(dev);
@@ -2882,7 +2930,7 @@ static void nvme_probe_work(struct work_struct *work)
 	if (result)
 		goto free_tags;
 
-	dev->event_limit = 1;
+	dev->ctrl.event_limit = 1;
 
 	/*
 	 * Keep the controller around but remove all namespaces if we don't have
@@ -2900,8 +2948,8 @@ static void nvme_probe_work(struct work_struct *work)
 
  free_tags:
 	nvme_dev_remove_admin(dev);
-	blk_put_queue(dev->admin_q);
-	dev->admin_q = NULL;
+	blk_put_queue(dev->ctrl.admin_q);
+	dev->ctrl.admin_q = NULL;
 	dev->queues[0]->tags = NULL;
  disable:
 	nvme_disable_queue(dev, 0);
@@ -2929,7 +2977,7 @@ static void nvme_dead_ctrl(struct nvme_dev *dev)
 	dev_warn(dev->dev, "Device failed to resume\n");
 	kref_get(&dev->kref);
 	if (IS_ERR(kthread_run(nvme_remove_dead_ctrl, dev, "nvme%d",
-						dev->instance))) {
+						dev->ctrl.instance))) {
 		dev_err(dev->dev,
 			"Failed to start controller remove task\n");
 		kref_put(&dev->kref, nvme_free_dev);
@@ -2971,7 +3019,7 @@ static int nvme_reset(struct nvme_dev *dev)
 {
 	int ret;
 
-	if (!dev->admin_q || blk_queue_dying(dev->admin_q))
+	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
 		return -ENODEV;
 
 	spin_lock(&dev_list_lock);
@@ -3002,6 +3050,16 @@ static ssize_t nvme_sysfs_reset(struct device *dev,
 }
 static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, nvme_sysfs_reset);
 
+static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
+{
+	*val = readl(to_nvme_dev(ctrl)->bar + off);
+	return 0;
+}
+
+static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
+	.reg_read32		= nvme_pci_reg_read32,
+};
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int node, result = -ENOMEM;
@@ -3028,7 +3086,9 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dev->dev = get_device(&pdev->dev);
 	pci_set_drvdata(pdev, dev);
 
-	dev->vendor = pdev->vendor;
+	dev->ctrl.vendor = pdev->vendor;
+	dev->ctrl.ops = &nvme_pci_ctrl_ops;
+	dev->ctrl.dev = dev->dev;
 
 	result = nvme_set_instance(dev);
 	if (result)
@@ -3040,8 +3100,8 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	kref_init(&dev->kref);
 	dev->device = device_create(nvme_class, &pdev->dev,
-				MKDEV(nvme_char_major, dev->instance),
-				dev, "nvme%d", dev->instance);
+				MKDEV(nvme_char_major, dev->ctrl.instance),
+				dev, "nvme%d", dev->ctrl.instance);
 	if (IS_ERR(dev->device)) {
 		result = PTR_ERR(dev->device);
 		goto release_pools;
@@ -3060,7 +3120,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return 0;
 
  put_dev:
-	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->instance));
+	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
 	put_device(dev->device);
  release_pools:
 	nvme_release_prp_pools(dev);
@@ -3107,7 +3167,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_dev_remove(dev);
 	nvme_dev_shutdown(dev);
 	nvme_dev_remove_admin(dev);
-	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->instance));
+	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
 	nvme_release_prp_pools(dev);
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index a5f6af1..00d0bdd 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -524,7 +524,7 @@ static int nvme_trans_standard_inquiry_page(struct nvme_ns *ns,
 					struct sg_io_hdr *hdr, u8 *inq_response,
 					int alloc_len)
 {
-	struct nvme_dev *dev = ns->dev;
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ns *id_ns;
 	int res;
 	int nvme_sc;
@@ -532,10 +532,10 @@ static int nvme_trans_standard_inquiry_page(struct nvme_ns *ns,
 	u8 resp_data_format = 0x02;
 	u8 protect;
 	u8 cmdque = 0x01 << 1;
-	u8 fw_offset = sizeof(dev->firmware_rev);
+	u8 fw_offset = sizeof(ctrl->firmware_rev);
 
 	/* nvme ns identify - use DPS value for PROTECT field */
-	nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+	nvme_sc = nvme_identify_ns(ctrl, ns->ns_id, &id_ns);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		return res;
@@ -553,12 +553,12 @@ static int nvme_trans_standard_inquiry_page(struct nvme_ns *ns,
 	inq_response[5] = protect;	/* sccs=0 | acc=0 | tpgs=0 | pc3=0 */
 	inq_response[7] = cmdque;	/* wbus16=0 | sync=0 | vs=0 */
 	strncpy(&inq_response[8], "NVMe    ", 8);
-	strncpy(&inq_response[16], dev->model, 16);
+	strncpy(&inq_response[16], ctrl->model, 16);
 
-	while (dev->firmware_rev[fw_offset - 1] == ' ' && fw_offset > 4)
+	while (ctrl->firmware_rev[fw_offset - 1] == ' ' && fw_offset > 4)
 		fw_offset--;
 	fw_offset -= 4;
-	strncpy(&inq_response[32], dev->firmware_rev + fw_offset, 4);
+	strncpy(&inq_response[32], ctrl->firmware_rev + fw_offset, 4);
 
 	xfer_len = min(alloc_len, STANDARD_INQUIRY_LENGTH);
 	return nvme_trans_copy_to_user(hdr, inq_response, xfer_len);
@@ -588,13 +588,12 @@ static int nvme_trans_unit_serial_page(struct nvme_ns *ns,
 					struct sg_io_hdr *hdr, u8 *inq_response,
 					int alloc_len)
 {
-	struct nvme_dev *dev = ns->dev;
 	int xfer_len;
 
 	memset(inq_response, 0, STANDARD_INQUIRY_LENGTH);
 	inq_response[1] = INQ_UNIT_SERIAL_NUMBER_PAGE; /* Page Code */
 	inq_response[3] = INQ_SERIAL_NUMBER_LENGTH;    /* Page Length */
-	strncpy(&inq_response[4], dev->serial, INQ_SERIAL_NUMBER_LENGTH);
+	strncpy(&inq_response[4], ns->ctrl->serial, INQ_SERIAL_NUMBER_LENGTH);
 
 	xfer_len = min(alloc_len, STANDARD_INQUIRY_LENGTH);
 	return nvme_trans_copy_to_user(hdr, inq_response, xfer_len);
@@ -603,27 +602,32 @@ static int nvme_trans_unit_serial_page(struct nvme_ns *ns,
 static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 					u8 *inq_response, int alloc_len)
 {
-	struct nvme_dev *dev = ns->dev;
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	int res;
 	int nvme_sc;
 	int xfer_len;
+	u32 vs;
 	__be32 tmp_id = cpu_to_be32(ns->ns_id);
 
+	res = ctrl->ops->reg_read32(ctrl, NVME_REG_VS, &vs);
+	if (res)
+		return res;
+
 	memset(inq_response, 0, alloc_len);
 	inq_response[1] = INQ_DEVICE_IDENTIFICATION_PAGE;    /* Page Code */
-	if (readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 1)) {
+	if (vs >= NVME_VS(1, 1)) {
 		struct nvme_id_ns *id_ns;
 		void *eui;
 		int len;
 
-		nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+		nvme_sc = nvme_identify_ns(ctrl, ns->ns_id, &id_ns);
 		res = nvme_trans_status_code(hdr, nvme_sc);
 		if (res)
 			return res;
 
 		eui = id_ns->eui64;
 		len = sizeof(id_ns->eui64);
-		if (readl(dev->bar + NVME_REG_VS) >= NVME_VS(1, 2)) {
+		if (vs >= NVME_VS(1, 2)) {
 			if (bitmap_empty(eui, len * 8)) {
 				eui = id_ns->nguid;
 				len = sizeof(id_ns->nguid);
@@ -657,10 +661,10 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 		inq_response[6] = 0x00;    /* Rsvd */
 		inq_response[7] = 0x44;    /* Designator Length */
 
-		sprintf(&inq_response[8], "%04x", dev->vendor);
-		memcpy(&inq_response[12], dev->model, sizeof(dev->model));
+		sprintf(&inq_response[8], "%04x", ctrl->vendor);
+		memcpy(&inq_response[12], ctrl->model, sizeof(ctrl->model));
 		sprintf(&inq_response[52], "%04x", tmp_id);
-		memcpy(&inq_response[56], dev->serial, sizeof(dev->serial));
+		memcpy(&inq_response[56], ctrl->serial, sizeof(ctrl->serial));
 	}
 	xfer_len = alloc_len;
 	return nvme_trans_copy_to_user(hdr, inq_response, xfer_len);
@@ -672,7 +676,7 @@ static int nvme_trans_ext_inq_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	u8 *inq_response;
 	int res;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ctrl *id_ctrl;
 	struct nvme_id_ns *id_ns;
 	int xfer_len;
@@ -688,7 +692,7 @@ static int nvme_trans_ext_inq_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	if (inq_response == NULL)
 		return -ENOMEM;
 
-	nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+	nvme_sc = nvme_identify_ns(ctrl, ns->ns_id, &id_ns);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		goto out_free_inq;
@@ -704,7 +708,7 @@ static int nvme_trans_ext_inq_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	app_chk = protect << 1;
 	ref_chk = protect;
 
-	nvme_sc = nvme_identify_ctrl(dev, &id_ctrl);
+	nvme_sc = nvme_identify_ctrl(ctrl, &id_ctrl);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		goto out_free_inq;
@@ -815,7 +819,6 @@ static int nvme_trans_log_info_exceptions(struct nvme_ns *ns,
 	int res;
 	int xfer_len;
 	u8 *log_response;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_smart_log *smart_log;
 	u8 temp_c;
 	u16 temp_k;
@@ -824,7 +827,7 @@ static int nvme_trans_log_info_exceptions(struct nvme_ns *ns,
 	if (log_response == NULL)
 		return -ENOMEM;
 
-	res = nvme_get_log_page(dev, &smart_log);
+	res = nvme_get_log_page(ns->ctrl, &smart_log);
 	if (res < 0)
 		goto out_free_response;
 
@@ -862,7 +865,6 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	int res;
 	int xfer_len;
 	u8 *log_response;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_smart_log *smart_log;
 	u32 feature_resp;
 	u8 temp_c_cur, temp_c_thresh;
@@ -872,7 +874,7 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	if (log_response == NULL)
 		return -ENOMEM;
 
-	res = nvme_get_log_page(dev, &smart_log);
+	res = nvme_get_log_page(ns->ctrl, &smart_log);
 	if (res < 0)
 		goto out_free_response;
 
@@ -886,7 +888,7 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	kfree(smart_log);
 
 	/* Get Features for Temp Threshold */
-	res = nvme_get_features(dev, NVME_FEAT_TEMP_THRESH, 0, 0,
+	res = nvme_get_features(ns->ctrl, NVME_FEAT_TEMP_THRESH, 0, 0,
 								&feature_resp);
 	if (res != NVME_SC_SUCCESS)
 		temp_c_thresh = LOG_TEMP_UNKNOWN;
@@ -948,7 +950,6 @@ static int nvme_trans_fill_blk_desc(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 {
 	int res;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_id_ns *id_ns;
 	u8 flbas;
 	u32 lba_length;
@@ -958,7 +959,7 @@ static int nvme_trans_fill_blk_desc(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	else if (llbaa > 0 && len < MODE_PAGE_LLBAA_BLK_DES_LEN)
 		return -EINVAL;
 
-	nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+	nvme_sc = nvme_identify_ns(ns->ctrl, ns->ns_id, &id_ns);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		return res;
@@ -1014,14 +1015,13 @@ static int nvme_trans_fill_caching_page(struct nvme_ns *ns,
 {
 	int res = 0;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	u32 feature_resp;
 	u8 vwc;
 
 	if (len < MODE_PAGE_CACHING_LEN)
 		return -EINVAL;
 
-	nvme_sc = nvme_get_features(dev, NVME_FEAT_VOLATILE_WC, 0, 0,
+	nvme_sc = nvme_get_features(ns->ctrl, NVME_FEAT_VOLATILE_WC, 0, 0,
 								&feature_resp);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
@@ -1207,12 +1207,11 @@ static int nvme_trans_power_state(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 {
 	int res;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_id_ctrl *id_ctrl;
 	int lowest_pow_st;	/* max npss = lowest power consumption */
 	unsigned ps_desired = 0;
 
-	nvme_sc = nvme_identify_ctrl(dev, &id_ctrl);
+	nvme_sc = nvme_identify_ctrl(ns->ctrl, &id_ctrl);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		return res;
@@ -1256,7 +1255,7 @@ static int nvme_trans_power_state(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 				SCSI_ASCQ_CAUSE_NOT_REPORTABLE);
 		break;
 	}
-	nvme_sc = nvme_set_features(dev, NVME_FEAT_POWER_MGMT, ps_desired, 0,
+	nvme_sc = nvme_set_features(ns->ctrl, NVME_FEAT_POWER_MGMT, ps_desired, 0,
 				    NULL);
 	return nvme_trans_status_code(hdr, nvme_sc);
 }
@@ -1280,7 +1279,6 @@ static int nvme_trans_send_download_fw_cmd(struct nvme_ns *ns, struct sg_io_hdr
 					u8 buffer_id)
 {
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_command c;
 
 	if (hdr->iovec_count > 0) {
@@ -1297,7 +1295,7 @@ static int nvme_trans_send_download_fw_cmd(struct nvme_ns *ns, struct sg_io_hdr
 	c.dlfw.numd = cpu_to_le32((tot_len/BYTES_TO_DWORDS) - 1);
 	c.dlfw.offset = cpu_to_le32(offset/BYTES_TO_DWORDS);
 
-	nvme_sc = __nvme_submit_sync_cmd(dev->admin_q, &c, NULL,
+	nvme_sc = __nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, NULL,
 			hdr->dxferp, tot_len, NULL, 0);
 	return nvme_trans_status_code(hdr, nvme_sc);
 }
@@ -1364,14 +1362,13 @@ static int nvme_trans_modesel_get_mp(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 {
 	int res = 0;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	unsigned dword11;
 
 	switch (page_code) {
 	case MODE_PAGE_CACHING:
 		dword11 = ((mode_page[2] & CACHING_MODE_PAGE_WCE_MASK) ? 1 : 0);
-		nvme_sc = nvme_set_features(dev, NVME_FEAT_VOLATILE_WC, dword11,
-					    0, NULL);
+		nvme_sc = nvme_set_features(ns->ctrl, NVME_FEAT_VOLATILE_WC,
+					    dword11, 0, NULL);
 		res = nvme_trans_status_code(hdr, nvme_sc);
 		break;
 	case MODE_PAGE_CONTROL:
@@ -1473,7 +1470,6 @@ static int nvme_trans_fmt_set_blk_size_count(struct nvme_ns *ns,
 {
 	int res = 0;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	u8 flbas;
 
 	/*
@@ -1486,7 +1482,7 @@ static int nvme_trans_fmt_set_blk_size_count(struct nvme_ns *ns,
 	if (ns->mode_select_num_blocks == 0 || ns->mode_select_block_len == 0) {
 		struct nvme_id_ns *id_ns;
 
-		nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+		nvme_sc = nvme_identify_ns(ns->ctrl, ns->ns_id, &id_ns);
 		res = nvme_trans_status_code(hdr, nvme_sc);
 		if (res)
 			return res;
@@ -1570,7 +1566,6 @@ static int nvme_trans_fmt_send_cmd(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 {
 	int res;
 	int nvme_sc;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_id_ns *id_ns;
 	u8 i;
 	u8 flbas, nlbaf;
@@ -1579,7 +1574,7 @@ static int nvme_trans_fmt_send_cmd(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	struct nvme_command c;
 
 	/* Loop thru LBAF's in id_ns to match reqd lbaf, put in cdw10 */
-	nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+	nvme_sc = nvme_identify_ns(ns->ctrl, ns->ns_id, &id_ns);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		return res;
@@ -1611,7 +1606,7 @@ static int nvme_trans_fmt_send_cmd(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	c.format.nsid = cpu_to_le32(ns->ns_id);
 	c.format.cdw10 = cpu_to_le32(cdw10);
 
-	nvme_sc = nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0);
+	nvme_sc = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, NULL, 0);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 
 	kfree(id_ns);
@@ -2040,7 +2035,6 @@ static int nvme_trans_read_capacity(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	u32 alloc_len;
 	u32 resp_size;
 	u32 xfer_len;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_id_ns *id_ns;
 	u8 *response;
 
@@ -2052,7 +2046,7 @@ static int nvme_trans_read_capacity(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 		resp_size = READ_CAP_10_RESP_SIZE;
 	}
 
-	nvme_sc = nvme_identify_ns(dev, ns->ns_id, &id_ns);
+	nvme_sc = nvme_identify_ns(ns->ctrl, ns->ns_id, &id_ns);
 	res = nvme_trans_status_code(hdr, nvme_sc);
 	if (res)
 		return res;	
@@ -2080,7 +2074,6 @@ static int nvme_trans_report_luns(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	int nvme_sc;
 	u32 alloc_len, xfer_len, resp_size;
 	u8 *response;
-	struct nvme_dev *dev = ns->dev;
 	struct nvme_id_ctrl *id_ctrl;
 	u32 ll_length, lun_id;
 	u8 lun_id_offset = REPORT_LUNS_FIRST_LUN_OFFSET;
@@ -2094,7 +2087,7 @@ static int nvme_trans_report_luns(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	case ALL_LUNS_RETURNED:
 	case ALL_WELL_KNOWN_LUNS_RETURNED:
 	case RESTRICTED_LUNS_RETURNED:
-		nvme_sc = nvme_identify_ctrl(dev, &id_ctrl);
+		nvme_sc = nvme_identify_ctrl(ns->ctrl, &id_ctrl);
 		res = nvme_trans_status_code(hdr, nvme_sc);
 		if (res)
 			return res;
@@ -2295,9 +2288,7 @@ static int nvme_trans_test_unit_ready(struct nvme_ns *ns,
 					struct sg_io_hdr *hdr,
 					u8 *cmd)
 {
-	struct nvme_dev *dev = ns->dev;
-
-	if (!(readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_RDY))
+	if (nvme_ctrl_ready(ns->ctrl))
 		return nvme_trans_completion(hdr, SAM_STAT_CHECK_CONDITION,
 					    NOT_READY, SCSI_ASC_LUN_NOT_READY,
 					    SCSI_ASCQ_CAUSE_NOT_REPORTABLE);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-16  5:58 ` [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
@ 2015-10-20 10:19   ` Sagi Grimberg
  2015-10-20 10:26     ` Christoph Hellwig
  2015-10-21 21:23   ` J Freyensee
  1 sibling, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:19 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> The new struct nvme_ctrl will be used by the common NVMe code that sits
> on top of struct request_queue and the new nvme_ctrl_ops abstraction.
> It only contains the bare minimum required, which consists of values
> sampled during controller probe, the admin queue pointer and a second
> struct device pointer at the moment,

Hi Christoph,

Can you explain why nvme_ctrl needs an additional struct device pointer?
(I understand it will go away from your statement "at the moment"
haven't read the rest of the patch set yet).

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-20 10:19   ` Sagi Grimberg
@ 2015-10-20 10:26     ` Christoph Hellwig
  2015-10-20 10:44       ` Sagi Grimberg
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-20 10:26 UTC (permalink / raw)

On Tue, Oct 20, 2015@01:19:45PM +0300, Sagi Grimberg wrote:
> Can you explain why nvme_ctrl needs an additional struct device pointer?
> (I understand it will go away from your statement "at the moment"
> haven't read the rest of the patch set yet).

No, it won't go away for the foreseeable future.  I don't want to move
any members used in the hot path to sturct nvme_ctrl so that drivers can
be optimized in this path without impacting the generic code.  Both for
PCIe and Fabrics we need to squeeze the last bit of performance out and
I don't want to have to cacheline optimize nvme_ctrl or even worse
interactions between it and the containing structures.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-20 10:26     ` Christoph Hellwig
@ 2015-10-20 10:44       ` Sagi Grimberg
  2015-10-20 11:30         ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:44 UTC (permalink / raw)


On 10/20/2015 1:26 PM, Christoph Hellwig wrote:
> On Tue, Oct 20, 2015@01:19:45PM +0300, Sagi Grimberg wrote:
>> Can you explain why nvme_ctrl needs an additional struct device pointer?
>> (I understand it will go away from your statement "at the moment"
>> haven't read the rest of the patch set yet).
>
> No, it won't go away for the foreseeable future.  I don't want to move
> any members used in the hot path to sturct nvme_ctrl so that drivers can
> be optimized in this path without impacting the generic code.  Both for
> PCIe and Fabrics we need to squeeze the last bit of performance out and
> I don't want to have to cacheline optimize nvme_ctrl or even worse
> interactions between it and the containing structures.
>

I wasn't suggesting moving it to nvme_ctrl, I was suggesting to keep it
in nvme_dev and maybe add a helper have nvme_ctrl_to_dev(ctrl)->dev used
in slow paths. It would allow to remove the copy pointer in nvme_ctrl.

It's just a nit so I don't really mind keeping the copy.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-20 10:44       ` Sagi Grimberg
@ 2015-10-20 11:30         ` Christoph Hellwig
  2015-10-21 14:40           ` Sagi Grimberg
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-20 11:30 UTC (permalink / raw)


On Tue, Oct 20, 2015@01:44:01PM +0300, Sagi Grimberg wrote:
> I wasn't suggesting moving it to nvme_ctrl, I was suggesting to keep it
> in nvme_dev and maybe add a helper have nvme_ctrl_to_dev(ctrl)->dev used
> in slow paths. It would allow to remove the copy pointer in nvme_ctrl.

But it would require adding a function pointer to get the location from
the different transport drivers.  A pointer seems much simpler than that.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-20 11:30         ` Christoph Hellwig
@ 2015-10-21 14:40           ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-21 14:40 UTC (permalink / raw)


On 10/20/2015 2:30 PM, Christoph Hellwig wrote:
> On Tue, Oct 20, 2015@01:44:01PM +0300, Sagi Grimberg wrote:
>> I wasn't suggesting moving it to nvme_ctrl, I was suggesting to keep it
>> in nvme_dev and maybe add a helper have nvme_ctrl_to_dev(ctrl)->dev used
>> in slow paths. It would allow to remove the copy pointer in nvme_ctrl.
>
> But it would require adding a function pointer to get the location from
> the different transport drivers.  A pointer seems much simpler than that.
>

Yea, I'm fine with that...

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-16  5:58 ` [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
  2015-10-20 10:19   ` Sagi Grimberg
@ 2015-10-21 21:23   ` J Freyensee
  2015-10-21 22:51     ` Busch, Keith
  2015-10-22  7:37     ` Christoph Hellwig
  1 sibling, 2 replies; 59+ messages in thread
From: J Freyensee @ 2015-10-21 21:23 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> The new struct nvme_ctrl will be used by the common NVMe code that 
> sits
> on top of struct request_queue and the new nvme_ctrl_ops abstraction.
> It only contains the bare minimum required, which consists of values
> sampled during controller probe, the admin queue pointer and a second
> struct device pointer at the moment, but more will follow later. 
>  Only
> values that are not used in the I/O fast path should be moved to
> struct nvme_ctrl so that drivers can optimize their cache line usage
> easily.  That's also the reason why we have two device pointers as
> the struct device is used for DMA mapping purposes.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/core.c |  10 +--
>  drivers/nvme/host/nvme.h |  61 ++++++---------
>  drivers/nvme/host/pci.c  | 190 +++++++++++++++++++++++++++++++------
> ----------
>  drivers/nvme/host/scsi.c |  85 ++++++++++-----------
>  4 files changed, 190 insertions(+), 156 deletions(-)
> 

<snipped>


> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 370aa5b..3e409fa 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -25,46 +25,16 @@ extern unsigned char nvme_io_timeout;
>  extern unsigned char admin_timeout;
>  #define ADMIN_TIMEOUT	(admin_timeout * HZ)
>  
> -/*
> - * Represents an NVM Express device.  Each nvme_dev is a PCI 
> function.
> - */
> -struct nvme_dev {
> -	struct list_head node;
> -	struct nvme_queue **queues;
> +struct nvme_ctrl {

Whether it is a PCIe NVMe device with multiple controllers or something
beyond PCIe, I think an instance of struct nvme_ctrl is going to need
to know its cntlid.  How does this struct know its cntlid?  I'm not
initially seeing it.  I think it would make more sense to have struct
nvme_ctrl have a member that stores its cntlid value.  It would
basically be the "name" of the specific nvme_ctrl instance allocated.


> +	const struct nvme_ctrl_ops *ops;
>  	struct request_queue *admin_q;
> -	struct blk_mq_tag_set tagset;
> -	struct blk_mq_tag_set admin_tagset;
> -	u32 __iomem *dbs;
>  	struct device *dev;
> -	struct dma_pool *prp_page_pool;
> -	struct dma_pool *prp_small_pool;
>  	int instance;
> -	unsigned queue_count;
> -	unsigned online_queues;
> -	unsigned max_qid;
> -	int q_depth;
> -	u32 db_stride;
> -	u32 ctrl_config;
> -	struct msix_entry *entry;
> -	void __iomem *bar;
> -	struct list_head namespaces;
> -	struct kref kref;
> -	struct device *device;
> -	struct work_struct reset_work;
> -	struct work_struct probe_work;
> -	struct work_struct scan_work;
> +
>  	char name[12];
>  	char serial[20];
>  	char model[40];
>  	char firmware_rev[8];
> -	bool subsystem;

Also, this is something probably a bit more far visioned, but I think
struct nvme_ctrl would need a mechanism to know what NVMe subsystem it
sits in.  Even if 'subsystem' stayed in the struct, I'm not sure how a
bool would work for this.

> -	u32 max_hw_sectors;
> -	u32 stripe_size;
> -	u32 page_size;
> -	void __iomem *cmb;
> -	dma_addr_t cmb_dma_addr;
> -	u64 cmb_size;
> -	u32 cmbsz;
>  	u16 oncs;
>  	u16 abort_limit;
>  	u8 event_limit;
> @@ -78,7 +48,7 @@ struct nvme_dev {
>  struct nvme_ns {
>  	struct list_head list;
>  
> -	struct nvme_dev *dev;
> +	struct nvme_ctrl *ctrl;

This seems a bit backwards to me.  It's the controller (cntlid) that is
going to tell the host how many namespaces are associated with it via
the NVMe Identify commands.  Thus, I would have thought that a list of
struct nvme_ns instances would be in a struct nvme_ctrl definition, not
vice-versa.  Unless '*ctrl' is going to be used as a back pointer?  But
then in 'struct nvme_ctrl' I didn't see initially see anything that
associates itself to the namespaces attached to it.

>  	struct request_queue *queue;
>  	struct gendisk *disk;
>  	struct kref kref;
> @@ -92,6 +62,19 @@ struct nvme_ns {
>  	u32 mode_select_block_len;
>  };
>  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-21 21:23   ` J Freyensee
@ 2015-10-21 22:51     ` Busch, Keith
  2015-10-22  0:15       ` J Freyensee
  2015-10-22  7:37     ` Christoph Hellwig
  1 sibling, 1 reply; 59+ messages in thread
From: Busch, Keith @ 2015-10-21 22:51 UTC (permalink / raw)


On Wed, Oct 21, 2015@02:23:46PM -0700, J Freyensee wrote:
> >  	char firmware_rev[8];
> > -	bool subsystem;
> 
> Also, this is something probably a bit more far visioned, but I think
> struct nvme_ctrl would need a mechanism to know what NVMe subsystem it
> sits in.  Even if 'subsystem' stayed in the struct, I'm not sure how a
> bool would work for this.

The bool is not to identify a specific subsystem. It's only to notify
the driver that this controller is subsystem capable. In other words,
please periodically check the subsystem reset notification since that
can happen at anytime externally from host connected to the controller,
and we want to know when such events occur.

> > @@ -78,7 +48,7 @@ struct nvme_dev {
> >  struct nvme_ns {
> >  	struct list_head list;
> >  
> > -	struct nvme_dev *dev;
> > +	struct nvme_ctrl *ctrl;
> 
> This seems a bit backwards to me.  It's the controller (cntlid) that is
> going to tell the host how many namespaces are associated with it via
> the NVMe Identify commands.  Thus, I would have thought that a list of
> struct nvme_ns instances would be in a struct nvme_ctrl definition, not
> vice-versa.  Unless '*ctrl' is going to be used as a back pointer?  But
> then in 'struct nvme_ctrl' I didn't see initially see anything that
> associates itself to the namespaces attached to it.

The 'ctrl' pointer is a back pointer to the owning controller.

The controller itself contains a list_head appropriately called
'namespaces' to hold a reference to all its namespaces. The nvme_ns
list_head 'list' is simply the entry item for that list.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-21 22:51     ` Busch, Keith
@ 2015-10-22  0:15       ` J Freyensee
  0 siblings, 0 replies; 59+ messages in thread
From: J Freyensee @ 2015-10-22  0:15 UTC (permalink / raw)


On Wed, 2015-10-21@22:51 +0000, Busch, Keith wrote:
> On Wed, Oct 21, 2015@02:23:46PM -0700, J Freyensee wrote:
> > >  	char firmware_rev[8];
> > > -	bool subsystem;
> > 
> > Also, this is something probably a bit more far visioned, but I 
> > think
> > struct nvme_ctrl would need a mechanism to know what NVMe subsystem 
> > it
> > sits in.  Even if 'subsystem' stayed in the struct, I'm not sure 
> > how a
> > bool would work for this.
> 
> The bool is not to identify a specific subsystem. It's only to notify
> the driver that this controller is subsystem capable. In other words,
> please periodically check the subsystem reset notification since that
> can happen at anytime externally from host connected to the 
> controller,
> and we want to know when such events occur.
> 
> > > @@ -78,7 +48,7 @@ struct nvme_dev {
> > >  struct nvme_ns {
> > >  	struct list_head list;
> > >  
> > > -	struct nvme_dev *dev;
> > > +	struct nvme_ctrl *ctrl;
> > 
> > This seems a bit backwards to me.  It's the controller (cntlid) 
> > that is
> > going to tell the host how many namespaces are associated with it 
> > via
> > the NVMe Identify commands.  Thus, I would have thought that a list 
> > of
> > struct nvme_ns instances would be in a struct nvme_ctrl definition, 
> > not
> > vice-versa.  Unless '*ctrl' is going to be used as a back pointer? 
> >  But
> > then in 'struct nvme_ctrl' I didn't see initially see anything that
> > associates itself to the namespaces attached to it.
> 
> The 'ctrl' pointer is a back pointer to the owning controller.
> 
> The controller itself contains a list_head appropriately called
> 'namespaces' to hold a reference to all its namespaces. The nvme_ns
> list_head 'list' is simply the entry item for that list.

Yah, I saw that in a later patch in the 18 patch series and it made a
lot more sense.  Thanks!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev
  2015-10-21 21:23   ` J Freyensee
  2015-10-21 22:51     ` Busch, Keith
@ 2015-10-22  7:37     ` Christoph Hellwig
  1 sibling, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-22  7:37 UTC (permalink / raw)


On Wed, Oct 21, 2015@02:23:46PM -0700, J Freyensee wrote:
> > -struct nvme_dev {
> > -	struct list_head node;
> > -	struct nvme_queue **queues;
> > +struct nvme_ctrl {
> 
> Whether it is a PCIe NVMe device with multiple controllers or something
> beyond PCIe, I think an instance of struct nvme_ctrl is going to need
> to know its cntlid.  How does this struct know its cntlid?  I'm not
> initially seeing it.  I think it would make more sense to have struct
> nvme_ctrl have a member that stores its cntlid value.  It would
> basically be the "name" of the specific nvme_ctrl instance allocated.

Right now nothing in the Linux PCIe driver knows the cntlid.  At least
for PCIe it's also fairly uninteresting for the driver.  Remeber this
is just a simple split for now, additional functionality will be built
on top of it.

> > -	bool subsystem;
> 
> Also, this is something probably a bit more far visioned, but I think
> struct nvme_ctrl would need a mechanism to know what NVMe subsystem it
> sits in.  Even if 'subsystem' stayed in the struct, I'm not sure how a
> bool would work for this.

See the reply for Keith on what this field does.  But at least for PCIe
the containing subsystem isn't all that interesting for the driver.  Maybe
as an additional safety check for the Namespae GUIDs in a multipathing
setup, but that's about it.

> > @@ -78,7 +48,7 @@ struct nvme_dev {
> >  struct nvme_ns {
> >  	struct list_head list;
> >  
> > -	struct nvme_dev *dev;
> > +	struct nvme_ctrl *ctrl;
> 
> This seems a bit backwards to me.  It's the controller (cntlid) that is
> going to tell the host how many namespaces are associated with it via
> the NVMe Identify commands.  Thus, I would have thought that a list of
> struct nvme_ns instances would be in a struct nvme_ctrl definition, not
> vice-versa.  Unless '*ctrl' is going to be used as a back pointer?  But
> then in 'struct nvme_ctrl' I didn't see initially see anything that
> associates itself to the namespaces attached to it.

The namespace lists moves to nvme_ctrl in this list, and the dev field
has always been a backpointer that's now replaced with a backpointer to
the nvme_ctrl structure.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:30   ` Sagi Grimberg
  2015-10-16  5:58 ` [PATCH 08/18] nvme: refactor nvme_queue_rq Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


Pass back a true/false value instead of the length which needs a compare
with the bytes in the request and drop the pointless gfp_t argument.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8b0ba11..9dbb0ec 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -719,9 +719,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 	blk_mq_complete_request(req, error);
 }
 
-/* length is in bytes.  gfp flags indicates whether we may sleep. */
-static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
-		int total_len, gfp_t gfp)
+static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
+		int total_len)
 {
 	struct dma_pool *pool;
 	int length = total_len;
@@ -737,7 +736,7 @@ static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 
 	length -= (page_size - offset);
 	if (length <= 0)
-		return total_len;
+		return true;
 
 	dma_len -= (page_size - offset);
 	if (dma_len) {
@@ -750,7 +749,7 @@ static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 
 	if (length <= page_size) {
 		iod->first_dma = dma_addr;
-		return total_len;
+		return true;
 	}
 
 	nprps = DIV_ROUND_UP(length, page_size);
@@ -762,11 +761,11 @@ static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 		iod->npages = 1;
 	}
 
-	prp_list = dma_pool_alloc(pool, gfp, &prp_dma);
+	prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 	if (!prp_list) {
 		iod->first_dma = dma_addr;
 		iod->npages = -1;
-		return (total_len - length) + page_size;
+		return false;
 	}
 	list[0] = prp_list;
 	iod->first_dma = prp_dma;
@@ -774,9 +773,9 @@ static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 	for (;;) {
 		if (i == page_size >> 3) {
 			__le64 *old_prp_list = prp_list;
-			prp_list = dma_pool_alloc(pool, gfp, &prp_dma);
+			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
-				return total_len - length;
+				return false;
 			list[iod->npages++] = prp_list;
 			prp_list[0] = old_prp_list[i - 1];
 			old_prp_list[i - 1] = cpu_to_le64(prp_dma);
@@ -796,7 +795,7 @@ static int nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 		dma_len = sg_dma_len(sg);
 	}
 
-	return total_len;
+	return true;
 }
 
 static void nvme_submit_priv(struct nvme_queue *nvmeq, struct request *req,
@@ -962,8 +961,7 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 		if (!dma_map_sg(nvmeq->q_dmadev, iod->sg, iod->nents, dma_dir))
 			goto retry_cmd;
 
-		if (blk_rq_bytes(req) !=
-                    nvme_setup_prps(dev, iod, blk_rq_bytes(req), GFP_ATOMIC)) {
+		if (!nvme_setup_prps(dev, iod, blk_rq_bytes(req))) {
 			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
 			goto retry_cmd;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention
  2015-10-16  5:58 ` [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention Christoph Hellwig
@ 2015-10-20 10:30   ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:30 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> Pass back a true/false value instead of the length which needs a compare
> with the bytes in the request and drop the pointless gfp_t argument.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>

Looks good to me,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 08/18] nvme: refactor nvme_queue_rq
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 09/18] nvme: move nvme_error_status to common code Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


This "backports" the structure I've used for the fabrics driver.  It
mostly started out as a cleanup so that I could actually understand
the code, but I think it also qualifies as a micro-optimization due
to the reduced time we hold q_lock and disable interrupts.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 219 +++++++++++++++++++++---------------------------
 1 file changed, 97 insertions(+), 122 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9dbb0ec..dfe914e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -798,19 +798,53 @@ static bool nvme_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod,
 	return true;
 }
 
-static void nvme_submit_priv(struct nvme_queue *nvmeq, struct request *req,
-		struct nvme_iod *iod)
+static int nvme_map_data(struct nvme_dev *dev, struct nvme_iod *iod,
+		struct nvme_command *cmnd)
 {
-	struct nvme_command cmnd;
+	struct request *req = iod_get_private(iod);
+	struct request_queue *q = req->q;
+	enum dma_data_direction dma_dir = rq_data_dir(req) ?
+			DMA_TO_DEVICE : DMA_FROM_DEVICE;
+	int ret = BLK_MQ_RQ_QUEUE_ERROR;
+
+	sg_init_table(iod->sg, req->nr_phys_segments);
+	iod->nents = blk_rq_map_sg(q, req, iod->sg);
+	if (!iod->nents)
+		goto out;
+
+	ret = BLK_MQ_RQ_QUEUE_BUSY;
+	if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+		goto out;
+
+	if (!nvme_setup_prps(dev, iod, blk_rq_bytes(req)))
+		goto out_unmap;
+
+	ret = BLK_MQ_RQ_QUEUE_ERROR;
+	if (blk_integrity_rq(req)) {
+		if (blk_rq_count_integrity_sg(q, req->bio) != 1)
+			goto out_unmap;
+
+		sg_init_table(iod->meta_sg, 1);
+		if (blk_rq_map_integrity_sg(q, req->bio, iod->meta_sg) != 1)
+			goto out_unmap;
 
-	memcpy(&cmnd, req->cmd, sizeof(cmnd));
-	cmnd.rw.command_id = req->tag;
-	if (req->nr_phys_segments) {
-		cmnd.rw.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
-		cmnd.rw.prp2 = cpu_to_le64(iod->first_dma);
+		if (rq_data_dir(req))
+			nvme_dif_remap(req, nvme_dif_prep);
+
+		if (!dma_map_sg(dev->dev, iod->meta_sg, 1, dma_dir))
+			goto out_unmap;
 	}
 
-	__nvme_submit_cmd(nvmeq, &cmnd);
+	cmnd->rw.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
+	cmnd->rw.prp2 = cpu_to_le64(iod->first_dma);
+	if (blk_integrity_rq(req))
+		cmnd->rw.metadata = cpu_to_le64(sg_dma_address(iod->meta_sg));
+	return BLK_MQ_RQ_QUEUE_OK;
+
+out_unmap:
+	dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
+out:
+	return ret;
 }
 
 /*
@@ -818,46 +852,42 @@ static void nvme_submit_priv(struct nvme_queue *nvmeq, struct request *req,
  * worth having a special pool for these or additional cases to handle freeing
  * the iod.
  */
-static void nvme_submit_discard(struct nvme_queue *nvmeq, struct nvme_ns *ns,
-		struct request *req, struct nvme_iod *iod)
+static int nvme_setup_discard(struct nvme_queue *nvmeq, struct nvme_ns *ns,
+		struct nvme_iod *iod, struct nvme_command *cmnd)
 {
-	struct nvme_dsm_range *range =
-				(struct nvme_dsm_range *)iod_list(iod)[0];
-	struct nvme_command cmnd;
+	struct request *req = iod_get_private(iod);
+	struct nvme_dsm_range *range;
+
+	range = dma_pool_alloc(nvmeq->dev->prp_small_pool, GFP_ATOMIC,
+						&iod->first_dma);
+	if (!range)
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	iod_list(iod)[0] = (__le64 *)range;
+	iod->npages = 0;
 
 	range->cattr = cpu_to_le32(0);
 	range->nlb = cpu_to_le32(blk_rq_bytes(req) >> ns->lba_shift);
 	range->slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
 
-	memset(&cmnd, 0, sizeof(cmnd));
-	cmnd.dsm.opcode = nvme_cmd_dsm;
-	cmnd.dsm.command_id = req->tag;
-	cmnd.dsm.nsid = cpu_to_le32(ns->ns_id);
-	cmnd.dsm.prp1 = cpu_to_le64(iod->first_dma);
-	cmnd.dsm.nr = 0;
-	cmnd.dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
-
-	__nvme_submit_cmd(nvmeq, &cmnd);
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->dsm.opcode = nvme_cmd_dsm;
+	cmnd->dsm.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->dsm.prp1 = cpu_to_le64(iod->first_dma);
+	cmnd->dsm.nr = 0;
+	cmnd->dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
+	return BLK_MQ_RQ_QUEUE_OK;
 }
 
-static void nvme_submit_flush(struct nvme_queue *nvmeq, struct nvme_ns *ns,
-								int cmdid)
+static void nvme_setup_flush(struct nvme_ns *ns, struct nvme_command *cmnd)
 {
-	struct nvme_command cmnd;
-
-	memset(&cmnd, 0, sizeof(cmnd));
-	cmnd.common.opcode = nvme_cmd_flush;
-	cmnd.common.command_id = cmdid;
-	cmnd.common.nsid = cpu_to_le32(ns->ns_id);
-
-	__nvme_submit_cmd(nvmeq, &cmnd);
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->common.opcode = nvme_cmd_flush;
+	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
 }
 
-static int nvme_submit_iod(struct nvme_queue *nvmeq, struct nvme_iod *iod,
-							struct nvme_ns *ns)
+static void nvme_setup_rw(struct nvme_ns *ns, struct request *req,
+		struct nvme_command *cmnd)
 {
-	struct request *req = iod_get_private(iod);
-	struct nvme_command cmnd;
 	u16 control = 0;
 	u32 dsmgmt = 0;
 
@@ -869,14 +899,12 @@ static int nvme_submit_iod(struct nvme_queue *nvmeq, struct nvme_iod *iod,
 	if (req->cmd_flags & REQ_RAHEAD)
 		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
 
-	memset(&cmnd, 0, sizeof(cmnd));
-	cmnd.rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
-	cmnd.rw.command_id = req->tag;
-	cmnd.rw.nsid = cpu_to_le32(ns->ns_id);
-	cmnd.rw.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
-	cmnd.rw.prp2 = cpu_to_le64(iod->first_dma);
-	cmnd.rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
-	cmnd.rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
+	cmnd->rw.command_id = req->tag;
+	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
+	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
 	if (ns->ms) {
 		switch (ns->pi_type) {
@@ -887,23 +915,16 @@ static int nvme_submit_iod(struct nvme_queue *nvmeq, struct nvme_iod *iod,
 		case NVME_NS_DPS_PI_TYPE2:
 			control |= NVME_RW_PRINFO_PRCHK_GUARD |
 					NVME_RW_PRINFO_PRCHK_REF;
-			cmnd.rw.reftag = cpu_to_le32(
+			cmnd->rw.reftag = cpu_to_le32(
 					nvme_block_nr(ns, blk_rq_pos(req)));
 			break;
 		}
-		if (blk_integrity_rq(req))
-			cmnd.rw.metadata =
-				cpu_to_le64(sg_dma_address(iod->meta_sg));
-		else
+		if (!blk_integrity_rq(req))
 			control |= NVME_RW_PRINFO_PRACT;
 	}
 
-	cmnd.rw.control = cpu_to_le16(control);
-	cmnd.rw.dsmgmt = cpu_to_le32(dsmgmt);
-
-	__nvme_submit_cmd(nvmeq, &cmnd);
-
-	return 0;
+	cmnd->rw.control = cpu_to_le16(control);
+	cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
 }
 
 /*
@@ -918,7 +939,8 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request *req = bd->rq;
 	struct nvme_cmd_info *cmd = blk_mq_rq_to_pdu(req);
 	struct nvme_iod *iod;
-	enum dma_data_direction dma_dir;
+	struct nvme_command cmnd;
+	int ret = BLK_MQ_RQ_QUEUE_OK;
 
 	/*
 	 * If formated with metadata, require the block layer provide a buffer
@@ -938,80 +960,33 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
 	if (req->cmd_flags & REQ_DISCARD) {
-		void *range;
-		/*
-		 * We reuse the small pool to allocate the 16-byte range here
-		 * as it is not worth having a special pool for these or
-		 * additional cases to handle freeing the iod.
-		 */
-		range = dma_pool_alloc(dev->prp_small_pool, GFP_ATOMIC,
-						&iod->first_dma);
-		if (!range)
-			goto retry_cmd;
-		iod_list(iod)[0] = (__le64 *)range;
-		iod->npages = 0;
-	} else if (req->nr_phys_segments) {
-		dma_dir = rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
-		sg_init_table(iod->sg, req->nr_phys_segments);
-		iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
-		if (!iod->nents)
-			goto error_cmd;
-
-		if (!dma_map_sg(nvmeq->q_dmadev, iod->sg, iod->nents, dma_dir))
-			goto retry_cmd;
-
-		if (!nvme_setup_prps(dev, iod, blk_rq_bytes(req))) {
-			dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
-			goto retry_cmd;
-		}
-		if (blk_integrity_rq(req)) {
-			if (blk_rq_count_integrity_sg(req->q, req->bio) != 1) {
-				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
-						dma_dir);
-				goto error_cmd;
-			}
-
-			sg_init_table(iod->meta_sg, 1);
-			if (blk_rq_map_integrity_sg(
-					req->q, req->bio, iod->meta_sg) != 1) {
-				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
-						dma_dir);
-				goto error_cmd;
-			}
-
-			if (rq_data_dir(req))
-				nvme_dif_remap(req, nvme_dif_prep);
+		ret = nvme_setup_discard(nvmeq, ns, iod, &cmnd);
+	} else {
+		if (req->cmd_type == REQ_TYPE_DRV_PRIV)
+			memcpy(&cmnd, req->cmd, sizeof(cmnd));
+		else if (req->cmd_flags & REQ_FLUSH)
+			nvme_setup_flush(ns, &cmnd);
+		else
+			nvme_setup_rw(ns, req, &cmnd);
 
-			if (!dma_map_sg(nvmeq->q_dmadev, iod->meta_sg, 1, dma_dir)) {
-				dma_unmap_sg(dev->dev, iod->sg, iod->nents,
-						dma_dir);
-				goto error_cmd;
-			}
-		}
+		if (req->nr_phys_segments)
+			ret = nvme_map_data(dev, iod, &cmnd);
 	}
 
+	if (ret)
+		goto out;
+
+	cmnd.common.command_id = req->tag;
 	nvme_set_info(cmd, iod, req_completion);
-	spin_lock_irq(&nvmeq->q_lock);
-	if (req->cmd_type == REQ_TYPE_DRV_PRIV)
-		nvme_submit_priv(nvmeq, req, iod);
-	else if (req->cmd_flags & REQ_DISCARD)
-		nvme_submit_discard(nvmeq, ns, req, iod);
-	else if (req->cmd_flags & REQ_FLUSH)
-		nvme_submit_flush(nvmeq, ns, req->tag);
-	else
-		nvme_submit_iod(nvmeq, iod, ns);
 
+	spin_lock_irq(&nvmeq->q_lock);
+	__nvme_submit_cmd(nvmeq, &cmnd);
 	nvme_process_cq(nvmeq);
 	spin_unlock_irq(&nvmeq->q_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
-
- error_cmd:
-	nvme_free_iod(dev, iod);
-	return BLK_MQ_RQ_QUEUE_ERROR;
- retry_cmd:
+out:
 	nvme_free_iod(dev, iod);
-	return BLK_MQ_RQ_QUEUE_BUSY;
+	return ret;
 }
 
 static int nvme_process_cq(struct nvme_queue *nvmeq)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/18] nvme: move nvme_error_status to common code
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 08/18] nvme: refactor nvme_queue_rq Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 10:54   ` Sagi Grimberg
  2015-10-16  5:58 ` [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw " Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


And mark it inline so that we don't slow down the completion path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/nvme.h | 12 ++++++++++++
 drivers/nvme/host/pci.c  | 12 ------------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3e409fa..a4f2f2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -80,6 +80,18 @@ static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 	return (sector >> (ns->lba_shift - 9));
 }
 
+static inline int nvme_error_status(u16 status)
+{
+	switch (status & 0x7ff) {
+	case NVME_SC_SUCCESS:
+		return 0;
+	case NVME_SC_CAP_EXCEEDED:
+		return -ENOSPC;
+	default:
+		return -EIO;
+	}
+}
+
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index dfe914e..7ed7dcb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -543,18 +543,6 @@ static void nvme_free_iod(struct nvme_dev *dev, struct nvme_iod *iod)
 		kfree(iod);
 }
 
-static int nvme_error_status(u16 status)
-{
-	switch (status & 0x7ff) {
-	case NVME_SC_SUCCESS:
-		return 0;
-	case NVME_SC_CAP_EXCEEDED:
-		return -ENOSPC;
-	default:
-		return -EIO;
-	}
-}
-
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 static void nvme_dif_prep(u32 p, u32 v, struct t10_pi_tuple *pi)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/18] nvme: move nvme_error_status to common code
  2015-10-16  5:58 ` [PATCH 09/18] nvme: move nvme_error_status to common code Christoph Hellwig
@ 2015-10-20 10:54   ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 10:54 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> And mark it inline so that we don't slow down the completion path by
> having to turn it into a forced out of line call.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>

Looks good,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw to common code
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 09/18] nvme: move nvme_error_status to common code Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-20 11:01   ` Sagi Grimberg
  2015-10-16  5:58 ` [PATCH 11/18] nvme: split __nvme_submit_sync_cmd Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


And mark them inline so that we don't slow down the I/O submission path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/nvme.h | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/pci.c  | 49 ----------------------------------------------
 2 files changed, 51 insertions(+), 49 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a4f2f2c..6c77db7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -80,6 +80,57 @@ static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 	return (sector >> (ns->lba_shift - 9));
 }
 
+static inline void nvme_setup_flush(struct nvme_ns *ns,
+		struct nvme_command *cmnd)
+{
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->common.opcode = nvme_cmd_flush;
+	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
+}
+
+static inline void nvme_setup_rw(struct nvme_ns *ns, struct request *req,
+		struct nvme_command *cmnd)
+{
+	u16 control = 0;
+	u32 dsmgmt = 0;
+
+	if (req->cmd_flags & REQ_FUA)
+		control |= NVME_RW_FUA;
+	if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD))
+		control |= NVME_RW_LR;
+
+	if (req->cmd_flags & REQ_RAHEAD)
+		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
+
+	memset(cmnd, 0, sizeof(*cmnd));
+	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
+	cmnd->rw.command_id = req->tag;
+	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
+	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+
+	if (ns->ms) {
+		switch (ns->pi_type) {
+		case NVME_NS_DPS_PI_TYPE3:
+			control |= NVME_RW_PRINFO_PRCHK_GUARD;
+			break;
+		case NVME_NS_DPS_PI_TYPE1:
+		case NVME_NS_DPS_PI_TYPE2:
+			control |= NVME_RW_PRINFO_PRCHK_GUARD |
+					NVME_RW_PRINFO_PRCHK_REF;
+			cmnd->rw.reftag = cpu_to_le32(
+					nvme_block_nr(ns, blk_rq_pos(req)));
+			break;
+		}
+		if (!blk_integrity_rq(req))
+			control |= NVME_RW_PRINFO_PRACT;
+	}
+
+	cmnd->rw.control = cpu_to_le16(control);
+	cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
+}
+
+
 static inline int nvme_error_status(u16 status)
 {
 	switch (status & 0x7ff) {
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7ed7dcb..d742efe 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -866,55 +866,6 @@ static int nvme_setup_discard(struct nvme_queue *nvmeq, struct nvme_ns *ns,
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 
-static void nvme_setup_flush(struct nvme_ns *ns, struct nvme_command *cmnd)
-{
-	memset(cmnd, 0, sizeof(*cmnd));
-	cmnd->common.opcode = nvme_cmd_flush;
-	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
-}
-
-static void nvme_setup_rw(struct nvme_ns *ns, struct request *req,
-		struct nvme_command *cmnd)
-{
-	u16 control = 0;
-	u32 dsmgmt = 0;
-
-	if (req->cmd_flags & REQ_FUA)
-		control |= NVME_RW_FUA;
-	if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD))
-		control |= NVME_RW_LR;
-
-	if (req->cmd_flags & REQ_RAHEAD)
-		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
-
-	memset(cmnd, 0, sizeof(*cmnd));
-	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
-	cmnd->rw.command_id = req->tag;
-	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
-	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
-	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
-
-	if (ns->ms) {
-		switch (ns->pi_type) {
-		case NVME_NS_DPS_PI_TYPE3:
-			control |= NVME_RW_PRINFO_PRCHK_GUARD;
-			break;
-		case NVME_NS_DPS_PI_TYPE1:
-		case NVME_NS_DPS_PI_TYPE2:
-			control |= NVME_RW_PRINFO_PRCHK_GUARD |
-					NVME_RW_PRINFO_PRCHK_REF;
-			cmnd->rw.reftag = cpu_to_le32(
-					nvme_block_nr(ns, blk_rq_pos(req)));
-			break;
-		}
-		if (!blk_integrity_rq(req))
-			control |= NVME_RW_PRINFO_PRACT;
-	}
-
-	cmnd->rw.control = cpu_to_le16(control);
-	cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
-}
-
 /*
  * NOTE: ns is NULL when called on the admin queue.
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw to common code
  2015-10-16  5:58 ` [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw " Christoph Hellwig
@ 2015-10-20 11:01   ` Sagi Grimberg
  2015-10-21  6:55     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-20 11:01 UTC (permalink / raw)


On 10/16/2015 8:58 AM, Christoph Hellwig wrote:
> And mark them inline so that we don't slow down the I/O submission path by
> having to turn it into a forced out of line call.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/nvme.h | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/pci.c  | 49 ----------------------------------------------
>   2 files changed, 51 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a4f2f2c..6c77db7 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -80,6 +80,57 @@ static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
>   	return (sector >> (ns->lba_shift - 9));
>   }
>
> +static inline void nvme_setup_flush(struct nvme_ns *ns,
> +		struct nvme_command *cmnd)
> +{
> +	memset(cmnd, 0, sizeof(*cmnd));
> +	cmnd->common.opcode = nvme_cmd_flush;
> +	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
> +}
> +
> +static inline void nvme_setup_rw(struct nvme_ns *ns, struct request *req,
> +		struct nvme_command *cmnd)
> +{
> +	u16 control = 0;
> +	u32 dsmgmt = 0;
> +
> +	if (req->cmd_flags & REQ_FUA)
> +		control |= NVME_RW_FUA;
> +	if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD))
> +		control |= NVME_RW_LR;
> +
> +	if (req->cmd_flags & REQ_RAHEAD)
> +		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
> +
> +	memset(cmnd, 0, sizeof(*cmnd));
> +	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
> +	cmnd->rw.command_id = req->tag;
> +	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
> +	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
> +	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
> +
> +	if (ns->ms) {
> +		switch (ns->pi_type) {
> +		case NVME_NS_DPS_PI_TYPE3:
> +			control |= NVME_RW_PRINFO_PRCHK_GUARD;
> +			break;
> +		case NVME_NS_DPS_PI_TYPE1:
> +		case NVME_NS_DPS_PI_TYPE2:
> +			control |= NVME_RW_PRINFO_PRCHK_GUARD |
> +					NVME_RW_PRINFO_PRCHK_REF;
> +			cmnd->rw.reftag = cpu_to_le32(
> +					nvme_block_nr(ns, blk_rq_pos(req)));
> +			break;
> +		}
> +		if (!blk_integrity_rq(req))
> +			control |= NVME_RW_PRINFO_PRACT;
> +	}
> +
> +	cmnd->rw.control = cpu_to_le16(control);
> +	cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
> +}
> +
> +

Hi Christoph,

I do agree that making these static inline can speed things up here,
but the coding style documentation asks to avoid inline'ing functions
longer than a few lines of code (See Documentation/CodingStyle Chapter
15: "The inline disease").

Do you think this case qualifies as an exception?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw to common code
  2015-10-20 11:01   ` Sagi Grimberg
@ 2015-10-21  6:55     ` Christoph Hellwig
  2015-10-21 14:41       ` Sagi Grimberg
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-21  6:55 UTC (permalink / raw)

On Tue, Oct 20, 2015@02:01:06PM +0300, Sagi Grimberg wrote:
> I do agree that making these static inline can speed things up here,
> but the coding style documentation asks to avoid inline'ing functions
> longer than a few lines of code (See Documentation/CodingStyle Chapter
> 15: "The inline disease").
>
> Do you think this case qualifies as an exception?

Yes.  The inline is only used once per NVMe transport driver and it's
used in the absolute fast path in a place where being able to optimize
the assignments inside and outside the function call will become useful.
Also all the code is trivial assignments and simple conditionals so it's
actually pretty small in terms of generated code.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw to common code
  2015-10-21  6:55     ` Christoph Hellwig
@ 2015-10-21 14:41       ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-10-21 14:41 UTC (permalink / raw)


On 10/21/2015 9:55 AM, Christoph Hellwig wrote:
> On Tue, Oct 20, 2015@02:01:06PM +0300, Sagi Grimberg wrote:
>> I do agree that making these static inline can speed things up here,
>> but the coding style documentation asks to avoid inline'ing functions
>> longer than a few lines of code (See Documentation/CodingStyle Chapter
>> 15: "The inline disease").
>>
>> Do you think this case qualifies as an exception?
>
> Yes.  The inline is only used once per NVMe transport driver and it's
> used in the absolute fast path in a place where being able to optimize
> the assignments inside and outside the function call will become useful.
> Also all the code is trivial assignments and simple conditionals so it's
> actually pretty small in terms of generated code.
>

I agree it is better to have them inline...

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 11/18] nvme: split __nvme_submit_sync_cmd
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw " Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


Add a separate nvme_submit_user_cmd for commands that directly DMA
to or from userspace.  We'll add metadata support to that soon and
the common version would become too messy.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 82 ++++++++++++++++++++++++++++++++++--------------
 drivers/nvme/host/nvme.h |  6 ++--
 drivers/nvme/host/pci.c  |  6 ++--
 drivers/nvme/host/scsi.c |  4 +--
 4 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e2e8818..9d05df0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -21,22 +21,15 @@
 
 #include "nvme.h"
 
-/*
- * Returns 0 on success.  If the result is negative, it's a Linux error code;
- * if the result is positive, it's an NVM Express status code
- */
-int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
-		void *buffer, void __user *ubuffer, unsigned bufflen,
-		u32 *result, unsigned timeout)
+static struct request *nvme_alloc_request(struct request_queue *q,
+		struct nvme_command *cmd)
 {
 	bool write = cmd->common.opcode & 1;
-	struct bio *bio = NULL;
 	struct request *req;
-	int ret;
 
 	req = blk_mq_alloc_request(q, write, GFP_KERNEL, false);
 	if (IS_ERR(req))
-		return PTR_ERR(req);
+		return req;
 
 	req->cmd_type = REQ_TYPE_DRV_PRIV;
 	req->cmd_flags |= REQ_FAILFAST_DRIVER;
@@ -44,26 +37,36 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	req->__sector = (sector_t) -1;
 	req->bio = req->biotail = NULL;
 
-	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
-
 	req->cmd = (unsigned char *)cmd;
 	req->cmd_len = sizeof(struct nvme_command);
 	req->special = (void *)0;
 
+	return req;
+}
+
+/*
+ * Returns 0 on success.  If the result is negative, it's a Linux error code;
+ * if the result is positive, it's an NVM Express status code
+ */
+int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void *buffer, unsigned bufflen, u32 *result, unsigned timeout)
+{
+	struct request *req;
+	int ret;
+
+	req = nvme_alloc_request(q, cmd);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+
 	if (buffer && bufflen) {
 		ret = blk_rq_map_kern(q, req, buffer, bufflen, __GFP_WAIT);
 		if (ret)
 			goto out;
-	} else if (ubuffer && bufflen) {
-		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_WAIT);
-		if (ret)
-			goto out;
-		bio = req->bio;
 	}
 
 	blk_execute_rq(req->q, NULL, req, 0);
-	if (bio)
-		blk_rq_unmap_user(bio);
 	if (result)
 		*result = (u32)(uintptr_t)req->special;
 	ret = req->errors;
@@ -75,7 +78,40 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buffer, unsigned bufflen)
 {
-	return __nvme_submit_sync_cmd(q, cmd, buffer, NULL, bufflen, NULL, 0);
+	return __nvme_submit_sync_cmd(q, cmd, buffer, bufflen, NULL, 0);
+}
+
+int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void __user *ubuffer, unsigned bufflen, u32 *result,
+		unsigned timeout)
+{
+	struct bio *bio = NULL;
+	struct request *req;
+	int ret;
+
+	req = nvme_alloc_request(q, cmd);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+
+	if (ubuffer && bufflen) {
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
+				__GFP_WAIT);
+		if (ret)
+			goto out;
+		bio = req->bio;
+	}
+
+	blk_execute_rq(req->q, NULL, req, 0);
+	if (bio)
+		blk_rq_unmap_user(bio);
+	if (result)
+		*result = (u32)(uintptr_t)req->special;
+	ret = req->errors;
+ out:
+	blk_mq_free_request(req);
+	return ret;
 }
 
 int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
@@ -130,8 +166,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 	c.features.prp1 = cpu_to_le64(dma_addr);
 	c.features.fid = cpu_to_le32(fid);
 
-	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
-			result, 0);
+	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0, result, 0);
 }
 
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
@@ -145,8 +180,7 @@ int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 	c.features.fid = cpu_to_le32(fid);
 	c.features.dword11 = cpu_to_le32(dword11);
 
-	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, NULL, 0,
-			result, 0);
+	return __nvme_submit_sync_cmd(dev->admin_q, &c, NULL, 0, result, 0);
 }
 
 int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6c77db7..faa10f0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -146,8 +146,10 @@ static inline int nvme_error_status(u16 status)
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
-		void *buffer, void __user *ubuffer, unsigned bufflen,
-		u32 *result, unsigned timeout);
+		void *buffer, unsigned bufflen,  u32 *result, unsigned timeout);
+int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void __user *ubuffer, unsigned bufflen, u32 *result,
+		unsigned timeout);
 int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id);
 int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 		struct nvme_id_ns **id);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d742efe..be0ae02 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1677,7 +1677,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	c.rw.appmask = cpu_to_le16(io.appmask);
 	c.rw.metadata = cpu_to_le64(meta_dma);
 
-	status = __nvme_submit_sync_cmd(ns->queue, &c, NULL,
+	status = nvme_submit_user_cmd(ns->queue, &c,
 			(void __user *)(uintptr_t)io.addr, length, NULL, 0);
  unmap:
 	if (meta) {
@@ -1719,8 +1719,8 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	if (cmd.timeout_ms)
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
-	status = __nvme_submit_sync_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			NULL, (void __user *)(uintptr_t)cmd.addr, cmd.data_len,
+	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
+			(void __user *)(uintptr_t)cmd.addr, cmd.data_len,
 			&cmd.result, timeout);
 	if (status >= 0) {
 		if (put_user(cmd.result, &ucmd->result))
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index 00d0bdd..b673fe4 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -1295,7 +1295,7 @@ static int nvme_trans_send_download_fw_cmd(struct nvme_ns *ns, struct sg_io_hdr
 	c.dlfw.numd = cpu_to_le32((tot_len/BYTES_TO_DWORDS) - 1);
 	c.dlfw.offset = cpu_to_le32(offset/BYTES_TO_DWORDS);
 
-	nvme_sc = __nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, NULL,
+	nvme_sc = nvme_submit_user_cmd(ns->ctrl->admin_q, &c,
 			hdr->dxferp, tot_len, NULL, 0);
 	return nvme_trans_status_code(hdr, nvme_sc);
 }
@@ -1699,7 +1699,7 @@ static int nvme_trans_do_nvme_io(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 			nvme_sc = NVME_SC_LBA_RANGE;
 			break;
 		}
-		nvme_sc = __nvme_submit_sync_cmd(ns->queue, &c, NULL,
+		nvme_sc = nvme_submit_user_cmd(ns->queue, &c,
 				next_mapping_addr, unit_len, NULL, 0);
 		if (nvme_sc)
 			break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 11/18] nvme: split __nvme_submit_sync_cmd Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16 20:04   ` Busch, Keith
  2015-10-16  5:58 ` [PATCH 13/18] nvme: move block_device_operations and ns/ctrl freeing to common code Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


Use the integrity API to pass through metadata from userspace.  For PI
enabled devices this means that we now validate the reftag, which seems
like an unintentional ommission in the old code.

Thanks to Keith Busch for testing and fixes.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 78 +++++++++++++++++++++++++++++++++++++++++++-----
 drivers/nvme/host/nvme.h |  4 +++
 drivers/nvme/host/pci.c  | 39 ++++--------------------
 3 files changed, 79 insertions(+), 42 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9d05df0..6d738b6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -81,12 +81,16 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	return __nvme_submit_sync_cmd(q, cmd, buffer, bufflen, NULL, 0);
 }
 
-int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
-		void __user *ubuffer, unsigned bufflen, u32 *result,
-		unsigned timeout)
+int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void __user *ubuffer, unsigned bufflen,
+		void __user *meta_buffer, unsigned meta_len, u32 meta_seed,
+		u32 *result, unsigned timeout)
 {
-	struct bio *bio = NULL;
+	bool write = cmd->common.opcode & 1;
+	struct nvme_ns *ns = q->queuedata;
 	struct request *req;
+	struct bio *bio = NULL;
+	void *meta = NULL;
 	int ret;
 
 	req = nvme_alloc_request(q, cmd);
@@ -101,19 +105,77 @@ int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
 		if (ret)
 			goto out;
 		bio = req->bio;
+		bio->bi_bdev = bdget_disk(ns->disk, 0);
+		if (!bio->bi_bdev) {
+			ret = -ENODEV;
+			goto out_unmap;
+		}
+
+		if (meta_buffer) {
+			struct bio_integrity_payload *bip;
+
+			meta = kmalloc(meta_len, GFP_KERNEL);
+			if (!meta) {
+				ret = -ENOMEM;
+				goto out_unmap;
+			}
+
+			if (write) {
+				if (copy_from_user(meta, meta_buffer,
+						meta_len)) {
+					ret = -EFAULT;
+					goto out_free_meta;
+				}
+			}
+
+			bip = bio_integrity_alloc(bio, GFP_KERNEL, 1);
+			if (!bip) {
+				ret = -ENOMEM;
+				goto out_free_meta;
+			}
+
+			bip->bip_iter.bi_size = meta_len;
+			bip->bip_iter.bi_sector = meta_seed;
+
+			ret = bio_integrity_add_page(bio, virt_to_page(meta),
+					meta_len, offset_in_page(meta));
+			if (ret != meta_len) {
+				ret = -ENOMEM;
+				goto out_free_meta;
+			}
+		}
 	}
 
-	blk_execute_rq(req->q, NULL, req, 0);
-	if (bio)
-		blk_rq_unmap_user(bio);
+	blk_execute_rq(req->q, ns->disk, req, 0);
+	ret = req->errors;
 	if (result)
 		*result = (u32)(uintptr_t)req->special;
-	ret = req->errors;
+	if (meta && !ret && !write) {
+		if (copy_to_user(meta_buffer, meta, meta_len))
+			ret = -EFAULT;
+	}
+
+ out_free_meta:
+	kfree(meta);
+ out_unmap:
+	if (bio) {
+		if (bio->bi_bdev)
+			bdput(bio->bi_bdev);
+		blk_rq_unmap_user(bio);
+	}
  out:
 	blk_mq_free_request(req);
 	return ret;
 }
 
+int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void __user *ubuffer, unsigned bufflen, u32 *result,
+		unsigned timeout)
+{
+	return __nvme_submit_user_cmd(q, cmd, ubuffer, bufflen, NULL, 0, 0,
+			result, timeout);
+}
+
 int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 {
 	struct nvme_command c = { };
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index faa10f0..fefcdff 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -150,6 +150,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void __user *ubuffer, unsigned bufflen, u32 *result,
 		unsigned timeout);
+int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
+		void __user *ubuffer, unsigned bufflen,
+		void __user *meta_buffer, unsigned meta_len, u32 meta_seed,
+		u32 *result, unsigned timeout);
 int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id);
 int nvme_identify_ns(struct nvme_ctrl *dev, unsigned nsid,
 		struct nvme_id_ns **id);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index be0ae02..c972213 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1615,13 +1615,9 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 
 static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 {
-	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
 	struct nvme_user_io io;
 	struct nvme_command c;
 	unsigned length, meta_len;
-	int status, write;
-	dma_addr_t meta_dma = 0;
-	void *meta = NULL;
 	void __user *metadata;
 
 	if (copy_from_user(&io, uio, sizeof(io)))
@@ -1639,29 +1635,13 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	length = (io.nblocks + 1) << ns->lba_shift;
 	meta_len = (io.nblocks + 1) * ns->ms;
 	metadata = (void __user *)(uintptr_t)io.metadata;
-	write = io.opcode & 1;
 
 	if (ns->ext) {
 		length += meta_len;
 		meta_len = 0;
-	}
-	if (meta_len) {
-		if (((io.metadata & 3) || !io.metadata) && !ns->ext)
+	} else if (meta_len) {
+		if ((io.metadata & 3) || !io.metadata)
 			return -EINVAL;
-
-		meta = dma_alloc_coherent(dev->dev, meta_len,
-						&meta_dma, GFP_KERNEL);
-
-		if (!meta) {
-			status = -ENOMEM;
-			goto unmap;
-		}
-		if (write) {
-			if (copy_from_user(meta, metadata, meta_len)) {
-				status = -EFAULT;
-				goto unmap;
-			}
-		}
 	}
 
 	memset(&c, 0, sizeof(c));
@@ -1675,19 +1655,10 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	c.rw.reftag = cpu_to_le32(io.reftag);
 	c.rw.apptag = cpu_to_le16(io.apptag);
 	c.rw.appmask = cpu_to_le16(io.appmask);
-	c.rw.metadata = cpu_to_le64(meta_dma);
 
-	status = nvme_submit_user_cmd(ns->queue, &c,
-			(void __user *)(uintptr_t)io.addr, length, NULL, 0);
- unmap:
-	if (meta) {
-		if (status == NVME_SC_SUCCESS && !write) {
-			if (copy_to_user(metadata, meta, meta_len))
-				status = -EFAULT;
-		}
-		dma_free_coherent(dev->dev, meta_len, meta, meta_dma);
-	}
-	return status;
+	return __nvme_submit_user_cmd(ns->queue, &c,
+			(void __user *)(uintptr_t)io.addr, length,
+			metadata, meta_len, io.slba, NULL, 0);
 }
 
 static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata
  2015-10-16  5:58 ` [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata Christoph Hellwig
@ 2015-10-16 20:04   ` Busch, Keith
  2015-10-18 18:22     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Busch, Keith @ 2015-10-16 20:04 UTC (permalink / raw)


On Fri, Oct 16, 2015@07:58:42AM +0200, Christoph Hellwig wrote:
> Use the integrity API to pass through metadata from userspace.  For PI
> enabled devices this means that we now validate the reftag, which seems
> like an unintentional ommission in the old code.

More trouble. The below works for IO but not admin. There is no
"namespace" associated with the queue's queuedata, so there's no way to
send passthrough commands to the admin queue with metadata. I don't think
anyone cares as there is no admin command that carries such a payload.

Since the admin queuedata is NULL, we can just check for NULL before
proceeding with metadata setup, but need to be aware of this if the
admin queue ever does define queuedata.

My fault for not regression testing before posting the previous fix. The
diff to correct this new issue is at the end of this email.

> @@ -101,19 +105,77 @@ int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
>  		if (ret)
>  			goto out;
>  		bio = req->bio;
> +		bio->bi_bdev = bdget_disk(ns->disk, 0);
> +		if (!bio->bi_bdev) {
> +			ret = -ENODEV;
> +			goto out_unmap;
> +		}
> +
> +		if (meta_buffer) {
> +			struct bio_integrity_payload *bip;
> +
> +			meta = kmalloc(meta_len, GFP_KERNEL);
> +			if (!meta) {
> +				ret = -ENOMEM;
> +				goto out_unmap;
> +			}
> +
> +			if (write) {
> +				if (copy_from_user(meta, meta_buffer,
> +						meta_len)) {
> +					ret = -EFAULT;
> +					goto out_free_meta;
> +				}
> +			}
> +
> +			bip = bio_integrity_alloc(bio, GFP_KERNEL, 1);
> +			if (!bip) {
> +				ret = -ENOMEM;
> +				goto out_free_meta;
> +			}
> +
> +			bip->bip_iter.bi_size = meta_len;
> +			bip->bip_iter.bi_sector = meta_seed;
> +
> +			ret = bio_integrity_add_page(bio, virt_to_page(meta),
> +					meta_len, offset_in_page(meta));
> +			if (ret != meta_len) {
> +				ret = -ENOMEM;
> +				goto out_free_meta;
> +			}
> +		}
>  	}
>  
> -	blk_execute_rq(req->q, NULL, req, 0);
> -	if (bio)
> -		blk_rq_unmap_user(bio);
> +	blk_execute_rq(req->q, ns->disk, req, 0);
> +	ret = req->errors;
>  	if (result)
>  		*result = (u32)(uintptr_t)req->special;
> -	ret = req->errors;
> +	if (meta && !ret && !write) {
> +		if (copy_to_user(meta_buffer, meta, meta_len))
> +			ret = -EFAULT;
> +	}
> +
> + out_free_meta:
> +	kfree(meta);
> + out_unmap:
> +	if (bio) {
> +		if (bio->bi_bdev)
> +			bdput(bio->bi_bdev);
> +		blk_rq_unmap_user(bio);
> +	}
>   out:
>  	blk_mq_free_request(req);
>  	return ret;
>  }

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b8c72d2..c8f9a71 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -156,6 +156,10 @@ int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
 		if (ret)
 			goto out;
 		bio = req->bio;
+
+		if (!ns)
+			goto submit;
+
 		bio->bi_bdev = bdget_disk(ns->disk, 0);
 		if (!bio->bi_bdev) {
 			ret = -ENODEV;
@@ -196,8 +200,8 @@ int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd,
 			}
 		}
 	}
-
-	blk_execute_rq(req->q, ns->disk, req, 0);
+ submit:
+	blk_execute_rq(req->q, ns ? ns->disk : NULL, req, 0);
 	ret = req->errors;
 	if (result)
 		*result = (u32)(uintptr_t)req->special;
--

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata
  2015-10-16 20:04   ` Busch, Keith
@ 2015-10-18 18:22     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-18 18:22 UTC (permalink / raw)


Thanks Keith, this looks good to me.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 13/18] nvme: move block_device_operations and ns/ctrl freeing to common code
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 14/18] nvme: add explicit quirk handling Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


This moves the block_device_operations over to common code mostly
as-is.  The only change is that the ns and ctrl refcounting got some
small refcounting to have wrappers around the kref_put operations.

A new free_ctrl operation is added to allow the PCI driver to free
it's ressources on the final drop.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 320 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  14 +++
 drivers/nvme/host/pci.c  | 319 ++--------------------------------------------
 3 files changed, 346 insertions(+), 307 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6d738b6..1b27fd9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -15,12 +15,50 @@
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/errno.h>
+#include <linux/hdreg.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/nvme_ioctl.h>
+#include <linux/t10-pi.h>
+#include <scsi/sg.h>
 
 #include "nvme.h"
 
+DEFINE_SPINLOCK(dev_list_lock);
+
+static void nvme_free_ns(struct kref *kref)
+{
+	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
+
+	spin_lock(&dev_list_lock);
+	ns->disk->private_data = NULL;
+	spin_unlock(&dev_list_lock);
+
+	nvme_put_ctrl(ns->ctrl);
+	put_disk(ns->disk);
+	kfree(ns);
+}
+
+void nvme_put_ns(struct nvme_ns *ns)
+{
+	kref_put(&ns->kref, nvme_free_ns);
+}
+
+static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
+{
+	struct nvme_ns *ns;
+
+	spin_lock(&dev_list_lock);
+	ns = disk->private_data;
+	if (ns && !kref_get_unless_zero(&ns->kref))
+		ns = NULL;
+	spin_unlock(&dev_list_lock);
+
+	return ns;
+}
+
 static struct request *nvme_alloc_request(struct request_queue *q,
 		struct nvme_command *cmd)
 {
@@ -266,3 +304,285 @@ int nvme_get_log_page(struct nvme_ctrl *dev, struct nvme_smart_log **log)
 		kfree(*log);
 	return error;
 }
+
+static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
+{
+	struct nvme_user_io io;
+	struct nvme_command c;
+	unsigned length, meta_len;
+	void __user *metadata;
+
+	if (copy_from_user(&io, uio, sizeof(io)))
+		return -EFAULT;
+
+	switch (io.opcode) {
+	case nvme_cmd_write:
+	case nvme_cmd_read:
+	case nvme_cmd_compare:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	length = (io.nblocks + 1) << ns->lba_shift;
+	meta_len = (io.nblocks + 1) * ns->ms;
+	metadata = (void __user *)(uintptr_t)io.metadata;
+
+	if (ns->ext) {
+		length += meta_len;
+		meta_len = 0;
+	} else if (meta_len) {
+		if ((io.metadata & 3) || !io.metadata)
+			return -EINVAL;
+	}
+
+	memset(&c, 0, sizeof(c));
+	c.rw.opcode = io.opcode;
+	c.rw.flags = io.flags;
+	c.rw.nsid = cpu_to_le32(ns->ns_id);
+	c.rw.slba = cpu_to_le64(io.slba);
+	c.rw.length = cpu_to_le16(io.nblocks);
+	c.rw.control = cpu_to_le16(io.control);
+	c.rw.dsmgmt = cpu_to_le32(io.dsmgmt);
+	c.rw.reftag = cpu_to_le32(io.reftag);
+	c.rw.apptag = cpu_to_le16(io.apptag);
+	c.rw.appmask = cpu_to_le16(io.appmask);
+
+	return __nvme_submit_user_cmd(ns->queue, &c,
+			(void __user *)(uintptr_t)io.addr, length,
+			metadata, meta_len, io.slba, NULL, 0);
+}
+
+int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+			struct nvme_passthru_cmd __user *ucmd)
+{
+	struct nvme_passthru_cmd cmd;
+	struct nvme_command c;
+	unsigned timeout = 0;
+	int status;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+	if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
+		return -EFAULT;
+
+	memset(&c, 0, sizeof(c));
+	c.common.opcode = cmd.opcode;
+	c.common.flags = cmd.flags;
+	c.common.nsid = cpu_to_le32(cmd.nsid);
+	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
+	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
+	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
+	c.common.cdw10[1] = cpu_to_le32(cmd.cdw11);
+	c.common.cdw10[2] = cpu_to_le32(cmd.cdw12);
+	c.common.cdw10[3] = cpu_to_le32(cmd.cdw13);
+	c.common.cdw10[4] = cpu_to_le32(cmd.cdw14);
+	c.common.cdw10[5] = cpu_to_le32(cmd.cdw15);
+
+	if (cmd.timeout_ms)
+		timeout = msecs_to_jiffies(cmd.timeout_ms);
+
+	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
+			(void __user *)cmd.addr, cmd.data_len,
+			&cmd.result, timeout);
+	if (status >= 0) {
+		if (put_user(cmd.result, &ucmd->result))
+			return -EFAULT;
+	}
+
+	return status;
+}
+
+static int nvme_ioctl(struct block_device *bdev, fmode_t mode,
+		unsigned int cmd, unsigned long arg)
+{
+	struct nvme_ns *ns = bdev->bd_disk->private_data;
+
+	switch (cmd) {
+	case NVME_IOCTL_ID:
+		force_successful_syscall_return();
+		return ns->ns_id;
+	case NVME_IOCTL_ADMIN_CMD:
+		return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
+	case NVME_IOCTL_IO_CMD:
+		return nvme_user_cmd(ns->ctrl, ns, (void __user *)arg);
+	case NVME_IOCTL_SUBMIT_IO:
+		return nvme_submit_io(ns, (void __user *)arg);
+	case SG_GET_VERSION_NUM:
+		return nvme_sg_get_version_num((void __user *)arg);
+	case SG_IO:
+		return nvme_sg_io(ns, (void __user *)arg);
+	default:
+		return -ENOTTY;
+	}
+}
+
+#ifdef CONFIG_COMPAT
+static int nvme_compat_ioctl(struct block_device *bdev, fmode_t mode,
+			unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case SG_IO:
+		return -ENOIOCTLCMD;
+	}
+	return nvme_ioctl(bdev, mode, cmd, arg);
+}
+#else
+#define nvme_compat_ioctl	NULL
+#endif
+
+static int nvme_open(struct block_device *bdev, fmode_t mode)
+{
+	return nvme_get_ns_from_disk(bdev->bd_disk) ? 0 : -ENXIO;
+}
+
+static void nvme_release(struct gendisk *disk, fmode_t mode)
+{
+	nvme_put_ns(disk->private_data);
+}
+
+static int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
+{
+	/* some standard values */
+	geo->heads = 1 << 6;
+	geo->sectors = 1 << 5;
+	geo->cylinders = get_capacity(bdev->bd_disk) >> 11;
+	return 0;
+}
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static int nvme_noop_verify(struct blk_integrity_iter *iter)
+{
+	return 0;
+}
+
+static int nvme_noop_generate(struct blk_integrity_iter *iter)
+{
+	return 0;
+}
+
+static struct blk_integrity nvme_meta_noop = {
+	.name			= "NVME_META_NOOP",
+	.generate_fn		= nvme_noop_generate,
+	.verify_fn		= nvme_noop_verify,
+};
+
+static void nvme_init_integrity(struct nvme_ns *ns)
+{
+	struct blk_integrity integrity;
+
+	switch (ns->pi_type) {
+	case NVME_NS_DPS_PI_TYPE3:
+		integrity = t10_pi_type3_crc;
+		break;
+	case NVME_NS_DPS_PI_TYPE1:
+	case NVME_NS_DPS_PI_TYPE2:
+		integrity = t10_pi_type1_crc;
+		break;
+	default:
+		integrity = nvme_meta_noop;
+		break;
+	}
+	integrity.tuple_size = ns->ms;
+	blk_integrity_register(ns->disk, &integrity);
+	blk_queue_max_integrity_segments(ns->queue, 1);
+}
+#else
+static void nvme_init_integrity(struct nvme_ns *ns)
+{
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+static void nvme_config_discard(struct nvme_ns *ns)
+{
+	u32 logical_block_size = queue_logical_block_size(ns->queue);
+	ns->queue->limits.discard_zeroes_data = 0;
+	ns->queue->limits.discard_alignment = logical_block_size;
+	ns->queue->limits.discard_granularity = logical_block_size;
+	blk_queue_max_discard_sectors(ns->queue, 0xffffffff);
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+}
+
+int nvme_revalidate_disk(struct gendisk *disk)
+{
+	struct nvme_ns *ns = disk->private_data;
+	struct nvme_id_ns *id;
+	u8 lbaf, pi_type;
+	u16 old_ms;
+	unsigned short bs;
+
+	if (nvme_identify_ns(ns->ctrl, ns->ns_id, &id)) {
+		dev_warn(ns->ctrl->dev, "%s: Identify failure nvme%dn%d\n",
+				__func__, ns->ctrl->instance, ns->ns_id);
+		return -ENODEV;
+	}
+	if (id->ncap == 0) {
+		kfree(id);
+		return -ENODEV;
+	}
+
+	old_ms = ns->ms;
+	lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
+	ns->lba_shift = id->lbaf[lbaf].ds;
+	ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
+
+	/*
+	 * If identify namespace failed, use default 512 byte block size so
+	 * block layer can use before failing read/write for 0 capacity.
+	 */
+	if (ns->lba_shift == 0)
+		ns->lba_shift = 9;
+	bs = 1 << ns->lba_shift;
+
+	/* XXX: PI implementation requires metadata equal t10 pi tuple size */
+	pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
+					id->dps & NVME_NS_DPS_PI_MASK : 0;
+
+	if (blk_get_integrity(disk) && (ns->pi_type != pi_type ||
+				ns->ms != old_ms ||
+				bs != queue_logical_block_size(disk->queue) ||
+				(ns->ms && ns->ext)))
+		blk_integrity_unregister(disk);
+
+	ns->pi_type = pi_type;
+	blk_queue_logical_block_size(ns->queue, bs);
+
+	if (ns->ms && !blk_get_integrity(disk) && (disk->flags & GENHD_FL_UP) &&
+								!ns->ext)
+		nvme_init_integrity(ns);
+
+	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
+		set_capacity(disk, 0);
+	else
+		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+
+	if (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM)
+		nvme_config_discard(ns);
+
+	kfree(id);
+	return 0;
+}
+
+const struct block_device_operations nvme_fops = {
+	.owner		= THIS_MODULE,
+	.ioctl		= nvme_ioctl,
+	.compat_ioctl	= nvme_compat_ioctl,
+	.open		= nvme_open,
+	.release	= nvme_release,
+	.getgeo		= nvme_getgeo,
+	.revalidate_disk= nvme_revalidate_disk,
+};
+
+static void nvme_free_ctrl(struct kref *kref)
+{
+	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+
+	ctrl->ops->free_ctrl(ctrl);
+}
+
+void nvme_put_ctrl(struct nvme_ctrl *ctrl)
+{
+	kref_put(&ctrl->kref, nvme_free_ctrl);
+}
+
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fefcdff..d3a6c97 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -19,6 +19,8 @@
 #include <linux/kref.h>
 #include <linux/blk-mq.h>
 
+struct nvme_passthru_cmd;
+
 extern unsigned char nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
 
@@ -29,6 +31,7 @@ struct nvme_ctrl {
 	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
 	struct device *dev;
+	struct kref kref;
 	int instance;
 
 	char name[12];
@@ -64,6 +67,7 @@ struct nvme_ns {
 
 struct nvme_ctrl_ops {
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
+	void (*free_ctrl)(struct nvme_ctrl *ctrl);
 };
 
 static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
@@ -143,6 +147,9 @@ static inline int nvme_error_status(u16 status)
 	}
 }
 
+void nvme_put_ctrl(struct nvme_ctrl *ctrl);
+void nvme_put_ns(struct nvme_ns *ns);
+
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
@@ -163,6 +170,13 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 			dma_addr_t dma_addr, u32 *result);
 
+extern const struct block_device_operations nvme_fops;
+extern spinlock_t dev_list_lock;
+
+int nvme_revalidate_disk(struct gendisk *disk);
+int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+			struct nvme_passthru_cmd __user *ucmd);
+
 struct sg_io_hdr;
 
 int nvme_sg_io(struct nvme_ns *ns, struct sg_io_hdr __user *u_hdr);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c972213..e6b3992 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -77,7 +77,6 @@ static bool use_cmb_sqes = true;
 module_param(use_cmb_sqes, bool, 0644);
 MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for I/O SQes");
 
-static DEFINE_SPINLOCK(dev_list_lock);
 static LIST_HEAD(dev_list);
 static struct task_struct *nvme_thread;
 static struct workqueue_struct *nvme_workq;
@@ -123,7 +122,6 @@ struct nvme_dev {
 	struct msix_entry *entry;
 	void __iomem *bar;
 	struct list_head namespaces;
-	struct kref kref;
 	struct device *device;
 	struct work_struct reset_work;
 	struct work_struct probe_work;
@@ -597,43 +595,6 @@ static void nvme_dif_remap(struct request *req,
 	}
 	kunmap_atomic(pmap);
 }
-
-static int nvme_noop_verify(struct blk_integrity_iter *iter)
-{
-	return 0;
-}
-
-static int nvme_noop_generate(struct blk_integrity_iter *iter)
-{
-	return 0;
-}
-
-struct blk_integrity nvme_meta_noop = {
-	.name			= "NVME_META_NOOP",
-	.generate_fn		= nvme_noop_generate,
-	.verify_fn		= nvme_noop_verify,
-};
-
-static void nvme_init_integrity(struct nvme_ns *ns)
-{
-	struct blk_integrity integrity;
-
-	switch (ns->pi_type) {
-	case NVME_NS_DPS_PI_TYPE3:
-		integrity = t10_pi_type3_crc;
-		break;
-	case NVME_NS_DPS_PI_TYPE1:
-	case NVME_NS_DPS_PI_TYPE2:
-		integrity = t10_pi_type1_crc;
-		break;
-	default:
-		integrity = nvme_meta_noop;
-		break;
-	}
-	integrity.tuple_size = ns->ms;
-	blk_integrity_register(ns->disk, &integrity);
-	blk_queue_max_integrity_segments(ns->queue, 1);
-}
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 static void nvme_dif_remap(struct request *req,
 			void (*dif_swap)(u32 p, u32 v, struct t10_pi_tuple *pi))
@@ -645,9 +606,6 @@ static void nvme_dif_prep(u32 p, u32 v, struct t10_pi_tuple *pi)
 static void nvme_dif_complete(u32 p, u32 v, struct t10_pi_tuple *pi)
 {
 }
-static void nvme_init_integrity(struct nvme_ns *ns)
-{
-}
 #endif
 
 static void req_completion(struct nvme_queue *nvmeq, void *ctx,
@@ -1613,94 +1571,6 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 	return result;
 }
 
-static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
-{
-	struct nvme_user_io io;
-	struct nvme_command c;
-	unsigned length, meta_len;
-	void __user *metadata;
-
-	if (copy_from_user(&io, uio, sizeof(io)))
-		return -EFAULT;
-
-	switch (io.opcode) {
-	case nvme_cmd_write:
-	case nvme_cmd_read:
-	case nvme_cmd_compare:
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	length = (io.nblocks + 1) << ns->lba_shift;
-	meta_len = (io.nblocks + 1) * ns->ms;
-	metadata = (void __user *)(uintptr_t)io.metadata;
-
-	if (ns->ext) {
-		length += meta_len;
-		meta_len = 0;
-	} else if (meta_len) {
-		if ((io.metadata & 3) || !io.metadata)
-			return -EINVAL;
-	}
-
-	memset(&c, 0, sizeof(c));
-	c.rw.opcode = io.opcode;
-	c.rw.flags = io.flags;
-	c.rw.nsid = cpu_to_le32(ns->ns_id);
-	c.rw.slba = cpu_to_le64(io.slba);
-	c.rw.length = cpu_to_le16(io.nblocks);
-	c.rw.control = cpu_to_le16(io.control);
-	c.rw.dsmgmt = cpu_to_le32(io.dsmgmt);
-	c.rw.reftag = cpu_to_le32(io.reftag);
-	c.rw.apptag = cpu_to_le16(io.apptag);
-	c.rw.appmask = cpu_to_le16(io.appmask);
-
-	return __nvme_submit_user_cmd(ns->queue, &c,
-			(void __user *)(uintptr_t)io.addr, length,
-			metadata, meta_len, io.slba, NULL, 0);
-}
-
-static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-			struct nvme_passthru_cmd __user *ucmd)
-{
-	struct nvme_passthru_cmd cmd;
-	struct nvme_command c;
-	unsigned timeout = 0;
-	int status;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EACCES;
-	if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
-		return -EFAULT;
-
-	memset(&c, 0, sizeof(c));
-	c.common.opcode = cmd.opcode;
-	c.common.flags = cmd.flags;
-	c.common.nsid = cpu_to_le32(cmd.nsid);
-	c.common.cdw2[0] = cpu_to_le32(cmd.cdw2);
-	c.common.cdw2[1] = cpu_to_le32(cmd.cdw3);
-	c.common.cdw10[0] = cpu_to_le32(cmd.cdw10);
-	c.common.cdw10[1] = cpu_to_le32(cmd.cdw11);
-	c.common.cdw10[2] = cpu_to_le32(cmd.cdw12);
-	c.common.cdw10[3] = cpu_to_le32(cmd.cdw13);
-	c.common.cdw10[4] = cpu_to_le32(cmd.cdw14);
-	c.common.cdw10[5] = cpu_to_le32(cmd.cdw15);
-
-	if (cmd.timeout_ms)
-		timeout = msecs_to_jiffies(cmd.timeout_ms);
-
-	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			(void __user *)(uintptr_t)cmd.addr, cmd.data_len,
-			&cmd.result, timeout);
-	if (status >= 0) {
-		if (put_user(cmd.result, &ucmd->result))
-			return -EFAULT;
-	}
-
-	return status;
-}
-
 static int nvme_subsys_reset(struct nvme_dev *dev)
 {
 	if (!dev->subsystem)
@@ -1710,172 +1580,6 @@ static int nvme_subsys_reset(struct nvme_dev *dev)
 	return 0;
 }
 
-static int nvme_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd,
-							unsigned long arg)
-{
-	struct nvme_ns *ns = bdev->bd_disk->private_data;
-
-	switch (cmd) {
-	case NVME_IOCTL_ID:
-		force_successful_syscall_return();
-		return ns->ns_id;
-	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
-	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->ctrl, ns, (void __user *)arg);
-	case NVME_IOCTL_SUBMIT_IO:
-		return nvme_submit_io(ns, (void __user *)arg);
-	case SG_GET_VERSION_NUM:
-		return nvme_sg_get_version_num((void __user *)arg);
-	case SG_IO:
-		return nvme_sg_io(ns, (void __user *)arg);
-	default:
-		return -ENOTTY;
-	}
-}
-
-#ifdef CONFIG_COMPAT
-static int nvme_compat_ioctl(struct block_device *bdev, fmode_t mode,
-					unsigned int cmd, unsigned long arg)
-{
-	switch (cmd) {
-	case SG_IO:
-		return -ENOIOCTLCMD;
-	}
-	return nvme_ioctl(bdev, mode, cmd, arg);
-}
-#else
-#define nvme_compat_ioctl	NULL
-#endif
-
-static void nvme_free_dev(struct kref *kref);
-static void nvme_free_ns(struct kref *kref)
-{
-	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
-	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
-
-	spin_lock(&dev_list_lock);
-	ns->disk->private_data = NULL;
-	spin_unlock(&dev_list_lock);
-
-	kref_put(&dev->kref, nvme_free_dev);
-	put_disk(ns->disk);
-	kfree(ns);
-}
-
-static int nvme_open(struct block_device *bdev, fmode_t mode)
-{
-	int ret = 0;
-	struct nvme_ns *ns;
-
-	spin_lock(&dev_list_lock);
-	ns = bdev->bd_disk->private_data;
-	if (!ns)
-		ret = -ENXIO;
-	else if (!kref_get_unless_zero(&ns->kref))
-		ret = -ENXIO;
-	spin_unlock(&dev_list_lock);
-
-	return ret;
-}
-
-static void nvme_release(struct gendisk *disk, fmode_t mode)
-{
-	struct nvme_ns *ns = disk->private_data;
-	kref_put(&ns->kref, nvme_free_ns);
-}
-
-static int nvme_getgeo(struct block_device *bd, struct hd_geometry *geo)
-{
-	/* some standard values */
-	geo->heads = 1 << 6;
-	geo->sectors = 1 << 5;
-	geo->cylinders = get_capacity(bd->bd_disk) >> 11;
-	return 0;
-}
-
-static void nvme_config_discard(struct nvme_ns *ns)
-{
-	u32 logical_block_size = queue_logical_block_size(ns->queue);
-	ns->queue->limits.discard_zeroes_data = 0;
-	ns->queue->limits.discard_alignment = logical_block_size;
-	ns->queue->limits.discard_granularity = logical_block_size;
-	blk_queue_max_discard_sectors(ns->queue, 0xffffffff);
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
-}
-
-static int nvme_revalidate_disk(struct gendisk *disk)
-{
-	struct nvme_ns *ns = disk->private_data;
-	struct nvme_dev *dev = to_nvme_dev(ns->ctrl);
-	struct nvme_id_ns *id;
-	u8 lbaf, pi_type;
-	u16 old_ms;
-	unsigned short bs;
-
-	if (nvme_identify_ns(&dev->ctrl, ns->ns_id, &id)) {
-		dev_warn(dev->dev, "%s: Identify failure nvme%dn%d\n", __func__,
-						dev->ctrl.instance, ns->ns_id);
-		return -ENODEV;
-	}
-	if (id->ncap == 0) {
-		kfree(id);
-		return -ENODEV;
-	}
-
-	old_ms = ns->ms;
-	lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
-	ns->lba_shift = id->lbaf[lbaf].ds;
-	ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
-	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
-
-	/*
-	 * If identify namespace failed, use default 512 byte block size so
-	 * block layer can use before failing read/write for 0 capacity.
-	 */
-	if (ns->lba_shift == 0)
-		ns->lba_shift = 9;
-	bs = 1 << ns->lba_shift;
-
-	/* XXX: PI implementation requires metadata equal t10 pi tuple size */
-	pi_type = ns->ms == sizeof(struct t10_pi_tuple) ?
-					id->dps & NVME_NS_DPS_PI_MASK : 0;
-
-	if (blk_get_integrity(disk) && (ns->pi_type != pi_type ||
-				ns->ms != old_ms ||
-				bs != queue_logical_block_size(disk->queue) ||
-				(ns->ms && ns->ext)))
-		blk_integrity_unregister(disk);
-
-	ns->pi_type = pi_type;
-	blk_queue_logical_block_size(ns->queue, bs);
-
-	if (ns->ms && !blk_get_integrity(disk) && (disk->flags & GENHD_FL_UP) &&
-								!ns->ext)
-		nvme_init_integrity(ns);
-
-	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
-		set_capacity(disk, 0);
-	else
-		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
-
-	if (dev->ctrl.oncs & NVME_CTRL_ONCS_DSM)
-		nvme_config_discard(ns);
-
-	kfree(id);
-	return 0;
-}
-
-static const struct block_device_operations nvme_fops = {
-	.owner		= THIS_MODULE,
-	.ioctl		= nvme_ioctl,
-	.compat_ioctl	= nvme_compat_ioctl,
-	.open		= nvme_open,
-	.release	= nvme_release,
-	.getgeo		= nvme_getgeo,
-	.revalidate_disk= nvme_revalidate_disk,
-};
-
 static int nvme_kthread(void *data)
 {
 	struct nvme_dev *dev, *next;
@@ -1976,7 +1680,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	if (nvme_revalidate_disk(ns->disk))
 		goto out_free_disk;
 
-	kref_get(&dev->kref);
+	kref_get(&dev->ctrl.kref);
 	add_disk(ns->disk);
 	if (ns->ms) {
 		struct block_device *bd = bdget_disk(ns->disk, 0);
@@ -2226,7 +1930,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		blk_cleanup_queue(ns->queue);
 	}
 	list_del_init(&ns->list);
-	kref_put(&ns->kref, nvme_free_ns);
+	nvme_put_ns(ns);
 }
 
 static void nvme_scan_namespaces(struct nvme_dev *dev, unsigned nn)
@@ -2692,9 +2396,9 @@ static void nvme_release_instance(struct nvme_dev *dev)
 	spin_unlock(&dev_list_lock);
 }
 
-static void nvme_free_dev(struct kref *kref)
+static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
-	struct nvme_dev *dev = container_of(kref, struct nvme_dev, kref);
+	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
 	put_device(dev->dev);
 	put_device(dev->device);
@@ -2721,7 +2425,7 @@ static int nvme_dev_open(struct inode *inode, struct file *f)
 				ret = -EWOULDBLOCK;
 				break;
 			}
-			if (!kref_get_unless_zero(&dev->kref))
+			if (!kref_get_unless_zero(&dev->ctrl.kref))
 				break;
 			f->private_data = dev;
 			ret = 0;
@@ -2736,7 +2440,7 @@ static int nvme_dev_open(struct inode *inode, struct file *f)
 static int nvme_dev_release(struct inode *inode, struct file *f)
 {
 	struct nvme_dev *dev = f->private_data;
-	kref_put(&dev->kref, nvme_free_dev);
+	nvme_put_ctrl(&dev->ctrl);
 	return 0;
 }
 
@@ -2851,19 +2555,19 @@ static int nvme_remove_dead_ctrl(void *arg)
 
 	if (pci_get_drvdata(pdev))
 		pci_stop_and_remove_bus_device_locked(pdev);
-	kref_put(&dev->kref, nvme_free_dev);
+	nvme_put_ctrl(&dev->ctrl);
 	return 0;
 }
 
 static void nvme_dead_ctrl(struct nvme_dev *dev)
 {
 	dev_warn(dev->dev, "Device failed to resume\n");
-	kref_get(&dev->kref);
+	kref_get(&dev->ctrl.kref);
 	if (IS_ERR(kthread_run(nvme_remove_dead_ctrl, dev, "nvme%d",
 						dev->ctrl.instance))) {
 		dev_err(dev->dev,
 			"Failed to start controller remove task\n");
-		kref_put(&dev->kref, nvme_free_dev);
+		nvme_put_ctrl(&dev->ctrl);
 	}
 }
 
@@ -2941,6 +2645,7 @@ static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.reg_read32		= nvme_pci_reg_read32,
+	.free_ctrl		= nvme_pci_free_ctrl,
 };
 
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
@@ -2981,7 +2686,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (result)
 		goto release;
 
-	kref_init(&dev->kref);
+	kref_init(&dev->ctrl.kref);
 	dev->device = device_create(nvme_class, &pdev->dev,
 				MKDEV(nvme_char_major, dev->ctrl.instance),
 				dev, "nvme%d", dev->ctrl.instance);
@@ -3054,7 +2759,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
 	nvme_release_prp_pools(dev);
-	kref_put(&dev->kref, nvme_free_dev);
+	nvme_put_ctrl(&dev->ctrl);
 }
 
 /* These functions are yet to be implemented */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 14/18] nvme: add explicit quirk handling
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 13/18] nvme: move block_device_operations and ns/ctrl freeing to common code Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 15/18] nvme: add a common helper to read Identify Controller data Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


Add an enum for all workarounds not in the spec and identify the affected
controllers at probe time.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/nvme.h | 13 +++++++++++++
 drivers/nvme/host/pci.c  |  8 +++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d3a6c97..1ebd0da 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -27,6 +27,18 @@ extern unsigned char nvme_io_timeout;
 extern unsigned char admin_timeout;
 #define ADMIN_TIMEOUT	(admin_timeout * HZ)
 
+/*
+ * List of workarounds for devices that required behavior not specified in
+ * the standard.
+ */
+enum nvme_quirks {
+	/*
+	 * Prefers I/O aligned to a stripe size specified in a vendor
+	 * specific Identify field.
+	 */
+	NVME_QUIRK_STRIPE_SIZE			= (1 << 0),
+};
+
 struct nvme_ctrl {
 	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
@@ -43,6 +55,7 @@ struct nvme_ctrl {
 	u8 event_limit;
 	u8 vwc;
 	u16 vendor;
+	unsigned long quirks;
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e6b3992..2c13d7a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1991,7 +1991,6 @@ static void nvme_dev_scan(struct work_struct *work)
  */
 static int nvme_dev_add(struct nvme_dev *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int res;
 	struct nvme_id_ctrl *ctrl;
 	int shift = NVME_CAP_MPSMIN(readq(dev->bar + NVME_REG_CAP)) + 12;
@@ -2010,8 +2009,8 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	memcpy(dev->ctrl.firmware_rev, ctrl->fr, sizeof(ctrl->fr));
 	if (ctrl->mdts)
 		dev->max_hw_sectors = 1 << (ctrl->mdts + shift - 9);
-	if ((pdev->vendor == PCI_VENDOR_ID_INTEL) &&
-			(pdev->device == 0x0953) && ctrl->vs[3]) {
+
+	if ((dev->ctrl.quirks & NVME_QUIRK_STRIPE_SIZE) && ctrl->vs[3]) {
 		unsigned int max_hw_sectors;
 
 		dev->stripe_size = 1 << (ctrl->vs[3] + shift);
@@ -2677,6 +2676,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dev->ctrl.vendor = pdev->vendor;
 	dev->ctrl.ops = &nvme_pci_ctrl_ops;
 	dev->ctrl.dev = dev->dev;
+	dev->ctrl.quirks = id->driver_data;
 
 	result = nvme_set_instance(dev);
 	if (result)
@@ -2804,6 +2804,8 @@ static const struct pci_error_handlers nvme_err_handler = {
 #define PCI_CLASS_STORAGE_EXPRESS	0x010802
 
 static const struct pci_device_id nvme_id_table[] = {
+	{ PCI_VDEVICE(INTEL, 0x0953),
+		.driver_data = NVME_QUIRK_STRIPE_SIZE, },
 	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
 	{ 0, }
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/18] nvme: add a common helper to read Identify Controller data
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 14/18] nvme: add explicit quirk handling Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-21 22:44   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 16/18] nvme: move the call to nvme_init_identify earlier Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


And add the 64-bit register read operation for it.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 53 +++++++++++++++---------------------------------
 3 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1b27fd9..3e88901 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -574,6 +574,56 @@ const struct block_device_operations nvme_fops = {
 	.revalidate_disk= nvme_revalidate_disk,
 };
 
+/*
+ * Initialize the cached copies of the Identify data and various controller
+ * register in our nvme_ctrl structure.  This should be called as soon as
+ * the admin queue is fully up and running.
+ */
+int nvme_init_identify(struct nvme_ctrl *ctrl)
+{
+	struct nvme_id_ctrl *id;
+	u64 cap;
+	int ret, page_shift;
+
+	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &cap);
+	if (ret) {
+		dev_err(ctrl->dev, "Reading CAP failed (%d)\n", ret);
+		return ret;
+	}
+	page_shift = NVME_CAP_MPSMIN(cap) + 12;
+
+	ret = nvme_identify_ctrl(ctrl, &id);
+	if (ret) {
+		dev_err(ctrl->dev, "Identify Controller failed (%d)\n", ret);
+		return -EIO;
+	}
+
+	ctrl->oncs = le16_to_cpup(&id->oncs);
+	ctrl->abort_limit = id->acl + 1;
+	ctrl->vwc = id->vwc;
+	memcpy(ctrl->serial, id->sn, sizeof(id->sn));
+	memcpy(ctrl->model, id->mn, sizeof(id->mn));
+	memcpy(ctrl->firmware_rev, id->fr, sizeof(id->fr));
+	if (id->mdts)
+		ctrl->max_hw_sectors = 1 << (id->mdts + page_shift - 9);
+
+	if ((ctrl->quirks & NVME_QUIRK_STRIPE_SIZE) && id->vs[3]) {
+		unsigned int max_hw_sectors;
+
+		ctrl->stripe_size = 1 << (id->vs[3] + page_shift);
+		max_hw_sectors = ctrl->stripe_size >> (page_shift - 9);
+		if (ctrl->max_hw_sectors) {
+			ctrl->max_hw_sectors = min(max_hw_sectors,
+							ctrl->max_hw_sectors);
+		} else {
+			ctrl->max_hw_sectors = max_hw_sectors;
+		}
+	}
+
+	kfree(id);
+	return 0;
+}
+
 static void nvme_free_ctrl(struct kref *kref)
 {
 	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1ebd0da..48b221a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -50,6 +50,8 @@ struct nvme_ctrl {
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
+	u32 max_hw_sectors;
+	u32 stripe_size;
 	u16 oncs;
 	u16 abort_limit;
 	u8 event_limit;
@@ -80,6 +82,7 @@ struct nvme_ns {
 
 struct nvme_ctrl_ops {
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
+	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
 	void (*free_ctrl)(struct nvme_ctrl *ctrl);
 };
 
@@ -161,6 +164,7 @@ static inline int nvme_error_status(u16 status)
 }
 
 void nvme_put_ctrl(struct nvme_ctrl *ctrl);
+int nvme_init_identify(struct nvme_ctrl *ctrl);
 void nvme_put_ns(struct nvme_ns *ns);
 
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2c13d7a..7a25d6f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -127,8 +127,6 @@ struct nvme_dev {
 	struct work_struct probe_work;
 	struct work_struct scan_work;
 	bool subsystem;
-	u32 max_hw_sectors;
-	u32 stripe_size;
 	u32 page_size;
 	void __iomem *cmb;
 	dma_addr_t cmb_dma_addr;
@@ -1650,13 +1648,13 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	list_add_tail(&ns->list, &dev->namespaces);
 
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
-	if (dev->max_hw_sectors) {
-		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
+	if (dev->ctrl.max_hw_sectors) {
+		blk_queue_max_hw_sectors(ns->queue, dev->ctrl.max_hw_sectors);
 		blk_queue_max_segments(ns->queue,
-			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
+			((dev->ctrl.max_hw_sectors << 9) / dev->page_size) + 1);
 	}
-	if (dev->stripe_size)
-		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
+	if (dev->ctrl.stripe_size)
+		blk_queue_chunk_sectors(ns->queue, dev->ctrl.stripe_size >> 9);
 	if (dev->ctrl.vwc & NVME_CTRL_VWC_PRESENT)
 		blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
 	blk_queue_virt_boundary(ns->queue, dev->page_size - 1);
@@ -1992,36 +1990,10 @@ static void nvme_dev_scan(struct work_struct *work)
 static int nvme_dev_add(struct nvme_dev *dev)
 {
 	int res;
-	struct nvme_id_ctrl *ctrl;
-	int shift = NVME_CAP_MPSMIN(readq(dev->bar + NVME_REG_CAP)) + 12;
-
-	res = nvme_identify_ctrl(&dev->ctrl, &ctrl);
-	if (res) {
-		dev_err(dev->dev, "Identify Controller failed (%d)\n", res);
-		return -EIO;
-	}
-
-	dev->ctrl.oncs = le16_to_cpup(&ctrl->oncs);
-	dev->ctrl.abort_limit = ctrl->acl + 1;
-	dev->ctrl.vwc = ctrl->vwc;
-	memcpy(dev->ctrl.serial, ctrl->sn, sizeof(ctrl->sn));
-	memcpy(dev->ctrl.model, ctrl->mn, sizeof(ctrl->mn));
-	memcpy(dev->ctrl.firmware_rev, ctrl->fr, sizeof(ctrl->fr));
-	if (ctrl->mdts)
-		dev->max_hw_sectors = 1 << (ctrl->mdts + shift - 9);
-
-	if ((dev->ctrl.quirks & NVME_QUIRK_STRIPE_SIZE) && ctrl->vs[3]) {
-		unsigned int max_hw_sectors;
-
-		dev->stripe_size = 1 << (ctrl->vs[3] + shift);
-		max_hw_sectors = dev->stripe_size >> (shift - 9);
-		if (dev->max_hw_sectors) {
-			dev->max_hw_sectors = min(max_hw_sectors,
-							dev->max_hw_sectors);
-		} else
-			dev->max_hw_sectors = max_hw_sectors;
-	}
-	kfree(ctrl);
+
+	res = nvme_init_identify(&dev->ctrl);
+	if (res)
+		return res;
 
 	if (!dev->tagset.tags) {
 		dev->tagset.ops = &nvme_mq_ops;
@@ -2642,8 +2614,15 @@ static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 	return 0;
 }
 
+static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
+{
+	*val = readq(to_nvme_dev(ctrl)->bar + off);
+	return 0;
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.reg_read32		= nvme_pci_reg_read32,
+	.reg_read64		= nvme_pci_reg_read64,
 	.free_ctrl		= nvme_pci_free_ctrl,
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/18] nvme: add a common helper to read Identify Controller data
  2015-10-16  5:58 ` [PATCH 15/18] nvme: add a common helper to read Identify Controller data Christoph Hellwig
@ 2015-10-21 22:44   ` J Freyensee
  2015-10-22  7:38     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: J Freyensee @ 2015-10-21 22:44 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> And add the 64-bit register read operation for it.

I apologize, but I am getting tripped up by the subject line of this
patch and then the following description.

To me, this patch sounds and looks like it is doing two distinct,
separate things, thereby two separate patches:

1. Add a helper function to read and cache Identify Controller data
2. Add a 64-bit read register function pointer to nvme_ctrl_ops

I think it was PATCH 6 of this series that introduced "int
(*reg_read32)()".  If it is easier, it would also make sense to also
add "int (*reg_read64)()" there, instead of its own separate patch.
 > 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c | 50 
> +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h |  4 ++++
>  drivers/nvme/host/pci.c  | 53 +++++++++++++++-----------------------
> ----------
>  3 files changed, 70 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1b27fd9..3e88901 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -574,6 +574,56 @@ const struct block_device_operations nvme_fops = 
> {
>  	.revalidate_disk= nvme_revalidate_disk,
>  };
>  
> +/*
> + * Initialize the cached copies of the Identify data and various 
> controller
> + * register in our nvme_ctrl structure.  This should be called as 
> soon as
> + * the admin queue is fully up and running.
> + */
> +int nvme_init_identify(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_id_ctrl *id;
> +	u64 cap;
> +	int ret, page_shift;
> +
> +	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &cap);
> +	if (ret) {
> +		dev_err(ctrl->dev, "Reading CAP failed (%d)\n", 
> ret);
> +		return ret;
> +	}
> +	page_shift = NVME_CAP_MPSMIN(cap) + 12;
> +
> +	ret = nvme_identify_ctrl(ctrl, &id);
> +	if (ret) {
> +		dev_err(ctrl->dev, "Identify Controller failed 
> (%d)\n", ret);
> +		return -EIO;
> +	}
> +
> +	ctrl->oncs = le16_to_cpup(&id->oncs);
> +	ctrl->abort_limit = id->acl + 1;
> +	ctrl->vwc = id->vwc;
> +	memcpy(ctrl->serial, id->sn, sizeof(id->sn));
> +	memcpy(ctrl->model, id->mn, sizeof(id->mn));
> +	memcpy(ctrl->firmware_rev, id->fr, sizeof(id->fr));
> +	if (id->mdts)
> +		ctrl->max_hw_sectors = 1 << (id->mdts + page_shift - 
> 9);
> +
> +	if ((ctrl->quirks & NVME_QUIRK_STRIPE_SIZE) && id->vs[3]) {
> +		unsigned int max_hw_sectors;
> +
> +		ctrl->stripe_size = 1 << (id->vs[3] + page_shift);
> +		max_hw_sectors = ctrl->stripe_size >> (page_shift - 
> 9);
> +		if (ctrl->max_hw_sectors) {
> +			ctrl->max_hw_sectors = min(max_hw_sectors,
> +							ctrl
> ->max_hw_sectors);
> +		} else {
> +			ctrl->max_hw_sectors = max_hw_sectors;
> +		}
> +	}
> +
> +	kfree(id);
> +	return 0;
> +}
> +
>  static void nvme_free_ctrl(struct kref *kref)
>  {
>  	struct nvme_ctrl *ctrl = container_of(kref, struct 
> nvme_ctrl, kref);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1ebd0da..48b221a 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -50,6 +50,8 @@ struct nvme_ctrl {
>  	char serial[20];
>  	char model[40];
>  	char firmware_rev[8];
> +	u32 max_hw_sectors;
> +	u32 stripe_size;
>  	u16 oncs;
>  	u16 abort_limit;
>  	u8 event_limit;
> @@ -80,6 +82,7 @@ struct nvme_ns {
>  
>  struct nvme_ctrl_ops {
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 
> *val);
> +	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 
> *val);
>  	void (*free_ctrl)(struct nvme_ctrl *ctrl);
>  };
>  
> @@ -161,6 +164,7 @@ static inline int nvme_error_status(u16 status)
>  }
>  
>  void nvme_put_ctrl(struct nvme_ctrl *ctrl);
> +int nvme_init_identify(struct nvme_ctrl *ctrl);
>  void nvme_put_ns(struct nvme_ns *ns);
>  
>  int nvme_submit_sync_cmd(struct request_queue *q, struct 
> nvme_command *cmd,
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2c13d7a..7a25d6f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -127,8 +127,6 @@ struct nvme_dev {
>  	struct work_struct probe_work;
>  	struct work_struct scan_work;
>  	bool subsystem;
> -	u32 max_hw_sectors;
> -	u32 stripe_size;
>  	u32 page_size;
>  	void __iomem *cmb;
>  	dma_addr_t cmb_dma_addr;
> @@ -1650,13 +1648,13 @@ static void nvme_alloc_ns(struct nvme_dev 
> *dev, unsigned nsid)
>  	list_add_tail(&ns->list, &dev->namespaces);
>  
>  	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
> -	if (dev->max_hw_sectors) {
> -		blk_queue_max_hw_sectors(ns->queue, dev
> ->max_hw_sectors);
> +	if (dev->ctrl.max_hw_sectors) {
> +		blk_queue_max_hw_sectors(ns->queue, dev
> ->ctrl.max_hw_sectors);
>  		blk_queue_max_segments(ns->queue,
> -			((dev->max_hw_sectors << 9) / dev
> ->page_size) + 1);
> +			((dev->ctrl.max_hw_sectors << 9) / dev
> ->page_size) + 1);
>  	}
> -	if (dev->stripe_size)
> -		blk_queue_chunk_sectors(ns->queue, dev->stripe_size 
> >> 9);
> +	if (dev->ctrl.stripe_size)
> +		blk_queue_chunk_sectors(ns->queue, dev
> ->ctrl.stripe_size >> 9);
>  	if (dev->ctrl.vwc & NVME_CTRL_VWC_PRESENT)
>  		blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
>  	blk_queue_virt_boundary(ns->queue, dev->page_size - 1);
> @@ -1992,36 +1990,10 @@ static void nvme_dev_scan(struct work_struct 
> *work)
>  static int nvme_dev_add(struct nvme_dev *dev)
>  {
>  	int res;
> -	struct nvme_id_ctrl *ctrl;
> -	int shift = NVME_CAP_MPSMIN(readq(dev->bar + NVME_REG_CAP)) 
> + 12;
> -
> -	res = nvme_identify_ctrl(&dev->ctrl, &ctrl);
> -	if (res) {
> -		dev_err(dev->dev, "Identify Controller failed 
> (%d)\n", res);
> -		return -EIO;
> -	}
> -
> -	dev->ctrl.oncs = le16_to_cpup(&ctrl->oncs);
> -	dev->ctrl.abort_limit = ctrl->acl + 1;
> -	dev->ctrl.vwc = ctrl->vwc;
> -	memcpy(dev->ctrl.serial, ctrl->sn, sizeof(ctrl->sn));
> -	memcpy(dev->ctrl.model, ctrl->mn, sizeof(ctrl->mn));
> -	memcpy(dev->ctrl.firmware_rev, ctrl->fr, sizeof(ctrl->fr));
> -	if (ctrl->mdts)
> -		dev->max_hw_sectors = 1 << (ctrl->mdts + shift - 9);
> -
> -	if ((dev->ctrl.quirks & NVME_QUIRK_STRIPE_SIZE) && ctrl
> ->vs[3]) {
> -		unsigned int max_hw_sectors;
> -
> -		dev->stripe_size = 1 << (ctrl->vs[3] + shift);
> -		max_hw_sectors = dev->stripe_size >> (shift - 9);
> -		if (dev->max_hw_sectors) {
> -			dev->max_hw_sectors = min(max_hw_sectors,
> -							dev
> ->max_hw_sectors);
> -		} else
> -			dev->max_hw_sectors = max_hw_sectors;
> -	}
> -	kfree(ctrl);
> +
> +	res = nvme_init_identify(&dev->ctrl);
> +	if (res)
> +		return res;
>  
>  	if (!dev->tagset.tags) {
>  		dev->tagset.ops = &nvme_mq_ops;
> @@ -2642,8 +2614,15 @@ static int nvme_pci_reg_read32(struct 
> nvme_ctrl *ctrl, u32 off, u32 *val)
>  	return 0;
>  }
>  
> +static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 
> *val)
> +{
> +	*val = readq(to_nvme_dev(ctrl)->bar + off);
> +	return 0;
> +}
> +
>  static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
>  	.reg_read32		= nvme_pci_reg_read32,
> +	.reg_read64		= nvme_pci_reg_read64,
>  	.free_ctrl		= nvme_pci_free_ctrl,
>  };
>  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 15/18] nvme: add a common helper to read Identify Controller data
  2015-10-21 22:44   ` J Freyensee
@ 2015-10-22  7:38     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-22  7:38 UTC (permalink / raw)


On Wed, Oct 21, 2015@03:44:23PM -0700, J Freyensee wrote:
> On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> > And add the 64-bit register read operation for it.
> 
> I apologize, but I am getting tripped up by the subject line of this
> patch and then the following description.
> 
> To me, this patch sounds and looks like it is doing two distinct,
> separate things, thereby two separate patches:
> 
> 1. Add a helper function to read and cache Identify Controller data
> 2. Add a 64-bit read register function pointer to nvme_ctrl_ops
> 
> I think it was PATCH 6 of this series that introduced "int
> (*reg_read32)()".  If it is easier, it would also make sense to also
> add "int (*reg_read64)()" there, instead of its own separate patch.

Through the whole series I've introduced the abstractions on an as-needed
basis, and this patch follows that style.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 16/18] nvme: move the call to nvme_init_identify earlier
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 15/18] nvme: add a common helper to read Identify Controller data Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 17/18] nvme: move namespace scanning to common code Christoph Hellwig
  2015-10-16  5:58 ` [PATCH 18/18] nvme: move chardev and sysfs interface " Christoph Hellwig
  17 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


We want to record the identify and CAP values even if no I/O queue
is available.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/pci.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7a25d6f..adff1cb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1989,12 +1989,6 @@ static void nvme_dev_scan(struct work_struct *work)
  */
 static int nvme_dev_add(struct nvme_dev *dev)
 {
-	int res;
-
-	res = nvme_init_identify(&dev->ctrl);
-	if (res)
-		return res;
-
 	if (!dev->tagset.tags) {
 		dev->tagset.ops = &nvme_mq_ops;
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
@@ -2484,6 +2478,10 @@ static void nvme_probe_work(struct work_struct *work)
 	if (result)
 		goto disable;
 
+	result = nvme_init_identify(&dev->ctrl);
+	if (result)
+		goto free_tags;
+
 	result = nvme_setup_io_queues(dev);
 	if (result)
 		goto free_tags;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 16/18] nvme: move the call to nvme_init_identify earlier Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-16  6:14   ` Ming Lin
  2015-10-21 23:27   ` J Freyensee
  2015-10-16  5:58 ` [PATCH 18/18] nvme: move chardev and sysfs interface " Christoph Hellwig
  17 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


The namespace scanning code has been mostly generic already, we just
need to store a pointer to the tagset in the nvme_ctrl structure, and
add a method to check if a controller is I/O incapable.  The latter
will hopefully be replaced by a proper controller state machine soon.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 190 +++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h |  25 +++++-
 drivers/nvme/host/pci.c  | 221 +++++++----------------------------------------
 3 files changed, 239 insertions(+), 197 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3e88901..a01ab5a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -17,6 +17,8 @@
 #include <linux/errno.h>
 #include <linux/hdreg.h>
 #include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list_sort.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/ptrace.h>
@@ -26,6 +28,9 @@
 
 #include "nvme.h"
 
+static int nvme_major;
+module_param(nvme_major, int, 0);
+
 DEFINE_SPINLOCK(dev_list_lock);
 
 static void nvme_free_ns(struct kref *kref)
@@ -41,7 +46,7 @@ static void nvme_free_ns(struct kref *kref)
 	kfree(ns);
 }
 
-void nvme_put_ns(struct nvme_ns *ns)
+static void nvme_put_ns(struct nvme_ns *ns)
 {
 	kref_put(&ns->kref, nvme_free_ns);
 }
@@ -503,7 +508,7 @@ static void nvme_config_discard(struct nvme_ns *ns)
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
 }
 
-int nvme_revalidate_disk(struct gendisk *disk)
+static int nvme_revalidate_disk(struct gendisk *disk)
 {
 	struct nvme_ns *ns = disk->private_data;
 	struct nvme_id_ns *id;
@@ -564,7 +569,7 @@ int nvme_revalidate_disk(struct gendisk *disk)
 	return 0;
 }
 
-const struct block_device_operations nvme_fops = {
+static const struct block_device_operations nvme_fops = {
 	.owner		= THIS_MODULE,
 	.ioctl		= nvme_ioctl,
 	.compat_ioctl	= nvme_compat_ioctl,
@@ -591,6 +596,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		return ret;
 	}
 	page_shift = NVME_CAP_MPSMIN(cap) + 12;
+	ctrl->page_size = 1 << page_shift;
 
 	ret = nvme_identify_ctrl(ctrl, &id);
 	if (ret) {
@@ -636,3 +642,181 @@ void nvme_put_ctrl(struct nvme_ctrl *ctrl)
 	kref_put(&ctrl->kref, nvme_free_ctrl);
 }
 
+static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
+	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
+
+	return nsa->ns_id - nsb->ns_id;
+}
+
+static struct nvme_ns *nvme_find_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+{
+	struct nvme_ns *ns;
+
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->ns_id == nsid)
+			return ns;
+		if (ns->ns_id > nsid)
+			break;
+	}
+	return NULL;
+}
+
+static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+{
+	struct nvme_ns *ns;
+	struct gendisk *disk;
+	int node = dev_to_node(ctrl->dev);
+
+	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
+	if (!ns)
+		return;
+
+	ns->queue = blk_mq_init_queue(ctrl->tagset);
+	if (IS_ERR(ns->queue))
+		goto out_free_ns;
+	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, ns->queue);
+	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
+	ns->queue->queuedata = ns;
+	ns->ctrl = ctrl;
+
+	disk = alloc_disk_node(0, node);
+	if (!disk)
+		goto out_free_queue;
+
+	kref_init(&ns->kref);
+	ns->ns_id = nsid;
+	ns->disk = disk;
+	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
+	list_add_tail(&ns->list, &ctrl->namespaces);
+
+	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
+	if (ctrl->max_hw_sectors) {
+		blk_queue_max_hw_sectors(ns->queue, ctrl->max_hw_sectors);
+		blk_queue_max_segments(ns->queue,
+			((ctrl->max_hw_sectors << 9) / ctrl->page_size) + 1);
+	}
+	if (ctrl->stripe_size)
+		blk_queue_chunk_sectors(ns->queue, ctrl->stripe_size >> 9);
+	if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
+		blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
+	blk_queue_virt_boundary(ns->queue, ctrl->page_size - 1);
+
+	disk->major = nvme_major;
+	disk->first_minor = 0;
+	disk->fops = &nvme_fops;
+	disk->private_data = ns;
+	disk->queue = ns->queue;
+	disk->driverfs_dev = ctrl->device;
+	disk->flags = GENHD_FL_EXT_DEVT;
+	sprintf(disk->disk_name, "nvme%dn%d", ctrl->instance, nsid);
+
+	/*
+	 * Initialize capacity to 0 until we establish the namespace format and
+	 * setup integrity extentions if necessary. The revalidate_disk after
+	 * add_disk allows the driver to register with integrity if the format
+	 * requires it.
+	 */
+	set_capacity(disk, 0);
+	if (nvme_revalidate_disk(ns->disk))
+		goto out_free_disk;
+
+	kref_get(&ctrl->kref);
+	add_disk(ns->disk);
+	if (ns->ms) {
+		struct block_device *bd = bdget_disk(ns->disk, 0);
+		if (!bd)
+			return;
+		if (blkdev_get(bd, FMODE_READ, NULL)) {
+			bdput(bd);
+			return;
+		}
+		blkdev_reread_part(bd);
+		blkdev_put(bd, FMODE_READ);
+	}
+	return;
+ out_free_disk:
+	kfree(disk);
+	list_del(&ns->list);
+ out_free_queue:
+	blk_cleanup_queue(ns->queue);
+ out_free_ns:
+	kfree(ns);
+}
+
+static void nvme_ns_remove(struct nvme_ns *ns)
+{
+	bool kill = nvme_io_incapable(ns->ctrl) &&
+			!blk_queue_dying(ns->queue);
+
+	if (kill)
+		blk_set_queue_dying(ns->queue);
+	if (ns->disk->flags & GENHD_FL_UP) {
+		if (blk_get_integrity(ns->disk))
+			blk_integrity_unregister(ns->disk);
+		del_gendisk(ns->disk);
+	}
+	if (kill || !blk_queue_dying(ns->queue)) {
+		blk_mq_abort_requeue_list(ns->queue);
+		blk_cleanup_queue(ns->queue);
+	}
+	list_del_init(&ns->list);
+	nvme_put_ns(ns);
+}
+
+static void __nvme_scan_namespaces(struct nvme_ctrl *ctrl, unsigned nn)
+{
+	struct nvme_ns *ns, *next;
+	unsigned i;
+
+	for (i = 1; i <= nn; i++) {
+		ns = nvme_find_ns(ctrl, i);
+		if (ns) {
+			if (revalidate_disk(ns->disk))
+				nvme_ns_remove(ns);
+		} else
+			nvme_alloc_ns(ctrl, i);
+	}
+	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
+		if (ns->ns_id > nn)
+			nvme_ns_remove(ns);
+	}
+	list_sort(NULL, &ctrl->namespaces, ns_cmp);
+}
+
+void nvme_scan_namespaces(struct nvme_ctrl *ctrl)
+{
+	struct nvme_id_ctrl *id;
+
+	if (nvme_identify_ctrl(ctrl, &id))
+		return;
+	__nvme_scan_namespaces(ctrl, le32_to_cpup(&id->nn));
+	kfree(id);
+}
+
+void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns, *next;
+
+	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
+		nvme_ns_remove(ns);
+}
+
+int __init nvme_core_init(void)
+{
+	int result;
+
+	result = register_blkdev(nvme_major, "nvme");
+	if (result < 0)
+		return result;
+	else if (result > 0)
+		nvme_major = result;
+
+	return 0;
+}
+
+void nvme_core_exit(void)
+{
+	unregister_blkdev(nvme_major, "nvme");
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 48b221a..53e82feb 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -45,6 +45,9 @@ struct nvme_ctrl {
 	struct device *dev;
 	struct kref kref;
 	int instance;
+	struct blk_mq_tag_set *tagset;
+	struct list_head namespaces;
+	struct device *device;	/* char device */
 
 	char name[12];
 	char serial[20];
@@ -52,6 +55,7 @@ struct nvme_ctrl {
 	char firmware_rev[8];
 	u32 max_hw_sectors;
 	u32 stripe_size;
+	u32 page_size;
 	u16 oncs;
 	u16 abort_limit;
 	u8 event_limit;
@@ -83,6 +87,7 @@ struct nvme_ns {
 struct nvme_ctrl_ops {
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
+	bool (*io_incapable)(struct nvme_ctrl *ctrl);
 	void (*free_ctrl)(struct nvme_ctrl *ctrl);
 };
 
@@ -95,6 +100,17 @@ static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
 	return val & NVME_CSTS_RDY;
 }
 
+static inline bool nvme_io_incapable(struct nvme_ctrl *ctrl)
+{
+	u32 val = 0;
+
+	if (ctrl->ops->io_incapable(ctrl))
+		return false;
+	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val))
+		return false;
+	return val & NVME_CSTS_CFS;
+}
+
 static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 {
 	return (sector >> (ns->lba_shift - 9));
@@ -165,7 +181,9 @@ static inline int nvme_error_status(u16 status)
 
 void nvme_put_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_identify(struct nvme_ctrl *ctrl);
-void nvme_put_ns(struct nvme_ns *ns);
+
+void nvme_scan_namespaces(struct nvme_ctrl *ctrl);
+void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
@@ -187,10 +205,8 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 			dma_addr_t dma_addr, u32 *result);
 
-extern const struct block_device_operations nvme_fops;
 extern spinlock_t dev_list_lock;
 
-int nvme_revalidate_disk(struct gendisk *disk);
 int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 			struct nvme_passthru_cmd __user *ucmd);
 
@@ -200,4 +216,7 @@ int nvme_sg_io(struct nvme_ns *ns, struct sg_io_hdr __user *u_hdr);
 int nvme_sg_io32(struct nvme_ns *ns, unsigned long arg);
 int nvme_sg_get_version_num(int __user *ip);
 
+int __init nvme_core_init(void);
+void nvme_core_exit(void);
+
 #endif /* _NVME_H */
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index adff1cb..3d51396 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -28,7 +28,6 @@
 #include <linux/kdev_t.h>
 #include <linux/kthread.h>
 #include <linux/kernel.h>
-#include <linux/list_sort.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
@@ -64,9 +63,6 @@ static unsigned char shutdown_timeout = 5;
 module_param(shutdown_timeout, byte, 0644);
 MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown");
 
-static int nvme_major;
-module_param(nvme_major, int, 0);
-
 static int nvme_char_major;
 module_param(nvme_char_major, int, 0);
 
@@ -121,8 +117,6 @@ struct nvme_dev {
 	u32 ctrl_config;
 	struct msix_entry *entry;
 	void __iomem *bar;
-	struct list_head namespaces;
-	struct device *device;
 	struct work_struct reset_work;
 	struct work_struct probe_work;
 	struct work_struct scan_work;
@@ -1619,88 +1613,6 @@ static int nvme_kthread(void *data)
 	return 0;
 }
 
-static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
-{
-	struct nvme_ns *ns;
-	struct gendisk *disk;
-	int node = dev_to_node(dev->dev);
-
-	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
-	if (!ns)
-		return;
-
-	ns->queue = blk_mq_init_queue(&dev->tagset);
-	if (IS_ERR(ns->queue))
-		goto out_free_ns;
-	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, ns->queue);
-	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
-	ns->ctrl = &dev->ctrl;
-	ns->queue->queuedata = ns;
-
-	disk = alloc_disk_node(0, node);
-	if (!disk)
-		goto out_free_queue;
-
-	kref_init(&ns->kref);
-	ns->ns_id = nsid;
-	ns->disk = disk;
-	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
-	list_add_tail(&ns->list, &dev->namespaces);
-
-	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
-	if (dev->ctrl.max_hw_sectors) {
-		blk_queue_max_hw_sectors(ns->queue, dev->ctrl.max_hw_sectors);
-		blk_queue_max_segments(ns->queue,
-			((dev->ctrl.max_hw_sectors << 9) / dev->page_size) + 1);
-	}
-	if (dev->ctrl.stripe_size)
-		blk_queue_chunk_sectors(ns->queue, dev->ctrl.stripe_size >> 9);
-	if (dev->ctrl.vwc & NVME_CTRL_VWC_PRESENT)
-		blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
-	blk_queue_virt_boundary(ns->queue, dev->page_size - 1);
-
-	disk->major = nvme_major;
-	disk->first_minor = 0;
-	disk->fops = &nvme_fops;
-	disk->private_data = ns;
-	disk->queue = ns->queue;
-	disk->driverfs_dev = dev->device;
-	disk->flags = GENHD_FL_EXT_DEVT;
-	sprintf(disk->disk_name, "nvme%dn%d", dev->ctrl.instance, nsid);
-
-	/*
-	 * Initialize capacity to 0 until we establish the namespace format and
-	 * setup integrity extentions if necessary. The revalidate_disk after
-	 * add_disk allows the driver to register with integrity if the format
-	 * requires it.
-	 */
-	set_capacity(disk, 0);
-	if (nvme_revalidate_disk(ns->disk))
-		goto out_free_disk;
-
-	kref_get(&dev->ctrl.kref);
-	add_disk(ns->disk);
-	if (ns->ms) {
-		struct block_device *bd = bdget_disk(ns->disk, 0);
-		if (!bd)
-			return;
-		if (blkdev_get(bd, FMODE_READ, NULL)) {
-			bdput(bd);
-			return;
-		}
-		blkdev_reread_part(bd);
-		blkdev_put(bd, FMODE_READ);
-	}
-	return;
- out_free_disk:
-	kfree(disk);
-	list_del(&ns->list);
- out_free_queue:
-	blk_cleanup_queue(ns->queue);
- out_free_ns:
-	kfree(ns);
-}
-
 /*
  * Create I/O queues.  Failing to create an I/O queue is not an issue,
  * we can continue with less than the desired amount of queues, and
@@ -1883,74 +1795,6 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	return result;
 }
 
-static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
-{
-	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
-	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
-
-	return nsa->ns_id - nsb->ns_id;
-}
-
-static struct nvme_ns *nvme_find_ns(struct nvme_dev *dev, unsigned nsid)
-{
-	struct nvme_ns *ns;
-
-	list_for_each_entry(ns, &dev->namespaces, list) {
-		if (ns->ns_id == nsid)
-			return ns;
-		if (ns->ns_id > nsid)
-			break;
-	}
-	return NULL;
-}
-
-static inline bool nvme_io_incapable(struct nvme_dev *dev)
-{
-	return (!dev->bar ||
-		readl(dev->bar + NVME_REG_CSTS) & NVME_CSTS_CFS ||
-		dev->online_queues < 2);
-}
-
-static void nvme_ns_remove(struct nvme_ns *ns)
-{
-	bool kill = nvme_io_incapable(to_nvme_dev(ns->ctrl)) &&
-			!blk_queue_dying(ns->queue);
-
-	if (kill)
-		blk_set_queue_dying(ns->queue);
-	if (ns->disk->flags & GENHD_FL_UP) {
-		if (blk_get_integrity(ns->disk))
-			blk_integrity_unregister(ns->disk);
-		del_gendisk(ns->disk);
-	}
-	if (kill || !blk_queue_dying(ns->queue)) {
-		blk_mq_abort_requeue_list(ns->queue);
-		blk_cleanup_queue(ns->queue);
-	}
-	list_del_init(&ns->list);
-	nvme_put_ns(ns);
-}
-
-static void nvme_scan_namespaces(struct nvme_dev *dev, unsigned nn)
-{
-	struct nvme_ns *ns, *next;
-	unsigned i;
-
-	for (i = 1; i <= nn; i++) {
-		ns = nvme_find_ns(dev, i);
-		if (ns) {
-			if (revalidate_disk(ns->disk))
-				nvme_ns_remove(ns);
-		} else
-			nvme_alloc_ns(dev, i);
-	}
-	list_for_each_entry_safe(ns, next, &dev->namespaces, list) {
-		if (ns->ns_id > nn)
-			nvme_ns_remove(ns);
-	}
-	list_sort(NULL, &dev->namespaces, ns_cmp);
-}
-
 static void nvme_set_irq_hints(struct nvme_dev *dev)
 {
 	struct nvme_queue *nvmeq;
@@ -1970,14 +1814,10 @@ static void nvme_set_irq_hints(struct nvme_dev *dev)
 static void nvme_dev_scan(struct work_struct *work)
 {
 	struct nvme_dev *dev = container_of(work, struct nvme_dev, scan_work);
-	struct nvme_id_ctrl *ctrl;
 
 	if (!dev->tagset.tags)
 		return;
-	if (nvme_identify_ctrl(&dev->ctrl, &ctrl))
-		return;
-	nvme_scan_namespaces(dev, le32_to_cpup(&ctrl->nn));
-	kfree(ctrl);
+	nvme_scan_namespaces(&dev->ctrl);
 	nvme_set_irq_hints(dev);
 }
 
@@ -1989,7 +1829,7 @@ static void nvme_dev_scan(struct work_struct *work)
  */
 static int nvme_dev_add(struct nvme_dev *dev)
 {
-	if (!dev->tagset.tags) {
+	if (!dev->ctrl.tagset) {
 		dev->tagset.ops = &nvme_mq_ops;
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
@@ -2002,6 +1842,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 
 		if (blk_mq_alloc_tag_set(&dev->tagset))
 			return 0;
+		dev->ctrl.tagset = &dev->tagset;
 	}
 	schedule_work(&dev->scan_work);
 	return 0;
@@ -2250,7 +2091,7 @@ static void nvme_freeze_queues(struct nvme_dev *dev)
 {
 	struct nvme_ns *ns;
 
-	list_for_each_entry(ns, &dev->namespaces, list) {
+	list_for_each_entry(ns, &dev->ctrl.namespaces, list) {
 		blk_mq_freeze_queue_start(ns->queue);
 
 		spin_lock_irq(ns->queue->queue_lock);
@@ -2266,7 +2107,7 @@ static void nvme_unfreeze_queues(struct nvme_dev *dev)
 {
 	struct nvme_ns *ns;
 
-	list_for_each_entry(ns, &dev->namespaces, list) {
+	list_for_each_entry(ns, &dev->ctrl.namespaces, list) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_unfreeze_queue(ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
@@ -2301,14 +2142,6 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 		nvme_clear_queue(dev->queues[i]);
 }
 
-static void nvme_dev_remove(struct nvme_dev *dev)
-{
-	struct nvme_ns *ns, *next;
-
-	list_for_each_entry_safe(ns, next, &dev->namespaces, list)
-		nvme_ns_remove(ns);
-}
-
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
 	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
@@ -2366,7 +2199,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
 	put_device(dev->dev);
-	put_device(dev->device);
+	put_device(ctrl->device);
 	nvme_release_instance(dev);
 	if (dev->tagset.tags)
 		blk_mq_free_tag_set(&dev->tagset);
@@ -2418,9 +2251,9 @@ static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 	case NVME_IOCTL_ADMIN_CMD:
 		return nvme_user_cmd(&dev->ctrl, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
-		if (list_empty(&dev->namespaces))
+		if (list_empty(&dev->ctrl.namespaces))
 			return -ENOTTY;
-		ns = list_first_entry(&dev->namespaces, struct nvme_ns, list);
+		ns = list_first_entry(&dev->ctrl.namespaces, struct nvme_ns, list);
 		return nvme_user_cmd(&dev->ctrl, ns, (void __user *)arg);
 	case NVME_IOCTL_RESET:
 		dev_warn(dev->dev, "resetting controller\n");
@@ -2494,7 +2327,7 @@ static void nvme_probe_work(struct work_struct *work)
 	 */
 	if (dev->online_queues < 2) {
 		dev_warn(dev->dev, "IO queues not created\n");
-		nvme_dev_remove(dev);
+		nvme_remove_namespaces(&dev->ctrl);
 	} else {
 		nvme_unfreeze_queues(dev);
 		nvme_dev_add(dev);
@@ -2618,9 +2451,17 @@ static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 	return 0;
 }
 
+static bool nvme_pci_io_incapable(struct nvme_ctrl *ctrl)
+{
+	struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+	return !dev->bar || dev->online_queues < 2;
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_read64		= nvme_pci_reg_read64,
+	.io_incapable		= nvme_pci_io_incapable,
 	.free_ctrl		= nvme_pci_free_ctrl,
 };
 
@@ -2645,7 +2486,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!dev->queues)
 		goto free;
 
-	INIT_LIST_HEAD(&dev->namespaces);
+	INIT_LIST_HEAD(&dev->ctrl.namespaces);
 	INIT_WORK(&dev->reset_work, nvme_reset_work);
 	dev->dev = get_device(&pdev->dev);
 	pci_set_drvdata(pdev, dev);
@@ -2664,17 +2505,17 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto release;
 
 	kref_init(&dev->ctrl.kref);
-	dev->device = device_create(nvme_class, &pdev->dev,
+	dev->ctrl.device = device_create(nvme_class, &pdev->dev,
 				MKDEV(nvme_char_major, dev->ctrl.instance),
 				dev, "nvme%d", dev->ctrl.instance);
-	if (IS_ERR(dev->device)) {
-		result = PTR_ERR(dev->device);
+	if (IS_ERR(dev->ctrl.device)) {
+		result = PTR_ERR(dev->ctrl.device);
 		goto release_pools;
 	}
-	get_device(dev->device);
-	dev_set_drvdata(dev->device, dev);
+	get_device(dev->ctrl.device);
+	dev_set_drvdata(dev->ctrl.device, dev);
 
-	result = device_create_file(dev->device, &dev_attr_reset_controller);
+	result = device_create_file(dev->ctrl.device, &dev_attr_reset_controller);
 	if (result)
 		goto put_dev;
 
@@ -2686,7 +2527,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
  put_dev:
 	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
-	put_device(dev->device);
+	put_device(dev->ctrl.device);
  release_pools:
 	nvme_release_prp_pools(dev);
  release:
@@ -2728,8 +2569,8 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->probe_work);
 	flush_work(&dev->reset_work);
 	flush_work(&dev->scan_work);
-	device_remove_file(dev->device, &dev_attr_reset_controller);
-	nvme_dev_remove(dev);
+	device_remove_file(dev->ctrl.device, &dev_attr_reset_controller);
+	nvme_remove_namespaces(&dev->ctrl);
 	nvme_dev_shutdown(dev);
 	nvme_dev_remove_admin(dev);
 	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
@@ -2810,11 +2651,9 @@ static int __init nvme_init(void)
 	if (!nvme_workq)
 		return -ENOMEM;
 
-	result = register_blkdev(nvme_major, "nvme");
+	result = nvme_core_init();
 	if (result < 0)
 		goto kill_workq;
-	else if (result > 0)
-		nvme_major = result;
 
 	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme",
 							&nvme_dev_fops);
@@ -2839,7 +2678,7 @@ static int __init nvme_init(void)
  unregister_chrdev:
 	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
  unregister_blkdev:
-	unregister_blkdev(nvme_major, "nvme");
+	nvme_core_exit();
  kill_workq:
 	destroy_workqueue(nvme_workq);
 	return result;
@@ -2848,7 +2687,7 @@ static int __init nvme_init(void)
 static void __exit nvme_exit(void)
 {
 	pci_unregister_driver(&nvme_driver);
-	unregister_blkdev(nvme_major, "nvme");
+	nvme_core_exit();
 	destroy_workqueue(nvme_workq);
 	class_destroy(nvme_class);
 	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-16  5:58 ` [PATCH 17/18] nvme: move namespace scanning to common code Christoph Hellwig
@ 2015-10-16  6:14   ` Ming Lin
  2015-10-16  6:16     ` Christoph Hellwig
  2015-10-21 23:27   ` J Freyensee
  1 sibling, 1 reply; 59+ messages in thread
From: Ming Lin @ 2015-10-16  6:14 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> @@ -2810,11 +2651,9 @@ static int __init nvme_init(void)
>  	if (!nvme_workq)
>  		return -ENOMEM;
>  
> -	result = register_blkdev(nvme_major, "nvme");
> +	result = nvme_core_init();
>  	if (result < 0)
>  		goto kill_workq;
> -	else if (result > 0)
> -		nvme_major = result;
>  
>  	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme",
>  							&nvme_dev_fops);
> @@ -2839,7 +2678,7 @@ static int __init nvme_init(void)
>   unregister_chrdev:
>  	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
>   unregister_blkdev:
> -	unregister_blkdev(nvme_major, "nvme");
> +	nvme_core_exit();
>   kill_workq:
>  	destroy_workqueue(nvme_workq);
>  	return result;
> @@ -2848,7 +2687,7 @@ static int __init nvme_init(void)
>  static void __exit nvme_exit(void)
>  {
>  	pci_unregister_driver(&nvme_driver);
> -	unregister_blkdev(nvme_major, "nvme");
> +	nvme_core_exit();
>  	destroy_workqueue(nvme_workq);
>  	class_destroy(nvme_class);
>  	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");

Should nvme_core_{init,exit} be called in core.c?

I did this:
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=nvme-split/virtio&id=76fa970

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-16  6:14   ` Ming Lin
@ 2015-10-16  6:16     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  6:16 UTC (permalink / raw)


On Thu, Oct 15, 2015@11:14:14PM -0700, Ming Lin wrote:
> Should nvme_core_{init,exit} be called in core.c?

We can't do that until core.c is built into a separate module.  Your
patch will only work when NVMe is built into the kernel.

I want to defer the split until it can actually be used in tree.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-16  5:58 ` [PATCH 17/18] nvme: move namespace scanning to common code Christoph Hellwig
  2015-10-16  6:14   ` Ming Lin
@ 2015-10-21 23:27   ` J Freyensee
  2015-10-22  7:39     ` Christoph Hellwig
  1 sibling, 1 reply; 59+ messages in thread
From: J Freyensee @ 2015-10-21 23:27 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> The namespace scanning code has been mostly generic already, we just
> need to store a pointer to the tagset in the nvme_ctrl structure, and
> add a method to check if a controller is I/O incapable.  The latter
> will hopefully be replaced by a proper controller state machine soon.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c | 190 
> +++++++++++++++++++++++++++++++++++++++-
>  drivers/nvme/host/nvme.h |  25 +++++-
>  drivers/nvme/host/pci.c  | 221 +++++++------------------------------
> ----------
>  3 files changed, 239 insertions(+), 197 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3e88901..a01ab5a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -17,6 +17,8 @@
>  #include <linux/errno.h>
>  #include <linux/hdreg.h>
>  #include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/list_sort.h>
>  #include <linux/slab.h>
>  #include <linux/types.h>
>  #include <linux/ptrace.h>
> @@ -26,6 +28,9 @@
>  
>  #include "nvme.h"
>  
> +static int nvme_major;
> +module_param(nvme_major, int, 0);
> +
>  DEFINE_SPINLOCK(dev_list_lock);
>  
>  static void nvme_free_ns(struct kref *kref)
> @@ -41,7 +46,7 @@ static void nvme_free_ns(struct kref *kref)
>  	kfree(ns);
>  }
>  
> -void nvme_put_ns(struct nvme_ns *ns)
> +static void nvme_put_ns(struct nvme_ns *ns)
>  {
>  	kref_put(&ns->kref, nvme_free_ns);
>  }
> @@ -503,7 +508,7 @@ static void nvme_config_discard(struct nvme_ns 
> *ns)
>  	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
>  }
>  
> -int nvme_revalidate_disk(struct gendisk *disk)
> +static int nvme_revalidate_disk(struct gendisk *disk)
>  {
>  	struct nvme_ns *ns = disk->private_data;
>  	struct nvme_id_ns *id;
> @@ -564,7 +569,7 @@ int nvme_revalidate_disk(struct gendisk *disk)
>  	return 0;
>  }
>  
> -const struct block_device_operations nvme_fops = {
> +static const struct block_device_operations nvme_fops = {
>  	.owner		= THIS_MODULE,
>  	.ioctl		= nvme_ioctl,
>  	.compat_ioctl	= nvme_compat_ioctl,
> @@ -591,6 +596,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>  		return ret;
>  	}
>  	page_shift = NVME_CAP_MPSMIN(cap) + 12;
> +	ctrl->page_size = 1 << page_shift;
>  
>  	ret = nvme_identify_ctrl(ctrl, &id);
>  	if (ret) {
> @@ -636,3 +642,181 @@ void nvme_put_ctrl(struct nvme_ctrl *ctrl)
>  	kref_put(&ctrl->kref, nvme_free_ctrl);
>  }
>  
> +static int ns_cmp(void *priv, struct list_head *a, struct list_head 
> *b)
> +{
> +	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
> +	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
> +
> +	return nsa->ns_id - nsb->ns_id;
> +}
> +
> +static struct nvme_ns *nvme_find_ns(struct nvme_ctrl *ctrl, unsigned 
> nsid)
> +{
> +	struct nvme_ns *ns;
> +
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {

I know this is basically just a move into core.c, but with the concept
of namespace management in the 1.2 spec, would something like
list_for_each_entry_safe() and/or a spinlock be more appropriate?


> +		if (ns->ns_id == nsid)
> +			return ns;
> +		if (ns->ns_id > nsid)
> +			break;
> +	}
> +	return NULL;
> +}

<snip>

> +
>  static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
>  	.reg_read32		= nvme_pci_reg_read32,
>  	.reg_read64		= nvme_pci_reg_read64,
> +	.io_incapable		= nvme_pci_io_incapable,
>  	.free_ctrl		= nvme_pci_free_ctrl,
>  };
>  
> @@ -2645,7 +2486,7 @@ static int nvme_probe(struct pci_dev *pdev, 
> const struct pci_device_id *id)
>  	if (!dev->queues)
>  		goto free;
>  
> -	INIT_LIST_HEAD(&dev->namespaces);
> +	INIT_LIST_HEAD(&dev->ctrl.namespaces);

OK, so I'm looking at your git repo:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme
-split.5

and I see:

struct nvme_ctrl {
.
.
	struct list_head namespaces;
.
.
}

So that makes more sense, and I must have missed some code or future
patch.

>  	INIT_WORK(&dev->reset_work, nvme_reset_work);
>  	dev->dev = get_device(&pdev->dev);
>  	pci_set_drvdata(pdev, dev);
> @@ -2664,17 +2505,17 @@ static int nvme_probe(struct pci_dev *pdev, 
> const struct pci_device_id *id)
>  		goto release;
>  
>  	kref_init(&dev->ctrl.kref);
> -	dev->device = device_create(nvme_class, &pdev->dev,
> +	dev->ctrl.device = device_create(nvme_class, &pdev->dev,
>  				MKDEV(nvme_char_major, dev
> ->ctrl.instance),
>  				dev, "nvme%d", dev->ctrl.instance);
> -	if (IS_ERR(dev->device)) {
> -		result = PTR_ERR(dev->device);
> +	if (IS_ERR(dev->ctrl.device)) {
> +		result = PTR_ERR(dev->ctrl.device);
>  		goto release_pools;
>  	}
> -	get_device(dev->device);
> -	dev_set_drvdata(dev->device, dev);
> +	get_device(dev->ctrl.device);
> +	dev_set_drvdata(dev->ctrl.device, dev);
>  
> -	result = device_create_file(dev->device, 
> &dev_attr_reset_controller);
> +	result = device_create_file(dev->ctrl.device, 
> &dev_attr_reset_controller);
>  	if (result)
>  		goto put_dev;
>  
> @@ -2686,7 +2527,7 @@ static int nvme_probe(struct pci_dev *pdev, 
> const struct pci_device_id *id)
>  
>   put_dev:
>  	device_destroy(nvme_class, MKDEV(nvme_char_major, dev
> ->ctrl.instance));
> -	put_device(dev->device);
> +	put_device(dev->ctrl.device);
>   release_pools:
>  	nvme_release_prp_pools(dev);
>   release:
> @@ -2728,8 +2569,8 @@ static void nvme_remove(struct pci_dev *pdev)
>  	flush_work(&dev->probe_work);
>  	flush_work(&dev->reset_work);
>  	flush_work(&dev->scan_work);
> -	device_remove_file(dev->device, &dev_attr_reset_controller);
> -	nvme_dev_remove(dev);
> +	device_remove_file(dev->ctrl.device, 
> &dev_attr_reset_controller);
> +	nvme_remove_namespaces(&dev->ctrl);
>  	nvme_dev_shutdown(dev);
>  	nvme_dev_remove_admin(dev);
>  	device_destroy(nvme_class, MKDEV(nvme_char_major, dev
> ->ctrl.instance));
> @@ -2810,11 +2651,9 @@ static int __init nvme_init(void)
>  	if (!nvme_workq)
>  		return -ENOMEM;
>  
> -	result = register_blkdev(nvme_major, "nvme");
> +	result = nvme_core_init();
>  	if (result < 0)
>  		goto kill_workq;
> -	else if (result > 0)
> -		nvme_major = result;
>  
>  	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme",
>  							&nvme_dev_fo
> ps);
> @@ -2839,7 +2678,7 @@ static int __init nvme_init(void)
>   unregister_chrdev:
>  	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme");
>   unregister_blkdev:
> -	unregister_blkdev(nvme_major, "nvme");
> +	nvme_core_exit();
>   kill_workq:
>  	destroy_workqueue(nvme_workq);
>  	return result;
> @@ -2848,7 +2687,7 @@ static int __init nvme_init(void)
>  static void __exit nvme_exit(void)
>  {
>  	pci_unregister_driver(&nvme_driver);
> -	unregister_blkdev(nvme_major, "nvme");
> +	nvme_core_exit();
>  	destroy_workqueue(nvme_workq);
>  	class_destroy(nvme_class);
>  	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme");

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-21 23:27   ` J Freyensee
@ 2015-10-22  7:39     ` Christoph Hellwig
  2015-10-22 13:48       ` Busch, Keith
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-22  7:39 UTC (permalink / raw)


On Wed, Oct 21, 2015@04:27:39PM -0700, J Freyensee wrote:
> I know this is basically just a move into core.c, but with the concept
> of namespace management in the 1.2 spec, would something like
> list_for_each_entry_safe() and/or a spinlock be more appropriate?

Yes, someone should sit down and think about the concurrency model
for namespace scanning.  Also we really should look into using Identify
for the active name space list as that will be a lot more efficient for
controllers with lots of namespaces.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-22  7:39     ` Christoph Hellwig
@ 2015-10-22 13:48       ` Busch, Keith
  2015-10-22 16:30         ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Busch, Keith @ 2015-10-22 13:48 UTC (permalink / raw)


> -----Original Message-----
> From: Christoph Hellwig [mailto:hch at lst.de]
> Sent: Thursday, October 22, 2015 1:40 AM
> To: J Freyensee
> Cc: Christoph Hellwig; axboe at fb.com; Busch, Keith; Sternberg, Jay E; Cayton, Phil;
> mlin at kernel.org; linux-nvme at lists.infradead.org
> Subject: Re: [PATCH 17/18] nvme: move namespace scanning to common code
> 
> On Wed, Oct 21, 2015@04:27:39PM -0700, J Freyensee wrote:
> > I know this is basically just a move into core.c, but with the concept
> > of namespace management in the 1.2 spec, would something like
> > list_for_each_entry_safe() and/or a spinlock be more appropriate?
> 
> Yes, someone should sit down and think about the concurrency model
> for namespace scanning.  Also we really should look into using Identify
> for the active name space list as that will be a lot more efficient for
> controllers with lots of namespaces.

Here's a namespace list proposal:

http://lists.infradead.org/pipermail/linux-nvme/2015-September/002325.html

I'll rebase it on the latest and resend.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-22 13:48       ` Busch, Keith
@ 2015-10-22 16:30         ` Christoph Hellwig
  2015-10-22 21:24           ` Busch, Keith
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-22 16:30 UTC (permalink / raw)

On Thu, Oct 22, 2015@01:48:30PM +0000, Busch, Keith wrote:
> > Yes, someone should sit down and think about the concurrency model
> > for namespace scanning.  Also we really should look into using Identify
> > for the active name space list as that will be a lot more efficient for
> > controllers with lots of namespaces.
> 
> Here's a namespace list proposal:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2015-September/002325.html
> 
> I'll rebase it on the latest and resend.

Looks good.  I suspect we might want a fallback if the controller
doesn't support Identify 0x2 - NVMe 1.1 doesn't mention anything about
Identify subcommand being optional or not and 1.2 says "Controllers that
support specification revision 1.1 or later shall support this
capability." which isn't as a strong as a must.  And some lazy people
like me implemented NVMe targets systes that don't support it but claim
to be version 1.2, although I'll try to get that fixed ASAP.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-22 16:30         ` Christoph Hellwig
@ 2015-10-22 21:24           ` Busch, Keith
  2015-10-23  5:41             ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Busch, Keith @ 2015-10-22 21:24 UTC (permalink / raw)


On Thu, Oct 22, 2015@06:30:00PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 22, 2015@01:48:30PM +0000, Busch, Keith wrote:
> > Here's a namespace list proposal:
> > 
> > http://lists.infradead.org/pipermail/linux-nvme/2015-September/002325.html
> > 
> > I'll rebase it on the latest and resend.
> 
> Looks good.  I suspect we might want a fallback if the controller
> doesn't support Identify 0x2 - NVMe 1.1 doesn't mention anything about
> Identify subcommand being optional or not and 1.2 says "Controllers that
> support specification revision 1.1 or later shall support this
> capability." which isn't as a strong as a must.  And some lazy people
> like me implemented NVMe targets systes that don't support it but claim
> to be version 1.2, although I'll try to get that fixed ASAP.

It does fall through to the older way if identify list fails. My concern
is when it doesn't fail when it should have. Some controllers claim 1.1 or
higher, but do not interpret the Identify Namespace List correctly. They
just check that CNS != 0, and if true, returns success with an Identify
Namespace structure, so the driver misinterprets the data.

I filed bugs with the vendors I know about, but there may be others I
haven't tested.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/18] nvme: move namespace scanning to common code
  2015-10-22 21:24           ` Busch, Keith
@ 2015-10-23  5:41             ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-23  5:41 UTC (permalink / raw)


On Thu, Oct 22, 2015@09:24:25PM +0000, Busch, Keith wrote:
> It does fall through to the older way if identify list fails. My concern
> is when it doesn't fail when it should have. Some controllers claim 1.1 or
> higher, but do not interpret the Identify Namespace List correctly. They
> just check that CNS != 0, and if true, returns success with an Identify
> Namespace structure, so the driver misinterprets the data.
> 
> I filed bugs with the vendors I know about, but there may be others I
> haven't tested.

Can you blacklist the known bad controller using the quirks mechanism
I added during the driver split?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 18/18] nvme: move chardev and sysfs interface to common code
  2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2015-10-16  5:58 ` [PATCH 17/18] nvme: move namespace scanning to common code Christoph Hellwig
@ 2015-10-16  5:58 ` Christoph Hellwig
  2015-10-22  0:11   ` J Freyensee
  17 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-16  5:58 UTC (permalink / raw)


For this we need to add a proper controller init routine and a list of
all controllers that is in addition to the list of PCIe controllers,
which stays in pci.c.  Note that we remove the sysfs device when the
last reference to a controller is dropped now - the old code would have
kept it around longer, which doesn't make much sense.

This requires a new ->reset_ctrl operation to implement controleller
resets, and a new ->write_reg32 operation that is required to implement
subsystem resets.  We also now store caches copied of the NVMe compliance
version and the flag if a controller is attached to a subsystem or not in
the generic controller structure now.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 217 +++++++++++++++++++++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |  20 +++--
 drivers/nvme/host/pci.c  | 202 +++++--------------------------------------
 drivers/nvme/host/scsi.c |   9 +-
 4 files changed, 250 insertions(+), 198 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a01ab5a..b8c72d2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -28,11 +28,19 @@
 
 #include "nvme.h"
 
+#define NVME_MINORS		(1U << MINORBITS)
+
 static int nvme_major;
 module_param(nvme_major, int, 0);
 
+static int nvme_char_major;
+module_param(nvme_char_major, int, 0);
+
+static LIST_HEAD(nvme_ctrl_list);
 DEFINE_SPINLOCK(dev_list_lock);
 
+static struct class *nvme_class;
+
 static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
@@ -358,7 +366,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 			metadata, meta_len, io.slba, NULL, 0);
 }
 
-int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 			struct nvme_passthru_cmd __user *ucmd)
 {
 	struct nvme_passthru_cmd cmd;
@@ -590,6 +598,12 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	u64 cap;
 	int ret, page_shift;
 
+	ret = ctrl->ops->reg_read32(ctrl, NVME_REG_VS, &ctrl->vs);
+	if (ret) {
+		dev_err(ctrl->dev, "Reading VS failed (%d)\n", ret);
+		return ret;
+	}
+
 	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &cap);
 	if (ret) {
 		dev_err(ctrl->dev, "Reading CAP failed (%d)\n", ret);
@@ -598,6 +612,9 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	page_shift = NVME_CAP_MPSMIN(cap) + 12;
 	ctrl->page_size = 1 << page_shift;
 
+	if (ctrl->vs >= NVME_VS(1, 1))
+		ctrl->subsystem = NVME_CAP_NSSRC(cap);
+
 	ret = nvme_identify_ctrl(ctrl, &id);
 	if (ret) {
 		dev_err(ctrl->dev, "Identify Controller failed (%d)\n", ret);
@@ -630,18 +647,85 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	return 0;
 }
 
-static void nvme_free_ctrl(struct kref *kref)
+static int nvme_dev_open(struct inode *inode, struct file *file)
 {
-	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+	struct nvme_ctrl *ctrl;
+	int instance = iminor(inode);
+	int ret = -ENODEV;
 
-	ctrl->ops->free_ctrl(ctrl);
+	spin_lock(&dev_list_lock);
+	list_for_each_entry(ctrl, &nvme_ctrl_list, node) {
+		if (ctrl->instance != instance)
+			continue;
+
+		if (!ctrl->admin_q) {
+			ret = -EWOULDBLOCK;
+			break;
+		}
+		if (!kref_get_unless_zero(&ctrl->kref))
+			break;
+		file->private_data = ctrl;
+		ret = 0;
+		break;
+	}
+	spin_unlock(&dev_list_lock);
+
+	return ret;
 }
 
-void nvme_put_ctrl(struct nvme_ctrl *ctrl)
+static int nvme_dev_release(struct inode *inode, struct file *file)
 {
-	kref_put(&ctrl->kref, nvme_free_ctrl);
+	nvme_put_ctrl(file->private_data);
+	return 0;
+}
+
+static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
+		unsigned long arg)
+{
+	struct nvme_ctrl *ctrl = file->private_data;
+	void __user *argp = (void __user *)arg;
+	struct nvme_ns *ns;
+
+	switch (cmd) {
+	case NVME_IOCTL_ADMIN_CMD:
+		return nvme_user_cmd(ctrl, NULL, argp);
+	case NVME_IOCTL_IO_CMD:
+		if (list_empty(&ctrl->namespaces))
+			return -ENOTTY;
+		ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
+		return nvme_user_cmd(ctrl, ns, argp);
+	case NVME_IOCTL_RESET:
+		dev_warn(ctrl->dev, "resetting controller\n");
+		return ctrl->ops->reset_ctrl(ctrl);
+	case NVME_IOCTL_SUBSYS_RESET:
+		return nvme_reset_subsystem(ctrl);
+	default:
+		return -ENOTTY;
+	}
 }
 
+static const struct file_operations nvme_dev_fops = {
+	.owner		= THIS_MODULE,
+	.open		= nvme_dev_open,
+	.release	= nvme_dev_release,
+	.unlocked_ioctl	= nvme_dev_ioctl,
+	.compat_ioctl	= nvme_dev_ioctl,
+};
+
+static ssize_t nvme_sysfs_reset(struct device *dev,
+				struct device_attribute *attr, const char *buf,
+				size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	int ret;
+
+	ret = ctrl->ops->reset_ctrl(ctrl);
+	if (ret < 0)
+		return ret;
+	return count;
+}
+static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, nvme_sysfs_reset);
+
 static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 {
 	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
@@ -803,6 +887,106 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 		nvme_ns_remove(ns);
 }
 
+static DEFINE_IDA(nvme_instance_ida);
+
+static int nvme_set_instance(struct nvme_ctrl *ctrl)
+{
+	int instance, error;
+
+	do {
+		if (!ida_pre_get(&nvme_instance_ida, GFP_KERNEL))
+			return -ENODEV;
+
+		spin_lock(&dev_list_lock);
+		error = ida_get_new(&nvme_instance_ida, &instance);
+		spin_unlock(&dev_list_lock);
+	} while (error == -EAGAIN);
+
+	if (error)
+		return -ENODEV;
+
+	ctrl->instance = instance;
+	return 0;
+}
+
+static void nvme_release_instance(struct nvme_ctrl *ctrl)
+{
+	spin_lock(&dev_list_lock);
+	ida_remove(&nvme_instance_ida, ctrl->instance);
+	spin_unlock(&dev_list_lock);
+}
+
+static void nvme_free_ctrl(struct kref *kref)
+{
+	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+
+	spin_lock(&dev_list_lock);
+	list_del(&ctrl->node);
+	spin_unlock(&dev_list_lock);
+
+	put_device(ctrl->device);
+	nvme_release_instance(ctrl);
+	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl->instance));
+
+	ctrl->ops->free_ctrl(ctrl);
+}
+
+void nvme_put_ctrl(struct nvme_ctrl *ctrl)
+{
+	kref_put(&ctrl->kref, nvme_free_ctrl);
+}
+
+/*
+ * Initialize a NVMe controller structures.  This needs to be called during
+ * earliest initialization so that we have the initialized structured around
+ * during probing.
+ */
+int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
+		const struct nvme_ctrl_ops *ops, u16 vendor,
+		unsigned long quirks)
+{
+	int ret;
+
+	INIT_LIST_HEAD(&ctrl->namespaces);
+	kref_init(&ctrl->kref);
+	ctrl->dev = dev;
+	ctrl->ops = ops;
+	ctrl->vendor = vendor;
+	ctrl->quirks = quirks;
+
+	ret = nvme_set_instance(ctrl);
+	if (ret)
+		goto out;
+
+	ctrl->device = device_create(nvme_class, ctrl->dev,
+				MKDEV(nvme_char_major, ctrl->instance),
+				dev, "nvme%d", ctrl->instance);
+	if (IS_ERR(ctrl->device)) {
+		ret = PTR_ERR(ctrl->device);
+		goto out_release_instance;
+	}
+	get_device(ctrl->device);
+	dev_set_drvdata(ctrl->device, ctrl);
+
+	ret = device_create_file(ctrl->device, &dev_attr_reset_controller);
+	if (ret)
+		goto out_put_device;
+
+	spin_lock(&dev_list_lock);
+	list_add_tail(&ctrl->node, &nvme_ctrl_list);
+	spin_unlock(&dev_list_lock);
+
+	return 0;
+
+out_put_device:
+	put_device(ctrl->device);
+	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl->instance));
+out_release_instance:
+	nvme_release_instance(ctrl);
+out:
+	return ret;
+}
+
 int __init nvme_core_init(void)
 {
 	int result;
@@ -813,10 +997,31 @@ int __init nvme_core_init(void)
 	else if (result > 0)
 		nvme_major = result;
 
+	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme",
+							&nvme_dev_fops);
+	if (result < 0)
+		goto unregister_blkdev;
+	else if (result > 0)
+		nvme_char_major = result;
+
+	nvme_class = class_create(THIS_MODULE, "nvme");
+	if (IS_ERR(nvme_class)) {
+		result = PTR_ERR(nvme_class);
+		goto unregister_chrdev;
+	}
+
 	return 0;
+
+ unregister_chrdev:
+	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
+ unregister_blkdev:
+	unregister_blkdev(nvme_major, "nvme");
+	return result;
 }
 
 void nvme_core_exit(void)
 {
 	unregister_blkdev(nvme_major, "nvme");
+	class_destroy(nvme_class);
+	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 53e82feb..da63835 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -19,8 +19,6 @@
 #include <linux/kref.h>
 #include <linux/blk-mq.h>
 
-struct nvme_passthru_cmd;
-
 extern unsigned char nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
 
@@ -48,6 +46,7 @@ struct nvme_ctrl {
 	struct blk_mq_tag_set *tagset;
 	struct list_head namespaces;
 	struct device *device;	/* char device */
+	struct list_head node;
 
 	char name[12];
 	char serial[20];
@@ -60,6 +59,8 @@ struct nvme_ctrl {
 	u16 abort_limit;
 	u8 event_limit;
 	u8 vwc;
+	u32 vs;
+	bool subsystem;
 	u16 vendor;
 	unsigned long quirks;
 };
@@ -87,7 +88,9 @@ struct nvme_ns {
 struct nvme_ctrl_ops {
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
+	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	bool (*io_incapable)(struct nvme_ctrl *ctrl);
+	int (*reset_ctrl)(struct nvme_ctrl *ctrl);
 	void (*free_ctrl)(struct nvme_ctrl *ctrl);
 };
 
@@ -111,6 +114,13 @@ static inline bool nvme_io_incapable(struct nvme_ctrl *ctrl)
 	return val & NVME_CSTS_CFS;
 }
 
+static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
+{
+	if (!ctrl->subsystem)
+		return -ENOTTY;
+	return ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 0x4E564D65);
+}
+
 static inline u64 nvme_block_nr(struct nvme_ns *ns, sector_t sector)
 {
 	return (sector >> (ns->lba_shift - 9));
@@ -179,6 +189,9 @@ static inline int nvme_error_status(u16 status)
 	}
 }
 
+int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
+		const struct nvme_ctrl_ops *ops, u16 vendor,
+		unsigned long quirks);
 void nvme_put_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_identify(struct nvme_ctrl *ctrl);
 
@@ -207,9 +220,6 @@ int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 
 extern spinlock_t dev_list_lock;
 
-int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-			struct nvme_passthru_cmd __user *ucmd);
-
 struct sg_io_hdr;
 
 int nvme_sg_io(struct nvme_ns *ns, struct sg_io_hdr __user *u_hdr);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3d51396..cc0177c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -38,13 +38,10 @@
 #include <linux/slab.h>
 #include <linux/t10-pi.h>
 #include <linux/types.h>
-#include <scsi/sg.h>
 #include <asm-generic/io-64-nonatomic-lo-hi.h>
 
-#include <uapi/linux/nvme_ioctl.h>
 #include "nvme.h"
 
-#define NVME_MINORS		(1U << MINORBITS)
 #define NVME_Q_DEPTH		1024
 #define NVME_AQ_DEPTH		256
 #define SQ_SIZE(depth)		(depth * sizeof(struct nvme_command))
@@ -63,9 +60,6 @@ static unsigned char shutdown_timeout = 5;
 module_param(shutdown_timeout, byte, 0644);
 MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown");
 
-static int nvme_char_major;
-module_param(nvme_char_major, int, 0);
-
 static int use_threaded_interrupts;
 module_param(use_threaded_interrupts, int, 0);
 
@@ -78,8 +72,6 @@ static struct task_struct *nvme_thread;
 static struct workqueue_struct *nvme_workq;
 static wait_queue_head_t nvme_kthread_wait;
 
-static struct class *nvme_class;
-
 struct nvme_dev;
 struct nvme_queue;
 
@@ -1563,15 +1555,6 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 	return result;
 }
 
-static int nvme_subsys_reset(struct nvme_dev *dev)
-{
-	if (!dev->subsystem)
-		return -ENOTTY;
-
-	writel(0x4E564D65, dev->bar + NVME_REG_NSSR); /* "NVMe" */
-	return 0;
-}
-
 static int nvme_kthread(void *data)
 {
 	struct nvme_dev *dev, *next;
@@ -2165,42 +2148,11 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
 	dma_pool_destroy(dev->prp_small_pool);
 }
 
-static DEFINE_IDA(nvme_instance_ida);
-
-static int nvme_set_instance(struct nvme_dev *dev)
-{
-	int instance, error;
-
-	do {
-		if (!ida_pre_get(&nvme_instance_ida, GFP_KERNEL))
-			return -ENODEV;
-
-		spin_lock(&dev_list_lock);
-		error = ida_get_new(&nvme_instance_ida, &instance);
-		spin_unlock(&dev_list_lock);
-	} while (error == -EAGAIN);
-
-	if (error)
-		return -ENODEV;
-
-	dev->ctrl.instance = instance;
-	return 0;
-}
-
-static void nvme_release_instance(struct nvme_dev *dev)
-{
-	spin_lock(&dev_list_lock);
-	ida_remove(&nvme_instance_ida, dev->ctrl.instance);
-	spin_unlock(&dev_list_lock);
-}
-
 static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
 	put_device(dev->dev);
-	put_device(ctrl->device);
-	nvme_release_instance(dev);
 	if (dev->tagset.tags)
 		blk_mq_free_tag_set(&dev->tagset);
 	if (dev->ctrl.admin_q)
@@ -2210,69 +2162,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	kfree(dev);
 }
 
-static int nvme_dev_open(struct inode *inode, struct file *f)
-{
-	struct nvme_dev *dev;
-	int instance = iminor(inode);
-	int ret = -ENODEV;
-
-	spin_lock(&dev_list_lock);
-	list_for_each_entry(dev, &dev_list, node) {
-		if (dev->ctrl.instance == instance) {
-			if (!dev->ctrl.admin_q) {
-				ret = -EWOULDBLOCK;
-				break;
-			}
-			if (!kref_get_unless_zero(&dev->ctrl.kref))
-				break;
-			f->private_data = dev;
-			ret = 0;
-			break;
-		}
-	}
-	spin_unlock(&dev_list_lock);
-
-	return ret;
-}
-
-static int nvme_dev_release(struct inode *inode, struct file *f)
-{
-	struct nvme_dev *dev = f->private_data;
-	nvme_put_ctrl(&dev->ctrl);
-	return 0;
-}
-
-static long nvme_dev_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
-{
-	struct nvme_dev *dev = f->private_data;
-	struct nvme_ns *ns;
-
-	switch (cmd) {
-	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(&dev->ctrl, NULL, (void __user *)arg);
-	case NVME_IOCTL_IO_CMD:
-		if (list_empty(&dev->ctrl.namespaces))
-			return -ENOTTY;
-		ns = list_first_entry(&dev->ctrl.namespaces, struct nvme_ns, list);
-		return nvme_user_cmd(&dev->ctrl, ns, (void __user *)arg);
-	case NVME_IOCTL_RESET:
-		dev_warn(dev->dev, "resetting controller\n");
-		return nvme_reset(dev);
-	case NVME_IOCTL_SUBSYS_RESET:
-		return nvme_subsys_reset(dev);
-	default:
-		return -ENOTTY;
-	}
-}
-
-static const struct file_operations nvme_dev_fops = {
-	.owner		= THIS_MODULE,
-	.open		= nvme_dev_open,
-	.release	= nvme_dev_release,
-	.unlocked_ioctl	= nvme_dev_ioctl,
-	.compat_ioctl	= nvme_dev_ioctl,
-};
-
 static void nvme_probe_work(struct work_struct *work)
 {
 	struct nvme_dev *dev = container_of(work, struct nvme_dev, probe_work);
@@ -2424,21 +2313,6 @@ static int nvme_reset(struct nvme_dev *dev)
 	return ret;
 }
 
-static ssize_t nvme_sysfs_reset(struct device *dev,
-				struct device_attribute *attr, const char *buf,
-				size_t count)
-{
-	struct nvme_dev *ndev = dev_get_drvdata(dev);
-	int ret;
-
-	ret = nvme_reset(ndev);
-	if (ret < 0)
-		return ret;
-
-	return count;
-}
-static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, nvme_sysfs_reset);
-
 static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
 {
 	*val = readl(to_nvme_dev(ctrl)->bar + off);
@@ -2451,6 +2325,12 @@ static int nvme_pci_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
 	return 0;
 }
 
+static int nvme_pci_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
+{
+	writel(val, to_nvme_dev(ctrl)->bar + off);
+	return 0;
+}
+
 static bool nvme_pci_io_incapable(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
@@ -2458,10 +2338,17 @@ static bool nvme_pci_io_incapable(struct nvme_ctrl *ctrl)
 	return !dev->bar || dev->online_queues < 2;
 }
 
+static int nvme_pci_reset_ctrl(struct nvme_ctrl *ctrl)
+{
+	return nvme_reset(to_nvme_dev(ctrl));
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_read64		= nvme_pci_reg_read64,
+	.reg_write32		= nvme_pci_reg_write32,
 	.io_incapable		= nvme_pci_io_incapable,
+	.reset_ctrl		= nvme_pci_reset_ctrl,
 	.free_ctrl		= nvme_pci_free_ctrl,
 };
 
@@ -2486,52 +2373,28 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!dev->queues)
 		goto free;
 
-	INIT_LIST_HEAD(&dev->ctrl.namespaces);
-	INIT_WORK(&dev->reset_work, nvme_reset_work);
 	dev->dev = get_device(&pdev->dev);
 	pci_set_drvdata(pdev, dev);
 
-	dev->ctrl.vendor = pdev->vendor;
-	dev->ctrl.ops = &nvme_pci_ctrl_ops;
-	dev->ctrl.dev = dev->dev;
-	dev->ctrl.quirks = id->driver_data;
+	INIT_LIST_HEAD(&dev->node);
+	INIT_WORK(&dev->scan_work, nvme_dev_scan);
+	INIT_WORK(&dev->probe_work, nvme_probe_work);
+	INIT_WORK(&dev->reset_work, nvme_reset_work);
 
-	result = nvme_set_instance(dev);
+	result = nvme_setup_prp_pools(dev);
 	if (result)
 		goto put_pci;
 
-	result = nvme_setup_prp_pools(dev);
+	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
+			pdev->vendor, id->driver_data);
 	if (result)
-		goto release;
-
-	kref_init(&dev->ctrl.kref);
-	dev->ctrl.device = device_create(nvme_class, &pdev->dev,
-				MKDEV(nvme_char_major, dev->ctrl.instance),
-				dev, "nvme%d", dev->ctrl.instance);
-	if (IS_ERR(dev->ctrl.device)) {
-		result = PTR_ERR(dev->ctrl.device);
 		goto release_pools;
-	}
-	get_device(dev->ctrl.device);
-	dev_set_drvdata(dev->ctrl.device, dev);
 
-	result = device_create_file(dev->ctrl.device, &dev_attr_reset_controller);
-	if (result)
-		goto put_dev;
-
-	INIT_LIST_HEAD(&dev->node);
-	INIT_WORK(&dev->scan_work, nvme_dev_scan);
-	INIT_WORK(&dev->probe_work, nvme_probe_work);
 	schedule_work(&dev->probe_work);
 	return 0;
 
- put_dev:
-	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
-	put_device(dev->ctrl.device);
  release_pools:
 	nvme_release_prp_pools(dev);
- release:
-	nvme_release_instance(dev);
  put_pci:
 	put_device(dev->dev);
  free:
@@ -2569,11 +2432,9 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->probe_work);
 	flush_work(&dev->reset_work);
 	flush_work(&dev->scan_work);
-	device_remove_file(dev->ctrl.device, &dev_attr_reset_controller);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_dev_shutdown(dev);
 	nvme_dev_remove_admin(dev);
-	device_destroy(nvme_class, MKDEV(nvme_char_major, dev->ctrl.instance));
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
 	nvme_release_prp_pools(dev);
@@ -2655,29 +2516,12 @@ static int __init nvme_init(void)
 	if (result < 0)
 		goto kill_workq;
 
-	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme",
-							&nvme_dev_fops);
-	if (result < 0)
-		goto unregister_blkdev;
-	else if (result > 0)
-		nvme_char_major = result;
-
-	nvme_class = class_create(THIS_MODULE, "nvme");
-	if (IS_ERR(nvme_class)) {
-		result = PTR_ERR(nvme_class);
-		goto unregister_chrdev;
-	}
-
 	result = pci_register_driver(&nvme_driver);
 	if (result)
-		goto destroy_class;
+		goto core_exit;
 	return 0;
 
- destroy_class:
-	class_destroy(nvme_class);
- unregister_chrdev:
-	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
- unregister_blkdev:
+ core_exit:
 	nvme_core_exit();
  kill_workq:
 	destroy_workqueue(nvme_workq);
@@ -2689,8 +2533,6 @@ static void __exit nvme_exit(void)
 	pci_unregister_driver(&nvme_driver);
 	nvme_core_exit();
 	destroy_workqueue(nvme_workq);
-	class_destroy(nvme_class);
-	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
 	BUG_ON(nvme_thread && !IS_ERR(nvme_thread));
 	_nvme_check_size();
 }
diff --git a/drivers/nvme/host/scsi.c b/drivers/nvme/host/scsi.c
index b673fe4..34c65b9 100644
--- a/drivers/nvme/host/scsi.c
+++ b/drivers/nvme/host/scsi.c
@@ -606,16 +606,11 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 	int res;
 	int nvme_sc;
 	int xfer_len;
-	u32 vs;
 	__be32 tmp_id = cpu_to_be32(ns->ns_id);
 
-	res = ctrl->ops->reg_read32(ctrl, NVME_REG_VS, &vs);
-	if (res)
-		return res;
-
 	memset(inq_response, 0, alloc_len);
 	inq_response[1] = INQ_DEVICE_IDENTIFICATION_PAGE;    /* Page Code */
-	if (vs >= NVME_VS(1, 1)) {
+	if (ctrl->vs >= NVME_VS(1, 1)) {
 		struct nvme_id_ns *id_ns;
 		void *eui;
 		int len;
@@ -627,7 +622,7 @@ static int nvme_trans_device_id_page(struct nvme_ns *ns, struct sg_io_hdr *hdr,
 
 		eui = id_ns->eui64;
 		len = sizeof(id_ns->eui64);
-		if (vs >= NVME_VS(1, 2)) {
+		if (ctrl->vs >= NVME_VS(1, 2)) {
 			if (bitmap_empty(eui, len * 8)) {
 				eui = id_ns->nguid;
 				len = sizeof(id_ns->nguid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 18/18] nvme: move chardev and sysfs interface to common code
  2015-10-16  5:58 ` [PATCH 18/18] nvme: move chardev and sysfs interface " Christoph Hellwig
@ 2015-10-22  0:11   ` J Freyensee
  2015-10-22  7:45     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: J Freyensee @ 2015-10-22  0:11 UTC (permalink / raw)


On Fri, 2015-10-16@07:58 +0200, Christoph Hellwig wrote:
> For this we need to add a proper controller init routine and a list 
> of
> all controllers that is in addition to the list of PCIe controllers,
> which stays in pci.c.  Note that we remove the sysfs device when the
> last reference to a controller is dropped now - the old code would 
> have
> kept it around longer, which doesn't make much sense.
> 
> This requires a new ->reset_ctrl operation to implement controleller

Controller got misspelled above.

> resets, and a new ->write_reg32 operation that is required to 
> implement
> subsystem resets.  We also now store caches copied of the NVMe 
> compliance
> version and the flag if a controller is attached to a subsystem or 
> not in
> the generic controller structure now.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c | 217 
> +++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/nvme/host/nvme.h |  20 +++--
>  drivers/nvme/host/pci.c  | 202 +++++--------------------------------
> ------
>  drivers/nvme/host/scsi.c |   9 +-
>  4 files changed, 250 insertions(+), 198 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a01ab5a..b8c72d2 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -28,11 +28,19 @@
>  
>  #include "nvme.h"
>  
> +#define NVME_MINORS		(1U << MINORBITS)
> +
>  static int nvme_major;
>  module_param(nvme_major, int, 0);
>  
> +static int nvme_char_major;
> +module_param(nvme_char_major, int, 0);
> +
> +static LIST_HEAD(nvme_ctrl_list);
>  DEFINE_SPINLOCK(dev_list_lock);
>  
> +static struct class *nvme_class;
> +
>  static void nvme_free_ns(struct kref *kref)
>  {
>  	struct nvme_ns *ns = container_of(kref, struct nvme_ns, 
> kref);
> @@ -358,7 +366,7 @@ static int nvme_submit_io(struct nvme_ns *ns, 
> struct nvme_user_io __user *uio)
>  			metadata, meta_len, io.slba, NULL, 0);
>  }
>  
> -int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
> +static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>  			struct nvme_passthru_cmd __user *ucmd)
>  {
>  	struct nvme_passthru_cmd cmd;
> @@ -590,6 +598,12 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>  	u64 cap;
>  	int ret, page_shift;
>  
> +	ret = ctrl->ops->reg_read32(ctrl, NVME_REG_VS, &ctrl->vs);
> +	if (ret) {
> +		dev_err(ctrl->dev, "Reading VS failed (%d)\n", ret);
> +		return ret;
> +	}
> +
>  	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &cap);
>  	if (ret) {
>  		dev_err(ctrl->dev, "Reading CAP failed (%d)\n", 
> ret);
> @@ -598,6 +612,9 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>  	page_shift = NVME_CAP_MPSMIN(cap) + 12;
>  	ctrl->page_size = 1 << page_shift;
>  
> +	if (ctrl->vs >= NVME_VS(1, 1))
> +		ctrl->subsystem = NVME_CAP_NSSRC(cap);
> +
>  	ret = nvme_identify_ctrl(ctrl, &id);
>  	if (ret) {
>  		dev_err(ctrl->dev, "Identify Controller failed 
> (%d)\n", ret);
> @@ -630,18 +647,85 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>  	return 0;
>  }
>  
> -static void nvme_free_ctrl(struct kref *kref)
> +static int nvme_dev_open(struct inode *inode, struct file *file)
>  {
> -	struct nvme_ctrl *ctrl = container_of(kref, struct 
> nvme_ctrl, kref);
> +	struct nvme_ctrl *ctrl;
> +	int instance = iminor(inode);
> +	int ret = -ENODEV;
>  
> -	ctrl->ops->free_ctrl(ctrl);
> +	spin_lock(&dev_list_lock);
> +	list_for_each_entry(ctrl, &nvme_ctrl_list, node) {

list_for_each_entry_safe() and/or some type of lock access?

> +		if (ctrl->instance != instance)
> +			continue;
> +
> +		if (!ctrl->admin_q) {
> +			ret = -EWOULDBLOCK;
> +			break;
> +		}
> +		if (!kref_get_unless_zero(&ctrl->kref))
> +			break;
> +		file->private_data = ctrl;
> +		ret = 0;
> +		break;
> +	}
> +	spin_unlock(&dev_list_lock);
> +
> +	return ret;
>  }
>  
> -void nvme_put_ctrl(struct nvme_ctrl *ctrl)
> +static int nvme_dev_release(struct inode *inode, struct file *file)
>  {
> -	kref_put(&ctrl->kref, nvme_free_ctrl);
> +	nvme_put_ctrl(file->private_data);
> +	return 0;
> +}
> +
> +static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
> +		unsigned long arg)
> +{
> +	struct nvme_ctrl *ctrl = file->private_data;
> +	void __user *argp = (void __user *)arg;
> +	struct nvme_ns *ns;
> +
> +	switch (cmd) {
> +	case NVME_IOCTL_ADMIN_CMD:
> +		return nvme_user_cmd(ctrl, NULL, argp);
> +	case NVME_IOCTL_IO_CMD:
> +		if (list_empty(&ctrl->namespaces))
> +			return -ENOTTY;
> +		ns = list_first_entry(&ctrl->namespaces, struct 
> nvme_ns, list);
> +		return nvme_user_cmd(ctrl, ns, argp);
> +	case NVME_IOCTL_RESET:
> +		dev_warn(ctrl->dev, "resetting controller\n");
> +		return ctrl->ops->reset_ctrl(ctrl);
> +	case NVME_IOCTL_SUBSYS_RESET:
> +		return nvme_reset_subsystem(ctrl);
> +	default:
> +		return -ENOTTY;
> +	}
>  }
>  
> +static const struct file_operations nvme_dev_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= nvme_dev_open,
> +	.release	= nvme_dev_release,
> +	.unlocked_ioctl	= nvme_dev_ioctl,
> +	.compat_ioctl	= nvme_dev_ioctl,
> +};
> +
> +static ssize_t nvme_sysfs_reset(struct device *dev,
> +				struct device_attribute *attr, const 
> char *buf,
> +				size_t count)
> +{
> +	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> +	int ret;
> +
> +	ret = ctrl->ops->reset_ctrl(ctrl);
> +	if (ret < 0)
> +		return ret;
> +	return count;
> +}
> +static DEVICE_ATTR(reset_controller, S_IWUSR, NULL, 
> nvme_sysfs_reset);
> +
>  static int ns_cmp(void *priv, struct list_head *a, struct list_head 
> *b)
>  {
>  	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
> @@ -803,6 +887,106 @@ void nvme_remove_namespaces(struct nvme_ctrl 
> *ctrl)
>  		nvme_ns_remove(ns);
>  }
>  
> +static DEFINE_IDA(nvme_instance_ida);
> +
> +static int nvme_set_instance(struct nvme_ctrl *ctrl)
> +{
> +	int instance, error;
> +
> +	do {
> +		if (!ida_pre_get(&nvme_instance_ida, GFP_KERNEL))
> +			return -ENODEV;
> +
> +		spin_lock(&dev_list_lock);
> +		error = ida_get_new(&nvme_instance_ida, &instance);
> +		spin_unlock(&dev_list_lock);
> +	} while (error == -EAGAIN);
> +
> +	if (error)
> +		return -ENODEV;
> +
> +	ctrl->instance = instance;
> +	return 0;
> +}
> +
> +static void nvme_release_instance(struct nvme_ctrl *ctrl)
> +{
> +	spin_lock(&dev_list_lock);
> +	ida_remove(&nvme_instance_ida, ctrl->instance);
> +	spin_unlock(&dev_list_lock);
> +}
> +
> +static void nvme_free_ctrl(struct kref *kref)
> +{
> +	struct nvme_ctrl *ctrl = container_of(kref, struct 
> nvme_ctrl, kref);
> +
> +	spin_lock(&dev_list_lock);
> +	list_del(&ctrl->node);
> +	spin_unlock(&dev_list_lock);
> +
> +	put_device(ctrl->device);
> +	nvme_release_instance(ctrl);
> +	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl
> ->instance));
> +
> +	ctrl->ops->free_ctrl(ctrl);
> +}
> +
> +void nvme_put_ctrl(struct nvme_ctrl *ctrl)
> +{
> +	kref_put(&ctrl->kref, nvme_free_ctrl);
> +}
> +
> +/*
> + * Initialize a NVMe controller structures.  This needs to be called 
> during
> + * earliest initialization so that we have the initialized 
> structured around
> + * during probing.
> + */
> +int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> +		const struct nvme_ctrl_ops *ops, u16 vendor,
> +		unsigned long quirks)
> +{
> +	int ret;
> +
> +	INIT_LIST_HEAD(&ctrl->namespaces);
> +	kref_init(&ctrl->kref);
> +	ctrl->dev = dev;
> +	ctrl->ops = ops;
> +	ctrl->vendor = vendor;
> +	ctrl->quirks = quirks;
> +
> +	ret = nvme_set_instance(ctrl);
> +	if (ret)
> +		goto out;
> +
> +	ctrl->device = device_create(nvme_class, ctrl->dev,
> +				MKDEV(nvme_char_major, ctrl
> ->instance),
> +				dev, "nvme%d", ctrl->instance);
> +	if (IS_ERR(ctrl->device)) {
> +		ret = PTR_ERR(ctrl->device);
> +		goto out_release_instance;
> +	}
> +	get_device(ctrl->device);
> +	dev_set_drvdata(ctrl->device, ctrl);
> +
> +	ret = device_create_file(ctrl->device, 
> &dev_attr_reset_controller);
> +	if (ret)
> +		goto out_put_device;
> +
> +	spin_lock(&dev_list_lock);
> +	list_add_tail(&ctrl->node, &nvme_ctrl_list);
> +	spin_unlock(&dev_list_lock);
> +
> +	return 0;
> +
> +out_put_device:
> +	put_device(ctrl->device);
> +	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl
> ->instance));
> +out_release_instance:
> +	nvme_release_instance(ctrl);
> +out:
> +	return ret;
> +}
> +
>  int __init nvme_core_init(void)
>  {
>  	int result;
> @@ -813,10 +997,31 @@ int __init nvme_core_init(void)
>  	else if (result > 0)
>  		nvme_major = result;
>  
> +	result = __register_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme",
> +							&nvme_dev_fo
> ps);
> +	if (result < 0)
> +		goto unregister_blkdev;
> +	else if (result > 0)
> +		nvme_char_major = result;
> +
> +	nvme_class = class_create(THIS_MODULE, "nvme");

It would be better to have "nvme" as a #define somewhere, probably in
the .h?

> +	if (IS_ERR(nvme_class)) {
> +		result = PTR_ERR(nvme_class);
> +		goto unregister_chrdev;
> +	}
> +
>  	return 0;
> +
> + unregister_chrdev:
> +	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme");
> + unregister_blkdev:
> +	unregister_blkdev(nvme_major, "nvme");
> +	return result;
>  }
>  
>  void nvme_core_exit(void)
>  {
>  	unregister_blkdev(nvme_major, "nvme");
> +	class_destroy(nvme_class);
> +	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, 
> "nvme");
>  }
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 53e82feb..da63835 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -19,8 +19,6 @@
>  #include <linux/kref.h>
>  #include <linux/blk-mq.h>
>  
> -struct nvme_passthru_cmd;
> -
>  extern unsigned char nvme_io_timeout;
>  #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
>  
> @@ -48,6 +46,7 @@ struct nvme_ctrl {
>  	struct blk_mq_tag_set *tagset;
>  	struct list_head namespaces;
>  	struct device *device;	/* char device */
> +	struct list_head node;
>  
>  	char name[12];
>  	char serial[20];
> @@ -60,6 +59,8 @@ struct nvme_ctrl {
>  	u16 abort_limit;
>  	u8 event_limit;
>  	u8 vwc;
> +	u32 vs;
> +	bool subsystem;

OK, so 'bool subsystem' got added back in.  Not sure still how a bool
helps define a controller into a given subsystem.  Isn't the definition
of an NVM subsystem 1 or more controllers?  So every new 'struct
nvme_ctrl' instance is going to set this to 'true'?

Or looking into the future, say if this is on a future fabric
connection, there could be lots of controllers under lots of distinct
subsystems.  Then I don't know how 'bool subsystem' makes sense and
distinguishes a controller in a given NVM subsystem.

>  	u16 vendor;
>  	unsigned long quirks;
>  };
> @@ -87,7 +88,9 @@ struct nvme_ns {
>  struct nvme_ctrl_ops {
>  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 
> *val);
>  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 
> *val);
> +	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 
> val);

I don't think they are being used currently, but the ACQ and ASQ 8-byte
registers do have RW fields.  Maybe add "int (*reg_write64)()" as well?

>  	bool (*io_incapable)(struct nvme_ctrl *ctrl);
> +	int (*reset_ctrl)(struct nvme_ctrl *ctrl);

Probably would want a "(*shutdown_ctrl)()" as well?

>  	void (*free_ctrl)(struct nvme_ctrl *ctrl);
>  };
>  
> @@ -111,6 +114,13 @@ static inline bool nvme_io_incapable(struct 
> nvme_ctrl *ctrl)
>  	return val & NVME_CSTS_CFS;
>  }
>  
> +static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
> +{
> +	if (!ctrl->subsystem)
> +		return -ENOTTY;
> +	return ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 
> 0x4E564D65);

It would be really good to have the hex value in a #define, most likely
located in the nvme.h file.
 
> +}
> +
>  

Lots of good work Christoph, thanks,
Jay

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 18/18] nvme: move chardev and sysfs interface to common code
  2015-10-22  0:11   ` J Freyensee
@ 2015-10-22  7:45     ` Christoph Hellwig
  2015-10-22 18:36       ` J Freyensee
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-10-22  7:45 UTC (permalink / raw)


On Wed, Oct 21, 2015@05:11:11PM -0700, J Freyensee wrote:
> > +	spin_lock(&dev_list_lock);
> > +	list_for_each_entry(ctrl, &nvme_ctrl_list, node) {
> 
> list_for_each_entry_safe() and/or some type of lock access?

list_for_each_entry_safe does not synchronize, it just ensures you
can continue to iterate after deleting the current item.

And the spin_lock() call above provides the required synchronization.

> > +	if (result < 0)
> > +		goto unregister_blkdev;
> > +	else if (result > 0)
> > +		nvme_char_major = result;
> > +
> > +	nvme_class = class_create(THIS_MODULE, "nvme");
> 
> It would be better to have "nvme" as a #define somewhere, probably in
> the .h?

Why?

> >  	char name[12];
> >  	char serial[20];
> > @@ -60,6 +59,8 @@ struct nvme_ctrl {
> >  	u16 abort_limit;
> >  	u8 event_limit;
> >  	u8 vwc;
> > +	u32 vs;
> > +	bool subsystem;
> 
> OK, so 'bool subsystem' got added back in.  Not sure still how a bool
> helps define a controller into a given subsystem.  Isn't the definition
> of an NVM subsystem 1 or more controllers?  So every new 'struct
> nvme_ctrl' instance is going to set this to 'true'?

In NVMe 1.0 the concept of subsystems didn't exist.  Now strictly speaking
what we care about here is if a subsystem _reset_ is supported, but I've
kept the name from the existing code for now.

> Or looking into the future, say if this is on a future fabric
> connection, there could be lots of controllers under lots of distinct
> subsystems.  Then I don't know how 'bool subsystem' makes sense and
> distinguishes a controller in a given NVM subsystem.

That's not what the flag is used for.  It just indicates if we can do
a subsystem reset.

> > @@ -87,7 +88,9 @@ struct nvme_ns {
> >  struct nvme_ctrl_ops {
> >  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 
> > *val);
> >  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 
> > *val);
> > +	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 
> > val);
> 
> I don't think they are being used currently, but the ACQ and ASQ 8-byte
> registers do have RW fields.  Maybe add "int (*reg_write64)()" as well?

We can do that once we actually need it.  But at least ACQ and ASQ are
deeply tied to PCI specific initialization so right now there is no need
for that.

> >  	bool (*io_incapable)(struct nvme_ctrl *ctrl);
> > +	int (*reset_ctrl)(struct nvme_ctrl *ctrl);
> 
> Probably would want a "(*shutdown_ctrl)()" as well?

We currently don't shut the controller down from generic code, so until
we have a state machine that might do that there's no need for that.

> 
> > +	return ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 
> > 0x4E564D65);
> 
> It would be really good to have the hex value in a #define, most likely
> located in the nvme.h file.

Feel free to send incremental patches for these sorts of cleanups!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 18/18] nvme: move chardev and sysfs interface to common code
  2015-10-22  7:45     ` Christoph Hellwig
@ 2015-10-22 18:36       ` J Freyensee
  2015-10-22 18:59         ` Jon Derrick
  0 siblings, 1 reply; 59+ messages in thread
From: J Freyensee @ 2015-10-22 18:36 UTC (permalink / raw)


On Thu, 2015-10-22@09:45 +0200, Christoph Hellwig wrote:
> On Wed, Oct 21, 2015@05:11:11PM -0700, J Freyensee wrote:
> > > +	spin_lock(&dev_list_lock);
> > > +	list_for_each_entry(ctrl, &nvme_ctrl_list, node) {
> > 
> > list_for_each_entry_safe() and/or some type of lock access?
> 
> list_for_each_entry_safe does not synchronize, it just ensures you
> can continue to iterate after deleting the current item.
> 
> And the spin_lock() call above provides the required synchronization.
> 
> > > +	if (result < 0)
> > > +		goto unregister_blkdev;
> > > +	else if (result > 0)
> > > +		nvme_char_major = result;
> > > +
> > > +	nvme_class = class_create(THIS_MODULE, "nvme");
> > 
> > It would be better to have "nvme" as a #define somewhere, probably 
> > in
> > the .h?
> 
> Why?

Well, for starters, it's good practice to have strings like this in a
#define.

I can send a patch to redefine this and 0x4E564D65 in the nvme.h.

> 
> > >  	char name[12];
> > >  	char serial[20];
> > > @@ -60,6 +59,8 @@ struct nvme_ctrl {
> > >  	u16 abort_limit;
> > >  	u8 event_limit;
> > >  	u8 vwc;
> > > +	u32 vs;
> > > +	bool subsystem;
> > 
> > OK, so 'bool subsystem' got added back in.  Not sure still how a 
> > bool
> > helps define a controller into a given subsystem.  Isn't the 
> > definition
> > of an NVM subsystem 1 or more controllers?  So every new 'struct
> > nvme_ctrl' instance is going to set this to 'true'?
> 
> In NVMe 1.0 the concept of subsystems didn't exist.  Now strictly 
> speaking
> what we care about here is if a subsystem _reset_ is supported, but 
> I've
> kept the name from the existing code for now.

OK understood.

> 
> > Or looking into the future, say if this is on a future fabric
> > connection, there could be lots of controllers under lots of 
> > distinct
> > subsystems.  Then I don't know how 'bool subsystem' makes sense and
> > distinguishes a controller in a given NVM subsystem.
> 
> That's not what the flag is used for.  It just indicates if we can do
> a subsystem reset.
> 
> > > @@ -87,7 +88,9 @@ struct nvme_ns {
> > >  struct nvme_ctrl_ops {
> > >  	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 
> > > *val);
> > >  	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 
> > > *val);
> > > +	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 
> > > val);
> > 
> > I don't think they are being used currently, but the ACQ and ASQ 8
> > -byte
> > registers do have RW fields.  Maybe add "int (*reg_write64)()" as 
> > well?
> 
> We can do that once we actually need it.  But at least ACQ and ASQ 
> are
> deeply tied to PCI specific initialization so right now there is no 
> need
> for that.
> 
> > >  	bool (*io_incapable)(struct nvme_ctrl *ctrl);
> > > +	int (*reset_ctrl)(struct nvme_ctrl *ctrl);
> > 
> > Probably would want a "(*shutdown_ctrl)()" as well?
> 
> We currently don't shut the controller down from generic code, so 
> until
> we have a state machine that might do that there's no need for that.
> 
> > 
> > > +	return ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 
> > > 0x4E564D65);
> > 
> > It would be really good to have the hex value in a #define, most 
> > likely
> > located in the nvme.h file.
> 
> Feel free to send incremental patches for these sorts of cleanups!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 18/18] nvme: move chardev and sysfs interface to common code
  2015-10-22 18:36       ` J Freyensee
@ 2015-10-22 18:59         ` Jon Derrick
  0 siblings, 0 replies; 59+ messages in thread
From: Jon Derrick @ 2015-10-22 18:59 UTC (permalink / raw)


> > Why?
> 
> Well, for starters, it's good practice to have strings like this in a
> #define.
> 
> I can send a patch to redefine this and 0x4E564D65 in the nvme.h.
IMO, make the hex "NVMe" a #define or at least keep its comment.
Keep "nvme" in the class_create call because there is no point in obfuscating that :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2015-10-23  5:41 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-16  5:58 nvme driver split V2 Christoph Hellwig
2015-10-16  5:58 ` [PATCH 01/18] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
2015-10-20 10:04   ` Sagi Grimberg
2015-10-20 14:07     ` Busch, Keith
2015-10-16  5:58 ` [PATCH 02/18] nvme: move struct nvme_iod to pci.c Christoph Hellwig
2015-10-20 10:04   ` Sagi Grimberg
2015-10-16  5:58 ` [PATCH 03/18] nvme: split command submission helpers out of pci.c Christoph Hellwig
2015-10-20 10:07   ` Sagi Grimberg
2015-10-21 18:48   ` J Freyensee
2015-10-16  5:58 ` [PATCH 04/18] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
2015-10-20 10:08   ` Sagi Grimberg
2015-10-21 18:58   ` J Freyensee
2015-10-21 19:10     ` Busch, Keith
2015-10-16  5:58 ` [PATCH 05/18] nvme: use offset instead of a struct for registers Christoph Hellwig
2015-10-16 17:30   ` Busch, Keith
2015-10-21 20:28   ` J Freyensee
2015-10-16  5:58 ` [PATCH 06/18] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
2015-10-20 10:19   ` Sagi Grimberg
2015-10-20 10:26     ` Christoph Hellwig
2015-10-20 10:44       ` Sagi Grimberg
2015-10-20 11:30         ` Christoph Hellwig
2015-10-21 14:40           ` Sagi Grimberg
2015-10-21 21:23   ` J Freyensee
2015-10-21 22:51     ` Busch, Keith
2015-10-22  0:15       ` J Freyensee
2015-10-22  7:37     ` Christoph Hellwig
2015-10-16  5:58 ` [PATCH 07/18] nvme: simplify nvme_setup_prps calling convention Christoph Hellwig
2015-10-20 10:30   ` Sagi Grimberg
2015-10-16  5:58 ` [PATCH 08/18] nvme: refactor nvme_queue_rq Christoph Hellwig
2015-10-16  5:58 ` [PATCH 09/18] nvme: move nvme_error_status to common code Christoph Hellwig
2015-10-20 10:54   ` Sagi Grimberg
2015-10-16  5:58 ` [PATCH 10/18] nvme: move nvme_setup_flush and nvme_setup_rw " Christoph Hellwig
2015-10-20 11:01   ` Sagi Grimberg
2015-10-21  6:55     ` Christoph Hellwig
2015-10-21 14:41       ` Sagi Grimberg
2015-10-16  5:58 ` [PATCH 11/18] nvme: split __nvme_submit_sync_cmd Christoph Hellwig
2015-10-16  5:58 ` [PATCH 12/18] nvme: use the block layer for userspace passthrough metadata Christoph Hellwig
2015-10-16 20:04   ` Busch, Keith
2015-10-18 18:22     ` Christoph Hellwig
2015-10-16  5:58 ` [PATCH 13/18] nvme: move block_device_operations and ns/ctrl freeing to common code Christoph Hellwig
2015-10-16  5:58 ` [PATCH 14/18] nvme: add explicit quirk handling Christoph Hellwig
2015-10-16  5:58 ` [PATCH 15/18] nvme: add a common helper to read Identify Controller data Christoph Hellwig
2015-10-21 22:44   ` J Freyensee
2015-10-22  7:38     ` Christoph Hellwig
2015-10-16  5:58 ` [PATCH 16/18] nvme: move the call to nvme_init_identify earlier Christoph Hellwig
2015-10-16  5:58 ` [PATCH 17/18] nvme: move namespace scanning to common code Christoph Hellwig
2015-10-16  6:14   ` Ming Lin
2015-10-16  6:16     ` Christoph Hellwig
2015-10-21 23:27   ` J Freyensee
2015-10-22  7:39     ` Christoph Hellwig
2015-10-22 13:48       ` Busch, Keith
2015-10-22 16:30         ` Christoph Hellwig
2015-10-22 21:24           ` Busch, Keith
2015-10-23  5:41             ` Christoph Hellwig
2015-10-16  5:58 ` [PATCH 18/18] nvme: move chardev and sysfs interface " Christoph Hellwig
2015-10-22  0:11   ` J Freyensee
2015-10-22  7:45     ` Christoph Hellwig
2015-10-22 18:36       ` J Freyensee
2015-10-22 18:59         ` Jon Derrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).