Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
@ 2018-03-23 22:19 Keith Busch
  2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)


The PCI interrupt vectors intended to be associated with a queue may
not start at 0. This patch adds an offset parameter so blk-mq may find
the intended affinity mask. The default value is 0 so existing drivers
that don't care about this parameter don't need to change.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq-pci.c         | 12 ++++++++++--
 include/linux/blk-mq-pci.h |  2 ++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..1040a7705c13 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -21,6 +21,7 @@
  * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
  * @set:	tagset to provide the mapping for
  * @pdev:	PCI device associated with @set.
+ * @offset:	PCI irq starting vector offset
  *
  * This function assumes the PCI device @pdev has at least as many available
  * interrupt vectors as @set has queues.  It will then query the vector
@@ -28,13 +29,14 @@
  * that maps a queue to the CPUs that have irq affinity for the corresponding
  * vector.
  */
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+			    int offset)
 {
 	const struct cpumask *mask;
 	unsigned int queue, cpu;
 
 	for (queue = 0; queue < set->nr_hw_queues; queue++) {
-		mask = pci_irq_get_affinity(pdev, queue);
+		mask = pci_irq_get_affinity(pdev, queue + offset);
 		if (!mask)
 			goto fallback;
 
@@ -50,4 +52,10 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
 		set->mq_map[cpu] = 0;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__blk_mq_pci_map_queues);
+
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+{
+	return __blk_mq_pci_map_queues(set, pdev, 0);
+}
 EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
index 6338551e0fb9..5a92ecdbd78e 100644
--- a/include/linux/blk-mq-pci.h
+++ b/include/linux/blk-mq-pci.h
@@ -5,6 +5,8 @@
 struct blk_mq_tag_set;
 struct pci_dev;
 
+int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+			    int offset);
 int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
 
 #endif /* _LINUX_BLK_MQ_PCI_H */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nvme-pci: Remove unused queue parameter
  2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
@ 2018-03-23 22:19 ` Keith Busch
  2018-03-26  1:47   ` Ming Lei
  2018-03-27 14:17   ` Christoph Hellwig
  2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)


All nvme queue memory is allocated up front. We don't take the node
into consideration when creating queues anymore, so removing the unused
parameter.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cef5ce851a92..632166f7d8f2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1379,8 +1379,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
 	return 0;
 }
 
-static int nvme_alloc_queue(struct nvme_dev *dev, int qid,
-		int depth, int node)
+static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
 {
 	struct nvme_queue *nvmeq = &dev->queues[qid];
 
@@ -1595,8 +1594,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
 	if (result < 0)
 		return result;
 
-	result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH,
-			dev_to_node(dev->dev));
+	result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
 	if (result)
 		return result;
 
@@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
 	int ret = 0;
 
 	for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
-		/* vector == qid - 1, match nvme_create_queue */
-		if (nvme_alloc_queue(dev, i, dev->q_depth,
-		     pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
+		if (nvme_alloc_queue(dev, i, dev->q_depth)) {
 			ret = -ENOMEM;
 			break;
 		}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
  2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
@ 2018-03-23 22:19 ` Keith Busch
  2018-03-27 14:20   ` Christoph Hellwig
  2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)


From: Jianchao Wang <jianchao.w.wang@oracle.com>

The admin and first IO queues shared the first irq vector, which has an
affinity mask including cpu0. If a system allows cpu0 to be offlined,
the admin queue may not be usable if no other CPUs in the affinity mask
are online. This is a problem since unlike IO queues, there is only
one admin queue that always needs to be usable.

To fix, this patch allocates one pre_vector for the admin queue that
is assigned all CPUs, so will always be accessible. The IO queues are
assigned the remaining managed vectors.

In case a controller has only one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.

Signed-off-by: Jianchao Wang <jianchao.w.wang at oracle.com>
Reviewed-by: Ming Lei <ming.lei at redhat.com>
[changelog, code comments, merge, and blk-mq pci vector offset]
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 632166f7d8f2..7b31bc01df6c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -84,6 +84,7 @@ struct nvme_dev {
 	struct dma_pool *prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
+	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
 	void __iomem *bar;
@@ -139,6 +140,16 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
 	return container_of(ctrl, struct nvme_dev, ctrl);
 }
 
+static inline unsigned int nvme_ioq_vector(struct nvme_dev *dev,
+		unsigned int qid)
+{
+	/*
+	 * A queue's vector matches the queue identifier unless the controller
+	 * has only one vector available.
+	 */
+	return (dev->num_vecs == 1) ? 0 : qid;
+}
+
 /*
  * An NVM Express queue.  Each device has at least two (one for admin
  * commands and one for I/O commands).
@@ -414,7 +425,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 {
 	struct nvme_dev *dev = set->driver_data;
 
-	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
+	return __blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
+				       dev->num_vecs > 1);
 }
 
 /**
@@ -1455,7 +1467,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 		nvmeq->sq_cmds_io = dev->cmb + offset;
 	}
 
-	nvmeq->cq_vector = qid - 1;
+	nvmeq->cq_vector = nvme_ioq_vector(dev, qid);
 	result = adapter_alloc_cq(dev, qid, nvmeq);
 	if (result < 0)
 		goto release_vector;
@@ -1908,6 +1920,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int result, nr_io_queues;
 	unsigned long size;
+	struct irq_affinity affd = {.pre_vectors = 1};
+	int ret;
 
 	nr_io_queues = num_present_cpus();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
@@ -1944,11 +1958,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	 * setting up the full range we need.
 	 */
 	pci_free_irq_vectors(pdev);
-	nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
-			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
-	if (nr_io_queues <= 0)
+	ret = pci_alloc_irq_vectors_affinity(pdev, 1, (nr_io_queues + 1),
+			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
+	if (ret <= 0)
 		return -EIO;
-	dev->max_qid = nr_io_queues;
+	dev->num_vecs = ret;
+	dev->max_qid = max(ret - 1, 1);
 
 	/*
 	 * Should investigate if there's a performance win from allocating
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
  2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
  2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
  2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-24 13:55 ` jianchao.wang
  2018-03-26 19:33   ` Keith Busch
  2018-03-26  1:50 ` Ming Lei
  2018-03-27 14:17 ` Christoph Hellwig
  4 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2018-03-24 13:55 UTC (permalink / raw)


Hi Keith

Thanks for your time and patch for this.

On 03/24/2018 06:19 AM, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  block/blk-mq-pci.c         | 12 ++++++++++--
>  include/linux/blk-mq-pci.h |  2 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
>   * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
>   * @set:	tagset to provide the mapping for
>   * @pdev:	PCI device associated with @set.
> + * @offset:	PCI irq starting vector offset
>   *
>   * This function assumes the PCI device @pdev has at least as many available
>   * interrupt vectors as @set has queues.  It will then query the vector
> @@ -28,13 +29,14 @@
>   * that maps a queue to the CPUs that have irq affinity for the corresponding
>   * vector.
>   */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> +			    int offset)
>  {
>  	const struct cpumask *mask;
>  	unsigned int queue, cpu;
>  
>  	for (queue = 0; queue < set->nr_hw_queues; queue++) {
> -		mask = pci_irq_get_affinity(pdev, queue);
> +		mask = pci_irq_get_affinity(pdev, queue + offset);
>  		if (!mask)
>  			goto fallback;
>  

Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
give the mapping from hctx queue num to device-relative interrupt vector index.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nvme-pci: Remove unused queue parameter
  2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
@ 2018-03-26  1:47   ` Ming Lei
  2018-03-26 14:48     ` Keith Busch
  2018-03-27 14:17   ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2018-03-26  1:47 UTC (permalink / raw)


On Fri, Mar 23, 2018@04:19:22PM -0600, Keith Busch wrote:
> All nvme queue memory is allocated up front. We don't take the node
> into consideration when creating queues anymore, so removing the unused
> parameter.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/pci.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cef5ce851a92..632166f7d8f2 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1379,8 +1379,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
>  	return 0;
>  }
>  
> -static int nvme_alloc_queue(struct nvme_dev *dev, int qid,
> -		int depth, int node)
> +static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
>  {
>  	struct nvme_queue *nvmeq = &dev->queues[qid];
>  
> @@ -1595,8 +1594,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
>  	if (result < 0)
>  		return result;
>  
> -	result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH,
> -			dev_to_node(dev->dev));
> +	result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
>  	if (result)
>  		return result;
>  
> @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
>  	int ret = 0;
>  
>  	for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
> -		/* vector == qid - 1, match nvme_create_queue */
> -		if (nvme_alloc_queue(dev, i, dev->q_depth,
> -		     pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
> +		if (nvme_alloc_queue(dev, i, dev->q_depth)) {
>  			ret = -ENOMEM;
>  			break;
>  		}

nvme_create_io_queues() is called after pci_alloc_irq_vectors() returns,
and the above pci_irq_get_node() should return the correct node info,
right?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
  2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
                   ` (2 preceding siblings ...)
  2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
@ 2018-03-26  1:50 ` Ming Lei
  2018-03-26 19:37   ` Keith Busch
  2018-03-27 14:17 ` Christoph Hellwig
  4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2018-03-26  1:50 UTC (permalink / raw)


On Fri, Mar 23, 2018@04:19:21PM -0600, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  block/blk-mq-pci.c         | 12 ++++++++++--
>  include/linux/blk-mq-pci.h |  2 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
>   * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
>   * @set:	tagset to provide the mapping for
>   * @pdev:	PCI device associated with @set.
> + * @offset:	PCI irq starting vector offset
>   *
>   * This function assumes the PCI device @pdev has at least as many available
>   * interrupt vectors as @set has queues.  It will then query the vector
> @@ -28,13 +29,14 @@
>   * that maps a queue to the CPUs that have irq affinity for the corresponding
>   * vector.
>   */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> +			    int offset)
>  {
>  	const struct cpumask *mask;
>  	unsigned int queue, cpu;
>  
>  	for (queue = 0; queue < set->nr_hw_queues; queue++) {
> -		mask = pci_irq_get_affinity(pdev, queue);
> +		mask = pci_irq_get_affinity(pdev, queue + offset);
>  		if (!mask)
>  			goto fallback;
>  
> @@ -50,4 +52,10 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
>  		set->mq_map[cpu] = 0;
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(__blk_mq_pci_map_queues);
> +
> +int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +{
> +	return __blk_mq_pci_map_queues(set, pdev, 0);
> +}
>  EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
> diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
> index 6338551e0fb9..5a92ecdbd78e 100644
> --- a/include/linux/blk-mq-pci.h
> +++ b/include/linux/blk-mq-pci.h
> @@ -5,6 +5,8 @@
>  struct blk_mq_tag_set;
>  struct pci_dev;
>  
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> +			    int offset);
>  int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
>  
>  #endif /* _LINUX_BLK_MQ_PCI_H */
> -- 
> 2.14.3
> 

Given no many callers of blk_mq_pci_map_queues(), I suggest to add the
parameter of 'offset' to this API directly, then people may keep the 
'.pre_vectors' stuff in mind, and avoid to misuse it.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nvme-pci: Remove unused queue parameter
  2018-03-26  1:47   ` Ming Lei
@ 2018-03-26 14:48     ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 14:48 UTC (permalink / raw)


On Mon, Mar 26, 2018@09:47:07AM +0800, Ming Lei wrote:
> On Fri, Mar 23, 2018@04:19:22PM -0600, Keith Busch wrote:
> > @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
> >  	int ret = 0;
> >  
> >  	for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
> > -		/* vector == qid - 1, match nvme_create_queue */
> > -		if (nvme_alloc_queue(dev, i, dev->q_depth,
> > -		     pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
> > +		if (nvme_alloc_queue(dev, i, dev->q_depth)) {
> >  			ret = -ENOMEM;
> >  			break;
> >  		}
> 
> nvme_create_io_queues() is called after pci_alloc_irq_vectors() returns,
> and the above pci_irq_get_node() should return the correct node info,
> right?

Right, the return is correct. It's just not being used anymore.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
  2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
@ 2018-03-26 19:33   ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 19:33 UTC (permalink / raw)


On Sat, Mar 24, 2018@09:55:49PM +0800, jianchao.wang wrote:
> Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
> give the mapping from hctx queue num to device-relative interrupt vector index.

If a driver's mapping is so complicated as to require a special per-hctx
callback, it'd be just as easy to implement the mapping in that driver's
blk_mq_ops' 'map_queues' directly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
  2018-03-26  1:50 ` Ming Lei
@ 2018-03-26 19:37   ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 19:37 UTC (permalink / raw)


On Mon, Mar 26, 2018@09:50:38AM +0800, Ming Lei wrote:
> 
> Given no many callers of blk_mq_pci_map_queues(), I suggest to add the
> parameter of 'offset' to this API directly, then people may keep the 
> '.pre_vectors' stuff in mind, and avoid to misuse it.

Yeah, I think I have to agree. I was trying really hard to not touch
other drivers, but the concept doesn't seem odd enough to justify hiding
it behind a default parameter. I'll send v2 tomorrow if there's no other
feedback.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
  2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
                   ` (3 preceding siblings ...)
  2018-03-26  1:50 ` Ming Lei
@ 2018-03-27 14:17 ` Christoph Hellwig
  4 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:17 UTC (permalink / raw)


On Fri, Mar 23, 2018@04:19:21PM -0600, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  block/blk-mq-pci.c         | 12 ++++++++++--
>  include/linux/blk-mq-pci.h |  2 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
>   * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
>   * @set:	tagset to provide the mapping for
>   * @pdev:	PCI device associated with @set.
> + * @offset:	PCI irq starting vector offset
>   *
>   * This function assumes the PCI device @pdev has at least as many available
>   * interrupt vectors as @set has queues.  It will then query the vector
> @@ -28,13 +29,14 @@
>   * that maps a queue to the CPUs that have irq affinity for the corresponding
>   * vector.
>   */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> +			    int offset)

Can you just change the blk_mq_pci_map_queues prototype instead?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] nvme-pci: Remove unused queue parameter
  2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
  2018-03-26  1:47   ` Ming Lei
@ 2018-03-27 14:17   ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:17 UTC (permalink / raw)


Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-27 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:20 UTC (permalink / raw)


> +static inline unsigned int nvme_ioq_vector(struct nvme_dev *dev,
> +		unsigned int qid)

No need for the inline here I think.

> +{
> +	/*
> +	 * A queue's vector matches the queue identifier unless the controller
> +	 * has only one vector available.
> +	 */
> +	return (dev->num_vecs == 1) ? 0 : qid;

and no need for the braces here.

> +	struct irq_affinity affd = {.pre_vectors = 1};

	struct irq_affinity affd = {
		.pre_vectors	= 1
	};

to make it a little more readable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-27 15:39 Keith Busch
@ 2018-03-27 15:39 ` Keith Busch
  2018-03-28  2:08   ` Ming Lei
  2018-03-28  7:32   ` Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-27 15:39 UTC (permalink / raw)


The admin and first IO queues shared the first irq vector, which has an
affinity mask including cpu0. If a system allows cpu0 to be offlined,
the admin queue may not be usable if no other CPUs in the affinity mask
are online. This is a problem since unlike IO queues, there is only
one admin queue that always needs to be usable.

To fix, this patch allocates one pre_vector for the admin queue that
is assigned all CPUs, so will always be accessible. The IO queues are
assigned the remaining managed vectors.

In case a controller has only one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.

Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
Cc: Ming Lei <ming.lei at redhat.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
v1 -> v2:

  Update to use new blk-mq API.

  Removed unnecessary braces, inline functions, and temp variables.

  Amended author (this has evolved significantly from the original).

 drivers/nvme/host/pci.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cc47fbe32ea5..50c8eaf51d92 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -84,6 +84,7 @@ struct nvme_dev {
 	struct dma_pool *prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
+	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
 	void __iomem *bar;
@@ -414,7 +415,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 {
 	struct nvme_dev *dev = set->driver_data;
 
-	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
+	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
+				     dev->num_vecs > 1);
 }
 
 /**
@@ -1455,7 +1457,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
 		nvmeq->sq_cmds_io = dev->cmb + offset;
 	}
 
-	nvmeq->cq_vector = qid - 1;
+	/*
+	 * A queue's vector matches the queue identifier unless the controller
+	 * has only one vector available.
+	 */
+	nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid;
 	result = adapter_alloc_cq(dev, qid, nvmeq);
 	if (result < 0)
 		goto release_vector;
@@ -1909,6 +1915,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	int result, nr_io_queues;
 	unsigned long size;
 
+	struct irq_affinity affd = {
+		.pre_vectors = 1
+	};
+
 	nr_io_queues = num_present_cpus();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
 	if (result < 0)
@@ -1944,11 +1954,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	 * setting up the full range we need.
 	 */
 	pci_free_irq_vectors(pdev);
-	nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
-			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
-	if (nr_io_queues <= 0)
+	result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues + 1,
+			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
+	if (result <= 0)
 		return -EIO;
-	dev->max_qid = nr_io_queues;
+	dev->num_vecs = result;
+	dev->max_qid = max(result - 1, 1);
 
 	/*
 	 * Should investigate if there's a performance win from allocating
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-28  2:08   ` Ming Lei
  2018-03-28  7:32   ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Ming Lei @ 2018-03-28  2:08 UTC (permalink / raw)


On Tue, Mar 27, 2018@09:39:08AM -0600, Keith Busch wrote:
> The admin and first IO queues shared the first irq vector, which has an
> affinity mask including cpu0. If a system allows cpu0 to be offlined,
> the admin queue may not be usable if no other CPUs in the affinity mask
> are online. This is a problem since unlike IO queues, there is only
> one admin queue that always needs to be usable.
> 
> To fix, this patch allocates one pre_vector for the admin queue that
> is assigned all CPUs, so will always be accessible. The IO queues are
> assigned the remaining managed vectors.
> 
> In case a controller has only one interrupt vector available, the admin
> and IO queues will share the pre_vector with all CPUs assigned.
> 
> Cc: Jianchao Wang <jianchao.w.wang at oracle.com>
> Cc: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
> v1 -> v2:
> 
>   Update to use new blk-mq API.
> 
>   Removed unnecessary braces, inline functions, and temp variables.
> 
>   Amended author (this has evolved significantly from the original).
> 
>  drivers/nvme/host/pci.c | 23 +++++++++++++++++------
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cc47fbe32ea5..50c8eaf51d92 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -84,6 +84,7 @@ struct nvme_dev {
>  	struct dma_pool *prp_small_pool;
>  	unsigned online_queues;
>  	unsigned max_qid;
> +	unsigned int num_vecs;
>  	int q_depth;
>  	u32 db_stride;
>  	void __iomem *bar;
> @@ -414,7 +415,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
>  {
>  	struct nvme_dev *dev = set->driver_data;
>  
> -	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> +	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> +				     dev->num_vecs > 1);
>  }
>  
>  /**
> @@ -1455,7 +1457,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
>  		nvmeq->sq_cmds_io = dev->cmb + offset;
>  	}
>  
> -	nvmeq->cq_vector = qid - 1;
> +	/*
> +	 * A queue's vector matches the queue identifier unless the controller
> +	 * has only one vector available.
> +	 */
> +	nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid;
>  	result = adapter_alloc_cq(dev, qid, nvmeq);
>  	if (result < 0)
>  		goto release_vector;
> @@ -1909,6 +1915,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
>  	int result, nr_io_queues;
>  	unsigned long size;
>  
> +	struct irq_affinity affd = {
> +		.pre_vectors = 1
> +	};
> +
>  	nr_io_queues = num_present_cpus();
>  	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
>  	if (result < 0)
> @@ -1944,11 +1954,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
>  	 * setting up the full range we need.
>  	 */
>  	pci_free_irq_vectors(pdev);
> -	nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
> -			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
> -	if (nr_io_queues <= 0)
> +	result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues + 1,
> +			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
> +	if (result <= 0)
>  		return -EIO;
> -	dev->max_qid = nr_io_queues;
> +	dev->num_vecs = result;
> +	dev->max_qid = max(result - 1, 1);
>  
>  	/*
>  	 * Should investigate if there's a performance win from allocating
> -- 
> 2.14.3
> 

Reviewed-by: Ming Lei <ming.lei at redhat.com>

-- 
Ming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
  2018-03-28  2:08   ` Ming Lei
@ 2018-03-28  7:32   ` Christoph Hellwig
  2018-03-28 20:38     ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-28  7:32 UTC (permalink / raw)


On Tue, Mar 27, 2018@09:39:08AM -0600, Keith Busch wrote:
> -	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> +	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> +				     dev->num_vecs > 1);

Can you turn this into:

-	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
			     dev->num_vecs > 1 ? 1 /* admin queue */ : 0);

no functional change, but much easier to understand.

Except for that the whole series looks good:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
  2018-03-28  7:32   ` Christoph Hellwig
@ 2018-03-28 20:38     ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-28 20:38 UTC (permalink / raw)


On Wed, Mar 28, 2018@09:32:14AM +0200, Christoph Hellwig wrote:
> On Tue, Mar 27, 2018@09:39:08AM -0600, Keith Busch wrote:
> > -	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> > +	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> > +				     dev->num_vecs > 1);
> 
> Can you turn this into:
> 
> -	return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> 			     dev->num_vecs > 1 ? 1 /* admin queue */ : 0);
> 
> no functional change, but much easier to understand.
> 
> Except for that the whole series looks good:

Sounds good, thanks for the reviews, Christoph and Ming.

Updated with your suggestion and applied (Jens picked up the blk-mq part;
nvme-4.17 is rebased to that).

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-28 20:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
2018-03-26  1:47   ` Ming Lei
2018-03-26 14:48     ` Keith Busch
2018-03-27 14:17   ` Christoph Hellwig
2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
2018-03-27 14:20   ` Christoph Hellwig
2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
2018-03-26 19:33   ` Keith Busch
2018-03-26  1:50 ` Ming Lei
2018-03-26 19:37   ` Keith Busch
2018-03-27 14:17 ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2018-03-27 15:39 Keith Busch
2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
2018-03-28  2:08   ` Ming Lei
2018-03-28  7:32   ` Christoph Hellwig
2018-03-28 20:38     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox