* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
@ 2018-03-23 22:19 Keith Busch
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)
To: Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe, Keith Busch
The PCI interrupt vectors intended to be associated with a queue may
not start at 0. This patch adds an offset parameter so blk-mq may find
the intended affinity mask. The default value is 0 so existing drivers
that don't care about this parameter don't need to change.
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
block/blk-mq-pci.c | 12 ++++++++++--
include/linux/blk-mq-pci.h | 2 ++
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..1040a7705c13 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -21,6 +21,7 @@
* blk_mq_pci_map_queues - provide a default queue mapping for PCI device
* @set: tagset to provide the mapping for
* @pdev: PCI device associated with @set.
+ * @offset: PCI irq starting vector offset
*
* This function assumes the PCI device @pdev has at least as many available
* interrupt vectors as @set has queues. It will then query the vector
@@ -28,13 +29,14 @@
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int offset)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
- mask = pci_irq_get_affinity(pdev, queue);
+ mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
@@ -50,4 +52,10 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
set->mq_map[cpu] = 0;
return 0;
}
+EXPORT_SYMBOL_GPL(__blk_mq_pci_map_queues);
+
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+{
+ return __blk_mq_pci_map_queues(set, pdev, 0);
+}
EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
index 6338551e0fb9..5a92ecdbd78e 100644
--- a/include/linux/blk-mq-pci.h
+++ b/include/linux/blk-mq-pci.h
@@ -5,6 +5,8 @@
struct blk_mq_tag_set;
struct pci_dev;
+int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int offset);
int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
#endif /* _LINUX_BLK_MQ_PCI_H */
--
2.14.3
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 2/3] nvme-pci: Remove unused queue parameter
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
@ 2018-03-23 22:19 ` Keith Busch
2018-03-26 1:47 ` Ming Lei
2018-03-27 14:17 ` Christoph Hellwig
2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
` (3 subsequent siblings)
4 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)
To: Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe, Keith Busch
All nvme queue memory is allocated up front. We don't take the node
into consideration when creating queues anymore, so removing the unused
parameter.
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
drivers/nvme/host/pci.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cef5ce851a92..632166f7d8f2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1379,8 +1379,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
return 0;
}
-static int nvme_alloc_queue(struct nvme_dev *dev, int qid,
- int depth, int node)
+static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
{
struct nvme_queue *nvmeq = &dev->queues[qid];
@@ -1595,8 +1594,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
if (result < 0)
return result;
- result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH,
- dev_to_node(dev->dev));
+ result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
if (result)
return result;
@@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
int ret = 0;
for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
- /* vector == qid - 1, match nvme_create_queue */
- if (nvme_alloc_queue(dev, i, dev->q_depth,
- pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
+ if (nvme_alloc_queue(dev, i, dev->q_depth)) {
ret = -ENOMEM;
break;
}
--
2.14.3
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 2/3] nvme-pci: Remove unused queue parameter
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
@ 2018-03-26 1:47 ` Ming Lei
2018-03-26 14:48 ` Keith Busch
2018-03-27 14:17 ` Christoph Hellwig
1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2018-03-26 1:47 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Jens Axboe
On Fri, Mar 23, 2018 at 04:19:22PM -0600, Keith Busch wrote:
> All nvme queue memory is allocated up front. We don't take the node
> into consideration when creating queues anymore, so removing the unused
> parameter.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> drivers/nvme/host/pci.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cef5ce851a92..632166f7d8f2 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1379,8 +1379,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
> return 0;
> }
>
> -static int nvme_alloc_queue(struct nvme_dev *dev, int qid,
> - int depth, int node)
> +static int nvme_alloc_queue(struct nvme_dev *dev, int qid, int depth)
> {
> struct nvme_queue *nvmeq = &dev->queues[qid];
>
> @@ -1595,8 +1594,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
> if (result < 0)
> return result;
>
> - result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH,
> - dev_to_node(dev->dev));
> + result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
> if (result)
> return result;
>
> @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
> int ret = 0;
>
> for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
> - /* vector == qid - 1, match nvme_create_queue */
> - if (nvme_alloc_queue(dev, i, dev->q_depth,
> - pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
> + if (nvme_alloc_queue(dev, i, dev->q_depth)) {
> ret = -ENOMEM;
> break;
> }
nvme_create_io_queues() is called after pci_alloc_irq_vectors() returns,
and the above pci_irq_get_node() should return the correct node info,
right?
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 2/3] nvme-pci: Remove unused queue parameter
2018-03-26 1:47 ` Ming Lei
@ 2018-03-26 14:48 ` Keith Busch
0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 14:48 UTC (permalink / raw)
To: Ming Lei
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Jens Axboe
On Mon, Mar 26, 2018 at 09:47:07AM +0800, Ming Lei wrote:
> On Fri, Mar 23, 2018 at 04:19:22PM -0600, Keith Busch wrote:
> > @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
> > int ret = 0;
> >
> > for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
> > - /* vector == qid - 1, match nvme_create_queue */
> > - if (nvme_alloc_queue(dev, i, dev->q_depth,
> > - pci_irq_get_node(to_pci_dev(dev->dev), i - 1))) {
> > + if (nvme_alloc_queue(dev, i, dev->q_depth)) {
> > ret = -ENOMEM;
> > break;
> > }
>
> nvme_create_io_queues() is called after pci_alloc_irq_vectors() returns,
> and the above pci_irq_get_node() should return the correct node info,
> right?
Right, the return is correct. It's just not being used anymore.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvme-pci: Remove unused queue parameter
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
2018-03-26 1:47 ` Ming Lei
@ 2018-03-27 14:17 ` Christoph Hellwig
1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:17 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Ming Lei, Jens Axboe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
@ 2018-03-23 22:19 ` Keith Busch
2018-03-27 14:20 ` Christoph Hellwig
2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2018-03-23 22:19 UTC (permalink / raw)
To: Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe, Keith Busch
From: Jianchao Wang <jianchao.w.wang@oracle.com>
The admin and first IO queues shared the first irq vector, which has an
affinity mask including cpu0. If a system allows cpu0 to be offlined,
the admin queue may not be usable if no other CPUs in the affinity mask
are online. This is a problem since unlike IO queues, there is only
one admin queue that always needs to be usable.
To fix, this patch allocates one pre_vector for the admin queue that
is assigned all CPUs, so will always be accessible. The IO queues are
assigned the remaining managed vectors.
In case a controller has only one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
[changelog, code comments, merge, and blk-mq pci vector offset]
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
drivers/nvme/host/pci.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 632166f7d8f2..7b31bc01df6c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -84,6 +84,7 @@ struct nvme_dev {
struct dma_pool *prp_small_pool;
unsigned online_queues;
unsigned max_qid;
+ unsigned int num_vecs;
int q_depth;
u32 db_stride;
void __iomem *bar;
@@ -139,6 +140,16 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
return container_of(ctrl, struct nvme_dev, ctrl);
}
+static inline unsigned int nvme_ioq_vector(struct nvme_dev *dev,
+ unsigned int qid)
+{
+ /*
+ * A queue's vector matches the queue identifier unless the controller
+ * has only one vector available.
+ */
+ return (dev->num_vecs == 1) ? 0 : qid;
+}
+
/*
* An NVM Express queue. Each device has at least two (one for admin
* commands and one for I/O commands).
@@ -414,7 +425,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_dev *dev = set->driver_data;
- return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
+ return __blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
+ dev->num_vecs > 1);
}
/**
@@ -1455,7 +1467,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
nvmeq->sq_cmds_io = dev->cmb + offset;
}
- nvmeq->cq_vector = qid - 1;
+ nvmeq->cq_vector = nvme_ioq_vector(dev, qid);
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
goto release_vector;
@@ -1908,6 +1920,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
struct pci_dev *pdev = to_pci_dev(dev->dev);
int result, nr_io_queues;
unsigned long size;
+ struct irq_affinity affd = {.pre_vectors = 1};
+ int ret;
nr_io_queues = num_present_cpus();
result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
@@ -1944,11 +1958,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
* setting up the full range we need.
*/
pci_free_irq_vectors(pdev);
- nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
- PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
- if (nr_io_queues <= 0)
+ ret = pci_alloc_irq_vectors_affinity(pdev, 1, (nr_io_queues + 1),
+ PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
+ if (ret <= 0)
return -EIO;
- dev->max_qid = nr_io_queues;
+ dev->num_vecs = ret;
+ dev->max_qid = max(ret - 1, 1);
/*
* Should investigate if there's a performance win from allocating
--
2.14.3
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-27 14:20 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:20 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Ming Lei, Jens Axboe
> +static inline unsigned int nvme_ioq_vector(struct nvme_dev *dev,
> + unsigned int qid)
No need for the inline here I think.
> +{
> + /*
> + * A queue's vector matches the queue identifier unless the controller
> + * has only one vector available.
> + */
> + return (dev->num_vecs == 1) ? 0 : qid;
and no need for the braces here.
> + struct irq_affinity affd = {.pre_vectors = 1};
struct irq_affinity affd = {
.pre_vectors = 1
};
to make it a little more readable.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-24 13:55 ` jianchao.wang
2018-03-26 19:33 ` Keith Busch
2018-03-26 1:50 ` Ming Lei
2018-03-27 14:17 ` Christoph Hellwig
4 siblings, 1 reply; 16+ messages in thread
From: jianchao.wang @ 2018-03-24 13:55 UTC (permalink / raw)
To: Keith Busch, Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Ming Lei, Jens Axboe
Hi Keith
Thanks for your time and patch for this.
On 03/24/2018 06:19 AM, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> block/blk-mq-pci.c | 12 ++++++++++--
> include/linux/blk-mq-pci.h | 2 ++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
> * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
> * @set: tagset to provide the mapping for
> * @pdev: PCI device associated with @set.
> + * @offset: PCI irq starting vector offset
> *
> * This function assumes the PCI device @pdev has at least as many available
> * interrupt vectors as @set has queues. It will then query the vector
> @@ -28,13 +29,14 @@
> * that maps a queue to the CPUs that have irq affinity for the corresponding
> * vector.
> */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset)
> {
> const struct cpumask *mask;
> unsigned int queue, cpu;
>
> for (queue = 0; queue < set->nr_hw_queues; queue++) {
> - mask = pci_irq_get_affinity(pdev, queue);
> + mask = pci_irq_get_affinity(pdev, queue + offset);
> if (!mask)
> goto fallback;
>
Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
give the mapping from hctx queue num to device-relative interrupt vector index.
Thanks
Jianchao
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
@ 2018-03-26 19:33 ` Keith Busch
0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 19:33 UTC (permalink / raw)
To: jianchao.wang
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Ming Lei, Jens Axboe
On Sat, Mar 24, 2018 at 09:55:49PM +0800, jianchao.wang wrote:
> Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
> give the mapping from hctx queue num to device-relative interrupt vector index.
If a driver's mapping is so complicated as to require a special per-hctx
callback, it'd be just as easy to implement the mapping in that driver's
blk_mq_ops' 'map_queues' directly.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
` (2 preceding siblings ...)
2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
@ 2018-03-26 1:50 ` Ming Lei
2018-03-26 19:37 ` Keith Busch
2018-03-27 14:17 ` Christoph Hellwig
4 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2018-03-26 1:50 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Jens Axboe
On Fri, Mar 23, 2018 at 04:19:21PM -0600, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> block/blk-mq-pci.c | 12 ++++++++++--
> include/linux/blk-mq-pci.h | 2 ++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
> * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
> * @set: tagset to provide the mapping for
> * @pdev: PCI device associated with @set.
> + * @offset: PCI irq starting vector offset
> *
> * This function assumes the PCI device @pdev has at least as many available
> * interrupt vectors as @set has queues. It will then query the vector
> @@ -28,13 +29,14 @@
> * that maps a queue to the CPUs that have irq affinity for the corresponding
> * vector.
> */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset)
> {
> const struct cpumask *mask;
> unsigned int queue, cpu;
>
> for (queue = 0; queue < set->nr_hw_queues; queue++) {
> - mask = pci_irq_get_affinity(pdev, queue);
> + mask = pci_irq_get_affinity(pdev, queue + offset);
> if (!mask)
> goto fallback;
>
> @@ -50,4 +52,10 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> set->mq_map[cpu] = 0;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(__blk_mq_pci_map_queues);
> +
> +int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +{
> + return __blk_mq_pci_map_queues(set, pdev, 0);
> +}
> EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
> diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
> index 6338551e0fb9..5a92ecdbd78e 100644
> --- a/include/linux/blk-mq-pci.h
> +++ b/include/linux/blk-mq-pci.h
> @@ -5,6 +5,8 @@
> struct blk_mq_tag_set;
> struct pci_dev;
>
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset);
> int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
>
> #endif /* _LINUX_BLK_MQ_PCI_H */
> --
> 2.14.3
>
Given no many callers of blk_mq_pci_map_queues(), I suggest to add the
parameter of 'offset' to this API directly, then people may keep the
'.pre_vectors' stuff in mind, and avoid to misuse it.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
2018-03-26 1:50 ` Ming Lei
@ 2018-03-26 19:37 ` Keith Busch
0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-26 19:37 UTC (permalink / raw)
To: Ming Lei
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Jens Axboe
On Mon, Mar 26, 2018 at 09:50:38AM +0800, Ming Lei wrote:
>
> Given no many callers of blk_mq_pci_map_queues(), I suggest to add the
> parameter of 'offset' to this API directly, then people may keep the
> '.pre_vectors' stuff in mind, and avoid to misuse it.
Yeah, I think I have to agree. I was trying really hard to not touch
other drivers, but the concept doesn't seem odd enough to justify hiding
it behind a default parameter. I'll send v2 tomorrow if there's no other
feedback.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
` (3 preceding siblings ...)
2018-03-26 1:50 ` Ming Lei
@ 2018-03-27 14:17 ` Christoph Hellwig
4 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-27 14:17 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Ming Lei, Jens Axboe
On Fri, Mar 23, 2018 at 04:19:21PM -0600, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> block/blk-mq-pci.c | 12 ++++++++++--
> include/linux/blk-mq-pci.h | 2 ++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
> * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
> * @set: tagset to provide the mapping for
> * @pdev: PCI device associated with @set.
> + * @offset: PCI irq starting vector offset
> *
> * This function assumes the PCI device @pdev has at least as many available
> * interrupt vectors as @set has queues. It will then query the vector
> @@ -28,13 +29,14 @@
> * that maps a queue to the CPUs that have irq affinity for the corresponding
> * vector.
> */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset)
Can you just change the blk_mq_pci_map_queues prototype instead?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
@ 2018-03-27 15:39 Keith Busch
2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2018-03-27 15:39 UTC (permalink / raw)
To: Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe, Keith Busch, Don Brace, qla2xxx-upstream, linux-scsi
The PCI interrupt vectors intended to be associated with a queue may
not start at 0; a driver may allocate pre_vectors for special use. This
patch adds an offset parameter so blk-mq may find the intended affinity
mask and updates all drivers using this API accordingly.
Cc: Don Brace <don.brace@microsemi.com>
Cc: <qla2xxx-upstream@qlogic.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
v1 -> v2:
Update blk-mq API directly instead of chaining a default parameter to
a new API, and update all drivers accordingly.
block/blk-mq-pci.c | 6 ++++--
drivers/nvme/host/pci.c | 2 +-
drivers/scsi/qla2xxx/qla_os.c | 2 +-
drivers/scsi/smartpqi/smartpqi_init.c | 2 +-
include/linux/blk-mq-pci.h | 3 ++-
5 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..e233996bb76f 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -21,6 +21,7 @@
* blk_mq_pci_map_queues - provide a default queue mapping for PCI device
* @set: tagset to provide the mapping for
* @pdev: PCI device associated with @set.
+ * @offset: Offset to use for the pci irq vector
*
* This function assumes the PCI device @pdev has at least as many available
* interrupt vectors as @set has queues. It will then query the vector
@@ -28,13 +29,14 @@
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int offset)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
- mask = pci_irq_get_affinity(pdev, queue);
+ mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cef5ce851a92..e3b9efca0571 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -414,7 +414,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_dev *dev = set->driver_data;
- return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
+ return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
}
/**
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 12ee6e02d146..2c705f3dd265 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -6805,7 +6805,7 @@ static int qla2xxx_map_queues(struct Scsi_Host *shost)
if (USER_CTRL_IRQ(vha->hw))
rc = blk_mq_map_queues(&shost->tag_set);
else
- rc = blk_mq_pci_map_queues(&shost->tag_set, vha->hw->pdev);
+ rc = blk_mq_pci_map_queues(&shost->tag_set, vha->hw->pdev, 0);
return rc;
}
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index b2880c7709e6..10c94011c8a8 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -5348,7 +5348,7 @@ static int pqi_map_queues(struct Scsi_Host *shost)
{
struct pqi_ctrl_info *ctrl_info = shost_to_hba(shost);
- return blk_mq_pci_map_queues(&shost->tag_set, ctrl_info->pci_dev);
+ return blk_mq_pci_map_queues(&shost->tag_set, ctrl_info->pci_dev, 0);
}
static int pqi_getpciinfo_ioctl(struct pqi_ctrl_info *ctrl_info,
diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
index 6338551e0fb9..9f4c17f0d2d8 100644
--- a/include/linux/blk-mq-pci.h
+++ b/include/linux/blk-mq-pci.h
@@ -5,6 +5,7 @@
struct blk_mq_tag_set;
struct pci_dev;
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int offset);
#endif /* _LINUX_BLK_MQ_PCI_H */
--
2.14.3
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-27 15:39 Keith Busch
@ 2018-03-27 15:39 ` Keith Busch
2018-03-28 2:08 ` Ming Lei
2018-03-28 7:32 ` Christoph Hellwig
0 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-27 15:39 UTC (permalink / raw)
To: Linux NVMe, Linux Block
Cc: Christoph Hellwig, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe, Keith Busch
The admin and first IO queues shared the first irq vector, which has an
affinity mask including cpu0. If a system allows cpu0 to be offlined,
the admin queue may not be usable if no other CPUs in the affinity mask
are online. This is a problem since unlike IO queues, there is only
one admin queue that always needs to be usable.
To fix, this patch allocates one pre_vector for the admin queue that
is assigned all CPUs, so will always be accessible. The IO queues are
assigned the remaining managed vectors.
In case a controller has only one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
v1 -> v2:
Update to use new blk-mq API.
Removed unnecessary braces, inline functions, and temp variables.
Amended author (this has evolved significantly from the original).
drivers/nvme/host/pci.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cc47fbe32ea5..50c8eaf51d92 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -84,6 +84,7 @@ struct nvme_dev {
struct dma_pool *prp_small_pool;
unsigned online_queues;
unsigned max_qid;
+ unsigned int num_vecs;
int q_depth;
u32 db_stride;
void __iomem *bar;
@@ -414,7 +415,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_dev *dev = set->driver_data;
- return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
+ return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
+ dev->num_vecs > 1);
}
/**
@@ -1455,7 +1457,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
nvmeq->sq_cmds_io = dev->cmb + offset;
}
- nvmeq->cq_vector = qid - 1;
+ /*
+ * A queue's vector matches the queue identifier unless the controller
+ * has only one vector available.
+ */
+ nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
goto release_vector;
@@ -1909,6 +1915,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
int result, nr_io_queues;
unsigned long size;
+ struct irq_affinity affd = {
+ .pre_vectors = 1
+ };
+
nr_io_queues = num_present_cpus();
result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
if (result < 0)
@@ -1944,11 +1954,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
* setting up the full range we need.
*/
pci_free_irq_vectors(pdev);
- nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
- PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
- if (nr_io_queues <= 0)
+ result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues + 1,
+ PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
+ if (result <= 0)
return -EIO;
- dev->max_qid = nr_io_queues;
+ dev->num_vecs = result;
+ dev->max_qid = max(result - 1, 1);
/*
* Should investigate if there's a performance win from allocating
--
2.14.3
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
@ 2018-03-28 2:08 ` Ming Lei
2018-03-28 7:32 ` Christoph Hellwig
1 sibling, 0 replies; 16+ messages in thread
From: Ming Lei @ 2018-03-28 2:08 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Jens Axboe
On Tue, Mar 27, 2018 at 09:39:08AM -0600, Keith Busch wrote:
> The admin and first IO queues shared the first irq vector, which has an
> affinity mask including cpu0. If a system allows cpu0 to be offlined,
> the admin queue may not be usable if no other CPUs in the affinity mask
> are online. This is a problem since unlike IO queues, there is only
> one admin queue that always needs to be usable.
>
> To fix, this patch allocates one pre_vector for the admin queue that
> is assigned all CPUs, so will always be accessible. The IO queues are
> assigned the remaining managed vectors.
>
> In case a controller has only one interrupt vector available, the admin
> and IO queues will share the pre_vector with all CPUs assigned.
>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> v1 -> v2:
>
> Update to use new blk-mq API.
>
> Removed unnecessary braces, inline functions, and temp variables.
>
> Amended author (this has evolved significantly from the original).
>
> drivers/nvme/host/pci.c | 23 +++++++++++++++++------
> 1 file changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cc47fbe32ea5..50c8eaf51d92 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -84,6 +84,7 @@ struct nvme_dev {
> struct dma_pool *prp_small_pool;
> unsigned online_queues;
> unsigned max_qid;
> + unsigned int num_vecs;
> int q_depth;
> u32 db_stride;
> void __iomem *bar;
> @@ -414,7 +415,8 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
> {
> struct nvme_dev *dev = set->driver_data;
>
> - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> + return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> + dev->num_vecs > 1);
> }
>
> /**
> @@ -1455,7 +1457,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
> nvmeq->sq_cmds_io = dev->cmb + offset;
> }
>
> - nvmeq->cq_vector = qid - 1;
> + /*
> + * A queue's vector matches the queue identifier unless the controller
> + * has only one vector available.
> + */
> + nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid;
> result = adapter_alloc_cq(dev, qid, nvmeq);
> if (result < 0)
> goto release_vector;
> @@ -1909,6 +1915,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> int result, nr_io_queues;
> unsigned long size;
>
> + struct irq_affinity affd = {
> + .pre_vectors = 1
> + };
> +
> nr_io_queues = num_present_cpus();
> result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
> if (result < 0)
> @@ -1944,11 +1954,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> * setting up the full range we need.
> */
> pci_free_irq_vectors(pdev);
> - nr_io_queues = pci_alloc_irq_vectors(pdev, 1, nr_io_queues,
> - PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY);
> - if (nr_io_queues <= 0)
> + result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues + 1,
> + PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
> + if (result <= 0)
> return -EIO;
> - dev->max_qid = nr_io_queues;
> + dev->num_vecs = result;
> + dev->max_qid = max(result - 1, 1);
>
> /*
> * Should investigate if there's a performance win from allocating
> --
> 2.14.3
>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
2018-03-28 2:08 ` Ming Lei
@ 2018-03-28 7:32 ` Christoph Hellwig
2018-03-28 20:38 ` Keith Busch
1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2018-03-28 7:32 UTC (permalink / raw)
To: Keith Busch
Cc: Linux NVMe, Linux Block, Christoph Hellwig, Sagi Grimberg,
Jianchao Wang, Ming Lei, Jens Axboe
On Tue, Mar 27, 2018 at 09:39:08AM -0600, Keith Busch wrote:
> - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> + return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> + dev->num_vecs > 1);
Can you turn this into:
- return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
dev->num_vecs > 1 ? 1 /* admin queue */ : 0);
no functional change, but much easier to understand.
Except for that the whole series looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors
2018-03-28 7:32 ` Christoph Hellwig
@ 2018-03-28 20:38 ` Keith Busch
0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-03-28 20:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linux NVMe, Linux Block, Sagi Grimberg, Jianchao Wang, Ming Lei,
Jens Axboe
On Wed, Mar 28, 2018 at 09:32:14AM +0200, Christoph Hellwig wrote:
> On Tue, Mar 27, 2018 at 09:39:08AM -0600, Keith Busch wrote:
> > - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> > + return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev),
> > + dev->num_vecs > 1);
>
> Can you turn this into:
>
> - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> dev->num_vecs > 1 ? 1 /* admin queue */ : 0);
>
> no functional change, but much easier to understand.
>
> Except for that the whole series looks good:
Sounds good, thanks for the reviews, Christoph and Ming.
Updated with your suggestion and applied (Jens picked up the blk-mq part;
nvme-4.17 is rebased to that).
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-03-28 20:35 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-23 22:19 [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues Keith Busch
2018-03-23 22:19 ` [PATCH 2/3] nvme-pci: Remove unused queue parameter Keith Busch
2018-03-26 1:47 ` Ming Lei
2018-03-26 14:48 ` Keith Busch
2018-03-27 14:17 ` Christoph Hellwig
2018-03-23 22:19 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
2018-03-27 14:20 ` Christoph Hellwig
2018-03-24 13:55 ` [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues jianchao.wang
2018-03-26 19:33 ` Keith Busch
2018-03-26 1:50 ` Ming Lei
2018-03-26 19:37 ` Keith Busch
2018-03-27 14:17 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2018-03-27 15:39 Keith Busch
2018-03-27 15:39 ` [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors Keith Busch
2018-03-28 2:08 ` Ming Lei
2018-03-28 7:32 ` Christoph Hellwig
2018-03-28 20:38 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox