[PATCH v6 0/9] blk: honor isolcpus configuration

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 0/9] blk: honor isolcpus configuration
@ 2025-04-24 18:19 Daniel Wagner
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

I've added back the isolcpus io_queue agrument. This avoids any semantic
changes of managed_irq. I don't like it but I haven't found a
better way to deal with it. Ming clearly stated managed_irq should not
change.

Another change is to prevent offlining a housekeeping CPU which is still
serving an isolated CPU instead just warning. Seem way saner way to
handle this situation. Thanks Mathieu!

Here details what's the difference is between managed_irq and io_queue.

* nr cpus <= nr hardware queue

(e.g. 8 CPUs, 8 hardware queues)

managed_irq is working nicely for situation where there hardware has at
least as many hardware queues as CPUs, e.g. enterprise nvme-pci devices.

managed_irq will assign each CPU its own hardware queue and ensures that
non unbound IO is scheduled to a isolated CPU. As long the isolated CPU
is not issuing any IO there will be no block layer 'noise' on the
isolated CPU.

  - irqaffinity=0 isolcpus=managed_ird,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0
	        hctx1: default 1
	        hctx2: default 2
	        hctx3: default 3
	        hctx4: default 4
	        hctx5: default 5
	        hctx6: default 6
	        hctx7: default 7

	IRQ mapping for nvme0n1
	        irq 40 affinity 0 effective 0  nvme0q0
	        irq 41 affinity 0 effective 0  nvme0q1
	        irq 42 affinity 1 effective 1  nvme0q2
	        irq 43 affinity 2 effective 2  nvme0q3
	        irq 44 affinity 3 effective 3  nvme0q4
	        irq 45 affinity 4 effective 4  nvme0q5
	        irq 46 affinity 5 effective 5  nvme0q6
	        irq 47 affinity 6 effective 6  nvme0q7
	        irq 48 affinity 7 effective 7  nvme0q8

With this configuration io_queue will create four hctx for the four
housekeeping CPUs:

  - irqaffinity=0 isolcpus=io_queue,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 2
	        hctx1: default 1 3
	        hctx2: default 4 6
	        hctx3: default 5 7

	IRQ mapping for /dev/nvme0n1
	        irq 36 affinity 0 effective 0  nvme0q0
	        irq 37 affinity 0 effective 0  nvme0q1
	        irq 38 affinity 1 effective 1  nvme0q2
	        irq 39 affinity 4 effective 4  nvme0q3
	        irq 40 affinity 5 effective 5  nvme0q4

* nr cpus > nr hardware queue

(e.g. 8 CPUs, 2 hardware queues)

managed_irq is creating two hctx and all CPUs could handle IRQs. In this
case an isolated CPU is selected to handle all IRQs for a given hctx:

  - irqaffinity=0 isolcpus=managed_ird,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 1 2 3
	        hctx1: default 4 5 6 7

	IRQ mapping for /dev/nvme0n1
	        irq 40 affinity 0 effective 0  nvme0q0
	        irq 41 affinity 0-3 effective 3  nvme0q1
	        irq 42 affinity 4-7 effective 7  nvme0q2

io_queue also creates also two hctxs but only assigns housekeeping CPUs
to handle the IRQs:

  - irqaffinity=0 isolcpus=io_queue,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 1 2 6
	        hctx1: default 3 4 5 7

	IRQ mapping for /dev/nvme0n1
	        irq 36 affinity 0 effective 0  nvme0q0
	        irq 37 affinity 0-1 effective 1  nvme0q1
	        irq 38 affinity 4-5 effective 5  nvme0q2

The case that there are less hardware queues than CPUs is more common
with the SCSI HBAs so with the io_queue approach not just nvme-pci are
supported.

Something completely different: we got several bug reports for kdump and
SCSI HBAs. The issue is that the SCSI drivers are allocating too many
resources when it's a kdump kernel. This series will fix this as well,
because the number of queues will be limitted by
blk_mq_num_possible_queues() instead of num_possible_cpus(). This will
avoid sprinkling is_kdump_kernel() around.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
Changes in v6:
- added io_queue isolcpus type back
- prevent offlining hk cpu if a isol cpu is still present isntead just warning
- Link to v5: https://lore.kernel.org/r/20250110-isolcpus-io-queues-v5-0-0e4f118680b0@kernel.org

Changes in v5:
- rebased on latest for-6.14/block
- udpated documetation on managed_irq
- updated commit message "blk-mq: issue warning when offlining hctx with online isolcpus"
- split input/output parameter in "lib/group_cpus: let group_cpu_evenly return number of groups"
- dropped "sched/isolation: document HK_TYPE housekeeping option"
- Link to v4: https://lore.kernel.org/r/20241217-isolcpus-io-queues-v4-0-5d355fbb1e14@kernel.org

Changes in v4:
- added "blk-mq: issue warning when offlining hctx with online isolcpus"
- fixed check in cgroup_cpus_evenly, the if condition needs to use
  housekeeping_enabled() and not cpusmask_weight(housekeeping_masks),
  because the later will always return a valid mask.
- dropped fixed tag from "lib/group_cpus.c: honor housekeeping config when
  grouping CPUs"
- fixed overlong line "scsi: use block layer helpers to calculate num
  of queues"
- dropped "sched/isolation: Add io_queue housekeeping option",
  just document the housekeep enum hk_type
- added "lib/group_cpus: let group_cpu_evenly return number of groups"
- collected tags
- splitted series into a preperation series:
  https://lore.kernel.org/linux-nvme/20241202-refactor-blk-affinity-helpers-v6-0-27211e9c2cd5@kernel.org/
- Link to v3: https://lore.kernel.org/r/20240806-isolcpus-io-queues-v3-0-da0eecfeaf8b@suse.de

Changes in v3:
- lifted a couple of patches from
  https://lore.kernel.org/all/20210709081005.421340-1-ming.lei@redhat.com/
  "virito: add APIs for retrieving vq affinity"
  "blk-mq: introduce blk_mq_dev_map_queues"
- replaces all users of blk_mq_[pci|virtio]_map_queues with
  blk_mq_dev_map_queues
- updated/extended number of queue calc helpers
- add isolcpus=io_queue CPU-hctx mapping function
- documented enum hk_type and isolcpus=io_queue
- added "scsi: pm8001: do not overwrite PCI queue mapping"
- Link to v2: https://lore.kernel.org/r/20240627-isolcpus-io-queues-v2-0-26a32e3c4f75@suse.de

Changes in v2:
- updated documentation
- splitted blk/nvme-pci patch
- dropped HK_TYPE_IO_QUEUE, use HK_TYPE_MANAGED_IRQ
- Link to v1: https://lore.kernel.org/r/20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de

---
Daniel Wagner (9):
      lib/group_cpus: let group_cpu_evenly return number initialized masks
      blk-mq: add number of queue calc helper
      nvme-pci: use block layer helpers to calculate num of queues
      scsi: use block layer helpers to calculate num of queues
      virtio: blk/scsi: use block layer helpers to calculate num of queues
      isolation: introduce io_queue isolcpus type
      lib/group_cpus: honor housekeeping config when grouping CPUs
      blk-mq: use hk cpus only when isolcpus=io_queue is enabled
      blk-mq: prevent offlining hk CPU with associated online isolated CPUs

 block/blk-mq-cpumap.c                     | 116 +++++++++++++++++++++++++++++-
 block/blk-mq.c                            |  46 +++++++++++-
 drivers/block/virtio_blk.c                |   5 +-
 drivers/nvme/host/pci.c                   |   5 +-
 drivers/scsi/megaraid/megaraid_sas_base.c |  15 ++--
 drivers/scsi/qla2xxx/qla_isr.c            |  10 +--
 drivers/scsi/smartpqi/smartpqi_init.c     |   5 +-
 drivers/scsi/virtio_scsi.c                |   1 +
 drivers/virtio/virtio_vdpa.c              |   9 +--
 fs/fuse/virtio_fs.c                       |   6 +-
 include/linux/blk-mq.h                    |   2 +
 include/linux/group_cpus.h                |   3 +-
 include/linux/sched/isolation.h           |   1 +
 kernel/irq/affinity.c                     |   9 +--
 kernel/sched/isolation.c                  |   7 ++
 lib/group_cpus.c                          |  90 +++++++++++++++++++++--
 16 files changed, 290 insertions(+), 40 deletions(-)
---
base-commit: 3b607b75a345b1d808031bf1bb1038e4dac8d521
change-id: 20240620-isolcpus-io-queues-1a88eb47ff8b

Best regards,
-- 
Daniel Wagner <wagi@kernel.org>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-04-28 12:37   ` Thomas Gleixner
  2025-05-09  1:29   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 2/9] blk-mq: add number of queue calc helper Daniel Wagner
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

group_cpu_evenly might allocated less groups then the requested:

group_cpu_evenly
  __group_cpus_evenly
    alloc_nodes_groups
      # allocated total groups may be less than numgrps when
      # active total CPU number is less then numgrps

In this case, the caller will do an out of bound access because the
caller assumes the masks returned has numgrps.

Return the number of groups created so the caller can limit the access
range accordingly.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq-cpumap.c        |  6 +++---
 drivers/virtio/virtio_vdpa.c |  9 +++++----
 fs/fuse/virtio_fs.c          |  6 +++---
 include/linux/group_cpus.h   |  3 ++-
 kernel/irq/affinity.c        |  9 +++++----
 lib/group_cpus.c             | 12 +++++++++---
 6 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 444798c5374f48088b661b519f2638bda8556cf2..269161252add756897fce1b65cae5b2e6aebd647 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -19,9 +19,9 @@
 void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 {
 	const struct cpumask *masks;
-	unsigned int queue, cpu;
+	unsigned int queue, cpu, nr_masks;
 
-	masks = group_cpus_evenly(qmap->nr_queues);
+	masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
 	if (!masks) {
 		for_each_possible_cpu(cpu)
 			qmap->mq_map[cpu] = qmap->queue_offset;
@@ -29,7 +29,7 @@ void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 	}
 
 	for (queue = 0; queue < qmap->nr_queues; queue++) {
-		for_each_cpu(cpu, &masks[queue])
+		for_each_cpu(cpu, &masks[queue % nr_masks])
 			qmap->mq_map[cpu] = qmap->queue_offset + queue;
 	}
 	kfree(masks);
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 1f60c9d5cb1810a6f208c24bb2ac640d537391a0..a7b297dae4890c9d6002744b90fc133bbedb7b44 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -329,20 +329,21 @@ create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
 
 	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
 		unsigned int this_vecs = affd->set_size[i];
+		unsigned int nr_masks;
 		int j;
-		struct cpumask *result = group_cpus_evenly(this_vecs);
+		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
 
 		if (!result) {
 			kfree(masks);
 			return NULL;
 		}
 
-		for (j = 0; j < this_vecs; j++)
+		for (j = 0; j < nr_masks; j++)
 			cpumask_copy(&masks[curvec + j], &result[j]);
 		kfree(result);
 
-		curvec += this_vecs;
-		usedvecs += this_vecs;
+		curvec += nr_masks;
+		usedvecs += nr_masks;
 	}
 
 	/* Fill out vectors at the end that don't need affinity */
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 2c7b24cb67adb2cb329ed545f56f04700aca8b81..7ed43b9ea4f3f8b108f1e0d7050c27267b9941c9 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -862,7 +862,7 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
 static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *fs)
 {
 	const struct cpumask *mask, *masks;
-	unsigned int q, cpu;
+	unsigned int q, cpu, nr_masks;
 
 	/* First attempt to map using existing transport layer affinities
 	 * e.g. PCIe MSI-X
@@ -882,7 +882,7 @@ static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *f
 	return;
 fallback:
 	/* Attempt to map evenly in groups over the CPUs */
-	masks = group_cpus_evenly(fs->num_request_queues);
+	masks = group_cpus_evenly(fs->num_request_queues, &nr_masks);
 	/* If even this fails we default to all CPUs use first request queue */
 	if (!masks) {
 		for_each_possible_cpu(cpu)
@@ -891,7 +891,7 @@ static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *f
 	}
 
 	for (q = 0; q < fs->num_request_queues; q++) {
-		for_each_cpu(cpu, &masks[q])
+		for_each_cpu(cpu, &masks[q % nr_masks])
 			fs->mq_map[cpu] = q + VQ_REQUEST;
 	}
 	kfree(masks);
diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h
index e42807ec61f6e8cf3787af7daa0d8686edfef0a3..bd5dada6e8606fa6cf8f7babf939e39fd7475c8d 100644
--- a/include/linux/group_cpus.h
+++ b/include/linux/group_cpus.h
@@ -9,6 +9,7 @@
 #include <linux/kernel.h>
 #include <linux/cpu.h>
 
-struct cpumask *group_cpus_evenly(unsigned int numgrps);
+struct cpumask *group_cpus_evenly(unsigned int numgrps,
+				  unsigned int *nummasks);
 
 #endif
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 44a4eba80315cc098ecfa366ca1d88483641b12a..d2aefab5eb2b929877ced43f48b6268098484bd7 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -70,20 +70,21 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
 	 */
 	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
 		unsigned int this_vecs = affd->set_size[i];
+		unsigned int nr_masks;
 		int j;
-		struct cpumask *result = group_cpus_evenly(this_vecs);
+		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
 
 		if (!result) {
 			kfree(masks);
 			return NULL;
 		}
 
-		for (j = 0; j < this_vecs; j++)
+		for (j = 0; j < nr_masks; j++)
 			cpumask_copy(&masks[curvec + j].mask, &result[j]);
 		kfree(result);
 
-		curvec += this_vecs;
-		usedvecs += this_vecs;
+		curvec += nr_masks;
+		usedvecs += nr_masks;
 	}
 
 	/* Fill out vectors at the end that don't need affinity */
diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc13907ce9f211f479615d2e3c9154..016c6578a07616959470b47121459a16a1bc99e5 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -332,9 +332,11 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
 /**
  * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
  * @numgrps: number of groups
+ * @nummasks: number of initialized cpumasks
  *
  * Return: cpumask array if successful, NULL otherwise. And each element
- * includes CPUs assigned to this group
+ * includes CPUs assigned to this group. nummasks contains the number
+ * of initialized masks which can be less than numgrps.
  *
  * Try to put close CPUs from viewpoint of CPU and NUMA locality into
  * same group, and run two-stage grouping:
@@ -344,7 +346,8 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
  * We guarantee in the resulted grouping that all CPUs are covered, and
  * no same CPU is assigned to multiple groups
  */
-struct cpumask *group_cpus_evenly(unsigned int numgrps)
+struct cpumask *group_cpus_evenly(unsigned int numgrps,
+				  unsigned int *nummasks)
 {
 	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
 	cpumask_var_t *node_to_cpumask;
@@ -421,10 +424,12 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 		kfree(masks);
 		return NULL;
 	}
+	*nummasks = nr_present + nr_others;
 	return masks;
 }
 #else /* CONFIG_SMP */
-struct cpumask *group_cpus_evenly(unsigned int numgrps)
+struct cpumask *group_cpus_evenly(unsigned int numgrps,
+				  unsigned int *nummasks)
 {
 	struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
 
@@ -433,6 +438,7 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 
 	/* assign all CPUs(cpu 0) to the 1st group only */
 	cpumask_copy(&masks[0], cpu_possible_mask);
+	*nummasks = 1;
 	return masks;
 }
 #endif /* CONFIG_SMP */

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 2/9] blk-mq: add number of queue calc helper
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  1:43   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

Multiqueue devices should only allocate queues for the housekeeping CPUs
when isolcpus=io_queue is set. This avoids that the isolated CPUs get
disturbed with OS workload.

Add two variants of helpers which calculates the correct number of
queues which should be used. The need for two variants is necessary
because some drivers calculate their max number of queues based on the
possible CPU mask, others based on the online CPU mask.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq-cpumap.c  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 269161252add756897fce1b65cae5b2e6aebd647..6e6b3e989a5676186b5a31296a1b94b7602f1542 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -12,10 +12,55 @@
 #include <linux/cpu.h>
 #include <linux/group_cpus.h>
 #include <linux/device/bus.h>
+#include <linux/sched/isolation.h>
 
 #include "blk.h"
 #include "blk-mq.h"
 
+static unsigned int blk_mq_num_queues(const struct cpumask *mask,
+				      unsigned int max_queues)
+{
+	unsigned int num;
+
+	if (housekeeping_enabled(HK_TYPE_MANAGED_IRQ))
+		mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
+
+	num = cpumask_weight(mask);
+	return min_not_zero(num, max_queues);
+}
+
+/**
+ * blk_mq_num_possible_queues - Calc nr of queues for multiqueue devices
+ * @max_queues:	The maximal number of queues the hardware/driver
+ *		supports. If max_queues is 0, the argument is
+ *		ignored.
+ *
+ * Calculate the number of queues which should be used for a multiqueue
+ * device based on the number of possible cpu. The helper is considering
+ * isolcpus settings.
+ */
+unsigned int blk_mq_num_possible_queues(unsigned int max_queues)
+{
+	return blk_mq_num_queues(cpu_possible_mask, max_queues);
+}
+EXPORT_SYMBOL_GPL(blk_mq_num_possible_queues);
+
+/**
+ * blk_mq_num_online_queues - Calc nr of queues for multiqueue devices
+ * @max_queues:	The maximal number of queues the hardware/driver
+ *		supports. If max_queues is 0, the argument is
+ *		ignored.
+ *
+ * Calculate the number of queues which should be used for a multiqueue
+ * device based on the number of online cpus. The helper is considering
+ * isolcpus settings.
+ */
+unsigned int blk_mq_num_online_queues(unsigned int max_queues)
+{
+	return blk_mq_num_queues(cpu_online_mask, max_queues);
+}
+EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
+
 void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 {
 	const struct cpumask *masks;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8eb9b3310167c36f8a67ee8756a97d1274f8e73b..feed1dcaeef51c8db49d3fe667c64ecc824ce655 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -941,6 +941,8 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 void blk_mq_unfreeze_queue_non_owner(struct request_queue *q);
 void blk_freeze_queue_start_non_owner(struct request_queue *q);
 
+unsigned int blk_mq_num_possible_queues(unsigned int max_queues);
+unsigned int blk_mq_num_online_queues(unsigned int max_queues);
 void blk_mq_map_queues(struct blk_mq_queue_map *qmap);
 void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
 			  struct device *dev, unsigned int offset);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
  2025-04-24 18:19 ` [PATCH v6 2/9] blk-mq: add number of queue calc helper Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  1:47   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 4/9] scsi: " Daniel Wagner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

Multiqueue devices should only allocate queues for the housekeeping CPUs
when isolcpus=io_queue is set. This avoids that the isolated CPUs get
disturbed with OS workload.

Use helpers which calculates the correct number of queues which should
be used when isolcpus is used.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/pci.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b178d52eac1b7f7286e217226b9b3686d07b7b6c..2b1aa6833a12a5ecf7b293461a115026f97ea94c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -81,7 +81,7 @@ static int io_queue_count_set(const char *val, const struct kernel_param *kp)
 	int ret;
 
 	ret = kstrtouint(val, 10, &n);
-	if (ret != 0 || n > num_possible_cpus())
+	if (ret != 0 || n > blk_mq_num_possible_queues(0))
 		return -EINVAL;
 	return param_set_uint(val, kp);
 }
@@ -2448,7 +2448,8 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev)
 	 */
 	if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS)
 		return 1;
-	return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues;
+	return blk_mq_num_possible_queues(0) + dev->nr_write_queues +
+		dev->nr_poll_queues;
 }
 
 static int nvme_setup_io_queues(struct nvme_dev *dev)

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 4/9] scsi: use block layer helpers to calculate num of queues
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (2 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  1:49   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 5/9] virtio: blk/scsi: " Daniel Wagner
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

Multiqueue devices should only allocate queues for the housekeeping CPUs
when isolcpus=managed_irq is set. This avoids that the isolated CPUs get
disturbed with OS workload.

Use helpers which calculates the correct number of queues which should
be used when isolcpus is used.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 15 +++++++++------
 drivers/scsi/qla2xxx/qla_isr.c            | 10 +++++-----
 drivers/scsi/smartpqi/smartpqi_init.c     |  5 ++---
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 28c75865967af36c6390c5ee5767577ec1bcf779..a5f1117f3ddb20da04e0b29fd9d52d47ed1af3d8 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5962,7 +5962,8 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance)
 		else
 			instance->iopoll_q_count = 0;
 
-		num_msix_req = num_online_cpus() + instance->low_latency_index_start;
+		num_msix_req = blk_mq_num_online_queues(0) +
+			instance->low_latency_index_start;
 		instance->msix_vectors = min(num_msix_req,
 				instance->msix_vectors);
 
@@ -5978,7 +5979,8 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance)
 		/* Disable Balanced IOPS mode and try realloc vectors */
 		instance->perf_mode = MR_LATENCY_PERF_MODE;
 		instance->low_latency_index_start = 1;
-		num_msix_req = num_online_cpus() + instance->low_latency_index_start;
+		num_msix_req = blk_mq_num_online_queues(0) +
+			instance->low_latency_index_start;
 
 		instance->msix_vectors = min(num_msix_req,
 				instance->msix_vectors);
@@ -6234,7 +6236,7 @@ static int megasas_init_fw(struct megasas_instance *instance)
 		intr_coalescing = (scratch_pad_1 & MR_INTR_COALESCING_SUPPORT_OFFSET) ?
 								true : false;
 		if (intr_coalescing &&
-			(num_online_cpus() >= MR_HIGH_IOPS_QUEUE_COUNT) &&
+			(blk_mq_num_online_queues(0) >= MR_HIGH_IOPS_QUEUE_COUNT) &&
 			(instance->msix_vectors == MEGASAS_MAX_MSIX_QUEUES))
 			instance->perf_mode = MR_BALANCED_PERF_MODE;
 		else
@@ -6278,7 +6280,8 @@ static int megasas_init_fw(struct megasas_instance *instance)
 		else
 			instance->low_latency_index_start = 1;
 
-		num_msix_req = num_online_cpus() + instance->low_latency_index_start;
+		num_msix_req = blk_mq_num_online_queues(0) +
+			instance->low_latency_index_start;
 
 		instance->msix_vectors = min(num_msix_req,
 				instance->msix_vectors);
@@ -6310,8 +6313,8 @@ static int megasas_init_fw(struct megasas_instance *instance)
 	megasas_setup_reply_map(instance);
 
 	dev_info(&instance->pdev->dev,
-		"current msix/online cpus\t: (%d/%d)\n",
-		instance->msix_vectors, (unsigned int)num_online_cpus());
+		"current msix/max num queues\t: (%d/%u)\n",
+		instance->msix_vectors, blk_mq_num_online_queues(0));
 	dev_info(&instance->pdev->dev,
 		"RDPQ mode\t: (%s)\n", instance->is_rdpq ? "enabled" : "disabled");
 
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index fe98c76e9be32ff03a1960f366f0d700d1168383..c4c6b5c6658c0734f7ff68bcc31b33dde87296dd 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -4533,13 +4533,13 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
 	if (USER_CTRL_IRQ(ha) || !ha->mqiobase) {
 		/* user wants to control IRQ setting for target mode */
 		ret = pci_alloc_irq_vectors(ha->pdev, min_vecs,
-		    min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)),
-		    PCI_IRQ_MSIX);
+			blk_mq_num_online_queues(ha->msix_count) + min_vecs,
+			PCI_IRQ_MSIX);
 	} else
 		ret = pci_alloc_irq_vectors_affinity(ha->pdev, min_vecs,
-		    min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)),
-		    PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
-		    &desc);
+			blk_mq_num_online_queues(ha->msix_count) + min_vecs,
+			PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
+			&desc);
 
 	if (ret < 0) {
 		ql_log(ql_log_fatal, vha, 0x00c7,
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 0da7be40c925807519f5bff8d428a29e5ce454a5..7212cb96d0f9a337578fa2b982afa3ee6d17f4be 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -5278,15 +5278,14 @@ static void pqi_calculate_queue_resources(struct pqi_ctrl_info *ctrl_info)
 	if (reset_devices) {
 		num_queue_groups = 1;
 	} else {
-		int num_cpus;
 		int max_queue_groups;
 
 		max_queue_groups = min(ctrl_info->max_inbound_queues / 2,
 			ctrl_info->max_outbound_queues - 1);
 		max_queue_groups = min(max_queue_groups, PQI_MAX_QUEUE_GROUPS);
 
-		num_cpus = num_online_cpus();
-		num_queue_groups = min(num_cpus, ctrl_info->max_msix_vectors);
+		num_queue_groups =
+			blk_mq_num_online_queues(ctrl_info->max_msix_vectors);
 		num_queue_groups = min(num_queue_groups, max_queue_groups);
 	}
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 5/9] virtio: blk/scsi: use block layer helpers to calculate num of queues
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (3 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 4/9] scsi: " Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  1:52   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 6/9] isolation: introduce io_queue isolcpus type Daniel Wagner
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

Multiqueue devices should only allocate queues for the housekeeping CPUs
when isolcpus=io_queue is set. This avoids that the isolated CPUs get
disturbed with OS workload.

Use helpers which calculates the correct number of queues which should
be used when isolcpus is used.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/block/virtio_blk.c | 5 ++---
 drivers/scsi/virtio_scsi.c | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 7cffea01d868c6dcfe6734d3c89c1709fec07956..975036e8ddef5d622bab623843826ac26a0aa63d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -976,9 +976,8 @@ static int init_vq(struct virtio_blk *vblk)
 		return -EINVAL;
 	}
 
-	num_vqs = min_t(unsigned int,
-			min_not_zero(num_request_queues, nr_cpu_ids),
-			num_vqs);
+	num_vqs = blk_mq_num_possible_queues(
+			min_not_zero(num_request_queues, num_vqs));
 
 	num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
 
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 21ce3e9401929cd273fde08b0944e8b47e1e66cc..96a69edddbe5555574fc8fed1ba7c82a99df4472 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -919,6 +919,7 @@ static int virtscsi_probe(struct virtio_device *vdev)
 	/* We need to know how many queues before we allocate. */
 	num_queues = virtscsi_config_get(vdev, num_queues) ? : 1;
 	num_queues = min_t(unsigned int, nr_cpu_ids, num_queues);
+	num_queues = blk_mq_num_possible_queues(num_queues);
 
 	num_targets = virtscsi_config_get(vdev, max_target) + 1;
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 6/9] isolation: introduce io_queue isolcpus type
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (4 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 5/9] virtio: blk/scsi: " Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-04-25  6:26   ` Hannes Reinecke
  2025-04-24 18:19 ` [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs Daniel Wagner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

Multiqueue drivers spreading IO queues on all CPUs for optimal
performance. The drivers are not aware of the CPU isolated requirement
and will spread all queues ignoring the isolcpus configuration.

Introduce a new isolcpus mask which allows the user to define on which
CPUs IO queues should be placed. This is similar to the managed_irq but
for drivers which do not use the managed IRQ infrastructure.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 include/linux/sched/isolation.h | 1 +
 kernel/sched/isolation.c        | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index d8501f4709b583b8a1c91574446382f093bccdb1..6b6ae9c5b2f61a93c649a98ea27482b932627fca 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -9,6 +9,7 @@
 enum hk_type {
 	HK_TYPE_DOMAIN,
 	HK_TYPE_MANAGED_IRQ,
+	HK_TYPE_IO_QUEUE,
 	HK_TYPE_KERNEL_NOISE,
 	HK_TYPE_MAX,
 
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 81bc8b329ef17cd3a3f5ae0a20ca02af3a1a69bc..687b11f900e31ab656e25cae263f15f6d8f46a9a 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -11,6 +11,7 @@
 enum hk_flags {
 	HK_FLAG_DOMAIN		= BIT(HK_TYPE_DOMAIN),
 	HK_FLAG_MANAGED_IRQ	= BIT(HK_TYPE_MANAGED_IRQ),
+	HK_FLAG_IO_QUEUE	= BIT(HK_TYPE_IO_QUEUE),
 	HK_FLAG_KERNEL_NOISE	= BIT(HK_TYPE_KERNEL_NOISE),
 };
 
@@ -224,6 +225,12 @@ static int __init housekeeping_isolcpus_setup(char *str)
 			continue;
 		}
 
+		if (!strncmp(str, "io_queue,", 9)) {
+			str += 9;
+			flags |= HK_FLAG_IO_QUEUE;
+			continue;
+		}
+
 		/*
 		 * Skip unknown sub-parameter and validate that it is not
 		 * containing an invalid character.

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (5 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 6/9] isolation: introduce io_queue isolcpus type Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  2:22   ` Ming Lei
       [not found]   ` <cd1576ee-82a3-4899-b218-2e5c5334af6e@redhat.com>
  2025-04-24 18:19 ` [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

group_cpus_evenly distributes all present CPUs into groups. This ignores
the isolcpus configuration and assigns isolated CPUs into the groups.

Make group_cpus_evenly aware of isolcpus configuration and use the
housekeeping CPU mask as base for distributing the available CPUs into
groups.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 lib/group_cpus.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 016c6578a07616959470b47121459a16a1bc99e5..707997bca55344b18f63ccfa539ba77a89d8acb6 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -8,6 +8,7 @@
 #include <linux/cpu.h>
 #include <linux/sort.h>
 #include <linux/group_cpus.h>
+#include <linux/sched/isolation.h>
 
 #ifdef CONFIG_SMP
 
@@ -330,7 +331,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
 }
 
 /**
- * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
+ * group_possible_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
  * @numgrps: number of groups
  * @nummasks: number of initialized cpumasks
  *
@@ -346,8 +347,8 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
  * We guarantee in the resulted grouping that all CPUs are covered, and
  * no same CPU is assigned to multiple groups
  */
-struct cpumask *group_cpus_evenly(unsigned int numgrps,
-				  unsigned int *nummasks)
+static struct cpumask *group_possible_cpus_evenly(unsigned int numgrps,
+						  unsigned int *nummasks)
 {
 	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
 	cpumask_var_t *node_to_cpumask;
@@ -427,6 +428,81 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps,
 	*nummasks = nr_present + nr_others;
 	return masks;
 }
+
+/**
+ * group_mask_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
+ * @numgrps: number of groups
+ * @cpu_mask: CPU to consider for the grouping
+ * @nummasks: number of initialized cpusmasks
+ *
+ * Return: cpumask array if successful, NULL otherwise. And each element
+ * includes CPUs assigned to this group.
+ *
+ * Try to put close CPUs from viewpoint of CPU and NUMA locality into
+ * same group. Allocate present CPUs on these groups evenly.
+ */
+static struct cpumask *group_mask_cpus_evenly(unsigned int numgrps,
+					      const struct cpumask *cpu_mask,
+					      unsigned int *nummasks)
+{
+	cpumask_var_t *node_to_cpumask;
+	cpumask_var_t nmsk;
+	int ret = -ENOMEM;
+	struct cpumask *masks = NULL;
+
+	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+		return NULL;
+
+	node_to_cpumask = alloc_node_to_cpumask();
+	if (!node_to_cpumask)
+		goto fail_nmsk;
+
+	masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		goto fail_node_to_cpumask;
+
+	build_node_to_cpumask(node_to_cpumask);
+
+	ret = __group_cpus_evenly(0, numgrps, node_to_cpumask, cpu_mask, nmsk,
+				  masks);
+
+fail_node_to_cpumask:
+	free_node_to_cpumask(node_to_cpumask);
+
+fail_nmsk:
+	free_cpumask_var(nmsk);
+	if (ret < 0) {
+		kfree(masks);
+		return NULL;
+	}
+	*nummasks = ret;
+	return masks;
+}
+
+/**
+ * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
+ * @numgrps: number of groups
+ * @nummasks: number of initialized cpusmasks
+ *
+ * Return: cpumask array if successful, NULL otherwise.
+ *
+ * group_possible_cpus_evently() is used for distributing the cpus on all
+ * possible cpus in absence of isolcpus command line argument.
+ * group_mask_cpu_evenly() is used when the isolcpus command line
+ * argument is used with managed_irq option. In this case only the
+ * housekeeping CPUs are considered.
+ */
+struct cpumask *group_cpus_evenly(unsigned int numgrps,
+				  unsigned int *nummasks)
+{
+	if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
+		return group_mask_cpus_evenly(numgrps,
+				housekeeping_cpumask(HK_TYPE_IO_QUEUE),
+				nummasks);
+	}
+
+	return group_possible_cpus_evenly(numgrps, nummasks);
+}
 #else /* CONFIG_SMP */
 struct cpumask *group_cpus_evenly(unsigned int numgrps,
 				  unsigned int *nummasks)

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (6 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-05-09  2:38   ` Ming Lei
  2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
  2025-05-06  3:17 ` [PATCH v6 0/9] blk: honor isolcpus configuration Ming Lei
  9 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

When isolcpus=io_queue is enabled all hardware queues should run on
the housekeeping CPUs only. Thus ignore the affinity mask provided by
the driver. Also we can't use blk_mq_map_queues because it will map all
CPUs to first hctx unless, the CPU is the same as the hctx has the
affinity set to, e.g. 8 CPUs with isolcpus=io_queue,2-3,6-7 config

  queue mapping for /dev/nvme0n1
        hctx0: default 2 3 4 6 7
        hctx1: default 5
        hctx2: default 0
        hctx3: default 1

  PCI name is 00:05.0: nvme0n1
        irq 57 affinity 0-1 effective 1 is_managed:0 nvme0q0
        irq 58 affinity 4 effective 4 is_managed:1 nvme0q1
        irq 59 affinity 5 effective 5 is_managed:1 nvme0q2
        irq 60 affinity 0 effective 0 is_managed:1 nvme0q3
        irq 61 affinity 1 effective 1 is_managed:1 nvme0q4

where as with blk_mq_hk_map_queues we get

  queue mapping for /dev/nvme0n1
        hctx0: default 2 4
        hctx1: default 3 5
        hctx2: default 0 6
        hctx3: default 1 7

  PCI name is 00:05.0: nvme0n1
        irq 56 affinity 0-1 effective 1 is_managed:0 nvme0q0
        irq 61 affinity 4 effective 4 is_managed:1 nvme0q1
        irq 62 affinity 5 effective 5 is_managed:1 nvme0q2
        irq 63 affinity 0 effective 0 is_managed:1 nvme0q3
        irq 64 affinity 1 effective 1 is_managed:1 nvme0q4

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq-cpumap.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 6e6b3e989a5676186b5a31296a1b94b7602f1542..2d678d1db2b5196fc2b2ce5678fdb0cb6bad26e0 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -22,8 +22,8 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
 {
 	unsigned int num;
 
-	if (housekeeping_enabled(HK_TYPE_MANAGED_IRQ))
-		mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
+	if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
+		mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
 
 	num = cpumask_weight(mask);
 	return min_not_zero(num, max_queues);
@@ -61,11 +61,73 @@ unsigned int blk_mq_num_online_queues(unsigned int max_queues)
 }
 EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
 
+/*
+ * blk_mq_map_hk_queues - Create housekeeping CPU to hardware queue mapping
+ * @qmap:	CPU to hardware queue map
+ *
+ * Create a housekeeping CPU to hardware queue mapping in @qmap. If the
+ * isolcpus feature is enabled and blk_mq_map_hk_queues returns true,
+ * @qmap contains a valid configuration honoring the io_queue
+ * configuration. If the isolcpus feature is disabled this function
+ * returns false.
+ */
+static bool blk_mq_map_hk_queues(struct blk_mq_queue_map *qmap)
+{
+	struct cpumask *hk_masks;
+	cpumask_var_t isol_mask;
+	unsigned int queue, cpu, nr_masks;
+
+	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
+		return false;
+
+	/* map housekeeping cpus to matching hardware context */
+	hk_masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
+	if (!hk_masks)
+		goto fallback;
+
+	for (queue = 0; queue < qmap->nr_queues; queue++) {
+		for_each_cpu(cpu, &hk_masks[queue % nr_masks])
+			qmap->mq_map[cpu] = qmap->queue_offset + queue;
+	}
+
+	kfree(hk_masks);
+
+	/* map isolcpus to hardware context */
+	if (!alloc_cpumask_var(&isol_mask, GFP_KERNEL))
+		goto fallback;
+
+	queue = 0;
+	cpumask_andnot(isol_mask,
+		       cpu_possible_mask,
+		       housekeeping_cpumask(HK_TYPE_IO_QUEUE));
+
+	for_each_cpu(cpu, isol_mask) {
+		qmap->mq_map[cpu] = qmap->queue_offset + queue;
+		queue = (queue + 1) % qmap->nr_queues;
+	}
+
+	free_cpumask_var(isol_mask);
+
+	return true;
+
+fallback:
+	/* map all cpus to hardware context ignoring any affinity */
+	queue = 0;
+	for_each_possible_cpu(cpu) {
+		qmap->mq_map[cpu] = qmap->queue_offset + queue;
+		queue = (queue + 1) % qmap->nr_queues;
+	}
+	return true;
+}
+
 void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 {
 	const struct cpumask *masks;
 	unsigned int queue, cpu, nr_masks;
 
+	if (blk_mq_map_hk_queues(qmap))
+		return;
+
 	masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
 	if (!masks) {
 		for_each_possible_cpu(cpu)
@@ -120,6 +182,9 @@ void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
 	if (!dev->bus->irq_get_affinity)
 		goto fallback;
 
+	if (blk_mq_map_hk_queues(qmap))
+		return;
+
 	for (queue = 0; queue < qmap->nr_queues; queue++) {
 		mask = dev->bus->irq_get_affinity(dev, queue + offset);
 		if (!mask)

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (7 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
@ 2025-04-24 18:19 ` Daniel Wagner
  2025-04-25  6:28   ` Hannes Reinecke
  2025-05-09  2:54   ` Ming Lei
  2025-05-06  3:17 ` [PATCH v6 0/9] blk: honor isolcpus configuration Ming Lei
  9 siblings, 2 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

When isolcpus=io_queue is enabled, and the last housekeeping CPU for a
given hctx would go offline, there would be no CPU left which handles
the IOs. To prevent IO stalls, prevent offlining housekeeping CPUs which
are still severing isolated CPUs..

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2697db591091200cdb9f6e082e472b829701e4c..aff17673b773583dfb2b01cb2f5f010c456bd834 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3627,6 +3627,48 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
 	return data.has_rq;
 }
 
+static bool blk_mq_hctx_check_isolcpus_online(struct blk_mq_hw_ctx *hctx, unsigned int cpu)
+{
+	const struct cpumask *hk_mask;
+	int i;
+
+	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
+		return true;
+
+	hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
+
+	for (i = 0; i < hctx->nr_ctx; i++) {
+		struct blk_mq_ctx *ctx = hctx->ctxs[i];
+
+		if (ctx->cpu == cpu)
+			continue;
+
+		/*
+		 * Check if this context has at least one online
+		 * housekeeping CPU in this case the hardware context is
+		 * usable.
+		 */
+		if (cpumask_test_cpu(ctx->cpu, hk_mask) &&
+		    cpu_online(ctx->cpu))
+			break;
+
+		/*
+		 * The context doesn't have any online housekeeping CPUs
+		 * but there might be an online isolated CPU mapped to
+		 * it.
+		 */
+		if (cpu_is_offline(ctx->cpu))
+			continue;
+
+		pr_warn("%s: trying to offline hctx%d but there is still an online isolcpu CPU %d mapped to it\n",
+			hctx->queue->disk->disk_name,
+			hctx->queue_num, ctx->cpu);
+		return true;
+	}
+
+	return false;
+}
+
 static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
 		unsigned int this_cpu)
 {
@@ -3647,7 +3689,7 @@ static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
 
 		/* this hctx has at least one online CPU */
 		if (this_cpu != cpu)
-			return true;
+			return blk_mq_hctx_check_isolcpus_online(hctx, this_cpu);
 	}
 
 	return false;
@@ -3659,7 +3701,7 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
 			struct blk_mq_hw_ctx, cpuhp_online);
 
 	if (blk_mq_hctx_has_online_cpu(hctx, cpu))
-		return 0;
+		return -EINVAL;
 
 	/*
 	 * Prevent new request from being allocated on the current hctx.

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 6/9] isolation: introduce io_queue isolcpus type
  2025-04-24 18:19 ` [PATCH v6 6/9] isolation: introduce io_queue isolcpus type Daniel Wagner
@ 2025-04-25  6:26   ` Hannes Reinecke
  2025-04-25  7:32     ` Daniel Wagner
  0 siblings, 1 reply; 28+ messages in thread
From: Hannes Reinecke @ 2025-04-25  6:26 UTC (permalink / raw)
  To: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Mathieu Desnoyers, linux-kernel, linux-block,
	linux-nvme, megaraidlinux.pdl, linux-scsi, storagedev,
	virtualization, GR-QLogic-Storage-Upstream

On 4/24/25 20:19, Daniel Wagner wrote:
> Multiqueue drivers spreading IO queues on all CPUs for optimal
> performance. The drivers are not aware of the CPU isolated requirement
> and will spread all queues ignoring the isolcpus configuration.
> 
> Introduce a new isolcpus mask which allows the user to define on which
> CPUs IO queues should be placed. This is similar to the managed_irq but
> for drivers which do not use the managed IRQ infrastructure.
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   include/linux/sched/isolation.h | 1 +
>   kernel/sched/isolation.c        | 7 +++++++
>   2 files changed, 8 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs
  2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
@ 2025-04-25  6:28   ` Hannes Reinecke
  2025-05-09  2:54   ` Ming Lei
  1 sibling, 0 replies; 28+ messages in thread
From: Hannes Reinecke @ 2025-04-25  6:28 UTC (permalink / raw)
  To: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Mathieu Desnoyers, linux-kernel, linux-block,
	linux-nvme, megaraidlinux.pdl, linux-scsi, storagedev,
	virtualization, GR-QLogic-Storage-Upstream

On 4/24/25 20:19, Daniel Wagner wrote:
> When isolcpus=io_queue is enabled, and the last housekeeping CPU for a
> given hctx would go offline, there would be no CPU left which handles
> the IOs. To prevent IO stalls, prevent offlining housekeeping CPUs which
> are still severing isolated CPUs..
             serving

> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>   block/blk-mq.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c2697db591091200cdb9f6e082e472b829701e4c..aff17673b773583dfb2b01cb2f5f010c456bd834 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3627,6 +3627,48 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>   	return data.has_rq;
>   }
>   
> +static bool blk_mq_hctx_check_isolcpus_online(struct blk_mq_hw_ctx *hctx, unsigned int cpu)
> +{
> +	const struct cpumask *hk_mask;
> +	int i;
> +
> +	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		return true;
> +
> +	hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
> +
> +	for (i = 0; i < hctx->nr_ctx; i++) {
> +		struct blk_mq_ctx *ctx = hctx->ctxs[i];
> +
> +		if (ctx->cpu == cpu)
> +			continue;
> +
> +		/*
> +		 * Check if this context has at least one online
> +		 * housekeeping CPU in this case the hardware context is
> +		 * usable.
> +		 */
> +		if (cpumask_test_cpu(ctx->cpu, hk_mask) &&
> +		    cpu_online(ctx->cpu))
> +			break;
> +
> +		/*
> +		 * The context doesn't have any online housekeeping CPUs
> +		 * but there might be an online isolated CPU mapped to
> +		 * it.
> +		 */
> +		if (cpu_is_offline(ctx->cpu))
> +			continue;
> +
> +		pr_warn("%s: trying to offline hctx%d but there is still an online isolcpu CPU %d mapped to it\n",
> +			hctx->queue->disk->disk_name,
> +			hctx->queue_num, ctx->cpu);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>   static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
>   		unsigned int this_cpu)
>   {
> @@ -3647,7 +3689,7 @@ static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
>   
>   		/* this hctx has at least one online CPU */
>   		if (this_cpu != cpu)
> -			return true;
> +			return blk_mq_hctx_check_isolcpus_online(hctx, this_cpu);
>   	}
>   
>   	return false;
> @@ -3659,7 +3701,7 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
>   			struct blk_mq_hw_ctx, cpuhp_online);
>   
>   	if (blk_mq_hctx_has_online_cpu(hctx, cpu))
> -		return 0;
> +		return -EINVAL;
>   
>   	/*
>   	 * Prevent new request from being allocated on the current hctx.
> 
Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 6/9] isolation: introduce io_queue isolcpus type
  2025-04-25  6:26   ` Hannes Reinecke
@ 2025-04-25  7:32     ` Daniel Wagner
  2025-05-09  2:04       ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Wagner @ 2025-04-25  7:32 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin, Martin K. Petersen,
	Thomas Gleixner, Costa Shulyupin, Juri Lelli, Valentin Schneider,
	Waiman Long, Ming Lei, Frederic Weisbecker, Mel Gorman,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Fri, Apr 25, 2025 at 08:26:22AM +0200, Hannes Reinecke wrote:
> On 4/24/25 20:19, Daniel Wagner wrote:
> > Multiqueue drivers spreading IO queues on all CPUs for optimal
> > performance. The drivers are not aware of the CPU isolated requirement
> > and will spread all queues ignoring the isolcpus configuration.
> > 
> > Introduce a new isolcpus mask which allows the user to define on which
> > CPUs IO queues should be placed. This is similar to the managed_irq but
> > for drivers which do not use the managed IRQ infrastructure.
> > 
> > Signed-off-by: Daniel Wagner <wagi@kernel.org>
> > ---
> >   include/linux/sched/isolation.h | 1 +
> >   kernel/sched/isolation.c        | 7 +++++++
> >   2 files changed, 8 insertions(+)
> > 
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Just realized I forgot to also add some document on this new argument:

			io_queue
			  Isolate from IO queue work caused by multiqueue
			  device drivers. Restrict the placement of
			  queues to housekeeping CPUs only, ensuring that
			  all IO work is processed by a housekeeping CPU.

			  Note: When an isolated CPU issues an IO, it is
			  forwarded to a housekeeping CPU. This will
			  trigger a software interrupt on the completion
			  path.

			  Note: It is not possible to offline housekeeping
			  CPUs that serve isolated CPUs.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
@ 2025-04-28 12:37   ` Thomas Gleixner
  2025-05-09  1:29   ` Ming Lei
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2025-04-28 12:37 UTC (permalink / raw)
  To: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin
  Cc: Martin K. Petersen, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

On Thu, Apr 24 2025 at 20:19, Daniel Wagner wrote:

"let group_cpu_evenly return number initialized masks' is not a
sentence.

  Let group_cpu_evenly() return the number of initialized masks

is actually parseable.

> group_cpu_evenly might allocated less groups then the requested:

group_cpu_evenly() might have .... then requested.

> group_cpu_evenly
>   __group_cpus_evenly
>     alloc_nodes_groups
>       # allocated total groups may be less than numgrps when
>       # active total CPU number is less then numgrps
>
> In this case, the caller will do an out of bound access because the
> caller assumes the masks returned has numgrps.
>
> Return the number of groups created so the caller can limit the access
> range accordingly.
>
> --- a/include/linux/group_cpus.h
> +++ b/include/linux/group_cpus.h
> @@ -9,6 +9,7 @@
>  #include <linux/kernel.h>
>  #include <linux/cpu.h>
>  
> -struct cpumask *group_cpus_evenly(unsigned int numgrps);
> +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> +				  unsigned int *nummasks);

One line

>  #endif
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 44a4eba80315cc098ecfa366ca1d88483641b12a..d2aefab5eb2b929877ced43f48b6268098484bd7 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -70,20 +70,21 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
>  	 */
>  	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
>  		unsigned int this_vecs = affd->set_size[i];
> +		unsigned int nr_masks;

  unsigned int nr_masks, this_vect = ....

>  		int j;

As yoou touch the loop anyway, move this into the for ()

> -		struct cpumask *result = group_cpus_evenly(this_vecs);
> +		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
>  
>  		if (!result) {
>  			kfree(masks);
>  			return NULL;
>  		}
>  
> -		for (j = 0; j < this_vecs; j++)

                for (int j = 0; ....)

> +		for (j = 0; j < nr_masks; j++)
>  			cpumask_copy(&masks[curvec + j].mask, &result[j]);
>  		kfree(result);
>  
> -		curvec += this_vecs;
> -		usedvecs += this_vecs;
> +		curvec += nr_masks;
> +		usedvecs += nr_masks;
>  	}
>  
>  	/* Fill out vectors at the end that don't need affinity */
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc13907ce9f211f479615d2e3c9154..016c6578a07616959470b47121459a16a1bc99e5 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -332,9 +332,11 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>  /**
>   * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
>   * @numgrps: number of groups
> + * @nummasks: number of initialized cpumasks
>   *
>   * Return: cpumask array if successful, NULL otherwise. And each element
> - * includes CPUs assigned to this group
> + * includes CPUs assigned to this group. nummasks contains the number
> + * of initialized masks which can be less than numgrps.
>   *
>   * Try to put close CPUs from viewpoint of CPU and NUMA locality into
>   * same group, and run two-stage grouping:
> @@ -344,7 +346,8 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>   * We guarantee in the resulted grouping that all CPUs are covered, and
>   * no same CPU is assigned to multiple groups
>   */
> -struct cpumask *group_cpus_evenly(unsigned int numgrps)
> +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> +				  unsigned int *nummasks)

No line break required.

>  {
>  	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
>  	cpumask_var_t *node_to_cpumask;
> @@ -421,10 +424,12 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>  		kfree(masks);
>  		return NULL;
>  	}
> +	*nummasks = nr_present + nr_others;
>  	return masks;
>  }
>  #else /* CONFIG_SMP */
> -struct cpumask *group_cpus_evenly(unsigned int numgrps)
> +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> +				  unsigned int *nummasks)

Ditto.

Other than that:

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 0/9] blk: honor isolcpus configuration
  2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
                   ` (8 preceding siblings ...)
  2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
@ 2025-05-06  3:17 ` Ming Lei
  9 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-06  3:17 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:39PM +0200, Daniel Wagner wrote:
> I've added back the isolcpus io_queue agrument. This avoids any semantic
> changes of managed_irq.

IMO, this is correct thing to do.

> I don't like it but I haven't found a
> better way to deal with it. Ming clearly stated managed_irq should not
> change.

Precisely, we can't cause io hang and break existing managed_irq applications,
especially you know there isn't kernel solution for it, same for v5, v6 or
whatever.

I will look at v6 this week.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
  2025-04-28 12:37   ` Thomas Gleixner
@ 2025-05-09  1:29   ` Ming Lei
  1 sibling, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  1:29 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:40PM +0200, Daniel Wagner wrote:
> group_cpu_evenly might allocated less groups then the requested:
> 
> group_cpu_evenly
>   __group_cpus_evenly
>     alloc_nodes_groups
>       # allocated total groups may be less than numgrps when
>       # active total CPU number is less then numgrps
> 
> In this case, the caller will do an out of bound access because the
> caller assumes the masks returned has numgrps.
> 
> Return the number of groups created so the caller can limit the access
> range accordingly.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  block/blk-mq-cpumap.c        |  6 +++---
>  drivers/virtio/virtio_vdpa.c |  9 +++++----
>  fs/fuse/virtio_fs.c          |  6 +++---
>  include/linux/group_cpus.h   |  3 ++-
>  kernel/irq/affinity.c        |  9 +++++----
>  lib/group_cpus.c             | 12 +++++++++---
>  6 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 444798c5374f48088b661b519f2638bda8556cf2..269161252add756897fce1b65cae5b2e6aebd647 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -19,9 +19,9 @@
>  void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  {
>  	const struct cpumask *masks;
> -	unsigned int queue, cpu;
> +	unsigned int queue, cpu, nr_masks;
>  
> -	masks = group_cpus_evenly(qmap->nr_queues);
> +	masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
>  	if (!masks) {
>  		for_each_possible_cpu(cpu)
>  			qmap->mq_map[cpu] = qmap->queue_offset;
> @@ -29,7 +29,7 @@ void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  	}
>  
>  	for (queue = 0; queue < qmap->nr_queues; queue++) {
> -		for_each_cpu(cpu, &masks[queue])
> +		for_each_cpu(cpu, &masks[queue % nr_masks])
>  			qmap->mq_map[cpu] = qmap->queue_offset + queue;
>  	}
>  	kfree(masks);
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 1f60c9d5cb1810a6f208c24bb2ac640d537391a0..a7b297dae4890c9d6002744b90fc133bbedb7b44 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -329,20 +329,21 @@ create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
>  
>  	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
>  		unsigned int this_vecs = affd->set_size[i];
> +		unsigned int nr_masks;
>  		int j;
> -		struct cpumask *result = group_cpus_evenly(this_vecs);
> +		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
>  
>  		if (!result) {
>  			kfree(masks);
>  			return NULL;
>  		}
>  
> -		for (j = 0; j < this_vecs; j++)
> +		for (j = 0; j < nr_masks; j++)
>  			cpumask_copy(&masks[curvec + j], &result[j]);
>  		kfree(result);
>  
> -		curvec += this_vecs;
> -		usedvecs += this_vecs;
> +		curvec += nr_masks;
> +		usedvecs += nr_masks;
>  	}
>  
>  	/* Fill out vectors at the end that don't need affinity */
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 2c7b24cb67adb2cb329ed545f56f04700aca8b81..7ed43b9ea4f3f8b108f1e0d7050c27267b9941c9 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -862,7 +862,7 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
>  static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *fs)
>  {
>  	const struct cpumask *mask, *masks;
> -	unsigned int q, cpu;
> +	unsigned int q, cpu, nr_masks;
>  
>  	/* First attempt to map using existing transport layer affinities
>  	 * e.g. PCIe MSI-X
> @@ -882,7 +882,7 @@ static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *f
>  	return;
>  fallback:
>  	/* Attempt to map evenly in groups over the CPUs */
> -	masks = group_cpus_evenly(fs->num_request_queues);
> +	masks = group_cpus_evenly(fs->num_request_queues, &nr_masks);
>  	/* If even this fails we default to all CPUs use first request queue */
>  	if (!masks) {
>  		for_each_possible_cpu(cpu)
> @@ -891,7 +891,7 @@ static void virtio_fs_map_queues(struct virtio_device *vdev, struct virtio_fs *f
>  	}
>  
>  	for (q = 0; q < fs->num_request_queues; q++) {
> -		for_each_cpu(cpu, &masks[q])
> +		for_each_cpu(cpu, &masks[q % nr_masks])
>  			fs->mq_map[cpu] = q + VQ_REQUEST;
>  	}
>  	kfree(masks);
> diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h
> index e42807ec61f6e8cf3787af7daa0d8686edfef0a3..bd5dada6e8606fa6cf8f7babf939e39fd7475c8d 100644
> --- a/include/linux/group_cpus.h
> +++ b/include/linux/group_cpus.h
> @@ -9,6 +9,7 @@
>  #include <linux/kernel.h>
>  #include <linux/cpu.h>
>  
> -struct cpumask *group_cpus_evenly(unsigned int numgrps);
> +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> +				  unsigned int *nummasks);
>  
>  #endif
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 44a4eba80315cc098ecfa366ca1d88483641b12a..d2aefab5eb2b929877ced43f48b6268098484bd7 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -70,20 +70,21 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
>  	 */
>  	for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
>  		unsigned int this_vecs = affd->set_size[i];
> +		unsigned int nr_masks;
>  		int j;
> -		struct cpumask *result = group_cpus_evenly(this_vecs);
> +		struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks);
>  
>  		if (!result) {
>  			kfree(masks);
>  			return NULL;
>  		}
>  
> -		for (j = 0; j < this_vecs; j++)
> +		for (j = 0; j < nr_masks; j++)
>  			cpumask_copy(&masks[curvec + j].mask, &result[j]);
>  		kfree(result);
>  
> -		curvec += this_vecs;
> -		usedvecs += this_vecs;
> +		curvec += nr_masks;
> +		usedvecs += nr_masks;
>  	}
>  
>  	/* Fill out vectors at the end that don't need affinity */
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc13907ce9f211f479615d2e3c9154..016c6578a07616959470b47121459a16a1bc99e5 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -332,9 +332,11 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>  /**
>   * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
>   * @numgrps: number of groups
> + * @nummasks: number of initialized cpumasks
>   *
>   * Return: cpumask array if successful, NULL otherwise. And each element
> - * includes CPUs assigned to this group
> + * includes CPUs assigned to this group. nummasks contains the number
> + * of initialized masks which can be less than numgrps.
>   *
>   * Try to put close CPUs from viewpoint of CPU and NUMA locality into
>   * same group, and run two-stage grouping:
> @@ -344,7 +346,8 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>   * We guarantee in the resulted grouping that all CPUs are covered, and
>   * no same CPU is assigned to multiple groups
>   */
> -struct cpumask *group_cpus_evenly(unsigned int numgrps)
> +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> +				  unsigned int *nummasks)
>  {
>  	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
>  	cpumask_var_t *node_to_cpumask;
> @@ -421,10 +424,12 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>  		kfree(masks);
>  		return NULL;
>  	}
> +	*nummasks = nr_present + nr_others;

WARN_ON(nr_present + nr_others < numgrps) can be removed now.

Other than that and with Thomas's comment addressed:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/9] blk-mq: add number of queue calc helper
  2025-04-24 18:19 ` [PATCH v6 2/9] blk-mq: add number of queue calc helper Daniel Wagner
@ 2025-05-09  1:43   ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  1:43 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:41PM +0200, Daniel Wagner wrote:
> Multiqueue devices should only allocate queues for the housekeeping CPUs
> when isolcpus=io_queue is set. This avoids that the isolated CPUs get
> disturbed with OS workload.

io_queue isn't introduced yet, so the commit log needs to be updated.

Otherwise, looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues
  2025-04-24 18:19 ` [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
@ 2025-05-09  1:47   ` Ming Lei
  2025-05-14 16:12     ` Daniel Wagner
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-05-09  1:47 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:42PM +0200, Daniel Wagner wrote:
> Multiqueue devices should only allocate queues for the housekeeping CPUs
> when isolcpus=io_queue is set. This avoids that the isolated CPUs get
> disturbed with OS workload.

The commit log needs to be updated:

- io_queue isn't introduced yet

- this patch can only reduce nr_hw_queues, and queue mapping isn't changed
yet, so nothing to do with

 "This avoids that the isolated CPUs get disturbed with OS workload"


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 4/9] scsi: use block layer helpers to calculate num of queues
  2025-04-24 18:19 ` [PATCH v6 4/9] scsi: " Daniel Wagner
@ 2025-05-09  1:49   ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  1:49 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:43PM +0200, Daniel Wagner wrote:
> Multiqueue devices should only allocate queues for the housekeeping CPUs
> when isolcpus=managed_irq is set.

> This avoids that the isolated CPUs get disturbed with OS workload.

The above words should be removed, since that isn't what the patch is
doing.

Otherwise, looks fine to me:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 5/9] virtio: blk/scsi: use block layer helpers to calculate num of queues
  2025-04-24 18:19 ` [PATCH v6 5/9] virtio: blk/scsi: " Daniel Wagner
@ 2025-05-09  1:52   ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  1:52 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:44PM +0200, Daniel Wagner wrote:
> Multiqueue devices should only allocate queues for the housekeeping CPUs
> when isolcpus=io_queue is set. This avoids that the isolated CPUs get
> disturbed with OS workload.

With commit log fixed:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

thanks,
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 6/9] isolation: introduce io_queue isolcpus type
  2025-04-25  7:32     ` Daniel Wagner
@ 2025-05-09  2:04       ` Ming Lei
  2025-05-14 16:08         ` Daniel Wagner
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-05-09  2:04 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Hannes Reinecke, Daniel Wagner, Jens Axboe, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Michael S. Tsirkin,
	Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Frederic Weisbecker, Mel Gorman,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Fri, Apr 25, 2025 at 09:32:16AM +0200, Daniel Wagner wrote:
> On Fri, Apr 25, 2025 at 08:26:22AM +0200, Hannes Reinecke wrote:
> > On 4/24/25 20:19, Daniel Wagner wrote:
> > > Multiqueue drivers spreading IO queues on all CPUs for optimal
> > > performance. The drivers are not aware of the CPU isolated requirement
> > > and will spread all queues ignoring the isolcpus configuration.
> > > 
> > > Introduce a new isolcpus mask which allows the user to define on which
> > > CPUs IO queues should be placed. This is similar to the managed_irq but
> > > for drivers which do not use the managed IRQ infrastructure.
> > > 
> > > Signed-off-by: Daniel Wagner <wagi@kernel.org>
> > > ---
> > >   include/linux/sched/isolation.h | 1 +
> > >   kernel/sched/isolation.c        | 7 +++++++
> > >   2 files changed, 8 insertions(+)
> > > 
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> 
> Just realized I forgot to also add some document on this new argument:
> 
> 			io_queue
> 			  Isolate from IO queue work caused by multiqueue
> 			  device drivers. Restrict the placement of
> 			  queues to housekeeping CPUs only, ensuring that
> 			  all IO work is processed by a housekeeping CPU.
> 
> 			  Note: When an isolated CPU issues an IO, it is
> 			  forwarded to a housekeeping CPU. This will
> 			  trigger a software interrupt on the completion
> 			  path.
> 
> 			  Note: It is not possible to offline housekeeping
> 			  CPUs that serve isolated CPUs.

This patch adds kernel parameter only, but not apply it at all, the above
words just confuses everyone, so I'd suggest to not expose the kernel
command line & document until the whole mechanism is supported.

Especially 'irqaffinity=0 isolcpus=io_queue' requires the application
to offline CPU in order, which has to be documented:

https://lore.kernel.org/all/cc5e44dd-e1dc-4f24-88d9-ce45a8b0794f@flourine.local/

Thanks,
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs
  2025-04-24 18:19 ` [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs Daniel Wagner
@ 2025-05-09  2:22   ` Ming Lei
       [not found]   ` <cd1576ee-82a3-4899-b218-2e5c5334af6e@redhat.com>
  1 sibling, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  2:22 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:46PM +0200, Daniel Wagner wrote:
> group_cpus_evenly distributes all present CPUs into groups. This ignores
> the isolcpus configuration and assigns isolated CPUs into the groups.
> 
> Make group_cpus_evenly aware of isolcpus configuration and use the
> housekeeping CPU mask as base for distributing the available CPUs into
> groups.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  lib/group_cpus.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 79 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 016c6578a07616959470b47121459a16a1bc99e5..707997bca55344b18f63ccfa539ba77a89d8acb6 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -8,6 +8,7 @@
>  #include <linux/cpu.h>
>  #include <linux/sort.h>
>  #include <linux/group_cpus.h>
> +#include <linux/sched/isolation.h>
>  
>  #ifdef CONFIG_SMP
>  
> @@ -330,7 +331,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>  }
>  
>  /**
> - * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
> + * group_possible_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
>   * @numgrps: number of groups
>   * @nummasks: number of initialized cpumasks
>   *
> @@ -346,8 +347,8 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>   * We guarantee in the resulted grouping that all CPUs are covered, and
>   * no same CPU is assigned to multiple groups
>   */
> -struct cpumask *group_cpus_evenly(unsigned int numgrps,
> -				  unsigned int *nummasks)
> +static struct cpumask *group_possible_cpus_evenly(unsigned int numgrps,
> +						  unsigned int *nummasks)
>  {
>  	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
>  	cpumask_var_t *node_to_cpumask;
> @@ -427,6 +428,81 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps,
>  	*nummasks = nr_present + nr_others;
>  	return masks;
>  }
> +
> +/**
> + * group_mask_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
> + * @numgrps: number of groups
> + * @cpu_mask: CPU to consider for the grouping
> + * @nummasks: number of initialized cpusmasks
> + *
> + * Return: cpumask array if successful, NULL otherwise. And each element
> + * includes CPUs assigned to this group.
> + *
> + * Try to put close CPUs from viewpoint of CPU and NUMA locality into
> + * same group. Allocate present CPUs on these groups evenly.
> + */
> +static struct cpumask *group_mask_cpus_evenly(unsigned int numgrps,
> +					      const struct cpumask *cpu_mask,
> +					      unsigned int *nummasks)
> +{
> +	cpumask_var_t *node_to_cpumask;
> +	cpumask_var_t nmsk;
> +	int ret = -ENOMEM;
> +	struct cpumask *masks = NULL;
> +
> +	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
> +		return NULL;
> +
> +	node_to_cpumask = alloc_node_to_cpumask();
> +	if (!node_to_cpumask)
> +		goto fail_nmsk;
> +
> +	masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
> +	if (!masks)
> +		goto fail_node_to_cpumask;
> +
> +	build_node_to_cpumask(node_to_cpumask);
> +
> +	ret = __group_cpus_evenly(0, numgrps, node_to_cpumask, cpu_mask, nmsk,
> +				  masks);
> +
> +fail_node_to_cpumask:
> +	free_node_to_cpumask(node_to_cpumask);
> +
> +fail_nmsk:
> +	free_cpumask_var(nmsk);
> +	if (ret < 0) {
> +		kfree(masks);
> +		return NULL;
> +	}
> +	*nummasks = ret;
> +	return masks;
> +}
> +
> +/**
> + * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
> + * @numgrps: number of groups
> + * @nummasks: number of initialized cpusmasks
> + *
> + * Return: cpumask array if successful, NULL otherwise.
> + *
> + * group_possible_cpus_evently() is used for distributing the cpus on all

s/evently/evenly/

> + * possible cpus in absence of isolcpus command line argument.

s/isolcpus/isolcpus=io_queue

> + * group_mask_cpu_evenly() is used when the isolcpus command line
> + * argument is used with managed_irq option. In this case only the

s/managed_irq/io_queue

> + * housekeeping CPUs are considered.

I'd suggest to highlight the difference, which is one fundamental thing,
originally all CPUs are covered, now only housekeeping CPUs are
distributed.

Otherwise, looks fine to me:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
  2025-04-24 18:19 ` [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
@ 2025-05-09  2:38   ` Ming Lei
  2025-05-15  8:36     ` Daniel Wagner
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-05-09  2:38 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:47PM +0200, Daniel Wagner wrote:
> When isolcpus=io_queue is enabled all hardware queues should run on
> the housekeeping CPUs only. Thus ignore the affinity mask provided by
> the driver. Also we can't use blk_mq_map_queues because it will map all
> CPUs to first hctx unless, the CPU is the same as the hctx has the
> affinity set to, e.g. 8 CPUs with isolcpus=io_queue,2-3,6-7 config
> 
>   queue mapping for /dev/nvme0n1
>         hctx0: default 2 3 4 6 7
>         hctx1: default 5
>         hctx2: default 0
>         hctx3: default 1
> 
>   PCI name is 00:05.0: nvme0n1
>         irq 57 affinity 0-1 effective 1 is_managed:0 nvme0q0
>         irq 58 affinity 4 effective 4 is_managed:1 nvme0q1
>         irq 59 affinity 5 effective 5 is_managed:1 nvme0q2
>         irq 60 affinity 0 effective 0 is_managed:1 nvme0q3
>         irq 61 affinity 1 effective 1 is_managed:1 nvme0q4
> 
> where as with blk_mq_hk_map_queues we get
> 
>   queue mapping for /dev/nvme0n1
>         hctx0: default 2 4
>         hctx1: default 3 5
>         hctx2: default 0 6
>         hctx3: default 1 7
> 
>   PCI name is 00:05.0: nvme0n1
>         irq 56 affinity 0-1 effective 1 is_managed:0 nvme0q0
>         irq 61 affinity 4 effective 4 is_managed:1 nvme0q1
>         irq 62 affinity 5 effective 5 is_managed:1 nvme0q2
>         irq 63 affinity 0 effective 0 is_managed:1 nvme0q3
>         irq 64 affinity 1 effective 1 is_managed:1 nvme0q4
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  block/blk-mq-cpumap.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 6e6b3e989a5676186b5a31296a1b94b7602f1542..2d678d1db2b5196fc2b2ce5678fdb0cb6bad26e0 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -22,8 +22,8 @@ static unsigned int blk_mq_num_queues(const struct cpumask *mask,
>  {
>  	unsigned int num;
>  
> -	if (housekeeping_enabled(HK_TYPE_MANAGED_IRQ))
> -		mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
> +	if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);

Here both two can be considered for figuring out nr_hw_queues:

	if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
		mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
	else if (housekeeping_enabled(HK_TYPE_MANAGED_IRQ))
		mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);

>  
>  	num = cpumask_weight(mask);
>  	return min_not_zero(num, max_queues);
> @@ -61,11 +61,73 @@ unsigned int blk_mq_num_online_queues(unsigned int max_queues)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
>  
> +/*
> + * blk_mq_map_hk_queues - Create housekeeping CPU to hardware queue mapping
> + * @qmap:	CPU to hardware queue map
> + *
> + * Create a housekeeping CPU to hardware queue mapping in @qmap. If the
> + * isolcpus feature is enabled and blk_mq_map_hk_queues returns true,
> + * @qmap contains a valid configuration honoring the io_queue
> + * configuration. If the isolcpus feature is disabled this function
> + * returns false.
> + */
> +static bool blk_mq_map_hk_queues(struct blk_mq_queue_map *qmap)
> +{
> +	struct cpumask *hk_masks;
> +	cpumask_var_t isol_mask;
> +	unsigned int queue, cpu, nr_masks;
> +
> +	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		return false;

It could be more readable to move the above check to the caller.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs
  2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
  2025-04-25  6:28   ` Hannes Reinecke
@ 2025-05-09  2:54   ` Ming Lei
  1 sibling, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-05-09  2:54 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin, Martin K. Petersen, Thomas Gleixner,
	Costa Shulyupin, Juri Lelli, Valentin Schneider, Waiman Long,
	Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Thu, Apr 24, 2025 at 08:19:48PM +0200, Daniel Wagner wrote:
> When isolcpus=io_queue is enabled, and the last housekeeping CPU for a
> given hctx would go offline, there would be no CPU left which handles
> the IOs. To prevent IO stalls, prevent offlining housekeeping CPUs which
> are still severing isolated CPUs..
> 
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  block/blk-mq.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c2697db591091200cdb9f6e082e472b829701e4c..aff17673b773583dfb2b01cb2f5f010c456bd834 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3627,6 +3627,48 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>  	return data.has_rq;
>  }
>  
> +static bool blk_mq_hctx_check_isolcpus_online(struct blk_mq_hw_ctx *hctx, unsigned int cpu)
> +{
> +	const struct cpumask *hk_mask;
> +	int i;
> +
> +	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		return true;
> +
> +	hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
> +
> +	for (i = 0; i < hctx->nr_ctx; i++) {
> +		struct blk_mq_ctx *ctx = hctx->ctxs[i];
> +
> +		if (ctx->cpu == cpu)
> +			continue;
> +
> +		/*
> +		 * Check if this context has at least one online
> +		 * housekeeping CPU in this case the hardware context is
> +		 * usable.
> +		 */
> +		if (cpumask_test_cpu(ctx->cpu, hk_mask) &&
> +		    cpu_online(ctx->cpu))
> +			break;
> +
> +		/*
> +		 * The context doesn't have any online housekeeping CPUs
> +		 * but there might be an online isolated CPU mapped to
> +		 * it.
> +		 */
> +		if (cpu_is_offline(ctx->cpu))
> +			continue;
> +
> +		pr_warn("%s: trying to offline hctx%d but there is still an online isolcpu CPU %d mapped to it\n",
> +			hctx->queue->disk->disk_name,
> +			hctx->queue_num, ctx->cpu);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
>  		unsigned int this_cpu)
>  {
> @@ -3647,7 +3689,7 @@ static bool blk_mq_hctx_has_online_cpu(struct blk_mq_hw_ctx *hctx,
>  
>  		/* this hctx has at least one online CPU */
>  		if (this_cpu != cpu)
> -			return true;
> +			return blk_mq_hctx_check_isolcpus_online(hctx, this_cpu);
>  	}
>  
>  	return false;
> @@ -3659,7 +3701,7 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
>  			struct blk_mq_hw_ctx, cpuhp_online);
>  
>  	if (blk_mq_hctx_has_online_cpu(hctx, cpu))
> -		return 0;
> +		return -EINVAL;

Here the logic looks wrong, it is fine to return 0 immediately if there are
more online CPUs for this hctx.

Looks you are trying for figuring out the last online & housekeeping cpu
meantime there are still online isolated cpus in this hctx, it could be more
readable by:


	if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
		if (!can_offline_this_hk_cpu(cpu))
			return -EINVAL;
	} else {
		if (blk_mq_hctx_has_online_cpu(hctx, cpu))
			return 0;
	}

Another thing is that this way breaks cpu offline, you need to document
the behavior for 'isolcpus=io_queue' in
Documentation/admin-guide/kernel-parameters.rst. Otherwise, people may
complain it is one bug.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 6/9] isolation: introduce io_queue isolcpus type
  2025-05-09  2:04       ` Ming Lei
@ 2025-05-14 16:08         ` Daniel Wagner
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-05-14 16:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Hannes Reinecke, Daniel Wagner, Jens Axboe, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, Michael S. Tsirkin,
	Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Frederic Weisbecker, Mel Gorman,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Fri, May 09, 2025 at 10:04:18AM +0800, Ming Lei wrote:
> > 			io_queue
> > 			  Isolate from IO queue work caused by multiqueue
> > 			  device drivers. Restrict the placement of
> > 			  queues to housekeeping CPUs only, ensuring that
> > 			  all IO work is processed by a housekeeping CPU.
> > 
> > 			  Note: When an isolated CPU issues an IO, it is
> > 			  forwarded to a housekeeping CPU. This will
> > 			  trigger a software interrupt on the completion
> > 			  path.
> > 
> > 			  Note: It is not possible to offline housekeeping
> > 			  CPUs that serve isolated CPUs.
> 
> This patch adds kernel parameter only, but not apply it at all, the above
> words just confuses everyone, so I'd suggest to not expose the kernel
> command line & document until the whole mechanism is supported.

I'll add this doc update as last patch.

> Especially 'irqaffinity=0 isolcpus=io_queue' requires the application
> to offline CPU in order, which has to be documented:
> 
> https://lore.kernel.org/all/cc5e44dd-e1dc-4f24-88d9-ce45a8b0794f@flourine.local/

Okay, so you want me to extend the above second note in this case. I'll
give it a go.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues
  2025-05-09  1:47   ` Ming Lei
@ 2025-05-14 16:12     ` Daniel Wagner
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-05-14 16:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin, Martin K. Petersen,
	Thomas Gleixner, Costa Shulyupin, Juri Lelli, Valentin Schneider,
	Waiman Long, Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Fri, May 09, 2025 at 09:47:31AM +0800, Ming Lei wrote:
> On Thu, Apr 24, 2025 at 08:19:42PM +0200, Daniel Wagner wrote:
> > Multiqueue devices should only allocate queues for the housekeeping CPUs
> > when isolcpus=io_queue is set. This avoids that the isolated CPUs get
> > disturbed with OS workload.
> 
> The commit log needs to be updated:
> 
> - io_queue isn't introduced yet
> 
> - this patch can only reduce nr_hw_queues, and queue mapping isn't changed
> yet, so nothing to do with
> 
>  "This avoids that the isolated CPUs get disturbed with OS workload"

What about:

  The calculation of the upper limit for queues does not depend solely on
  the number of possible CPUs; for example, the isolcpus kernel
  command-line option must also be considered.

  To account for this, the block layer provides a helper function to
  retrieve the maximum number of queues. Use it to set an appropriate
  upper queue number limit.

I would use this version for the other patches as well,
s/possible/online/ where necessary.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs
       [not found]   ` <cd1576ee-82a3-4899-b218-2e5c5334af6e@redhat.com>
@ 2025-05-14 17:49     ` Daniel Wagner
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-05-14 17:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin, Martin K. Petersen,
	Thomas Gleixner, Costa Shulyupin, Juri Lelli, Valentin Schneider,
	Ming Lei, Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Tue, May 06, 2025 at 02:18:32PM -0400, Waiman Long wrote:
> > +static struct cpumask *group_mask_cpus_evenly(unsigned int numgrps,
> > +					      const struct cpumask *cpu_mask,
> > +					      unsigned int *nummasks)

> > +struct cpumask *group_cpus_evenly(unsigned int numgrps,
> > +				  unsigned int *nummasks)
> > +{
> > +	if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) {
> > +		return group_mask_cpus_evenly(numgrps,
> > +				housekeeping_cpumask(HK_TYPE_IO_QUEUE),
> > +				nummasks);
> > +	}
> > +
> > +	return group_possible_cpus_evenly(numgrps, nummasks);
> > +}
> 
> The group_cpus_evenly() isn't just used by block I/O. So you can't make it
> check only HK_TYPE_IO_QUEUE here. I will suggest to make it a bit more
> general and add helper function to specify the isolated cpumask the caller
> want to skip.

Okay, in this case I'd make group_mask_cpus_evenly a public interface
and drop the houskeeping bits in group_cpus_evenly.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
  2025-05-09  2:38   ` Ming Lei
@ 2025-05-15  8:36     ` Daniel Wagner
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-05-15  8:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Daniel Wagner, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Michael S. Tsirkin, Martin K. Petersen,
	Thomas Gleixner, Costa Shulyupin, Juri Lelli, Valentin Schneider,
	Waiman Long, Frederic Weisbecker, Mel Gorman, Hannes Reinecke,
	Mathieu Desnoyers, linux-kernel, linux-block, linux-nvme,
	megaraidlinux.pdl, linux-scsi, storagedev, virtualization,
	GR-QLogic-Storage-Upstream

On Fri, May 09, 2025 at 10:38:32AM +0800, Ming Lei wrote:
> > +static bool blk_mq_map_hk_queues(struct blk_mq_queue_map *qmap)
> > +{
> > +	struct cpumask *hk_masks;
> > +	cpumask_var_t isol_mask;
> > +	unsigned int queue, cpu, nr_masks;
> > +
> > +	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
> > +		return false;
> 
> It could be more readable to move the above check to the caller.

I wanted to avoid checking if housekeeping is enabled twice in a row.
I'll post the next version with your suggestion and see if this approach
is better.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-05-15  8:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
2025-04-28 12:37   ` Thomas Gleixner
2025-05-09  1:29   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 2/9] blk-mq: add number of queue calc helper Daniel Wagner
2025-05-09  1:43   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
2025-05-09  1:47   ` Ming Lei
2025-05-14 16:12     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 4/9] scsi: " Daniel Wagner
2025-05-09  1:49   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 5/9] virtio: blk/scsi: " Daniel Wagner
2025-05-09  1:52   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 6/9] isolation: introduce io_queue isolcpus type Daniel Wagner
2025-04-25  6:26   ` Hannes Reinecke
2025-04-25  7:32     ` Daniel Wagner
2025-05-09  2:04       ` Ming Lei
2025-05-14 16:08         ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs Daniel Wagner
2025-05-09  2:22   ` Ming Lei
     [not found]   ` <cd1576ee-82a3-4899-b218-2e5c5334af6e@redhat.com>
2025-05-14 17:49     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
2025-05-09  2:38   ` Ming Lei
2025-05-15  8:36     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
2025-04-25  6:28   ` Hannes Reinecke
2025-05-09  2:54   ` Ming Lei
2025-05-06  3:17 ` [PATCH v6 0/9] blk: honor isolcpus configuration Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).