All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/9] blk: honor isolcpus configuration
@ 2025-04-24 18:19 Daniel Wagner
  2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Daniel Wagner @ 2025-04-24 18:19 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
	Michael S. Tsirkin
  Cc: Martin K. Petersen, Thomas Gleixner, Costa Shulyupin, Juri Lelli,
	Valentin Schneider, Waiman Long, Ming Lei, Frederic Weisbecker,
	Mel Gorman, Hannes Reinecke, Mathieu Desnoyers, linux-kernel,
	linux-block, linux-nvme, megaraidlinux.pdl, linux-scsi,
	storagedev, virtualization, GR-QLogic-Storage-Upstream,
	Daniel Wagner

I've added back the isolcpus io_queue agrument. This avoids any semantic
changes of managed_irq. I don't like it but I haven't found a
better way to deal with it. Ming clearly stated managed_irq should not
change.

Another change is to prevent offlining a housekeeping CPU which is still
serving an isolated CPU instead just warning. Seem way saner way to
handle this situation. Thanks Mathieu!

Here details what's the difference is between managed_irq and io_queue.

* nr cpus <= nr hardware queue

(e.g. 8 CPUs, 8 hardware queues)

managed_irq is working nicely for situation where there hardware has at
least as many hardware queues as CPUs, e.g. enterprise nvme-pci devices.

managed_irq will assign each CPU its own hardware queue and ensures that
non unbound IO is scheduled to a isolated CPU. As long the isolated CPU
is not issuing any IO there will be no block layer 'noise' on the
isolated CPU.

  - irqaffinity=0 isolcpus=managed_ird,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0
	        hctx1: default 1
	        hctx2: default 2
	        hctx3: default 3
	        hctx4: default 4
	        hctx5: default 5
	        hctx6: default 6
	        hctx7: default 7

	IRQ mapping for nvme0n1
	        irq 40 affinity 0 effective 0  nvme0q0
	        irq 41 affinity 0 effective 0  nvme0q1
	        irq 42 affinity 1 effective 1  nvme0q2
	        irq 43 affinity 2 effective 2  nvme0q3
	        irq 44 affinity 3 effective 3  nvme0q4
	        irq 45 affinity 4 effective 4  nvme0q5
	        irq 46 affinity 5 effective 5  nvme0q6
	        irq 47 affinity 6 effective 6  nvme0q7
	        irq 48 affinity 7 effective 7  nvme0q8

With this configuration io_queue will create four hctx for the four
housekeeping CPUs:

  - irqaffinity=0 isolcpus=io_queue,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 2
	        hctx1: default 1 3
	        hctx2: default 4 6
	        hctx3: default 5 7

	IRQ mapping for /dev/nvme0n1
	        irq 36 affinity 0 effective 0  nvme0q0
	        irq 37 affinity 0 effective 0  nvme0q1
	        irq 38 affinity 1 effective 1  nvme0q2
	        irq 39 affinity 4 effective 4  nvme0q3
	        irq 40 affinity 5 effective 5  nvme0q4

* nr cpus > nr hardware queue

(e.g. 8 CPUs, 2 hardware queues)

managed_irq is creating two hctx and all CPUs could handle IRQs. In this
case an isolated CPU is selected to handle all IRQs for a given hctx:

  - irqaffinity=0 isolcpus=managed_ird,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 1 2 3
	        hctx1: default 4 5 6 7

	IRQ mapping for /dev/nvme0n1
	        irq 40 affinity 0 effective 0  nvme0q0
	        irq 41 affinity 0-3 effective 3  nvme0q1
	        irq 42 affinity 4-7 effective 7  nvme0q2

io_queue also creates also two hctxs but only assigns housekeeping CPUs
to handle the IRQs:

  - irqaffinity=0 isolcpus=io_queue,2-3,6-7

	queue mapping for /dev/nvme0n1
	        hctx0: default 0 1 2 6
	        hctx1: default 3 4 5 7

	IRQ mapping for /dev/nvme0n1
	        irq 36 affinity 0 effective 0  nvme0q0
	        irq 37 affinity 0-1 effective 1  nvme0q1
	        irq 38 affinity 4-5 effective 5  nvme0q2

The case that there are less hardware queues than CPUs is more common
with the SCSI HBAs so with the io_queue approach not just nvme-pci are
supported.

Something completely different: we got several bug reports for kdump and
SCSI HBAs. The issue is that the SCSI drivers are allocating too many
resources when it's a kdump kernel. This series will fix this as well,
because the number of queues will be limitted by
blk_mq_num_possible_queues() instead of num_possible_cpus(). This will
avoid sprinkling is_kdump_kernel() around.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
Changes in v6:
- added io_queue isolcpus type back
- prevent offlining hk cpu if a isol cpu is still present isntead just warning
- Link to v5: https://lore.kernel.org/r/20250110-isolcpus-io-queues-v5-0-0e4f118680b0@kernel.org

Changes in v5:
- rebased on latest for-6.14/block
- udpated documetation on managed_irq
- updated commit message "blk-mq: issue warning when offlining hctx with online isolcpus"
- split input/output parameter in "lib/group_cpus: let group_cpu_evenly return number of groups"
- dropped "sched/isolation: document HK_TYPE housekeeping option"
- Link to v4: https://lore.kernel.org/r/20241217-isolcpus-io-queues-v4-0-5d355fbb1e14@kernel.org

Changes in v4:
- added "blk-mq: issue warning when offlining hctx with online isolcpus"
- fixed check in cgroup_cpus_evenly, the if condition needs to use
  housekeeping_enabled() and not cpusmask_weight(housekeeping_masks),
  because the later will always return a valid mask.
- dropped fixed tag from "lib/group_cpus.c: honor housekeeping config when
  grouping CPUs"
- fixed overlong line "scsi: use block layer helpers to calculate num
  of queues"
- dropped "sched/isolation: Add io_queue housekeeping option",
  just document the housekeep enum hk_type
- added "lib/group_cpus: let group_cpu_evenly return number of groups"
- collected tags
- splitted series into a preperation series:
  https://lore.kernel.org/linux-nvme/20241202-refactor-blk-affinity-helpers-v6-0-27211e9c2cd5@kernel.org/
- Link to v3: https://lore.kernel.org/r/20240806-isolcpus-io-queues-v3-0-da0eecfeaf8b@suse.de

Changes in v3:
- lifted a couple of patches from
  https://lore.kernel.org/all/20210709081005.421340-1-ming.lei@redhat.com/
  "virito: add APIs for retrieving vq affinity"
  "blk-mq: introduce blk_mq_dev_map_queues"
- replaces all users of blk_mq_[pci|virtio]_map_queues with
  blk_mq_dev_map_queues
- updated/extended number of queue calc helpers
- add isolcpus=io_queue CPU-hctx mapping function
- documented enum hk_type and isolcpus=io_queue
- added "scsi: pm8001: do not overwrite PCI queue mapping"
- Link to v2: https://lore.kernel.org/r/20240627-isolcpus-io-queues-v2-0-26a32e3c4f75@suse.de

Changes in v2:
- updated documentation
- splitted blk/nvme-pci patch
- dropped HK_TYPE_IO_QUEUE, use HK_TYPE_MANAGED_IRQ
- Link to v1: https://lore.kernel.org/r/20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de

---
Daniel Wagner (9):
      lib/group_cpus: let group_cpu_evenly return number initialized masks
      blk-mq: add number of queue calc helper
      nvme-pci: use block layer helpers to calculate num of queues
      scsi: use block layer helpers to calculate num of queues
      virtio: blk/scsi: use block layer helpers to calculate num of queues
      isolation: introduce io_queue isolcpus type
      lib/group_cpus: honor housekeeping config when grouping CPUs
      blk-mq: use hk cpus only when isolcpus=io_queue is enabled
      blk-mq: prevent offlining hk CPU with associated online isolated CPUs

 block/blk-mq-cpumap.c                     | 116 +++++++++++++++++++++++++++++-
 block/blk-mq.c                            |  46 +++++++++++-
 drivers/block/virtio_blk.c                |   5 +-
 drivers/nvme/host/pci.c                   |   5 +-
 drivers/scsi/megaraid/megaraid_sas_base.c |  15 ++--
 drivers/scsi/qla2xxx/qla_isr.c            |  10 +--
 drivers/scsi/smartpqi/smartpqi_init.c     |   5 +-
 drivers/scsi/virtio_scsi.c                |   1 +
 drivers/virtio/virtio_vdpa.c              |   9 +--
 fs/fuse/virtio_fs.c                       |   6 +-
 include/linux/blk-mq.h                    |   2 +
 include/linux/group_cpus.h                |   3 +-
 include/linux/sched/isolation.h           |   1 +
 kernel/irq/affinity.c                     |   9 +--
 kernel/sched/isolation.c                  |   7 ++
 lib/group_cpus.c                          |  90 +++++++++++++++++++++--
 16 files changed, 290 insertions(+), 40 deletions(-)
---
base-commit: 3b607b75a345b1d808031bf1bb1038e4dac8d521
change-id: 20240620-isolcpus-io-queues-1a88eb47ff8b

Best regards,
-- 
Daniel Wagner <wagi@kernel.org>


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-05-15  8:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-24 18:19 [PATCH v6 0/9] blk: honor isolcpus configuration Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 1/9] lib/group_cpus: let group_cpu_evenly return number initialized masks Daniel Wagner
2025-04-28 12:37   ` Thomas Gleixner
2025-05-09  1:29   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 2/9] blk-mq: add number of queue calc helper Daniel Wagner
2025-05-09  1:43   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 3/9] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
2025-05-09  1:47   ` Ming Lei
2025-05-14 16:12     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 4/9] scsi: " Daniel Wagner
2025-05-09  1:49   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 5/9] virtio: blk/scsi: " Daniel Wagner
2025-05-09  1:52   ` Ming Lei
2025-04-24 18:19 ` [PATCH v6 6/9] isolation: introduce io_queue isolcpus type Daniel Wagner
2025-04-25  6:26   ` Hannes Reinecke
2025-04-25  7:32     ` Daniel Wagner
2025-05-09  2:04       ` Ming Lei
2025-05-14 16:08         ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 7/9] lib/group_cpus: honor housekeeping config when grouping CPUs Daniel Wagner
2025-05-09  2:22   ` Ming Lei
     [not found]   ` <cd1576ee-82a3-4899-b218-2e5c5334af6e@redhat.com>
2025-05-14 17:49     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 8/9] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
2025-05-09  2:38   ` Ming Lei
2025-05-15  8:36     ` Daniel Wagner
2025-04-24 18:19 ` [PATCH v6 9/9] blk-mq: prevent offlining hk CPU with associated online isolated CPUs Daniel Wagner
2025-04-25  6:28   ` Hannes Reinecke
2025-05-09  2:54   ` Ming Lei
2025-05-06  3:17 ` [PATCH v6 0/9] blk: honor isolcpus configuration Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.