public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Aaron Tomlin <atomlin@atomlin.com>
Cc: axboe@kernel.dk, hch@lst.de, sagi@grimberg.me, mst@redhat.com,
	aacraid@microsemi.com, James.Bottomley@hansenpartnership.com,
	martin.petersen@oracle.com, liyihang9@h-partners.com,
	kashyap.desai@broadcom.com, sumit.saxena@broadcom.com,
	shivasharan.srikanteshwara@broadcom.com,
	chandrakanth.patil@broadcom.com, sathya.prakash@broadcom.com,
	sreekanth.reddy@broadcom.com,
	suganath-prabu.subramani@broadcom.com, ranjan.kumar@broadcom.com,
	jinpu.wang@cloud.ionos.com, tglx@kernel.org, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, akpm@linux-foundation.org,
	maz@kernel.org, ruanjinjie@huawei.com, bigeasy@linutronix.de,
	yphbchou0911@gmail.com, wagi@kernel.org, frederic@kernel.org,
	longman@redhat.com, chenridong@huawei.com, hare@suse.de,
	kch@nvidia.com, ming.lei@redhat.com, steve@abita.co,
	sean@ashe.io, chjohnst@gmail.com, neelx@suse.com,
	mproche@gmail.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
	megaraidlinux.pdl@broadcom.com, mpi3mr-linuxdrv.pdl@broadcom.com,
	MPT-FusionLinux.pdl@broadcom.com
Subject: Re: [PATCH v9 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
Date: Tue, 31 Mar 2026 17:05:07 -0600	[thread overview]
Message-ID: <acxTI2CJlq3npMVa@kbusch-mbp> (raw)
In-Reply-To: <20260330221047.630206-11-atomlin@atomlin.com>

On Mon, Mar 30, 2026 at 06:10:44PM -0400, Aaron Tomlin wrote:
> +static bool blk_mq_validate(struct blk_mq_queue_map *qmap,
> +			    const struct cpumask *active_hctx)
> +{
> +	/*
> +	 * Verify if the mapping is usable when housekeeping
> +	 * configuration is enabled
> +	 */
> +
> +	for (int queue = 0; queue < qmap->nr_queues; queue++) {
> +		int cpu;
> +
> +		if (cpumask_test_cpu(queue, active_hctx)) {
> +			/*
> +			 * This htcx has at least one online CPU thus it

Typo, should say "hctx".

> +			 * is able to serve any assigned isolated CPU.
> +			 */
> +			continue;
> +		}
> +
> +		/*
> +		 * There is no housekeeping online CPU for this hctx, all
> +		 * good as long as all non houskeeping CPUs are also

Typo, "housekeeping".

...

>  void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  {
> -	const struct cpumask *masks;
> +	struct cpumask *masks __free(kfree) = NULL;
> +	const struct cpumask *constraint;
>  	unsigned int queue, cpu, nr_masks;
> +	cpumask_var_t active_hctx;
>  
> -	masks = group_cpus_evenly(qmap->nr_queues, &nr_masks);
> -	if (!masks) {
> -		for_each_possible_cpu(cpu)
> -			qmap->mq_map[cpu] = qmap->queue_offset;
> -		return;
> -	}
> +	if (!zalloc_cpumask_var(&active_hctx, GFP_KERNEL))
> +		goto fallback;
> +
> +	if (housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		constraint = housekeeping_cpumask(HK_TYPE_IO_QUEUE);
> +	else
> +		constraint = cpu_possible_mask;
> +
> +	/* Map CPUs to the hardware contexts (hctx) */
> +	masks = group_mask_cpus_evenly(qmap->nr_queues, constraint, &nr_masks);
> +	if (!masks)
> +		goto free_fallback;
>  
>  	for (queue = 0; queue < qmap->nr_queues; queue++) {
> -		for_each_cpu(cpu, &masks[queue % nr_masks])
> -			qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +		unsigned int idx = (qmap->queue_offset + queue) % nr_masks;
> +
> +		for_each_cpu(cpu, &masks[idx]) {
> +			qmap->mq_map[cpu] = idx;

I think there's something off with this when we have multiple queue maps. The
wrapping loses the offset when we've isolated CPUs, so I think the index would
end up incorrect.

Trying this series out when "nvme.poll_queues=2" with isolcpus set, I am
getting a kernel panic:

 nvme nvme0: 8/0/2 default/read/poll queues
 BUG: unable to handle page fault for address: ffff889101898da0
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 4e01067 P4D 4e01067 PUD 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 11 UID: 0 PID: 201 Comm: kworker/u64:19 Not tainted 7.0.0-rc4-00222-g065cad526374 #1586 PREEMPT
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
 Workqueue: async async_run_entry_fn
 RIP: 0010:nvme_init_hctx_common+0x6f/0x190 [nvme]
 Code: 85 78 01 00 00 0f 85 86 00 00 00 45 8b b5 88 01 00 00 4c 89 f0 4d 89 f1 48 c1 e0 04 49 89 c7 4c 8d 94 03 38 0b 00 00 49 01 df <49> 83 bf 40 0b 00 00 00 74 64 44 89 d0 49 81 fa 00 f0 ff ff 77 27
 RSP: 0018:ffffc9000083ba90 EFLAGS: 00010286
 RAX: 0000000ffffffff0 RBX: ffff888101898270 RCX: ffffffffa008bd40
 RDX: 0000000000000008 RSI: ffff888101898270 RDI: ffff888101900800
 RBP: ffffc9000083bac8 R08: 0000000000000060 R09: 00000000ffffffff
 R10: ffff889101898d98 R11: ffff888101ddf000 R12: ffff8881087f36c0
 R13: ffff888101900800 R14: 00000000ffffffff R15: ffff889101898260
 FS:  0000000000000000(0000) GS:ffff8890bb50a000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff889101898da0 CR3: 0000000101fe8001 CR4: 0000000000770ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  blk_mq_alloc_and_init_hctx+0x11e/0x3a0
  __blk_mq_realloc_hw_ctxs+0x185/0x220
  blk_mq_init_allocated_queue+0xeb/0x3b0
  ? percpu_ref_init+0x6a/0x130
  blk_mq_alloc_queue+0x7a/0xd0
  __blk_mq_alloc_disk+0x14/0x60
  nvme_alloc_ns+0xac/0xb30 [nvme_core]
  ? blk_mq_run_hw_queue+0x117/0x270
  nvme_scan_ns+0x279/0x350 [nvme_core]
  async_run_entry_fn+0x2e/0x130
  process_one_work+0x16c/0x3a0
  worker_thread+0x173/0x2e0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe0/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x207/0x270
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>


  reply	other threads:[~2026-04-01 12:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 22:10 [PATCH v9 00/13] blk: honor isolcpus configuration Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 01/13] scsi: aacraid: use block layer helpers to calculate num of queues Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 02/13] lib/group_cpus: remove dead !SMP code Aaron Tomlin
2026-04-01 12:29   ` Sebastian Andrzej Siewior
2026-04-01 19:31     ` Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 03/13] lib/group_cpus: Add group_mask_cpus_evenly() Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 04/13] genirq/affinity: Add cpumask to struct irq_affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 05/13] blk-mq: add blk_mq_{online|possible}_queue_affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 06/13] nvme-pci: use block layer helpers to constrain queue affinity Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 07/13] scsi: Use " Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 08/13] virtio: blk/scsi: use " Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 09/13] isolation: Introduce io_queue isolcpus type Aaron Tomlin
2026-04-01 12:49   ` Sebastian Andrzej Siewior
2026-04-01 19:05     ` Waiman Long
2026-04-02  7:58       ` Sebastian Andrzej Siewior
2026-04-03  1:54         ` Waiman Long
2026-04-01 20:58     ` Aaron Tomlin
2026-04-02  9:09       ` Sebastian Andrzej Siewior
2026-04-03  0:50         ` Aaron Tomlin
2026-04-03  1:20           ` Ming Lei
2026-03-30 22:10 ` [PATCH v9 10/13] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Aaron Tomlin
2026-03-31 23:05   ` Keith Busch [this message]
2026-04-01 17:16     ` Aaron Tomlin
2026-04-03  1:43   ` Ming Lei
2026-03-30 22:10 ` [PATCH v9 11/13] blk-mq: prevent offlining hk CPUs with associated online isolated CPUs Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 12/13] genirq/affinity: Restrict managed IRQ affinity to housekeeping CPUs Aaron Tomlin
2026-03-30 22:10 ` [PATCH v9 13/13] docs: add io_queue flag to isolcpus Aaron Tomlin
2026-03-31  1:01 ` [PATCH v9 00/13] blk: honor isolcpus configuration Ming Lei
2026-03-31 13:38   ` Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acxTI2CJlq3npMVa@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=MPT-FusionLinux.pdl@broadcom.com \
    --cc=aacraid@microsemi.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@atomlin.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=chenridong@huawei.com \
    --cc=chjohnst@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=juri.lelli@redhat.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=liyihang9@h-partners.com \
    --cc=longman@redhat.com \
    --cc=martin.petersen@oracle.com \
    --cc=maz@kernel.org \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=mproche@gmail.com \
    --cc=mst@redhat.com \
    --cc=neelx@suse.com \
    --cc=peterz@infradead.org \
    --cc=ranjan.kumar@broadcom.com \
    --cc=ruanjinjie@huawei.com \
    --cc=sagi@grimberg.me \
    --cc=sathya.prakash@broadcom.com \
    --cc=sean@ashe.io \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=steve@abita.co \
    --cc=suganath-prabu.subramani@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    --cc=tglx@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux.dev \
    --cc=wagi@kernel.org \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox