linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Daniel Wagner <dwagner@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@grimberg.me>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	John Garry <john.g.garry@oracle.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	Sumit Saxena <sumit.saxena@broadcom.com>,
	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
	Chandrakanth patil <chandrakanth.patil@broadcom.com>,
	Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
	Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
	Nilesh Javali <njavali@marvell.com>,
	GR-QLogic-Storage-Upstream@marvell.com,
	Jonathan Corbet <corbet@lwn.net>,
	Frederic Weisbecker <frederic@kernel.org>,
	Mel Gorman <mgorman@suse.de>, Hannes Reinecke <hare@suse.de>,
	Sridhar Balaraman <sbalaraman@parallelwireless.com>,
	"brookxu.cn" <brookxu.cn@gmail.com>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
	virtualization@lists.linux.dev, megaraidlinux.pdl@broadcom.com,
	mpi3mr-linuxdrv.pdl@broadcom.com,
	MPT-FusionLinux.pdl@broadcom.com, storagedev@microchip.com,
	linux-doc@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH v3 15/15] blk-mq: use hk cpus only when isolcpus=io_queue is enabled
Date: Tue, 13 Aug 2024 20:56:02 +0800	[thread overview]
Message-ID: <ZrtX4pzqwVUEgIPS@fedora> (raw)
In-Reply-To: <20240806-isolcpus-io-queues-v3-15-da0eecfeaf8b@suse.de>

On Tue, Aug 06, 2024 at 02:06:47PM +0200, Daniel Wagner wrote:
> When isolcpus=io_queue is enabled all hardware queues should run on the
> housekeeping CPUs only. Thus ignore the affinity mask provided by the
> driver. Also we can't use blk_mq_map_queues because it will map all CPUs
> to first hctx unless, the CPU is the same as the hctx has the affinity
> set to, e.g. 8 CPUs with isolcpus=io_queue,2-3,6-7 config
> 
>   queue mapping for /dev/nvme0n1
>         hctx0: default 2 3 4 6 7
>         hctx1: default 5
>         hctx2: default 0
>         hctx3: default 1
> 
>   PCI name is 00:05.0: nvme0n1
>         irq 57 affinity 0-1 effective 1 is_managed:0 nvme0q0
>         irq 58 affinity 4 effective 4 is_managed:1 nvme0q1
>         irq 59 affinity 5 effective 5 is_managed:1 nvme0q2
>         irq 60 affinity 0 effective 0 is_managed:1 nvme0q3
>         irq 61 affinity 1 effective 1 is_managed:1 nvme0q4
> 
> where as with blk_mq_hk_map_queues we get
> 
>   queue mapping for /dev/nvme0n1
>         hctx0: default 2 4
>         hctx1: default 3 5
>         hctx2: default 0 6
>         hctx3: default 1 7
> 
>   PCI name is 00:05.0: nvme0n1
>         irq 56 affinity 0-1 effective 1 is_managed:0 nvme0q0
>         irq 61 affinity 4 effective 4 is_managed:1 nvme0q1
>         irq 62 affinity 5 effective 5 is_managed:1 nvme0q2
>         irq 63 affinity 0 effective 0 is_managed:1 nvme0q3
>         irq 64 affinity 1 effective 1 is_managed:1 nvme0q4
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  block/blk-mq-cpumap.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index c1277763aeeb..7e026c2ffa02 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -60,11 +60,64 @@ unsigned int blk_mq_num_online_queues(unsigned int max_queues)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_num_online_queues);
>  
> +static bool blk_mq_hk_map_queues(struct blk_mq_queue_map *qmap)
> +{
> +	struct cpumask *hk_masks;
> +	cpumask_var_t isol_mask;
> +
> +	unsigned int queue, cpu;
> +
> +	if (!housekeeping_enabled(HK_TYPE_IO_QUEUE))
> +		return false;
> +
> +	/* map housekeeping cpus to matching hardware context */
> +	hk_masks = group_cpus_evenly(qmap->nr_queues);
> +	if (!hk_masks)
> +		goto fallback;
> +
> +	for (queue = 0; queue < qmap->nr_queues; queue++) {
> +		for_each_cpu(cpu, &hk_masks[queue])
> +			qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +	}
> +
> +	kfree(hk_masks);
> +
> +	/* map isolcpus to hardware context */
> +	if (!alloc_cpumask_var(&isol_mask, GFP_KERNEL))
> +		goto fallback;
> +
> +	queue = 0;
> +	cpumask_andnot(isol_mask,
> +		       cpu_possible_mask,
> +		       housekeeping_cpumask(HK_TYPE_IO_QUEUE));
> +
> +	for_each_cpu(cpu, isol_mask) {
> +		qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +		queue = (queue + 1) % qmap->nr_queues;
> +	}
> +

With patch 14 and the above change, managed irq's affinity becomes not
matched with blk-mq mapping any more.

If the last CPU in managed irq's affinity becomes offline, blk-mq
mapping may have other isolated CPUs, so IOs in this hctx won't be
drained from blk_mq_hctx_notify_offline() in case of CPU offline,
but genirq still shutdowns this manage irq.

So IO hang risk is introduced here, it should be the reason of your
hang observation.


Thanks, 
Ming


  parent reply	other threads:[~2024-08-13 12:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-06 12:06 [PATCH v3 00/15] honor isolcpus configuration Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 01/15] scsi: pm8001: do not overwrite PCI queue mapping Daniel Wagner
2024-08-06 13:24   ` Christoph Hellwig
2024-08-06 15:03   ` John Garry
2024-08-06 12:06 ` [PATCH v3 02/15] virito: add APIs for retrieving vq affinity Daniel Wagner
2024-08-06 13:25   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 03/15] blk-mq: introduce blk_mq_dev_map_queues Daniel Wagner
2024-08-06 13:26   ` Christoph Hellwig
2024-08-07 12:49     ` Daniel Wagner
2024-08-12  9:05       ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 04/15] scsi: replace blk_mq_pci_map_queues with blk_mq_dev_map_queues Daniel Wagner
2024-08-12  9:06   ` Christoph Hellwig
2024-08-12 15:31   ` John Garry
2024-08-13  9:39     ` Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 05/15] nvme: " Daniel Wagner
2024-08-12  9:06   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 06/15] virtio: blk/scs: replace blk_mq_virtio_map_queues " Daniel Wagner
2024-08-12  9:07   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 07/15] blk-mq: remove unused queue mapping helpers Daniel Wagner
2024-08-12  9:08   ` Christoph Hellwig
2024-08-13  9:41     ` Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 08/15] sched/isolation: Add io_queue housekeeping option Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 09/15] docs: add io_queue as isolcpus options Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 10/15] blk-mq: add number of queue calc helper Daniel Wagner
2024-08-12  9:03   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 11/15] nvme-pci: use block layer helpers to calculate num of queues Daniel Wagner
2024-08-12  9:04   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 12/15] scsi: " Daniel Wagner
2024-08-12  9:09   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 13/15] virtio: blk/scsi: " Daniel Wagner
2024-08-06 12:06 ` [PATCH v3 14/15] lib/group_cpus.c: honor housekeeping config when grouping CPUs Daniel Wagner
2024-08-06 14:47   ` Ming Lei
2024-08-12  9:09   ` Christoph Hellwig
2024-08-06 12:06 ` [PATCH v3 15/15] blk-mq: use hk cpus only when isolcpus=io_queue is enabled Daniel Wagner
2024-08-06 14:55   ` Ming Lei
2024-08-07 12:40     ` Daniel Wagner
2024-08-08  5:26       ` Ming Lei
2024-08-09  7:22         ` Daniel Wagner
2024-08-09 14:53           ` Ming Lei
2024-08-13 12:17             ` Daniel Wagner
2024-08-09 15:23   ` Ming Lei
2024-08-13 12:53     ` Daniel Wagner
2024-08-13 12:56   ` Ming Lei [this message]
2024-08-13 13:11     ` Daniel Wagner
2024-08-06 13:09 ` [PATCH v3 00/15] honor isolcpus configuration Stefan Hajnoczi
2024-08-07 12:25   ` Daniel Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrtX4pzqwVUEgIPS@fedora \
    --to=ming.lei@redhat.com \
    --cc=GR-QLogic-Storage-Upstream@marvell.com \
    --cc=MPT-FusionLinux.pdl@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=brookxu.cn@gmail.com \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=corbet@lwn.net \
    --cc=dwagner@suse.de \
    --cc=frederic@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jasowang@redhat.com \
    --cc=john.g.garry@oracle.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=mgorman@suse.de \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=mst@redhat.com \
    --cc=njavali@marvell.com \
    --cc=sagi@grimberg.me \
    --cc=sathya.prakash@broadcom.com \
    --cc=sbalaraman@parallelwireless.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=storagedev@microchip.com \
    --cc=suganath-prabu.subramani@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).