virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Slaby <jirislaby@kernel.org>
To: 'Guanjun' <guanjun@linux.alibaba.com>,
	corbet@lwn.net, axboe@kernel.dk, mst@redhat.com,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	eperezma@redhat.com, vgoyal@redhat.com, stefanha@redhat.com,
	miklos@szeredi.hu, tglx@linutronix.de, peterz@infradead.org,
	akpm@linux-foundation.org, paulmck@kernel.org, thuth@redhat.com,
	rostedt@goodmis.org, bp@alien8.de, xiongwei.song@windriver.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, virtualization@lists.linux.dev,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts
Date: Fri, 1 Nov 2024 08:06:02 +0100	[thread overview]
Message-ID: <0f5c1192-090b-4c00-a951-9613289057df@kernel.org> (raw)
In-Reply-To: <20241031074618.3585491-2-guanjun@linux.alibaba.com>

Hi,

On 31. 10. 24, 8:46, 'Guanjun' wrote:
> From: Guanjun <guanjun@linux.alibaba.com>
> 
> Commit c410abbbacb9 (genirq/affinity: Add is_managed to struct irq_affinity_desc)
> introduced is_managed bit to struct irq_affinity_desc. Due to queue interrupts
> treated as managed interrupts, in scenarios where a large number of
> devices are present (using massive msix queue interrupts), an excessive number
> of IRQ matrix bits (about num_online_cpus() * nvecs) are reserved during
> interrupt allocation. This sequently leads to the situation where interrupts
> for some devices cannot be properly allocated.
> 
> Support for limiting the number of managed interrupts on every node per allocation.
> 
> Signed-off-by: Guanjun <guanjun@linux.alibaba.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  9 +++
>   block/blk-mq-cpumap.c                         |  2 +-
>   drivers/virtio/virtio_vdpa.c                  |  2 +-
>   fs/fuse/virtio_fs.c                           |  2 +-
>   include/linux/group_cpus.h                    |  2 +-
>   kernel/irq/affinity.c                         | 11 ++--
>   lib/group_cpus.c                              | 55 ++++++++++++++++++-
>   7 files changed, 73 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9b61097a6448..ac80f35d04c9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3238,6 +3238,15 @@
>   			different yeeloong laptops.
>   			Example: machtype=lemote-yeeloong-2f-7inch
>   
> +	managed_irqs_per_node=
> +			[KNL,SMP] Support for limiting the number of managed
> +			interrupts on every node to prevent the case that
> +			interrupts cannot be properly allocated where a large
> +			number of devices are present. The default number is 0,
> +			that means no limit to the number of managed irqs.
> +			Format: integer between 0 and num_possible_cpus() / num_possible_nodes()
> +			Default: 0

Kernel parameters suck. Esp. here you have to guess to even properly 
boot. Could this be auto-tuned instead?

> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -11,6 +11,30 @@
>   
>   #ifdef CONFIG_SMP
>   
> +static unsigned int __read_mostly managed_irqs_per_node;
> +static struct cpumask managed_irqs_cpumsk[MAX_NUMNODES] __cacheline_aligned_in_smp = {

This is quite excessive. On SUSE configs, this is 8192 cpu bits * 1024 
nodes = 1 M. For everyone. You have to allocate this dynamically 
instead. See e.g. setup_node_to_cpumask_map().

> +	[0 ... MAX_NUMNODES-1] = {CPU_BITS_ALL}
> +};
> +
> +static int __init irq_managed_setup(char *str)
> +{
> +	int ret;
> +
> +	ret = kstrtouint(str, 10, &managed_irqs_per_node);
> +	if (ret < 0) {
> +		pr_warn("managed_irqs_per_node= cannot parse, ignored\n");

could not be parsed

> +		return 0;
> +	}
> +
> +	if (managed_irqs_per_node * num_possible_nodes() > num_possible_cpus()) {
> +		managed_irqs_per_node = num_possible_cpus() / num_possible_nodes();
> +		pr_warn("managed_irqs_per_node= cannot be larger than %u\n",
> +			managed_irqs_per_node);
> +	}
> +	return 1;
> +}
> +__setup("managed_irqs_per_node=", irq_managed_setup);
> +
>   static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>   				unsigned int cpus_per_grp)
>   {
...
> @@ -332,6 +380,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>   /**
>    * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
>    * @numgrps: number of groups
> + * @is_managed: if these groups managed by kernel

are managed by the kernel

>    *
>    * Return: cpumask array if successful, NULL otherwise. And each element
>    * includes CPUs assigned to this group

thanks,
-- 
js
suse labs


  parent reply	other threads:[~2024-11-01  7:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31  7:46 [PATCH RFC v1 0/2] Support for limiting the number of managed interrupts on every node per allocation 'Guanjun'
2024-10-31  7:46 ` [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts 'Guanjun'
2024-10-31 10:35   ` Thomas Gleixner
2024-10-31 10:50     ` Ming Lei
     [not found]       ` <43FD1116-C188-4729-A3AB-C2A0F5A087D2@linux.alibaba.com>
2024-11-01  3:34         ` Jason Wang
2024-11-01  3:03     ` mapicccy
2024-11-01 23:37       ` Thomas Gleixner
2024-11-01  7:06   ` Jiri Slaby [this message]
2024-10-31  7:46 ` [PATCH RFC v1 2/2] genirq/cpuhotplug: Handle managed IRQs when the last CPU hotplug out in the affinity 'Guanjun'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f5c1192-090b-4c00-a951-9613289057df@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=eperezma@redhat.com \
    --cc=guanjun@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mst@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stefanha@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thuth@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xiongwei.song@windriver.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).