virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Dongli Zhang <dongli.zhang@oracle.com>, linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux.dev, joe.jin@oracle.com
Subject: Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure
Date: Mon, 22 Apr 2024 22:58:10 +0200	[thread overview]
Message-ID: <87ttjtunbx.ffs@tglx> (raw)
In-Reply-To: <20240419013322.58500-2-dongli.zhang@oracle.com>

On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:

> When a CPU is offline, its IRQs may migrate to other CPUs. For managed
> IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
> affinity are offline). For regular IRQs, there will only be a
> migration.

Please write out interrupts. There is enough space for it and IRQ is
just not a regular word.

> The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.
>
> 104         if (irq_fixup_move_pending(desc, true))
> 105                 affinity = irq_desc_get_pending_mask(desc);
> 106         else
> 107                 affinity = irq_data_get_affinity_mask(d);
>
> The migrate_one_irq() may use all online CPUs, if all CPUs in
> pending_mask/affinity_mask are already offline.
>
> 113         if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
> 114                 /*
> 115                  * If the interrupt is managed, then shut it down and leave
> 116                  * the affinity untouched.
> 117                  */
> 118                 if (irqd_affinity_is_managed(d)) {
> 119                         irqd_set_managed_shutdown(d);
> 120                         irq_shutdown_and_deactivate(desc);
> 121                         return false;
> 122                 }
> 123                 affinity = cpu_online_mask;
> 124                 brokeaff = true;
> 125         }

Please don't copy code into the change log. Describe the problem in
text.

> However, there is a corner case. Although some CPUs in
> pending_mask/affinity_mask are still online, they are lack of available
> vectors. If the kernel continues calling irq_do_set_affinity() with those CPUs,
> there will be -ENOSPC error.
>
> This is not reasonable as other online CPUs still have many available
> vectors.

Reasonable is not the question here. It's either correct or not.

> name:   VECTOR
>  size:   0
>  mapped: 529
>  flags:  0x00000103
> Online bitmaps:        7
> Global available:    884
> Global reserved:       6
> Total allocated:     539
> System: 36: 0-19,21,50,128,236,243-244,246-255
>  | CPU | avl | man | mac | act | vectors
>      0   147     0     0   55  32-49,51-87
>      1   147     0     0   55  32-49,51-87
>      2     0     0     0  202  32-49,51-127,129-235

Just ouf of curiousity. How did this end up with CPU2 completely
occupied?

>      4   147     0     0   55  32-49,51-87
>      5   147     0     0   55  32-49,51-87
>      6   148     0     0   54  32-49,51-86
>      7   148     0     0   54  32-49,51-86
>
> This issue should not happen for managed IRQs because the vectors are already
> reserved before CPU hotplug.

Should not? It either does or it does not.

> For regular IRQs, do a re-try with all online
> CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>  kernel/irq/cpuhotplug.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 1ed2b1739363..d1666a6b73f4 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
>  	 * CPU.
>  	 */
>  	err = irq_do_set_affinity(d, affinity, false);
> +
> +	if (err == -ENOSPC &&
> +	    !irqd_affinity_is_managed(d) &&
> +	    affinity != cpu_online_mask) {

This really wants to be a single line conditional.

> +		affinity = cpu_online_mask;
> +		brokeaff = true;
> +
> +		pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all online CPUs\n",
> +			 d->irq, cpumask_pr_args(affinity));

How is it useful to print cpu_online_mask here?

Thanks,

        tglx

  reply	other threads:[~2024-04-22 20:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19  1:33 [PATCH 0/1] genirq/cpuhotplug: fix CPU hotplug set affinity failure issue Dongli Zhang
2024-04-19  1:33 ` [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure Dongli Zhang
2024-04-22 20:58   ` Thomas Gleixner [this message]
2024-04-22 23:09     ` Dongli Zhang
2024-04-23  1:02       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttjtunbx.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=dongli.zhang@oracle.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).