All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Breno Leitao <leitao@debian.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Cc: leit@meta.com, "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Wei Liu <wei.liu@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Adrian Huang <ahuang12@lenovo.com>,
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()
Date: Mon, 29 Jul 2024 18:13:34 +0200	[thread overview]
Message-ID: <874j8889ch.ffs@tglx> (raw)
In-Reply-To: <20240729140604.2814597-1-leitao@debian.org>

On Mon, Jul 29 2024 at 07:06, Breno Leitao wrote:
> I've been running some experiments with failslab fault injector running
> to detect a different problem, and the machine always crash with the
> following stack:
>
> 	can not alloc irq_pin_list (-1,0,20)
> 	Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed
>
> 	Call Trace:
> 	 panic
> 	   _printk
> 	   panic_smp_self_stop
> 	   rcu_is_watching
> 	   intel_irq_remapping_free

This completely lacks context. When does this happen? What's the system
state? What has intel_irq_remapping_free() to do with the allocation path?

> This happens because add_pin_to_irq_node() function would panic if
> adding a pin to an IRQ failed due to -ENOMEM (which was injected by
> failslab fault injector).  I've been running with this patch in my test
> cases in order to be able to pick real bugs, and I thought it might be a
> good idea to have it upstream also, so, other people trying to find real
> bugs don't stumble upon this one. Also, this makes sense in a real
> world(?), when retrying a few times might be better than just
> panicking.

While it seems to make sense, the reality is that this is mostly early
boot code. If there is a real world memory allocation failure during
early boot then retries will not help at all.

> Introduce a retry mechanism that attempts to add the pin up to 3 times
> before giving up and panicking. This should improve the robustness of
> the IO-APIC code in the face of transient errors.

I'm absolutely not convinced by this loop heuristic. That's just a bad
hack.

Thanks,

        tglx

  reply	other threads:[~2024-07-29 16:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-29 14:06 [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node() Breno Leitao
2024-07-29 16:13 ` Thomas Gleixner [this message]
2024-07-29 16:55   ` Breno Leitao
2024-07-29 17:44     ` Thomas Gleixner
2024-07-30 10:28       ` Breno Leitao
2024-08-07 16:25     ` [tip: x86/apic] x86/ioapic: Handle allocation failures gracefully tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874j8889ch.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=ahuang12@lenovo.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=leit@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.