From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, clm@fb.com,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, hpa@zytor.com
Subject: Re: [PATCH RFC x86/nmi] Fix out-of-order nesting checks
Date: Thu, 12 Oct 2023 08:37:25 +0200 [thread overview]
Message-ID: <ZSeUJbZLbk2g7GC/@gmail.com> (raw)
In-Reply-To: <0cbff831-6e3d-431c-9830-ee65ee7787ff@paulmck-laptop>
* Paul E. McKenney <paulmck@kernel.org> wrote:
> The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10
> ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place
> the exit-time check of the bottom bit of ->idt_seq after the
> this_cpu_dec_return() that re-enables NMI nesting. This can result in
> the following sequence of events on a given CPU in kernels built with
> CONFIG_NMI_CHECK_CPU=y:
>
> o An NMI arrives, and ->idt_seq is incremented to an odd number.
> In addition, nmi_state is set to NMI_EXECUTING==1.
>
> o The NMI is processed.
>
> o The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns
> NMI_EXECUTING==1, thus opting out of the "goto nmi_restart".
>
> o Another NMI arrives and ->idt_seq is incremented to an even
> number, triggering the warning. But all is just fine, at least
> assuming we don't get so many closely spaced NMIs that the stack
> overflows or some such.
>
> Experience on the fleet indicates that the MTBF of this false positive
> is about 70 years. Or, for those who are not quite that patient, the
> MTBF appears to be about one per week per 4,000 systems.
>
> Fix this false-positive warning by moving the "nmi_restart" label before
> the initial ->idt_seq increment/check and moving the this_cpu_dec_return()
> to follow the final ->idt_seq increment/check. This way, all nested NMIs
> that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate.
> And if they don't get past that check, they will set nmi_state to
> NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state)
> to restart.
This looks like a sensible fix: the warning should obviously be atomic wrt.
the no-nesting region. I've applied your fix to tip:x86/irq, as it doesn't
seem urgent enough with a MTBF of 70 years to warrant tip:x86/urgent handling. ;-)
Thanks,
Ingo
next prev parent reply other threads:[~2023-10-12 6:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 18:40 [PATCH RFC x86/nmi] Fix out-of-order nesting checks Paul E. McKenney
2023-10-12 6:37 ` Ingo Molnar [this message]
2023-10-12 10:45 ` Paul E. McKenney
2023-10-12 6:41 ` [tip: x86/irq] x86/nmi: Fix out-of-order NMI nesting checks & false positive warning tip-bot2 for Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZSeUJbZLbk2g7GC/@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=clm@fb.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.