From: Thomas Gleixner <tglx@linutronix.de>
To: Gabriele Monaco <gmonaco@redhat.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Andy Lutomirski <luto@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs
Date: Wed, 02 Jul 2025 11:39:37 +0200 [thread overview]
Message-ID: <87cyajkk2u.ffs@tglx> (raw)
In-Reply-To: <5097b17cc506af60ca718aba5a0a10e0fda01440.camel@redhat.com>
On Wed, Jul 02 2025 at 09:18, Gabriele Monaco wrote:
> On Tue, 2025-07-01 at 14:54 +0200, Peter Zijlstra wrote:
> I could probably only fix this without even considering NMIs when
> interrupts are disabled, but I believe if that happens, the tracepoints
> would report something wrong, since using tracing_irq_cpu alone does:
>
> local_irq_disable -> trace_irq_off
> nmi_enter -> no trace
> nmi_exit -> trace_irq_on
> // here interrupts are still off, aren't they?
> local_irq_enable -> no trace
>
> The idea that I described poorly, was to use tracing_irq_cpu in a way
> that the first context disabling interrupts fires the tracepoint
> (current behaviour), but when it's time to enable interrupts, an NMI
> which didn't disable interrupts shouldn't trace but let the next
> context trace.
...
> [1] - https://lore.kernel.org/lkml/87sejup1fe.ffs@tglx
As I told you before:
"As you correctly described, the two states are asynchronous in the
context of NMIs, so the obvious thing is to treat them as seperate
entities so that the trace really contains the off/on events
independent of lockdep enablement and lockdep's internal state. No?"
So I would have expected that you look at the lockdep implementation and
figure out why that one works correctly. Let's have a look at it:
nmi_entry()
irq_state.lockdep = lockdep_hardirqs_enabled();
lockdep_hardirqs_off(CALLER_ADDR0);
...
return irq_state;
nmi_exit(irq_state)
if (irq_state.lockdep)
lockdep_hardirqs_on(CALLER_ADDR0);
On NMI entry the internal state of lockdep tracking is recorded and
lockdep_hardirqs_off() handles a redundant off gracefully. As I said it
might be arguable to invoke it only when irq_state.lockdep == true, but
that's a cosmetic or performance issue not a correctness problem.
On NMI exit lockdep_hardirqs_on() is only invoked when the lockdep internal
state at entry was 'enabled' so it can restore the state correctly.
If you model the tracer exactly the same way:
irq_state.lockdep = lockdep_hardirqs_enabled();
irq_state.tracer = tracer_hardirqs_enabled();
....
You get the idea...
That will be "correct" in terms of sequencing depending on your trace side
implementation:
trace_hardirqs_off()
if (trace_hardirqs_enabled()) {
trace_irqs_enabled = false; // A
emit_trace(OFF); // B
If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.
If it hits between A and B, then the NMI tracing wont do anything
because enabled state is already false, but the resulting trace
sequencing is messed up:
irqs on
...
nmi_entry
nmi_exit
irqs off
IOW, it misses the ON->OFF OFF->ON transitions in the NMI.
So you want to do it the other way round:
trace_hardirqs_off()
if (trace_hardirqs_enabled()) {
emit_trace(OFF); // A
trace_irqs_enabled = false; // B
If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.
If it hits between A and B then you get a duplicate
irqs on
...
irqs off
nmi_entry
irqs off
irqs on
nmi_exit
There is nothing you can do about it, but that's easy enough to filter
out, no?
I'm pretty sure you can figure out how that should be modeled on the
on() side of affairs.
Thanks,
tglx
prev parent reply other threads:[~2025-07-02 10:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 12:08 [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs Gabriele Monaco
2025-06-30 16:24 ` Thomas Gleixner
2025-07-01 12:54 ` Peter Zijlstra
2025-07-02 7:18 ` Gabriele Monaco
2025-07-02 9:39 ` Thomas Gleixner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cyajkk2u.ffs@tglx \
--to=tglx@linutronix.de \
--cc=catalin.marinas@arm.com \
--cc=gmonaco@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox