Re: [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@linutronix.de>
To: Gabriele Monaco <gmonaco@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Andy Lutomirski <luto@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs
Date: Wed, 02 Jul 2025 11:39:37 +0200	[thread overview]
Message-ID: <87cyajkk2u.ffs@tglx> (raw)
In-Reply-To: <5097b17cc506af60ca718aba5a0a10e0fda01440.camel@redhat.com>

On Wed, Jul 02 2025 at 09:18, Gabriele Monaco wrote:
> On Tue, 2025-07-01 at 14:54 +0200, Peter Zijlstra wrote:
> I could probably only fix this without even considering NMIs when
> interrupts are disabled, but I believe if that happens, the tracepoints
> would report something wrong, since using tracing_irq_cpu alone does:
>
> local_irq_disable -> trace_irq_off
>   nmi_enter -> no trace
>   nmi_exit -> trace_irq_on
>   // here interrupts are still off, aren't they?
> local_irq_enable -> no trace
>
> The idea that I described poorly, was to use tracing_irq_cpu in a way
> that the first context disabling interrupts fires the tracepoint
> (current behaviour), but when it's time to enable interrupts, an NMI
> which didn't disable interrupts shouldn't trace but let the next
> context trace.
...
> [1] - https://lore.kernel.org/lkml/87sejup1fe.ffs@tglx

As I told you before:

 "As you correctly described, the two states are asynchronous in the
  context of NMIs, so the obvious thing is to treat them as seperate
  entities so that the trace really contains the off/on events
  independent of lockdep enablement and lockdep's internal state. No?"

So I would have expected that you look at the lockdep implementation and
figure out why that one works correctly. Let's have a look at it:

nmi_entry()
   irq_state.lockdep = lockdep_hardirqs_enabled();

   lockdep_hardirqs_off(CALLER_ADDR0);
   ...
   return irq_state;

nmi_exit(irq_state)

   if (irq_state.lockdep)
	lockdep_hardirqs_on(CALLER_ADDR0);

On NMI entry the internal state of lockdep tracking is recorded and
lockdep_hardirqs_off() handles a redundant off gracefully. As I said it
might be arguable to invoke it only when irq_state.lockdep == true, but
that's a cosmetic or performance issue not a correctness problem.

On NMI exit lockdep_hardirqs_on() is only invoked when the lockdep internal
state at entry was 'enabled' so it can restore the state correctly.

If you model the tracer exactly the same way:

   irq_state.lockdep = lockdep_hardirqs_enabled();
   irq_state.tracer = tracer_hardirqs_enabled();
   ....

   You get the idea...

That will be "correct" in terms of sequencing depending on your trace side
implementation:

  trace_hardirqs_off()
        if (trace_hardirqs_enabled()) {
        	trace_irqs_enabled = false;     // A
                emit_trace(OFF);	        // B

If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.

If it hits between A and B, then the NMI tracing wont do anything
because enabled state is already false, but the resulting trace
sequencing is messed up:

     irqs on
     ...
     nmi_entry
     nmi_exit
     irqs off

IOW, it misses the ON->OFF OFF->ON transitions in the NMI.

So you want to do it the other way round:

  trace_hardirqs_off()
        if (trace_hardirqs_enabled()) {
                emit_trace(OFF);	        // A
        	trace_irqs_enabled = false;     // B

If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.

If it hits between A and B then you get a duplicate

     irqs on
     ...
     irqs off
     nmi_entry
     irqs off
     irqs on
     nmi_exit

There is nothing you can do about it, but that's easy enough to filter
out, no?

I'm pretty sure you can figure out how that should be modeled on the
on() side of affairs.

Thanks,

        tglx

     prev parent reply	other threads:[~2025-07-02 10:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 12:08 [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs Gabriele Monaco
2025-06-30 16:24 ` Thomas Gleixner
2025-07-01 12:54 ` Peter Zijlstra
2025-07-02  7:18   ` Gabriele Monaco
2025-07-02  9:39     ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyajkk2u.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=catalin.marinas@arm.com \
    --cc=gmonaco@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox