From: Peter Zijlstra <peterz@infradead.org>
To: linux-kernel@vger.kernel.org, mingo@kernel.org, will@kernel.org
Cc: npiggin@gmail.com, elver@google.com, jgross@suse.com,
paulmck@kernel.org, rostedt@goodmis.org, rjw@rjwysocki.net,
joel@joelfernandes.org, svens@linux.ibm.com, tglx@linutronix.de,
peterz@infradead.org
Subject: [PATCH 0/9] TRACE_IRQFLAGS wreckage
Date: Thu, 20 Aug 2020 09:30:31 +0200 [thread overview]
Message-ID: <20200820073031.886217423@infradead.org> (raw)
TRACE_IRQFLAGS
local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.
raw_local_irq_*() does not update the software state, no tracing.
---
Problem 1:
raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*
existing instances:
- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()
- trace_clock_global()
raw_local_irq_save()
arch_spin_lock()
pv_wait() := kvm_wait()
local_irq_save()
- apic_retrigger_irq()
raw_local_irq_save()
apic->send_IPI() := default_send_IPI_single_phys()
local_irq_save()
Possible solutions:
A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now
Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.
So we pick B) and declare any code that ends up doing:
raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();
broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.
---
Problem 2:
raw_local_irq_save(); // software state on
trace_*()
...
perf_tp_event()
...
perf_callchain()
<#PF>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#PF>
raw_local_irq_restore(); // software state stays off, *whoopsie*
existing instances:
- lock_acquire() / lock_release()
raw_local_irq_save()
trace_lock_acquire() / trace_lock_release()
- function tracing
Possible solutions:
A) fix every architecture's entry code
B) only fix kernel/entry/common.c
C) fix lockdep tracepoints and pray
This series does C, AFAICT this problem has existed forever.
---
Problem 3:
raw_local_irq_save(); // software state on
<#NMI>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#NMI>
raw_local_irq_restore(); // software state stays off, *whoopsie*
Possible solutions:
This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().
In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).
There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.
---
The patches in this series do (in reverse order):
- 2C
- 1B
- fix fallout in idle due to the trace_lock_*() tracepoints suddenly
being visible to rcu-lockdep.
---
arch/arm/mach-omap2/pm34xx.c | 4 --
arch/arm64/kernel/process.c | 2 -
arch/nds32/include/asm/irqflags.h | 5 ++
arch/powerpc/include/asm/hw_irq.h | 11 ++---
arch/s390/kernel/idle.c | 3 -
arch/x86/entry/thunk_32.S | 5 --
arch/x86/include/asm/mmu.h | 1
arch/x86/kernel/process.c | 4 --
arch/x86/mm/tlb.c | 13 +-----
drivers/cpuidle/cpuidle.c | 19 +++++++--
drivers/idle/intel_idle.c | 16 --------
include/linux/cpuidle.h | 13 +++---
include/linux/irqflags.h | 73 ++++++++++++++++++++------------------
include/linux/lockdep.h | 18 ++++++---
include/linux/mmu_context.h | 5 ++
kernel/locking/lockdep.c | 18 +++++----
kernel/sched/idle.c | 21 ++++------
17 files changed, 112 insertions(+), 119 deletions(-)
next reply other threads:[~2020-08-20 7:50 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-20 7:30 Peter Zijlstra [this message]
2020-08-20 7:30 ` [PATCH 1/9] lockdep: Use raw_cpu_*() for per-cpu variables Peter Zijlstra
2020-08-20 13:54 ` Steven Rostedt
2020-08-20 7:30 ` [PATCH 2/9] sched,idle,rcu: Push rcu_idle deeper into the idle path Peter Zijlstra
2020-08-20 13:58 ` Steven Rostedt
2020-08-20 7:30 ` [PATCH 3/9] cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic Peter Zijlstra
2020-08-20 7:30 ` [PATCH 4/9] cpuidle: Move trace_cpu_idle() into generic code Peter Zijlstra
2020-08-20 7:30 ` [PATCH 5/9] x86/entry: Remove unused THUNKs Peter Zijlstra
2020-08-20 7:30 ` [PATCH 6/9] locking/lockdep: Cleanup Peter Zijlstra
2020-08-20 7:30 ` [PATCH 7/9] nds32: Implement arch_irqs_disabled() Peter Zijlstra
2020-08-20 7:30 ` [PATCH 8/9] lockdep: Only trace IRQ edges Peter Zijlstra
2020-08-20 7:30 ` [PATCH 9/9] lockdep,trace: Expose tracepoints Peter Zijlstra
2020-08-20 14:36 ` [PATCH 0/9] TRACE_IRQFLAGS wreckage Steven Rostedt
2020-08-20 14:39 ` Steven Rostedt
2020-08-20 14:49 ` Marco Elver
2020-08-20 14:58 ` peterz
2020-08-20 16:53 ` Steven Rostedt
2020-08-20 17:20 ` Marco Elver
2020-08-20 19:59 ` Steven Rostedt
2020-08-21 6:37 ` Marco Elver
2020-08-21 6:54 ` peterz
2020-08-21 7:05 ` Marco Elver
2020-08-27 7:54 ` [tip: sched/urgent] sched: Use __always_inline on is_idle_task() tip-bot2 for Marco Elver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200820073031.886217423@infradead.org \
--to=peterz@infradead.org \
--cc=elver@google.com \
--cc=jgross@suse.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=rjw@rjwysocki.net \
--cc=rostedt@goodmis.org \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox