* [PATCH 0/5] ftrace: do not trace NMI handlers
@ 2008-07-30 1:29 Steven Rostedt
2008-07-30 1:29 ` [PATCH 1/5] ftrace: do not trace nmi callers in x86 Steven Rostedt
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
The dynamic ftrace code modifies code text at run time. Arjan informed
me that there is no safe way to modify code text on an SMP system when
the other CPUs might execute that same code. The reason has to do with
pipeline caches and CPUs might do funny things if the code being pushed
in the pipeline also happens to be modified at that same time. (Arjan
correct me if I'm wrong here).
We use kstop_machine to put the system into a UP like mode. This prevents
other CPUs from executing code while we modify it. Under stress testing
Ingo discovered that NMIs can cause the system to crash. This was due
to NMIs calling code that is being modified. Some boxes are more prone to
failure than others.
This series of patches performs two tasks:
1) Add notrace to functions called by NMI, or simply remove the tracing
completely from files that are primarily used by NMI.
2) Add a warning when code that will be modified is called by an NMI.
This also disables ftraced when it is detected, to prevent the
race with the NMI and code modification from happneing.
The warning looks something like this:
--------------- cut here ---------------
WARNING: ftraced code called from NMI context lapic_wd_event+0xd/0x65
Please report this to the ftrace maintainer.
Disabling ftraced. Boot with ftrace_keep_on_nmi to not disable.
Pid: 0, comm: swapper Not tainted 2.6.26-tip #96
Call Trace:
<NMI> [<ffffffff8021c6d0>] ? lapic_wd_event+0xd/0x65
[<ffffffff8027b9c1>] ftrace_record_ip+0xa3/0x357
[<ffffffff8020c0f4>] mcount_call+0x5/0x31
[<ffffffff8021c6d5>] ? lapic_wd_event+0x12/0x65
[<ffffffff804b90d4>] nmi_watchdog_tick+0x21b/0x230
[<ffffffff804b8487>] default_do_nmi+0x73/0x1e0
[<ffffffff804b8a04>] do_nmi+0x64/0x91
[<ffffffff804b80bf>] nmi+0x7f/0x80
[<ffffffff80212c14>] ? default_idle+0x35/0x4f
<<EOE>> [<ffffffff8020ae42>] cpu_idle+0x8a/0xc9
[<ffffffff804b15a6>] start_secondary+0x172/0x177
--------------- end cut here ---------------
This appears once when it is caught. We are hoping that this will not
appear often, and are running code to catch it as it does.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/5] ftrace: do not trace nmi callers in x86
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
@ 2008-07-30 1:29 ` Steven Rostedt
2008-07-30 1:29 ` [PATCH 2/5] ftrace: do not trace nmi callers in drivers Steven Rostedt
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
Cc: Steven Rostedt
[-- Attachment #1: ftrace-no-trace-nmi-x86.patch --]
[-- Type: text/plain, Size: 8696 bytes --]
The dynamic ftrace code performs run time modification of the code text
section. This is not safe to do unless all other CPUS are halted. Because
there is no good way to halt NMIs while doing the modification, we must
make sure that the NMIs will not execute code that will be modified.
This patch adds notrace annotation to functions called by NMIs in x86.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/Makefile | 6 ++++++
arch/x86/kernel/cpu/perfctr-watchdog.c | 13 +++++++------
arch/x86/kernel/traps_32.c | 3 ++-
arch/x86/kernel/traps_64.c | 3 ++-
arch/x86/mm/kmemcheck/smp.c | 2 +-
arch/x86/mm/kmmio.c | 9 +++++----
arch/x86/oprofile/Makefile | 5 +++++
include/asm-x86/apic.h | 8 ++++----
8 files changed, 32 insertions(+), 17 deletions(-)
Index: linux-tip.git/arch/x86/kernel/Makefile
===================================================================
--- linux-tip.git.orig/arch/x86/kernel/Makefile 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/kernel/Makefile 2008-07-29 19:57:52.000000000 -0400
@@ -11,6 +11,12 @@ ifdef CONFIG_FTRACE
CFLAGS_REMOVE_tsc.o = -pg
CFLAGS_REMOVE_rtc.o = -pg
CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
+CFLAGS_REMOVE_kgdb.o = -pg
+CFLAGS_REMOVE_kprobes.o = -pg
+ifdef CONFIG_DYNAMIC_FTRACE
+CFLAGS_REMOVE_nmi.o = -pg
+CFLAGS_REMOVE_crash.o = -pg
+endif
endif
#
Index: linux-tip.git/arch/x86/kernel/traps_32.c
===================================================================
--- linux-tip.git.orig/arch/x86/kernel/traps_32.c 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/kernel/traps_32.c 2008-07-29 19:57:53.000000000 -0400
@@ -389,7 +389,8 @@ unsigned __kprobes long oops_begin(void)
return flags;
}
-void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
+void notrace __kprobes
+oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
bust_spinlocks(0);
die_owner = -1;
Index: linux-tip.git/arch/x86/kernel/traps_64.c
===================================================================
--- linux-tip.git.orig/arch/x86/kernel/traps_64.c 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/kernel/traps_64.c 2008-07-29 19:57:53.000000000 -0400
@@ -506,7 +506,8 @@ unsigned __kprobes long oops_begin(void)
return flags;
}
-void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
+void __kprobes notrace
+oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
die_owner = -1;
bust_spinlocks(0);
Index: linux-tip.git/arch/x86/mm/kmemcheck/smp.c
===================================================================
--- linux-tip.git.orig/arch/x86/mm/kmemcheck/smp.c 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/mm/kmemcheck/smp.c 2008-07-29 19:57:53.000000000 -0400
@@ -13,7 +13,7 @@ static atomic_t nmi_wait;
static atomic_t nmi_resume;
static atomic_t paused;
-static int nmi_notifier(struct notifier_block *self,
+static int notrace nmi_notifier(struct notifier_block *self,
unsigned long val, void *data)
{
if (val != DIE_NMI_IPI || !atomic_read(&nmi_wait))
Index: linux-tip.git/arch/x86/mm/kmmio.c
===================================================================
--- linux-tip.git.orig/arch/x86/mm/kmmio.c 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/mm/kmmio.c 2008-07-29 19:57:53.000000000 -0400
@@ -147,7 +147,7 @@ static void set_page_present(unsigned lo
}
/** Mark the given page as not present. Access to it will trigger a fault. */
-static void arm_kmmio_fault_page(unsigned long page, unsigned int *pglevel)
+static notrace void arm_kmmio_fault_page(unsigned long page, unsigned int *pglevel)
{
set_page_present(page & PAGE_MASK, false, pglevel);
}
@@ -269,7 +269,8 @@ no_kmmio:
* and they remain disabled thorough out this function.
* This must always get called as the pair to kmmio_handler().
*/
-static int post_kmmio_handler(unsigned long condition, struct pt_regs *regs)
+static notrace int
+post_kmmio_handler(unsigned long condition, struct pt_regs *regs)
{
int ret = 0;
struct kmmio_context *ctx = &get_cpu_var(kmmio_ctx);
@@ -484,8 +485,8 @@ void unregister_kmmio_probe(struct kmmio
}
EXPORT_SYMBOL(unregister_kmmio_probe);
-static int kmmio_die_notifier(struct notifier_block *nb, unsigned long val,
- void *args)
+static int notrace
+kmmio_die_notifier(struct notifier_block *nb, unsigned long val, void *args)
{
struct die_args *arg = args;
Index: linux-tip.git/arch/x86/oprofile/Makefile
===================================================================
--- linux-tip.git.orig/arch/x86/oprofile/Makefile 2008-07-29 19:56:58.000000000 -0400
+++ linux-tip.git/arch/x86/oprofile/Makefile 2008-07-29 19:57:53.000000000 -0400
@@ -6,6 +6,11 @@ DRIVER_OBJS = $(addprefix ../../../drive
oprofilefs.o oprofile_stats.o \
timer_int.o )
+ifdef CONFIG_DYNAMIC_FTRACE
+CFLAGS_REMOVE_nmi_int.o = -pg
+CFLAGS_REMOVE_nmi_timer_int.o = -pg
+endif
+
oprofile-y := $(DRIVER_OBJS) init.o backtrace.o
oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_amd.o \
op_model_ppro.o op_model_p4.o
Index: linux-tip.git/arch/x86/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux-tip.git.orig/arch/x86/kernel/cpu/perfctr-watchdog.c 2008-07-29 20:02:19.000000000 -0400
+++ linux-tip.git/arch/x86/kernel/cpu/perfctr-watchdog.c 2008-07-29 20:02:36.000000000 -0400
@@ -243,7 +243,7 @@ static unsigned int adjust_for_32bit_ctr
return retval;
}
-static void write_watchdog_counter(unsigned int perfctr_msr,
+static notrace void write_watchdog_counter(unsigned int perfctr_msr,
const char *descr, unsigned nmi_hz)
{
u64 count = (u64)cpu_khz * 1000;
@@ -254,7 +254,7 @@ static void write_watchdog_counter(unsig
wrmsrl(perfctr_msr, 0 - count);
}
-static void write_watchdog_counter32(unsigned int perfctr_msr,
+static notrace void write_watchdog_counter32(unsigned int perfctr_msr,
const char *descr, unsigned nmi_hz)
{
u64 count = (u64)cpu_khz * 1000;
@@ -330,7 +330,8 @@ static void single_msr_unreserve(void)
release_perfctr_nmi(wd_ops->perfctr);
}
-static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+static notrace void
+single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
/* start the cycle over again */
write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
@@ -389,7 +390,7 @@ static int setup_p6_watchdog(unsigned nm
return 1;
}
-static void p6_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+static notrace void p6_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
/*
* P6 based Pentium M need to re-unmask
@@ -541,7 +542,7 @@ static void p4_unreserve(void)
release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
}
-static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+static notrace void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
{
unsigned dummy;
/*
@@ -716,7 +717,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
return hz;
}
-int lapic_wd_event(unsigned nmi_hz)
+notrace int lapic_wd_event(unsigned nmi_hz)
{
struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
u64 ctr;
Index: linux-tip.git/include/asm-x86/apic.h
===================================================================
--- linux-tip.git.orig/include/asm-x86/apic.h 2008-07-29 20:02:19.000000000 -0400
+++ linux-tip.git/include/asm-x86/apic.h 2008-07-29 20:02:37.000000000 -0400
@@ -60,7 +60,7 @@ extern u64 xapic_icr_read(void);
extern void xapic_icr_write(u32, u32);
extern int setup_profiling_timer(unsigned int);
-static inline void native_apic_mem_write(u32 reg, u32 v)
+static inline notrace void native_apic_mem_write(u32 reg, u32 v)
{
volatile u32 *addr = (volatile u32 *)(APIC_BASE + reg);
@@ -69,12 +69,12 @@ static inline void native_apic_mem_write
ASM_OUTPUT2("0" (v), "m" (*addr)));
}
-static inline u32 native_apic_mem_read(u32 reg)
+static inline notrace u32 native_apic_mem_read(u32 reg)
{
return *((volatile u32 *)(APIC_BASE + reg));
}
-static inline void native_apic_msr_write(u32 reg, u32 v)
+static inline notrace void native_apic_msr_write(u32 reg, u32 v)
{
if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||
reg == APIC_LVR)
@@ -83,7 +83,7 @@ static inline void native_apic_msr_write
wrmsr(APIC_BASE_MSR + (reg >> 4), v, 0);
}
-static inline u32 native_apic_msr_read(u32 reg)
+static inline notrace u32 native_apic_msr_read(u32 reg)
{
u32 low, high;
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/5] ftrace: do not trace nmi callers in drivers
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
2008-07-30 1:29 ` [PATCH 1/5] ftrace: do not trace nmi callers in x86 Steven Rostedt
@ 2008-07-30 1:29 ` Steven Rostedt
2008-07-30 1:29 ` [PATCH 3/5] ftrace: do not trace nmi callers in RCU Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
Cc: Steven Rostedt, Corey Minyard, Thomas Mingarelli
[-- Attachment #1: ftrace-notrace-nmi-drivers.patch --]
[-- Type: text/plain, Size: 2705 bytes --]
The dynamic ftrace code performs run time modification of the code text
section. This is not safe to do unless all other CPUS are halted. Because
there is no good way to halt NMIs while doing the modification, we must
make sure that the NMIs will not execute code that will be modified.
This patch adds notrace annotation to functions called by NMIs in the
drivers section.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
CC: Corey Minyard <minyard@acm.org>
CC: Thomas Mingarelli <thomas.mingarelli@hp.com>
---
drivers/char/ipmi/ipmi_watchdog.c | 2 +-
drivers/misc/sgi-xp/xpc_main.c | 4 ++--
drivers/watchdog/hpwdt.c | 5 +++--
3 files changed, 6 insertions(+), 5 deletions(-)
Index: linux-tip.git/drivers/char/ipmi/ipmi_watchdog.c
===================================================================
--- linux-tip.git.orig/drivers/char/ipmi/ipmi_watchdog.c 2008-07-29 20:03:09.000000000 -0400
+++ linux-tip.git/drivers/char/ipmi/ipmi_watchdog.c 2008-07-29 20:03:20.000000000 -0400
@@ -1051,7 +1051,7 @@ static void ipmi_unregister_watchdog(int
}
#ifdef HAVE_DIE_NMI
-static int
+static notrace int
ipmi_nmi(struct notifier_block *self, unsigned long val, void *data)
{
struct die_args *args = data;
Index: linux-tip.git/drivers/misc/sgi-xp/xpc_main.c
===================================================================
--- linux-tip.git.orig/drivers/misc/sgi-xp/xpc_main.c 2008-07-29 20:03:09.000000000 -0400
+++ linux-tip.git/drivers/misc/sgi-xp/xpc_main.c 2008-07-29 20:03:20.000000000 -0400
@@ -1004,7 +1004,7 @@ xpc_system_reboot(struct notifier_block
/*
* Notify other partitions to disengage from all references to our memory.
*/
-static void
+static notrace void
xpc_die_disengage(void)
{
struct xpc_partition *part;
@@ -1083,7 +1083,7 @@ xpc_die_disengage(void)
* for a time. In this case we need to notify other partitions to not worry
* about the lack of a heartbeat.
*/
-static int
+static notrace int
xpc_system_die(struct notifier_block *nb, unsigned long event, void *unused)
{
switch (event) {
Index: linux-tip.git/drivers/watchdog/hpwdt.c
===================================================================
--- linux-tip.git.orig/drivers/watchdog/hpwdt.c 2008-07-29 20:03:09.000000000 -0400
+++ linux-tip.git/drivers/watchdog/hpwdt.c 2008-07-29 20:03:20.000000000 -0400
@@ -417,8 +417,9 @@ static int __devinit detect_cru_service(
/*
* NMI Handler
*/
-static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
- void *data)
+static notrace int
+hpwdt_pretimeout(struct notifier_block *nb,
+ unsigned long ulReason, void *data)
{
static unsigned long rom_pl;
static int die_nmi_called;
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/5] ftrace: do not trace nmi callers in RCU
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
2008-07-30 1:29 ` [PATCH 1/5] ftrace: do not trace nmi callers in x86 Steven Rostedt
2008-07-30 1:29 ` [PATCH 2/5] ftrace: do not trace nmi callers in drivers Steven Rostedt
@ 2008-07-30 1:29 ` Steven Rostedt
2008-07-30 1:29 ` [PATCH 4/5] ftrace: do not trace nmi callers in kernel directory Steven Rostedt
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
Cc: Steven Rostedt, Paul E. McKenney
[-- Attachment #1: ftrace-no-trace-rcu-nmi.patch --]
[-- Type: text/plain, Size: 1894 bytes --]
The dynamic ftrace code performs run time modification of the code text
section. This is not safe to do unless all other CPUS are halted. Because
there is no good way to halt NMIs while doing the modification, we must
make sure that the NMIs will not execute code that will be modified.
This patch adds notrace annotation to functions called by NMIs in RCU.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcupreempt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Index: linux-tip.git/kernel/rcupreempt.c
===================================================================
--- linux-tip.git.orig/kernel/rcupreempt.c 2008-07-29 20:12:05.000000000 -0400
+++ linux-tip.git/kernel/rcupreempt.c 2008-07-29 20:13:52.000000000 -0400
@@ -240,7 +240,7 @@ long rcu_batches_completed(void)
}
EXPORT_SYMBOL_GPL(rcu_batches_completed);
-void __rcu_read_lock(void)
+notrace void __rcu_read_lock(void)
{
int idx;
struct task_struct *t = current;
@@ -306,7 +306,7 @@ void __rcu_read_lock(void)
}
EXPORT_SYMBOL_GPL(__rcu_read_lock);
-void __rcu_read_unlock(void)
+notrace void __rcu_read_unlock(void)
{
int idx;
struct task_struct *t = current;
@@ -448,7 +448,7 @@ static DEFINE_PER_CPU(int, rcu_update_fl
* rcu_dyntick_sched.dynticks to let the RCU handling know that the
* CPU is active.
*/
-void rcu_irq_enter(void)
+notrace void rcu_irq_enter(void)
{
int cpu = smp_processor_id();
struct rcu_dyntick_sched *rdssp = &per_cpu(rcu_dyntick_sched, cpu);
@@ -518,7 +518,7 @@ void rcu_irq_enter(void)
* rcu_dyntick_sched.dynticks to put let the RCU handling be
* aware that the CPU is going back to idle with no ticks.
*/
-void rcu_irq_exit(void)
+notrace void rcu_irq_exit(void)
{
int cpu = smp_processor_id();
struct rcu_dyntick_sched *rdssp = &per_cpu(rcu_dyntick_sched, cpu);
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/5] ftrace: do not trace nmi callers in kernel directory
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
` (2 preceding siblings ...)
2008-07-30 1:29 ` [PATCH 3/5] ftrace: do not trace nmi callers in RCU Steven Rostedt
@ 2008-07-30 1:29 ` Steven Rostedt
2008-07-30 1:29 ` [PATCH 5/5] ftrace: warn on NMI calling code that may be modified Steven Rostedt
2008-07-30 7:28 ` [PATCH 0/5] ftrace: do not trace NMI handlers Peter Zijlstra
5 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
Cc: Steven Rostedt
[-- Attachment #1: ftrace-no-trace-nmi-kernel.patch --]
[-- Type: text/plain, Size: 1382 bytes --]
The dynamic ftrace code performs run time modification of the code text
section. This is not safe to do unless all other CPUS are halted. Because
there is no good way to halt NMIs while doing the modification, we must
make sure that the NMIs will not execute code that will be modified.
This patch adds notrace annotation to functions called by NMIs in the
kernel directory.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/Makefile | 7 +++++++
1 file changed, 7 insertions(+)
Index: linux-tip.git/kernel/Makefile
===================================================================
--- linux-tip.git.orig/kernel/Makefile 2008-07-29 20:13:14.000000000 -0400
+++ linux-tip.git/kernel/Makefile 2008-07-29 20:33:51.000000000 -0400
@@ -15,6 +15,8 @@ CFLAGS_REMOVE_sched.o = -mno-spe
ifdef CONFIG_FTRACE
# Do not trace debug files and internal ftrace files
+CFLAGS_REMOVE_kprobes.o = -pg
+CFLAGS_REMOVE_test_kprobes.o = -pg
CFLAGS_REMOVE_lockdep.o = -pg
CFLAGS_REMOVE_lockdep_proc.o = -pg
CFLAGS_REMOVE_mutex-debug.o = -pg
@@ -22,6 +24,11 @@ CFLAGS_REMOVE_rtmutex-debug.o = -pg
CFLAGS_REMOVE_cgroup-debug.o = -pg
CFLAGS_REMOVE_sched_clock.o = -pg
CFLAGS_REMOVE_sched.o = -mno-spe -pg
+ifdef CONFIG_DYNAMIC_FTRACE
+# NMI called files
+CFLAGS_REMOVE_notifier.o = -pg
+CFLAGS_REMOVE_spinlock.o = -pg
+endif
endif
obj-$(CONFIG_PROFILING) += profile.o
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 5/5] ftrace: warn on NMI calling code that may be modified
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
` (3 preceding siblings ...)
2008-07-30 1:29 ` [PATCH 4/5] ftrace: do not trace nmi callers in kernel directory Steven Rostedt
@ 2008-07-30 1:29 ` Steven Rostedt
2008-07-30 7:28 ` [PATCH 0/5] ftrace: do not trace NMI handlers Peter Zijlstra
5 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2008-07-30 1:29 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
Cc: Steven Rostedt
[-- Attachment #1: ftrace-nmi-warning.patch --]
[-- Type: text/plain, Size: 2844 bytes --]
The dynamic ftrace code performs run time modification of the code text
section. This is not safe to do unless all other CPUS are halted. Because
there is no good way to halt NMIs while doing the modification, we must
make sure that the NMIs will not execute code that will be modified.
This patch adds a warning at the location that records code text that
will be modified later by the dynamic ftracer. It also disables the
modification from happening, so that the race with the NMI will not exist.
The kernel command line parameter "ftrace_keep_on_nmi" is defined to let
the user override the disabling of ftrace, at the risk that a very slight
race condition might crash the machine. The command line still outputs
the warning, but doesn't disable ftrace.
The race that this protects is very hard to hit, but this patch adds detection
that the race exists and disables the code to prevent it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/trace/ftrace.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
Index: linux-tip.git/kernel/trace/ftrace.c
===================================================================
--- linux-tip.git.orig/kernel/trace/ftrace.c 2008-07-28 20:15:15.000000000 -0400
+++ linux-tip.git/kernel/trace/ftrace.c 2008-07-29 17:06:55.000000000 -0400
@@ -36,6 +36,17 @@
int ftrace_enabled __read_mostly;
static int last_ftrace_enabled;
+static int ftrace_nmi_disabled __read_mostly;
+
+/* Let the user override disabling ftrace on NMI, at their own risk. */
+static int ftrace_keep_on_nmi __read_mostly;
+static int __init ftrace_keep_on_nmi_setup(char *str)
+{
+ ftrace_keep_on_nmi = 1;
+ return 1;
+}
+__setup("ftrace_keep_on_nmi", ftrace_keep_on_nmi_setup);
+
/*
* ftrace_disabled is set when an anomaly is discovered.
* ftrace_disabled is much stronger than ftrace_enabled.
@@ -341,6 +352,30 @@ ftrace_record_ip(unsigned long ip)
int atomic;
int cpu;
+ if (!ftrace_nmi_disabled && in_nmi()) {
+ ftrace_nmi_disabled = 1;
+ printk(KERN_WARNING
+ "\n--------------- cut here ---------------\n");
+ printk(KERN_WARNING
+ "WARNING: ftraced code called from NMI context ");
+ print_symbol("%s\n", ip);
+ printk(KERN_WARNING
+ " Please report this to the ftrace maintainer.\n");
+ if (!ftrace_keep_on_nmi) {
+ ftrace_disabled = 1;
+ printk(KERN_WARNING
+ " Disabling ftraced. Boot with ftrace_keep_on_nmi"
+ " to not disable.\n");
+ } else
+ printk(KERN_WARNING
+ " \"ftrace_keep_on_nmi\" set. Continuing to use"
+ " dynamic ftrace,\n"
+ " but the system may be unstable\n");
+ dump_stack();
+ printk(KERN_WARNING
+ "--------------- end cut here ---------------\n");
+ }
+
if (!ftrace_enabled || ftrace_disabled)
return;
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] ftrace: do not trace NMI handlers
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
` (4 preceding siblings ...)
2008-07-30 1:29 ` [PATCH 5/5] ftrace: warn on NMI calling code that may be modified Steven Rostedt
@ 2008-07-30 7:28 ` Peter Zijlstra
2008-07-30 14:04 ` Mathieu Desnoyers
5 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2008-07-30 7:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Thomas Gleixner, Andrew Morton, linux-kernel,
Linus Torvalds, Arjan van de Ven, Mathieu Desnoyers
On Tue, 2008-07-29 at 21:29 -0400, Steven Rostedt wrote:
> The dynamic ftrace code modifies code text at run time. Arjan informed
> me that there is no safe way to modify code text on an SMP system when
> the other CPUs might execute that same code. The reason has to do with
> pipeline caches and CPUs might do funny things if the code being pushed
> in the pipeline also happens to be modified at that same time. (Arjan
> correct me if I'm wrong here).
How does the immediate value stuff get around this issue?
> We use kstop_machine to put the system into a UP like mode. This prevents
> other CPUs from executing code while we modify it. Under stress testing
> Ingo discovered that NMIs can cause the system to crash. This was due
> to NMIs calling code that is being modified. Some boxes are more prone to
> failure than others.
>
> This series of patches performs two tasks:
>
> 1) Add notrace to functions called by NMI, or simply remove the tracing
> completely from files that are primarily used by NMI.
>
> 2) Add a warning when code that will be modified is called by an NMI.
> This also disables ftraced when it is detected, to prevent the
> race with the NMI and code modification from happneing.
>
> The warning looks something like this:
>
> --------------- cut here ---------------
> WARNING: ftraced code called from NMI context lapic_wd_event+0xd/0x65
> Please report this to the ftrace maintainer.
> Disabling ftraced. Boot with ftrace_keep_on_nmi to not disable.
> Pid: 0, comm: swapper Not tainted 2.6.26-tip #96
>
> Call Trace:
> <NMI> [<ffffffff8021c6d0>] ? lapic_wd_event+0xd/0x65
> [<ffffffff8027b9c1>] ftrace_record_ip+0xa3/0x357
> [<ffffffff8020c0f4>] mcount_call+0x5/0x31
> [<ffffffff8021c6d5>] ? lapic_wd_event+0x12/0x65
> [<ffffffff804b90d4>] nmi_watchdog_tick+0x21b/0x230
> [<ffffffff804b8487>] default_do_nmi+0x73/0x1e0
> [<ffffffff804b8a04>] do_nmi+0x64/0x91
> [<ffffffff804b80bf>] nmi+0x7f/0x80
> [<ffffffff80212c14>] ? default_idle+0x35/0x4f
> <<EOE>> [<ffffffff8020ae42>] cpu_idle+0x8a/0xc9
> [<ffffffff804b15a6>] start_secondary+0x172/0x177
>
> --------------- end cut here ---------------
>
>
> This appears once when it is caught. We are hoping that this will not
> appear often, and are running code to catch it as it does.
>
> -- Steve
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] ftrace: do not trace NMI handlers
2008-07-30 7:28 ` [PATCH 0/5] ftrace: do not trace NMI handlers Peter Zijlstra
@ 2008-07-30 14:04 ` Mathieu Desnoyers
2008-07-30 14:23 ` Mathieu Desnoyers
0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2008-07-30 14:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-29 at 21:29 -0400, Steven Rostedt wrote:
> > The dynamic ftrace code modifies code text at run time. Arjan informed
> > me that there is no safe way to modify code text on an SMP system when
> > the other CPUs might execute that same code. The reason has to do with
> > pipeline caches and CPUs might do funny things if the code being pushed
> > in the pipeline also happens to be modified at that same time. (Arjan
> > correct me if I'm wrong here).
>
> How does the immediate value stuff get around this issue?
>
By inserting a temporary breakpoint in the code about to be replaced and
by issuing an IPI to every CPU, issuing a cpuid instruction to
synchronize the cores, before reverting the breakpoint to the first byte
of the new instruction. Therefore we are sure that any CPU which could
have seen the old instruction will flush its pipeline cache (cpuid
instruction synchronizes the core) before seeing the new instruction.
Meanwhile, the breakpoint handler executes the original instruction,
dealing with non-relocatable instructions if necessary by emulating them
(e.g. by playing with the return IP). We have some constraints to
follow: every instruction being replaced must stay a valid instruction
to the CPU, because there could be preemption in the middle of the
modified area. Immediate values does not change the size of the
instructions, so we don't run in this problem, but changing
{ 0x90, 0x90, 0x90, 0x90, 0x90 } (nops)
into
{ 0xe9, 0xXX, 0xXX, 0xXX, 0xXX } (5-bytes jump)
would be problematic because a preempted IP could point right in the
middle of those nops. As a general rule, never try to combine smaller
instructions into a bigger one, except in the case of adding a
lock-prefix to an instruction : this case insures that the non-lock
prefixed instruction is still valid after the change has been done. We
could however run into a nasty non-synchronized atomic instruction use
in SMP mode if a thread happens to be scheduled out right after the lock
prefix. Hopefully the alternative code uses the refrigerator... (hrm, it
doesn't).
All that is documentented thoroughly in arch/x86/kernel/immediate.c
comments.
Mathieu
> > We use kstop_machine to put the system into a UP like mode. This prevents
> > other CPUs from executing code while we modify it. Under stress testing
> > Ingo discovered that NMIs can cause the system to crash. This was due
> > to NMIs calling code that is being modified. Some boxes are more prone to
> > failure than others.
> >
> > This series of patches performs two tasks:
> >
> > 1) Add notrace to functions called by NMI, or simply remove the tracing
> > completely from files that are primarily used by NMI.
> >
> > 2) Add a warning when code that will be modified is called by an NMI.
> > This also disables ftraced when it is detected, to prevent the
> > race with the NMI and code modification from happneing.
> >
> > The warning looks something like this:
> >
> > --------------- cut here ---------------
> > WARNING: ftraced code called from NMI context lapic_wd_event+0xd/0x65
> > Please report this to the ftrace maintainer.
> > Disabling ftraced. Boot with ftrace_keep_on_nmi to not disable.
> > Pid: 0, comm: swapper Not tainted 2.6.26-tip #96
> >
> > Call Trace:
> > <NMI> [<ffffffff8021c6d0>] ? lapic_wd_event+0xd/0x65
> > [<ffffffff8027b9c1>] ftrace_record_ip+0xa3/0x357
> > [<ffffffff8020c0f4>] mcount_call+0x5/0x31
> > [<ffffffff8021c6d5>] ? lapic_wd_event+0x12/0x65
> > [<ffffffff804b90d4>] nmi_watchdog_tick+0x21b/0x230
> > [<ffffffff804b8487>] default_do_nmi+0x73/0x1e0
> > [<ffffffff804b8a04>] do_nmi+0x64/0x91
> > [<ffffffff804b80bf>] nmi+0x7f/0x80
> > [<ffffffff80212c14>] ? default_idle+0x35/0x4f
> > <<EOE>> [<ffffffff8020ae42>] cpu_idle+0x8a/0xc9
> > [<ffffffff804b15a6>] start_secondary+0x172/0x177
> >
> > --------------- end cut here ---------------
> >
> >
> > This appears once when it is caught. We are hoping that this will not
> > appear often, and are running code to catch it as it does.
> >
> > -- Steve
> >
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] ftrace: do not trace NMI handlers
2008-07-30 14:04 ` Mathieu Desnoyers
@ 2008-07-30 14:23 ` Mathieu Desnoyers
0 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2008-07-30 14:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, Andrew Morton,
linux-kernel, Linus Torvalds, Arjan van de Ven
* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Tue, 2008-07-29 at 21:29 -0400, Steven Rostedt wrote:
> > > The dynamic ftrace code modifies code text at run time. Arjan informed
> > > me that there is no safe way to modify code text on an SMP system when
> > > the other CPUs might execute that same code. The reason has to do with
> > > pipeline caches and CPUs might do funny things if the code being pushed
> > > in the pipeline also happens to be modified at that same time. (Arjan
> > > correct me if I'm wrong here).
> >
> > How does the immediate value stuff get around this issue?
> >
>
> By inserting a temporary breakpoint in the code about to be replaced and
> by issuing an IPI to every CPU, issuing a cpuid instruction to
> synchronize the cores, before reverting the breakpoint to the first byte
> of the new instruction. Therefore we are sure that any CPU which could
> have seen the old instruction will flush its pipeline cache (cpuid
> instruction synchronizes the core) before seeing the new instruction.
> Meanwhile, the breakpoint handler executes the original instruction,
> dealing with non-relocatable instructions if necessary by emulating them
> (e.g. by playing with the return IP). We have some constraints to
> follow: every instruction being replaced must stay a valid instruction
> to the CPU, because there could be preemption in the middle of the
> modified area. Immediate values does not change the size of the
> instructions, so we don't run in this problem, but changing
> { 0x90, 0x90, 0x90, 0x90, 0x90 } (nops)
> into
> { 0xe9, 0xXX, 0xXX, 0xXX, 0xXX } (5-bytes jump)
> would be problematic because a preempted IP could point right in the
> middle of those nops. As a general rule, never try to combine smaller
> instructions into a bigger one, except in the case of adding a
> lock-prefix to an instruction : this case insures that the non-lock
> prefixed instruction is still valid after the change has been done. We
> could however run into a nasty non-synchronized atomic instruction use
> in SMP mode if a thread happens to be scheduled out right after the lock
> prefix. Hopefully the alternative code uses the refrigerator... (hrm, it
> doesn't).
>
Actually, alternative.c lock-prefix modification is O.K. for spinlocks
because they execute with preemption off, but not for other atomic
operations which may execute with preemption on.
Mathieu
> All that is documentented thoroughly in arch/x86/kernel/immediate.c
> comments.
>
> Mathieu
>
>
>
> > > We use kstop_machine to put the system into a UP like mode. This prevents
> > > other CPUs from executing code while we modify it. Under stress testing
> > > Ingo discovered that NMIs can cause the system to crash. This was due
> > > to NMIs calling code that is being modified. Some boxes are more prone to
> > > failure than others.
> > >
> > > This series of patches performs two tasks:
> > >
> > > 1) Add notrace to functions called by NMI, or simply remove the tracing
> > > completely from files that are primarily used by NMI.
> > >
> > > 2) Add a warning when code that will be modified is called by an NMI.
> > > This also disables ftraced when it is detected, to prevent the
> > > race with the NMI and code modification from happneing.
> > >
> > > The warning looks something like this:
> > >
> > > --------------- cut here ---------------
> > > WARNING: ftraced code called from NMI context lapic_wd_event+0xd/0x65
> > > Please report this to the ftrace maintainer.
> > > Disabling ftraced. Boot with ftrace_keep_on_nmi to not disable.
> > > Pid: 0, comm: swapper Not tainted 2.6.26-tip #96
> > >
> > > Call Trace:
> > > <NMI> [<ffffffff8021c6d0>] ? lapic_wd_event+0xd/0x65
> > > [<ffffffff8027b9c1>] ftrace_record_ip+0xa3/0x357
> > > [<ffffffff8020c0f4>] mcount_call+0x5/0x31
> > > [<ffffffff8021c6d5>] ? lapic_wd_event+0x12/0x65
> > > [<ffffffff804b90d4>] nmi_watchdog_tick+0x21b/0x230
> > > [<ffffffff804b8487>] default_do_nmi+0x73/0x1e0
> > > [<ffffffff804b8a04>] do_nmi+0x64/0x91
> > > [<ffffffff804b80bf>] nmi+0x7f/0x80
> > > [<ffffffff80212c14>] ? default_idle+0x35/0x4f
> > > <<EOE>> [<ffffffff8020ae42>] cpu_idle+0x8a/0xc9
> > > [<ffffffff804b15a6>] start_secondary+0x172/0x177
> > >
> > > --------------- end cut here ---------------
> > >
> > >
> > > This appears once when it is caught. We are hoping that this will not
> > > appear often, and are running code to catch it as it does.
> > >
> > > -- Steve
> > >
> >
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-30 14:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-30 1:29 [PATCH 0/5] ftrace: do not trace NMI handlers Steven Rostedt
2008-07-30 1:29 ` [PATCH 1/5] ftrace: do not trace nmi callers in x86 Steven Rostedt
2008-07-30 1:29 ` [PATCH 2/5] ftrace: do not trace nmi callers in drivers Steven Rostedt
2008-07-30 1:29 ` [PATCH 3/5] ftrace: do not trace nmi callers in RCU Steven Rostedt
2008-07-30 1:29 ` [PATCH 4/5] ftrace: do not trace nmi callers in kernel directory Steven Rostedt
2008-07-30 1:29 ` [PATCH 5/5] ftrace: warn on NMI calling code that may be modified Steven Rostedt
2008-07-30 7:28 ` [PATCH 0/5] ftrace: do not trace NMI handlers Peter Zijlstra
2008-07-30 14:04 ` Mathieu Desnoyers
2008-07-30 14:23 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox