* [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer
@ 2009-04-16 16:15 Steven Rostedt
2009-04-16 16:15 ` [PATCH 1/2] tracing/events: perform function tracing in event selftests Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-04-16 16:15 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Frederic Weisbecker
Ingo,
I did not add your delay.h fix, since I was not sure if you added it
(I saw a commit sha1 in the email).
I actually wrote the function trace event tester last night. But because
it would sometimes lock up the system on bootup I held off.
But I also found that the lockdep trace points where they are would lock up
my system too. If I placed them inside the current->lockdep_recursion,
everything ran stable.
I ran the lockdep trace points enabled all night with the second patch.
Without the second patch, it would lockup within a few minutes.
Please pull the latest tip/tracing/core-v2 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/core-v2
Steven Rostedt (2):
tracing/events: perform function tracing in event selftests
tracing/events/lockdep: move tracepoints within recursive protection
----
kernel/lockdep.c | 14 ++++----
kernel/trace/trace_events.c | 80 +++++++++++++++++++++++++++++++++++++++----
2 files changed, 80 insertions(+), 14 deletions(-)
--
^ permalink raw reply [flat|nested] 30+ messages in thread* [PATCH 1/2] tracing/events: perform function tracing in event selftests 2009-04-16 16:15 [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Steven Rostedt @ 2009-04-16 16:15 ` Steven Rostedt 2009-04-17 16:10 ` [tip:tracing/core] " tip-bot for Steven Rostedt 2009-04-16 16:15 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 2009-04-16 16:40 ` [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Ingo Molnar 2 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 16:15 UTC (permalink / raw) To: linux-kernel Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker [-- Attachment #1: 0001-tracing-events-perform-function-tracing-in-event-se.patch --] [-- Type: text/plain, Size: 3719 bytes --] From: Steven Rostedt <srostedt@redhat.com> We can find some bugs in the trace events if we stress the writes as well. The function tracer is a good way to stress the events. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- kernel/trace/trace_events.c | 80 +++++++++++++++++++++++++++++++++++++++---- 1 files changed, 73 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 96934f0..dd3eb59 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1015,7 +1015,7 @@ static __init void event_test_stuff(void) * For every trace event defined, we will test each trace point separately, * and then by groups, and finally all trace points. */ -static __init int event_trace_self_tests(void) +static __init void event_trace_self_tests(void) { struct ftrace_event_call *call; struct event_subsystem *system; @@ -1069,7 +1069,7 @@ static __init int event_trace_self_tests(void) sysname = kstrdup(system->name, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } ret = ftrace_set_clr_event(sysname, 1); kfree(sysname); @@ -1084,7 +1084,7 @@ static __init int event_trace_self_tests(void) sysname = kstrdup(system->name, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } ret = ftrace_set_clr_event(sysname, 0); kfree(sysname); @@ -1104,14 +1104,14 @@ static __init int event_trace_self_tests(void) sysname = kmalloc(4, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } memcpy(sysname, "*:*", 4); ret = ftrace_set_clr_event(sysname, 1); if (WARN_ON_ONCE(ret)) { kfree(sysname); pr_warning("error enabling all events\n"); - return 0; + return; } event_test_stuff(); @@ -1123,14 +1123,80 @@ static __init int event_trace_self_tests(void) if (WARN_ON_ONCE(ret)) { pr_warning("error disabling all events\n"); - return 0; + return; } pr_cont("OK\n"); +} + +#ifdef CONFIG_FUNCTION_TRACER + +static DEFINE_PER_CPU(atomic_t, test_event_disable); + +static void +function_test_events_call(unsigned long ip, unsigned long parent_ip) +{ + struct ring_buffer_event *event; + struct ftrace_entry *entry; + unsigned long flags; + long disabled; + int resched; + int cpu; + int pc; + + pc = preempt_count(); + resched = ftrace_preempt_disable(); + cpu = raw_smp_processor_id(); + disabled = atomic_inc_return(&per_cpu(test_event_disable, cpu)); + + if (disabled != 1) + goto out; + + local_save_flags(flags); + + event = trace_current_buffer_lock_reserve(TRACE_FN, sizeof(*entry), + flags, pc); + if (!event) + goto out; + entry = ring_buffer_event_data(event); + entry->ip = ip; + entry->parent_ip = parent_ip; + + trace_current_buffer_unlock_commit(event, flags, pc); + + out: + atomic_dec(&per_cpu(test_event_disable, cpu)); + ftrace_preempt_enable(resched); +} + +static struct ftrace_ops trace_ops __initdata = +{ + .func = function_test_events_call, +}; + +static __init void event_trace_self_test_with_function(void) +{ + register_ftrace_function(&trace_ops); + pr_info("Running tests again, along with the function tracer\n"); + event_trace_self_tests(); + unregister_ftrace_function(&trace_ops); +} +#else +static __init void event_trace_self_test_with_function(void) +{ +} +#endif + +static __init int event_trace_self_tests_init(void) +{ + + event_trace_self_tests(); + + event_trace_self_test_with_function(); return 0; } -late_initcall(event_trace_self_tests); +late_initcall(event_trace_self_tests_init); #endif -- 1.6.2.1 -- ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [tip:tracing/core] tracing/events: perform function tracing in event selftests 2009-04-16 16:15 ` [PATCH 1/2] tracing/events: perform function tracing in event selftests Steven Rostedt @ 2009-04-17 16:10 ` tip-bot for Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: tip-bot for Steven Rostedt @ 2009-04-17 16:10 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, peterz, fweisbec, rostedt, akpm, srostedt, tglx, mingo Commit-ID: 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0 Gitweb: http://git.kernel.org/tip/9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0 Author: Steven Rostedt <srostedt@redhat.com> AuthorDate: Thu, 16 Apr 2009 12:15:44 -0400 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 17 Apr 2009 17:10:35 +0200 tracing/events: perform function tracing in event selftests We can find some bugs in the trace events if we stress the writes as well. The function tracer is a good way to stress the events. [ Impact: extend scope of event tracer self-tests ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090416161746.604786131@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/trace/trace_events.c | 78 +++++++++++++++++++++++++++++++++++++++--- 1 files changed, 72 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 7163a2b..1137f95 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1017,7 +1017,7 @@ static __init void event_test_stuff(void) * For every trace event defined, we will test each trace point separately, * and then by groups, and finally all trace points. */ -static __init int event_trace_self_tests(void) +static __init void event_trace_self_tests(void) { struct ftrace_event_call *call; struct event_subsystem *system; @@ -1071,7 +1071,7 @@ static __init int event_trace_self_tests(void) sysname = kstrdup(system->name, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } ret = ftrace_set_clr_event(sysname, 1); kfree(sysname); @@ -1086,7 +1086,7 @@ static __init int event_trace_self_tests(void) sysname = kstrdup(system->name, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } ret = ftrace_set_clr_event(sysname, 0); kfree(sysname); @@ -1106,14 +1106,14 @@ static __init int event_trace_self_tests(void) sysname = kmalloc(4, GFP_KERNEL); if (WARN_ON(!sysname)) { pr_warning("Can't allocate memory, giving up!\n"); - return 0; + return; } memcpy(sysname, "*:*", 4); ret = ftrace_set_clr_event(sysname, 1); if (WARN_ON_ONCE(ret)) { kfree(sysname); pr_warning("error enabling all events\n"); - return 0; + return; } event_test_stuff(); @@ -1125,10 +1125,76 @@ static __init int event_trace_self_tests(void) if (WARN_ON_ONCE(ret)) { pr_warning("error disabling all events\n"); - return 0; + return; } pr_cont("OK\n"); +} + +#ifdef CONFIG_FUNCTION_TRACER + +static DEFINE_PER_CPU(atomic_t, test_event_disable); + +static void +function_test_events_call(unsigned long ip, unsigned long parent_ip) +{ + struct ring_buffer_event *event; + struct ftrace_entry *entry; + unsigned long flags; + long disabled; + int resched; + int cpu; + int pc; + + pc = preempt_count(); + resched = ftrace_preempt_disable(); + cpu = raw_smp_processor_id(); + disabled = atomic_inc_return(&per_cpu(test_event_disable, cpu)); + + if (disabled != 1) + goto out; + + local_save_flags(flags); + + event = trace_current_buffer_lock_reserve(TRACE_FN, sizeof(*entry), + flags, pc); + if (!event) + goto out; + entry = ring_buffer_event_data(event); + entry->ip = ip; + entry->parent_ip = parent_ip; + + trace_current_buffer_unlock_commit(event, flags, pc); + + out: + atomic_dec(&per_cpu(test_event_disable, cpu)); + ftrace_preempt_enable(resched); +} + +static struct ftrace_ops trace_ops __initdata = +{ + .func = function_test_events_call, +}; + +static __init void event_trace_self_test_with_function(void) +{ + register_ftrace_function(&trace_ops); + pr_info("Running tests again, along with the function tracer\n"); + event_trace_self_tests(); + unregister_ftrace_function(&trace_ops); +} +#else +static __init void event_trace_self_test_with_function(void) +{ +} +#endif + +static __init int event_trace_self_tests_init(void) +{ + + event_trace_self_tests(); + + event_trace_self_test_with_function(); return 0; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 16:15 [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Steven Rostedt 2009-04-16 16:15 ` [PATCH 1/2] tracing/events: perform function tracing in event selftests Steven Rostedt @ 2009-04-16 16:15 ` Steven Rostedt 2009-04-16 16:47 ` Peter Zijlstra 2009-04-16 16:40 ` [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Ingo Molnar 2 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 16:15 UTC (permalink / raw) To: linux-kernel Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker, Peter Zijlstra [-- Attachment #1: 0002-tracing-events-lockdep-move-tracepoints-within-recu.patch --] [-- Type: text/plain, Size: 1696 bytes --] From: Steven Rostedt <srostedt@redhat.com> With the current location of the tracepoints in lockdep, the system can hard lockup in minutes when the tracepoints are enabled. Moving the tracepoints outside inside the lockdep protection solves the issue. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- kernel/lockdep.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/lockdep.c b/kernel/lockdep.c index 47b201e..53b887a 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -2941,11 +2941,11 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, { unsigned long flags; - trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip); - if (unlikely(current->lockdep_recursion)) return; + trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip); + raw_local_irq_save(flags); check_flags(flags); @@ -2962,11 +2962,11 @@ void lock_release(struct lockdep_map *lock, int nested, { unsigned long flags; - trace_lock_release(lock, nested, ip); - if (unlikely(current->lockdep_recursion)) return; + trace_lock_release(lock, nested, ip); + raw_local_irq_save(flags); check_flags(flags); current->lockdep_recursion = 1; @@ -3118,14 +3118,14 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip) { unsigned long flags; + if (unlikely(current->lockdep_recursion)) + return; + trace_lock_contended(lock, ip); if (unlikely(!lock_stat)) return; - if (unlikely(current->lockdep_recursion)) - return; - raw_local_irq_save(flags); check_flags(flags); current->lockdep_recursion = 1; -- 1.6.2.1 -- ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 16:15 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt @ 2009-04-16 16:47 ` Peter Zijlstra 2009-04-16 17:03 ` Steven Rostedt 2009-04-16 17:45 ` Steven Rostedt 0 siblings, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2009-04-16 16:47 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > plain text document attachment > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > From: Steven Rostedt <srostedt@redhat.com> > > With the current location of the tracepoints in lockdep, the system > can hard lockup in minutes when the tracepoints are enabled. > > Moving the tracepoints outside inside the lockdep protection solves > the issue. NAK the idea is to eventually move lockdep on top of the tracepoints. The tracer should grow to be more robust and handle recursion itself. Its likely a case of the tracer using a spinlock or mutex in the tracepoint code. When I did the tracepoints I converted one such to a raw_spinlock_t in the trace_print code. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 16:47 ` Peter Zijlstra @ 2009-04-16 17:03 ` Steven Rostedt 2009-04-16 17:20 ` Peter Zijlstra 2009-04-16 17:45 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 17:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 16 Apr 2009, Peter Zijlstra wrote: > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > plain text document attachment > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > From: Steven Rostedt <srostedt@redhat.com> > > > > With the current location of the tracepoints in lockdep, the system > > can hard lockup in minutes when the tracepoints are enabled. > > > > Moving the tracepoints outside inside the lockdep protection solves > > the issue. > > NAK > > the idea is to eventually move lockdep on top of the tracepoints. The > tracer should grow to be more robust and handle recursion itself. > > Its likely a case of the tracer using a spinlock or mutex in the > tracepoint code. When I did the tracepoints I converted one such to a > raw_spinlock_t in the trace_print code. Note, that the ring buffer and events are made to be recursive. That is, it allows one event to trace within another event. If the tracepoint is triggered by something within the trace point handler, then we are screwed. That needs to be fixed. I have not seen what is triggering back into locking. The ring buffer and what I can see by the event code, does not grab any locks besides raw ones. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:03 ` Steven Rostedt @ 2009-04-16 17:20 ` Peter Zijlstra 2009-04-16 17:38 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-04-16 17:20 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 2009-04-16 at 13:03 -0400, Steven Rostedt wrote: > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > > plain text document attachment > > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > > From: Steven Rostedt <srostedt@redhat.com> > > > > > > With the current location of the tracepoints in lockdep, the system > > > can hard lockup in minutes when the tracepoints are enabled. > > > > > > Moving the tracepoints outside inside the lockdep protection solves > > > the issue. > > > > NAK > > > > the idea is to eventually move lockdep on top of the tracepoints. The > > tracer should grow to be more robust and handle recursion itself. > > > > Its likely a case of the tracer using a spinlock or mutex in the > > tracepoint code. When I did the tracepoints I converted one such to a > > raw_spinlock_t in the trace_print code. > > Note, that the ring buffer and events are made to be recursive. That is, > it allows one event to trace within another event. But surely not in the same context. You could do a 4 level recursion protection like I did in perf-counter, not allowing recursion in: nmi, irq, softirq, process - context. That allows you to trace an irq while you're tracing something in process context, etc.. But not allow recursion on the same level. > If the tracepoint is > triggered by something within the trace point handler, then we are > screwed. That needs to be fixed. Exactly the thing you want to detect and warn about, preferably with a nice stack trace. > I have not seen what is triggering back into locking. The ring buffer and > what I can see by the event code, does not grab any locks besides raw > ones. Well, it used to all work, so something snuck in. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:20 ` Peter Zijlstra @ 2009-04-16 17:38 ` Steven Rostedt 2009-04-16 17:49 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 17:38 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 16 Apr 2009, Peter Zijlstra wrote: > On Thu, 2009-04-16 at 13:03 -0400, Steven Rostedt wrote: > > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > > > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > > > plain text document attachment > > > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > > > From: Steven Rostedt <srostedt@redhat.com> > > > > > > > > With the current location of the tracepoints in lockdep, the system > > > > can hard lockup in minutes when the tracepoints are enabled. > > > > > > > > Moving the tracepoints outside inside the lockdep protection solves > > > > the issue. > > > > > > NAK > > > > > > the idea is to eventually move lockdep on top of the tracepoints. The > > > tracer should grow to be more robust and handle recursion itself. > > > > > > Its likely a case of the tracer using a spinlock or mutex in the > > > tracepoint code. When I did the tracepoints I converted one such to a > > > raw_spinlock_t in the trace_print code. > > > > Note, that the ring buffer and events are made to be recursive. That is, > > it allows one event to trace within another event. > > But surely not in the same context. You could do a 4 level recursion > protection like I did in perf-counter, not allowing recursion in: > > nmi, irq, softirq, process - context. Why not allow a nested interrupt to trace? I don't want to add this logic to the lower levels, where only a few users need the protection. The protecting should be at the user level. > > That allows you to trace an irq while you're tracing something in > process context, etc.. But not allow recursion on the same level. > > > If the tracepoint is > > triggered by something within the trace point handler, then we are > > screwed. That needs to be fixed. > > Exactly the thing you want to detect and warn about, preferably with a > nice stack trace. Its hard when you want to allow nesting. > > > I have not seen what is triggering back into locking. The ring buffer and > > what I can see by the event code, does not grab any locks besides raw > > ones. > > Well, it used to all work, so something snuck in. Note, it seems only the lockdep has issues with nesting. Perhaps when I can publish the lockless ring buffer this will all go away? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:38 ` Steven Rostedt @ 2009-04-16 17:49 ` Peter Zijlstra 2009-04-16 17:58 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-04-16 17:49 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 2009-04-16 at 13:38 -0400, Steven Rostedt wrote: > > > Note, that the ring buffer and events are made to be recursive. That is, > > > it allows one event to trace within another event. > > > > But surely not in the same context. You could do a 4 level recursion > > protection like I did in perf-counter, not allowing recursion in: > > > > nmi, irq, softirq, process - context. > > Why not allow a nested interrupt to trace? > > I don't want to add this logic to the lower levels, where only a few > users need the protection. The protecting should be at the user level. wouldn't you want to disable preemption/softirq/irqs in the tracer -- to avoid such recursion to begin with (preemption isn't even strictly needed if you put the recursion count in the task struct, as each task has a new stack anyway). I think having a recursion detection in place is far more valuable than being able to recursively trace interrupts and the like, which are exceedingly rare (on x86, and power and other arch with multiple interrupt levels that each have their own stack can extend the recursion levels too). > > That allows you to trace an irq while you're tracing something in > > process context, etc.. But not allow recursion on the same level. > > > > > If the tracepoint is > > > triggered by something within the trace point handler, then we are > > > screwed. That needs to be fixed. > > > > Exactly the thing you want to detect and warn about, preferably with a > > nice stack trace. > > Its hard when you want to allow nesting. Hard never stopped us before, did it ;-) > > > I have not seen what is triggering back into locking. The ring buffer and > > > what I can see by the event code, does not grab any locks besides raw > > > ones. > > > > Well, it used to all work, so something snuck in. > > Note, it seems only the lockdep has issues with nesting. Perhaps when I > can publish the lockless ring buffer this will all go away? I doubt it, it shouldn't happen as it stands -- so this patch only hides the real issue. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:49 ` Peter Zijlstra @ 2009-04-16 17:58 ` Steven Rostedt 2009-04-16 18:06 ` Peter Zijlstra 2009-04-16 18:22 ` Mathieu Desnoyers 0 siblings, 2 replies; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 17:58 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers [ added Maitheu, since he likes things like this ] On Thu, 16 Apr 2009, Peter Zijlstra wrote: > On Thu, 2009-04-16 at 13:38 -0400, Steven Rostedt wrote: > > > > > Note, that the ring buffer and events are made to be recursive. That is, > > > > it allows one event to trace within another event. > > > > > > But surely not in the same context. You could do a 4 level recursion > > > protection like I did in perf-counter, not allowing recursion in: > > > > > > nmi, irq, softirq, process - context. > > > > Why not allow a nested interrupt to trace? > > > > I don't want to add this logic to the lower levels, where only a few > > users need the protection. The protecting should be at the user level. > > wouldn't you want to disable preemption/softirq/irqs in the tracer -- to > avoid such recursion to begin with (preemption isn't even strictly > needed if you put the recursion count in the task struct, as each task > has a new stack anyway). No, we only disable preemption, nothing more. Interrupts and softirqs are free to happen. Also, we allow tracing of NMIs. > > I think having a recursion detection in place is far more valuable than > being able to recursively trace interrupts and the like, which are > exceedingly rare (on x86, and power and other arch with multiple > interrupt levels that each have their own stack can extend the recursion > levels too). Is there any arch generic way to tell what level you are at? That is, at thread context, you are at level 0, if an interrupt comes in, it sets you to level 1, if another interrupt comes in, it sets you to level 2, and so on. I guess we could add this into the irq_enter/exit sofirq_enter/exit and nmi_enter/exit. Thus we can have each task with a bitmask. When we start to trace, we set the bit coresponding to the level the task is at. Ie. in thread context, we set bit 0, if we are interrupted by a softirq/irq/nmi, we set the level bit we are at. Hmm, we might be able to do this via the preempt count already :-/ Just add the softirq/irq/nmi bits together. The if the bit is already set we can dump out a warning. I'll try that out. > > > > That allows you to trace an irq while you're tracing something in > > > process context, etc.. But not allow recursion on the same level. > > > > > > > If the tracepoint is > > > > triggered by something within the trace point handler, then we are > > > > screwed. That needs to be fixed. > > > > > > Exactly the thing you want to detect and warn about, preferably with a > > > nice stack trace. > > > > Its hard when you want to allow nesting. > > Hard never stopped us before, did it ;-) And it may not be that hard if we do the above. -- Steve > > > > > I have not seen what is triggering back into locking. The ring buffer and > > > > what I can see by the event code, does not grab any locks besides raw > > > > ones. > > > > > > Well, it used to all work, so something snuck in. > > > > Note, it seems only the lockdep has issues with nesting. Perhaps when I > > can publish the lockless ring buffer this will all go away? > > I doubt it, it shouldn't happen as it stands -- so this patch only hides > the real issue. > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:58 ` Steven Rostedt @ 2009-04-16 18:06 ` Peter Zijlstra 2009-04-16 18:12 ` Steven Rostedt 2009-04-16 18:22 ` Mathieu Desnoyers 1 sibling, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-04-16 18:06 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers On Thu, 2009-04-16 at 13:58 -0400, Steven Rostedt wrote: > > [ added Maitheu, since he likes things like this ] > > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > On Thu, 2009-04-16 at 13:38 -0400, Steven Rostedt wrote: > > > > > > > Note, that the ring buffer and events are made to be recursive. That is, > > > > > it allows one event to trace within another event. > > > > > > > > But surely not in the same context. You could do a 4 level recursion > > > > protection like I did in perf-counter, not allowing recursion in: > > > > > > > > nmi, irq, softirq, process - context. > > > > > > Why not allow a nested interrupt to trace? > > > > > > I don't want to add this logic to the lower levels, where only a few > > > users need the protection. The protecting should be at the user level. > > > > wouldn't you want to disable preemption/softirq/irqs in the tracer -- to > > avoid such recursion to begin with (preemption isn't even strictly > > needed if you put the recursion count in the task struct, as each task > > has a new stack anyway). > > No, we only disable preemption, nothing more. Interrupts and softirqs are > free to happen. Also, we allow tracing of NMIs. Right. > > I think having a recursion detection in place is far more valuable than > > being able to recursively trace interrupts and the like, which are > > exceedingly rare (on x86, and power and other arch with multiple > > interrupt levels that each have their own stack can extend the recursion > > levels too). > > Is there any arch generic way to tell what level you are at? No, on x86 there are a few broken ass pieces of hardware/drivers that require interrupts enabled in the interrupt handler, and can cause interrupt recursion -- these should be rare and IMHO can be ignored, esp for a FTRACE_DEBUG option that detects recursion -- that is, simply disable interrupts when entering the tracer in irq context. IRQ level nesting like on power would need some arch support. > That is, at thread context, you are at level 0, if an interrupt comes > in, it sets you to level 1, if another interrupt comes in, it sets you to > level 2, and so on. > > I guess we could add this into the irq_enter/exit sofirq_enter/exit and > nmi_enter/exit. > > Thus we can have each task with a bitmask. When we start to trace, we set > the bit coresponding to the level the task is at. > > Ie. in thread context, we set bit 0, if we are interrupted by a > softirq/irq/nmi, we set the level bit we are at. Hmm, we might be able to > do this via the preempt count already :-/ > > Just add the softirq/irq/nmi bits together. > > The if the bit is already set we can dump out a warning. > > I'll try that out. static int *perf_swcounter_recursion_context(struct perf_cpu_context *cpuctx) { if (in_nmi()) return &cpuctx->recursion[3]; if (in_irq()) return &cpuctx->recursion[2]; if (in_softirq()) return &cpuctx->recursion[1]; return &cpuctx->recursion[0]; } Is what I use for perf-counters. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 18:06 ` Peter Zijlstra @ 2009-04-16 18:12 ` Steven Rostedt 2009-04-16 18:29 ` Mathieu Desnoyers 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 18:12 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > That is, at thread context, you are at level 0, if an interrupt comes > > in, it sets you to level 1, if another interrupt comes in, it sets you to > > level 2, and so on. > > > > I guess we could add this into the irq_enter/exit sofirq_enter/exit and > > nmi_enter/exit. > > > > Thus we can have each task with a bitmask. When we start to trace, we set > > the bit coresponding to the level the task is at. > > > > Ie. in thread context, we set bit 0, if we are interrupted by a > > softirq/irq/nmi, we set the level bit we are at. Hmm, we might be able to > > do this via the preempt count already :-/ > > > > Just add the softirq/irq/nmi bits together. > > > > The if the bit is already set we can dump out a warning. > > > > I'll try that out. > > static int *perf_swcounter_recursion_context(struct perf_cpu_context *cpuctx) > { > if (in_nmi()) > return &cpuctx->recursion[3]; > > if (in_irq()) > return &cpuctx->recursion[2]; > > if (in_softirq()) > return &cpuctx->recursion[1]; > > return &cpuctx->recursion[0]; > } > > Is what I use for perf-counters. But does that allow multi nested interrupts? I'll try the preempt_count and let you know. Thanks, -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 18:12 ` Steven Rostedt @ 2009-04-16 18:29 ` Mathieu Desnoyers 0 siblings, 0 replies; 30+ messages in thread From: Mathieu Desnoyers @ 2009-04-16 18:29 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker * Steven Rostedt (rostedt@goodmis.org) wrote: > > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > > > That is, at thread context, you are at level 0, if an interrupt comes > > > in, it sets you to level 1, if another interrupt comes in, it sets you to > > > level 2, and so on. > > > > > > I guess we could add this into the irq_enter/exit sofirq_enter/exit and > > > nmi_enter/exit. > > > > > > Thus we can have each task with a bitmask. When we start to trace, we set > > > the bit coresponding to the level the task is at. > > > > > > Ie. in thread context, we set bit 0, if we are interrupted by a > > > softirq/irq/nmi, we set the level bit we are at. Hmm, we might be able to > > > do this via the preempt count already :-/ > > > > > > Just add the softirq/irq/nmi bits together. > > > > > > The if the bit is already set we can dump out a warning. > > > > > > I'll try that out. > > > > static int *perf_swcounter_recursion_context(struct perf_cpu_context *cpuctx) > > { > > if (in_nmi()) > > return &cpuctx->recursion[3]; > > > > if (in_irq()) > > return &cpuctx->recursion[2]; > > > > if (in_softirq()) > > return &cpuctx->recursion[1]; > > > > return &cpuctx->recursion[0]; > > } > > > > Is what I use for perf-counters. > > But does that allow multi nested interrupts? > > I'll try the preempt_count and let you know. > > Thanks, > > -- Steve > In practice, I have used a "tracing nesting level" detection counter in LTTng which drops events if they are at a nesting level of about 5 or more if my memory serves me well. This should be enough to detect recursive tracer calls before filling the stack while handling some of the worse nesting we can think of (thread + softirq + irq + nmi). And it's _really_ easy to implement, arch-independant, and has no special corner-case. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:58 ` Steven Rostedt 2009-04-16 18:06 ` Peter Zijlstra @ 2009-04-16 18:22 ` Mathieu Desnoyers 1 sibling, 0 replies; 30+ messages in thread From: Mathieu Desnoyers @ 2009-04-16 18:22 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker * Steven Rostedt (rostedt@goodmis.org) wrote: > > > [ added Maitheu, since he likes things like this ] > > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > On Thu, 2009-04-16 at 13:38 -0400, Steven Rostedt wrote: > > > > > > > Note, that the ring buffer and events are made to be recursive. That is, > > > > > it allows one event to trace within another event. > > > > > > > > But surely not in the same context. You could do a 4 level recursion > > > > protection like I did in perf-counter, not allowing recursion in: > > > > > > > > nmi, irq, softirq, process - context. > > > > > > Why not allow a nested interrupt to trace? > > > > > > I don't want to add this logic to the lower levels, where only a few > > > users need the protection. The protecting should be at the user level. > > > > wouldn't you want to disable preemption/softirq/irqs in the tracer -- to > > avoid such recursion to begin with (preemption isn't even strictly > > needed if you put the recursion count in the task struct, as each task > > has a new stack anyway). > > No, we only disable preemption, nothing more. Interrupts and softirqs are > free to happen. Also, we allow tracing of NMIs. > > > > > I think having a recursion detection in place is far more valuable than > > being able to recursively trace interrupts and the like, which are > > exceedingly rare (on x86, and power and other arch with multiple > > interrupt levels that each have their own stack can extend the recursion > > levels too). > > Is there any arch generic way to tell what level you are at? > > That is, at thread context, you are at level 0, if an interrupt comes > in, it sets you to level 1, if another interrupt comes in, it sets you to > level 2, and so on. > > I guess we could add this into the irq_enter/exit sofirq_enter/exit and > nmi_enter/exit. > > Thus we can have each task with a bitmask. When we start to trace, we set > the bit coresponding to the level the task is at. > > Ie. in thread context, we set bit 0, if we are interrupted by a > softirq/irq/nmi, we set the level bit we are at. Hmm, we might be able to > do this via the preempt count already :-/ > > Just add the softirq/irq/nmi bits together. > > The if the bit is already set we can dump out a warning. > > I'll try that out. > I think having a "recursive tracer call" detection is very valuable. e.g., if the trace code has a bug and causes a page fault, which happens to be instrumented and to call the tracer again, I would call this a recursive tracing call. In that case we want to drop the nested event. However, if the tracer code is interrupted, and the nested tracer code is simply "nested" without any recursion with the context underneath (and therefore will never cause infinite recursion), then everyhting should go smoothly, no need to drop anything. If the trap handlers do not change the preempt count, then I think your idea should work. Also don't forget the "in nmi" preempt count bit. Given that all you want to check is if the preempt count for softirq, irq and nmis have changed or not compared to the snapshot taken in the context underneath, I think you just have to remember those bits, no need to "add" them together. A simple copy, and a bitmask "and" to keep only nmi, irq, softirq bits in the test should be fine. Mathieu > > > > > > > That allows you to trace an irq while you're tracing something in > > > > process context, etc.. But not allow recursion on the same level. > > > > > > > > > If the tracepoint is > > > > > triggered by something within the trace point handler, then we are > > > > > screwed. That needs to be fixed. > > > > > > > > Exactly the thing you want to detect and warn about, preferably with a > > > > nice stack trace. > > > > > > Its hard when you want to allow nesting. > > > > Hard never stopped us before, did it ;-) > > And it may not be that hard if we do the above. > > -- Steve > > > > > > > > I have not seen what is triggering back into locking. The ring buffer and > > > > > what I can see by the event code, does not grab any locks besides raw > > > > > ones. > > > > > > > > Well, it used to all work, so something snuck in. > > > > > > Note, it seems only the lockdep has issues with nesting. Perhaps when I > > > can publish the lockless ring buffer this will all go away? > > > > I doubt it, it shouldn't happen as it stands -- so this patch only hides > > the real issue. > > > > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 16:47 ` Peter Zijlstra 2009-04-16 17:03 ` Steven Rostedt @ 2009-04-16 17:45 ` Steven Rostedt 2009-04-16 17:53 ` Peter Zijlstra 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-16 17:45 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 16 Apr 2009, Peter Zijlstra wrote: > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > plain text document attachment > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > From: Steven Rostedt <srostedt@redhat.com> > > > > With the current location of the tracepoints in lockdep, the system > > can hard lockup in minutes when the tracepoints are enabled. > > > > Moving the tracepoints outside inside the lockdep protection solves > > the issue. > > NAK Can we at least add this as a workaround. Basically, the lockdep tracepoints are broken as is. I do not plan on changing the logic of the events to prevent nesting. That's a feature I use. If we don't allow nesting, we must drop events, which is bad. The other answer is simply to remove the trace points in lockdep, until they work again. -- Steve > > the idea is to eventually move lockdep on top of the tracepoints. The > tracer should grow to be more robust and handle recursion itself. > > Its likely a case of the tracer using a spinlock or mutex in the > tracepoint code. When I did the tracepoints I converted one such to a > raw_spinlock_t in the trace_print code. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:45 ` Steven Rostedt @ 2009-04-16 17:53 ` Peter Zijlstra 2009-04-17 3:03 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-04-16 17:53 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 2009-04-16 at 13:45 -0400, Steven Rostedt wrote: > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > > plain text document attachment > > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > > From: Steven Rostedt <srostedt@redhat.com> > > > > > > With the current location of the tracepoints in lockdep, the system > > > can hard lockup in minutes when the tracepoints are enabled. > > > > > > Moving the tracepoints outside inside the lockdep protection solves > > > the issue. > > > > NAK > > Can we at least add this as a workaround. Basically, the lockdep > tracepoints are broken as is. I do not plan on changing the logic of the > events to prevent nesting. That's a feature I use. If we don't allow > nesting, we must drop events, which is bad. No, I would say any nesting is a bug, and should stop tracing and print a splat. No need to drop events. > The other answer is simply to remove the trace points in lockdep, until > they work again. Well, they worked when I added them, so someone broke it. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-16 17:53 ` Peter Zijlstra @ 2009-04-17 3:03 ` Steven Rostedt 2009-04-17 3:24 ` Steven Rostedt 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 0 siblings, 2 replies; 30+ messages in thread From: Steven Rostedt @ 2009-04-17 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 16 Apr 2009, Peter Zijlstra wrote: > On Thu, 2009-04-16 at 13:45 -0400, Steven Rostedt wrote: > > On Thu, 16 Apr 2009, Peter Zijlstra wrote: > > > > > On Thu, 2009-04-16 at 12:15 -0400, Steven Rostedt wrote: > > > > plain text document attachment > > > > (0002-tracing-events-lockdep-move-tracepoints-within-recu.patch) > > > > From: Steven Rostedt <srostedt@redhat.com> > > > > > > > > With the current location of the tracepoints in lockdep, the system > > > > can hard lockup in minutes when the tracepoints are enabled. > > > > > > > > Moving the tracepoints outside inside the lockdep protection solves > > > > the issue. > > > > > > NAK > > > > Can we at least add this as a workaround. Basically, the lockdep > > tracepoints are broken as is. I do not plan on changing the logic of the > > events to prevent nesting. That's a feature I use. If we don't allow > > nesting, we must drop events, which is bad. > > No, I would say any nesting is a bug, and should stop tracing and print > a splat. No need to drop events. > > > The other answer is simply to remove the trace points in lockdep, until > > they work again. > > Well, they worked when I added them, so someone broke it. I reverted my patch and added recursion detection, but I hit this nasty bug: --------------------------------- CPU 0 Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc ip6t_REJECT ip6table_filter ip6_tables ipv6 sbs sbshc battery sg evbug ide_cd_mod cdrom snd_intel8x0 snd_ac97_codec ac97_bus serio_raw snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss e1000 snd_pcm button floppy snd_timer i2c_i801 snd soundcore i2c_core snd_page_alloc ata_generic iTCO_wdt iTCO_vendor_support pata_acpi e752x_edac edac_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 3301, comm: pcscd Tainted: G W 2.6.30-rc1 #1036 Precision WorkStation 470 RIP: 0010:[<ffffffff803a59a2>] [<ffffffff803a59a2>] strlen+0x2/0x20 RSP: 0018:ffff880033d19c40 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffff807617df RCX: ffffffff807617de RDX: ffff880033d19d48 RSI: ffff880033d19c58 RDI: 5b5b5b5b5b5b5b5b RBP: ffff880033d19ca8 R08: 5b5b5b5b5b5b5b00 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000 R13: 5b5b5b5b5b5b5b5b R14: ffffffff81269b62 R15: ffff880033d19d08 FS: 0000000041b6f940(0063) GS:ffffffff80828000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006aec88 CR3: 0000000033c02000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process pcscd (pid: 3301, threadinfo ffff880033d18000, task ffff880033c48000) Stack: ffffffff803a73e5 ffffffff81269b60 ffffffff8126ab60 0000000000000004 0000000affffffff ffffffffffffffff 0000000000000000 ffff880033d19c98 0000000000000246 ffff880033d19d08 0000000000000000 0000000000000000 Call Trace: [<ffffffff803a73e5>] ? vbin_printf+0x1b5/0x2e0 [<ffffffff802b77dc>] trace_vbprintk+0xbc/0x1c0 [<ffffffff802887fe>] ? lock_acquire+0x10e/0x120 [<ffffffff802bad29>] __trace_bprintk+0x79/0x80 [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b [<ffffffff802b5dac>] ? tracing_record_cmdline+0x2c/0x100 [<ffffffff80284920>] ftrace_event_lock_acquire+0x60/0x70 [<ffffffff802887fe>] lock_acquire+0x10e/0x120 [<ffffffff8030a744>] ? dput+0x64/0x170 [<ffffffff80610336>] _spin_lock+0x36/0x70 [<ffffffff8030a744>] ? dput+0x64/0x170 [<ffffffff8039f98f>] ? _atomic_dec_and_lock+0x1f/0x50 [<ffffffff8030a744>] dput+0x64/0x170 [<ffffffff80315215>] dcache_dir_close+0x15/0x20 [<ffffffff802f8a51>] __fput+0xd1/0x230 [<ffffffff802f8f0b>] fput+0x1b/0x30 [<ffffffff802f566d>] filp_close+0x5d/0x90 [<ffffffff802f5746>] sys_close+0xa6/0x110 [<ffffffff802291af>] system_call_fastpath+0x16/0x1b Code: 01 48 83 f9 ff 0f 1f 40 00 74 0b 48 83 c7 01 0f b6 07 84 c0 75 e3 c9 31 c0 c3 c9 48 89 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 31 c0 <80> 3f 00 55 48 89 e5 74 11 48 89 f8 66 90 48 83 c0 01 80 38 00 RIP [<ffffffff803a59a2>] strlen+0x2/0x20 RSP <ffff880033d19c40> Doing an objdump: ffffffff803a59a0 <strlen>: ffffffff803a59a0: 31 c0 xor %eax,%eax ffffffff803a59a2: 80 3f 00 cmpb $0x0,(%rdi) ffffffff803a59a5: 55 push %rbp ffffffff803a59a6: 48 89 e5 mov %rsp,%rbp ffffffff803a59a9: 74 11 je ffffffff803a59bc <strlen+0x1c> The %rdi is: 5b5b5b5b5b5b5b5b Either there's a bug in the vbin_printf, or we have some crazy lock->name? TRACE_FORMAT(lock_acquire, TP_PROTO(struct lockdep_map *lock, unsigned int subclass, int trylock, int read, int check, struct lockdep_map *next_lock, unsigned long ip), TP_ARGS(lock, subclass, trylock, read, check, next_lock, ip), TP_FMT("%s%s%s", trylock ? "try " : "", read ? "read " : "", lock->name) ); I'll continue to look into this, and perhaps reboot and see what other nasties I hit. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 3:03 ` Steven Rostedt @ 2009-04-17 3:24 ` Steven Rostedt 2009-04-17 7:40 ` Peter Zijlstra 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-17 3:24 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 16 Apr 2009, Steven Rostedt wrote: > I reverted my patch and added recursion detection, but I hit this nasty > bug: > > --------------------------------- > CPU 0 > Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc > ip6t_REJECT ip6table_filter ip6_tables ipv6 sbs sbshc battery sg evbug > ide_cd_mod cdrom snd_intel8x0 snd_ac97_codec ac97_bus serio_raw > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device > snd_pcm_oss snd_mixer_oss e1000 snd_pcm button floppy snd_timer i2c_i801 > snd soundcore i2c_core snd_page_alloc ata_generic iTCO_wdt > iTCO_vendor_support pata_acpi e752x_edac edac_core pcspkr dm_snapshot > dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata sd_mod > scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > Pid: 3301, comm: pcscd Tainted: G W 2.6.30-rc1 #1036 Precision > WorkStation 470 > RIP: 0010:[<ffffffff803a59a2>] [<ffffffff803a59a2>] strlen+0x2/0x20 > RSP: 0018:ffff880033d19c40 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffffffff807617df RCX: ffffffff807617de > RDX: ffff880033d19d48 RSI: ffff880033d19c58 RDI: 5b5b5b5b5b5b5b5b > RBP: ffff880033d19ca8 R08: 5b5b5b5b5b5b5b00 R09: 0000000000000000 > R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000 > R13: 5b5b5b5b5b5b5b5b R14: ffffffff81269b62 R15: ffff880033d19d08 > FS: 0000000041b6f940(0063) GS:ffffffff80828000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000000006aec88 CR3: 0000000033c02000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 > Process pcscd (pid: 3301, threadinfo ffff880033d18000, task > ffff880033c48000) > Stack: > ffffffff803a73e5 ffffffff81269b60 ffffffff8126ab60 0000000000000004 > 0000000affffffff ffffffffffffffff 0000000000000000 ffff880033d19c98 > 0000000000000246 ffff880033d19d08 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff803a73e5>] ? vbin_printf+0x1b5/0x2e0 > [<ffffffff802b77dc>] trace_vbprintk+0xbc/0x1c0 > [<ffffffff802887fe>] ? lock_acquire+0x10e/0x120 > [<ffffffff802bad29>] __trace_bprintk+0x79/0x80 > [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b > [<ffffffff802b5dac>] ? tracing_record_cmdline+0x2c/0x100 > [<ffffffff80284920>] ftrace_event_lock_acquire+0x60/0x70 > [<ffffffff802887fe>] lock_acquire+0x10e/0x120 > [<ffffffff8030a744>] ? dput+0x64/0x170 > [<ffffffff80610336>] _spin_lock+0x36/0x70 > [<ffffffff8030a744>] ? dput+0x64/0x170 > [<ffffffff8039f98f>] ? _atomic_dec_and_lock+0x1f/0x50 > [<ffffffff8030a744>] dput+0x64/0x170 > [<ffffffff80315215>] dcache_dir_close+0x15/0x20 > [<ffffffff802f8a51>] __fput+0xd1/0x230 > [<ffffffff802f8f0b>] fput+0x1b/0x30 > [<ffffffff802f566d>] filp_close+0x5d/0x90 > [<ffffffff802f5746>] sys_close+0xa6/0x110 > [<ffffffff802291af>] system_call_fastpath+0x16/0x1b > Code: 01 48 83 f9 ff 0f 1f 40 00 74 0b 48 83 c7 01 0f b6 07 84 c0 75 e3 c9 > 31 c0 c3 c9 48 89 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 31 c0 <80> 3f 00 55 > 48 89 e5 74 11 48 89 f8 66 90 48 83 c0 01 80 38 00 > RIP [<ffffffff803a59a2>] strlen+0x2/0x20 > RSP <ffff880033d19c40> > > > Doing an objdump: > > ffffffff803a59a0 <strlen>: > ffffffff803a59a0: 31 c0 xor %eax,%eax > ffffffff803a59a2: 80 3f 00 cmpb $0x0,(%rdi) > ffffffff803a59a5: 55 push %rbp > ffffffff803a59a6: 48 89 e5 mov %rsp,%rbp > ffffffff803a59a9: 74 11 je ffffffff803a59bc <strlen+0x1c> > > The %rdi is: 5b5b5b5b5b5b5b5b > > Either there's a bug in the vbin_printf, or we have some crazy lock->name? > > TRACE_FORMAT(lock_acquire, > TP_PROTO(struct lockdep_map *lock, unsigned int subclass, > int trylock, int read, int check, > struct lockdep_map *next_lock, unsigned long ip), > TP_ARGS(lock, subclass, trylock, read, check, next_lock, ip), > TP_FMT("%s%s%s", trylock ? "try " : "", > read ? "read " : "", lock->name) > ); > > I'll continue to look into this, and perhaps reboot and see what other > nasties I hit. OK, I think I figured this bug out. This is a lockdep issue with respect to tracepoints. The trace points in lockdep are called all the time. Outside the lockdep logic. But if lockdep were to trigger an error / warning (which this run did) we might be in trouble. For new locks, like the dentry->d_lock, that are created, they will not get a name: void lockdep_init_map(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass) { if (unlikely(!debug_locks)) return; When a problem is found by lockdep, debug_locks becomes false. Thus we stop allocating names for locks. This dentry->d_lock I had, now has no name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non initialized memory. Thus, when the trace point was hit, it had junk for the lock->name, and the machine crashed. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 3:24 ` Steven Rostedt @ 2009-04-17 7:40 ` Peter Zijlstra 2009-04-17 16:03 ` [tip:core/urgent] lockdep: more robust lockdep_map init sequence tip-bot for Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-04-17 7:40 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker On Thu, 2009-04-16 at 23:24 -0400, Steven Rostedt wrote: > > Either there's a bug in the vbin_printf, or we have some crazy lock->name? > > > > TRACE_FORMAT(lock_acquire, > > TP_PROTO(struct lockdep_map *lock, unsigned int subclass, > > int trylock, int read, int check, > > struct lockdep_map *next_lock, unsigned long ip), > > TP_ARGS(lock, subclass, trylock, read, check, next_lock, ip), > > TP_FMT("%s%s%s", trylock ? "try " : "", > > read ? "read " : "", lock->name) > > ); > > > > I'll continue to look into this, and perhaps reboot and see what other > > nasties I hit. > > OK, I think I figured this bug out. This is a lockdep issue with respect > to tracepoints. > > The trace points in lockdep are called all the time. Outside the lockdep > logic. But if lockdep were to trigger an error / warning (which this run > did) we might be in trouble. For new locks, like the dentry->d_lock, that > are created, they will not get a name: > > void lockdep_init_map(struct lockdep_map *lock, const char *name, > struct lock_class_key *key, int subclass) > { > if (unlikely(!debug_locks)) > return; > > When a problem is found by lockdep, debug_locks becomes false. Thus we > stop allocating names for locks. This dentry->d_lock I had, now has no > name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non > initialized memory. Thus, when the trace point was hit, it had junk for > the lock->name, and the machine crashed. Ah, nice catch. I think we should put at least the name in regardless. Something like the below ought to do I guess: --- Subject: lockdep: more robust lockdep_map init sequence Ensure we at least initialize the trivial entries of the depmap so that they can be relied upon, even when lockdep itself decided to pack up and go home. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- kernel/lockdep.c | 22 ++++++++++++++-------- 1 files changed, 14 insertions(+), 8 deletions(-) diff --git a/kernel/lockdep.c b/kernel/lockdep.c index c4582a6..4516e5b 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -2490,13 +2490,20 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, void lockdep_init_map(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass) { - if (unlikely(!debug_locks)) + lock->class_cache = NULL; +#ifdef CONFIG_LOCK_STAT + lock->cpu = raw_smp_processor_id(); +#endif + + if (DEBUG_LOCKS_WARN_ON(!name)) { + lock->name = "NULL"; return; + } + + lock->name = name; if (DEBUG_LOCKS_WARN_ON(!key)) return; - if (DEBUG_LOCKS_WARN_ON(!name)) - return; /* * Sanity check, the lock-class key must be persistent: */ @@ -2505,12 +2512,11 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name, DEBUG_LOCKS_WARN_ON(1); return; } - lock->name = name; lock->key = key; - lock->class_cache = NULL; -#ifdef CONFIG_LOCK_STAT - lock->cpu = raw_smp_processor_id(); -#endif + + if (unlikely(!debug_locks)) + return; + if (subclass) register_lock_class(lock, subclass, 1); } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [tip:core/urgent] lockdep: more robust lockdep_map init sequence 2009-04-17 7:40 ` Peter Zijlstra @ 2009-04-17 16:03 ` tip-bot for Peter Zijlstra 0 siblings, 0 replies; 30+ messages in thread From: tip-bot for Peter Zijlstra @ 2009-04-17 16:03 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, fweisbec, rostedt, akpm, tglx, mingo Commit-ID: c8a250058656495be02c00de61e26b017c86ef00 Gitweb: http://git.kernel.org/tip/c8a250058656495be02c00de61e26b017c86ef00 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Fri, 17 Apr 2009 09:40:49 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 17 Apr 2009 18:00:00 +0200 lockdep: more robust lockdep_map init sequence Steven Rostedt reported: > OK, I think I figured this bug out. This is a lockdep issue with respect > to tracepoints. > > The trace points in lockdep are called all the time. Outside the lockdep > logic. But if lockdep were to trigger an error / warning (which this run > did) we might be in trouble. For new locks, like the dentry->d_lock, that > are created, they will not get a name: > > void lockdep_init_map(struct lockdep_map *lock, const char *name, > struct lock_class_key *key, int subclass) > { > if (unlikely(!debug_locks)) > return; > > When a problem is found by lockdep, debug_locks becomes false. Thus we > stop allocating names for locks. This dentry->d_lock I had, now has no > name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non > initialized memory. Thus, when the trace point was hit, it had junk for > the lock->name, and the machine crashed. Ah, nice catch. I think we should put at least the name in regardless. Ensure we at least initialize the trivial entries of the depmap so that they can be relied upon, even when lockdep itself decided to pack up and go home. [ Impact: fix lock tracing after lockdep warnings. ] Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1239954049.23397.4156.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/lockdep.c | 22 ++++++++++++++-------- 1 files changed, 14 insertions(+), 8 deletions(-) diff --git a/kernel/lockdep.c b/kernel/lockdep.c index b0f0118..accb40c 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -2490,13 +2490,20 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, void lockdep_init_map(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass) { - if (unlikely(!debug_locks)) + lock->class_cache = NULL; +#ifdef CONFIG_LOCK_STAT + lock->cpu = raw_smp_processor_id(); +#endif + + if (DEBUG_LOCKS_WARN_ON(!name)) { + lock->name = "NULL"; return; + } + + lock->name = name; if (DEBUG_LOCKS_WARN_ON(!key)) return; - if (DEBUG_LOCKS_WARN_ON(!name)) - return; /* * Sanity check, the lock-class key must be persistent: */ @@ -2505,12 +2512,11 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name, DEBUG_LOCKS_WARN_ON(1); return; } - lock->name = name; lock->key = key; - lock->class_cache = NULL; -#ifdef CONFIG_LOCK_STAT - lock->cpu = raw_smp_processor_id(); -#endif + + if (unlikely(!debug_locks)) + return; + if (subclass) register_lock_class(lock, subclass, 1); } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 3:03 ` Steven Rostedt 2009-04-17 3:24 ` Steven Rostedt @ 2009-04-17 4:16 ` Steven Rostedt 2009-04-17 4:36 ` Steven Rostedt ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Steven Rostedt @ 2009-04-17 4:16 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers, Masami Hiramatsu Here's the dump that I get that triggers the lockdep warning: WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0() Hardware name: Precision WorkStation 470 Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc [snip] Pid: 3768, comm: sshd Not tainted 2.6.30-rc1 #1036 Call Trace: [<ffffffff8025c321>] warn_slowpath+0xe1/0x110 [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 [<ffffffff8051a5a9>] ? __alloc_skb+0x49/0x160 [<ffffffff80282407>] check_flags+0x1a7/0x1d0 [<ffffffff80284d63>] lockdep_trace_alloc+0x33/0xe0 [<ffffffff802f3682>] kmem_cache_alloc+0x32/0x160 [<ffffffff8051a5a9>] __alloc_skb+0x49/0x160 [<ffffffff8059192d>] tcp_send_ack+0x2d/0xe0 [<ffffffff8058e941>] __tcp_ack_snd_check+0x61/0xb0 [<ffffffff80590408>] tcp_rcv_established+0x398/0x600 [<ffffffff80596a58>] tcp_v4_do_rcv+0x228/0x380 [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b [<ffffffff8058454e>] ? tcp_prequeue_process+0x2e/0xa0 [<ffffffff8058458d>] tcp_prequeue_process+0x6d/0xa0 [<ffffffff8058734a>] tcp_recvmsg+0x49a/0x880 [<ffffffff80514267>] sock_common_recvmsg+0x37/0x50 [<ffffffff805116b9>] sock_aio_read+0x109/0x110 [<ffffffff802f75f1>] do_sync_read+0xf1/0x130 [<ffffffff8022ec33>] ? sched_clock+0x13/0x20 [<ffffffff8022ec5d>] ? native_sched_clock+0x1d/0x50 [<ffffffff802737d0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b [<ffffffff80381469>] ? cap_file_permission+0x9/0x10 [<ffffffff80380116>] ? security_file_permission+0x16/0x20 [<ffffffff802f7fc9>] vfs_read+0x159/0x170 [<ffffffff802f8285>] sys_read+0x55/0x90 [<ffffffff802291af>] system_call_fastpath+0x16/0x1b ---[ end trace 03d889e04bc7a9a7 ]--- possible reason: unannotated irqs-on. irq event stamp: 12569 hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 Note, for some reason we hit int3 ?? Tracepoints do not use int3 does it? I have kprobes defined but not any kprobe self tests on. Anyway, let me describe what the above is and what I found in my investigation. The lockdep took a check_flags error when it noticed that interrupts were enabled, but the current->hardirqs_enabled was 0. Lockdep thought interrupts were disabled but they were in fact enabled. The last 4 lines of the warning have the numbers in the parenthesis annotate the order of events: (Here they are in order) softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 The last change that lockdep saw was interrupts being disabled by int3. I still don't understand why int3 was enabled. I have startup tests for ftrace and the event tracer, but this blob happened when I first ssh'd into the box. In arch/x86/kernel/entry_64.S we have: paranoidzeroentry_ist int3 do_int3 DEBUG_STACK .macro paranoidzeroentry sym do_sym ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq $-1 /* ORIG_RAX: no syscall to restart */ CFI_ADJUST_CFA_OFFSET 8 subq $15*8, %rsp call save_paranoid TRACE_IRQS_OFF movq %rsp,%rdi /* pt_regs pointer */ xorl %esi,%esi /* no error code */ call \do_sym jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC END(\sym) .endm ENTRY(paranoid_exit) INTR_FRAME DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF testl %ebx,%ebx /* swapgs needed? */ jnz paranoid_restore testl $3,CS(%rsp) jnz paranoid_userspace paranoid_swapgs: TRACE_IRQS_IRETQ 0 SWAPGS_UNSAFE_STACK paranoid_restore: RESTORE_ALL 8 jmp irq_return irq_return: INTERRUPT_RETURN INTERRUPT_RETURN is simply defined as iretq I see that we call TRACE_IRQS_OFF when entering paranoid_exit, but if we do not need to swapgs (we don't because int3 looks like it happened in kernel space) we just call irq_return and jump back. We miss the fact that the irq_return enables interrupts for us. I'd try to come up with a fix, but this gets a bit complex, and I figured I let the lockdep irq-tracing guru's play with this magic. I'm just reporting the problem ;-) -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt @ 2009-04-17 4:36 ` Steven Rostedt 2009-04-17 4:52 ` Mathieu Desnoyers 2009-04-17 7:44 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Peter Zijlstra 2 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2009-04-17 4:36 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers, Masami Hiramatsu On Fri, 17 Apr 2009, Steven Rostedt wrote: > > Note, for some reason we hit int3 ?? > > Tracepoints do not use int3 does it? > > I have kprobes defined but not any kprobe self tests on. > Interesting, I disabled kprobes and all lockdep issues go away. I no longer get the lockdep error message, and the lockdep tracing seems to be working fine (without the modification of the original post). I'll let it run the rest of the night and see how it is doing tomorrow morning. Night, -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 2009-04-17 4:36 ` Steven Rostedt @ 2009-04-17 4:52 ` Mathieu Desnoyers 2009-04-17 11:09 ` Steven Rostedt 2009-04-17 7:44 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Peter Zijlstra 2 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2009-04-17 4:52 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu * Steven Rostedt (rostedt@goodmis.org) wrote: > > Here's the dump that I get that triggers the lockdep warning: > > WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0() > Hardware name: Precision WorkStation 470 > Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc > > [snip] > > Pid: 3768, comm: sshd Not tainted 2.6.30-rc1 #1036 > Call Trace: > [<ffffffff8025c321>] warn_slowpath+0xe1/0x110 > [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 > [<ffffffff8051a5a9>] ? __alloc_skb+0x49/0x160 > [<ffffffff80282407>] check_flags+0x1a7/0x1d0 > [<ffffffff80284d63>] lockdep_trace_alloc+0x33/0xe0 > [<ffffffff802f3682>] kmem_cache_alloc+0x32/0x160 > [<ffffffff8051a5a9>] __alloc_skb+0x49/0x160 > [<ffffffff8059192d>] tcp_send_ack+0x2d/0xe0 > [<ffffffff8058e941>] __tcp_ack_snd_check+0x61/0xb0 > [<ffffffff80590408>] tcp_rcv_established+0x398/0x600 > [<ffffffff80596a58>] tcp_v4_do_rcv+0x228/0x380 > [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b > [<ffffffff8058454e>] ? tcp_prequeue_process+0x2e/0xa0 > [<ffffffff8058458d>] tcp_prequeue_process+0x6d/0xa0 > [<ffffffff8058734a>] tcp_recvmsg+0x49a/0x880 > [<ffffffff80514267>] sock_common_recvmsg+0x37/0x50 > [<ffffffff805116b9>] sock_aio_read+0x109/0x110 > [<ffffffff802f75f1>] do_sync_read+0xf1/0x130 > [<ffffffff8022ec33>] ? sched_clock+0x13/0x20 > [<ffffffff8022ec5d>] ? native_sched_clock+0x1d/0x50 > [<ffffffff802737d0>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b > [<ffffffff80381469>] ? cap_file_permission+0x9/0x10 > [<ffffffff80380116>] ? security_file_permission+0x16/0x20 > [<ffffffff802f7fc9>] vfs_read+0x159/0x170 > [<ffffffff802f8285>] sys_read+0x55/0x90 > [<ffffffff802291af>] system_call_fastpath+0x16/0x1b > ---[ end trace 03d889e04bc7a9a7 ]--- > possible reason: unannotated irqs-on. > irq event stamp: 12569 > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > > Note, for some reason we hit int3 ?? > > Tracepoints do not use int3 does it? > Not in your ftrace tree. My LTTng tree includes the immediate values, which brienfly uses the int3 handler when enabling/disabling tracepoints. But this seems unrelated to your problem. > I have kprobes defined but not any kprobe self tests on. > Could this be a userspace breakpoint then ? > Anyway, let me describe what the above is and what I found in my > investigation. > > The lockdep took a check_flags error when it noticed that interrupts were > enabled, but the current->hardirqs_enabled was 0. Lockdep thought > interrupts were disabled but they were in fact enabled. > > > The last 4 lines of the warning have the numbers in the parenthesis > annotate the order of events: (Here they are in order) > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > The last change that lockdep saw was interrupts being disabled by int3. I > still don't understand why int3 was enabled. I have startup tests for > ftrace and the event tracer, but this blob happened when I first ssh'd > into the box. Also note that maybe some entry.S annotation could be missing, making kallsyms think it was running within int3 when in fact it was running in a different function. I would double-check with objdump givin the ffffffff80610c76 address to make sure. Just giving my 2 cents before going to bed. I don't have any more brain left for tonight. Cheers, Mathieu > > In arch/x86/kernel/entry_64.S we have: > > paranoidzeroentry_ist int3 do_int3 DEBUG_STACK > > .macro paranoidzeroentry sym do_sym > ENTRY(\sym) > INTR_FRAME > PARAVIRT_ADJUST_EXCEPTION_FRAME > pushq $-1 /* ORIG_RAX: no syscall to restart */ > CFI_ADJUST_CFA_OFFSET 8 > subq $15*8, %rsp > call save_paranoid > TRACE_IRQS_OFF > movq %rsp,%rdi /* pt_regs pointer */ > xorl %esi,%esi /* no error code */ > call \do_sym > jmp paranoid_exit /* %ebx: no swapgs flag */ > CFI_ENDPROC > END(\sym) > .endm > > > ENTRY(paranoid_exit) > INTR_FRAME > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > testl %ebx,%ebx /* swapgs needed? */ > jnz paranoid_restore > testl $3,CS(%rsp) > jnz paranoid_userspace > paranoid_swapgs: > TRACE_IRQS_IRETQ 0 > SWAPGS_UNSAFE_STACK > paranoid_restore: > RESTORE_ALL 8 > jmp irq_return > > > irq_return: > INTERRUPT_RETURN > > > INTERRUPT_RETURN is simply defined as iretq > > > I see that we call TRACE_IRQS_OFF when entering paranoid_exit, but if we > do not need to swapgs (we don't because int3 looks like it happened in > kernel space) we just call irq_return and jump back. We miss the fact that > the irq_return enables interrupts for us. > > I'd try to come up with a fix, but this gets a bit complex, and I figured > I let the lockdep irq-tracing guru's play with this magic. I'm just > reporting the problem ;-) > > -- Steve > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 4:52 ` Mathieu Desnoyers @ 2009-04-17 11:09 ` Steven Rostedt 2009-04-17 22:19 ` [PATCH] x86 entry_64.S lockdep fix Mathieu Desnoyers 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-17 11:09 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu On Fri, 17 Apr 2009, Mathieu Desnoyers wrote: > > ---[ end trace 03d889e04bc7a9a7 ]--- > > possible reason: unannotated irqs-on. > > irq event stamp: 12569 > > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > > > > > Note, for some reason we hit int3 ?? > > > > Tracepoints do not use int3 does it? > > > > Not in your ftrace tree. My LTTng tree includes the immediate values, > which brienfly uses the int3 handler when enabling/disabling > tracepoints. But this seems unrelated to your problem. Maybe something else is adding it :-/ > > > I have kprobes defined but not any kprobe self tests on. > > > > Could this be a userspace breakpoint then ? Nope, it happens right in the kernel. And the path I describe below points out that we have a bug in irq tracing when we take a int3 in kernel space. > > > Anyway, let me describe what the above is and what I found in my > > investigation. > > > > The lockdep took a check_flags error when it noticed that interrupts were > > enabled, but the current->hardirqs_enabled was 0. Lockdep thought > > interrupts were disabled but they were in fact enabled. > > > > > > The last 4 lines of the warning have the numbers in the parenthesis > > annotate the order of events: (Here they are in order) > > > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > > > The last change that lockdep saw was interrupts being disabled by int3. I > > still don't understand why int3 was enabled. I have startup tests for > > ftrace and the event tracer, but this blob happened when I first ssh'd > > into the box. > > Also note that maybe some entry.S annotation could be missing, making > kallsyms think it was running within int3 when in fact it was running in > a different function. I would double-check with objdump givin the > ffffffff80610c76 address to make sure. Actually, that was the first thing I did. Because I thought int3 was weird. > > Just giving my 2 cents before going to bed. I don't have any more brain > left for tonight. Thanks, -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] x86 entry_64.S lockdep fix 2009-04-17 11:09 ` Steven Rostedt @ 2009-04-17 22:19 ` Mathieu Desnoyers 2009-04-18 13:54 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2009-04-17 22:19 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu * Steven Rostedt (rostedt@goodmis.org) wrote: > > > On Fri, 17 Apr 2009, Mathieu Desnoyers wrote: > > > ---[ end trace 03d889e04bc7a9a7 ]--- > > > possible reason: unannotated irqs-on. > > > irq event stamp: 12569 > > > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > > > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > > > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > > > > > > > > Note, for some reason we hit int3 ?? > > > > > > Tracepoints do not use int3 does it? > > > > > > > Not in your ftrace tree. My LTTng tree includes the immediate values, > > which brienfly uses the int3 handler when enabling/disabling > > tracepoints. But this seems unrelated to your problem. > > Maybe something else is adding it :-/ > > > > > > I have kprobes defined but not any kprobe self tests on. > > > > > > > Could this be a userspace breakpoint then ? > > Nope, it happens right in the kernel. And the path I describe below points > out that we have a bug in irq tracing when we take a int3 in kernel space. > > > > > > Anyway, let me describe what the above is and what I found in my > > > investigation. > > > > > > The lockdep took a check_flags error when it noticed that interrupts were > > > enabled, but the current->hardirqs_enabled was 0. Lockdep thought > > > interrupts were disabled but they were in fact enabled. > > > > > > > > > The last 4 lines of the warning have the numbers in the parenthesis > > > annotate the order of events: (Here they are in order) > > > > > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > > > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > > > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > > > > > The last change that lockdep saw was interrupts being disabled by int3. I > > > still don't understand why int3 was enabled. I have startup tests for > > > ftrace and the event tracer, but this blob happened when I first ssh'd > > > into the box. > > > > Also note that maybe some entry.S annotation could be missing, making > > kallsyms think it was running within int3 when in fact it was running in > > a different function. I would double-check with objdump givin the > > ffffffff80610c76 address to make sure. > > Actually, that was the first thing I did. Because I thought int3 was > weird. > > > > > > Just giving my 2 cents before going to bed. I don't have any more brain > > left for tonight. > > Thanks, > > -- Steve I happened to have the following patch hanging around in my LTTng tree for a while. Would it solve your problem by any chance ? I had to move it a bit around in my patchset to put it before the nmi-safe int3 handler patch I have, but it should apply correctly. x86 entry_64.S lockdep fix Add missing lockdep irq on instrumentation to entry_64.S. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> --- arch/x86/kernel/entry_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S =================================================================== --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 17:44:18.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-17 17:53:42.000000000 -0400 @@ -1420,9 +1420,9 @@ ENTRY(paranoid_exit) testl $3,CS(%rsp) jnz paranoid_userspace paranoid_swapgs: - TRACE_IRQS_IRETQ 0 SWAPGS_UNSAFE_STACK paranoid_restore: + TRACE_IRQS_IRETQ 0 RESTORE_ALL 8 jmp irq_return paranoid_userspace: -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] x86 entry_64.S lockdep fix 2009-04-17 22:19 ` [PATCH] x86 entry_64.S lockdep fix Mathieu Desnoyers @ 2009-04-18 13:54 ` Steven Rostedt 2009-04-19 4:11 ` Mathieu Desnoyers 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2009-04-18 13:54 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu On Fri, 17 Apr 2009, Mathieu Desnoyers wrote: > > I happened to have the following patch hanging around in my LTTng tree > for a while. Would it solve your problem by any chance ? I had to move > it a bit around in my patchset to put it before the nmi-safe int3 > handler patch I have, but it should apply correctly. > > > x86 entry_64.S lockdep fix > > Add missing lockdep irq on instrumentation to entry_64.S. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > --- > arch/x86/kernel/entry_64.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S > =================================================================== > --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 17:44:18.000000000 -0400 > +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-17 17:53:42.000000000 -0400 > @@ -1420,9 +1420,9 @@ ENTRY(paranoid_exit) > testl $3,CS(%rsp) > jnz paranoid_userspace > paranoid_swapgs: > - TRACE_IRQS_IRETQ 0 > SWAPGS_UNSAFE_STACK > paranoid_restore: > + TRACE_IRQS_IRETQ 0 This is buggy. If you go here via userspace, you just did a swapgs, and the %gs register (process context) is now zero. If you call kernel code that does anything with "current" you will crash the system. -- Steve > RESTORE_ALL 8 > jmp irq_return > paranoid_userspace: > > -- > Mathieu Desnoyers > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] x86 entry_64.S lockdep fix 2009-04-18 13:54 ` Steven Rostedt @ 2009-04-19 4:11 ` Mathieu Desnoyers 2009-04-19 9:10 ` Ingo Molnar 0 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2009-04-19 4:11 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu * Steven Rostedt (rostedt@goodmis.org) wrote: > > On Fri, 17 Apr 2009, Mathieu Desnoyers wrote: > > > > I happened to have the following patch hanging around in my LTTng tree > > for a while. Would it solve your problem by any chance ? I had to move > > it a bit around in my patchset to put it before the nmi-safe int3 > > handler patch I have, but it should apply correctly. > > > > > > x86 entry_64.S lockdep fix > > > > Add missing lockdep irq on instrumentation to entry_64.S. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > > --- > > arch/x86/kernel/entry_64.S | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S > > =================================================================== > > --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 17:44:18.000000000 -0400 > > +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-17 17:53:42.000000000 -0400 > > @@ -1420,9 +1420,9 @@ ENTRY(paranoid_exit) > > testl $3,CS(%rsp) > > jnz paranoid_userspace > > paranoid_swapgs: > > - TRACE_IRQS_IRETQ 0 > > SWAPGS_UNSAFE_STACK > > paranoid_restore: > > + TRACE_IRQS_IRETQ 0 > > This is buggy. If you go here via userspace, you just did a swapgs, and > the %gs register (process context) is now zero. If you call kernel code > that does anything with "current" you will crash the system. > Argh, I should not have extracted my fix from my patchset and try to reorder the patches that late in the day. Sorry, you are indeed right. On the bright side, the patch I send earlier has never hit a repository. Here is a more sensible version. Add missing lockdep irq on instrumentation to entry_64.S. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> --- arch/x86/kernel/entry_64.S | 3 +++ 1 file changed, 3 insertions(+) Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S =================================================================== --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 18:34:51.000000000 -0400 +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-18 23:41:28.000000000 -0400 @@ -1422,7 +1422,10 @@ ENTRY(paranoid_exit) paranoid_swapgs: TRACE_IRQS_IRETQ 0 SWAPGS_UNSAFE_STACK + RESTORE_ALL 8 + jmp irq_return paranoid_restore: + TRACE_IRQS_IRETQ 0 RESTORE_ALL 8 jmp irq_return paranoid_userspace: > -- Steve > > > > RESTORE_ALL 8 > > jmp irq_return > > paranoid_userspace: > > > > -- > > Mathieu Desnoyers > > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 > > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] x86 entry_64.S lockdep fix 2009-04-19 4:11 ` Mathieu Desnoyers @ 2009-04-19 9:10 ` Ingo Molnar 0 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2009-04-19 9:10 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Steven Rostedt, Peter Zijlstra, LKML, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Masami Hiramatsu * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote: > * Steven Rostedt (rostedt@goodmis.org) wrote: > > > > On Fri, 17 Apr 2009, Mathieu Desnoyers wrote: > > > > > > I happened to have the following patch hanging around in my LTTng tree > > > for a while. Would it solve your problem by any chance ? I had to move > > > it a bit around in my patchset to put it before the nmi-safe int3 > > > handler patch I have, but it should apply correctly. > > > > > > > > > x86 entry_64.S lockdep fix > > > > > > Add missing lockdep irq on instrumentation to entry_64.S. > > > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > > > --- > > > arch/x86/kernel/entry_64.S | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S > > > =================================================================== > > > --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 17:44:18.000000000 -0400 > > > +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-17 17:53:42.000000000 -0400 > > > @@ -1420,9 +1420,9 @@ ENTRY(paranoid_exit) > > > testl $3,CS(%rsp) > > > jnz paranoid_userspace > > > paranoid_swapgs: > > > - TRACE_IRQS_IRETQ 0 > > > SWAPGS_UNSAFE_STACK > > > paranoid_restore: > > > + TRACE_IRQS_IRETQ 0 > > > > This is buggy. If you go here via userspace, you just did a swapgs, and > > the %gs register (process context) is now zero. If you call kernel code > > that does anything with "current" you will crash the system. > > > > Argh, I should not have extracted my fix from my patchset and try to > reorder the patches that late in the day. Sorry, you are indeed right. > On the bright side, the patch I send earlier has never hit a repository. > Here is a more sensible version. > > > Add missing lockdep irq on instrumentation to entry_64.S. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > --- > arch/x86/kernel/entry_64.S | 3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S > =================================================================== > --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S 2009-04-17 18:34:51.000000000 -0400 > +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S 2009-04-18 23:41:28.000000000 -0400 > @@ -1422,7 +1422,10 @@ ENTRY(paranoid_exit) > paranoid_swapgs: > TRACE_IRQS_IRETQ 0 > SWAPGS_UNSAFE_STACK > + RESTORE_ALL 8 > + jmp irq_return > paranoid_restore: > + TRACE_IRQS_IRETQ 0 > RESTORE_ALL 8 > jmp irq_return > paranoid_userspace: Yeah - that is exactly the fix from Steve that i have queued up two days ago - see it attached below. Ingo ------------> >From 0300e7f1a525ae4e4ac05344624adf0e5f13ea52 Mon Sep 17 00:00:00 2001 From: Steven Rostedt <srostedt@redhat.com> Date: Fri, 17 Apr 2009 08:33:52 -0400 Subject: [PATCH] lockdep, x86: account for irqs enabled in paranoid_exit I hit the check_flags error of lockdep: WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0() [...] hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 The check_flags warning of lockdep tells me that lockdep thought interrupts were disabled, but they were really enabled. The numbers in the above parenthesis show the order of events: 12566: softirqs last enabled: lock_sock_nested 12567: hardirqs last enabled: local_bh_enable 12568: softirqs last disabled: tcp_prequeue_process 12566: hardirqs last disabled: int3 int3 is a breakpoint! Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds break points into the kernel. The paranoid_exit of the return of int3 does not account for enabling interrupts on return to kernel. This code is a bit tricky since it is also used by the nmi handler (when lockdep is off), and we must be careful about the swapgs. We can not call kernel code after the swapgs has been performed. [ Impact: fix lockdep check_flags warning + self-turn-off ] Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/entry_64.S | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index a331ec3..38946c6 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1410,7 +1410,10 @@ ENTRY(paranoid_exit) paranoid_swapgs: TRACE_IRQS_IRETQ 0 SWAPGS_UNSAFE_STACK + RESTORE_ALL 8 + jmp irq_return paranoid_restore: + TRACE_IRQS_IRETQ 0 RESTORE_ALL 8 jmp irq_return paranoid_userspace: ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 2009-04-17 4:36 ` Steven Rostedt 2009-04-17 4:52 ` Mathieu Desnoyers @ 2009-04-17 7:44 ` Peter Zijlstra 2 siblings, 0 replies; 30+ messages in thread From: Peter Zijlstra @ 2009-04-17 7:44 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Ingo Molnar, Andrew Morton, Thomas Gleixner, Frederic Weisbecker, Mathieu Desnoyers, Masami Hiramatsu On Fri, 2009-04-17 at 00:16 -0400, Steven Rostedt wrote: > Here's the dump that I get that triggers the lockdep warning: > > WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0() > Hardware name: Precision WorkStation 470 > Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc > > [snip] > > Pid: 3768, comm: sshd Not tainted 2.6.30-rc1 #1036 > Call Trace: > [<ffffffff8025c321>] warn_slowpath+0xe1/0x110 > [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80286dca>] ? validate_chain+0x4ca/0x12d0 > [<ffffffff80287f7c>] ? __lock_acquire+0x3ac/0xb20 > [<ffffffff8051a5a9>] ? __alloc_skb+0x49/0x160 > [<ffffffff80282407>] check_flags+0x1a7/0x1d0 > [<ffffffff80284d63>] lockdep_trace_alloc+0x33/0xe0 > [<ffffffff802f3682>] kmem_cache_alloc+0x32/0x160 > [<ffffffff8051a5a9>] __alloc_skb+0x49/0x160 > [<ffffffff8059192d>] tcp_send_ack+0x2d/0xe0 > [<ffffffff8058e941>] __tcp_ack_snd_check+0x61/0xb0 > [<ffffffff80590408>] tcp_rcv_established+0x398/0x600 > [<ffffffff80596a58>] tcp_v4_do_rcv+0x228/0x380 > [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b > [<ffffffff8058454e>] ? tcp_prequeue_process+0x2e/0xa0 > [<ffffffff8058458d>] tcp_prequeue_process+0x6d/0xa0 > [<ffffffff8058734a>] tcp_recvmsg+0x49a/0x880 > [<ffffffff80514267>] sock_common_recvmsg+0x37/0x50 > [<ffffffff805116b9>] sock_aio_read+0x109/0x110 > [<ffffffff802f75f1>] do_sync_read+0xf1/0x130 > [<ffffffff8022ec33>] ? sched_clock+0x13/0x20 > [<ffffffff8022ec5d>] ? native_sched_clock+0x1d/0x50 > [<ffffffff802737d0>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff80228ed6>] ? ftrace_call+0x5/0x2b > [<ffffffff80381469>] ? cap_file_permission+0x9/0x10 > [<ffffffff80380116>] ? security_file_permission+0x16/0x20 > [<ffffffff802f7fc9>] vfs_read+0x159/0x170 > [<ffffffff802f8285>] sys_read+0x55/0x90 > [<ffffffff802291af>] system_call_fastpath+0x16/0x1b > ---[ end trace 03d889e04bc7a9a7 ]--- > possible reason: unannotated irqs-on. > irq event stamp: 12569 > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > > > Note, for some reason we hit int3 ?? > > Tracepoints do not use int3 does it? > > I have kprobes defined but not any kprobe self tests on. CONFIG_NET_TCPPROBE perhaps? > Anyway, let me describe what the above is and what I found in my > investigation. > > The lockdep took a check_flags error when it noticed that interrupts were > enabled, but the current->hardirqs_enabled was 0. Lockdep thought > interrupts were disabled but they were in fact enabled. > > > The last 4 lines of the warning have the numbers in the parenthesis > annotate the order of events: (Here they are in order) > > softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 > hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 > softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 > hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 > > The last change that lockdep saw was interrupts being disabled by int3. I > still don't understand why int3 was enabled. I have startup tests for > ftrace and the event tracer, but this blob happened when I first ssh'd > into the box. > > In arch/x86/kernel/entry_64.S we have: > > paranoidzeroentry_ist int3 do_int3 DEBUG_STACK > > ..macro paranoidzeroentry sym do_sym > ENTRY(\sym) > INTR_FRAME > PARAVIRT_ADJUST_EXCEPTION_FRAME > pushq $-1 /* ORIG_RAX: no syscall to restart */ > CFI_ADJUST_CFA_OFFSET 8 > subq $15*8, %rsp > call save_paranoid > TRACE_IRQS_OFF > movq %rsp,%rdi /* pt_regs pointer */ > xorl %esi,%esi /* no error code */ > call \do_sym > jmp paranoid_exit /* %ebx: no swapgs flag */ > CFI_ENDPROC > END(\sym) > ..endm > > > ENTRY(paranoid_exit) > INTR_FRAME > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > testl %ebx,%ebx /* swapgs needed? */ > jnz paranoid_restore > testl $3,CS(%rsp) > jnz paranoid_userspace > paranoid_swapgs: > TRACE_IRQS_IRETQ 0 > SWAPGS_UNSAFE_STACK > paranoid_restore: > RESTORE_ALL 8 > jmp irq_return > > > irq_return: > INTERRUPT_RETURN > > > INTERRUPT_RETURN is simply defined as iretq > > > I see that we call TRACE_IRQS_OFF when entering paranoid_exit, but if we > do not need to swapgs (we don't because int3 looks like it happened in > kernel space) we just call irq_return and jump back. We miss the fact that > the irq_return enables interrupts for us. > > I'd try to come up with a fix, but this gets a bit complex, and I figured > I let the lockdep irq-tracing guru's play with this magic. I'm just > reporting the problem ;-) Hehe, I think I've started several times at this kprobes stuff, always makes my head hurt. I'll give it another go. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer 2009-04-16 16:15 [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Steven Rostedt 2009-04-16 16:15 ` [PATCH 1/2] tracing/events: perform function tracing in event selftests Steven Rostedt 2009-04-16 16:15 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt @ 2009-04-16 16:40 ` Ingo Molnar 2 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2009-04-16 16:40 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker * Steven Rostedt <rostedt@goodmis.org> wrote: > Ingo, > > I did not add your delay.h fix, since I was not sure if you added > it (I saw a commit sha1 in the email). Forgot to push it out in the big impact-line flamewars^W discussions ;-) > I actually wrote the function trace event tester last night. But > because it would sometimes lock up the system on bootup I held > off. > > But I also found that the lockdep trace points where they are > would lock up my system too. If I placed them inside the > current->lockdep_recursion, everything ran stable. Do you think this explains the self-test lockups i reported? The lockups happened in the skb test, so i'm unsure. I left the workaround (which comments the self-test out) for the time being, could you check whether the config i sent works fine for you? > I ran the lockdep trace points enabled all night with the second > patch. Without the second patch, it would lockup within a few > minutes. > > Please pull the latest tip/tracing/core-v2 tree, which can be > found at: > > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git > tip/tracing/core-v2 > > > Steven Rostedt (2): > tracing/events: perform function tracing in event selftests > tracing/events/lockdep: move tracepoints within recursive protection > > ---- > kernel/lockdep.c | 14 ++++---- > kernel/trace/trace_events.c | 80 +++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 80 insertions(+), 14 deletions(-) Pulled (from email - i fixed up the conflict with the workaround and i added the new-style impact lines), thanks a lot Steve! Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-04-19 9:11 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-16 16:15 [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Steven Rostedt 2009-04-16 16:15 ` [PATCH 1/2] tracing/events: perform function tracing in event selftests Steven Rostedt 2009-04-17 16:10 ` [tip:tracing/core] " tip-bot for Steven Rostedt 2009-04-16 16:15 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 2009-04-16 16:47 ` Peter Zijlstra 2009-04-16 17:03 ` Steven Rostedt 2009-04-16 17:20 ` Peter Zijlstra 2009-04-16 17:38 ` Steven Rostedt 2009-04-16 17:49 ` Peter Zijlstra 2009-04-16 17:58 ` Steven Rostedt 2009-04-16 18:06 ` Peter Zijlstra 2009-04-16 18:12 ` Steven Rostedt 2009-04-16 18:29 ` Mathieu Desnoyers 2009-04-16 18:22 ` Mathieu Desnoyers 2009-04-16 17:45 ` Steven Rostedt 2009-04-16 17:53 ` Peter Zijlstra 2009-04-17 3:03 ` Steven Rostedt 2009-04-17 3:24 ` Steven Rostedt 2009-04-17 7:40 ` Peter Zijlstra 2009-04-17 16:03 ` [tip:core/urgent] lockdep: more robust lockdep_map init sequence tip-bot for Peter Zijlstra 2009-04-17 4:16 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Steven Rostedt 2009-04-17 4:36 ` Steven Rostedt 2009-04-17 4:52 ` Mathieu Desnoyers 2009-04-17 11:09 ` Steven Rostedt 2009-04-17 22:19 ` [PATCH] x86 entry_64.S lockdep fix Mathieu Desnoyers 2009-04-18 13:54 ` Steven Rostedt 2009-04-19 4:11 ` Mathieu Desnoyers 2009-04-19 9:10 ` Ingo Molnar 2009-04-17 7:44 ` [PATCH 2/2] tracing/events/lockdep: move tracepoints within recursive protection Peter Zijlstra 2009-04-16 16:40 ` [PATCH 0/2] [GIT PULL] updates for event tester and lockdep tracer Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox