From: Ingo Molnar <mingo@elte.hu>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Frédéric Weisbecker" <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Glauber de Oliveira Costa" <gcosta@redhat.com>,
"Chris Wright" <chrisw@sous-sol.org>,
"Jeremy Fitzhardinge" <jeremy@goop.org>,
"Rusty Russell" <rusty@rustcorp.com.au>,
"Pekka Enberg" <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes
Date: Thu, 23 Apr 2009 10:20:31 +0200 [thread overview]
Message-ID: <20090423082031.GA599@elte.hu> (raw)
In-Reply-To: <alpine.DEB.2.00.0904222200350.24310@gandalf.stny.rr.com>
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 22 Apr 2009, Steven Rostedt wrote:
>
> >
> >
> > On Tue, 21 Apr 2009, Ingo Molnar wrote:
> >
> > >
> > > * Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > > > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > > > - lets see how it goes today.
> > >
> > > i'm still getting a ring-buffer warning - see below. Config
> > > attached.
> > >
> > > Ingo
> > >
> > > [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> > > [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> > > [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> > > [ 7.674206] Running tests on trace events:
> > > [ 7.675143] Testing event kfree_skb: OK
> > > [ 7.680145] Testing event kmalloc: OK
> > > [ 7.683819] Testing event kmem_cache_alloc: OK
> > > [ 7.686755] Testing event kmalloc_node: OK
> > > [ 7.689744] Testing event kmem_cache_alloc_node: OK
> > > [ 7.692711] Testing event kfree: OK
> > > [ 7.695712] Testing event kmem_cache_free: OK
> > > [ 7.698744] Testing event irq_handler_exit: OK
> > > [ 7.701724] Testing event irq_handler_entry: OK
> > > [ 7.704710] Testing event softirq_entry: OK
> > > [ 7.707743] Testing event softirq_exit: OK
> > > [ 7.710768] Testing event lock_acquired: OK
> > > [ 7.713738] Testing event lock_acquire: OK
> > > [ 7.716782] Testing event lock_release: OK
> > > [ 7.719771] Testing event lock_contended: OK
> > > [ 7.722740] Testing event sched_kthread_stop: OK
> > > [ 7.726741] Testing event sched_kthread_stop_ret: OK
> > > [ 7.729710] Testing event sched_wait_task: OK
> > > [ 7.732706] Testing event sched_wakeup: OK
> > > [ 7.735713] Testing event sched_wakeup_new: OK
> > > [ 7.738727] Testing event sched_switch: OK
> > > [ 7.741724] Testing event sched_migrate_task: OK
> > > [ 7.745711] Testing event sched_process_free: OK
> > > [ 7.749704] Testing event sched_process_exit: OK
> > > [ 7.753699] Testing event sched_process_wait: OK
> > > [ 7.757692] Testing event sched_process_fork: OK
> > > [ 7.761711] Testing event sched_signal_send: OK
> > > [ 7.764709] Running tests on trace event systems:
> > > [ 7.765144] Testing event system skb: OK
> > > [ 7.768776] Testing event system kmem: OK
> > > [ 7.771817] Testing event system irq: OK
> > > [ 7.774778] Testing event system lockdep: OK
> > > [ 7.777921] Testing event system sched: OK
> > > [ 7.780902] Running tests on all trace events:
> > > [ 7.781148] Testing all events: OK
> > > [ 7.787765] Running tests again, along with the function tracer
> > > [ 7.788156] Running tests on trace events:
> > > [ 7.789154] Testing event kfree_skb: OK
> > > [ 7.792858] Testing event kmalloc: OK
> > > [ 7.795853] Testing event kmem_cache_alloc: OK
> > > [ 7.799848] Testing event kmalloc_node: OK
> > > [ 7.802966] Testing event kmem_cache_alloc_node: OK
> > > [ 7.805970] Testing event kfree: OK
> > > [ 7.808942] Testing event kmem_cache_free: OK
> > > [ 7.812935] Testing event irq_handler_exit: OK
> > > [ 7.815850] Testing event irq_handler_entry: OK
> > > [ 7.818877] Testing event softirq_entry: OK
> > > [ 7.822846] Testing event softirq_exit: OK
> > > [ 7.825895] Testing event lock_acquired: OK
> > > [ 7.830045] Testing event lock_acquire: OK
> > > [ 7.834025] Testing event lock_release: OK
> > > [ 7.837920] Testing event lock_contended: OK
> > > [ 7.841854] Testing event sched_kthread_stop: OK
> > > [ 7.844883] Testing event sched_kthread_stop_ret: OK
> > > [ 7.847850] Testing event sched_wait_task: OK
> > > [ 7.851938] Testing event sched_wakeup: OK
> > > [ 7.855960] Testing event sched_wakeup_new: OK
> > > [ 7.858884] Testing event sched_switch: OK
> > > [ 7.861941] Testing event sched_migrate_task: OK
> > > [ 7.864972] Testing event sched_process_free: OK
> > > [ 7.867853] Testing event sched_process_exit: OK
> > > [ 7.870890] Testing event sched_process_wait: OK
> > > [ 7.873856] Testing event sched_process_fork: OK
> > > [ 7.876930] Testing event sched_signal_send: OK
> > > [ 7.879853] Running tests on trace event systems:
> > > [ 7.880154] Testing event system skb: OK
> > > [ 7.883940] Testing event system kmem: OK
> > > [ 7.888105] Testing event system irq: OK
> > > [ 7.892085] Testing event system lockdep: OK
> > > [ 7.896154] Testing event system sched: OK
> > > [ 7.901156] Running tests on all trace events:
> > > [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> > > [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
> > > [ 7.909137] Modules linked in:
> > > [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> > > [ 7.909137] Call Trace:
> > > [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> > > [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> > > [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> > > [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> > > [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> > > [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> > > [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> > > [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> > > [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> > > [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> > > [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> > > [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> > > [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > > [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> > > [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> > > [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> > > [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> > > [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> > > [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> > > [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> > > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> > > [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> > > [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> > > [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> > > [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> > > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > > [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> > > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > > [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> > > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > > [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> > > [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> > > [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> > > [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> > > [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> > > [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> > > [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> > > [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> > > [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> > > [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> > > [ 7.912099] OK
> > > [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> > > [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> > > [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> > > [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
> > >
> >
> > I spent the entire day (and half the night) debugging this. I was fighting
> > a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> > was mysteriously being set and cleared. I stepped through the entire
> > kernel thread fork process (that was an exercise) and could not find
> > anything wrong.
> >
> > Sometimes it would go away with printk's sometimes it would not. This was
> > driving me crazy, until I noticed that paravirt was enabled.
> >
> > Turning off paravirtualization here (so far) makes everything run
> > smoothly.
> >
> > Thus my theory is that there's something fishy with the modifying of the
> > irq enable/disable code when the system detects that it is running on bare
> > hardware.
> >
> > I'm too tired to look at this more. Ingo supplied a config to play with.
> > You can disable VSMP too and it will still trigger the crash.
> >
>
> OK, after spending two full days doing a config bisect to find out what is
> happening, I've found that CONFIG_PARAVIRT and CONFIG_DEBUG_PAGEALLOC do
> not play nicely. But I have yet to find out why.
>
> Here's the situation:
>
> We've added selftests for the event tracer. What this basically does is
> enables each event one at a time and runs tests. The tests include
> creating a kernel thread, executing workqueues and grabbing locks.
>
> We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> interrupts being enabled. When they are, we set a flag in the task struct
> "hardirqs_enabled". When they are disabled, this flag is cleared.
>
> When we fork a process, a test is made to see if the flag is set for the
> new process and if it is not, a warning is printed (as is done in the
> above dump).
>
> I investigated this and found that the flag is mysteriously being set and
> cleared for no apparent reason. The flag is in the task struct and nothing
> should be touching it. In fact, it is a full int, not even a bit in a
> flags variable.
>
> I had a test that would print the flag and irqs_disabled() in copy_process
> before the warning and it showed that the flag was cleared but irqs was
> enabled. The funny part is, I if the test triggered, I printed the flag
> again, and the second print it was set again!!
>
> if (!p->hardirqs_enabled) {
> printk("irqs:%d flag:%d\n", irqs_disabled(),
> p->hardirqs_enabled);
> printk("try again: %d\n", p->hardirqs_enabled);
> }
>
> The first print showed that it was cleared, the second showed it was set
> again??
function tracer was active? So somewhere there we corrupted this
state? It's unlikely that printk itself did this.
Ingo
next prev parent reply other threads:[~2009-04-23 8:22 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-20 22:22 [PATCH 0/2] [GIT PULL] tracing: various bug fixes Steven Rostedt
2009-04-20 22:22 ` [PATCH 1/2] tracing: use recursive counter over irq level Steven Rostedt
2009-04-20 22:22 ` [PATCH 2/2] tracing: use nowakeup version of commit for function event trace tests Steven Rostedt
2009-04-20 22:41 ` Frederic Weisbecker
2009-04-20 22:57 ` Steven Rostedt
2009-04-21 8:23 ` [PATCH 0/2] [GIT PULL] tracing: various bug fixes Ingo Molnar
2009-04-21 9:46 ` Ingo Molnar
2009-04-21 13:08 ` Steven Rostedt
2009-04-21 13:55 ` [PATCH][GIT PULL] ring-buffer: only warn on wrap if buffer is bigger than two pages Steven Rostedt
2009-04-21 14:35 ` Ingo Molnar
2009-04-21 14:54 ` Steven Rostedt
2009-04-21 16:22 ` Steven Rostedt
2009-04-21 14:03 ` [tip:tracing/core] " tip-bot for Steven Rostedt
2009-04-22 6:48 ` [PATCH 0/2] [GIT PULL] tracing: various bug fixes Steven Rostedt
2009-04-22 11:47 ` Frederic Weisbecker
2009-04-22 13:49 ` Steven Rostedt
2009-04-22 17:10 ` Frederic Weisbecker
2009-04-22 17:17 ` Jeremy Fitzhardinge
2009-04-22 17:22 ` Steven Rostedt
2009-04-22 21:32 ` Steven Rostedt
2009-04-23 8:28 ` Ingo Molnar
2009-04-23 3:35 ` Steven Rostedt
2009-04-23 8:20 ` Ingo Molnar [this message]
2009-04-23 13:53 ` Steven Rostedt
2009-04-23 14:39 ` Steven Rostedt
2009-04-23 15:08 ` Steven Rostedt
2009-04-23 15:11 ` Ingo Molnar
2009-04-23 16:49 ` Jeremy Fitzhardinge
2009-04-23 17:21 ` Chris Wright
2009-04-23 18:02 ` Chris Wright
2009-04-23 18:33 ` Steven Rostedt
2009-04-23 18:43 ` Chris Wright
2009-04-24 8:36 ` Ingo Molnar
2009-04-24 15:12 ` Chris Wright
2009-04-29 6:16 ` [tip:tracing/core] x86: use native register access for native tlb flushing tip-bot for Chris Wright
2009-04-23 17:52 ` [PATCH 0/2] [GIT PULL] tracing: various bug fixes Steven Rostedt
2009-04-24 8:34 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090423082031.GA599@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=chrisw@sous-sol.org \
--cc=fweisbec@gmail.com \
--cc=gcosta@redhat.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=penberg@cs.helsinki.fi \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).