* [BUG] perf: hard lockup when using perf-sched
@ 2010-03-17 7:21 Li Zefan
2010-03-24 5:52 ` Li Zefan
0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2010-03-17 7:21 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Frederic Weisbecker, Arnaldo Carvalho de Melo, Paul Mackerras,
LKML
[-- Attachment #1: Type: text/plain, Size: 374 bytes --]
An idea came to me that running perf-sched with a fork "bomb" might
reveal some bug.
So I wrote a program (attached), and did the following test:
# gcc fork.c
# ./a.out &
# perf sched record
After a few seconds, the machine froze, without printing any message.
I've tried the test both on v2.6.34-rc1 and v2.6.34-rc1-tip+.
I'll send you any other information you need.
[-- Attachment #2: fork.c --]
[-- Type: text/plain, Size: 404 bytes --]
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
int i;
int pid;
while (1) {
for (i = 0; i < 400; i++) {
pid = fork();
if (pid == 0) {
return 0;
} else if (pid < 0) {
fprintf(stderr, "faild to fork!\n");
continue;
}
}
for (i = 0; i < 400; i++)
if (wait(NULL) < 0)
break;
}
return 0;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-17 7:21 [BUG] perf: hard lockup when using perf-sched Li Zefan
@ 2010-03-24 5:52 ` Li Zefan
2010-03-24 7:32 ` Mike Galbraith
0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2010-03-24 5:52 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
Anyone have a look at this bug ?
Li Zefan wrote:
> An idea came to me that running perf-sched with a fork "bomb" might
> reveal some bug.
>
> So I wrote a program (attached), and did the following test:
>
> # gcc fork.c
> # ./a.out &
> # perf sched record
>
> After a few seconds, the machine froze, without printing any message.
>
> I've tried the test both on v2.6.34-rc1 and v2.6.34-rc1-tip+.
>
> I'll send you any other information you need.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-24 5:52 ` Li Zefan
@ 2010-03-24 7:32 ` Mike Galbraith
2010-03-24 8:17 ` Mike Galbraith
0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2010-03-24 7:32 UTC (permalink / raw)
To: Li Zefan
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
On Wed, 2010-03-24 at 13:52 +0800, Li Zefan wrote:
> Anyone have a look at this bug ?
>
> Li Zefan wrote:
> > An idea came to me that running perf-sched with a fork "bomb" might
> > reveal some bug.
> >
> > So I wrote a program (attached), and did the following test:
> >
> > # gcc fork.c
> > # ./a.out &
> > # perf sched record
> >
> > After a few seconds, the machine froze, without printing any message.
> >
> > I've tried the test both on v2.6.34-rc1 and v2.6.34-rc1-tip+.
> >
> > I'll send you any other information you need.
I just saw this, hunted down your testcase and tried it here. Looks
like perf_output_lock() wedged box.
[ 212.439223] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff81078ca2, registers:
[ 212.439223] CPU 2
[ 212.439223] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer firewire_ohci firewire_core crc_itu_t usb_storage snd ohci1394 soundcore snd_page_alloc usb_libusual rtc_cmos ieee1394 thermal button processor rtc_core sr_mod rtc_lib e1000e i2c_i801 cdrom sg usbhid hid uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 212.439223]
[ 212.439223] Pid: 0, comm: swapper Not tainted 2.6.34-smpx #1475 MS-7502/MS-7502
[ 212.439223] RIP: 0010:[<ffffffff81078ca2>] [<ffffffff81078ca2>] perf_output_begin+0x81/0x191
[ 212.439223] RSP: 0018:ffff880001703af0 EFLAGS: 00000086
[ 212.439223] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
[ 212.439223] RDX: 0000000000000058 RSI: ffff88003715c000 RDI: ffff880001703b40
[ 212.439223] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8800bb81fc00
[ 212.439223] R10: 0000000000000200 R11: 0000000000000000 R12: 0000000000000001
[ 212.439223] R13: ffff88000170f9e0 R14: ffff880001703c90 R15: ffff880001703b70
[ 212.439223] FS: 0000000000000000(0000) GS:ffff880001700000(0000) knlGS:0000000000000000
[ 212.439223] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.439223] CR2: 00007f1b2e3aa4a8 CR3: 0000000001420000 CR4: 00000000000006e0
[ 212.439223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.439223] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.439223] Process swapper (pid: 0, threadinfo ffff8800bf892000, task ffff8800bf843450)
[ 212.439223] Stack:
[ 212.439223] ffff880001703b70 ffffffff810797a3 ffff8800bf843450 ffff880000000000
[ 212.439223] <0> 00000000ffffffff 0000000000000002 ffff88000170f9e0 ffff8800bfb1c800
[ 212.439223] <0> 0000000000000000 ffffffff81079a14 ffff8800bb81fc00 ffff88003715c000
[ 212.439223] Call Trace:
[ 212.439223] <IRQ>
[ 212.439223] [<ffffffff810797a3>] ? perf_prepare_sample+0xa6/0x1ba
[ 212.439223] [<ffffffff81079a14>] ? __perf_event_overflow+0x15d/0x199
[ 212.439223] [<ffffffff81079a9e>] ? perf_swevent_overflow+0x4e/0x68
[ 212.439223] [<ffffffff81079be1>] ? perf_swevent_ctx_event+0xda/0xeb
[ 212.439223] [<ffffffff81079c3b>] ? do_perf_sw_event+0x49/0x85
[ 212.439223] [<ffffffff81079ca8>] ? perf_tp_event+0x31/0x36
[ 212.439223] [<ffffffff81023332>] ? perf_trace_templ_sched_wakeup_template+0xbe/0xd1
[ 212.439223] [<ffffffff8102b366>] ? try_to_wake_up+0x160/0x21f
[ 212.439223] [<ffffffff8102cde5>] ? load_balance+0x3b8/0x467
[ 212.439223] [<ffffffff8110cad1>] ? __blk_run_queue+0x77/0x133
[ 212.439223] [<ffffffffa00056e7>] ? scsi_run_queue+0x2d0/0x367 [scsi_mod]
[ 212.439223] [<ffffffff8102cf6a>] ? rebalance_domains+0xd6/0x144
[ 212.439223] [<ffffffff8102d013>] ? run_rebalance_domains+0x3b/0xc6
[ 212.439223] [<ffffffff81036270>] ? __do_softirq+0xd6/0x193
[ 212.439223] [<ffffffff810791cb>] ? perf_ctx_adjust_freq+0x10/0xec
[ 212.439223] [<ffffffff81002d0c>] ? call_softirq+0x1c/0x28
[ 212.439223] [<ffffffff81004942>] ? do_softirq+0x31/0x67
[ 212.439223] [<ffffffff810363bb>] ? irq_exit+0x36/0x7f
[ 212.439223] [<ffffffff81015e92>] ? smp_apic_timer_interrupt+0x87/0x97
[ 212.439223] [<ffffffff810027d3>] ? apic_timer_interrupt+0x13/0x20
[ 212.439223] <EOI>
[ 212.439223] [<ffffffff810090b1>] ? mwait_idle+0x9a/0x9f
[ 212.439223] [<ffffffff81000747>] ? cpu_idle+0x4f/0x6b
[ 212.439223] Code: c7 47 28 00 00 00 00 0f 45 d0 65 8b 0c 25 58 d1 00 00 c7 44 24 20 ff ff ff ff 89 4c 24 28 8b 5c 24 28 8b 44 24 20 f0 0f b1 5e 38 <89> 44 24 18 8b 44 24 18 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb
[ 212.439223] ---[ end trace f8a4ceb8268dbe33 ]---
[ 212.439228] BUG: NMI Watchdog detected LOCKUP on CPU1, ip ffffffff81078ca2, registers:
[ 212.439228] CPU 1
[ 212.439228] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer firewire_ohci firewire_core crc_itu_t usb_storage snd ohci1394 soundcore snd_page_alloc usb_libusual rtc_cmos ieee1394 thermal button processor rtc_core sr_mod rtc_lib e1000e i2c_i801 cdrom sg usbhid hid uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 212.439228]
[ 212.439228] Pid: 9149, comm: forkbomb Tainted: G D 2.6.34-smpx #1475 MS-7502/MS-7502
[ 212.439228] RIP: 0010:[<ffffffff81078ca2>] [<ffffffff81078ca2>] perf_output_begin+0x81/0x191
[ 212.439228] RSP: 0018:ffff880001683c68 EFLAGS: 00000086
[ 212.439228] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 212.439228] RDX: 0000000000000060 RSI: ffff88003715c000 RDI: ffff880001683cb8
[ 212.439228] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8800bb81fc00
[ 212.439228] R10: dead000000100100 R11: 0000000000000000 R12: 0000000000000001
[ 212.439228] R13: ffff88000168f9e0 R14: ffff880001683e08 R15: ffff880001683ce8
[ 212.439228] FS: 00007f1b2e582700(0000) GS:ffff880001680000(0000) knlGS:0000000000000000
[ 212.439228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.439228] CR2: 00007f1b2e3abe04 CR3: 00000000bc5a7000 CR4: 00000000000006e0
[ 212.439228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.439228] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.439228] Process forkbomb (pid: 9149, threadinfo ffff8800b839a000, task ffff8800b9719170)
[ 212.439228] Stack:
[ 212.439228] ffff880001683ce8 ffffffff810797a3 ffffffff81079a31 ffff880000000000
[ 212.439228] <0> 00000000ffffffff 0000000000000001 ffff88000168f9e0 ffff8800ba313000
[ 212.439228] <0> 0000000000000000 ffffffff81079a14 ffff8800bb81fc00 ffff88003715c000
[ 212.439228] Call Trace:
[ 212.439228] <IRQ>
[ 212.439228] [<ffffffff810797a3>] ? perf_prepare_sample+0xa6/0x1ba
[ 212.439228] [<ffffffff81079a31>] ? __perf_event_overflow+0x17a/0x199
[ 212.439228] [<ffffffff81079a14>] ? __perf_event_overflow+0x15d/0x199
[ 212.439228] [<ffffffff81079a9e>] ? perf_swevent_overflow+0x4e/0x68
[ 212.439228] [<ffffffff81079a9e>] ? perf_swevent_overflow+0x4e/0x68
[ 212.439228] [<ffffffff81079be1>] ? perf_swevent_ctx_event+0xda/0xeb
[ 212.439228] [<ffffffff81079c3b>] ? do_perf_sw_event+0x49/0x85
[ 212.439228] [<ffffffff81079c3b>] ? do_perf_sw_event+0x49/0x85
[ 212.439228] [<ffffffff81079ca8>] ? perf_tp_event+0x31/0x36
[ 212.439228] [<ffffffff81079ca8>] ? perf_tp_event+0x31/0x36
[ 212.439228] [<ffffffff810239ab>] ? perf_trace_sched_stat_runtime+0xbb/0xd0
[ 212.439228] [<ffffffff8102539e>] ? update_curr+0x86/0xca
[ 212.439228] [<ffffffff81025522>] ? task_tick_fair+0x3d/0x107
[ 212.439228] [<ffffffff8102dba2>] ? scheduler_tick+0xd0/0x1d8
[ 212.439228] [<ffffffff8103d7dd>] ? update_process_times+0x40/0x4c
[ 212.439228] [<ffffffff81051e1d>] ? tick_handle_periodic+0x18/0x5f
[ 212.439228] [<ffffffff81015e8d>] ? smp_apic_timer_interrupt+0x82/0x97
[ 212.439228] [<ffffffff810027d3>] ? apic_timer_interrupt+0x13/0x20
[ 212.439228] <EOI>
[ 212.439228] [<ffffffff811212b7>] ? clear_page_c+0x7/0x10
[ 212.439228] [<ffffffff81080c11>] ? get_page_from_freelist+0x373/0x3fc
[ 212.439228] [<ffffffff81080da9>] ? __alloc_pages_nodemask+0x10f/0x601
[ 212.439228] [<ffffffff8104c9b0>] ? sched_clock_local+0x10/0x75
[ 212.439228] [<ffffffff81080c11>] ? get_page_from_freelist+0x373/0x3fc
[ 212.439228] [<ffffffff810812ad>] ? __get_free_pages+0x12/0x52
[ 212.439228] [<ffffffff810905a7>] ? __pmd_alloc+0x14/0x71
[ 212.439228] [<ffffffff81091cfa>] ? copy_page_range+0x22a/0x6c4
[ 212.439228] [<ffffffff81079c3b>] ? do_perf_sw_event+0x49/0x85
[ 212.439228] [<ffffffff8102f17a>] ? dup_mm+0x2c0/0x44c
[ 212.439228] [<ffffffff8102faae>] ? copy_process+0x762/0xe05
[ 212.439228] [<ffffffff81090c6a>] ? handle_mm_fault+0x666/0x6ad
[ 212.439228] [<ffffffff810302cd>] ? do_fork+0x155/0x2ac
[ 212.439228] [<ffffffff8101c1da>] ? do_page_fault+0x264/0x282
[ 212.439228] [<ffffffff81002153>] ? stub_clone+0x13/0x20
[ 212.439228] [<ffffffff81001eeb>] ? system_call_fastpath+0x16/0x1b
[ 212.439228] Code: c7 47 28 00 00 00 00 0f 45 d0 65 8b 0c 25 58 d1 00 00 c7 44 24 20 ff ff ff ff 89 4c 24 28 8b 5c 24 28 8b 44 24 20 f0 0f b1 5e 38 <89> 44 24 18 8b 44 24 18 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb
[ 212.439228] ---[ end trace f8a4ceb8268dbe34 ]---
[ 212.439219] BUG: NMI Watchdog detected LOCKUP on CPU3, ip ffffffff81078ca2, registers:
[ 212.439219] CPU 3
[ 212.439219] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer firewire_ohci firewire_core crc_itu_t usb_storage snd ohci1394 soundcore snd_page_alloc usb_libusual rtc_cmos ieee1394 thermal button processor rtc_core sr_mod rtc_lib e1000e i2c_i801 cdrom sg usbhid hid uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 212.439219]
[ 212.439219] Pid: 7184, comm: forkbomb Tainted: G D 2.6.34-smpx #1475 MS-7502/MS-7502
[ 212.439219] RIP: 0010:[<ffffffff81078ca2>] [<ffffffff81078ca2>] perf_output_begin+0x81/0x191
[ 212.439219] RSP: 0000:ffff880001783b70 EFLAGS: 00000086
[ 212.439219] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000003
[ 212.439219] RDX: 0000000000000060 RSI: ffff88003715c000 RDI: ffff880001783bc0
[ 212.439219] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8800bb81fc00
[ 212.439219] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000001
[ 212.439219] R13: ffff88000178f9e0 R14: ffff880001783d10 R15: ffff880001783bf0
[ 212.439219] FS: 00007f1b2e582700(0000) GS:ffff880001780000(0000) knlGS:0000000000000000
[ 212.439219] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.439219] CR2: 00007f1b2e5cf980 CR3: 00000000bb914000 CR4: 00000000000006e0
[ 212.439219] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.439219] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.439219] Process forkbomb (pid: 7184, threadinfo ffff8800bbdb4000, task ffff8800b0097440)
[ 212.439219] Stack:
[ 212.439219] ffff880001783bf0 ffffffff810797a3 ffffffff810796f2 ffff880000000000
[ 212.439219] <0> 00000000ffffffff 0000000000000003 ffff88000178f9e0 ffff8800b0e00000
[ 212.439219] <0> 0000000000000000 ffffffff81079a14 ffff8800bb81fc00 ffff88003715c000
[ 212.439219] Call Trace:
[ 212.439219] <IRQ>
[ 212.439219] [<ffffffff810797a3>] ? perf_prepare_sample+0xa6/0x1ba
[ 212.439219] [<ffffffff810796f2>] ? perf_output_sample+0x16f/0x17a
[ 212.439219] [<ffffffff81079a14>] ? __perf_event_overflow+0x15d/0x199
[ 212.439219] [<ffffffff81079a9e>] ? perf_swevent_overflow+0x4e/0x68
[ 212.439219] [<ffffffff81079be1>] ? perf_swevent_ctx_event+0xda/0xeb
[ 212.439219] [<ffffffff81079be1>] ? perf_swevent_ctx_event+0xda/0xeb
[ 212.439219] [<ffffffff81079c3b>] ? do_perf_sw_event+0x49/0x85
[ 212.439219] [<ffffffff81079ca8>] ? perf_tp_event+0x31/0x36
[ 212.439219] [<ffffffff810239ab>] ? perf_trace_sched_stat_runtime+0xbb/0xd0
[ 212.439219] [<ffffffff8102539e>] ? update_curr+0x86/0xca
[ 212.439219] [<ffffffff8102695b>] ? enqueue_task_fair+0x82/0x180
[ 212.439219] [<ffffffff81028b04>] ? enqueue_task+0x42/0x52
[ 212.439219] [<ffffffff81028b36>] ? activate_task+0x22/0x28
[ 212.439219] [<ffffffff8102b344>] ? try_to_wake_up+0x13e/0x21f
[ 212.439219] [<ffffffff8103632b>] ? __do_softirq+0x191/0x193
[ 212.439219] [<ffffffff81052413>] ? tick_handle_periodic_broadcast+0xe/0x44
[ 212.439219] [<ffffffff810362e4>] ? __do_softirq+0x14a/0x193
[ 212.439219] [<ffffffff8105f82b>] ? handle_IRQ_event+0x53/0x102
[ 212.439219] [<ffffffff81002d0c>] ? call_softirq+0x1c/0x28
[ 212.439219] [<ffffffff81004942>] ? do_softirq+0x31/0x67
[ 212.439219] [<ffffffff810363bb>] ? irq_exit+0x36/0x7f
[ 212.439219] [<ffffffff8100469e>] ? do_IRQ+0xa3/0xbc
[ 212.439219] [<ffffffff812a3dd3>] ? ret_from_intr+0x0/0xa
[ 212.439219] <EOI>
[ 212.439219] [<ffffffff8101bff4>] ? do_page_fault+0x7e/0x282
[ 212.439219] [<ffffffff8101c1da>] ? do_page_fault+0x264/0x282
[ 212.439219] [<ffffffff81121b6d>] ? __put_user_4+0x1d/0x30
[ 212.439219] [<ffffffff812a3fdf>] ? page_fault+0x1f/0x30
[ 212.439219] Code: c7 47 28 00 00 00 00 0f 45 d0 65 8b 0c 25 58 d1 00 00 c7 44 24 20 ff ff ff ff 89 4c 24 28 8b 5c 24 28 8b 44 24 20 f0 0f b1 5e 38 <89> 44 24 18 8b 44 24 18 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb
[ 212.439219] ---[ end trace f8a4ceb8268dbe35 ]---
[ 212.439228] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff812a3a29, registers:
[ 212.439228] CPU 0
[ 212.439228] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer firewire_ohci firewire_core crc_itu_t usb_storage snd ohci1394 soundcore snd_page_alloc usb_libusual rtc_cmos ieee1394 thermal button processor rtc_core sr_mod rtc_lib e1000e i2c_i801 cdrom sg usbhid hid uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 212.439228]
[ 212.439228] Pid: 7183, comm: forkbomb Tainted: G D 2.6.34-smpx #1475 MS-7502/MS-7502
[ 212.439228] RIP: 0010:[<ffffffff812a3a29>] [<ffffffff812a3a29>] _raw_spin_lock+0x10/0x15
[ 212.439228] RSP: 0018:ffff880001603e68 EFLAGS: 00000097
[ 212.439228] RAX: 0000000000000504 RBX: ffff8800bf841170 RCX: ffffffffffffd809
[ 212.439228] RDX: 0000000000000000 RSI: ffff880001603ea8 RDI: ffff880001612e40
[ 212.439228] RBP: ffff880001612e40 R08: 00000000242d7b68 R09: 0000000000000034
[ 212.439228] R10: 0000000000000034 R11: ffff88003715c000 R12: 0000000000012e40
[ 212.439228] R13: ffff880001603ea8 R14: 0000000000000000 R15: 000000000000e200
[ 212.439228] FS: 0000000000000000(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
[ 212.439228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.439228] CR2: 00007f1b2e3aa4a8 CR3: 0000000001420000 CR4: 00000000000006f0
[ 212.439228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.439228] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.439228] Process forkbomb (pid: 7183, threadinfo ffff8800b774a000, task ffff8800b0097a10)
[ 212.439228] Stack:
[ 212.439228] ffffffff810252fe ffffffff81432580 ffff8800bf841170 0000000000000000
[ 212.439228] <0> 000000000000000f 0000000000000100 ffffffff8102b227 ffff8800b774bfd8
[ 212.439228] <0> 0000000000000083 ffffffff8140c0c8 ffff8800b774bfd8 0000000000000009
[ 212.439228] Call Trace:
[ 212.439228] <IRQ>
[ 212.439228] [<ffffffff810252fe>] ? task_rq_lock+0x7a/0x94
[ 212.439228] [<ffffffff8102b227>] ? try_to_wake_up+0x21/0x21f
[ 212.439228] [<ffffffff8103632b>] ? __do_softirq+0x191/0x193
[ 212.439228] [<ffffffff810791cb>] ? perf_ctx_adjust_freq+0x10/0xec
[ 212.439228] [<ffffffff81002d0c>] ? call_softirq+0x1c/0x28
[ 212.439228] [<ffffffff81004942>] ? do_softirq+0x31/0x67
[ 212.439228] [<ffffffff810363bb>] ? irq_exit+0x36/0x7f
[ 212.439228] [<ffffffff81015e92>] ? smp_apic_timer_interrupt+0x87/0x97
[ 212.439228] [<ffffffff810027d3>] ? apic_timer_interrupt+0x13/0x20
[ 212.439228] <EOI>
[ 212.439228] [<ffffffff810788dc>] ? perf_output_unlock+0x3d/0x12e
[ 212.439228] [<ffffffff81078ea2>] ? perf_event_task_ctx+0xb1/0x101
[ 212.439228] [<ffffffff81078f81>] ? perf_event_task+0x8f/0xbd
[ 212.439228] [<ffffffff81033a03>] ? do_exit+0x261/0x664
[ 212.439228] [<ffffffff8101c1da>] ? do_page_fault+0x264/0x282
[ 212.439228] [<ffffffff81121b6d>] ? __put_user_4+0x1d/0x30
[ 212.439228] [<ffffffff81034083>] ? do_group_exit+0x6c/0x93
[ 212.439228] [<ffffffff810340bc>] ? sys_exit_group+0x12/0x19
[ 212.439228] [<ffffffff81001eeb>] ? system_call_fastpath+0x16/0x1b
[ 212.439228] Code: 90 00 01 00 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 c3 b8 00 01 00 00 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c3 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17 38 f2 74
[ 212.439228] ---[ end trace f8a4ceb8268dbe36 ]---
[ 212.439223] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 212.439223] Pid: 0, comm: swapper Tainted: G D 2.6.34-smpx #1475
[ 212.439223] Call Trace:
[ 212.439223] <NMI> [<ffffffff812a1956>] ? panic+0x77/0xe5
[ 212.439223] [<ffffffff8104bfe1>] ? notifier_call_chain+0x2e/0x5b
[ 212.439223] [<ffffffff8104c416>] ? notify_die+0x2d/0x33
[ 212.439223] [<ffffffff81016b73>] ? nmi_watchdog_tick+0x38/0x191
[ 212.439223] [<ffffffff810037d3>] ? do_nmi+0x215/0x23c
[ 212.439223] [<ffffffff811a9ba4>] ? serial8250_console_putchar+0x0/0x28
[ 212.439223] [<ffffffff812a423a>] ? nmi+0x1a/0x20
[ 212.439223] [<ffffffff811a9ba4>] ? serial8250_console_putchar+0x0/0x28
[ 212.439223] [<ffffffff81121881>] ? delay_tsc+0x32/0x4d
[ 212.439223] <<EOE>>
[ 212.439228] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff812a3a29, registers:
[ 212.439228] CPU 0
[ 212.439228] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer firewire_ohci firewire_core crc_itu_t usb_storage snd ohci1394 soundcore snd_page_alloc usb_libusual rtc_cmos ieee1394 thermal button processor rtc_core sr_mod rtc_lib e1000e i2c_i801 cdrom sg usbhid hid uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 212.439228]
[ 212.439228] Pid: 7183, comm: forkbomb Tainted: G D 2.6.34-smpx #1475 MS-7502/MS-7502
[ 212.439228] RIP: 0010:[<ffffffff812a3a29>] [<ffffffff812a3a29>] _raw_spin_lock+0x10/0x15
[ 212.439228] RSP: 0018:ffff8800016077b8 EFLAGS: 00000093
[ 212.439228] RAX: 0000000000000604 RBX: ffff8800bf8a0000 RCX: 0000000000000000
[ 212.439228] RDX: 0000000000000000 RSI: ffff8800016077f8 RDI: ffff880001612e40
[ 212.439228] RBP: ffff880001612e40 R08: 0000000000000000 R09: 000000000000007f
[ 212.439228] R10: 00000000003604e2 R11: ffffffff815658d8 R12: 0000000000012e40
[ 212.439228] R13: ffff8800016077f8 R14: 0000000000000000 R15: ffff880001615ae0
[ 212.439228] FS: 0000000000000000(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
[ 212.439228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.439228] CR2: 00007f1b2e3aa4a8 CR3: 0000000001420000 CR4: 00000000000006f0
[ 212.439228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.439228] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.439228] Process forkbomb (pid: 7183, threadinfo ffff8800b774a000, task ffff8800b0097a10)
[ 212.439228] Stack:
[ 212.439228] ffffffff810252fe 0000000000000000 ffff8800bf8a0000 0000000000000000
[ 212.439228] <0> 0000000000000003 0000000000000000 ffffffff8102b227 0000000000000000
[ 212.439228] <0> 0000000000000083 ffff8800bf89fe80 0000000000000003 ffff880001615ac8
[ 212.439228] Call Trace:
[ 212.439228] <NMI>
[ 212.439228] [<ffffffff810252fe>] ? task_rq_lock+0x7a/0x94
[ 212.439228] [<ffffffff8102b227>] ? try_to_wake_up+0x21/0x21f
[ 212.439228] [<ffffffff810482ee>] ? autoremove_wake_function+0x9/0x2a
[ 212.439228] [<ffffffff810220bf>] ? __wake_up_common+0x41/0x78
[ 212.439228] [<ffffffff8114f42e>] ? bit_putcs+0x4de/0x55c
[ 212.439228] [<ffffffff81120aef>] ? vsnprintf+0x413/0x44c
[ 212.439228] [<ffffffff8114ef32>] ? bit_cursor+0x4c7/0x4e5
[ 212.439228] [<ffffffff8114f704>] ? soft_cursor+0x19c/0x1b0
[ 212.439228] [<ffffffff8114ef32>] ? bit_cursor+0x4c7/0x4e5
[ 212.439228] [<ffffffff8114ef50>] ? bit_putcs+0x0/0x55c
[ 212.439228] [<ffffffff8114c5de>] ? fbcon_scroll+0x5bd/0xad8
[ 212.439228] [<ffffffff811a9b59>] ? wait_for_xmitr+0x3d/0x88
[ 212.439228] [<ffffffff811a9bb9>] ? serial8250_console_putchar+0x15/0x28
[ 212.439228] [<ffffffff811a9b59>] ? wait_for_xmitr+0x3d/0x88
[ 212.439228] [<ffffffff81030ed4>] ? __call_console_drivers+0x64/0x78
[ 212.439228] [<ffffffff8104be0a>] ? up+0xe/0x39
[ 212.439228] [<ffffffff81031886>] ? release_console_sem+0x177/0x195
[ 212.439228] [<ffffffff81031886>] ? release_console_sem+0x177/0x195
[ 212.439228] [<ffffffff812a1a04>] ? printk+0x40/0x4c
[ 212.439228] [<ffffffff81058810>] ? __module_text_address+0x9/0x55
[ 212.439228] [<ffffffff81058810>] ? __module_text_address+0x9/0x55
[ 212.439228] [<ffffffff81046388>] ? __kernel_text_address+0x1f/0x45
[ 212.439228] [<ffffffff810059ed>] ? print_context_stack+0xa1/0xbd
[ 212.439228] [<ffffffff810059ed>] ? print_context_stack+0xa1/0xbd
[ 212.439228] [<ffffffff81004af5>] ? dump_trace+0x17d/0x23e
[ 212.439228] [<ffffffff81004cc2>] ? show_stack_log_lvl+0x10c/0x11b
[ 212.439228] [<ffffffff81004d84>] ? show_registers+0xb3/0x213
[ 212.439228] [<ffffffff81005da9>] ? die_nmi+0x7a/0xdd
[ 212.439228] [<ffffffff81016c60>] ? nmi_watchdog_tick+0x125/0x191
[ 212.439228] [<ffffffff81003661>] ? do_nmi+0xa3/0x23c
[ 212.439228] [<ffffffff812a423a>] ? nmi+0x1a/0x20
[ 212.439228] [<ffffffff812a3a29>] ? _raw_spin_lock+0x10/0x15
[ 212.439228] <<EOE>>
[ 212.439228] Code: 90 00 01 00 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 c3 b8 00 01 00 00 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c3 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17 38 f2 74
[ 212.439228] ---[ end trace f8a4ceb8268dbe37 ]---
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-24 7:32 ` Mike Galbraith
@ 2010-03-24 8:17 ` Mike Galbraith
2010-03-25 8:04 ` Li Zefan
0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2010-03-24 8:17 UTC (permalink / raw)
To: Li Zefan
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
> I just saw this, hunted down your testcase and tried it here. Looks
> like perf_output_lock() wedged box.
(turns on frame pointers, and adds noinline)
[ 90.276644] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff812b54a8, registers:
[ 90.276644] CPU 2
[ 90.276644] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 90.276644]
[ 90.276644] Pid: 20237, comm: forkbomb Not tainted 2.6.34-smpx #1477 MS-7502/MS-7502
[ 90.276644] RIP: 0010:[<ffffffff812b54a8>] [<ffffffff812b54a8>] _raw_spin_lock+0x12/0x1a
[ 90.276644] RSP: 0018:ffff880001703ee8 EFLAGS: 00000097
[ 90.276644] RAX: 000000000000c3c2 RBX: ffff880001712e40 RCX: 0000001504ab3a62
[ 90.276644] RDX: 0000001504ab3b83 RSI: 0000000000000026 RDI: ffff880001712e40
[ 90.276644] RBP: ffff880001703ee8 R08: 000000001ba05e30 R09: ffff880001703f88
[ 90.276644] R10: 0000000000000038 R11: 0000000000000038 R12: 0000000000000002
[ 90.276644] R13: ffff8800b48de2d0 R14: 0000000000000038 R15: 00000000ffffffe4
[ 90.276644] FS: 0000000000000000(0000) GS:ffff880001700000(0000) knlGS:0000000000000000
[ 90.276644] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 90.276644] CR2: 00007ff05702a4a8 CR3: 0000000001430000 CR4: 00000000000006e0
[ 90.276644] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 90.276644] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 90.276644] Process forkbomb (pid: 20237, threadinfo ffff8800bd3f8000, task ffff8800b48de2d0)
[ 90.276644] Stack:
[ 90.276644] ffff880001703f18 ffffffff8102ecc4 ffff8800b48de2d0 ffff8800b48de2d0
[ 90.276644] <0> 0000000000000000 0000000000000002 ffff880001703f48 ffffffff8103f082
[ 90.276644] <0> ffffea00029f9ee0 ffff88000170d170 ffff88000170d170 0000000000000002
[ 90.276644] Call Trace:
[ 90.276644] <IRQ>
[ 90.276644] [<ffffffff8102ecc4>] scheduler_tick+0x3c/0x1e6
[ 90.276644] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
[ 90.276644] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
[ 90.276644] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
[ 90.276644] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
[ 90.276644] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
[ 90.276644] <EOI>
[ 90.276644] [<ffffffff8107c1df>] ? perf_output_unlock+0x27/0x122
[ 90.276644] [<ffffffff8107d45f>] ? perf_swevent_overflow+0x58/0x76
[ 90.276644] [<ffffffff8107c6c6>] perf_output_begin+0x16e/0x180
[ 90.276644] [<ffffffff8107d5a2>] ? perf_swevent_ctx_event+0xdd/0xee
[ 90.276644] [<ffffffff8107c80d>] perf_event_task_ctx+0xf8/0x110
[ 90.276644] [<ffffffff8104ed00>] ? sched_clock_cpu+0xc0/0xce
[ 90.276644] [<ffffffff8107c8b7>] perf_event_task+0x92/0xc3
[ 90.276644] [<ffffffff8107e25e>] perf_event_exit_task+0x27/0x112
[ 90.276644] [<ffffffff810c6dca>] ? free_fs_struct+0x2d/0x31
[ 90.276644] [<ffffffff81034e4e>] do_exit+0x264/0x67a
[ 90.276644] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
[ 90.276644] [<ffffffff810354e7>] do_group_exit+0x6f/0x98
[ 90.276644] [<ffffffff81035522>] sys_exit_group+0x12/0x16
[ 90.276644] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
[ 90.276644] Code: 00 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 <f3> 90 8a 07 eb f6 c9 c3 55 48 89 e5 9c 58 fa ba 00 01 00 00 f0
[ 90.276639] BUG: NMI Watchdog detected LOCKUP
[ 90.276644] ---[ end trace ef30915e055ad1b9 ]---
[ 90.276639] on CPU1, ip ffffffff81079014, registers:
[ 90.276639] CPU 1
[ 90.276639] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 90.276639]
[ 90.276639] Pid: 0, comm: swapper Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
[ 90.276639] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
[ 90.276639] RSP: 0018:ffff880001683a00 EFLAGS: 00000082
[ 90.276639] RAX: 0000000000000002 RBX: ffff880001683ab0 RCX: ffff8800bc454000
[ 90.276639] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880001683ab0
[ 90.276639] RBP: ffff880001683a10 R08: 0000000000000001 R09: ffff880001683ae0
[ 90.276639] R10: 0000000000000200 R11: 0000000000000000 R12: ffff8800bc454000
[ 90.276639] R13: ffff880037788000 R14: 0000000000000070 R15: 00000000ffffffe4
[ 90.276639] FS: 0000000000000000(0000) GS:ffff880001680000(0000) knlGS:0000000000000000
[ 90.276639] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 90.276639] CR2: 00007ff05702a4a8 CR3: 0000000001430000 CR4: 00000000000006e0
[ 90.276639] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 90.276639] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 90.276639] Process swapper (pid: 0, threadinfo ffff8800bf888000, task ffff8800bf8422e0)
[ 90.276639] Stack:
[ 90.276639] 00000001374fb210 00000002ffffffff ffff880001683a90 ffffffff8107c5cf
[ 90.276639] <0> ffff880001683a40 000000000002b3c6 ffff880001683a40 ffff880001683ae0
[ 90.276639] <0> ffff880001683a90 ffffffff8107d13c ffffffff8104eb6b ffff8800bf8422e0
[ 90.276639] Call Trace:
[ 90.276639] <IRQ>
[ 90.276639] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
[ 90.276639] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
[ 90.276639] [<ffffffff8104eb6b>] ? sched_clock_local+0x12/0x75
[ 90.276639] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
[ 90.276639] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
[ 90.276639] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
[ 90.276639] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
[ 90.276639] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
[ 90.276639] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
[ 90.276639] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
[ 90.276639] [<ffffffff8102c425>] ? try_to_wake_up+0x165/0x226
[ 90.276639] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
[ 90.276639] [<ffffffff81024142>] perf_trace_templ_sched_wakeup_template+0xca/0xe0
[ 90.276639] [<ffffffff81024167>] perf_trace_sched_wakeup+0xf/0x11
[ 90.276639] [<ffffffff8102c425>] try_to_wake_up+0x165/0x226
[ 90.276639] [<ffffffff8102c505>] wake_up_process+0x10/0x12
[ 90.276639] [<ffffffff8102df50>] load_balance+0x3d3/0x47f
[ 90.276639] [<ffffffff8107cc69>] ? perf_event_task_tick+0x6e/0x124
[ 90.276639] [<ffffffff8102e0d6>] rebalance_domains+0xda/0x14a
[ 90.276639] [<ffffffff8102e18a>] run_rebalance_domains+0x44/0xd4
[ 90.276639] [<ffffffff810377e7>] __do_softirq+0xd7/0x193
[ 90.276639] [<ffffffff810050a5>] ? timer_interrupt+0x19/0x20
[ 90.276639] [<ffffffff81002d4c>] call_softirq+0x1c/0x28
[ 90.276639] [<ffffffff810049fa>] do_softirq+0x33/0x69
[ 90.276639] [<ffffffff8103793d>] irq_exit+0x36/0x78
[ 90.276639] [<ffffffff81004733>] do_IRQ+0xa7/0xbe
[ 90.276639] [<ffffffff812b5893>] ret_from_intr+0x0/0xa
[ 90.276639] <EOI>
[ 90.276639] [<ffffffff810093ca>] ? mwait_idle+0xa2/0xa9
[ 90.276639] [<ffffffff8104e1b2>] ? atomic_notifier_call_chain+0x13/0x15
[ 90.276639] [<ffffffff81000736>] cpu_idle+0x52/0x6f
[ 90.276639] [<ffffffff812b0d53>] start_secondary+0x1ae/0x1b2
[ 90.276639] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
[ 90.276639] ---[ end trace ef30915e055ad1ba ]---
[ 90.276644] BUG: NMI Watchdog detected LOCKUP on CPU3, ip ffffffff81079014, registers:
[ 90.276644] CPU 3
[ 90.276644] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 90.276644]
[ 90.276644] Pid: 5146, comm: forkbomb Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
[ 90.276644] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
[ 90.276644] RSP: 0018:ffff880001783b88 EFLAGS: 00000082
[ 90.276644] RAX: 0000000000000002 RBX: ffff880001783c38 RCX: ffff8800bc454000
[ 90.276644] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880001783c38
[ 90.276644] RBP: ffff880001783b98 R08: 0000000000000001 R09: ffff880001783c68
[ 90.276644] R10: 0000000000000003 R11: 0000000000000246 R12: ffff8800bc454000
[ 90.276644] R13: ffff880037788000 R14: 0000000000000078 R15: 00000000ffffffe4
[ 90.276644] FS: 00007ff057202700(0000) GS:ffff880001780000(0000) knlGS:0000000000000000
[ 90.276644] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 90.276644] CR2: 00007ff05702be04 CR3: 00000000bf8c4000 CR4: 00000000000006e0
[ 90.276644] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 90.276644] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 90.276644] Process forkbomb (pid: 5146, threadinfo ffff8800bbebc000, task ffff8800bc128000)
[ 90.276644] Stack:
[ 90.276644] 000000030178f9e0 00000002ffffffff ffff880001783c18 ffffffff8107c5cf
[ 90.276644] <0> ffff880001783bc8 000000000002b3c6 ffff880001783bc8 ffff880001783c68
[ 90.276644] <0> ffff880001783c18 ffffffff8107d13c ffff880001783be8 ffff8800bc128000
[ 90.276644] Call Trace:
[ 90.276644] <IRQ>
[ 90.276644] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
[ 90.276644] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
[ 90.276644] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
[ 90.276644] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
[ 90.276644] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
[ 90.276644] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
[ 90.276644] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
[ 90.276644] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
[ 90.276644] [<ffffffff810263b8>] ? task_tick_fair+0x47/0x103
[ 90.276644] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
[ 90.276644] [<ffffffff810247ff>] perf_trace_sched_stat_runtime+0xc4/0xdc
[ 90.276644] [<ffffffff81026227>] update_curr+0x87/0xc8
[ 90.276644] [<ffffffff810263b8>] task_tick_fair+0x47/0x103
[ 90.276644] [<ffffffff8102ed61>] scheduler_tick+0xd9/0x1e6
[ 90.276644] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
[ 90.276644] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
[ 90.276644] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
[ 90.276644] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
[ 90.276644] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
[ 90.276644] <EOI>
[ 90.276644] [<ffffffff810a493a>] ? kmem_cache_alloc+0x6a/0xdc
[ 90.276644] [<ffffffff8109ba31>] anon_vma_fork+0x40/0xca
[ 90.276644] [<ffffffff81030377>] dup_mm+0x206/0x43f
[ 90.276644] [<ffffffff81030dac>] copy_process+0x7b2/0xea9
[ 90.276644] [<ffffffff81031625>] do_fork+0x15e/0x2b4
[ 90.276644] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
[ 90.276644] [<ffffffff81009a6f>] sys_clone+0x23/0x25
[ 90.276644] [<ffffffff81002193>] stub_clone+0x13/0x20
[ 90.276644] [<ffffffff81001f2b>] ? system_call_fastpath+0x16/0x1b
[ 90.276644] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
[ 90.276644] ---[ end trace ef30915e055ad1bb ]---
[ 90.276643] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff81079014, registers:
[ 90.276643] CPU 0
[ 90.276643] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave
[ 90.276644] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 90.276643] acpi_cpufreqPid: 5146, comm: forkbomb Tainted: G D 2.6.34-smpx #1477
[ 90.276643] snd_pcm_ossCall Trace:
[ 90.276643] snd_mixer_oss <NMI> microcode snd_seq [<ffffffff812b3364>] panic+0x73/0xe1
[ 90.276643] snd_seq_device fuse [<ffffffff8103794f>] ? irq_exit+0x48/0x78
[ 90.276643] loop dm_mod [<ffffffff81034c57>] do_exit+0x6d/0x67a
[ 90.276643] snd_hda_codec_realtek snd_hda_intel [<ffffffff8104e164>] ? notifier_call_chain+0x32/0x5e
[ 90.276643] firewire_ohci firewire_core [<ffffffff81005f8c>] die_nmi+0xd2/0xde
[ 90.276643] snd_hda_codec usbmouse [<ffffffff8101751d>] nmi_watchdog_tick+0x41/0x19d
[ 90.276643] crc_itu_t snd_hwdep [<ffffffff81003812>] do_nmi+0x222/0x24c
[ 90.276643] snd_pcm usb_storage [<ffffffff812b5cfa>] nmi+0x1a/0x20
[ 90.276643] snd_timer sr_mod [<ffffffff81005f87>] ? die_nmi+0xcd/0xde
[ 90.276643] usbhid <<EOE>> ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 90.276643]
[ 90.276643] Pid: 20238, comm: forkbomb Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
[ 90.276643] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
[ 90.276643] RSP: 0000:ffff880001603b88 EFLAGS: 00000082
[ 90.276643] RAX: 0000000000000002 RBX: ffff880001603c38 RCX: ffff8800bc454000
[ 90.276643] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880001603c38
[ 90.276643] RBP: ffff880001603b98 R08: 0000000000000001 R09: ffff880001603c68
[ 90.276643] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bc454000
[ 90.276643] R13: ffff880037788000 R14: 0000000000000078 R15: 00000000ffffffe4
[ 90.276643] FS: 00007ff057202700(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
[ 90.276643] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 90.276643] CR2: 00007ff056d05510 CR3: 00000000bd9f7000 CR4: 00000000000006f0
[ 90.276643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 90.276643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 90.276643] Process forkbomb (pid: 20238, threadinfo ffff8800bd3fa000, task ffff8800b48de8a0)
[ 90.276643] Stack:
[ 90.276643] 00000000071cf224 00000002ffffffff ffff880001603c18 ffffffff8107c5cf
[ 90.276643] <0> ffff880001603bc8 000000000002b3c6 ffff880001603bc8 ffff880001603c68
[ 90.276643] <0> ffff880001603c18 ffffffff8107d13c ffff8800bc690888 ffff8800b48de8a0
[ 90.276643] Call Trace:
[ 90.276643] <IRQ>
[ 90.276643] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
[ 90.276643] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
[ 90.276643] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
[ 90.276643] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
[ 90.276643] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
[ 90.276643] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
[ 90.276643] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
[ 90.276643] [<ffffffff811b8f9e>] ? put_device+0x12/0x14
[ 90.276643] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
[ 90.276643] [<ffffffff810263b8>] ? task_tick_fair+0x47/0x103
[ 90.276643] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
[ 90.276643] [<ffffffff810247ff>] perf_trace_sched_stat_runtime+0xc4/0xdc
[ 90.276643] [<ffffffffa0000773>] ? scsi_finish_command+0xec/0xf5 [scsi_mod]
[ 90.276643] [<ffffffff81026227>] update_curr+0x87/0xc8
[ 90.276643] [<ffffffff810263b8>] task_tick_fair+0x47/0x103
[ 90.276643] [<ffffffff8102ed61>] scheduler_tick+0xd9/0x1e6
[ 90.276643] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
[ 90.276643] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
[ 90.276643] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
[ 90.276643] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
[ 90.276643] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
[ 90.276643] <EOI>
[ 90.276643] [<ffffffff8101cbd1>] ? do_page_fault+0x84/0x253
[ 90.276643] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
[ 90.276643] [<ffffffff812b5a9f>] page_fault+0x1f/0x30
[ 90.276643] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
[ 90.276643] ---[ end trace ef30915e055ad1bc ]---
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-24 8:17 ` Mike Galbraith
@ 2010-03-25 8:04 ` Li Zefan
2010-03-25 9:27 ` Mike Galbraith
0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2010-03-25 8:04 UTC (permalink / raw)
To: Mike Galbraith
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
Mike Galbraith wrote:
> On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
>
>> I just saw this, hunted down your testcase and tried it here. Looks
>> like perf_output_lock() wedged box.
>
> (turns on frame pointers, and adds noinline)
>
Thanks! Then who's going to fix this...
--Li
> [ 90.276644] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff812b54a8, registers:
> [ 90.276644] CPU 2
> [ 90.276644] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
> [ 90.276644]
> [ 90.276644] Pid: 20237, comm: forkbomb Not tainted 2.6.34-smpx #1477 MS-7502/MS-7502
> [ 90.276644] RIP: 0010:[<ffffffff812b54a8>] [<ffffffff812b54a8>] _raw_spin_lock+0x12/0x1a
> [ 90.276644] RSP: 0018:ffff880001703ee8 EFLAGS: 00000097
> [ 90.276644] RAX: 000000000000c3c2 RBX: ffff880001712e40 RCX: 0000001504ab3a62
> [ 90.276644] RDX: 0000001504ab3b83 RSI: 0000000000000026 RDI: ffff880001712e40
> [ 90.276644] RBP: ffff880001703ee8 R08: 000000001ba05e30 R09: ffff880001703f88
> [ 90.276644] R10: 0000000000000038 R11: 0000000000000038 R12: 0000000000000002
> [ 90.276644] R13: ffff8800b48de2d0 R14: 0000000000000038 R15: 00000000ffffffe4
> [ 90.276644] FS: 0000000000000000(0000) GS:ffff880001700000(0000) knlGS:0000000000000000
> [ 90.276644] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 90.276644] CR2: 00007ff05702a4a8 CR3: 0000000001430000 CR4: 00000000000006e0
> [ 90.276644] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 90.276644] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 90.276644] Process forkbomb (pid: 20237, threadinfo ffff8800bd3f8000, task ffff8800b48de2d0)
> [ 90.276644] Stack:
> [ 90.276644] ffff880001703f18 ffffffff8102ecc4 ffff8800b48de2d0 ffff8800b48de2d0
> [ 90.276644] <0> 0000000000000000 0000000000000002 ffff880001703f48 ffffffff8103f082
> [ 90.276644] <0> ffffea00029f9ee0 ffff88000170d170 ffff88000170d170 0000000000000002
> [ 90.276644] Call Trace:
> [ 90.276644] <IRQ>
> [ 90.276644] [<ffffffff8102ecc4>] scheduler_tick+0x3c/0x1e6
> [ 90.276644] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
> [ 90.276644] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
> [ 90.276644] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
> [ 90.276644] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
> [ 90.276644] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
> [ 90.276644] <EOI>
> [ 90.276644] [<ffffffff8107c1df>] ? perf_output_unlock+0x27/0x122
> [ 90.276644] [<ffffffff8107d45f>] ? perf_swevent_overflow+0x58/0x76
> [ 90.276644] [<ffffffff8107c6c6>] perf_output_begin+0x16e/0x180
> [ 90.276644] [<ffffffff8107d5a2>] ? perf_swevent_ctx_event+0xdd/0xee
> [ 90.276644] [<ffffffff8107c80d>] perf_event_task_ctx+0xf8/0x110
> [ 90.276644] [<ffffffff8104ed00>] ? sched_clock_cpu+0xc0/0xce
> [ 90.276644] [<ffffffff8107c8b7>] perf_event_task+0x92/0xc3
> [ 90.276644] [<ffffffff8107e25e>] perf_event_exit_task+0x27/0x112
> [ 90.276644] [<ffffffff810c6dca>] ? free_fs_struct+0x2d/0x31
> [ 90.276644] [<ffffffff81034e4e>] do_exit+0x264/0x67a
> [ 90.276644] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
> [ 90.276644] [<ffffffff810354e7>] do_group_exit+0x6f/0x98
> [ 90.276644] [<ffffffff81035522>] sys_exit_group+0x12/0x16
> [ 90.276644] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
> [ 90.276644] Code: 00 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 <f3> 90 8a 07 eb f6 c9 c3 55 48 89 e5 9c 58 fa ba 00 01 00 00 f0
> [ 90.276639] BUG: NMI Watchdog detected LOCKUP
> [ 90.276644] ---[ end trace ef30915e055ad1b9 ]---
> [ 90.276639] on CPU1, ip ffffffff81079014, registers:
> [ 90.276639] CPU 1
> [ 90.276639] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
> [ 90.276639]
> [ 90.276639] Pid: 0, comm: swapper Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
> [ 90.276639] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
> [ 90.276639] RSP: 0018:ffff880001683a00 EFLAGS: 00000082
> [ 90.276639] RAX: 0000000000000002 RBX: ffff880001683ab0 RCX: ffff8800bc454000
> [ 90.276639] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880001683ab0
> [ 90.276639] RBP: ffff880001683a10 R08: 0000000000000001 R09: ffff880001683ae0
> [ 90.276639] R10: 0000000000000200 R11: 0000000000000000 R12: ffff8800bc454000
> [ 90.276639] R13: ffff880037788000 R14: 0000000000000070 R15: 00000000ffffffe4
> [ 90.276639] FS: 0000000000000000(0000) GS:ffff880001680000(0000) knlGS:0000000000000000
> [ 90.276639] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 90.276639] CR2: 00007ff05702a4a8 CR3: 0000000001430000 CR4: 00000000000006e0
> [ 90.276639] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 90.276639] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 90.276639] Process swapper (pid: 0, threadinfo ffff8800bf888000, task ffff8800bf8422e0)
> [ 90.276639] Stack:
> [ 90.276639] 00000001374fb210 00000002ffffffff ffff880001683a90 ffffffff8107c5cf
> [ 90.276639] <0> ffff880001683a40 000000000002b3c6 ffff880001683a40 ffff880001683ae0
> [ 90.276639] <0> ffff880001683a90 ffffffff8107d13c ffffffff8104eb6b ffff8800bf8422e0
> [ 90.276639] Call Trace:
> [ 90.276639] <IRQ>
> [ 90.276639] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
> [ 90.276639] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
> [ 90.276639] [<ffffffff8104eb6b>] ? sched_clock_local+0x12/0x75
> [ 90.276639] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
> [ 90.276639] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
> [ 90.276639] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
> [ 90.276639] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
> [ 90.276639] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
> [ 90.276639] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
> [ 90.276639] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
> [ 90.276639] [<ffffffff8102c425>] ? try_to_wake_up+0x165/0x226
> [ 90.276639] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
> [ 90.276639] [<ffffffff81024142>] perf_trace_templ_sched_wakeup_template+0xca/0xe0
> [ 90.276639] [<ffffffff81024167>] perf_trace_sched_wakeup+0xf/0x11
> [ 90.276639] [<ffffffff8102c425>] try_to_wake_up+0x165/0x226
> [ 90.276639] [<ffffffff8102c505>] wake_up_process+0x10/0x12
> [ 90.276639] [<ffffffff8102df50>] load_balance+0x3d3/0x47f
> [ 90.276639] [<ffffffff8107cc69>] ? perf_event_task_tick+0x6e/0x124
> [ 90.276639] [<ffffffff8102e0d6>] rebalance_domains+0xda/0x14a
> [ 90.276639] [<ffffffff8102e18a>] run_rebalance_domains+0x44/0xd4
> [ 90.276639] [<ffffffff810377e7>] __do_softirq+0xd7/0x193
> [ 90.276639] [<ffffffff810050a5>] ? timer_interrupt+0x19/0x20
> [ 90.276639] [<ffffffff81002d4c>] call_softirq+0x1c/0x28
> [ 90.276639] [<ffffffff810049fa>] do_softirq+0x33/0x69
> [ 90.276639] [<ffffffff8103793d>] irq_exit+0x36/0x78
> [ 90.276639] [<ffffffff81004733>] do_IRQ+0xa7/0xbe
> [ 90.276639] [<ffffffff812b5893>] ret_from_intr+0x0/0xa
> [ 90.276639] <EOI>
> [ 90.276639] [<ffffffff810093ca>] ? mwait_idle+0xa2/0xa9
> [ 90.276639] [<ffffffff8104e1b2>] ? atomic_notifier_call_chain+0x13/0x15
> [ 90.276639] [<ffffffff81000736>] cpu_idle+0x52/0x6f
> [ 90.276639] [<ffffffff812b0d53>] start_secondary+0x1ae/0x1b2
> [ 90.276639] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
> [ 90.276639] ---[ end trace ef30915e055ad1ba ]---
> [ 90.276644] BUG: NMI Watchdog detected LOCKUP on CPU3, ip ffffffff81079014, registers:
> [ 90.276644] CPU 3
> [ 90.276644] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_pcm_oss snd_mixer_oss microcode snd_seq snd_seq_device fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel firewire_ohci firewire_core snd_hda_codec usbmouse crc_itu_t snd_hwdep snd_pcm usb_storage snd_timer sr_mod usbhid ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
> [ 90.276644]
> [ 90.276644] Pid: 5146, comm: forkbomb Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
> [ 90.276644] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
> [ 90.276644] RSP: 0018:ffff880001783b88 EFLAGS: 00000082
> [ 90.276644] RAX: 0000000000000002 RBX: ffff880001783c38 RCX: ffff8800bc454000
> [ 90.276644] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880001783c38
> [ 90.276644] RBP: ffff880001783b98 R08: 0000000000000001 R09: ffff880001783c68
> [ 90.276644] R10: 0000000000000003 R11: 0000000000000246 R12: ffff8800bc454000
> [ 90.276644] R13: ffff880037788000 R14: 0000000000000078 R15: 00000000ffffffe4
> [ 90.276644] FS: 00007ff057202700(0000) GS:ffff880001780000(0000) knlGS:0000000000000000
> [ 90.276644] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 90.276644] CR2: 00007ff05702be04 CR3: 00000000bf8c4000 CR4: 00000000000006e0
> [ 90.276644] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 90.276644] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 90.276644] Process forkbomb (pid: 5146, threadinfo ffff8800bbebc000, task ffff8800bc128000)
> [ 90.276644] Stack:
> [ 90.276644] 000000030178f9e0 00000002ffffffff ffff880001783c18 ffffffff8107c5cf
> [ 90.276644] <0> ffff880001783bc8 000000000002b3c6 ffff880001783bc8 ffff880001783c68
> [ 90.276644] <0> ffff880001783c18 ffffffff8107d13c ffff880001783be8 ffff8800bc128000
> [ 90.276644] Call Trace:
> [ 90.276644] <IRQ>
> [ 90.276644] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
> [ 90.276644] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
> [ 90.276644] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
> [ 90.276644] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
> [ 90.276644] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
> [ 90.276644] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
> [ 90.276644] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
> [ 90.276644] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
> [ 90.276644] [<ffffffff810263b8>] ? task_tick_fair+0x47/0x103
> [ 90.276644] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
> [ 90.276644] [<ffffffff810247ff>] perf_trace_sched_stat_runtime+0xc4/0xdc
> [ 90.276644] [<ffffffff81026227>] update_curr+0x87/0xc8
> [ 90.276644] [<ffffffff810263b8>] task_tick_fair+0x47/0x103
> [ 90.276644] [<ffffffff8102ed61>] scheduler_tick+0xd9/0x1e6
> [ 90.276644] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
> [ 90.276644] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
> [ 90.276644] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
> [ 90.276644] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
> [ 90.276644] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
> [ 90.276644] <EOI>
> [ 90.276644] [<ffffffff810a493a>] ? kmem_cache_alloc+0x6a/0xdc
> [ 90.276644] [<ffffffff8109ba31>] anon_vma_fork+0x40/0xca
> [ 90.276644] [<ffffffff81030377>] dup_mm+0x206/0x43f
> [ 90.276644] [<ffffffff81030dac>] copy_process+0x7b2/0xea9
> [ 90.276644] [<ffffffff81031625>] do_fork+0x15e/0x2b4
> [ 90.276644] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
> [ 90.276644] [<ffffffff81009a6f>] sys_clone+0x23/0x25
> [ 90.276644] [<ffffffff81002193>] stub_clone+0x13/0x20
> [ 90.276644] [<ffffffff81001f2b>] ? system_call_fastpath+0x16/0x1b
> [ 90.276644] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
> [ 90.276644] ---[ end trace ef30915e055ad1bb ]---
> [ 90.276643] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff81079014, registers:
> [ 90.276643] CPU 0
> [ 90.276643] Modules linked in: cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave
> [ 90.276644] Kernel panic - not syncing: Aiee, killing interrupt handler!
> [ 90.276643] acpi_cpufreqPid: 5146, comm: forkbomb Tainted: G D 2.6.34-smpx #1477
> [ 90.276643] snd_pcm_ossCall Trace:
> [ 90.276643] snd_mixer_oss <NMI> microcode snd_seq [<ffffffff812b3364>] panic+0x73/0xe1
> [ 90.276643] snd_seq_device fuse [<ffffffff8103794f>] ? irq_exit+0x48/0x78
> [ 90.276643] loop dm_mod [<ffffffff81034c57>] do_exit+0x6d/0x67a
> [ 90.276643] snd_hda_codec_realtek snd_hda_intel [<ffffffff8104e164>] ? notifier_call_chain+0x32/0x5e
> [ 90.276643] firewire_ohci firewire_core [<ffffffff81005f8c>] die_nmi+0xd2/0xde
> [ 90.276643] snd_hda_codec usbmouse [<ffffffff8101751d>] nmi_watchdog_tick+0x41/0x19d
> [ 90.276643] crc_itu_t snd_hwdep [<ffffffff81003812>] do_nmi+0x222/0x24c
> [ 90.276643] snd_pcm usb_storage [<ffffffff812b5cfa>] nmi+0x1a/0x20
> [ 90.276643] snd_timer sr_mod [<ffffffff81005f87>] ? die_nmi+0xcd/0xde
> [ 90.276643] usbhid <<EOE>> ohci1394 snd rtc_cmos soundcore rtc_core ieee1394 usb_libusual thermal i2c_i801 processor button e1000e snd_page_alloc cdrom rtc_lib hid sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
> [ 90.276643]
> [ 90.276643] Pid: 20238, comm: forkbomb Tainted: G D 2.6.34-smpx #1477 MS-7502/MS-7502
> [ 90.276643] RIP: 0010:[<ffffffff81079014>] [<ffffffff81079014>] perf_output_lock+0x30/0x4e
> [ 90.276643] RSP: 0000:ffff880001603b88 EFLAGS: 00000082
> [ 90.276643] RAX: 0000000000000002 RBX: ffff880001603c38 RCX: ffff8800bc454000
> [ 90.276643] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880001603c38
> [ 90.276643] RBP: ffff880001603b98 R08: 0000000000000001 R09: ffff880001603c68
> [ 90.276643] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bc454000
> [ 90.276643] R13: ffff880037788000 R14: 0000000000000078 R15: 00000000ffffffe4
> [ 90.276643] FS: 00007ff057202700(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
> [ 90.276643] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 90.276643] CR2: 00007ff056d05510 CR3: 00000000bd9f7000 CR4: 00000000000006f0
> [ 90.276643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 90.276643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 90.276643] Process forkbomb (pid: 20238, threadinfo ffff8800bd3fa000, task ffff8800b48de8a0)
> [ 90.276643] Stack:
> [ 90.276643] 00000000071cf224 00000002ffffffff ffff880001603c18 ffffffff8107c5cf
> [ 90.276643] <0> ffff880001603bc8 000000000002b3c6 ffff880001603bc8 ffff880001603c68
> [ 90.276643] <0> ffff880001603c18 ffffffff8107d13c ffff8800bc690888 ffff8800b48de8a0
> [ 90.276643] Call Trace:
> [ 90.276643] <IRQ>
> [ 90.276643] [<ffffffff8107c5cf>] perf_output_begin+0x77/0x180
> [ 90.276643] [<ffffffff8107d13c>] ? perf_prepare_sample+0xb5/0x1d3
> [ 90.276643] [<ffffffff8107d3c5>] __perf_event_overflow+0x16b/0x1ad
> [ 90.276643] [<ffffffff8107d45f>] perf_swevent_overflow+0x58/0x76
> [ 90.276643] [<ffffffff8107d4c3>] perf_swevent_add+0x46/0x48
> [ 90.276643] [<ffffffff8107d5a2>] perf_swevent_ctx_event+0xdd/0xee
> [ 90.276643] [<ffffffff8107d601>] do_perf_sw_event+0x4e/0x8c
> [ 90.276643] [<ffffffff811b8f9e>] ? put_device+0x12/0x14
> [ 90.276643] [<ffffffff8107d670>] perf_tp_event+0x31/0x33
> [ 90.276643] [<ffffffff810263b8>] ? task_tick_fair+0x47/0x103
> [ 90.276643] [<ffffffff810752e6>] ? perf_trace_buf_prepare+0x8e/0xb5
> [ 90.276643] [<ffffffff810247ff>] perf_trace_sched_stat_runtime+0xc4/0xdc
> [ 90.276643] [<ffffffffa0000773>] ? scsi_finish_command+0xec/0xf5 [scsi_mod]
> [ 90.276643] [<ffffffff81026227>] update_curr+0x87/0xc8
> [ 90.276643] [<ffffffff810263b8>] task_tick_fair+0x47/0x103
> [ 90.276643] [<ffffffff8102ed61>] scheduler_tick+0xd9/0x1e6
> [ 90.276643] [<ffffffff8103f082>] update_process_times+0x4b/0x5b
> [ 90.276643] [<ffffffff81054229>] tick_periodic+0x5d/0x5f
> [ 90.276643] [<ffffffff8105428e>] tick_handle_periodic+0x21/0x6e
> [ 90.276643] [<ffffffff810167a7>] smp_apic_timer_interrupt+0x83/0x96
> [ 90.276643] [<ffffffff81002813>] apic_timer_interrupt+0x13/0x20
> [ 90.276643] <EOI>
> [ 90.276643] [<ffffffff8101cbd1>] ? do_page_fault+0x84/0x253
> [ 90.276643] [<ffffffff8101cd6f>] ? do_page_fault+0x222/0x253
> [ 90.276643] [<ffffffff812b5a9f>] page_fault+0x1f/0x30
> [ 90.276643] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 58 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 09 c7 47 28 01 00 00 00 eb 08 39
> [ 90.276643] ---[ end trace ef30915e055ad1bc ]---
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-25 8:04 ` Li Zefan
@ 2010-03-25 9:27 ` Mike Galbraith
2010-03-25 9:54 ` Peter Zijlstra
2010-03-26 10:11 ` Mike Galbraith
0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2010-03-25 9:27 UTC (permalink / raw)
To: Li Zefan
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
On Thu, 2010-03-25 at 16:04 +0800, Li Zefan wrote:
> Mike Galbraith wrote:
> > On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
> >
> >> I just saw this, hunted down your testcase and tried it here. Looks
> >> like perf_output_lock() wedged box.
> >
> > (turns on frame pointers, and adds noinline)
> >
>
> Thanks! Then who's going to fix this...
Well, that kinda depends on whether I figure out how the heck it's all
supposed to work before somebody else whacks it or not.
ATM, I've instrumented, know _what's_ happening, but find myself saying
"wtf?" a lot, especially wrt handle->locked. The act of attempting to
lock a handle declares it unlocked, turning perf_output_unlock() into a
noop, which looks a bit strange. We're spinning on those "unlocked"
locks, all left genuinely locked by one CPU. I just whacked the thing,
and am very likely about to see in yet another trace.
Locking is hard, "curious construct" locking is even harder :)
-Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-25 9:27 ` Mike Galbraith
@ 2010-03-25 9:54 ` Peter Zijlstra
2010-03-26 10:11 ` Mike Galbraith
1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2010-03-25 9:54 UTC (permalink / raw)
To: Mike Galbraith
Cc: Li Zefan, LKML, Ingo Molnar, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
On Thu, 2010-03-25 at 10:27 +0100, Mike Galbraith wrote:
> On Thu, 2010-03-25 at 16:04 +0800, Li Zefan wrote:
> > Mike Galbraith wrote:
> > > On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
> > >
> > >> I just saw this, hunted down your testcase and tried it here. Looks
> > >> like perf_output_lock() wedged box.
> > >
> > > (turns on frame pointers, and adds noinline)
> > >
> >
> > Thanks! Then who's going to fix this...
>
> Well, that kinda depends on whether I figure out how the heck it's all
> supposed to work before somebody else whacks it or not.
>
> ATM, I've instrumented, know _what's_ happening, but find myself saying
> "wtf?" a lot, especially wrt handle->locked. The act of attempting to
> lock a handle declares it unlocked, turning perf_output_unlock() into a
> noop, which looks a bit strange. We're spinning on those "unlocked"
> locks, all left genuinely locked by one CPU. I just whacked the thing,
> and am very likely about to see in yet another trace.
>
> Locking is hard, "curious construct" locking is even harder :)
Yeah, that perf_output_{un,}lock() stuff is a pain..
handle->locked should indicate if we're the outer most context
the problem this stuff is solving is that of publishing the data head
pointer, suppose two concurrent writers:
head = 0;
A: reserve N bytes
write content
commit
B: reserve M bytes
write content
commit
Now, if B happens during A, and B completes before A is done, B's commit
would expose a head pointer of N+M bytes, even though the first N aren't
completely written yet.
So the trick is to only update the head pointer once everybody is done
writing. You can model this as a recursion problem and only update once
the recursion count hits 0 again.
However, with multiple cpus and NMIs and a busy enough environment this
never needs to happen, which would be bad, since we then cannot make
progress.
Hence the curious construct which basically serializes per cpu. Since
each cpu can only have a limited nesting context, and any one cpu is
bound to make progress (unless of course there is another bug that
prevents that) we'll post regular updates to the head pointer.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-25 9:27 ` Mike Galbraith
2010-03-25 9:54 ` Peter Zijlstra
@ 2010-03-26 10:11 ` Mike Galbraith
2010-03-26 17:23 ` Frederic Weisbecker
` (2 more replies)
1 sibling, 3 replies; 16+ messages in thread
From: Mike Galbraith @ 2010-03-26 10:11 UTC (permalink / raw)
To: Li Zefan
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
On Thu, 2010-03-25 at 10:27 +0100, Mike Galbraith wrote:
> On Thu, 2010-03-25 at 16:04 +0800, Li Zefan wrote:
> > Mike Galbraith wrote:
> > > On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
> > >
> > >> I just saw this, hunted down your testcase and tried it here. Looks
> > >> like perf_output_lock() wedged box.
> > >
> > > (turns on frame pointers, and adds noinline)
> > >
> >
> > Thanks! Then who's going to fix this...
>
> Well, that kinda depends on whether I figure out how the heck it's all
> supposed to work before somebody else whacks it or not.
This seems to work, in contrast to everything I tried yesterday. Not
exactly a thing of beauty, but at least it's an option, so...
perf: fix perf sched record forkbomb deadlock
perf sched record can deadlock a box should the holder of handle->data->lock
take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
to acquire the same lock. Disable interrupts.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <new-submission>
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 574ee58..2ba2e9f 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3378,15 +3378,23 @@ static void perf_event_task_output(struct perf_event *event,
struct perf_task_event *task_event)
{
struct perf_output_handle handle;
- int size;
struct task_struct *task = task_event->task;
- int ret;
+ unsigned long flags;
+ int size, ret;
+
+ /*
+ * If this CPU attempts to acquire an rq lock held by a CPU spinning
+ * in perf_output_lock() from interrupt context, it's game over.
+ */
+ local_irq_save(flags);
size = task_event->event_id.header.size;
ret = perf_output_begin(&handle, event, size, 0, 0);
- if (ret)
+ if (ret) {
+ local_irq_restore(flags);
return;
+ }
task_event->event_id.pid = perf_event_pid(event, task);
task_event->event_id.ppid = perf_event_pid(event, current);
@@ -3397,6 +3405,7 @@ static void perf_event_task_output(struct perf_event *event,
perf_output_put(&handle, task_event->event_id);
perf_output_end(&handle);
+ local_irq_restore(flags);
}
static int perf_event_task_match(struct perf_event *event)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 10:11 ` Mike Galbraith
@ 2010-03-26 17:23 ` Frederic Weisbecker
2010-03-26 19:10 ` Mike Galbraith
2010-03-29 2:01 ` Li Zefan
2010-04-02 19:07 ` [tip:perf/core] perf: Fix 'perf sched record' deadlock tip-bot for Mike Galbraith
2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-03-26 17:23 UTC (permalink / raw)
To: Mike Galbraith
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, Mar 26, 2010 at 11:11:33AM +0100, Mike Galbraith wrote:
> On Thu, 2010-03-25 at 10:27 +0100, Mike Galbraith wrote:
> > On Thu, 2010-03-25 at 16:04 +0800, Li Zefan wrote:
> > > Mike Galbraith wrote:
> > > > On Wed, 2010-03-24 at 08:32 +0100, Mike Galbraith wrote:
> > > >
> > > >> I just saw this, hunted down your testcase and tried it here. Looks
> > > >> like perf_output_lock() wedged box.
> > > >
> > > > (turns on frame pointers, and adds noinline)
> > > >
> > >
> > > Thanks! Then who's going to fix this...
> >
> > Well, that kinda depends on whether I figure out how the heck it's all
> > supposed to work before somebody else whacks it or not.
>
> This seems to work, in contrast to everything I tried yesterday. Not
> exactly a thing of beauty, but at least it's an option, so...
>
> perf: fix perf sched record forkbomb deadlock
>
> perf sched record can deadlock a box should the holder of handle->data->lock
> take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
> to acquire the same lock. Disable interrupts.
Aah.
So the scenario is the following inversion?
CPU0 CPU1
sched event with rq->lock held
grab handle->data->lock
spin on handle->data->lock
interrupt
try to grab rq->lock
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 17:23 ` Frederic Weisbecker
@ 2010-03-26 19:10 ` Mike Galbraith
2010-03-26 19:27 ` Frederic Weisbecker
0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2010-03-26 19:10 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, 2010-03-26 at 18:23 +0100, Frederic Weisbecker wrote:
> On Fri, Mar 26, 2010 at 11:11:33AM +0100, Mike Galbraith wrote:
> > perf: fix perf sched record forkbomb deadlock
> >
> > perf sched record can deadlock a box should the holder of handle->data->lock
> > take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
> > to acquire the same lock. Disable interrupts.
>
>
>
> Aah.
>
> So the scenario is the following inversion?
>
> CPU0 CPU1
> sched event with rq->lock held
> grab handle->data->lock
> spin on handle->data->lock
> interrupt
> try to grab rq->lock
Yeah, handle->data->lock holder dare not try to grab any rq lock because
of sched event with rq->lock held.
-Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 19:10 ` Mike Galbraith
@ 2010-03-26 19:27 ` Frederic Weisbecker
2010-03-26 20:22 ` Mike Galbraith
0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2010-03-26 19:27 UTC (permalink / raw)
To: Mike Galbraith
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, Mar 26, 2010 at 08:10:40PM +0100, Mike Galbraith wrote:
> On Fri, 2010-03-26 at 18:23 +0100, Frederic Weisbecker wrote:
> > On Fri, Mar 26, 2010 at 11:11:33AM +0100, Mike Galbraith wrote:
>
> > > perf: fix perf sched record forkbomb deadlock
> > >
> > > perf sched record can deadlock a box should the holder of handle->data->lock
> > > take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
> > > to acquire the same lock. Disable interrupts.
> >
> >
> >
> > Aah.
> >
> > So the scenario is the following inversion?
> >
> > CPU0 CPU1
> > sched event with rq->lock held
> > grab handle->data->lock
> > spin on handle->data->lock
> > interrupt
> > try to grab rq->lock
>
> Yeah, handle->data->lock holder dare not try to grab any rq lock because
> of sched event with rq->lock held.
>
But if that happens with perf sched, there is something weird.
perf sched only use sched events, which have interrupt disabled
from the trace event handler, so this is not supposed to happen.
But if there is another kind of event involved, something that has
interrupts enabled, may be some software events, then it may
happen indeed.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 19:27 ` Frederic Weisbecker
@ 2010-03-26 20:22 ` Mike Galbraith
2010-03-26 21:41 ` Frederic Weisbecker
2010-03-26 21:45 ` Frederic Weisbecker
0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2010-03-26 20:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, 2010-03-26 at 20:27 +0100, Frederic Weisbecker wrote:
> On Fri, Mar 26, 2010 at 08:10:40PM +0100, Mike Galbraith wrote:
> > On Fri, 2010-03-26 at 18:23 +0100, Frederic Weisbecker wrote:
> > > On Fri, Mar 26, 2010 at 11:11:33AM +0100, Mike Galbraith wrote:
> >
> > > > perf: fix perf sched record forkbomb deadlock
> > > >
> > > > perf sched record can deadlock a box should the holder of handle->data->lock
> > > > take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
> > > > to acquire the same lock. Disable interrupts.
> > >
> > >
> > >
> > > Aah.
> > >
> > > So the scenario is the following inversion?
> > >
> > > CPU0 CPU1
> > > sched event with rq->lock held
> > > grab handle->data->lock
> > > spin on handle->data->lock
> > > interrupt
> > > try to grab rq->lock
> >
> > Yeah, handle->data->lock holder dare not try to grab any rq lock because
> > of sched event with rq->lock held.
> >
>
>
> But if that happens with perf sched, there is something weird.
> perf sched only use sched events, which have interrupt disabled
> from the trace event handler, so this is not supposed to happen.
>
> But if there is another kind of event involved, something that has
> interrupts enabled, may be some software events, then it may
> happen indeed.
Hm. Last trace I took is below.
./forkbomb&
[1] 5990
marge:/root/tmp # perf sched record
[ 427.931717] BUG: NMI Watchdog detected LOCKUP on CPU1, ip ffffffff810853f7, registers:
[ 427.931717] CPU 1
[ 427.931717] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc snd_pcm_oss snd_mixer_oss exportfs snd_seq snd_seq_device cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm firewire_ohci usbmouse firewire_core snd_timer crc_itu_t snd usbhid usb_storage rtc_cmos hid soundcore ohci1394 usb_libusual rtc_core sr_mod thermal snd_page_alloc ieee1394 i2c_i801 button processor rtc_lib cdrom e1000e sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 427.931717]
[ 427.931717] Pid: 0, comm: swapper Tainted: G W 2.6.34-smpx #1500 MS-7502/MS-7502
[ 427.931717] RIP: 0010:[<ffffffff810853f7>] [<ffffffff810853f7>] perf_output_lock+0x30/0x5a
[ 427.931717] RSP: 0018:ffff880002083910 EFLAGS: 00000082
[ 427.931717] RAX: 0000000000000002 RBX: ffff8800020839d0 RCX: ffff8800bc91c000
[ 427.931717] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800020839d0
[ 427.931717] RBP: ffff880002083920 R08: 0000000000000001 R09: 0000000000000001
[ 427.931717] R10: 0000000000000000 R11: ffff8800bf845940 R12: ffff8800bc91c000
[ 427.931717] R13: ffff8800bc36a400 R14: 0000000000000060 R15: 0000000000000001
[ 427.931717] FS: 0000000000000000(0000) GS:ffff880002080000(0000) knlGS:0000000000000000
[ 427.931717] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 427.931717] CR2: 00007ff8e6f464a8 CR3: 000000000146e000 CR4: 00000000000006e0
[ 427.931717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 427.931717] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 427.931717] Process swapper (pid: 0, threadinfo ffff8800bf888000, task ffff8800bf845940)
[ 427.931717] Stack:
[ 427.931717] 0000000100000060 00000002ffffffff ffff8800020839c0 ffffffff81089409
[ 427.931717] <0> ffffffff810892f9 ffffffff8105a90d ffff880002083970 ffffffff81052cff
[ 427.931717] <0> ffff880002083bb0 0000000000000001 ffff880002083a00 ffff880002083bb0
[ 427.931717] Call Trace:
[ 427.931717] <IRQ>
[ 427.931717] [<ffffffff81089409>] perf_output_begin+0x110/0x24d
[ 427.931717] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
[ 427.931717] [<ffffffff8105a90d>] ? trace_hardirqs_off+0xd/0xf
[ 427.931717] [<ffffffff81052cff>] ? cpu_clock+0x2d/0x40
[ 427.931717] [<ffffffff8108a37c>] ? perf_prepare_sample+0xb5/0x1d3
[ 427.931717] [<ffffffff8108a4e2>] perf_event_output+0x48/0x74
[ 427.931717] [<ffffffff8108a655>] __perf_event_overflow+0x147/0x168
[ 427.931717] [<ffffffff8108a6ce>] perf_swevent_overflow+0x58/0x75
[ 427.931717] [<ffffffff8108a731>] perf_swevent_add+0x46/0x48
[ 427.931717] [<ffffffff8108a7a0>] perf_swevent_ctx_event+0x6d/0x95
[ 427.931717] [<ffffffff8108a844>] do_perf_sw_event+0x7c/0x103
[ 427.931717] [<ffffffff8108a8fc>] perf_tp_event+0x31/0x33
[ 427.931717] [<ffffffff81028475>] ? dequeue_task_fair+0x4c/0x10a
[ 427.931717] [<ffffffff810811af>] ? perf_trace_buf_prepare+0x93/0xcf
[ 427.931717] [<ffffffff81025356>] perf_trace_sched_stat_runtime+0xc4/0xf4
[ 427.931717] [<ffffffff81026ed9>] update_curr+0xbb/0x101
[ 427.931717] [<ffffffff81028475>] dequeue_task_fair+0x4c/0x10a
[ 427.931717] [<ffffffff8102a841>] dequeue_task+0x3d/0x4d
[ 427.931717] [<ffffffff8102a879>] deactivate_task+0x28/0x31
[ 427.931717] [<ffffffff81028849>] ? double_rq_lock+0x48/0x4d
[ 427.931717] [<ffffffff8102d433>] pull_task+0x1d/0x69
[ 427.931717] [<ffffffff8102f07b>] load_balance+0x253/0x496
[ 427.931717] [<ffffffff8102f398>] rebalance_domains+0xda/0x150
[ 427.931717] [<ffffffff8102f452>] run_rebalance_domains+0x44/0xd4
[ 427.931717] [<ffffffff81039670>] __do_softirq+0x11d/0x220
[ 427.931717] [<ffffffff81002e0c>] call_softirq+0x1c/0x28
[ 427.931717] [<ffffffff81004b73>] do_softirq+0x38/0x81
[ 427.931717] [<ffffffff8103983a>] irq_exit+0x45/0x87
[ 427.931717] [<ffffffff81016f86>] smp_apic_timer_interrupt+0x88/0x96
[ 427.931717] [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
[ 427.931717] <EOI>
[ 427.931717] [<ffffffff810097d7>] ? mwait_idle+0xd4/0xe9
[ 427.931717] [<ffffffff810097e0>] ? mwait_idle+0xdd/0xe9
[ 427.931717] [<ffffffff810097d7>] ? mwait_idle+0xd4/0xe9
[ 427.931717] [<ffffffff8100073b>] cpu_idle+0x57/0x74
[ 427.931717] [<ffffffff812d695a>] start_secondary+0x1bd/0x1c1
[ 427.931717] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 88 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 10 c7 47 28 01 00 00 00 c7 47 2c
[ 427.931717] ---[ end trace 4ede7d64a7e7ec98 ]---
[ 427.931717] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff811404b9, registers:
[ 427.931717] CPU 2
[ 427.931717] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc snd_pcm_oss snd_mixer_oss exportfs snd_seq snd_seq_device cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm firewire_ohci usbmouse firewire_core snd_timer crc_itu_t snd usbhid usb_storage rtc_cmos hid soundcore ohci1394 usb_libusual rtc_core sr_mod thermal snd_page_alloc ieee1394 i2c_i801 button processor rtc_lib cdrom e1000e sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 427.931717]
[ 427.931717] Pid: 5990, comm: forkbomb Tainted: G D W 2.6.34-smpx #1500 MS-7502/MS-7502
[ 427.931717] RIP: 0010:[<ffffffff811404b9>] [<ffffffff811404b9>] delay_tsc+0x14/0x52
[ 427.931717] RSP: 0018:ffff880002103da0 EFLAGS: 00000006
[ 427.931717] RAX: 000000007cca47f2 RBX: ffff8800021130c0 RCX: 0000000000002d00
[ 427.931717] RDX: 0000000000000102 RSI: 0000000000000002 RDI: 0000000000000001
[ 427.931717] RBP: ffff880002103da0 R08: 0000000000000002 R09: 0000000000000001
[ 427.931717] R10: 0000000000000000 R11: ffff880002103ee8 R12: ffff8800bbdc8000
[ 427.931717] R13: 000000008e9adbdc R14: ffff8800bbdc8360 R15: 000000000432710f
[ 427.931717] FS: 00007ff8e711e700(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
[ 427.931717] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 427.931717] CR2: 00007ff8e6f47e04 CR3: 000000003736f000 CR4: 00000000000006e0
[ 427.931717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 427.931717] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 427.931717] Process forkbomb (pid: 5990, threadinfo ffff8800bdb70000, task ffff8800bbdc8000)
[ 427.931717] Stack:
[ 427.931717] ffff880002103db0 ffffffff811403f4 ffff880002103e00 ffffffff81143e42
[ 427.931717] <0> 0000000000000000 ffff880000000001 ffff8800abfa2c00 ffff8800021130c0
[ 427.931717] <0> ffff8800021130c0 00000000000130c0 ffff880002103e78 000000000000000f
[ 427.931717] Call Trace:
[ 427.931717] <IRQ>
[ 427.931717] [<ffffffff811403f4>] __delay+0xa/0xc
[ 427.931717] [<ffffffff81143e42>] do_raw_spin_lock+0xd2/0x13c
[ 427.931717] [<ffffffff812dbc18>] _raw_spin_lock+0x34/0x3b
[ 427.931717] [<ffffffff81026e03>] ? task_rq_lock+0x7c/0x97
[ 427.931717] [<ffffffff8105f2b1>] ? trace_hardirqs_on+0xd/0xf
[ 427.931717] [<ffffffff81026e03>] task_rq_lock+0x7c/0x97
[ 427.931717] [<ffffffff8102d4a5>] try_to_wake_up+0x26/0x266
[ 427.931717] [<ffffffff8102d704>] wake_up_process+0x10/0x12
[ 427.931717] [<ffffffff810392f4>] wakeup_softirqd+0x2a/0x2c
[ 427.931717] [<ffffffff81039771>] __do_softirq+0x21e/0x220
[ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
[ 427.931717] [<ffffffff81002e0c>] call_softirq+0x1c/0x28
[ 427.931717] [<ffffffff81004b73>] do_softirq+0x38/0x81
[ 427.931717] [<ffffffff8103983a>] irq_exit+0x45/0x87
[ 427.931717] [<ffffffff81016f86>] smp_apic_timer_interrupt+0x88/0x96
[ 427.931717] [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
[ 427.931717] <EOI>
[ 427.931717] [<ffffffff8105eae6>] ? lock_acquire+0x108/0x117
[ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
[ 427.931717] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
[ 427.931717] [<ffffffff8104b6dc>] __task_pid_nr_ns+0x3c/0xad
[ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
[ 427.931717] [<ffffffff8108668a>] perf_event_tid+0x26/0x28
[ 427.931717] [<ffffffff8108960f>] perf_event_task_output+0x74/0x9f
[ 427.931717] [<ffffffff81089675>] perf_event_task_ctx+0x3b/0x5b
[ 427.931717] [<ffffffff810896e8>] perf_event_task_event+0x53/0xca
[ 427.931717] [<ffffffff81089695>] ? perf_event_task_event+0x0/0xca
[ 427.931717] [<ffffffff810897da>] perf_event_task+0x7b/0x86
[ 427.931717] [<ffffffff8108a90e>] perf_event_fork+0x10/0x12
[ 427.931717] [<ffffffff810328e7>] copy_process+0xf0c/0x1068
[ 427.931717] [<ffffffff81032bdd>] do_fork+0x176/0x31e
[ 427.931717] [<ffffffff8105192a>] ? up_read+0x1e/0x38
[ 427.931717] [<ffffffff812dbaaa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 427.931717] [<ffffffff81009ed1>] sys_clone+0x23/0x25
[ 427.931717] [<ffffffff81002253>] stub_clone+0x13/0x20
[ 427.931717] [<ffffffff81001f6b>] ? system_call_fastpath+0x16/0x1b
[ 427.931717] Code: 81 48 6b 94 0a 98 00 00 00 3e f7 e2 48 8d 7a 01 e8 47 ff ff ff c9 c3 55 48 89 e5 65 8b 34 25 88 d1 00 00 0f 1f 00 0f ae e8 0f 31 <89> c1 0f 1f 00 0f ae e8 0f 31 89 c0 48 89 c2 48 29 ca 48 39 fa
[ 427.931717] ---[ end trace 4ede7d64a7e7ec99 ]---
[ 427.931715] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff810853f7, registers:
[ 427.931715] CPU 0
[ 427.931715] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc snd_pcm_oss snd_mixer_oss exportfs snd_seq snd_seq_device cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm firewire_ohci usbmouse firewire_core snd_timer crc_itu_t snd usbhid usb_storage rtc_cmos hid soundcore ohci1394 usb_libusual rtc_core sr_mod thermal snd_page_alloc ieee1394 i2c_i801 button processor rtc_lib cdrom e1000e sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 427.931715]
[ 427.931715] Pid: 19553, comm: forkbomb Tainted: G D W 2.6.34-smpx #1500 MS-7502/MS-7502
[ 427.931715] RIP: 0010:[<ffffffff810853f7>] [<ffffffff810853f7>] perf_output_lock+0x30/0x5a
[ 427.931715] RSP: 0018:ffff880002003b18 EFLAGS: 00000082
[ 427.931715] RAX: 0000000000000002 RBX: ffff880002003bd8 RCX: ffff8800bc91c000
[ 427.931715] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880002003bd8
[ 427.931715] RBP: ffff880002003b28 R08: 0000000000000001 R09: 0000000000000001
[ 427.931715] R10: 0000000000000001 R11: ffff8800bf35ca60 R12: ffff8800bc91c000
[ 427.931715] R13: ffff8800bc36a400 R14: 0000000000000060 R15: 0000000000000001
[ 427.931715] FS: 0000000000000000(0000) GS:ffff880002000000(0000) knlGS:0000000000000000
[ 427.931715] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 427.931715] CR2: 00007ff8e6f464a8 CR3: 00000000bc892000 CR4: 00000000000006f0
[ 427.931715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 427.931715] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 427.931715] Process forkbomb (pid: 19553, threadinfo ffff8800bcc58000, task ffff8800bf35ca60)
[ 427.931715] Stack:
[ 427.931715] 0000000000000060 00000002ffffffff ffff880002003bc8 ffffffff81089409
[ 427.931715] <0> ffffffff810892f9 ffffffff8105a90d ffff880002003b78 ffffffff81052cff
[ 427.931715] <0> ffff880002003db8 0000000000000000 ffff880002003c08 ffff880002003db8
[ 427.931715] Call Trace:
[ 427.931715] <IRQ>
[ 427.931715] [<ffffffff81089409>] perf_output_begin+0x110/0x24d
[ 427.931715] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
[ 427.931715] [<ffffffff8105a90d>] ? trace_hardirqs_off+0xd/0xf
[ 427.931715] [<ffffffff81052cff>] ? cpu_clock+0x2d/0x40
[ 427.931715] [<ffffffff8108a37c>] ? perf_prepare_sample+0xb5/0x1d3
[ 427.931715] [<ffffffff8108a4e2>] perf_event_output+0x48/0x74
[ 427.931715] [<ffffffffa014ac59>] ? uhci_urb_enqueue+0x87f/0x8a5 [uhci_hcd]
[ 427.931715] [<ffffffff8108a655>] __perf_event_overflow+0x147/0x168
[ 427.931715] [<ffffffff8108a6ce>] perf_swevent_overflow+0x58/0x75
[ 427.931715] [<ffffffff8108a731>] perf_swevent_add+0x46/0x48
[ 427.931715] [<ffffffff8108a7a0>] perf_swevent_ctx_event+0x6d/0x95
[ 427.931715] [<ffffffff8108a844>] do_perf_sw_event+0x7c/0x103
[ 427.931715] [<ffffffff812dbc18>] ? _raw_spin_lock+0x34/0x3b
[ 427.931715] [<ffffffff8108a8fc>] perf_tp_event+0x31/0x33
[ 427.931715] [<ffffffff81027075>] ? task_tick_fair+0x4a/0x110
[ 427.931715] [<ffffffff810811af>] ? perf_trace_buf_prepare+0x93/0xcf
[ 427.931715] [<ffffffff81025356>] perf_trace_sched_stat_runtime+0xc4/0xf4
[ 427.931715] [<ffffffff81026ed9>] update_curr+0xbb/0x101
[ 427.931715] [<ffffffff81027075>] task_tick_fair+0x4a/0x110
[ 427.931715] [<ffffffff8102fe9f>] scheduler_tick+0xdc/0x1f0
[ 427.931715] [<ffffffff810417da>] update_process_times+0x4b/0x5b
[ 427.931715] [<ffffffff810584a3>] tick_periodic+0x63/0x65
[ 427.931715] [<ffffffff81058508>] tick_handle_periodic+0x21/0x6e
[ 427.931715] [<ffffffff81016f81>] smp_apic_timer_interrupt+0x83/0x96
[ 427.931715] [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
[ 427.931715] <EOI>
[ 427.931715] [<ffffffff810a1b01>] ? unmap_vmas+0x2bd/0x766
[ 427.931715] [<ffffffff810a1fa5>] ? unmap_vmas+0x761/0x766
[ 427.931715] [<ffffffff81092113>] ? __alloc_pages_nodemask+0x120/0x63e
[ 427.931715] [<ffffffff810a6b12>] exit_mmap+0xb3/0x139
[ 427.931715] [<ffffffff8103122e>] mmput+0x30/0xd7
[ 427.931715] [<ffffffff81035272>] exit_mm+0x101/0x10e
[ 427.931715] [<ffffffff8106b226>] ? acct_collect+0x174/0x181
[ 427.931715] [<ffffffff810368b2>] do_exit+0x1bc/0x69f
[ 427.931715] [<ffffffff812dbaaa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 427.931715] [<ffffffff810a16b2>] ? do_wp_page+0x4b3/0x645
[ 427.931715] [<ffffffff8103701b>] do_group_exit+0x72/0x9b
[ 427.931715] [<ffffffff81037056>] sys_exit_group+0x12/0x16
[ 427.931715] [<ffffffff81001f6b>] system_call_fastpath+0x16/0x1b
[ 427.931715] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 88 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 10 c7 47 28 01 00 00 00 c7 47 2c
[ 427.931715] ---[ end trace 4ede7d64a7e7ec9a ]---
[ 433.367734] BUG: NMI Watchdog detected LOCKUP on CPU3, ip ffffffff810853f7, registers:
[ 433.367734] CPU 3
[ 433.367734] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc snd_pcm_oss snd_mixer_oss exportfs snd_seq snd_seq_device cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq fuse loop dm_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm firewire_ohci usbmouse firewire_core snd_timer crc_itu_t snd usbhid usb_storage rtc_cmos hid soundcore ohci1394 usb_libusual rtc_core sr_mod thermal snd_page_alloc ieee1394 i2c_i801 button processor rtc_lib cdrom e1000e sg uhci_hcd ehci_hcd sd_mod usbcore edd fan ext3 ext2 mbcache jbd ahci libata scsi_mod
[ 433.367734]
[ 433.367734] Pid: 0, comm: swapper Tainted: G D W 2.6.34-smpx #1500 MS-7502/MS-7502
[ 433.367734] RIP: 0010:[<ffffffff810853f7>] [<ffffffff810853f7>] perf_output_lock+0x30/0x5a
[ 433.367734] RSP: 0018:ffff880002183958 EFLAGS: 00000082
[ 433.367734] RAX: 0000000000000002 RBX: ffff880002183a18 RCX: ffff8800bc91c000
[ 433.367734] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880002183a18
[ 433.367734] RBP: ffff880002183968 R08: 0000000000000001 R09: 0000000000000001
[ 433.367734] R10: ffff8800bf8a5e48 R11: ffff8800bc91c000 R12: ffff8800bc91c000
[ 433.367734] R13: ffff8800bc36a400 R14: 0000000000000058 R15: 0000000000000001
[ 433.367734] FS: 0000000000000000(0000) GS:ffff880002180000(0000) knlGS:0000000000000000
[ 433.367734] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 433.367734] CR2: 00007ff8e6f464a8 CR3: 000000000146e000 CR4: 00000000000006e0
[ 433.367734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 433.367734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 433.367734] Process swapper (pid: 0, threadinfo ffff8800bf8a4000, task ffff8800bf893b80)
[ 433.367734] Stack:
[ 433.367734] 0000000300000058 00000002ffffffff ffff880002183a08 ffffffff81089409
[ 433.367734] <0> ffffffff810892f9 ffffffff8105a90d ffff8800021839b8 ffffffff81052cff
[ 433.367734] <0> ffff880002183bf8 0000000000000003 ffff880002183a48 ffff880002183bf8
[ 433.367734] Call Trace:
[ 433.367734] <IRQ>
[ 433.367734] [<ffffffff81089409>] perf_output_begin+0x110/0x24d
[ 433.367734] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
[ 433.367734] [<ffffffff8105a90d>] ? trace_hardirqs_off+0xd/0xf
[ 433.367734] [<ffffffff81052cff>] ? cpu_clock+0x2d/0x40
[ 433.367734] [<ffffffff8108a37c>] ? perf_prepare_sample+0xb5/0x1d3
[ 433.367734] [<ffffffff8108a4e2>] perf_event_output+0x48/0x74
[ 433.367734] [<ffffffff8108a655>] __perf_event_overflow+0x147/0x168
[ 433.367734] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
[ 433.367734] [<ffffffff8108a6ce>] perf_swevent_overflow+0x58/0x75
[ 433.367734] [<ffffffff8108a731>] perf_swevent_add+0x46/0x48
[ 433.367734] [<ffffffff8108a7a0>] perf_swevent_ctx_event+0x6d/0x95
[ 433.367734] [<ffffffff8108a844>] do_perf_sw_event+0x7c/0x103
[ 433.367734] [<ffffffff8108a8fc>] perf_tp_event+0x31/0x33
[ 433.367734] [<ffffffff8102d61d>] ? try_to_wake_up+0x19e/0x266
[ 433.367734] [<ffffffff810811af>] ? perf_trace_buf_prepare+0x93/0xcf
[ 433.367734] [<ffffffff81024bf1>] perf_trace_templ_sched_wakeup_template+0xca/0xf8
[ 433.367734] [<ffffffff81024c2e>] perf_trace_sched_wakeup+0xf/0x11
[ 433.367734] [<ffffffff8102d61d>] try_to_wake_up+0x19e/0x266
[ 433.367734] [<ffffffff81050847>] ? hrtimer_wakeup+0x0/0x21
[ 433.367734] [<ffffffff8102d704>] wake_up_process+0x10/0x12
[ 433.367734] [<ffffffff81050864>] hrtimer_wakeup+0x1d/0x21
[ 433.367734] [<ffffffff8105097d>] __run_hrtimer+0x115/0x1a5
[ 433.367734] [<ffffffff81051593>] hrtimer_run_queues+0x126/0x15c
[ 433.367734] [<ffffffff81041783>] run_local_timers+0x9/0x15
[ 433.367734] [<ffffffff810417c0>] update_process_times+0x31/0x5b
[ 433.367734] [<ffffffff810584a3>] tick_periodic+0x63/0x65
[ 433.367734] [<ffffffff81058508>] tick_handle_periodic+0x21/0x6e
[ 433.367734] [<ffffffff8105897f>] tick_do_broadcast+0x3b/0x6d
[ 433.367734] [<ffffffff810589e5>] tick_do_periodic_broadcast+0x34/0x42
[ 433.367734] [<ffffffff81058b07>] tick_handle_periodic_broadcast+0xf/0x46
[ 433.367734] [<ffffffff81005231>] timer_interrupt+0x19/0x20
[ 433.367734] [<ffffffff8106d61f>] handle_IRQ_event+0x83/0x176
[ 433.367734] [<ffffffff8106f4dd>] handle_edge_irq+0xee/0x136
[ 433.367734] [<ffffffff81004b33>] handle_irq+0x1f/0x27
[ 433.367734] [<ffffffff81004837>] do_IRQ+0x57/0xbe
[ 433.367734] [<ffffffff812dc553>] ret_from_intr+0x0/0xf
[ 433.367734] <EOI>
[ 433.367734] [<ffffffff810097d7>] ? mwait_idle+0xd4/0xe9
[ 433.367734] [<ffffffff810097e0>] ? mwait_idle+0xdd/0xe9
[ 433.367734] [<ffffffff810097d7>] ? mwait_idle+0xd4/0xe9
[ 433.367734] [<ffffffff8100073b>] cpu_idle+0x57/0x74
[ 433.367734] [<ffffffff812d695a>] start_secondary+0x1bd/0x1c1
[ 433.367734] Code: 83 ec 10 c7 47 28 00 00 00 00 48 8b 4f 08 65 8b 14 25 88 d1 00 00 c7 45 f8 ff ff ff ff 89 55 f4 8b 75 f4 8b 45 f8 f0 0f b1 71 38 <89> 45 fc 8b 45 fc 83 f8 ff 75 10 c7 47 28 01 00 00 00 c7 47 2c
[ 433.367734] ---[ end trace 4ede7d64a7e7ec9b ]---
[ 433.367734] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 433.367734] Pid: 0, comm: swapper Tainted: G D W 2.6.34-smpx #1500
[ 433.367734] Call Trace:
[ 433.367734] <NMI> [<ffffffff812d90dd>] panic+0x73/0xe6
[ 433.367734] [<ffffffff8105200c>] ? notifier_call_chain+0x32/0x5e
[ 433.367734] [<ffffffff81052129>] ? __atomic_notifier_call_chain+0x74/0x86
[ 433.367734] [<ffffffff810520b5>] ? __atomic_notifier_call_chain+0x0/0x86
[ 433.367734] [<ffffffff8105214a>] ? atomic_notifier_call_chain+0xf/0x11
[ 433.367734] [<ffffffff81052537>] ? notify_die+0x2e/0x33
[ 433.367734] [<ffffffff81017d89>] ? nmi_watchdog_tick+0x41/0x1a3
[ 433.367734] [<ffffffff81003931>] ? do_nmi+0x24a/0x274
[ 433.367734] [<ffffffff811cf234>] ? serial8250_console_putchar+0x0/0x27
[ 433.367734] [<ffffffff812dcaea>] ? nmi+0x1a/0x2c
[ 433.367734] [<ffffffff811cf234>] ? serial8250_console_putchar+0x0/0x27
[ 433.367734] [<ffffffff811404c1>] ? delay_tsc+0x1c/0x52
[ 433.367734] <<EOE>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 20:22 ` Mike Galbraith
@ 2010-03-26 21:41 ` Frederic Weisbecker
2010-03-26 21:45 ` Frederic Weisbecker
1 sibling, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-03-26 21:41 UTC (permalink / raw)
To: Mike Galbraith
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, Mar 26, 2010 at 09:22:04PM +0100, Mike Galbraith wrote:
> Hm. Last trace I took is below.
>
> ./forkbomb&
Hmm, Yeah that's indeed what the traces show.
And your patch really fix this on your box?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 20:22 ` Mike Galbraith
2010-03-26 21:41 ` Frederic Weisbecker
@ 2010-03-26 21:45 ` Frederic Weisbecker
1 sibling, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2010-03-26 21:45 UTC (permalink / raw)
To: Mike Galbraith
Cc: Li Zefan, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras
On Fri, Mar 26, 2010 at 09:22:04PM +0100, Mike Galbraith wrote:
> [ 427.931717] <IRQ>
> [ 427.931717] [<ffffffff811403f4>] __delay+0xa/0xc
> [ 427.931717] [<ffffffff81143e42>] do_raw_spin_lock+0xd2/0x13c
> [ 427.931717] [<ffffffff812dbc18>] _raw_spin_lock+0x34/0x3b
> [ 427.931717] [<ffffffff81026e03>] ? task_rq_lock+0x7c/0x97
> [ 427.931717] [<ffffffff8105f2b1>] ? trace_hardirqs_on+0xd/0xf
> [ 427.931717] [<ffffffff81026e03>] task_rq_lock+0x7c/0x97
> [ 427.931717] [<ffffffff8102d4a5>] try_to_wake_up+0x26/0x266
> [ 427.931717] [<ffffffff8102d704>] wake_up_process+0x10/0x12
> [ 427.931717] [<ffffffff810392f4>] wakeup_softirqd+0x2a/0x2c
> [ 427.931717] [<ffffffff81039771>] __do_softirq+0x21e/0x220
> [ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
> [ 427.931717] [<ffffffff81002e0c>] call_softirq+0x1c/0x28
> [ 427.931717] [<ffffffff81004b73>] do_softirq+0x38/0x81
> [ 427.931717] [<ffffffff8103983a>] irq_exit+0x45/0x87
> [ 427.931717] [<ffffffff81016f86>] smp_apic_timer_interrupt+0x88/0x96
> [ 427.931717] [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
> [ 427.931717] <EOI>
> [ 427.931717] [<ffffffff8105eae6>] ? lock_acquire+0x108/0x117
> [ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
> [ 427.931717] [<ffffffff810892f9>] ? perf_output_begin+0x0/0x24d
> [ 427.931717] [<ffffffff8104b6dc>] __task_pid_nr_ns+0x3c/0xad
> [ 427.931717] [<ffffffff8104b6a0>] ? __task_pid_nr_ns+0x0/0xad
> [ 427.931717] [<ffffffff8108668a>] perf_event_tid+0x26/0x28
> [ 427.931717] [<ffffffff8108960f>] perf_event_task_output+0x74/0x9f
> [ 427.931717] [<ffffffff81089675>] perf_event_task_ctx+0x3b/0x5b
> [ 427.931717] [<ffffffff810896e8>] perf_event_task_event+0x53/0xca
> [ 427.931717] [<ffffffff81089695>] ? perf_event_task_event+0x0/0xca
> [ 427.931717] [<ffffffff810897da>] perf_event_task+0x7b/0x86
> [ 427.931717] [<ffffffff8108a90e>] perf_event_fork+0x10/0x12
Aaah, I see now. That happens during a fork event!
So that looks indeed the right fix.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] perf: hard lockup when using perf-sched
2010-03-26 10:11 ` Mike Galbraith
2010-03-26 17:23 ` Frederic Weisbecker
@ 2010-03-29 2:01 ` Li Zefan
2010-04-02 19:07 ` [tip:perf/core] perf: Fix 'perf sched record' deadlock tip-bot for Mike Galbraith
2 siblings, 0 replies; 16+ messages in thread
From: Li Zefan @ 2010-03-29 2:01 UTC (permalink / raw)
To: Mike Galbraith
Cc: LKML, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker,
Arnaldo Carvalho de Melo, Paul Mackerras
> This seems to work, in contrast to everything I tried yesterday. Not
> exactly a thing of beauty, but at least it's an option, so...
>
> perf: fix perf sched record forkbomb deadlock
>
> perf sched record can deadlock a box should the holder of handle->data->lock
> take an interrupt, and then attempt to acquire an rq lock held by a CPU trying
> to acquire the same lock. Disable interrupts.
>
This patch fixes the lockup on my box. Thanks!
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> LKML-Reference: <new-submission>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [tip:perf/core] perf: Fix 'perf sched record' deadlock
2010-03-26 10:11 ` Mike Galbraith
2010-03-26 17:23 ` Frederic Weisbecker
2010-03-29 2:01 ` Li Zefan
@ 2010-04-02 19:07 ` tip-bot for Mike Galbraith
2 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-04-02 19:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, acme, hpa, mingo, a.p.zijlstra, lizf, efault,
fweisbec, tglx, mingo
Commit-ID: 8bb39f9aa068262732fe44b965d7a6eb5a5a7d67
Gitweb: http://git.kernel.org/tip/8bb39f9aa068262732fe44b965d7a6eb5a5a7d67
Author: Mike Galbraith <efault@gmx.de>
AuthorDate: Fri, 26 Mar 2010 11:11:33 +0100
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 2 Apr 2010 19:30:05 +0200
perf: Fix 'perf sched record' deadlock
perf sched record can deadlock a box should the holder of
handle->data->lock take an interrupt, and then attempt to
acquire an rq lock held by a CPU trying to acquire the
same lock. Disable interrupts.
CPU0 CPU1
sched event with rq->lock held
grab handle->data->lock
spin on handle->data->lock
interrupt
try to grab rq->lock
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1269598293.6174.8.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/perf_event.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index b0feb47..96aae13 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3376,15 +3376,23 @@ static void perf_event_task_output(struct perf_event *event,
struct perf_task_event *task_event)
{
struct perf_output_handle handle;
- int size;
struct task_struct *task = task_event->task;
- int ret;
+ unsigned long flags;
+ int size, ret;
+
+ /*
+ * If this CPU attempts to acquire an rq lock held by a CPU spinning
+ * in perf_output_lock() from interrupt context, it's game over.
+ */
+ local_irq_save(flags);
size = task_event->event_id.header.size;
ret = perf_output_begin(&handle, event, size, 0, 0);
- if (ret)
+ if (ret) {
+ local_irq_restore(flags);
return;
+ }
task_event->event_id.pid = perf_event_pid(event, task);
task_event->event_id.ppid = perf_event_pid(event, current);
@@ -3395,6 +3403,7 @@ static void perf_event_task_output(struct perf_event *event,
perf_output_put(&handle, task_event->event_id);
perf_output_end(&handle);
+ local_irq_restore(flags);
}
static int perf_event_task_match(struct perf_event *event)
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-04-02 19:08 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-17 7:21 [BUG] perf: hard lockup when using perf-sched Li Zefan
2010-03-24 5:52 ` Li Zefan
2010-03-24 7:32 ` Mike Galbraith
2010-03-24 8:17 ` Mike Galbraith
2010-03-25 8:04 ` Li Zefan
2010-03-25 9:27 ` Mike Galbraith
2010-03-25 9:54 ` Peter Zijlstra
2010-03-26 10:11 ` Mike Galbraith
2010-03-26 17:23 ` Frederic Weisbecker
2010-03-26 19:10 ` Mike Galbraith
2010-03-26 19:27 ` Frederic Weisbecker
2010-03-26 20:22 ` Mike Galbraith
2010-03-26 21:41 ` Frederic Weisbecker
2010-03-26 21:45 ` Frederic Weisbecker
2010-03-29 2:01 ` Li Zefan
2010-04-02 19:07 ` [tip:perf/core] perf: Fix 'perf sched record' deadlock tip-bot for Mike Galbraith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox