perf: fuzzer triggered CPU lockup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* perf: fuzzer triggered CPU lockup
@ 2015-07-02 21:02 Vince Weaver
  2015-07-03  3:31 ` Vince Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Vince Weaver @ 2015-07-02 21:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian


The fuzzer just turned this up.  This is the first part; a huge amount of 
CPU lockup messages, rcu stalls, and spinlock lockup suspected messages 
followed beyond what I've posted here but eventually the machine recovered (?)

[20601.976374] WARNING: CPU: 1 PID: 28966 at kernel/watchdog.c:311 watchdog_overflow_callback+0x84/0xa0()
[20601.986660] Watchdog detected hard LOCKUP on cpu 1
[20601.991597] Modules linked in:
[20601.995219]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel hmac drbg i915 ansi_cprng snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt ppdev snd_hda_intel snd_hda_codec aesni_intel aes_x86_64 snd_hda_core lrw evdev snd_hwdep drm_kms_helper drm psmouse iTCO_vendor_support snd_pcm gf128mul glue_helper ablk_helper serio_raw pcspkr cryptd snd_timer i2c_i801 processor battery video snd mei_me parport_pc i2c_algo_bit button soundcore lpc_ich mfd_core tpm_tis mei parport tpm wmi sg sr_mod sd_mod cdrom ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp usbcore crc32c_intel fan scsi_mod pps_core usb_common thermal thermal_sys
[20602.068491] CPU: 1 PID: 28966 Comm: perf_fuzzer Tainted: G        W       4.1.0+ #160
[20602.077305] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[20602.085645]  ffffffff81a27b82 ffff88011ea45af0 ffffffff8169e763 0000000000000000
[20602.094061]  ffff88011ea45b40 ffff88011ea45b30 ffffffff8106ecba ffff88011ea45c40
[20602.102496]  ffff8801197c7800 0000000000000000 ffff88011ea45c40 ffff88011ea45ef8
[20602.110909] Call Trace:
[20602.113982]  <NMI>  [<ffffffff8169e763>] dump_stack+0x45/0x57
[20602.120616]  [<ffffffff8106ecba>] warn_slowpath_common+0x8a/0xc0
[20602.127520]  [<ffffffff8106ed36>] warn_slowpath_fmt+0x46/0x50
[20602.134103]  [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0
[20602.141753]  [<ffffffff8111b694>] watchdog_overflow_callback+0x84/0xa0
[20602.149189]  [<ffffffff8115ae7c>] __perf_event_overflow+0x8c/0x1b0
[20602.156300]  [<ffffffff8115b9d4>] perf_event_overflow+0x14/0x20
[20602.163092]  [<ffffffff81032174>] intel_pmu_handle_irq+0x1d4/0x440
[20602.170206]  [<ffffffff81195ca5>] ? copy_page_range+0x515/0x960
[20602.177014]  [<ffffffff8105182b>] ? native_apic_msr_write+0x2b/0x30
[20602.184197]  [<ffffffff8105169d>] ? x2apic_send_IPI_self+0x1d/0x20
[20602.191284]  [<ffffffff81019155>] ? arch_irq_work_raise+0x35/0x40
[20602.198282]  [<ffffffff8114b0b6>] ? irq_work_queue+0x66/0x80
[20602.204830]  [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[20602.211928]  [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[20602.217999]  [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[20602.224206]  [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
[20602.230651]  [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[20602.236280]  [<ffffffff816a86df>] end_repeat_nmi+0x1e/0x2e
[20602.242650]  [<ffffffff8101cd16>] ? native_read_tsc+0x6/0x20
[20602.249194]  [<ffffffff8101cd16>] ? native_read_tsc+0x6/0x20
[20602.255733]  [<ffffffff8101cd16>] ? native_read_tsc+0x6/0x20
[20602.262282]  <<EOE>>  [<ffffffff813f7e9d>] delay_tsc+0x2d/0x70
[20602.269008]  [<ffffffff813f7daf>] __delay+0xf/0x20
[20602.274617]  [<ffffffff810bc2ac>] do_raw_spin_lock+0x8c/0x120
[20602.281273]  [<ffffffff816a62bc>] _raw_spin_lock_irqsave+0x4c/0x60
[20602.288346]  [<ffffffff81153aca>] ? perf_event_read+0x6a/0x130
[20602.295064]  [<ffffffff81153aca>] perf_event_read+0x6a/0x130
[20602.301605]  [<ffffffff81153c18>] perf_event_read_value+0x88/0xd0
[20602.308563]  [<ffffffff81154125>] ? perf_event_ctx_lock_nested+0x15/0xf0
[20602.316160]  [<ffffffff8115453f>] perf_read+0x1cf/0x300
[20602.322203]  [<ffffffff811973cf>] ? handle_mm_fault+0xdbf/0x1750
[20602.329079]  [<ffffffff811dd2a8>] __vfs_read+0x28/0xd0
[20602.335008]  [<ffffffff8136b073>] ? security_file_permission+0xa3/0xc0
[20602.342452]  [<ffffffff811dd7f3>] ? rw_verify_area+0x53/0xf0
[20602.348934]  [<ffffffff811dd91a>] vfs_read+0x8a/0x130
[20602.354769]  [<ffffffff811de739>] SyS_read+0x49/0xa0
[20602.360484]  [<ffffffff816a64b2>] entry_SYSCALL_64_fastpath+0x16/0x7a
[20602.367734] ---[ end trace e3d880444bf5705f ]---
[20609.247053] ------------[ cut here ]------------
[20609.252343] WARNING: CPU: 2 PID: 29626 at kernel/watchdog.c:311 watchdog_overflow_callback+0x84/0xa0()
[20609.262667] Watchdog detected hard LOCKUP on cpu 2
[20609.387068] Call Trace:
[20609.390155]  <NMI>  [<ffffffff8169e763>] dump_stack+0x45/0x57
[20609.396774]  [<ffffffff8106ecba>] warn_slowpath_common+0x8a/0xc0
[20609.403676]  [<ffffffff8106ed36>] warn_slowpath_fmt+0x46/0x50
[20609.410245]  [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0
[20609.417849]  [<ffffffff8111b694>] watchdog_overflow_callback+0x84/0xa0
[20609.425247]  [<ffffffff8115ae7c>] __perf_event_overflow+0x8c/0x1b0
[20609.432324]  [<ffffffff8115b9d4>] perf_event_overflow+0x14/0x20
[20609.439100]  [<ffffffff81032174>] intel_pmu_handle_irq+0x1d4/0x440
[20609.446153]  [<ffffffff8108f4f7>] ? __atomic_notifier_call_chain+0x77/0x90
[20609.453984]  [<ffffffff8108f485>] ? __atomic_notifier_call_chain+0x5/0x90
[20609.461717]  [<ffffffff810b8d4d>] ? lock_acquire+0xbd/0x120
[20609.468152]  [<ffffffff8105182b>] ? native_apic_msr_write+0x2b/0x30
[20609.475296]  [<ffffffff8105169d>] ? x2apic_send_IPI_self+0x1d/0x20
[20609.482383]  [<ffffffff81019155>] ? arch_irq_work_raise+0x35/0x40
[20609.489362]  [<ffffffff8114b0b6>] ? irq_work_queue+0x66/0x80
[20609.495880]  [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[20609.502970]  [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[20609.509066]  [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[20609.515184]  [<ffffffff810184b9>] default_do_nmi+0xc9/0x120
[20609.521617]  [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[20609.527217]  [<ffffffff816a86df>] end_repeat_nmi+0x1e/0x2e
[20609.533566]  [<ffffffff81058bdd>] ? native_write_msr_safe+0xd/0x10
[20609.540658]  [<ffffffff81058bdd>] ? native_write_msr_safe+0xd/0x10
[20609.547728]  [<ffffffff81058bdd>] ? native_write_msr_safe+0xd/0x10
[20609.554770]  <<EOE>>  [<ffffffff8103097a>] ? __intel_pmu_enable_all+0x5a/0xc0
[20609.562888]  [<ffffffff810309f0>] intel_pmu_enable_all+0x10/0x20
[20609.569768]  [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0
[20609.576271]  [<ffffffff81156112>] perf_pmu_enable+0x22/0x30
[20609.582693]  [<ffffffff811578d4>] perf_event_context_sched_in+0x94/0x150
[20609.590305]  [<ffffffff81157e07>] __perf_event_task_sched_in+0x97/0xa0
[20609.597710]  [<ffffffff81096517>] finish_task_switch+0x197/0x1f0
[20609.604580]  [<ffffffff816a0860>] ? __schedule+0x280/0xa50
[20609.610866]  [<ffffffff816a08c8>] __schedule+0x2e8/0xa50
[20609.616966]  [<ffffffff811590f2>] ? put_event+0x92/0x150
[20609.623071]  [<ffffffff816a1067>] schedule+0x37/0x80
[20609.628788]  [<ffffffff816a135e>] schedule_preempt_disabled+0xe/0x10
[20609.636002]  [<ffffffff816a2ce0>] mutex_lock_nested+0x120/0x320
[20609.642717]  [<ffffffff811590f2>] ? put_event+0x92/0x150
[20609.648789]  [<ffffffff811590f2>] put_event+0x92/0x150
[20609.654651]  [<ffffffff8115908b>] ? put_event+0x2b/0x150
[20609.660680]  [<ffffffff811591d0>] perf_release+0x10/0x20
[20609.666685]  [<ffffffff811df34c>] __fput+0xdc/0x1e0
[20609.672243]  [<ffffffff811df49e>] ____fput+0xe/0x10
[20609.677768]  [<ffffffff8108c295>] task_work_run+0x85/0xb0
[20609.683882]  [<ffffffff81071645>] do_exit+0x3a5/0xa90
[20609.689605]  [<ffffffff8107d1b2>] ? get_signal+0xe2/0x720
[20609.695712]  [<ffffffff81071dd4>] do_group_exit+0x54/0xe0
[20609.701802]  [<ffffffff8107d350>] get_signal+0x280/0x720
[20609.707790]  [<ffffffff81013478>] do_signal+0x28/0xb50
[20609.713579]  [<ffffffff810b7806>] ? __lock_acquire.isra.31+0x3a6/0xf90
[20609.720874]  [<ffffffff816a0653>] ? __schedule+0x73/0xa50
[20609.726927]  [<ffffffff816a0bcc>] ? __schedule+0x5ec/0xa50
[20609.733093]  [<ffffffff816a0bcc>] ? __schedule+0x5ec/0xa50
[20609.739262]  [<ffffffff810815d5>] ? sys_getppid+0x5/0x80
[20609.745244]  [<ffffffff81014002>] do_notify_resume+0x62/0x70
[20609.751546]  [<ffffffff816a719e>] retint_signal+0x44/0x86
[20609.757629] ---[ end trace e3d880444bf57060 ]---


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: perf: fuzzer triggered CPU lockup
  2015-07-02 21:02 perf: fuzzer triggered CPU lockup Vince Weaver
@ 2015-07-03  3:31 ` Vince Weaver
  2015-07-03 10:08   ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Vince Weaver @ 2015-07-03  3:31 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 2 Jul 2015, Vince Weaver wrote:

> The fuzzer just turned this up.  This is the first part; a huge amount of 
> CPU lockup messages, rcu stalls, and spinlock lockup suspected messages 
> followed beyond what I've posted here but eventually the machine recovered (?)

> [20602.134103]  [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0

after a reboot I am still getting these, in addition to the other
intel_pmu_drain_pebs_nhm related warning.  Do I need to try to bisect this 
down?

[18356.573650] ------------[ cut here ]------------
[18356.578896] WARNING: CPU: 1 PID: 27699 at kernel/watchdog.c:311 watchdog_overflow_callback+0x84/0xa0()
[18356.590213] Watchdog detected hard LOCKUP on cpu 1
[18356.706920] CPU: 1 PID: 27699 Comm: perf_fuzzer Tainted: G        W       4.1.0+ #162
[18356.716672] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[18356.725951]  ffffffff81a27b92 ffff88011ea45af0 ffffffff8169e763 0000000000000000
[18356.738175]  ffff88011ea45b40 ffff88011ea45b30 ffffffff8106ecba ffff88011ea45c40
[18356.750305]  ffff8801197c7800 0000000000000001 ffff88011ea45c40 ffff88011ea45ef8
[18356.761131] Call Trace:
[18356.765163]  <NMI>  [<ffffffff8169e763>] dump_stack+0x45/0x57
[18356.772761]  [<ffffffff8106ecba>] warn_slowpath_common+0x8a/0xc0
[18356.780580]  [<ffffffff8106ed36>] warn_slowpath_fmt+0x46/0x50
[18356.788080]  [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0
[18356.796685]  [<ffffffff8111b694>] watchdog_overflow_callback+0x84/0xa0
[18356.805063]  [<ffffffff8115ae7c>] __perf_event_overflow+0x8c/0x1b0
[18356.813081]  [<ffffffff8115b9d4>] perf_event_overflow+0x14/0x20
[18356.820813]  [<ffffffff81032194>] intel_pmu_handle_irq+0x1d4/0x440
[18356.828795]  [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[18356.836761]  [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[18356.843764]  [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[18356.850868]  [<ffffffff810184b9>] default_do_nmi+0xc9/0x120
[18356.858228]  [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[18356.864793]  [<ffffffff816a86df>] end_repeat_nmi+0x1e/0x2e
[18356.872062]  <<EOE>>  <UNK> 
[18356.875986] ---[ end trace 1b52d9dc94e8716a ]---

Followed by a lot of these:

[18359.393390] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 6, t=5253 jiffies, g=2341684, c=2341683, q=71)
[18359.405806] Task dump for CPU 1:
[18359.409688] perf_fuzzer     R  running task        0 27699   2386 0x00000008
[18359.417712]  ffff8800cfb7c680 00007f889f330fff 0000000000000096 0000000000000002
[18359.426147]  ffff8800cfb7c680 ffffffff82a107e0 0000000000000001 ffff8800c4413c28
[18359.434553]  0000000000000096 ffffffff814090e0 00007f8800000000 000000019f400000
[18359.443023] Call Trace:
[18359.446107]  [<ffffffff814090e0>] ? debug_check_no_obj_freed+0x90/0x200
[18359.453610]  [<ffffffff81409162>] ? debug_check_no_obj_freed+0x112/0x200
[18359.461226]  [<ffffffff81409162>] ? debug_check_no_obj_freed+0x112/0x200
[18359.468840]  [<ffffffff811bc096>] ? kfree_debugcheck+0x16/0x40
[18359.475521]  [<ffffffff8119e14e>] ? mmap_region+0x49e/0x5d0
[18359.481915]  [<ffffffff8118468b>] ? vm_mmap_pgoff+0x9b/0xc0
[18359.488362]  [<ffffffff810b5023>] ? up_write+0x23/0x50
[18359.494362]  [<ffffffff8118468b>] ? vm_mmap_pgoff+0x9b/0xc0
[18359.500807]  [<ffffffff811fbec5>] ? __fget+0x5/0xd0
[18359.506498]  [<ffffffff8119c978>] ? SyS_mmap_pgoff+0xe8/0x270
[18359.513100]  [<ffffffff81003012>] ? lockdep_sys_exit_thunk+0x12/0x14
[18359.520351]  [<ffffffff816a64bc>] ? entry_SYSCALL_64_fastpath+0x20/0x7a


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: perf: fuzzer triggered CPU lockup
  2015-07-03  3:31 ` Vince Weaver
@ 2015-07-03 10:08   ` Peter Zijlstra
  0 siblings, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2015-07-03 10:08 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, kan.liang

On Thu, Jul 02, 2015 at 11:31:15PM -0400, Vince Weaver wrote:
> On Thu, 2 Jul 2015, Vince Weaver wrote:
> 
> > The fuzzer just turned this up.  This is the first part; a huge amount of 
> > CPU lockup messages, rcu stalls, and spinlock lockup suspected messages 
> > followed beyond what I've posted here but eventually the machine recovered (?)
> 
> > [20602.134103]  [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0
> 
> after a reboot I am still getting these, in addition to the other
> intel_pmu_drain_pebs_nhm related warning.  Do I need to try to bisect this 
> down?

I think I have a fairly good idea which patches to blame for that;
there's been a rework of the pebs code there. Let me go have a look.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-07-03 10:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-02 21:02 perf: fuzzer triggered CPU lockup Vince Weaver
2015-07-03  3:31 ` Vince Weaver
2015-07-03 10:08   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox