perf: fuzzer causes lockup in x86_pmu_event

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* perf: fuzzer causes lockup in x86_pmu_event_init()
@ 2015-02-17 16:17 Vince Weaver
  2015-02-24  3:56 ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-02-17 16:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jiri Olsa


This is on a Haswell machine, current git as of this past Friday.

I let the perf_fuzzer run and it took 4 days to find this.
Sadly it doesn't seem to be reproducible so I am not sure
how it exactly got into this state.

It got stuck and kept printing the following message until I rebooted 
the system.

[328363.704011] INFO: rcu_sched self-detected stall on CPU { 2}  (t=5251 jiffies g=28607531 c=28607530 q=232)
[328363.715076] Task dump for CPU 2:
[328363.719278] perf_fuzzer     R  running task        0 20935   2309 0x00000008
[328363.727653]  0000000000000905 ffff88011ea83d68 ffffffff8109a126 ffffffff8109a095
[328363.736466]  0000000000000007 0000000000000002 ffffffff81c50780 ffff88011ea83d88
[328363.745256]  ffffffff8109d9fd ffff88011ea95b00 0000000000000003 ffff88011ea83db8
[328363.754037] Call Trace:
[328363.757382]  <IRQ>  [<ffffffff8109a126>] sched_show_task+0xf6/0x160
[328363.764853]  [<ffffffff8109a095>] ? sched_show_task+0x65/0x160
[328363.771769]  [<ffffffff8109d9fd>] dump_cpu_task+0x3d/0x50
[328363.778205]  [<ffffffff810d2301>] rcu_dump_cpu_stacks+0x91/0xd0
[328363.785181]  [<ffffffff810d5dab>] rcu_check_callbacks+0x48b/0x760
[328363.792345]  [<ffffffff810db429>] update_process_times+0x39/0x60
[328363.799431]  [<ffffffff810eb195>] tick_sched_handle.isra.18+0x25/0x60
[328363.806923]  [<ffffffff810eb8e4>] tick_sched_timer+0x44/0x80
[328363.813594]  [<ffffffff810dc0d4>] __run_hrtimer+0x94/0x1f0
[328363.820090]  [<ffffffff810dc623>] ? hrtimer_interrupt+0x83/0x220
[328363.827166]  [<ffffffff810eb8a0>] ? tick_sched_do_timer+0x40/0x40
[328363.834322]  [<ffffffff810dc69b>] hrtimer_interrupt+0xfb/0x220
[328363.841188]  [<ffffffff810495ec>] local_apic_timer_interrupt+0x3c/0x70
[328363.848802]  [<ffffffff816caf11>] smp_apic_timer_interrupt+0x41/0x60
[328363.856197]  [<ffffffff816c8fbd>] apic_timer_interrupt+0x6d/0x80
[328363.863238]  <EOI>  [<ffffffff810bad96>] ? __raw_spin_lock_init+0x36/0x60
[328363.871175]  [<ffffffff81029cb5>] ? collect_events+0x55/0xb0
[328363.877844]  [<ffffffff8102a6be>] x86_pmu_event_init+0xfe/0x3c0
[328363.884817]  [<ffffffff811559d3>] perf_try_init_event+0x33/0x70
[328363.891797]  [<ffffffff8115eb4f>] perf_init_event+0x13f/0x170
[328363.898586]  [<ffffffff8115ea15>] ? perf_init_event+0x5/0x170
[328363.905369]  [<ffffffff8115ef18>] perf_event_alloc+0x398/0x450
[328363.912237]  [<ffffffff8115f343>] SYSC_perf_event_open+0x373/0xc80
[328363.919502]  [<ffffffff8116009e>] SyS_perf_event_open+0xe/0x10
[328363.926387]  [<ffffffff816c7ead>] system_call_fastpath+0x16/0x1b
[328363.933452] INFO: rcu_sched detected stalls on CPUs/tasks: { 2} (detected by 0, t=5308 jiffies, g=28607531, c=28607530, q=232)
[328363.946389] Task dump for CPU 2:
[328363.950495] perf_fuzzer     R  running task        0 20935   2309 0x00000008
[328363.958750]  0000000000000246 ffff88011edf8f18 0000000000000141 ffff8800947d0990
[328363.967451]  ffff880036e36000 ffff88011a000640 000001ff00000020 0000000000000000
[328363.976158]  ffff88011a000640 ffff880036e36000 0000000080002800 ffffffff81032ead
[328363.984776] Call Trace:
[328363.988059]  [<ffffffff81032ead>] ? allocate_shared_regs+0x3d/0x90
[328363.995306]  [<ffffffff811c190b>] ? cache_alloc_debugcheck_after.isra.51+0x1fb/0x280
[328364.004275]  [<ffffffff810b53bd>] ? lockdep_init_map+0x5d/0x640
[328364.011286]  [<ffffffff8102a562>] ? allocate_fake_cpuc+0x32/0x90
[328364.018369]  [<ffffffff811c190b>] ? cache_alloc_debugcheck_after.isra.51+0x1fb/0x280
[328364.027342]  [<ffffffff810bad96>] ? __raw_spin_lock_init+0x36/0x60
[328364.034626]  [<ffffffff81029cb5>] ? collect_events+0x55/0xb0
[328364.041342]  [<ffffffff8102a6be>] ? x86_pmu_event_init+0xfe/0x3c0
[328364.048514]  [<ffffffff811559d3>] ? perf_try_init_event+0x33/0x70
[328364.055727]  [<ffffffff8115eb4f>] ? perf_init_event+0x13f/0x170
[328364.062731]  [<ffffffff8115ea15>] ? perf_init_event+0x5/0x170
[328364.069552]  [<ffffffff8115ef18>] ? perf_event_alloc+0x398/0x450
[328364.076619]  [<ffffffff8115f343>] ? SYSC_perf_event_open+0x373/0xc80
[328364.084124]  [<ffffffff8116009e>] ? SyS_perf_event_open+0xe/0x10
[328364.091184]  [<ffffffff816c7ead>] ? system_call_fastpath+0x16/0x1b
[328390.679373] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:20935]
[328390.688418] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_controller ghash_clmulni_intel snd_hda_codec aesni_intel aes_x86_64 snd_hwdep lrw ppdev i915 snd_pcm gf128mul iTCO_wdt evdev iTCO_vendor_support drm_kms_helper glue_helper psmouse drm ablk_helper cryptd snd_timer serio_raw mei_me parport_pc tpm_tis snd lpc_ich tpm soundcore i2c_i801 xhci_pci battery mei i2c_algo_bit parport pcspkr xhci_hcd mfd_core wmi button processor video sg sr_mod cdrom sd_mod ahci e1000e ehci_pci libahci ptp ehci_hcd libata usbcore crc32c_intel scsi_mod usb_common pps_core thermal fan thermal_sys
[328390.762307] CPU: 2 PID: 20935 Comm: perf_fuzzer Tainted: G        W       3.19.0+ #127
[328390.771527] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[328390.780237] task: ffff8800947d0990 ti: ffff8800946c4000 task.ti: ffff8800946c4000
[328390.789004] RIP: 0010:[<ffffffff81029cad>]  [<ffffffff81029cad>] collect_events+0x4d/0xb0
[328390.798577] RSP: 0018:ffff8800946c7d38  EFLAGS: 00000202
[328390.805033] RAX: ffff8800ce837000 RBX: ffffffff810bad96 RCX: 0000000000000001
[328390.813454] RDX: ffff8800ce837010 RSI: ffff8801193cb020 RDI: ffff8800958b3000
[328390.821845] RBP: ffff8800946c7d38 R08: 0000000000000007 R09: ffffffff828fb310
[328390.830256] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c220c0
[328390.838661] R13: ffff8801193cb000 R14: ffffffff81e7f181 R15: ffffffff81a24c53
[328390.847094] FS:  00007fb692472700(0000) GS:ffff88011ea80000(0000) knlGS:0000000000000000
[328390.856548] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[328390.863438] CR2: 0000000001e1d038 CR3: 0000000094724000 CR4: 00000000001407e0
[328390.871839] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[328390.880251] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[328390.888646] Stack:
[328390.891588]  ffff8800946c7d88 ffffffff8102a6be ffff8800946c7db8 0000000000000292
[328390.900379]  ffff8800947d0990 ffffffff81c220c0 ffff8800a0274000 0000000000000000
[328390.909169]  0000000000000000 0000000000000000 ffff8800946c7db8 ffffffff811559d3
[328390.917922] Call Trace:
[328390.921321]  [<ffffffff8102a6be>] x86_pmu_event_init+0xfe/0x3c0
[328390.928405]  [<ffffffff811559d3>] perf_try_init_event+0x33/0x70
[328390.935479]  [<ffffffff8115eb4f>] perf_init_event+0x13f/0x170
[328390.942426]  [<ffffffff8115ea15>] ? perf_init_event+0x5/0x170
[328390.949360]  [<ffffffff8115ef18>] perf_event_alloc+0x398/0x450
[328390.956398]  [<ffffffff8115f343>] SYSC_perf_event_open+0x373/0xc80
[328390.963774]  [<ffffffff8116009e>] SyS_perf_event_open+0xe/0x10
[328390.970773]  [<ffffffff816c7ead>] system_call_fastpath+0x16/0x1b
[328390.977968] Code: 84 d2 89 c8 74 5a 48 8b 56 20 48 83 c6 20 48 39 d6 48 8d 42 f0 75 14 eb 45 0f 1f 44 00 00 48 8b 50 10 48 39 d6 48 8d 42 f0 74 33 <48> 81 78 70 c0 20 c2 81 75 e9 8b 50 78 85 d2 78 e2 41 39 c8 7e 
[328390.679373] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:20935]
[328418.691171] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:20935]
...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-02-17 16:17 perf: fuzzer causes lockup in x86_pmu_event_init() Vince Weaver
@ 2015-02-24  3:56 ` Vince Weaver
  2015-02-25 15:16   ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-02-24  3:56 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jiri Olsa

On Tue, 17 Feb 2015, Vince Weaver wrote:

> This is on a Haswell machine, current git as of this past Friday.
> 
> I let the perf_fuzzer run and it took 4 days to find this.
> Sadly it doesn't seem to be reproducible so I am not sure
> how it exactly got into this state.

I have hit this on another machine, my core2 machine (after 10 days of 
fuzzing).  So this seems to be a real issue although hard to hit.

The problem seems to map to 
	arch/x86/kernel/cpu/perf_event.c:824

It is stuck forever in this loop in collect_events()

        list_for_each_entry(event, &leader->sibling_list, group_entry) {
                if (!is_x86_event(event) ||
                    event->state <= PERF_EVENT_STATE_OFF)
                        continue;

                if (n >= max_count)
                        return -EINVAL;

                cpuc->event_list[n] = event;
                n++;
        }


[884044.228001] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [perf_fuzzer:17282]
[884044.228001] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative f71882fg mcs7830 usbnet evdev video pcspkr acpi_cpufreq coretemp psmouse serio_raw processor thermal_sys ohci_pci ohci_hcd i2c_nforce2 wmi button sg ehci_pci ehci_hcd sd_mod usbcore usb_common
[884044.228001] CPU: 1 PID: 17282 Comm: perf_fuzzer Tainted: G        W       3.19.0+ #201
[884044.228001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[884044.228001] task: ffff88003dca4980 ti: ffff8801049dc000 task.ti: ffff8801049dc000
[884044.228001] RIP: 0010:[<ffffffff810138a8>]  [<ffffffff810138a8>] x86_pmu_event_init+0x138/0x31d
[884044.228001] RSP: 0018:ffff8801049dfd98  EFLAGS: 00000286
[884044.228001] RAX: ffff880042cd2000 RBX: ffff88003d11c000 RCX: 0000000000000005
[884044.228001] RDX: 0000000000000001 RSI: ffff880042cd2010 RDI: ffffffff810135c1
[884044.228001] RBP: ffff8801049dfdc8 R08: 00000000000080d0 R09: 0000000000000000
[884044.228001] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000286
[884044.228001] R13: 0000000000008000 R14: ffff88011f000700 R15: 0000000000000000
[884044.228001] FS:  00007faf3205f700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[884044.228001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[884044.228001] CR2: 0000000005463038 CR3: 0000000106371000 CR4: 00000000000407e0
[884044.228001] DR0: 00000000020a4000 DR1: 0000000001e96000 DR2: 0000000001e96000
[884044.228001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[884044.228001] Stack:
[884044.228001]  0000000000000002 ffffffff81a1b270 ffff88003e96e000 0000000000000000
[884044.228001]  ffffffff81a3b7a0 ffff88003e96e000 ffff8801049dfde8 ffffffff810cec1b
[884044.228001]  ffffffff81a1b270 ffff88003e96e000 ffff8801049dfe28 ffffffff810d488d
[884044.228001] Call Trace:
[884044.228001]  [<ffffffff810cec1b>] perf_try_init_event+0x25/0x47
[884044.228001]  [<ffffffff810d488d>] perf_init_event+0x93/0xca
[884044.228001]  [<ffffffff810d4b5f>] perf_event_alloc+0x29b/0x32d
[884044.228001]  [<ffffffff810d5008>] SYSC_perf_event_open+0x417/0x89c
[884044.228001]  [<ffffffff810d57fe>] SyS_perf_event_open+0x9/0xb
[884044.228001]  [<ffffffff8153eb96>] system_call_fastpath+0x16/0x1b
[884044.228001] Code: a1 81 8b 90 14 02 00 00 75 15 39 ca 0f 8d e7 01 00 00 48 63 c2 ff c2 4d 89 bc c5 20 05 00 00 49 8b 47 20 49 83 c7 20 48 83 e8 10 <48> 8d 70 10 4c 39 fe 74 2f 48 81 78 70 70 b2 a1 81 75 1b 83 78 
[884044.228001] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [perf_fuzzer:17282]
[884072.228001] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [perf_fuzzer:17282]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-02-24  3:56 ` Vince Weaver
@ 2015-02-25 15:16   ` Peter Zijlstra
  2015-02-28 12:14     ` Jiri Olsa
  2015-03-27 11:46     ` [tip:perf/core] perf: Fix racy group access tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2015-02-25 15:16 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Feb 23, 2015 at 10:56:10PM -0500, Vince Weaver wrote:
> On Tue, 17 Feb 2015, Vince Weaver wrote:
> 
> > This is on a Haswell machine, current git as of this past Friday.
> > 
> > I let the perf_fuzzer run and it took 4 days to find this.
> > Sadly it doesn't seem to be reproducible so I am not sure
> > how it exactly got into this state.
> 
> I have hit this on another machine, my core2 machine (after 10 days of 
> fuzzing).  So this seems to be a real issue although hard to hit.
> 
> The problem seems to map to 
> 	arch/x86/kernel/cpu/perf_event.c:824
> 
> It is stuck forever in this loop in collect_events()
> 
>         list_for_each_entry(event, &leader->sibling_list, group_entry) {
>                 if (!is_x86_event(event) ||
>                     event->state <= PERF_EVENT_STATE_OFF)
>                         continue;
> 
>                 if (n >= max_count)
>                         return -EINVAL;
> 
>                 cpuc->event_list[n] = event;
>                 n++;
>         }
> 

> [884044.228001] RIP: 0010:[<ffffffff810138a8>]  [<ffffffff810138a8>] x86_pmu_event_init+0x138/0x31d

> [884044.228001] Call Trace:
> [884044.228001]  [<ffffffff810cec1b>] perf_try_init_event+0x25/0x47
> [884044.228001]  [<ffffffff810d488d>] perf_init_event+0x93/0xca
> [884044.228001]  [<ffffffff810d4b5f>] perf_event_alloc+0x29b/0x32d
> [884044.228001]  [<ffffffff810d5008>] SYSC_perf_event_open+0x417/0x89c
> [884044.228001]  [<ffffffff810d57fe>] SyS_perf_event_open+0x9/0xb

That smells like a corrupted sibling_list, I see no other way for that
loop to not end.

It occurs to me that that list iteration is entirely unserialized, we
should be holding a ctx lock or mutex, but we do not.

Now IIRC the perf fuzzer is single threaded, so it would not actually
trigger the most horrible cases here; but this does smell bad.

Does something like the below make sense and/or help? Jolsa?

---
 kernel/events/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index af924bc38121..763e7c02e796 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7049,12 +7049,23 @@ EXPORT_SYMBOL_GPL(perf_pmu_unregister);
 
 static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 {
+	struct perf_event_context *ctx = NULL;
 	int ret;
 
 	if (!try_module_get(pmu->module))
 		return -ENODEV;
+
+	if (event->group_leader != event) {
+		ctx = perf_event_ctx_lock(event->group_leader);
+		BUG_ON(!ctx);
+	}
+
 	event->pmu = pmu;
 	ret = pmu->event_init(event);
+
+	if (ctx)
+		perf_event_ctx_unlock(event->group_leader, ctx);
+
 	if (ret)
 		module_put(pmu->module);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-02-25 15:16   ` Peter Zijlstra
@ 2015-02-28 12:14     ` Jiri Olsa
  2015-03-02 19:13       ` Vince Weaver
  2015-03-17 14:10       ` Peter Zijlstra
  2015-03-27 11:46     ` [tip:perf/core] perf: Fix racy group access tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 11+ messages in thread
From: Jiri Olsa @ 2015-02-28 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Wed, Feb 25, 2015 at 04:16:39PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 23, 2015 at 10:56:10PM -0500, Vince Weaver wrote:
> > On Tue, 17 Feb 2015, Vince Weaver wrote:
> > [884044.228001] RIP: 0010:[<ffffffff810138a8>]  [<ffffffff810138a8>] x86_pmu_event_init+0x138/0x31d
> 
> > [884044.228001] Call Trace:
> > [884044.228001]  [<ffffffff810cec1b>] perf_try_init_event+0x25/0x47
> > [884044.228001]  [<ffffffff810d488d>] perf_init_event+0x93/0xca
> > [884044.228001]  [<ffffffff810d4b5f>] perf_event_alloc+0x29b/0x32d
> > [884044.228001]  [<ffffffff810d5008>] SYSC_perf_event_open+0x417/0x89c
> > [884044.228001]  [<ffffffff810d57fe>] SyS_perf_event_open+0x9/0xb
> 
> That smells like a corrupted sibling_list, I see no other way for that
> loop to not end.
> 
> It occurs to me that that list iteration is entirely unserialized, we
> should be holding a ctx lock or mutex, but we do not.
> 
> Now IIRC the perf fuzzer is single threaded, so it would not actually
> trigger the most horrible cases here; but this does smell bad.
> 
> Does something like the below make sense and/or help? Jolsa?

SNIP

hum, I dont see the locking is a problem.. but looks like once
the sibling becomes singleton event, we dont init its sibling_list

how about patch below.. compile tested ;-)

jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e114019b14a..d91107392ce0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1470,7 +1470,9 @@ static void perf_group_detach(struct perf_event *event)
 	list_for_each_entry_safe(sibling, tmp, &event->sibling_list, group_entry) {
 		if (list)
 			list_move_tail(&sibling->group_entry, list);
+
 		sibling->group_leader = sibling;
+		INIT_LIST_HEAD(&sibling->sibling_list);
 
 		/* Inherit group flags from the previous leader */
 		sibling->group_flags = event->group_flags;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-02-28 12:14     ` Jiri Olsa
@ 2015-03-02 19:13       ` Vince Weaver
  2015-03-04 10:32         ` Jiri Olsa
  2015-03-17 13:55         ` Jiri Olsa
  2015-03-17 14:10       ` Peter Zijlstra
  1 sibling, 2 replies; 11+ messages in thread
From: Vince Weaver @ 2015-03-02 19:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Sat, 28 Feb 2015, Jiri Olsa wrote:

> hum, I dont see the locking is a problem.. but looks like once
> the sibling becomes singleton event, we dont init its sibling_list
> 
> how about patch below.. compile tested ;-)

so this is instead of PeterZ's patch?

I'll start my test-machine fuzzing with this plus the infinite-interrupt 
fix and see what happens.  The machines do eventually trigger the issue
somwhat consistently but it can take up to a week before they hit it
so it's a tough one to test for.

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-03-02 19:13       ` Vince Weaver
@ 2015-03-04 10:32         ` Jiri Olsa
  2015-03-17 13:55         ` Jiri Olsa
  1 sibling, 0 replies; 11+ messages in thread
From: Jiri Olsa @ 2015-03-04 10:32 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Mon, Mar 02, 2015 at 02:13:25PM -0500, Vince Weaver wrote:
> On Sat, 28 Feb 2015, Jiri Olsa wrote:
> 
> > hum, I dont see the locking is a problem.. but looks like once
> > the sibling becomes singleton event, we dont init its sibling_list
> > 
> > how about patch below.. compile tested ;-)
> 
> so this is instead of PeterZ's patch?

I think it's in addition.. Peter?

> 
> I'll start my test-machine fuzzing with this plus the infinite-interrupt 
> fix and see what happens.  The machines do eventually trigger the issue
> somwhat consistently but it can take up to a week before they hit it
> so it's a tough one to test for.

thanks,
jirka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-03-02 19:13       ` Vince Weaver
  2015-03-04 10:32         ` Jiri Olsa
@ 2015-03-17 13:55         ` Jiri Olsa
  2015-03-17 14:11           ` Peter Zijlstra
  2015-03-17 15:00           ` Vince Weaver
  1 sibling, 2 replies; 11+ messages in thread
From: Jiri Olsa @ 2015-03-17 13:55 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Mon, Mar 02, 2015 at 02:13:25PM -0500, Vince Weaver wrote:
> On Sat, 28 Feb 2015, Jiri Olsa wrote:
> 
> > hum, I dont see the locking is a problem.. but looks like once
> > the sibling becomes singleton event, we dont init its sibling_list
> > 
> > how about patch below.. compile tested ;-)
> 
> so this is instead of PeterZ's patch?
> 
> I'll start my test-machine fuzzing with this plus the infinite-interrupt 
> fix and see what happens.  The machines do eventually trigger the issue
> somwhat consistently but it can take up to a week before they hit it
> so it's a tough one to test for.

I guess no news is good news in here..  ;-)

Peter, should I repost formal patch or you'll take it somehow

thanks,
jirka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-03-17 13:55         ` Jiri Olsa
@ 2015-03-17 14:11           ` Peter Zijlstra
  2015-03-17 15:00           ` Vince Weaver
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2015-03-17 14:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Mar 17, 2015 at 02:55:36PM +0100, Jiri Olsa wrote:
> On Mon, Mar 02, 2015 at 02:13:25PM -0500, Vince Weaver wrote:
> > On Sat, 28 Feb 2015, Jiri Olsa wrote:
> > 
> > > hum, I dont see the locking is a problem.. but looks like once
> > > the sibling becomes singleton event, we dont init its sibling_list
> > > 
> > > how about patch below.. compile tested ;-)
> > 
> > so this is instead of PeterZ's patch?
> > 
> > I'll start my test-machine fuzzing with this plus the infinite-interrupt 
> > fix and see what happens.  The machines do eventually trigger the issue
> > somwhat consistently but it can take up to a week before they hit it
> > so it's a tough one to test for.
> 
> I guess no news is good news in here..  ;-)
> 
> Peter, should I repost formal patch or you'll take it somehow

Yeah, please send with a proper changelog, I'll stuff it in someplace.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-03-17 13:55         ` Jiri Olsa
  2015-03-17 14:11           ` Peter Zijlstra
@ 2015-03-17 15:00           ` Vince Weaver
  1 sibling, 0 replies; 11+ messages in thread
From: Vince Weaver @ 2015-03-17 15:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, Peter Zijlstra, linux-kernel, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Tue, 17 Mar 2015, Jiri Olsa wrote:

> > I'll start my test-machine fuzzing with this plus the infinite-interrupt 
> > fix and see what happens.  The machines do eventually trigger the issue
> > somwhat consistently but it can take up to a week before they hit it
> > so it's a tough one to test for.
> 
> I guess no news is good news in here..  ;-)

yes, with both the irq-storm patch plus your patch I've been fuzzing for 
over a week with no crashes.  (I'm only using your patch for the lockup 
problem, not Peter's).

There are other minor warnings that pop up as per usual, and if you fuzz 
with paranoid set to 1 you quickly crash with the uncore issue I reported 
a while back, but otherwise things seem really solid with the two patches 
I mentioned.

Of course that's what I always say just before things start crashing 
again, although right now I had to stop the fuzzing temporarily as somehow 
the fuzzer is leaking children for reasons I don't understand and I'm 
trying to debug.

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf: fuzzer causes lockup in x86_pmu_event_init()
  2015-02-28 12:14     ` Jiri Olsa
  2015-03-02 19:13       ` Vince Weaver
@ 2015-03-17 14:10       ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2015-03-17 14:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Sat, Feb 28, 2015 at 01:14:42PM +0100, Jiri Olsa wrote:
> hum, I dont see the locking is a problem.. 

Well that validate stuff iterates the sibling list, and we're not
holding any of the requisite locks to do so. So I think we also need my
patch.

Agreed that its unlikely that this will be causing Vince's problem
though.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:perf/core] perf: Fix racy group access
  2015-02-25 15:16   ` Peter Zijlstra
  2015-02-28 12:14     ` Jiri Olsa
@ 2015-03-27 11:46     ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-03-27 11:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, sasha.levin, linux-kernel, peterz, hpa, jolsa, mingo,
	vincent.weaver

Commit-ID:  ccd41c86ad4d464d0ed4e48d80759ff85c2115b0
Gitweb:     http://git.kernel.org/tip/ccd41c86ad4d464d0ed4e48d80759ff85c2115b0
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 25 Feb 2015 15:56:04 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 27 Mar 2015 09:49:45 +0100

perf: Fix racy group access

While looking at some fuzzer output I noticed that we do not hold any
locks on leader->ctx and therefore the sibling_list iteration is
unsafe.

Acquire the relevant ctx->mutex before calling into the pmu specific
code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/20150225151639.GL5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b01dfb6..bb1a7c3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7036,12 +7036,23 @@ EXPORT_SYMBOL_GPL(perf_pmu_unregister);
 
 static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 {
+	struct perf_event_context *ctx = NULL;
 	int ret;
 
 	if (!try_module_get(pmu->module))
 		return -ENODEV;
+
+	if (event->group_leader != event) {
+		ctx = perf_event_ctx_lock(event->group_leader);
+		BUG_ON(!ctx);
+	}
+
 	event->pmu = pmu;
 	ret = pmu->event_init(event);
+
+	if (ctx)
+		perf_event_ctx_unlock(event->group_leader, ctx);
+
 	if (ret)
 		module_put(pmu->module);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-27 11:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-17 16:17 perf: fuzzer causes lockup in x86_pmu_event_init() Vince Weaver
2015-02-24  3:56 ` Vince Weaver
2015-02-25 15:16   ` Peter Zijlstra
2015-02-28 12:14     ` Jiri Olsa
2015-03-02 19:13       ` Vince Weaver
2015-03-04 10:32         ` Jiri Olsa
2015-03-17 13:55         ` Jiri Olsa
2015-03-17 14:11           ` Peter Zijlstra
2015-03-17 15:00           ` Vince Weaver
2015-03-17 14:10       ` Peter Zijlstra
2015-03-27 11:46     ` [tip:perf/core] perf: Fix racy group access tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox