* perf: perf_fuzzer triggers GPF in perf_prepare_sample
@ 2018-12-04 15:54 Vince Weaver
2018-12-05 12:45 ` Jiri Olsa
0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2018-12-04 15:54 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Alexander Shishkin, Jiri Olsa, Namhyung Kim
Hello,
I was able to trigger another oops with the perf_fuzzer with current git.
This is 4.20-rc5 after the fix for the very similar oops I previously
reported got committed.
It seems to be pointing to the same location in the source as
before, I guess maybe triggered a different way?
Unfortunately this crash is not easily reproducible like the last one was.
kernel/events/core.c:6393
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
>>>>>>>>> size += data->callchain->nr;
header->size += size * sizeof(u64);
}
Vince
[45050.698745] general protection fault: 0000 [#1] SMP PTI
[45050.698745] CPU: 5 PID: 13475 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5 #124
[45050.698746] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[45050.698746] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[45050.698746] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[45050.698747] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
[45050.698747] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
[45050.698747] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
[45050.698748] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
[45050.698748] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
[45050.698748] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
[45050.698748] FS: 00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
[45050.698749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45050.698749] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
[45050.698749] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
[45050.698749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[45050.698750] Call Trace:
[45050.698750] intel_pmu_drain_bts_buffer+0x151/0x220
[45050.698750] ? mem_cgroup_commit_charge+0x7a/0x510
[45050.698750] ? wp_page_copy+0x39e/0x650
[45050.698750] ? reuse_swap_page+0x129/0x340
[45050.698751] ? _raw_spin_unlock+0xa/0x10
[45050.698751] ? do_wp_page+0x30f/0x4d0
[45050.698751] ? finish_mkwrite_fault+0x140/0x140
[45050.698751] ? __handle_mm_fault+0xb22/0x12c0
[45050.698751] intel_pmu_handle_irq+0x6d/0x160
[45050.698752] perf_event_nmi_handler+0x2d/0x50
[45050.698752] nmi_handle+0x63/0x110
[45050.698752] default_do_nmi+0x4e/0x100
[45050.698752] do_nmi+0x112/0x170
[45050.698752] nmi+0x8b/0xd4
[45050.698753] RIP: 0033:0x558a6a6366c3
[45050.698753] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48
[45050.698753] RSP: 002b:00007ffc9f521660 EFLAGS: 00000246
[45050.698754] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030
[45050.698754] RDX: 000000000000e740 RSI: 00007ffc9f521634 RDI: 00007fab6612c740
[45050.698754] RBP: 00007ffc9f521670 R08: 00007fab6612c1f0 R09: 00007fab6612c240
[45050.698754] R10: 00007fab661337d0 R11: 0000000000000246 R12: 0000558a6a6364c0
[45050.698755] R13: 00007ffc9f523ad0 R14: 0000000000000000 R15: 0000000000000000
[45050.698755] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel coretemp tpm_tis snd_hda_codec snd_hda_core kvm_intel tpm_tis_core i915 snd_hwdep kvm tpm snd_pcm rng_core wmi_bmof mei_me sg iosf_mbi irqbypass drm_kms_helper evdev crct10dif_pclmul drm mei iTCO_wdt i2c_algo_bit iTCO_vendor_support snd_timer pcc_cpufreq crc32_pclmul ghash_clmulni_intel aesni_intel snd video aes_x86_64 crypto_simd cryptd glue_helper soundcore pcspkr wmi button binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci libahci ehci_pci xhci_pci libata xhci_hcd ehci_hcd lpc_ich mfd_core crc32c_intel scsi_mod e1000e i2c_i801 usbcore usb_common fan thermal[45051.027024] ---[ end trace 9565944010fbdf23 ]---
[45051.027024] RIP: 0010:perf_prepare_sample+0x82/0x4a0
[45051.027025] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
[45051.027025] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082
[45051.027025] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f
[45051.027025] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000
[45051.027026] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0
[45051.027026] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40
[45051.027026] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420
[45051.027027] FS: 00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000
[45051.027027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45051.027027] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0
[45051.027027] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000
[45051.027027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[45051.027028] Kernel panic - not syncing: Fatal exception in interrupt
[45051.027051] Kernel Offset: disabled
[45051.149441] ---[ end Kernel panic - not syncing: Fatal exception in interrupt]---
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-04 15:54 perf: perf_fuzzer triggers GPF in perf_prepare_sample Vince Weaver @ 2018-12-05 12:45 ` Jiri Olsa 2018-12-05 16:38 ` Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-12-05 12:45 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > Hello, > > I was able to trigger another oops with the perf_fuzzer with current git. > > This is 4.20-rc5 after the fix for the very similar oops I previously > reported got committed. > > It seems to be pointing to the same location in the source as > before, I guess maybe triggered a different way? nice.. yep, looks the same > > Unfortunately this crash is not easily reproducible like the last one was. will check jirka > > kernel/events/core.c:6393 > > if (sample_type & PERF_SAMPLE_CALLCHAIN) { > int size = 1; > > if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)) > data->callchain = perf_callchain(event, regs); > > >>>>>>>>> size += data->callchain->nr; > > header->size += size * sizeof(u64); > } > > > Vince > > [45050.698745] general protection fault: 0000 [#1] SMP PTI > [45050.698745] CPU: 5 PID: 13475 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5 #124 > [45050.698746] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > [45050.698746] RIP: 0010:perf_prepare_sample+0x82/0x4a0 > [45050.698746] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41 > [45050.698747] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082 > [45050.698747] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f > [45050.698747] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000 > [45050.698748] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0 > [45050.698748] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40 > [45050.698748] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420 > [45050.698748] FS: 00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000 > [45050.698749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [45050.698749] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0 > [45050.698749] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000 > [45050.698749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > [45050.698750] Call Trace: > [45050.698750] intel_pmu_drain_bts_buffer+0x151/0x220 > [45050.698750] ? mem_cgroup_commit_charge+0x7a/0x510 > [45050.698750] ? wp_page_copy+0x39e/0x650 > [45050.698750] ? reuse_swap_page+0x129/0x340 > [45050.698751] ? _raw_spin_unlock+0xa/0x10 > [45050.698751] ? do_wp_page+0x30f/0x4d0 > [45050.698751] ? finish_mkwrite_fault+0x140/0x140 > [45050.698751] ? __handle_mm_fault+0xb22/0x12c0 > [45050.698751] intel_pmu_handle_irq+0x6d/0x160 > [45050.698752] perf_event_nmi_handler+0x2d/0x50 > [45050.698752] nmi_handle+0x63/0x110 > [45050.698752] default_do_nmi+0x4e/0x100 > [45050.698752] do_nmi+0x112/0x170 > [45050.698752] nmi+0x8b/0xd4 > [45050.698753] RIP: 0033:0x558a6a6366c3 > [45050.698753] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48 > [45050.698753] RSP: 002b:00007ffc9f521660 EFLAGS: 00000246 > [45050.698754] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030 > [45050.698754] RDX: 000000000000e740 RSI: 00007ffc9f521634 RDI: 00007fab6612c740 > [45050.698754] RBP: 00007ffc9f521670 R08: 00007fab6612c1f0 R09: 00007fab6612c240 > [45050.698754] R10: 00007fab661337d0 R11: 0000000000000246 R12: 0000558a6a6364c0 > [45050.698755] R13: 00007ffc9f523ad0 R14: 0000000000000000 R15: 0000000000000000 > [45050.698755] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel coretemp tpm_tis snd_hda_codec snd_hda_core kvm_intel tpm_tis_core i915 snd_hwdep kvm tpm snd_pcm rng_core wmi_bmof mei_me sg iosf_mbi irqbypass drm_kms_helper evdev crct10dif_pclmul drm mei iTCO_wdt i2c_algo_bit iTCO_vendor_support snd_timer pcc_cpufreq crc32_pclmul ghash_clmulni_intel aesni_intel snd video aes_x86_64 crypto_simd cryptd glue_helper soundcore pcspkr wmi button binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci libahci ehci_pci xhci_pci libata xhci_hcd ehci_hcd lpc_ich mfd_core crc32c_intel scsi_mod e1000e i2c_i801 usbcore usb_common fan thermal[45051.027024] ---[ end trace 9565944010fbdf23 ]--- > [45051.027024] RIP: 0010:perf_prepare_sample+0x82/0x4a0 > [45051.027025] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41 > [45051.027025] RSP: 0000:ffffc900206bfb00 EFLAGS: 00010082 > [45051.027025] RAX: dead000000000200 RBX: ffffc900206bfb58 RCX: 000000000000001f > [45051.027025] RDX: 0000000000000000 RSI: 0000000025bbf56f RDI: 0000000000000000 > [45051.027026] RBP: 8000000000000275 R08: 0000000000000002 R09: 00000000000215c0 > [45051.027026] R10: 00008b25b2e2f5c8 R11: 0000000000000000 R12: ffffc900206bfc40 > [45051.027026] R13: ffff8880cf6d7800 R14: ffffc900206bfb98 R15: ffff88811ab4f420 > [45051.027027] FS: 00007fab66133500(0000) GS:ffff88811ab40000(0000) knlGS:0000000000000000 > [45051.027027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [45051.027027] CR2: 00007fab66133480 CR3: 00000000811aa004 CR4: 00000000001607e0 > [45051.027027] DR0: 0000000000000000 DR1: 000000008e8e8000 DR2: 0000000000000000 > [45051.027027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > [45051.027028] Kernel panic - not syncing: Fatal exception in interrupt > [45051.027051] Kernel Offset: disabled > [45051.149441] ---[ end Kernel panic - not syncing: Fatal exception in interrupt]--- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-05 12:45 ` Jiri Olsa @ 2018-12-05 16:38 ` Jiri Olsa 2018-12-05 17:11 ` Vince Weaver 0 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-12-05 16:38 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote: > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > > Hello, > > > > I was able to trigger another oops with the perf_fuzzer with current git. > > > > This is 4.20-rc5 after the fix for the very similar oops I previously > > reported got committed. > > > > It seems to be pointing to the same location in the source as > > before, I guess maybe triggered a different way? > > nice.. yep, looks the same > > > > > Unfortunately this crash is not easily reproducible like the last one was. > > will check what model are hitting this on? jirka ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-05 16:38 ` Jiri Olsa @ 2018-12-05 17:11 ` Vince Weaver 2018-12-05 18:33 ` Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-12-05 17:11 UTC (permalink / raw) To: Jiri Olsa Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim On Wed, 5 Dec 2018, Jiri Olsa wrote: > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote: > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > > > Hello, > > > > > > I was able to trigger another oops with the perf_fuzzer with current git. > > > > > > This is 4.20-rc5 after the fix for the very similar oops I previously > > > reported got committed. > > > > > > It seems to be pointing to the same location in the source as > > > before, I guess maybe triggered a different way? > > > > nice.. yep, looks the same > > > > > > > > Unfortunately this crash is not easily reproducible like the last one was. > > > > will check > > what model are hitting this on? Haswell. 6/60/3. While I can't deterministically trigger this, the fuzzer usually hits it within an hour or two. Is there any debug or printk messages I can add that would help figure out what's going on? Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-05 17:11 ` Vince Weaver @ 2018-12-05 18:33 ` Jiri Olsa 2018-12-06 15:35 ` Vince Weaver 0 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-12-05 18:33 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim, Andi Kleen On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote: > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote: > > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > > > > Hello, > > > > > > > > I was able to trigger another oops with the perf_fuzzer with current git. > > > > > > > > This is 4.20-rc5 after the fix for the very similar oops I previously > > > > reported got committed. > > > > > > > > It seems to be pointing to the same location in the source as > > > > before, I guess maybe triggered a different way? > > > > > > nice.. yep, looks the same > > > > > > > > > > > Unfortunately this crash is not easily reproducible like the last one was. > > > > > > will check > > > > what model are hitting this on? > > Haswell. 6/60/3. > > While I can't deterministically trigger this, the fuzzer usually hits it > within an hour or two. Is there any debug or printk messages I can > add that would help figure out what's going on? I can't see how we could end up with that config other than some corruption.. the only way I see could be that we touch cpu->events array without checking its active_mask bit but that does not explain why the crash happened in the same place as before jirka --- diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ecc3e34ca955..9a2fd5a68d87 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2404,7 +2404,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs) struct cpu_hw_events *cpuc; int loops; u64 status; - int handled; + int handled = 0; int pmu_enabled; cpuc = this_cpu_ptr(&cpu_hw_events); @@ -2423,8 +2423,10 @@ static int intel_pmu_handle_irq(struct pt_regs *regs) intel_bts_disable_local(); cpuc->enabled = 0; __intel_pmu_disable_all(); - handled = intel_pmu_drain_bts_buffer(); - handled += intel_bts_interrupt(); + if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) { + handled += intel_pmu_drain_bts_buffer(); + handled += intel_bts_interrupt(); + } status = intel_pmu_get_status(); if (!status) goto done; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-05 18:33 ` Jiri Olsa @ 2018-12-06 15:35 ` Vince Weaver 2018-12-06 15:44 ` Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-12-06 15:35 UTC (permalink / raw) To: Jiri Olsa Cc: Vince Weaver, linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim, Andi Kleen On Wed, 5 Dec 2018, Jiri Olsa wrote: > On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote: > > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > > > > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote: > > > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > > > > > Hello, > > > > > > > > > > I was able to trigger another oops with the perf_fuzzer with current git. > > > > > > > > > > This is 4.20-rc5 after the fix for the very similar oops I previously > > > > > reported got committed. > > > > > > > > > > It seems to be pointing to the same location in the source as > > > > > before, I guess maybe triggered a different way? > > > > > > > > nice.. yep, looks the same > > > > > > > > > > > > > > Unfortunately this crash is not easily reproducible like the last one was. > > > > > > > > will check > > > > > > what model are hitting this on? > > > > Haswell. 6/60/3. > > > > While I can't deterministically trigger this, the fuzzer usually hits it > > within an hour or two. Is there any debug or printk messages I can > > add that would help figure out what's going on? > > I can't see how we could end up with that config other than > some corruption.. the only way I see could be that we touch > cpu->events array without checking its active_mask bit > > but that does not explain why the crash happened in the same > place as before Maybe it is a corruption issue. I had applied my own debug patch that would dump some info if data->callchain was NULL. But my debug code didn't trigger this time because it looks like data->callchain was "1" rather than "0". [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [27764.840179] PGD 0 P4D 0 [27764.840180] Oops: 0000 [#1] SMP PTI [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #125 [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-06 15:35 ` Vince Weaver @ 2018-12-06 15:44 ` Jiri Olsa 2018-12-09 2:08 ` Vince Weaver 0 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-12-06 15:44 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim, Andi Kleen On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote: > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > > On Wed, Dec 05, 2018 at 12:11:19PM -0500, Vince Weaver wrote: > > > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > > > > > > On Wed, Dec 05, 2018 at 01:45:38PM +0100, Jiri Olsa wrote: > > > > > On Tue, Dec 04, 2018 at 10:54:55AM -0500, Vince Weaver wrote: > > > > > > Hello, > > > > > > > > > > > > I was able to trigger another oops with the perf_fuzzer with current git. > > > > > > > > > > > > This is 4.20-rc5 after the fix for the very similar oops I previously > > > > > > reported got committed. > > > > > > > > > > > > It seems to be pointing to the same location in the source as > > > > > > before, I guess maybe triggered a different way? > > > > > > > > > > nice.. yep, looks the same > > > > > > > > > > > > > > > > > Unfortunately this crash is not easily reproducible like the last one was. > > > > > > > > > > will check > > > > > > > > what model are hitting this on? > > > > > > Haswell. 6/60/3. > > > > > > While I can't deterministically trigger this, the fuzzer usually hits it > > > within an hour or two. Is there any debug or printk messages I can > > > add that would help figure out what's going on? > > > > I can't see how we could end up with that config other than > > some corruption.. the only way I see could be that we touch > > cpu->events array without checking its active_mask bit > > > > but that does not explain why the crash happened in the same > > place as before > > Maybe it is a corruption issue. I had applied my own debug patch that > would dump some info if data->callchain was NULL. > > But my debug code didn't trigger this time because it looks like > data->callchain was "1" rather than "0". > > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > [27764.840179] PGD 0 P4D 0 > [27764.840180] Oops: 0000 [#1] SMP PTI > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #125 > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 actually, you could try that patch from my previous email? thanks, jirka ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-06 15:44 ` Jiri Olsa @ 2018-12-09 2:08 ` Vince Weaver 2018-12-09 11:55 ` Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-12-09 2:08 UTC (permalink / raw) To: Jiri Olsa Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim, Andi Kleen On Thu, 6 Dec 2018, Jiri Olsa wrote: > On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote: > > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > Maybe it is a corruption issue. I had applied my own debug patch that > > would dump some info if data->callchain was NULL. > > > > But my debug code didn't trigger this time because it looks like > > data->callchain was "1" rather than "0". > > > > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > > [27764.840179] PGD 0 P4D 0 > > [27764.840180] Oops: 0000 [#1] SMP PTI > > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #125 > > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > > actually, you could try that patch from my previous email? > still crashes with your patch (see below) I've also been able to replicate this crash on a skylake machine in addition to the haswell machine. Vince [28269.147232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [28269.155628] PGD 0 P4D 0 [28269.158360] Oops: 0000 [#1] SMP PTI [28269.162087] CPU: 0 PID: 1189 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #128 [28269.171011] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 [28269.178935] RIP: 0010:perf_prepare_sample+0x82/0x4a0 [28269.184239] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41 [28269.204249] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082 [28269.209832] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8 [28269.217484] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e [28269.225129] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0 [28269.232760] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80 [28269.240380] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300 [28269.248014] FS: 00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 [28269.256606] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28269.262739] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0 [28269.270349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [28269.277968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 [28269.285639] Call Trace: [28269.288266] intel_pmu_drain_bts_buffer+0x151/0x220 [28269.293476] ? radix_tree_delete_item+0x69/0xc0 [28269.298378] x86_pmu_stop+0x3b/0x90 [28269.302113] x86_pmu_del+0x57/0x160 [28269.305840] event_sched_out.isra.106+0x81/0x170 [28269.310780] group_sched_out.part.108+0x51/0xc0 [28269.315634] ctx_sched_out+0xf8/0x220 [28269.319551] __perf_event_task_sched_out+0x18d/0x3f0 [28269.324866] ? pick_next_task_fair+0x60a/0x660 [28269.329639] __schedule+0x4b9/0x820 [28269.333367] ? kill_pid_info+0x34/0x50 [28269.337360] schedule+0x28/0x80 [28269.340725] exit_to_usermode_loop+0x4e/0xc0 [28269.345272] prepare_exit_to_usermode+0x53/0x80 [28269.350109] retint_user+0x8/0x8 [28269.353541] RIP: 0033:0x56154980b6c3 [28269.357346] Code: 01 d0 48 c1 e0 06 48 89 c2 48 8d 05 cf 93 23 00 48 8b 04 02 48 85 c0 74 11 8b 45 f8 3b 45 f4 75 05 8b 45 fc eb 16 83 45 f8 01 <83> 45 fc 01 81 7d fc 9f 86 01 00 7e 96 b8 ff ff ff ff c9 c3 55 48 [28269.377462] RSP: 002b:00007ffc6a1540a0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [28269.385562] RAX: 0000000000000000 RBX: 000000000000000c RCX: 000000000000003c [28269.393182] RDX: 0000000000b895c0 RSI: 00007ffc6a154074 RDI: 00007f5927fe0740 [28269.400835] RBP: 00007ffc6a1540b0 R08: 00007f5927fe01f0 R09: 00007f5927fe0240 [28269.408452] R10: 0000000000000000 R11: 0000000000000246 R12: 000056154980b4c0 [28269.416080] R13: 00007ffc6a156510 R14: 0000000000000000 R15: 0000000000000000 [28269.423723] Modules linked in: snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul iosf_mbi ghash_clmulni_intel drm_kms_helper aesni_intel snd_hda_codec_realtek aes_x86_64 crypto_simd drm cryptd snd_hda_codec_generic i2c_algo_bit snd_hda_intel evdev glue_helper snd_hda_codec snd_hda_core iTCO_wdt mei_me mei wmi_bmof tpm_tis snd_hwdep tpm_tis_core pcc_cpufreq pcspkr iTCO_vendor_support snd_pcm tpm sg rng_core button snd_timer video snd soundcore wmi binfmt_misc ip_tables x_tables autofs4 sr_mod sd_mod cdrom ahci xhci_pci ehci_pci libahci xhci_hcd ehci_hcd libata usbcore lpc_ich mfd_core e1000e scsi_mod i2c_i801 crc32c_intel usb_common fan thermal [28269.492702] CR2: 0000000000000000 [28269.496246] ---[ end trace 6775846bfda0f18b ]--- [28269.501186] RIP: 0010:perf_prepare_sample+0x82/0x4a0 [28269.506482] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41 [28269.526587] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082 [28269.532176] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8 [28269.539805] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e [28269.547450] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0 [28269.555075] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80 [28269.562694] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300 [28269.570329] FS: 00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 [28269.578960] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28269.585123] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0 [28269.592740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [28269.600358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample 2018-12-09 2:08 ` Vince Weaver @ 2018-12-09 11:55 ` Jiri Olsa 0 siblings, 0 replies; 9+ messages in thread From: Jiri Olsa @ 2018-12-09 11:55 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim, Andi Kleen On Sat, Dec 08, 2018 at 09:08:28PM -0500, Vince Weaver wrote: > On Thu, 6 Dec 2018, Jiri Olsa wrote: > > > On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote: > > > On Wed, 5 Dec 2018, Jiri Olsa wrote: > > > Maybe it is a corruption issue. I had applied my own debug patch that > > > would dump some info if data->callchain was NULL. > > > > > > But my debug code didn't trigger this time because it looks like > > > data->callchain was "1" rather than "0". > > > > > > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 > > > [27764.840179] PGD 0 P4D 0 > > > [27764.840180] Oops: 0000 [#1] SMP PTI > > > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #125 > > > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > > > > actually, you could try that patch from my previous email? > > > still crashes with your patch (see below) > > I've also been able to replicate this crash on a skylake machine in > addition to the haswell machine. > > Vince > > [28269.147232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > [28269.155628] PGD 0 P4D 0 > [28269.158360] Oops: 0000 [#1] SMP PTI > [28269.162087] CPU: 0 PID: 1189 Comm: perf_fuzzer Tainted: G W 4.20.0-rc5+ #128 > [28269.171011] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > [28269.178935] RIP: 0010:perf_prepare_sample+0x82/0x4a0 > [28269.184239] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41 > [28269.204249] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082 > [28269.209832] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8 > [28269.217484] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e > [28269.225129] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0 > [28269.232760] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80 > [28269.240380] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300 > [28269.248014] FS: 00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000 > [28269.256606] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [28269.262739] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0 > [28269.270349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [28269.277968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > [28269.285639] Call Trace: > [28269.288266] intel_pmu_drain_bts_buffer+0x151/0x220 > [28269.293476] ? radix_tree_delete_item+0x69/0xc0 > [28269.298378] x86_pmu_stop+0x3b/0x90 > [28269.302113] x86_pmu_del+0x57/0x160 nice, at least it's in different callstack context, that might help thanks, jirka ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-12-09 11:55 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-04 15:54 perf: perf_fuzzer triggers GPF in perf_prepare_sample Vince Weaver 2018-12-05 12:45 ` Jiri Olsa 2018-12-05 16:38 ` Jiri Olsa 2018-12-05 17:11 ` Vince Weaver 2018-12-05 18:33 ` Jiri Olsa 2018-12-06 15:35 ` Vince Weaver 2018-12-06 15:44 ` Jiri Olsa 2018-12-09 2:08 ` Vince Weaver 2018-12-09 11:55 ` Jiri Olsa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox