All of lore.kernel.org
 help / color / mirror / Atom feed
* crash in perf_event_read
@ 2017-01-31  7:04 Alexei Starovoitov
  2017-01-31 10:27 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2017-01-31  7:04 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel

Hi Peter,

rarely I'm seeing the following crash:
[40196.164255] BUG: unable to handle kernel paging request at 000000000000a11a
[40196.179636] IP: perf_event_read+0xd3/0x1a0
[40196.188669] PGD 82e93a067
[40196.188670] PUD 7e1ddf067
[40196.194629] PMD 0
[40196.200589]
[40196.208284] Oops: 0000 [#1] SMP
[40196.208285] Modules linked in:
[40196.208299] CPU: 24 PID: 4423 Comm: dynoKernelMon Not tainted 4.10.0-rc5-01189-gc6e0ad0ee5b0 #599
[40196.208300] Hardware name: Quanta Mono Lake-M.2 SATA 20F20BU0270/Mono Lake-M.2 SATA, BIOS F20_3A12 10/24/2016
[40196.208301] task: ffff8807e3b65580 task.stack: ffffc90009748000
[40196.208302] RIP: 0010:perf_event_read+0xd3/0x1a0
[40196.208303] RSP: 0018:ffffc9000974bd48 EFLAGS: 00010202
[40196.208304] RAX: 000000000000a040 RBX: ffff8807b79fd000 RCX: 0000000000000000
[40196.208304] RDX: 0000000000000018 RSI: 0000000000000000 RDI: 00000000ffffffff
[40196.208305] RBP: ffffc9000974bd80 R08: 0000000000000000 R09: 0000000000000000
[40196.208305] R10: ffff8807cf8c7038 R11: 0000000000000000 R12: ffffc9000974bde8
[40196.208306] R13: 0000000000000000 R14: ffff8807b79fd000 R15: 00000000000004e0
[40196.208307] FS:  00007ff1b45ff700(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000
[40196.208307] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40196.208308] CR2: 000000000000a11a CR3: 0000000850298000 CR4: 00000000003406e0
[40196.208308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[40196.208309] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[40196.208309] Call Trace:
[40196.208313]  ? __might_sleep+0x4a/0x80
[40196.208316]  perf_event_read_value+0x45/0x130
[40196.208318]  perf_read+0x84/0x2d0
[40196.208322]  __vfs_read+0x28/0x110
[40196.208325]  ? security_file_permission+0x9b/0xc0
[40196.208327]  vfs_read+0xa5/0x170
[40196.208327]  SyS_read+0x46/0xa0
[40196.208329]  do_syscall_64+0x4d/0xb0
[40196.208332]  entry_SYSCALL64_slow_path+0x25/0x25
[40196.208333] RIP: 0033:0x7ff1b6f5716d
[40196.208334] RSP: 002b:00007ff1b45fd080 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[40196.208335] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff1b6f5716d
[40196.208335] RDX: 0000000000000020 RSI: 00007ff1b45fd090 RDI: 0000000000000054
[40196.208336] RBP: 00007ff1b45fd0e0 R08: 00007ff1b65e45c0 R09: 00007ff1b45fc9b1
[40196.208336] R10: 0000000000000020 R11: 0000000000000293 R12: 0000000000000000
[40196.208337] R13: 00007ff1b5687000 R14: 00000004a817c7fe R15: 00000000000004e0
[40196.208337] Code: 60 02 00 00 74 30 48 63 cf 48 c7 c0 40 a0 00 00 48 8b 34 cd c0 93 d0 81 48 63 ca 48 8b 0c cd c0 93 d0 81 0f b7 8c 08 da 00 00 00 <66> 39 8c 30 da 00 00 00 0f 44 fa 48 8d 55 d0 b9 01 00 00 00 48
[40196.208353] RIP: perf_event_read+0xd3/0x1a0 RSP: ffffc9000974bd48
[40196.208353] CR2: 000000000000a11a

The RIP points to this asm:
   0xffffffff8115fc9b <+155>:   mov    %gs:0x7eeaa486(%rip),%edx        # 0xa128 <cpu_number>
   0xffffffff8115fca2 <+162>:   testb  $0x2,0x68(%rbx)
   0xffffffff8115fca6 <+166>:   mov    0x260(%rdi),%edi
   0xffffffff8115fcac <+172>:   je     0xffffffff8115fcde <perf_event_read+222>
   0xffffffff8115fcae <+174>:   movslq %edi,%rcx
   0xffffffff8115fcb1 <+177>:   mov    $0xa040,%rax
   0xffffffff8115fcb8 <+184>:   mov    -0x7e2f6c40(,%rcx,8),%rsi
   0xffffffff8115fcc0 <+192>:   movslq %edx,%rcx
   0xffffffff8115fcc3 <+195>:   mov    -0x7e2f6c40(,%rcx,8),%rcx
   0xffffffff8115fccb <+203>:   movzwl 0xda(%rax,%rcx,1),%ecx
   0xffffffff8115fcd3 <+211>:   cmp    %cx,0xda(%rax,%rsi,1)

which is this C code:
perf_event_read():
  local_cpu = get_cpu();
  cpu_to_read = find_cpu_to_read(event, local_cpu);
  put_cpu();

find_cpu_to_read():
  event_pkg =  topology_physical_package_id(event_cpu);
  local_pkg =  topology_physical_package_id(local_cpu);
  if (event_pkg == local_pkg)

If I read the asm correctly at the time of the crash
event_cpu == RDI == 00000000ffffffff
or in other words event->oncpu == -1
which I think is technically possible here.

Any suggestions how to fix this?
Happy to test any patches, though I don't know how to reproduce reliably.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-10  8:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-31  7:04 crash in perf_event_read Alexei Starovoitov
2017-01-31 10:27 ` Peter Zijlstra
2017-01-31 18:50   ` Alexei Starovoitov
2017-02-02  7:43   ` David Carrillo-Cisneros
2017-02-10  8:32   ` [tip:perf/urgent] perf/core: Fix crash in perf_event_read() tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.