From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752650AbcCASEr (ORCPT ); Tue, 1 Mar 2016 13:04:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50967 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751040AbcCASEq (ORCPT ); Tue, 1 Mar 2016 13:04:46 -0500 Date: Tue, 1 Mar 2016 19:04:40 +0100 From: Jiri Olsa To: Peter Zijlstra Cc: Andi Kleen , "Liang, Kan" , Arnaldo Carvalho de Melo , Ingo Molnar , Stephane Eranian , Wang Nan , "zheng.z.yan@intel.com" , LKML Subject: Re: [BUG] Core2 cpu triggers hard lockup with perf test Message-ID: <20160301180440.GA6769@krava.redhat.com> References: <20160227123636.GB30858@krava.redhat.com> <37D7C6CF3E00A74B8858931C1DB2F0770589EC94@SHSMSX103.ccr.corp.intel.com> <20160301091703.GN6356@twins.programming.kicks-ass.net> <20160301110651.GA15260@krava.redhat.com> <20160301145105.GQ5083@two.firstfloor.org> <20160301145909.GS6356@twins.programming.kicks-ass.net> <20160301171722.GA2666@krava.redhat.com> <20160301174903.GX6356@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160301174903.GX6356@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 01, 2016 at 06:49:03PM +0100, Peter Zijlstra wrote: > On Tue, Mar 01, 2016 at 06:17:22PM +0100, Jiri Olsa wrote: > > > [ 125.982977] [] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M > > [ 125.982977] [] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M > > [ 125.982977] [] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M > > [ 125.982977] <> [] intel_pmu_enable_all+0x10/0x20^M > > [ 125.982977] [] x86_pmu_enable+0x263/0x2f0^M > > [ 125.982977] [] perf_pmu_enable+0x22/0x30^M > > [ 125.982977] [] ctx_resched+0x51/0x60^M > > [ 125.982977] [] perf_event_exec+0x109/0x150^M > > [ 125.982977] [] setup_new_exec+0x6d/0x1a0^M > > [ 125.982977] [] load_elf_binary+0x37a/0x10e0^M > > [ 125.982977] [] ? get_user_pages+0x52/0x60^M > > [ 125.982977] [] search_binary_handler+0x9e/0x1e0^M > > [ 125.982977] [] do_execveat_common.isra.37+0x54d/0x6e0^M > > [ 125.982977] [] SyS_execve+0x3a/0x50^M > > [ 125.982977] [] stub_execve+0x5/0x5^M > > [ 125.982977] [] ? entry_SYSCALL_64_fastpath+0x12/0x6a^M > > > the exception addr is on wrmsr: > > > > ffffffff8100ae30 <__intel_pmu_enable_all.isra.11>: > > ffffffff8100ae30: e8 bb 02 67 00 callq ffffffff8167b0f0 <__fentry__> > > ffffffff8100ae35: 55 push %rbp > > ffffffff8100ae36: 48 89 e5 mov %rsp,%rbp > > ffffffff8100ae39: 41 54 push %r12 > > ffffffff8100ae3b: 41 89 fc mov %edi,%r12d > > ffffffff8100ae3e: 53 push %rbx > > ffffffff8100ae3f: 48 c7 c3 80 a3 00 00 mov $0xa380,%rbx > > ffffffff8100ae46: 65 48 03 1d d2 f2 ff add %gs:0x7efff2d2(%rip),%rbx # a120 > > ffffffff8100ae4d: 7e > > ffffffff8100ae4e: e8 6d 49 00 00 callq ffffffff8100f7c0 > > ffffffff8100ae53: 41 0f b6 fc movzbl %r12b,%edi > > ffffffff8100ae57: e8 94 58 00 00 callq ffffffff810106f0 > > ffffffff8100ae5c: 48 8b 83 68 0c 00 00 mov 0xc68(%rbx),%rax > > ffffffff8100ae63: b9 8f 03 00 00 mov $0x38f,%ecx > > ffffffff8100ae68: 48 f7 d0 not %rax > > ffffffff8100ae6b: 48 23 05 26 80 ad 00 and 0xad8026(%rip),%rax # ffffffff81ae2e98 > > ffffffff8100ae72: 48 89 c2 mov %rax,%rdx > > ffffffff8100ae75: 48 c1 ea 20 shr $0x20,%rdx > > ffffffff8100ae79: 0f 30 wrmsr > > > > That's the PERF_GLOBAL_CTRL, right? But it must have succeeded, yep, should be this one: static void __intel_pmu_enable_all(int added, bool pmi) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); intel_pmu_pebs_enable_all(); intel_pmu_lbr_enable_all(pmi); >>> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask); > otherwise the NMI watchdog would never have fired. so NMI wouldn't trigger if CPU is inside wrmsr? jirka > > Something is hosed alright. > > I think I've seen my IVB-EP do something similar. But mostly that > machine gets stuck in intel_bts_enable_local(). > >