All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liang, Kan <kan.liang@linux.intel.com>
To: lkp@lists.01.org
Subject: Re: [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload
Date: Tue, 20 Feb 2018 13:59:08 -0500	[thread overview]
Message-ID: <6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.com> (raw)
In-Reply-To: <20180219124446.GR25201@hirez.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]



On 2/19/2018 7:44 AM, Peter Zijlstra wrote:
> On Sat, Feb 17, 2018 at 02:21:19PM +0800, kernel test robot wrote:
>> [  242.731381] WARNING: CPU: 3 PID: 1107 at arch/x86/events/intel/ds.c:1326 intel_pmu_save_and_restart_reload+0x87/0x90
> 
> That's the one asserting the PMU is in fact disabled.
> 
>> [  242.731417] CPU: 3 PID: 1107 Comm: netserver Not tainted 4.15.0-00001-g41e062c #1
>> [  242.731418] Hardware name: LENOVO IdeaPad U410    /Lenovo          , BIOS 65CN15WW 06/05/2012
>> [  242.731422] RIP: 0010:intel_pmu_save_and_restart_reload+0x87/0x90
>> [  242.731423] RSP: 0018:fffffe000008c8d0 EFLAGS: 00010002
>> [  242.731425] RAX: 0000000000000001 RBX: ffff88007d069800 RCX: 0000000000000000
>> [  242.731426] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88007d069800
>> [  242.731427] RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000001
>> [  242.731428] R10: 00000000000000b0 R11: 0000000000003000 R12: 00000000000f4243
>> [  242.731429] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
>> [  242.731431] FS:  00007f1501639700(0000) GS:ffff880112ac0000(0000) knlGS:0000000000000000
>> [  242.731432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  242.731433] CR2: 00007f65a1394d68 CR3: 000000007f62a006 CR4: 00000000001606e0
>> [  242.731434] Call Trace:
>> [  242.731438]  <NMI>
>> [  242.731443]  __intel_pmu_pebs_event+0xc8/0x260
>> [  242.731452]  ? intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [  242.731454]  intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [  242.731457]  intel_pmu_handle_irq+0x12d/0x4b0
>> [  242.731464]  ? perf_event_nmi_handler+0x2d/0x50
>> [  242.731466]  perf_event_nmi_handler+0x2d/0x50
>> [  242.731470]  nmi_handle+0x6a/0x130
>> [  242.731473]  default_do_nmi+0x4e/0x110
>> [  242.731475]  do_nmi+0xe5/0x140
>> [  242.731479]  end_repeat_nmi+0x1a/0x54
> 
> And this should have shown with any testing I think.
> 
> The problem appears to be that intel_pmu_handle_irq() uses
> __intel_pmu_disable_all() which 'forgets' to clear cpuc->enabled as per
> x86_pmu_disable().
> 
> 

Yes, the cpuc->enabled is not updated accordingly in NMI handler.
The patch as below could fix it.

Thanks,
Kan
------

 From 4d07d81e3406a6a9958cfbb34c1deb87b77721a9 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Tue, 20 Feb 2018 02:11:50 -0800
Subject: [PATCH] perf/x86/intel: Update the PMU state in NMI handler

Intel PMU is disabled in NMI handler, but cpuc->enabled is not updated
accordingly. It doesn't trigger any problems in current code. Because
no one check it. But the code quality issue will bring problem when the
code want to check the PMU state. For example, the drain_pebs() will be
modified to fix auto-reload issue. The new code will check the PMU state.

The old PMU state must be saved when entering the NMI. Because it will
be used to restore the PMU state when leaving the NMI.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
  arch/x86/events/intel/core.c | 10 +++++++++-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6461a4a..80dfaae 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2209,16 +2209,23 @@ static int intel_pmu_handle_irq(struct pt_regs 
*regs)
  	int bit, loops;
  	u64 status;
  	int handled;
+	int pmu_enabled;

  	cpuc = this_cpu_ptr(&cpu_hw_events);

  	/*
+	 * Save the PMU state.
+	 * It needs to be restored when leaving the handler.
+	 */
+	pmu_enabled = cpuc->enabled;
+	/*
  	 * No known reason to not always do late ACK,
  	 * but just in case do it opt-in.
  	 */
  	if (!x86_pmu.late_ack)
  		apic_write(APIC_LVTPC, APIC_DM_NMI);
  	intel_bts_disable_local();
+	cpuc->enabled = 0;
  	__intel_pmu_disable_all();
  	handled = intel_pmu_drain_bts_buffer();
  	handled += intel_bts_interrupt();
@@ -2328,7 +2335,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

  done:
  	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
-	if (cpuc->enabled)
+	cpuc->enabled = pmu_enabled;
+	if (pmu_enabled)
  		__intel_pmu_enable_all(0, true);
  	intel_bts_enable_local();

-- 
2.7.4


WARNING: multiple messages have this Message-ID (diff)
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	kernel test robot <fengguang.wu@intel.com>
Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, acme@kernel.org,
	tglx@linutronix.de, jolsa@redhat.com, eranian@google.com,
	ak@linux.intel.com, lkp@01.org
Subject: Re: [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload
Date: Tue, 20 Feb 2018 13:59:08 -0500	[thread overview]
Message-ID: <6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.com> (raw)
In-Reply-To: <20180219124446.GR25201@hirez.programming.kicks-ass.net>



On 2/19/2018 7:44 AM, Peter Zijlstra wrote:
> On Sat, Feb 17, 2018 at 02:21:19PM +0800, kernel test robot wrote:
>> [  242.731381] WARNING: CPU: 3 PID: 1107 at arch/x86/events/intel/ds.c:1326 intel_pmu_save_and_restart_reload+0x87/0x90
> 
> That's the one asserting the PMU is in fact disabled.
> 
>> [  242.731417] CPU: 3 PID: 1107 Comm: netserver Not tainted 4.15.0-00001-g41e062c #1
>> [  242.731418] Hardware name: LENOVO IdeaPad U410    /Lenovo          , BIOS 65CN15WW 06/05/2012
>> [  242.731422] RIP: 0010:intel_pmu_save_and_restart_reload+0x87/0x90
>> [  242.731423] RSP: 0018:fffffe000008c8d0 EFLAGS: 00010002
>> [  242.731425] RAX: 0000000000000001 RBX: ffff88007d069800 RCX: 0000000000000000
>> [  242.731426] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88007d069800
>> [  242.731427] RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000001
>> [  242.731428] R10: 00000000000000b0 R11: 0000000000003000 R12: 00000000000f4243
>> [  242.731429] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
>> [  242.731431] FS:  00007f1501639700(0000) GS:ffff880112ac0000(0000) knlGS:0000000000000000
>> [  242.731432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  242.731433] CR2: 00007f65a1394d68 CR3: 000000007f62a006 CR4: 00000000001606e0
>> [  242.731434] Call Trace:
>> [  242.731438]  <NMI>
>> [  242.731443]  __intel_pmu_pebs_event+0xc8/0x260
>> [  242.731452]  ? intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [  242.731454]  intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [  242.731457]  intel_pmu_handle_irq+0x12d/0x4b0
>> [  242.731464]  ? perf_event_nmi_handler+0x2d/0x50
>> [  242.731466]  perf_event_nmi_handler+0x2d/0x50
>> [  242.731470]  nmi_handle+0x6a/0x130
>> [  242.731473]  default_do_nmi+0x4e/0x110
>> [  242.731475]  do_nmi+0xe5/0x140
>> [  242.731479]  end_repeat_nmi+0x1a/0x54
> 
> And this should have shown with any testing I think.
> 
> The problem appears to be that intel_pmu_handle_irq() uses
> __intel_pmu_disable_all() which 'forgets' to clear cpuc->enabled as per
> x86_pmu_disable().
> 
> 

Yes, the cpuc->enabled is not updated accordingly in NMI handler.
The patch as below could fix it.

Thanks,
Kan
------

 From 4d07d81e3406a6a9958cfbb34c1deb87b77721a9 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Tue, 20 Feb 2018 02:11:50 -0800
Subject: [PATCH] perf/x86/intel: Update the PMU state in NMI handler

Intel PMU is disabled in NMI handler, but cpuc->enabled is not updated
accordingly. It doesn't trigger any problems in current code. Because
no one check it. But the code quality issue will bring problem when the
code want to check the PMU state. For example, the drain_pebs() will be
modified to fix auto-reload issue. The new code will check the PMU state.

The old PMU state must be saved when entering the NMI. Because it will
be used to restore the PMU state when leaving the NMI.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
  arch/x86/events/intel/core.c | 10 +++++++++-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6461a4a..80dfaae 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2209,16 +2209,23 @@ static int intel_pmu_handle_irq(struct pt_regs 
*regs)
  	int bit, loops;
  	u64 status;
  	int handled;
+	int pmu_enabled;

  	cpuc = this_cpu_ptr(&cpu_hw_events);

  	/*
+	 * Save the PMU state.
+	 * It needs to be restored when leaving the handler.
+	 */
+	pmu_enabled = cpuc->enabled;
+	/*
  	 * No known reason to not always do late ACK,
  	 * but just in case do it opt-in.
  	 */
  	if (!x86_pmu.late_ack)
  		apic_write(APIC_LVTPC, APIC_DM_NMI);
  	intel_bts_disable_local();
+	cpuc->enabled = 0;
  	__intel_pmu_disable_all();
  	handled = intel_pmu_drain_bts_buffer();
  	handled += intel_bts_interrupt();
@@ -2328,7 +2335,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

  done:
  	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
-	if (cpuc->enabled)
+	cpuc->enabled = pmu_enabled;
+	if (pmu_enabled)
  		__intel_pmu_enable_all(0, true);
  	intel_bts_enable_local();

-- 
2.7.4

  reply	other threads:[~2018-02-20 18:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-12 22:20 [PATCH V4 0/5] bugs fix for auto-reload mmap read and rdpmc read kan.liang
2018-02-12 22:20 ` [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload kan.liang
2018-02-17  6:21   ` [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload kernel test robot
2018-02-17  6:21     ` kernel test robot
2018-02-19 12:44     ` Peter Zijlstra
2018-02-19 12:44       ` Peter Zijlstra
2018-02-20 18:59       ` Liang, Kan [this message]
2018-02-20 18:59         ` Liang, Kan
2018-03-09  9:08         ` [tip:perf/core] perf/x86/intel: Properly save/restore the PMU state in the NMI handler tip-bot for Kan Liang
2018-02-21 10:32   ` [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload Peter Zijlstra
2018-02-21 13:43     ` Liang, Kan
2018-02-21 13:45       ` Peter Zijlstra
2018-03-09  9:08   ` [tip:perf/core] " tip-bot for Kan Liang
2018-02-12 22:20 ` [PATCH V4 2/5] perf/x86: Introduce read function for x86_pmu kan.liang
2018-03-09  9:09   ` [tip:perf/core] perf/x86: Introduce a ->read() callback in 'struct x86_pmu' tip-bot for Kan Liang
2018-02-12 22:20 ` [PATCH V4 3/5] perf/x86/intel/ds: Introduce read function for auto-reload event kan.liang
2018-03-09  9:09   ` [tip:perf/core] perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there tip-bot for Kan Liang
2018-02-12 22:20 ` [PATCH V4 4/5] perf/x86/intel: Fix pmu read for auto-reload kan.liang
2018-03-09  9:10   ` [tip:perf/core] perf/x86/intel: Fix PMU " tip-bot for Kan Liang
2018-02-12 22:20 ` [PATCH V4 5/5] perf/x86: Fix: disable userspace RDPMC usage for large PEBS kan.liang
2018-03-09  9:10   ` [tip:perf/core] perf/x86/intel: Disable " tip-bot for Kan Liang
2018-03-09 14:31     ` Vince Weaver
2018-03-09 17:42       ` Peter Zijlstra
2018-03-09 18:53         ` Liang, Kan
2018-03-09 19:10         ` Vince Weaver
2018-03-12 14:08           ` Liang, Kan
2018-03-20 11:15   ` [tip:perf/urgent] " tip-bot for Kan Liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.