From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1750945AbdAXQmO (ORCPT <rfc822;w@1wt.eu>);
        Tue, 24 Jan 2017 11:42:14 -0500
Received: from merlin.infradead.org ([205.233.59.134]:50676 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750789AbdAXQmN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 24 Jan 2017 11:42:13 -0500
Date: Tue, 24 Jan 2017 17:41:22 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Jiri Olsa <jolsa@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Andi Kleen <andi@firstfloor.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Vince Weaver <vince@deater.net>
Subject: Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
Message-ID: <20170124164122.GL25813@worktop.programming.kicks-ass.net>
References: <1482931866-6018-1-git-send-email-jolsa@kernel.org>
 <1482931866-6018-5-git-send-email-jolsa@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1482931866-6018-5-git-send-email-jolsa@kernel.org>
User-Agent: Mutt/1.5.22.1 (2013-10-16)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Dec 28, 2016 at 02:31:06PM +0100, Jiri Olsa wrote:
> This patch fixes following WARNING:
> 
>   WARNING: CPU: 15 PID: 15768 at arch/x86/events/core.c:1256 x86_pmu_start+0x1b3/0x1c0
>   ...
>   Call Trace:
>    <IRQ>
>    dump_stack+0x86/0xc3
>    __warn+0xcb/0xf0
>    warn_slowpath_null+0x1d/0x20
>    x86_pmu_start+0x1b3/0x1c0
>    perf_event_task_tick+0x342/0x3f0
>    scheduler_tick+0x75/0xd0
>    update_process_times+0x47/0x60
>    tick_sched_handle.isra.19+0x25/0x60
>    tick_sched_timer+0x3d/0x70
>    __hrtimer_run_queues+0xfb/0x510
>    hrtimer_interrupt+0x9d/0x1a0
>    local_apic_timer_interrupt+0x38/0x60
>    smp_trace_apic_timer_interrupt+0x56/0x25a
>    trace_apic_timer_interrupt+0x9d/0xb0
>    ...
> 
> which happens AFAICS under following conditions:
> (we have PEBS events configured)
> 
>   - x86_pmu_enable reconfigures counters and calls:
>        - x86_pmu_stop on PEBS event
>        - x86_pmu_stop drains the PEBS buffer, crosses
>          the throttle limit and sets:
>            event->hw.interrupts = MAX_INTERRUPTS
>        - following x86_pmu_start call starts the event
>   - perf_event_task_tick is triggered
>     - perf_adjust_freq_unthr_context sees event with
>       MAX_INTERRUPTS set and calls x86_pmu_start on already
>       started event, which triggers the warning
> 
> My first attempt to fix this was to unthrottle the event
> before starting it in x86_pmu_enable. But I think that
> omitting the throttling completely when we are not in the
> PMI is better.

So I don't particularly like these patches... they make a wee bit of a
mess.

Under the assumption that draining a single event is on the same order
of cost as a regular PMI, then accounting a drain of multiple events as
an equal amount of interrupts makes sense.

We should not disregard this work. Now it looks like both (BTS & PEBS)
drain methods only count a single interrupt, that's something we maybe
ought to fix too.

So these things that drain are different from the regular case in that
::stop() will do this extra work not 'expected' by the regular core, so
we must do something special. But 'hiding' the work is not correct.

Arguably the x86_pmu_start() call in x86_pmu_enable() is wrong, if the
stop caused a throttle, we should respect that. The problem is that we
'loose' the x86_pmu_stop() call done by drain. We check
PERF_HES_STOPPED() before doing x86_pmu_stop(), but we cannot do
thereafter because HES_STOPPED will always be set.


Hmm, so we have:

  x86_pmu_enable()
    if (HES_STOPPED)
      hwc->state |= HES_ARCH;

    x86_pmu_stop()
      if __tac(active_mask) (true)
	x86_pmu.disable() := intel_pmu_disable_event()
	  intel_pmu_pebs_disable()
	    intel_pmu_drain_pebs_buffer()
	      x86_pmu_stop()
		__tac(active_mask) (false)
      hwc->state |= HES_STOPPED;

    if (!HES_ARCH)
      x86_pmu_start();


So if we have that recursive stop also set ARCH, things might just work.

---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1635c0c8df23..a95707a4140f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1343,6 +1343,8 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 		cpuc->events[hwc->idx] = NULL;
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
+	} else {
+		hwc->state |= PERF_HES_ARCH;
 	}
 
 	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {