From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750945AbdAXQmO (ORCPT ); Tue, 24 Jan 2017 11:42:14 -0500 Received: from merlin.infradead.org ([205.233.59.134]:50676 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750789AbdAXQmN (ORCPT ); Tue, 24 Jan 2017 11:42:13 -0500 Date: Tue, 24 Jan 2017 17:41:22 +0100 From: Peter Zijlstra To: Jiri Olsa Cc: lkml , Ingo Molnar , Andi Kleen , Alexander Shishkin , Arnaldo Carvalho de Melo , Vince Weaver Subject: Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Message-ID: <20170124164122.GL25813@worktop.programming.kicks-ass.net> References: <1482931866-6018-1-git-send-email-jolsa@kernel.org> <1482931866-6018-5-git-send-email-jolsa@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1482931866-6018-5-git-send-email-jolsa@kernel.org> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 28, 2016 at 02:31:06PM +0100, Jiri Olsa wrote: > This patch fixes following WARNING: > > WARNING: CPU: 15 PID: 15768 at arch/x86/events/core.c:1256 x86_pmu_start+0x1b3/0x1c0 > ... > Call Trace: > > dump_stack+0x86/0xc3 > __warn+0xcb/0xf0 > warn_slowpath_null+0x1d/0x20 > x86_pmu_start+0x1b3/0x1c0 > perf_event_task_tick+0x342/0x3f0 > scheduler_tick+0x75/0xd0 > update_process_times+0x47/0x60 > tick_sched_handle.isra.19+0x25/0x60 > tick_sched_timer+0x3d/0x70 > __hrtimer_run_queues+0xfb/0x510 > hrtimer_interrupt+0x9d/0x1a0 > local_apic_timer_interrupt+0x38/0x60 > smp_trace_apic_timer_interrupt+0x56/0x25a > trace_apic_timer_interrupt+0x9d/0xb0 > ... > > which happens AFAICS under following conditions: > (we have PEBS events configured) > > - x86_pmu_enable reconfigures counters and calls: > - x86_pmu_stop on PEBS event > - x86_pmu_stop drains the PEBS buffer, crosses > the throttle limit and sets: > event->hw.interrupts = MAX_INTERRUPTS > - following x86_pmu_start call starts the event > - perf_event_task_tick is triggered > - perf_adjust_freq_unthr_context sees event with > MAX_INTERRUPTS set and calls x86_pmu_start on already > started event, which triggers the warning > > My first attempt to fix this was to unthrottle the event > before starting it in x86_pmu_enable. But I think that > omitting the throttling completely when we are not in the > PMI is better. So I don't particularly like these patches... they make a wee bit of a mess. Under the assumption that draining a single event is on the same order of cost as a regular PMI, then accounting a drain of multiple events as an equal amount of interrupts makes sense. We should not disregard this work. Now it looks like both (BTS & PEBS) drain methods only count a single interrupt, that's something we maybe ought to fix too. So these things that drain are different from the regular case in that ::stop() will do this extra work not 'expected' by the regular core, so we must do something special. But 'hiding' the work is not correct. Arguably the x86_pmu_start() call in x86_pmu_enable() is wrong, if the stop caused a throttle, we should respect that. The problem is that we 'loose' the x86_pmu_stop() call done by drain. We check PERF_HES_STOPPED() before doing x86_pmu_stop(), but we cannot do thereafter because HES_STOPPED will always be set. Hmm, so we have: x86_pmu_enable() if (HES_STOPPED) hwc->state |= HES_ARCH; x86_pmu_stop() if __tac(active_mask) (true) x86_pmu.disable() := intel_pmu_disable_event() intel_pmu_pebs_disable() intel_pmu_drain_pebs_buffer() x86_pmu_stop() __tac(active_mask) (false) hwc->state |= HES_STOPPED; if (!HES_ARCH) x86_pmu_start(); So if we have that recursive stop also set ARCH, things might just work. --- diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 1635c0c8df23..a95707a4140f 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1343,6 +1343,8 @@ void x86_pmu_stop(struct perf_event *event, int flags) cpuc->events[hwc->idx] = NULL; WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); hwc->state |= PERF_HES_STOPPED; + } else { + hwc->state |= PERF_HES_ARCH; } if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {