From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751865AbbJTLgN (ORCPT ); Tue, 20 Oct 2015 07:36:13 -0400 Received: from casper.infradead.org ([85.118.1.10]:53981 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750846AbbJTLgL (ORCPT ); Tue, 20 Oct 2015 07:36:11 -0400 Date: Tue, 20 Oct 2015 13:36:08 +0200 From: Peter Zijlstra To: Andi Kleen Cc: linux-kernel@vger.kernel.org, Andi Kleen , Ingo Molnar , Thomas Gleixner Subject: Re: [PATCH] x86, perf: Use INST_RETIRED.PREC_DIST for cycles:pp on Skylake Message-ID: <20151020113608.GC17308@twins.programming.kicks-ass.net> References: <1445295496-8550-1-git-send-email-andi@firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1445295496-8550-1-git-send-email-andi@firstfloor.org> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 19, 2015 at 03:58:16PM -0700, Andi Kleen wrote: > Switch the cycles:pp alias from UOPS_RETITRED to INST_RETIRED.PREC_DIST. > The basic mechanism of abusing the inverse cmask to get all cycles > works the same as before. > > PREC_DIST has special support for avoiding shadow effects, which > can give better results compare to UOPS_RETIRED. The drawback is > that PREC_DIST can only schedule on counter 1, but that is ok for > cycle sampling, as there is normally no need to do multiple cycle > sampling runs in parallel. It is still possible to run perf top > in parallel, as that doesn't use precise mode. Also of course > the multiplexing can still allow parallel operation. So the worry I have with this is that there might indeed be people wanting to use this in parallel. Typically on workstations you do not, because there's only a single user, but on servers it might be more common. The thing I expect to be most common is having both a CPU wide and a per task cycle counter enabled. This means a fairly visible change in behaviour depending on uarch. And you having killed the flag bits for PEBS events precludes people from using this manually, right? I think we want to exempt .inv=1 .cmask=16 from that general rule on general utility value. We could maybe abuse .precise_ip = 3 for this? > On earlier parts there were various hardware bugs in it > (but no show stopper on IvyBridge and up I believe), > so it could be enabled there after sufficient testing. Just enable it for IVB+ then. > On Sandy Bridge PREC_DIST can only be scheduled as a single > event on the PMU, which is too limiting. Before Sandy > Bridge it was not supported. Right, that was a bit cumbersome :-)