From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756752Ab0BCPMo (ORCPT ); Wed, 3 Feb 2010 10:12:44 -0500 Received: from bombadil.infradead.org ([18.85.46.34]:34678 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754792Ab0BCPMn (ORCPT ); Wed, 3 Feb 2010 10:12:43 -0500 Subject: Re: [RFC][PATCH] perf_events, x86: PEBS support From: Peter Zijlstra To: Stephane Eranian Cc: Ingo Molnar , Paul Mackerras , "Metzger, Markus T" , lkml , Robert Richter , "David S. Miller" , Jamie Iles , Paul Mundt , Arjan van de Ven , "H. Peter Anvin" , perfmon2-devel@lists.sf.net In-Reply-To: References: <1265129772.24455.329.camel@laptop> <20100202182653.GB19320@elte.hu> <1265135588.24455.350.camel@laptop> <1265205361.24455.533.camel@laptop> <1265206784.24455.568.camel@laptop> <1265208012.24455.592.camel@laptop> Content-Type: text/plain; charset="UTF-8" Date: Wed, 03 Feb 2010 16:12:31 +0100 Message-ID: <1265209951.24455.640.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2010-02-03 at 15:54 +0100, Stephane Eranian wrote: > > PEBS is still very useful because it guarantees the state you capture > is at retirement of an instruction which caused the event. > > PEBS also gets way more interesting on Nehalem because of the > ability to capture where cache misses occur. That's the load latency > feature. You need to support that. Simple things first. But yeah, we'll get to load-latency eventually. > I believe you would need to abstract this in a generic fashion so it > could be used on other architectures, such as AMD with IBS. Right, Robert said he was working on IBS, I've still not made up my mind on how to represent IBS properly, its a bit of a weird thing. > On Nehalem, it requires the following: > > - only works if you sample on MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD. Yeah, and then you get to decode the data source thingy, not really a nice interface. Also, it mostly contains L3 information, not L2/L1. > - the threshold must be programmed into a dedicated MSR. The extra > difficulty is that this MSR is shared between CPU when HT is on. Lovely :/ One way is to program it to the lowest of the two and simply discard events afterwards.