From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933435Ab0CaOCL (ORCPT ); Wed, 31 Mar 2010 10:02:11 -0400 Received: from casper.infradead.org ([85.118.1.10]:43806 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933284Ab0CaOCJ (ORCPT ); Wed, 31 Mar 2010 10:02:09 -0400 Subject: Re: [RFC] perf_events: support for uncore a.k.a. nest units From: Peter Zijlstra To: Corey Ashford Cc: Lin Ming , Ingo Molnar , LKML , Andi Kleen , Paul Mackerras , Stephane Eranian , Frederic Weisbecker , Xiao Guangrong , Dan Terpstra , Philip Mucci , Maynard Johnson , Carl Love , Steven Rostedt , Arnaldo Carvalho de Melo , Masami Hiramatsu In-Reply-To: <4BB27764.2060802@linux.vnet.ibm.com> References: <4B560ACD.4040206@linux.vnet.ibm.com> <1269934931.8575.6.camel@minggr.sh.intel.com> <4BB22BB0.8030208@linux.vnet.ibm.com> <1269969305.5258.479.camel@laptop> <4BB27764.2060802@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 31 Mar 2010 16:01:56 +0200 Message-ID: <1270044116.1616.26.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2010-03-30 at 15:12 -0700, Corey Ashford wrote: > > > Initially I'd not allow per-pmu-per-task contexts > > because then things like perf_event_task_sched_out() would get rather > > complex. > > Definitely. I don't think it makes sense to have per-task context on > nest/uncore PMUs. At least we haven't found any justification for it. For uncore no, but there is also the hw-breakpoint stuff that is being presented as a pmu, for those it would make sense to have a separate per-task context. But doing multiple per-task contexts is something for a next step indeed. > > For RR we can move away from perf_event_task_tick and let the pmu > > install a (hr)timer for this on their own. > > This is necessary I think, because of the access time for some of the PMU's. I > wonder though if it should, perhaps optionally, be off-loaded to a high-priority > task to do the switching so that access latency to the PMU can be controlled. > > As I mentioned when we met, some of the Wire-Speed processor nest PMU control > registers are accessed via SCOM, which is an internal, 200 MHz serial bus. We > are being quoted ~525 SCOM bus ticks to do a PMU control register access, which > comes out to about 2.5 microseconds. If you figure 5 accesses to rotate the > events on a PMU, that's a minimum of 12.5 microseconds. Yeah, you mentioned that.. for those things we need some changes anyway, since currently we install per-cpu counters using IPIs and expect the pmu::enable() method to be synchronous (it has a return value). It would be totally unacceptable to do 2.5ms pokes with IRQs disabled. The RR thing would be the easiest to solve, just let the timer wake up a thread instead of doing the work itself, that's fully isolated to how the pmu chooses to implement that. The above mentioned issue however would be much more challenging to fix nicely.