From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754954Ab0CERDg (ORCPT ); Fri, 5 Mar 2010 12:03:36 -0500 Received: from mail-fx0-f219.google.com ([209.85.220.219]:47132 "EHLO mail-fx0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754641Ab0CERDe (ORCPT ); Fri, 5 Mar 2010 12:03:34 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=V/YC+GRPXx/qbSFPQkL5nIIjvGXyR75tbzlL4lmAF4jooSKrCt4KqsAjBiQ7eeiemS XDhQFUmBT6qGL/uufpzWs/Jh/1gYQj6bYXjdih8yiZFU1Lr7V+CxJCoCqTutlpZzeX0m xgq1rZbwSofvCduJw3MTEh2pPJ3xr5FU2MIE4= Date: Fri, 5 Mar 2010 18:03:33 +0100 From: Frederic Weisbecker To: Peter Zijlstra Cc: LKML , Ingo Molnar , Paul Mackerras , Steven Rostedt , Masami Hiramatsu , Jason Baron , Arnaldo Carvalho de Melo Subject: Re: [PATCH 2/2] perf: Walk through the relevant events only Message-ID: <20100305170331.GB5244@nowhere> References: <1267772426-5944-1-git-send-regression-fweisbec@gmail.com> <1267772426-5944-2-git-send-regression-fweisbec@gmail.com> <1267781969.16716.55.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1267781969.16716.55.camel@laptop> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 05, 2010 at 10:39:29AM +0100, Peter Zijlstra wrote: > On Fri, 2010-03-05 at 08:00 +0100, Frederic Weisbecker wrote: > > Each time a trace event triggers, we walk through the entire > > list of events from the active contexts to find the perf events > > that match the current one. > > > > This is wasteful. To solve this, we maintain a per cpu list of > > the active perf events for each running trace events and we > > directly commit to these. > > Right, so this seems a little trace specific. I once thought about using > a hash table to do this for all software events. It also keeps it all > nicely inside perf_event.[ch]. Right. We could have a per cpu type:event_id based hlist that would cover trace events and other software events. That would do the trick more generically wrt perf. Now isn't the problem more in the fact that most of the swevents should be tracepoints? This is the case for most of them. Only PERF_COUNT_SW_CPU_CLOCK and PERF_COUNT_SW_TASK_CLOCK seem to be the exception, and they manage their own path by calling perf_event_overflow() directly. And as you guess, turning them into tracepoints would benefit to everyone. We'll have interesting trace events in fault paths, we won't have zillions of hooks in the same place (in the context switch, we have the usual tracepoint plus the perf call). And eventually the off-case is better optimized, and further optimizations there (jmp/nop patching/whatever) will propagate to all tracepoint users. Finally, we would have only one path to maintain for the swevents.