linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Tom Zanussi <tzanussi@gmail.com>, Jason Baron <jbaron@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] tracing/profile: Add filter support
Date: Tue, 8 Sep 2009 04:24:56 +0200	[thread overview]
Message-ID: <20090908022454.GE6312@nowhere> (raw)
In-Reply-To: <20090908020117.GC6312@nowhere>

On Tue, Sep 08, 2009 at 04:01:19AM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 07, 2009 at 04:12:53PM +0800, Li Zefan wrote:
> > - add ftrace_profile_set_filter(), to set filter for a profile event
> > 
> > - filter is enabled when profile probe is registered
> > 
> > - filter is disabled when profile probe is unregistered
> > 
> > - in ftrace_profile_##call(), record events only when
> >   filter_match_preds() returns 1
> > 
> > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> 
> 
> Well, I feel a bit uncomfortable with this approach.
> 
> The events approach taken by perf is different from ftrace.
> 
> ftrace events activation/deactivation, ring buffer captures,
> filters are all globals. And this is nice to perform kernel
> tracing from debugfs files.
> 
> But perf has a per counter instance approach. This means
> that when a tracepoint counter registers a filter, this should
> be private to this tracepoint counter and not propagated to the
> others.
> 
> So this should rely on a kind of per tracepoint counter
> attribute, something that we should probably be stored in
> the struct hw_perf_counter like:
> 
> --- a/include/linux/perf_counter.h
> +++ b/include/linux/perf_counter.h
> @@ -467,6 +467,7 @@ struct hw_perf_counter {
>                 union { /* software */
>                         atomic64_t      count;
>                         struct hrtimer  hrtimer;
> +                       struct event_filter filter;
>                 };
>         };
>         atomic64_t                      prev_count;
> 
> 
> You may need to get the current perf context that can
> be found in current->perf_counter_ctxp and then iterate
> through the counter_list of this ctx to find the current counter
> attached to this tracepoint (using the event id).
> 
> What is not nice is that we need to iterate in O(n), n beeing the
> number of tracepoint counters attached to the current counter
> context.
> 
> So to avoid the following costly sequence in the tracing fastpath:
> 
> - deref ctx->current->perf_counter_ctxp
> - list every ctx->counter_list
> - find the counter that matches
> - deref counter->filter and test...
> 
> You could keep the profile_filter field (and profile_filter_active)
> in struct ftrace_event_call but allocate them per cpu and
> write these fields for a given event each time we enter/exit a
> counter context that has a counter that uses this given event.
> 
> That's something we could do by using a struct pmu specific for
> tracepoints. More precisely with enable/disable callbacks that would do
> specific things and then relay on the perf_ops_generic pmu
> callbacks.
> 
> the struct pmu::enable()/disable() callbacks are functions that are called
> each time we schedule in/out a task group that has a counter that
> uses the given pmu.
> Ie: they are called each time we schedule in/out a counter.
> 
> So you have a struct ftrace_event_call. This event can be used in
> several different counters instance at the same time. But in a given cpu,
> only one of these counters can be currently in use.



I fear I've been confusing here.

Just to be clear:

Say we schedule in a task in cpu 0.
This task has a irq_entry tp counter. It makes no
sense to have two irq_entry counters for a same task and
if it's possible I guess that should be fixed (unless I'm
missing relevant usecases).

Then when we enter the task, if we have an irq_entry counter
that becomes enabled, we can safely write a shortcut from its
ftrace_event_call structure to the counter filter.

But that must be done per cpu, because another task of the same
task group may be using this counter instance on another cpu.
That's why we need this field to be written per cpu.



 
> Now if we build a simple struct pmu tp_pmu that mostly relays
> on the perf_ops_generic pmu but also have a specific enable/disable
> pair that calls perf_swcounter_enable/disable and also does:
> 
> /*
>  * We are scheduling this counter on the smp_processor_id() cpu
>  * (we are in atomic context, ctx->lock acquired) then we
>  * can safely write a local (per cpu) shortcut from the filter_profile
>  * field in the event to the counter filter.
>  */
> static int perf_tpcounter_enable(struct perf_counter *counter)
> {
> 	struct event_filter *filter, **shortcut;
> 	int *enable;
> 	struct ftrace_event_call *call;
> 	int cpu = smp_processor_if();
> 
> 	call = find_event_by_id(counter->attr.config);
> 	filter = &counter->hw.filter;
> 
> 	shortcut = &per_cpu_ptr(call->filter_profile, cpu)
> 	enable = &per_cpu_var(call->filter_profile_enabled, cpu)
> 
> 	if (filter is present) {
> 		*enable = 1;
> 		*shortcut = filter;
> 	}
> 
> 	return perf_swcounter_enable(counter);
> }
> 
> /* We schedule out this counter from this cpu, erase the shortcut */
> static void perf_tpcounter_disable(struct perf_counter *counter)
> {
> 		/* ... */
> 		enable = 0;
> 		shortcut = NULL;
> 
> 		perf_swcounter_disable(counter);
> }
> 
> static const struct pmu perf_ops_tracepoint = {
> 	.enable		= perf_tpcounter_enable,
> 	.disable	= perf_tpcounter_disable,
> 	.read		= perf_swcounter_read,
> 	.unthrottle	= perf_swcounter_unthrottle,
> };
> 	
> 
> See? Then you can have a O(1) retrieval of the current per
> counter filter to apply from ftrace_profile_##call()
> 
> I hope I haven't been too much confusing...
> 


  reply	other threads:[~2009-09-08  2:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07  8:11 [PATCH 0/6] perf trace: Add filter support Li Zefan
2009-09-07  8:12 ` [PATCH 1/6] tracing/filters: refactor subsystem filter code Li Zefan
2009-09-07  8:12 ` [PATCH 2/6] tracing/profile: Add filter support Li Zefan
2009-09-08  2:01   ` Frederic Weisbecker
2009-09-08  2:24     ` Frederic Weisbecker [this message]
2009-09-08  8:35     ` Peter Zijlstra
2009-09-08 12:33       ` Frederic Weisbecker
2009-09-07  8:13 ` [PATCH 3/6] tracing/syscalls: Add profile " Li Zefan
2009-09-07  8:13 ` [PATCH 4/6] perf_counter: Add PERF_COUNTER_IOC_SET_FILTER ioctl Li Zefan
2009-09-07 16:44   ` Peter Zijlstra
2009-09-07 16:48     ` Ingo Molnar
2009-09-07 16:55       ` Peter Zijlstra
2009-09-08  0:49         ` Li Zefan
2009-09-08  6:52           ` Ingo Molnar
2009-09-08  8:37           ` Peter Zijlstra
2009-09-08  7:01         ` Tom Zanussi
2009-09-09  2:18           ` Li Zefan
2009-09-10  4:45             ` Tom Zanussi
2009-09-10 23:01           ` Randy Dunlap
2009-09-11  4:08             ` Tom Zanussi
2009-09-07  8:13 ` [PATCH 5/6] perf trace: increase MAX_EVENT_LENGTH Li Zefan
2009-09-07  8:14 ` [PATCH 6/6] perf trace: Add filter support Li Zefan
2009-09-08  0:02 ` [PATCH 0/6] " Frederic Weisbecker
2009-09-08  1:06   ` Li Zefan
2009-09-08  2:12     ` Frederic Weisbecker
2009-09-08  6:53       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908022454.GE6312@nowhere \
    --to=fweisbec@gmail.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).