Re: [PATCH 2/6] tracing/profile: Add filter support

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Frederic Weisbecker <fweisbec@gmail.com>
To: Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Tom Zanussi <tzanussi@gmail.com>, Jason Baron <jbaron@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] tracing/profile: Add filter support
Date: Tue, 8 Sep 2009 04:01:19 +0200	[thread overview]
Message-ID: <20090908020117.GC6312@nowhere> (raw)
In-Reply-To: <4AA4C085.5050006@cn.fujitsu.com>

On Mon, Sep 07, 2009 at 04:12:53PM +0800, Li Zefan wrote:
> - add ftrace_profile_set_filter(), to set filter for a profile event
> 
> - filter is enabled when profile probe is registered
> 
> - filter is disabled when profile probe is unregistered
> 
> - in ftrace_profile_##call(), record events only when
>   filter_match_preds() returns 1
> 
> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

Well, I feel a bit uncomfortable with this approach.

The events approach taken by perf is different from ftrace.

ftrace events activation/deactivation, ring buffer captures,
filters are all globals. And this is nice to perform kernel
tracing from debugfs files.

But perf has a per counter instance approach. This means
that when a tracepoint counter registers a filter, this should
be private to this tracepoint counter and not propagated to the
others.

So this should rely on a kind of per tracepoint counter
attribute, something that we should probably be stored in
the struct hw_perf_counter like:

--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -467,6 +467,7 @@ struct hw_perf_counter {
                union { /* software */
                        atomic64_t      count;
                        struct hrtimer  hrtimer;
+                       struct event_filter filter;
                };
        };
        atomic64_t                      prev_count;

You may need to get the current perf context that can
be found in current->perf_counter_ctxp and then iterate
through the counter_list of this ctx to find the current counter
attached to this tracepoint (using the event id).

What is not nice is that we need to iterate in O(n), n beeing the
number of tracepoint counters attached to the current counter
context.

So to avoid the following costly sequence in the tracing fastpath:

- deref ctx->current->perf_counter_ctxp
- list every ctx->counter_list
- find the counter that matches
- deref counter->filter and test...

You could keep the profile_filter field (and profile_filter_active)
in struct ftrace_event_call but allocate them per cpu and
write these fields for a given event each time we enter/exit a
counter context that has a counter that uses this given event.

That's something we could do by using a struct pmu specific for
tracepoints. More precisely with enable/disable callbacks that would do
specific things and then relay on the perf_ops_generic pmu
callbacks.

the struct pmu::enable()/disable() callbacks are functions that are called
each time we schedule in/out a task group that has a counter that
uses the given pmu.
Ie: they are called each time we schedule in/out a counter.

So you have a struct ftrace_event_call. This event can be used in
several different counters instance at the same time. But in a given cpu,
only one of these counters can be currently in use.

Now if we build a simple struct pmu tp_pmu that mostly relays
on the perf_ops_generic pmu but also have a specific enable/disable
pair that calls perf_swcounter_enable/disable and also does:

/*
 * We are scheduling this counter on the smp_processor_id() cpu
 * (we are in atomic context, ctx->lock acquired) then we
 * can safely write a local (per cpu) shortcut from the filter_profile
 * field in the event to the counter filter.
 */
static int perf_tpcounter_enable(struct perf_counter *counter)
{
	struct event_filter *filter, **shortcut;
	int *enable;
	struct ftrace_event_call *call;
	int cpu = smp_processor_if();

	call = find_event_by_id(counter->attr.config);
	filter = &counter->hw.filter;

	shortcut = &per_cpu_ptr(call->filter_profile, cpu)
	enable = &per_cpu_var(call->filter_profile_enabled, cpu)

	if (filter is present) {
		*enable = 1;
		*shortcut = filter;
	}

	return perf_swcounter_enable(counter);
}

/* We schedule out this counter from this cpu, erase the shortcut */
static void perf_tpcounter_disable(struct perf_counter *counter)
{
		/* ... */
		enable = 0;
		shortcut = NULL;

		perf_swcounter_disable(counter);
}

static const struct pmu perf_ops_tracepoint = {
	.enable		= perf_tpcounter_enable,
	.disable	= perf_tpcounter_disable,
	.read		= perf_swcounter_read,
	.unthrottle	= perf_swcounter_unthrottle,
};

See? Then you can have a O(1) retrieval of the current per
counter filter to apply from ftrace_profile_##call()

I hope I haven't been too much confusing...

next prev parent reply	other threads:[~2009-09-08  2:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07  8:11 [PATCH 0/6] perf trace: Add filter support Li Zefan
2009-09-07  8:12 ` [PATCH 1/6] tracing/filters: refactor subsystem filter code Li Zefan
2009-09-07  8:12 ` [PATCH 2/6] tracing/profile: Add filter support Li Zefan
2009-09-08  2:01   ` Frederic Weisbecker [this message]
2009-09-08  2:24     ` Frederic Weisbecker
2009-09-08  8:35     ` Peter Zijlstra
2009-09-08 12:33       ` Frederic Weisbecker
2009-09-07  8:13 ` [PATCH 3/6] tracing/syscalls: Add profile " Li Zefan
2009-09-07  8:13 ` [PATCH 4/6] perf_counter: Add PERF_COUNTER_IOC_SET_FILTER ioctl Li Zefan
2009-09-07 16:44   ` Peter Zijlstra
2009-09-07 16:48     ` Ingo Molnar
2009-09-07 16:55       ` Peter Zijlstra
2009-09-08  0:49         ` Li Zefan
2009-09-08  6:52           ` Ingo Molnar
2009-09-08  8:37           ` Peter Zijlstra
2009-09-08  7:01         ` Tom Zanussi
2009-09-09  2:18           ` Li Zefan
2009-09-10  4:45             ` Tom Zanussi
2009-09-10 23:01           ` Randy Dunlap
2009-09-11  4:08             ` Tom Zanussi
2009-09-07  8:13 ` [PATCH 5/6] perf trace: increase MAX_EVENT_LENGTH Li Zefan
2009-09-07  8:14 ` [PATCH 6/6] perf trace: Add filter support Li Zefan
2009-09-08  0:02 ` [PATCH 0/6] " Frederic Weisbecker
2009-09-08  1:06   ` Li Zefan
2009-09-08  2:12     ` Frederic Weisbecker
2009-09-08  6:53       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908020117.GC6312@nowhere \
    --to=fweisbec@gmail.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox