linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jiri Olsa <jolsa@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>, Borislav Petkov <bp@suse.de>
Subject: Re: Re: [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support
Date: Thu, 23 Jul 2015 11:01:27 -0300	[thread overview]
Message-ID: <20150723140127.GD3152@kernel.org> (raw)
In-Reply-To: <55B0E872.8030206@hitachi.com>

Em Thu, Jul 23, 2015 at 10:13:22PM +0900, Masami Hiramatsu escreveu:
> On 2015/07/22 23:12, Hemant Kumar wrote:
> > On 07/17/2015 08:51 AM, Masami Hiramatsu wrote:
> >> On 2015/07/16 12:13, Hemant Kumar wrote:
> > The idea behind '%' was to identify the SDT events and take a different path
> > to lookup through the cache, put a probe, record and then delete the probe.
> > Or, do you want "perf record" to record any event this way (not just an sdt
> > event).
 
> I see, but I think that is not good by following reasons,
 
> - when we record event with "-e %provider:event", it will be shown as
>   "provider:event"
> - if perf-list shows the SDT(cached) events as "%provider:event", that
>   will not match the recorded result.
> - it is somewhat fragile that we temporary add the SDT event and remove it
>   after record, because the event will not hide from ftrace users (this
>   means that we'll fail removing the event by -EBUSY if someone use it
>   via ftrace)

We should avoid that, if we say record event "foo:bar", then we should
consistently show "foo:bar" everywhere this is referenced.

> - if we set SDT events perf-probe, it will be shown as "provider:event" name
>   because "%" will be rejected by ftrace. In that case, what the perf-list show
>   those events, both of %provider:event and provider:event ?
 
> thus I pushed the "%" as a "special remembering mark" only for looking
> up the event from cache by perf-probe.
 
> So I'd like to suggest that the following behavior
 
> 1) perf-list shows the cached-with-name and SDT events as Tracepoint events
>   even if it is not yet probed.

I can agree with 'perf list' showing what can be used as events, and
SDTs, AFAIK, match that definition, i.e. we have somewhere (in the DSOs,
right?) information about where to ask for an event to be enabled.

If there are details on how that needs to be obtained, then passed to
the kernel somehow to then become really, really accessible, then these
details are completely internal.

> # perf list
> 
> List of pre-defined events (to be used in -e):
> ...
>   libc:memory_heap_new                             [Tracepoint event]

Is it like this or is it like [ku]probes where we already have a
namespace qualifier, i.e.:

[root@zoo ~]# perf probe icmp_rcv
Added new event:
  probe:icmp_rcv       (on icmp_rcv)

You can now use it in all perf tools, such as:

	perf record -e probe:icmp_rcv -aR sleep 1

[root@zoo ~]#

[root@zoo ~]# perf probe /lib64/libc-2.20.so malloc
Added new events:
  probe_libc:malloc    (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_1  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_2  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_3  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_4  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_5  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_6  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_7  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_8  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_9  (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_10 (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_11 (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_12 (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_13 (on malloc in /lib64/libc-2.20.so)
  probe_libc:malloc_14 (on malloc in /lib64/libc-2.20.so)

You can now use it in all perf tools, such as:

	perf record -e probe_libc:malloc_14 -aR sleep 1

[root@zoo ~]#

"probe" for kernel events, "probe_%s" % DSO basename for userspace
events.

Why not continue with that and have SDTs use the probe_%s: namespace?
Sorry if this was already discussed here...

If there is some ambiguity, that can be resolved by explicitely setting
a new name, 'perf probe' has provision for that, right? I.e.:

[root@zoo ~]# perf probe /lib64/libc-2.20.so another_name=malloc
Added new events:
<SNIP>
  probe_libc:another_name_14 (on malloc in /lib64/libc-2.20.so)

You can now use it in all perf tools, such as:

	perf record -e probe_libc:another_name_14 -aR sleep 1

[root@zoo ~]#

> ...
>   probes:myevent                                   [Tracepoint event]
> ...
 
> 2) perf-record -e with no-probed event should try to set up the given probe
>  by using perf-probe. It is possible to remove that the probe after recording,
>  but also ignore if it fails by -EBUSY. (anyway, there is no difference for
>  users)

Right, being able to add new probes _without_ calling 'perf probe', but
instead functions used by 'perf probe' is something the eBPF does (I'm
almost getting there... :) ) and that I want to do in other tools, like
'trace' as well...

There are permission problems about how to add new probes and how to
_enable_ them, i.e. I think it is ok to allow users to ask for
probe:foo, if they are monitoring just their workloads...

[root@zoo ~]# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
Added new event:
  probe:vfs_getname    (on getname_flags:72 with pathname=filename:string)

You can now use it in all perf tools, such as:

	perf record -e probe:vfs_getname -aR sleep 1

[root@zoo ~]#

Then, as !root:

  [acme@zoo linux]$ trace cat /etc/passwd
  Error:	No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
  Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

By default, !root users can't see debugfs, so, no dice in trying to use
the raw_syscalls tracepoints, bummer, then:

  [acme@zoo linux]$ sudo mount -o remount,mode=755 /sys/kernel/debug
  [sudo] password for acme: 
  [acme@zoo linux]$ trace cat /etc/passwd
  Error:	No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
  Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'

Ouch, now this tracefs thing inside debugfs, at least sudo doesn't asks me for the
password this time!

But then, since 'perf trace' knows there is a probe:vfs_getname syscall in place, it
will try to use it, to monitor a workload it is about to start, but:

  [acme@zoo linux]$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
  [acme@zoo linux]$ trace cat /etc/passwd
  Error:	Operation not permitted.
  Hint:	Check /proc/sys/kernel/perf_event_paranoid setting.
  Hint:	For system wide tracing it needs to be set to -1.
  Hint:	Try: 'sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"'
  Hint:	The current value is 1.
  [acme@zoo linux]$

This needs refining, i.e. I should just warn that albeit "probe:vfs_getname" is available,
it can't be used, unless we open the doors wide.

Sorry for the digression, but these permission issues will hit us with SDT as well, no?
 
> This rule will solve the contradiction between the event name on recorded
> data and listed events. However, as we discussed there are other clashes.

> A) clash among binaries: Since the binary builders can freely use the
> provider name, it is possible to clash to other binaries' SDTs.

Yes, one way to disambiguate is to use the buildid, i.e. content based, when/if
the need arises. We should _always_ store the build-id, together with any probe
used, so that we can bail out when trying to run those with a non matching DSO
(kernel, module, library, whatever).

When specifying some SDT that is ambiguous, we should bail out and ask for further
namespacing, like:

[acme@zoo linux]$ git show --oneline 25b6
error: short SHA1 25b6 is ambiguous.
error: short SHA1 25b6 is ambiguous.
fatal: ambiguous argument '25b6': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
[acme@zoo linux]$ git show --oneline 25b67
error: short SHA1 25b67 is ambiguous.
error: short SHA1 25b67 is ambiguous.
fatal: ambiguous argument '25b67': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

It can be via build-id or by asking for the full pathname to the DSO, etc.
 
> B) clash among different versions: Of course the different versions of binaries
> can be co-exist on the system. Those usually have the same SDTs and same
> basename, just different build-ids.

Right, in that case the build-id or the full pathname needs to be specified somehow

> These issues are not solved by using "%" because it happens among SDTs.
> So we need to find another way to distinguish the SDTs.

>From what I've read so far (not much): agreed.

- Arnaldo

  reply	other threads:[~2015-07-23 14:01 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-15  9:13 [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 01/16] perf probe: Simplify __add_probe_trace_events code Masami Hiramatsu
2015-07-21  9:34   ` [tip:perf/core] " tip-bot for Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 02/16] perf probe: Move ftrace probe-event operations to probe-file.c Masami Hiramatsu
2015-07-21  9:35   ` [tip:perf/core] " tip-bot for Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 03/16] perf probe: Use strbuf for making strings in probe-event.c Masami Hiramatsu
2015-07-17  7:42   ` Namhyung Kim
2015-07-17 10:16     ` Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 04/16] perf-buildid-cache: Use path/to/bin/buildid/elf instead of path/to/bin/buildid Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 05/16] perf buildid: Use SBUILD_ID_SIZE macro Masami Hiramatsu
2015-07-20 18:47   ` Arnaldo Carvalho de Melo
2015-07-21  9:35   ` [tip:perf/core] " tip-bot for Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 06/16] perf buildid: Introduce sysfs/filename__sprintf_build_id Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 07/16] perf: Add lsdir to read a directory Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 08/16] perf-buildid-cache: Use lsdir for looking up buildid caches Masami Hiramatsu
2015-07-15  9:14 ` [RFC PATCH perf/core v2 09/16] perf probe: Add --cache option to cache the probe definitions Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 10/16] perf probe: Use cache entry if possible Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 11/16] perf probe: Show all cached probes Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 12/16] perf probe: Remove caches when --cache is given Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 13/16] perf/sdt: ELF support for SDT Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 14/16] perf probe: Add group name support Masami Hiramatsu
2015-07-19 10:16   ` Namhyung Kim
2015-07-20  4:48     ` Masami Hiramatsu
2015-07-20 15:31       ` Namhyung Kim
2015-07-15  9:15 ` [RFC PATCH perf/core v2 15/16] perf buildid-cache: Scan and import user SDT events to probe cache Masami Hiramatsu
2015-07-19 10:46   ` Namhyung Kim
2015-07-20  3:19     ` Masami Hiramatsu
2015-07-20 15:52       ` Namhyung Kim
2015-07-21 10:42         ` Masami Hiramatsu
2015-07-15  9:15 ` [RFC PATCH perf/core v2 16/16] perf probe: Accept %sdt and %cached event name Masami Hiramatsu
2015-07-19 10:53   ` Namhyung Kim
2015-07-20  3:03     ` Masami Hiramatsu
2015-07-16  3:13 ` [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support Hemant Kumar
2015-07-17  3:21   ` Masami Hiramatsu
2015-07-19  4:24     ` Namhyung Kim
2015-07-20  5:47       ` Brendan Gregg
2015-07-20 16:20         ` Namhyung Kim
2015-07-21 10:34           ` Masami Hiramatsu
2015-07-22 14:12     ` Hemant Kumar
2015-07-23 13:13       ` Masami Hiramatsu
2015-07-23 14:01         ` Arnaldo Carvalho de Melo [this message]
2015-07-23 16:24           ` Masami Hiramatsu
2015-07-23 16:42             ` Arnaldo Carvalho de Melo
2015-07-24  7:55             ` Namhyung Kim
2015-07-24 15:52               ` Arnaldo Carvalho de Melo
2015-07-25  0:51                 ` Masami Hiramatsu
2015-07-27 14:03                 ` Re: " Namhyung Kim
2015-07-27 15:16                   ` Arnaldo Carvalho de Melo
2015-07-28  0:42                     ` Masami Hiramatsu
2015-07-28 13:45                       ` Arnaldo Carvalho de Melo
2015-07-20 18:36 ` Arnaldo Carvalho de Melo
2015-07-20 18:42   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150723140127.GD3152@kernel.org \
    --to=acme@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adrian.hunter@intel.com \
    --cc=bp@suse.de \
    --cc=hemant@linux.vnet.ibm.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).