All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	David Ahern <dsahern@gmail.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Wang Nan <wangnan0@huawei.com>
Subject: [PATCH 1/3] perf evsel: Fix probing of precise_ip level for default cycles event
Date: Fri, 16 Jun 2017 16:26:33 -0300	[thread overview]
Message-ID: <20170616192635.14535-2-acme@kernel.org> (raw)
In-Reply-To: <20170616192635.14535-1-acme@kernel.org>

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Since commit 18e7a45af91a ("perf/x86: Reject non sampling events with
precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
in the routine used to probe the max precise level when no events were
passed to 'perf record' or 'perf top', i.e.:

	perf_evsel__new_cycles()
		perf_event_attr__set_max_precise_ip()

The x86 code, in x86_pmu_hw_config(), which is called all the way from
sys_perf_event_open() did, starting with the aforementioned commit:

                /* There's no sense in having PEBS for non sampling events: */
                if (!is_sampling_event(event))
                        return -EINVAL;

Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
just the non precise cycles variant.

To make sure that this is the case, I tested it, before this patch,
with:

  # perf probe -L x86_pmu_hw_config
  <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
        0  int x86_pmu_hw_config(struct perf_event *event)
        1  {
        2         if (event->attr.precise_ip) {
<SNIP>
       17                 if (event->attr.precise_ip > precise)
       18                         return -EOPNOTSUPP;

                          /* There's no sense in having PEBS for non sampling events: */
       21                 if (!is_sampling_event(event))
       22                         return -EINVAL;
                  }
<SNIP>
  # perf probe x86_pmu_hw_config:22
  Added new events:
    probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
    probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)

  You can now use it in all perf tools, such as:

        perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1

  # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
     0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
     0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                       x86_pmu_hw_config ([kernel.kallsyms])
                                       hsw_hw_config ([kernel.kallsyms])
                                       x86_pmu_event_init ([kernel.kallsyms])
                                       perf_try_init_event ([kernel.kallsyms])
                                       perf_event_alloc ([kernel.kallsyms])
                                       SYSC_perf_event_open ([kernel.kallsyms])
                                       sys_perf_event_open ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       return_from_SYSCALL_64 ([kernel.kallsyms])
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                       perf_evsel__new_cycles (/home/acme/bin/perf)
                                       perf_evlist__add_default (/home/acme/bin/perf)
                                       cmd_record (/home/acme/bin/perf)
                                       run_builtin (/home/acme/bin/perf)
                                       handle_internal_command (/home/acme/bin/perf)
     0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
     0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
     0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                       x86_pmu_hw_config ([kernel.kallsyms])
                                       hsw_hw_config ([kernel.kallsyms])
                                       x86_pmu_event_init ([kernel.kallsyms])
                                       perf_try_init_event ([kernel.kallsyms])
                                       perf_event_alloc ([kernel.kallsyms])
                                       SYSC_perf_event_open ([kernel.kallsyms])
                                       sys_perf_event_open ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       return_from_SYSCALL_64 ([kernel.kallsyms])
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                       perf_evsel__new_cycles (/home/acme/bin/perf)
                                       perf_evlist__add_default (/home/acme/bin/perf)
                                       cmd_record (/home/acme/bin/perf)
                                       run_builtin (/home/acme/bin/perf)
                                       handle_internal_command (/home/acme/bin/perf)
     0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
     0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
     0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                       x86_pmu_hw_config ([kernel.kallsyms])
                                       hsw_hw_config ([kernel.kallsyms])
                                       x86_pmu_event_init ([kernel.kallsyms])
                                       perf_try_init_event ([kernel.kallsyms])
                                       perf_event_alloc ([kernel.kallsyms])
                                       SYSC_perf_event_open ([kernel.kallsyms])
                                       sys_perf_event_open ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       return_from_SYSCALL_64 ([kernel.kallsyms])
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                       perf_evsel__new_cycles (/home/acme/bin/perf)
                                       perf_evlist__add_default (/home/acme/bin/perf)
                                       cmd_record (/home/acme/bin/perf)
                                       run_builtin (/home/acme/bin/perf)
                                       handle_internal_command (/home/acme/bin/perf)
     0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
    41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
    41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
    41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
    41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
    41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
    41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
    41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
  #

I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.

So fix it by just setting attr.sample_period

Now, after this patch:

  # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
     0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_event_open_cloexec_flag (/home/acme/bin/perf)
     0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evlist__config (/home/acme/bin/perf)
     0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evlist__config (/home/acme/bin/perf)
     0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
     0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evsel__open (/home/acme/bin/perf)
     0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evsel__open (/home/acme/bin/perf)
     0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evsel__open (/home/acme/bin/perf)
     0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                       syscall (/usr/lib64/libc-2.24.so)
                                       perf_evsel__open (/home/acme/bin/perf)
  [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
  #

The probe one called from perf_event_attr__set_max_precise_ip() works
the first time, with attr.precise_ip = 3, wit hthe next ones being the
per cpu ones for the cycles:ppp event.

And here is the text from a report and alternative proposed patch by
Thomas-Mich Richter:

 ---

On s390 the counter and sampling facility do not support a precise IP
skid level and sometimes returns EOPNOTSUPP when structure member
precise_ip in struct perf_event_attr is not set to zero.

On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
happens only when no events are specified on command line.

The functions called are
...
  --> perf_evlist__add_default
      --> perf_evsel__new_cycles
          --> perf_event_attr__set_max_precise_ip

The last function determines the value of structure member precise_ip by
invoking the perf_event_open() system call and checking the return code.
The first successful open is the value for precise_ip.

However the value is determined without setting member sample_period and
indicates no sampling.

On s390 the counter facility and sampling facility are different.  The
above procedure determines a precise_ip value of 3 using the counter
facility. Later it uses the sampling facility with a value of 3 and
fails with EOPNOTSUPP.

 ---

v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
    of unnamed union members in the container struct initialization, so
    move from:

	struct perf_event_attr attr = {
		...
		.sample_period = 1,
	};

to right after it as:

	struct perf_event_attr attr = {
		...
	};

	attr.sample_period = 1;

v3: We need to reset .sample_period to 0 to let the users of
perf_evsel__new_cycles() to properly setup attr.sample_period or
attr.sample_freq. Reported by Ingo Molnar.

Reported-and-Acked-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 18e7a45af91a ("perf/x86: Reject non sampling events with precise_ip")
Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/task-exit.c |  2 +-
 tools/perf/util/evsel.c      | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
index 32873ec91a4e..cf00ebad2ef5 100644
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
@@ -83,7 +83,7 @@ int test__task_exit(int subtest __maybe_unused)
 
 	evsel = perf_evlist__first(evlist);
 	evsel->attr.task = 1;
-	evsel->attr.sample_freq = 0;
+	evsel->attr.sample_freq = 1;
 	evsel->attr.inherit = 0;
 	evsel->attr.watermark = 0;
 	evsel->attr.wakeup_events = 1;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index e4f7902d5afa..cda44b0e821c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -273,8 +273,20 @@ struct perf_evsel *perf_evsel__new_cycles(void)
 	struct perf_evsel *evsel;
 
 	event_attr_init(&attr);
+	/*
+	 * Unnamed union member, not supported as struct member named
+	 * initializer in older compilers such as gcc 4.4.7
+	 *
+	 * Just for probing the precise_ip:
+	 */
+	attr.sample_period = 1;
 
 	perf_event_attr__set_max_precise_ip(&attr);
+	/*
+	 * Now let the usual logic to set up the perf_event_attr defaults
+	 * to kick in when we return and before perf_evsel__open() is called.
+	 */
+	attr.sample_period = 0;
 
 	evsel = perf_evsel__new(&attr);
 	if (evsel == NULL)
-- 
2.9.4

  reply	other threads:[~2017-06-16 19:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-16 19:26 [GIT PULL 0/3] perf/urgent fixes Arnaldo Carvalho de Melo
2017-06-16 19:26 ` Arnaldo Carvalho de Melo [this message]
2017-06-16 19:26 ` [PATCH 2/3] perf tools: Fix build with ARCH=x86_64 Arnaldo Carvalho de Melo
2017-06-16 19:26 ` [PATCH 3/3] perf unwind: Report module before querying isactivation in dwfl unwind Arnaldo Carvalho de Melo
2017-06-16 19:34 ` [GIT PULL 0/3] perf/urgent fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170616192635.14535-2-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dsahern@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.