All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	David Ahern <dsahern@gmail.com>,
	Hendrik Brueckner <brueckner@linux.vnet.ibm.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>,
	Stephane Eranian <eranian@google.com>,
	Thomas Richter <tmricht@linux.vnet.ibm.com>,
	Wang Nan <wangnan0@huawei.com>
Subject: [PATCH 14/54] perf trace: Support setting cgroups as targets
Date: Thu,  8 Mar 2018 16:49:49 -0300	[thread overview]
Message-ID: <20180308195029.14991-15-acme@kernel.org> (raw)
In-Reply-To: <20180308195029.14991-1-acme@kernel.org>

From: Arnaldo Carvalho de Melo <acme@redhat.com>

One can set a cgroup as a default cgroup to be used by all events or
set cgroups with the 'perf stat' and 'perf record' behaviour, i.e.
'-G A' will be the cgroup for events defined so far in the command line.

Here in my main machine, with a kvm instance running a rhel6 guinea pig
I have:

  # ls -la /sys/fs/cgroup/perf_event/ | grep drw
  drwxr-xr-x. 14 root root 360 Mar  6 12:04 ..
  drwxr-xr-x.  3 root root   0 Mar  6 15:05 machine.slice
  #

So I can go ahead and use that cgroup hierarchy, say lets see what
syscalls are being emitted by threads in that 'machine.slice' hierarchy
that are taking more than 100ms:

  # perf trace --duration 100 -G machine.slice
     0.188 (249.850 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   250.274 (249.743 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   500.224 (249.755 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   750.097 (249.934 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1000.244 (249.780 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1250.197 (249.796 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1500.124 (249.859 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1750.076 (172.900 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   902.570 (1021.116 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
  1923.825 (305.133 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
  2000.172 (229.002 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
^C  #

If we look inside that cgroup hierarchy we get:

  # ls -la /sys/fs/cgroup/perf_event/machine.slice/ | grep drw
  drwxr-xr-x. 3 root root 0 Mar  6 15:05 .
  drwxr-xr-x. 2 root root 0 Mar  6 16:16 machine-qemu\x2d2\x2drhel6.sandy.scope
  #

There is just one, but lets say there were more and we would want to see
5 seconds worth of syscall summary for the threads in that cgroup:

  # perf trace --summary -G machine.slice/machine-qemu\\x2d2\\x2drhel6.sandy.scope/ -a sleep 5

   Summary of events:

     qemu-system-x86 (23667), 143858 events, 24.2%

     syscall            calls    total       min       avg       max      stddev
                                 (msec)    (msec)    (msec)    (msec)        (%)
     --------------- -------- --------- --------- --------- ---------     ------
     ppoll              28492  4348.631     0.000     0.153    11.616      1.05%
     futex              19661   140.801     0.001     0.007     2.993      3.20%
     read               18440    68.084     0.001     0.004     1.653      4.33%
     ioctl               5387    24.768     0.002     0.005     0.134      1.62%

     CPU 0/KVM (23744), 449455 events, 75.8%

     syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
     --------------- -------- --------- --------- --------- ---------     ------
     ioctl             148364  3401.812     0.000     0.023    11.801      1.15%
     futex              36131   404.127     0.001     0.011     7.377      2.63%
     writev             29452   339.688     0.003     0.012     1.740      1.36%
     write              11315    45.992     0.001     0.004     0.105      1.10%

  #

See the documentation about how to set more than one cgroup for
different events in the same command line.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-t126jh4occqvu0xdqlcjygex@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-trace.txt | 25 +++++++++++++++++
 tools/perf/builtin-trace.c              | 50 +++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index 33a88e984e66..5a7035c5c523 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -63,6 +63,31 @@ filter out the startup phase of the program, which is often very different.
 --uid=::
         Record events in threads owned by uid. Name or number.
 
+-G::
+--cgroup::
+	Record events in threads in a cgroup.
+
+	Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then
+	remove the /sys/fs/cgroup/perf_event/ part and try:
+
+		perf trace -G A -e sched:*switch
+
+	Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc
+	_and_ sched:sched_switch to the 'A' cgroup, while:
+
+		perf trace -e sched:*switch -G A
+
+	will only set the sched:sched_switch event to the 'A' cgroup, all the
+	other events (raw_syscalls:sys_{enter,exit}, etc are left "without"
+	a cgroup (on the root cgroup, sys wide, etc).
+
+	Multiple cgroups:
+
+		perf trace -G A -e sched:*switch -G B
+
+	the syscall ones go to the 'A' cgroup, the sched:sched_switch goes
+	to the 'B' cgroup.
+
 --filter-pids=::
 	Filter out events for these pids and for 'trace' itself (comma separated list).
 
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 1a93debc1e8d..5b81060a8117 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -19,6 +19,7 @@
 #include <traceevent/event-parse.h>
 #include <api/fs/tracing_path.h>
 #include "builtin.h"
+#include "util/cgroup.h"
 #include "util/color.h"
 #include "util/debug.h"
 #include "util/env.h"
@@ -83,6 +84,7 @@ struct trace {
 	struct perf_evlist	*evlist;
 	struct machine		*host;
 	struct thread		*current;
+	struct cgroup		*cgroup;
 	u64			base_time;
 	FILE			*output;
 	unsigned long		nr_events;
@@ -2370,6 +2372,34 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 				   trace__sched_stat_runtime))
 		goto out_error_sched_stat_runtime;
 
+	/*
+	 * If a global cgroup was set, apply it to all the events without an
+	 * explicit cgroup. I.e.:
+	 *
+	 * 	trace -G A -e sched:*switch
+	 *
+	 * Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc
+	 * _and_ sched:sched_switch to the 'A' cgroup, while:
+	 *
+	 * trace -e sched:*switch -G A
+	 *
+	 * will only set the sched:sched_switch event to the 'A' cgroup, all the
+	 * other events (raw_syscalls:sys_{enter,exit}, etc are left "without"
+	 * a cgroup (on the root cgroup, sys wide, etc).
+	 *
+	 * Multiple cgroups:
+	 *
+	 * trace -G A -e sched:*switch -G B
+	 *
+	 * the syscall ones go to the 'A' cgroup, the sched:sched_switch goes
+	 * to the 'B' cgroup.
+	 *
+	 * evlist__set_default_cgroup() grabs a reference of the passed cgroup
+	 * only for the evsels still without a cgroup, i.e. evsel->cgroup == NULL.
+	 */
+	if (trace->cgroup)
+		evlist__set_default_cgroup(trace->evlist, trace->cgroup);
+
 	err = perf_evlist__create_maps(evlist, &trace->opts.target);
 	if (err < 0) {
 		fprintf(trace->output, "Problems parsing the target to trace, check your options!\n");
@@ -2540,6 +2570,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	trace__symbols__exit(trace);
 
 	perf_evlist__delete(evlist);
+	cgroup__put(trace->cgroup);
 	trace->evlist = NULL;
 	trace->live = false;
 	return err;
@@ -2979,6 +3010,18 @@ static int trace__parse_events_option(const struct option *opt, const char *str,
 	return err;
 }
 
+static int trace__parse_cgroups(const struct option *opt, const char *str, int unset)
+{
+	struct trace *trace = opt->value;
+
+	if (!list_empty(&trace->evlist->entries))
+		return parse_cgroups(opt, str, unset);
+
+	trace->cgroup = evlist__findnew_cgroup(trace->evlist, str);
+
+	return 0;
+}
+
 int cmd_trace(int argc, const char **argv)
 {
 	const char *trace_usage[] = {
@@ -3069,6 +3112,8 @@ int cmd_trace(int argc, const char **argv)
 			"print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info, for debugging"),
 	OPT_UINTEGER(0, "proc-map-timeout", &trace.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_CALLBACK('G', "cgroup", &trace, "name", "monitor event in cgroup name only",
+		     trace__parse_cgroups),
 	OPT_UINTEGER('D', "delay", &trace.opts.initial_delay,
 		     "ms to wait before starting measurement after program "
 		     "start"),
@@ -3095,6 +3140,11 @@ int cmd_trace(int argc, const char **argv)
 	argc = parse_options_subcommand(argc, argv, trace_options, trace_subcommands,
 				 trace_usage, PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if ((nr_cgroups || trace.cgroup) && !trace.opts.target.system_wide) {
+		usage_with_options_msg(trace_usage, trace_options,
+				       "cgroup monitoring only available in system-wide mode");
+	}
+
 	err = bpf__setup_stdout(trace.evlist);
 	if (err) {
 		bpf__strerror_setup_stdout(trace.evlist, err, bf, sizeof(bf));
-- 
2.14.3

  parent reply	other threads:[~2018-03-08 19:49 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 19:49 [GIT PULL 00/54] perf/core improvements and fixes Arnaldo Carvalho de Melo
2018-03-08 19:49 ` Arnaldo Carvalho de Melo
2018-03-08 19:49 ` Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 01/54] perf cgroup: Remove misplaced __maybe_unused Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 02/54] perf cgroup: Rename 'struct cgroup_sel' to 'struct cgroup' Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 03/54] perf cgroup: Introduce cgroup__delete() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 04/54] perf cgroup: Rename close_cgroup() to cgroup__put() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 05/54] perf cgroup: Introduce cgroup__get() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 06/54] perf cgroup: Introduce find_cgroup() method Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 07/54] perf cgroup: Introduce cgroup__new() out of open coded equivalent Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 08/54] perf sched: Move thread::shortname to thread_runtime Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 09/54] perf sched map: Re-annotate shortname if thread comm changed Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 10/54] perf record: Combine some auxtrace initialization into a single function Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 11/54] perf cgroup: Add evlist__findnew_cgroup() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 12/54] perf cgroup: Add evlist__add_default_cgroup() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 13/54] perf cgroup: Make the cgroup name be const char * Arnaldo Carvalho de Melo
2018-03-08 19:49 ` Arnaldo Carvalho de Melo [this message]
2018-03-08 19:49 ` [PATCH 15/54] perf auxtrace: Add missing parameters from kernel-doc comments Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 16/54] perf auxtrace: Rename some buffer-queuing functions Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 17/54] perf auxtrace: Make auxtrace_queues__add_buffer() return buffer_ptr Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 18/54] perf tools: Correct title markers for asciidoctor Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 19/54] perf pmu: Support wildcards on pmu name in dynamic pmu events Arnaldo Carvalho de Melo
2018-03-08 19:49   ` Arnaldo Carvalho de Melo
2018-03-08 19:49   ` Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 20/54] perf pmu: Display pmu name when printing unmerged events in stat Arnaldo Carvalho de Melo
2018-03-08 19:49   ` Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 21/54] perf pmu: Auto-merge PMU events created by prefix or glob match Arnaldo Carvalho de Melo
2018-03-08 19:49   ` Arnaldo Carvalho de Melo
2018-03-08 19:49   ` Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 22/54] perf evlist: Store 'overwrite' in struct perf_mmap Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 23/54] perf mmap: Store mmap scope in struct perf_mmap() Arnaldo Carvalho de Melo
2018-03-08 19:49 ` [PATCH 24/54] perf mmap: Use the stored scope data in perf_mmap__push() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 25/54] perf mmap: Use the stored data in perf_mmap__read_event() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 26/54] perf mmap: Use stored 'overwrite' in perf_mmap__consume() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 27/54] perf mmap: Simplify perf_mmap__consume() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 28/54] perf mmap: Simplify perf_mmap__read_event() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 29/54] perf mmap: Simplify perf_mmap__read_init() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 30/54] perf intel-pt: Fix overlap detection to identify consecutive buffers correctly Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 31/54] perf intel-pt: Fix sync_switch Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 32/54] perf intel-pt: Fix error recovery from missing TIP packet Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 33/54] perf intel-pt: Fix timestamp following overflow Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 34/54] perf intel-pt/bts: In auxtrace_record__init_intel() evlist is never NULL Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 35/54] perf intel-pt: Get rid of intel_pt_use_buffer_pid_tid() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 36/54] perf intel-pt: Tidy old_buffer handling in intel_pt_get_trace() Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 37/54] perf intel-pt: Remove a check for sampling mode Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 38/54] perf intel-pt: Adjust overlap-checking to support " Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 39/54] perf annotate: Fix s390 target function disassembly Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 40/54] perf report: Fix the output for stdio events list Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 41/54] perf report: Display perf.data header info Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 42/54] perf record: Move machine variable down the function Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 43/54] perf record: Remove progname from struct record Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 44/54] perf tools: Add refcnt into struct mem_info Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 45/54] perf c2c: Use mem_info refcnt logic Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 46/54] perf tools: Add MEM_TOPOLOGY feature to perf data file Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 47/54] perf tools: Update tags with .cpp files Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 48/54] perf build: Add llvm/clang/cxx make tests into FEATURE_TESTS_EXTRA Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 49/54] perf build: Add llvm/clang make targets to FILES Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 50/54] perf build: Force llvm/clang test compile output to .make.output Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 51/54] perf report: Provide libtraceevent with a kernel symbol resolver Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 52/54] perf annotate: Support to display the IPC/Cycle in TUI mode Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 53/54] perf annotate: Handle s390 PC relative load and store instruction Arnaldo Carvalho de Melo
2018-03-08 19:50 ` [PATCH 54/54] perf tools: Update quipper information Arnaldo Carvalho de Melo
2018-03-09  7:29 ` [GIT PULL 00/54] perf/core improvements and fixes Ingo Molnar
2018-03-09  7:29   ` Ingo Molnar
2018-03-09  7:29   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180308195029.14991-15-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=brueckner@linux.vnet.ibm.com \
    --cc=dsahern@gmail.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=ravi.bangoria@linux.vnet.ibm.com \
    --cc=tmricht@linux.vnet.ibm.com \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.