- * [PATCH v7 1/6] perf evlist: Add perf_evlist__go_system_wide() helper
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 2/6] perf evlist: Add evlist__findnew_tracking_event() helper Yang Jihong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
For dummy events that keep tracking, we may need to modify its cpu_maps.
For example, change the cpu_maps to record sideband events for all CPUS.
Add perf_evlist__go_system_wide() helper to support this scenario.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c                  | 9 +++++++++
 tools/lib/perf/include/internal/evlist.h | 2 ++
 2 files changed, 11 insertions(+)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index b8b066d0dc5e..3acbbccc1901 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct perf_evlist *evlist)
 	}
 	return nr_groups;
 }
+
+void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
+{
+	if (!evsel->system_wide) {
+		evsel->system_wide = true;
+		if (evlist->needs_map_propagation)
+			__perf_evlist__propagate_maps(evlist, evsel);
+	}
+}
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 3339bc2f1765..d86ffe8ed483 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -135,4 +135,6 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
 void perf_evlist__reset_id_hash(struct perf_evlist *evlist);
 
 void __perf_evlist__set_leader(struct list_head *list, struct perf_evsel *leader);
+
+void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel);
 #endif /* __LIBPERF_INTERNAL_EVLIST_H */
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * [PATCH v7 2/6] perf evlist: Add evlist__findnew_tracking_event() helper
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 1/6] perf evlist: Add perf_evlist__go_system_wide() helper Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 3/6] perf record: Move setting tracking events before record__init_thread_masks() Yang Jihong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
Currently, intel-bts, intel-pt, and arm-spe may add tracking event to the
evlist. We may need to search for the tracking event for some settings.
Therefore, add evlist__findnew_tracking_event() helper.
If system_wide is true, evlist__findnew_tracking_event() set the cpu map
of the evsel to all online CPUs.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-record.c | 11 +++--------
 tools/perf/util/evlist.c    | 18 ++++++++++++++++++
 tools/perf/util/evlist.h    |  1 +
 3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 34bb31f08bb5..12edad8392cc 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1293,14 +1293,9 @@ static int record__open(struct record *rec)
 	 */
 	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
 	    perf_pmus__num_core_pmus() > 1) {
-		pos = evlist__get_tracking_event(evlist);
-		if (!evsel__is_dummy_event(pos)) {
-			/* Set up dummy event. */
-			if (evlist__add_dummy(evlist))
-				return -ENOMEM;
-			pos = evlist__last(evlist);
-			evlist__set_tracking_event(evlist, pos);
-		}
+		pos = evlist__findnew_tracking_event(evlist, false);
+		if (!pos)
+			return -ENOMEM;
 
 		/*
 		 * Enable the dummy event when the process is forked for
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7ef43f72098e..25c3ebe2c2f5 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1694,6 +1694,24 @@ void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_ev
 	tracking_evsel->tracking = true;
 }
 
+struct evsel *evlist__findnew_tracking_event(struct evlist *evlist, bool system_wide)
+{
+	struct evsel *evsel;
+
+	evsel = evlist__get_tracking_event(evlist);
+	if (!evsel__is_dummy_event(evsel)) {
+		evsel = evlist__add_aux_dummy(evlist, system_wide);
+		if (!evsel)
+			return NULL;
+
+		evlist__set_tracking_event(evlist, evsel);
+	} else if (system_wide) {
+		perf_evlist__go_system_wide(&evlist->core, &evsel->core);
+	}
+
+	return evsel;
+}
+
 struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 664c6bf7b3e0..98e7ddb2bd30 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -387,6 +387,7 @@ bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu_itr);
 
 struct evsel *evlist__get_tracking_event(struct evlist *evlist);
 void evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel);
+struct evsel *evlist__findnew_tracking_event(struct evlist *evlist, bool system_wide);
 
 struct evsel *evlist__find_evsel_by_str(struct evlist *evlist, const char *str);
 
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * [PATCH v7 3/6] perf record: Move setting tracking events before record__init_thread_masks()
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 1/6] perf evlist: Add perf_evlist__go_system_wide() helper Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 2/6] perf evlist: Add evlist__findnew_tracking_event() helper Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
User space tasks can migrate between CPUs, so when tracing selected CPUs,
sideband for all CPUs is needed. In this case set the cpu map of the evsel
to all online CPUs. This may modify the original cpu map of the evlist.
Therefore, need to check whether the preceding scenario exists before
record__init_thread_masks().
Dummy tracking has been set in record__open(), move it before
record__init_thread_masks() and add a helper for unified processing.
The sys_perf_event_open invoked is as follows:
  # perf --debug verbose=3 record -e cpu-clock -D 100 true
  <SNIP>
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0 (PERF_COUNT_SW_CPU_CLOCK)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 10318  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 10318  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 10318  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid 10318  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 10318  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 10318  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 10318  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 10318  cpu 7  group_fd -1  flags 0x8 = 13
  Opening: dummy:u
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0x9 (PERF_COUNT_SW_DUMMY)
    { sample_period, sample_freq }   1
    sample_type                      IP|TID|TIME|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    exclude_kernel                   1
    exclude_hv                       1
    mmap                             1
    comm                             1
    enable_on_exec                   1
    task                             1
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid 10318  cpu 0  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid 10318  cpu 1  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid 10318  cpu 2  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid 10318  cpu 3  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid 10318  cpu 4  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid 10318  cpu 5  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid 10318  cpu 6  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid 10318  cpu 7  group_fd -1  flags 0x8 = 21
  <SNIP>
perf test need to update base-record & system-wide-dummy attr expected values
for test-record-C0:
1. Because a dummy sideband event is added to the sampling of specified
   CPUs. When evlist contains evsel of different sample_type,
   evlist__config() will change the default PERF_SAMPLE_ID bit to
   PERF_SAMPLE_IDENTIFICATION bit.
   The attr sample_type expected value of base-record and system-wide-dummy
   in test-record-C0 needs to be updated.
2. The perf record uses evlist__add_aux_dummy() instead of
   evlist__add_dummy() to add a dummy event.
   The expected value of system-wide-dummy attr needs to be updated.
The perf test result is as follows:
  # ./perf test list  2>&1 | grep 'Setup struct perf_event_attr'
   17: Setup struct perf_event_attr
  # ./perf test 17
   17: Setup struct perf_event_attr                                    : Ok
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-record.c             | 59 ++++++++++++++++---------
 tools/perf/tests/attr/system-wide-dummy | 14 +++---
 tools/perf/tests/attr/test-record-C0    |  4 +-
 3 files changed, 47 insertions(+), 30 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 12edad8392cc..83bd1f117191 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -906,6 +906,37 @@ static int record__config_off_cpu(struct record *rec)
 	return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
 }
 
+static int record__config_tracking_events(struct record *rec)
+{
+	struct record_opts *opts = &rec->opts;
+	struct evlist *evlist = rec->evlist;
+	struct evsel *evsel;
+
+	/*
+	 * For initial_delay, system wide or a hybrid system, we need to add
+	 * tracking event so that we can track PERF_RECORD_MMAP to cover the
+	 * delay of waiting or event synthesis.
+	 */
+	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
+	    perf_pmus__num_core_pmus() > 1) {
+		evsel = evlist__findnew_tracking_event(evlist, false);
+		if (!evsel)
+			return -ENOMEM;
+
+		/*
+		 * Enable the tracking event when the process is forked for
+		 * initial_delay, immediately for system wide.
+		 */
+		if (opts->target.initial_delay && !evsel->immediate &&
+		    !target__has_cpu(&opts->target))
+			evsel->core.attr.enable_on_exec = 1;
+		else
+			evsel->immediate = 1;
+	}
+
+	return 0;
+}
+
 static bool record__kcore_readable(struct machine *machine)
 {
 	char kcore[PATH_MAX];
@@ -1286,28 +1317,6 @@ static int record__open(struct record *rec)
 	struct record_opts *opts = &rec->opts;
 	int rc = 0;
 
-	/*
-	 * For initial_delay, system wide or a hybrid system, we need to add a
-	 * dummy event so that we can track PERF_RECORD_MMAP to cover the delay
-	 * of waiting or event synthesis.
-	 */
-	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
-	    perf_pmus__num_core_pmus() > 1) {
-		pos = evlist__findnew_tracking_event(evlist, false);
-		if (!pos)
-			return -ENOMEM;
-
-		/*
-		 * Enable the dummy event when the process is forked for
-		 * initial_delay, immediately for system wide.
-		 */
-		if (opts->target.initial_delay && !pos->immediate &&
-		    !target__has_cpu(&opts->target))
-			pos->core.attr.enable_on_exec = 1;
-		else
-			pos->immediate = 1;
-	}
-
 	evlist__config(evlist, opts, &callchain_param);
 
 	evlist__for_each_entry(evlist, pos) {
@@ -4190,6 +4199,12 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	err = record__config_tracking_events(rec);
+	if (err) {
+		pr_err("record__config_tracking_events failed, error %d\n", err);
+		goto out;
+	}
+
 	err = record__init_thread_masks(rec);
 	if (err) {
 		pr_err("Failed to initialize parallel data streaming masks\n");
diff --git a/tools/perf/tests/attr/system-wide-dummy b/tools/perf/tests/attr/system-wide-dummy
index 2f3e3eb728eb..a1e1d6a263bf 100644
--- a/tools/perf/tests/attr/system-wide-dummy
+++ b/tools/perf/tests/attr/system-wide-dummy
@@ -9,8 +9,10 @@ flags=8
 type=1
 size=136
 config=9
-sample_period=4000
-sample_type=455
+sample_period=1
+# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
+# PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER
+sample_type=65671
 read_format=4|20
 # Event will be enabled right away.
 disabled=0
@@ -18,12 +20,12 @@ inherit=1
 pinned=0
 exclusive=0
 exclude_user=0
-exclude_kernel=0
-exclude_hv=0
+exclude_kernel=1
+exclude_hv=1
 exclude_idle=0
 mmap=1
 comm=1
-freq=1
+freq=0
 inherit_stat=0
 enable_on_exec=0
 task=1
@@ -32,7 +34,7 @@ precise_ip=0
 mmap_data=0
 sample_id_all=1
 exclude_host=0
-exclude_guest=0
+exclude_guest=1
 exclude_callchain_kernel=0
 exclude_callchain_user=0
 mmap2=1
diff --git a/tools/perf/tests/attr/test-record-C0 b/tools/perf/tests/attr/test-record-C0
index 317730b906dd..198e8429a1bf 100644
--- a/tools/perf/tests/attr/test-record-C0
+++ b/tools/perf/tests/attr/test-record-C0
@@ -10,9 +10,9 @@ cpu=0
 enable_on_exec=0
 
 # PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
-# PERF_SAMPLE_ID | PERF_SAMPLE_PERIOD
+# PERF_SAMPLE_PERIOD | PERF_SAMPLE_IDENTIFIER
 # + PERF_SAMPLE_CPU added by -C 0
-sample_type=455
+sample_type=65927
 
 # Dummy event handles mmaps, comm and task.
 mmap=0
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
                   ` (2 preceding siblings ...)
  2023-08-26  3:26 ` [PATCH v7 3/6] perf record: Move setting tracking events before record__init_thread_masks() Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-26 14:59   ` Namhyung Kim
  2023-08-26  3:26 ` [PATCH v7 5/6] perf test: Add test case for record sideband events Yang Jihong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.
The specific scenarios are as follows:
         CPU0                                 CPU1
  perf record -C 0 start
                              taskA starts to be created and executed
                                -> PERF_RECORD_COMM and PERF_RECORD_MMAP
                                   events only deliver to CPU1
                              ......
                                |
                          migrate to CPU0
                                |
  Running on CPU0    <----------/
  ...
  perf record -C 0 stop
Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.
The solution is to record sideband events for all CPUs when tracing
selected CPUs. Because this modifies the default behavior, add related
comments to the perf record man page.
The sys_perf_event_open invoked is as follows:
  # perf --debug verbose=3 record -e cpu-clock -C 1 true
  <SNIP>
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0 (PERF_COUNT_SW_CPU_CLOCK)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
  Opening: dummy:u
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0x9 (PERF_COUNT_SW_DUMMY)
    { sample_period, sample_freq }   1
    sample_type                      IP|TID|TIME|CPU|IDENTIFIER
    read_format                      ID|LOST
    inherit                          1
    exclude_kernel                   1
    exclude_hv                       1
    mmap                             1
    comm                             1
    task                             1
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
  <SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  3 ++
 tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d5217be012d7..1889f66addf2 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
 In per-thread mode with inheritance mode on (default), samples are captured only when
 the thread executes on the designated CPUs. Default is to monitor all CPUs.
 
+User space tasks can migrate between CPUs, so when tracing selected CPUs,
+a dummy event is created to track sideband for all CPUs.
+
 -B::
 --no-buildid::
 Do not save the build ids of binaries in the perf.data files. This skips
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 83bd1f117191..21c571018148 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
 	return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
 }
 
+static bool record__tracking_system_wide(struct record *rec)
+{
+	struct record_opts *opts = &rec->opts;
+	struct evlist *evlist = rec->evlist;
+	struct evsel *evsel;
+
+	/*
+	 * If all (non-dummy) evsel have exclude_user,
+	 * system_wide is not needed.
+	 *
+	 * all_kernel and all_user will overwrite exclude_kernel and
+	 * exclude_user of attr in evsel__config(), here need to check
+	 * all the three items.
+	 *
+	 * Sideband system wide if one of the following conditions is met:
+	 *
+	 *   - all_user is set, and there is a non-dummy event
+	 *   - all_user and all_kernel are not set, and there is
+	 *     a non-dummy event without exclude_user
+	 */
+	if (opts->all_kernel)
+		return false;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (!evsel__is_dummy_event(evsel)) {
+			if (opts->all_user || !evsel->core.attr.exclude_user)
+				return true;
+		}
+	}
+
+	return false;
+}
+
 static int record__config_tracking_events(struct record *rec)
 {
 	struct record_opts *opts = &rec->opts;
 	struct evlist *evlist = rec->evlist;
+	bool system_wide = false;
 	struct evsel *evsel;
 
 	/*
@@ -919,7 +953,15 @@ static int record__config_tracking_events(struct record *rec)
 	 */
 	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
 	    perf_pmus__num_core_pmus() > 1) {
-		evsel = evlist__findnew_tracking_event(evlist, false);
+
+		/*
+		 * User space tasks can migrate between CPUs, so when tracing
+		 * selected CPUs, sideband for all CPUs is still needed.
+		 */
+		if (!!opts->target.cpu_list && record__tracking_system_wide(rec))
+			system_wide = true;
+
+		evsel = evlist__findnew_tracking_event(evlist, system_wide);
 		if (!evsel)
 			return -ENOMEM;
 
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-26  3:26 ` [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
@ 2023-08-26 14:59   ` Namhyung Kim
  2023-08-28  3:03     ` Yang Jihong
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2023-08-26 14:59 UTC (permalink / raw)
  To: Yang Jihong
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, james.clark, tmricht, ak,
	anshuman.khandual, linux-kernel, linux-perf-users
Hello,
On Fri, Aug 25, 2023 at 8:29 PM Yang Jihong <yangjihong1@huawei.com> wrote:
>
> User space tasks can migrate between CPUs, we need to track side-band
> events for all CPUs.
>
> The specific scenarios are as follows:
>
>          CPU0                                 CPU1
>   perf record -C 0 start
>                               taskA starts to be created and executed
>                                 -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>                                    events only deliver to CPU1
>                               ......
>                                 |
>                           migrate to CPU0
>                                 |
>   Running on CPU0    <----------/
>   ...
>
>   perf record -C 0 stop
>
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
> Therefore, the comm and symbols of taskA cannot be parsed.
>
> The solution is to record sideband events for all CPUs when tracing
> selected CPUs. Because this modifies the default behavior, add related
> comments to the perf record man page.
>
> The sys_perf_event_open invoked is as follows:
>
>   # perf --debug verbose=3 record -e cpu-clock -C 1 true
>   <SNIP>
>   Opening: cpu-clock
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1 (PERF_TYPE_SOFTWARE)
>     size                             136
>     config                           0 (PERF_COUNT_SW_CPU_CLOCK)
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
>     read_format                      ID|LOST
>     disabled                         1
>     inherit                          1
>     freq                             1
>     sample_id_all                    1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>   Opening: dummy:u
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1 (PERF_TYPE_SOFTWARE)
>     size                             136
>     config                           0x9 (PERF_COUNT_SW_DUMMY)
>     { sample_period, sample_freq }   1
>     sample_type                      IP|TID|TIME|CPU|IDENTIFIER
>     read_format                      ID|LOST
>     inherit                          1
>     exclude_kernel                   1
>     exclude_hv                       1
>     mmap                             1
>     comm                             1
>     task                             1
>     sample_id_all                    1
>     exclude_guest                    1
>     mmap2                            1
>     comm_exec                        1
>     ksymbol                          1
>     bpf_event                        1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>   sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>   sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>   sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>   sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>   sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>   sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>   <SNIP>
>
> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
> Acked-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-record.txt |  3 ++
>  tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++-
>  2 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index d5217be012d7..1889f66addf2 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
>  In per-thread mode with inheritance mode on (default), samples are captured only when
>  the thread executes on the designated CPUs. Default is to monitor all CPUs.
>
> +User space tasks can migrate between CPUs, so when tracing selected CPUs,
> +a dummy event is created to track sideband for all CPUs.
> +
>  -B::
>  --no-buildid::
>  Do not save the build ids of binaries in the perf.data files. This skips
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 83bd1f117191..21c571018148 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
>         return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>  }
>
> +static bool record__tracking_system_wide(struct record *rec)
> +{
> +       struct record_opts *opts = &rec->opts;
> +       struct evlist *evlist = rec->evlist;
> +       struct evsel *evsel;
> +
> +       /*
> +        * If all (non-dummy) evsel have exclude_user,
> +        * system_wide is not needed.
Maybe I missed some earlier discussion but why is it not
needed when exclude_user is set?  I think it still needs
FORK or COMM at least..
Thanks,
Namhyung
> +        *
> +        * all_kernel and all_user will overwrite exclude_kernel and
> +        * exclude_user of attr in evsel__config(), here need to check
> +        * all the three items.
> +        *
> +        * Sideband system wide if one of the following conditions is met:
> +        *
> +        *   - all_user is set, and there is a non-dummy event
> +        *   - all_user and all_kernel are not set, and there is
> +        *     a non-dummy event without exclude_user
> +        */
> +       if (opts->all_kernel)
> +               return false;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               if (!evsel__is_dummy_event(evsel)) {
> +                       if (opts->all_user || !evsel->core.attr.exclude_user)
> +                               return true;
> +               }
> +       }
> +
> +       return false;
> +}
> +
>  static int record__config_tracking_events(struct record *rec)
>  {
>         struct record_opts *opts = &rec->opts;
>         struct evlist *evlist = rec->evlist;
> +       bool system_wide = false;
>         struct evsel *evsel;
>
>         /*
> @@ -919,7 +953,15 @@ static int record__config_tracking_events(struct record *rec)
>          */
>         if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
>             perf_pmus__num_core_pmus() > 1) {
> -               evsel = evlist__findnew_tracking_event(evlist, false);
> +
> +               /*
> +                * User space tasks can migrate between CPUs, so when tracing
> +                * selected CPUs, sideband for all CPUs is still needed.
> +                */
> +               if (!!opts->target.cpu_list && record__tracking_system_wide(rec))
> +                       system_wide = true;
> +
> +               evsel = evlist__findnew_tracking_event(evlist, system_wide);
>                 if (!evsel)
>                         return -ENOMEM;
>
> --
> 2.30.GIT
>
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-26 14:59   ` Namhyung Kim
@ 2023-08-28  3:03     ` Yang Jihong
  2023-08-28 11:25       ` Adrian Hunter
  0 siblings, 1 reply; 15+ messages in thread
From: Yang Jihong @ 2023-08-28  3:03 UTC (permalink / raw)
  To: Namhyung Kim, Adrian Hunter
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, kan.liang, james.clark, tmricht, ak, anshuman.khandual,
	linux-kernel, linux-perf-users
Hello,
On 2023/8/26 22:59, Namhyung Kim wrote:
> Hello,
> 
> On Fri, Aug 25, 2023 at 8:29 PM Yang Jihong <yangjihong1@huawei.com> wrote:
>>
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>>           CPU0                                 CPU1
>>    perf record -C 0 start
>>                                taskA starts to be created and executed
>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>                                     events only deliver to CPU1
>>                                ......
>>                                  |
>>                            migrate to CPU0
>>                                  |
>>    Running on CPU0    <----------/
>>    ...
>>
>>    perf record -C 0 stop
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
>> Therefore, the comm and symbols of taskA cannot be parsed.
>>
>> The solution is to record sideband events for all CPUs when tracing
>> selected CPUs. Because this modifies the default behavior, add related
>> comments to the perf record man page.
>>
>> The sys_perf_event_open invoked is as follows:
>>
>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>    <SNIP>
>>    Opening: cpu-clock
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             1 (PERF_TYPE_SOFTWARE)
>>      size                             136
>>      config                           0 (PERF_COUNT_SW_CPU_CLOCK)
>>      { sample_period, sample_freq }   4000
>>      sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
>>      read_format                      ID|LOST
>>      disabled                         1
>>      inherit                          1
>>      freq                             1
>>      sample_id_all                    1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>    Opening: dummy:u
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             1 (PERF_TYPE_SOFTWARE)
>>      size                             136
>>      config                           0x9 (PERF_COUNT_SW_DUMMY)
>>      { sample_period, sample_freq }   1
>>      sample_type                      IP|TID|TIME|CPU|IDENTIFIER
>>      read_format                      ID|LOST
>>      inherit                          1
>>      exclude_kernel                   1
>>      exclude_hv                       1
>>      mmap                             1
>>      comm                             1
>>      task                             1
>>      sample_id_all                    1
>>      exclude_guest                    1
>>      mmap2                            1
>>      comm_exec                        1
>>      ksymbol                          1
>>      bpf_event                        1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>    <SNIP>
>>
>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>> Acked-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>   tools/perf/Documentation/perf-record.txt |  3 ++
>>   tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++-
>>   2 files changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index d5217be012d7..1889f66addf2 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
>>   In per-thread mode with inheritance mode on (default), samples are captured only when
>>   the thread executes on the designated CPUs. Default is to monitor all CPUs.
>>
>> +User space tasks can migrate between CPUs, so when tracing selected CPUs,
>> +a dummy event is created to track sideband for all CPUs.
>> +
>>   -B::
>>   --no-buildid::
>>   Do not save the build ids of binaries in the perf.data files. This skips
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 83bd1f117191..21c571018148 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
>>          return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>   }
>>
>> +static bool record__tracking_system_wide(struct record *rec)
>> +{
>> +       struct record_opts *opts = &rec->opts;
>> +       struct evlist *evlist = rec->evlist;
>> +       struct evsel *evsel;
>> +
>> +       /*
>> +        * If all (non-dummy) evsel have exclude_user,
>> +        * system_wide is not needed.
> 
> Maybe I missed some earlier discussion but why is it not
> needed when exclude_user is set?  I think it still needs
> FORK or COMM at least..
> 
This is Adrian's suggestion earlier, I think it's probably because if 
exclude_user is set, MMAP information is not needed, that's my guess.
However, as you said, even if exclude_user is set, at least FORK and 
COMM are required.
Therefore, the conditions here need to be changed to:
"system_wide is need as long as there is a non-dummy event."
@Adrian, is this change okay?
Thanks,
Yang
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-28  3:03     ` Yang Jihong
@ 2023-08-28 11:25       ` Adrian Hunter
  2023-08-29  1:40         ` Yang Jihong
  0 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2023-08-28 11:25 UTC (permalink / raw)
  To: Yang Jihong, Namhyung Kim
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, kan.liang, james.clark, tmricht, ak, anshuman.khandual,
	linux-kernel, linux-perf-users
On 28/08/23 06:03, Yang Jihong wrote:
> Hello,
> 
> On 2023/8/26 22:59, Namhyung Kim wrote:
>> Hello,
>>
>> On Fri, Aug 25, 2023 at 8:29 PM Yang Jihong <yangjihong1@huawei.com> wrote:
>>>
>>> User space tasks can migrate between CPUs, we need to track side-band
>>> events for all CPUs.
>>>
>>> The specific scenarios are as follows:
>>>
>>>           CPU0                                 CPU1
>>>    perf record -C 0 start
>>>                                taskA starts to be created and executed
>>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>>                                     events only deliver to CPU1
>>>                                ......
>>>                                  |
>>>                            migrate to CPU0
>>>                                  |
>>>    Running on CPU0    <----------/
>>>    ...
>>>
>>>    perf record -C 0 stop
>>>
>>> Now perf samples the PC of taskA. However, perf does not record the
>>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>
>>> The solution is to record sideband events for all CPUs when tracing
>>> selected CPUs. Because this modifies the default behavior, add related
>>> comments to the perf record man page.
>>>
>>> The sys_perf_event_open invoked is as follows:
>>>
>>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>>    <SNIP>
>>>    Opening: cpu-clock
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1 (PERF_TYPE_SOFTWARE)
>>>      size                             136
>>>      config                           0 (PERF_COUNT_SW_CPU_CLOCK)
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
>>>      read_format                      ID|LOST
>>>      disabled                         1
>>>      inherit                          1
>>>      freq                             1
>>>      sample_id_all                    1
>>>      exclude_guest                    1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>>    Opening: dummy:u
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1 (PERF_TYPE_SOFTWARE)
>>>      size                             136
>>>      config                           0x9 (PERF_COUNT_SW_DUMMY)
>>>      { sample_period, sample_freq }   1
>>>      sample_type                      IP|TID|TIME|CPU|IDENTIFIER
>>>      read_format                      ID|LOST
>>>      inherit                          1
>>>      exclude_kernel                   1
>>>      exclude_hv                       1
>>>      mmap                             1
>>>      comm                             1
>>>      task                             1
>>>      sample_id_all                    1
>>>      exclude_guest                    1
>>>      mmap2                            1
>>>      comm_exec                        1
>>>      ksymbol                          1
>>>      bpf_event                        1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>>    <SNIP>
>>>
>>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>>> Acked-by: Adrian Hunter <adrian.hunter@intel.com>
>>> ---
>>>   tools/perf/Documentation/perf-record.txt |  3 ++
>>>   tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++-
>>>   2 files changed, 46 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>>> index d5217be012d7..1889f66addf2 100644
>>> --- a/tools/perf/Documentation/perf-record.txt
>>> +++ b/tools/perf/Documentation/perf-record.txt
>>> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
>>>   In per-thread mode with inheritance mode on (default), samples are captured only when
>>>   the thread executes on the designated CPUs. Default is to monitor all CPUs.
>>>
>>> +User space tasks can migrate between CPUs, so when tracing selected CPUs,
>>> +a dummy event is created to track sideband for all CPUs.
>>> +
>>>   -B::
>>>   --no-buildid::
>>>   Do not save the build ids of binaries in the perf.data files. This skips
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 83bd1f117191..21c571018148 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
>>>          return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>>   }
>>>
>>> +static bool record__tracking_system_wide(struct record *rec)
>>> +{
>>> +       struct record_opts *opts = &rec->opts;
>>> +       struct evlist *evlist = rec->evlist;
>>> +       struct evsel *evsel;
>>> +
>>> +       /*
>>> +        * If all (non-dummy) evsel have exclude_user,
>>> +        * system_wide is not needed.
>>
>> Maybe I missed some earlier discussion but why is it not
>> needed when exclude_user is set?  I think it still needs
>> FORK or COMM at least..
>>
> 
> This is Adrian's suggestion earlier, I think it's probably because if exclude_user is set, MMAP information is not needed, that's my guess.
> 
> However, as you said, even if exclude_user is set, at least FORK and COMM are required.
> 
> Therefore, the conditions here need to be changed to:
> "system_wide is need as long as there is a non-dummy event."
> 
> @Adrian, is this change okay?
If you wish.  I think we use FORK to get mappings, so not sure it
would be needed.  There is PID, TID so COMM is not essential.
There are FORK, COMM etc from the CPUs being traced of course.
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-28 11:25       ` Adrian Hunter
@ 2023-08-29  1:40         ` Yang Jihong
  0 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-29  1:40 UTC (permalink / raw)
  To: Adrian Hunter, Namhyung Kim
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, kan.liang, james.clark, tmricht, ak, anshuman.khandual,
	linux-kernel, linux-perf-users
Hello,
On 2023/8/28 19:25, Adrian Hunter wrote:
> On 28/08/23 06:03, Yang Jihong wrote:
>> Hello,
>>
>> On 2023/8/26 22:59, Namhyung Kim wrote:
>>> Hello,
>>>
>>> On Fri, Aug 25, 2023 at 8:29 PM Yang Jihong <yangjihong1@huawei.com> wrote:
>>>>
>>>> User space tasks can migrate between CPUs, we need to track side-band
>>>> events for all CPUs.
>>>>
>>>> The specific scenarios are as follows:
>>>>
>>>>            CPU0                                 CPU1
>>>>     perf record -C 0 start
>>>>                                 taskA starts to be created and executed
>>>>                                   -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>>>                                      events only deliver to CPU1
>>>>                                 ......
>>>>                                   |
>>>>                             migrate to CPU0
>>>>                                   |
>>>>     Running on CPU0    <----------/
>>>>     ...
>>>>
>>>>     perf record -C 0 stop
>>>>
>>>> Now perf samples the PC of taskA. However, perf does not record the
>>>> PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
>>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>>
>>>> The solution is to record sideband events for all CPUs when tracing
>>>> selected CPUs. Because this modifies the default behavior, add related
>>>> comments to the perf record man page.
>>>>
>>>> The sys_perf_event_open invoked is as follows:
>>>>
>>>>     # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>>>     <SNIP>
>>>>     Opening: cpu-clock
>>>>     ------------------------------------------------------------
>>>>     perf_event_attr:
>>>>       type                             1 (PERF_TYPE_SOFTWARE)
>>>>       size                             136
>>>>       config                           0 (PERF_COUNT_SW_CPU_CLOCK)
>>>>       { sample_period, sample_freq }   4000
>>>>       sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
>>>>       read_format                      ID|LOST
>>>>       disabled                         1
>>>>       inherit                          1
>>>>       freq                             1
>>>>       sample_id_all                    1
>>>>       exclude_guest                    1
>>>>     ------------------------------------------------------------
>>>>     sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>>>     Opening: dummy:u
>>>>     ------------------------------------------------------------
>>>>     perf_event_attr:
>>>>       type                             1 (PERF_TYPE_SOFTWARE)
>>>>       size                             136
>>>>       config                           0x9 (PERF_COUNT_SW_DUMMY)
>>>>       { sample_period, sample_freq }   1
>>>>       sample_type                      IP|TID|TIME|CPU|IDENTIFIER
>>>>       read_format                      ID|LOST
>>>>       inherit                          1
>>>>       exclude_kernel                   1
>>>>       exclude_hv                       1
>>>>       mmap                             1
>>>>       comm                             1
>>>>       task                             1
>>>>       sample_id_all                    1
>>>>       exclude_guest                    1
>>>>       mmap2                            1
>>>>       comm_exec                        1
>>>>       ksymbol                          1
>>>>       bpf_event                        1
>>>>     ------------------------------------------------------------
>>>>     sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>>>     sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>>>     sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>>>     sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>>>     sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>>>     sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>>>     sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>>>     sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>>>     <SNIP>
>>>>
>>>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>>>> Acked-by: Adrian Hunter <adrian.hunter@intel.com>
>>>> ---
>>>>    tools/perf/Documentation/perf-record.txt |  3 ++
>>>>    tools/perf/builtin-record.c              | 44 +++++++++++++++++++++++-
>>>>    2 files changed, 46 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>>>> index d5217be012d7..1889f66addf2 100644
>>>> --- a/tools/perf/Documentation/perf-record.txt
>>>> +++ b/tools/perf/Documentation/perf-record.txt
>>>> @@ -374,6 +374,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
>>>>    In per-thread mode with inheritance mode on (default), samples are captured only when
>>>>    the thread executes on the designated CPUs. Default is to monitor all CPUs.
>>>>
>>>> +User space tasks can migrate between CPUs, so when tracing selected CPUs,
>>>> +a dummy event is created to track sideband for all CPUs.
>>>> +
>>>>    -B::
>>>>    --no-buildid::
>>>>    Do not save the build ids of binaries in the perf.data files. This skips
>>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>>> index 83bd1f117191..21c571018148 100644
>>>> --- a/tools/perf/builtin-record.c
>>>> +++ b/tools/perf/builtin-record.c
>>>> @@ -906,10 +906,44 @@ static int record__config_off_cpu(struct record *rec)
>>>>           return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>>>    }
>>>>
>>>> +static bool record__tracking_system_wide(struct record *rec)
>>>> +{
>>>> +       struct record_opts *opts = &rec->opts;
>>>> +       struct evlist *evlist = rec->evlist;
>>>> +       struct evsel *evsel;
>>>> +
>>>> +       /*
>>>> +        * If all (non-dummy) evsel have exclude_user,
>>>> +        * system_wide is not needed.
>>>
>>> Maybe I missed some earlier discussion but why is it not
>>> needed when exclude_user is set?  I think it still needs
>>> FORK or COMM at least..
>>>
>>
>> This is Adrian's suggestion earlier, I think it's probably because if exclude_user is set, MMAP information is not needed, that's my guess.
>>
>> However, as you said, even if exclude_user is set, at least FORK and COMM are required.
>>
>> Therefore, the conditions here need to be changed to:
>> "system_wide is need as long as there is a non-dummy event."
>>
>> @Adrian, is this change okay?
> 
> If you wish.  I think we use FORK to get mappings, so not sure it
> would be needed.  There is PID, TID so COMM is not essential.
> There are FORK, COMM etc from the CPUs being traced of course.
> 
I think COMM is also necessary. If there is no COMM, the perf report 
cannot display the comm of the process migrated from other core.
The test result is as follows:
   # perf report --stdio
   # To display the perf.data header info, please use 
--header/--header-only options.
   #
   #
   # Total Lost Samples: 0
   #
   # Samples: 33  of event 'cpu-clock:khppp'
   # Event count (approx.): 8250000
   #
   # Overhead  Command          Shared Object      Symbol
   # ........  ...............  ................. 
.......................................
   #
       15.15%  test3            [kernel.kallsyms]  [k] finish_task_switch
        9.09%  kworker/1:0-eve  [kernel.kallsyms]  [k] _raw_spin_unlock_irq
        3.03%  :934             [kernel.kallsyms]  [k] blk_done_softirq
        3.03%  :934             [kernel.kallsyms]  [k] finish_task_switch
   <SNIP>
Thanks,
Yang
^ permalink raw reply	[flat|nested] 15+ messages in thread
 
 
 
 
- * [PATCH v7 5/6] perf test: Add test case for record sideband events
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
                   ` (3 preceding siblings ...)
  2023-08-26  3:26 ` [PATCH v7 4/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-26  3:26 ` [PATCH v7 6/6] perf test: Add perf_event_attr test for record selected CPUs exclude_user Yang Jihong
  2023-08-29  5:37 ` [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Ravi Bangoria
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
Add a new test case to record sideband events for all CPUs when tracing
selected CPUs
Test result:
  # ./perf test list 2>&1 | grep 'perf record sideband tests'
   95: perf record sideband tests
  # ./perf test 95
   95: perf record sideband tests                                      : Ok
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
 tools/perf/tests/shell/record_sideband.sh | 58 +++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100755 tools/perf/tests/shell/record_sideband.sh
diff --git a/tools/perf/tests/shell/record_sideband.sh b/tools/perf/tests/shell/record_sideband.sh
new file mode 100755
index 000000000000..5024a7ce0c51
--- /dev/null
+++ b/tools/perf/tests/shell/record_sideband.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+# perf record sideband tests
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+err=0
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+
+cleanup()
+{
+    rm -rf ${perfdata}
+    trap - EXIT TERM INT
+}
+
+trap_cleanup()
+{
+    cleanup
+    exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+can_cpu_wide()
+{
+    if ! perf record -o ${perfdata} -BN --no-bpf-event -C $1 true 2>&1 >/dev/null
+    then
+        echo "record sideband test [Skipped cannot record cpu$1]"
+        err=2
+    fi
+
+    rm -f ${perfdata}
+    return $err
+}
+
+test_system_wide_tracking()
+{
+    # Need CPU 0 and CPU 1
+    can_cpu_wide 0 || return 0
+    can_cpu_wide 1 || return 0
+
+    # Record on CPU 0 a task running on CPU 1
+    perf record -BN --no-bpf-event -o ${perfdata} -C 0 -- taskset --cpu-list 1 true
+
+    # Should get MMAP events from CPU 1
+    mmap_cnt=`perf script -i ${perfdata} --show-mmap-events -C 1 2>/dev/null | grep MMAP | wc -l`
+
+    if [ ${mmap_cnt} -gt 0 ] ; then
+        return 0
+    fi
+
+    echo "Failed to record MMAP events on CPU 1 when tracing CPU 0"
+    return 1
+}
+
+test_system_wide_tracking
+
+cleanup
+exit $err
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * [PATCH v7 6/6] perf test: Add perf_event_attr test for record selected CPUs exclude_user
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
                   ` (4 preceding siblings ...)
  2023-08-26  3:26 ` [PATCH v7 5/6] perf test: Add test case for record sideband events Yang Jihong
@ 2023-08-26  3:26 ` Yang Jihong
  2023-08-29  5:37 ` [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Ravi Bangoria
  6 siblings, 0 replies; 15+ messages in thread
From: Yang Jihong @ 2023-08-26  3:26 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, kan.liang, james.clark, tmricht,
	ak, anshuman.khandual, linux-kernel, linux-perf-users
  Cc: yangjihong1
If all (non-dummy) evsel have exclude_user, system_wide sideband is not
needed. Add this test scenario.
Test result:
  # ./perf test list 2>&1 | grep 'Setup struct perf_event_attr'
   17: Setup struct perf_event_attr
  # ./perf test 17 -v
   17: Setup struct perf_event_attr                                    :
  --- start ---
  test child forked, pid 720198
  <SNIP>
  running './tests/attr/test-record-C0-all-kernel'
  <SNIP>
  test child finished with 0
  ---- end ----
  Setup struct perf_event_attr: Ok
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
---
 .../perf/tests/attr/test-record-C0-all-kernel | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 tools/perf/tests/attr/test-record-C0-all-kernel
diff --git a/tools/perf/tests/attr/test-record-C0-all-kernel b/tools/perf/tests/attr/test-record-C0-all-kernel
new file mode 100644
index 000000000000..2d7549277c1e
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-C0-all-kernel
@@ -0,0 +1,32 @@
+[config]
+command = record
+args    = --no-bpf-event --all-kernel -C 0 kill >/dev/null 2>&1
+ret     = 1
+
+[event:base-record]
+cpu=0
+
+# no enable on exec for CPU attached
+enable_on_exec=0
+
+# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
+# PERF_SAMPLE_PERIOD | PERF_SAMPLE_IDENTIFIER
+# + PERF_SAMPLE_CPU added by -C 0
+sample_type=65927
+
+# Dummy event handles mmaps, comm and task.
+mmap=0
+comm=0
+task=0
+
+# exclude_user for all-kernel option
+exclude_user=1
+
+[event:system-wide-dummy]
+
+# system_wide is not need for all (non-dummy) events have exclude_user
+cpu=0
+
+# exclude_user for all-kernel option
+exclude_user=1
+exclude_kernel=0
-- 
2.30.GIT
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-26  3:26 [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Yang Jihong
                   ` (5 preceding siblings ...)
  2023-08-26  3:26 ` [PATCH v7 6/6] perf test: Add perf_event_attr test for record selected CPUs exclude_user Yang Jihong
@ 2023-08-29  5:37 ` Ravi Bangoria
  2023-08-29  9:34   ` Yang Jihong
  6 siblings, 1 reply; 15+ messages in thread
From: Ravi Bangoria @ 2023-08-29  5:37 UTC (permalink / raw)
  To: Yang Jihong, acme, irogers, jolsa, namhyung, adrian.hunter
  Cc: peterz, mingo, mark.rutland, alexander.shishkin, kan.liang,
	james.clark, tmricht, ak, anshuman.khandual, linux-kernel,
	linux-perf-users, Ravi Bangoria
On 26-Aug-23 8:56 AM, Yang Jihong wrote:
> User space tasks can migrate between CPUs, track sideband events for all
> CPUs.
I've tested this series with simple test program and it mostly works fine.
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Since we are recording sideband data on all cpus, perf will endup recording
lots of unnecessary data, esp. on large systems. E.g. on a 128 cpu system:
Without patch:
  $ sudo ./perf record -C 10 -o perf.data.without -- sleep 10
  $ du -d1 -ha ./perf.data.without
  3.0M    ./perf.data.without
  $ sudo ./perf report --stats -i perf.data.without
  Aggregated stats:
             TOTAL events:      47011
              MMAP events:         51  ( 0.1%)
              COMM events:       1549  ( 3.3%)
              EXIT events:        105  ( 0.2%)
              FORK events:       1544  ( 3.3%)
            SAMPLE events:      38226  (81.3%)
             MMAP2 events:       5485  (11.7%)
             ...
  cycles:P stats:
            SAMPLE events:      38226
With patch:
  $ sudo ./perf record -C 10 -o perf.data.with -- sleep 10
  $ du -d1 -ha ./perf.data.with
  15M     ./perf.data.with
  $ sudo ./perf report --stats -i perf.data.with
  Aggregated stats:
             TOTAL events:     160164
              MMAP events:         51  ( 0.0%)
              COMM events:      12879  ( 8.0%)
              EXIT events:      11192  ( 7.0%)
              FORK events:      12669  ( 7.9%)
            SAMPLE events:      38464  (24.0%)
             MMAP2 events:      84844  (53.0%)
             ...
  cycles:P stats:
            SAMPLE events:      38464
Number of actual samples are same ~38K. However, the perf.data file is 5x
bigger because of additional sideband data.
I'm pretty sure we don't need most of those additional data. So, thinking
loud, should we post-process perf.data file and filter out unnecessary data?
Thanks,
Ravi
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-29  5:37 ` [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs Ravi Bangoria
@ 2023-08-29  9:34   ` Yang Jihong
  2023-08-29 12:24     ` Ravi Bangoria
  0 siblings, 1 reply; 15+ messages in thread
From: Yang Jihong @ 2023-08-29  9:34 UTC (permalink / raw)
  To: Ravi Bangoria, acme, irogers, jolsa, namhyung, adrian.hunter
  Cc: peterz, mingo, mark.rutland, alexander.shishkin, kan.liang,
	james.clark, tmricht, ak, anshuman.khandual, linux-kernel,
	linux-perf-users
Hello,
On 2023/8/29 13:37, Ravi Bangoria wrote:
> On 26-Aug-23 8:56 AM, Yang Jihong wrote:
>> User space tasks can migrate between CPUs, track sideband events for all
>> CPUs.
> 
> I've tested this series with simple test program and it mostly works fine.
> 
> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
> 
Thanks for the test.
> Since we are recording sideband data on all cpus, perf will endup recording
> lots of unnecessary data, esp. on large systems. E.g. on a 128 cpu system:
> 
> Without patch:
>    $ sudo ./perf record -C 10 -o perf.data.without -- sleep 10
>    $ du -d1 -ha ./perf.data.without
>    3.0M    ./perf.data.without
> 
>    $ sudo ./perf report --stats -i perf.data.without
>    Aggregated stats:
>               TOTAL events:      47011
>                MMAP events:         51  ( 0.1%)
>                COMM events:       1549  ( 3.3%)
>                EXIT events:        105  ( 0.2%)
>                FORK events:       1544  ( 3.3%)
>              SAMPLE events:      38226  (81.3%)
>               MMAP2 events:       5485  (11.7%)
>               ...
>    cycles:P stats:
>              SAMPLE events:      38226
> 
> With patch:
>    $ sudo ./perf record -C 10 -o perf.data.with -- sleep 10
>    $ du -d1 -ha ./perf.data.with
>    15M     ./perf.data.with
> 
>    $ sudo ./perf report --stats -i perf.data.with
>    Aggregated stats:
>               TOTAL events:     160164
>                MMAP events:         51  ( 0.0%)
>                COMM events:      12879  ( 8.0%)
>                EXIT events:      11192  ( 7.0%)
>                FORK events:      12669  ( 7.9%)
>              SAMPLE events:      38464  (24.0%)
>               MMAP2 events:      84844  (53.0%)
>               ...
>    cycles:P stats:
>              SAMPLE events:      38464
> 
> Number of actual samples are same ~38K. However, the perf.data file is 5x
> bigger because of additional sideband data.
Yes, if record system wide sideband data, the amount of sideband events 
will increase proportionally, which is expected.
> 
> I'm pretty sure we don't need most of those additional data. So, thinking
> loud, should we post-process perf.data file and filter out unnecessary data?
> 
I wonder if we can add a new function in perf inject.
By reading perf.data and comparing tid of SAMPLE events and sideband 
events, we can filter out the sideband data of unmatched tasks.
That's just my idea, or there's another better solution.
Thanks,
Yang
^ permalink raw reply	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-29  9:34   ` Yang Jihong
@ 2023-08-29 12:24     ` Ravi Bangoria
  2023-08-31 10:13       ` Adrian Hunter
  0 siblings, 1 reply; 15+ messages in thread
From: Ravi Bangoria @ 2023-08-29 12:24 UTC (permalink / raw)
  To: Yang Jihong, acme, irogers, jolsa, namhyung, adrian.hunter
  Cc: peterz, mingo, mark.rutland, alexander.shishkin, kan.liang,
	james.clark, tmricht, ak, anshuman.khandual, linux-kernel,
	linux-perf-users, Ravi Bangoria
>> Number of actual samples are same ~38K. However, the perf.data file is 5x
>> bigger because of additional sideband data.
> 
> Yes, if record system wide sideband data, the amount of sideband events will increase proportionally, which is expected.
> 
>>
>> I'm pretty sure we don't need most of those additional data. So, thinking
>> loud, should we post-process perf.data file and filter out unnecessary data?
>>
> 
> I wonder if we can add a new function in perf inject.
Ok. perf inject is one option. But shall we do it bydefault in perf-record?
It's needed only when profiling target is set of cpus, not for systemwide or
per-process mode.
> By reading perf.data and comparing tid of SAMPLE events and sideband events, we can filter out the sideband data of unmatched tasks.
Yup. But AFAIK, perf-record keeps writing to perf.data and never does post-
processing on it. So adding support for this will take a bit of effort. Not
sure if we should do it as part of this series.
Thanks,
Ravi
^ permalink raw reply	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v7 0/6] perf record: Track sideband events for all CPUs when tracing selected CPUs
  2023-08-29 12:24     ` Ravi Bangoria
@ 2023-08-31 10:13       ` Adrian Hunter
  0 siblings, 0 replies; 15+ messages in thread
From: Adrian Hunter @ 2023-08-31 10:13 UTC (permalink / raw)
  To: Ravi Bangoria, Yang Jihong, acme, irogers, jolsa, namhyung
  Cc: peterz, mingo, mark.rutland, alexander.shishkin, kan.liang,
	james.clark, tmricht, ak, anshuman.khandual, linux-kernel,
	linux-perf-users
On 29/08/23 15:24, Ravi Bangoria wrote:
>>> Number of actual samples are same ~38K. However, the perf.data file is 5x
>>> bigger because of additional sideband data.
>>
>> Yes, if record system wide sideband data, the amount of sideband events will increase proportionally, which is expected.
>>
>>>
>>> I'm pretty sure we don't need most of those additional data. So, thinking
>>> loud, should we post-process perf.data file and filter out unnecessary data?
>>>
>>
>> I wonder if we can add a new function in perf inject.
> 
> Ok. perf inject is one option. But shall we do it bydefault in perf-record?
> It's needed only when profiling target is set of cpus, not for systemwide or
> per-process mode.
> 
>> By reading perf.data and comparing tid of SAMPLE events and sideband events, we can filter out the sideband data of unmatched tasks.
> 
> Yup. But AFAIK, perf-record keeps writing to perf.data and never does post-
> processing on it. So adding support for this will take a bit of effort. Not
> sure if we should do it as part of this series.
I agree there could be more work, but probably better for a separate patch set.
^ permalink raw reply	[flat|nested] 15+ messages in thread