linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1)
@ 2022-12-09 19:07 Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Hello,

This patchset adds two more aggregation modes for perf lock contention.

The first one is the per-task mode with -t/--threads option.  The option
was there already but it remained broken with BPF for a while.  Now it
supports the output with and without BPF.

  # perf lock contention -a -b -t -- sleep 1
   contended   total wait     max wait     avg wait          pid   comm

          11     11.85 us      2.23 us      1.08 us            0   swapper
           2     11.13 us     10.22 us      5.56 us       749053   ThreadPoolForeg
           1      8.15 us      8.15 us      8.15 us       321353   Chrome_ChildIOT
           2      2.73 us      1.77 us      1.37 us       320761   Chrome_ChildIOT
           1      1.40 us      1.40 us      1.40 us       320502   chrome
           1       379 ns       379 ns       379 ns       321227   chrome

The other one is the per-lock-instance mode with -l/--lock-addr option.
If the lock has a symbol, it will be displayed as well.

  # perf lock contention -a -b -l -- sleep 1
   contended   total wait     max wait     avg wait            address   symbol

           3      4.79 us      2.33 us      1.60 us   ffffffffbaed50c0   rcu_state
           4      4.19 us      1.62 us      1.05 us   ffffffffbae07a40   jiffies_lock
           1      1.94 us      1.94 us      1.94 us   ffff9262277861e0
           1       387 ns       387 ns       387 ns   ffff9260bfda4f60

It's based on the current acme/tmp.perf/core branch.
You can find the code in the 'perf/lock-con-aggr-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (4):
  perf lock contention: Add lock_data.h for common data
  perf lock contention: Implement -t/--threads option for BPF
  perf lock contention: Add -l/--lock-addr option
  perf test: Update perf lock contention test

 tools/perf/Documentation/perf-lock.txt        |  4 +
 tools/perf/builtin-lock.c                     | 95 ++++++++++++++-----
 tools/perf/tests/shell/lock_contention.sh     | 48 ++++++++++
 tools/perf/util/bpf_lock_contention.c         | 72 ++++++++++----
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 67 +++++++++----
 tools/perf/util/bpf_skel/lock_data.h          | 30 ++++++
 tools/perf/util/lock-contention.h             |  1 +
 7 files changed, 255 insertions(+), 62 deletions(-)
 create mode 100644 tools/perf/util/bpf_skel/lock_data.h


base-commit: b22802e295a80ec16e355d7208d2fbbd7bbc1b7a
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-12 19:42   ` Arnaldo Carvalho de Melo
  2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
to define the common data structures.  No functional changes.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_lock_contention.c         | 19 ++++--------
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 ++---------
 tools/perf/util/bpf_skel/lock_data.h          | 30 +++++++++++++++++++
 3 files changed, 38 insertions(+), 28 deletions(-)
 create mode 100644 tools/perf/util/bpf_skel/lock_data.h

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index f4ebb9a2e380..b6a8eb7164b3 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -12,17 +12,10 @@
 #include <bpf/bpf.h>
 
 #include "bpf_skel/lock_contention.skel.h"
+#include "bpf_skel/lock_data.h"
 
 static struct lock_contention_bpf *skel;
 
-struct lock_contention_data {
-	u64 total_time;
-	u64 min_time;
-	u64 max_time;
-	u32 count;
-	u32 flags;
-};
-
 int lock_contention_prepare(struct lock_contention *con)
 {
 	int i, fd;
@@ -110,8 +103,8 @@ int lock_contention_stop(void)
 int lock_contention_read(struct lock_contention *con)
 {
 	int fd, stack, err = 0;
-	s32 prev_key, key;
-	struct lock_contention_data data = {};
+	struct contention_key *prev_key, key;
+	struct contention_data data = {};
 	struct lock_stat *st = NULL;
 	struct machine *machine = con->machine;
 	u64 *stack_trace;
@@ -126,8 +119,8 @@ int lock_contention_read(struct lock_contention *con)
 	if (stack_trace == NULL)
 		return -1;
 
-	prev_key = 0;
-	while (!bpf_map_get_next_key(fd, &prev_key, &key)) {
+	prev_key = NULL;
+	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
 		struct symbol *sym;
 		int idx = 0;
@@ -184,7 +177,7 @@ int lock_contention_read(struct lock_contention *con)
 		}
 
 		hlist_add_head(&st->hash_entry, con->result);
-		prev_key = key;
+		prev_key = &key;
 
 		/* we're fine now, reset the values */
 		st = NULL;
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 9681cb59b0df..0f63cc28ccba 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -5,24 +5,11 @@
 #include <bpf/bpf_tracing.h>
 #include <bpf/bpf_core_read.h>
 
-/* maximum stack trace depth */
-#define MAX_STACKS   8
+#include "lock_data.h"
 
 /* default buffer size */
 #define MAX_ENTRIES  10240
 
-struct contention_key {
-	__s32 stack_id;
-};
-
-struct contention_data {
-	__u64 total_time;
-	__u64 min_time;
-	__u64 max_time;
-	__u32 count;
-	__u32 flags;
-};
-
 struct tstamp_data {
 	__u64 timestamp;
 	__u64 lock;
@@ -34,7 +21,7 @@ struct tstamp_data {
 struct {
 	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
 	__uint(key_size, sizeof(__u32));
-	__uint(value_size, MAX_STACKS * sizeof(__u64));
+	__uint(value_size, sizeof(__u64));
 	__uint(max_entries, MAX_ENTRIES);
 } stacks SEC(".maps");
 
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
new file mode 100644
index 000000000000..dbdf4caedc4a
--- /dev/null
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Data structures shared between BPF and tools. */
+#ifndef UTIL_BPF_SKEL_LOCK_DATA_H
+#define UTIL_BPF_SKEL_LOCK_DATA_H
+
+struct contention_key {
+	s32 stack_or_task_id;
+};
+
+#define TASK_COMM_LEN  16
+
+struct contention_task_data {
+	char comm[TASK_COMM_LEN];
+};
+
+struct contention_data {
+	u64 total_time;
+	u64 min_time;
+	u64 max_time;
+	u32 count;
+	u32 flags;
+};
+
+enum lock_aggr_mode {
+	LOCK_AGGR_ADDR = 0,
+	LOCK_AGGR_TASK,
+	LOCK_AGGR_CALLER,
+};
+
+#endif /* UTIL_BPF_SKEL_LOCK_DATA_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

The BPF didn't show the per-thread stat properly.  Use task's thread id (PID)
as a key instead of stack_id and add a task_data map to save task comm names.

  $ sudo ./perf lock con -abt -E 5 sleep 1
   contended   total wait     max wait     avg wait          pid   comm

           1    740.66 ms    740.66 ms    740.66 ms         1950   nv_queue
           3    305.50 ms    298.19 ms    101.83 ms         1884   nvidia-modeset/
           1     25.14 us     25.14 us     25.14 us      2725038   EventManager_De
          12     23.09 us      9.30 us      1.92 us            0   swapper
           1     20.18 us     20.18 us     20.18 us      2725033   EventManager_De

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-lock.c                     | 13 ++----
 tools/perf/util/bpf_lock_contention.c         | 40 ++++++++++++++++--
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 41 +++++++++++++++++--
 tools/perf/util/lock-contention.h             |  1 +
 4 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 15ce6358f127..6fa3cdfec5cb 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -12,6 +12,7 @@
 #include "util/target.h"
 #include "util/callchain.h"
 #include "util/lock-contention.h"
+#include "util/bpf_skel/lock_data.h"
 
 #include <subcmd/pager.h>
 #include <subcmd/parse-options.h>
@@ -61,11 +62,7 @@ static int max_stack_depth = CONTENTION_STACK_DEPTH;
 static int stack_skip = CONTENTION_STACK_SKIP;
 static int print_nr_entries = INT_MAX / 2;
 
-static enum {
-	LOCK_AGGR_ADDR,
-	LOCK_AGGR_TASK,
-	LOCK_AGGR_CALLER,
-} aggr_mode = LOCK_AGGR_ADDR;
+static enum lock_aggr_mode aggr_mode = LOCK_AGGR_ADDR;
 
 static struct thread_stat *thread_stat_find(u32 tid)
 {
@@ -1619,6 +1616,7 @@ static int __cmd_contention(int argc, const char **argv)
 		.map_nr_entries = bpf_map_entries,
 		.max_stack = max_stack_depth,
 		.stack_skip = stack_skip,
+		.aggr_mode = show_thread_stats ? LOCK_AGGR_TASK : LOCK_AGGR_CALLER,
 	};
 
 	session = perf_session__new(use_bpf ? NULL : &data, &eops);
@@ -1691,11 +1689,6 @@ static int __cmd_contention(int argc, const char **argv)
 	if (select_key(true))
 		goto out_delete;
 
-	if (show_thread_stats)
-		aggr_mode = LOCK_AGGR_TASK;
-	else
-		aggr_mode = LOCK_AGGR_CALLER;
-
 	if (use_bpf) {
 		lock_contention_start();
 		if (argc)
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index b6a8eb7164b3..1590a9f05145 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -5,6 +5,7 @@
 #include "util/map.h"
 #include "util/symbol.h"
 #include "util/target.h"
+#include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/lock-contention.h"
 #include <linux/zalloc.h>
@@ -30,10 +31,17 @@ int lock_contention_prepare(struct lock_contention *con)
 	}
 
 	bpf_map__set_value_size(skel->maps.stacks, con->max_stack * sizeof(u64));
-	bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
 	bpf_map__set_max_entries(skel->maps.lock_stat, con->map_nr_entries);
 	bpf_map__set_max_entries(skel->maps.tstamp, con->map_nr_entries);
 
+	if (con->aggr_mode == LOCK_AGGR_TASK) {
+		bpf_map__set_max_entries(skel->maps.task_data, con->map_nr_entries);
+		bpf_map__set_max_entries(skel->maps.stacks, 1);
+	} else {
+		bpf_map__set_max_entries(skel->maps.task_data, 1);
+		bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
+	}
+
 	if (target__has_cpu(target))
 		ncpus = perf_cpu_map__nr(evlist->core.user_requested_cpus);
 	if (target__has_task(target))
@@ -82,7 +90,9 @@ int lock_contention_prepare(struct lock_contention *con)
 		bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
 	}
 
+	/* these don't work well if in the rodata section */
 	skel->bss->stack_skip = con->stack_skip;
+	skel->bss->aggr_mode = con->aggr_mode;
 
 	lock_contention_bpf__attach(skel);
 	return 0;
@@ -102,7 +112,7 @@ int lock_contention_stop(void)
 
 int lock_contention_read(struct lock_contention *con)
 {
-	int fd, stack, err = 0;
+	int fd, stack, task_fd, err = 0;
 	struct contention_key *prev_key, key;
 	struct contention_data data = {};
 	struct lock_stat *st = NULL;
@@ -112,6 +122,7 @@ int lock_contention_read(struct lock_contention *con)
 
 	fd = bpf_map__fd(skel->maps.lock_stat);
 	stack = bpf_map__fd(skel->maps.stacks);
+	task_fd = bpf_map__fd(skel->maps.task_data);
 
 	con->lost = skel->bss->lost;
 
@@ -119,6 +130,13 @@ int lock_contention_read(struct lock_contention *con)
 	if (stack_trace == NULL)
 		return -1;
 
+	if (con->aggr_mode == LOCK_AGGR_TASK) {
+		struct thread *idle = __machine__findnew_thread(machine,
+								/*pid=*/0,
+								/*tid=*/0);
+		thread__set_comm(idle, "swapper", /*timestamp=*/0);
+	}
+
 	prev_key = NULL;
 	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
@@ -143,6 +161,22 @@ int lock_contention_read(struct lock_contention *con)
 
 		st->flags = data.flags;
 
+		if (con->aggr_mode == LOCK_AGGR_TASK) {
+			struct contention_task_data task;
+			struct thread *t;
+
+			st->addr = key.stack_or_task_id;
+
+			/* do not update idle comm which contains CPU number */
+			if (st->addr) {
+				bpf_map_lookup_elem(task_fd, &key, &task);
+				t = __machine__findnew_thread(machine, /*pid=*/-1,
+							      key.stack_or_task_id);
+				thread__set_comm(t, task.comm, /*timestamp=*/0);
+			}
+			goto next;
+		}
+
 		bpf_map_lookup_elem(stack, &key, stack_trace);
 
 		/* skip lock internal functions */
@@ -175,7 +209,7 @@ int lock_contention_read(struct lock_contention *con)
 			if (st->callstack == NULL)
 				break;
 		}
-
+next:
 		hlist_add_head(&st->hash_entry, con->result);
 		prev_key = &key;
 
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 0f63cc28ccba..cd405adcd252 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -41,6 +41,13 @@ struct {
 	__uint(max_entries, MAX_ENTRIES);
 } lock_stat SEC(".maps");
 
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct contention_task_data));
+	__uint(max_entries, MAX_ENTRIES);
+} task_data SEC(".maps");
+
 struct {
 	__uint(type, BPF_MAP_TYPE_HASH);
 	__uint(key_size, sizeof(__u32));
@@ -61,6 +68,9 @@ int has_cpu;
 int has_task;
 int stack_skip;
 
+/* determine the key of lock stat */
+int aggr_mode;
+
 /* error stat */
 int lost;
 
@@ -87,6 +97,19 @@ static inline int can_record(void)
 	return 1;
 }
 
+static inline void update_task_data(__u32 pid)
+{
+	struct contention_task_data *p;
+
+	p = bpf_map_lookup_elem(&task_data, &pid);
+	if (p == NULL) {
+		struct contention_task_data data;
+
+		bpf_get_current_comm(data.comm, sizeof(data.comm));
+		bpf_map_update_elem(&task_data, &pid, &data, BPF_NOEXIST);
+	}
+}
+
 SEC("tp_btf/contention_begin")
 int contention_begin(u64 *ctx)
 {
@@ -115,10 +138,14 @@ int contention_begin(u64 *ctx)
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
-	pelem->stack_id = bpf_get_stackid(ctx, &stacks, BPF_F_FAST_STACK_CMP | stack_skip);
 
-	if (pelem->stack_id < 0)
-		lost++;
+	if (aggr_mode == LOCK_AGGR_CALLER) {
+		pelem->stack_id = bpf_get_stackid(ctx, &stacks,
+						  BPF_F_FAST_STACK_CMP | stack_skip);
+		if (pelem->stack_id < 0)
+			lost++;
+	}
+
 	return 0;
 }
 
@@ -141,7 +168,13 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	key.stack_id = pelem->stack_id;
+	if (aggr_mode == LOCK_AGGR_CALLER) {
+		key.stack_or_task_id = pelem->stack_id;
+	} else {
+		key.stack_or_task_id = pid;
+		update_task_data(pid);
+	}
+
 	data = bpf_map_lookup_elem(&lock_stat, &key);
 	if (!data) {
 		struct contention_data first = {
diff --git a/tools/perf/util/lock-contention.h b/tools/perf/util/lock-contention.h
index a2346875098d..47fd47fb56c1 100644
--- a/tools/perf/util/lock-contention.h
+++ b/tools/perf/util/lock-contention.h
@@ -117,6 +117,7 @@ struct lock_contention {
 	int lost;
 	int max_stack;
 	int stack_skip;
+	int aggr_mode;
 };
 
 #ifdef HAVE_BPF_SKEL
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] perf lock contention: Add -l/--lock-addr option
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
  2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
  2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR.  It displays lock address and optionally
symbol name if exists.

  $ sudo ./perf lock con -abl sleep 1
   contended   total wait     max wait     avg wait            address   symbol

           1     36.28 us     36.28 us     36.28 us   ffff92615d6448b8
           9     10.91 us      1.84 us      1.21 us   ffffffffbaed50c0   rcu_state
           1     10.49 us     10.49 us     10.49 us   ffff9262ac4f0c80
           8      4.68 us      1.67 us       585 ns   ffffffffbae07a40   jiffies_lock
           3      3.03 us      1.45 us      1.01 us   ffff9262277861e0
           1       924 ns       924 ns       924 ns   ffff926095ba9d20
           1       436 ns       436 ns       436 ns   ffff9260bfda4f60

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-lock.txt        |  4 +
 tools/perf/builtin-lock.c                     | 84 +++++++++++++++----
 tools/perf/util/bpf_lock_contention.c         | 23 +++--
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 +++-
 tools/perf/util/bpf_skel/lock_data.h          |  2 +-
 5 files changed, 102 insertions(+), 28 deletions(-)

diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt
index 4958a1ffa1cc..38e79d45e426 100644
--- a/tools/perf/Documentation/perf-lock.txt
+++ b/tools/perf/Documentation/perf-lock.txt
@@ -168,6 +168,10 @@ CONTENTION OPTIONS
 --entries=<value>::
 	Display this many entries.
 
+-l::
+--lock-addr::
+	Show lock contention stat by address
+
 
 SEE ALSO
 --------
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 6fa3cdfec5cb..25c0a5e5051f 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -56,6 +56,7 @@ static struct rb_root		thread_stats;
 
 static bool combine_locks;
 static bool show_thread_stats;
+static bool show_lock_addrs;
 static bool use_bpf;
 static unsigned long bpf_map_entries = 10240;
 static int max_stack_depth = CONTENTION_STACK_DEPTH;
@@ -999,13 +1000,32 @@ static int report_lock_contention_begin_event(struct evsel *evsel,
 	ls = lock_stat_find(key);
 	if (!ls) {
 		char buf[128];
-		const char *caller = buf;
+		const char *name = "";
 		unsigned int flags = evsel__intval(evsel, sample, "flags");
+		struct machine *machine = &session->machines.host;
+		struct map *kmap;
+		struct symbol *sym;
+
+		switch (aggr_mode) {
+		case LOCK_AGGR_ADDR:
+			/* make sure it loads the kernel map to find lock symbols */
+			map__load(machine__kernel_map(machine));
+
+			sym = machine__find_kernel_symbol(machine, key, &kmap);
+			if (sym)
+				name = sym->name;
+			break;
+		case LOCK_AGGR_CALLER:
+			name = buf;
+			if (lock_contention_caller(evsel, sample, buf, sizeof(buf)) < 0)
+				name = "Unknown";
+			break;
+		case LOCK_AGGR_TASK:
+		default:
+			break;
+		}
 
-		if (lock_contention_caller(evsel, sample, buf, sizeof(buf)) < 0)
-			caller = "Unknown";
-
-		ls = lock_stat_findnew(key, caller, flags);
+		ls = lock_stat_findnew(key, name, flags);
 		if (!ls)
 			return -ENOMEM;
 
@@ -1460,10 +1480,19 @@ static void print_contention_result(struct lock_contention *con)
 		list_for_each_entry(key, &lock_keys, list)
 			pr_info("%*s ", key->len, key->header);
 
-		if (show_thread_stats)
+		switch (aggr_mode) {
+		case LOCK_AGGR_TASK:
 			pr_info("  %10s   %s\n\n", "pid", "comm");
-		else
+			break;
+		case LOCK_AGGR_CALLER:
 			pr_info("  %10s   %s\n\n", "type", "caller");
+			break;
+		case LOCK_AGGR_ADDR:
+			pr_info("  %16s   %s\n\n", "address", "symbol");
+			break;
+		default:
+			break;
+		}
 	}
 
 	bad = total = printed = 0;
@@ -1471,6 +1500,9 @@ static void print_contention_result(struct lock_contention *con)
 		bad = bad_hist[BROKEN_CONTENDED];
 
 	while ((st = pop_from_result())) {
+		struct thread *t;
+		int pid;
+
 		total += use_bpf ? st->nr_contended : 1;
 		if (st->broken)
 			bad++;
@@ -1480,18 +1512,24 @@ static void print_contention_result(struct lock_contention *con)
 			pr_info(" ");
 		}
 
-		if (show_thread_stats) {
-			struct thread *t;
-			int pid = st->addr;
-
-			/* st->addr contains tid of thread */
+		switch (aggr_mode) {
+		case LOCK_AGGR_CALLER:
+			pr_info("  %10s   %s\n", get_type_str(st), st->name);
+			break;
+		case LOCK_AGGR_TASK:
+			pid = st->addr;
 			t = perf_session__findnew(session, pid);
 			pr_info("  %10d   %s\n", pid, thread__comm_str(t));
-			goto next;
+			break;
+		case LOCK_AGGR_ADDR:
+			pr_info("  %016llx   %s\n", (unsigned long long)st->addr,
+				st->name ? : "");
+			break;
+		default:
+			break;
 		}
 
-		pr_info("  %10s   %s\n", get_type_str(st), st->name);
-		if (verbose) {
+		if (aggr_mode == LOCK_AGGR_CALLER && verbose) {
 			struct map *kmap;
 			struct symbol *sym;
 			char buf[128];
@@ -1508,7 +1546,6 @@ static void print_contention_result(struct lock_contention *con)
 			}
 		}
 
-next:
 		if (++printed >= print_nr_entries)
 			break;
 	}
@@ -1616,7 +1653,6 @@ static int __cmd_contention(int argc, const char **argv)
 		.map_nr_entries = bpf_map_entries,
 		.max_stack = max_stack_depth,
 		.stack_skip = stack_skip,
-		.aggr_mode = show_thread_stats ? LOCK_AGGR_TASK : LOCK_AGGR_CALLER,
 	};
 
 	session = perf_session__new(use_bpf ? NULL : &data, &eops);
@@ -1627,6 +1663,9 @@ static int __cmd_contention(int argc, const char **argv)
 
 	con.machine = &session->machines.host;
 
+	con.aggr_mode = aggr_mode = show_thread_stats ? LOCK_AGGR_TASK :
+		show_lock_addrs ? LOCK_AGGR_ADDR : LOCK_AGGR_CALLER;
+
 	/* for lock function check */
 	symbol_conf.sort_by_name = true;
 	symbol__init(&session->header.env);
@@ -1907,6 +1946,7 @@ int cmd_lock(int argc, const char **argv)
 		    "Set the number of stack depth to skip when finding a lock caller, "
 		    "Default: " __stringify(CONTENTION_STACK_SKIP)),
 	OPT_INTEGER('E', "entries", &print_nr_entries, "display this many functions"),
+	OPT_BOOLEAN('l', "lock-addr", &show_lock_addrs, "show lock stats by address"),
 	OPT_PARENT(lock_options)
 	};
 
@@ -1976,6 +2016,16 @@ int cmd_lock(int argc, const char **argv)
 			argc = parse_options(argc, argv, contention_options,
 					     contention_usage, 0);
 		}
+
+		if (show_thread_stats && show_lock_addrs) {
+			pr_err("Cannot use thread and addr mode together\n");
+			parse_options_usage(contention_usage, contention_options,
+					    "threads", 0);
+			parse_options_usage(NULL, contention_options,
+					    "lock-addr", 0);
+			return -1;
+		}
+
 		rc = __cmd_contention(argc, argv);
 	} else {
 		usage_with_options(lock_usage, lock_options);
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index 1590a9f05145..8e1b791dc58f 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -137,11 +137,15 @@ int lock_contention_read(struct lock_contention *con)
 		thread__set_comm(idle, "swapper", /*timestamp=*/0);
 	}
 
+	/* make sure it loads the kernel map */
+	map__load(maps__first(machine->kmaps));
+
 	prev_key = NULL;
 	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
 		struct symbol *sym;
 		int idx = 0;
+		s32 stack_id;
 
 		/* to handle errors in the loop body */
 		err = -1;
@@ -160,24 +164,31 @@ int lock_contention_read(struct lock_contention *con)
 			st->avg_wait_time = data.total_time / data.count;
 
 		st->flags = data.flags;
+		st->addr = key.aggr_key;
 
 		if (con->aggr_mode == LOCK_AGGR_TASK) {
 			struct contention_task_data task;
 			struct thread *t;
-
-			st->addr = key.stack_or_task_id;
+			int pid = key.aggr_key;
 
 			/* do not update idle comm which contains CPU number */
 			if (st->addr) {
-				bpf_map_lookup_elem(task_fd, &key, &task);
-				t = __machine__findnew_thread(machine, /*pid=*/-1,
-							      key.stack_or_task_id);
+				bpf_map_lookup_elem(task_fd, &pid, &task);
+				t = __machine__findnew_thread(machine, /*pid=*/-1, pid);
 				thread__set_comm(t, task.comm, /*timestamp=*/0);
 			}
 			goto next;
 		}
 
-		bpf_map_lookup_elem(stack, &key, stack_trace);
+		if (con->aggr_mode == LOCK_AGGR_ADDR) {
+			sym = machine__find_kernel_symbol(machine, st->addr, &kmap);
+			if (sym)
+				st->name = strdup(sym->name);
+			goto next;
+		}
+
+		stack_id = key.aggr_key;
+		bpf_map_lookup_elem(stack, &stack_id, stack_trace);
 
 		/* skip lock internal functions */
 		while (machine__is_lock_function(machine, stack_trace[idx]) &&
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index cd405adcd252..11b0fc7ee53b 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -168,11 +168,20 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	if (aggr_mode == LOCK_AGGR_CALLER) {
-		key.stack_or_task_id = pelem->stack_id;
-	} else {
-		key.stack_or_task_id = pid;
+	switch (aggr_mode) {
+	case LOCK_AGGR_CALLER:
+		key.aggr_key = pelem->stack_id;
+		break;
+	case LOCK_AGGR_TASK:
+		key.aggr_key = pid;
 		update_task_data(pid);
+		break;
+	case LOCK_AGGR_ADDR:
+		key.aggr_key = pelem->lock;
+		break;
+	default:
+		/* should not happen */
+		return 0;
 	}
 
 	data = bpf_map_lookup_elem(&lock_stat, &key);
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index dbdf4caedc4a..ce71cf1a7e1e 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -4,7 +4,7 @@
 #define UTIL_BPF_SKEL_LOCK_DATA_H
 
 struct contention_key {
-	s32 stack_or_task_id;
+	u64 aggr_key;  /* can be stack_id, pid or lock addr */
 };
 
 #define TASK_COMM_LEN  16
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] perf test: Update perf lock contention test
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Add test cases for the task and addr aggregation modes.

  $ sudo ./perf test -v contention
   86: kernel lock contention analysis test                            :
  --- start ---
  test child forked, pid 680006
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  test child finished with 0
  ---- end ----
  kernel lock contention analysis test: Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/shell/lock_contention.sh | 48 +++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh
index f7bd0d8eb5c3..cc9ceb9e19ca 100755
--- a/tools/perf/tests/shell/lock_contention.sh
+++ b/tools/perf/tests/shell/lock_contention.sh
@@ -77,10 +77,58 @@ test_record_concurrent()
 	fi
 }
 
+test_aggr_task()
+{
+	echo "Testing perf lock contention --threads"
+	perf lock contention -i ${perfdata} -t -E 1 -q 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] Recorded result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+
+	if ! perf lock con -b true > /dev/null 2>&1 ; then
+		return
+	fi
+
+	# the perf lock contention output goes to the stderr
+	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] BPF result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+}
+
+test_aggr_addr()
+{
+	echo "Testing perf lock contention --lock-addr"
+	perf lock contention -i ${perfdata} -l -E 1 -q 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] Recorded result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+
+	if ! perf lock con -b true > /dev/null 2>&1 ; then
+		return
+	fi
+
+	# the perf lock contention output goes to the stderr
+	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] BPF result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+}
+
 check
 
 test_record
 test_bpf
 test_record_concurrent
+test_aggr_task
+test_aggr_addr
 
 exit ${err}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
@ 2022-12-12 19:42   ` Arnaldo Carvalho de Melo
  2022-12-12 19:43     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> to define the common data structures.  No functional changes.

You forgot to update one of the stack_id users, that field got renamed:

util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
        key.stack_id = pelem->stack_id;
        ~~~ ^
1 error generated.
make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
make[1]: *** [Makefile.perf:236: sub-make] Error 2
make: *** [Makefile:113: install-bin] Error 2
make: Leaving directory '/var/home/acme/git/perf/tools/perf'

 Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':

     7,005,216,342      cycles:u
    11,851,225,594      instructions:u                   #    1.69  insn per cycle

       3.168945139 seconds time elapsed

       1.730964000 seconds user
       1.578932000 seconds sys


⬢[acme@toolbox perf]$ git log --oneline -4
f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
5d9b55713c5c037f perf python: Account for multiple words in CC
d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
⬢[acme@toolbox perf]$

After some point it builds.

I'm fixing this to keep it bisectable.

- Arnaldo
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/bpf_lock_contention.c         | 19 ++++--------
>  .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 ++---------
>  tools/perf/util/bpf_skel/lock_data.h          | 30 +++++++++++++++++++
>  3 files changed, 38 insertions(+), 28 deletions(-)
>  create mode 100644 tools/perf/util/bpf_skel/lock_data.h
> 
> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index f4ebb9a2e380..b6a8eb7164b3 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -12,17 +12,10 @@
>  #include <bpf/bpf.h>
>  
>  #include "bpf_skel/lock_contention.skel.h"
> +#include "bpf_skel/lock_data.h"
>  
>  static struct lock_contention_bpf *skel;
>  
> -struct lock_contention_data {
> -	u64 total_time;
> -	u64 min_time;
> -	u64 max_time;
> -	u32 count;
> -	u32 flags;
> -};
> -
>  int lock_contention_prepare(struct lock_contention *con)
>  {
>  	int i, fd;
> @@ -110,8 +103,8 @@ int lock_contention_stop(void)
>  int lock_contention_read(struct lock_contention *con)
>  {
>  	int fd, stack, err = 0;
> -	s32 prev_key, key;
> -	struct lock_contention_data data = {};
> +	struct contention_key *prev_key, key;
> +	struct contention_data data = {};
>  	struct lock_stat *st = NULL;
>  	struct machine *machine = con->machine;
>  	u64 *stack_trace;
> @@ -126,8 +119,8 @@ int lock_contention_read(struct lock_contention *con)
>  	if (stack_trace == NULL)
>  		return -1;
>  
> -	prev_key = 0;
> -	while (!bpf_map_get_next_key(fd, &prev_key, &key)) {
> +	prev_key = NULL;
> +	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
>  		struct map *kmap;
>  		struct symbol *sym;
>  		int idx = 0;
> @@ -184,7 +177,7 @@ int lock_contention_read(struct lock_contention *con)
>  		}
>  
>  		hlist_add_head(&st->hash_entry, con->result);
> -		prev_key = key;
> +		prev_key = &key;
>  
>  		/* we're fine now, reset the values */
>  		st = NULL;
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 9681cb59b0df..0f63cc28ccba 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -5,24 +5,11 @@
>  #include <bpf/bpf_tracing.h>
>  #include <bpf/bpf_core_read.h>
>  
> -/* maximum stack trace depth */
> -#define MAX_STACKS   8
> +#include "lock_data.h"
>  
>  /* default buffer size */
>  #define MAX_ENTRIES  10240
>  
> -struct contention_key {
> -	__s32 stack_id;
> -};
> -
> -struct contention_data {
> -	__u64 total_time;
> -	__u64 min_time;
> -	__u64 max_time;
> -	__u32 count;
> -	__u32 flags;
> -};
> -
>  struct tstamp_data {
>  	__u64 timestamp;
>  	__u64 lock;
> @@ -34,7 +21,7 @@ struct tstamp_data {
>  struct {
>  	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
>  	__uint(key_size, sizeof(__u32));
> -	__uint(value_size, MAX_STACKS * sizeof(__u64));
> +	__uint(value_size, sizeof(__u64));
>  	__uint(max_entries, MAX_ENTRIES);
>  } stacks SEC(".maps");
>  
> diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> new file mode 100644
> index 000000000000..dbdf4caedc4a
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/lock_data.h
> @@ -0,0 +1,30 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +/* Data structures shared between BPF and tools. */
> +#ifndef UTIL_BPF_SKEL_LOCK_DATA_H
> +#define UTIL_BPF_SKEL_LOCK_DATA_H
> +
> +struct contention_key {
> +	s32 stack_or_task_id;
> +};
> +
> +#define TASK_COMM_LEN  16
> +
> +struct contention_task_data {
> +	char comm[TASK_COMM_LEN];
> +};
> +
> +struct contention_data {
> +	u64 total_time;
> +	u64 min_time;
> +	u64 max_time;
> +	u32 count;
> +	u32 flags;
> +};
> +
> +enum lock_aggr_mode {
> +	LOCK_AGGR_ADDR = 0,
> +	LOCK_AGGR_TASK,
> +	LOCK_AGGR_CALLER,
> +};
> +
> +#endif /* UTIL_BPF_SKEL_LOCK_DATA_H */
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:42   ` Arnaldo Carvalho de Melo
@ 2022-12-12 19:43     ` Arnaldo Carvalho de Melo
  2022-12-12 19:45       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > to define the common data structures.  No functional changes.
> 
> You forgot to update one of the stack_id users, that field got renamed:
> 
> util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
>         key.stack_id = pelem->stack_id;
>         ~~~ ^
> 1 error generated.
> make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> make[1]: *** [Makefile.perf:236: sub-make] Error 2
> make: *** [Makefile:113: install-bin] Error 2
> make: Leaving directory '/var/home/acme/git/perf/tools/perf'
> 
>  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> 
>      7,005,216,342      cycles:u
>     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> 
>        3.168945139 seconds time elapsed
> 
>        1.730964000 seconds user
>        1.578932000 seconds sys
> 
> 
> ⬢[acme@toolbox perf]$ git log --oneline -4
> f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> 5d9b55713c5c037f perf python: Account for multiple words in CC
> d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> ⬢[acme@toolbox perf]$
> 
> After some point it builds.
> 
> I'm fixing this to keep it bisectable.

I folded this:


diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	key.stack_id = pelem->stack_id;
+	key.stack_or_task_id = pelem->stack_id;
 	data = bpf_map_lookup_elem(&lock_stat, &key);
 	if (!data) {
 		struct contention_data first = {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:43     ` Arnaldo Carvalho de Melo
@ 2022-12-12 19:45       ` Arnaldo Carvalho de Melo
  2022-12-12 20:04         ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Mon, Dec 12, 2022 at 04:43:43PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > > to define the common data structures.  No functional changes.
> > 
> > You forgot to update one of the stack_id users, that field got renamed:
> > 
> > util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
> >         key.stack_id = pelem->stack_id;
> >         ~~~ ^
> > 1 error generated.
> > make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > make: *** [Makefile:113: install-bin] Error 2
> > make: Leaving directory '/var/home/acme/git/perf/tools/perf'
> > 
> >  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> > 
> >      7,005,216,342      cycles:u
> >     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> > 
> >        3.168945139 seconds time elapsed
> > 
> >        1.730964000 seconds user
> >        1.578932000 seconds sys
> > 
> > 
> > ⬢[acme@toolbox perf]$ git log --oneline -4
> > f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> > 5d9b55713c5c037f perf python: Account for multiple words in CC
> > d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> > 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> > ⬢[acme@toolbox perf]$
> > 
> > After some point it builds.
> > 
> > I'm fixing this to keep it bisectable.
> 
> I folded this:
> 
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
>  
>  	duration = bpf_ktime_get_ns() - pelem->timestamp;
>  
> -	key.stack_id = pelem->stack_id;
> +	key.stack_or_task_id = pelem->stack_id;
>  	data = bpf_map_lookup_elem(&lock_stat, &key);
>  	if (!data) {
>  		struct contention_data first = {


And then fixed up this:

Could not apply 3d4947c7bd10beba... perf lock contention: Implement -t/--threads option for BPF
⬢[acme@toolbox perf]$
⬢[acme@toolbox perf]$
⬢[acme@toolbox perf]$ git diff
diff --cc tools/perf/util/bpf_skel/lock_contention.bpf.c
index 64fd1e040ac86e58,cd405adcd252b82d..0000000000000000
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@@ -141,7 -168,13 +168,17 @@@ int contention_end(u64 *ctx

        duration = bpf_ktime_get_ns() - pelem->timestamp;

++<<<<<<< HEAD
 +      key.stack_or_task_id = pelem->stack_id;
++=======
+       if (aggr_mode == LOCK_AGGR_CALLER) {
+               key.stack_or_task_id = pelem->stack_id;
+       } else {
+               key.stack_or_task_id = pid;
+               update_task_data(pid);
+       }
+
++>>>>>>> 3d4947c7bd10beba (perf lock contention: Implement -t/--threads option for BPF)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1)
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
@ 2022-12-12 19:58 ` Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Fri, Dec 09, 2022 at 11:07:23AM -0800, Namhyung Kim escreveu:
> Hello,
> 
> This patchset adds two more aggregation modes for perf lock contention.


Thanks, applied.

- Arnaldo

 
> The first one is the per-task mode with -t/--threads option.  The option
> was there already but it remained broken with BPF for a while.  Now it
> supports the output with and without BPF.
> 
>   # perf lock contention -a -b -t -- sleep 1
>    contended   total wait     max wait     avg wait          pid   comm
> 
>           11     11.85 us      2.23 us      1.08 us            0   swapper
>            2     11.13 us     10.22 us      5.56 us       749053   ThreadPoolForeg
>            1      8.15 us      8.15 us      8.15 us       321353   Chrome_ChildIOT
>            2      2.73 us      1.77 us      1.37 us       320761   Chrome_ChildIOT
>            1      1.40 us      1.40 us      1.40 us       320502   chrome
>            1       379 ns       379 ns       379 ns       321227   chrome
> 
> The other one is the per-lock-instance mode with -l/--lock-addr option.
> If the lock has a symbol, it will be displayed as well.
> 
>   # perf lock contention -a -b -l -- sleep 1
>    contended   total wait     max wait     avg wait            address   symbol
> 
>            3      4.79 us      2.33 us      1.60 us   ffffffffbaed50c0   rcu_state
>            4      4.19 us      1.62 us      1.05 us   ffffffffbae07a40   jiffies_lock
>            1      1.94 us      1.94 us      1.94 us   ffff9262277861e0
>            1       387 ns       387 ns       387 ns   ffff9260bfda4f60
> 
> It's based on the current acme/tmp.perf/core branch.
> You can find the code in the 'perf/lock-con-aggr-v1' branch in
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (4):
>   perf lock contention: Add lock_data.h for common data
>   perf lock contention: Implement -t/--threads option for BPF
>   perf lock contention: Add -l/--lock-addr option
>   perf test: Update perf lock contention test
> 
>  tools/perf/Documentation/perf-lock.txt        |  4 +
>  tools/perf/builtin-lock.c                     | 95 ++++++++++++++-----
>  tools/perf/tests/shell/lock_contention.sh     | 48 ++++++++++
>  tools/perf/util/bpf_lock_contention.c         | 72 ++++++++++----
>  .../perf/util/bpf_skel/lock_contention.bpf.c  | 67 +++++++++----
>  tools/perf/util/bpf_skel/lock_data.h          | 30 ++++++
>  tools/perf/util/lock-contention.h             |  1 +
>  7 files changed, 255 insertions(+), 62 deletions(-)
>  create mode 100644 tools/perf/util/bpf_skel/lock_data.h
> 
> 
> base-commit: b22802e295a80ec16e355d7208d2fbbd7bbc1b7a
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:45       ` Arnaldo Carvalho de Melo
@ 2022-12-12 20:04         ` Namhyung Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-12 20:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Hi Arnaldo,

On Mon, Dec 12, 2022 at 11:45 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Mon, Dec 12, 2022 at 04:43:43PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > > > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > > > to define the common data structures.  No functional changes.
> > >
> > > You forgot to update one of the stack_id users, that field got renamed:
> > >
> > > util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
> > >         key.stack_id = pelem->stack_id;
> > >         ~~~ ^
> > > 1 error generated.
> > > make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> > > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > > make: *** [Makefile:113: install-bin] Error 2
> > > make: Leaving directory '/var/home/acme/git/perf/tools/perf'

Oops, right.

> > >
> > >  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> > >
> > >      7,005,216,342      cycles:u
> > >     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> > >
> > >        3.168945139 seconds time elapsed
> > >
> > >        1.730964000 seconds user
> > >        1.578932000 seconds sys
> > >
> > >
> > > ⬢[acme@toolbox perf]$ git log --oneline -4
> > > f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> > > 5d9b55713c5c037f perf python: Account for multiple words in CC
> > > d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> > > 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> > > ⬢[acme@toolbox perf]$
> > >
> > > After some point it builds.
> > >
> > > I'm fixing this to keep it bisectable.
> >
> > I folded this:
> >
> > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
> > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > @@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
> >
> >       duration = bpf_ktime_get_ns() - pelem->timestamp;
> >
> > -     key.stack_id = pelem->stack_id;
> > +     key.stack_or_task_id = pelem->stack_id;
> >       data = bpf_map_lookup_elem(&lock_stat, &key);
> >       if (!data) {
> >               struct contention_data first = {

Thanks for fixing this.

>
>
> And then fixed up this:
>
> Could not apply 3d4947c7bd10beba... perf lock contention: Implement -t/--threads option for BPF
> ⬢[acme@toolbox perf]$
> ⬢[acme@toolbox perf]$
> ⬢[acme@toolbox perf]$ git diff
> diff --cc tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 64fd1e040ac86e58,cd405adcd252b82d..0000000000000000
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@@ -141,7 -168,13 +168,17 @@@ int contention_end(u64 *ctx
>
>         duration = bpf_ktime_get_ns() - pelem->timestamp;
>
> ++<<<<<<< HEAD
>  +      key.stack_or_task_id = pelem->stack_id;
> ++=======
> +       if (aggr_mode == LOCK_AGGR_CALLER) {
> +               key.stack_or_task_id = pelem->stack_id;
> +       } else {
> +               key.stack_or_task_id = pid;
> +               update_task_data(pid);
> +       }
> +
> ++>>>>>>> 3d4947c7bd10beba (perf lock contention: Implement -t/--threads option for BPF)

Sure, it looks good to me.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-12 20:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
2022-12-12 19:42   ` Arnaldo Carvalho de Melo
2022-12-12 19:43     ` Arnaldo Carvalho de Melo
2022-12-12 19:45       ` Arnaldo Carvalho de Melo
2022-12-12 20:04         ` Namhyung Kim
2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).