All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1)
@ 2022-12-09 19:07 Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Hello,

This patchset adds two more aggregation modes for perf lock contention.

The first one is the per-task mode with -t/--threads option.  The option
was there already but it remained broken with BPF for a while.  Now it
supports the output with and without BPF.

  # perf lock contention -a -b -t -- sleep 1
   contended   total wait     max wait     avg wait          pid   comm

          11     11.85 us      2.23 us      1.08 us            0   swapper
           2     11.13 us     10.22 us      5.56 us       749053   ThreadPoolForeg
           1      8.15 us      8.15 us      8.15 us       321353   Chrome_ChildIOT
           2      2.73 us      1.77 us      1.37 us       320761   Chrome_ChildIOT
           1      1.40 us      1.40 us      1.40 us       320502   chrome
           1       379 ns       379 ns       379 ns       321227   chrome

The other one is the per-lock-instance mode with -l/--lock-addr option.
If the lock has a symbol, it will be displayed as well.

  # perf lock contention -a -b -l -- sleep 1
   contended   total wait     max wait     avg wait            address   symbol

           3      4.79 us      2.33 us      1.60 us   ffffffffbaed50c0   rcu_state
           4      4.19 us      1.62 us      1.05 us   ffffffffbae07a40   jiffies_lock
           1      1.94 us      1.94 us      1.94 us   ffff9262277861e0
           1       387 ns       387 ns       387 ns   ffff9260bfda4f60

It's based on the current acme/tmp.perf/core branch.
You can find the code in the 'perf/lock-con-aggr-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (4):
  perf lock contention: Add lock_data.h for common data
  perf lock contention: Implement -t/--threads option for BPF
  perf lock contention: Add -l/--lock-addr option
  perf test: Update perf lock contention test

 tools/perf/Documentation/perf-lock.txt        |  4 +
 tools/perf/builtin-lock.c                     | 95 ++++++++++++++-----
 tools/perf/tests/shell/lock_contention.sh     | 48 ++++++++++
 tools/perf/util/bpf_lock_contention.c         | 72 ++++++++++----
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 67 +++++++++----
 tools/perf/util/bpf_skel/lock_data.h          | 30 ++++++
 tools/perf/util/lock-contention.h             |  1 +
 7 files changed, 255 insertions(+), 62 deletions(-)
 create mode 100644 tools/perf/util/bpf_skel/lock_data.h


base-commit: b22802e295a80ec16e355d7208d2fbbd7bbc1b7a
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-12 19:42   ` Arnaldo Carvalho de Melo
  2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
to define the common data structures.  No functional changes.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_lock_contention.c         | 19 ++++--------
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 ++---------
 tools/perf/util/bpf_skel/lock_data.h          | 30 +++++++++++++++++++
 3 files changed, 38 insertions(+), 28 deletions(-)
 create mode 100644 tools/perf/util/bpf_skel/lock_data.h

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index f4ebb9a2e380..b6a8eb7164b3 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -12,17 +12,10 @@
 #include <bpf/bpf.h>
 
 #include "bpf_skel/lock_contention.skel.h"
+#include "bpf_skel/lock_data.h"
 
 static struct lock_contention_bpf *skel;
 
-struct lock_contention_data {
-	u64 total_time;
-	u64 min_time;
-	u64 max_time;
-	u32 count;
-	u32 flags;
-};
-
 int lock_contention_prepare(struct lock_contention *con)
 {
 	int i, fd;
@@ -110,8 +103,8 @@ int lock_contention_stop(void)
 int lock_contention_read(struct lock_contention *con)
 {
 	int fd, stack, err = 0;
-	s32 prev_key, key;
-	struct lock_contention_data data = {};
+	struct contention_key *prev_key, key;
+	struct contention_data data = {};
 	struct lock_stat *st = NULL;
 	struct machine *machine = con->machine;
 	u64 *stack_trace;
@@ -126,8 +119,8 @@ int lock_contention_read(struct lock_contention *con)
 	if (stack_trace == NULL)
 		return -1;
 
-	prev_key = 0;
-	while (!bpf_map_get_next_key(fd, &prev_key, &key)) {
+	prev_key = NULL;
+	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
 		struct symbol *sym;
 		int idx = 0;
@@ -184,7 +177,7 @@ int lock_contention_read(struct lock_contention *con)
 		}
 
 		hlist_add_head(&st->hash_entry, con->result);
-		prev_key = key;
+		prev_key = &key;
 
 		/* we're fine now, reset the values */
 		st = NULL;
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 9681cb59b0df..0f63cc28ccba 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -5,24 +5,11 @@
 #include <bpf/bpf_tracing.h>
 #include <bpf/bpf_core_read.h>
 
-/* maximum stack trace depth */
-#define MAX_STACKS   8
+#include "lock_data.h"
 
 /* default buffer size */
 #define MAX_ENTRIES  10240
 
-struct contention_key {
-	__s32 stack_id;
-};
-
-struct contention_data {
-	__u64 total_time;
-	__u64 min_time;
-	__u64 max_time;
-	__u32 count;
-	__u32 flags;
-};
-
 struct tstamp_data {
 	__u64 timestamp;
 	__u64 lock;
@@ -34,7 +21,7 @@ struct tstamp_data {
 struct {
 	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
 	__uint(key_size, sizeof(__u32));
-	__uint(value_size, MAX_STACKS * sizeof(__u64));
+	__uint(value_size, sizeof(__u64));
 	__uint(max_entries, MAX_ENTRIES);
 } stacks SEC(".maps");
 
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
new file mode 100644
index 000000000000..dbdf4caedc4a
--- /dev/null
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Data structures shared between BPF and tools. */
+#ifndef UTIL_BPF_SKEL_LOCK_DATA_H
+#define UTIL_BPF_SKEL_LOCK_DATA_H
+
+struct contention_key {
+	s32 stack_or_task_id;
+};
+
+#define TASK_COMM_LEN  16
+
+struct contention_task_data {
+	char comm[TASK_COMM_LEN];
+};
+
+struct contention_data {
+	u64 total_time;
+	u64 min_time;
+	u64 max_time;
+	u32 count;
+	u32 flags;
+};
+
+enum lock_aggr_mode {
+	LOCK_AGGR_ADDR = 0,
+	LOCK_AGGR_TASK,
+	LOCK_AGGR_CALLER,
+};
+
+#endif /* UTIL_BPF_SKEL_LOCK_DATA_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

The BPF didn't show the per-thread stat properly.  Use task's thread id (PID)
as a key instead of stack_id and add a task_data map to save task comm names.

  $ sudo ./perf lock con -abt -E 5 sleep 1
   contended   total wait     max wait     avg wait          pid   comm

           1    740.66 ms    740.66 ms    740.66 ms         1950   nv_queue
           3    305.50 ms    298.19 ms    101.83 ms         1884   nvidia-modeset/
           1     25.14 us     25.14 us     25.14 us      2725038   EventManager_De
          12     23.09 us      9.30 us      1.92 us            0   swapper
           1     20.18 us     20.18 us     20.18 us      2725033   EventManager_De

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-lock.c                     | 13 ++----
 tools/perf/util/bpf_lock_contention.c         | 40 ++++++++++++++++--
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 41 +++++++++++++++++--
 tools/perf/util/lock-contention.h             |  1 +
 4 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 15ce6358f127..6fa3cdfec5cb 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -12,6 +12,7 @@
 #include "util/target.h"
 #include "util/callchain.h"
 #include "util/lock-contention.h"
+#include "util/bpf_skel/lock_data.h"
 
 #include <subcmd/pager.h>
 #include <subcmd/parse-options.h>
@@ -61,11 +62,7 @@ static int max_stack_depth = CONTENTION_STACK_DEPTH;
 static int stack_skip = CONTENTION_STACK_SKIP;
 static int print_nr_entries = INT_MAX / 2;
 
-static enum {
-	LOCK_AGGR_ADDR,
-	LOCK_AGGR_TASK,
-	LOCK_AGGR_CALLER,
-} aggr_mode = LOCK_AGGR_ADDR;
+static enum lock_aggr_mode aggr_mode = LOCK_AGGR_ADDR;
 
 static struct thread_stat *thread_stat_find(u32 tid)
 {
@@ -1619,6 +1616,7 @@ static int __cmd_contention(int argc, const char **argv)
 		.map_nr_entries = bpf_map_entries,
 		.max_stack = max_stack_depth,
 		.stack_skip = stack_skip,
+		.aggr_mode = show_thread_stats ? LOCK_AGGR_TASK : LOCK_AGGR_CALLER,
 	};
 
 	session = perf_session__new(use_bpf ? NULL : &data, &eops);
@@ -1691,11 +1689,6 @@ static int __cmd_contention(int argc, const char **argv)
 	if (select_key(true))
 		goto out_delete;
 
-	if (show_thread_stats)
-		aggr_mode = LOCK_AGGR_TASK;
-	else
-		aggr_mode = LOCK_AGGR_CALLER;
-
 	if (use_bpf) {
 		lock_contention_start();
 		if (argc)
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index b6a8eb7164b3..1590a9f05145 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -5,6 +5,7 @@
 #include "util/map.h"
 #include "util/symbol.h"
 #include "util/target.h"
+#include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/lock-contention.h"
 #include <linux/zalloc.h>
@@ -30,10 +31,17 @@ int lock_contention_prepare(struct lock_contention *con)
 	}
 
 	bpf_map__set_value_size(skel->maps.stacks, con->max_stack * sizeof(u64));
-	bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
 	bpf_map__set_max_entries(skel->maps.lock_stat, con->map_nr_entries);
 	bpf_map__set_max_entries(skel->maps.tstamp, con->map_nr_entries);
 
+	if (con->aggr_mode == LOCK_AGGR_TASK) {
+		bpf_map__set_max_entries(skel->maps.task_data, con->map_nr_entries);
+		bpf_map__set_max_entries(skel->maps.stacks, 1);
+	} else {
+		bpf_map__set_max_entries(skel->maps.task_data, 1);
+		bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
+	}
+
 	if (target__has_cpu(target))
 		ncpus = perf_cpu_map__nr(evlist->core.user_requested_cpus);
 	if (target__has_task(target))
@@ -82,7 +90,9 @@ int lock_contention_prepare(struct lock_contention *con)
 		bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
 	}
 
+	/* these don't work well if in the rodata section */
 	skel->bss->stack_skip = con->stack_skip;
+	skel->bss->aggr_mode = con->aggr_mode;
 
 	lock_contention_bpf__attach(skel);
 	return 0;
@@ -102,7 +112,7 @@ int lock_contention_stop(void)
 
 int lock_contention_read(struct lock_contention *con)
 {
-	int fd, stack, err = 0;
+	int fd, stack, task_fd, err = 0;
 	struct contention_key *prev_key, key;
 	struct contention_data data = {};
 	struct lock_stat *st = NULL;
@@ -112,6 +122,7 @@ int lock_contention_read(struct lock_contention *con)
 
 	fd = bpf_map__fd(skel->maps.lock_stat);
 	stack = bpf_map__fd(skel->maps.stacks);
+	task_fd = bpf_map__fd(skel->maps.task_data);
 
 	con->lost = skel->bss->lost;
 
@@ -119,6 +130,13 @@ int lock_contention_read(struct lock_contention *con)
 	if (stack_trace == NULL)
 		return -1;
 
+	if (con->aggr_mode == LOCK_AGGR_TASK) {
+		struct thread *idle = __machine__findnew_thread(machine,
+								/*pid=*/0,
+								/*tid=*/0);
+		thread__set_comm(idle, "swapper", /*timestamp=*/0);
+	}
+
 	prev_key = NULL;
 	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
@@ -143,6 +161,22 @@ int lock_contention_read(struct lock_contention *con)
 
 		st->flags = data.flags;
 
+		if (con->aggr_mode == LOCK_AGGR_TASK) {
+			struct contention_task_data task;
+			struct thread *t;
+
+			st->addr = key.stack_or_task_id;
+
+			/* do not update idle comm which contains CPU number */
+			if (st->addr) {
+				bpf_map_lookup_elem(task_fd, &key, &task);
+				t = __machine__findnew_thread(machine, /*pid=*/-1,
+							      key.stack_or_task_id);
+				thread__set_comm(t, task.comm, /*timestamp=*/0);
+			}
+			goto next;
+		}
+
 		bpf_map_lookup_elem(stack, &key, stack_trace);
 
 		/* skip lock internal functions */
@@ -175,7 +209,7 @@ int lock_contention_read(struct lock_contention *con)
 			if (st->callstack == NULL)
 				break;
 		}
-
+next:
 		hlist_add_head(&st->hash_entry, con->result);
 		prev_key = &key;
 
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 0f63cc28ccba..cd405adcd252 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -41,6 +41,13 @@ struct {
 	__uint(max_entries, MAX_ENTRIES);
 } lock_stat SEC(".maps");
 
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct contention_task_data));
+	__uint(max_entries, MAX_ENTRIES);
+} task_data SEC(".maps");
+
 struct {
 	__uint(type, BPF_MAP_TYPE_HASH);
 	__uint(key_size, sizeof(__u32));
@@ -61,6 +68,9 @@ int has_cpu;
 int has_task;
 int stack_skip;
 
+/* determine the key of lock stat */
+int aggr_mode;
+
 /* error stat */
 int lost;
 
@@ -87,6 +97,19 @@ static inline int can_record(void)
 	return 1;
 }
 
+static inline void update_task_data(__u32 pid)
+{
+	struct contention_task_data *p;
+
+	p = bpf_map_lookup_elem(&task_data, &pid);
+	if (p == NULL) {
+		struct contention_task_data data;
+
+		bpf_get_current_comm(data.comm, sizeof(data.comm));
+		bpf_map_update_elem(&task_data, &pid, &data, BPF_NOEXIST);
+	}
+}
+
 SEC("tp_btf/contention_begin")
 int contention_begin(u64 *ctx)
 {
@@ -115,10 +138,14 @@ int contention_begin(u64 *ctx)
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
-	pelem->stack_id = bpf_get_stackid(ctx, &stacks, BPF_F_FAST_STACK_CMP | stack_skip);
 
-	if (pelem->stack_id < 0)
-		lost++;
+	if (aggr_mode == LOCK_AGGR_CALLER) {
+		pelem->stack_id = bpf_get_stackid(ctx, &stacks,
+						  BPF_F_FAST_STACK_CMP | stack_skip);
+		if (pelem->stack_id < 0)
+			lost++;
+	}
+
 	return 0;
 }
 
@@ -141,7 +168,13 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	key.stack_id = pelem->stack_id;
+	if (aggr_mode == LOCK_AGGR_CALLER) {
+		key.stack_or_task_id = pelem->stack_id;
+	} else {
+		key.stack_or_task_id = pid;
+		update_task_data(pid);
+	}
+
 	data = bpf_map_lookup_elem(&lock_stat, &key);
 	if (!data) {
 		struct contention_data first = {
diff --git a/tools/perf/util/lock-contention.h b/tools/perf/util/lock-contention.h
index a2346875098d..47fd47fb56c1 100644
--- a/tools/perf/util/lock-contention.h
+++ b/tools/perf/util/lock-contention.h
@@ -117,6 +117,7 @@ struct lock_contention {
 	int lost;
 	int max_stack;
 	int stack_skip;
+	int aggr_mode;
 };
 
 #ifdef HAVE_BPF_SKEL
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] perf lock contention: Add -l/--lock-addr option
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
  2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
  2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR.  It displays lock address and optionally
symbol name if exists.

  $ sudo ./perf lock con -abl sleep 1
   contended   total wait     max wait     avg wait            address   symbol

           1     36.28 us     36.28 us     36.28 us   ffff92615d6448b8
           9     10.91 us      1.84 us      1.21 us   ffffffffbaed50c0   rcu_state
           1     10.49 us     10.49 us     10.49 us   ffff9262ac4f0c80
           8      4.68 us      1.67 us       585 ns   ffffffffbae07a40   jiffies_lock
           3      3.03 us      1.45 us      1.01 us   ffff9262277861e0
           1       924 ns       924 ns       924 ns   ffff926095ba9d20
           1       436 ns       436 ns       436 ns   ffff9260bfda4f60

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-lock.txt        |  4 +
 tools/perf/builtin-lock.c                     | 84 +++++++++++++++----
 tools/perf/util/bpf_lock_contention.c         | 23 +++--
 .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 +++-
 tools/perf/util/bpf_skel/lock_data.h          |  2 +-
 5 files changed, 102 insertions(+), 28 deletions(-)

diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt
index 4958a1ffa1cc..38e79d45e426 100644
--- a/tools/perf/Documentation/perf-lock.txt
+++ b/tools/perf/Documentation/perf-lock.txt
@@ -168,6 +168,10 @@ CONTENTION OPTIONS
 --entries=<value>::
 	Display this many entries.
 
+-l::
+--lock-addr::
+	Show lock contention stat by address
+
 
 SEE ALSO
 --------
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 6fa3cdfec5cb..25c0a5e5051f 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -56,6 +56,7 @@ static struct rb_root		thread_stats;
 
 static bool combine_locks;
 static bool show_thread_stats;
+static bool show_lock_addrs;
 static bool use_bpf;
 static unsigned long bpf_map_entries = 10240;
 static int max_stack_depth = CONTENTION_STACK_DEPTH;
@@ -999,13 +1000,32 @@ static int report_lock_contention_begin_event(struct evsel *evsel,
 	ls = lock_stat_find(key);
 	if (!ls) {
 		char buf[128];
-		const char *caller = buf;
+		const char *name = "";
 		unsigned int flags = evsel__intval(evsel, sample, "flags");
+		struct machine *machine = &session->machines.host;
+		struct map *kmap;
+		struct symbol *sym;
+
+		switch (aggr_mode) {
+		case LOCK_AGGR_ADDR:
+			/* make sure it loads the kernel map to find lock symbols */
+			map__load(machine__kernel_map(machine));
+
+			sym = machine__find_kernel_symbol(machine, key, &kmap);
+			if (sym)
+				name = sym->name;
+			break;
+		case LOCK_AGGR_CALLER:
+			name = buf;
+			if (lock_contention_caller(evsel, sample, buf, sizeof(buf)) < 0)
+				name = "Unknown";
+			break;
+		case LOCK_AGGR_TASK:
+		default:
+			break;
+		}
 
-		if (lock_contention_caller(evsel, sample, buf, sizeof(buf)) < 0)
-			caller = "Unknown";
-
-		ls = lock_stat_findnew(key, caller, flags);
+		ls = lock_stat_findnew(key, name, flags);
 		if (!ls)
 			return -ENOMEM;
 
@@ -1460,10 +1480,19 @@ static void print_contention_result(struct lock_contention *con)
 		list_for_each_entry(key, &lock_keys, list)
 			pr_info("%*s ", key->len, key->header);
 
-		if (show_thread_stats)
+		switch (aggr_mode) {
+		case LOCK_AGGR_TASK:
 			pr_info("  %10s   %s\n\n", "pid", "comm");
-		else
+			break;
+		case LOCK_AGGR_CALLER:
 			pr_info("  %10s   %s\n\n", "type", "caller");
+			break;
+		case LOCK_AGGR_ADDR:
+			pr_info("  %16s   %s\n\n", "address", "symbol");
+			break;
+		default:
+			break;
+		}
 	}
 
 	bad = total = printed = 0;
@@ -1471,6 +1500,9 @@ static void print_contention_result(struct lock_contention *con)
 		bad = bad_hist[BROKEN_CONTENDED];
 
 	while ((st = pop_from_result())) {
+		struct thread *t;
+		int pid;
+
 		total += use_bpf ? st->nr_contended : 1;
 		if (st->broken)
 			bad++;
@@ -1480,18 +1512,24 @@ static void print_contention_result(struct lock_contention *con)
 			pr_info(" ");
 		}
 
-		if (show_thread_stats) {
-			struct thread *t;
-			int pid = st->addr;
-
-			/* st->addr contains tid of thread */
+		switch (aggr_mode) {
+		case LOCK_AGGR_CALLER:
+			pr_info("  %10s   %s\n", get_type_str(st), st->name);
+			break;
+		case LOCK_AGGR_TASK:
+			pid = st->addr;
 			t = perf_session__findnew(session, pid);
 			pr_info("  %10d   %s\n", pid, thread__comm_str(t));
-			goto next;
+			break;
+		case LOCK_AGGR_ADDR:
+			pr_info("  %016llx   %s\n", (unsigned long long)st->addr,
+				st->name ? : "");
+			break;
+		default:
+			break;
 		}
 
-		pr_info("  %10s   %s\n", get_type_str(st), st->name);
-		if (verbose) {
+		if (aggr_mode == LOCK_AGGR_CALLER && verbose) {
 			struct map *kmap;
 			struct symbol *sym;
 			char buf[128];
@@ -1508,7 +1546,6 @@ static void print_contention_result(struct lock_contention *con)
 			}
 		}
 
-next:
 		if (++printed >= print_nr_entries)
 			break;
 	}
@@ -1616,7 +1653,6 @@ static int __cmd_contention(int argc, const char **argv)
 		.map_nr_entries = bpf_map_entries,
 		.max_stack = max_stack_depth,
 		.stack_skip = stack_skip,
-		.aggr_mode = show_thread_stats ? LOCK_AGGR_TASK : LOCK_AGGR_CALLER,
 	};
 
 	session = perf_session__new(use_bpf ? NULL : &data, &eops);
@@ -1627,6 +1663,9 @@ static int __cmd_contention(int argc, const char **argv)
 
 	con.machine = &session->machines.host;
 
+	con.aggr_mode = aggr_mode = show_thread_stats ? LOCK_AGGR_TASK :
+		show_lock_addrs ? LOCK_AGGR_ADDR : LOCK_AGGR_CALLER;
+
 	/* for lock function check */
 	symbol_conf.sort_by_name = true;
 	symbol__init(&session->header.env);
@@ -1907,6 +1946,7 @@ int cmd_lock(int argc, const char **argv)
 		    "Set the number of stack depth to skip when finding a lock caller, "
 		    "Default: " __stringify(CONTENTION_STACK_SKIP)),
 	OPT_INTEGER('E', "entries", &print_nr_entries, "display this many functions"),
+	OPT_BOOLEAN('l', "lock-addr", &show_lock_addrs, "show lock stats by address"),
 	OPT_PARENT(lock_options)
 	};
 
@@ -1976,6 +2016,16 @@ int cmd_lock(int argc, const char **argv)
 			argc = parse_options(argc, argv, contention_options,
 					     contention_usage, 0);
 		}
+
+		if (show_thread_stats && show_lock_addrs) {
+			pr_err("Cannot use thread and addr mode together\n");
+			parse_options_usage(contention_usage, contention_options,
+					    "threads", 0);
+			parse_options_usage(NULL, contention_options,
+					    "lock-addr", 0);
+			return -1;
+		}
+
 		rc = __cmd_contention(argc, argv);
 	} else {
 		usage_with_options(lock_usage, lock_options);
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index 1590a9f05145..8e1b791dc58f 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -137,11 +137,15 @@ int lock_contention_read(struct lock_contention *con)
 		thread__set_comm(idle, "swapper", /*timestamp=*/0);
 	}
 
+	/* make sure it loads the kernel map */
+	map__load(maps__first(machine->kmaps));
+
 	prev_key = NULL;
 	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
 		struct map *kmap;
 		struct symbol *sym;
 		int idx = 0;
+		s32 stack_id;
 
 		/* to handle errors in the loop body */
 		err = -1;
@@ -160,24 +164,31 @@ int lock_contention_read(struct lock_contention *con)
 			st->avg_wait_time = data.total_time / data.count;
 
 		st->flags = data.flags;
+		st->addr = key.aggr_key;
 
 		if (con->aggr_mode == LOCK_AGGR_TASK) {
 			struct contention_task_data task;
 			struct thread *t;
-
-			st->addr = key.stack_or_task_id;
+			int pid = key.aggr_key;
 
 			/* do not update idle comm which contains CPU number */
 			if (st->addr) {
-				bpf_map_lookup_elem(task_fd, &key, &task);
-				t = __machine__findnew_thread(machine, /*pid=*/-1,
-							      key.stack_or_task_id);
+				bpf_map_lookup_elem(task_fd, &pid, &task);
+				t = __machine__findnew_thread(machine, /*pid=*/-1, pid);
 				thread__set_comm(t, task.comm, /*timestamp=*/0);
 			}
 			goto next;
 		}
 
-		bpf_map_lookup_elem(stack, &key, stack_trace);
+		if (con->aggr_mode == LOCK_AGGR_ADDR) {
+			sym = machine__find_kernel_symbol(machine, st->addr, &kmap);
+			if (sym)
+				st->name = strdup(sym->name);
+			goto next;
+		}
+
+		stack_id = key.aggr_key;
+		bpf_map_lookup_elem(stack, &stack_id, stack_trace);
 
 		/* skip lock internal functions */
 		while (machine__is_lock_function(machine, stack_trace[idx]) &&
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index cd405adcd252..11b0fc7ee53b 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -168,11 +168,20 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	if (aggr_mode == LOCK_AGGR_CALLER) {
-		key.stack_or_task_id = pelem->stack_id;
-	} else {
-		key.stack_or_task_id = pid;
+	switch (aggr_mode) {
+	case LOCK_AGGR_CALLER:
+		key.aggr_key = pelem->stack_id;
+		break;
+	case LOCK_AGGR_TASK:
+		key.aggr_key = pid;
 		update_task_data(pid);
+		break;
+	case LOCK_AGGR_ADDR:
+		key.aggr_key = pelem->lock;
+		break;
+	default:
+		/* should not happen */
+		return 0;
 	}
 
 	data = bpf_map_lookup_elem(&lock_stat, &key);
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index dbdf4caedc4a..ce71cf1a7e1e 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -4,7 +4,7 @@
 #define UTIL_BPF_SKEL_LOCK_DATA_H
 
 struct contention_key {
-	s32 stack_or_task_id;
+	u64 aggr_key;  /* can be stack_id, pid or lock addr */
 };
 
 #define TASK_COMM_LEN  16
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] perf test: Update perf lock contention test
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
@ 2022-12-09 19:07 ` Namhyung Kim
  2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-09 19:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Song Liu, Blake Jones, bpf

Add test cases for the task and addr aggregation modes.

  $ sudo ./perf test -v contention
   86: kernel lock contention analysis test                            :
  --- start ---
  test child forked, pid 680006
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  test child finished with 0
  ---- end ----
  kernel lock contention analysis test: Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/shell/lock_contention.sh | 48 +++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh
index f7bd0d8eb5c3..cc9ceb9e19ca 100755
--- a/tools/perf/tests/shell/lock_contention.sh
+++ b/tools/perf/tests/shell/lock_contention.sh
@@ -77,10 +77,58 @@ test_record_concurrent()
 	fi
 }
 
+test_aggr_task()
+{
+	echo "Testing perf lock contention --threads"
+	perf lock contention -i ${perfdata} -t -E 1 -q 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] Recorded result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+
+	if ! perf lock con -b true > /dev/null 2>&1 ; then
+		return
+	fi
+
+	# the perf lock contention output goes to the stderr
+	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] BPF result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+}
+
+test_aggr_addr()
+{
+	echo "Testing perf lock contention --lock-addr"
+	perf lock contention -i ${perfdata} -l -E 1 -q 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] Recorded result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+
+	if ! perf lock con -b true > /dev/null 2>&1 ; then
+		return
+	fi
+
+	# the perf lock contention output goes to the stderr
+	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	if [ $(cat "${result}" | wc -l) != "1" ]; then
+		echo "[Fail] BPF result count is not 1:" $(cat "${result}" | wc -l)
+		err=1
+		exit
+	fi
+}
+
 check
 
 test_record
 test_bpf
 test_record_concurrent
+test_aggr_task
+test_aggr_addr
 
 exit ${err}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
@ 2022-12-12 19:42   ` Arnaldo Carvalho de Melo
  2022-12-12 19:43     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> to define the common data structures.  No functional changes.

You forgot to update one of the stack_id users, that field got renamed:

util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
        key.stack_id = pelem->stack_id;
        ~~~ ^
1 error generated.
make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
make[1]: *** [Makefile.perf:236: sub-make] Error 2
make: *** [Makefile:113: install-bin] Error 2
make: Leaving directory '/var/home/acme/git/perf/tools/perf'

 Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':

     7,005,216,342      cycles:u
    11,851,225,594      instructions:u                   #    1.69  insn per cycle

       3.168945139 seconds time elapsed

       1.730964000 seconds user
       1.578932000 seconds sys


⬢[acme@toolbox perf]$ git log --oneline -4
f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
5d9b55713c5c037f perf python: Account for multiple words in CC
d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
⬢[acme@toolbox perf]$

After some point it builds.

I'm fixing this to keep it bisectable.

- Arnaldo
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/bpf_lock_contention.c         | 19 ++++--------
>  .../perf/util/bpf_skel/lock_contention.bpf.c  | 17 ++---------
>  tools/perf/util/bpf_skel/lock_data.h          | 30 +++++++++++++++++++
>  3 files changed, 38 insertions(+), 28 deletions(-)
>  create mode 100644 tools/perf/util/bpf_skel/lock_data.h
> 
> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index f4ebb9a2e380..b6a8eb7164b3 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -12,17 +12,10 @@
>  #include <bpf/bpf.h>
>  
>  #include "bpf_skel/lock_contention.skel.h"
> +#include "bpf_skel/lock_data.h"
>  
>  static struct lock_contention_bpf *skel;
>  
> -struct lock_contention_data {
> -	u64 total_time;
> -	u64 min_time;
> -	u64 max_time;
> -	u32 count;
> -	u32 flags;
> -};
> -
>  int lock_contention_prepare(struct lock_contention *con)
>  {
>  	int i, fd;
> @@ -110,8 +103,8 @@ int lock_contention_stop(void)
>  int lock_contention_read(struct lock_contention *con)
>  {
>  	int fd, stack, err = 0;
> -	s32 prev_key, key;
> -	struct lock_contention_data data = {};
> +	struct contention_key *prev_key, key;
> +	struct contention_data data = {};
>  	struct lock_stat *st = NULL;
>  	struct machine *machine = con->machine;
>  	u64 *stack_trace;
> @@ -126,8 +119,8 @@ int lock_contention_read(struct lock_contention *con)
>  	if (stack_trace == NULL)
>  		return -1;
>  
> -	prev_key = 0;
> -	while (!bpf_map_get_next_key(fd, &prev_key, &key)) {
> +	prev_key = NULL;
> +	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
>  		struct map *kmap;
>  		struct symbol *sym;
>  		int idx = 0;
> @@ -184,7 +177,7 @@ int lock_contention_read(struct lock_contention *con)
>  		}
>  
>  		hlist_add_head(&st->hash_entry, con->result);
> -		prev_key = key;
> +		prev_key = &key;
>  
>  		/* we're fine now, reset the values */
>  		st = NULL;
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 9681cb59b0df..0f63cc28ccba 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -5,24 +5,11 @@
>  #include <bpf/bpf_tracing.h>
>  #include <bpf/bpf_core_read.h>
>  
> -/* maximum stack trace depth */
> -#define MAX_STACKS   8
> +#include "lock_data.h"
>  
>  /* default buffer size */
>  #define MAX_ENTRIES  10240
>  
> -struct contention_key {
> -	__s32 stack_id;
> -};
> -
> -struct contention_data {
> -	__u64 total_time;
> -	__u64 min_time;
> -	__u64 max_time;
> -	__u32 count;
> -	__u32 flags;
> -};
> -
>  struct tstamp_data {
>  	__u64 timestamp;
>  	__u64 lock;
> @@ -34,7 +21,7 @@ struct tstamp_data {
>  struct {
>  	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
>  	__uint(key_size, sizeof(__u32));
> -	__uint(value_size, MAX_STACKS * sizeof(__u64));
> +	__uint(value_size, sizeof(__u64));
>  	__uint(max_entries, MAX_ENTRIES);
>  } stacks SEC(".maps");
>  
> diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> new file mode 100644
> index 000000000000..dbdf4caedc4a
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/lock_data.h
> @@ -0,0 +1,30 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +/* Data structures shared between BPF and tools. */
> +#ifndef UTIL_BPF_SKEL_LOCK_DATA_H
> +#define UTIL_BPF_SKEL_LOCK_DATA_H
> +
> +struct contention_key {
> +	s32 stack_or_task_id;
> +};
> +
> +#define TASK_COMM_LEN  16
> +
> +struct contention_task_data {
> +	char comm[TASK_COMM_LEN];
> +};
> +
> +struct contention_data {
> +	u64 total_time;
> +	u64 min_time;
> +	u64 max_time;
> +	u32 count;
> +	u32 flags;
> +};
> +
> +enum lock_aggr_mode {
> +	LOCK_AGGR_ADDR = 0,
> +	LOCK_AGGR_TASK,
> +	LOCK_AGGR_CALLER,
> +};
> +
> +#endif /* UTIL_BPF_SKEL_LOCK_DATA_H */
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:42   ` Arnaldo Carvalho de Melo
@ 2022-12-12 19:43     ` Arnaldo Carvalho de Melo
  2022-12-12 19:45       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > to define the common data structures.  No functional changes.
> 
> You forgot to update one of the stack_id users, that field got renamed:
> 
> util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
>         key.stack_id = pelem->stack_id;
>         ~~~ ^
> 1 error generated.
> make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> make[1]: *** [Makefile.perf:236: sub-make] Error 2
> make: *** [Makefile:113: install-bin] Error 2
> make: Leaving directory '/var/home/acme/git/perf/tools/perf'
> 
>  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> 
>      7,005,216,342      cycles:u
>     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> 
>        3.168945139 seconds time elapsed
> 
>        1.730964000 seconds user
>        1.578932000 seconds sys
> 
> 
> ⬢[acme@toolbox perf]$ git log --oneline -4
> f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> 5d9b55713c5c037f perf python: Account for multiple words in CC
> d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> ⬢[acme@toolbox perf]$
> 
> After some point it builds.
> 
> I'm fixing this to keep it bisectable.

I folded this:


diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
 
 	duration = bpf_ktime_get_ns() - pelem->timestamp;
 
-	key.stack_id = pelem->stack_id;
+	key.stack_or_task_id = pelem->stack_id;
 	data = bpf_map_lookup_elem(&lock_stat, &key);
 	if (!data) {
 		struct contention_data first = {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:43     ` Arnaldo Carvalho de Melo
@ 2022-12-12 19:45       ` Arnaldo Carvalho de Melo
  2022-12-12 20:04         ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Mon, Dec 12, 2022 at 04:43:43PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > > to define the common data structures.  No functional changes.
> > 
> > You forgot to update one of the stack_id users, that field got renamed:
> > 
> > util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
> >         key.stack_id = pelem->stack_id;
> >         ~~~ ^
> > 1 error generated.
> > make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > make: *** [Makefile:113: install-bin] Error 2
> > make: Leaving directory '/var/home/acme/git/perf/tools/perf'
> > 
> >  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> > 
> >      7,005,216,342      cycles:u
> >     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> > 
> >        3.168945139 seconds time elapsed
> > 
> >        1.730964000 seconds user
> >        1.578932000 seconds sys
> > 
> > 
> > ⬢[acme@toolbox perf]$ git log --oneline -4
> > f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> > 5d9b55713c5c037f perf python: Account for multiple words in CC
> > d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> > 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> > ⬢[acme@toolbox perf]$
> > 
> > After some point it builds.
> > 
> > I'm fixing this to keep it bisectable.
> 
> I folded this:
> 
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
>  
>  	duration = bpf_ktime_get_ns() - pelem->timestamp;
>  
> -	key.stack_id = pelem->stack_id;
> +	key.stack_or_task_id = pelem->stack_id;
>  	data = bpf_map_lookup_elem(&lock_stat, &key);
>  	if (!data) {
>  		struct contention_data first = {


And then fixed up this:

Could not apply 3d4947c7bd10beba... perf lock contention: Implement -t/--threads option for BPF
⬢[acme@toolbox perf]$
⬢[acme@toolbox perf]$
⬢[acme@toolbox perf]$ git diff
diff --cc tools/perf/util/bpf_skel/lock_contention.bpf.c
index 64fd1e040ac86e58,cd405adcd252b82d..0000000000000000
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@@ -141,7 -168,13 +168,17 @@@ int contention_end(u64 *ctx

        duration = bpf_ktime_get_ns() - pelem->timestamp;

++<<<<<<< HEAD
 +      key.stack_or_task_id = pelem->stack_id;
++=======
+       if (aggr_mode == LOCK_AGGR_CALLER) {
+               key.stack_or_task_id = pelem->stack_id;
+       } else {
+               key.stack_or_task_id = pid;
+               update_task_data(pid);
+       }
+
++>>>>>>> 3d4947c7bd10beba (perf lock contention: Implement -t/--threads option for BPF)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1)
  2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
@ 2022-12-12 19:58 ` Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-12-12 19:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Em Fri, Dec 09, 2022 at 11:07:23AM -0800, Namhyung Kim escreveu:
> Hello,
> 
> This patchset adds two more aggregation modes for perf lock contention.


Thanks, applied.

- Arnaldo

 
> The first one is the per-task mode with -t/--threads option.  The option
> was there already but it remained broken with BPF for a while.  Now it
> supports the output with and without BPF.
> 
>   # perf lock contention -a -b -t -- sleep 1
>    contended   total wait     max wait     avg wait          pid   comm
> 
>           11     11.85 us      2.23 us      1.08 us            0   swapper
>            2     11.13 us     10.22 us      5.56 us       749053   ThreadPoolForeg
>            1      8.15 us      8.15 us      8.15 us       321353   Chrome_ChildIOT
>            2      2.73 us      1.77 us      1.37 us       320761   Chrome_ChildIOT
>            1      1.40 us      1.40 us      1.40 us       320502   chrome
>            1       379 ns       379 ns       379 ns       321227   chrome
> 
> The other one is the per-lock-instance mode with -l/--lock-addr option.
> If the lock has a symbol, it will be displayed as well.
> 
>   # perf lock contention -a -b -l -- sleep 1
>    contended   total wait     max wait     avg wait            address   symbol
> 
>            3      4.79 us      2.33 us      1.60 us   ffffffffbaed50c0   rcu_state
>            4      4.19 us      1.62 us      1.05 us   ffffffffbae07a40   jiffies_lock
>            1      1.94 us      1.94 us      1.94 us   ffff9262277861e0
>            1       387 ns       387 ns       387 ns   ffff9260bfda4f60
> 
> It's based on the current acme/tmp.perf/core branch.
> You can find the code in the 'perf/lock-con-aggr-v1' branch in
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (4):
>   perf lock contention: Add lock_data.h for common data
>   perf lock contention: Implement -t/--threads option for BPF
>   perf lock contention: Add -l/--lock-addr option
>   perf test: Update perf lock contention test
> 
>  tools/perf/Documentation/perf-lock.txt        |  4 +
>  tools/perf/builtin-lock.c                     | 95 ++++++++++++++-----
>  tools/perf/tests/shell/lock_contention.sh     | 48 ++++++++++
>  tools/perf/util/bpf_lock_contention.c         | 72 ++++++++++----
>  .../perf/util/bpf_skel/lock_contention.bpf.c  | 67 +++++++++----
>  tools/perf/util/bpf_skel/lock_data.h          | 30 ++++++
>  tools/perf/util/lock-contention.h             |  1 +
>  7 files changed, 255 insertions(+), 62 deletions(-)
>  create mode 100644 tools/perf/util/bpf_skel/lock_data.h
> 
> 
> base-commit: b22802e295a80ec16e355d7208d2fbbd7bbc1b7a
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] perf lock contention: Add lock_data.h for common data
  2022-12-12 19:45       ` Arnaldo Carvalho de Melo
@ 2022-12-12 20:04         ` Namhyung Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-12 20:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	Adrian Hunter, linux-perf-users, Song Liu, Blake Jones, bpf

Hi Arnaldo,

On Mon, Dec 12, 2022 at 11:45 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Mon, Dec 12, 2022 at 04:43:43PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Dec 12, 2022 at 04:42:30PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Fri, Dec 09, 2022 at 11:07:24AM -0800, Namhyung Kim escreveu:
> > > > Accessing BPF maps should use the same data types.  Add bpf_skel/lock_data.h
> > > > to define the common data structures.  No functional changes.
> > >
> > > You forgot to update one of the stack_id users, that field got renamed:
> > >
> > > util/bpf_skel/lock_contention.bpf.c:144:6: error: no member named 'stack_id' in 'struct contention_key'
> > >         key.stack_id = pelem->stack_id;
> > >         ~~~ ^
> > > 1 error generated.
> > > make[2]: *** [Makefile.perf:1075: /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o] Error 1
> > > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > > make: *** [Makefile:113: install-bin] Error 2
> > > make: Leaving directory '/var/home/acme/git/perf/tools/perf'

Oops, right.

> > >
> > >  Performance counter stats for 'make -k NO_LIBTRACEEVENT=1 BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin':
> > >
> > >      7,005,216,342      cycles:u
> > >     11,851,225,594      instructions:u                   #    1.69  insn per cycle
> > >
> > >        3.168945139 seconds time elapsed
> > >
> > >        1.730964000 seconds user
> > >        1.578932000 seconds sys
> > >
> > >
> > > ⬢[acme@toolbox perf]$ git log --oneline -4
> > > f6e7a5f1db49dc8e (HEAD) perf lock contention: Add lock_data.h for common data
> > > 5d9b55713c5c037f perf python: Account for multiple words in CC
> > > d9078bf3f3320457 perf off_cpu: Fix a typo in BTF tracepoint name, it should be 'btf_trace_sched_switch'
> > > 3b7ea76f0f7844f5 perf test: Update event group check for support of uncore event
> > > ⬢[acme@toolbox perf]$
> > >
> > > After some point it builds.
> > >
> > > I'm fixing this to keep it bisectable.
> >
> > I folded this:
> >
> > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > index 0f63cc28ccbabd21..64fd1e040ac86e58 100644
> > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > @@ -141,7 +141,7 @@ int contention_end(u64 *ctx)
> >
> >       duration = bpf_ktime_get_ns() - pelem->timestamp;
> >
> > -     key.stack_id = pelem->stack_id;
> > +     key.stack_or_task_id = pelem->stack_id;
> >       data = bpf_map_lookup_elem(&lock_stat, &key);
> >       if (!data) {
> >               struct contention_data first = {

Thanks for fixing this.

>
>
> And then fixed up this:
>
> Could not apply 3d4947c7bd10beba... perf lock contention: Implement -t/--threads option for BPF
> ⬢[acme@toolbox perf]$
> ⬢[acme@toolbox perf]$
> ⬢[acme@toolbox perf]$ git diff
> diff --cc tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 64fd1e040ac86e58,cd405adcd252b82d..0000000000000000
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@@ -141,7 -168,13 +168,17 @@@ int contention_end(u64 *ctx
>
>         duration = bpf_ktime_get_ns() - pelem->timestamp;
>
> ++<<<<<<< HEAD
>  +      key.stack_or_task_id = pelem->stack_id;
> ++=======
> +       if (aggr_mode == LOCK_AGGR_CALLER) {
> +               key.stack_or_task_id = pelem->stack_id;
> +       } else {
> +               key.stack_or_task_id = pid;
> +               update_task_data(pid);
> +       }
> +
> ++>>>>>>> 3d4947c7bd10beba (perf lock contention: Implement -t/--threads option for BPF)

Sure, it looks good to me.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-12 20:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09 19:07 [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Namhyung Kim
2022-12-09 19:07 ` [PATCH 1/4] perf lock contention: Add lock_data.h for common data Namhyung Kim
2022-12-12 19:42   ` Arnaldo Carvalho de Melo
2022-12-12 19:43     ` Arnaldo Carvalho de Melo
2022-12-12 19:45       ` Arnaldo Carvalho de Melo
2022-12-12 20:04         ` Namhyung Kim
2022-12-09 19:07 ` [PATCH 2/4] perf lock contention: Implement -t/--threads option for BPF Namhyung Kim
2022-12-09 19:07 ` [PATCH 3/4] perf lock contention: Add -l/--lock-addr option Namhyung Kim
2022-12-09 19:07 ` [PATCH 4/4] perf test: Update perf lock contention test Namhyung Kim
2022-12-12 19:58 ` [PATCH 0/4] perf lock contention: Support task/addr aggregation mode (v1) Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.