From: Namhyung Kim <namhyung@kernel.org>
To: Chun-Tse Shao <ctshao@google.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
bpf@vger.kernel.org
Subject: Re: [PATCH v2 1/4] perf lock: Add bpf maps for owner stack tracing
Date: Thu, 23 Jan 2025 16:34:10 -0800 [thread overview]
Message-ID: <Z5LgAq7tOuU-ULBw@google.com> (raw)
In-Reply-To: <CAJpZYjXMd4+UbaO8y-HGkPor4M1L7W1iMWYe+ak1EYSOxwqXiw@mail.gmail.com>
On Tue, Jan 21, 2025 at 02:02:57PM -0800, Chun-Tse Shao wrote:
> Hi Namhyung, thanks for your reply!
>
> On Mon, Jan 13, 2025 at 7:05 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Sun, Jan 12, 2025 at 09:20:14PM -0800, Chun-Tse Shao wrote:
> > > Add few bpf maps in order to tracing owner stack.
> >
> > If you want to split this code as a separate commit, I think you'd
> > better explain what these maps do and why you need them.
> >
> > >
> > > Signed-off-by: Chun-Tse Shao <ctshao@google.com>
> > > ---
> > > tools/perf/util/bpf_lock_contention.c | 17 ++++++--
> > > .../perf/util/bpf_skel/lock_contention.bpf.c | 40 +++++++++++++++++--
> > > tools/perf/util/bpf_skel/lock_data.h | 6 +++
> > > 3 files changed, 56 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> > > index 41a1ad087895..c9c58f243ceb 100644
> > > --- a/tools/perf/util/bpf_lock_contention.c
> > > +++ b/tools/perf/util/bpf_lock_contention.c
> > > @@ -41,9 +41,20 @@ int lock_contention_prepare(struct lock_contention *con)
> > > else
> > > bpf_map__set_max_entries(skel->maps.task_data, 1);
> > >
> > > - if (con->save_callstack)
> > > - bpf_map__set_max_entries(skel->maps.stacks, con->map_nr_entries);
> > > - else
> > > + if (con->save_callstack) {
> > > + bpf_map__set_max_entries(skel->maps.stacks,
> > > + con->map_nr_entries);
> > > + if (con->owner) {
> > > + bpf_map__set_value_size(skel->maps.owner_stacks_entries,
> > > + con->max_stack * sizeof(u64));
> > > + bpf_map__set_value_size(
> > > + skel->maps.contention_owner_stacks,
> > > + con->max_stack * sizeof(u64));
> > > + bpf_map__set_key_size(skel->maps.owner_lock_stat,
> > > + con->max_stack * sizeof(u64));
> > > + skel->rodata->max_stack = con->max_stack;
> > > + }
> > > + } else
> > > bpf_map__set_max_entries(skel->maps.stacks, 1);
> > >
> > > if (target__has_cpu(target)) {
> > > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > > index 1069bda5d733..05da19fdab23 100644
> > > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > > @@ -19,13 +19,37 @@
> > > #define LCB_F_PERCPU (1U << 4)
> > > #define LCB_F_MUTEX (1U << 5)
> > >
> >
> > Can we rename these shorter and save some typings?
>
> I tend to use longer variable names with full descriptions with some
> easy to understand abbreviations. Would a shorter name be preferable
> in Linux kernel?
Well, I think it's a matter of preference. I don't know how others
think but I prefer shorter names.
>
> >
> > > -/* callstack storage */
> > > + /* tmp buffer for owner callstack */
> > > struct {
> > > - __uint(type, BPF_MAP_TYPE_STACK_TRACE);
> > > + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> > > __uint(key_size, sizeof(__u32));
> > > __uint(value_size, sizeof(__u64));
> > > + __uint(max_entries, 1);
> > > +} owner_stacks_entries SEC(".maps");
> >
> > I think this can be 'stack_buf'.
> >
> > > +
> > > +/* a map for tracing lock address to owner data */
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_HASH);
> > > + __uint(key_size, sizeof(__u64)); // lock address
> > > + __uint(value_size, sizeof(cotd));
> > > __uint(max_entries, MAX_ENTRIES);
> > > -} stacks SEC(".maps");
> > > +} contention_owner_tracing SEC(".maps");
> >
> > owner_data.
> >
> > > +
> > > +/* a map for tracing lock address to owner stacktrace */
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_HASH);
> > > + __uint(key_size, sizeof(__u64)); // lock address
> > > + __uint(value_size, sizeof(__u64)); // straktrace
> >
> > Typo.
> >
> > > + __uint(max_entries, MAX_ENTRIES);
> > > +} contention_owner_stacks SEC(".maps");
> >
> > owner_stack.
> >
> > > +
> > > +/* owner callstack to contention data storage */
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_HASH);
> > > + __uint(key_size, sizeof(__u64));
> > > + __uint(value_size, sizeof(struct contention_data));
> > > + __uint(max_entries, MAX_ENTRIES);
> > > +} owner_lock_stat SEC(".maps");
> >
> > owner_stat. What do you think?
> >
> > By the way, I got an idea to implement stackid map in BPF using hash
> > map. For owner stack, you can use the stacktrace as a key and make a
> > value an unique integer. Then the return value can be used as a stack
> > id (like from bpf_get_stackid) for the owner_data and owner_stat.
> >
> > Something like:
> >
> > s32 get_stack_id(struct owner_stack *owner_stack, u64 stacktrace[])
> > {
> > s32 *id, new_id;
> > static s32 id_gen = 1;
> >
> > id = bpf_map_lookup_elem(owner_stack, stacktrace);
> > if (id)
> > return *id;
> >
> > new_id = __sync_fetch_and_add(&id_gen, 1);
> > bpf_map_update_elem(owner_stack, stacktrace, &new_id, BPF_NOEXIST);
> >
> > id = bpf_map_lookup_elem(owner_stack, stacktrace);
> > if (id)
> > return *id;
> >
> > return -1;
> > }
> >
> > Later, in user space, you can traverse the owner_stack map to build
> > reverse mapping from id to stacktrace.
>
> I wonder if stack_id is necessary here. So far I have three bpf maps.
> 2 bpf maps for tracing owner stack on given lock address in bpf program:
> key: lock_address, value: a struct for tracing owner pid, count of
> waiters and contention begin timestamp.
> key: lock_address, value: owner stack, which is variable length so I
> have to put it in a separate bpf map.
>
> 1 bpf map for reporting owner stack in user mode:
> key: owner stack, value: struct lock_stat.
>
> With stackid I think there will still be 3 bpf maps, one for
> lock_address to owner's info with stackid, one for stackid to stack,
> and one for contention_key (has stackid inside) to lock_stat. I think
> it is just another way to implement and does not simplify the
> implementation. WDYT?
With stackid, I think we can have these 3 maps:
* key: stack-trace, value: stack-id
* key: lock-addr, value: owner-pid, counter, timestamp, stack-id
* key: stack-id, value: struct lock_stat
Then at least you can save some space for the key in the 3rd map (and
simplify the comparison of the key).
>
> >
> > >
> > > /* maintain timestamp at the beginning of contention */
> > > struct {
> > > @@ -43,6 +67,14 @@ struct {
> > > __uint(max_entries, 1);
> > > } tstamp_cpu SEC(".maps");
> > >
> > > +/* callstack storage */
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_STACK_TRACE);
> > > + __uint(key_size, sizeof(__u32));
> > > + __uint(value_size, sizeof(__u64));
> > > + __uint(max_entries, MAX_ENTRIES);
> > > +} stacks SEC(".maps");
> > > +
> > > /* actual lock contention statistics */
> > > struct {
> > > __uint(type, BPF_MAP_TYPE_HASH);
> > > @@ -126,6 +158,7 @@ const volatile int needs_callstack;
> > > const volatile int stack_skip;
> > > const volatile int lock_owner;
> > > const volatile int use_cgroup_v2;
> > > +const volatile int max_stack;
> > >
> > > /* determine the key of lock stat */
> > > const volatile int aggr_mode;
> > > @@ -436,7 +469,6 @@ int contention_end(u64 *ctx)
> > > return 0;
> > > need_delete = true;
> > > }
> > > -
> > > duration = bpf_ktime_get_ns() - pelem->timestamp;
> > > if ((__s64)duration < 0) {
> > > __sync_fetch_and_add(&time_fail, 1);
> > > diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> > > index de12892f992f..1ef0bca9860e 100644
> > > --- a/tools/perf/util/bpf_skel/lock_data.h
> > > +++ b/tools/perf/util/bpf_skel/lock_data.h
> > > @@ -3,6 +3,12 @@
> > > #ifndef UTIL_BPF_SKEL_LOCK_DATA_H
> > > #define UTIL_BPF_SKEL_LOCK_DATA_H
> > >
> > > +typedef struct contention_owner_tracing_data {
> > > + u32 pid; // Who has the lock.
> > > + u64 timestamp; // The time while the owner acquires lock and contention is going on.
> > > + u32 count; // How many waiters for this lock.
> >
> > Switching the order of timestamp and count would remove padding.
>
> Thanks for the nit!
No problem. :)
Thanks,
Namhyung
>
> >
> > > +} cotd;
> >
> > Usually we don't use typedef to remove the struct tag.
> >
> > Thanks,
> > Namhyung
> >
> > > +
> > > struct tstamp_data {
> > > u64 timestamp;
> > > u64 lock;
> > > --
> > > 2.47.1.688.g23fc6f90ad-goog
> > >
next prev parent reply other threads:[~2025-01-24 0:34 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-13 5:20 [PATCH v2 0/4] Tracing contention lock owner call stack Chun-Tse Shao
2025-01-13 5:20 ` [PATCH v2 1/4] perf lock: Add bpf maps for owner stack tracing Chun-Tse Shao
2025-01-14 3:05 ` Namhyung Kim
2025-01-21 22:02 ` Chun-Tse Shao
2025-01-24 0:34 ` Namhyung Kim [this message]
2025-01-13 5:20 ` [PATCH v2 2/4] perf lock: Retrieve owner callstack in bpf program Chun-Tse Shao
2025-01-14 3:35 ` Namhyung Kim
2025-01-21 22:35 ` Chun-Tse Shao
2025-01-24 0:39 ` Namhyung Kim
2025-01-13 5:20 ` [PATCH v2 3/4] perf lock: Make rb_tree helper functions generic Chun-Tse Shao
2025-01-13 5:20 ` [PATCH v2 4/4] perf lock: Report owner stack in usermode Chun-Tse Shao
2025-01-14 3:56 ` Namhyung Kim
2025-01-29 0:26 ` [PATCH v2 0/4] Tracing contention lock owner call stack Chun-Tse Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5LgAq7tOuU-ULBw@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=bpf@vger.kernel.org \
--cc=ctshao@google.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox