[PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
@ 2026-05-30 19:59 Suchit Karunakaran
  2026-05-30 20:41 ` sashiko-bot
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 5+ messages in thread
From: Suchit Karunakaran @ 2026-05-30 19:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung
  Cc: mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	james.clark, tycho, suchitkarunakaran, linux-perf-users,
	linux-kernel, bpf

update_lock_stat() handles lock contentions that start but never reach a
contention_end event (e.g., locks still held when profiling stops), but
previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
context in userspace.

Fix this by adding a cgroup_id field to struct tstamp_data, recording it
at contention_begin using get_current_cgroup_id() when aggr_mode is
LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
correct, the contention cost is incurred by the task that had to wait,
not by whatever task happens to be running at contention_end. It is also
preferable from a performance standpoint, as contention_end runs just
before the task enters the critical section.

Update contention_end to use pelem->cgroup_id instead of calling
get_current_cgroup_id() dynamically, ensuring both complete and
incomplete contention events attribute the wait time to the cgroup at
wait-start time consistently.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
---
 tools/perf/util/bpf_lock_contention.c          | 4 ++--
 tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
 tools/perf/util/bpf_skel/lock_data.h           | 1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index cbd7435579fe..1a5bd2ff8ee4 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
 		stat_key.lock_addr_or_cgroup = ts_data->lock;
 		break;
 	case LOCK_AGGR_CGROUP:
-		/* TODO */
-		return;
+		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
+		break;
 	default:
 		return;
 	}
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 96e7d853b9ed..139199811020 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
+	if (aggr_mode == LOCK_AGGR_CGROUP)
+		pelem->cgroup_id = get_current_cgroup_id();
 
 	if (needs_callstack) {
 		u32 i = 0;
@@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
 			key.stack_id = pelem->stack_id;
 		break;
 	case LOCK_AGGR_CGROUP:
-		key.lock_addr_or_cgroup = get_current_cgroup_id();
+		key.lock_addr_or_cgroup = pelem->cgroup_id;
 		break;
 	default:
 		/* should not happen */
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 28c5e5aced7f..652e114e6b87 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -13,6 +13,7 @@ struct owner_tracing_data {
 struct tstamp_data {
 	u64 timestamp;
 	u64 lock;
+	u64 cgroup_id;
 	u32 flags;
 	s32 stack_id;
 };
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
@ 2026-05-30 20:41 ` sashiko-bot
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-30 20:41 UTC (permalink / raw)
  To: Suchit Karunakaran; +Cc: linux-perf-users, bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incomplete lock contentions for per-CPU locks (spinlocks/rwlocks) are silently dropped when using cgroup aggregation due to an unmodified early return in `account_end_timestamp()`.
--

commit 9be2409410efd2d4d44f23abbecf590bbbc25d34
Author: Suchit Karunakaran <suchitkarunakaran@gmail.com>

perf/lock: enable end-timestamp accounting for cgroup aggregation

This commit enables end-timestamp accounting for cgroup aggregation by
tracking the cgroup_id in the BPF program during contention_begin. It
updates update_lock_stat() in userspace and contention_end() in BPF to
use this recorded cgroup context instead of fetching it dynamically.

> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index cbd7435579fea..1a5bd2ff8ee4c 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
>  		stat_key.lock_addr_or_cgroup = ts_data->lock;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		/* TODO */
> -		return;
> +		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
> +		break;
>  	default:
>  		return;
>  	}

[Severity: Medium]
Since this patch enables end-timestamp accounting for cgroup aggregation,
should the early return for LOCK_AGGR_CGROUP in account_end_timestamp()
also be removed?

tools/perf/util/bpf_lock_contention.c:account_end_timestamp() {
        ...
        /* Now it'll check per-cpu tstamp map which doesn't have TID. */
        if (aggr_mode == LOCK_AGGR_TASK || aggr_mode == LOCK_AGGR_CGROUP)
                return;
        ...
}

Without removing this check, incomplete lock contentions for per-CPU
locks (like spinlocks and rwlocks) appear to be silently skipped when
using cgroup aggregation.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260530195940.78700-1-suchitkarunakaran@gmail.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
  2026-05-30 20:41 ` sashiko-bot
@ 2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  2026-06-04 16:31   ` Namhyung Kim
  1 sibling, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-06-04 13:37 UTC (permalink / raw)
  To: Namhyung Kim, Suchit Karunakaran
  Cc: Peter Zijlstra, Ingo Molnar, mark.rutland, alexander.shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, james.clark, tycho,
	linux-perf-users, linux-kernel, bpf

On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> update_lock_stat() handles lock contentions that start but never reach a
> contention_end event (e.g., locks still held when profiling stops), but
> previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> context in userspace.
> 
> Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> at contention_begin using get_current_cgroup_id() when aggr_mode is
> LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> correct, the contention cost is incurred by the task that had to wait,
> not by whatever task happens to be running at contention_end. It is also
> preferable from a performance standpoint, as contention_end runs just
> before the task enters the critical section.
> 
> Update contention_end to use pelem->cgroup_id instead of calling
> get_current_cgroup_id() dynamically, ensuring both complete and
> incomplete contention events attribute the wait time to the cgroup at
> wait-start time consistently.

Namhyung, can you provide an Acked-by or Reviewed-by?

Thanks,

- Arnaldo
 
> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
> ---
>  tools/perf/util/bpf_lock_contention.c          | 4 ++--
>  tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
>  tools/perf/util/bpf_skel/lock_data.h           | 1 +
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index cbd7435579fe..1a5bd2ff8ee4 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
>  		stat_key.lock_addr_or_cgroup = ts_data->lock;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		/* TODO */
> -		return;
> +		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
> +		break;
>  	default:
>  		return;
>  	}
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 96e7d853b9ed..139199811020 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
>  	pelem->timestamp = bpf_ktime_get_ns();
>  	pelem->lock = (__u64)ctx[0];
>  	pelem->flags = (__u32)ctx[1];
> +	if (aggr_mode == LOCK_AGGR_CGROUP)
> +		pelem->cgroup_id = get_current_cgroup_id();
>  
>  	if (needs_callstack) {
>  		u32 i = 0;
> @@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
>  			key.stack_id = pelem->stack_id;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		key.lock_addr_or_cgroup = get_current_cgroup_id();
> +		key.lock_addr_or_cgroup = pelem->cgroup_id;
>  		break;
>  	default:
>  		/* should not happen */
> diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> index 28c5e5aced7f..652e114e6b87 100644
> --- a/tools/perf/util/bpf_skel/lock_data.h
> +++ b/tools/perf/util/bpf_skel/lock_data.h
> @@ -13,6 +13,7 @@ struct owner_tracing_data {
>  struct tstamp_data {
>  	u64 timestamp;
>  	u64 lock;
> +	u64 cgroup_id;
>  	u32 flags;
>  	s32 stack_id;
>  };
> -- 
> 2.54.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
@ 2026-06-04 16:31   ` Namhyung Kim
  2026-06-04 20:29     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2026-06-04 16:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Suchit Karunakaran, Peter Zijlstra, Ingo Molnar, mark.rutland,
	alexander.shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	james.clark, tycho, linux-perf-users, linux-kernel, bpf

On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > update_lock_stat() handles lock contentions that start but never reach a
> > contention_end event (e.g., locks still held when profiling stops), but
> > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > context in userspace.
> > 
> > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > correct, the contention cost is incurred by the task that had to wait,
> > not by whatever task happens to be running at contention_end. It is also
> > preferable from a performance standpoint, as contention_end runs just
> > before the task enters the critical section.
> > 
> > Update contention_end to use pelem->cgroup_id instead of calling
> > get_current_cgroup_id() dynamically, ensuring both complete and
> > incomplete contention events attribute the wait time to the cgroup at
> > wait-start time consistently.
> 
> Namhyung, can you provide an Acked-by or Reviewed-by?

Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-06-04 16:31   ` Namhyung Kim
@ 2026-06-04 20:29     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-06-04 20:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Suchit Karunakaran, Peter Zijlstra, Ingo Molnar, mark.rutland,
	alexander.shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	james.clark, tycho, linux-perf-users, linux-kernel, bpf

On Thu, Jun 04, 2026 at 09:31:15AM -0700, Namhyung Kim wrote:
> On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > > update_lock_stat() handles lock contentions that start but never reach a
> > > contention_end event (e.g., locks still held when profiling stops), but
> > > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > > context in userspace.
> > > 
> > > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > > correct, the contention cost is incurred by the task that had to wait,
> > > not by whatever task happens to be running at contention_end. It is also
> > > preferable from a performance standpoint, as contention_end runs just
> > > before the task enters the critical section.
> > > 
> > > Update contention_end to use pelem->cgroup_id instead of calling
> > > get_current_cgroup_id() dynamically, ensuring both complete and
> > > incomplete contention events attribute the wait time to the cgroup at
> > > wait-start time consistently.
> > 
> > Namhyung, can you provide an Acked-by or Reviewed-by?
> 
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied to perf-tools-next, for v7.2.

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-04 20:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
2026-05-30 20:41 ` sashiko-bot
2026-06-04 13:37 ` Arnaldo Carvalho de Melo
2026-06-04 16:31   ` Namhyung Kim
2026-06-04 20:29     ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox