[PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
@ 2026-05-30 19:59 Suchit Karunakaran
  2026-05-30 20:41 ` sashiko-bot
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 5+ messages in thread
From: Suchit Karunakaran @ 2026-05-30 19:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung
  Cc: mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	james.clark, tycho, suchitkarunakaran, linux-perf-users,
	linux-kernel, bpf

update_lock_stat() handles lock contentions that start but never reach a
contention_end event (e.g., locks still held when profiling stops), but
previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
context in userspace.

Fix this by adding a cgroup_id field to struct tstamp_data, recording it
at contention_begin using get_current_cgroup_id() when aggr_mode is
LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
correct, the contention cost is incurred by the task that had to wait,
not by whatever task happens to be running at contention_end. It is also
preferable from a performance standpoint, as contention_end runs just
before the task enters the critical section.

Update contention_end to use pelem->cgroup_id instead of calling
get_current_cgroup_id() dynamically, ensuring both complete and
incomplete contention events attribute the wait time to the cgroup at
wait-start time consistently.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
---
 tools/perf/util/bpf_lock_contention.c          | 4 ++--
 tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
 tools/perf/util/bpf_skel/lock_data.h           | 1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index cbd7435579fe..1a5bd2ff8ee4 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
 		stat_key.lock_addr_or_cgroup = ts_data->lock;
 		break;
 	case LOCK_AGGR_CGROUP:
-		/* TODO */
-		return;
+		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
+		break;
 	default:
 		return;
 	}
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 96e7d853b9ed..139199811020 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
 	pelem->timestamp = bpf_ktime_get_ns();
 	pelem->lock = (__u64)ctx[0];
 	pelem->flags = (__u32)ctx[1];
+	if (aggr_mode == LOCK_AGGR_CGROUP)
+		pelem->cgroup_id = get_current_cgroup_id();
 
 	if (needs_callstack) {
 		u32 i = 0;
@@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
 			key.stack_id = pelem->stack_id;
 		break;
 	case LOCK_AGGR_CGROUP:
-		key.lock_addr_or_cgroup = get_current_cgroup_id();
+		key.lock_addr_or_cgroup = pelem->cgroup_id;
 		break;
 	default:
 		/* should not happen */
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 28c5e5aced7f..652e114e6b87 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -13,6 +13,7 @@ struct owner_tracing_data {
 struct tstamp_data {
 	u64 timestamp;
 	u64 lock;
+	u64 cgroup_id;
 	u32 flags;
 	s32 stack_id;
 };
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
@ 2026-05-30 20:41 ` sashiko-bot
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-30 20:41 UTC (permalink / raw)
  To: Suchit Karunakaran; +Cc: linux-perf-users, bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incomplete lock contentions for per-CPU locks (spinlocks/rwlocks) are silently dropped when using cgroup aggregation due to an unmodified early return in `account_end_timestamp()`.
--

commit 9be2409410efd2d4d44f23abbecf590bbbc25d34
Author: Suchit Karunakaran <suchitkarunakaran@gmail.com>

perf/lock: enable end-timestamp accounting for cgroup aggregation

This commit enables end-timestamp accounting for cgroup aggregation by
tracking the cgroup_id in the BPF program during contention_begin. It
updates update_lock_stat() in userspace and contention_end() in BPF to
use this recorded cgroup context instead of fetching it dynamically.

> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index cbd7435579fea..1a5bd2ff8ee4c 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
>  		stat_key.lock_addr_or_cgroup = ts_data->lock;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		/* TODO */
> -		return;
> +		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
> +		break;
>  	default:
>  		return;
>  	}

[Severity: Medium]
Since this patch enables end-timestamp accounting for cgroup aggregation,
should the early return for LOCK_AGGR_CGROUP in account_end_timestamp()
also be removed?

tools/perf/util/bpf_lock_contention.c:account_end_timestamp() {
        ...
        /* Now it'll check per-cpu tstamp map which doesn't have TID. */
        if (aggr_mode == LOCK_AGGR_TASK || aggr_mode == LOCK_AGGR_CGROUP)
                return;
        ...
}

Without removing this check, incomplete lock contentions for per-CPU
locks (like spinlocks and rwlocks) appear to be silently skipped when
using cgroup aggregation.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260530195940.78700-1-suchitkarunakaran@gmail.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
  2026-05-30 20:41 ` sashiko-bot
@ 2026-06-04 13:37 ` Arnaldo Carvalho de Melo
  2026-06-04 16:31   ` Namhyung Kim
  1 sibling, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-06-04 13:37 UTC (permalink / raw)
  To: Namhyung Kim, Suchit Karunakaran
  Cc: Peter Zijlstra, Ingo Molnar, mark.rutland, alexander.shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, james.clark, tycho,
	linux-perf-users, linux-kernel, bpf

On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> update_lock_stat() handles lock contentions that start but never reach a
> contention_end event (e.g., locks still held when profiling stops), but
> previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> context in userspace.
> 
> Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> at contention_begin using get_current_cgroup_id() when aggr_mode is
> LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> correct, the contention cost is incurred by the task that had to wait,
> not by whatever task happens to be running at contention_end. It is also
> preferable from a performance standpoint, as contention_end runs just
> before the task enters the critical section.
> 
> Update contention_end to use pelem->cgroup_id instead of calling
> get_current_cgroup_id() dynamically, ensuring both complete and
> incomplete contention events attribute the wait time to the cgroup at
> wait-start time consistently.

Namhyung, can you provide an Acked-by or Reviewed-by?

Thanks,

- Arnaldo
 
> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
> ---
>  tools/perf/util/bpf_lock_contention.c          | 4 ++--
>  tools/perf/util/bpf_skel/lock_contention.bpf.c | 4 +++-
>  tools/perf/util/bpf_skel/lock_data.h           | 1 +
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> index cbd7435579fe..1a5bd2ff8ee4 100644
> --- a/tools/perf/util/bpf_lock_contention.c
> +++ b/tools/perf/util/bpf_lock_contention.c
> @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts,
>  		stat_key.lock_addr_or_cgroup = ts_data->lock;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		/* TODO */
> -		return;
> +		stat_key.lock_addr_or_cgroup = ts_data->cgroup_id;
> +		break;
>  	default:
>  		return;
>  	}
> diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> index 96e7d853b9ed..139199811020 100644
> --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> @@ -536,6 +536,8 @@ int contention_begin(u64 *ctx)
>  	pelem->timestamp = bpf_ktime_get_ns();
>  	pelem->lock = (__u64)ctx[0];
>  	pelem->flags = (__u32)ctx[1];
> +	if (aggr_mode == LOCK_AGGR_CGROUP)
> +		pelem->cgroup_id = get_current_cgroup_id();
>  
>  	if (needs_callstack) {
>  		u32 i = 0;
> @@ -771,7 +773,7 @@ int contention_end(u64 *ctx)
>  			key.stack_id = pelem->stack_id;
>  		break;
>  	case LOCK_AGGR_CGROUP:
> -		key.lock_addr_or_cgroup = get_current_cgroup_id();
> +		key.lock_addr_or_cgroup = pelem->cgroup_id;
>  		break;
>  	default:
>  		/* should not happen */
> diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
> index 28c5e5aced7f..652e114e6b87 100644
> --- a/tools/perf/util/bpf_skel/lock_data.h
> +++ b/tools/perf/util/bpf_skel/lock_data.h
> @@ -13,6 +13,7 @@ struct owner_tracing_data {
>  struct tstamp_data {
>  	u64 timestamp;
>  	u64 lock;
> +	u64 cgroup_id;
>  	u32 flags;
>  	s32 stack_id;
>  };
> -- 
> 2.54.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-06-04 13:37 ` Arnaldo Carvalho de Melo
@ 2026-06-04 16:31   ` Namhyung Kim
  2026-06-04 20:29     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2026-06-04 16:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Suchit Karunakaran, Peter Zijlstra, Ingo Molnar, mark.rutland,
	alexander.shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	james.clark, tycho, linux-perf-users, linux-kernel, bpf

On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > update_lock_stat() handles lock contentions that start but never reach a
> > contention_end event (e.g., locks still held when profiling stops), but
> > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > context in userspace.
> > 
> > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > correct, the contention cost is incurred by the task that had to wait,
> > not by whatever task happens to be running at contention_end. It is also
> > preferable from a performance standpoint, as contention_end runs just
> > before the task enters the critical section.
> > 
> > Update contention_end to use pelem->cgroup_id instead of calling
> > get_current_cgroup_id() dynamically, ensuring both complete and
> > incomplete contention events attribute the wait time to the cgroup at
> > wait-start time consistently.
> 
> Namhyung, can you provide an Acked-by or Reviewed-by?

Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation
  2026-06-04 16:31   ` Namhyung Kim
@ 2026-06-04 20:29     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-06-04 20:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Suchit Karunakaran, Peter Zijlstra, Ingo Molnar, mark.rutland,
	alexander.shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	james.clark, tycho, linux-perf-users, linux-kernel, bpf

On Thu, Jun 04, 2026 at 09:31:15AM -0700, Namhyung Kim wrote:
> On Thu, Jun 04, 2026 at 10:37:05AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Sun, May 31, 2026 at 01:29:40AM +0530, Suchit Karunakaran wrote:
> > > update_lock_stat() handles lock contentions that start but never reach a
> > > contention_end event (e.g., locks still held when profiling stops), but
> > > previously treated LOCK_AGGR_CGROUP as a no-op due to missing cgroup
> > > context in userspace.
> > > 
> > > Fix this by adding a cgroup_id field to struct tstamp_data, recording it
> > > at contention_begin using get_current_cgroup_id() when aggr_mode is
> > > LOCK_AGGR_CGROUP. Capturing it at contention_begin is semantically
> > > correct, the contention cost is incurred by the task that had to wait,
> > > not by whatever task happens to be running at contention_end. It is also
> > > preferable from a performance standpoint, as contention_end runs just
> > > before the task enters the critical section.
> > > 
> > > Update contention_end to use pelem->cgroup_id instead of calling
> > > get_current_cgroup_id() dynamically, ensuring both complete and
> > > incomplete contention events attribute the wait time to the cgroup at
> > > wait-start time consistently.
> > 
> > Namhyung, can you provide an Acked-by or Reviewed-by?
> 
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied to perf-tools-next, for v7.2.

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-04 20:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30 19:59 [PATCH v2 RESEND] perf/lock: enable end-timestamp accounting for cgroup aggregation Suchit Karunakaran
2026-05-30 20:41 ` sashiko-bot
2026-06-04 13:37 ` Arnaldo Carvalho de Melo
2026-06-04 16:31   ` Namhyung Kim
2026-06-04 20:29     ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.