From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE0801D63F0; Thu, 23 Apr 2026 01:12:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776906721; cv=none; b=rtRBXXDW26N0e1TmrwXUE5tpeHA/WTZzdraeaei+KIxjqnPIx60RyBKiLwAuX5qjitp8iuP8Sjw/440KIwwCKIPyW/hMQfs/KEW6CzSiK87CETusqPSPIowYzi/Sf+xo6In/3dkL2/l4Z+zl1i8ZbZ0eg3nLfgWHfI3Muunc+Zk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776906721; c=relaxed/simple; bh=S7lOtiK0ny4Q7zCLC45kW8aw23HtqZxs1qHhF0KBIJ0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Bwsq+klCARpVcTD+9U2doZZm+D2k4OmtJOJtGEMiMU47tEhtLh6TeRihgHRVMoQbWaso8qzlNc1vLSh3TLPjKLNxoflc+czqvIKzwU7QFPP0Dqbj/45dQlTSIGOGyQeeShEvtTopw1aG3vX/omqCwaaTXE1Q05aZvztlFyIOiag= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=T0z6z9lF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="T0z6z9lF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21205C19425; Thu, 23 Apr 2026 01:12:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776906721; bh=S7lOtiK0ny4Q7zCLC45kW8aw23HtqZxs1qHhF0KBIJ0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=T0z6z9lFxtdV+V69oNN3V3Wb8Tf2WQ2vY7z17mnxmQhIwVxK0DoSGvOEiXohWhrp/ rwHdpuTlSlnQmGY5ahuNYtZS6C9f/pkLpqDfsrm3txShr5khAcOVX75jQq2me7yzso ibr59Pf+AeLGEgINWw+cOPlufD/Wwnhn9CQHfauknB2MAdACJl1ImXLh0pCDJJcRX3 MYFMj3Pf32s2hTKBSvAVUkl7lo2uB3ORuruUL5hgvArFpUesd72psIZjKszrH/yY+5 vd6xbe1NC6hWXRQ+PMZj4OtILye7q3H21KbYDewad7bLLmM/hPb8bf6vfZAUq2k9a6 TQ7M2B5k0+14Q== Date: Wed, 22 Apr 2026 18:11:59 -0700 From: Namhyung Kim To: Suchit Karunakaran Cc: sashiko@lists.linux.dev, bpf@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH] perf/lock: enable end-timestamp accounting for cgroup aggregation Message-ID: References: <20260420184656.331306-1-suchitkarunakaran@gmail.com> <20260421004512.CDE13C19425@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Apr 21, 2026 at 01:59:00PM +0530, Suchit Karunakaran wrote: > On Tue, 21 Apr 2026 at 06:15, wrote: > > > > Sashiko AI review found 2 potential issue(s): > > - [High] Incomplete spinlock and rwlock contentions are silently dropped for LOCK_AGGR_CGROUP because account_end_timestamp() skips the per-CPU tstamp map. > > - [Medium] Inconsistent cgroup attribution between complete and incomplete lock contentions when a task migrates cgroups. > > -- > > > > commit 7ba25dc647534f97eb1ae9f34c6c8d38e7fef074 > > Author: Suchit Karunakaran > > > > perf/lock: enable end-timestamp accounting for cgroup aggregation > > > > This commit adds a cgroup_id field to struct tstamp_data to properly account > > for incomplete lock contention events when aggregating by cgroup. It records > > the cgroup ID at contention_begin and uses it to build the aggregation key > > in userspace. > > > > > diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c > > > index cbd7435579fea..1a5bd2ff8ee4c 100644 > > > --- a/tools/perf/util/bpf_lock_contention.c > > > +++ b/tools/perf/util/bpf_lock_contention.c > > > @@ -463,8 +463,8 @@ static void update_lock_stat(int map_fd, int pid, u64 end_ts, > > > stat_key.lock_addr_or_cgroup = ts_data->lock; > > > break; > > > case LOCK_AGGR_CGROUP: > > > - /* TODO */ > > > - return; > > > + stat_key.lock_addr_or_cgroup = ts_data->cgroup_id; > > > + break; > > > > Does this fully enable end-timestamp accounting for cgroup aggregation > > on per-CPU locks like spinlocks and rwlocks? > > > > Looking at account_end_timestamp() in tools/perf/util/bpf_lock_contention.c, > > there is an early return that skips the per-CPU tstamp map: > > > > static void account_end_timestamp(struct lock_contention *con) > > { > > ... > > /* Now it'll check per-cpu tstamp map which doesn't have TID. */ > > if (aggr_mode == LOCK_AGGR_TASK || aggr_mode == LOCK_AGGR_CGROUP) > > return; > > > > total_cpus = cpu__max_cpu().cpu; > > ts_fd = bpf_map__fd(skel->maps.tstamp_cpu); > > ... > > } > > > > Since account_end_timestamp() skips iterating over skel->maps.tstamp_cpu > > when aggr_mode is LOCK_AGGR_CGROUP, are all incomplete contentions for > > spinlocks and rwlocks still silently dropped? > > > > As per my understanding the tstamp_cpu map stores per-cpu entries > without a TID, so there is no current task context available to derive > a cgroup ID. Unlike the per-task tstamp map where cgroup_id is > captured at contention_begin time, per-cpu entries represent > spinlock/rwlock contentions that are inherently CPU-bound and not > attributed to a single task at contention_end time either. The > original LOCK_AGGR_TASK mode has the same early return for the same > reason, cgroup attribution without a task context is not meaningful. > This patch addresses only the per-task straggler case. Right, it'll miss spinlocks and rwlocks but I think it's ok. > > > [ ... ] > > > > > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c > > > index 96e7d853b9edf..d0e2cad02fa36 100644 > > > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c > > > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c > > > @@ -536,6 +536,8 @@ int contention_begin(u64 *ctx) > > > pelem->timestamp = bpf_ktime_get_ns(); > > > pelem->lock = (__u64)ctx[0]; > > > pelem->flags = (__u32)ctx[1]; > > > + if (aggr_mode == LOCK_AGGR_CGROUP) > > > + pelem->cgroup_id = get_current_cgroup_id(); > > > > If a task migrates to a different cgroup while blocked on a lock, could this > > lead to inconsistent attribution between complete and incomplete contentions? > > > > The cgroup ID is statically recorded here at contention_begin() and used by > > userspace for incomplete events. > > > > However, contention_end() continues to dynamically resolve the cgroup ID: > > > > SEC("tp_btf/contention_end") > > int contention_end(u64 *ctx) > > { > > ... > > case LOCK_AGGR_CGROUP: > > key.lock_addr_or_cgroup = get_current_cgroup_id(); > > break; > > ... > > } > > > > Would it be more consistent for contention_end() to also use the statically > > recorded pelem->cgroup_id, ensuring that both completed and incomplete events > > attribute the contention time to the exact same cgroup? > > > > I think the fix should go in the opposite direction. Instead of > calling get_current_cgroup_id() at contention_end, I think we should > be using pelem->cgroup_id. Semantically, the cost of lock contention > is incurred by the task that had to wait for the lock. So it makes > more sense to attribute that cost based on the cgroup at the time the > wait started (contention_begin), rather than whatever happens to be > current at the end. > I’m not entirely certain about this, so I’d really appreciate any > input from maintainers and reviewers on whether this attribution makes > sense. Agreed, as I replied earlier we should use pelem->cgroup_id. The tstamp_data is saved per-thread so it doesn't matter which thread is calling this. I think the concern is when - IIRC I put it at _end() because it may reduce the overhead when we have some filters. But it seems it's already after checking the filters. Then it should be ok to do in _begin(). Maybe it's a better choice doing it in _begin() since the thread is about to go to wait for the lock and doing nothing critical. While at the _end(), it'll start the critical section soon. Thanks, Namhyung