public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] locking: contended_release tracepoint instrumentation
@ 2026-03-10 17:49 Dmitry Ilvokhin
  2026-03-10 17:49 ` [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
  2026-03-10 17:49 ` [PATCH v2 2/2] locking: Add contended_release tracepoint Dmitry Ilvokhin
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Ilvokhin @ 2026-03-10 17:49 UTC (permalink / raw)
  To: Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long
  Cc: linux-mm, linux-kernel, linux-trace-kernel, kernel-team,
	Dmitry Ilvokhin

The existing contention_begin/contention_end tracepoints fire on the
waiter side. The lock holder's identity and stack can be captured at
contention_begin time (e.g. perf lock contention --lock-owner), but
this reflects the holder's state when a waiter arrives, not when the
lock is actually released.

This series adds a contended_release tracepoint that fires on the
holder side when a lock with waiters is released. This provides:

- Hold time estimation: when the holder's own acquisition was
  contended, its contention_end (acquisition) and contended_release
  can be correlated to measure how long the lock was held under
  contention.

- The holder's stack at release time, which may differ from what perf lock
  contention --lock-owner captures if the holder does significant work between
  the waiter's arrival and the unlock.

The tracepoint is placed exclusively in slowpath unlock paths, so
there is no performance impact on the uncontended fast path and
expected minimal impact on binary size.

Based on the feedback on the RFC, here is v2 with the suggested fixes.
Since there were no objections to the approach, I am dropping the RFC
tag.

RFC -> v2:

- Add trace_contended_release_enabled() guard before waiter checks that
  exist only for the tracepoint (Steven Rostedt).
- Rename __percpu_up_read_slowpath() to __percpu_up_read() (Peter
  Zijlstra).
- Add extern for __percpu_up_read() (Peter Zijlstra).
- Squashed tracepoint introduction and usage commits (Masami Hiramatsu).

RFC: https://lore.kernel.org/all/cover.1772642407.git.d@ilvokhin.com/

Dmitry Ilvokhin (2):
  locking/percpu-rwsem: Extract __percpu_up_read()
  locking: Add contended_release tracepoint

 include/linux/percpu-rwsem.h  | 15 +++------------
 include/trace/events/lock.h   | 17 +++++++++++++++++
 kernel/locking/mutex.c        |  1 +
 kernel/locking/percpu-rwsem.c | 21 +++++++++++++++++++++
 kernel/locking/rtmutex.c      |  1 +
 kernel/locking/rwbase_rt.c    |  8 +++++++-
 kernel/locking/rwsem.c        |  9 +++++++--
 kernel/locking/semaphore.c    |  4 +++-
 8 files changed, 60 insertions(+), 16 deletions(-)

-- 
2.52.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read()
  2026-03-10 17:49 [PATCH v2 0/2] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
@ 2026-03-10 17:49 ` Dmitry Ilvokhin
  2026-03-12 11:39   ` Usama Arif
  2026-03-10 17:49 ` [PATCH v2 2/2] locking: Add contended_release tracepoint Dmitry Ilvokhin
  1 sibling, 1 reply; 6+ messages in thread
From: Dmitry Ilvokhin @ 2026-03-10 17:49 UTC (permalink / raw)
  To: Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long
  Cc: linux-mm, linux-kernel, linux-trace-kernel, kernel-team,
	Dmitry Ilvokhin

Move the percpu_up_read() slowpath out of the inline function into a new
__percpu_up_read() to avoid binary size increase from adding a
tracepoint to an inlined function.

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 include/linux/percpu-rwsem.h  | 15 +++------------
 kernel/locking/percpu-rwsem.c | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index c8cb010d655e..39d5bf8e6562 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -107,6 +107,8 @@ static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
 	return ret;
 }
 
+extern void __percpu_up_read(struct percpu_rw_semaphore *sem);
+
 static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, _RET_IP_);
@@ -118,18 +120,7 @@ static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
 	if (likely(rcu_sync_is_idle(&sem->rss))) {
 		this_cpu_dec(*sem->read_count);
 	} else {
-		/*
-		 * slowpath; reader will only ever wake a single blocked
-		 * writer.
-		 */
-		smp_mb(); /* B matches C */
-		/*
-		 * In other words, if they see our decrement (presumably to
-		 * aggregate zero, as that is the only time it matters) they
-		 * will also see our critical section.
-		 */
-		this_cpu_dec(*sem->read_count);
-		rcuwait_wake_up(&sem->writer);
+		__percpu_up_read(sem);
 	}
 	preempt_enable();
 }
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index ef234469baac..f3ee7a0d6047 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -288,3 +288,21 @@ void percpu_up_write(struct percpu_rw_semaphore *sem)
 	rcu_sync_exit(&sem->rss);
 }
 EXPORT_SYMBOL_GPL(percpu_up_write);
+
+void __percpu_up_read(struct percpu_rw_semaphore *sem)
+{
+	lockdep_assert_preemption_disabled();
+	/*
+	 * slowpath; reader will only ever wake a single blocked
+	 * writer.
+	 */
+	smp_mb(); /* B matches C */
+	/*
+	 * In other words, if they see our decrement (presumably to
+	 * aggregate zero, as that is the only time it matters) they
+	 * will also see our critical section.
+	 */
+	this_cpu_dec(*sem->read_count);
+	rcuwait_wake_up(&sem->writer);
+}
+EXPORT_SYMBOL_GPL(__percpu_up_read);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] locking: Add contended_release tracepoint
  2026-03-10 17:49 [PATCH v2 0/2] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
  2026-03-10 17:49 ` [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
@ 2026-03-10 17:49 ` Dmitry Ilvokhin
  2026-03-12 11:38   ` Usama Arif
  1 sibling, 1 reply; 6+ messages in thread
From: Dmitry Ilvokhin @ 2026-03-10 17:49 UTC (permalink / raw)
  To: Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long
  Cc: linux-mm, linux-kernel, linux-trace-kernel, kernel-team,
	Dmitry Ilvokhin

Add the contended_release trace event. This tracepoint fires on the
holder side when a contended lock is released, complementing the
existing contention_begin/contention_end tracepoints which fire on the
waiter side.

This enables correlating lock hold time under contention with waiter
events by lock address.

Add trace_contended_release() calls to the slowpath unlock paths of
sleepable locks: mutex, rtmutex, semaphore, rwsem, percpu-rwsem, and
RT-specific rwbase locks. Each call site fires only when there are
blocked waiters being woken, except percpu_up_write() which always wakes
via __wake_up().

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 include/trace/events/lock.h   | 17 +++++++++++++++++
 kernel/locking/mutex.c        |  1 +
 kernel/locking/percpu-rwsem.c |  3 +++
 kernel/locking/rtmutex.c      |  1 +
 kernel/locking/rwbase_rt.c    |  8 +++++++-
 kernel/locking/rwsem.c        |  9 +++++++--
 kernel/locking/semaphore.c    |  4 +++-
 7 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
index 8e89baa3775f..4f28e41977ec 100644
--- a/include/trace/events/lock.h
+++ b/include/trace/events/lock.h
@@ -138,6 +138,23 @@ TRACE_EVENT(contention_end,
 	TP_printk("%p (ret=%d)", __entry->lock_addr, __entry->ret)
 );
 
+TRACE_EVENT(contended_release,
+
+	TP_PROTO(void *lock),
+
+	TP_ARGS(lock),
+
+	TP_STRUCT__entry(
+		__field(void *, lock_addr)
+	),
+
+	TP_fast_assign(
+		__entry->lock_addr = lock;
+	),
+
+	TP_printk("%p", __entry->lock_addr)
+);
+
 #endif /* _TRACE_LOCK_H */
 
 /* This part must be outside protection */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 427187ff02db..ff9d07f3e900 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -992,6 +992,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 	if (waiter) {
 		next = waiter->task;
 
+		trace_contended_release(lock);
 		debug_mutex_wake_waiter(lock, waiter);
 		__clear_task_blocked_on(next, lock);
 		wake_q_add(&wake_q, next);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index f3ee7a0d6047..1eee51766aaf 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -263,6 +263,8 @@ void percpu_up_write(struct percpu_rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, _RET_IP_);
 
+	trace_contended_release(sem);
+
 	/*
 	 * Signal the writer is done, no fast path yet.
 	 *
@@ -297,6 +299,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
 	 * writer.
 	 */
 	smp_mb(); /* B matches C */
+	trace_contended_release(sem);
 	/*
 	 * In other words, if they see our decrement (presumably to
 	 * aggregate zero, as that is the only time it matters) they
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ccaba6148b61..3db8a840b4e8 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1466,6 +1466,7 @@ static void __sched rt_mutex_slowunlock(struct rt_mutex_base *lock)
 		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
 
+	trace_contended_release(lock);
 	/*
 	 * The wakeup next waiter path does not suffer from the above
 	 * race. See the comments there.
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 82e078c0665a..081778934b13 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -162,8 +162,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
 	 * worst case which can happen is a spurious wakeup.
 	 */
 	owner = rt_mutex_owner(rtm);
-	if (owner)
+	if (owner) {
+		trace_contended_release(rwb);
 		rt_mutex_wake_q_add_task(&wqh, owner, state);
+	}
 
 	/* Pairs with the preempt_enable in rt_mutex_wake_up_q() */
 	preempt_disable();
@@ -205,6 +207,8 @@ static inline void rwbase_write_unlock(struct rwbase_rt *rwb)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm))
+		trace_contended_release(rwb);
 	__rwbase_write_unlock(rwb, WRITER_BIAS, flags);
 }
 
@@ -214,6 +218,8 @@ static inline void rwbase_write_downgrade(struct rwbase_rt *rwb)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm))
+		trace_contended_release(rwb);
 	/* Release it and account current as reader */
 	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
 }
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index ba4cb74de064..cf7d8e75ad7b 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1390,6 +1390,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) ==
 		      RWSEM_FLAG_WAITERS)) {
 		clear_nonspinnable(sem);
+		trace_contended_release(sem);
 		rwsem_wake(sem);
 	}
 	preempt_enable();
@@ -1413,8 +1414,10 @@ static inline void __up_write(struct rw_semaphore *sem)
 	preempt_disable();
 	rwsem_clear_owner(sem);
 	tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
-	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
+	if (unlikely(tmp & RWSEM_FLAG_WAITERS)) {
+		trace_contended_release(sem);
 		rwsem_wake(sem);
+	}
 	preempt_enable();
 }
 
@@ -1437,8 +1440,10 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	tmp = atomic_long_fetch_add_release(
 		-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count);
 	rwsem_set_reader_owned(sem);
-	if (tmp & RWSEM_FLAG_WAITERS)
+	if (tmp & RWSEM_FLAG_WAITERS) {
+		trace_contended_release(sem);
 		rwsem_downgrade_wake(sem);
+	}
 	preempt_enable();
 }
 
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 74d41433ba13..d46415095dd6 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -231,8 +231,10 @@ void __sched up(struct semaphore *sem)
 	else
 		__up(sem, &wake_q);
 	raw_spin_unlock_irqrestore(&sem->lock, flags);
-	if (!wake_q_empty(&wake_q))
+	if (!wake_q_empty(&wake_q)) {
+		trace_contended_release(sem);
 		wake_up_q(&wake_q);
+	}
 }
 EXPORT_SYMBOL(up);
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] locking: Add contended_release tracepoint
  2026-03-10 17:49 ` [PATCH v2 2/2] locking: Add contended_release tracepoint Dmitry Ilvokhin
@ 2026-03-12 11:38   ` Usama Arif
  2026-03-16 15:32     ` Dmitry Ilvokhin
  0 siblings, 1 reply; 6+ messages in thread
From: Usama Arif @ 2026-03-12 11:38 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Usama Arif, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	linux-mm, linux-kernel, linux-trace-kernel, kernel-team

On Tue, 10 Mar 2026 17:49:39 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:

> Add the contended_release trace event. This tracepoint fires on the
> holder side when a contended lock is released, complementing the
> existing contention_begin/contention_end tracepoints which fire on the
> waiter side.
> 
> This enables correlating lock hold time under contention with waiter
> events by lock address.
> 
> Add trace_contended_release() calls to the slowpath unlock paths of
> sleepable locks: mutex, rtmutex, semaphore, rwsem, percpu-rwsem, and
> RT-specific rwbase locks. Each call site fires only when there are
> blocked waiters being woken, except percpu_up_write() which always wakes
> via __wake_up().
> 
> Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
> ---
>  include/trace/events/lock.h   | 17 +++++++++++++++++
>  kernel/locking/mutex.c        |  1 +
>  kernel/locking/percpu-rwsem.c |  3 +++
>  kernel/locking/rtmutex.c      |  1 +
>  kernel/locking/rwbase_rt.c    |  8 +++++++-
>  kernel/locking/rwsem.c        |  9 +++++++--
>  kernel/locking/semaphore.c    |  4 +++-
>  7 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
> index 8e89baa3775f..4f28e41977ec 100644
> --- a/include/trace/events/lock.h
> +++ b/include/trace/events/lock.h
> @@ -138,6 +138,23 @@ TRACE_EVENT(contention_end,
>  	TP_printk("%p (ret=%d)", __entry->lock_addr, __entry->ret)
>  );
>  
> +TRACE_EVENT(contended_release,
> +
> +	TP_PROTO(void *lock),
> +
> +	TP_ARGS(lock),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, lock_addr)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->lock_addr = lock;
> +	),
> +
> +	TP_printk("%p", __entry->lock_addr)
> +);
> +
>  #endif /* _TRACE_LOCK_H */
>  
>  /* This part must be outside protection */
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 427187ff02db..ff9d07f3e900 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -992,6 +992,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
>  	if (waiter) {
>  		next = waiter->task;
>  
> +		trace_contended_release(lock);
>  		debug_mutex_wake_waiter(lock, waiter);
>  		__clear_task_blocked_on(next, lock);
>  		wake_q_add(&wake_q, next);
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index f3ee7a0d6047..1eee51766aaf 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -263,6 +263,8 @@ void percpu_up_write(struct percpu_rw_semaphore *sem)
>  {
>  	rwsem_release(&sem->dep_map, _RET_IP_);
>  
> +	trace_contended_release(sem);
> +

Hello!

I saw that you mentioned in the commmit message that you do this for only
blocked waiters except for percpu_up_write(). We can use
waitqueue_active(&sem->waiters) to check for this over here so that
its consistent with every other call?


>  	/*
>  	 * Signal the writer is done, no fast path yet.
>  	 *
> @@ -297,6 +299,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
>  	 * writer.
>  	 */
>  	smp_mb(); /* B matches C */
> +	trace_contended_release(sem);

Should we do this after this_cpu_dec(*sem->read_count)?

>  	/*
>  	 * In other words, if they see our decrement (presumably to
>  	 * aggregate zero, as that is the only time it matters) they
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index ccaba6148b61..3db8a840b4e8 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1466,6 +1466,7 @@ static void __sched rt_mutex_slowunlock(struct rt_mutex_base *lock)
>  		raw_spin_lock_irqsave(&lock->wait_lock, flags);
>  	}
>  
> +	trace_contended_release(lock);
>  	/*
>  	 * The wakeup next waiter path does not suffer from the above
>  	 * race. See the comments there.
> diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
> index 82e078c0665a..081778934b13 100644
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -162,8 +162,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
>  	 * worst case which can happen is a spurious wakeup.
>  	 */
>  	owner = rt_mutex_owner(rtm);
> -	if (owner)
> +	if (owner) {
> +		trace_contended_release(rwb);
>  		rt_mutex_wake_q_add_task(&wqh, owner, state);
> +	}
>  
>  	/* Pairs with the preempt_enable in rt_mutex_wake_up_q() */
>  	preempt_disable();
> @@ -205,6 +207,8 @@ static inline void rwbase_write_unlock(struct rwbase_rt *rwb)
>  	unsigned long flags;
>  
>  	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
> +	if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm))
> +		trace_contended_release(rwb);
>  	__rwbase_write_unlock(rwb, WRITER_BIAS, flags);
>  }
>  
> @@ -214,6 +218,8 @@ static inline void rwbase_write_downgrade(struct rwbase_rt *rwb)
>  	unsigned long flags;
>  
>  	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
> +	if (trace_contended_release_enabled() && rt_mutex_has_waiters(rtm))
> +		trace_contended_release(rwb);
>  	/* Release it and account current as reader */
>  	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
>  }
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index ba4cb74de064..cf7d8e75ad7b 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1390,6 +1390,7 @@ static inline void __up_read(struct rw_semaphore *sem)
>  	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) ==
>  		      RWSEM_FLAG_WAITERS)) {
>  		clear_nonspinnable(sem);
> +		trace_contended_release(sem);
>  		rwsem_wake(sem);
>  	}
>  	preempt_enable();
> @@ -1413,8 +1414,10 @@ static inline void __up_write(struct rw_semaphore *sem)
>  	preempt_disable();
>  	rwsem_clear_owner(sem);
>  	tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
> -	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
> +	if (unlikely(tmp & RWSEM_FLAG_WAITERS)) {
> +		trace_contended_release(sem);
>  		rwsem_wake(sem);
> +	}
>  	preempt_enable();
>  }
>  
> @@ -1437,8 +1440,10 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>  	tmp = atomic_long_fetch_add_release(
>  		-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count);
>  	rwsem_set_reader_owned(sem);
> -	if (tmp & RWSEM_FLAG_WAITERS)
> +	if (tmp & RWSEM_FLAG_WAITERS) {
> +		trace_contended_release(sem);
>  		rwsem_downgrade_wake(sem);
> +	}
>  	preempt_enable();
>  }
>  
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 74d41433ba13..d46415095dd6 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -231,8 +231,10 @@ void __sched up(struct semaphore *sem)
>  	else
>  		__up(sem, &wake_q);
>  	raw_spin_unlock_irqrestore(&sem->lock, flags);
> -	if (!wake_q_empty(&wake_q))
> +	if (!wake_q_empty(&wake_q)) {
> +		trace_contended_release(sem);
>  		wake_up_q(&wake_q);
> +	}
>  }
>  EXPORT_SYMBOL(up);
>  
> -- 
> 2.52.0
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read()
  2026-03-10 17:49 ` [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
@ 2026-03-12 11:39   ` Usama Arif
  0 siblings, 0 replies; 6+ messages in thread
From: Usama Arif @ 2026-03-12 11:39 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Usama Arif, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	linux-mm, linux-kernel, linux-trace-kernel, kernel-team

On Tue, 10 Mar 2026 17:49:38 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:

> Move the percpu_up_read() slowpath out of the inline function into a new
> __percpu_up_read() to avoid binary size increase from adding a
> tracepoint to an inlined function.
> 
> Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
> ---
>  include/linux/percpu-rwsem.h  | 15 +++------------
>  kernel/locking/percpu-rwsem.c | 18 ++++++++++++++++++
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
> index c8cb010d655e..39d5bf8e6562 100644
> --- a/include/linux/percpu-rwsem.h
> +++ b/include/linux/percpu-rwsem.h
> @@ -107,6 +107,8 @@ static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
>  	return ret;
>  }
>  

Acked-by: Usama Arif <usama.arif@linux.dev> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] locking: Add contended_release tracepoint
  2026-03-12 11:38   ` Usama Arif
@ 2026-03-16 15:32     ` Dmitry Ilvokhin
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Ilvokhin @ 2026-03-16 15:32 UTC (permalink / raw)
  To: Usama Arif
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long, linux-mm, linux-kernel,
	linux-trace-kernel, kernel-team

On Thu, Mar 12, 2026 at 04:38:14AM -0700, Usama Arif wrote:
> On Tue, 10 Mar 2026 17:49:39 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:
> 
> > Add the contended_release trace event. This tracepoint fires on the
> > holder side when a contended lock is released, complementing the
> > existing contention_begin/contention_end tracepoints which fire on the
> > waiter side.
> > 
> > This enables correlating lock hold time under contention with waiter
> > events by lock address.
> > 
> > Add trace_contended_release() calls to the slowpath unlock paths of
> > sleepable locks: mutex, rtmutex, semaphore, rwsem, percpu-rwsem, and
> > RT-specific rwbase locks. Each call site fires only when there are
> > blocked waiters being woken, except percpu_up_write() which always wakes
> > via __wake_up().
> > 
> > Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
> > ---
> >  include/trace/events/lock.h   | 17 +++++++++++++++++
> >  kernel/locking/mutex.c        |  1 +
> >  kernel/locking/percpu-rwsem.c |  3 +++
> >  kernel/locking/rtmutex.c      |  1 +
> >  kernel/locking/rwbase_rt.c    |  8 +++++++-
> >  kernel/locking/rwsem.c        |  9 +++++++--
> >  kernel/locking/semaphore.c    |  4 +++-
> >  7 files changed, 39 insertions(+), 4 deletions(-)
> > 

[...]

> > diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> > index f3ee7a0d6047..1eee51766aaf 100644
> > --- a/kernel/locking/percpu-rwsem.c
> > +++ b/kernel/locking/percpu-rwsem.c
> > @@ -263,6 +263,8 @@ void percpu_up_write(struct percpu_rw_semaphore *sem)
> >  {
> >  	rwsem_release(&sem->dep_map, _RET_IP_);
> >  
> > +	trace_contended_release(sem);
> > +
> 
> Hello!
> 
> I saw that you mentioned in the commmit message that you do this for only
> blocked waiters except for percpu_up_write(). We can use
> waitqueue_active(&sem->waiters) to check for this over here so that
> its consistent with every other call?

Thanks for the feedback, Usama.

I thought about it and even mentioned in the comment, but I forgot what
was the reason. Now, I think you are correct. I added wq_has_sleeper()
locally instead of waitqueue_active() locally, since we are not holding
the lock here and waitqueue_active() requires a barrier based on the
comment. It might be not very important here, but I'd rather make it
correct even for tracepoint.

Note that __percpu_up_read() doesn't need this guard. Maybe I was
thinking at __percpu_up_read() part before and just made it symmetric.

Anyway, thanks for suggestion.

> 
> 
> >  	/*
> >  	 * Signal the writer is done, no fast path yet.
> >  	 *
> > @@ -297,6 +299,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
> >  	 * writer.
> >  	 */
> >  	smp_mb(); /* B matches C */
> > +	trace_contended_release(sem);
> 
> Should we do this after this_cpu_dec(*sem->read_count)?

Good point. I moved it after this_cpu_dec() so the tracepoint fires
after the lock is released but before rcuwait_wake_up(). I also went
through all other call sites and made the placement consistent where
possible: after release, before wake. It should be fixed in v3.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-16 15:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 17:49 [PATCH v2 0/2] locking: contended_release tracepoint instrumentation Dmitry Ilvokhin
2026-03-10 17:49 ` [PATCH v2 1/2] locking/percpu-rwsem: Extract __percpu_up_read() Dmitry Ilvokhin
2026-03-12 11:39   ` Usama Arif
2026-03-10 17:49 ` [PATCH v2 2/2] locking: Add contended_release tracepoint Dmitry Ilvokhin
2026-03-12 11:38   ` Usama Arif
2026-03-16 15:32     ` Dmitry Ilvokhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox