[PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
@ 2025-07-09 19:39 Waiman Long
  2025-07-11 22:20 ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-07-09 19:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Linus Torvalds, Jann Horn, Waiman Long

Callers of mutex_lock/unlock() must ensure the validity of the memory
objects holding mutexes until after the return of all outstanding
and upcoming mutex_unlock() calls. Add a cond_resched() call in
__mutex_unlock_slowpath() to help KASAN catch a violation of this
rule. This will enforce a context switch if another runnable task is
present and the CPU is not in an atomic context.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/mutex.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a39ecccbd106..dd107fb9dad7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -931,6 +931,17 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 		}
 	}
 
+#if defined(CONFIG_KASAN) && defined(CONFIG_DEBUG_MUTEXES)
+	/*
+	 * Mutex users must ensure that the memory object holding the mutex
+	 * remains valid until after the return of all the outstanding and
+	 * upcoming mutex_unlock() calls. Enforce a context switch here if
+	 * another runnable task is present and this CPU is not in an atomic
+	 * context to increase the chance that KASAN can catch a violation of
+	 * this rule.
+	 */
+	cond_resched();
+#endif
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-09 19:39 [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule Waiman Long
@ 2025-07-11 22:20 ` Boqun Feng
  2025-07-11 22:30   ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2025-07-11 22:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Linus Torvalds, Jann Horn

On Wed, Jul 09, 2025 at 03:39:10PM -0400, Waiman Long wrote:
> Callers of mutex_lock/unlock() must ensure the validity of the memory
> objects holding mutexes until after the return of all outstanding
> and upcoming mutex_unlock() calls. Add a cond_resched() call in
> __mutex_unlock_slowpath() to help KASAN catch a violation of this
> rule. This will enforce a context switch if another runnable task is
> present and the CPU is not in an atomic context.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Waiman Long <longman@redhat.com>

Meta question: are we able to construct a case that shows this can help
detect the issue?

Regards,
Boqun

> ---
>  kernel/locking/mutex.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index a39ecccbd106..dd107fb9dad7 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -931,6 +931,17 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
>  		}
>  	}
>  
> +#if defined(CONFIG_KASAN) && defined(CONFIG_DEBUG_MUTEXES)
> +	/*
> +	 * Mutex users must ensure that the memory object holding the mutex
> +	 * remains valid until after the return of all the outstanding and
> +	 * upcoming mutex_unlock() calls. Enforce a context switch here if
> +	 * another runnable task is present and this CPU is not in an atomic
> +	 * context to increase the chance that KASAN can catch a violation of
> +	 * this rule.
> +	 */
> +	cond_resched();
> +#endif
>  	raw_spin_lock_irqsave(&lock->wait_lock, flags);
>  	debug_mutex_unlock(lock);
>  	if (!list_empty(&lock->wait_list)) {
> -- 
> 2.50.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-11 22:20 ` Boqun Feng
@ 2025-07-11 22:30   ` Linus Torvalds
  2025-07-11 23:28     ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2025-07-11 22:30 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Meta question: are we able to construct a case that shows this can help
> detect the issue?

Well, the thing that triggered this was hopefully fixed by
8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
holding the ep mutex"), but I think Jann figured that one out by code
inspection.

I doubt it can be triggered in real life without something like
Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
enabled), doing lots of concurrent epoll closes would hopefully then
trigger the warning.

Of course, to then find *other* potential bugs would be the whole
point, and some of these kinds of bugs are definitely of the kind
where the race condition doesn't actually trigger in any real load,
because it's unlikely that real loads end up doing that kind of
"release all these objects concurrently".

But it might be interesting to try that "can you even recreate the bug
fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
can't be found by this, then it's obviously unlikely to help find
others.

That said, it does seem like an obvious trivial thing to stress, which
is why that patch by Waiman has my suggested-by...

          Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-11 22:30   ` Linus Torvalds
@ 2025-07-11 23:28     ` Boqun Feng
  2025-07-12  0:14       ` Linus Torvalds
  2025-07-12  0:42       ` Waiman Long
  0 siblings, 2 replies; 10+ messages in thread
From: Boqun Feng @ 2025-07-11 23:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
> On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Meta question: are we able to construct a case that shows this can help
> > detect the issue?
> 
> Well, the thing that triggered this was hopefully fixed by
> 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
> holding the ep mutex"), but I think Jann figured that one out by code
> inspection.
> 
> I doubt it can be triggered in real life without something like
> Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
> reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
> enabled), doing lots of concurrent epoll closes would hopefully then
> trigger the warning.
> 
> Of course, to then find *other* potential bugs would be the whole
> point, and some of these kinds of bugs are definitely of the kind
> where the race condition doesn't actually trigger in any real load,
> because it's unlikely that real loads end up doing that kind of
> "release all these objects concurrently".
> 
> But it might be interesting to try that "can you even recreate the bug
> fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
> can't be found by this, then it's obviously unlikely to help find
> others.
> 

Yeah, I guess I asked the question because there is no clear link from
the bug scenario to an extra context switch, that is, even if the
context switch didn't happen, the bug would trigger if
__mutex_unlock_slowpath() took too long after giving the ownership to
someone else. So my instinct was: would cond_resched() be slow enough
;-)

But I agree it's a trivel thing to do, and I think another thing we can
do is adding a kasan_check_byte(lock) at the end of
__mutex_unlock_slowpath(), because conceptually the mutex should be
valid throughout the whole __mutex_unlock_slowpath() function, i.e.

	void __mutex_unlock_slowpath(...)
	{
		...
		raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
		// <- conceptually "lock" should still be valid here.
		// so if anyone free the memory of the mutex, it's going
		// to be a problem.
		kasan_check_byte(lock);
	}

I think this may also give us a good chance of finding more bugs, one of
the reasons is that raw_spin_unlock_irqrestore_wake() has a
preempt_enable() at last, which may trigger a context switch.

Regards,
Boqun

> That said, it does seem like an obvious trivial thing to stress, which
> is why that patch by Waiman has my suggested-by...
> 
>           Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-11 23:28     ` Boqun Feng
@ 2025-07-12  0:14       ` Linus Torvalds
  2025-07-12  0:42       ` Waiman Long
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2025-07-12  0:14 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On Fri, 11 Jul 2025 at 16:28, Boqun Feng <boqun.feng@gmail.com> wrote:
>
>   [..]  I think another thing we can
> do is adding a kasan_check_byte(lock) at the end of
> __mutex_unlock_slowpath(), because conceptually the mutex should be
> valid throughout the whole __mutex_unlock_slowpath() function, i.e.

Agreed, that makes a ton of sense too.

              Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-11 23:28     ` Boqun Feng
  2025-07-12  0:14       ` Linus Torvalds
@ 2025-07-12  0:42       ` Waiman Long
  2025-07-12  1:48         ` Waiman Long
  1 sibling, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-07-12  0:42 UTC (permalink / raw)
  To: Boqun Feng, Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Jann Horn


On 7/11/25 7:28 PM, Boqun Feng wrote:
> On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
>> On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
>>> Meta question: are we able to construct a case that shows this can help
>>> detect the issue?
>> Well, the thing that triggered this was hopefully fixed by
>> 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
>> holding the ep mutex"), but I think Jann figured that one out by code
>> inspection.
>>
>> I doubt it can be triggered in real life without something like
>> Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
>> reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
>> enabled), doing lots of concurrent epoll closes would hopefully then
>> trigger the warning.
>>
>> Of course, to then find *other* potential bugs would be the whole
>> point, and some of these kinds of bugs are definitely of the kind
>> where the race condition doesn't actually trigger in any real load,
>> because it's unlikely that real loads end up doing that kind of
>> "release all these objects concurrently".
>>
>> But it might be interesting to try that "can you even recreate the bug
>> fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
>> can't be found by this, then it's obviously unlikely to help find
>> others.
>>
> Yeah, I guess I asked the question because there is no clear link from
> the bug scenario to an extra context switch, that is, even if the
> context switch didn't happen, the bug would trigger if
> __mutex_unlock_slowpath() took too long after giving the ownership to
> someone else. So my instinct was: would cond_resched() be slow enough
> ;-)
>
> But I agree it's a trivel thing to do, and I think another thing we can
> do is adding a kasan_check_byte(lock) at the end of
> __mutex_unlock_slowpath(), because conceptually the mutex should be
> valid throughout the whole __mutex_unlock_slowpath() function, i.e.
>
> 	void __mutex_unlock_slowpath(...)
> 	{
> 		...
> 		raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
> 		// <- conceptually "lock" should still be valid here.
> 		// so if anyone free the memory of the mutex, it's going
> 		// to be a problem.
> 		kasan_check_byte(lock);
> 	}
>
> I think this may also give us a good chance of finding more bugs, one of
> the reasons is that raw_spin_unlock_irqrestore_wake() has a
> preempt_enable() at last, which may trigger a context switch.
>
> Regards,
> Boqun

I think this is a good idea. We should extend that to add the check in 
rwsem as well. Will a post a patch to do that.

Cheers,
Longman

>> That said, it does seem like an obvious trivial thing to stress, which
>> is why that patch by Waiman has my suggested-by...
>>
>>            Linus


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-12  0:42       ` Waiman Long
@ 2025-07-12  1:48         ` Waiman Long
  2025-07-12  2:24           ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-07-12  1:48 UTC (permalink / raw)
  To: Boqun Feng, Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Jann Horn

On 7/11/25 8:42 PM, Waiman Long wrote:
>
> On 7/11/25 7:28 PM, Boqun Feng wrote:
>> On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
>>> On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
>>>> Meta question: are we able to construct a case that shows this can 
>>>> help
>>>> detect the issue?
>>> Well, the thing that triggered this was hopefully fixed by
>>> 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
>>> holding the ep mutex"), but I think Jann figured that one out by code
>>> inspection.
>>>
>>> I doubt it can be triggered in real life without something like
>>> Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
>>> reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
>>> enabled), doing lots of concurrent epoll closes would hopefully then
>>> trigger the warning.
>>>
>>> Of course, to then find *other* potential bugs would be the whole
>>> point, and some of these kinds of bugs are definitely of the kind
>>> where the race condition doesn't actually trigger in any real load,
>>> because it's unlikely that real loads end up doing that kind of
>>> "release all these objects concurrently".
>>>
>>> But it might be interesting to try that "can you even recreate the bug
>>> fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
>>> can't be found by this, then it's obviously unlikely to help find
>>> others.
>>>
>> Yeah, I guess I asked the question because there is no clear link from
>> the bug scenario to an extra context switch, that is, even if the
>> context switch didn't happen, the bug would trigger if
>> __mutex_unlock_slowpath() took too long after giving the ownership to
>> someone else. So my instinct was: would cond_resched() be slow enough
>> ;-)
>>
>> But I agree it's a trivel thing to do, and I think another thing we can
>> do is adding a kasan_check_byte(lock) at the end of
>> __mutex_unlock_slowpath(), because conceptually the mutex should be
>> valid throughout the whole __mutex_unlock_slowpath() function, i.e.
>>
>>     void __mutex_unlock_slowpath(...)
>>     {
>>         ...
>>         raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, 
>> &wake_q);
>>         // <- conceptually "lock" should still be valid here.
>>         // so if anyone free the memory of the mutex, it's going
>>         // to be a problem.
>>         kasan_check_byte(lock);
>>     }
>>
>> I think this may also give us a good chance of finding more bugs, one of
>> the reasons is that raw_spin_unlock_irqrestore_wake() has a
>> preempt_enable() at last, which may trigger a context switch.
>>
>> Regards,
>> Boqun
>
> I think this is a good idea. We should extend that to add the check in 
> rwsem as well. Will a post a patch to do that.

Digging into it some more, I think adding kasan_check_byte() may not be 
necessary. If KASAN is enabled, it will instrument the locking code 
including __mutex_unlock_slowpath(). I checked the generated assembly 
code, it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. 
Adding an extra kasan_check_byte() can be redundant.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-12  1:48         ` Waiman Long
@ 2025-07-12  2:24           ` Boqun Feng
  2025-07-12  3:16             ` Waiman Long
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2025-07-12  2:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On Fri, Jul 11, 2025 at 09:48:13PM -0400, Waiman Long wrote:
> On 7/11/25 8:42 PM, Waiman Long wrote:
> > 
> > On 7/11/25 7:28 PM, Boqun Feng wrote:
> > > On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
> > > > On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
> > > > > Meta question: are we able to construct a case that shows
> > > > > this can help
> > > > > detect the issue?
> > > > Well, the thing that triggered this was hopefully fixed by
> > > > 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
> > > > holding the ep mutex"), but I think Jann figured that one out by code
> > > > inspection.
> > > > 
> > > > I doubt it can be triggered in real life without something like
> > > > Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
> > > > reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
> > > > enabled), doing lots of concurrent epoll closes would hopefully then
> > > > trigger the warning.
> > > > 
> > > > Of course, to then find *other* potential bugs would be the whole
> > > > point, and some of these kinds of bugs are definitely of the kind
> > > > where the race condition doesn't actually trigger in any real load,
> > > > because it's unlikely that real loads end up doing that kind of
> > > > "release all these objects concurrently".
> > > > 
> > > > But it might be interesting to try that "can you even recreate the bug
> > > > fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
> > > > can't be found by this, then it's obviously unlikely to help find
> > > > others.
> > > > 
> > > Yeah, I guess I asked the question because there is no clear link from
> > > the bug scenario to an extra context switch, that is, even if the
> > > context switch didn't happen, the bug would trigger if
> > > __mutex_unlock_slowpath() took too long after giving the ownership to
> > > someone else. So my instinct was: would cond_resched() be slow enough
> > > ;-)
> > > 
> > > But I agree it's a trivel thing to do, and I think another thing we can
> > > do is adding a kasan_check_byte(lock) at the end of
> > > __mutex_unlock_slowpath(), because conceptually the mutex should be
> > > valid throughout the whole __mutex_unlock_slowpath() function, i.e.
> > > 
> > >     void __mutex_unlock_slowpath(...)
> > >     {
> > >         ...
> > >         raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags,
> > > &wake_q);
> > >         // <- conceptually "lock" should still be valid here.
> > >         // so if anyone free the memory of the mutex, it's going
> > >         // to be a problem.
> > >         kasan_check_byte(lock);
> > >     }
> > > 
> > > I think this may also give us a good chance of finding more bugs, one of
> > > the reasons is that raw_spin_unlock_irqrestore_wake() has a
> > > preempt_enable() at last, which may trigger a context switch.
> > > 
> > > Regards,
> > > Boqun
> > 
> > I think this is a good idea. We should extend that to add the check in
> > rwsem as well. Will a post a patch to do that.
> 
> Digging into it some more, I think adding kasan_check_byte() may not be
> necessary. If KASAN is enabled, it will instrument the locking code
> including __mutex_unlock_slowpath(). I checked the generated assembly code,
> it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. Adding an

The point is we want to check the memory at the end of
__mutex_unlock_slowpath(), so it's an extra checking.

Also since kasan will instrument all memory accesses, what you saw may
not be the instrument on "lock" but something else, for example,
wake_q_init() in raw_spin_unlock_irqrestore_wake().

Actually, I have 3 extension to the idea:

First it occurs to me that we could just put the kasan_check_byte() at
the outermost thing, for example, mutex_unlock().

Second I wonder whether kasan has a way to tag a pointer parameter of a
function, for example for mutex_unlock():

	void mutex_unlock(struct mutex * __ref lock)
	{
		...
	}

a kasan_check_byte(lock) will auto generate whenever the function
returns.

I actually tried to use __cleanup to implement __ref, like

	#define __ref __cleanup(kasan_check_byte)

but seems the "cleanup" attritube doesn't work on function parameters ;(

Third, I went to implement a always_alive():

	#define always_alive(ptr)                                                              \
	       typeof(ptr) __UNIQUE_ID(always_alive_guard) __cleanup(kasan_check_byte) = ptr;

and you can use in mutex_unlock():

	void mutex_unlock(struct mutex *lock)
	{
		always_alive(lock);
		...
	}

This also guarantee we emit a kasan_check_byte() at the very end.

Regards,
Boqun

> extra kasan_check_byte() can be redundant.
> 
> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-12  2:24           ` Boqun Feng
@ 2025-07-12  3:16             ` Waiman Long
  2025-07-12  4:20               ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-07-12  3:16 UTC (permalink / raw)
  To: Boqun Feng, Waiman Long
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On 7/11/25 10:24 PM, Boqun Feng wrote:
> On Fri, Jul 11, 2025 at 09:48:13PM -0400, Waiman Long wrote:
>> On 7/11/25 8:42 PM, Waiman Long wrote:
>>> On 7/11/25 7:28 PM, Boqun Feng wrote:
>>>> On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
>>>>> On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@gmail.com> wrote:
>>>>>> Meta question: are we able to construct a case that shows
>>>>>> this can help
>>>>>> detect the issue?
>>>>> Well, the thing that triggered this was hopefully fixed by
>>>>> 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
>>>>> holding the ep mutex"), but I think Jann figured that one out by code
>>>>> inspection.
>>>>>
>>>>> I doubt it can be triggered in real life without something like
>>>>> Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
>>>>> reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
>>>>> enabled), doing lots of concurrent epoll closes would hopefully then
>>>>> trigger the warning.
>>>>>
>>>>> Of course, to then find *other* potential bugs would be the whole
>>>>> point, and some of these kinds of bugs are definitely of the kind
>>>>> where the race condition doesn't actually trigger in any real load,
>>>>> because it's unlikely that real loads end up doing that kind of
>>>>> "release all these objects concurrently".
>>>>>
>>>>> But it might be interesting to try that "can you even recreate the bug
>>>>> fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
>>>>> can't be found by this, then it's obviously unlikely to help find
>>>>> others.
>>>>>
>>>> Yeah, I guess I asked the question because there is no clear link from
>>>> the bug scenario to an extra context switch, that is, even if the
>>>> context switch didn't happen, the bug would trigger if
>>>> __mutex_unlock_slowpath() took too long after giving the ownership to
>>>> someone else. So my instinct was: would cond_resched() be slow enough
>>>> ;-)
>>>>
>>>> But I agree it's a trivel thing to do, and I think another thing we can
>>>> do is adding a kasan_check_byte(lock) at the end of
>>>> __mutex_unlock_slowpath(), because conceptually the mutex should be
>>>> valid throughout the whole __mutex_unlock_slowpath() function, i.e.
>>>>
>>>>      void __mutex_unlock_slowpath(...)
>>>>      {
>>>>          ...
>>>>          raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags,
>>>> &wake_q);
>>>>          // <- conceptually "lock" should still be valid here.
>>>>          // so if anyone free the memory of the mutex, it's going
>>>>          // to be a problem.
>>>>          kasan_check_byte(lock);
>>>>      }
>>>>
>>>> I think this may also give us a good chance of finding more bugs, one of
>>>> the reasons is that raw_spin_unlock_irqrestore_wake() has a
>>>> preempt_enable() at last, which may trigger a context switch.
>>>>
>>>> Regards,
>>>> Boqun
>>> I think this is a good idea. We should extend that to add the check in
>>> rwsem as well. Will a post a patch to do that.
>> Digging into it some more, I think adding kasan_check_byte() may not be
>> necessary. If KASAN is enabled, it will instrument the locking code
>> including __mutex_unlock_slowpath(). I checked the generated assembly code,
>> it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. Adding an
> The point is we want to check the memory at the end of
> __mutex_unlock_slowpath(), so it's an extra checking.

It is likely that the instrumented kasan_check* calls can be invoked 
near the beginning when the lock is first accessed, as I don't see any 
kasan_check*() around the inlined raw_spin_unlock_irqrestore_wake() call.

So if we want a check at the end, we may have to manually add one.

>
> Also since kasan will instrument all memory accesses, what you saw may
> not be the instrument on "lock" but something else, for example,
> wake_q_init() in raw_spin_unlock_irqrestore_wake().

The wake_q memory is from stack which I don't believe the compiler will 
generate kasan_check for that. I also don't see any kasan_check*() call 
when the wake_q is being manipulated.

> Actually, I have 3 extension to the idea:
>
> First it occurs to me that we could just put the kasan_check_byte() at
> the outermost thing, for example, mutex_unlock().
>
> Second I wonder whether kasan has a way to tag a pointer parameter of a
> function, for example for mutex_unlock():
>
> 	void mutex_unlock(struct mutex * __ref lock)
> 	{
> 		...
> 	}
>
> a kasan_check_byte(lock) will auto generate whenever the function
> returns.
>
> I actually tried to use __cleanup to implement __ref, like
>
> 	#define __ref __cleanup(kasan_check_byte)
>
> but seems the "cleanup" attritube doesn't work on function parameters ;(
>
> Third, I went to implement a always_alive():
>
> 	#define always_alive(ptr)                                                              \
> 	       typeof(ptr) __UNIQUE_ID(always_alive_guard) __cleanup(kasan_check_byte) = ptr;
>
> and you can use in mutex_unlock():
>
> 	void mutex_unlock(struct mutex *lock)
> 	{
> 		always_alive(lock);
> 		...
> 	}
>
> This also guarantee we emit a kasan_check_byte() at the very end.

Adding a kasan_check_byte() test at the end of unlock is a locking 
specific problem that we don't have that many instances where a check is 
needed. So it may not be worth the effort to devise a special mechanism 
just for that. Adding a simple macro to abstract it may be enough. 
Anyway, it is your call.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule
  2025-07-12  3:16             ` Waiman Long
@ 2025-07-12  4:20               ` Boqun Feng
  0 siblings, 0 replies; 10+ messages in thread
From: Boqun Feng @ 2025-07-12  4:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Jann Horn

On Fri, Jul 11, 2025 at 11:16:43PM -0400, Waiman Long wrote:
> > > > > I think this may also give us a good chance of finding more bugs, one of
> > > > > the reasons is that raw_spin_unlock_irqrestore_wake() has a
> > > > > preempt_enable() at last, which may trigger a context switch.
> > > > > 
> > > > > Regards,
> > > > > Boqun
> > > > I think this is a good idea. We should extend that to add the check in
> > > > rwsem as well. Will a post a patch to do that.
> > > Digging into it some more, I think adding kasan_check_byte() may not be
> > > necessary. If KASAN is enabled, it will instrument the locking code
> > > including __mutex_unlock_slowpath(). I checked the generated assembly code,
> > > it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. Adding an
> > The point is we want to check the memory at the end of
> > __mutex_unlock_slowpath(), so it's an extra checking.
> 
> It is likely that the instrumented kasan_check* calls can be invoked near
> the beginning when the lock is first accessed, as I don't see any
> kasan_check*() around the inlined raw_spin_unlock_irqrestore_wake() call.
> 

One thing to notice is that __kasan_check_{read,write}() are the
instrument of atomic operations (because compiler cannot genenarte the
instrument for asm), try_cmpxchg() would instrument two
kasan_check_write()s and atomic_long_read() would instrument one
kasan_check_read(), and we have 2 try_cmpxchg() and 2 atomic_long_read()
in __mutex_unlock_slowpath(), that's why you saw 4 kasan_check_write()
and 2 kasan_check_read().

Compiler intrumentation is using function like:

	__asan_report_store8_noabort()
	__asan_report_load8_noabort()

And I checked the asmebly code of __mutex_unlock_slowpath(), and it
turns out it does instrument the memory operations on stack:

    3b1c: 94000000      bl      0x3b1c <__mutex_unlock_slowpath+0x188>
                0000000000003b1c:  R_AARCH64_CALL26     wake_up_q
    3b20: 52800028      mov     w8, #0x1                // =1
    3b24: f9000be8      str     x8, [sp, #0x10]
    3b28: 38776b08      ldrb    w8, [x24, x23]
    3b2c: 34000068      cbz     w8, 0x3b38 <__mutex_unlock_slowpath+0x1a4>
    3b30: aa1303e0      mov     x0, x19
    3b34: 94000000      bl      0x3b34 <__mutex_unlock_slowpath+0x1a0>
                0000000000003b34:  R_AARCH64_CALL26     __asan_report_store8_noabort
    3b38: f9000ff4      str     x20, [sp, #0x18]

    	^ this is the "head->lastp = &head->first;" in wake_q_init()
	and x19 is sp + 0x18, if you believe me ;-)

> So if we want a check at the end, we may have to manually add one.

But yes, we want to add a conceptually check at the very end, as if "the
mutex must be valid in the whole mutex_unlock() function"

> 
> > 
> > Also since kasan will instrument all memory accesses, what you saw may
> > not be the instrument on "lock" but something else, for example,
> > wake_q_init() in raw_spin_unlock_irqrestore_wake().
> 
> The wake_q memory is from stack which I don't believe the compiler will
> generate kasan_check for that. I also don't see any kasan_check*() call when
> the wake_q is being manipulated.
> 

See above.

> > Actually, I have 3 extension to the idea:
> > 
> > First it occurs to me that we could just put the kasan_check_byte() at
> > the outermost thing, for example, mutex_unlock().
> > 
> > Second I wonder whether kasan has a way to tag a pointer parameter of a
> > function, for example for mutex_unlock():
> > 
> > 	void mutex_unlock(struct mutex * __ref lock)
> > 	{
> > 		...
> > 	}
> > 
> > a kasan_check_byte(lock) will auto generate whenever the function
> > returns.
> > 

I'm more curious whether kasan can support this.

> > I actually tried to use __cleanup to implement __ref, like
> > 
> > 	#define __ref __cleanup(kasan_check_byte)
> > 
> > but seems the "cleanup" attritube doesn't work on function parameters ;(
> > 
> > Third, I went to implement a always_alive():
> > 
> > 	#define always_alive(ptr)                                                              \
> > 	       typeof(ptr) __UNIQUE_ID(always_alive_guard) __cleanup(kasan_check_byte) = ptr;
> > 
> > and you can use in mutex_unlock():
> > 
> > 	void mutex_unlock(struct mutex *lock)
> > 	{
> > 		always_alive(lock);
> > 		...
> > 	}
> > 
> > This also guarantee we emit a kasan_check_byte() at the very end.
> 
> Adding a kasan_check_byte() test at the end of unlock is a locking specific
> problem that we don't have that many instances where a check is needed. So
> it may not be worth the effort to devise a special mechanism just for that.
> Adding a simple macro to abstract it may be enough. Anyway, it is your call.
> 

Well, we do have a lot of cases where a foo() takes a "struct bar *" and
the expectation is the pointer is always valid in foo(), so maybe there
are other usages.

But sure, we can start using always_alive() in lock primitives only at
first.

Regards,
Boqun

> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-12  4:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 19:39 [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule Waiman Long
2025-07-11 22:20 ` Boqun Feng
2025-07-11 22:30   ` Linus Torvalds
2025-07-11 23:28     ` Boqun Feng
2025-07-12  0:14       ` Linus Torvalds
2025-07-12  0:42       ` Waiman Long
2025-07-12  1:48         ` Waiman Long
2025-07-12  2:24           ` Boqun Feng
2025-07-12  3:16             ` Waiman Long
2025-07-12  4:20               ` Boqun Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).