* [PATCH] locking/semaphore: Use raw_spin_trylock_irqsave() in down_trylock()
@ 2025-01-20 19:36 Waiman Long
2025-01-21 8:08 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Waiman Long @ 2025-01-20 19:36 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
Cc: linux-kernel, Waiman Long
A circular lock dependency splat has been seen with down_trylock().
[ 4011.795602] ======================================================
[ 4011.795603] WARNING: possible circular locking dependency detected
[ 4011.795607] 6.12.0-41.el10.s390x+debug
[ 4011.795612] ------------------------------------------------------
[ 4011.795613] dd/32479 is trying to acquire lock:
[ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
[ 4011.795636]
[ 4011.795636] but task is already holding lock:
[ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
[ 4011.795644]
[ 4011.795644] which lock already depends on the new lock.
:
[ 4011.796025] (console_sem).lock --> hrtimer_bases.lock --> &zone->lock
[ 4011.796025]
[ 4011.796029] Possible unsafe locking scenario:
[ 4011.796029]
[ 4011.796030] CPU0
[ 4011.796031] ----
[ 4011.796032] lock(&zone->lock);
[ 4011.796034] lock(hrtimer_bases.lock);
[ 4011.796036] lock(&zone->lock);
[ 4011.796038] lock((console_sem).lock);
[ 4011.796039]
[ 4011.796039] *** DEADLOCK ***
The calling sequence was
rmqueue_pcplist()
=> __rmqueue_pcplist()
=> rmqueue_bulk()
=> expand()
=> __add_to_free_list()
=> __warn_printk()
=> ...
=> console_trylock()
=> __down_trylock_console_sem()
=> down_trylock()
=> _raw_spin_lock_irqsave()
Normally, a trylock call should avoid this kind of circular lock
dependency splat, but down_trylock() is an exception. Fix this problem
by using raw_spin_trylock_irqsave() in down_trylock() to make it behave
like the other trylock calls.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/semaphore.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 34bfae72f295..cb27cbf5162f 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -127,6 +127,7 @@ EXPORT_SYMBOL(down_killable);
*
* NOTE: This return value is inverted from both spin_trylock and
* mutex_trylock! Be careful about this when converting code.
+ * I.e. 0 on success, 1 on failure.
*
* Unlike mutex_trylock, this function can be used from interrupt context,
* and the semaphore can be released by any task or interrupt.
@@ -136,7 +137,8 @@ int __sched down_trylock(struct semaphore *sem)
unsigned long flags;
int count;
- raw_spin_lock_irqsave(&sem->lock, flags);
+ if (!raw_spin_trylock_irqsave(&sem->lock, flags))
+ return 1;
count = sem->count - 1;
if (likely(count >= 0))
sem->count = count;
--
2.47.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] locking/semaphore: Use raw_spin_trylock_irqsave() in down_trylock()
2025-01-20 19:36 [PATCH] locking/semaphore: Use raw_spin_trylock_irqsave() in down_trylock() Waiman Long
@ 2025-01-21 8:08 ` Peter Zijlstra
2025-01-21 23:10 ` Waiman Long
0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2025-01-21 8:08 UTC (permalink / raw)
To: Waiman Long; +Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel
On Mon, Jan 20, 2025 at 02:36:08PM -0500, Waiman Long wrote:
> A circular lock dependency splat has been seen with down_trylock().
>
> [ 4011.795602] ======================================================
> [ 4011.795603] WARNING: possible circular locking dependency detected
> [ 4011.795607] 6.12.0-41.el10.s390x+debug
> [ 4011.795612] ------------------------------------------------------
> [ 4011.795613] dd/32479 is trying to acquire lock:
> [ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
> [ 4011.795636]
> [ 4011.795636] but task is already holding lock:
> [ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
> [ 4011.795644]
> [ 4011.795644] which lock already depends on the new lock.
> :
> [ 4011.796025] (console_sem).lock --> hrtimer_bases.lock --> &zone->lock
> [ 4011.796025]
> [ 4011.796029] Possible unsafe locking scenario:
> [ 4011.796029]
> [ 4011.796030] CPU0
> [ 4011.796031] ----
> [ 4011.796032] lock(&zone->lock);
> [ 4011.796034] lock(hrtimer_bases.lock);
> [ 4011.796036] lock(&zone->lock);
> [ 4011.796038] lock((console_sem).lock);
> [ 4011.796039]
> [ 4011.796039] *** DEADLOCK ***
Urgh, I hate this ^ bit of the lockdep output, it pretends to be
something useful, while it is the least useful part. Doubly so for
anything with more than 2 locks involved.
> The calling sequence was
> rmqueue_pcplist()
> => __rmqueue_pcplist()
> => rmqueue_bulk()
> => expand()
> => __add_to_free_list()
> => __warn_printk()
> => ...
> => console_trylock()
> => __down_trylock_console_sem()
> => down_trylock()
> => _raw_spin_lock_irqsave()
>
> Normally, a trylock call should avoid this kind of circular lock
> dependency splat, but down_trylock() is an exception. Fix this problem
> by using raw_spin_trylock_irqsave() in down_trylock() to make it behave
> like the other trylock calls.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/locking/semaphore.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 34bfae72f295..cb27cbf5162f 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -127,6 +127,7 @@ EXPORT_SYMBOL(down_killable);
> *
> * NOTE: This return value is inverted from both spin_trylock and
> * mutex_trylock! Be careful about this when converting code.
> + * I.e. 0 on success, 1 on failure.
> *
> * Unlike mutex_trylock, this function can be used from interrupt context,
> * and the semaphore can be released by any task or interrupt.
> @@ -136,7 +137,8 @@ int __sched down_trylock(struct semaphore *sem)
> unsigned long flags;
> int count;
>
> - raw_spin_lock_irqsave(&sem->lock, flags);
> + if (!raw_spin_trylock_irqsave(&sem->lock, flags))
> + return 1;
> count = sem->count - 1;
> if (likely(count >= 0))
> sem->count = count;
Urgh, this is terrible *again*. Didn't you try and do something
similarly daft with the rt_mutex_trylock ? And you didn't learn from
that?
So please, start by actually explaining things.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] locking/semaphore: Use raw_spin_trylock_irqsave() in down_trylock()
2025-01-21 8:08 ` Peter Zijlstra
@ 2025-01-21 23:10 ` Waiman Long
0 siblings, 0 replies; 3+ messages in thread
From: Waiman Long @ 2025-01-21 23:10 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel
On 1/21/25 3:08 AM, Peter Zijlstra wrote:
> On Mon, Jan 20, 2025 at 02:36:08PM -0500, Waiman Long wrote:
>> A circular lock dependency splat has been seen with down_trylock().
>>
>> [ 4011.795602] ======================================================
>> [ 4011.795603] WARNING: possible circular locking dependency detected
>> [ 4011.795607] 6.12.0-41.el10.s390x+debug
>> [ 4011.795612] ------------------------------------------------------
>> [ 4011.795613] dd/32479 is trying to acquire lock:
>> [ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
>> [ 4011.795636]
>> [ 4011.795636] but task is already holding lock:
>> [ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
>> [ 4011.795644]
>> [ 4011.795644] which lock already depends on the new lock.
>> :
>> [ 4011.796025] (console_sem).lock --> hrtimer_bases.lock --> &zone->lock
>> [ 4011.796025]
>> [ 4011.796029] Possible unsafe locking scenario:
>> [ 4011.796029]
>> [ 4011.796030] CPU0
>> [ 4011.796031] ----
>> [ 4011.796032] lock(&zone->lock);
>> [ 4011.796034] lock(hrtimer_bases.lock);
>> [ 4011.796036] lock(&zone->lock);
>> [ 4011.796038] lock((console_sem).lock);
>> [ 4011.796039]
>> [ 4011.796039] *** DEADLOCK ***
> Urgh, I hate this ^ bit of the lockdep output, it pretends to be
> something useful, while it is the least useful part. Doubly so for
> anything with more than 2 locks involved.
Sorry for not including other relevant information.
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The last one is actually due to calling try_to_wake_up() while holding
the console.sem raw_spinlock. Another way to break the circular locking
dependency is to use the wake_q in the semaphore. I had proposed that in
the past, but you said that the problem might be gone with console
rewrite. Now the console rewrite should have been done with the v6.12
kernel, the problem is still there.
>> The calling sequence was
>> rmqueue_pcplist()
>> => __rmqueue_pcplist()
>> => rmqueue_bulk()
>> => expand()
>> => __add_to_free_list()
>> => __warn_printk()
>> => ...
>> => console_trylock()
>> => __down_trylock_console_sem()
>> => down_trylock()
>> => _raw_spin_lock_irqsave()
>>
>> Normally, a trylock call should avoid this kind of circular lock
>> dependency splat, but down_trylock() is an exception. Fix this problem
>> by using raw_spin_trylock_irqsave() in down_trylock() to make it behave
>> like the other trylock calls.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/locking/semaphore.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
>> index 34bfae72f295..cb27cbf5162f 100644
>> --- a/kernel/locking/semaphore.c
>> +++ b/kernel/locking/semaphore.c
>> @@ -127,6 +127,7 @@ EXPORT_SYMBOL(down_killable);
>> *
>> * NOTE: This return value is inverted from both spin_trylock and
>> * mutex_trylock! Be careful about this when converting code.
>> + * I.e. 0 on success, 1 on failure.
>> *
>> * Unlike mutex_trylock, this function can be used from interrupt context,
>> * and the semaphore can be released by any task or interrupt.
>> @@ -136,7 +137,8 @@ int __sched down_trylock(struct semaphore *sem)
>> unsigned long flags;
>> int count;
>>
>> - raw_spin_lock_irqsave(&sem->lock, flags);
>> + if (!raw_spin_trylock_irqsave(&sem->lock, flags))
>> + return 1;
>> count = sem->count - 1;
>> if (likely(count >= 0))
>> sem->count = count;
> Urgh, this is terrible *again*. Didn't you try and do something
> similarly daft with the rt_mutex_trylock ? And you didn't learn from
> that?
I also a bit of concern as it is a change in behavior which may have
unintended consequence. I think using the wake_q in semaphore will be a
less risky choice. What do you think?
Cheers,
Longman
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-21 23:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 19:36 [PATCH] locking/semaphore: Use raw_spin_trylock_irqsave() in down_trylock() Waiman Long
2025-01-21 8:08 ` Peter Zijlstra
2025-01-21 23:10 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox