* [PATCH RT] rtmutex: fix deadlock in spin_trylock
@ 2013-10-30 15:53 Cowell, Matt (NSN - US/Arlington Heights)
2013-11-06 17:39 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Cowell, Matt (NSN - US/Arlington Heights) @ 2013-10-30 15:53 UTC (permalink / raw)
To: linux-rt-users@vger.kernel.org
Repost to linux-rt-users list, as there has been no response on the linux-kernel list and it missed several rt patch releases. Originally seen on 3.10.10-rt7.
There is a deadlock condition with RT spin_locks that may use trylock in
hardirq context. The main user of this is get_next_timer_interrupt, which
will deadlock if an IRQ preempts a timer wheel modification.
Since this is a trylock and if the wait_lock is already held it means that
some other user is either locking, about to wait for, or unlocking the
rtmutex, we should not wait and return failure anyway. This likely also
reduces latency.
Detected by spinlock lockup debug:
BUG: spinlock lockup suspected on CPU#1, ncp_nca_0_grp_5/133
lock: 0xee02a000, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
CPU: 1 PID: 133 Comm: ncp_nca_0_grp_5 Tainted: G O 3.10.10-rt7 #53
[<c041539c>] (unwind_backtrace+0x0/0xf8) from [<c0411914>] (show_stack+0x10/0x14)
[<c0411914>] (show_stack+0x10/0x14) from [<c06059b0>] (do_raw_spin_lock+0x100/0x164)
[<c06059b0>] (do_raw_spin_lock+0x100/0x164) from [<c07afbc0>] (rt_spin_trylock+0x14/0xc4)
[<c07afbc0>] (rt_spin_trylock+0x14/0xc4) from [<c04310fc>] (get_next_timer_interrupt+0x54/0x2b8)
[<c04310fc>] (get_next_timer_interrupt+0x54/0x2b8) from [<c0467c44>] (tick_nohz_stop_sched_tick+0x204/0x2e4)
[<c0467c44>] (tick_nohz_stop_sched_tick+0x204/0x2e4) from [<c0468534>] (tick_nohz_irq_exit+0x98/0xb8)
[<c0468534>] (tick_nohz_irq_exit+0x98/0xb8) from [<c042b2b8>] (irq_exit+0xc0/0x164)
[<c042b2b8>] (irq_exit+0xc0/0x164) from [<c04137ec>] (handle_IPI+0x9c/0x124)
[<c04137ec>] (handle_IPI+0x9c/0x124) from [<c04085b4>] (axxia_gic_handle_irq+0xac/0xec)
[<c04085b4>] (axxia_gic_handle_irq+0xac/0xec) from [<c07b0ac4>] (__irq_svc+0x44/0x8c)
Exception stack(0xed859df8 to 0xed859e40)
9de0: ee02a000 00000000
9e00: 00000000 00000001 ee02a000 ee02a000 ee02a000 ed858000 ee02a000 00000000
9e20: ed858000 00000000 ed858000 ed859e40 c07af9a8 c0605914 60000013 ffffffff
[<c07b0ac4>] (__irq_svc+0x44/0x8c) from [<c0605914>] (do_raw_spin_lock+0x64/0x164)
[<c0605914>] (do_raw_spin_lock+0x64/0x164) from [<c07af9a8>] (rt_spin_lock_slowunlock+0xc/0x68)
[<c07af9a8>] (rt_spin_lock_slowunlock+0xc/0x68) from [<c07adbb0>] (schedule_timeout+0x138/0x1fc)
[<c07adbb0>] (schedule_timeout+0x138/0x1fc) from [<bf067f0c>] (ncp_nca_thread_notifier+0xd0/0x12c [ncp])
[<bf067f0c>] (ncp_nca_thread_notifier+0xd0/0x12c [ncp]) from [<bf083cac>] (ncp_nca_thread+0x24/0x78 [ncp])
[<bf083cac>] (ncp_nca_thread+0x24/0x78 [ncp]) from [<c04440ec>] (kthread+0x9c/0xa4)
[<c04440ec>] (kthread+0x9c/0xa4) from [<c040ddb8>] (ret_from_fork+0x14/0x3c)
---
kernel/rtmutex.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 5d76634..25ad02d 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1057,7 +1057,9 @@ rt_mutex_slowtrylock(struct rt_mutex *lock)
{
int ret = 0;
- raw_spin_lock(&lock->wait_lock);
+ if (!raw_spin_trylock(&lock->wait_lock)) {
+ return 0;
+ }
init_lists(lock);
if (likely(rt_mutex_owner(lock) != current)) {
--
1.7.10.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RT] rtmutex: fix deadlock in spin_trylock
2013-10-30 15:53 [PATCH RT] rtmutex: fix deadlock in spin_trylock Cowell, Matt (NSN - US/Arlington Heights)
@ 2013-11-06 17:39 ` Thomas Gleixner
2013-11-06 20:22 ` Matt Cowell
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2013-11-06 17:39 UTC (permalink / raw)
To: Cowell, Matt (NSN - US/Arlington Heights); +Cc: linux-rt-users@vger.kernel.org
On Wed, 30 Oct 2013, Cowell, Matt (NSN - US/Arlington Heights) wrote:
> Repost to linux-rt-users list, as there has been no response on the linux-kernel list and it missed several rt patch releases. Originally seen on 3.10.10-rt7.
>
>
> There is a deadlock condition with RT spin_locks that may use trylock in
> hardirq context. The main user of this is get_next_timer_interrupt, which
> will deadlock if an IRQ preempts a timer wheel modification.
Errm. No.
> Detected by spinlock lockup debug:
> BUG: spinlock lockup suspected on CPU#1, ncp_nca_0_grp_5/133
> lock: 0xee02a000, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
> CPU: 1 PID: 133 Comm: ncp_nca_0_grp_5 Tainted: G O 3.10.10-rt7 #53
The cpu is in thread context and therefor not idle.
void tick_nohz_irq_exit(void)
{
if (!ts->inidle)
return;
....
}
So you are papering over the real issue:
Why is ts->inidle true, if the CPU is not in idle?
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RT] rtmutex: fix deadlock in spin_trylock
2013-11-06 17:39 ` Thomas Gleixner
@ 2013-11-06 20:22 ` Matt Cowell
2013-11-16 13:12 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 4+ messages in thread
From: Matt Cowell @ 2013-11-06 20:22 UTC (permalink / raw)
To: ext Thomas Gleixner; +Cc: linux-rt-users@vger.kernel.org
>> There is a deadlock condition with RT spin_locks that may use trylock in
>> hardirq context. The main user of this is get_next_timer_interrupt, which
>> will deadlock if an IRQ preempts a timer wheel modification.
>
> Errm. No.
Are you saying that spin_trylock will/should not be called from hardirq
context, or that it is not possible for spin_trylock to deadlock in the
situation I described? With PREEMPT_RT_FULL and NO_HZ_FULL on v3.10,
spin_trylock will definitely will be called from hardirq context. This
seems like a valid usage to me as well, as an optimization to prevent
disabling IRQs and preemption in code that holds the timer spinlock.
Even if this code did not cause a deadlock, is there a reason that
rt_mutex_slowtrylock is using raw_spin_lock instead of raw_spin_trylock?
I would think that we would want to ensure that any *trylock call
would return as fast as possible without spinning or blocking.
> The cpu is in thread context and therefor not idle.
>
> void tick_nohz_irq_exit(void)
> {
> if (!ts->inidle)
> return;
> ....
> }
>
> So you are papering over the real issue:
>
> Why is ts->inidle true, if the CPU is not in idle?
The code you cited is from pre-v3.10 (without NO_HZ_FULL changes). This
was changed in commit 5811d99 (nohz: Prepare to stop the tick on irq
exit). When ts->idle is false (as it was in my trace), this code now
calls tick_nohz_full_stop_tick(). This leads to the following trace:
do_raw_spin_lock
rt_spin_trylock
spin_do_trylock
get_next_timer_interrupt
tick_nohz_stop_sched_tick
tick_nohz_full_stop_tick
tick_nohz_irq_exit
irq_exit
I see two ways to fix this:
1) The way I did in the patch to ensure that spin_trylock never can spin
on any locks internally.
2) Converting the timer base lock to a raw_spinlock_t so that it
actually disables hardirqs in lock_timer_base(). I don't think this
would be a good idea though.
Thanks,
-Matt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RT] rtmutex: fix deadlock in spin_trylock
2013-11-06 20:22 ` Matt Cowell
@ 2013-11-16 13:12 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-11-16 13:12 UTC (permalink / raw)
To: Matt Cowell; +Cc: ext Thomas Gleixner, linux-rt-users@vger.kernel.org
* Matt Cowell | 2013-11-06 14:22:18 [-0600]:
>I see two ways to fix this:
>1) The way I did in the patch to ensure that spin_trylock never can
>spin on any locks internally.
>2) Converting the timer base lock to a raw_spinlock_t so that it
>actually disables hardirqs in lock_timer_base(). I don't think this
>would be a good idea though.
A third one has been applied where the waiter lock is taken with irqs
off. The thread "CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo" has
a patch for that.
>
>Thanks,
>-Matt
Sebastian
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-11-16 13:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-30 15:53 [PATCH RT] rtmutex: fix deadlock in spin_trylock Cowell, Matt (NSN - US/Arlington Heights)
2013-11-06 17:39 ` Thomas Gleixner
2013-11-06 20:22 ` Matt Cowell
2013-11-16 13:12 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).