* [PATCH] lock/semaphore: Avoid a deadlock within __up() @ 2016-02-02 7:14 Byungchul Park 2016-02-02 8:13 ` Ingo Molnar 0 siblings, 1 reply; 3+ messages in thread From: Byungchul Park @ 2016-02-02 7:14 UTC (permalink / raw) To: willy, akpm, mingo Cc: linux-kernel, akinobu.mita, jack, sergey.senozhatsky.work, peter, torvalds Since I faced a infinite recursive printk() bug, I've tried to propose patches the title of which is "lib/spinlock_debug.c: prevent a recursive cycle in the debug code". But I noticed the root problem cannot be fixed by that, through some discussion thanks to Sergey and Peter. So I focused on preventing the DEADLOCK. -----8<----- >From 94a66990677735459a7790b637179d8600479639 Mon Sep 17 00:00:00 2001 From: Byungchul Park <byungchul.park@lge.com> Date: Tue, 2 Feb 2016 15:35:48 +0900 Subject: [PATCH] lock/semaphore: Avoid a deadlock within __up() When the semaphore __up() is called from within printk() with console_sem.lock, a DEADLOCK can happen, since the wake_up_process() can call printk() again, esp. if defined CONFIG_DEBUG_SPINLOCK. And the wake_up_process() don't need to be within a critical section. The scenario the bad thing can happen is, printk console_trylock console_unlock up_console_sem up raw_spin_lock_irqsave(&sem->lock, flags) __up wake_up_process try_to_wake_up raw_spin_lock_irqsave(&p->pi_lock) __spin_lock_debug spin_dump printk console_trylock raw_spin_lock_irqsave(&sem->lock, flags) *** DEADLOCK *** Signed-off-by: Byungchul Park <byungchul.park@lge.com> --- kernel/locking/semaphore.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c index b8120ab..d3a28dc 100644 --- a/kernel/locking/semaphore.c +++ b/kernel/locking/semaphore.c @@ -259,5 +259,14 @@ static noinline void __sched __up(struct semaphore *sem) struct semaphore_waiter, list); list_del(&waiter->list); waiter->up = true; + + /* + * Trying to acquire this sem->lock in wake_up_process() leads a + * DEADLOCK unless we unlock it here. For example, it's possile + * in the case that called from within printk() since + * wake_up_process() might call printk(). + */ + raw_spin_unlock_irq(&sem->lock); wake_up_process(waiter->task); + raw_spin_lock_irq(&sem->lock); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] lock/semaphore: Avoid a deadlock within __up() 2016-02-02 7:14 [PATCH] lock/semaphore: Avoid a deadlock within __up() Byungchul Park @ 2016-02-02 8:13 ` Ingo Molnar 2016-02-02 9:00 ` Byungchul Park 0 siblings, 1 reply; 3+ messages in thread From: Ingo Molnar @ 2016-02-02 8:13 UTC (permalink / raw) To: Byungchul Park Cc: willy, akpm, linux-kernel, akinobu.mita, jack, sergey.senozhatsky.work, peter, torvalds, Peter Zijlstra, Thomas Gleixner * Byungchul Park <byungchul.park@lge.com> wrote: > Since I faced a infinite recursive printk() bug, I've tried to propose > patches the title of which is "lib/spinlock_debug.c: prevent a recursive > cycle in the debug code". But I noticed the root problem cannot be fixed > by that, through some discussion thanks to Sergey and Peter. So I focused > on preventing the DEADLOCK. > > -----8<----- > From 94a66990677735459a7790b637179d8600479639 Mon Sep 17 00:00:00 2001 > From: Byungchul Park <byungchul.park@lge.com> > Date: Tue, 2 Feb 2016 15:35:48 +0900 > Subject: [PATCH] lock/semaphore: Avoid a deadlock within __up() > > When the semaphore __up() is called from within printk() with > console_sem.lock, a DEADLOCK can happen, since the wake_up_process() can > call printk() again, esp. if defined CONFIG_DEBUG_SPINLOCK. And the > wake_up_process() don't need to be within a critical section. > > The scenario the bad thing can happen is, > > printk > console_trylock > console_unlock > up_console_sem > up > raw_spin_lock_irqsave(&sem->lock, flags) > __up > wake_up_process > try_to_wake_up > raw_spin_lock_irqsave(&p->pi_lock) > __spin_lock_debug > spin_dump > printk > console_trylock > raw_spin_lock_irqsave(&sem->lock, flags) > > *** DEADLOCK *** > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > --- > kernel/locking/semaphore.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c > index b8120ab..d3a28dc 100644 > --- a/kernel/locking/semaphore.c > +++ b/kernel/locking/semaphore.c > @@ -259,5 +259,14 @@ static noinline void __sched __up(struct semaphore *sem) > struct semaphore_waiter, list); > list_del(&waiter->list); > waiter->up = true; > + > + /* > + * Trying to acquire this sem->lock in wake_up_process() leads a > + * DEADLOCK unless we unlock it here. For example, it's possile > + * in the case that called from within printk() since > + * wake_up_process() might call printk(). > + */ > + raw_spin_unlock_irq(&sem->lock); > wake_up_process(waiter->task); > + raw_spin_lock_irq(&sem->lock); So I'm pretty sad about this solution, as it penalizes every semaphore user - while the deadlock is a really obscure one occuring within the scheduler or a console driver, which are very narrow code paths! (Also, please don't shout in comments, unless there's some really good reason to do it.) Why doesn't spin_dump() break the console lock instead, if it detects that it's spinning on it, before doing the printk()? It's a likely fail state anyway - and this way we push any intrusive debug oriented action towards the unlikely fail state. Alternatively: why not improve down_trylock() to be lockless? The main reason for the lockup is that a trylock op takes the semaphore spinlock unconditionally. Which is fine for legacy code, but could perhaps be improved upon - I think we could in fact do it without turning sem->count into atomics. Alternatively #2: move printk() away from semaphores - it's pretty special code anyway and semaphore semanthics are far from obvious. Thanks, Ingo ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] lock/semaphore: Avoid a deadlock within __up() 2016-02-02 8:13 ` Ingo Molnar @ 2016-02-02 9:00 ` Byungchul Park 0 siblings, 0 replies; 3+ messages in thread From: Byungchul Park @ 2016-02-02 9:00 UTC (permalink / raw) To: Ingo Molnar Cc: willy, akpm, linux-kernel, akinobu.mita, jack, sergey.senozhatsky.work, peter, torvalds, Peter Zijlstra, Thomas Gleixner On Tue, Feb 02, 2016 at 09:13:55AM +0100, Ingo Molnar wrote: > > * Byungchul Park <byungchul.park@lge.com> wrote: > > > Since I faced a infinite recursive printk() bug, I've tried to propose > > patches the title of which is "lib/spinlock_debug.c: prevent a recursive > > cycle in the debug code". But I noticed the root problem cannot be fixed > > by that, through some discussion thanks to Sergey and Peter. So I focused > > on preventing the DEADLOCK. > > > > -----8<----- > > From 94a66990677735459a7790b637179d8600479639 Mon Sep 17 00:00:00 2001 > > From: Byungchul Park <byungchul.park@lge.com> > > Date: Tue, 2 Feb 2016 15:35:48 +0900 > > Subject: [PATCH] lock/semaphore: Avoid a deadlock within __up() > > > > When the semaphore __up() is called from within printk() with > > console_sem.lock, a DEADLOCK can happen, since the wake_up_process() can > > call printk() again, esp. if defined CONFIG_DEBUG_SPINLOCK. And the > > wake_up_process() don't need to be within a critical section. > > > > The scenario the bad thing can happen is, > > > > printk > > console_trylock > > console_unlock > > up_console_sem > > up > > raw_spin_lock_irqsave(&sem->lock, flags) > > __up > > wake_up_process > > try_to_wake_up > > raw_spin_lock_irqsave(&p->pi_lock) > > __spin_lock_debug > > spin_dump > > printk > > console_trylock > > raw_spin_lock_irqsave(&sem->lock, flags) > > > > *** DEADLOCK *** > > > > Signed-off-by: Byungchul Park <byungchul.park@lge.com> > > --- > > kernel/locking/semaphore.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c > > index b8120ab..d3a28dc 100644 > > --- a/kernel/locking/semaphore.c > > +++ b/kernel/locking/semaphore.c > > @@ -259,5 +259,14 @@ static noinline void __sched __up(struct semaphore *sem) > > struct semaphore_waiter, list); > > list_del(&waiter->list); > > waiter->up = true; > > + > > + /* > > + * Trying to acquire this sem->lock in wake_up_process() leads a > > + * DEADLOCK unless we unlock it here. For example, it's possile > > + * in the case that called from within printk() since > > + * wake_up_process() might call printk(). > > + */ > > + raw_spin_unlock_irq(&sem->lock); > > wake_up_process(waiter->task); > > + raw_spin_lock_irq(&sem->lock); > > So I'm pretty sad about this solution, as it penalizes every semaphore user - Yeh... That was on my mind. Then... What about this alternative? before ====== up spin_lock add count __up wake_up_process spin_unlock thispatch ========= up spin_lock add count __up spin_unlock wake_up_process spin_lock spin_unlock alternative =========== up spin_lock add count spin_unlock wake_up_process This alternative does not have additional overhead and seems to be reasonable, doesn't it? The reason why I proposed patches like this including this alternative is that I thought it define the critical section wider than it needs. > while the deadlock is a really obscure one occuring within the scheduler or a > console driver, which are very narrow code paths! > > (Also, please don't shout in comments, unless there's some really good reason to > do it.) Do you mean the upper case e.i. DEADLOCK? Okay I will keep in mind. > > Why doesn't spin_dump() break the console lock instead, if it detects that it's > spinning on it, before doing the printk()? It's a likely fail state anyway - and > this way we push any intrusive debug oriented action towards the unlikely fail > state. > > Alternatively: why not improve down_trylock() to be lockless? The main reason for > the lockup is that a trylock op takes the semaphore spinlock unconditionally. > Which is fine for legacy code, but could perhaps be improved upon - I think we > could in fact do it without turning sem->count into atomics. > > Alternatively #2: move printk() away from semaphores - it's pretty special code > anyway and semaphore semanthics are far from obvious. > Thank you for your advice, and these approaches also look good. Could you answer my question? If you don't think so, I will try it as you advised. Thanks, Byungchul > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-02 9:01 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-02 7:14 [PATCH] lock/semaphore: Avoid a deadlock within __up() Byungchul Park 2016-02-02 8:13 ` Ingo Molnar 2016-02-02 9:00 ` Byungchul Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox