From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754122AbcBBJB0 (ORCPT ); Tue, 2 Feb 2016 04:01:26 -0500 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:35659 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753962AbcBBJBZ (ORCPT ); Tue, 2 Feb 2016 04:01:25 -0500 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: byungchul.park@lge.com X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Tue, 2 Feb 2016 18:00:52 +0900 From: Byungchul Park To: Ingo Molnar Cc: willy@linux.intel.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, akinobu.mita@gmail.com, jack@suse.cz, sergey.senozhatsky.work@gmail.com, peter@hurleysoftware.com, torvalds@linux-foundation.org, Peter Zijlstra , Thomas Gleixner Subject: Re: [PATCH] lock/semaphore: Avoid a deadlock within __up() Message-ID: <20160202090052.GH29804@X58A-UD3R> References: <1454397268-6022-1-git-send-email-byungchul.park@lge.com> <20160202081355.GA30393@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160202081355.GA30393@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2016 at 09:13:55AM +0100, Ingo Molnar wrote: > > * Byungchul Park wrote: > > > Since I faced a infinite recursive printk() bug, I've tried to propose > > patches the title of which is "lib/spinlock_debug.c: prevent a recursive > > cycle in the debug code". But I noticed the root problem cannot be fixed > > by that, through some discussion thanks to Sergey and Peter. So I focused > > on preventing the DEADLOCK. > > > > -----8<----- > > From 94a66990677735459a7790b637179d8600479639 Mon Sep 17 00:00:00 2001 > > From: Byungchul Park > > Date: Tue, 2 Feb 2016 15:35:48 +0900 > > Subject: [PATCH] lock/semaphore: Avoid a deadlock within __up() > > > > When the semaphore __up() is called from within printk() with > > console_sem.lock, a DEADLOCK can happen, since the wake_up_process() can > > call printk() again, esp. if defined CONFIG_DEBUG_SPINLOCK. And the > > wake_up_process() don't need to be within a critical section. > > > > The scenario the bad thing can happen is, > > > > printk > > console_trylock > > console_unlock > > up_console_sem > > up > > raw_spin_lock_irqsave(&sem->lock, flags) > > __up > > wake_up_process > > try_to_wake_up > > raw_spin_lock_irqsave(&p->pi_lock) > > __spin_lock_debug > > spin_dump > > printk > > console_trylock > > raw_spin_lock_irqsave(&sem->lock, flags) > > > > *** DEADLOCK *** > > > > Signed-off-by: Byungchul Park > > --- > > kernel/locking/semaphore.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c > > index b8120ab..d3a28dc 100644 > > --- a/kernel/locking/semaphore.c > > +++ b/kernel/locking/semaphore.c > > @@ -259,5 +259,14 @@ static noinline void __sched __up(struct semaphore *sem) > > struct semaphore_waiter, list); > > list_del(&waiter->list); > > waiter->up = true; > > + > > + /* > > + * Trying to acquire this sem->lock in wake_up_process() leads a > > + * DEADLOCK unless we unlock it here. For example, it's possile > > + * in the case that called from within printk() since > > + * wake_up_process() might call printk(). > > + */ > > + raw_spin_unlock_irq(&sem->lock); > > wake_up_process(waiter->task); > > + raw_spin_lock_irq(&sem->lock); > > So I'm pretty sad about this solution, as it penalizes every semaphore user - Yeh... That was on my mind. Then... What about this alternative? before ====== up spin_lock add count __up wake_up_process spin_unlock thispatch ========= up spin_lock add count __up spin_unlock wake_up_process spin_lock spin_unlock alternative =========== up spin_lock add count spin_unlock wake_up_process This alternative does not have additional overhead and seems to be reasonable, doesn't it? The reason why I proposed patches like this including this alternative is that I thought it define the critical section wider than it needs. > while the deadlock is a really obscure one occuring within the scheduler or a > console driver, which are very narrow code paths! > > (Also, please don't shout in comments, unless there's some really good reason to > do it.) Do you mean the upper case e.i. DEADLOCK? Okay I will keep in mind. > > Why doesn't spin_dump() break the console lock instead, if it detects that it's > spinning on it, before doing the printk()? It's a likely fail state anyway - and > this way we push any intrusive debug oriented action towards the unlikely fail > state. > > Alternatively: why not improve down_trylock() to be lockless? The main reason for > the lockup is that a trylock op takes the semaphore spinlock unconditionally. > Which is fine for legacy code, but could perhaps be improved upon - I think we > could in fact do it without turning sem->count into atomics. > > Alternatively #2: move printk() away from semaphores - it's pretty special code > anyway and semaphore semanthics are far from obvious. > Thank you for your advice, and these approaches also look good. Could you answer my question? If you don't think so, I will try it as you advised. Thanks, Byungchul > Thanks, > > Ingo