From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755538AbcA2MQL (ORCPT ); Fri, 29 Jan 2016 07:16:11 -0500 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:46268 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751211AbcA2MQK (ORCPT ); Fri, 29 Jan 2016 07:16:10 -0500 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: byungchul.park@lge.com X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Fri, 29 Jan 2016 21:15:46 +0900 From: Byungchul Park To: Sergey Senozhatsky Cc: Peter Hurley , Sergey Senozhatsky , akpm@linux-foundation.org, mingo@kernel.org, linux-kernel@vger.kernel.org, akinobu.mita@gmail.com, jack@suse.cz, torvalds@linux-foundation.org Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code Message-ID: <20160129121545.GH31266@X58A-UD3R> References: <20160128060530.GC1834@swordfish> <20160128081313.GB31266@X58A-UD3R> <20160128104137.GA610@swordfish> <20160128105342.GB610@swordfish> <20160128154257.GA564@swordfish> <56AA9F63.9070600@hurleysoftware.com> <20160128235448.GC31266@X58A-UD3R> <20160129005406.GB4820@swordfish> <20160129030036.GD31266@X58A-UD3R> <20160129040500.GC4820@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160129040500.GC4820@swordfish> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 29, 2016 at 01:05:00PM +0900, Sergey Senozhatsky wrote: > then this will explode: > > printk > spin_lock > >> coding error << > spin_unlock > printk > spin_lock > printk > spin_lock > printk > spin_lock > ... boom > > vprintk_emit() recursion detection code will not work for logbuf_lock here. > because the only criteria how vprintk_emit() can detect a recursion is via > static `logbuf_cpu' which is set to UINT_MAX right before it > raw_spin_unlock(&logbuf_lock). so from vprintk_emit() POV the logbuf_lock is > already unlocked. which is not true. > > > in case of memory corruption I don't think we must care, 'coding error case' > is _probably/may be_ something that can be improved, but I'm not really 100% > sure... and this still doesn't explain your console_sem.lock case. Hello, I found the case this bad thing can happen. So the thought occurred struck me that we need a patch, similar to my v3 patch, even though the consideration of logbug_lock in the v3 patch may not be necessary now. cpu0 ==== printk console_trylock console_unlock up_console_sem up raw_spin_lock_irqsave(&sem->lock, flags) __up wake_up_process try_to_wake_up raw_spin_lock_irqsave(&p->pi_lock) __spin_lock_debug spin_dump // once it happened printk console_trylock raw_spin_lock_irqsave(&sem->lock, flags) <=== DEADLOCK cpu1 ==== printk console_trylock raw_spin_lock_irqsave(&sem->lock, flags) __spin_lock_debug spin_dump printk ... <=== repeat the recursive cycle infinitely This was the my v3 patch. -----8<----- >>From 92c84ea45ac18010804aa09eeb9e03f797a4b2b0 Mon Sep 17 00:00:00 2001 From: Byungchul Park Date: Wed, 27 Jan 2016 13:33:24 +0900 Subject: [PATCH v3] lib/spinlock_debug.c: prevent an infinite recursive cycle in spin_dump() It causes an infinite recursive cycle when using CONFIG_DEBUG_SPINLOCK, in the spin_dump(). Backtrace prints printk() -> console_trylock() -> do_raw_spin_lock() -> spin_dump() -> printk()... infinitely. When the spin_dump() is called from printk(), we should prevent the debug spinlock code from calling printk() again in that context. It's reasonable to avoid printing "lockup suspected" which is just a warning message but it would cause a real lockup definitely. However, this patch does not deal with spin_bug(), since avoiding it in the spin_bug() does not help it at all. Calling spin_bug() nearly means a real lockup happened!. In that case, it's not helpful. Signed-off-by: Byungchul Park --- kernel/locking/spinlock_debug.c | 16 +++++++++++++--- kernel/printk/printk.c | 6 ++++++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c index 0374a59..fefc76c 100644 --- a/kernel/locking/spinlock_debug.c +++ b/kernel/locking/spinlock_debug.c @@ -103,6 +103,8 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock) lock->owner_cpu = -1; } +extern int is_printk_lock(raw_spinlock_t *lock); + static void __spin_lock_debug(raw_spinlock_t *lock) { u64 i; @@ -113,11 +115,19 @@ static void __spin_lock_debug(raw_spinlock_t *lock) return; __delay(1); } - /* lockup suspected: */ - spin_dump(lock, "lockup suspected"); + + /* + * If this function is called from printk(), then we should + * not call printk() more. Or it will cause an infinite + * recursive cycle! + */ + if (likely(!is_printk_lock(lock))) { + /* lockup suspected: */ + spin_dump(lock, "lockup suspected"); #ifdef CONFIG_SMP - trigger_all_cpu_backtrace(); + trigger_all_cpu_backtrace(); #endif + } /* * The trylock above was causing a livelock. Give the lower level arch diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 2ce8826..657f8dd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1981,6 +1981,12 @@ asmlinkage __visible void early_printk(const char *fmt, ...) } #endif +int is_printk_lock(raw_spinlock_t *lock) +{ + return (lock == &console_sem.lock) || + (lock == &logbuf_lock) ; +} + static int __add_preferred_console(char *name, int idx, char *options, char *brl_options) { -- 1.9.1