From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751616AbcA2AZx (ORCPT ); Thu, 28 Jan 2016 19:25:53 -0500 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34576 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751126AbcA2AZu (ORCPT ); Thu, 28 Jan 2016 19:25:50 -0500 Date: Fri, 29 Jan 2016 09:27:03 +0900 From: Sergey Senozhatsky To: Peter Hurley Cc: Sergey Senozhatsky , Byungchul Park , akpm@linux-foundation.org, mingo@kernel.org, linux-kernel@vger.kernel.org, akinobu.mita@gmail.com, jack@suse.cz, torvalds@linux-foundation.org, Sergey Senozhatsky Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code Message-ID: <20160129002703.GA4820@swordfish> References: <1453896061-14102-1-git-send-email-byungchul.park@lge.com> <20160128014253.GC1538@X58A-UD3R> <20160128023750.GB1834@swordfish> <000301d15985$7f416690$7dc433b0$@lge.com> <20160128060530.GC1834@swordfish> <20160128081313.GB31266@X58A-UD3R> <20160128104137.GA610@swordfish> <20160128105342.GB610@swordfish> <20160128154257.GA564@swordfish> <56AA9F63.9070600@hurleysoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56AA9F63.9070600@hurleysoftware.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (01/28/16 15:08), Peter Hurley wrote: [..] > > even if at some level of recursion (nested printk calls) > > spin_dump()->__spin_lock_debug()->arch_spin_trylock() acquires the > > lock, it returns back with the spin lock unlocked anyway. > > > > vprintk_emit() > > console_trylock() > > spin_lock() > > spin_dump() > > vprintk_emit() > > console_trylock() > > spin_lock() > > spin_dump() > > vprintk_emit() > > console_trylock() > > spin_lock() << OK, got the lock finally > > The problem is you have postulated a very shallow recursion. > This looks much worse if this happens 1000 times, and > probably won't recover to output anything. well, the stack is surely limited, but on every spin_dump()->spin_lock() recursive call it does another round of u64 loops = loops_per_jiffy * HZ; for (i = 0; i < loops; i++) { if (arch_spin_trylock(&lock->raw_lock)) return; __delay(1); } so if you have 1000 spin_dump()->spin_lock() then, well, something has been holding the lock for '1000 * loops_per_jiffy * HZ'. and in particularly this case that somethign was holding the spin lock doing trivial operations like count = sem->count - 1; if (likely(count >= 0)) sem->count = count; (or a bit more if it was in down()). but still. and it's kinda hard to imagine console_sem lock being soooooooo congested and unfair. on each given point of time in the worst case there are `num_online_cpus() - 1' cpus spinning on that spin_lock and 1 cpu holding that spinlock. which in Byungchul's case is, what, 3 spinning cpus, or 7 spinnign cpus?... > Additionally, what if the console_sem is simply corrupted? > A livelock with no output ever is not very helpful. if it's corrupted then this is not a spinlock debug problem. at all. > As I wrote earlier, I don't think this is the way to fix > recursion problems with printk() [by eliding output]. > > Rather, a way to effectively determine a recursion is in progress, > and _at a minimum_ guaranteeing that the recursive output will > eventually be output should be the goal. > > Including dumb recursion like a console driver printing > an error :/ this is not a case of printk recursion and it should be handled just fine. console drivers are called under console_sem only. logbuf lock is unlocked. vprintk_emit() adds message to the logbuf, calls console_trylock() (which of course does not lock anything) and returns back to console_driver code. the only case when we really have a printk recursion is when someone calls printk() from within the vprintk_emit() logbuf_lock area. print() spin_lock logbuf printk() spin_lock logbuf <<< recursion spin_unlock logbuf -ss > Then, lockdep could remain enabled while calling console drivers. > > Regards, > Peter Hurley > > > sem->count-- > > spin_unlock() << unlock, return > > arch_spin_lock() << got the lock, return > > sem->count-- > > spin_unlock() << unlock, return > > arch_spin_lock() << got the lock, return > > sem->count-- > > spin_unlock() << unlock, return > > > > > > ...um > > > > > >> But I found there's a possiblity in the debug code *itself* to cause a > >> lockup. > > > > please explain. > > > > -ss > > >