From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753104AbcA2EDt (ORCPT ); Thu, 28 Jan 2016 23:03:49 -0500 Received: from mail-pa0-f67.google.com ([209.85.220.67]:36376 "EHLO mail-pa0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751668AbcA2EDr (ORCPT ); Thu, 28 Jan 2016 23:03:47 -0500 Date: Fri, 29 Jan 2016 13:05:00 +0900 From: Sergey Senozhatsky To: Byungchul Park Cc: Sergey Senozhatsky , Peter Hurley , Sergey Senozhatsky , akpm@linux-foundation.org, mingo@kernel.org, linux-kernel@vger.kernel.org, akinobu.mita@gmail.com, jack@suse.cz, torvalds@linux-foundation.org Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code Message-ID: <20160129040500.GC4820@swordfish> References: <000301d15985$7f416690$7dc433b0$@lge.com> <20160128060530.GC1834@swordfish> <20160128081313.GB31266@X58A-UD3R> <20160128104137.GA610@swordfish> <20160128105342.GB610@swordfish> <20160128154257.GA564@swordfish> <56AA9F63.9070600@hurleysoftware.com> <20160128235448.GC31266@X58A-UD3R> <20160129005406.GB4820@swordfish> <20160129030036.GD31266@X58A-UD3R> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160129030036.GD31266@X58A-UD3R> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (01/29/16 12:00), Byungchul Park wrote: [..] > > it took a while to even find out that you are reporting this issues > > not against a real H/W, but a qemu. I suppose qemu-arm running on > > x86_64 box. > > No matter what kind of box I used because I only kept talking about the > possiblity. It does not depend on a box at all. well, qemu completely invalidates all of the H/W theories - powered off, etc. so in a way it's important. > > on very spin_dump recursive call it waits for the spin_lock and when > > it eventually grabs it, it does the job that it wanted to do under > > that spin lock, unlock it and return back. and the only case when it > > never "return back" is when it never "eventually grabs it". > > Right. I missed it. hm... we also can hit problems in spin_unlock() path. AND there are chances that spin_unlock() can explode WITH OUT any memory corruption on sight, but due to a coding error... a theoretical one: we do unlock logbuf_lock, and debug_spin_unlock() is performed on a locked logbuf_lock spin_lock static inline void debug_spin_unlock(raw_spinlock_t *lock) { SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic"); SPIN_BUG_ON(!raw_spin_is_locked(lock), lock, "already unlocked"); SPIN_BUG_ON(lock->owner != current, lock, "wrong owner"); SPIN_BUG_ON(lock->owner_cpu != raw_smp_processor_id(), lock, "wrong CPU"); lock->owner = SPINLOCK_OWNER_INIT; lock->owner_cpu = -1; } void do_raw_spin_unlock(raw_spinlock_t *lock) { debug_spin_unlock(lock); arch_spin_unlock(&lock->raw_lock); } so if there was a coding error (schedule while atomic, or unlock from another CPU) which resulted in faulty lock->owner_cpu != raw_smp_processor_id() OR lock->owner != current then this will explode: printk spin_lock >> coding error << spin_unlock printk spin_lock printk spin_lock printk spin_lock ... boom vprintk_emit() recursion detection code will not work for logbuf_lock here. because the only criteria how vprintk_emit() can detect a recursion is via static `logbuf_cpu' which is set to UINT_MAX right before it raw_spin_unlock(&logbuf_lock). so from vprintk_emit() POV the logbuf_lock is already unlocked. which is not true. in case of memory corruption I don't think we must care, 'coding error case' is _probably/may be_ something that can be improved, but I'm not really 100% sure... and this still doesn't explain your console_sem.lock case. -ss