From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id D6CCD1A025A for ; Tue, 24 Feb 2015 18:27:26 +1100 (AEDT) Received: by mail-wi0-f170.google.com with SMTP id ex7so2473947wid.1 for ; Mon, 23 Feb 2015 23:27:23 -0800 (PST) Sender: Ingo Molnar Date: Tue, 24 Feb 2015 08:27:19 +0100 From: Ingo Molnar To: Anton Blanchard Subject: Re: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore} Message-ID: <20150224072719.GB15894@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> <1424748634-9153-2-git-send-email-anton@samba.org> <20150224064107.GB15387@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20150224064107.GB15387@gmail.com> Cc: Don Zickus , x86@kernel.org, Russell King , Peter Zijlstra , peterz@infradead.org, hpa@zytor.com, linux-kernel@vger.kernel.org, Steven Rostedt , Linus Torvalds , Ingo Molnar , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, Thomas Gleixner , sam.bobroff@au1.ibm.com, Arjan van de Ven List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , * Ingo Molnar wrote: > > * Anton Blanchard wrote: > > > +static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED; > > +static int die_owner = -1; > > +static unsigned int die_nest_count; > > + > > +unsigned long __die_spin_lock_irqsave(void) > > +{ > > + unsigned long flags; > > + int cpu; > > + > > + /* racy, but better than risking deadlock. */ > > + raw_local_irq_save(flags); > > + > > + cpu = smp_processor_id(); > > + if (!arch_spin_trylock(&die_lock)) { > > + if (cpu != die_owner) > > + arch_spin_lock(&die_lock); > > So why not trylock and time out here after a few seconds, > instead of indefinitely supressing some potentially vital > output due to some other CPU crashing/locking with the lock > held? [...] > If we fix the deadlock potential, and get a true global > ordering of various oopses/warnings as they triggered (or > at least timestamping them), [...] If we had a global 'trouble counter' we could use that to refine the spin-looping timeout: instead of using a pure timeout of a few seconds, we could say 'a timeout of a few seconds while the counter does not increase'. I.e. only override the locking/ordering if the owner CPU does not seem to be able to make progress with printing the oops/warning. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 From: mingo@kernel.org (Ingo Molnar) Date: Tue, 24 Feb 2015 08:27:19 +0100 Subject: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore} In-Reply-To: <20150224064107.GB15387@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> <1424748634-9153-2-git-send-email-anton@samba.org> <20150224064107.GB15387@gmail.com> Message-ID: <20150224072719.GB15894@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org * Ingo Molnar wrote: > > * Anton Blanchard wrote: > > > +static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED; > > +static int die_owner = -1; > > +static unsigned int die_nest_count; > > + > > +unsigned long __die_spin_lock_irqsave(void) > > +{ > > + unsigned long flags; > > + int cpu; > > + > > + /* racy, but better than risking deadlock. */ > > + raw_local_irq_save(flags); > > + > > + cpu = smp_processor_id(); > > + if (!arch_spin_trylock(&die_lock)) { > > + if (cpu != die_owner) > > + arch_spin_lock(&die_lock); > > So why not trylock and time out here after a few seconds, > instead of indefinitely supressing some potentially vital > output due to some other CPU crashing/locking with the lock > held? [...] > If we fix the deadlock potential, and get a true global > ordering of various oopses/warnings as they triggered (or > at least timestamping them), [...] If we had a global 'trouble counter' we could use that to refine the spin-looping timeout: instead of using a pure timeout of a few seconds, we could say 'a timeout of a few seconds while the counter does not increase'. I.e. only override the locking/ordering if the owner CPU does not seem to be able to make progress with printing the oops/warning. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751866AbbBXH1Z (ORCPT ); Tue, 24 Feb 2015 02:27:25 -0500 Received: from mail-we0-f171.google.com ([74.125.82.171]:46249 "EHLO mail-we0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbbBXH1Y (ORCPT ); Tue, 24 Feb 2015 02:27:24 -0500 Date: Tue, 24 Feb 2015 08:27:19 +0100 From: Ingo Molnar To: Anton Blanchard Cc: Andrew Morton , Steven Rostedt , Michael Ellerman , Paul Mackerras , Benjamin Herrenschmidt , sam.bobroff@au1.ibm.com, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Russell King , peterz@infradead.org, Don Zickus , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, Linus Torvalds , Arjan van de Ven , Peter Zijlstra Subject: Re: [PATCH 1/7] Add die_spin_lock_{irqsave,irqrestore} Message-ID: <20150224072719.GB15894@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> <1424748634-9153-2-git-send-email-anton@samba.org> <20150224064107.GB15387@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150224064107.GB15387@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > * Anton Blanchard wrote: > > > +static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED; > > +static int die_owner = -1; > > +static unsigned int die_nest_count; > > + > > +unsigned long __die_spin_lock_irqsave(void) > > +{ > > + unsigned long flags; > > + int cpu; > > + > > + /* racy, but better than risking deadlock. */ > > + raw_local_irq_save(flags); > > + > > + cpu = smp_processor_id(); > > + if (!arch_spin_trylock(&die_lock)) { > > + if (cpu != die_owner) > > + arch_spin_lock(&die_lock); > > So why not trylock and time out here after a few seconds, > instead of indefinitely supressing some potentially vital > output due to some other CPU crashing/locking with the lock > held? [...] > If we fix the deadlock potential, and get a true global > ordering of various oopses/warnings as they triggered (or > at least timestamping them), [...] If we had a global 'trouble counter' we could use that to refine the spin-looping timeout: instead of using a pure timeout of a few seconds, we could say 'a timeout of a few seconds while the counter does not increase'. I.e. only override the locking/ordering if the owner CPU does not seem to be able to make progress with printing the oops/warning. Thanks, Ingo