From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id CA4241A0D1E for ; Tue, 24 Feb 2015 17:35:21 +1100 (AEDT) Received: by mail-wi0-f180.google.com with SMTP id h11so22688884wiw.1 for ; Mon, 23 Feb 2015 22:35:18 -0800 (PST) Sender: Ingo Molnar Date: Tue, 24 Feb 2015 07:35:14 +0100 From: Ingo Molnar To: Anton Blanchard Subject: Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups Message-ID: <20150224063513.GA15387@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1424748634-9153-1-git-send-email-anton@samba.org> Cc: Don Zickus , x86@kernel.org, Russell King , peterz@infradead.org, hpa@zytor.com, linux-kernel@vger.kernel.org, Steven Rostedt , Linus Torvalds , Ingo Molnar , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, Thomas Gleixner , sam.bobroff@au1.ibm.com, Arjan van de Ven List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , * Anton Blanchard wrote: > Every now and then I end up with an undebuggable issue > because multiple CPUs hit something at the same time and > everything is interleaved: > > CR: 48000082 XER: 00000000 > ,RI > c0000003dc72fd10 > ,LE > d0000000065b84e8 > Instruction dump: > MSR: 8000000100029033 > > Very annoying. > > Some architectures already have their own recursive > locking for oopses and we have another version for > serialising dump_stack. > > Create a common version and use it everywhere (oopses, > BUGs, WARNs, dump_stack, soft lockups and hard lockups). Dunno. I've had cases where the simultaneity of the oopses (i.e. their garbled nature) gave me the clue about the type of race to expect. To still get that information: instead of taking a serializing spinlock (or in addition to it), it would be nice to at least preserve the true time order of the incidents, at minimum by generating a global count for oopses/warnings (a bit like the oops count # currently), and to gather it first - before taking any spinlocks. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 From: mingo@kernel.org (Ingo Molnar) Date: Tue, 24 Feb 2015 07:35:14 +0100 Subject: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups In-Reply-To: <1424748634-9153-1-git-send-email-anton@samba.org> References: <1424748634-9153-1-git-send-email-anton@samba.org> Message-ID: <20150224063513.GA15387@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org * Anton Blanchard wrote: > Every now and then I end up with an undebuggable issue > because multiple CPUs hit something at the same time and > everything is interleaved: > > CR: 48000082 XER: 00000000 > ,RI > c0000003dc72fd10 > ,LE > d0000000065b84e8 > Instruction dump: > MSR: 8000000100029033 > > Very annoying. > > Some architectures already have their own recursive > locking for oopses and we have another version for > serialising dump_stack. > > Create a common version and use it everywhere (oopses, > BUGs, WARNs, dump_stack, soft lockups and hard lockups). Dunno. I've had cases where the simultaneity of the oopses (i.e. their garbled nature) gave me the clue about the type of race to expect. To still get that information: instead of taking a serializing spinlock (or in addition to it), it would be nice to at least preserve the true time order of the incidents, at minimum by generating a global count for oopses/warnings (a bit like the oops count # currently), and to gather it first - before taking any spinlocks. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752223AbbBXGfU (ORCPT ); Tue, 24 Feb 2015 01:35:20 -0500 Received: from mail-we0-f170.google.com ([74.125.82.170]:33344 "EHLO mail-we0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbbBXGfT (ORCPT ); Tue, 24 Feb 2015 01:35:19 -0500 Date: Tue, 24 Feb 2015 07:35:14 +0100 From: Ingo Molnar To: Anton Blanchard Cc: Andrew Morton , Steven Rostedt , Michael Ellerman , Paul Mackerras , Benjamin Herrenschmidt , sam.bobroff@au1.ibm.com, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Russell King , peterz@infradead.org, Don Zickus , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, Linus Torvalds , Arjan van de Ven Subject: Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups Message-ID: <20150224063513.GA15387@gmail.com> References: <1424748634-9153-1-git-send-email-anton@samba.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1424748634-9153-1-git-send-email-anton@samba.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Anton Blanchard wrote: > Every now and then I end up with an undebuggable issue > because multiple CPUs hit something at the same time and > everything is interleaved: > > CR: 48000082 XER: 00000000 > ,RI > c0000003dc72fd10 > ,LE > d0000000065b84e8 > Instruction dump: > MSR: 8000000100029033 > > Very annoying. > > Some architectures already have their own recursive > locking for oopses and we have another version for > serialising dump_stack. > > Create a common version and use it everywhere (oopses, > BUGs, WARNs, dump_stack, soft lockups and hard lockups). Dunno. I've had cases where the simultaneity of the oopses (i.e. their garbled nature) gave me the clue about the type of race to expect. To still get that information: instead of taking a serializing spinlock (or in addition to it), it would be nice to at least preserve the true time order of the incidents, at minimum by generating a global count for oopses/warnings (a bit like the oops count # currently), and to gather it first - before taking any spinlocks. Thanks, Ingo