From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751428AbaIJN3K (ORCPT ); Wed, 10 Sep 2014 09:29:10 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:48982 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750935AbaIJN3I (ORCPT ); Wed, 10 Sep 2014 09:29:08 -0400 Date: Wed, 10 Sep 2014 15:28:57 +0200 From: Peter Zijlstra To: Vince Weaver Cc: linux-kernel@vger.kernel.org, Paul Mackerras , Ingo Molnar , Arnaldo Carvalho de Melo , Steven Rostedt Subject: Re: perf: perf_fuzzer triggers instant reboot Message-ID: <20140910132857.GA4783@worktop.ger.corp.intel.com> References: <20140908185115.GI6758@twins.programming.kicks-ass.net> <20140910083136.GP6758@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 10, 2014 at 09:18:35AM -0400, Vince Weaver wrote: > Somehow something is stomping over memory with a forking workload (likely > an improper free with RCU like we've seen before) but the fact that it > causes a reboot immediately makes it *really* hard to debug this. Yes, the insta reboot thing is a total pain. Too bad Steve is out for a spell; the only thing I can think of is trying to 'preserve' the trace buffer over the reboot; its a warm reboot and memory contents should be 'stable'. So if we can get the new boot to agree with the old kernel's idea of trace buffers we might retain enough. Another approach would be using the firewire debug facility to read the trace buffer post-mortem. Of course, that requires you have FW in at least two boxes and an appropriate cable (not something I've actually ever done due to lack of FW hardware). Maybe the EHCI debug port (USB) might provide similar capabilities -- again, significant lack of experience due to not actually having hardware for that. I think I've once managed to hit the triple fault reboot in qemu/kvm, which makes inspecting the dead state tons easier, if you can manage to reproduce in a virt environment you've got a chance (of course, the problem at that time was not perf and so a lot less sensitive to hardware).