From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964924AbbCRVmE (ORCPT ); Wed, 18 Mar 2015 17:42:04 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:37765 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755076AbbCRVmB (ORCPT ); Wed, 18 Mar 2015 17:42:01 -0400 Message-ID: <5509F125.7020006@message-id.googlemail.com> Date: Wed, 18 Mar 2015 22:41:57 +0100 From: Stefan Seyfried User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Andy Lutomirski CC: Linus Torvalds , Takashi Iwai , Denys Vlasenko , X86 ML , LKML , Tejun Heo Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? References: <5505400B.8050300@message-id.googlemail.com> <5509CBF7.3040602@message-id.googlemail.com> <5509DAA4.10901@message-id.googlemail.com> <5509EA3B.8040106@message-id.googlemail.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 18.03.2015 um 22:21 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried > wrote: >> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried >>> wrote: >> >>>>> The relevant thread's stack is here (see ti in the trace): >>>>> >>>>> ffff8801013d4000 >>>>> >>>>> It could be interesting to see what's there. >>>>> >>>>> I don't suppose you want to try to walk the paging structures to see >>>>> if ffff88023bc80000 (i.e. gsbase) and, more specifically, >>>>> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are >>>>> present? You'd only have to walk one level -- presumably, if the PGD >>>>> entry is there, the rest of the entries are okay, too. >>>> >>>> That's all greek to me :-) >>>> >>>> I see that there is something at ffff88023bc80000: >>>> >>>> crash> x /64xg 0xffff88023bc80000 >>>> 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000 >>>> 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000 >>>> 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9 >>>> 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000 >>>> 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000 >>>> [all zeroes] >>>> 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000 >>>> >>>> old_rsp and kernel_stack seem bogus: >>>> crash> print old_rsp >>>> Cannot access memory at address 0xa200 >>>> gdb: gdb request failed: print old_rsp >>>> crash> print kernel_stack >>>> Cannot access memory at address 0xaa48 >>>> gdb: gdb request failed: print kernel_stack >>>> >>>> kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is: >>> >>> Yup. old_rsp and kernel_stack are offsets relative to gsbase. >>> >>>> >>>> crash> x /64xg 0xffff88023bc8aa00 >>>> 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000 >>> >>> [...] >>> >>> I don't know enough about crashkernel to know whether the fact that >>> this worked means anything. >> >> AFAIK this just means that the memory at this location is included in >> the dump :-) >> >>> Can you dump the page of physical memory at 0x4779a067? That's the PGD. >> >> Unfortunately not, this is a partial dump (I think the default config in >> openSUSE, but I might have changed it some time ago) and the dump_level >> is 31 which means that the following are excluded: >> >> | |cache |cache | | >> dump | zero |without|with | user | free >> level | page |private|private| data | page >> -------+------+-------+-------+------+------ >> 31 | X | X | X | X | X >> >> so this: >> crash> x /64xg 0x4779a067 >> 0x4779a067: Cannot access memory at address 0x4779a067 >> gdb: gdb request failed: x /64xg >> >> probably just means, that the PGD falls in one of the above excluded >> categories. > > I suspect that it actually means that gdb sees virtual addresses, not > physical addresses. But I screwed up completely -- "PGD" in the dump > is the PGD *entry*, not the PGD pointer. in crash, usually physical addresses work (it's a sophisticated wrapper around gdb AFAICT) > > We could plausibly fish it out from current->mm, but that's a mess. I'll come to that later I > don't suppose that "info registers" or "p/x $cr3" will show the cr3 > value? No, that does not work from crash. But current->mm is easy: crash> task|grep mm start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" mm = 0xffff8800b8a9c040, active_mm = 0xffff8800b8a9c040, comm = "qemu-system-x86", and (guessing the type :-) crash> print *(struct mm_struct *)0xffff8800b8a9c040|grep pgd pgd = 0xffff880002d7e000, But if that's correct, pgd contains all zeroes: crash> print *(pgd_t *)0xffff880002d7e000 $15 = { pgd = 0 } crash> x /16xg 0xffff880002d7e000 0xffff880002d7e000: 0x0000000000000000 0x0000000000000000 0xffff880002d7e010: 0x0000000000000000 0x0000000000000000 0xffff880002d7e020: 0x0000000000000000 0x0000000000000000 0xffff880002d7e030: 0x0000000000000000 0x0000000000000000 0xffff880002d7e040: 0x0000000000000000 0x0000000000000000 0xffff880002d7e050: 0x0000000000000000 0x0000000000000000 0xffff880002d7e060: 0x0000000000000000 0x0000000000000000 0xffff880002d7e070: 0x0000000000000000 0x0000000000000000 > In any case, Denys is right -- my theory doesn't really hold water on > non-SMAP systems. Mine is definitely not new enough for this feature :) Maybe it would be more helpful if Takashi who is able to reproduce this more reliably than me would do a crash dump, preferably with a lower dumplevel, to investigate on. I have seen the bug two or three times in a week or two, which makes waiting for it to happen a boring experience. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537