From: Stefan Seyfried <stefan.seyfried@googlemail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Takashi Iwai <tiwai@suse.de>,
Denys Vlasenko <dvlasenk@redhat.com>, X86 ML <x86@kernel.org>,
LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Wed, 18 Mar 2015 22:41:57 +0100 [thread overview]
Message-ID: <5509F125.7020006@message-id.googlemail.com> (raw)
In-Reply-To: <CALCETrXscbJoMpth_mW6DWbh3oEwDs4E5r0PTd5V0f3yQgvpNw@mail.gmail.com>
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried
> <stefan.seyfried@googlemail.com> wrote:
>> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
>>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
>>> <stefan.seyfried@googlemail.com> wrote:
>>
>>>>> The relevant thread's stack is here (see ti in the trace):
>>>>>
>>>>> ffff8801013d4000
>>>>>
>>>>> It could be interesting to see what's there.
>>>>>
>>>>> I don't suppose you want to try to walk the paging structures to see
>>>>> if ffff88023bc80000 (i.e. gsbase) and, more specifically,
>>>>> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are
>>>>> present? You'd only have to walk one level -- presumably, if the PGD
>>>>> entry is there, the rest of the entries are okay, too.
>>>>
>>>> That's all greek to me :-)
>>>>
>>>> I see that there is something at ffff88023bc80000:
>>>>
>>>> crash> x /64xg 0xffff88023bc80000
>>>> 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000
>>>> 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000
>>>> 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9
>>>> 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000
>>>> 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000
>>>> [all zeroes]
>>>> 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000
>>>>
>>>> old_rsp and kernel_stack seem bogus:
>>>> crash> print old_rsp
>>>> Cannot access memory at address 0xa200
>>>> gdb: gdb request failed: print old_rsp
>>>> crash> print kernel_stack
>>>> Cannot access memory at address 0xaa48
>>>> gdb: gdb request failed: print kernel_stack
>>>>
>>>> kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is:
>>>
>>> Yup. old_rsp and kernel_stack are offsets relative to gsbase.
>>>
>>>>
>>>> crash> x /64xg 0xffff88023bc8aa00
>>>> 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000
>>>
>>> [...]
>>>
>>> I don't know enough about crashkernel to know whether the fact that
>>> this worked means anything.
>>
>> AFAIK this just means that the memory at this location is included in
>> the dump :-)
>>
>>> Can you dump the page of physical memory at 0x4779a067? That's the PGD.
>>
>> Unfortunately not, this is a partial dump (I think the default config in
>> openSUSE, but I might have changed it some time ago) and the dump_level
>> is 31 which means that the following are excluded:
>>
>> | |cache |cache | |
>> dump | zero |without|with | user | free
>> level | page |private|private| data | page
>> -------+------+-------+-------+------+------
>> 31 | X | X | X | X | X
>>
>> so this:
>> crash> x /64xg 0x4779a067
>> 0x4779a067: Cannot access memory at address 0x4779a067
>> gdb: gdb request failed: x /64xg
>>
>> probably just means, that the PGD falls in one of the above excluded
>> categories.
>
> I suspect that it actually means that gdb sees virtual addresses, not
> physical addresses. But I screwed up completely -- "PGD" in the dump
> is the PGD *entry*, not the PGD pointer.
in crash, usually physical addresses work (it's a sophisticated wrapper
around gdb AFAICT)
>
> We could plausibly fish it out from current->mm, but that's a mess.
I'll come to that later
I
> don't suppose that "info registers" or "p/x $cr3" will show the cr3
> value?
No, that does not work from crash.
But current->mm is easy:
crash> task|grep mm
start_comm =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
mm = 0xffff8800b8a9c040,
active_mm = 0xffff8800b8a9c040,
comm = "qemu-system-x86",
and (guessing the type :-)
crash> print *(struct mm_struct *)0xffff8800b8a9c040|grep pgd
pgd = 0xffff880002d7e000,
But if that's correct, pgd contains all zeroes:
crash> print *(pgd_t *)0xffff880002d7e000
$15 = {
pgd = 0
}
crash> x /16xg 0xffff880002d7e000
0xffff880002d7e000: 0x0000000000000000 0x0000000000000000
0xffff880002d7e010: 0x0000000000000000 0x0000000000000000
0xffff880002d7e020: 0x0000000000000000 0x0000000000000000
0xffff880002d7e030: 0x0000000000000000 0x0000000000000000
0xffff880002d7e040: 0x0000000000000000 0x0000000000000000
0xffff880002d7e050: 0x0000000000000000 0x0000000000000000
0xffff880002d7e060: 0x0000000000000000 0x0000000000000000
0xffff880002d7e070: 0x0000000000000000 0x0000000000000000
> In any case, Denys is right -- my theory doesn't really hold water on
> non-SMAP systems.
Mine is definitely not new enough for this feature :)
Maybe it would be more helpful if Takashi who is able to reproduce this
more reliably than me would do a crash dump, preferably with a lower
dumplevel, to investigate on.
I have seen the bug two or three times in a week or two, which makes
waiting for it to happen a boring experience.
Best regards,
Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B
B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
next prev parent reply other threads:[~2015-03-18 21:42 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05 ` Takashi Iwai
2015-03-18 17:43 ` Takashi Iwai
2015-03-18 17:46 ` Takashi Iwai
2015-03-18 18:03 ` Andy Lutomirski
2015-03-18 19:03 ` Stefan Seyfried
2015-03-18 19:26 ` Andy Lutomirski
2015-03-18 20:05 ` Stefan Seyfried
2015-03-18 20:51 ` Andy Lutomirski
2015-03-18 21:12 ` Stefan Seyfried
2015-03-18 21:21 ` Andy Lutomirski
2015-03-18 21:41 ` Stefan Seyfried [this message]
2015-03-18 21:49 ` Denys Vlasenko
2015-03-18 21:53 ` Stefan Seyfried
2015-03-18 20:06 ` Denys Vlasenko
2015-03-18 20:49 ` Andy Lutomirski
2015-03-18 21:06 ` Denys Vlasenko
2015-03-18 21:17 ` Andy Lutomirski
2015-03-18 21:32 ` Linus Torvalds
2015-03-18 21:42 ` Denys Vlasenko
2015-03-18 21:55 ` Andy Lutomirski
2015-03-18 22:17 ` Denys Vlasenko
2015-03-18 22:20 ` Andy Lutomirski
2015-03-18 22:27 ` Denys Vlasenko
2015-03-18 22:18 ` Linus Torvalds
2015-03-18 22:24 ` Andy Lutomirski
2015-03-18 22:22 ` Jiri Kosina
2015-03-18 22:28 ` Linus Torvalds
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:38 ` Stefan Seyfried
2015-03-18 22:40 ` Andy Lutomirski
2015-03-18 23:22 ` Andy Lutomirski
2015-03-19 0:23 ` Stefan Seyfried
2015-03-19 0:57 ` Andy Lutomirski
2015-03-19 2:15 ` Linus Torvalds
2015-03-19 6:24 ` Stefan Seyfried
2015-03-19 10:16 ` Takashi Iwai
2015-03-19 10:58 ` Denys Vlasenko
2015-03-19 11:21 ` Takashi Iwai
2015-03-19 12:48 ` Denys Vlasenko
2015-03-19 13:47 ` Takashi Iwai
2015-03-19 14:55 ` Takashi Iwai
2015-03-19 15:22 ` Takashi Iwai
2015-03-19 15:41 ` Andy Lutomirski
2015-03-19 15:51 ` Takashi Iwai
2015-03-19 16:01 ` Andy Lutomirski
2015-03-20 18:16 ` Denys Vlasenko
2015-03-20 18:50 ` Takashi Iwai
2015-03-23 9:02 ` Takashi Iwai
2015-03-23 9:35 ` Takashi Iwai
2015-03-23 13:22 ` Takashi Iwai
2015-03-23 16:07 ` Denys Vlasenko
2015-03-23 17:18 ` Takashi Iwai
2015-03-23 17:46 ` Denys Vlasenko
2015-03-23 18:43 ` Takashi Iwai
2015-03-23 18:38 ` Andy Lutomirski
2015-03-23 18:48 ` Andy Lutomirski
2015-03-23 18:59 ` Takashi Iwai
2015-03-23 19:10 ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21 ` Denys Vlasenko
2015-03-23 19:27 ` Andy Lutomirski
2015-03-23 19:32 ` Andy Lutomirski
2015-03-24 11:17 ` Takashi Iwai
2015-03-24 20:08 ` Ingo Molnar
2015-03-25 0:35 ` Andy Lutomirski
2015-03-25 12:21 ` Ingo Molnar
2015-03-25 15:07 ` Andy Lutomirski
2015-03-25 9:13 ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54 ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56 ` Takashi Iwai
2015-03-23 19:07 ` Denys Vlasenko
2015-03-23 19:10 ` Andy Lutomirski
2015-03-19 13:21 ` Denys Vlasenko
2015-03-18 21:49 ` Stefan Seyfried
2015-03-28 23:57 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5509F125.7020006@message-id.googlemail.com \
--to=stefan.seyfried@googlemail.com \
--cc=dvlasenk@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=tiwai@suse.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).