From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Andy Lutomirski <luto@amacapital.net>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen <xen@lists.fedoraproject.org>,
Xen Devel <xen-devel@lists.xensource.com>,
kvm list <kvm@vger.kernel.org>,
Cole Robinson <crobinso@redhat.com>,
Borislav Petkov <bp@alien8.de>,
M A Young <m.a.young@durham.ac.uk>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?
Date: Thu, 17 Sep 2015 22:29:34 +0100 [thread overview]
Message-ID: <55FB30BE.3080603@citrix.com> (raw)
In-Reply-To: <CALCETrWT0WmZqv3jbd9jpmBD6M9fdtdJEz7yD92=C+PJH=PipQ@mail.gmail.com>
On 17/09/2015 21:23, Andy Lutomirski wrote:
> On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
>>> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
>>>>> On 09/16/2015 04:07 PM, M A Young wrote:
>>>>>> On Wed, 16 Sep 2015, Cole Robinson wrote:
>>>>>>
>>>>>>> Unfortunately I couldn't get anything else extra out of xen using any of these
>>>>>>> options or the ones Major recommended... in fact I couldn't get anything to
>>>>>>> the serial console at all. console=con1 would seem to redirect messages since
>>>>>>> they wouldn't show up on the graphical display, but nothing went to the serial
>>>>>>> log. Maybe I'm missing something...
>>>>>> That should be console=com1 so you have a typo either in this message or
>>>>>> in your tests.
>>>>>>
>>>>> Yeah that was it :/ So here's the crash output use -cpu host:
>>>>>
>>>>> - Cole
>>>>>
>>> <snip>
>>>
>>>>> about to get started...
>>>>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
>>>>> VCPU 0 [ec=0000]
>>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>>>>> create_bounce_frame+0x12b/0x13a
>>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>>>> (XEN) ----[ Xen-4.5.1 x86_64 debug=n Not tainted ]----
>>>>> (XEN) CPU: 0
>>>>> (XEN) RIP: e033:[<ffffffff810032b0>]
>>>> That is the Linux kernel EIP. Can you figure out what is at ffffffff810032b0 ?
>>>>
>>>> gdb vmlinux and then
>>>> x/20i 0xffffffff810032b0
>>>>
>>>> can help with that.
>>>>
>>> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
>>>
>>> about to get started...
>>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
>>> VCPU 0 [ec=0000]
> What exactly does this mean?
This means that there was #GP fault originating from dom0 context, but
dom0 has not yet registered a #GP handler with Xen. (I already have a
patch pending to correct the wording of that error message.)
Would be a double fault on native.
>
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>>> create_bounce_frame+0x12b/0x13a
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.5.1 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e033:[<ffffffff810031f0>]
>>> (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest
>>> (XEN) rax: 0000000000000015 rbx: ffffffff81c03e1c rcx: 00000000c0010112
>>> (XEN) rdx: 0000000000000001 rsi: ffffffff81c03e1c rdi: 00000000c0010112
>>> (XEN) rbp: ffffffff81c03df8 rsp: ffffffff81c03da0 r8: ffffffff81c03e28
>>> (XEN) r9: ffffffff81c03e2c r10: 0000000000000000 r11: 00000000ffffffff
>>> (XEN) r12: ffffffff81d25a60 r13: 0000000004000000 r14: 0000000000000000
>>> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000406f0
>>> (XEN) cr3: 0000000075c0b000 cr2: 0000000000000000
>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
>>> (XEN) Guest stack trace from rsp=ffffffff81c03da0:
>>> (XEN) 00000000c0010112 00000000ffffffff 0000000000000000 ffffffff810031f0
>>> (XEN) 000000010000e030 0000000000010082 ffffffff81c03de0 000000000000e02b
>>> (XEN) 0000000000000000 000000000000000c ffffffff81c03e1c ffffffff81c03e48
>>> (XEN) ffffffff8102a7a4 ffffffff81c03e48 ffffffff8102aa3b ffffffff81c03e48
>>> (XEN) cf1fa5f5e026f464 0000000001000000 ffffffff81c03ef8 0000000004000000
>>> (XEN) 0000000000000000 ffffffff81c03e58 ffffffff81d5d142 ffffffff81c03ee8
>>> (XEN) ffffffff81d58b56 0000000000000000 0000000000000000 ffffffff81c03e88
>>> (XEN) ffffffff810f8a39 ffffffff81c03ee8 ffffffff81798b13 ffffffff00000010
>>> (XEN) ffffffff81c03ef8 ffffffff81c03eb8 cf1fa5f5e026f464 ffffffff81f1de9c
>>> (XEN) ffffffffffffffff 0000000000000000 ffffffff81df7920 0000000000000000
>>> (XEN) 0000000000000000 ffffffff81c03f28 ffffffff81d51c74 cf1fa5f5e026f464
>>> (XEN) 0000000000000000 ffffffff81c03f60 ffffffff81c03f5c 0000000000000000
>>> (XEN) 0000000000000000 ffffffff81c03f38 ffffffff81d51339 ffffffff81c03ff8
>>> (XEN) ffffffff81d548b1 0000000000000000 00600f1200000000 0000000100000800
>>> (XEN) 0300000100000032 0000000000000005 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) 0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>
>>>
>>> gdb output:
>>>
>>> (gdb) x/20i 0xffffffff810031f0
>>> 0xffffffff810031f0 <xen_read_msr_safe+16>: rdmsr
>> Fantastic! So we have some rdmsr that makes KVM inject an
>> GP.
> What's the scenario? Is this Xen on KVM?
I believe from the thread that this is a Xen/dom0 combo running as a KVM
guest.
>
> Why didn't the guest print anything?
Lack of earlyprintk=xen on the dom0 command line. (IMO this really
should be the default when a PVOPs detects that it is running under Xen)
>
> Is the issue here that the guest died due to failure to handle an
> RDMSR failure or did the *hypervisor* die?
The guest suffered a GP fault which it couldn't handle. Therefore Xen
crashed the domain.
When dom0 crashes, Xen goes down too.
>
> It looks like null_trap_bounce is returning true, which suggests that
> the failure is happening before the guest sets up exception handling.
I concur.
>
>> Looking at the stack you have some other values:
>> ffffffff81c03de0, ffffffff81c03e1c .. they should correspond
>> to other functions calling this one. If you do 'nm --defined vmlinux | grep ffffffff81c03e1'
>> that should give an idea where they are. Or use 'gdb'.
>>
>> That will give us an stack - and we can find what type of MSR
>> this is. Oh wait, it is on the registers: 00000000c0010112
>>
>> Ok, so where in the code is that MSR ah, that looks to be:
>> #define MSR_K8_TSEG_ADDR 0xc0010112
>>
>> which is called at bsp_init_amd.
>>
>> I think the problem here is that we are calling the
>> 'safe' variant of MSR but we still get an injected #GP and
>> don't expect that.
>>
>> I am not really sure what the expected outcome should be here.
>>
>> CC-ing xen-devel, KVM folks, and Andy who has been looking
>> in mucking around in the _safe* pvops.
> It's too early of a failure, I think.
>
> Cc: Borislav. Is TSEG guaranteed to exist? Can we defer that until
> we have exception handling working? Do we need to rig up exception
> handling so that it works earlier (e.g. in early_trap_init, which is
> presumably early enough)? Or is this just a KVM and/or Xen bug.
It would certainly help to move the exception setup as early as possible.
>From a Xen PV guests point of view, the kernel is already executing on
working pagetables and flat GDT when it starts. A set_trap_table
hypercall (equivalent of `lidt`) ought to be the second action,
following the stack switch.
This appears not to be the case, and the load_idt() is deferred until
native cpu_init().
~Andrew
next prev parent reply other threads:[~2015-09-17 21:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <55F87984.7030903@redhat.com>
[not found] ` <alpine.DEB.2.00.1509152223140.16001@procyon.dur.ac.uk>
[not found] ` <55F9C792.8070205@redhat.com>
[not found] ` <alpine.DEB.2.00.1509162056260.3899@procyon.dur.ac.uk>
[not found] ` <55F9D95F.9040401@redhat.com>
[not found] ` <20150916210814.GA4643@l.oracle.com>
[not found] ` <55F9EF87.7030407@redhat.com>
2015-09-17 20:10 ` [Fedora-xen] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: Running fedora xen on top of KVM? Konrad Rzeszutek Wilk
2015-09-17 20:23 ` rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] " Andy Lutomirski
2015-09-17 21:29 ` Andrew Cooper [this message]
2015-09-18 13:54 ` Borislav Petkov
2015-09-18 15:20 ` Andy Lutomirski
2015-09-18 19:04 ` Borislav Petkov
2015-09-18 19:15 ` Cole Robinson
2015-09-21 4:49 ` Andy Lutomirski
2015-09-22 18:23 ` [Fedora-xen] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: " Konrad Rzeszutek Wilk
2015-09-22 18:32 ` rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] " Andy Lutomirski
2015-09-18 15:27 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55FB30BE.3080603@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=bp@alien8.de \
--cc=crobinso@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=m.a.young@durham.ac.uk \
--cc=pbonzini@redhat.com \
--cc=xen-devel@lists.xensource.com \
--cc=xen@lists.fedoraproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).