guest crash in wrmsr_hypervisor_regs if hypercall page is paged out

All of lore.kernel.org
 help / color / mirror / Atom feed

* guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
@ 2013-04-30 18:19 Olaf Hering
  2013-05-02 11:20 ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Olaf Hering @ 2013-04-30 18:19 UTC (permalink / raw)
  To: xen-devel


With current xen-unstable I see this guest crash if the gfn 169ff is
paged out. The xenpaging -v output shows that 169ff is populated, but
appearently wrmsr_hypervisor_regs does not like the resulting mfn?!

...
(XEN) HVM10: HVM Loader
(XEN) HVM10: Detected Xen v4.3.26939-20130430
(XEN) HVM10: Xenbus rings @0xfeffc000, event channel 6
(XEN) HVM10: System requested SeaBIOS
(XEN) HVM10: CPU speed is 2926 MHz
(XEN) irq.c:270: Dom10 PCI link 0 changed 0 -> 5
(XEN) HVM10: PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:270: Dom10 PCI link 1 changed 0 -> 10
(XEN) HVM10: PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:270: Dom10 PCI link 2 changed 0 -> 11
(XEN) HVM10: PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:270: Dom10 PCI link 3 changed 0 -> 5
(XEN) HVM10: PCI-ISA link 3 routed to IRQ5
(XEN) HVM10: pci dev 01:2 INTD->IRQ5
(XEN) HVM10: pci dev 01:3 INTA->IRQ10
(XEN) HVM10: pci dev 03:0 INTA->IRQ5
(XEN) HVM10: pci dev 02:0 bar 10 size lx: 02000000
(XEN) HVM10: pci dev 03:0 bar 14 size lx: 01000000
(XEN) HVM10: pci dev 02:0 bar 30 size lx: 00010000
(XEN) HVM10: pci dev 02:0 bar 14 size lx: 00001000
(XEN) HVM10: pci dev 03:0 bar 10 size lx: 00000100
(XEN) HVM10: pci dev 01:2 bar 20 size lx: 00000020
(XEN) HVM10: pci dev 01:1 bar 20 size lx: 00000010
(XEN) HVM10: Multiprocessor initialisation:
(XEN) HVM10:  - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done.
(XEN) HVM10:  - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done.
(XEN) HVM10:  - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done.
(XEN) HVM10:  - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done.
(XEN) HVM10: Testing HVM environment:
(XEN) HVM10:  - REP INSB across page boundaries ... passed
(XEN) HVM10:  - GS base MSRs and SWAPGS ... passed
(XEN) HVM10: Passed 2 of 2 tests
(XEN) HVM10: Writing SMBIOS tables ...
(XEN) HVM10: Loading SeaBIOS ...
(XEN) HVM10: Creating MP tables ...
(XEN) HVM10: Loading ACPI ...
(XEN) HVM10: vm86 TSS at fc00a100
(XEN) HVM10: BIOS map:
(XEN) HVM10:  10000-100d3: Scratch space
(XEN) HVM10:  e0000-fffff: Main BIOS
(XEN) HVM10: E820 table:
(XEN) HVM10:  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(XEN) HVM10:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM10:  [01]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM10:  [02]: 00000000:00100000 - 00000000:16a00000: RAM
(XEN) HVM10:  HOLE: 00000000:16a00000 - 00000000:fc000000
(XEN) HVM10:  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM10: Invoking SeaBIOS ...
(XEN) HVM10: SeaBIOS (version ?-20130430_174224-bax)
(XEN) HVM10:
(XEN) HVM10: Found Xen hypervisor signature at 40000000
(XEN) HVM10: xen: copy e820...
(XEN) HVM10: Ram Size=0x16a00000 (0x0000000000000000 high)
(XEN) HVM10: Relocating low data from 0x000e2490 to 0x000ef790 (size 2156)
(XEN) HVM10: Relocating init from 0x000e2cfc to 0x169e20f0 (size 56804)
(XEN) HVM10: CPU Mhz=2928
(XEN) HVM10: Found 7 PCI devices (max PCI bus is 00)
(XEN) HVM10: Allocated Xen hypercall page at 169ff000
(XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
(XEN) HVM10: Detected Xen v4.3
(XEN) io.c:201:d10 MMIO emulation failed @ 0008:c2c2c2c2: 18 7c 55 6d 03 83 ff ff 10 7c
(XEN) hvm.c:1253:d10 Triple fault on VCPU0 - invoking HVM shutdown action 1.
(XEN) HVM11: HVM Loader
(XEN) HVM11: Detected Xen v4.3.26939-20130430
...


The .cfg file looks like this:

name="12.3_full_default2"
uuid="3c8c0937-cb46-4fe9-a871-8e4c60ab8dfe"
memory=370
vcpus=4
serial="pty"
builder="hvm"
boot="dc"
disk=[ 
        'file:/some.vdisk,hda,w',
        'file:/some.iso,hdc:cdrom,r',
]
vif=[
        'bridge=br0,type=netfront'
]
vfb = [
        'type=vnc,vncunused=1,keymap=de'
]
usb=1
usbdevice='tablet'

I'm using "xl -v -v create -d -p -f domU.cfg" to start it, then run xenpaging
manually, then unpause the guest.


Olaf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-04-30 18:19 guest crash in wrmsr_hypervisor_regs if hypercall page is paged out Olaf Hering
@ 2013-05-02 11:20 ` Tim Deegan
  2013-05-02 14:43   ` Olaf Hering
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2013-05-02 11:20 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
> 
> With current xen-unstable I see this guest crash if the gfn 169ff is
> paged out. The xenpaging -v output shows that 169ff is populated, but
> appearently wrmsr_hypervisor_regs does not like the resulting mfn?!

Looks that way:

> (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000

That MFN looks like garbage, so I'm guessing that 'page' was null, i.e.
get_page_from_gfn() returned NULL.  I guess you'll need to instrument it
up to figure out why.  At least the GFN is a predictable constant which
should make it easier to add debugging printout for just this case.

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 11:20 ` Tim Deegan
@ 2013-05-02 14:43   ` Olaf Hering
  2013-05-02 14:52     ` Tim Deegan
  2013-05-02 14:58     ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Olaf Hering @ 2013-05-02 14:43 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

On Thu, May 02, Tim Deegan wrote:

> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
> > 
> > With current xen-unstable I see this guest crash if the gfn 169ff is
> > paged out. The xenpaging -v output shows that 169ff is populated, but
> > appearently wrmsr_hypervisor_regs does not like the resulting mfn?!
> 
> Looks that way:
> 
> > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
> 
> That MFN looks like garbage, so I'm guessing that 'page' was null, i.e.
> get_page_from_gfn() returned NULL.  I guess you'll need to instrument it
> up to figure out why.  At least the GFN is a predictable constant which
> should make it easier to add debugging printout for just this case.

The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.

Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
gfn? I was under the impression that get_page_from_gfn would wait until
the gfn is paged-in again.

Olaf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 14:43   ` Olaf Hering
@ 2013-05-02 14:52     ` Tim Deegan
  2013-05-02 14:58     ` Jan Beulich
  1 sibling, 0 replies; 8+ messages in thread
From: Tim Deegan @ 2013-05-02 14:52 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

At 16:43 +0200 on 02 May (1367512981), Olaf Hering wrote:
> On Thu, May 02, Tim Deegan wrote:
> 
> > At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
> > > 
> > > With current xen-unstable I see this guest crash if the gfn 169ff is
> > > paged out. The xenpaging -v output shows that 169ff is populated, but
> > > appearently wrmsr_hypervisor_regs does not like the resulting mfn?!
> > 
> > Looks that way:
> > 
> > > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
> > > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
> > 
> > That MFN looks like garbage, so I'm guessing that 'page' was null, i.e.
> > get_page_from_gfn() returned NULL.  I guess you'll need to instrument it
> > up to figure out why.  At least the GFN is a predictable constant which
> > should make it easier to add debugging printout for just this case.
> 
> The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.
> 
> Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
> gfn? I was under the impression that get_page_from_gfn would wait until
> the gfn is paged-in again.

Ah, it doesn't seem to be that way.  Other callers of the p2m functions
handle this in the caller. :(

So you'll need to add something like:
    if ( paged )
        p2m_mem_paging_populate(d, gmfn);

here (and anywhere else).

It would be much better if this could happen inside the p2m lookup
function, but ISTR it currently can't because you can't sleep with any
locks held.

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 14:43   ` Olaf Hering
  2013-05-02 14:52     ` Tim Deegan
@ 2013-05-02 14:58     ` Jan Beulich
  2013-05-02 15:20       ` Olaf Hering
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2013-05-02 14:58 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Tim Deegan, xen-devel

>>> On 02.05.13 at 16:43, Olaf Hering <olaf@aepfle.de> wrote:
> On Thu, May 02, Tim Deegan wrote:
> 
>> At 20:19 +0200 on 30 Apr (1367353157), Olaf Hering wrote:
>> > 
>> > With current xen-unstable I see this guest crash if the gfn 169ff is
>> > paged out. The xenpaging -v output shows that 169ff is populated, but
>> > appearently wrmsr_hypervisor_regs does not like the resulting mfn?!
>> 
>> Looks that way:
>> 
>> > (XEN) HVM10: Allocated Xen hypercall page at 169ff000
>> > (XEN) traps.c:654:d10 Bad GMFN 169ff (MFN 3e900000000) to MSR 40000000
>> 
>> That MFN looks like garbage, so I'm guessing that 'page' was null, i.e.
>> get_page_from_gfn() returned NULL.  I guess you'll need to instrument it
>> up to figure out why.  At least the GFN is a predictable constant which
>> should make it easier to add debugging printout for just this case.
> 
> The GMFN has p2m_t p2m_ram_paged, so the mfn is -1.
> 
> Its not clear to me, how should wrmsr_hypervisor_regs handle a paged
> gfn? I was under the impression that get_page_from_gfn would wait until
> the gfn is paged-in again.

We can't put a vCPU to sleep at arbitrary points yet, which means
that right now the caller of the function is responsible for the
wait-and-retry - normally that would be in hypercall handlers, but
obviously you need this here too.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 14:58     ` Jan Beulich
@ 2013-05-02 15:20       ` Olaf Hering
  2013-05-02 15:29         ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Olaf Hering @ 2013-05-02 15:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, xen-devel

On Thu, May 02, Jan Beulich wrote:

> We can't put a vCPU to sleep at arbitrary points yet, which means
> that right now the caller of the function is responsible for the
> wait-and-retry - normally that would be in hypercall handlers, but
> obviously you need this here too.

Yes, thats the issue.

vmx_msr_write_intercept and svm_msr_write_intercept could just return
X86EMUL_RETRY to their callers.

How should emulate_privileged_op handle the wrmsr_hypervisor_regs
failure due to a paged page?


Olaf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 15:20       ` Olaf Hering
@ 2013-05-02 15:29         ` Jan Beulich
  2013-05-02 17:46           ` Olaf Hering
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2013-05-02 15:29 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Tim Deegan, xen-devel

>>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de> wrote:
> On Thu, May 02, Jan Beulich wrote:
> 
>> We can't put a vCPU to sleep at arbitrary points yet, which means
>> that right now the caller of the function is responsible for the
>> wait-and-retry - normally that would be in hypercall handlers, but
>> obviously you need this here too.
> 
> Yes, thats the issue.
> 
> vmx_msr_write_intercept and svm_msr_write_intercept could just return
> X86EMUL_RETRY to their callers.
> 
> How should emulate_privileged_op handle the wrmsr_hypervisor_regs
> failure due to a paged page?

That's a PV only path, hence no need to consider paging. Just
assert that the return value of X86EMUL_OKAY.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: guest crash in wrmsr_hypervisor_regs if hypercall page is paged out
  2013-05-02 15:29         ` Jan Beulich
@ 2013-05-02 17:46           ` Olaf Hering
  0 siblings, 0 replies; 8+ messages in thread
From: Olaf Hering @ 2013-05-02 17:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, xen-devel

On Thu, May 02, Jan Beulich wrote:

> >>> On 02.05.13 at 17:20, Olaf Hering <olaf@aepfle.de> wrote:
> > On Thu, May 02, Jan Beulich wrote:
> > 
> >> We can't put a vCPU to sleep at arbitrary points yet, which means
> >> that right now the caller of the function is responsible for the
> >> wait-and-retry - normally that would be in hypercall handlers, but
> >> obviously you need this here too.
> > 
> > Yes, thats the issue.
> > 
> > vmx_msr_write_intercept and svm_msr_write_intercept could just return
> > X86EMUL_RETRY to their callers.
> > 
> > How should emulate_privileged_op handle the wrmsr_hypervisor_regs
> > failure due to a paged page?
> 
> That's a PV only path, hence no need to consider paging. Just
> assert that the return value of X86EMUL_OKAY.

I sent a patch which fixes this issue for me. The 4.2 branch has
appearently the same issue.

Olaf

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-05-02 17:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-30 18:19 guest crash in wrmsr_hypervisor_regs if hypercall page is paged out Olaf Hering
2013-05-02 11:20 ` Tim Deegan
2013-05-02 14:43   ` Olaf Hering
2013-05-02 14:52     ` Tim Deegan
2013-05-02 14:58     ` Jan Beulich
2013-05-02 15:20       ` Olaf Hering
2013-05-02 15:29         ` Jan Beulich
2013-05-02 17:46           ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.