* PV on HVM guest hang...
@ 2008-08-27 3:57 Mukesh Rathor
[not found] ` <6ee487650809251106v5618d1cdhfe775bb1ccf7303c@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Mukesh Rathor @ 2008-08-27 3:57 UTC (permalink / raw)
To: xen-devel
I'm debugging a hang of 64bit HVM guest with PV drivers. The problem happens
during migrate. So far I've discovered that the guest is stuck in loop
receiving interrupt 0xa9/169. In the hypervisor I see that upon vmx exit, it
sends 0xa9 right away...
(XEN) [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40 :0xa9
(XEN) [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
(XEN) [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
(XEN) [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
(XEN) [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
(XEN) [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
(XEN) [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
(XEN) [<ffff828c8014e568>] mmio_operands+0xa8/0x160
(XEN) [<ffff828c8014eb96>] handle_mmio+0x576/0x880
(XEN) [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900
I'm now trying ot figure out the IP that causes vm exit so I can figure where
in the guest/guest-driver its writing to the APIC.
On the guest side, I see that evtchn_pending_sel is not set in
evtchn_interrupt().
Any ideas/suggestions would be great as it is a critical bug.
Thanks
Mukesh
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: PV on HVM guest hang...
[not found] ` <6ee487650809251106v5618d1cdhfe775bb1ccf7303c@mail.gmail.com>
@ 2008-09-25 19:10 ` Mukesh Rathor
2008-09-26 7:06 ` Keir Fraser
0 siblings, 1 reply; 4+ messages in thread
From: Mukesh Rathor @ 2008-09-25 19:10 UTC (permalink / raw)
To: Sheng Liang; +Cc: xen-devel
I was able to finally track this down. Basically, on source machine, if there's
an event for the guest at the right moment during live migration, the line
is asserted via the pci_intx.i bit in:
__hvm_pci_intx_assert():
if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) ) <-----
return;
when moved to target, this gets carried over, and gsi is asserted again:
irq_load_pci():
if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) )
{
/* Direct GSI assert */
gsi = hvm_pci_intx_gsi(dev, intx);
hvm_irq->gsi_assert_count[gsi]++; <---
/* PCI-ISA bridge assert */
link = hvm_pci_intx_link(dev, intx);
hvm_irq->pci_link_assert_count[link]++;
}
As soon as it gets a xen_platform_pci event, the assert count causes it
to be delivered in a loop, hence the guest hang.
My simple fix is to just check for mask:
vioapic_masked():
.....
+ gsi = hvm_pci_intx_gsi(device, intx);
+ if (vioapic_masked(d, gsi))
+ return;
+
vioapic.c:
+int vioapic_masked(struct domain *d, unsigned int irq)
+{
+ struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+ union vioapic_redir_entry *ent;
+
+ ent = &vioapic->redirtbl[irq];
+ if ( ent->fields.mask )
+ return 1;
+
+ return 0;
+}
+
This seems to work, but not sure if it's the best fix, and currently waiting
for feedback from intel, and others here now.
Thanks
mukesh
Sheng Liang wrote:
> Mukesh,
>
> Did you ever get a response to this? Were you able to track it down?
>
> Sheng
>
> On Tue, Aug 26, 2008 at 8:57 PM, Mukesh Rathor <mukesh.rathor@oracle.com
> <mailto:mukesh.rathor@oracle.com>> wrote:
>
> I'm debugging a hang of 64bit HVM guest with PV drivers. The problem
> happens during migrate. So far I've discovered that the guest is
> stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see
> that upon vmx exit, it sends 0xa9 right away...
>
> (XEN) [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40 :0xa9
> (XEN) [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
> (XEN) [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
> (XEN) [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
> (XEN) [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
> (XEN) [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
> (XEN) [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
> (XEN) [<ffff828c8014e568>] mmio_operands+0xa8/0x160
> (XEN) [<ffff828c8014eb96>] handle_mmio+0x576/0x880
> (XEN) [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900
>
>
> I'm now trying ot figure out the IP that causes vm exit so I can
> figure where in the guest/guest-driver its writing to the APIC.
> On the guest side, I see that evtchn_pending_sel is not set in
> evtchn_interrupt().
>
> Any ideas/suggestions would be great as it is a critical bug.
>
> Thanks
> Mukesh
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com <mailto:Xen-devel@lists.xensource.com>
> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: PV on HVM guest hang...
2008-09-25 19:10 ` Mukesh Rathor
@ 2008-09-26 7:06 ` Keir Fraser
2008-10-03 2:18 ` Mukesh Rathor
0 siblings, 1 reply; 4+ messages in thread
From: Keir Fraser @ 2008-09-26 7:06 UTC (permalink / raw)
To: mukesh.rathor, Sheng Liang; +Cc: xen-devel
On 25/9/08 20:10, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> This seems to work, but not sure if it's the best fix, and currently waiting
> for feedback from intel, and others here now.
The fix is bogus since assertion of a GSI (basically an IO-APIC input pin)
should not be dependent on whether the pin is masked in the IO-APIC -- the
input pin 'voltage level' is obviously not affected by the
masked/not-masked.
This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is
getting stuck asserted). See the explicit logic to deassert and then
reassert the PV-on-HVM INTx line in irq_save_pci().
My guess would be that you are using 3.1 branch, where that fix was never
applied (not sure why; possibly I missed it by accident). You want changeset
15691 from xen-unstable.hg.
-- Keir
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: PV on HVM guest hang...
2008-09-26 7:06 ` Keir Fraser
@ 2008-10-03 2:18 ` Mukesh Rathor
0 siblings, 0 replies; 4+ messages in thread
From: Mukesh Rathor @ 2008-10-03 2:18 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Sheng Liang
Keir Fraser wrote:
> On 25/9/08 20:10, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
>
>> This seems to work, but not sure if it's the best fix, and currently waiting
>> for feedback from intel, and others here now.
>
> The fix is bogus since assertion of a GSI (basically an IO-APIC input pin)
> should not be dependent on whether the pin is masked in the IO-APIC -- the
> input pin 'voltage level' is obviously not affected by the
> masked/not-masked.
yeah, it was a shot in the dark... forgot lot of that since college :)..
> This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is
> getting stuck asserted). See the explicit logic to deassert and then
> reassert the PV-on-HVM INTx line in irq_save_pci().
>
> My guess would be that you are using 3.1 branch, where that fix was never
> applied (not sure why; possibly I missed it by accident). You want changeset
> 15691 from xen-unstable.hg.
Correct, 3.1.4. Got the changeset, and looks like it's fixed now.
Thanks as always... Mukesh
> -- Keir
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-10-03 2:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-27 3:57 PV on HVM guest hang Mukesh Rathor
[not found] ` <6ee487650809251106v5618d1cdhfe775bb1ccf7303c@mail.gmail.com>
2008-09-25 19:10 ` Mukesh Rathor
2008-09-26 7:06 ` Keir Fraser
2008-10-03 2:18 ` Mukesh Rathor
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.