xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* x86/passthrough: Fix corruption caused by race conditions between device allocation and deallocation to a domain.
@ 2012-09-12 18:00 Andrew Cooper
  0 siblings, 0 replies; only message in thread
From: Andrew Cooper @ 2012-09-12 18:00 UTC (permalink / raw)
  To: xen-devel@lists.xen.org, Keir Fraser, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

This issue certainly affects unstable, 4.2 and 4.1, and probably affects
previous releases as well.

It has been tested by using 40 VMs with 8 devices passed through to
each, in a reboot loop.  Before the fix, the host would reliably crash
before the end of the first reboot cycle, whereas with the fix, the VMs
are on their 20th reboot cycle with no issue.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: passthrough-fix-corruption.patch --]
[-- Type: text/x-patch, Size: 3935 bytes --]

# HG changeset patch
# Parent 7d770de90b7f92bb197bd54d8ac0941e2e5ae96a
x86/passthrough: Fix corruption caused by race conditions between device allocation and deallocation to a domain.

A toolstack, when dealing with a domain using PCIPassthrough, could
reasonably be expected to issue DOMCTL_deassign_device hypercalls to
remove all passed through devices before issuing a DOMCTL_destroydomain
hypercall to kill the domain.  In the case where a toolstack is perhaps
less sensible in this regard, the hypervisor should not fall over.

In domain_kill(), pci_release_devices() searches the alldevs_list list
looking for PCI devices still assigned to the domain.  If the toolstack
has correctly deassigned all devices before killing the domain, this
loop does nothing.

However, if there are still devices attached to the domain, the loop
will call pci_cleanup_msi() without unbinding the pirq from the domain.
This eventually calls destroy_irq() which xfree()'s the action.

However, as the irq_desc->action pointer is abused in an unsafe matter,
without unbinding first (which at least correctly cleans up), the action
is actually an irq_guest_action_t* rather than an irqaction*, meaning
that the cpu_eoi_map is leaked, and eoi_timer is free()'d while still
being on a pcpu's inactive_timer list.  As a result, when this free()'d
memory gets reused, the inactive_timer list becomes corrupt, and
list_*** operations will corrupt hypervisor memory.

If the above were not bad enough, the loop in pci_release_devices()
still leaves references to the irq it destroyed in domain->arch.pirq_irq
and irq_pirq, meaning that a later loop, free_domain_pirqs(), which
happens as a result of complete_domain_destroy() will unbind and destroy
all irqs which were still bound to the domain, resulting in a double
destroy of any irq which was still bound to the domain at the point at
which the DOMCTL_destroydomain hypercall happened.

Because of the allocation of irqs from find_unassigned_irq(), the lowest
free irq number is going to be handed back from create_irq().

There is a further race condition between the original (incorrect) call
to destroy_irq() from pci_release_devices(), and the later call to
free_domain_pirqs() (which happens in a softirq context at some point
after the domain has officially died) during which the same irq number
(which is still referenced in a stale way in domain->arch.pirq_irq and
irq_pirq) has been allocated to a new domain via a PHYSDEVOP_map_pirq
hypercall (Say perhaps in the case of rebooting a domain).

In this case, the cleanup for the dead domain will free the recently
bound irq under the feet of the new domain.  Furthermore, after the irq
has been incorrectly destroyed, the same domain with another
PHYSDEVOP_map_pirq hypercall can be allocated the same irq number as
before, leading to an error along the lines of:

../physdev.c:188: dom54: -1:-1 already mapped to 74

In this case, the pirq_irq and irq_pirq mappings get updated to the new
PCI device from the latter PHYSDEVOP_map_pirq hypercall, and the IOMMU
interrupt remapping registers get updated, leading to IOMMU Primary
Pending Fault due to source-id verification failure for incoming
interrupts from the passed through device.


The easy fix is to simply deassign the device in pci_release_devices()
and leave all the real cleanup to the free_domain_pirqs() which
correctly unbinds and destroys the irq without leaving stale references
around.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 7d770de90b7f xen/drivers/passthrough/pci.c
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -551,7 +551,6 @@ void pci_release_devices(struct domain *
     pci_clean_dpci_irqs(d);
     while ( (pdev = pci_get_pdev_by_domain(d, -1, -1, -1)) )
     {
-        pci_cleanup_msi(pdev);
         bus = pdev->bus;
         devfn = pdev->devfn;
         if ( deassign_device(d, pdev->seg, bus, devfn) )

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2012-09-12 18:00 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 18:00 x86/passthrough: Fix corruption caused by race conditions between device allocation and deallocation to a domain Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).