* IOMMU faults
@ 2011-06-16 9:25 Tim Deegan
2011-06-16 9:47 ` Jean Guyader
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Tim Deegan @ 2011-06-16 9:25 UTC (permalink / raw)
To: Allen Kay, Wei Wang; +Cc: xen-devel, Jean Guyader
Hi, IOMMU maintainers,
What should Xen do when an IOMMU fault happens? As far as I can
see both the AMD and Intel code clears the error in the IOMMU and
carries on, but I suspect some more vigorous action is appropriate.
I've seen traces from an Intel machine that seemed to be livelocked on
IOMMU faults from a passed-through VGA card, until it was killed by the
watchdog. I think I can see two things that contribute to that:
- The Intel IOMMU fault handler prints quite a lot of info in interrupt
context, making it easier to livelock. Still I think the general
problem applies on AMD too.
- Domain destruction re-assigns passed though cards to dom0, but the
cards don't seem to get reset. So there's nothing to stop a card
battering away at DMA in the meantime. That seems like a problem
independent of livelock, actually.
In any case, it seems like it would be a good idea to stop a
broken/malicious/deassigned card from flooding Xen with IOMMU faults.
I was considering just writing 0 to the faulting card's PCI command
register, but I'm told that's not always enough to properly deactivate
a card, and it might be a little over-zealous to do it on the first
offence.
Ideas?
Tim.
--
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: IOMMU faults 2011-06-16 9:25 IOMMU faults Tim Deegan @ 2011-06-16 9:47 ` Jean Guyader 2011-06-16 10:07 ` Tim Deegan 2011-06-16 14:30 ` Wei Wang2 2011-06-16 19:21 ` Kay, Allen M 2 siblings, 1 reply; 12+ messages in thread From: Jean Guyader @ 2011-06-16 9:47 UTC (permalink / raw) To: Tim Deegan Cc: Wei Wang, xen-devel@lists.xensource.com, Allen Kay, Jean Guyader [-- Attachment #1: Type: text/plain, Size: 2254 bytes --] On 16/06 10:25, Tim Deegan wrote: > Hi, IOMMU maintainers, > > What should Xen do when an IOMMU fault happens? As far as I can > see both the AMD and Intel code clears the error in the IOMMU and > carries on, but I suspect some more vigorous action is appropriate. > I've seen traces from an Intel machine that seemed to be livelocked on > IOMMU faults from a passed-through VGA card, until it was killed by the > watchdog. I think I can see two things that contribute to that: > > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too. > - Domain destruction re-assigns passed though cards to dom0, but the > cards don't seem to get reset. So there's nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually. > > In any case, it seems like it would be a good idea to stop a > broken/malicious/deassigned card from flooding Xen with IOMMU faults. > > I was considering just writing 0 to the faulting card's PCI command > register, but I'm told that's not always enough to properly deactivate > a card, and it might be a little over-zealous to do it on the first > offence. > > Ideas? > Hi Tim, We have seed such behavior when we were testing GPU assignement especially the Intel GPU. The problem is that domain destruction in Xen is assynchronous and right now the pci device reset is done in dom0 with some help of the toolstack. In the Intel GPU case we need to make sure that the guest memory and the IOMMU are still in place while we perform to reset otherwise the device drift into an unstable state. There is probably other ways to do that in a cleaner way but we decided to move the pci reset code into Xen, so we are sure we perform the reset while the device is in a known state (functionning state). Attached is the patch we have in XenClient that move the pci reset into Xen. The modifications we have made to the VT-d code should go in the IOMMU generic section. I appologise but this patch is based on Xen 3.4, if we think this is the right way to do it, I can submit a proper patch against unstable and 4.1. Regards, Jean [-- Attachment #2: iommu-vtd-pci-reset-on-reassign --] [-- Type: text/plain, Size: 6846 bytes --] diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index feda3ba..2f5a69a 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -510,8 +510,6 @@ void arch_domain_destroy(struct domain *d) pci_release_devices(d); free_domain_pirqs(d); - if ( !is_idle_domain(d) ) - iommu_domain_destroy(d); paging_final_teardown(d); diff --git a/xen/common/domain.c b/xen/common/domain.c index 32f1d48..a7b0207 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -385,6 +385,7 @@ int domain_kill(struct domain *d) v4v_destroy(d); evtchn_destroy(d); gnttab_release_mappings(d); + iommu_domain_destroy(d); /* fallthrough */ case DOMDYING_dying: rc = domain_relinquish_resources(d); diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 723cff0..e90a335 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -75,6 +75,81 @@ struct pci_dev *pci_get_pdev(int bus, int devfn) return NULL; } +struct pci_dev *pci_get_bridge(u8 secondary) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + { + u8 type = pci_conf_read8(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_HEADER_TYPE); + if ( (type & 3) == PCI_HEADER_TYPE_BRIDGE ) + { + u8 sec = pci_conf_read8(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_SECONDARY_BUS); + if ( sec == secondary ) + { + m = pdev; + goto out; + } + } + } +out: + spin_unlock(&pcidevs_lock); + return m; +} + +struct pci_dev *pci_get_gfx(struct domain *d) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + { + if ( d == NULL || pdev->domain == d ) + { + u16 class = pci_conf_read16(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_CLASS_DEVICE); + + if ( class == 0x0300 ) + { + m = pdev; + goto out; + } + } + } +out: + spin_unlock(&pcidevs_lock); + return m; + +} + +struct pci_dev *pci_match_pdev(int (*match)(struct pci_dev *)) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + if ( match(pdev) ) + { + m = pdev; + goto out; + } +out: + spin_unlock(&pcidevs_lock); + return m; +} + struct pci_dev *pci_get_pdev_by_domain(struct domain *d, int bus, int devfn) { struct pci_dev *pdev = NULL; diff --git a/xen/drivers/passthrough/vtd/Makefile b/xen/drivers/passthrough/vtd/Makefile index 0e6f163..00ba797 100644 --- a/xen/drivers/passthrough/vtd/Makefile +++ b/xen/drivers/passthrough/vtd/Makefile @@ -6,3 +6,4 @@ obj-y += dmar.o obj-y += utils.o obj-y += qinval.o obj-y += intremap.o +obj-y += pci_reset.o diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 687f3fd..df88dc7 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -21,7 +21,6 @@ #include <xen/irq.h> #include <xen/sched.h> -#include <xen/xmalloc.h> #include <xen/mm.h> #include <xen/paging.h> #include <xen/domain_page.h> @@ -32,11 +31,13 @@ #include <xen/pci.h> #include <xen/pci_regs.h> #include <xen/keyhandler.h> +#include <xen/xmalloc.h> #include <asm/msi.h> #include "iommu.h" #include "dmar.h" #include "extern.h" #include "vtd.h" +#include "pci_reset.h" #define domain_iommu_domid(d) ((d)->arch.hvm_domain.hvm_iommu.iommu_domid) @@ -1503,6 +1504,15 @@ static int reassign_device_ownership( if ( (drhd = acpi_find_matched_drhd_unit(pdev)) == NULL ) return -ENODEV; + + /* Only do the reset here if target isn't dom0. + ** For the reassign to dom0 it's done in the iommu_domain_teardown + ** function because reassign_device_ownership happen too late in + ** the process. + */ + if (target->domain_id != 0) + pci_reset_device(bus, devfn); + pdev_iommu = drhd->iommu; domain_context_unmap(source, bus, devfn); @@ -1534,10 +1544,14 @@ static int reassign_device_ownership( void iommu_domain_teardown(struct domain *d) { struct hvm_iommu *hd = domain_hvm_iommu(d); + struct pci_dev *pdev = NULL; if ( list_empty(&acpi_drhd_units) ) return; + if ((pdev = pci_get_gfx(d))) + pci_reset_device(pdev->bus, pdev->devfn); + spin_lock(&hd->mapping_lock); iommu_free_pagetable(hd->pgd_maddr, agaw_to_level(hd->agaw)); hd->pgd_maddr = 0; diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 6827e0d..9ed6f60 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -77,6 +77,9 @@ int pci_remove_device(u8 bus, u8 devfn); int pci_add_device_ext(u8 bus, u8 devfn, struct pci_dev_info *info); struct pci_dev *pci_get_pdev(int bus, int devfn); struct pci_dev *pci_get_pdev_by_domain(struct domain *d, int bus, int devfn); +struct pci_dev *pci_match_pdev(int (*match)(struct pci_dev *)); +struct pci_dev *pci_get_bridge(u8 secondary); +struct pci_dev *pci_get_gfx(struct domain *d); uint8_t pci_conf_read8( unsigned int bus, unsigned int dev, unsigned int func, unsigned int reg); diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h index 361554c..20b00e8 100644 --- a/xen/include/xen/pci_regs.h +++ b/xen/include/xen/pci_regs.h @@ -210,6 +210,7 @@ #define PCI_CAP_ID_AGP3 0x0E /* AGP Target PCI-PCI bridge */ #define PCI_CAP_ID_EXP 0x10 /* PCI Express */ #define PCI_CAP_ID_MSIX 0x11 /* MSI-X */ +#define PCI_CAP_ID_AF 0x13 /* PCI Advanced Features */ #define PCI_CAP_LIST_NEXT 1 /* Next capability in the list */ #define PCI_CAP_FLAGS 2 /* Capability defined flags (16 bits) */ #define PCI_CAP_SIZEOF 4 @@ -315,6 +316,17 @@ #define PCI_CHSWP_EXT 0x40 /* ENUM# status - extraction */ #define PCI_CHSWP_INS 0x80 /* ENUM# status - insertion */ +/* PCI Advanced Feature registers */ + +#define PCI_AF_LENGTH 2 +#define PCI_AF_CAP 3 +#define PCI_AF_CAP_TP 0x01 +#define PCI_AF_CAP_FLR 0x02 +#define PCI_AF_CTRL 4 +#define PCI_AF_CTRL_FLR 0x01 +#define PCI_AF_STATUS 5 +#define PCI_AF_STATUS_TP 0x01 + /* PCI-X registers */ #define PCI_X_CMD 2 /* Modes & Features */ [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-16 9:47 ` Jean Guyader @ 2011-06-16 10:07 ` Tim Deegan 2011-06-16 10:28 ` Jean Guyader 0 siblings, 1 reply; 12+ messages in thread From: Tim Deegan @ 2011-06-16 10:07 UTC (permalink / raw) To: Jean Guyader Cc: Wei Wang, Jean, xen-devel@lists.xensource.com, Allen Kay, Guyader Hi Jean, At 10:47 +0100 on 16 Jun (1308221277), Jean Guyader wrote: > We have seed such behavior when we were testing GPU assignement especially > the Intel GPU. The problem is that domain destruction in Xen is assynchronous > and right now the pci device reset is done in dom0 with some help of the toolstack. > > In the Intel GPU case we need to make sure that the guest memory and the IOMMU > are still in place while we perform to reset otherwise the device drift into > an unstable state. > > There is probably other ways to do that in a cleaner way but we decided to move > the pci reset code into Xen, so we are sure we perform the reset while the device > is in a known state (functionning state). > > Attached is the patch we have in XenClient that move the pci reset into Xen. Thanks, Jean. This sounds like a good idea to me, though I'd like to hear Wei and Allen's opinions. The patch is incomplete (missing the new pci_reset.[ch] files) but I get the general idea. A few questions: - Why the special handling for one graphics device on each domain? (And if one, why not all?) - Why not reset when the target is dom0? It seems like it can do no harm and should protect dom0 from assigning itself an active PCI card. Of course, even with this patch, my original question still stands: should Xen do something more assertive in the IOMMU fault handler? Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-16 10:07 ` Tim Deegan @ 2011-06-16 10:28 ` Jean Guyader 2011-06-24 13:32 ` Tim Deegan 0 siblings, 1 reply; 12+ messages in thread From: Jean Guyader @ 2011-06-16 10:28 UTC (permalink / raw) To: Tim Deegan Cc: Wei Wang, xen-devel@lists.xensource.com, Allen Kay, Jean Guyader [-- Attachment #1: Type: text/plain, Size: 2050 bytes --] On 16/06 11:07, Tim Deegan wrote: > Hi Jean, > Reply below, > At 10:47 +0100 on 16 Jun (1308221277), Jean Guyader wrote: > > We have seed such behavior when we were testing GPU assignement especially > > the Intel GPU. The problem is that domain destruction in Xen is assynchronous > > and right now the pci device reset is done in dom0 with some help of the toolstack. > > > > In the Intel GPU case we need to make sure that the guest memory and the IOMMU > > are still in place while we perform to reset otherwise the device drift into > > an unstable state. > > > > There is probably other ways to do that in a cleaner way but we decided to move > > the pci reset code into Xen, so we are sure we perform the reset while the device > > is in a known state (functionning state). > > > > Attached is the patch we have in XenClient that move the pci reset into Xen. > > Thanks, Jean. This sounds like a good idea to me, though I'd like to > hear Wei and Allen's opinions. > > The patch is incomplete (missing the new pci_reset.[ch] files) but I get > the general idea. A few questions: Reattach the full patch. > > - Why the special handling for one graphics device on each domain? > (And if one, why not all?) No good reason really just a limitation of the patch, we can trivially get ride of the limitation. > - Why not reset when the target is dom0? It seems like it can do no > harm and should protect dom0 from assigning itself an active PCI > card. Reset could be quiet expensive (couple of seconds sometimes). We did that to avoid a double reset on domain reboot. I agree that we should remove that, or extend the IOMMU API so we can reassign from domU to domU without going through dom0. > > Of course, even with this patch, my original question still stands: > should Xen do something more assertive in the IOMMU fault handler? > What we really want to achive here is to stop DMA on this device. One way of doing it is to perform a proper PCI reset (FLR, secondary bus reset, ...) when that happens. Jean [-- Attachment #2: iommu-vtd-pci-reset-on-reassign --] [-- Type: text/plain, Size: 19643 bytes --] diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index feda3ba..2f5a69a 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -510,8 +510,6 @@ void arch_domain_destroy(struct domain *d) pci_release_devices(d); free_domain_pirqs(d); - if ( !is_idle_domain(d) ) - iommu_domain_destroy(d); paging_final_teardown(d); diff --git a/xen/common/domain.c b/xen/common/domain.c index 32f1d48..a7b0207 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -385,6 +385,7 @@ int domain_kill(struct domain *d) v4v_destroy(d); evtchn_destroy(d); gnttab_release_mappings(d); + iommu_domain_destroy(d); /* fallthrough */ case DOMDYING_dying: rc = domain_relinquish_resources(d); diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 723cff0..e90a335 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -75,6 +75,81 @@ struct pci_dev *pci_get_pdev(int bus, int devfn) return NULL; } +struct pci_dev *pci_get_bridge(u8 secondary) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + { + u8 type = pci_conf_read8(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_HEADER_TYPE); + if ( (type & 3) == PCI_HEADER_TYPE_BRIDGE ) + { + u8 sec = pci_conf_read8(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_SECONDARY_BUS); + if ( sec == secondary ) + { + m = pdev; + goto out; + } + } + } +out: + spin_unlock(&pcidevs_lock); + return m; +} + +struct pci_dev *pci_get_gfx(struct domain *d) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + { + if ( d == NULL || pdev->domain == d ) + { + u16 class = pci_conf_read16(pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + PCI_CLASS_DEVICE); + + if ( class == 0x0300 ) + { + m = pdev; + goto out; + } + } + } +out: + spin_unlock(&pcidevs_lock); + return m; + +} + +struct pci_dev *pci_match_pdev(int (*match)(struct pci_dev *)) +{ + struct pci_dev *pdev = NULL, *m = NULL; + + spin_lock(&pcidevs_lock); + + list_for_each_entry ( pdev, &alldevs_list, alldevs_list ) + if ( match(pdev) ) + { + m = pdev; + goto out; + } +out: + spin_unlock(&pcidevs_lock); + return m; +} + struct pci_dev *pci_get_pdev_by_domain(struct domain *d, int bus, int devfn) { struct pci_dev *pdev = NULL; diff --git a/xen/drivers/passthrough/vtd/Makefile b/xen/drivers/passthrough/vtd/Makefile index 0e6f163..00ba797 100644 --- a/xen/drivers/passthrough/vtd/Makefile +++ b/xen/drivers/passthrough/vtd/Makefile @@ -6,3 +6,4 @@ obj-y += dmar.o obj-y += utils.o obj-y += qinval.o obj-y += intremap.o +obj-y += pci_reset.o diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 687f3fd..df88dc7 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -21,7 +21,6 @@ #include <xen/irq.h> #include <xen/sched.h> -#include <xen/xmalloc.h> #include <xen/mm.h> #include <xen/paging.h> #include <xen/domain_page.h> @@ -32,11 +31,13 @@ #include <xen/pci.h> #include <xen/pci_regs.h> #include <xen/keyhandler.h> +#include <xen/xmalloc.h> #include <asm/msi.h> #include "iommu.h" #include "dmar.h" #include "extern.h" #include "vtd.h" +#include "pci_reset.h" #define domain_iommu_domid(d) ((d)->arch.hvm_domain.hvm_iommu.iommu_domid) @@ -1503,6 +1504,15 @@ static int reassign_device_ownership( if ( (drhd = acpi_find_matched_drhd_unit(pdev)) == NULL ) return -ENODEV; + + /* Only do the reset here if target isn't dom0. + ** For the reassign to dom0 it's done in the iommu_domain_teardown + ** function because reassign_device_ownership happen too late in + ** the process. + */ + if (target->domain_id != 0) + pci_reset_device(bus, devfn); + pdev_iommu = drhd->iommu; domain_context_unmap(source, bus, devfn); @@ -1534,10 +1544,14 @@ static int reassign_device_ownership( void iommu_domain_teardown(struct domain *d) { struct hvm_iommu *hd = domain_hvm_iommu(d); + struct pci_dev *pdev = NULL; if ( list_empty(&acpi_drhd_units) ) return; + if ((pdev = pci_get_gfx(d))) + pci_reset_device(pdev->bus, pdev->devfn); + spin_lock(&hd->mapping_lock); iommu_free_pagetable(hd->pgd_maddr, agaw_to_level(hd->agaw)); hd->pgd_maddr = 0; diff --git a/xen/drivers/passthrough/vtd/pci_reset.c b/xen/drivers/passthrough/vtd/pci_reset.c new file mode 100644 index 0000000..f0614f9 --- /dev/null +++ b/xen/drivers/passthrough/vtd/pci_reset.c @@ -0,0 +1,389 @@ +/* + * Copyright (c) 2011, Citrix Systems. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) Julian Pidancet <julian.pidancet@citrix.com> + * Copyright (C) Jean Guyader <jean.guyader@citrix.com> + */ + +#include <xen/sched.h> +#include <xen/pci.h> +#include <xen/pci_regs.h> +#include <asm/delay.h> +#include <xen/delay.h> + +#define INTEL_VID 0x8086 +#define INTEL_DID_GMA 0x2a00 +#define INTEL_DID_CALPELLA 0x0046 +#define INTEL_DID_ARRANDALE 0x0042 +#define INTEL_DID_SNB1 0x0126 +#define INTEL_DID_SNB2 0x0102 + +#define NVIDIA_VID 0x10de +#define ATI_VID 0x1002 + +#define IS_MISCGFX(vid, class) ( ((vid) == NVIDIA_VID) && \ + (class) == 0x300 \ + ) + +#define IS_GMA(vid, did) ( (vid) == INTEL_VID && \ + ((did) & 0xff00) == INTEL_DID_GMA) + +#define IS_IRONLAKE(vid, did) ( (vid) == INTEL_VID && \ + ((did) == INTEL_DID_CALPELLA || \ + (did) == INTEL_DID_SNB1 || \ + (did) == INTEL_DID_SNB2 || \ + (did) == INTEL_DID_ARRANDALE) \ + ) + +#define PCI_BDF_FMT "%02x:%02x.%x" + +#define PCI_COMMAND_VALID (PCI_COMMAND_IO | PCI_COMMAND_MEMORY | \ + PCI_COMMAND_MASTER) + +#define PCI_CONFIG_SIZE 0x100 + +#define ypcmem(a,b,c) memcpy(b,a,c) + +static int reg_len[PCI_CONFIG_SIZE] = { +/*00*/ 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, +/*10*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*20*/ 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*30*/ 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, + +/*40*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*50*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*60*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*70*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, + +/*80*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*90*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*a0*/ 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*b0*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, + +/*c0*/ 0, 1, 2, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*d0*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*e0*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*f0*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +}; + +static int reg_compare_len[PCI_CONFIG_SIZE] = { +/*00*/ 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, +/*10*/ 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, +/*20*/ 4, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*30*/ 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, + +/*40*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*50*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*60*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*70*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + +/*80*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*90*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*a0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*b0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + +/*c0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*d0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*e0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/*f0*/ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, +}; + +static void +read_config_space (u8 b, u16 devfn, u8 * cfg) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u32 cfg32; + u16 cfg16; + int i; + + for (i = 0; i < PCI_CONFIG_SIZE; ++i) + { + switch (reg_len[i]) + { + case 1: + cfg[i] = pci_conf_read8(b, d, f, i); + break; + case 2: + cfg16 = pci_conf_read16(b, d, f, i); + memcpy (&cfg[i], &cfg16, sizeof (cfg16)); + break; + case 4: + cfg32 = pci_conf_read32(b, d, f, i); + memcpy (&cfg[i], &cfg32, sizeof (cfg32)); + break; + } + } +} + +static void +write_config_space (u8 b, u16 devfn, u8 * cfg) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u32 cfg32; + u16 cfg16; + int i; + + + for (i = 0; i < PCI_CONFIG_SIZE; ++i) + { + switch (reg_len[i]) + { + case 1: + pci_conf_write8(b, d, f, i, cfg[i]); + break; + case 2: + ypcmem (&cfg[i], &cfg16, sizeof (cfg16)); + pci_conf_write16(b, d, f, i, cfg16); + break; + case 4: + ypcmem (&cfg[i], &cfg32, sizeof (cfg32)); + pci_conf_write32(b, d, f, i, cfg32); + break; + } + } +} + +static int +compare_config_space (u8 b, u16 devfn, u8 * cfg) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u32 cfg32; + u16 cfg16; + u8 cfg8; + int ret = 0; + int i; + + for (i = 0; i < PCI_CONFIG_SIZE; ++i) + { + switch (reg_compare_len[i]) + { + case 1: + cfg8 = pci_conf_read8(b, d, f, i); + if (memcmp (&cfg[i], &cfg8, sizeof (cfg8))) + { + printk("compare_config_space failed, wait a bit a restore again " PCI_BDF_FMT " %x\n", + b, d, f, i); + ret++; + } + break; + case 2: + cfg16 = pci_conf_read16(b, d, f, i); + if (memcmp (&cfg[i], &cfg16, sizeof (cfg16))) + { + printk("compare_config_space failed, wait a bit a restore again " PCI_BDF_FMT " %x\n", + b, d, f, i); + ret++; + } + break; + case 4: + cfg32 = pci_conf_read32(b, d, f, i); + if (memcmp (&cfg[i], &cfg32, sizeof (cfg32))) + { + printk("compare_config_space failed, wait a bit a restore again " PCI_BDF_FMT " %x\n", + b, d, f, i); + ret++; + } + break; + } + } + return ret; +} + +static int +restore_config_space (u8 b, u16 devfn, u8 * cfg) +{ + int tries = 3; + + while (tries--) + { + write_config_space(b, devfn, cfg); + if (!compare_config_space(b, devfn, cfg)) + return 0; + cpu_relax(); + mdelay(100); /* Wait 100 ms between restores */ + } + return tries == 0; +} + +static void pci_reset_function(u8 bus, u8 devfn, int force) +{ + s_time_t start_time; + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u8 val; + int af_pos; + u8 af; + + printk("Resetting PCI function %d:%d.%d\n", bus, d, f); + + af_pos = pci_find_cap_offset(bus, d, f, PCI_CAP_ID_AF); + if (af_pos == 0) + { + printk("%s: PCI Advanced Features not found\n", __func__); + if (force) { + printk("%s: trying A4h\n", __func__); + af_pos = 0xa4; + } + else + BUG_ON(1); + } + + if (!force) + { + af = pci_conf_read8(bus, d, f, af_pos + PCI_AF_CAP); + if (!(af & PCI_AF_CAP_FLR)) + { + printk("%s: PCI device cannot FLR\n", __func__); + BUG_ON(1); + } + } + + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); + + /* Wait for bus accesses to complete */ + start_time = NOW(); + for ( ; ; ) + { + val = pci_conf_read8(bus, d, f, af_pos + PCI_AF_STATUS); + if ( !(val & PCI_AF_STATUS_TP) ) + break; + + if ( NOW() > start_time + MILLISECS(1000) ) + { + printk("%s: Failed to quiesce bus accesses\n", __func__); + BUG_ON(1); + } + + cpu_relax(); + } + + /* Trigger the FLR */ + pci_conf_write8(bus, d, f, af_pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR); + mdelay(100); +} + +static void force_cmd_register(u8 bus, u8 devfn) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u8 val; + + val = pci_conf_read16(bus, d, f, PCI_COMMAND); + + if ((val & 0x7) == 0) + { + val |= 0x7; + pci_conf_write16(bus, d, f, PCI_COMMAND, val); + } +} + +static void pci_reset_bus(u8 bus, u8 devfn) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u8 val; + + val = pci_conf_read8(bus, d, f, PCI_BRIDGE_CONTROL); + pci_conf_write8(bus, d, f, PCI_BRIDGE_CONTROL, val | PCI_BRIDGE_CTL_BUS_RESET); + udelay(20000); + pci_conf_write8(bus, d, f, PCI_BRIDGE_CONTROL, val); + udelay(20000); +} + +static void pci_reset_misc_gfx(u8 bus, u8 devfn) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u16 vendorid, deviceid, class; + + vendorid = pci_conf_read16(bus, d, f, PCI_VENDOR_ID); + deviceid = pci_conf_read16(bus, d, f, PCI_DEVICE_ID); + class = pci_conf_read16(bus, d, f, PCI_CLASS_DEVICE); + + if (IS_MISCGFX(vendorid, class)) { + struct pci_dev *bridge = pci_get_bridge(bus); + u8 *config; + + if (bridge == NULL) { + printk("%s: Can't find bridge.\n", __func__); + return; + } + + config = xmalloc_array(u8, PCI_CONFIG_SIZE); + read_config_space(bus, devfn, config); + + pci_reset_bus(bridge->bus, bridge->devfn); + + udelay(20000); + + restore_config_space(bus, devfn, config); + xfree(config); + } + +} + +static void pci_reset_intel_gfx(u8 bus, u8 devfn) +{ + u8 d = PCI_SLOT(devfn), f = PCI_FUNC(devfn); + u16 vendorid, deviceid; + u8 val; + u8 *config; + int i; + + vendorid = pci_conf_read16(bus, d, f, PCI_VENDOR_ID); + deviceid = pci_conf_read16(bus, d, f, PCI_DEVICE_ID); + + /* Intel GMA */ + if (IS_GMA(vendorid, deviceid)) { + config = xmalloc_array(u8, PCI_CONFIG_SIZE); + read_config_space(bus, devfn, config); + + /* Perform GPU Media reset */ + pci_conf_write8(bus, d, f, 0xC0, 0x0D); + + for (i = 0; i < 10; i++) + { + mdelay(10); + + val = pci_conf_read8(bus, d, f, 0xC0); + if ( !(val & 0x1) ) + break; + } + + pci_reset_function(bus, devfn, 1); + restore_config_space(bus, devfn, config); + xfree(config); + + force_cmd_register(bus, devfn); + } + + /* Ironlake */ + if (IS_IRONLAKE(vendorid, deviceid)) { + + config = xmalloc_array(u8, PCI_CONFIG_SIZE); + read_config_space(bus, devfn, config); + + pci_reset_function(bus, devfn, 0); + + restore_config_space(bus, devfn, config); + xfree(config); + + force_cmd_register(bus, devfn); + } +} + +void pci_reset_device(u8 bus, u8 devfn) +{ + pci_reset_intel_gfx(bus, devfn); + pci_reset_misc_gfx(bus, devfn); +} diff --git a/xen/drivers/passthrough/vtd/pci_reset.h b/xen/drivers/passthrough/vtd/pci_reset.h new file mode 100644 index 0000000..dfd0872 --- /dev/null +++ b/xen/drivers/passthrough/vtd/pci_reset.h @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2011, Citrix Systems. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) Julian Pidancet <julian.pidancet@citrix.com> + * Copyright (C) Jean Guyader <jean.guyader@citrix.com> + */ + + +#ifndef PCI_RESET_H_ +# define PCI_RESET_H_ + +void pci_reset_device(u8 bus, u8 devfn); + +#endif /* PCI_RESET_H_ */ diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 6827e0d..9ed6f60 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -77,6 +77,9 @@ int pci_remove_device(u8 bus, u8 devfn); int pci_add_device_ext(u8 bus, u8 devfn, struct pci_dev_info *info); struct pci_dev *pci_get_pdev(int bus, int devfn); struct pci_dev *pci_get_pdev_by_domain(struct domain *d, int bus, int devfn); +struct pci_dev *pci_match_pdev(int (*match)(struct pci_dev *)); +struct pci_dev *pci_get_bridge(u8 secondary); +struct pci_dev *pci_get_gfx(struct domain *d); uint8_t pci_conf_read8( unsigned int bus, unsigned int dev, unsigned int func, unsigned int reg); diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h index 361554c..20b00e8 100644 --- a/xen/include/xen/pci_regs.h +++ b/xen/include/xen/pci_regs.h @@ -210,6 +210,7 @@ #define PCI_CAP_ID_AGP3 0x0E /* AGP Target PCI-PCI bridge */ #define PCI_CAP_ID_EXP 0x10 /* PCI Express */ #define PCI_CAP_ID_MSIX 0x11 /* MSI-X */ +#define PCI_CAP_ID_AF 0x13 /* PCI Advanced Features */ #define PCI_CAP_LIST_NEXT 1 /* Next capability in the list */ #define PCI_CAP_FLAGS 2 /* Capability defined flags (16 bits) */ #define PCI_CAP_SIZEOF 4 @@ -315,6 +316,17 @@ #define PCI_CHSWP_EXT 0x40 /* ENUM# status - extraction */ #define PCI_CHSWP_INS 0x80 /* ENUM# status - insertion */ +/* PCI Advanced Feature registers */ + +#define PCI_AF_LENGTH 2 +#define PCI_AF_CAP 3 +#define PCI_AF_CAP_TP 0x01 +#define PCI_AF_CAP_FLR 0x02 +#define PCI_AF_CTRL 4 +#define PCI_AF_CTRL_FLR 0x01 +#define PCI_AF_STATUS 5 +#define PCI_AF_STATUS_TP 0x01 + /* PCI-X registers */ #define PCI_X_CMD 2 /* Modes & Features */ [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-16 10:28 ` Jean Guyader @ 2011-06-24 13:32 ` Tim Deegan 2011-06-30 10:08 ` Tim Deegan 0 siblings, 1 reply; 12+ messages in thread From: Tim Deegan @ 2011-06-24 13:32 UTC (permalink / raw) To: Jean Guyader; +Cc: Wei Wang, xen-devel, Allen Kay Hi, At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote: > > Of course, even with this patch, my original question still stands: > > should Xen do something more assertive in the IOMMU fault handler? > > What we really want to achive here is to stop DMA on this device. > One way of doing it is to perform a proper PCI reset (FLR, secondary > bus reset, ...) when that happens. I think that's more or less a consensus then, that we should try to stop the device from the IOMMU fault handler. Looking at your patch in a bit more detail, I see two things that worry me. The first is that the new pci_reset_device() function does nothing at all if the device isn't one of the particular graphics cards it know about! The second is this comment: > + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); which implies that my current approach of just disabling the card might have pretty bad conequences. Can you expand on that? Would it be better just to mask out PCI_COMMAND_MASTER? And if I do that do I need to try and issue a reset as well (i.e. are there cards that are known to ignore this bit?) Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-24 13:32 ` Tim Deegan @ 2011-06-30 10:08 ` Tim Deegan 2011-06-30 10:31 ` Jean Guyader 0 siblings, 1 reply; 12+ messages in thread From: Tim Deegan @ 2011-06-30 10:08 UTC (permalink / raw) To: Jean Guyader; +Cc: Wei Wang, xen-devel, Allen Kay At 14:32 +0100 on 24 Jun (1308925934), Tim Deegan wrote: > At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote: > > > Of course, even with this patch, my original question still stands: > > > should Xen do something more assertive in the IOMMU fault handler? > > > > What we really want to achive here is to stop DMA on this device. > > One way of doing it is to perform a proper PCI reset (FLR, secondary > > bus reset, ...) when that happens. > > I think that's more or less a consensus then, that we should try to > stop the device from the IOMMU fault handler. > > Looking at your patch in a bit more detail, I see two things that worry > me. The first is that the new pci_reset_device() function does nothing > at all if the device isn't one of the particular graphics cards it know > about! > > The second is this comment: > > > + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); > > which implies that my current approach of just disabling the card might > have pretty bad conequences. Can you expand on that? Would it be > better just to mask out PCI_COMMAND_MASTER? And if I do that do I need > to try and issue a reset as well (i.e. are there cards that are known to > ignore this bit?) Ping? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-30 10:08 ` Tim Deegan @ 2011-06-30 10:31 ` Jean Guyader 0 siblings, 0 replies; 12+ messages in thread From: Jean Guyader @ 2011-06-30 10:31 UTC (permalink / raw) To: Tim Deegan Cc: Wei Wang, xen-devel@lists.xensource.com, Allen Kay, Jean Guyader On 30/06 11:08, Tim Deegan wrote: > At 14:32 +0100 on 24 Jun (1308925934), Tim Deegan wrote: > > At 11:28 +0100 on 16 Jun (1308223697), Jean Guyader wrote: > > > > Of course, even with this patch, my original question still stands: > > > > should Xen do something more assertive in the IOMMU fault handler? > > > > > > What we really want to achive here is to stop DMA on this device. > > > One way of doing it is to perform a proper PCI reset (FLR, secondary > > > bus reset, ...) when that happens. > > > > I think that's more or less a consensus then, that we should try to > > stop the device from the IOMMU fault handler. > > > > Looking at your patch in a bit more detail, I see two things that worry > > me. The first is that the new pci_reset_device() function does nothing > > at all if the device isn't one of the particular graphics cards it know > > about! > > In our case the reset of other devices is done using the classic Xen toolstack way in dom0. But the code could be easily changed to do a proper reset on all the type of devices. > > The second is this comment: > > > > > + /* Leave CMD MEMORY set otherwise the platform can crashe during FLR */ > > > + pci_conf_write16(bus, d, f, PCI_COMMAND, 2); > > > > which implies that my current approach of just disabling the card might > > have pretty bad conequences. Can you expand on that? Would it be > > better just to mask out PCI_COMMAND_MASTER? And if I do that do I need > > to try and issue a reset as well (i.e. are there cards that are known to > > ignore this bit?) Agreed, masking out PCI_COMMAND_MASTER should be enough, Linux only do that. Jean ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: IOMMU faults 2011-06-16 9:25 IOMMU faults Tim Deegan 2011-06-16 9:47 ` Jean Guyader @ 2011-06-16 14:30 ` Wei Wang2 2011-06-16 14:47 ` Konrad Rzeszutek Wilk 2011-06-16 19:21 ` Kay, Allen M 2 siblings, 1 reply; 12+ messages in thread From: Wei Wang2 @ 2011-06-16 14:30 UTC (permalink / raw) To: Tim Deegan; +Cc: xen-devel@lists.xensource.com, Allen Kay, Jean Guyader Alberto BozzoOn Thursday 16 June 2011 11:25:09 Tim Deegan wrote: > Hi, IOMMU maintainers, > > What should Xen do when an IOMMU fault happens? As far as I can > see both the AMD and Intel code clears the error in the IOMMU and > carries on, but I suspect some more vigorous action is appropriate. > I've seen traces from an Intel machine that seemed to be livelocked on > IOMMU faults from a passed-through VGA card, until it was killed by the > watchdog. I think I can see two things that contribute to that: > > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too. This info could still be useful for debugging, but we should only enable this for debug build. > - Domain destruction re-assigns passed though cards to dom0, but the > cards don't seem to get reset. So there's nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually. There should be some FLR codes in tools (both xm and xl). But this might not work well with some devices... > In any case, it seems like it would be a good idea to stop a > broken/malicious/deassigned card from flooding Xen with IOMMU faults. Yes, agree that. Actually I saw a lot could be improved in the fault handler. When iommu faults come from dma error, we should either stop the device from doing dma or inject errors to guest if the guest driver is able to handle io page fault. > I was considering just writing 0 to the faulting card's PCI command > register, but I'm told that's not always enough to properly deactivate > a card, and it might be a little over-zealous to do it on the first > offence. > Ideas? It seems difficult to find a generic approach to stop a device without knowing more device specific details... Thanks, Wei > Tim. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-16 14:30 ` Wei Wang2 @ 2011-06-16 14:47 ` Konrad Rzeszutek Wilk 2011-06-17 8:08 ` Tim Deegan 0 siblings, 1 reply; 12+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-06-16 14:47 UTC (permalink / raw) To: Wei Wang2 Cc: xen-devel@lists.xensource.com, Allen Kay, Tim Deegan, Jean Guyader > > I was considering just writing 0 to the faulting card's PCI command > > register, but I'm told that's not always enough to properly deactivate > > a card, and it might be a little over-zealous to do it on the first > > offence. > > Ideas? > It seems difficult to find a generic approach to stop a device without knowing > more device specific details... Perhaps make something similar to the MCE fault interrupts? As in when the error happens, the Dom0 is notified of the offending BDF and persuses whatever action it thinks are neccessary. The action would be to tell the device driver to turn itself off. But how it would interact with the driver.. Well how does Linux deal with this today? Is there an extension to the device driver API (similar to the power) to notify the driver that it has done bad things and to shut itself off? Perhaps similar to the PCIe AER handling? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: IOMMU faults 2011-06-16 14:47 ` Konrad Rzeszutek Wilk @ 2011-06-17 8:08 ` Tim Deegan 0 siblings, 0 replies; 12+ messages in thread From: Tim Deegan @ 2011-06-17 8:08 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Wei Wang2, Jean, xen-devel@lists.xensource.com, Allen Kay, Guyader At 10:47 -0400 on 16 Jun (1308221250), Konrad Rzeszutek Wilk wrote: > Perhaps make something similar to the MCE fault interrupts? As in when > the error happens, the Dom0 is notified of the offending BDF and > persuses whatever action it thinks are neccessary. The action would be > to tell the device driver to turn itself off. But how it would > interact with the driver.. Well how does Linux deal with this today? > Is there an extension to the device driver API (similar to the power) > to notify the driver that it has done bad things and to shut itself > off? That sort of interface might be nice too, but I was worried more about badly-behaved guests or devices. In the livelock case the guest might never get to run so can't do anything, and a malicious guest would just ignore the message anyway. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: IOMMU faults 2011-06-16 9:25 IOMMU faults Tim Deegan 2011-06-16 9:47 ` Jean Guyader 2011-06-16 14:30 ` Wei Wang2 @ 2011-06-16 19:21 ` Kay, Allen M 2011-06-17 8:06 ` Tim Deegan 2 siblings, 1 reply; 12+ messages in thread From: Kay, Allen M @ 2011-06-16 19:21 UTC (permalink / raw) To: Tim Deegan, Wei Wang; +Cc: Jean, xen-devel@lists.xensource.com, Guyader > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > context, making it easier to livelock. Still I think the general > problem applies on AMD too. Someone at Intel looked into implementing measured rate printing in vt-d fault handler. He encountered some complications. I remember it had to do with measured rate printing not enabled by default (?). For now, I think having it print out only for debug case sounds simple enough. I will submit a patch for it. > - Domain destruction re-assigns passed though cards to dom0, but the > cards don't seem to get reset. So there's nothing to stop a card > battering away at DMA in the meantime. That seems like a problem > independent of livelock, actually. >From reading the code in libxl, it seems libxl__device_pci_reset() is called by both libxl__device_pci_add() and do_pci_remove(). Isn't do_pci_remove() called when the pass through device is reassigned to dom0 during a domain teardown? Allen -----Original Message----- From: Tim Deegan [mailto:Tim.Deegan@citrix.com] Sent: Thursday, June 16, 2011 2:25 AM To: Kay, Allen M; Wei Wang Cc: xen-devel@lists.xensource.com; Jean Guyader Subject: IOMMU faults Hi, IOMMU maintainers, What should Xen do when an IOMMU fault happens? As far as I can see both the AMD and Intel code clears the error in the IOMMU and carries on, but I suspect some more vigorous action is appropriate. I've seen traces from an Intel machine that seemed to be livelocked on IOMMU faults from a passed-through VGA card, until it was killed by the watchdog. I think I can see two things that contribute to that: - The Intel IOMMU fault handler prints quite a lot of info in interrupt context, making it easier to livelock. Still I think the general problem applies on AMD too. - Domain destruction re-assigns passed though cards to dom0, but the cards don't seem to get reset. So there's nothing to stop a card battering away at DMA in the meantime. That seems like a problem independent of livelock, actually. In any case, it seems like it would be a good idea to stop a broken/malicious/deassigned card from flooding Xen with IOMMU faults. I was considering just writing 0 to the faulting card's PCI command register, but I'm told that's not always enough to properly deactivate a card, and it might be a little over-zealous to do it on the first offence. Ideas? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RE: IOMMU faults 2011-06-16 19:21 ` Kay, Allen M @ 2011-06-17 8:06 ` Tim Deegan 0 siblings, 0 replies; 12+ messages in thread From: Tim Deegan @ 2011-06-17 8:06 UTC (permalink / raw) To: Kay, Allen M; +Cc: Wei Wang, Jean, xen-devel, Guyader At 12:21 -0700 on 16 Jun (1308226873), Kay, Allen M wrote: > > - The Intel IOMMU fault handler prints quite a lot of info in interrupt > > context, making it easier to livelock. Still I think the general > > problem applies on AMD too. > > Someone at Intel looked into implementing measured rate printing in vt-d fault handler. He encountered some complications. I remember it had to do with measured rate printing not enabled by default (?). For now, I think having it print out only for debug case sounds simple enough. I will submit a patch for it. > That's great, thanks. > > - Domain destruction re-assigns passed though cards to dom0, but the > > cards don't seem to get reset. So there's nothing to stop a card > > battering away at DMA in the meantime. That seems like a problem > > independent of livelock, actually. > > >From reading the code in libxl, it seems libxl__device_pci_reset() is called by both libxl__device_pci_add() and do_pci_remove(). Isn't do_pci_remove() called when the pass through device is reassigned to dom0 during a domain teardown? Libxl could be too late, though. When a domain is destroyed, its iommu tables get torn down in Xen. So if it has active devices: - they can start raising IOMMU faults immediately, and in some circumstances libxl might never get to run. - since deassign is implemented as "assign to dom0" they might start DMAing over dom0 memory. If we can rely on the dom0 tools always completely resetting a domains's devices before calling domctl_destroydomain, that should never happen. That seems a bit fragile, though I guess dom0 can shoot itself in the foot in enough other ways. I prefer Jean's reset-in-xen approach; it's only a few hundred lines of code and we could reuse some of it for resetting badly-behaved cards from the IOMMU fault handler. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-06-30 10:31 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-16 9:25 IOMMU faults Tim Deegan 2011-06-16 9:47 ` Jean Guyader 2011-06-16 10:07 ` Tim Deegan 2011-06-16 10:28 ` Jean Guyader 2011-06-24 13:32 ` Tim Deegan 2011-06-30 10:08 ` Tim Deegan 2011-06-30 10:31 ` Jean Guyader 2011-06-16 14:30 ` Wei Wang2 2011-06-16 14:47 ` Konrad Rzeszutek Wilk 2011-06-17 8:08 ` Tim Deegan 2011-06-16 19:21 ` Kay, Allen M 2011-06-17 8:06 ` Tim Deegan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.