* [PATCH] intel-iommu: Manage iommu_coherency globally @ 2011-11-16 4:11 Alex Williamson 2011-11-16 10:28 ` Chris Wright 2011-11-18 12:03 ` David Woodhouse 0 siblings, 2 replies; 11+ messages in thread From: Alex Williamson @ 2011-11-16 4:11 UTC (permalink / raw) To: dwmw2; +Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile, alex.williamson We currently manage iommu_coherency on a per domain basis, choosing the safest setting across the iommus attached to a particular domain. This unfortunately has a bug that when no iommus are attached, the domain defaults to coherent. If we fall into this mode, then later add a device behind a non-coherent iommu to that domain, the context entry is updated using the wrong coherency setting, and we get dmar faults. Since we expect chipsets to be consistent in their coherency setting, we can instead determine the coherency once and use it globally. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- As suggested, here's an updated patch to which sets the coherency once at initalization time instead of changing the default coherency for orphaned domains. Thanks, Alex drivers/iommu/intel-iommu.c | 40 +++++++++++++++++----------------------- 1 files changed, 17 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index c0c7820..79e103a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -353,7 +353,6 @@ struct dmar_domain { int flags; /* flags to find out type of domain */ - int iommu_coherency;/* indicate coherency of iommu access */ int iommu_snooping; /* indicate snooping control feature*/ int iommu_count; /* reference count of iommu */ int iommu_superpage;/* Level of superpages supported: @@ -419,6 +418,8 @@ static LIST_HEAD(device_domain_list); static struct iommu_ops intel_iommu_ops; +static int iommu_coherency __read_mostly = 1; + static int __init intel_iommu_setup(char *str) { if (!str) @@ -556,20 +557,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain) return g_iommus[iommu_id]; } -static void domain_update_iommu_coherency(struct dmar_domain *domain) -{ - int i; - - domain->iommu_coherency = 1; - - for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) { - if (!ecap_coherent(g_iommus[i]->ecap)) { - domain->iommu_coherency = 0; - break; - } - } -} - static void domain_update_iommu_snooping(struct dmar_domain *domain) { int i; @@ -608,7 +595,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain) /* Some capabilities may be different across iommus */ static void domain_update_iommu_cap(struct dmar_domain *domain) { - domain_update_iommu_coherency(domain); domain_update_iommu_snooping(domain); domain_update_iommu_superpage(domain); } @@ -646,7 +632,7 @@ static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn) static void domain_flush_cache(struct dmar_domain *domain, void *addr, int size) { - if (!domain->iommu_coherency) + if (!iommu_coherency) clflush_cache_range(addr, size); } @@ -1459,11 +1445,6 @@ static int domain_init(struct dmar_domain *domain, int guest_width) domain->agaw = agaw; INIT_LIST_HEAD(&domain->devices); - if (ecap_coherent(iommu->ecap)) - domain->iommu_coherency = 1; - else - domain->iommu_coherency = 0; - if (ecap_sc_support(iommu->ecap)) domain->iommu_snooping = 1; else @@ -3583,6 +3564,8 @@ static struct notifier_block device_nb = { int __init intel_iommu_init(void) { + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; int ret = 0; /* VT-d is required for a TXT/tboot launch, so enforce that */ @@ -3643,6 +3626,18 @@ int __init intel_iommu_init(void) init_iommu_pm_ops(); + /* + * Check for coherency support across all iommus. If unsupported + * by any, force cache flushes. Expected to be consistent across + * all iommus in the system. + */ + for_each_active_iommu(iommu, drhd) { + if (!ecap_coherent(iommu->ecap)) { + iommu_coherency = 0; + break; + } + } + bus_set_iommu(&pci_bus_type, &intel_iommu_ops); bus_register_notifier(&pci_bus_type, &device_nb); @@ -3820,7 +3815,6 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) INIT_LIST_HEAD(&domain->devices); domain->iommu_count = 0; - domain->iommu_coherency = 0; domain->iommu_snooping = 0; domain->iommu_superpage = 0; domain->max_addr = 0; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-16 4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson @ 2011-11-16 10:28 ` Chris Wright 2011-11-18 12:03 ` David Woodhouse 1 sibling, 0 replies; 11+ messages in thread From: Chris Wright @ 2011-11-16 10:28 UTC (permalink / raw) To: Alex Williamson; +Cc: dwmw2, iommu, linux-pci, linux-kernel, chrisw, ddutile * Alex Williamson (alex.williamson@redhat.com) wrote: > We currently manage iommu_coherency on a per domain basis, > choosing the safest setting across the iommus attached to a > particular domain. This unfortunately has a bug that when > no iommus are attached, the domain defaults to coherent. > If we fall into this mode, then later add a device behind a > non-coherent iommu to that domain, the context entry is > updated using the wrong coherency setting, and we get dmar > faults. > > Since we expect chipsets to be consistent in their coherency > setting, we can instead determine the coherency once and use > it globally. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Looks good to me. Acked-by: Chris Wright <chrisw@sous-sol.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-16 4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson 2011-11-16 10:28 ` Chris Wright @ 2011-11-18 12:03 ` David Woodhouse 2011-11-18 16:03 ` Alex Williamson ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: David Woodhouse @ 2011-11-18 12:03 UTC (permalink / raw) To: Alex Williamson, rajesh.sankaran Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile [-- Attachment #1: Type: text/plain, Size: 5777 bytes --] On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote: > We currently manage iommu_coherency on a per domain basis, > choosing the safest setting across the iommus attached to a > particular domain. This unfortunately has a bug that when > no iommus are attached, the domain defaults to coherent. > If we fall into this mode, then later add a device behind a > non-coherent iommu to that domain, the context entry is > updated using the wrong coherency setting, and we get dmar > faults. > > Since we expect chipsets to be consistent in their coherency > setting, we can instead determine the coherency once and use > it globally. (Adding Rajesh). Hm, it seems I lied to you about this. The non-coherent mode isn't just a historical mistake; it's configurable by the BIOS, and we actually encourage people to use the non-coherent mode because it makes the hardware page-walk faster — so reduces the latency for IOTLB misses. In addition to that, the IOMMU associated with the integrated graphics is so "special" that it doesn't support coherent mode either. So it *is* quite feasible that we'll see a machine where some IOMMUs support coherent mode, and some don't. And thus we do need to address the concern that just assuming non-coherent mode will cause unnecessary performance issues, for the case where a domain *doesn't* happen to include any of the non-coherent IOMMUs. However... for VM domains I don't think we care. Setting up the page tables *isn't* a fast path there (at least not until/unless we support exposing an emulated IOMMU to the guest). The case we care about is *native* DMA, where this cache flush will be happening for example in the fast path of network TX/RX. But in *that* case, there is only *one* IOMMU to worry about so it's simple enough to do the right thing, surely? So, referring to your patch... perhaps we should retain the domain->iommu_coherency variable, but *still* have the global 'iommu_coherency' calculated at init time. For *VM* domains, we set domain->iommu_coherency = iommu_coherency and use the globally-calculated (and pessimistic) result. For *DMA* domains (i.e. in domain_init()), we keep the code that was there before, which looks at ecap_coherent(iommu->ecap). Does that seem sane? Revised patch... From c6b04e1c7456892f43a6f81f0005b3ade8ca0383 Mon Sep 17 00:00:00 2001 From: Alex Williamson <alex.williamson@redhat.com> Date: Tue, 15 Nov 2011 21:11:09 -0700 Subject: [PATCH] intel-iommu: Manage iommu_coherency globally We currently manage iommu_coherency on a per domain basis, choosing the safest setting across the iommus attached to a particular domain. This unfortunately has a bug that when no iommus are attached, the domain defaults to coherent. If we fall into this mode, then later add a device behind a non-coherent iommu to that domain, the context entry is updated using the wrong coherency setting, and we get dmar faults. Since we expect chipsets to be consistent in their coherency setting, we can instead determine the coherency once and use it globally. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> [dwmw2: Do the above for VM domains only; use capabilities for normal domains] --- drivers/iommu/intel-iommu.c | 33 +++++++++++++++++---------------- 1 files changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index c0c7820..ce8f933 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -419,6 +419,8 @@ static LIST_HEAD(device_domain_list); static struct iommu_ops intel_iommu_ops; +static int vm_iommu_coherency __read_mostly = 1; + static int __init intel_iommu_setup(char *str) { if (!str) @@ -556,20 +558,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain) return g_iommus[iommu_id]; } -static void domain_update_iommu_coherency(struct dmar_domain *domain) -{ - int i; - - domain->iommu_coherency = 1; - - for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) { - if (!ecap_coherent(g_iommus[i]->ecap)) { - domain->iommu_coherency = 0; - break; - } - } -} - static void domain_update_iommu_snooping(struct dmar_domain *domain) { int i; @@ -608,7 +596,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain) /* Some capabilities may be different across iommus */ static void domain_update_iommu_cap(struct dmar_domain *domain) { - domain_update_iommu_coherency(domain); domain_update_iommu_snooping(domain); domain_update_iommu_superpage(domain); } @@ -3583,6 +3570,8 @@ static struct notifier_block device_nb = { int __init intel_iommu_init(void) { + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; int ret = 0; /* VT-d is required for a TXT/tboot launch, so enforce that */ @@ -3643,6 +3632,18 @@ int __init intel_iommu_init(void) init_iommu_pm_ops(); + /* + * Check for coherency support across all iommus. If unsupported + * by any, force cache flushes. Expected to be consistent across + * all iommus in the system. + */ + for_each_active_iommu(iommu, drhd) { + if (!ecap_coherent(iommu->ecap)) { + vm_iommu_coherency = 0; + break; + } + } + bus_set_iommu(&pci_bus_type, &intel_iommu_ops); bus_register_notifier(&pci_bus_type, &device_nb); @@ -3820,7 +3821,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) INIT_LIST_HEAD(&domain->devices); domain->iommu_count = 0; - domain->iommu_coherency = 0; + domain->iommu_coherency = vm_iommu_coherency; domain->iommu_snooping = 0; domain->iommu_superpage = 0; domain->max_addr = 0; -- 1.7.7.1 -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5818 bytes --] ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 12:03 ` David Woodhouse @ 2011-11-18 16:03 ` Alex Williamson 2011-11-18 17:00 ` David Woodhouse 2011-11-19 19:17 ` Chris Wright 2011-11-23 7:30 ` cody 2 siblings, 1 reply; 11+ messages in thread From: Alex Williamson @ 2011-11-18 16:03 UTC (permalink / raw) To: David Woodhouse Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile On Fri, 2011-11-18 at 12:03 +0000, David Woodhouse wrote: > On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote: > > We currently manage iommu_coherency on a per domain basis, > > choosing the safest setting across the iommus attached to a > > particular domain. This unfortunately has a bug that when > > no iommus are attached, the domain defaults to coherent. > > If we fall into this mode, then later add a device behind a > > non-coherent iommu to that domain, the context entry is > > updated using the wrong coherency setting, and we get dmar > > faults. > > > > Since we expect chipsets to be consistent in their coherency > > setting, we can instead determine the coherency once and use > > it globally. > > (Adding Rajesh). > > Hm, it seems I lied to you about this. The non-coherent mode isn't just > a historical mistake; it's configurable by the BIOS, and we actually > encourage people to use the non-coherent mode because it makes the > hardware page-walk faster — so reduces the latency for IOTLB misses. > > In addition to that, the IOMMU associated with the integrated graphics > is so "special" that it doesn't support coherent mode either. So it *is* > quite feasible that we'll see a machine where some IOMMUs support > coherent mode, and some don't. > > And thus we do need to address the concern that just assuming > non-coherent mode will cause unnecessary performance issues, for the > case where a domain *doesn't* happen to include any of the non-coherent > IOMMUs. > > However... for VM domains I don't think we care. Setting up the page > tables *isn't* a fast path there (at least not until/unless we support > exposing an emulated IOMMU to the guest). > > The case we care about is *native* DMA, where this cache flush will be > happening for example in the fast path of network TX/RX. But in *that* > case, there is only *one* IOMMU to worry about so it's simple enough to > do the right thing, surely? > > So, referring to your patch... perhaps we should retain the > domain->iommu_coherency variable, but *still* have the global > 'iommu_coherency' calculated at init time. For *VM* domains, we set > domain->iommu_coherency = iommu_coherency and use the > globally-calculated (and pessimistic) result. For *DMA* domains (i.e. in > domain_init()), we keep the code that was there before, which looks at > ecap_coherent(iommu->ecap). Does that seem sane? Revised patch... I can't help but thinking we're just taking the easy, lazy path for VM domains when it sounds like we really would prefer to keep the coherency set to the least common denominator of the domain rather than the platform. Couldn't we instead force a flush of the domain when we transition from coherent to non-coherent? Not sure I'm qualified to write that, but seems like it would keep the efficiency of VM domains with no effect to native DMA domains since they'd never trigger such a transition. The below appears as if it would work and probably be OK since the unnecessary cache flushes are rare, but they're still unnecessary... and the comments/commit log are now wrong. Thanks, Alex > From c6b04e1c7456892f43a6f81f0005b3ade8ca0383 Mon Sep 17 00:00:00 2001 > From: Alex Williamson <alex.williamson@redhat.com> > Date: Tue, 15 Nov 2011 21:11:09 -0700 > Subject: [PATCH] intel-iommu: Manage iommu_coherency globally > > We currently manage iommu_coherency on a per domain basis, > choosing the safest setting across the iommus attached to a > particular domain. This unfortunately has a bug that when > no iommus are attached, the domain defaults to coherent. > If we fall into this mode, then later add a device behind a > non-coherent iommu to that domain, the context entry is > updated using the wrong coherency setting, and we get dmar > faults. > > Since we expect chipsets to be consistent in their coherency > setting, we can instead determine the coherency once and use > it globally. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > [dwmw2: Do the above for VM domains only; use capabilities for normal domains] > --- > drivers/iommu/intel-iommu.c | 33 +++++++++++++++++---------------- > 1 files changed, 17 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index c0c7820..ce8f933 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -419,6 +419,8 @@ static LIST_HEAD(device_domain_list); > > static struct iommu_ops intel_iommu_ops; > > +static int vm_iommu_coherency __read_mostly = 1; > + > static int __init intel_iommu_setup(char *str) > { > if (!str) > @@ -556,20 +558,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain) > return g_iommus[iommu_id]; > } > > -static void domain_update_iommu_coherency(struct dmar_domain *domain) > -{ > - int i; > - > - domain->iommu_coherency = 1; > - > - for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) { > - if (!ecap_coherent(g_iommus[i]->ecap)) { > - domain->iommu_coherency = 0; > - break; > - } > - } > -} > - > static void domain_update_iommu_snooping(struct dmar_domain *domain) > { > int i; > @@ -608,7 +596,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain) > /* Some capabilities may be different across iommus */ > static void domain_update_iommu_cap(struct dmar_domain *domain) > { > - domain_update_iommu_coherency(domain); > domain_update_iommu_snooping(domain); > domain_update_iommu_superpage(domain); > } > @@ -3583,6 +3570,8 @@ static struct notifier_block device_nb = { > > int __init intel_iommu_init(void) > { > + struct dmar_drhd_unit *drhd; > + struct intel_iommu *iommu; > int ret = 0; > > /* VT-d is required for a TXT/tboot launch, so enforce that */ > @@ -3643,6 +3632,18 @@ int __init intel_iommu_init(void) > > init_iommu_pm_ops(); > > + /* > + * Check for coherency support across all iommus. If unsupported > + * by any, force cache flushes. Expected to be consistent across > + * all iommus in the system. > + */ > + for_each_active_iommu(iommu, drhd) { > + if (!ecap_coherent(iommu->ecap)) { > + vm_iommu_coherency = 0; > + break; > + } > + } > + > bus_set_iommu(&pci_bus_type, &intel_iommu_ops); > > bus_register_notifier(&pci_bus_type, &device_nb); > @@ -3820,7 +3821,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) > INIT_LIST_HEAD(&domain->devices); > > domain->iommu_count = 0; > - domain->iommu_coherency = 0; > + domain->iommu_coherency = vm_iommu_coherency; > domain->iommu_snooping = 0; > domain->iommu_superpage = 0; > domain->max_addr = 0; > -- > 1.7.7.1 > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 16:03 ` Alex Williamson @ 2011-11-18 17:00 ` David Woodhouse 2011-11-18 17:15 ` Alex Williamson 0 siblings, 1 reply; 11+ messages in thread From: David Woodhouse @ 2011-11-18 17:00 UTC (permalink / raw) To: Alex Williamson Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile [-- Attachment #1: Type: text/plain, Size: 980 bytes --] On Fri, 2011-11-18 at 09:03 -0700, Alex Williamson wrote: > I can't help but thinking we're just taking the easy, lazy path for VM > domains when it sounds like we really would prefer to keep the coherency > set to the least common denominator of the domain rather than the > platform. Couldn't we instead force a flush of the domain when we > transition from coherent to non-coherent? Not sure I'm qualified to > write that, but seems like it would keep the efficiency of VM domains > with no effect to native DMA domains since they'd never trigger such a > transition. The below appears as if it would work and probably be OK > since the unnecessary cache flushes are rare, but they're still > unnecessary... and the comments/commit log are now wrong. Thanks, Yeah, that would make some sense. I was about to knock up some code which would walk the page tables and use clflush to flush every one... but wouldn't it be saner just to use wbinvd? -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5818 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 17:00 ` David Woodhouse @ 2011-11-18 17:15 ` Alex Williamson 2011-11-18 18:15 ` David Woodhouse 0 siblings, 1 reply; 11+ messages in thread From: Alex Williamson @ 2011-11-18 17:15 UTC (permalink / raw) To: David Woodhouse Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile On Fri, 2011-11-18 at 17:00 +0000, David Woodhouse wrote: > On Fri, 2011-11-18 at 09:03 -0700, Alex Williamson wrote: > > I can't help but thinking we're just taking the easy, lazy path for VM > > domains when it sounds like we really would prefer to keep the coherency > > set to the least common denominator of the domain rather than the > > platform. Couldn't we instead force a flush of the domain when we > > transition from coherent to non-coherent? Not sure I'm qualified to > > write that, but seems like it would keep the efficiency of VM domains > > with no effect to native DMA domains since they'd never trigger such a > > transition. The below appears as if it would work and probably be OK > > since the unnecessary cache flushes are rare, but they're still > > unnecessary... and the comments/commit log are now wrong. Thanks, > > Yeah, that would make some sense. I was about to knock up some code > which would walk the page tables and use clflush to flush every one... > but wouldn't it be saner just to use wbinvd? A bit heavy handed, but obviously easier. It feels like we could safely be more strategic, but maybe we'd end up trashing the cache anyway in a drawn out attempt to flush the context and all page tables. However, do we actually need a wbinvd_on_all_cpus()? Probably better to trash one cache than all of them. Thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 17:15 ` Alex Williamson @ 2011-11-18 18:15 ` David Woodhouse 0 siblings, 0 replies; 11+ messages in thread From: David Woodhouse @ 2011-11-18 18:15 UTC (permalink / raw) To: Alex Williamson Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile [-- Attachment #1: Type: text/plain, Size: 1659 bytes --] On Fri, 2011-11-18 at 10:15 -0700, Alex Williamson wrote: > A bit heavy handed, but obviously easier. It feels like we could safely > be more strategic, but maybe we'd end up trashing the cache anyway in a > drawn out attempt to flush the context and all page tables. However, do > we actually need a wbinvd_on_all_cpus()? Probably better to trash one > cache than all of them. Thanks, Yes, it would be wbinvd_on_all_cpus(). The page-table-walk code would look something like this... static void flush_domain_ptes(struct dmar_domain *domain) { int level = agaw_to_level(domain->agaw); struct dma_pte *ptelvl[level + 2], *pte; pte = domain->pgd; clflush_cache_range(pte, VTD_PAGE_SIZE); /* Keep the end condition nice and simple... */ ptelvl[level + 1] = NULL; while (pte) { /* BUG_ON(level < 2); */ if (dma_pte_present(pte)) { if (level == 2) { /* Flush the bottom-level page, but no need to process its *contents* any further. */ clflush_cache_range(phys_to_virt(dma_pte_addr(pte)), VTD_PAGE_SIZE); } else if (!(pte->val & DMA_PTE_LARGE_PAGE)) { /* Remember the next PTE at this level; we'll come back to it. */ ptelvl[level] = pte + 1; /* Go down a level and process PTEs there. */ pte = phys_to_virt(dma_pte_addr(pte)); clflush_cache_range(pte, VTD_PAGE_SIZE); level--; continue; } } pte++; /* When we reach the end of a PTE page, go back up to the next level where we came from. */ while (pte && !((unsigned long)pte & ~VTD_PAGE_MASK)) { level++; pte = ptelvl[level]; } } } -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5818 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 12:03 ` David Woodhouse 2011-11-18 16:03 ` Alex Williamson @ 2011-11-19 19:17 ` Chris Wright 2011-11-19 20:11 ` David Woodhouse 2011-11-23 7:30 ` cody 2 siblings, 1 reply; 11+ messages in thread From: Chris Wright @ 2011-11-19 19:17 UTC (permalink / raw) To: David Woodhouse Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile * David Woodhouse (dwmw2@infradead.org) wrote: > On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote: > > We currently manage iommu_coherency on a per domain basis, > > choosing the safest setting across the iommus attached to a > > particular domain. This unfortunately has a bug that when > > no iommus are attached, the domain defaults to coherent. > > If we fall into this mode, then later add a device behind a > > non-coherent iommu to that domain, the context entry is > > updated using the wrong coherency setting, and we get dmar > > faults. > > > > Since we expect chipsets to be consistent in their coherency > > setting, we can instead determine the coherency once and use > > it globally. > > (Adding Rajesh). > > Hm, it seems I lied to you about this. The non-coherent mode isn't just > a historical mistake; it's configurable by the BIOS, and we actually > encourage people to use the non-coherent mode because it makes the > hardware page-walk faster — so reduces the latency for IOTLB misses. Interesting because for the workloads I've tested it's the exact opposite. Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU added smth like 10% when non-coherent vs. coherent. > In addition to that, the IOMMU associated with the integrated graphics > is so "special" that it doesn't support coherent mode either. So it *is* > quite feasible that we'll see a machine where some IOMMUs support > coherent mode, and some don't. > > And thus we do need to address the concern that just assuming > non-coherent mode will cause unnecessary performance issues, for the > case where a domain *doesn't* happen to include any of the non-coherent > IOMMUs. > > However... for VM domains I don't think we care. Setting up the page > tables *isn't* a fast path there (at least not until/unless we support > exposing an emulated IOMMU to the guest). > > The case we care about is *native* DMA, where this cache flush will be > happening for example in the fast path of network TX/RX. But in *that* > case, there is only *one* IOMMU to worry about so it's simple enough to > do the right thing, surely? Definitely agreed on the above points, limited page table setup/teardown to VMs and bare metal case is sensitive to IOMMU overhead of flushing. thanks, -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-19 19:17 ` Chris Wright @ 2011-11-19 20:11 ` David Woodhouse 0 siblings, 0 replies; 11+ messages in thread From: David Woodhouse @ 2011-11-19 20:11 UTC (permalink / raw) To: Chris Wright Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel, ddutile [-- Attachment #1: Type: text/plain, Size: 1389 bytes --] On Sat, 2011-11-19 at 11:17 -0800, Chris Wright wrote: > > Hm, it seems I lied to you about this. The non-coherent mode isn't just > > a historical mistake; it's configurable by the BIOS, and we actually > > encourage people to use the non-coherent mode because it makes the > > hardware page-walk faster — so reduces the latency for IOTLB misses. > > Interesting because for the workloads I've tested it's the exact opposite. > Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent > access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU > added smth like 10% when non-coherent vs. coherent. Right. I specifically said "latency for IOTLB misses". That is the main reason we apparently advise BIOS authors to use non-coherent mode. (Of course, the fact that we're letting the BIOS control this at all means that we obviously haven't learned *any* lessons from the last few years of pain with VT-d enabling and BIOS bugs. It should have been under OS control. But that's a separate, stunningly depressing, issue.) Of course, The *overall* performance sucks rocks if we are in non-coherent mode. The bare-metal NIC bandwidth testing is the obvious test case for it, and thanks for providing ballpark numbers for that. They perfectly back up the qualitative remarks I made to Rajesh yesterday on exactly this topic. -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5818 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-18 12:03 ` David Woodhouse 2011-11-18 16:03 ` Alex Williamson 2011-11-19 19:17 ` Chris Wright @ 2011-11-23 7:30 ` cody 2011-11-22 15:41 ` David Woodhouse 2 siblings, 1 reply; 11+ messages in thread From: cody @ 2011-11-23 7:30 UTC (permalink / raw) To: David Woodhouse Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile On 11/18/2011 04:03 AM, David Woodhouse wrote: > > Hm, it seems I lied to you about this. The non-coherent mode isn't just > a historical mistake; it's configurable by the BIOS, and we actually > encourage people to use the non-coherent mode because it makes the > hardware page-walk faster — so reduces the latency for IOTLB misses. > > I have a question, Intel VT-d manual says the coherency bit is "RO", how can it be configurable by the BIOS? Or there's another way to configure the IOMMU to be coherent or non-coherent? (I didn't find such info in the manual. I missed it?) -cody > In addition to that, the IOMMU associated with the integrated graphics > is so "special" that it doesn't support coherent mode either. So it *is* > quite feasible that we'll see a machine where some IOMMUs support > coherent mode, and some don't. > > And thus we do need to address the concern that just assuming > non-coherent mode will cause unnecessary performance issues, for the > case where a domain *doesn't* happen to include any of the non-coherent > IOMMUs. > > However... for VM domains I don't think we care. Setting up the page > tables *isn't* a fast path there (at least not until/unless we support > exposing an emulated IOMMU to the guest). > > The case we care about is *native* DMA, where this cache flush will be > happening for example in the fast path of network TX/RX. But in *that* > case, there is only *one* IOMMU to worry about so it's simple enough to > do the right thing, surely? > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] intel-iommu: Manage iommu_coherency globally 2011-11-23 7:30 ` cody @ 2011-11-22 15:41 ` David Woodhouse 0 siblings, 0 replies; 11+ messages in thread From: David Woodhouse @ 2011-11-22 15:41 UTC (permalink / raw) To: cody Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile On Tue, 2011-11-22 at 23:30 -0800, cody wrote: > I have a question, Intel VT-d manual says the coherency bit is "RO", how > can it be configurable by the BIOS? Or there's another way to configure > the IOMMU to be coherent or non-coherent? (I didn't find such info in > the manual. I missed it?) It's not in the manual. We hide it so that only the BIOS can set it. It's only read-only to *you*, the operating system that's supposed to have control of the hardware. You might think that we should expose it to the OS so that it can be set according to the *usage* — so perhaps you'd set it non-coherent for VM use (where you rarely modify the page tables) and coherent for host-based DMA protection (where the constant clflushes in the fast path cause a 10% performance hit on network bandwidth testing). However, that would not be consistent with our goal of ensuring that the BIOS can *always* screw you over as much as possible. -- dwmw2 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-11-22 15:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-16 4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson 2011-11-16 10:28 ` Chris Wright 2011-11-18 12:03 ` David Woodhouse 2011-11-18 16:03 ` Alex Williamson 2011-11-18 17:00 ` David Woodhouse 2011-11-18 17:15 ` Alex Williamson 2011-11-18 18:15 ` David Woodhouse 2011-11-19 19:17 ` Chris Wright 2011-11-19 20:11 ` David Woodhouse 2011-11-23 7:30 ` cody 2011-11-22 15:41 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).