linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] intel-iommu: Manage iommu_coherency globally
@ 2011-11-16  4:11 Alex Williamson
  2011-11-16 10:28 ` Chris Wright
  2011-11-18 12:03 ` David Woodhouse
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Williamson @ 2011-11-16  4:11 UTC (permalink / raw)
  To: dwmw2; +Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile, alex.williamson

We currently manage iommu_coherency on a per domain basis,
choosing the safest setting across the iommus attached to a
particular domain.  This unfortunately has a bug that when
no iommus are attached, the domain defaults to coherent.
If we fall into this mode, then later add a device behind a
non-coherent iommu to that domain, the context entry is
updated using the wrong coherency setting, and we get dmar
faults.

Since we expect chipsets to be consistent in their coherency
setting, we can instead determine the coherency once and use
it globally.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

As suggested, here's an updated patch to which sets the
coherency once at initalization time instead of changing
the default coherency for orphaned domains.  Thanks,

Alex

 drivers/iommu/intel-iommu.c |   40 +++++++++++++++++-----------------------
 1 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c0c7820..79e103a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -353,7 +353,6 @@ struct dmar_domain {
 
 	int		flags;		/* flags to find out type of domain */
 
-	int		iommu_coherency;/* indicate coherency of iommu access */
 	int		iommu_snooping; /* indicate snooping control feature*/
 	int		iommu_count;	/* reference count of iommu */
 	int		iommu_superpage;/* Level of superpages supported:
@@ -419,6 +418,8 @@ static LIST_HEAD(device_domain_list);
 
 static struct iommu_ops intel_iommu_ops;
 
+static int iommu_coherency __read_mostly = 1;
+
 static int __init intel_iommu_setup(char *str)
 {
 	if (!str)
@@ -556,20 +557,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
 	return g_iommus[iommu_id];
 }
 
-static void domain_update_iommu_coherency(struct dmar_domain *domain)
-{
-	int i;
-
-	domain->iommu_coherency = 1;
-
-	for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) {
-		if (!ecap_coherent(g_iommus[i]->ecap)) {
-			domain->iommu_coherency = 0;
-			break;
-		}
-	}
-}
-
 static void domain_update_iommu_snooping(struct dmar_domain *domain)
 {
 	int i;
@@ -608,7 +595,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain)
 /* Some capabilities may be different across iommus */
 static void domain_update_iommu_cap(struct dmar_domain *domain)
 {
-	domain_update_iommu_coherency(domain);
 	domain_update_iommu_snooping(domain);
 	domain_update_iommu_superpage(domain);
 }
@@ -646,7 +632,7 @@ static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn)
 static void domain_flush_cache(struct dmar_domain *domain,
 			       void *addr, int size)
 {
-	if (!domain->iommu_coherency)
+	if (!iommu_coherency)
 		clflush_cache_range(addr, size);
 }
 
@@ -1459,11 +1445,6 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
 	domain->agaw = agaw;
 	INIT_LIST_HEAD(&domain->devices);
 
-	if (ecap_coherent(iommu->ecap))
-		domain->iommu_coherency = 1;
-	else
-		domain->iommu_coherency = 0;
-
 	if (ecap_sc_support(iommu->ecap))
 		domain->iommu_snooping = 1;
 	else
@@ -3583,6 +3564,8 @@ static struct notifier_block device_nb = {
 
 int __init intel_iommu_init(void)
 {
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
 	int ret = 0;
 
 	/* VT-d is required for a TXT/tboot launch, so enforce that */
@@ -3643,6 +3626,18 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
+	/*
+	 * Check for coherency support across all iommus.  If unsupported
+	 * by any, force cache flushes.  Expected to be consistent across
+	 * all iommus in the system.
+	 */
+	for_each_active_iommu(iommu, drhd) {
+		if (!ecap_coherent(iommu->ecap)) {
+			iommu_coherency = 0;
+			break;
+		}
+	}
+
 	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
@@ -3820,7 +3815,6 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
 	INIT_LIST_HEAD(&domain->devices);
 
 	domain->iommu_count = 0;
-	domain->iommu_coherency = 0;
 	domain->iommu_snooping = 0;
 	domain->iommu_superpage = 0;
 	domain->max_addr = 0;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-16  4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson
@ 2011-11-16 10:28 ` Chris Wright
  2011-11-18 12:03 ` David Woodhouse
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Wright @ 2011-11-16 10:28 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dwmw2, iommu, linux-pci, linux-kernel, chrisw, ddutile

* Alex Williamson (alex.williamson@redhat.com) wrote:
> We currently manage iommu_coherency on a per domain basis,
> choosing the safest setting across the iommus attached to a
> particular domain.  This unfortunately has a bug that when
> no iommus are attached, the domain defaults to coherent.
> If we fall into this mode, then later add a device behind a
> non-coherent iommu to that domain, the context entry is
> updated using the wrong coherency setting, and we get dmar
> faults.
> 
> Since we expect chipsets to be consistent in their coherency
> setting, we can instead determine the coherency once and use
> it globally.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Looks good to me.

Acked-by: Chris Wright <chrisw@sous-sol.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-16  4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson
  2011-11-16 10:28 ` Chris Wright
@ 2011-11-18 12:03 ` David Woodhouse
  2011-11-18 16:03   ` Alex Williamson
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: David Woodhouse @ 2011-11-18 12:03 UTC (permalink / raw)
  To: Alex Williamson, rajesh.sankaran
  Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile

[-- Attachment #1: Type: text/plain, Size: 5777 bytes --]

On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote:
> We currently manage iommu_coherency on a per domain basis,
> choosing the safest setting across the iommus attached to a
> particular domain.  This unfortunately has a bug that when
> no iommus are attached, the domain defaults to coherent.
> If we fall into this mode, then later add a device behind a
> non-coherent iommu to that domain, the context entry is
> updated using the wrong coherency setting, and we get dmar
> faults.
> 
> Since we expect chipsets to be consistent in their coherency
> setting, we can instead determine the coherency once and use
> it globally. 

(Adding Rajesh).

Hm, it seems I lied to you about this. The non-coherent mode isn't just
a historical mistake; it's configurable by the BIOS, and we actually
encourage people to use the non-coherent mode because it makes the
hardware page-walk faster — so reduces the latency for IOTLB misses.

In addition to that, the IOMMU associated with the integrated graphics
is so "special" that it doesn't support coherent mode either. So it *is*
quite feasible that we'll see a machine where some IOMMUs support
coherent mode, and some don't.

And thus we do need to address the concern that just assuming
non-coherent mode will cause unnecessary performance issues, for the
case where a domain *doesn't* happen to include any of the non-coherent
IOMMUs.

However... for VM domains I don't think we care. Setting up the page
tables *isn't* a fast path there (at least not until/unless we support
exposing an emulated IOMMU to the guest).

The case we care about is *native* DMA, where this cache flush will be
happening for example in the fast path of network TX/RX. But in *that*
case, there is only *one* IOMMU to worry about so it's simple enough to
do the right thing, surely?

So, referring to your patch... perhaps we should retain the
domain->iommu_coherency variable, but *still* have the global
'iommu_coherency' calculated at init time. For *VM* domains, we set
domain->iommu_coherency = iommu_coherency and use the
globally-calculated (and pessimistic) result. For *DMA* domains (i.e. in
domain_init()), we keep the code that was there before, which looks at
ecap_coherent(iommu->ecap). Does that seem sane? Revised patch...

From c6b04e1c7456892f43a6f81f0005b3ade8ca0383 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson@redhat.com>
Date: Tue, 15 Nov 2011 21:11:09 -0700
Subject: [PATCH] intel-iommu: Manage iommu_coherency globally

We currently manage iommu_coherency on a per domain basis,
choosing the safest setting across the iommus attached to a
particular domain.  This unfortunately has a bug that when
no iommus are attached, the domain defaults to coherent.
If we fall into this mode, then later add a device behind a
non-coherent iommu to that domain, the context entry is
updated using the wrong coherency setting, and we get dmar
faults.

Since we expect chipsets to be consistent in their coherency
setting, we can instead determine the coherency once and use
it globally.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[dwmw2: Do the above for VM domains only; use capabilities for normal domains]
---
 drivers/iommu/intel-iommu.c |   33 +++++++++++++++++----------------
 1 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c0c7820..ce8f933 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -419,6 +419,8 @@ static LIST_HEAD(device_domain_list);
 
 static struct iommu_ops intel_iommu_ops;
 
+static int vm_iommu_coherency __read_mostly = 1;
+
 static int __init intel_iommu_setup(char *str)
 {
 	if (!str)
@@ -556,20 +558,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
 	return g_iommus[iommu_id];
 }
 
-static void domain_update_iommu_coherency(struct dmar_domain *domain)
-{
-	int i;
-
-	domain->iommu_coherency = 1;
-
-	for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) {
-		if (!ecap_coherent(g_iommus[i]->ecap)) {
-			domain->iommu_coherency = 0;
-			break;
-		}
-	}
-}
-
 static void domain_update_iommu_snooping(struct dmar_domain *domain)
 {
 	int i;
@@ -608,7 +596,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain)
 /* Some capabilities may be different across iommus */
 static void domain_update_iommu_cap(struct dmar_domain *domain)
 {
-	domain_update_iommu_coherency(domain);
 	domain_update_iommu_snooping(domain);
 	domain_update_iommu_superpage(domain);
 }
@@ -3583,6 +3570,8 @@ static struct notifier_block device_nb = {
 
 int __init intel_iommu_init(void)
 {
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
 	int ret = 0;
 
 	/* VT-d is required for a TXT/tboot launch, so enforce that */
@@ -3643,6 +3632,18 @@ int __init intel_iommu_init(void)
 
 	init_iommu_pm_ops();
 
+	/*
+	 * Check for coherency support across all iommus.  If unsupported
+	 * by any, force cache flushes.  Expected to be consistent across
+	 * all iommus in the system.
+	 */
+	for_each_active_iommu(iommu, drhd) {
+		if (!ecap_coherent(iommu->ecap)) {
+			vm_iommu_coherency = 0;
+			break;
+		}
+	}
+
 	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
 
 	bus_register_notifier(&pci_bus_type, &device_nb);
@@ -3820,7 +3821,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
 	INIT_LIST_HEAD(&domain->devices);
 
 	domain->iommu_count = 0;
-	domain->iommu_coherency = 0;
+	domain->iommu_coherency = vm_iommu_coherency;
 	domain->iommu_snooping = 0;
 	domain->iommu_superpage = 0;
 	domain->max_addr = 0;
-- 
1.7.7.1



-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 12:03 ` David Woodhouse
@ 2011-11-18 16:03   ` Alex Williamson
  2011-11-18 17:00     ` David Woodhouse
  2011-11-19 19:17   ` Chris Wright
  2011-11-23  7:30   ` cody
  2 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2011-11-18 16:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile

On Fri, 2011-11-18 at 12:03 +0000, David Woodhouse wrote:
> On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote:
> > We currently manage iommu_coherency on a per domain basis,
> > choosing the safest setting across the iommus attached to a
> > particular domain.  This unfortunately has a bug that when
> > no iommus are attached, the domain defaults to coherent.
> > If we fall into this mode, then later add a device behind a
> > non-coherent iommu to that domain, the context entry is
> > updated using the wrong coherency setting, and we get dmar
> > faults.
> > 
> > Since we expect chipsets to be consistent in their coherency
> > setting, we can instead determine the coherency once and use
> > it globally. 
> 
> (Adding Rajesh).
> 
> Hm, it seems I lied to you about this. The non-coherent mode isn't just
> a historical mistake; it's configurable by the BIOS, and we actually
> encourage people to use the non-coherent mode because it makes the
> hardware page-walk faster — so reduces the latency for IOTLB misses.
> 
> In addition to that, the IOMMU associated with the integrated graphics
> is so "special" that it doesn't support coherent mode either. So it *is*
> quite feasible that we'll see a machine where some IOMMUs support
> coherent mode, and some don't.
> 
> And thus we do need to address the concern that just assuming
> non-coherent mode will cause unnecessary performance issues, for the
> case where a domain *doesn't* happen to include any of the non-coherent
> IOMMUs.
> 
> However... for VM domains I don't think we care. Setting up the page
> tables *isn't* a fast path there (at least not until/unless we support
> exposing an emulated IOMMU to the guest).
> 
> The case we care about is *native* DMA, where this cache flush will be
> happening for example in the fast path of network TX/RX. But in *that*
> case, there is only *one* IOMMU to worry about so it's simple enough to
> do the right thing, surely?
> 
> So, referring to your patch... perhaps we should retain the
> domain->iommu_coherency variable, but *still* have the global
> 'iommu_coherency' calculated at init time. For *VM* domains, we set
> domain->iommu_coherency = iommu_coherency and use the
> globally-calculated (and pessimistic) result. For *DMA* domains (i.e. in
> domain_init()), we keep the code that was there before, which looks at
> ecap_coherent(iommu->ecap). Does that seem sane? Revised patch...

I can't help but thinking we're just taking the easy, lazy path for VM
domains when it sounds like we really would prefer to keep the coherency
set to the least common denominator of the domain rather than the
platform.  Couldn't we instead force a flush of the domain when we
transition from coherent to non-coherent?  Not sure I'm qualified to
write that, but seems like it would keep the efficiency of VM domains
with no effect to native DMA domains since they'd never trigger such a
transition.  The below appears as if it would work and probably be OK
since the unnecessary cache flushes are rare, but they're still
unnecessary... and the comments/commit log are now wrong.  Thanks,

Alex

> From c6b04e1c7456892f43a6f81f0005b3ade8ca0383 Mon Sep 17 00:00:00 2001
> From: Alex Williamson <alex.williamson@redhat.com>
> Date: Tue, 15 Nov 2011 21:11:09 -0700
> Subject: [PATCH] intel-iommu: Manage iommu_coherency globally
> 
> We currently manage iommu_coherency on a per domain basis,
> choosing the safest setting across the iommus attached to a
> particular domain.  This unfortunately has a bug that when
> no iommus are attached, the domain defaults to coherent.
> If we fall into this mode, then later add a device behind a
> non-coherent iommu to that domain, the context entry is
> updated using the wrong coherency setting, and we get dmar
> faults.
> 
> Since we expect chipsets to be consistent in their coherency
> setting, we can instead determine the coherency once and use
> it globally.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> [dwmw2: Do the above for VM domains only; use capabilities for normal domains]
> ---
>  drivers/iommu/intel-iommu.c |   33 +++++++++++++++++----------------
>  1 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c0c7820..ce8f933 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -419,6 +419,8 @@ static LIST_HEAD(device_domain_list);
>  
>  static struct iommu_ops intel_iommu_ops;
>  
> +static int vm_iommu_coherency __read_mostly = 1;
> +
>  static int __init intel_iommu_setup(char *str)
>  {
>  	if (!str)
> @@ -556,20 +558,6 @@ static struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
>  	return g_iommus[iommu_id];
>  }
>  
> -static void domain_update_iommu_coherency(struct dmar_domain *domain)
> -{
> -	int i;
> -
> -	domain->iommu_coherency = 1;
> -
> -	for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) {
> -		if (!ecap_coherent(g_iommus[i]->ecap)) {
> -			domain->iommu_coherency = 0;
> -			break;
> -		}
> -	}
> -}
> -
>  static void domain_update_iommu_snooping(struct dmar_domain *domain)
>  {
>  	int i;
> @@ -608,7 +596,6 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain)
>  /* Some capabilities may be different across iommus */
>  static void domain_update_iommu_cap(struct dmar_domain *domain)
>  {
> -	domain_update_iommu_coherency(domain);
>  	domain_update_iommu_snooping(domain);
>  	domain_update_iommu_superpage(domain);
>  }
> @@ -3583,6 +3570,8 @@ static struct notifier_block device_nb = {
>  
>  int __init intel_iommu_init(void)
>  {
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
>  	int ret = 0;
>  
>  	/* VT-d is required for a TXT/tboot launch, so enforce that */
> @@ -3643,6 +3632,18 @@ int __init intel_iommu_init(void)
>  
>  	init_iommu_pm_ops();
>  
> +	/*
> +	 * Check for coherency support across all iommus.  If unsupported
> +	 * by any, force cache flushes.  Expected to be consistent across
> +	 * all iommus in the system.
> +	 */
> +	for_each_active_iommu(iommu, drhd) {
> +		if (!ecap_coherent(iommu->ecap)) {
> +			vm_iommu_coherency = 0;
> +			break;
> +		}
> +	}
> +
>  	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
>  
>  	bus_register_notifier(&pci_bus_type, &device_nb);
> @@ -3820,7 +3821,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
>  	INIT_LIST_HEAD(&domain->devices);
>  
>  	domain->iommu_count = 0;
> -	domain->iommu_coherency = 0;
> +	domain->iommu_coherency = vm_iommu_coherency;
>  	domain->iommu_snooping = 0;
>  	domain->iommu_superpage = 0;
>  	domain->max_addr = 0;
> -- 
> 1.7.7.1
> 
> 
> 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 16:03   ` Alex Williamson
@ 2011-11-18 17:00     ` David Woodhouse
  2011-11-18 17:15       ` Alex Williamson
  0 siblings, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2011-11-18 17:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

On Fri, 2011-11-18 at 09:03 -0700, Alex Williamson wrote:
> I can't help but thinking we're just taking the easy, lazy path for VM
> domains when it sounds like we really would prefer to keep the coherency
> set to the least common denominator of the domain rather than the
> platform.  Couldn't we instead force a flush of the domain when we
> transition from coherent to non-coherent?  Not sure I'm qualified to
> write that, but seems like it would keep the efficiency of VM domains
> with no effect to native DMA domains since they'd never trigger such a
> transition.  The below appears as if it would work and probably be OK
> since the unnecessary cache flushes are rare, but they're still
> unnecessary... and the comments/commit log are now wrong.  Thanks,

Yeah, that would make some sense. I was about to knock up some code
which would walk the page tables and use clflush to flush every one...
but wouldn't it be saner just to use wbinvd?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 17:00     ` David Woodhouse
@ 2011-11-18 17:15       ` Alex Williamson
  2011-11-18 18:15         ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2011-11-18 17:15 UTC (permalink / raw)
  To: David Woodhouse
  Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile

On Fri, 2011-11-18 at 17:00 +0000, David Woodhouse wrote:
> On Fri, 2011-11-18 at 09:03 -0700, Alex Williamson wrote:
> > I can't help but thinking we're just taking the easy, lazy path for VM
> > domains when it sounds like we really would prefer to keep the coherency
> > set to the least common denominator of the domain rather than the
> > platform.  Couldn't we instead force a flush of the domain when we
> > transition from coherent to non-coherent?  Not sure I'm qualified to
> > write that, but seems like it would keep the efficiency of VM domains
> > with no effect to native DMA domains since they'd never trigger such a
> > transition.  The below appears as if it would work and probably be OK
> > since the unnecessary cache flushes are rare, but they're still
> > unnecessary... and the comments/commit log are now wrong.  Thanks,
> 
> Yeah, that would make some sense. I was about to knock up some code
> which would walk the page tables and use clflush to flush every one...
> but wouldn't it be saner just to use wbinvd?

A bit heavy handed, but obviously easier.  It feels like we could safely
be more strategic, but maybe we'd end up trashing the cache anyway in a
drawn out attempt to flush the context and all page tables.  However, do
we actually need a wbinvd_on_all_cpus()?  Probably better to trash one
cache than all of them.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 17:15       ` Alex Williamson
@ 2011-11-18 18:15         ` David Woodhouse
  0 siblings, 0 replies; 11+ messages in thread
From: David Woodhouse @ 2011-11-18 18:15 UTC (permalink / raw)
  To: Alex Williamson
  Cc: rajesh.sankaran, iommu, linux-pci, linux-kernel, chrisw, ddutile

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

On Fri, 2011-11-18 at 10:15 -0700, Alex Williamson wrote:
> A bit heavy handed, but obviously easier.  It feels like we could safely
> be more strategic, but maybe we'd end up trashing the cache anyway in a
> drawn out attempt to flush the context and all page tables.  However, do
> we actually need a wbinvd_on_all_cpus()?  Probably better to trash one
> cache than all of them.  Thanks, 

Yes, it would be wbinvd_on_all_cpus(). The page-table-walk code would
look something like this...

static void flush_domain_ptes(struct dmar_domain *domain)
{
	int level = agaw_to_level(domain->agaw);
	struct dma_pte *ptelvl[level + 2], *pte;

	pte = domain->pgd;
	clflush_cache_range(pte, VTD_PAGE_SIZE);

	/* Keep the end condition nice and simple... */
	ptelvl[level + 1] = NULL;

	while (pte) {
		/* BUG_ON(level < 2); */
		if (dma_pte_present(pte)) {
			if (level == 2) {
				/* Flush the bottom-level page, but no need to
				   process its *contents* any further. */
				clflush_cache_range(phys_to_virt(dma_pte_addr(pte)),
						    VTD_PAGE_SIZE);
			} else if (!(pte->val & DMA_PTE_LARGE_PAGE)) {
				/* Remember the next PTE at this level; we'll
				   come back to it. */
				ptelvl[level] = pte + 1;

				/* Go down a level and process PTEs there. */
				pte = phys_to_virt(dma_pte_addr(pte));
				clflush_cache_range(pte, VTD_PAGE_SIZE);
				level--;
				continue;
			}
		}
		
		pte++;
		/* When we reach the end of a PTE page, go back up to the next
		   level where we came from. */
		while (pte && !((unsigned long)pte & ~VTD_PAGE_MASK)) {
			level++;
			pte = ptelvl[level];
		}
	}
}

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 12:03 ` David Woodhouse
  2011-11-18 16:03   ` Alex Williamson
@ 2011-11-19 19:17   ` Chris Wright
  2011-11-19 20:11     ` David Woodhouse
  2011-11-23  7:30   ` cody
  2 siblings, 1 reply; 11+ messages in thread
From: Chris Wright @ 2011-11-19 19:17 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel,
	chrisw, ddutile

* David Woodhouse (dwmw2@infradead.org) wrote:
> On Tue, 2011-11-15 at 21:11 -0700, Alex Williamson wrote:
> > We currently manage iommu_coherency on a per domain basis,
> > choosing the safest setting across the iommus attached to a
> > particular domain.  This unfortunately has a bug that when
> > no iommus are attached, the domain defaults to coherent.
> > If we fall into this mode, then later add a device behind a
> > non-coherent iommu to that domain, the context entry is
> > updated using the wrong coherency setting, and we get dmar
> > faults.
> > 
> > Since we expect chipsets to be consistent in their coherency
> > setting, we can instead determine the coherency once and use
> > it globally. 
> 
> (Adding Rajesh).
> 
> Hm, it seems I lied to you about this. The non-coherent mode isn't just
> a historical mistake; it's configurable by the BIOS, and we actually
> encourage people to use the non-coherent mode because it makes the
> hardware page-walk faster — so reduces the latency for IOTLB misses.

Interesting because for the workloads I've tested it's the exact opposite.
Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent
access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU
added smth like 10% when non-coherent vs. coherent.

> In addition to that, the IOMMU associated with the integrated graphics
> is so "special" that it doesn't support coherent mode either. So it *is*
> quite feasible that we'll see a machine where some IOMMUs support
> coherent mode, and some don't.
> 
> And thus we do need to address the concern that just assuming
> non-coherent mode will cause unnecessary performance issues, for the
> case where a domain *doesn't* happen to include any of the non-coherent
> IOMMUs.
> 
> However... for VM domains I don't think we care. Setting up the page
> tables *isn't* a fast path there (at least not until/unless we support
> exposing an emulated IOMMU to the guest).
> 
> The case we care about is *native* DMA, where this cache flush will be
> happening for example in the fast path of network TX/RX. But in *that*
> case, there is only *one* IOMMU to worry about so it's simple enough to
> do the right thing, surely?

Definitely agreed on the above points, limited page table setup/teardown
to VMs and bare metal case is sensitive to IOMMU overhead of flushing.

thanks,
-chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-19 19:17   ` Chris Wright
@ 2011-11-19 20:11     ` David Woodhouse
  0 siblings, 0 replies; 11+ messages in thread
From: David Woodhouse @ 2011-11-19 20:11 UTC (permalink / raw)
  To: Chris Wright
  Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel,
	ddutile

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

On Sat, 2011-11-19 at 11:17 -0800, Chris Wright wrote:
> > Hm, it seems I lied to you about this. The non-coherent mode isn't just
> > a historical mistake; it's configurable by the BIOS, and we actually
> > encourage people to use the non-coherent mode because it makes the
> > hardware page-walk faster — so reduces the latency for IOTLB misses.
> 
> Interesting because for the workloads I've tested it's the exact opposite.
> Tested w/ BIOS enabling and disabling coherency, and w/ non-coherent
> access and streaming DMA (i.e. bare metal NIC bw testing)...the IOMMU
> added smth like 10% when non-coherent vs. coherent. 

Right. I specifically said "latency for IOTLB misses". That is the main
reason we apparently advise BIOS authors to use non-coherent mode.

(Of course, the fact that we're letting the BIOS control this at all
means that we obviously haven't learned *any* lessons from the last few
years of pain with VT-d enabling and BIOS bugs. It should have been
under OS control. But that's a separate, stunningly depressing, issue.)

Of course, The *overall* performance sucks rocks if we are in
non-coherent mode. The bare-metal NIC bandwidth testing is the obvious
test case for it, and thanks for providing ballpark numbers for that.
They perfectly back up the qualitative remarks I made to Rajesh
yesterday on exactly this topic.

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-23  7:30   ` cody
@ 2011-11-22 15:41     ` David Woodhouse
  0 siblings, 0 replies; 11+ messages in thread
From: David Woodhouse @ 2011-11-22 15:41 UTC (permalink / raw)
  To: cody
  Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel,
	chrisw, ddutile

On Tue, 2011-11-22 at 23:30 -0800, cody wrote:
> I have a question, Intel VT-d manual says the coherency bit is "RO", how 
> can it be configurable by the BIOS? Or there's another way to configure 
> the IOMMU to be coherent or non-coherent? (I didn't find such info in 
> the manual. I missed it?) 

It's not in the manual. We hide it so that only the BIOS can set it.
It's only read-only to *you*, the operating system that's supposed to
have control of the hardware.

You might think that we should expose it to the OS so that it can be set
according to the *usage* — so perhaps you'd set it non-coherent for VM
use (where you rarely modify the page tables) and coherent for
host-based DMA protection (where the constant clflushes in the fast path
cause a 10% performance hit on network bandwidth testing).

However, that would not be consistent with our goal of ensuring that the
BIOS can *always* screw you over as much as possible.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] intel-iommu: Manage iommu_coherency globally
  2011-11-18 12:03 ` David Woodhouse
  2011-11-18 16:03   ` Alex Williamson
  2011-11-19 19:17   ` Chris Wright
@ 2011-11-23  7:30   ` cody
  2011-11-22 15:41     ` David Woodhouse
  2 siblings, 1 reply; 11+ messages in thread
From: cody @ 2011-11-23  7:30 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Alex Williamson, rajesh.sankaran, iommu, linux-pci, linux-kernel,
	chrisw, ddutile

On 11/18/2011 04:03 AM, David Woodhouse wrote:
>
> Hm, it seems I lied to you about this. The non-coherent mode isn't just
> a historical mistake; it's configurable by the BIOS, and we actually
> encourage people to use the non-coherent mode because it makes the
> hardware page-walk faster — so reduces the latency for IOTLB misses.
>
>    
I have a question, Intel VT-d manual says the coherency bit is "RO", how 
can it be configurable by the BIOS? Or there's another way to configure 
the IOMMU to be coherent or non-coherent? (I didn't find such info in 
the manual. I missed it?)

-cody
> In addition to that, the IOMMU associated with the integrated graphics
> is so "special" that it doesn't support coherent mode either. So it *is*
> quite feasible that we'll see a machine where some IOMMUs support
> coherent mode, and some don't.
>
> And thus we do need to address the concern that just assuming
> non-coherent mode will cause unnecessary performance issues, for the
> case where a domain *doesn't* happen to include any of the non-coherent
> IOMMUs.
>
> However... for VM domains I don't think we care. Setting up the page
> tables *isn't* a fast path there (at least not until/unless we support
> exposing an emulated IOMMU to the guest).
>
> The case we care about is *native* DMA, where this cache flush will be
> happening for example in the fast path of network TX/RX. But in *that*
> case, there is only *one* IOMMU to worry about so it's simple enough to
> do the right thing, surely?
>
>
>    

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-11-22 15:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-16  4:11 [PATCH] intel-iommu: Manage iommu_coherency globally Alex Williamson
2011-11-16 10:28 ` Chris Wright
2011-11-18 12:03 ` David Woodhouse
2011-11-18 16:03   ` Alex Williamson
2011-11-18 17:00     ` David Woodhouse
2011-11-18 17:15       ` Alex Williamson
2011-11-18 18:15         ` David Woodhouse
2011-11-19 19:17   ` Chris Wright
2011-11-19 20:11     ` David Woodhouse
2011-11-23  7:30   ` cody
2011-11-22 15:41     ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).