* [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
@ 2012-09-12 19:55 Donald Dutile
[not found] ` <1347479705-33972-1-git-send-email-ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 22+ messages in thread
From: Donald Dutile @ 2012-09-12 19:55 UTC (permalink / raw)
To: joro-zLv9SwRftAIdnm+yROfE0A
Cc: dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
This patch was posted back in Nov 2011:
http://lists.linuxfoundation.org/pipermail/iommu/2011-November/003086.html
and due to discussion about the patch, it was never pulled in.
Although the thread discussed an alternate patch to
default to non-coherent if any IOMMU didn't support coherency,
this alternate method was never implemented, and this bug persists.
This patch has been in RHEL6 for quite some time,
and it wasn't noticed that it didn't get into linux upstream,
until a RH partner reported this error when running upstream kernels,
and noticed how it doesn't occur on RHEL6 kernels.
Applying this patch to an upstream kernel resolved this issue.
domain_update_iommu_coherency() currently defaults to setting
domains as coherent when the domain is not attached to any iommus.
This allows for a window in domain_context_mapping_one() where such a
domain can update context entries non-coherently, and only after
update the domain capability to clear iommu_coherency.
This can be seen using KVM device assignment on VT-d systems that
do not support coherency in the ecap register. When a device is
added to a guest, a domain is created (iommu_coherency = 0), the
device is attached, and ranges are mapped. If we then hot unplug
the device, the coherency is updated and set to the default (1)
since no iommus are attached to the domain. A subsequent attach
of a device makes use of the same dmar domain (now marked coherent)
updates context entries with coherency enabled, and only disables
coherency as the last step in the process.
To fix this, switch domain_update_iommu_coherency() to use the
safer, non-coherent default for domains not attached to iommus.
Signed-off-by: Donald Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
cc: Alex Williamson <alex.williamson at redhat.com>
---
drivers/iommu/intel-iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2297ec1..52ff63f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -589,7 +589,9 @@ static void domain_update_iommu_coherency(struct dmar_domain *domain)
{
int i;
- domain->iommu_coherency = 1;
+ i = find_first_bit(domain->iommu_bmp, g_num_of_iommus);
+
+ domain->iommu_coherency = i < g_num_of_iommus ? 1: 0;
for_each_set_bit(i, domain->iommu_bmp, g_num_of_iommus) {
if (!ecap_coherent(g_iommus[i]->ecap)) {
--
1.7.10.2.552.gaa3bb87
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
@ 2011-11-11 22:49 Alex Williamson
2011-11-12 0:17 ` Chris Wright
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Alex Williamson @ 2011-11-11 22:49 UTC (permalink / raw)
To: dwmw2; +Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile, alex.williamson
domain_update_iommu_coherency() currently default to setting domains
as coherent when the domain is not attached to any iommus. This
allows for a window in domain_context_mapping_one() where such a
domain can update context entries non-coherenty, and only after
update the domain capability to clear iommu_coherency.
This can be seen using KVM device assignment on VT-d systems that
do not support coherency in the ecap register. When a device is
added to a guest, a domain is created (iommu_coherency = 0), the
device is attached, and ranges are mapped. If we then hot unplug
the device, the coherency is updated and set to the default (1)
since no iommus are attached to the domain. A subsequenct attach
of a device makes use of the same dmar domain (now marked coherent)
updates context entries with cohrency enabled, and only disables
coherency as the last step in the process.
To fix this, switch domain_update_iommu_coherency() to use the
safer, non-coherent default for domains not attached to iommus.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
---
drivers/iommu/intel-iommu.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c0c7820..6b9d8c1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -560,7 +560,9 @@ static void domain_update_iommu_coherency(struct dmar_domain *domain)
{
int i;
- domain->iommu_coherency = 1;
+ i = find_first_bit(&domain->iommu_bmp, g_num_of_iommus);
+
+ domain->iommu_coherency = i < g_num_of_iommus ? 1 : 0;
for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) {
if (!ecap_coherent(g_iommus[i]->ecap)) {
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-11 22:49 Alex Williamson
@ 2011-11-12 0:17 ` Chris Wright
2011-11-12 0:37 ` David Woodhouse
2012-08-27 20:22 ` Craig Hada
2 siblings, 0 replies; 22+ messages in thread
From: Chris Wright @ 2011-11-12 0:17 UTC (permalink / raw)
To: Alex Williamson; +Cc: dwmw2, iommu, linux-pci, linux-kernel, chrisw, ddutile
* Alex Williamson (alex.williamson@redhat.com) wrote:
> domain_update_iommu_coherency() currently default to setting domains
> as coherent when the domain is not attached to any iommus. This
> allows for a window in domain_context_mapping_one() where such a
> domain can update context entries non-coherenty, and only after
> update the domain capability to clear iommu_coherency.
>
> This can be seen using KVM device assignment on VT-d systems that
> do not support coherency in the ecap register. When a device is
> added to a guest, a domain is created (iommu_coherency = 0), the
> device is attached, and ranges are mapped. If we then hot unplug
> the device, the coherency is updated and set to the default (1)
> since no iommus are attached to the domain. A subsequenct attach
> of a device makes use of the same dmar domain (now marked coherent)
> updates context entries with cohrency enabled, and only disables
> coherency as the last step in the process.
Specifically, in domain_context_mapping_one():
context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
domain_flush_cache(domain, context, sizeof(*context)); <-- noop
This does not update context entry, leaving IOMMU w/ a non-present
context entry, and therefore subsequent DMA generates DMAR faults:
DMAR:[DMA Read] Request device [08:10.0] fault addr 37298000
DMAR:[fault reason 02] Present bit in context entry is clear
> To fix this, switch domain_update_iommu_coherency() to use the
> safer, non-coherent default for domains not attached to iommus.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Tested-by: Donald Dutile <ddutile@redhat.com>
> Acked-by: Donald Dutile <ddutile@redhat.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-11 22:49 Alex Williamson
2011-11-12 0:17 ` Chris Wright
@ 2011-11-12 0:37 ` David Woodhouse
2011-11-12 0:45 ` Chris Wright
2011-11-12 0:46 ` Roland Dreier
2012-08-27 20:22 ` Craig Hada
2 siblings, 2 replies; 22+ messages in thread
From: David Woodhouse @ 2011-11-12 0:37 UTC (permalink / raw)
To: Alex Williamson; +Cc: iommu, linux-pci, linux-kernel, chrisw, ddutile
[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]
On Fri, 2011-11-11 at 15:49 -0700, Alex Williamson wrote:
> To fix this, switch domain_update_iommu_coherency() to use the
> safer, non-coherent default for domains not attached to iommus.
That isn't a fix for the problem you described.
The problem is that changing a domain from coherent to non-coherent is
*broken*. It probably needs to flush the cache for the *entire* set of
page tables — not just the new context entry it adds.
You might have removed the *common* case where we trigger that bug, but
it certainly isn't a fix.
However, I'd be receptive to an argument that the situation you describe
is in fact the *only* time we'd have to switch from coherent to
non-coherent at run time, because the coherency is an all-or-nothing
characteristic of the chipset. Either all the IOMMUs are coherent, or
none of them, right? This brain-damage only affects the first chipsets
from before we worked out that cache incoherency was a *really* f*cking
stupid idea, doesn't it?
So if you were to ditch the whole idea of a per-domain runtime update,
and instead calculate a global value for 'iommu_coherency' at boot time,
by iterating over for_each_active_iommu()¹, I think that would be a
better way to deal with the issue. And you *could* really call that a
'fix'.
Make sense?
--
dwmw2
¹ just in *case* they ever differ
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:37 ` David Woodhouse
@ 2011-11-12 0:45 ` Chris Wright
2011-11-12 0:47 ` Chris Wright
2011-11-12 0:51 ` David Woodhouse
2011-11-12 0:46 ` Roland Dreier
1 sibling, 2 replies; 22+ messages in thread
From: Chris Wright @ 2011-11-12 0:45 UTC (permalink / raw)
To: David Woodhouse; +Cc: Alex Williamson, chrisw, linux-pci, iommu, linux-kernel
* David Woodhouse (dwmw2@infradead.org) wrote:
> On Fri, 2011-11-11 at 15:49 -0700, Alex Williamson wrote:
> > To fix this, switch domain_update_iommu_coherency() to use the
> > safer, non-coherent default for domains not attached to iommus.
>
> That isn't a fix for the problem you described.
>
> The problem is that changing a domain from coherent to non-coherent is
> *broken*. It probably needs to flush the cache for the *entire* set of
> page tables — not just the new context entry it adds.
For a guest domain, the page tables aren't actually changing.
And for the snoop mode change, we remap the pages.
> You might have removed the *common* case where we trigger that bug, but
> it certainly isn't a fix.
>
> However, I'd be receptive to an argument that the situation you describe
> is in fact the *only* time we'd have to switch from coherent to
> non-coherent at run time, because the coherency is an all-or-nothing
> characteristic of the chipset. Either all the IOMMUs are coherent, or
> none of them, right? This brain-damage only affects the first chipsets
> from before we worked out that cache incoherency was a *really* f*cking
> stupid idea, doesn't it?
Dunno if it exists going forward (I've stopped being surprised by the
brain damage in this area ;), but those machines are still out there.
> So if you were to ditch the whole idea of a per-domain runtime update,
> and instead calculate a global value for 'iommu_coherency' at boot time,
> by iterating over for_each_active_iommu()¹, I think that would be a
> better way to deal with the issue. And you *could* really call that a
> 'fix'.
>
> Make sense?
Ideally, yes. Not sure we can practically do it though. Would have to
be sure we force incoherent access mode for the busted hw.
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:45 ` Chris Wright
@ 2011-11-12 0:47 ` Chris Wright
2011-11-12 0:51 ` David Woodhouse
1 sibling, 0 replies; 22+ messages in thread
From: Chris Wright @ 2011-11-12 0:47 UTC (permalink / raw)
To: Chris Wright; +Cc: David Woodhouse, linux-pci, iommu, linux-kernel
* Chris Wright (chrisw@sous-sol.org) wrote:
> * David Woodhouse (dwmw2@infradead.org) wrote:
> > So if you were to ditch the whole idea of a per-domain runtime update,
> > and instead calculate a global value for 'iommu_coherency' at boot time,
> > by iterating over for_each_active_iommu()¹, I think that would be a
> > better way to deal with the issue. And you *could* really call that a
> > 'fix'.
> >
> > Make sense?
>
> Ideally, yes. Not sure we can practically do it though. Would have to
> be sure we force incoherent access mode for the busted hw.
Forgot to mention, the simple patch may be the better choice for -stable
(assuming the page table mapping issue...isn't).
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:45 ` Chris Wright
2011-11-12 0:47 ` Chris Wright
@ 2011-11-12 0:51 ` David Woodhouse
2011-11-12 0:58 ` Chris Wright
1 sibling, 1 reply; 22+ messages in thread
From: David Woodhouse @ 2011-11-12 0:51 UTC (permalink / raw)
To: Chris Wright; +Cc: Alex Williamson, linux-pci, iommu, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 858 bytes --]
On Fri, 2011-11-11 at 16:45 -0800, Chris Wright wrote:
> > So if you were to ditch the whole idea of a per-domain runtime update,
> > and instead calculate a global value for 'iommu_coherency' at boot time,
> > by iterating over for_each_active_iommu()¹, I think that would be a
> > better way to deal with the issue. And you *could* really call that a
> > 'fix'.
> >
> > Make sense?
>
> Ideally, yes. Not sure we can practically do it though. Would have to
> be sure we force incoherent access mode for the busted hw.
Why's it not practical? You do the *same* loop we currently have in
domain_update_iommu_coherency(), except that you do it just *once*, over
all active IOMMUs, at boot time. And then you just use that result
forever more.
So if *any* IOMMU in the system is non-coherent, you run them all that
way.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:51 ` David Woodhouse
@ 2011-11-12 0:58 ` Chris Wright
2011-11-12 1:03 ` David Woodhouse
0 siblings, 1 reply; 22+ messages in thread
From: Chris Wright @ 2011-11-12 0:58 UTC (permalink / raw)
To: David Woodhouse
Cc: Chris Wright, Alex Williamson, linux-pci, iommu, linux-kernel
* David Woodhouse (dwmw2@infradead.org) wrote:
> On Fri, 2011-11-11 at 16:45 -0800, Chris Wright wrote:
> > > So if you were to ditch the whole idea of a per-domain runtime update,
> > > and instead calculate a global value for 'iommu_coherency' at boot time,
> > > by iterating over for_each_active_iommu()¹, I think that would be a
> > > better way to deal with the issue. And you *could* really call that a
> > > 'fix'.
> > >
> > > Make sense?
> >
> > Ideally, yes. Not sure we can practically do it though. Would have to
> > be sure we force incoherent access mode for the busted hw.
>
> Why's it not practical? You do the *same* loop we currently have in
> domain_update_iommu_coherency(), except that you do it just *once*, over
> all active IOMMUs, at boot time. And then you just use that result
> forever more.
Yeah, you're right, that should be simple enough. What about snoop
control? Same thing...should we expect it to be system wide? Because
that one's exported out to KVM and used.
> So if *any* IOMMU in the system is non-coherent, you run them all that
> way.
Minus the measureable slowdown for some devices that were behind coherent
IOMMU (IIRC, the chipsets that had this issue were mobile anyway, so
not as performance sensitive), *nod* .
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:58 ` Chris Wright
@ 2011-11-12 1:03 ` David Woodhouse
0 siblings, 0 replies; 22+ messages in thread
From: David Woodhouse @ 2011-11-12 1:03 UTC (permalink / raw)
To: Chris Wright; +Cc: Alex Williamson, linux-pci, iommu, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]
On Fri, 2011-11-11 at 16:58 -0800, Chris Wright wrote:
> * David Woodhouse (dwmw2@infradead.org) wrote:
> > On Fri, 2011-11-11 at 16:45 -0800, Chris Wright wrote:
> > > > So if you were to ditch the whole idea of a per-domain runtime update,
> > > > and instead calculate a global value for 'iommu_coherency' at boot time,
> > > > by iterating over for_each_active_iommu()¹, I think that would be a
> > > > better way to deal with the issue. And you *could* really call that a
> > > > 'fix'.
> > > >
> > > > Make sense?
> > >
> > > Ideally, yes. Not sure we can practically do it though. Would have to
> > > be sure we force incoherent access mode for the busted hw.
> >
> > Why's it not practical? You do the *same* loop we currently have in
> > domain_update_iommu_coherency(), except that you do it just *once*, over
> > all active IOMMUs, at boot time. And then you just use that result
> > forever more.
>
> Yeah, you're right, that should be simple enough. What about snoop
> control? Same thing...should we expect it to be system wide? Because
> that one's exported out to KVM and used.
Not sure. That might differ for the graphics IOMMU (large page support
certainly does). But I don't *think* the coherency does, does it?
> > So if *any* IOMMU in the system is non-coherent, you run them all that
> > way.
>
> Minus the measureable slowdown for some devices that were behind coherent
> IOMMU (IIRC, the chipsets that had this issue were mobile anyway, so
> not as performance sensitive), *nod* .
Are there really machines where only *some* of the IOMMUs are coherent?
I didn't think there were, which would mean there's no real performance
penalty on the hardware that actually exists today.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:37 ` David Woodhouse
2011-11-12 0:45 ` Chris Wright
@ 2011-11-12 0:46 ` Roland Dreier
2011-11-12 0:51 ` Chris Wright
1 sibling, 1 reply; 22+ messages in thread
From: Roland Dreier @ 2011-11-12 0:46 UTC (permalink / raw)
To: David Woodhouse
Cc: Alex Williamson, iommu, linux-pci, linux-kernel, chrisw, ddutile
On Fri, Nov 11, 2011 at 4:37 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> This brain-damage only affects the first chipsets
> from before we worked out that cache incoherency was a *really* f*cking
> stupid idea, doesn't it?
As we talked about at KS, I have some Westmere EP (ie latest
and greatest server platform) systems where the BIOS exposes
an option that allows choosing VT-d coherency on or off, and
defaults it to "off".
What is the "official" Intel line on coherency with Westmere and
Tylersburg -- because as I also mentioned, I was seeing some
problems with VT-d and the default "coherency off" setting that
looked like the IOMMU HW is getting stale PTEs (ie a missing
or not working cache flush).
- R.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:46 ` Roland Dreier
@ 2011-11-12 0:51 ` Chris Wright
2011-11-12 0:55 ` David Woodhouse
0 siblings, 1 reply; 22+ messages in thread
From: Chris Wright @ 2011-11-12 0:51 UTC (permalink / raw)
To: Roland Dreier
Cc: David Woodhouse, Alex Williamson, iommu, linux-pci, linux-kernel,
chrisw, ddutile
* Roland Dreier (roland@purestorage.com) wrote:
> On Fri, Nov 11, 2011 at 4:37 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > This brain-damage only affects the first chipsets
> > from before we worked out that cache incoherency was a *really* f*cking
> > stupid idea, doesn't it?
>
> As we talked about at KS, I have some Westmere EP (ie latest
> and greatest server platform) systems where the BIOS exposes
> an option that allows choosing VT-d coherency on or off, and
> defaults it to "off".
That's just more brain damage AFAICT. Esp. if you do performance
testing (and choose not to use passthrough mode)... have and it's
quite measurable. I switched default to on long time ago, w/out
issue.
> What is the "official" Intel line on coherency with Westmere and
> Tylersburg -- because as I also mentioned, I was seeing some
> problems with VT-d and the default "coherency off" setting that
> looked like the IOMMU HW is getting stale PTEs (ie a missing
> or not working cache flush).
That sounds like sw bugs more than official recommendation issue.
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:51 ` Chris Wright
@ 2011-11-12 0:55 ` David Woodhouse
2011-11-12 1:08 ` Chris Wright
0 siblings, 1 reply; 22+ messages in thread
From: David Woodhouse @ 2011-11-12 0:55 UTC (permalink / raw)
To: Chris Wright
Cc: Roland Dreier, Alex Williamson, iommu, linux-pci, linux-kernel,
ddutile
[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]
On Fri, 2011-11-11 at 16:51 -0800, Chris Wright wrote:
> * Roland Dreier (roland@purestorage.com) wrote:
> > On Fri, Nov 11, 2011 at 4:37 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > This brain-damage only affects the first chipsets
> > > from before we worked out that cache incoherency was a *really* f*cking
> > > stupid idea, doesn't it?
> >
> > As we talked about at KS, I have some Westmere EP (ie latest
> > and greatest server platform) systems where the BIOS exposes
> > an option that allows choosing VT-d coherency on or off, and
> > defaults it to "off".
>
> That's just more brain damage AFAICT. Esp. if you do performance
> testing (and choose not to use passthrough mode)... have and it's
> quite measurable. I switched default to on long time ago, w/out
> issue.
>
> > What is the "official" Intel line on coherency with Westmere and
> > Tylersburg -- because as I also mentioned, I was seeing some
> > problems with VT-d and the default "coherency off" setting that
> > looked like the IOMMU HW is getting stale PTEs (ie a missing
> > or not working cache flush).
>
> That sounds like sw bugs more than official recommendation issue.
Although the cache-flushing has been tested on the original chipsets
fairly well, and it's one of the parts I've mostly rewritten when doing
performance work since I inherited the code, so that might not be my
first suspicion. I would be more inclined to suspect that there's some
chipset buffering that we aren't correctly flushing (which might in
itself be a hardware issue, since the way to flush the cache is supposed
to be well-defined).
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 0:55 ` David Woodhouse
@ 2011-11-12 1:08 ` Chris Wright
2011-11-12 1:20 ` David Woodhouse
2011-11-12 1:30 ` Roland Dreier
0 siblings, 2 replies; 22+ messages in thread
From: Chris Wright @ 2011-11-12 1:08 UTC (permalink / raw)
To: David Woodhouse
Cc: Chris Wright, Roland Dreier, Alex Williamson, iommu, linux-pci,
linux-kernel, ddutile
* David Woodhouse (dwmw2@infradead.org) wrote:
> On Fri, 2011-11-11 at 16:51 -0800, Chris Wright wrote:
> > * Roland Dreier (roland@purestorage.com) wrote:
> > > On Fri, Nov 11, 2011 at 4:37 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > > This brain-damage only affects the first chipsets
> > > > from before we worked out that cache incoherency was a *really* f*cking
> > > > stupid idea, doesn't it?
> > >
> > > As we talked about at KS, I have some Westmere EP (ie latest
> > > and greatest server platform) systems where the BIOS exposes
> > > an option that allows choosing VT-d coherency on or off, and
> > > defaults it to "off".
> >
> > That's just more brain damage AFAICT. Esp. if you do performance
> > testing (and choose not to use passthrough mode)... have and it's
> > quite measurable. I switched default to on long time ago, w/out
> > issue.
> >
> > > What is the "official" Intel line on coherency with Westmere and
> > > Tylersburg -- because as I also mentioned, I was seeing some
> > > problems with VT-d and the default "coherency off" setting that
> > > looked like the IOMMU HW is getting stale PTEs (ie a missing
> > > or not working cache flush).
> >
> > That sounds like sw bugs more than official recommendation issue.
>
> Although the cache-flushing has been tested on the original chipsets
> fairly well, and it's one of the parts I've mostly rewritten when doing
> performance work since I inherited the code, so that might not be my
> first suspicion.
All the stale PTE issues I've encountered in the past have turned into
fixed sw bugs (perhaps it's since been fixed?). Also, I thought with
Coherency On/Off it's only effecting the use of clflush, not IOTLB or
Context Entry cache flushing (invalidations).
On a slightly separate, but performance related note...have you ever
tried using the hw queue? Currently we only have a sw queue, but the
submission path for invalidations doesn't really queue (unless I missed
it). It seems to pull from the software queue and submit/wait,
submit/wait...Seems simple enough to submit the whole queue and then
issue the wait.
This would be a huge win if we ever have an emulated IOMMU. We could
make the sw queue bigger, and allocate more than a single page for the
hw queue. We'd only exit on running the queue rather than every
invalidation.
> I would be more inclined to suspect that there's some
> chipset buffering that we aren't correctly flushing (which might in
> itself be a hardware issue, since the way to flush the cache is supposed
> to be well-defined).
Roland, have you tried switching BIOS to Coherency On and can do you
ever see stale PTEs?
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 1:08 ` Chris Wright
@ 2011-11-12 1:20 ` David Woodhouse
2011-11-15 7:12 ` cody
2011-11-12 1:30 ` Roland Dreier
1 sibling, 1 reply; 22+ messages in thread
From: David Woodhouse @ 2011-11-12 1:20 UTC (permalink / raw)
To: Chris Wright
Cc: Roland Dreier, Alex Williamson, iommu, linux-pci, linux-kernel,
ddutile
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]
On Fri, 2011-11-11 at 17:08 -0800, Chris Wright wrote:
> All the stale PTE issues I've encountered in the past have turned into
> fixed sw bugs (perhaps it's since been fixed?). Also, I thought with
> Coherency On/Off it's only effecting the use of clflush, not IOTLB or
> Context Entry cache flushing (invalidations).
Yeah, it's supposed to be *just* clflush. Nevertheless, I can imagine it
being screwed up and there actually being a buffer in the chipset too.
We certainly made that mistake with the graphics engine in some cases...
> On a slightly separate, but performance related note...have you ever
> tried using the hw queue? Currently we only have a sw queue, but the
> submission path for invalidations doesn't really queue (unless I missed
> it). It seems to pull from the software queue and submit/wait,
> submit/wait...Seems simple enough to submit the whole queue and then
> issue the wait.
I have a feeling we trigger errata if we do that — although if we're
only doing it for an emulated IOMMU that shouldn't be an issue.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 1:20 ` David Woodhouse
@ 2011-11-15 7:12 ` cody
2011-11-15 4:54 ` Chris Wright
0 siblings, 1 reply; 22+ messages in thread
From: cody @ 2011-11-15 7:12 UTC (permalink / raw)
To: David Woodhouse
Cc: Chris Wright, Roland Dreier, Alex Williamson, iommu, linux-pci,
linux-kernel, ddutile
On 11/11/2011 05:20 PM, David Woodhouse wrote:
> On Fri, 2011-11-11 at 17:08 -0800, Chris Wright wrote:
>
>> All the stale PTE issues I've encountered in the past have turned into
>> fixed sw bugs (perhaps it's since been fixed?). Also, I thought with
>> Coherency On/Off it's only effecting the use of clflush, not IOTLB or
>> Context Entry cache flushing (invalidations).
>>
> Yeah, it's supposed to be *just* clflush. Nevertheless, I can imagine it
> being screwed up and there actually being a buffer in the chipset too.
> We certainly made that mistake with the graphics engine in some cases...
>
>
>> On a slightly separate, but performance related note...have you ever
>> tried using the hw queue? Currently we only have a sw queue, but the
>> submission path for invalidations doesn't really queue (unless I missed
>> it). It seems to pull from the software queue and submit/wait,
>> submit/wait...Seems simple enough to submit the whole queue and then
>> issue the wait.
>>
> I have a feeling we trigger errata if we do that — although if we're
> only doing it for an emulated IOMMU that shouldn't be an issue.
>
>
What does the emulated IOMMU here? Does it mean the emulated IOMMU
exposed to guest VM?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-15 7:12 ` cody
@ 2011-11-15 4:54 ` Chris Wright
2011-11-15 5:55 ` Kai Huang
0 siblings, 1 reply; 22+ messages in thread
From: Chris Wright @ 2011-11-15 4:54 UTC (permalink / raw)
To: cody
Cc: David Woodhouse, Roland Dreier, linux-pci, iommu, linux-kernel,
Chris Wright
* cody (mail.kai.huang@gmail.com) wrote:
> On 11/11/2011 05:20 PM, David Woodhouse wrote:
> >On Fri, 2011-11-11 at 17:08 -0800, Chris Wright wrote:
> >>On a slightly separate, but performance related note...have you ever
> >>tried using the hw queue? Currently we only have a sw queue, but the
> >>submission path for invalidations doesn't really queue (unless I missed
> >>it). It seems to pull from the software queue and submit/wait,
> >>submit/wait...Seems simple enough to submit the whole queue and then
> >>issue the wait.
> >I have a feeling we trigger errata if we do that — although if we're
> >only doing it for an emulated IOMMU that shouldn't be an issue.
> >
> What does the emulated IOMMU here? Does it mean the emulated IOMMU
> exposed to guest VM?
Yes. So for KVM, the IOMMU emulation would be in QEMU.
thanks,
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-15 4:54 ` Chris Wright
@ 2011-11-15 5:55 ` Kai Huang
0 siblings, 0 replies; 22+ messages in thread
From: Kai Huang @ 2011-11-15 5:55 UTC (permalink / raw)
To: Chris Wright
Cc: David Woodhouse, Roland Dreier, linux-pci, iommu, linux-kernel
On Tue, Nov 15, 2011 at 12:54 PM, Chris Wright <chrisw@sous-sol.org> wrote:
> * cody (mail.kai.huang@gmail.com) wrote:
>> On 11/11/2011 05:20 PM, David Woodhouse wrote:
>> >On Fri, 2011-11-11 at 17:08 -0800, Chris Wright wrote:
>> >>On a slightly separate, but performance related note...have you ever
>> >>tried using the hw queue? Currently we only have a sw queue, but the
>> >>submission path for invalidations doesn't really queue (unless I missed
>> >>it). It seems to pull from the software queue and submit/wait,
>> >>submit/wait...Seems simple enough to submit the whole queue and then
>> >>issue the wait.
>> >I have a feeling we trigger errata if we do that — although if we're
>> >only doing it for an emulated IOMMU that shouldn't be an issue.
>> >
>> What does the emulated IOMMU here? Does it mean the emulated IOMMU
>> exposed to guest VM?
>
> Yes. So for KVM, the IOMMU emulation would be in QEMU.
>
What will emulated IOMMU be used for to the guest VM? To provide
another layer of address translation (in guest VM) that allows guest
VM to allocate large contiguous DVMA space for device? Does this mean
we need to keep something like *shadow page table* for the real IOMMU?
And this will be only used for direct IO, correct?
> thanks,
> -chris
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-12 1:08 ` Chris Wright
2011-11-12 1:20 ` David Woodhouse
@ 2011-11-12 1:30 ` Roland Dreier
1 sibling, 0 replies; 22+ messages in thread
From: Roland Dreier @ 2011-11-12 1:30 UTC (permalink / raw)
To: Chris Wright
Cc: David Woodhouse, Alex Williamson, iommu, linux-pci, linux-kernel,
ddutile
On Fri, Nov 11, 2011 at 5:08 PM, Chris Wright <chrisw@sous-sol.org> wrote:
>> I would be more inclined to suspect that there's some
>> chipset buffering that we aren't correctly flushing (which might in
>> itself be a hardware issue, since the way to flush the cache is supposed
>> to be well-defined).
>
> Roland, have you tried switching BIOS to Coherency On and can do you
> ever see stale PTEs?
Yeah, the problems go away if I turn on coherency.
More details here:
https://lkml.org/lkml/2011/10/11/412
(and after that email, I let the box run long enough that I'm pretty
sure turning VT-d coherency on does fix things)
I'd love to fix this but not sure how much I can contribute beyond
tests (I don't see anything wrong in the VT-d cache flushing code
in the kernel, and I did look for a while).
- R.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus
2011-11-11 22:49 Alex Williamson
2011-11-12 0:17 ` Chris Wright
2011-11-12 0:37 ` David Woodhouse
@ 2012-08-27 20:22 ` Craig Hada
2 siblings, 0 replies; 22+ messages in thread
From: Craig Hada @ 2012-08-27 20:22 UTC (permalink / raw)
To: linux-pci
Alex Williamson <alex.williamson <at> redhat.com> writes:
>
> domain_update_iommu_coherency() currently default to setting domains
> as coherent when the domain is not attached to any iommus. This
> allows for a window in domain_context_mapping_one() where such a
> domain can update context entries non-coherenty, and only after
> update the domain capability to clear iommu_coherency.
>
> This can be seen using KVM device assignment on VT-d systems that
> do not support coherency in the ecap register. When a device is
> added to a guest, a domain is created (iommu_coherency = 0), the
> device is attached, and ranges are mapped. If we then hot unplug
> the device, the coherency is updated and set to the default (1)
> since no iommus are attached to the domain. A subsequenct attach
> of a device makes use of the same dmar domain (now marked coherent)
> updates context entries with cohrency enabled, and only disables
> coherency as the last step in the process.
>
> To fix this, switch domain_update_iommu_coherency() to use the
> safer, non-coherent default for domains not attached to iommus.
>
> Signed-off-by: Alex Williamson <alex.williamson <at> redhat.com>
> Tested-by: Donald Dutile <ddutile <at> redhat.com>
> Acked-by: Donald Dutile <ddutile <at> redhat.com>
> ---
>
> drivers/iommu/intel-iommu.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c0c7820..6b9d8c1 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -560,7 +560,9 @@ static void domain_update_iommu_coherency(struct
dmar_domain *domain)
> {
> int i;
>
> - domain->iommu_coherency = 1;
> + i = find_first_bit(&domain->iommu_bmp, g_num_of_iommus);
> +
> + domain->iommu_coherency = i < g_num_of_iommus ? 1 : 0;
>
> for_each_set_bit(i, &domain->iommu_bmp, g_num_of_iommus) {
> if (!ecap_coherent(g_iommus[i]->ecap)) {
>
>
Is there any resolution for this issue? I have run into it on two different
systems where the IOMMU is in non-coherent mode. The relevant output from the
dmesg follows.
DRHD: handling fault status reg 2
NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
CPU 0
Modules linked in: be2net(+) ext4 mbcache jbd2 sd_mod crc_t10dif aesni_intel cry
ptd aes_x86_64 aes_generic hpsa dm_mirror dm_region_hash dm_log dm_mod [last unl
oaded: scsi_wait_scan]
Pid: 698, comm: kdump Not tainted 3.5.0+ #12 HP ProLiant BL460c G7
RIP: 0010:[<ffffffff8143781c>] [<ffffffff8143781c>] dmar_fault+0x13c/0x210
RSP: 0000:ffff88100b203e28 EFLAGS: 00000086
RAX: 00000000c0000002 RBX: ffff880fbadaf840 RCX: 0000000000000096
RDX: 00000000ba25b000 RSI: 000000000000010c RDI: 0000000000000046
RBP: ffff88100b203e88 R08: ffffffff81ccd160 R09: 000000000000047a
R10: 0000000000000200 R11: 0000000000080000 R12: 0000000000000000
R13: ffff880fbadaf85c R14: 0000000000000100 R15: ffffc90006050100
FS: 00007f2702847700(0000) GS:ffff88100b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ea3028 CR3: 0000000fb5ed1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kdump (pid: 698, threadinfo ffff880fb7084000, task ffff880fb7154b30)
Stack:
0000000000000000 ffff880fb816f388 ffff880fb86b6100 ffff880fb832c010
0000000000000002 0000000000000096 0000000000000286 ffff880fbadac8c0
0000000000000040 0000000000000040 0000000000000000 0000000000000005
Call Trace:
<IRQ>
[<ffffffff810ddb1d>] handle_irq_event_percpu+0x6d/0x220
[<ffffffff810ddd12>] handle_irq_event+0x42/0x70
[<ffffffff810e13c9>] handle_edge_irq+0x69/0x120
[<ffffffff810153cc>] handle_irq+0x5c/0x150
[<ffffffff8105897b>] ? irq_enter+0x1b/0x80
[<ffffffff8152ffed>] do_IRQ+0x5d/0xe0
[<ffffffff815261ea>] common_interrupt+0x6a/0x6a
<EOI>
[<ffffffff8152e3a9>] ? system_call_fastpath+0x16/0x1b
Code: 00 48 89 c1 4d 01 f7 49 8d 77 0c 48 89 f0 48 03 03 8b 00 85 c0 0f 89 06 ff
ff ff 49 8d 57 08 48 03 13 44 8b 12 4c 03 3b 41 8b 17 <41> 8b 7f 04 48 c1 e7 20
41 89 d7 48 03 33 4e 8d 3c 3f ba 00 00
DMAR:[DMA Read] Request device [02:00.0] fault addr fba25b000
DMAR:[fault reason 02] Present bit in context entry is clear
be2net 0000:02:00.0: opcode 0-0 failed:status 71-0
be2net 0000:02:00.0: Emulex OneConnect 10Gbps NIC(be3) initialization failed
64bit 0000:02:00.1 uses identity mapping
be2net 0000:02:00.1: irq 75 for MSI/MSI-X
be2net 0000:02:00.1: Created only 1 receive queues
be2net 0000:02:00.1: eth0: Emulex OneConnect 10Gbps NIC(be3) port 1
64bit 0000:02:00.2 uses identity mapping
be2net 0000:02:00.2: irq 76 for MSI/MSI-X
be2net 0000:02:00.2: Created only 1 receive queues
be2net 0000:02:00.2: eth1: Emulex OneConnect 10Gbps NIC(be3) port 0
64bit 0000:02:00.3 uses identity mapping
be2net 0000:02:00.3: irq 77 for MSI/MSI-X
be2net 0000:02:00.3: Created only 1 receive queues
be2net 0000:02:00.3: eth2: Emulex OneConnect 10Gbps NIC(be3) port 1
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-09-18 14:34 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 19:55 [PATCH] intel-iommu: Default to non-coherent for domains unattached to iommus Donald Dutile
[not found] ` <1347479705-33972-1-git-send-email-ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-18 11:59 ` Joerg Roedel
[not found] ` <20120918115937.GB9939-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2012-09-18 13:57 ` Don Dutile
[not found] ` <50587DE6.3000207-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-18 14:34 ` Joerg Roedel
-- strict thread matches above, loose matches on Subject: below --
2011-11-11 22:49 Alex Williamson
2011-11-12 0:17 ` Chris Wright
2011-11-12 0:37 ` David Woodhouse
2011-11-12 0:45 ` Chris Wright
2011-11-12 0:47 ` Chris Wright
2011-11-12 0:51 ` David Woodhouse
2011-11-12 0:58 ` Chris Wright
2011-11-12 1:03 ` David Woodhouse
2011-11-12 0:46 ` Roland Dreier
2011-11-12 0:51 ` Chris Wright
2011-11-12 0:55 ` David Woodhouse
2011-11-12 1:08 ` Chris Wright
2011-11-12 1:20 ` David Woodhouse
2011-11-15 7:12 ` cody
2011-11-15 4:54 ` Chris Wright
2011-11-15 5:55 ` Kai Huang
2011-11-12 1:30 ` Roland Dreier
2012-08-27 20:22 ` Craig Hada
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.