* [PATCH] AMD IOMMU: Fix an interrupt remapping issue
@ 2011-04-08 10:52 Wei Wang2
2011-04-08 11:26 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Wei Wang2 @ 2011-04-08 10:52 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang2, Wei, Boris Ostrovsky, xen-devel@lists.xensource.com
Some device could generate bogus interrupts if an IO-APIC RTE and an iommu
interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC
RTE updates. For example, if the 2nd operation updates destination bits in
RTE for SATA device and unmask it, in some case, SATA device will assert
ioapic pin to generate interrupt immediately using new destination but iommu
could still translate it into the old destination, then dom0 would be
confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE
on every 32 bits operation and foward IOAPIC RTE updates after interrupt
remapping table has been changed.
Jan, This patch fixes SATA device issue we observed (Bug #680824), please
review it. Thanks!
Wei
--
Advanced Micro Devices GmbH
Sitz: Dornach, Gemeinde Aschheim,
Landkreis München Registergericht München,
HRB Nr. 43632
WEEE-Reg-Nr: DE 12919551
Geschäftsführer:
Alberto Bozzo, Andrew Bowd
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] AMD IOMMU: Fix an interrupt remapping issue
2011-04-08 10:52 [PATCH] AMD IOMMU: Fix an interrupt remapping issue Wei Wang2
@ 2011-04-08 11:26 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2011-04-08 11:26 UTC (permalink / raw)
To: Wei Wang2; +Cc: Boris Ostrovsky, Wei Huang2, xen-devel@lists.xensource.com
>>> On 08.04.11 at 12:52, Wei Wang2 <wei.wang2@amd.com> wrote:
> Some device could generate bogus interrupts if an IO-APIC RTE and an iommu
> interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC
>
> RTE updates. For example, if the 2nd operation updates destination bits in
> RTE for SATA device and unmask it, in some case, SATA device will assert
> ioapic pin to generate interrupt immediately using new destination but iommu
>
> could still translate it into the old destination, then dom0 would be
> confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE
> on every 32 bits operation and foward IOAPIC RTE updates after interrupt
> remapping table has been changed.
>
> Jan, This patch fixes SATA device issue we observed (Bug #680824), please
> review it. Thanks!
Sure - once you attach the actual patch ;-)
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] AMD IOMMU: Fix an interrupt remapping issue
@ 2011-04-08 11:35 Wei Wang2
2011-04-08 13:43 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Wei Wang2 @ 2011-04-08 11:35 UTC (permalink / raw)
To: Jan Beulich; +Cc: Huang2, Wei, Boris Ostrovsky, xen-devel@lists.xensource.com
[-- Attachment #1: Type: text/plain, Size: 984 bytes --]
(sorry, Forget the patch)
Some device could generate bogus interrupts if an IO-APIC RTE and an iommu
interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC
RTE updates. For example, if the 2nd operation updates destination bits in
RTE for SATA device and unmask it, in some case, SATA device will assert
ioapic pin to generate interrupt immediately using new destination but iommu
could still translate it into the old destination, then dom0 would be
confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE
on every 32 bits operation and foward IOAPIC RTE updates after interrupt
remapping table has been changed.
Jan, This patch fixes SATA device issue we observed (Bug #680824), please
review it. Thanks!
Wei
--
Advanced Micro Devices GmbH
Sitz: Dornach, Gemeinde Aschheim,
Landkreis München Registergericht München,
HRB Nr. 43632
WEEE-Reg-Nr: DE 12919551
Geschäftsführer:
Alberto Bozzo, Andrew Bowd
[-- Attachment #2: fix_intremap.patch --]
[-- Type: text/x-diff, Size: 5502 bytes --]
# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Node ID ab2944070ca99790546b34fa04a80103d3e7464f
# Parent e5a750d1bf9bb021713c6721000e655a4054ebea
Some device could generate bogus interrupts if an IO-APIC RTE and an iommu interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC RTE updates. For example, if the 2nd operation updates destination bits in RTE for SATA device and unmask it, in some case, SATA device will assert ioapic pin to generate interrupt immediately using new destination but iommu could still translate it into the old destination, then dom0 would be confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE on every 32 bits operation and foward IOAPIC RTE updates after interrupt remapping table has been changed.
Signed-off-by Wei Wang <wei.wang2@amd.com>
diff -r e5a750d1bf9b -r ab2944070ca9 xen/drivers/passthrough/amd/iommu_intr.c
--- a/xen/drivers/passthrough/amd/iommu_intr.c Thu Apr 07 11:12:55 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_intr.c Fri Apr 08 12:35:48 2011 +0200
@@ -118,7 +118,7 @@ static void update_intremap_entry_from_i
int bdf,
struct amd_iommu *iommu,
struct IO_APIC_route_entry *ioapic_rte,
- unsigned int rte_upper, unsigned int value)
+ unsigned int value)
{
unsigned long flags;
u32* entry;
@@ -130,28 +130,26 @@ static void update_intremap_entry_from_i
req_id = get_intremap_requestor_id(bdf);
lock = get_intremap_lock(req_id);
- /* only remap interrupt vector when lower 32 bits in ioapic ire changed */
- if ( likely(!rte_upper) )
- {
- delivery_mode = rte->delivery_mode;
- vector = rte->vector;
- dest_mode = rte->dest_mode;
- dest = rte->dest.logical.logical_dest;
-
- spin_lock_irqsave(lock, flags);
- offset = get_intremap_offset(vector, delivery_mode);
- entry = (u32*)get_intremap_entry(req_id, offset);
-
- update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
- spin_unlock_irqrestore(lock, flags);
-
- if ( iommu->enabled )
- {
- spin_lock_irqsave(&iommu->lock, flags);
- invalidate_interrupt_table(iommu, req_id);
- flush_command_buffer(iommu);
- spin_unlock_irqrestore(&iommu->lock, flags);
- }
+
+ delivery_mode = rte->delivery_mode;
+ vector = rte->vector;
+ dest_mode = rte->dest_mode;
+ dest = rte->dest.logical.logical_dest;
+
+ spin_lock_irqsave(lock, flags);
+
+ offset = get_intremap_offset(vector, delivery_mode);
+ entry = (u32*)get_intremap_entry(req_id, offset);
+ update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+
+ spin_unlock_irqrestore(lock, flags);
+
+ if ( iommu->enabled )
+ {
+ spin_lock_irqsave(&iommu->lock, flags);
+ invalidate_interrupt_table(iommu, req_id);
+ flush_command_buffer(iommu);
+ spin_unlock_irqrestore(&iommu->lock, flags);
}
}
@@ -199,7 +197,8 @@ int __init amd_iommu_setup_ioapic_remapp
spin_lock_irqsave(lock, flags);
offset = get_intremap_offset(vector, delivery_mode);
entry = (u32*)get_intremap_entry(req_id, offset);
- update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+ update_intremap_entry(entry, vector,
+ delivery_mode, dest_mode, dest);
spin_unlock_irqrestore(lock, flags);
if ( iommu->enabled )
@@ -218,15 +217,12 @@ void amd_iommu_ioapic_update_ire(
unsigned int apic, unsigned int reg, unsigned int value)
{
struct IO_APIC_route_entry ioapic_rte = { 0 };
- unsigned int rte_upper = (reg & 1) ? 1 : 0;
+ unsigned int rte_lo;
int saved_mask, bdf;
struct amd_iommu *iommu;
- *IO_APIC_BASE(apic) = reg;
- *(IO_APIC_BASE(apic)+4) = value;
-
if ( !iommu_intremap )
- return;
+ goto done;
/* get device id of ioapic devices */
bdf = ioapic_bdf[IO_APIC_ID(apic)];
@@ -237,28 +233,34 @@ void amd_iommu_ioapic_update_ire(
bdf);
return;
}
- if ( rte_upper )
- return;
+
+ /* get lower 32 bits IO-APIC ire index */
+ rte_lo = (reg & 1) ? reg - 1 : reg;
/* read both lower and upper 32-bits of rte entry */
- *IO_APIC_BASE(apic) = reg;
+ *IO_APIC_BASE(apic) = rte_lo;
*(((u32 *)&ioapic_rte) + 0) = *(IO_APIC_BASE(apic)+4);
- *IO_APIC_BASE(apic) = reg + 1;
+ *IO_APIC_BASE(apic) = rte_lo + 1;
*(((u32 *)&ioapic_rte) + 1) = *(IO_APIC_BASE(apic)+4);
/* mask the interrupt while we change the intremap table */
saved_mask = ioapic_rte.mask;
ioapic_rte.mask = 1;
- *IO_APIC_BASE(apic) = reg;
+ *IO_APIC_BASE(apic) = rte_lo;
*(IO_APIC_BASE(apic)+4) = *(((int *)&ioapic_rte)+0);
ioapic_rte.mask = saved_mask;
- update_intremap_entry_from_ioapic(
- bdf, iommu, &ioapic_rte, rte_upper, value);
+ /* Update interrupt remapping entry */
+ update_intremap_entry_from_ioapic(bdf, iommu, &ioapic_rte, value);
/* unmask the interrupt after we have updated the intremap table */
+ *IO_APIC_BASE(apic) = rte_lo;
+ *(IO_APIC_BASE(apic)+4) = *(((u32 *)&ioapic_rte)+0);
+
+done:
+ /* Forward write access to IO-APIC */
*IO_APIC_BASE(apic) = reg;
- *(IO_APIC_BASE(apic)+4) = *(((u32 *)&ioapic_rte)+0);
+ *(IO_APIC_BASE(apic)+4) = value;
}
static void update_intremap_entry_from_msi_msg(
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] AMD IOMMU: Fix an interrupt remapping issue
2011-04-08 11:35 Wei Wang2
@ 2011-04-08 13:43 ` Jan Beulich
2011-04-08 14:26 ` Wei Wang2
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2011-04-08 13:43 UTC (permalink / raw)
To: Wei Wang2; +Cc: Boris Ostrovsky, Wei Huang2, xen-devel@lists.xensource.com
>>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@amd.com> wrote:
> Some device could generate bogus interrupts if an IO-APIC RTE and an iommu
> interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC
> RTE updates. For example, if the 2nd operation updates destination bits in
> RTE for SATA device and unmask it, in some case, SATA device will assert
> ioapic pin to generate interrupt immediately using new destination but iommu
> could still translate it into the old destination, then dom0 would be
> confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE
> on every 32 bits operation and foward IOAPIC RTE updates after interrupt
> remapping table has been changed.
I don't think this is correct: Without the patch, the filling of ioapic_rte
takes into account the value already written. Now that you only write
the value at the end of the function, you should overwrite the
affected half with "value" immediately before calling
update_intremap_entry_from_ioapic(). (Without knowing which half
gets written, passing "value" to update_intremap_entry_from_ioapic()
is pointless, and indeed the function doesn't use that parameter.)
Eliminating the double write if reg == rte_lo would also seem desirable
(and in no case should you write back the old value after having called
update_intremap_entry_from_ioapic()).
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] AMD IOMMU: Fix an interrupt remapping issue
2011-04-08 13:43 ` Jan Beulich
@ 2011-04-08 14:26 ` Wei Wang2
2011-04-08 14:39 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Wei Wang2 @ 2011-04-08 14:26 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ostrovsky, Boris, Huang2, Wei, xen-devel@lists.xensource.com
On Friday 08 April 2011 15:43:57 Jan Beulich wrote:
> >>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@amd.com> wrote:
> >
> > Some device could generate bogus interrupts if an IO-APIC RTE and an
> > iommu interrupt remapping entry are not consistent during 2 adjacent
> > 64bits IO-APIC RTE updates. For example, if the 2nd operation updates
> > destination bits in RTE for SATA device and unmask it, in some case, SATA
> > device will assert ioapic pin to generate interrupt immediately using new
> > destination but iommu could still translate it into the old destination,
> > then dom0 would be confused. To fix that, we sync up interrupt remapping
> > entry with IO-APIC IRE on every 32 bits operation and foward IOAPIC RTE
> > updates after interrupt remapping table has been changed.
>
> I don't think this is correct: Without the patch, the filling of ioapic_rte
> takes into account the value already written. Now that you only write
> the value at the end of the function, you should overwrite the
> affected half with "value" immediately before calling
> update_intremap_entry_from_ioapic().
Sorry, not quite understand your point. My thought is, no matter dom0 tried to
updates lower half or upper half of RTE, we always updates interrupt table
from the lower half. This will keep iommu table strictly identically to RTE.
The old code has an assumption that both lower half and upper of RTE should
be updated together. But this might not be always true. If by incident, dom0
only updates the upper half and we don't sync iommu with it, then the
destination in RTE and iommu table will be different.
> (Without knowing which half
> gets written, passing "value" to update_intremap_entry_from_ioapic()
> is pointless, and indeed the function doesn't use that parameter.)
True, I will remove this parameter in update_intremap_entry_from_ioapic().
ioapic_rte should have all information.
> Eliminating the double write if reg == rte_lo would also seem desirable
> (and in no case should you write back the old value after having called
> update_intremap_entry_from_ioapic()).
It not a write back, It just finishes IO-APIC RTE writes. After updating
interrupt remapping table we still have to update RTE. It is just a copy of
__io_apic_write (maybe I should just call it). Old code updates ioapic
earlier than interrupt remapping table and sata device might generate
interrupt right after this, which is not expected.
Thanks,
Wei
> Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] AMD IOMMU: Fix an interrupt remapping issue
2011-04-08 14:26 ` Wei Wang2
@ 2011-04-08 14:39 ` Jan Beulich
2011-04-08 15:06 ` Wei Wang2
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2011-04-08 14:39 UTC (permalink / raw)
To: Wei Wang2; +Cc: Boris Ostrovsky, Wei Huang2, xen-devel@lists.xensource.com
>>> On 08.04.11 at 16:26, Wei Wang2 <wei.wang2@amd.com> wrote:
> On Friday 08 April 2011 15:43:57 Jan Beulich wrote:
>> >>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@amd.com> wrote:
>> >
>> > Some device could generate bogus interrupts if an IO-APIC RTE and an
>> > iommu interrupt remapping entry are not consistent during 2 adjacent
>> > 64bits IO-APIC RTE updates. For example, if the 2nd operation updates
>> > destination bits in RTE for SATA device and unmask it, in some case, SATA
>> > device will assert ioapic pin to generate interrupt immediately using new
>> > destination but iommu could still translate it into the old destination,
>> > then dom0 would be confused. To fix that, we sync up interrupt remapping
>> > entry with IO-APIC IRE on every 32 bits operation and foward IOAPIC RTE
>> > updates after interrupt remapping table has been changed.
>>
>> I don't think this is correct: Without the patch, the filling of ioapic_rte
>> takes into account the value already written. Now that you only write
>> the value at the end of the function, you should overwrite the
>> affected half with "value" immediately before calling
>> update_intremap_entry_from_ioapic().
> Sorry, not quite understand your point. My thought is, no matter dom0 tried
> to
> updates lower half or upper half of RTE, we always updates interrupt table
> from the lower half. This will keep iommu table strictly identically to RTE.
> The old code has an assumption that both lower half and upper of RTE should
> be updated together. But this might not be always true. If by incident, dom0
> only updates the upper half and we don't sync iommu with it, then the
> destination in RTE and iommu table will be different.
No, that's not my point. The problem I'm seeing is that you pass the
old value (as read from the IO-APIC) to
update_intremap_entry_from_ioapic(), but the function certainly
should use the to-be-written one. Previously this was implicit because
the IO-APIC register write happened first.
>> Eliminating the double write if reg == rte_lo would also seem desirable
>> (and in no case should you write back the old value after having called
>> update_intremap_entry_from_ioapic()).
>
> It not a write back, It just finishes IO-APIC RTE writes. After updating
> interrupt remapping table we still have to update RTE. It is just a copy of
> __io_apic_write (maybe I should just call it). Old code updates ioapic
> earlier than interrupt remapping table and sata device might generate
> interrupt right after this, which is not expected.
No. If reg == ret_lo, you write that IO-APIC register twice, which is
pointless. With the other problem unaddressed, you actually first write
back the old value (with the mask bit restored), which gets IO-APIC
and remapping tables out of sync for a brief period of time (which is
a problem by itself), then write the new value. With the other problem
addressed, you would simply write the new value twice, which is
wasteful given that these writes are uncached.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] AMD IOMMU: Fix an interrupt remapping issue
2011-04-08 14:39 ` Jan Beulich
@ 2011-04-08 15:06 ` Wei Wang2
0 siblings, 0 replies; 7+ messages in thread
From: Wei Wang2 @ 2011-04-08 15:06 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ostrovsky, Boris, Huang2, Wei, xen-devel@lists.xensource.com
On Friday 08 April 2011 16:39:13 Jan Beulich wrote:
> >>> On 08.04.11 at 16:26, Wei Wang2 <wei.wang2@amd.com> wrote:
> >
> > On Friday 08 April 2011 15:43:57 Jan Beulich wrote:
> >> >>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@amd.com> wrote:
> >> >
> >> > Some device could generate bogus interrupts if an IO-APIC RTE and an
> >> > iommu interrupt remapping entry are not consistent during 2 adjacent
> >> > 64bits IO-APIC RTE updates. For example, if the 2nd operation updates
> >> > destination bits in RTE for SATA device and unmask it, in some case,
> >> > SATA device will assert ioapic pin to generate interrupt immediately
> >> > using new destination but iommu could still translate it into the old
> >> > destination, then dom0 would be confused. To fix that, we sync up
> >> > interrupt remapping entry with IO-APIC IRE on every 32 bits operation
> >> > and foward IOAPIC RTE updates after interrupt remapping table has been
> >> > changed.
> >>
> >> I don't think this is correct: Without the patch, the filling of
> >> ioapic_rte takes into account the value already written. Now that you
> >> only write the value at the end of the function, you should overwrite
> >> the
> >> affected half with "value" immediately before calling
> >> update_intremap_entry_from_ioapic().
> >
> > Sorry, not quite understand your point. My thought is, no matter dom0
> > tried to
> > updates lower half or upper half of RTE, we always updates interrupt
> > table from the lower half. This will keep iommu table strictly
> > identically to RTE. The old code has an assumption that both lower half
> > and upper of RTE should be updated together. But this might not be always
> > true. If by incident, dom0 only updates the upper half and we don't sync
> > iommu with it, then the destination in RTE and iommu table will be
> > different.
>
> No, that's not my point. The problem I'm seeing is that you pass the
> old value (as read from the IO-APIC) to
> update_intremap_entry_from_ioapic(), but the function certainly
> should use the to-be-written one. Previously this was implicit because
> the IO-APIC register write happened first.
OK, got it. That is definitely problematic. will fix it.
> >> Eliminating the double write if reg == rte_lo would also seem desirable
> >> (and in no case should you write back the old value after having called
> >> update_intremap_entry_from_ioapic()).
> >
> > It not a write back, It just finishes IO-APIC RTE writes. After updating
> > interrupt remapping table we still have to update RTE. It is just a copy
> > of __io_apic_write (maybe I should just call it). Old code updates ioapic
> > earlier than interrupt remapping table and sata device might generate
> > interrupt right after this, which is not expected.
>
> No. If reg == ret_lo, you write that IO-APIC register twice, which is
> pointless. With the other problem unaddressed, you actually first write
> back the old value (with the mask bit restored), which gets IO-APIC
> and remapping tables out of sync for a brief period of time (which is
> a problem by itself), then write the new value. With the other problem
> addressed, you would simply write the new value twice, which is
> wasteful given that these writes are uncached.
True. I will rework the patch try to eliminate this.
Thanks
Wei
> Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-04-08 15:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-08 10:52 [PATCH] AMD IOMMU: Fix an interrupt remapping issue Wei Wang2
2011-04-08 11:26 ` Jan Beulich
-- strict thread matches above, loose matches on Subject: below --
2011-04-08 11:35 Wei Wang2
2011-04-08 13:43 ` Jan Beulich
2011-04-08 14:26 ` Wei Wang2
2011-04-08 14:39 ` Jan Beulich
2011-04-08 15:06 ` Wei Wang2
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).