From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Wang2 Subject: [PATCH] AMD IOMMU: Fix an interrupt remapping issue (v2) Date: Fri, 8 Apr 2011 18:52:20 +0200 Message-ID: <201104081852.20738.wei.wang2@amd.com> References: <201104081335.36718.wei.wang2@amd.com> <4D9F3A31020000780003A9FA@vpn.id2.novell.com> <201104081706.16445.wei.wang2@amd.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Boundary-00=_E1znNj3k8GyaZP9" Return-path: In-Reply-To: <201104081706.16445.wei.wang2@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: "Ostrovsky, Boris" , "Huang2, Wei" , Jan Beulich List-Id: xen-devel@lists.xenproject.org --Boundary-00=_E1znNj3k8GyaZP9 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Jan, How dose this one look like to you? Thanks, Wei Signed-off-by Wei Wang =2D- Advanced Micro Devices GmbH Sitz: Dornach, Gemeinde Aschheim,=20 Landkreis M=FCnchen Registergericht M=FCnchen,=20 HRB Nr. 43632 WEEE-Reg-Nr: DE 12919551 Gesch=E4ftsf=FChrer: Alberto Bozzo, Andrew Bowd On Friday 08 April 2011 17:06:16 Wei Wang2 wrote: > On Friday 08 April 2011 16:39:13 Jan Beulich wrote: > > >>> On 08.04.11 at 16:26, Wei Wang2 wrote: > > > > > > On Friday 08 April 2011 15:43:57 Jan Beulich wrote: > > >> >>> On 08.04.11 at 13:35, Wei Wang2 wrote: > > >> > > > >> > Some device could generate bogus interrupts if an IO-APIC RTE and = an > > >> > iommu interrupt remapping entry are not consistent during 2 adjace= nt > > >> > 64bits IO-APIC RTE updates. For example, if the 2nd operation > > >> > updates destination bits in RTE for SATA device and unmask it, in > > >> > some case, SATA device will assert ioapic pin to generate interrupt > > >> > immediately using new destination but iommu could still translate = it > > >> > into the old destination, then dom0 would be confused. To fix that, > > >> > we sync up interrupt remapping entry with IO-APIC IRE on every 32 > > >> > bits operation and foward IOAPIC RTE updates after interrupt > > >> > remapping table has been changed. > > >> > > >> I don't think this is correct: Without the patch, the filling of > > >> ioapic_rte takes into account the value already written. Now that you > > >> only write the value at the end of the function, you should overwrite > > >> the > > >> affected half with "value" immediately before calling > > >> update_intremap_entry_from_ioapic(). > > > > > > Sorry, not quite understand your point. My thought is, no matter dom0 > > > tried to > > > updates lower half or upper half of RTE, we always updates interrupt > > > table from the lower half. This will keep iommu table strictly > > > identically to RTE. The old code has an assumption that both lower ha= lf > > > and upper of RTE should be updated together. But this might not be > > > always true. If by incident, dom0 only updates the upper half and we > > > don't sync iommu with it, then the destination in RTE and iommu table > > > will be different. > > > > No, that's not my point. The problem I'm seeing is that you pass the > > old value (as read from the IO-APIC) to > > update_intremap_entry_from_ioapic(), but the function certainly > > should use the to-be-written one. Previously this was implicit because > > the IO-APIC register write happened first. > > OK, got it. That is definitely problematic. will fix it. > > > >> Eliminating the double write if reg =3D=3D rte_lo would also seem > > >> desirable (and in no case should you write back the old value after > > >> having called update_intremap_entry_from_ioapic()). > > > > > > It not a write back, It just finishes IO-APIC RTE writes. After > > > updating interrupt remapping table we still have to update RTE. It is > > > just a copy of __io_apic_write (maybe I should just call it). Old code > > > updates ioapic earlier than interrupt remapping table and sata device > > > might generate interrupt right after this, which is not expected. > > > > No. If reg =3D=3D ret_lo, you write that IO-APIC register twice, which = is > > pointless. With the other problem unaddressed, you actually first write > > back the old value (with the mask bit restored), which gets IO-APIC > > and remapping tables out of sync for a brief period of time (which is > > a problem by itself), then write the new value. With the other problem > > addressed, you would simply write the new value twice, which is > > wasteful given that these writes are uncached. > > True. I will rework the patch try to eliminate this. > Thanks > Wei > > > Jan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel --Boundary-00=_E1znNj3k8GyaZP9 Content-Type: text/x-diff; charset="iso-8859-1"; name="fix_intremap_v2.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fix_intremap_v2.patch" Content-Description: fix_intremap_v2.patch diff -r e5a750d1bf9b xen/drivers/passthrough/amd/iommu_intr.c --- a/xen/drivers/passthrough/amd/iommu_intr.c Thu Apr 07 11:12:55 2011 +0100 +++ b/xen/drivers/passthrough/amd/iommu_intr.c Fri Apr 08 18:49:18 2011 +0200 @@ -117,8 +117,7 @@ static void update_intremap_entry_from_i static void update_intremap_entry_from_ioapic( int bdf, struct amd_iommu *iommu, - struct IO_APIC_route_entry *ioapic_rte, - unsigned int rte_upper, unsigned int value) + struct IO_APIC_route_entry *ioapic_rte) { unsigned long flags; u32* entry; @@ -130,28 +129,26 @@ static void update_intremap_entry_from_i req_id = get_intremap_requestor_id(bdf); lock = get_intremap_lock(req_id); - /* only remap interrupt vector when lower 32 bits in ioapic ire changed */ - if ( likely(!rte_upper) ) - { - delivery_mode = rte->delivery_mode; - vector = rte->vector; - dest_mode = rte->dest_mode; - dest = rte->dest.logical.logical_dest; - - spin_lock_irqsave(lock, flags); - offset = get_intremap_offset(vector, delivery_mode); - entry = (u32*)get_intremap_entry(req_id, offset); - - update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); - spin_unlock_irqrestore(lock, flags); - - if ( iommu->enabled ) - { - spin_lock_irqsave(&iommu->lock, flags); - invalidate_interrupt_table(iommu, req_id); - flush_command_buffer(iommu); - spin_unlock_irqrestore(&iommu->lock, flags); - } + + delivery_mode = rte->delivery_mode; + vector = rte->vector; + dest_mode = rte->dest_mode; + dest = rte->dest.logical.logical_dest; + + spin_lock_irqsave(lock, flags); + + offset = get_intremap_offset(vector, delivery_mode); + entry = (u32*)get_intremap_entry(req_id, offset); + update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); + + spin_unlock_irqrestore(lock, flags); + + if ( iommu->enabled ) + { + spin_lock_irqsave(&iommu->lock, flags); + invalidate_interrupt_table(iommu, req_id); + flush_command_buffer(iommu); + spin_unlock_irqrestore(&iommu->lock, flags); } } @@ -199,7 +196,8 @@ int __init amd_iommu_setup_ioapic_remapp spin_lock_irqsave(lock, flags); offset = get_intremap_offset(vector, delivery_mode); entry = (u32*)get_intremap_entry(req_id, offset); - update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); + update_intremap_entry(entry, vector, + delivery_mode, dest_mode, dest); spin_unlock_irqrestore(lock, flags); if ( iommu->enabled ) @@ -217,16 +215,14 @@ void amd_iommu_ioapic_update_ire( void amd_iommu_ioapic_update_ire( unsigned int apic, unsigned int reg, unsigned int value) { - struct IO_APIC_route_entry ioapic_rte = { 0 }; - unsigned int rte_upper = (reg & 1) ? 1 : 0; + struct IO_APIC_route_entry old_rte = { 0 }; + struct IO_APIC_route_entry new_rte = { 0 }; + unsigned int rte_lo = (reg & 1) ? reg - 1 : reg; int saved_mask, bdf; struct amd_iommu *iommu; - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = value; - if ( !iommu_intremap ) - return; + goto done; /* get device id of ioapic devices */ bdf = ioapic_bdf[IO_APIC_ID(apic)]; @@ -235,30 +231,47 @@ void amd_iommu_ioapic_update_ire( { AMD_IOMMU_DEBUG("Fail to find iommu for ioapic device id = 0x%x\n", bdf); - return; - } - if ( rte_upper ) - return; - - /* read both lower and upper 32-bits of rte entry */ - *IO_APIC_BASE(apic) = reg; - *(((u32 *)&ioapic_rte) + 0) = *(IO_APIC_BASE(apic)+4); - *IO_APIC_BASE(apic) = reg + 1; - *(((u32 *)&ioapic_rte) + 1) = *(IO_APIC_BASE(apic)+4); + goto done; + } + + /* Save io-apic rte lower 32 bits */ + *IO_APIC_BASE(apic) = rte_lo; + *((u32 *)&old_rte) = *(IO_APIC_BASE(apic) + 4); + saved_mask = old_rte.mask; + + if ( reg == rte_lo ) + { + *((u32 *)&new_rte) = value; + /* read upper 32 bits from io-apic rte */ + *IO_APIC_BASE(apic) = reg + 1; + *(((u32 *)&new_rte) + 1) = *(IO_APIC_BASE(apic) + 4); + } + else + { + *((u32 *)&new_rte) = *((u32 *)&old_rte); + *(((u32 *)&new_rte) + 1) = value; + } /* mask the interrupt while we change the intremap table */ - saved_mask = ioapic_rte.mask; - ioapic_rte.mask = 1; - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = *(((int *)&ioapic_rte)+0); - ioapic_rte.mask = saved_mask; - - update_intremap_entry_from_ioapic( - bdf, iommu, &ioapic_rte, rte_upper, value); + old_rte.mask = 1; + *IO_APIC_BASE(apic) = rte_lo; + *(IO_APIC_BASE(apic) + 4) = *((u32 *)&old_rte); + + /* Update interrupt remapping entry */ + update_intremap_entry_from_ioapic(bdf, iommu, &new_rte); + + /* Update IO-APIC directly to avoid double writes */ + if ( reg == rte_lo ) + goto done; /* unmask the interrupt after we have updated the intremap table */ - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = *(((u32 *)&ioapic_rte)+0); + old_rte.mask = saved_mask; + *IO_APIC_BASE(apic) = rte_lo; + *(IO_APIC_BASE(apic) + 4) = *((u32 *)&old_rte); + +done: + /* Forward write access to IO-APIC */ + __io_apic_write(apic, reg, value); } static void update_intremap_entry_from_msi_msg( --Boundary-00=_E1znNj3k8GyaZP9 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --Boundary-00=_E1znNj3k8GyaZP9--