xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Wei Wang2 <wei.wang2@amd.com>
To: xen-devel@lists.xensource.com
Cc: "Ostrovsky, Boris" <Boris.Ostrovsky@amd.com>,
	"Huang2, Wei" <Wei.Huang2@amd.com>,
	Jan Beulich <JBeulich@novell.com>
Subject: [PATCH] AMD IOMMU: Fix an interrupt remapping issue (v2)
Date: Fri, 8 Apr 2011 18:52:20 +0200	[thread overview]
Message-ID: <201104081852.20738.wei.wang2@amd.com> (raw)
In-Reply-To: <201104081706.16445.wei.wang2@amd.com>

[-- Attachment #1: Type: text/plain, Size: 4009 bytes --]

Jan, How dose this one look like to you?
Thanks,
Wei

Signed-off-by Wei Wang <wei.wang2@amd.com>
--
Advanced Micro Devices GmbH
Sitz: Dornach, Gemeinde Aschheim, 
Landkreis München Registergericht München, 
HRB Nr. 43632
WEEE-Reg-Nr: DE 12919551
Geschäftsführer:
Alberto Bozzo, Andrew Bowd
On Friday 08 April 2011 17:06:16 Wei Wang2 wrote:
> On Friday 08 April 2011 16:39:13 Jan Beulich wrote:
> > >>> On 08.04.11 at 16:26, Wei Wang2 <wei.wang2@amd.com> wrote:
> > >
> > > On Friday 08 April 2011 15:43:57 Jan Beulich wrote:
> > >> >>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@amd.com> wrote:
> > >> >
> > >> > Some device could generate bogus interrupts if an IO-APIC RTE and an
> > >> > iommu interrupt remapping entry are not consistent during 2 adjacent
> > >> > 64bits IO-APIC RTE updates. For example, if the 2nd operation
> > >> > updates destination bits in RTE for SATA device and unmask it, in
> > >> > some case, SATA device will assert ioapic pin to generate interrupt
> > >> > immediately using new destination but iommu could still translate it
> > >> > into the old destination, then dom0 would be confused. To fix that,
> > >> > we sync up interrupt remapping entry with IO-APIC IRE on every 32
> > >> > bits operation and foward IOAPIC RTE updates after interrupt
> > >> > remapping table has been changed.
> > >>
> > >> I don't think this is correct: Without the patch, the filling of
> > >> ioapic_rte takes into account the value already written. Now that you
> > >> only write the value at the end of the function, you should overwrite
> > >> the
> > >> affected half with "value" immediately before calling
> > >> update_intremap_entry_from_ioapic().
> > >
> > > Sorry, not quite understand your point. My thought is, no matter dom0
> > > tried to
> > > updates lower half or upper half of RTE, we always updates interrupt
> > > table from the lower half. This will keep iommu table strictly
> > > identically to RTE. The old code has an assumption that both lower half
> > > and upper of RTE should be updated together. But this might not be
> > > always true. If by incident, dom0 only updates the upper half and we
> > > don't sync iommu with it, then the destination in RTE and iommu table
> > > will be different.
> >
> > No, that's not my point. The problem I'm seeing is that you pass the
> > old value (as read from the IO-APIC) to
> > update_intremap_entry_from_ioapic(), but the function certainly
> > should use the to-be-written one. Previously this was implicit because
> > the IO-APIC register write happened first.
>
> OK, got it. That is definitely problematic. will fix it.
>
> > >> Eliminating the double write if reg == rte_lo would also seem
> > >> desirable (and in no case should you write back the old value after
> > >> having called update_intremap_entry_from_ioapic()).
> > >
> > > It not a write back, It just finishes IO-APIC RTE writes. After
> > > updating interrupt remapping table we still have to update RTE. It is
> > > just a copy of __io_apic_write (maybe I should just call it). Old code
> > > updates ioapic earlier than interrupt remapping table and sata device
> > > might generate interrupt right after this, which is not expected.
> >
> > No. If reg == ret_lo, you write that IO-APIC register twice, which is
> > pointless. With the other problem unaddressed, you actually first write
> > back the old value (with the mask bit restored), which gets IO-APIC
> > and remapping tables out of sync for a brief period of time (which is
> > a problem by itself), then write the new value. With the other problem
> > addressed, you would simply write the new value twice, which is
> > wasteful given that these writes are uncached.
>
> True. I will rework the patch try to eliminate this.
> Thanks
> Wei
>
> > Jan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



[-- Attachment #2: fix_intremap_v2.patch --]
[-- Type: text/x-diff, Size: 5541 bytes --]

diff -r e5a750d1bf9b xen/drivers/passthrough/amd/iommu_intr.c
--- a/xen/drivers/passthrough/amd/iommu_intr.c	Thu Apr 07 11:12:55 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_intr.c	Fri Apr 08 18:49:18 2011 +0200
@@ -117,8 +117,7 @@ static void update_intremap_entry_from_i
 static void update_intremap_entry_from_ioapic(
     int bdf,
     struct amd_iommu *iommu,
-    struct IO_APIC_route_entry *ioapic_rte,
-    unsigned int rte_upper, unsigned int value)
+    struct IO_APIC_route_entry *ioapic_rte)
 {
     unsigned long flags;
     u32* entry;
@@ -130,28 +129,26 @@ static void update_intremap_entry_from_i
 
     req_id = get_intremap_requestor_id(bdf);
     lock = get_intremap_lock(req_id);
-    /* only remap interrupt vector when lower 32 bits in ioapic ire changed */
-    if ( likely(!rte_upper) )
-    {
-        delivery_mode = rte->delivery_mode;
-        vector = rte->vector;
-        dest_mode = rte->dest_mode;
-        dest = rte->dest.logical.logical_dest;
-
-        spin_lock_irqsave(lock, flags);
-        offset = get_intremap_offset(vector, delivery_mode);
-        entry = (u32*)get_intremap_entry(req_id, offset);
-
-        update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
-        spin_unlock_irqrestore(lock, flags);
-
-        if ( iommu->enabled )
-        {
-            spin_lock_irqsave(&iommu->lock, flags);
-            invalidate_interrupt_table(iommu, req_id);
-            flush_command_buffer(iommu);
-            spin_unlock_irqrestore(&iommu->lock, flags);
-        }
+
+    delivery_mode = rte->delivery_mode;
+    vector = rte->vector;
+    dest_mode = rte->dest_mode;
+    dest = rte->dest.logical.logical_dest;
+
+    spin_lock_irqsave(lock, flags);
+
+    offset = get_intremap_offset(vector, delivery_mode);
+    entry = (u32*)get_intremap_entry(req_id, offset);
+    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+
+    spin_unlock_irqrestore(lock, flags);
+
+    if ( iommu->enabled )
+    {
+        spin_lock_irqsave(&iommu->lock, flags);
+        invalidate_interrupt_table(iommu, req_id);
+        flush_command_buffer(iommu);
+        spin_unlock_irqrestore(&iommu->lock, flags);
     }
 }
 
@@ -199,7 +196,8 @@ int __init amd_iommu_setup_ioapic_remapp
             spin_lock_irqsave(lock, flags);
             offset = get_intremap_offset(vector, delivery_mode);
             entry = (u32*)get_intremap_entry(req_id, offset);
-            update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+            update_intremap_entry(entry, vector,
+                                  delivery_mode, dest_mode, dest);
             spin_unlock_irqrestore(lock, flags);
 
             if ( iommu->enabled )
@@ -217,16 +215,14 @@ void amd_iommu_ioapic_update_ire(
 void amd_iommu_ioapic_update_ire(
     unsigned int apic, unsigned int reg, unsigned int value)
 {
-    struct IO_APIC_route_entry ioapic_rte = { 0 };
-    unsigned int rte_upper = (reg & 1) ? 1 : 0;
+    struct IO_APIC_route_entry old_rte = { 0 };
+    struct IO_APIC_route_entry new_rte = { 0 };
+    unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
     int saved_mask, bdf;
     struct amd_iommu *iommu;
 
-    *IO_APIC_BASE(apic) = reg;
-    *(IO_APIC_BASE(apic)+4) = value;
-
     if ( !iommu_intremap )
-        return;
+        goto done;
 
     /* get device id of ioapic devices */
     bdf = ioapic_bdf[IO_APIC_ID(apic)];
@@ -235,30 +231,47 @@ void amd_iommu_ioapic_update_ire(
     {
         AMD_IOMMU_DEBUG("Fail to find iommu for ioapic device id = 0x%x\n",
                         bdf);
-        return;
-    }
-    if ( rte_upper )
-        return;
-
-    /* read both lower and upper 32-bits of rte entry */
-    *IO_APIC_BASE(apic) = reg;
-    *(((u32 *)&ioapic_rte) + 0) = *(IO_APIC_BASE(apic)+4);
-    *IO_APIC_BASE(apic) = reg + 1;
-    *(((u32 *)&ioapic_rte) + 1) = *(IO_APIC_BASE(apic)+4);
+        goto done;
+    }
+
+    /* Save io-apic rte lower 32 bits */
+    *IO_APIC_BASE(apic) = rte_lo;
+    *((u32 *)&old_rte) = *(IO_APIC_BASE(apic) + 4);
+    saved_mask = old_rte.mask;
+
+    if ( reg == rte_lo )
+    {
+        *((u32 *)&new_rte) = value;
+        /* read upper 32 bits from io-apic rte */
+        *IO_APIC_BASE(apic) = reg + 1;
+        *(((u32 *)&new_rte) + 1) = *(IO_APIC_BASE(apic) + 4);
+    }
+    else
+    {
+        *((u32 *)&new_rte) = *((u32 *)&old_rte);
+        *(((u32 *)&new_rte) + 1) = value;
+    }
 
     /* mask the interrupt while we change the intremap table */
-    saved_mask = ioapic_rte.mask;
-    ioapic_rte.mask = 1;
-    *IO_APIC_BASE(apic) = reg;
-    *(IO_APIC_BASE(apic)+4) = *(((int *)&ioapic_rte)+0);
-    ioapic_rte.mask = saved_mask;
-
-    update_intremap_entry_from_ioapic(
-        bdf, iommu, &ioapic_rte, rte_upper, value);
+    old_rte.mask = 1;
+    *IO_APIC_BASE(apic) = rte_lo;
+    *(IO_APIC_BASE(apic) + 4) = *((u32 *)&old_rte);
+
+    /* Update interrupt remapping entry */
+    update_intremap_entry_from_ioapic(bdf, iommu, &new_rte);
+
+    /* Update IO-APIC directly to avoid double writes */
+    if ( reg == rte_lo )
+        goto done;
 
     /* unmask the interrupt after we have updated the intremap table */
-    *IO_APIC_BASE(apic) = reg;
-    *(IO_APIC_BASE(apic)+4) = *(((u32 *)&ioapic_rte)+0);
+    old_rte.mask = saved_mask;
+    *IO_APIC_BASE(apic) = rte_lo;
+    *(IO_APIC_BASE(apic) + 4) = *((u32 *)&old_rte);
+
+done:
+    /* Forward write access to IO-APIC */
+    __io_apic_write(apic, reg, value);
 }
 
 static void update_intremap_entry_from_msi_msg(

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-04-08 16:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-08 11:35 [PATCH] AMD IOMMU: Fix an interrupt remapping issue Wei Wang2
2011-04-08 13:43 ` Jan Beulich
2011-04-08 14:26   ` Wei Wang2
2011-04-08 14:39     ` Jan Beulich
2011-04-08 15:06       ` Wei Wang2
2011-04-08 16:52         ` Wei Wang2 [this message]
2011-04-11  7:23           ` [PATCH] AMD IOMMU: Fix an interrupt remapping issue (v2) Jan Beulich
2011-04-11  7:39             ` Wei Wang2
2011-04-11 10:31             ` [PATCH V3] AMD IOMMU: Fix an interrupt remapping issue Wei Wang2
2011-04-11 11:35               ` Jan Beulich
2011-07-19  9:37                 ` George Dunlap
2011-07-19  9:59                   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104081852.20738.wei.wang2@amd.com \
    --to=wei.wang2@amd.com \
    --cc=Boris.Ostrovsky@amd.com \
    --cc=JBeulich@novell.com \
    --cc=Wei.Huang2@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).