All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org, Lan Tianyu <tianyu.lan@intel.com>,
	yi.l.liu@intel.com, "Michael S . Tsirkin" <mst@redhat.com>,
	Marcel Apfelbaum <marcel@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH v3 10/12] intel_iommu: support passthrough (PT)
Date: Thu, 11 May 2017 16:48:05 +0800	[thread overview]
Message-ID: <20170511084805.GH28293@pxdev.xzpeter.org> (raw)
In-Reply-To: <7d770f21-636e-dc96-99b9-923f5d5a14aa@redhat.com>

On Thu, May 11, 2017 at 04:31:40PM +0800, Jason Wang wrote:
> 
> 
> On 2017年05月10日 16:01, Peter Xu wrote:
> >Hardware support for VT-d device passthrough. Although current Linux can
> >live with iommu=pt even without this, but this is faster than when using
> >software passthrough.
> >
> >Signed-off-by: Peter Xu <peterx@redhat.com>
> >---
> >  hw/i386/intel_iommu.c          | 210 ++++++++++++++++++++++++++++++++---------
> >  hw/i386/intel_iommu_internal.h |   1 +
> >  hw/i386/trace-events           |   2 +
> >  hw/i386/x86-iommu.c            |   1 +
> >  include/hw/i386/x86-iommu.h    |   1 +
> >  5 files changed, 171 insertions(+), 44 deletions(-)
> >
> >diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >index 1a7eba2..1d034f9 100644
> >--- a/hw/i386/intel_iommu.c
> >+++ b/hw/i386/intel_iommu.c
> >@@ -640,6 +640,29 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
> >      }
> >  }
> >+/* Find the VTD address space associated with a given bus number */
> >+static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
> >+{
> >+    VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
> >+    if (!vtd_bus) {
> >+        /*
> >+         * Iterate over the registered buses to find the one which
> >+         * currently hold this bus number, and update the bus_num
> >+         * lookup table:
> >+         */
> >+        GHashTableIter iter;
> >+
> >+        g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> >+        while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> >+            if (pci_bus_num(vtd_bus->bus) == bus_num) {
> >+                s->vtd_as_by_bus_num[bus_num] = vtd_bus;
> >+                return vtd_bus;
> >+            }
> >+        }
> >+    }
> >+    return vtd_bus;
> >+}
> >+
> >  /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
> >   * of the translation, can be used for deciding the size of large page.
> >   */
> >@@ -881,6 +904,11 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >                  type_fail = true;
> >              }
> >              break;
> >+        case VTD_CONTEXT_TT_PASS_THROUGH:
> >+            if (!x86_iommu->pt_supported) {
> >+                type_fail = true;
> >+            }
> >+            break;
> >          default:
> >              /* Unknwon type */
> >              type_fail = true;
> >@@ -894,6 +922,84 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >      return 0;
> >  }
> >+/*
> >+ * Fetch translation type for specific device. Returns <0 if error
> >+ * happens, otherwise return the shifted type to check against
> >+ * VTD_CONTEXT_TT_*.
> >+ */
> >+static int vtd_dev_get_trans_type(VTDAddressSpace *as)
> >+{
> >+    IntelIOMMUState *s;
> >+    VTDContextEntry ce;
> >+    int ret;
> >+
> >+    s = as->iommu_state;
> >+
> >+    ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
> >+                                   as->devfn, &ce);
> >+    if (ret) {
> >+        return ret;
> >+    }
> >+
> >+    return vtd_ce_get_type(&ce);
> >+}
> >+
> >+static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
> >+{
> >+    int ret;
> >+
> >+    assert(as);
> >+
> >+    ret = vtd_dev_get_trans_type(as);
> >+    if (ret < 0) {
> >+        /*
> >+         * Possibly failed to parse the context entry for some reason
> >+         * (e.g., during init, or any guest configuration errors on
> >+         * context entries). We should assume PT not enabled for
> >+         * safety.
> >+         */
> >+        return false;
> >+    }
> >+
> >+    return ret == VTD_CONTEXT_TT_PASS_THROUGH;
> >+}
> >+
> >+/*
> >+ * When we are during init phase (device realizations, global
> >+ * enable/disable of translations), we should not detect PT
> >+ * (passthrough) when switching address spaces. In that cases, we
> >+ * should set `detect_pt' to false.
> >+ *
> >+ * Return whether the device is using IOMMU translation.
> >+ */
> >+static bool vtd_switch_address_space(VTDAddressSpace *as, bool detect_pt)
> >+{
> 
> The detect_pt looks suspicious. E.g if the context entry does not exist,
> vtd_dev_pt_enabled() will return false.

I forgot why I added that even after reading the comments I wrote. I
blame too much context switches recently in my brain. :(

(this is an excuse of mine :)

I did some test and I see nothing wrong to not hack on this bit. I
will remove that in next version, until one day I remembered
something.

And I will try to add more detailed comments in the future.

> 
> >+    bool use_iommu;
> >+
> >+    assert(as);
> >+
> >+    use_iommu = as->iommu_state->dmar_enabled;
> >+    if (detect_pt) {
> >+        use_iommu &= !vtd_dev_pt_enabled(as);
> >+    }
> >+
> >+    trace_vtd_switch_address_space(pci_bus_num(as->bus),
> >+                                   VTD_PCI_SLOT(as->devfn),
> >+                                   VTD_PCI_FUNC(as->devfn),
> >+                                   use_iommu);
> >+
> >+    /* Turn off first then on the other */
> >+    if (use_iommu) {
> >+        memory_region_set_enabled(&as->sys_alias, false);
> >+        memory_region_set_enabled(&as->iommu, true);
> >+    } else {
> >+        memory_region_set_enabled(&as->iommu, false);
> >+        memory_region_set_enabled(&as->sys_alias, true);
> >+    }
> >+
> >+    return use_iommu;
> >+}
> >+
> >  static inline uint16_t vtd_make_source_id(uint8_t bus_num, uint8_t devfn)
> >  {
> >      return ((bus_num & 0xffUL) << 8) | (devfn & 0xffUL);
> >@@ -931,6 +1037,31 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
> >      return VTD_INTERRUPT_ADDR_FIRST <= addr && addr <= VTD_INTERRUPT_ADDR_LAST;
> >  }
> >+static void vtd_pt_enable_fast_path(IntelIOMMUState *s, uint16_t source_id)
> >+{
> >+    VTDBus *vtd_bus;
> >+    VTDAddressSpace *vtd_as;
> >+    const char *msg = "FAIL";
> >+
> >+    vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
> >+    if (!vtd_bus) {
> >+        goto out;
> >+    }
> >+
> >+    vtd_as = vtd_bus->dev_as[VTD_SID_TO_DEVFN(source_id)];
> >+    if (!vtd_as) {
> >+        goto out;
> >+    }
> >+
> >+    if (vtd_switch_address_space(vtd_as, true) == false) {
> >+        /* We switched off IOMMU region successfully. */
> >+        msg = "SUCCESS";
> >+    }
> >+
> >+out:
> >+    trace_vtd_pt_enable_fast_path(source_id, msg);
> 
> Looks like using a boolean is better here.

Sure.

> 
> >+}
> >+
> >  /* Map dev to context-entry then do a paging-structures walk to do a iommu
> >   * translation.
> >   *
> >@@ -1002,6 +1133,30 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> >          cc_entry->context_cache_gen = s->context_cache_gen;
> >      }
> >+    /*
> >+     * We don't need to translate for pass-through context entries.
> >+     * Also, let's ignore IOTLB caching as well for PT devices.
> >+     */
> >+    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
> >+        entry->translated_addr = entry->iova;
> >+        entry->addr_mask = VTD_PAGE_SIZE - 1;
> >+        entry->perm = IOMMU_RW;
> >+        trace_vtd_translate_pt(source_id, entry->iova);
> >+
> >+        /*
> >+         * When this happens, it means firstly caching-mode is not
> >+         * enabled, and this is the first passthrough translation for
> >+         * the device. Let's enable the fast path for passthrough.
> >+         *
> >+         * When passthrough is disabled again for the device, we can
> >+         * capture it via the context entry invalidation, then the
> >+         * IOMMU region can be swapped back.
> >+         */
> >+        vtd_pt_enable_fast_path(s, source_id);
> >+
> >+        return;
> >+    }
> >+
> >      ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
> >                                 &reads, &writes);
> >      if (ret_fr) {
> >@@ -1081,29 +1236,6 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
> >      vtd_iommu_replay_all(s);
> >  }
> >-
> >-/* Find the VTD address space currently associated with a given bus number,
> >- */
> >-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
> >-{
> >-    VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
> >-    if (!vtd_bus) {
> >-        /* Iterate over the registered buses to find the one
> >-         * which currently hold this bus number, and update the bus_num lookup table:
> >-         */
> >-        GHashTableIter iter;
> >-
> >-        g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> >-        while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> >-            if (pci_bus_num(vtd_bus->bus) == bus_num) {
> >-                s->vtd_as_by_bus_num[bus_num] = vtd_bus;
> >-                return vtd_bus;
> >-            }
> >-        }
> >-    }
> >-    return vtd_bus;
> >-}
> >-
> >  /* Do a context-cache device-selective invalidation.
> >   * @func_mask: FM field after shifting
> >   */
> >@@ -1146,6 +1278,11 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
> >                                               VTD_PCI_FUNC(devfn_it));
> >                  vtd_as->context_cache_entry.context_cache_gen = 0;
> >                  /*
> >+                 * Do switch address space when needed, in case if the
> >+                 * device passthrough bit is switched.
> >+                 */
> >+                vtd_switch_address_space(vtd_as, true);
> 
> Do we need to do this also in DSI and GLOBAL invalidation?

Yes. Though this should be optional at least for Linux, but I will add
that later.

Thanks!

-- 
Peter Xu

  reply	other threads:[~2017-05-11  8:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-10  8:01 [Qemu-devel] [PATCH v3 00/12] VT-d: PT (passthrough) mode support and misc fixes Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 01/12] pc: add 2.10 machine type Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 02/12] memory: tune last param of iommu_ops.translate() Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 03/12] memory: remove the last param in memory_region_iommu_replay() Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 04/12] memory: fix address_space_get_iotlb_entry() Peter Xu
2017-05-11  1:56   ` David Gibson
2017-05-11  9:36     ` Peter Xu
2017-05-12  5:25       ` David Gibson
2017-05-15  9:00         ` Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 05/12] x86-iommu: use DeviceClass properties Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 06/12] intel_iommu: renaming context entry helpers Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 07/12] intel_iommu: provide vtd_ce_get_type() Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 08/12] intel_iommu: use IOMMU_ACCESS_FLAG() Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 09/12] intel_iommu: allow dev-iotlb context entry conditionally Peter Xu
2017-05-11  2:56   ` Jason Wang
2017-05-11  3:51     ` Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 10/12] intel_iommu: support passthrough (PT) Peter Xu
2017-05-11  8:31   ` Jason Wang
2017-05-11  8:48     ` Peter Xu [this message]
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 11/12] intel_iommu: turn off pt before 2.9 Peter Xu
2017-05-10  8:01 ` [Qemu-devel] [PATCH v3 12/12] vhost: iommu: cache static mapping if there is Peter Xu
2017-05-11  8:35   ` Jason Wang
2017-05-11  8:59     ` Peter Xu
2017-05-12  1:54       ` Jason Wang
2017-05-10  8:55 ` [Qemu-devel] [PATCH v3 00/12] VT-d: PT (passthrough) mode support and misc fixes no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511084805.GH28293@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jasowang@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.