qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com,
	jan.kiszka@siemens.com, jasowang@redhat.com, peterx@redhat.com,
	David Gibson <david@gibson.dropbear.id.au>,
	alex.williamson@redhat.com, bd.aviv@gmail.com
Subject: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Date: Tue,  7 Feb 2017 16:28:02 +0800	[thread overview]
Message-ID: <1486456099-7345-1-git-send-email-peterx@redhat.com> (raw)

This is v7 of vt-d vfio enablement series.

v7:
- for the two traces patches: Change subjects. Remove vtd_err() and
  vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
  each of the places. Don't remove any DPRINTF() if there is no
  replacement. [Jason]
- add r-b and a-b for Alex/David/Jason.
- in patch "intel_iommu: renaming gpa to iova where proper", convert
  one more place where I missed [Jason]
- fix the place where I should use "~0ULL" not "~0" [Jason]
- squash patch 16 into 18 [Jason]

v6:
- do unmap in all cases when replay [Jason]
- do global replay even if context entry is invalidated [Jason]
- when iommu reset, send unmap to all registered notifiers [Jason]
- use rcu read lock to protect the whole vfio_iommu_map_notify()
  [Alex, Paolo]

v5:
- fix patch 4 subject too long, and error spelling [Eric]
- add ack-by for alex in patch 1 [Alex]
- squashing patch 19/20 into patch 18 [Jason]
- fix comments in vtd_page_walk() [Jason]
- remove all error_report() [Jason]
- add comment for patch 18, mention about that enabled vhost without
  ATS as well [Jason]
- remove skipped debug thing during page walk [Jason]
- remove duplicated page walk trace [Jason]
- some tunings in vtd_address_space_unmap(), to provide correct iova
  and addr_mask. For this, I tuned this patch as well a bit:
  "memory: add section range info for IOMMU notifier"
  to loosen the range check

v4:
- convert all error_report()s into traces (in the two patches that did
  that)
- rebased to Jason's DMAR series (master + one more patch:
  "[PATCH V4 net-next] vhost_net: device IOTLB support")
- let vhost use the new api iommu_notifier_init() so it won't break
  vhost dmar [Jason]
- touch commit message of the patch:
  "intel_iommu: provide its own replay() callback"
  old replay is not a dead loop, but it will just consume lots of time
  [Jason]
- add comment for patch:
  "intel_iommu: do replay when context invalidate"
  telling why replay won't be a problem even without CM=1 [Jason]
- remove a useless comment line [Jason]
- remove dmar_enabled parameter for vtd_switch_address_space() and
  vtd_switch_address_space_all() [Mst, Jason]
- merged the vfio patches in, to support unmap of big ranges at the
  beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
  region")
- using caching_mode instead of cache_mode_enabled, and "caching-mode"
  instead of "cache-mode" [Kevin]
- when receive context entry invalidation, we unmap the entire region
  first, then replay [Alex]
- fix commit message for patch:
  "intel_iommu: simplify irq region translation" [Kevin]
- handle domain/global invalidation, and notify where proper [Jason,
  Kevin]

v3:
- fix style error reported by patchew
- fix comment in domain switch patch: use "IOMMU address space" rather
  than "IOMMU region" [Kevin]
- add ack-by for Paolo in patch:
  "memory: add section range info for IOMMU notifier"
  (this is seperately collected besides this thread)
- remove 3 patches which are merged already (from Jason)
- rebase to master b6c0897

v2:
- change comment for "end" parameter in vtd_page_walk() [Tianyu]
- change comment for "a iova" to "an iova" [Yi]
- fix fault printed val for GPA address in vtd_page_walk_level (debug
  only)
- rebased to master (rather than Aviv's v6 series) and merged Aviv's
  series v6: picked patch 1 (as patch 1 in this series), dropped patch
  2, re-wrote patch 3 (as patch 17 of this series).
- picked up two more bugfix patches from Jason's DMAR series
- picked up the following patch as well:
  "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"

This RFC series is a re-work for Aviv B.D.'s vfio enablement series
with vt-d:

  https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html

Aviv has done a great job there, and what we still lack there are
mostly the following:

(1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
    memory region.

(2) VT-d still haven't provide a correct replay() mechanism (e.g.,
    when IOMMU domain switches, things will broke).

This series should have solved the above two issues.

Online repo:

  https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

I would be glad to hear about any review comments for above patches.

=========
Test Done
=========

Build test passed for x86_64/arm/ppc64.

Simply tested with x86_64, assigning two PCI devices to a single VM,
boot the VM using:

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
     -device intel-iommu,intremap=on,eim=off,caching-mode=on \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device virtio-net-pci,netdev=net0 \
     -device vfio-pci,host=03:00.0 \
     -device vfio-pci,host=02:00.0 \
     -trace events=".trace.vfio" \
     /var/lib/libvirt/images/vm1.qcow2

pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
vtd_page_walk*
vtd_replay*
vtd_inv_desc*

Then, in the guest, run the following tool:

  https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c

With parameter:

  ./vfio-bind-group 00:03.0 00:04.0

Check host side trace log, I can see pages are replayed and mapped in
00:04.0 device address space, like:

...
vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
...

=========
Todo List
=========

- error reporting for the assigned devices (as Tianyu has mentioned)

- per-domain address-space: A better solution in the future may be -
  we maintain one address space per IOMMU domain in the guest (so
  multiple devices can share a same address space if they are sharing
  the same IOMMU domains in the guest), rather than one address space
  per device (which is current implementation of vt-d). However that's
  a step further than this series, and let's see whether we can first
  provide a workable version of device assignment with vt-d
  protection.

- don't need to notify IOTLB (psi/gsi/global) invalidations to devices
  that with ATS enabled

- investigate when guest map page while mask contains existing mapped
  pages (e.g. map 12k-16k first, then map 0-12k)

- coalesce unmap during page walk (currently, we send it once per
  page)

- when do PSI for unmap, whether we can send one notify directly
  instead of walking over the page table?

- more to come...

Thanks,

Aviv Ben-David (1):
  intel_iommu: add "caching-mode" option

Peter Xu (16):
  vfio: trace map/unmap for notify as well
  vfio: introduce vfio_get_vaddr()
  vfio: allow to notify unmap for very large region
  intel_iommu: simplify irq region translation
  intel_iommu: renaming gpa to iova where proper
  intel_iommu: convert dbg macros to traces for inv
  intel_iommu: convert dbg macros to trace for trans
  intel_iommu: vtd_slpt_level_shift check level
  memory: add section range info for IOMMU notifier
  memory: provide IOMMU_NOTIFIER_FOREACH macro
  memory: provide iommu_replay_all()
  memory: introduce memory_region_notify_one()
  memory: add MemoryRegionIOMMUOps.replay() callback
  intel_iommu: provide its own replay() callback
  intel_iommu: allow dynamic switch of IOMMU region
  intel_iommu: enable vfio devices

 hw/i386/intel_iommu.c          | 669 +++++++++++++++++++++++++++++++----------
 hw/i386/intel_iommu_internal.h |   2 +
 hw/i386/trace-events           |  36 +++
 hw/vfio/common.c               |  77 +++--
 hw/vfio/trace-events           |   2 +-
 hw/virtio/vhost.c              |   4 +-
 include/exec/memory.h          |  49 ++-
 include/hw/i386/intel_iommu.h  |  12 +
 memory.c                       |  52 +++-
 9 files changed, 710 insertions(+), 193 deletions(-)

-- 
2.7.4

             reply	other threads:[~2017-02-07  8:28 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07  8:28 Peter Xu [this message]
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 01/17] vfio: trace map/unmap for notify as well Peter Xu
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 02/17] vfio: introduce vfio_get_vaddr() Peter Xu
2017-02-10  1:12   ` David Gibson
2017-02-10  5:50     ` Peter Xu
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 03/17] vfio: allow to notify unmap for very large region Peter Xu
2017-02-10  1:13   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 04/17] intel_iommu: add "caching-mode" option Peter Xu
2017-02-10  1:14   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 05/17] intel_iommu: simplify irq region translation Peter Xu
2017-02-10  1:15   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 06/17] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-02-10  1:17   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 07/17] intel_iommu: convert dbg macros to traces for inv Peter Xu
2017-02-08  2:47   ` Jason Wang
2017-02-10  1:19   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 08/17] intel_iommu: convert dbg macros to trace for trans Peter Xu
2017-02-08  2:49   ` Jason Wang
2017-02-10  1:20   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 09/17] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-02-10  1:20   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 10/17] memory: add section range info for IOMMU notifier Peter Xu
2017-02-10  2:29   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 11/17] memory: provide IOMMU_NOTIFIER_FOREACH macro Peter Xu
2017-02-10  2:30   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 12/17] memory: provide iommu_replay_all() Peter Xu
2017-02-10  2:31   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 13/17] memory: introduce memory_region_notify_one() Peter Xu
2017-02-10  2:33   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 14/17] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-02-10  2:34   ` David Gibson
2017-03-27  8:35   ` Liu, Yi L
2017-03-27  9:12     ` Peter Xu
2017-03-27  9:21       ` Liu, Yi L
2017-03-30 11:06         ` Liu, Yi L
2017-03-30 11:57           ` Jason Wang
2017-03-31  2:56             ` Peter Xu
2017-03-31  4:21               ` Jason Wang
2017-03-31  5:01                 ` Peter Xu
2017-03-31  5:12                   ` Jason Wang
2017-03-31  5:28                     ` Peter Xu
2017-03-31  5:34             ` Liu, Yi L
2017-03-31  7:16               ` Jason Wang
2017-03-31  7:30                 ` Liu, Yi L
2017-04-01  5:00                   ` Jason Wang
2017-04-01  6:39                     ` Liu, Yi L
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 15/17] intel_iommu: provide its own replay() callback Peter Xu
2017-02-10  2:36   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 16/17] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-02-10  2:38   ` David Gibson
2017-02-07  8:28 ` [Qemu-devel] [PATCH v7 17/17] intel_iommu: enable vfio devices Peter Xu
2017-02-10  6:24   ` Jason Wang
2017-03-16  4:05   ` Peter Xu
2017-03-19 15:34     ` Aviv B.D.
2017-03-20  1:56       ` Peter Xu
2017-03-20  2:12         ` Liu, Yi L
2017-03-20  2:41           ` Peter Xu
2017-02-17 17:18 ` [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances Alex Williamson
2017-02-20  7:47   ` Peter Xu
2017-02-20  8:17     ` Liu, Yi L
2017-02-20  8:32       ` Peter Xu
2017-02-20 19:15     ` Alex Williamson
2017-02-28  7:52 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1486456099-7345-1-git-send-email-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).