iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Intel IOMMU scalability improvements
@ 2016-04-20  8:31 Adam Morrison
       [not found] ` <cover.1461135861.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Adam Morrison @ 2016-04-20  8:31 UTC (permalink / raw)
  To: dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, joro-zLv9SwRftAIdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: serebrin-hpIqsD4AKlfQT0dZR+AlfA,
	dan-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/,
	omer-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/, shli-b10kYP2dOMg,
	gvdl-hpIqsD4AKlfQT0dZR+AlfA, Kernel-team-b10kYP2dOMg

This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks and eliminating the linearity of
the IOVA allocator,, yielding up to ~5x performance improvement and
approaching iommu=off performance.

For example, here's the throughput obtained by 16 memcached instances
running on a 16-core Sandy Bridge system, accessed using memslap on
another machine that has iommu=off, using the default memslap config
(64-byte keys, 1024-byte values, and 10%/90% SET/GET ops):

    stock iommu=off:
       990,803 memcached transactions/sec (=100%, median of 10 runs).
    stock iommu=on:
       221,416 memcached transactions/sec (=22%).
       [61.70%    0.63%  memcached       [kernel.kallsyms]      [k] _raw_spin_lock_irqsave]
    patched iommu=on:
       963,159 memcached transactions/sec (=97%).
       [1.29%     1.10%  memcached       [kernel.kallsyms]      [k] _raw_spin_lock_irqsave]

The two resolved spinlocks:

 - Deferred IOTLB invalidations are batched in a global data structure
   and serialized under a spinlock (add_unmap() & flush_unmaps()); this
   patchset batches IOTLB invalidations in a per-CPU data structure.

 - IOVA management (alloc_iova() & __free_iova()) is serialized under
   the rbtree spinlock; this patchset adds per-CPU caches of allocated
   IOVAs so that the rbtree doesn't get accessed frequently. (Adding a
   cache above the existing IOVA allocator is less intrusive than dynamic
   identity mapping and helps keep IOMMU page table usage low; see
   Patch 7.)

The paper "Utilizing the IOMMU Scalably" (presented at the 2015 USENIX
Annual Technical Conference) contains many more details and
experiments highlighting the resolved lock contention:

  https://www.usenix.org/conference/atc15/technical-session/presentation/peleg

The resolved linearity of IOVA allocation:

 - The rbtree IOVA allocator (called by alloc_iova()) periodically traverses
   all previously allocated IOVAs in search for (the highest) unallocated
   IOVA; with the new IOVA cache, this code is usually bypassed as 
   allocations are satisfied from the cache in constant time.

The paper "Efficient intra-operating system protection against harmful
DMAs" (presented at the 2015 USENIX Conference on File and Storage
Technologies) contains details about the linearity of IOVA allocation:

  https://www.usenix.org/conference/fast15/technical-sessions/presentation/malka


v4:
 * Patch 7/7: Improve commit message and comment about iova_rcache_get().
 * Patch 5/7: Change placement of "has_iotlb_device" struct field.

v3:
 * Patch 7/7: Respect the caller-passed limit IOVA when satisfying an IOVA
   allocation from the cache.
 * Patch 7/7: Flush the IOVA cache if an rbtree IOVA allocation fails, and
   then retry the allocation.  This addresses the possibility that all
   desired IOVA ranges were in other CPUs' caches.
 * Patch 4/7: Clean up intel_unmap_sg() to use sg accessors. 

v2:

 * Extend IOVA API instead of modifying it, to not break the API's other 
   non-Intel callers.
 * Invalidate all per-cpu invalidations if one CPU hits its per-cpu limit,
   so that we don't defer invalidations more than before.
 * Smaller cap on per-CPU cache size, to consume less of the IOVA space.
 * Free resources and perform IOTLB invalidations when a CPU is hot-unplugged.


Omer Peleg (7):
  iommu/vt-d: refactoring of deferred flush entries
  iommu/vt-d: per-cpu deferred invalidation queues
  iommu/vt-d: correct flush_unmaps pfn usage
  iommu/vt-d: only unmap mapped entries
  iommu/vt-d: avoid dev iotlb logic in intel-iommu for domains with no dev
    iotlbs
  iommu/vt-d: change intel-iommu to use IOVA frame numbers
  iommu/vt-d: introduce per-cpu caching to iova allocation

 drivers/iommu/intel-iommu.c | 318 +++++++++++++++++++++++----------
 drivers/iommu/iova.c        | 417 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/iova.h        |  23 ++-
 3 files changed, 638 insertions(+), 120 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-04-20 19:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20  8:31 [PATCH v4 0/7] Intel IOMMU scalability improvements Adam Morrison
     [not found] ` <cover.1461135861.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-20  8:32   ` [PATCH v4 1/7] iommu/vt-d: refactoring of deferred flush entries Adam Morrison
2016-04-20  8:33   ` [PATCH v4 2/7] iommu/vt-d: per-cpu deferred invalidation queues Adam Morrison
2016-04-20  8:33   ` [PATCH v4 3/7] iommu/vt-d: correct flush_unmaps pfn usage Adam Morrison
2016-04-20  8:33   ` [PATCH v4 4/7] iommu/vt-d: only unmap mapped entries Adam Morrison
2016-04-20  8:33   ` [PATCH v4 5/7] iommu/vt-d: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs Adam Morrison
     [not found]     ` <3780bc5682714a48a8d90ee0adf621dbd3529283.1461135861.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-20 15:17       ` Godfrey van der Linden via iommu
     [not found]         ` <CANSfodEQC3YcVh0_THpDMQx_YZqD=p5Zw5BEnwRhm3hNgaLOhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-20 15:36           ` Adam Morrison
2016-04-20 16:03       ` Adam Morrison
2016-04-20  8:33   ` [PATCH v4 6/7] iommu/vt-d: change intel-iommu to use IOVA frame numbers Adam Morrison
2016-04-20  8:34   ` [PATCH v4 7/7] iommu/vt-d: introduce per-cpu caching to iova allocation Adam Morrison
     [not found]     ` <3d87d1ef7c5ae3e07e2d7c105c2fe2bb9a82cb46.1461135861.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-20 19:21       ` David Woodhouse
     [not found]         ` <1461180093.25115.28.camel-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-20 19:54           ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).