All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Intel IOMMU scalability improvements
@ 2016-04-19 16:48 Adam Morrison
       [not found] ` <cover.1460733778.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Adam Morrison @ 2016-04-19 16:48 UTC (permalink / raw)
  To: dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, joro-zLv9SwRftAIdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: serebrin-hpIqsD4AKlfQT0dZR+AlfA,
	dan-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/,
	omer-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/, shli-b10kYP2dOMg,
	gvdl-hpIqsD4AKlfQT0dZR+AlfA, Kernel-team-b10kYP2dOMg

This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks, yielding up to ~5x performance
improvement and approaching iommu=off performance.

For example, here's the throughput obtained by 16 memcached instances
running on a 16-core Sandy Bridge system, accessed using memslap on
another machine that has iommu=off, using the default memslap config
(64-byte keys, 1024-byte values, and 10%/90% SET/GET ops):

    stock iommu=off:
       990,803 memcached transactions/sec (=100%, median of 10 runs).
    stock iommu=on:
       221,416 memcached transactions/sec (=22%).
       [61.70%    0.63%  memcached       [kernel.kallsyms]      [k] _raw_spin_lock_irqsave]
    patched iommu=on:
       963,159 memcached transactions/sec (=97%).
       [1.29%     1.10%  memcached       [kernel.kallsyms]      [k] _raw_spin_lock_irqsave]

The two resolved spinlocks:

 - Deferred IOTLB invalidations are batched in a global data structure
   and serialized under a spinlock (add_unmap() & flush_unmaps()); this
   patchset batches IOTLB invalidations in a per-CPU data structure.

 - IOVA management (alloc_iova() & __free_iova()) is serialized under
   the rbtree spinlock; this patchset adds per-CPU caches of allocated
   IOVAs so that the rbtree doesn't get accessed frequently. (Adding a
   cache above the existing IOVA allocator is less intrusive than dynamic
   identity mapping and helps keep IOMMU page table usage low; see
   Patch 7.)

The paper "Utilizing the IOMMU Scalably" (presented at the 2015 USENIX
Annual Technical Conference) contains many more details and experiments:

  https://www.usenix.org/system/files/conference/atc15/atc15-paper-peleg.pdf

v3:
 * Patch 7/7: Respect the caller-passed limit IOVA when satisfying an IOVA
   allocation from the cache.
 * Patch 7/7: Flush the IOVA cache if an rbtree IOVA allocation fails, and
   then retry the allocation.  This addresses the possibility that all
   desired IOVA ranges were in other CPUs' caches.
 * Patch 4/7: Clean up intel_unmap_sg() to use sg accessors. 

v2:

 * Extend IOVA API instead of modifying it, to not break the API's other 
   non-Intel callers.
 * Invalidate all per-cpu invalidations if one CPU hits its per-cpu limit,
   so that we don't defer invalidations more than before.
 * Smaller cap on per-CPU cache size, to consume less of the IOVA space.
 * Free resources and perform IOTLB invalidations when a CPU is hot-unplugged.


Omer Peleg (7):
  iommu: refactoring of deferred flush entries
  iommu: per-cpu deferred invalidation queues
  iommu: correct flush_unmaps pfn usage
  iommu: only unmap mapped entries
  iommu: avoid dev iotlb logic in intel-iommu for domains with no dev
    iotlbs
  iommu: change intel-iommu to use IOVA frame numbers
  iommu: introduce per-cpu caching to iova allocation

 drivers/iommu/intel-iommu.c | 318 +++++++++++++++++++++++----------
 drivers/iommu/iova.c        | 416 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/iova.h        |  23 ++-
 3 files changed, 637 insertions(+), 120 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-04-20  8:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-19 16:48 [PATCH v3 0/7] Intel IOMMU scalability improvements Adam Morrison
     [not found] ` <cover.1460733778.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-19 16:48   ` [PATCH v3 1/7] iommu: refactoring of deferred flush entries Adam Morrison
2016-04-19 16:48   ` [PATCH v3 2/7] iommu: per-cpu deferred invalidation queues Adam Morrison
2016-04-19 16:48   ` [PATCH v3 3/7] iommu: correct flush_unmaps pfn usage Adam Morrison
2016-04-19 16:49   ` [PATCH v3 4/7] iommu: only unmap mapped entries Adam Morrison
2016-04-19 16:49   ` [PATCH v3 5/7] iommu: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs Adam Morrison
     [not found]     ` <20d0571ab8dda05bd3980b18df0a810fb26480e4.1460733778.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-19 19:09       ` David Woodhouse
     [not found]         ` <1461092967.25115.7.camel-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-20  8:21           ` Adam Morrison
2016-04-19 16:49   ` [PATCH v3 6/7] iommu: change intel-iommu to use IOVA frame numbers Adam Morrison
2016-04-19 16:49   ` [PATCH v3 7/7] iommu: introduce per-cpu caching to iova allocation Adam Morrison
     [not found]     ` <0923752fb142601d6c2b58bcf7cdceea6ec9305e.1460733778.git.mad-FrESSTt7Abv7r6psnUbsSmZHpeb/A1Y/@public.gmane.org>
2016-04-19 20:50       ` Benjamin Serebrin via iommu
2016-04-19 20:47   ` [PATCH v3 0/7] Intel IOMMU scalability improvements Shaohua Li
     [not found]     ` <20160419204705.GA1134028-tb7CFzD8y5b7E6g3fPdp/g2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2016-04-19 20:52       ` Benjamin Serebrin via iommu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.