xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 0/3] Implement per-cpu reader-writer locks
@ 2015-12-18 16:08 Malcolm Crossley
  2015-12-18 16:08 ` [PATCHv5 1/3] rwlock: Add per-cpu reader-writer lock infrastructure Malcolm Crossley
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Malcolm Crossley @ 2015-12-18 16:08 UTC (permalink / raw)
  To: malcolm.crossley, JBeulich, ian.campbell, andrew.cooper3,
	Marcos.Matsunaga, keir, konrad.wilk, george.dunlap
  Cc: xen-devel, dario.faggioli, stefano.stabellini

This patch series adds per-cpu reader-writer locks as a generic lock
implementation and then converts the grant table and p2m rwlocks to
use the percpu rwlocks, in order to improve multi-socket host performance.

CPU profiling has revealed the rwlocks themselves suffer from severe cache
line bouncing due to the cmpxchg operation used even when taking a read lock.
Multiqueue paravirtualised I/O results in heavy contention of the grant table
and p2m read locks of a specific domain and so I/O throughput is bottlenecked
by the overhead of the cache line bouncing itself.

Per-cpu read locks avoid lock cache line bouncing by using a per-cpu data
area to record a CPU has taken the read lock. Correctness is enforced for the 
write lock by using a per lock barrier which forces the per-cpu read lock 
to revert to using a standard read lock. The write lock then polls all 
the percpu data area until active readers for the lock have exited.

Removing the cache line bouncing on a multi-socket Haswell-EP system 
dramatically improves performance, with 16 vCPU network IO performance going 
from 15 gb/s to 64 gb/s! The host under test was fully utilising all 40 
logical CPU's at 64 gb/s, so a bigger logical CPU host may see an even better
IO improvement.

Note: Benchmarking of the these performance improvements should be done with 
the non debug version of the hypervisor otherwise the map_domain_page spinlock
is the main bottleneck.

Changes in V4:
- Move percpu_owner ASSERTS to be inline function
- Rename grant table rwlock wrappers

Changes in V4:
- Fix the ASSERTS for the percpu_owner check
 
Changes in V3:
- Add percpu rwlock owner for debug Xen builds
- Validate percpu rwlock owner at runtime for debug Xen builds
- Fix hard tab issues
- Use percpu rwlock wrappers for grant table rwlock users
- Add comments why rw_is_locked ASSERTS have been removed in grant table code

Changes in V2:
- Add Cover letter
- Convert p2m rwlock to percpu rwlock
- Improve percpu rwlock to safely handle simultaneously holding 2 or more 
  locks 
- Move percpu rwlock barrier from global to per lock
- Move write lock cpumask variable to a percpu variable
- Add macros to help initialise and use percpu rwlocks
- Updated IO benchmark results to cover revised locking implementation

Malcolm Crossley (3):
  rwlock: Add per-cpu reader-writer lock infrastructure
  grant_table: convert grant table rwlock to percpu rwlock
  p2m: convert p2m rwlock to percpu rwlock

 xen/arch/arm/mm.c             |   4 +-
 xen/arch/x86/mm.c             |   4 +-
 xen/arch/x86/mm/mm-locks.h    |  12 ++--
 xen/arch/x86/mm/p2m.c         |   1 +
 xen/common/grant_table.c      | 126 +++++++++++++++++++++++-------------------
 xen/common/spinlock.c         |  46 +++++++++++++++
 xen/include/asm-arm/percpu.h  |   5 ++
 xen/include/asm-x86/mm.h      |   2 +-
 xen/include/asm-x86/percpu.h  |   6 ++
 xen/include/xen/grant_table.h |  24 +++++++-
 xen/include/xen/percpu.h      |   4 ++
 xen/include/xen/spinlock.h    | 115 ++++++++++++++++++++++++++++++++++++++
 12 files changed, 282 insertions(+), 67 deletions(-)

-- 
1.7.12.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-01-21 15:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-18 16:08 [PATCHv5 0/3] Implement per-cpu reader-writer locks Malcolm Crossley
2015-12-18 16:08 ` [PATCHv5 1/3] rwlock: Add per-cpu reader-writer lock infrastructure Malcolm Crossley
2015-12-18 16:39   ` Jan Beulich
2015-12-22 11:56   ` George Dunlap
2016-01-11 15:06     ` Malcolm Crossley
2016-01-19 10:29       ` Malcolm Crossley
2016-01-19 12:25         ` George Dunlap
2016-01-20 15:30       ` George Dunlap
2016-01-21 15:17   ` Ian Campbell
2015-12-18 16:08 ` [PATCHv5 2/3] grant_table: convert grant table rwlock to percpu rwlock Malcolm Crossley
2015-12-18 16:40   ` Jan Beulich
2016-01-21 15:31   ` Ian Campbell
2015-12-18 16:08 ` [PATCHv5 3/3] p2m: convert p2m " Malcolm Crossley
2015-12-22 12:07   ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).