All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] locking/qspinlock: Enhance pvqspinlock performance
@ 2015-08-08  3:17 Waiman Long
  2015-08-08  3:17 ` [PATCH v5 1/6] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Waiman Long @ 2015-08-08  3:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: x86, linux-kernel, Scott J Norton, Douglas Hatch, Davidlohr Bueso,
	Waiman Long

v4->v5:
 - Rebased the patch to the latest tip tree.
 - Corrected the comments and commit log for patch 1.
 - Removed the v4 patch 5 as PV kick deferment is no longer needed with
   the new tip tree.
 - Simplified the adaptive spinning patch (patch 6) & improve its
   performance a bit further.
 - Re-ran the benchmark test with the new patch.

v3->v4:
 - Patch 1: add comment about possible racing condition in PV unlock.
 - Patch 2: simplified the pv_pending_lock() function as suggested by
   Davidlohr.
 - Move PV unlock optimization patch forward to patch 4 & rerun
   performance test.

v2->v3:
 - Moved deferred kicking enablement patch forward & move back
   the kick-ahead patch to make the effect of kick-ahead more visible.
 - Reworked patch 6 to make it more readable.
 - Reverted back to use state as a tri-state variable instead of
   adding an additional bistate variable.
 - Added performance data for different values of PV_KICK_AHEAD_MAX.
 - Add a new patch to optimize PV unlock code path performance.

v1->v2:
 - Take out the queued unfair lock patches
 - Add a patch to simplify the PV unlock code
 - Move pending bit and statistics collection patches to the front
 - Keep vCPU kicking in pv_kick_node(), but defer it to unlock time
   when appropriate.
 - Change the wait-early patch to use adaptive spinning to better
   balance the difference effect on normal and over-committed guests.
 - Add patch-to-patch performance changes in the patch commit logs.

This patchset tries to improve the performance of both normal and
over-commmitted VM guests. The kick-ahead and adaptive spinning
patches are inspired by the "Do Virtual Machines Really Scale?" blog
from Sanidhya Kashyap.

Patch 1 simplifies the unlock code by removing the unnecessary
state check.

Patch 2 adds pending bit support to pvqspinlock improving performance
at light load.

Patch 3 allows the collection of various count data that are useful
to see what is happening in the system. They do add a bit of overhead
when enabled slowing performance a tiny bit.

Patch 4 optimizes the PV unlock code path performance for x86-64
architecture.

Patch 5 enables multiple vCPU kick-ahead's at unlock time, outside of
the critical section which can improve performance in overcommitted
guests and sometime even in normal guests.

Patch 6 enables adaptive spinning in the queue nodes. This patch can
lead to pretty big performance increase in over-committed guest at
the expense of a slight performance hit in normal guests.

Patches 2 & 4 improves performance of common uncontended and lightly
contended cases. Patches 5-6 are for improving performance in
over-committed VM guests.

Performance measurements were done on a 32-CPU Westmere-EX and
Haswell-EX systems. The Westmere-EX system got the most performance
gain from patch 5, whereas the Haswell-EX system got the most gain
from patch 6 for over-committed guests.

The table below shows the Linux kernel build times for various
values of PV_KICK_AHEAD_MAX on an over-committed 48-vCPU guest on
the Westmere-EX system:

  PV_KICK_AHEAD_MAX	Patches 1-5	Patches 1-6
  -----------------	-----------	-----------
	  1		  9m46.9s	 11m10.1s
	  2		  9m40.2s	 10m08.3s
	  3		  9m36.8s	  9m49.8s
	  4		  9m35.9s	  9m38.7s
	  5		  9m35.1s	  9m33.0s
	  6		  9m35.7s	  9m28.5s

With patches 1-5, the performance wasn't very sensitive to different
PV_KICK_AHEAD_MAX values. Adding patch 6 into the mix, however, changes
the picture quite dramatically. There is a performance regression if
PV_KICK_AHEAD_MAX is too small. Starting with a value of 4, increasing
PV_KICK_AHEAD_MAX only gets us a minor benefit.

Waiman Long (6):
  locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL
  locking/pvqspinlock: Add pending bit support
  locking/pvqspinlock: Collect slowpath lock statistics
  locking/pvqspinlock, x86: Optimize PV unlock code path
  locking/pvqspinlock: Allow vCPUs kick-ahead
  locking/pvqspinlock: Queue node adaptive spinning

 arch/x86/Kconfig                          |    7 +
 arch/x86/include/asm/qspinlock_paravirt.h |   59 ++++
 kernel/locking/qspinlock.c                |   32 ++-
 kernel/locking/qspinlock_paravirt.h       |  475 +++++++++++++++++++++++++++--
 4 files changed, 542 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-08-14  2:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-08  3:17 [PATCH v5 0/6] locking/qspinlock: Enhance pvqspinlock performance Waiman Long
2015-08-08  3:17 ` [PATCH v5 1/6] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL Waiman Long
2015-08-08  6:02   ` Peter Zijlstra
2015-08-14  2:07     ` Waiman Long
2015-08-08  3:17 ` [PATCH v5 2/6] locking/pvqspinlock: Add pending bit support Waiman Long
2015-08-08  3:17 ` [PATCH v5 3/6] locking/pvqspinlock: Collect slowpath lock statistics Waiman Long
2015-08-08  3:17 ` [PATCH v5 4/6] locking/pvqspinlock, x86: Optimize PV unlock code path Waiman Long
2015-08-08  3:18 ` [PATCH v5 5/6] locking/pvqspinlock: Allow vCPUs kick-ahead Waiman Long
2015-08-08  3:18 ` [PATCH v5 6/6] locking/pvqspinlock: Queue node adaptive spinning Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.