[PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Waiman.Long@hp.com, tglx@linutronix.de, mingo@kernel.org
Cc: linux-arch@vger.kernel.org, riel@redhat.com, gleb@redhat.com,
	kvm@vger.kernel.org, konrad.wilk@oracle.com,
	Peter Zijlstra <peterz@infradead.org>,
	scott.norton@hp.com, raghavendra.kt@linux.vnet.ibm.com,
	paolo.bonzini@gmail.com, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, chegu_vinod@hp.com,
	david.vrabel@citrix.com, oleg@redhat.com,
	xen-devel@lists.xenproject.org, boris.ostrovsky@oracle.com,
	paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org
Subject: [PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable
Date: Sun, 15 Jun 2014 14:47:04 +0200	[thread overview]
Message-ID: <20140615130153.786898559@chello.nl> (raw)
In-Reply-To: 20140615124657.264658593@chello.nl

[-- Attachment #1: waiman_long-qspinlock-use_a_simple_write_to_grab_the_lock_if_applicable.patch --]
[-- Type: text/plain, Size: 5951 bytes --]

From: Waiman Long <Waiman.Long@hp.com>

Currently, atomic_cmpxchg() is used to get the lock. However, this is
not really necessary if there is more than one task in the queue and
the queue head don't need to reset the queue code word. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
queue code word (tail) in the lock is set.

With that change, the are some slight improvement in the performance
of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

		[Standalone/Embedded - same node]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	 2324/2321	2248/2265	 -3%/-2%
       4	 2890/2896	2819/2831	 -2%/-2%
       5	 3611/3595	3522/3512	 -2%/-2%
       6	 4281/4276	4173/4160	 -3%/-3%
       7	 5018/5001	4875/4861	 -3%/-3%
       8	 5759/5750	5563/5568	 -3%/-3%

		[Standalone/Embedded - different nodes]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	12242/12237	12087/12093	 -1%/-1%
       4	10688/10696	10507/10521	 -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queue spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

                AIM7 XFS Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            5678233    3.17       96.61       5.81
  qspinlock             5750799    3.13       94.83       5.97

                AIM7 EXT4 Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            1114551   16.15      509.72       7.11
  qspinlock             2184466    8.24      232.99       6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

  kernel               records/s  Real Time   Sys Time    Usr Time
  -----                ---------  ---------   --------    --------
  ticketlock             2075       10.00      216.35       3.49
  qspinlock              3023       10.00      198.20       4.80

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/locking/qspinlock.c |   59 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 16 deletions(-)

--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -93,24 +93,33 @@ static inline struct mcs_spinlock *decod
  * By using the whole 2nd least significant byte for the pending bit, we
  * can allow better optimization of the lock acquisition for the pending
  * bit holder.
+ *
+ * This internal structure is also used by the set_locked function which
+ * is not restricted to _Q_PENDING_BITS == 8.
  */
-#if _Q_PENDING_BITS == 8
-
 struct __qspinlock {
 	union {
 		atomic_t val;
-		struct {
 #ifdef __LITTLE_ENDIAN
+		u8	 locked;
+		struct {
 			u16	locked_pending;
 			u16	tail;
+		};
 #else
+		struct {
 			u16	tail;
 			u16	locked_pending;
-#endif
 		};
+		struct {
+			u8	reserved[3];
+			u8	locked;
+		};
+#endif
 	};
 };
 
+#if _Q_PENDING_BITS == 8
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -197,6 +206,19 @@ static __always_inline u32 xchg_tail(str
 #endif /* _Q_PENDING_BITS == 8 */
 
 /**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	struct __qspinlock *l = (void *)lock;
+
+	ACCESS_ONCE(l->locked) = _Q_LOCKED_VAL;
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
@@ -328,10 +350,13 @@ void queue_spin_lock_slowpath(struct qsp
 	/*
 	 * we're at the head of the waitqueue, wait for the owner & pending to
 	 * go away.
+	 * Load-acquired is used here because the set_locked()
+	 * function below may not be a full memory barrier.
 	 *
 	 * *,x,y -> *,0,0
 	 */
-	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK)
+	while ((val = smp_load_acquire(&lock->val.counter)) &
+			_Q_LOCKED_PENDING_MASK)
 		cpu_relax();
 
 	/*
@@ -339,15 +364,19 @@ void queue_spin_lock_slowpath(struct qsp
 	 *
 	 * n,0,0 -> 0,0,1 : lock, uncontended
 	 * *,0,0 -> *,0,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail),
+	 * clear the tail code and grab the lock. Otherwise, we only need
+	 * to grab the lock.
 	 */
 	for (;;) {
-		new = _Q_LOCKED_VAL;
-		if (val != tail)
-			new |= val;
-
-		old = atomic_cmpxchg(&lock->val, val, new);
-		if (old == val)
+		if (val != tail) {
+			set_locked(lock);
 			break;
+		}
+		old = atomic_cmpxchg(&lock->val, val, _Q_LOCKED_VAL);
+		if (old == val)
+			goto release;	/* No contention */
 
 		val = old;
 	}
@@ -355,12 +384,10 @@ void queue_spin_lock_slowpath(struct qsp
 	/*
 	 * contended path; wait for next, release.
 	 */
-	if (new != _Q_LOCKED_VAL) {
-		while (!(next = ACCESS_ONCE(node->next)))
-			cpu_relax();
+	while (!(next = ACCESS_ONCE(node->next)))
+		cpu_relax();
 
-		arch_mcs_spin_unlock_contended(&next->locked);
-	}
+	arch_mcs_spin_unlock_contended(&next->locked);
 
 release:
 	/*

WARNING: multiple messages have this Message-ID (diff)

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Waiman.Long@hp.com, tglx@linutronix.de, mingo@kernel.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
	paolo.bonzini@gmail.com, konrad.wilk@oracle.com,
	boris.ostrovsky@oracle.com, paulmck@linux.vnet.ibm.com,
	riel@redhat.com, torvalds@linux-foundation.org,
	raghavendra.kt@linux.vnet.ibm.com, david.vrabel@citrix.com,
	oleg@redhat.com, gleb@redhat.com, scott.norton@hp.com,
	chegu_vinod@hp.com, Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable
Date: Sun, 15 Jun 2014 14:47:04 +0200	[thread overview]
Message-ID: <20140615130153.786898559@chello.nl> (raw)
Message-ID: <20140615124704.-D_s-0BoWUTV-lLnFeqz7ld8fFWk9hoyCwPF7MWBasM@z> (raw)
In-Reply-To: 20140615124657.264658593@chello.nl

[-- Attachment #1: waiman_long-qspinlock-use_a_simple_write_to_grab_the_lock_if_applicable.patch --]
[-- Type: text/plain, Size: 5953 bytes --]

From: Waiman Long <Waiman.Long@hp.com>

Currently, atomic_cmpxchg() is used to get the lock. However, this is
not really necessary if there is more than one task in the queue and
the queue head don't need to reset the queue code word. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
queue code word (tail) in the lock is set.

With that change, the are some slight improvement in the performance
of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

		[Standalone/Embedded - same node]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	 2324/2321	2248/2265	 -3%/-2%
       4	 2890/2896	2819/2831	 -2%/-2%
       5	 3611/3595	3522/3512	 -2%/-2%
       6	 4281/4276	4173/4160	 -3%/-3%
       7	 5018/5001	4875/4861	 -3%/-3%
       8	 5759/5750	5563/5568	 -3%/-3%

		[Standalone/Embedded - different nodes]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	12242/12237	12087/12093	 -1%/-1%
       4	10688/10696	10507/10521	 -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queue spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

                AIM7 XFS Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            5678233    3.17       96.61       5.81
  qspinlock             5750799    3.13       94.83       5.97

                AIM7 EXT4 Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            1114551   16.15      509.72       7.11
  qspinlock             2184466    8.24      232.99       6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

  kernel               records/s  Real Time   Sys Time    Usr Time
  -----                ---------  ---------   --------    --------
  ticketlock             2075       10.00      216.35       3.49
  qspinlock              3023       10.00      198.20       4.80

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/locking/qspinlock.c |   59 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 16 deletions(-)

--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -93,24 +93,33 @@ static inline struct mcs_spinlock *decod
  * By using the whole 2nd least significant byte for the pending bit, we
  * can allow better optimization of the lock acquisition for the pending
  * bit holder.
+ *
+ * This internal structure is also used by the set_locked function which
+ * is not restricted to _Q_PENDING_BITS == 8.
  */
-#if _Q_PENDING_BITS == 8
-
 struct __qspinlock {
 	union {
 		atomic_t val;
-		struct {
 #ifdef __LITTLE_ENDIAN
+		u8	 locked;
+		struct {
 			u16	locked_pending;
 			u16	tail;
+		};
 #else
+		struct {
 			u16	tail;
 			u16	locked_pending;
-#endif
 		};
+		struct {
+			u8	reserved[3];
+			u8	locked;
+		};
+#endif
 	};
 };
 
+#if _Q_PENDING_BITS == 8
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -197,6 +206,19 @@ static __always_inline u32 xchg_tail(str
 #endif /* _Q_PENDING_BITS == 8 */
 
 /**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	struct __qspinlock *l = (void *)lock;
+
+	ACCESS_ONCE(l->locked) = _Q_LOCKED_VAL;
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
@@ -328,10 +350,13 @@ void queue_spin_lock_slowpath(struct qsp
 	/*
 	 * we're at the head of the waitqueue, wait for the owner & pending to
 	 * go away.
+	 * Load-acquired is used here because the set_locked()
+	 * function below may not be a full memory barrier.
 	 *
 	 * *,x,y -> *,0,0
 	 */
-	while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK)
+	while ((val = smp_load_acquire(&lock->val.counter)) &
+			_Q_LOCKED_PENDING_MASK)
 		cpu_relax();
 
 	/*
@@ -339,15 +364,19 @@ void queue_spin_lock_slowpath(struct qsp
 	 *
 	 * n,0,0 -> 0,0,1 : lock, uncontended
 	 * *,0,0 -> *,0,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail),
+	 * clear the tail code and grab the lock. Otherwise, we only need
+	 * to grab the lock.
 	 */
 	for (;;) {
-		new = _Q_LOCKED_VAL;
-		if (val != tail)
-			new |= val;
-
-		old = atomic_cmpxchg(&lock->val, val, new);
-		if (old == val)
+		if (val != tail) {
+			set_locked(lock);
 			break;
+		}
+		old = atomic_cmpxchg(&lock->val, val, _Q_LOCKED_VAL);
+		if (old == val)
+			goto release;	/* No contention */
 
 		val = old;
 	}
@@ -355,12 +384,10 @@ void queue_spin_lock_slowpath(struct qsp
 	/*
 	 * contended path; wait for next, release.
 	 */
-	if (new != _Q_LOCKED_VAL) {
-		while (!(next = ACCESS_ONCE(node->next)))
-			cpu_relax();
+	while (!(next = ACCESS_ONCE(node->next)))
+		cpu_relax();
 
-		arch_mcs_spin_unlock_contended(&next->locked);
-	}
+	arch_mcs_spin_unlock_contended(&next->locked);
 
 release:
 	/*

next prev parent reply	other threads:[~2014-06-15 12:47 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-15 12:46 [PATCH 00/11] qspinlock with paravirt support Peter Zijlstra
2014-06-15 12:46 ` Peter Zijlstra
2014-06-15 12:46 ` [PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock Peter Zijlstra
2014-06-15 12:46   ` Peter Zijlstra
2014-06-16 20:49   ` Konrad Rzeszutek Wilk
2014-06-16 20:49     ` Konrad Rzeszutek Wilk
2014-06-17 20:03     ` Konrad Rzeszutek Wilk
2014-06-17 20:03       ` Konrad Rzeszutek Wilk
2014-06-23 16:12       ` Peter Zijlstra
2014-06-23 16:12         ` Peter Zijlstra
2014-06-23 16:20         ` Konrad Rzeszutek Wilk
2014-06-23 16:20           ` Konrad Rzeszutek Wilk
2014-06-23 15:56     ` Peter Zijlstra
2014-06-23 16:16       ` Konrad Rzeszutek Wilk
2014-06-23 16:16         ` Konrad Rzeszutek Wilk
2014-06-17 20:05   ` Konrad Rzeszutek Wilk
2014-06-17 20:05     ` Konrad Rzeszutek Wilk
2014-06-23 16:26     ` Peter Zijlstra
2014-06-23 16:26       ` Peter Zijlstra
2014-06-23 16:45       ` Konrad Rzeszutek Wilk
2014-06-23 16:45         ` Konrad Rzeszutek Wilk
2014-06-15 12:46 ` [PATCH 02/11] qspinlock, x86: Enable x86-64 to use " Peter Zijlstra
2014-06-15 12:46   ` Peter Zijlstra
2014-06-15 12:47 ` [PATCH 03/11] qspinlock: Add pending bit Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-17 20:36   ` Konrad Rzeszutek Wilk
2014-06-17 20:36     ` Konrad Rzeszutek Wilk
2014-06-17 20:51     ` Waiman Long
2014-06-17 20:51       ` Waiman Long
2014-06-17 21:07       ` Konrad Rzeszutek Wilk
2014-06-17 21:07         ` Konrad Rzeszutek Wilk
2014-06-17 21:10         ` Konrad Rzeszutek Wilk
2014-06-17 21:10           ` Konrad Rzeszutek Wilk
2014-06-17 22:25           ` Waiman Long
2014-06-17 22:25             ` Waiman Long
2014-06-24  8:24         ` Peter Zijlstra
2014-06-24  8:24           ` Peter Zijlstra
2014-06-18 11:29     ` Paolo Bonzini
2014-06-18 11:29       ` Paolo Bonzini
2014-06-18 13:36       ` Konrad Rzeszutek Wilk
2014-06-18 13:36         ` Konrad Rzeszutek Wilk
2014-06-23 16:35     ` Peter Zijlstra
2014-06-23 16:35       ` Peter Zijlstra
2014-06-15 12:47 ` [PATCH 04/11] qspinlock: Extract out the exchange of tail code word Peter Zijlstra
2014-06-17 20:55   ` Konrad Rzeszutek Wilk
2014-06-17 20:55     ` Konrad Rzeszutek Wilk
2014-06-18 11:37     ` Paolo Bonzini
2014-06-18 11:37       ` Paolo Bonzini
2014-06-18 13:50       ` Konrad Rzeszutek Wilk
2014-06-18 13:50         ` Konrad Rzeszutek Wilk
2014-06-18 15:46         ` Waiman Long
2014-06-18 15:46           ` Waiman Long
2014-06-18 15:49           ` Paolo Bonzini
2014-06-18 15:49             ` Paolo Bonzini
2014-06-18 16:02           ` Konrad Rzeszutek Wilk
2014-06-18 16:02             ` Konrad Rzeszutek Wilk
2014-06-24 10:47       ` Peter Zijlstra
2014-06-24 10:47         ` Peter Zijlstra
2014-06-15 12:47 ` [PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-18 11:39   ` Paolo Bonzini
2014-06-18 11:39     ` Paolo Bonzini
2014-07-07 14:35     ` Peter Zijlstra
2014-07-07 14:35       ` Peter Zijlstra
2014-07-07 15:08       ` Paolo Bonzini
2014-07-07 15:08         ` Paolo Bonzini
2014-07-07 15:35         ` Peter Zijlstra
2014-07-07 15:35           ` Peter Zijlstra
2014-07-07 16:10           ` Paolo Bonzini
2014-07-07 16:10             ` Paolo Bonzini
2014-06-18 15:57   ` Konrad Rzeszutek Wilk
2014-06-18 15:57     ` Konrad Rzeszutek Wilk
2014-07-07 14:33     ` Peter Zijlstra
2014-07-07 14:33       ` Peter Zijlstra
2014-06-15 12:47 ` [PATCH 06/11] qspinlock: Optimize pending bit Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-18 11:42   ` Paolo Bonzini
2014-06-18 11:42     ` Paolo Bonzini
2014-06-15 12:47 ` Peter Zijlstra [this message]
2014-06-15 12:47   ` [PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable Peter Zijlstra
2014-06-18 16:36   ` Konrad Rzeszutek Wilk
2014-06-18 16:36     ` Konrad Rzeszutek Wilk
2014-07-07 14:51     ` Peter Zijlstra
2014-07-07 14:51       ` Peter Zijlstra
2014-06-15 12:47 ` [PATCH 08/11] qspinlock: Revert to test-and-set on hypervisors Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-16 21:57   ` Waiman Long
2014-06-18 16:40   ` Konrad Rzeszutek Wilk
2014-06-18 16:40     ` Konrad Rzeszutek Wilk
2014-06-15 12:47 ` [PATCH 09/11] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-18 16:43   ` Konrad Rzeszutek Wilk
2014-06-18 16:43     ` Konrad Rzeszutek Wilk
2014-06-15 12:47 ` [PATCH 10/11] qspinlock: Paravirt support Peter Zijlstra
2014-06-15 12:47   ` Peter Zijlstra
2014-06-16 22:08   ` Waiman Long
2014-06-18 12:03     ` Paolo Bonzini
2014-06-18 12:03       ` Paolo Bonzini
2014-06-18 15:26       ` Waiman Long
2014-06-18 15:26         ` Waiman Long
2014-07-07 15:20       ` Peter Zijlstra
2014-07-07 15:20         ` Peter Zijlstra
2014-07-07 15:20     ` Peter Zijlstra
2014-07-07 15:20       ` Peter Zijlstra
2014-06-17  0:53   ` Waiman Long
2014-06-17  0:53     ` Waiman Long
2014-06-18 12:04   ` Paolo Bonzini
2014-06-18 12:04     ` Paolo Bonzini
2014-06-20 13:46   ` Konrad Rzeszutek Wilk
2014-06-20 13:46     ` Konrad Rzeszutek Wilk
2014-07-07 15:27     ` Peter Zijlstra
2014-07-15 14:23       ` Konrad Rzeszutek Wilk
2014-07-15 14:23         ` Konrad Rzeszutek Wilk
2014-06-15 12:47 ` [PATCH 11/11] qspinlock, kvm: Add paravirt support Peter Zijlstra
2014-06-22 16:36   ` Raghavendra K T
2014-06-22 16:36     ` Raghavendra K T
2014-07-07 15:23     ` Peter Zijlstra
2014-07-07 15:23       ` Peter Zijlstra
2014-06-16 20:52 ` [PATCH 00/11] qspinlock with " Konrad Rzeszutek Wilk
2014-06-16 20:52   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140615130153.786898559@chello.nl \
    --to=a.p.zijlstra@chello.nl \
    --cc=Waiman.Long@hp.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=chegu_vinod@hp.com \
    --cc=david.vrabel@citrix.com \
    --cc=gleb@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paolo.bonzini@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).