linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org, Rik van Riel <riel@redhat.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Oleg Nesterov <oleg@redhat.com>,
	kvm@vger.kernel.org,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Daniel J Blueman <daniel@numascale.com>,
	x86@kernel.org, Paolo Bonzini <paolo.bonzini@gmail.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Scott J Norton <scott.norton@hp.com>,
	Ingo Molnar <mingo@redhat.com>,
	David Vrabel <david.vrabel@citrix.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	xen-devel@lists.xenproject.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
Date: Mon, 13 Apr 2015 11:51:36 -0400	[thread overview]
Message-ID: <552BE608.7000901@hp.com> (raw)
In-Reply-To: <20150413150832.GH5029@twins.programming.kicks-ass.net>

On 04/13/2015 11:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
>
>>>> +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
>>>> +{
>>>> +	struct __qspinlock *l = (void *)lock;
>>>> +	struct qspinlock **lp = NULL;
>>>> +	struct pv_node *pn = (struct pv_node *)node;
>>>> +	int slow_set = false;
>>>> +	int loop;
>>>> +
>>>> +	for (;;) {
>>>> +		for (loop = SPIN_THRESHOLD; loop; loop--) {
>>>> +			if (!READ_ONCE(l->locked))
>>>> +				return;
>>>> +
>>>> +			cpu_relax();
>>>> +		}
>>>> +
>>>> +		WRITE_ONCE(pn->state, vcpu_halted);
>>>> +		if (!lp)
>>>> +			lp = pv_hash(lock, pn);
>>>> +		/*
>>>> +		 * lp must be set before setting _Q_SLOW_VAL
>>>> +		 *
>>>> +		 * [S] lp = lock                [RmW] l = l->locked = 0
>>>> +		 *     MB                             MB
>>>> +		 * [S] l->locked = _Q_SLOW_VAL  [L]   lp
>>>> +		 *
>>>> +		 * Matches the cmpxchg() in pv_queue_spin_unlock().
>>>> +		 */
>>>> +		if (!slow_set&&
>>>> +		    !cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
>>>> +			/*
>>>> +			 * The lock is free and _Q_SLOW_VAL has never been
>>>> +			 * set. Need to clear the hash bucket before getting
>>>> +			 * the lock.
>>>> +			 */
>>>> +			WRITE_ONCE(*lp, NULL);
>>>> +			return;
>>>> +		} else if (slow_set&&   !READ_ONCE(l->locked))
>>>> +			return;
>>>> +		slow_set = true;
>>> I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
>>> I had, namely:
>>>
>>> 		if (!lp) {
>>> 			lp = pv_hash(lock, pn);
>>>
>>> 			/*
>>> 			 * comment
>>> 			 */
>>> 			lv = cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
>>> 			if (lv != _Q_LOCKED_VAL) {
>>> 				/* we're woken, unhash and return */
>>> 				WRITE_ONCE(*lp, NULL);
>>> 				return;
>>> 			}
>>> 		}
>>>> +
>>>> +		pv_wait(&l->locked, _Q_SLOW_VAL);
>>> If we get a spurious wakeup (due to device interrupts or random kick)
>>> we'll loop around but ->locked will remain _Q_SLOW_VAL.
>> The purpose of the slow_set flag is not about the lock value. It is to make
>> sure that pv_hash_find() will always find a match. Consider the following
>> scenario:
>>
>> cpu1            cpu2                    cpu3
>> ----            ----                    ----
>> pv_wait
>> spurious wakeup
>> loop l->locked
>>
>>                  read _Q_SLOW_VAL
>>                  pv_hash_find()
>>                  unlock
>>
>>                                          pv_hash()<- same entry
>>
>> cmpxchg(&l->locked)
>> clear hash (?)
>>
>> Here, the entry for cpu3 will be removed leading to panic when
>> pv_hash_find() can find the entry. So the hash entry can only be
>> removed if the other cpu has no chance to see _Q_SLOW_VAL.
> Still confused. Afaict that cannot happen with my code. We only do the
> cmpxchg(, _Q_SLOW_VAL) _once_.
>
> Only on the first time around that loop will we hash the lock and set
> the slow flag. And cpu3 cannot come in on the same entry because we've
> not yet released the lock when we find and unhash.
>
>

Maybe I am not clear in my illustration. More than one locks can be 
hashed to the same value. So cpu3 is accessing a different lock which 
has the same hashed value as the lock used by cpu1 and cpu2.

Anyway, I remove the slow_set flag by unrolling the retry loop so that 
after pv_wait(), it goes into the 2nd loop instead of going back to the 
top. As a result, cmpxchg and pv_hash can only be called once.

Cheers,
Longman

WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <waiman.long@hp.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-arch@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
	Paolo Bonzini <paolo.bonzini@gmail.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Daniel J Blueman <daniel@numascale.com>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
Date: Mon, 13 Apr 2015 11:51:36 -0400	[thread overview]
Message-ID: <552BE608.7000901@hp.com> (raw)
Message-ID: <20150413155136.AggeSRc2JAS8TofkHhISFogd25ZVT1yz8v3ldinqHgs@z> (raw)
In-Reply-To: <20150413150832.GH5029@twins.programming.kicks-ass.net>

On 04/13/2015 11:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
>
>>>> +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
>>>> +{
>>>> +	struct __qspinlock *l = (void *)lock;
>>>> +	struct qspinlock **lp = NULL;
>>>> +	struct pv_node *pn = (struct pv_node *)node;
>>>> +	int slow_set = false;
>>>> +	int loop;
>>>> +
>>>> +	for (;;) {
>>>> +		for (loop = SPIN_THRESHOLD; loop; loop--) {
>>>> +			if (!READ_ONCE(l->locked))
>>>> +				return;
>>>> +
>>>> +			cpu_relax();
>>>> +		}
>>>> +
>>>> +		WRITE_ONCE(pn->state, vcpu_halted);
>>>> +		if (!lp)
>>>> +			lp = pv_hash(lock, pn);
>>>> +		/*
>>>> +		 * lp must be set before setting _Q_SLOW_VAL
>>>> +		 *
>>>> +		 * [S] lp = lock                [RmW] l = l->locked = 0
>>>> +		 *     MB                             MB
>>>> +		 * [S] l->locked = _Q_SLOW_VAL  [L]   lp
>>>> +		 *
>>>> +		 * Matches the cmpxchg() in pv_queue_spin_unlock().
>>>> +		 */
>>>> +		if (!slow_set&&
>>>> +		    !cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
>>>> +			/*
>>>> +			 * The lock is free and _Q_SLOW_VAL has never been
>>>> +			 * set. Need to clear the hash bucket before getting
>>>> +			 * the lock.
>>>> +			 */
>>>> +			WRITE_ONCE(*lp, NULL);
>>>> +			return;
>>>> +		} else if (slow_set&&   !READ_ONCE(l->locked))
>>>> +			return;
>>>> +		slow_set = true;
>>> I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
>>> I had, namely:
>>>
>>> 		if (!lp) {
>>> 			lp = pv_hash(lock, pn);
>>>
>>> 			/*
>>> 			 * comment
>>> 			 */
>>> 			lv = cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
>>> 			if (lv != _Q_LOCKED_VAL) {
>>> 				/* we're woken, unhash and return */
>>> 				WRITE_ONCE(*lp, NULL);
>>> 				return;
>>> 			}
>>> 		}
>>>> +
>>>> +		pv_wait(&l->locked, _Q_SLOW_VAL);
>>> If we get a spurious wakeup (due to device interrupts or random kick)
>>> we'll loop around but ->locked will remain _Q_SLOW_VAL.
>> The purpose of the slow_set flag is not about the lock value. It is to make
>> sure that pv_hash_find() will always find a match. Consider the following
>> scenario:
>>
>> cpu1            cpu2                    cpu3
>> ----            ----                    ----
>> pv_wait
>> spurious wakeup
>> loop l->locked
>>
>>                  read _Q_SLOW_VAL
>>                  pv_hash_find()
>>                  unlock
>>
>>                                          pv_hash()<- same entry
>>
>> cmpxchg(&l->locked)
>> clear hash (?)
>>
>> Here, the entry for cpu3 will be removed leading to panic when
>> pv_hash_find() can find the entry. So the hash entry can only be
>> removed if the other cpu has no chance to see _Q_SLOW_VAL.
> Still confused. Afaict that cannot happen with my code. We only do the
> cmpxchg(, _Q_SLOW_VAL) _once_.
>
> Only on the first time around that loop will we hash the lock and set
> the slow flag. And cpu3 cannot come in on the same entry because we've
> not yet released the lock when we find and unhash.
>
>

Maybe I am not clear in my illustration. More than one locks can be 
hashed to the same value. So cpu3 is accessing a different lock which 
has the same hashed value as the lock used by cpu1 and cpu2.

Anyway, I remove the slow_set flag by unrolling the retry loop so that 
after pv_wait(), it goes into the 2nd loop instead of going back to the 
top. As a result, cmpxchg and pv_hash can only be called once.

Cheers,
Longman

  reply	other threads:[~2015-04-13 15:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-07  2:55 [PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support Waiman Long
2015-04-07  2:55 ` [PATCH v15 01/15] qspinlock: A simple generic 4-byte queue spinlock Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 02/15] qspinlock, x86: Enable x86-64 to use " Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 03/15] qspinlock: Add pending bit Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 04/15] qspinlock: Extract out code snippets for the next patch Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 05/15] qspinlock: Optimize for smaller NR_CPUS Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 06/15] qspinlock: Use a simple write to grab the lock Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 07/15] qspinlock: Revert to test-and-set on hypervisors Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 08/15] lfsr: a simple binary Galois linear feedback shift register Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-09 18:13   ` Peter Zijlstra
2015-04-09 18:13     ` Peter Zijlstra
2015-04-09 18:23     ` Peter Zijlstra
2015-04-09 18:23       ` Peter Zijlstra
2015-04-09 20:36       ` Waiman Long
2015-04-09 20:36         ` Waiman Long
2015-04-09 21:41     ` Waiman Long
2015-04-09 21:41       ` Waiman Long
2015-04-13 14:47       ` Peter Zijlstra
2015-04-13 15:45         ` Waiman Long
2015-04-13 15:45           ` Waiman Long
2015-04-13 15:08       ` Peter Zijlstra
2015-04-13 15:51         ` Waiman Long [this message]
2015-04-13 15:51           ` Waiman Long
2015-04-13 15:09       ` Peter Zijlstra
2015-04-13 16:19         ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 10/15] pvqspinlock: Implement the paravirt qspinlock for x86 Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 11/15] pvqspinlock, x86: Enable PV qspinlock for KVM Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-08 12:01   ` [Xen-devel] " David Vrabel
2015-04-08 12:01     ` David Vrabel
2015-04-08 17:42     ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-09 19:57   ` Peter Zijlstra
2015-04-09 20:07     ` Peter Zijlstra
2015-04-09 20:07       ` Peter Zijlstra
2015-04-09 22:06     ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 14/15] pvqspinlock: Improve slowpath performance by avoiding cmpxchg Waiman Long
2015-04-07  2:55   ` Waiman Long
2015-04-07  2:55 ` [PATCH v15 15/15] pvqspinlock: Add debug code to check for PV lock hash sanity Waiman Long
2015-04-07  2:55   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552BE608.7000901@hp.com \
    --to=waiman.long@hp.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel@numascale.com \
    --cc=david.vrabel@citrix.com \
    --cc=doug.hatch@hp.com \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paolo.bonzini@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).