linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hp.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support
Date: Mon, 27 Jul 2015 13:30:51 -0400	[thread overview]
Message-ID: <55B66ACB.6010702@hp.com> (raw)
In-Reply-To: <1437958584.25997.27.camel@stgolabs.net>

On 07/26/2015 08:56 PM, Davidlohr Bueso wrote:
> On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote:
>> Like the native qspinlock, using the pending bit when it is lightly
>> loaded to acquire the lock is faster than going through the PV queuing
>> process which is even slower than the native queuing process. It also
>> avoids loading two additional cachelines (the MCS and PV nodes).
>>
>> This patch adds the pending bit support for PV qspinlock. The pending
>> bit code has a smaller spin threshold (1<<10). It will default back
>> to the queuing method if it cannot acquired the lock within a certain
>> time limit.
> Can we infer that this new spin threshold is the metric to detect these
> "light loads"? If so, I cannot help but wonder if there is some more
> straightforward/ad-hoc way of detecting this, ie some pv_<>  function.
> That would also save a lot of time as it would not be time based.
> Although it might be a more costly call altogether, I dunno.

I used the term "light load" to refer to the condition that at most 2 
competing threads are trying to acquire the lock. In that case, the 
pending code will be used. Once there are 3 or more competing threads, 
it will switch back to the regular queuing code. It is the same 
mechanism used in the native code. The only difference is in the 
addition of a loop counter to make sure that the thread won't spend too 
much time on spinning.

> Some comments about this 'loop' threshold.
>
>> +static int pv_pending_lock(struct qspinlock *lock, u32 val)
>> +{
>> +	int loop = PENDING_SPIN_THRESHOLD;
>> +	u32 new, old;
>> +
>> +	/*
>> +	 * wait for in-progress pending->locked hand-overs
>> +	 */
>> +	if (val == _Q_PENDING_VAL) {
>> +		while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)&&
>> +			loop--)
>> +			cpu_relax();
>> +	}
>> +
>> +	/*
>> +	 * trylock || pending
>> +	 */
>> +	for (;;) {
>> +		if (val&  ~_Q_LOCKED_MASK)
>> +			goto queue;
>> +		new = _Q_LOCKED_VAL;
>> +		if (val == new)
>> +			new |= _Q_PENDING_VAL;
>> +		old = atomic_cmpxchg(&lock->val, val, new);
>> +		if (old == val)
>> +			break;
>> +		if (loop--<= 0)
>> +			goto queue;
>> +	}
> So I'm not clear about the semantics of what (should) occurs when the
> threshold is exhausted. In the trylock/pending loop above, you
> immediately return 0, indicating we want to queue. Ok, but below:

This is in the lock slowpath, so it can't return a lock failure.

>> +
>> +	if (new == _Q_LOCKED_VAL)
>> +		goto gotlock;
>> +	/*
>> +	 * We are pending, wait for the owner to go away.
>> +	 */
>> +	while (((val = smp_load_acquire(&lock->val.counter))&  _Q_LOCKED_MASK)
>> +		&&  (loop-->  0))
>> +		cpu_relax();
>> +
>> +	if (!(val&  _Q_LOCKED_MASK)) {
>> +		clear_pending_set_locked(lock);
>> +		goto gotlock;
>> +	}
>> +	/*
>> +	 * Clear the pending bit and fall back to queuing
>> +	 */
>> +	clear_pending(lock);
> ... you call clear_pending before returning. Is this intentional? Smells
> fishy.

The pending bit acts as a 1-slot waiting queue. So if the vCPU needs to 
fall back to regular queuing, it needs to clear the bit.

>
> And basically afaict all this chunk of code does is spin until loop is
> exhausted, and breakout when we got the lock. Ie, something like this is
> a lot cleaner:
>
>                  while (loop--) {
>                  	/*
>                           * We are pending, wait for the owner to go away.
>                           */
>                  	val = smp_load_acquire(&lock->val.counter);
>                  	if (!(val&  _Q_LOCKED_MASK)) {
>                  		clear_pending_set_locked(lock);
>                  		goto gotlock;
>                  	}
>
>                  	cpu_relax();		
>                  }
>
>                  /*
>                   * Clear the pending bit and fall back to queuing
>                   */
>                  clear_pending(lock);
>

Yes, we could change the loop to that. I was just following the same 
logic in the native code.

Cheers,
Longman


  reply	other threads:[~2015-07-27 17:30 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22 20:12 [PATCH v3 0/7] locking/qspinlock: Enhance pvqspinlock performance Waiman Long
2015-07-22 20:12 ` [PATCH v3 1/7] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL Waiman Long
2015-07-25 22:31   ` Davidlohr Bueso
2015-07-27  1:46     ` Davidlohr Bueso
2015-07-27 17:50       ` Waiman Long
2015-07-27 18:41         ` Davidlohr Bueso
2015-07-31  8:39   ` Peter Zijlstra
2015-07-31 17:01     ` Waiman Long
2015-07-22 20:12 ` [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support Waiman Long
2015-07-26 23:09   ` Davidlohr Bueso
2015-07-27 17:11     ` Waiman Long
2015-07-26 23:48   ` Davidlohr Bueso
2015-07-27  0:56   ` Davidlohr Bueso
2015-07-27 17:30     ` Waiman Long [this message]
2015-07-27 19:39       ` Davidlohr Bueso
2015-07-29 20:49         ` Waiman Long
2015-07-27 20:08       ` Davidlohr Bueso
2015-07-22 20:12 ` [PATCH v3 3/7] locking/pvqspinlock: Collect slowpath lock statistics Waiman Long
2015-07-27  1:14   ` Davidlohr Bueso
2015-07-27 17:33     ` Waiman Long
2015-07-22 20:12 ` [PATCH v3 4/7] locking/pvqspinlock: Enable deferment of vCPU kicking to unlock call Waiman Long
2015-07-22 20:12 ` [PATCH v3 5/7] locking/pvqspinlock: Allow vCPUs kick-ahead Waiman Long
2015-07-22 20:12 ` [PATCH v3 6/7] locking/pvqspinlock: Queue node adaptive spinning Waiman Long
2015-07-22 20:12 ` [PATCH v3 7/7] locking/pvqspinlock, x86: Optimize PV unlock code path Waiman Long
2015-07-27  1:18 ` [PATCH v3 0/7] locking/qspinlock: Enhance pvqspinlock performance Davidlohr Bueso
2015-07-27 17:36   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B66ACB.6010702@hp.com \
    --to=waiman.long@hp.com \
    --cc=dave@stgolabs.net \
    --cc=doug.hatch@hp.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).