[PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
@ 2016-07-14 11:39 Wanpeng Li
  2016-07-14 14:52 ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-14 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Wanpeng Li, Peter Zijlstra (Intel), Ingo Molnar, Waiman Long,
	Davidlohr Bueso

From: Wanpeng Li <wanpeng.li@hotmail.com>

When the lock holder vCPU is racing with the queue head:

   CPU 0 (lock holder)    CPU 1 (queue head)
   ===================    =================
   spin_lock();           spin_lock();
    pv_kick_node():        pv_wait_head_or_lock():
                            if (!lp) {
                             lp = pv_hash(lock, pn);
                             xchg(&l->locked, _Q_SLOW_VAL);
                            }
                            WRITE_ONCE(pn->state, vcpu_halted);
     cmpxchg(&pn->state, 
      vcpu_halted, vcpu_hashed);
     WRITE_ONCE(l->locked, _Q_SLOW_VAL);
     (void)pv_hash(lock, pn);

In this case, lock holder inserts the pv_node of queue head into the 
hash table and set _Q_SLOW_VAL which can result in hash entry leak. 
This patch avoids it by restoring/setting vcpu_hashed state after 
failing adaptive locking spinning.

Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
 * fix typo in patch description
v1 -> v2:
 * adjust patch description

 kernel/locking/qspinlock_paravirt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 21ede57..ac7d20b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -450,7 +450,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
 				goto gotlock;
 			}
 		}
-		WRITE_ONCE(pn->state, vcpu_halted);
+		WRITE_ONCE(pn->state, vcpu_hashed);
 		qstat_inc(qstat_pv_wait_head, true);
 		qstat_inc(qstat_pv_wait_again, waitcnt);
 		pv_wait(&l->locked, _Q_SLOW_VAL);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-14 11:39 [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning Wanpeng Li
@ 2016-07-14 14:52 ` Waiman Long
  2016-07-14 21:26   ` Wanpeng Li
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-07-14 14:52 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, Wanpeng Li, Peter Zijlstra (Intel), Ingo Molnar,
	Davidlohr Bueso

On 07/14/2016 07:39 AM, Wanpeng Li wrote:
> From: Wanpeng Li<wanpeng.li@hotmail.com>
>
> When the lock holder vCPU is racing with the queue head:
>
>     CPU 0 (lock holder)    CPU 1 (queue head)
>     ===================    =================
>     spin_lock();           spin_lock();
>      pv_kick_node():        pv_wait_head_or_lock():
>                              if (!lp) {
>                               lp = pv_hash(lock, pn);
>                               xchg(&l->locked, _Q_SLOW_VAL);
>                              }
>                              WRITE_ONCE(pn->state, vcpu_halted);
>       cmpxchg(&pn->state,
>        vcpu_halted, vcpu_hashed);
>       WRITE_ONCE(l->locked, _Q_SLOW_VAL);
>       (void)pv_hash(lock, pn);
>
> In this case, lock holder inserts the pv_node of queue head into the
> hash table and set _Q_SLOW_VAL which can result in hash entry leak.
> This patch avoids it by restoring/setting vcpu_hashed state after
> failing adaptive locking spinning.
>
> Reviewed-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel)<peterz@infradead.org>
> Cc: Ingo Molnar<mingo@kernel.org>
> Cc: Waiman Long<Waiman.Long@hpe.com>
> Cc: Davidlohr Bueso<dave@stgolabs.net>
> Signed-off-by: Wanpeng Li<wanpeng.li@hotmail.com>
> ---
> v2 ->  v3:
>   * fix typo in patch description
> v1 ->  v2:
>   * adjust patch description
>
>   kernel/locking/qspinlock_paravirt.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 21ede57..ac7d20b 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -450,7 +450,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>   				goto gotlock;
>   			}
>   		}
> -		WRITE_ONCE(pn->state, vcpu_halted);
> +		WRITE_ONCE(pn->state, vcpu_hashed);
>   		qstat_inc(qstat_pv_wait_head, true);
>   		qstat_inc(qstat_pv_wait_again, waitcnt);
>   		pv_wait(&l->locked, _Q_SLOW_VAL);

As pv_kick_node() is called immediately after designating the next node 
as the queue head, the chance of this racing is possible, but is not 
likely unless the lock holder vCPU gets preempted for a long time at 
that right moment. This change does not do any harm though, so I am OK 
with that. However, I do want you to add a comment about the possible 
race in the code as it isn't that obvious or likely.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-14 14:52 ` Waiman Long
@ 2016-07-14 21:26   ` Wanpeng Li
  2016-07-15  7:09     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-14 21:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-kernel@vger.kernel.org, Wanpeng Li, Peter Zijlstra (Intel),
	Ingo Molnar, Davidlohr Bueso

2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
[...]
> As pv_kick_node() is called immediately after designating the next node as
> the queue head, the chance of this racing is possible, but is not likely
> unless the lock holder vCPU gets preempted for a long time at that right
> moment. This change does not do any harm though, so I am OK with that.
> However, I do want you to add a comment about the possible race in the code
> as it isn't that obvious or likely.

How about something like:

/*
 * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
 * advance its state and hash the lock, restore/set the vcpu_hashed state to
 * avoid the race.
 */

Btw, do you think patch title should be improved, what do you like?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-14 21:26   ` Wanpeng Li
@ 2016-07-15  7:09     ` Peter Zijlstra
  2016-07-15  7:45       ` Wanpeng Li
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-07-15  7:09 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Waiman Long, linux-kernel@vger.kernel.org, Wanpeng Li,
	Ingo Molnar, Davidlohr Bueso

On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
> 2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
> [...]
> > As pv_kick_node() is called immediately after designating the next node as
> > the queue head, the chance of this racing is possible, but is not likely
> > unless the lock holder vCPU gets preempted for a long time at that right
> > moment. This change does not do any harm though, so I am OK with that.
> > However, I do want you to add a comment about the possible race in the code
> > as it isn't that obvious or likely.
> 
> How about something like:
> 
> /*
>  * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
>  * advance its state and hash the lock, restore/set the vcpu_hashed state to
>  * avoid the race.
>  */

So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
vcpu_hashed, we did after all hash the thing.

> Btw, do you think patch title should be improved, what do you like?

I changed it to: "locking/pvqspinlock: Fix double hash race"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-15  7:09     ` Peter Zijlstra
@ 2016-07-15  7:45       ` Wanpeng Li
  2016-07-15 16:44         ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-15  7:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, linux-kernel@vger.kernel.org, Wanpeng Li,
	Ingo Molnar, Davidlohr Bueso

2016-07-15 15:09 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>> 2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
>> [...]
>> > As pv_kick_node() is called immediately after designating the next node as
>> > the queue head, the chance of this racing is possible, but is not likely
>> > unless the lock holder vCPU gets preempted for a long time at that right
>> > moment. This change does not do any harm though, so I am OK with that.
>> > However, I do want you to add a comment about the possible race in the code
>> > as it isn't that obvious or likely.
>>
>> How about something like:
>>
>> /*
>>  * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
>>  * advance its state and hash the lock, restore/set the vcpu_hashed state to
>>  * avoid the race.
>>  */
>
> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be

I believe Waiman can give a better comments. :)

> vcpu_hashed, we did after all hash the thing.
>
>> Btw, do you think patch title should be improved, what do you like?
>
> I changed it to: "locking/pvqspinlock: Fix double hash race"

Thanks. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-15  7:45       ` Wanpeng Li
@ 2016-07-15 16:44         ` Waiman Long
  2016-07-16  1:12           ` Wanpeng Li
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-07-15 16:44 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, Wanpeng Li,
	Ingo Molnar, Davidlohr Bueso

On 07/15/2016 03:45 AM, Wanpeng Li wrote:
> 2016-07-15 15:09 GMT+08:00 Peter Zijlstra<peterz@infradead.org>:
>> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>>> 2016-07-14 22:52 GMT+08:00 Waiman Long<waiman.long@hpe.com>:
>>> [...]
>>>> As pv_kick_node() is called immediately after designating the next node as
>>>> the queue head, the chance of this racing is possible, but is not likely
>>>> unless the lock holder vCPU gets preempted for a long time at that right
>>>> moment. This change does not do any harm though, so I am OK with that.
>>>> However, I do want you to add a comment about the possible race in the code
>>>> as it isn't that obvious or likely.
>>> How about something like:
>>>
>>> /*
>>>   * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
>>>   * advance its state and hash the lock, restore/set the vcpu_hashed state to
>>>   * avoid the race.
>>>   */
>> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
> I believe Waiman can give a better comments. :)

Yes, setting the state to vcpu_hashed is the more obvious choice. What I 
said is not obvious is that there can be a race between the new lock 
holder in pv_kick_node() and the new queue head trying to call 
pv_wait(). And it is what I want to document it. Maybe something more 
graphical can help:

/*
  * lock holder vCPU             queue head vCPU
  * ----------------             ---------------
  * node->locked = 1;
  * <preemption>                 READ_ONCE(node->locked)
  *    ...                       pv_wait_head_or_lock():
  *                                SPIN_THRESHOLD loop;
  *                                pv_hash();
  *                                lock->locked = _Q_SLOW_VAL;
  *                                node->state  = vcpu_hashed;
  * pv_kick_node():
  *   cmpxchg(node->state,
  *      vcpu_halted, vcpu_hashed);
  *   lock->locked = _Q_SLOW_VAL;
  *   pv_hash();
  *
  * With preemption at the right moment, it is possible that both the
  * lock holder and queue head vCPUs can be racing to set node->state.
  * Making sure the state is never set to vcpu_halted will prevent this
  * racing from happening.
  */

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
  2016-07-15 16:44         ` Waiman Long
@ 2016-07-16  1:12           ` Wanpeng Li
  0 siblings, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2016-07-16  1:12 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, Wanpeng Li,
	Ingo Molnar, Davidlohr Bueso

2016-07-16 0:44 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
> On 07/15/2016 03:45 AM, Wanpeng Li wrote:
>>
>> 2016-07-15 15:09 GMT+08:00 Peter Zijlstra<peterz@infradead.org>:
>>>
>>> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>>>>
>>>> 2016-07-14 22:52 GMT+08:00 Waiman Long<waiman.long@hpe.com>:
>>>> [...]
>>>>>
>>>>> As pv_kick_node() is called immediately after designating the next node
>>>>> as
>>>>> the queue head, the chance of this racing is possible, but is not
>>>>> likely
>>>>> unless the lock holder vCPU gets preempted for a long time at that
>>>>> right
>>>>> moment. This change does not do any harm though, so I am OK with that.
>>>>> However, I do want you to add a comment about the possible race in the
>>>>> code
>>>>> as it isn't that obvious or likely.
>>>>
>>>> How about something like:
>>>>
>>>> /*
>>>>   * If the lock holder vCPU gets preempted for a long time, pv_kick_node
>>>> will
>>>>   * advance its state and hash the lock, restore/set the vcpu_hashed
>>>> state to
>>>>   * avoid the race.
>>>>   */
>>>
>>> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
>>
>> I believe Waiman can give a better comments. :)
>
>
> Yes, setting the state to vcpu_hashed is the more obvious choice. What I
> said is not obvious is that there can be a race between the new lock holder
> in pv_kick_node() and the new queue head trying to call pv_wait(). And it is
> what I want to document it. Maybe something more graphical can help:
>
> /*
>  * lock holder vCPU             queue head vCPU
>  * ----------------             ---------------
>  * node->locked = 1;
>  * <preemption>                 READ_ONCE(node->locked)
>  *    ...                       pv_wait_head_or_lock():
>  *                                SPIN_THRESHOLD loop;
>  *                                pv_hash();
>  *                                lock->locked = _Q_SLOW_VAL;
>  *                                node->state  = vcpu_hashed;
>  * pv_kick_node():
>  *   cmpxchg(node->state,
>  *      vcpu_halted, vcpu_hashed);
>  *   lock->locked = _Q_SLOW_VAL;
>  *   pv_hash();
>  *
>  * With preemption at the right moment, it is possible that both the
>  * lock holder and queue head vCPUs can be racing to set node->state.
>  * Making sure the state is never set to vcpu_halted will prevent this
>  * racing from happening.
>  */

Thanks, I will fold this in my patch. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-16  1:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-14 11:39 [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning Wanpeng Li
2016-07-14 14:52 ` Waiman Long
2016-07-14 21:26   ` Wanpeng Li
2016-07-15  7:09     ` Peter Zijlstra
2016-07-15  7:45       ` Wanpeng Li
2016-07-15 16:44         ` Waiman Long
2016-07-16  1:12           ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).