* [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
@ 2016-07-14 11:39 Wanpeng Li
2016-07-14 14:52 ` Waiman Long
0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-14 11:39 UTC (permalink / raw)
To: linux-kernel
Cc: Wanpeng Li, Peter Zijlstra (Intel), Ingo Molnar, Waiman Long,
Davidlohr Bueso
From: Wanpeng Li <wanpeng.li@hotmail.com>
When the lock holder vCPU is racing with the queue head:
CPU 0 (lock holder) CPU 1 (queue head)
=================== =================
spin_lock(); spin_lock();
pv_kick_node(): pv_wait_head_or_lock():
if (!lp) {
lp = pv_hash(lock, pn);
xchg(&l->locked, _Q_SLOW_VAL);
}
WRITE_ONCE(pn->state, vcpu_halted);
cmpxchg(&pn->state,
vcpu_halted, vcpu_hashed);
WRITE_ONCE(l->locked, _Q_SLOW_VAL);
(void)pv_hash(lock, pn);
In this case, lock holder inserts the pv_node of queue head into the
hash table and set _Q_SLOW_VAL which can result in hash entry leak.
This patch avoids it by restoring/setting vcpu_hashed state after
failing adaptive locking spinning.
Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
* fix typo in patch description
v1 -> v2:
* adjust patch description
kernel/locking/qspinlock_paravirt.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 21ede57..ac7d20b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -450,7 +450,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
goto gotlock;
}
}
- WRITE_ONCE(pn->state, vcpu_halted);
+ WRITE_ONCE(pn->state, vcpu_hashed);
qstat_inc(qstat_pv_wait_head, true);
qstat_inc(qstat_pv_wait_again, waitcnt);
pv_wait(&l->locked, _Q_SLOW_VAL);
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-14 11:39 [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning Wanpeng Li
@ 2016-07-14 14:52 ` Waiman Long
2016-07-14 21:26 ` Wanpeng Li
0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-07-14 14:52 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, Wanpeng Li, Peter Zijlstra (Intel), Ingo Molnar,
Davidlohr Bueso
On 07/14/2016 07:39 AM, Wanpeng Li wrote:
> From: Wanpeng Li<wanpeng.li@hotmail.com>
>
> When the lock holder vCPU is racing with the queue head:
>
> CPU 0 (lock holder) CPU 1 (queue head)
> =================== =================
> spin_lock(); spin_lock();
> pv_kick_node(): pv_wait_head_or_lock():
> if (!lp) {
> lp = pv_hash(lock, pn);
> xchg(&l->locked, _Q_SLOW_VAL);
> }
> WRITE_ONCE(pn->state, vcpu_halted);
> cmpxchg(&pn->state,
> vcpu_halted, vcpu_hashed);
> WRITE_ONCE(l->locked, _Q_SLOW_VAL);
> (void)pv_hash(lock, pn);
>
> In this case, lock holder inserts the pv_node of queue head into the
> hash table and set _Q_SLOW_VAL which can result in hash entry leak.
> This patch avoids it by restoring/setting vcpu_hashed state after
> failing adaptive locking spinning.
>
> Reviewed-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel)<peterz@infradead.org>
> Cc: Ingo Molnar<mingo@kernel.org>
> Cc: Waiman Long<Waiman.Long@hpe.com>
> Cc: Davidlohr Bueso<dave@stgolabs.net>
> Signed-off-by: Wanpeng Li<wanpeng.li@hotmail.com>
> ---
> v2 -> v3:
> * fix typo in patch description
> v1 -> v2:
> * adjust patch description
>
> kernel/locking/qspinlock_paravirt.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 21ede57..ac7d20b 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -450,7 +450,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
> goto gotlock;
> }
> }
> - WRITE_ONCE(pn->state, vcpu_halted);
> + WRITE_ONCE(pn->state, vcpu_hashed);
> qstat_inc(qstat_pv_wait_head, true);
> qstat_inc(qstat_pv_wait_again, waitcnt);
> pv_wait(&l->locked, _Q_SLOW_VAL);
As pv_kick_node() is called immediately after designating the next node
as the queue head, the chance of this racing is possible, but is not
likely unless the lock holder vCPU gets preempted for a long time at
that right moment. This change does not do any harm though, so I am OK
with that. However, I do want you to add a comment about the possible
race in the code as it isn't that obvious or likely.
Cheers,
Longman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-14 14:52 ` Waiman Long
@ 2016-07-14 21:26 ` Wanpeng Li
2016-07-15 7:09 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-14 21:26 UTC (permalink / raw)
To: Waiman Long
Cc: linux-kernel@vger.kernel.org, Wanpeng Li, Peter Zijlstra (Intel),
Ingo Molnar, Davidlohr Bueso
2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
[...]
> As pv_kick_node() is called immediately after designating the next node as
> the queue head, the chance of this racing is possible, but is not likely
> unless the lock holder vCPU gets preempted for a long time at that right
> moment. This change does not do any harm though, so I am OK with that.
> However, I do want you to add a comment about the possible race in the code
> as it isn't that obvious or likely.
How about something like:
/*
* If the lock holder vCPU gets preempted for a long time, pv_kick_node will
* advance its state and hash the lock, restore/set the vcpu_hashed state to
* avoid the race.
*/
Btw, do you think patch title should be improved, what do you like?
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-14 21:26 ` Wanpeng Li
@ 2016-07-15 7:09 ` Peter Zijlstra
2016-07-15 7:45 ` Wanpeng Li
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-07-15 7:09 UTC (permalink / raw)
To: Wanpeng Li
Cc: Waiman Long, linux-kernel@vger.kernel.org, Wanpeng Li,
Ingo Molnar, Davidlohr Bueso
On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
> 2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
> [...]
> > As pv_kick_node() is called immediately after designating the next node as
> > the queue head, the chance of this racing is possible, but is not likely
> > unless the lock holder vCPU gets preempted for a long time at that right
> > moment. This change does not do any harm though, so I am OK with that.
> > However, I do want you to add a comment about the possible race in the code
> > as it isn't that obvious or likely.
>
> How about something like:
>
> /*
> * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
> * advance its state and hash the lock, restore/set the vcpu_hashed state to
> * avoid the race.
> */
So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
vcpu_hashed, we did after all hash the thing.
> Btw, do you think patch title should be improved, what do you like?
I changed it to: "locking/pvqspinlock: Fix double hash race"
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-15 7:09 ` Peter Zijlstra
@ 2016-07-15 7:45 ` Wanpeng Li
2016-07-15 16:44 ` Waiman Long
0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2016-07-15 7:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Waiman Long, linux-kernel@vger.kernel.org, Wanpeng Li,
Ingo Molnar, Davidlohr Bueso
2016-07-15 15:09 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>> 2016-07-14 22:52 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
>> [...]
>> > As pv_kick_node() is called immediately after designating the next node as
>> > the queue head, the chance of this racing is possible, but is not likely
>> > unless the lock holder vCPU gets preempted for a long time at that right
>> > moment. This change does not do any harm though, so I am OK with that.
>> > However, I do want you to add a comment about the possible race in the code
>> > as it isn't that obvious or likely.
>>
>> How about something like:
>>
>> /*
>> * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
>> * advance its state and hash the lock, restore/set the vcpu_hashed state to
>> * avoid the race.
>> */
>
> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
I believe Waiman can give a better comments. :)
> vcpu_hashed, we did after all hash the thing.
>
>> Btw, do you think patch title should be improved, what do you like?
>
> I changed it to: "locking/pvqspinlock: Fix double hash race"
Thanks. :)
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-15 7:45 ` Wanpeng Li
@ 2016-07-15 16:44 ` Waiman Long
2016-07-16 1:12 ` Wanpeng Li
0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-07-15 16:44 UTC (permalink / raw)
To: Wanpeng Li
Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, Wanpeng Li,
Ingo Molnar, Davidlohr Bueso
On 07/15/2016 03:45 AM, Wanpeng Li wrote:
> 2016-07-15 15:09 GMT+08:00 Peter Zijlstra<peterz@infradead.org>:
>> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>>> 2016-07-14 22:52 GMT+08:00 Waiman Long<waiman.long@hpe.com>:
>>> [...]
>>>> As pv_kick_node() is called immediately after designating the next node as
>>>> the queue head, the chance of this racing is possible, but is not likely
>>>> unless the lock holder vCPU gets preempted for a long time at that right
>>>> moment. This change does not do any harm though, so I am OK with that.
>>>> However, I do want you to add a comment about the possible race in the code
>>>> as it isn't that obvious or likely.
>>> How about something like:
>>>
>>> /*
>>> * If the lock holder vCPU gets preempted for a long time, pv_kick_node will
>>> * advance its state and hash the lock, restore/set the vcpu_hashed state to
>>> * avoid the race.
>>> */
>> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
> I believe Waiman can give a better comments. :)
Yes, setting the state to vcpu_hashed is the more obvious choice. What I
said is not obvious is that there can be a race between the new lock
holder in pv_kick_node() and the new queue head trying to call
pv_wait(). And it is what I want to document it. Maybe something more
graphical can help:
/*
* lock holder vCPU queue head vCPU
* ---------------- ---------------
* node->locked = 1;
* <preemption> READ_ONCE(node->locked)
* ... pv_wait_head_or_lock():
* SPIN_THRESHOLD loop;
* pv_hash();
* lock->locked = _Q_SLOW_VAL;
* node->state = vcpu_hashed;
* pv_kick_node():
* cmpxchg(node->state,
* vcpu_halted, vcpu_hashed);
* lock->locked = _Q_SLOW_VAL;
* pv_hash();
*
* With preemption at the right moment, it is possible that both the
* lock holder and queue head vCPUs can be racing to set node->state.
* Making sure the state is never set to vcpu_halted will prevent this
* racing from happening.
*/
Cheers,
Longman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning
2016-07-15 16:44 ` Waiman Long
@ 2016-07-16 1:12 ` Wanpeng Li
0 siblings, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2016-07-16 1:12 UTC (permalink / raw)
To: Waiman Long
Cc: Peter Zijlstra, linux-kernel@vger.kernel.org, Wanpeng Li,
Ingo Molnar, Davidlohr Bueso
2016-07-16 0:44 GMT+08:00 Waiman Long <waiman.long@hpe.com>:
> On 07/15/2016 03:45 AM, Wanpeng Li wrote:
>>
>> 2016-07-15 15:09 GMT+08:00 Peter Zijlstra<peterz@infradead.org>:
>>>
>>> On Fri, Jul 15, 2016 at 05:26:40AM +0800, Wanpeng Li wrote:
>>>>
>>>> 2016-07-14 22:52 GMT+08:00 Waiman Long<waiman.long@hpe.com>:
>>>> [...]
>>>>>
>>>>> As pv_kick_node() is called immediately after designating the next node
>>>>> as
>>>>> the queue head, the chance of this racing is possible, but is not
>>>>> likely
>>>>> unless the lock holder vCPU gets preempted for a long time at that
>>>>> right
>>>>> moment. This change does not do any harm though, so I am OK with that.
>>>>> However, I do want you to add a comment about the possible race in the
>>>>> code
>>>>> as it isn't that obvious or likely.
>>>>
>>>> How about something like:
>>>>
>>>> /*
>>>> * If the lock holder vCPU gets preempted for a long time, pv_kick_node
>>>> will
>>>> * advance its state and hash the lock, restore/set the vcpu_hashed
>>>> state to
>>>> * avoid the race.
>>>> */
>>>
>>> So I'm not sure. Yes it was a bug, but its fairly 'obvious' it should be
>>
>> I believe Waiman can give a better comments. :)
>
>
> Yes, setting the state to vcpu_hashed is the more obvious choice. What I
> said is not obvious is that there can be a race between the new lock holder
> in pv_kick_node() and the new queue head trying to call pv_wait(). And it is
> what I want to document it. Maybe something more graphical can help:
>
> /*
> * lock holder vCPU queue head vCPU
> * ---------------- ---------------
> * node->locked = 1;
> * <preemption> READ_ONCE(node->locked)
> * ... pv_wait_head_or_lock():
> * SPIN_THRESHOLD loop;
> * pv_hash();
> * lock->locked = _Q_SLOW_VAL;
> * node->state = vcpu_hashed;
> * pv_kick_node():
> * cmpxchg(node->state,
> * vcpu_halted, vcpu_hashed);
> * lock->locked = _Q_SLOW_VAL;
> * pv_hash();
> *
> * With preemption at the right moment, it is possible that both the
> * lock holder and queue head vCPUs can be racing to set node->state.
> * Making sure the state is never set to vcpu_halted will prevent this
> * racing from happening.
> */
Thanks, I will fold this in my patch. :)
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-07-16 1:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-14 11:39 [PATCH v3] locking/pvqspinlock: restore/set vcpu_hashed state after failing adaptive locking spinning Wanpeng Li
2016-07-14 14:52 ` Waiman Long
2016-07-14 21:26 ` Wanpeng Li
2016-07-15 7:09 ` Peter Zijlstra
2016-07-15 7:45 ` Wanpeng Li
2016-07-15 16:44 ` Waiman Long
2016-07-16 1:12 ` Wanpeng Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).