From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934094AbcLHAiM (ORCPT ); Wed, 7 Dec 2016 19:38:12 -0500 Received: from 001b2d01.pphosted.com ([148.163.156.1]:33931 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932441AbcLHAiL (ORCPT ); Wed, 7 Dec 2016 19:38:11 -0500 Subject: Re: [PATCH-tip] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs To: Waiman Long , Peter Zijlstra , Ingo Molnar References: <1481051682-29438-1-git-send-email-longman@redhat.com> Cc: linux-kernel@vger.kernel.org, Boqun Feng From: Pan Xinhui Date: Thu, 8 Dec 2016 08:38:01 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1481051682-29438-1-git-send-email-longman@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16120800-1617-0000-0000-0000018B1080 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16120800-1618-0000-0000-00004750A03E Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-07_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612080001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2016/12/7 03:14, Waiman Long 写道: > A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more > relaxed versions to improve performance on architectures that use LL/SC. > > Signed-off-by: Waiman Long > --- thanks! I apply it on my tree. and the tests is okay. > ke rnel/locking/qspinlock_paravirt.h | 36 +++++++++++++++++++----------------- > 1 file changed, 19 insertions(+), 17 deletions(-) > > diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h > index e3b5520..9d2205f 100644 > --- a/kernel/locking/qspinlock_paravirt.h > +++ b/kernel/locking/qspinlock_paravirt.h > @@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock) > struct __qspinlock *l = (void *)lock; > > if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) && > - (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) { > + (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) { > qstat_inc(qstat_pv_lock_stealing, true); > return true; > } > @@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct qspinlock *lock) > > /* > * The pending bit check in pv_queued_spin_steal_lock() isn't a memory > - * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock > - * just to be sure that it will get it. > + * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the > + * lock to provide the proper memory barrier. > */ > static __always_inline int trylock_clear_pending(struct qspinlock *lock) > { > struct __qspinlock *l = (void *)lock; > > return !READ_ONCE(l->locked) && > - (cmpxchg(&l->locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL) > - == _Q_PENDING_VAL); > + (cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL, > + _Q_LOCKED_VAL) == _Q_PENDING_VAL); > } > #else /* _Q_PENDING_BITS == 8 */ > static __always_inline void set_pending(struct qspinlock *lock) > @@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct qspinlock *lock) > */ > old = val; > new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; > - val = atomic_cmpxchg(&lock->val, old, new); > + val = atomic_cmpxchg_acquire(&lock->val, old, new); > > if (val == old) > return 1; > @@ -211,7 +211,7 @@ static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) > > for_each_hash_entry(he, offset, hash) { > hopcnt++; > - if (!cmpxchg(&he->lock, NULL, lock)) { > + if (!cmpxchg_relaxed(&he->lock, NULL, lock)) { > WRITE_ONCE(he->node, node); > qstat_hop(hopcnt); > return &he->lock; > @@ -309,7 +309,7 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) > * MB MB > * [L] pn->locked [RmW] pn->state = vcpu_hashed > * > - * Matches the cmpxchg() from pv_kick_node(). > + * Matches the cmpxchg_release() from pv_kick_node(). > */ > smp_store_mb(pn->state, vcpu_halted); > > @@ -324,7 +324,7 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) > * value so that pv_wait_head_or_lock() knows to not also try > * to hash this lock. > */ > - cmpxchg(&pn->state, vcpu_halted, vcpu_running); > + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running); > > /* > * If the locked flag is still not set after wakeup, it is a > @@ -360,9 +360,10 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) > * pv_wait_node(). If OTOH this fails, the vCPU was running and will > * observe its next->locked value and advance itself. > * > - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node() > + * Matches with smp_store_mb() and cmpxchg_relaxed() in pv_wait_node(). > */ > - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted) > + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) > + != vcpu_halted) > return; > > /* > @@ -461,8 +462,8 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) > } > > /* > - * The cmpxchg() or xchg() call before coming here provides the > - * acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL > + * The cmpxchg_acquire() or xchg() call before coming here provides > + * the acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL > * here is to indicate to the compiler that the value will always > * be nozero to enable better code optimization. > */ > @@ -488,11 +489,12 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) > } > > /* > - * A failed cmpxchg doesn't provide any memory-ordering guarantees, > - * so we need a barrier to order the read of the node data in > - * pv_unhash *after* we've read the lock being _Q_SLOW_VAL. > + * A failed cmpxchg_release doesn't provide any memory-ordering > + * guarantees, so we need a barrier to order the read of the node > + * data in pv_unhash *after* we've read the lock being _Q_SLOW_VAL. > * > - * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL. > + * Matches the cmpxchg_acquire() in pv_wait_head_or_lock() setting > + * _Q_SLOW_VAL. > */ > smp_rmb(); >