From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752781AbdBUNEM (ORCPT ); Tue, 21 Feb 2017 08:04:12 -0500 Received: from foss.arm.com ([217.140.101.70]:60266 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752763AbdBUNEA (ORCPT ); Tue, 21 Feb 2017 08:04:00 -0500 Date: Tue, 21 Feb 2017 13:04:00 +0000 From: Will Deacon To: Boqun Feng Cc: Andrea Parri , Waiman Long , Peter Zijlstra , Ingo Molnar , Pan Xinhui , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs Message-ID: <20170221130400.GG300@arm.com> References: <1487364220-11641-1-git-send-email-longman@redhat.com> <20170220042052.GA4529@andrea> <20170220045358.GI9178@tardis.cn.ibm.com> <20170220045839.GJ9178@tardis.cn.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170220045839.GJ9178@tardis.cn.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 20, 2017 at 12:58:39PM +0800, Boqun Feng wrote: > > So Waiman, the fact is that in this case, we want the following code > > sequence: > > > > CPU 0 CPU 1 > > ================= ==================== > > {pn->state = vcpu_running, node->locked = 0} > > > > smp_store_smb(&pn->state, vcpu_halted): > > WRITE_ONCE(pn->state, vcpu_halted); > > smp_mb(); > > r1 = READ_ONCE(node->locked); > > arch_mcs_spin_unlock_contented(); > > WRITE_ONCE(node->locked, 1) > > > > cmpxchg(&pn->state, vcpu_halted, vcpu_hashed); > > > > never ends up in: > > > > r1 == 0 && cmpxchg fail(i.e. the read part of cmpxchg reads the > > value vcpu_running). > > > > We can have such a guarantee if cmpxchg has a smp_mb() before its load > > part, which is true for PPC. But semantically, cmpxchg() doesn't provide > > any order guarantee if it fails, which is true on ARM64, IIUC. (Add Will > > in Cc for his insight ;-)). I think you're right. The write to node->locked on CPU1 is not required to be ordered before the load part of the failing cmpxchg. > > So a possible "fix"(in case ARM64 will use qspinlock some day), would be > > replace cmpxchg() with smp_mb() + cmpxchg_relaxed(). Peversely, we could actually get away with cmpxchg_acquire on arm64 because arch_mcs_spin_unlock_contended is smp_store_release and we order release -> acquire in the architecture. But that just brings up the age old unlock/lock discussion again... Will