From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 19 Jun 2018 09:52:29 +0100 From: Will Deacon To: Peter Zijlstra Cc: Paul Burton , Huacai Chen , Ralf Baechle , James Hogan , linux-mips@linux-mips.org, Fuxin Zhang , Zhangjin Wu , Huacai Chen , stable@vger.kernel.org, Alan Stern , Andrea Parri , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , "Paul E. McKenney" , Akira Yokosawa , linux-kernel@vger.kernel.org Subject: Re: [PATCH] MIPS: implement smp_cond_load_acquire() for Loongson-3 Message-ID: <20180619085229.GA13984@arm.com> References: <1529042858-9483-1-git-send-email-chenhc@lemote.com> <20180618185141.yvkrsbdi2gbxjxj7@pburton-laptop> <20180619071710.GB2494@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180619071710.GB2494@hirez.programming.kicks-ass.net> Sender: linux-kernel-owner@vger.kernel.org List-ID: Hi all, On Tue, Jun 19, 2018 at 09:17:10AM +0200, Peter Zijlstra wrote: > On Mon, Jun 18, 2018 at 11:51:41AM -0700, Paul Burton wrote: > > On Fri, Jun 15, 2018 at 02:07:38PM +0800, Huacai Chen wrote: > > > After commit 7f56b58a92aaf2c ("locking/mcs: Use smp_cond_load_acquire() > > > in MCS spin loop") Loongson-3 fails to boot. This is because Loongson-3 > > > has SFB (Store Fill Buffer) and READ_ONCE() may get an old value in a > > > tight loop. So in smp_cond_load_acquire() we need a __smp_mb() after > > > every READ_ONCE(). > > > > Thanks - modifying smp_cond_load_acquire() is a step better than > > modifying arch_mcs_spin_lock_contended() to avoid it, but I'm still not > > sure we've reached the root of the problem. > > Agreed, this looks entirely dodgy. > > > If tight loops using > > READ_ONCE() are at fault then what's special about > > smp_cond_load_acquire()? Could other such loops not hit the same > > problem? > > Right again, Linux has a number of places where it relies on loops like > this. > > for (;;) { > if (READ_ONCE(*ptr)) > break; > > cpu_relax(); > } > > That is assumed to terminate -- provided the store to make *ptr != 0 > happens of course. > > And this has nothing to do with store buffers per se, sure store-buffers > might delay the store from being visible for a (little) while, but we > very much assume store buffers will not indefinitely hold on to data. We had an issue 8 years ago with the 11MPCore CPU where reads were prioritised over writes, so code doing something like: WRITE_ONCE(*foo, 1); while (!READ_ONCE(*bar)); might never make the store to foo visible to other CPUs. This caused a livelock in KGDB, where two CPUs were doing this on opposite variables (i.e. the "SB" litmus test, but with the reads looping until they read 1). See 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore") for the ugly fix, assuming that the "Store Fill Buffer" suffers from the same disease. Will