From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 31 Jan 2019 17:52:28 +0100 From: Heiko Carstens Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede References: <20190130125955.GD5299@osiris> <20190130132420.spwrq2d4oxeydk5s@linutronix.de> <20190130210733.mg6aascw2gzl3oqz@linutronix.de> <20190130233557.GA4240@linux.ibm.com> MIME-Version: 1.0 In-Reply-To: Message-Id: <20190131165228.GA32680@osiris> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-Archive: List-Post: To: Thomas Gleixner Cc: "Paul E. McKenney" , Sebastian Sewior , Peter Zijlstra , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler List-ID: On Thu, Jan 31, 2019 at 01:27:25AM +0100, Thomas Gleixner wrote: > On Thu, 31 Jan 2019, Thomas Gleixner wrote: > > > On Wed, 30 Jan 2019, Paul E. McKenney wrote: > > > On Thu, Jan 31, 2019 at 12:13:51AM +0100, Thomas Gleixner wrote: > > > > I might be wrong as usual, but this would definitely explain the fail very > > > > well. > > > > > > On recent versions of GCC, the fix would be to put this between the two > > > stores that need ordering: > > > > > > __atomic_thread_fence(__ATOMIC_RELEASE); > > > > > > I must defer to Heiko on whether s390 GCC might tear the stores. My > > > guess is "probably not". ;-) > > > > So I just checked the latest glibc code. It has: > > > > /* We must not enqueue the mutex before we have acquired it. > > Also see comments at ENQUEUE_MUTEX. */ > > __asm ("" ::: "memory"); > > ENQUEUE_MUTEX_PI (mutex); > > /* We need to clear op_pending after we enqueue the mutex. */ > > __asm ("" ::: "memory"); > > THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL); > > > > 8f9450a0b7a9 ("Add compiler barriers around modifications of the robust mutex list.") > > > > in the glibc repository, There since Dec 24 2016 ... > > And of course, I'm using the latest greatest glibc for testing that, so I'm > not at all surprised that it just does not reproduce on my tests. As discussed on IRC: I used plain vanilla glibc version 2.28 for my tests. This version already contains the commit you mentioned above. > I just hacked the ordering and restarted the test. If the theory holds, > then this should die sooner than later. ...nevertheless Stefan and I looked through the lovely disassembly of _pthread_mutex_lock_full() to verify if the compiler barriers are actually doing what they are supposed to do. The generated code however does look correct. So, it must be something different.