From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964794AbWG3Xxn (ORCPT ); Sun, 30 Jul 2006 19:53:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964796AbWG3Xxn (ORCPT ); Sun, 30 Jul 2006 19:53:43 -0400 Received: from ms-smtp-01.nyroc.rr.com ([24.24.2.55]:42933 "EHLO ms-smtp-01.nyroc.rr.com") by vger.kernel.org with ESMTP id S964794AbWG3Xxm (ORCPT ); Sun, 30 Jul 2006 19:53:42 -0400 Subject: Re: [PATCH] bug in futex unqueue_me From: Steven Rostedt To: Ingo Molnar Cc: Christian Borntraeger , linux-kernel@vger.kernel.org, Rusty Russell , Ingo Molnar , Thomas Gleixner , Martin Schwidefsky , Andrew Morton In-Reply-To: <20060730063821.GA8748@elte.hu> References: <200607271841.56342.borntrae@de.ibm.com> <20060730063821.GA8748@elte.hu> Content-Type: text/plain Date: Sun, 30 Jul 2006 19:53:21 -0400 Message-Id: <1154303601.10074.64.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2006-07-30 at 08:38 +0200, Ingo Molnar wrote: > * Christian Borntraeger wrote: > > > From: Christian Borntraeger > > > > This patch adds a barrier() in futex unqueue_me to avoid aliasing of > > two pointers. > > > > On my s390x system I saw the following oops: > > > So the code becomes more or less: > > if (q->lock_ptr != 0) spin_lock(q->lock_ptr) > > instead of > > if (lock_ptr != 0) spin_lock(lock_ptr) > > > > Which caused the oops from above. > > interesting, how is this possible? We do a spin_lock(lock_ptr), and > taking a spinlock is an implicit barrier(). So gcc must not delay > evaluating lock_ptr to inside the critical section. And as far as i can > see the s390 spinlock implementation goes through an 'asm volatile' > piece of code, which is a barrier already. So how could this have > happened? I have nothing against adding a barrier(), but we should first > investigate why the spin_lock() didnt act as a barrier - there might be > other, similar bugs hiding. (we rely on spin_lock()s barrier-ness in a > fair number of places) Ingo, this spinlock is probably still a barrier, but is it still a barrier on itself? That is, the problem here is that we have the compiler optimizing the lock_ptr temp variable that is used inside the spin_lock. So does a spin_lock protect itself, or just the stuff inside it? Here we need a barrier to keep gcc from optimizing the use of the lock and not what the lock is protecting. I don't know about other areas in the kernel that has a dynamic spin lock like this that needs protection. -- Steve