From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id n068cJN0032070 for ; Tue, 6 Jan 2009 02:38:21 -0600 Received: from email.aon.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id C4B9117AF80D for ; Tue, 6 Jan 2009 00:38:17 -0800 (PST) Received: from email.aon.at (WARSL404PIP1.highway.telekom.at [195.3.96.112]) by cuda.sgi.com with ESMTP id vdJ0gGxQhjoIamei for ; Tue, 06 Jan 2009 00:38:17 -0800 (PST) Message-ID: <49631877.3090803@aon.at> Date: Tue, 06 Jan 2009 09:38:15 +0100 From: Peter Klotz MIME-Version: 1.0 Subject: Re: [patch] mm: fix lockless pagecache reordering bug (was Re: BUG: soft lockup - is this XFS problem?) References: <20090105041959.GC367@wotan.suse.de> <20090105064838.GA5209@wotan.suse.de> <49623384.2070801@aon.at> <20090105164135.GC32675@wotan.suse.de> <20090105180008.GE32675@wotan.suse.de> <20090105201258.GN6959@linux.vnet.ibm.com> <20090105215727.GQ6959@linux.vnet.ibm.com> <20090106020550.GA819@wotan.suse.de> In-Reply-To: <20090106020550.GA819@wotan.suse.de> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Nick Piggin Cc: linux-kernel@vger.kernel.org, Roman Kononov , xfs@oss.sgi.com, Christoph Hellwig , Linux Memory Management List , Andrew Morton , "Paul E. McKenney" , Linus Torvalds , stable@kernel.org Nick Piggin wrote: > -- > Subject: mm lockless pagecache barrier fix > > An XFS workload showed up a bug in the lockless pagecache patch. Basically it > would go into an "infinite" loop, although it would sometimes be able to break > out of the loop! The reason is a missing compiler barrier in the "increment > reference count unless it was zero" case of the lockless pagecache protocol in > the gang lookup functions. > > This would cause the compiler to use a cached value of struct page pointer to > retry the operation with, rather than reload it. So the page might have been > removed from pagecache and freed (refcount==0) but the lookup would not correctly > notice the page is no longer in pagecache, and keep attempting to increment the > refcount and failing, until the page gets reallocated for something else. This > isn't a data corruption because the condition will be detected if the page has > been reallocated. However it can result in a lockup. > > Linus points out that ACCESS_ONCE is also required in that pointer load, even > if it's absence is not causing a bug on our particular build. The most general > way to solve this is just to put an rcu_dereference in radix_tree_deref_slot. > > Assembly of find_get_pages, > before: > .L220: > movq (%rbx), %rax #* ivtmp.1162, tmp82 > movq (%rax), %rdi #, prephitmp.1149 > .L218: > testb $1, %dil #, prephitmp.1149 > jne .L217 #, > testq %rdi, %rdi # prephitmp.1149 > je .L203 #, > cmpq $-1, %rdi #, prephitmp.1149 > je .L217 #, > movl 8(%rdi), %esi # ._count.counter, c > testl %esi, %esi # c > je .L218 #, > > after: > .L212: > movq (%rbx), %rax #* ivtmp.1109, tmp81 > movq (%rax), %rdi #, ret > testb $1, %dil #, ret > jne .L211 #, > testq %rdi, %rdi # ret > je .L197 #, > cmpq $-1, %rdi #, ret > je .L211 #, > movl 8(%rdi), %esi # ._count.counter, c > testl %esi, %esi # c > je .L212 #, > > (notice the obvious infinite loop in the first example, if page->count remains 0) > > Signed-off-by: Nick Piggin > --- > include/linux/radix-tree.h | 2 +- > mm/filemap.c | 23 ++++++++++++++++++++--- > 2 files changed, 21 insertions(+), 4 deletions(-) > > Index: linux-2.6/include/linux/radix-tree.h > =================================================================== > --- linux-2.6.orig/include/linux/radix-tree.h > +++ linux-2.6/include/linux/radix-tree.h > @@ -136,7 +136,7 @@ do { \ > */ > static inline void *radix_tree_deref_slot(void **pslot) > { > - void *ret = *pslot; > + void *ret = rcu_dereference(*pslot); > if (unlikely(radix_tree_is_indirect_ptr(ret))) > ret = RADIX_TREE_RETRY; > return ret; > > The patch above fixes my problem. I did two complete test runs that normally fail rather quickly. Regards, Peter. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs