From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 27 Oct 2008 23:25:41 -0700 (PDT) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m9S6PYCw022718 for ; Mon, 27 Oct 2008 23:25:34 -0700 Received: from ipmail01.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 042021B662BB for ; Mon, 27 Oct 2008 23:25:32 -0700 (PDT) Received: from ipmail01.adl6.internode.on.net (ipmail01.adl6.internode.on.net [203.16.214.146]) by cuda.sgi.com with ESMTP id zZhwob5SsTBHzWlx for ; Mon, 27 Oct 2008 23:25:32 -0700 (PDT) Date: Tue, 28 Oct 2008 17:25:24 +1100 From: Dave Chinner Subject: Re: deadlock with latest xfs Message-ID: <20081028062524.GQ4985@disturbed> References: <4900412A.2050802@sgi.com> <20081026005351.GK18495@disturbed> <20081026025013.GL18495@disturbed> <200810281702.17135.nickpiggin@yahoo.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200810281702.17135.nickpiggin@yahoo.com.au> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Nick Piggin Cc: Lachlan McIlroy , Christoph Hellwig , xfs-oss , linux-mm@kvack.org On Tue, Oct 28, 2008 at 05:02:16PM +1100, Nick Piggin wrote: > On Sunday 26 October 2008 13:50, Dave Chinner wrote: > > > [1] I don't see how any of the XFS changes we made make this easier to hit. > > What I suspect is a VM regression w.r.t. memory reclaim because this is > > the second problem since 2.6.26 that appears to be a result of memory > > allocation failures in places that we've never, ever seen failures before. > > > > The other new failure is this one: > > > > http://bugzilla.kernel.org/show_bug.cgi?id=11805 > > > > which is an alloc_pages(GFP_KERNEL) failure.... > > > > mm-folk - care to weight in? > > order-0 alloc page GFP_KERNEL can fail sometimes. If it is called > from reclaim or PF_MEMALLOC thread; if it is OOM-killed; fault > injection. > > This is even the case for __GFP_NOFAIL allocations (which basically > are buggy anyway). > > Not sure why it might have started happening, but I didn't see > exactly which alloc_pages you are talking about? If it is via slab, > then maybe some parameters have changed (eg. in SLUB) which is > using higher order allocations. In fs/xfs/linux-2.6/xfs_buf.c::xfs_buf_get_noaddr(). It's doing a single page allocation at a time. It may be that this failure is caused by an increase base memory consumption of the kernel as this failure was reported in an lguest and reproduced with a simple 'modprobe xfs ; mount /dev/xxx /mnt/xfs' command. Maybe the lguest had very little memory available to begin with and trying to allocate 2MB of pages for 8x256k log buffers may have been too much for it... Cheers, Dave. -- Dave Chinner david@fromorbit.com