From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Layton Subject: Re: OOM kills when running fsstress on CIFS Date: Tue, 25 May 2010 12:32:06 -0400 Message-ID: <20100525123206.6b737c88@corrin.poochiereds.net> References: <20100525065705.5a5d95e1@corrin.poochiereds.net> <20100525111639.GM5087@laptop> <20100525074929.1b5f813f@corrin.poochiereds.net> <20100525115431.GO5087@laptop> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org To: Nick Piggin Return-path: In-Reply-To: <20100525115431.GO5087@laptop> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-cifs-client-bounces@lists.samba.org Errors-To: linux-cifs-client-bounces@lists.samba.org List-Id: linux-fsdevel.vger.kernel.org On Tue, 25 May 2010 21:54:31 +1000 Nick Piggin wrote: > On Tue, May 25, 2010 at 07:49:29AM -0400, Jeff Layton wrote: > > On Tue, 25 May 2010 21:16:39 +1000 > > Nick Piggin wrote: > > > > > On Tue, May 25, 2010 at 06:57:05AM -0400, Jeff Layton wrote: > > > > Since 2.6.34, I've been able to consistently reproduce OOM kills when running fsstress (from the LTP suite) on CIFS. I spent some time yesterday and bisected it down to this patch: > > > > > > > > ---------------------[snip]--------------------- > > > > commit 315e995c63a15cb4d4efdbfd70fe2db191917f7a > > > > Author: Nick Piggin > > > > Date: Wed Apr 21 03:18:28 2010 +0000 > > > > > > > > [CIFS] use add_to_page_cache_lru > > > > > > > > add_to_page_cache_lru is exported, so it should be used. Benefits over > > > > using a private pagevec: neater code, 128 bytes fewer stack used, percpu > > > > lru ordering is preserved, and finally don't need to flush pagevec > > > > before returning so batching may be shared with other LRU insertions. > > > > > > > > Signed-off-by: Nick Piggin > > > > Reviewed-by: Dave Kleikamp > > > > Signed-off-by: Steve French > > > > ---------------------[snip]--------------------- > > > > > > > > Here's how I've been reproducing it: > > > > > > > > Mount up a samba share with -o sec=krb5i,nounix,noserverino > > > > > > > > Run: fsstress -d /path/to/dir/on/cifs/ -n 1000 -l0 -p8 > > > > > > > > ...within an hour or two, I start getting OOM kills. After backing out > > > > the patch above, I was able to run the test overnight. I'm not sure yet > > > > what the actual problem is, but there seems to be something wrong with > > > > that patch. > > > > > > > > Thoughts? > > > > > > Yep, it's my fault. The problem is the refcounting. Previously the > > > code hands off the references to the LRU, wheras now the lru takes > > > a new reference. (the other filesystems converted to use this > > > function seemed to more conventionally open-code lru_cache_add). > > > > > > Can we get rid of a refcount increment anywhere? Otherwise we'll > > > need to just drop the references after adding the pages. > > > > > > > The only caller of this function is cifs_readpages, and I don't see > > where it takes any references. I'm guessing that the pages come from > > the VFS with the refcount already incremented? > > Yep. I think we should just page_cache_release after doing the > add_to_page_cache_lru, like the generic code does. > > It's a little suboptimal tinkering with refcounts like that, but > a cleanup pass to fix it up all over the tree and allow > add_to_page_cache_lru to take over a reference would be better > place to fix that. > Ok, I tested out the patch that I just sent and it seems to have fixed the problem. Look ok? -- Jeff Layton