From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: What's the NFS OOM problem? Date: Fri, 11 Aug 2006 10:48:30 +0200 Message-ID: <1155286110.5696.64.camel@twins> References: <4ae3c140608081524u4666fb7x741734908c35cfe6@mail.gmail.com> <20060810045711.GI8776@1wt.eu> <17627.53340.43470.60811@cse.unsw.edu.au> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Willy Tarreau , Xin Zhao , linux-kernel , linux-fsdevel@vger.kernel.org Return-path: Received: from amsfep17-int.chello.nl ([213.46.243.15]:24500 "EHLO amsfep20-int.chello.nl") by vger.kernel.org with ESMTP id S1750885AbWHKIsr (ORCPT ); Fri, 11 Aug 2006 04:48:47 -0400 To: Neil Brown In-Reply-To: <17627.53340.43470.60811@cse.unsw.edu.au> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Fri, 2006-08-11 at 10:33 +1000, Neil Brown wrote: > On Thursday August 10, w@1wt.eu wrote: > > > > > Can someone help me and give me a brief description on OOM issue? > > > > I don't know about any OOM issue related to NFS. At most it might happen > > on the client (eg: stating firefox from an NFS root) which might not have > > enough memory for new network buffers, but I don't even know if it's > > possible at all. > > We've had reports of OOM problems with NFS at SuSE. > The common factors seem to be lots of memory (6G+) and very large > files. > Tuning down /proc/sys/vm/dirty_*ratio seems to avoid the problem, > but I'm not very close to understanding what the real problem is. Would it not be related to mmap'ed files, where the client will not properly track the dirty pages? This will make the reclaim code go crap itself because suddenly not a single page is easily freeable anymore, all pages are then found to be dirty and require writeback, which takes more memory - ie. allocate network packets, and wait for proper answer. Andrew is currently carrying some patches that will avoid this problem by virtue of tracking dirtying of mmap'ed pages. With these patches nr_dirty is properly incremented and the pdflush logic should kick in and do its thing. This would explain why lowering dirty_*ratio would sometimes help, that would kick off the pdflush thread earlier, which would then detect the previously unknown dirty pages.