From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from fieldses.org ([174.143.236.118]:38047 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751581Ab0INRdA (ORCPT ); Tue, 14 Sep 2010 13:33:00 -0400 Date: Tue, 14 Sep 2010 13:31:54 -0400 To: Peter Skensved Cc: linux-nfs@vger.kernel.org Subject: Re: nfsd4_stateowners problem Message-ID: <20100914173154.GC2409@fieldses.org> References: <20100827174823.GA26792@jay.phy.QueensU.CA> Content-Type: text/plain; charset=us-ascii In-Reply-To: <20100827174823.GA26792@jay.phy.QueensU.CA> From: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Fri, Aug 27, 2010 at 01:48:23PM -0400, Peter Skensved wrote: > > I'm looking for pointers and information on how to debug and annoying NFS > problem that has been bugging us for a long time. The problem is that the number > of nfsd4_stateowners keeps increasing until all low memory is exhausted and > the oom-killer is invoked. The severity of the problem has changed over time > with different kernels. At present it takes about 5 weeks for the size to > grow to 500 Mb ( kernel 2.6.18-194.8.1.el5PAE, CentOS5.5 ). Restarting > nfs clears up the problem but it is definitely not the preferred solution. > > The increase in the number of nfsd4_stateowners appears to happen in bursts. > Nothing happens for long times and I suddenly see a burst. I've tried ( briefly ) > to turn on all logging in rpcdebug and have run tcpdump while watching slabtop > but there is too much output to be able to see if there is anything strange > happening. So - my question is : how do I limit the diagnostic output to what > is relevant ? What are the modules and flags that I should be looking at ? > Any other info I should bemonitoring ? /proc/fs/nfsfs ? >>From the point of view of upstream, 2.6.18 is a bit old. I can't think of any existing logging or statistics that would answer the question; we'd probably need to add some more. --b.