* nfsd4_stateowners problem
@ 2010-08-27 17:48 Peter Skensved
2010-09-14 17:31 ` J. Bruce Fields
0 siblings, 1 reply; 5+ messages in thread
From: Peter Skensved @ 2010-08-27 17:48 UTC (permalink / raw)
To: linux-nfs
I'm looking for pointers and information on how to debug and annoying NFS
problem that has been bugging us for a long time. The problem is that the number
of nfsd4_stateowners keeps increasing until all low memory is exhausted and
the oom-killer is invoked. The severity of the problem has changed over time
with different kernels. At present it takes about 5 weeks for the size to
grow to 500 Mb ( kernel 2.6.18-194.8.1.el5PAE, CentOS5.5 ). Restarting
nfs clears up the problem but it is definitely not the preferred solution.
The increase in the number of nfsd4_stateowners appears to happen in bursts.
Nothing happens for long times and I suddenly see a burst. I've tried ( briefly )
to turn on all logging in rpcdebug and have run tcpdump while watching slabtop
but there is too much output to be able to see if there is anything strange
happening. So - my question is : how do I limit the diagnostic output to what
is relevant ? What are the modules and flags that I should be looking at ?
Any other info I should bemonitoring ? /proc/fs/nfsfs ?
peter
----
Peter Skensved Email : peter@SNO.Phy.QueensU.CA
Dept. of Physics,
Queen's University,
Kingston, Ontario,
Canada
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd4_stateowners problem
2010-08-27 17:48 nfsd4_stateowners problem Peter Skensved
@ 2010-09-14 17:31 ` J. Bruce Fields
2010-09-14 18:40 ` Peter Skensved
0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2010-09-14 17:31 UTC (permalink / raw)
To: Peter Skensved; +Cc: linux-nfs
On Fri, Aug 27, 2010 at 01:48:23PM -0400, Peter Skensved wrote:
>
> I'm looking for pointers and information on how to debug and annoying NFS
> problem that has been bugging us for a long time. The problem is that the number
> of nfsd4_stateowners keeps increasing until all low memory is exhausted and
> the oom-killer is invoked. The severity of the problem has changed over time
> with different kernels. At present it takes about 5 weeks for the size to
> grow to 500 Mb ( kernel 2.6.18-194.8.1.el5PAE, CentOS5.5 ). Restarting
> nfs clears up the problem but it is definitely not the preferred solution.
>
> The increase in the number of nfsd4_stateowners appears to happen in bursts.
> Nothing happens for long times and I suddenly see a burst. I've tried ( briefly )
> to turn on all logging in rpcdebug and have run tcpdump while watching slabtop
> but there is too much output to be able to see if there is anything strange
> happening. So - my question is : how do I limit the diagnostic output to what
> is relevant ? What are the modules and flags that I should be looking at ?
> Any other info I should bemonitoring ? /proc/fs/nfsfs ?
>From the point of view of upstream, 2.6.18 is a bit old.
I can't think of any existing logging or statistics that would answer
the question; we'd probably need to add some more.
--b.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd4_stateowners problem
2010-09-14 17:31 ` J. Bruce Fields
@ 2010-09-14 18:40 ` Peter Skensved
2010-09-14 20:00 ` J. Bruce Fields
0 siblings, 1 reply; 5+ messages in thread
From: Peter Skensved @ 2010-09-14 18:40 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Tue, Sep 14, 2010 at 01:31:54PM -0400, J. Bruce Fields wrote:
> On Fri, Aug 27, 2010 at 01:48:23PM -0400, Peter Skensved wrote:
> >
> > I'm looking for pointers and information on how to debug and annoying NFS
> > problem that has been bugging us for a long time. The problem is that the number
> > of nfsd4_stateowners keeps increasing until all low memory is exhausted and
> > the oom-killer is invoked. The severity of the problem has changed over time
> > with different kernels. At present it takes about 5 weeks for the size to
> > grow to 500 Mb ( kernel 2.6.18-194.8.1.el5PAE, CentOS5.5 ). Restarting
> > nfs clears up the problem but it is definitely not the preferred solution.
> >
> > The increase in the number of nfsd4_stateowners appears to happen in bursts.
> > Nothing happens for long times and I suddenly see a burst. I've tried ( briefly )
> > to turn on all logging in rpcdebug and have run tcpdump while watching slabtop
> > but there is too much output to be able to see if there is anything strange
> > happening. So - my question is : how do I limit the diagnostic output to what
> > is relevant ? What are the modules and flags that I should be looking at ?
> > Any other info I should bemonitoring ? /proc/fs/nfsfs ?
>
> >From the point of view of upstream, 2.6.18 is a bit old.
>
> I can't think of any existing logging or statistics that would answer
> the question; we'd probably need to add some more.
>
> --b.
Thanks for the reply. The current RedHat EL5 kernels are all based on 2.6.18 with
a lot of backported fixes so I'm not sure what version of the NFS code I'm effectively
running.
Do you know what the state_owners are used for ? What puzzles me is that in our case
we have a large number of workstations which NFS mounts some fairly large, mostly static
common directories and automounts HOME directories. So I would expect the amount of state
info that needs to be kept would be fairly constant. When the automounter unmounts the
info ought to go away . Yet the number of stateowners for the most part just keep on
growing.
The only work around at the moment is to reboot before it has eaten up around 500 Mb
of slabs
peter
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd4_stateowners problem
2010-09-14 18:40 ` Peter Skensved
@ 2010-09-14 20:00 ` J. Bruce Fields
2010-09-14 20:24 ` Peter Skensved
0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2010-09-14 20:00 UTC (permalink / raw)
To: Peter Skensved; +Cc: linux-nfs
On Tue, Sep 14, 2010 at 02:40:44PM -0400, Peter Skensved wrote:
> Thanks for the reply. The current RedHat EL5 kernels are all based on 2.6.18 with
> a lot of backported fixes so I'm not sure what version of the NFS code I'm effectively
> running.
>
> Do you know what the state_owners are used for ? What puzzles me is that in our case
They represent some notion of "who" is performing an open, or performing
a lock.
> we have a large number of workstations which NFS mounts some fairly large, mostly static
> common directories and automounts HOME directories. So I would expect the amount of state
> info that needs to be kept would be fairly constant. When the automounter unmounts the
> info ought to go away . Yet the number of stateowners for the most part just keep on
> growing.
>
> The only work around at the moment is to reboot before it has eaten up around 500 Mb
> of slabs
Is someone doing a lot of file locking?
I can't remember the logic the server uses to decide when to throw away
a lockowner, but it may just be inadequate.
The client has also had some fixes recently to be better about telling
the server when to throw them away.
--b.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd4_stateowners problem
2010-09-14 20:00 ` J. Bruce Fields
@ 2010-09-14 20:24 ` Peter Skensved
0 siblings, 0 replies; 5+ messages in thread
From: Peter Skensved @ 2010-09-14 20:24 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Tue, Sep 14, 2010 at 04:00:54PM -0400, J. Bruce Fields wrote:
> On Tue, Sep 14, 2010 at 02:40:44PM -0400, Peter Skensved wrote:
> > Thanks for the reply. The current RedHat EL5 kernels are all based on 2.6.18 with
> > a lot of backported fixes so I'm not sure what version of the NFS code I'm effectively
> > running.
> >
> > Do you know what the state_owners are used for ? What puzzles me is that in our case
>
> They represent some notion of "who" is performing an open, or performing
> a lock.
>
> > we have a large number of workstations which NFS mounts some fairly large, mostly static
> > common directories and automounts HOME directories. So I would expect the amount of state
> > info that needs to be kept would be fairly constant. When the automounter unmounts the
> > info ought to go away . Yet the number of stateowners for the most part just keep on
> > growing.
> >
> > The only work around at the moment is to reboot before it has eaten up around 500 Mb
> > of slabs
>
> Is someone doing a lot of file locking?
Not that I'm aware of -
>
> I can't remember the logic the server uses to decide when to throw away
> a lockowner, but it may just be inadequate.
>
> The client has also had some fixes recently to be better about telling
> the server when to throw them away.
>
> --b.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-14 20:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-27 17:48 nfsd4_stateowners problem Peter Skensved
2010-09-14 17:31 ` J. Bruce Fields
2010-09-14 18:40 ` Peter Skensved
2010-09-14 20:00 ` J. Bruce Fields
2010-09-14 20:24 ` Peter Skensved
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox