Linux NFS development
 help / color / mirror / Atom feed
* nfs_statfs: statfs error = 116
@ 2003-09-18  9:56 Marc Schmitt
  2003-09-18 13:49 ` Steve Dickson
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Schmitt @ 2003-09-18  9:56 UTC (permalink / raw)
  To: nfs

Dear all,

Server: RedHat 7.3, kernel 2.4.20-19.7smp, nfs-util 1.0.5
Client: RedHat 7.3, kernel 2.4.18-27.7.x, nfs-utils 0.3.3-6.73

Over 200 entries in /etc/exports, after issuing 'exportfs -r', some 
clients keep certain home directories stale.

On the client, the dmesg output says:

nfs: server x not responding, still trying
nfs: server x OK
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116

bash# ls -ld /home/user
ls: /home/user: Stale NFS file handle

On the server, I'm looking at the traffic from client ('tcpdump -s300 -i 
eth2 -Nt host client'):
...
client.4180178520 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4180178520: reply ok 32 (DF)
client.4196955736 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4196955736: reply ok 32 (DF)
client.4213732952 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4213732952: reply ok 32 (DF)
client.4230510168 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4230510168: reply ok 32 (DF)
client.4247287384 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4247287384: reply ok 32 (DF)
client.4264064600 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4264064600: reply ok 32 (DF)
client.4280841816 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4280841816: reply ok 32 (DF)
...

I don't know how to interpret this, but the facts are:

- the server x is running NFS, most clients still work w/o problems
- client can not recover from the stale handle and it looks like it is 
spamming the server

My questions are:

- Is this a race?
- Is there a way to get the client working again w/o having to reboot 
(or kill all the users processes and umount the home if that's 
possible)? I tried restarting rpc.statd on the client but that did not help.
- How can I provide more debugging infos if needed?
- Could this be related to the thread "[NFS] nfs errors clutter up logs 
after 2.4.20 -> 2.4.22-pre10", we really see a lot of messages like that 
on all clients

Thanks for your help.

Regards,
     Marc







-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18  9:56 nfs_statfs: statfs error = 116 Marc Schmitt
@ 2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Steve Dickson @ 2003-09-18 13:49 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: nfs



Marc Schmitt wrote:

>
> My questions are:
>
> - Is this a race? 

It sounds to me like it could be be a server issue under
a very heavy load...  How many nfsd are you running? Try
increasing the number to see if that helps....

>
> - Is there a way to get the client working again w/o having to reboot 
> (or kill all the users processes and umount the home if that's 
> possible)? I tried restarting rpc.statd on the client but that did not 
> help.

not really... :(

> - How can I provide more debugging infos if needed? 

ethereal traces have more information and are
generally more useful... imo...

SteveD.




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
@ 2003-09-18 19:24   ` Bernd Schubert
  2003-09-18 19:31     ` Marc Schmitt
  2003-09-18 19:26   ` Marc Schmitt
  2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2003-09-18 19:24 UTC (permalink / raw)
  To: Steve Dickson, Marc Schmitt; +Cc: nfs

> > - Is there a way to get the client working again w/o having to reboot
> > (or kill all the users processes and umount the home if that's
> > possible)? I tried restarting rpc.statd on the client but that did not
> > help.
>
> not really... :(
>

Mounting the directory again should help. IHMO not a nice solution but it 
always worked on our systems.

Cheers,
	Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
@ 2003-09-18 19:26   ` Marc Schmitt
  2003-09-19  0:22     ` Neil Brown
  2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Schmitt @ 2003-09-18 19:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
> 
> >
> > My questions are:
> >
> > - Is this a race? 
> 
> It sounds to me like it could be be a server issue under
> a very heavy load...  How many nfsd are you running? Try
> increasing the number to see if that helps....

Thanks, I'm trying that and changed the number of nfsd from 32 to 64 on
the production system.

> > - How can I provide more debugging infos if needed? 
> 
> ethereal traces have more information and are
> generally more useful... imo...

I'll try to get a test setup running with the same software versions,
create a couple hundres of exports and bomb it from one of the our
clusters with bonnie++s. Like that I'll hopefully be able to reproduce
this re-exporting issue.

A user has found a bug that appears when checking out a big subversion
repository on the same server over NFS, it will always timeout upon this
huge amount of file manipulations and the checkout fails. He then
reproduced the issue with a small script that basicly loops over those
four commands:

rename ("old/bla", "new/bla")
stat("new,bla",..)
chmod("new/bla")
rename ("new/bla", "old/bla")

Before 1000 iterations the script returns: Error setting new/bla to
read-only! We'll try to narrow this down on the test cluster, too. One
particularity has been found already: the bug only appears if
the renaming takes place over directory boundries.

Regards,
    Marc




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:24   ` Bernd Schubert
@ 2003-09-18 19:31     ` Marc Schmitt
  2003-09-18 19:49       ` Bernd Schubert
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Schmitt @ 2003-09-18 19:31 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nfs

On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > - Is there a way to get the client working again w/o having to reboot
> > > (or kill all the users processes and umount the home if that's
> > > possible)? I tried restarting rpc.statd on the client but that did not
> > > help.
> >
> > not really... :(
> >
> 
> Mounting the directory again should help. IHMO not a nice solution but it 
> always worked on our systems.

Do you mean 'mount -o remount /home/user'?
I've tried that, it didn't work.

Greetz
   Marc



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:31     ` Marc Schmitt
@ 2003-09-18 19:49       ` Bernd Schubert
  2003-11-14 14:57         ` Marc Schmitt
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2003-09-18 19:49 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: nfs

On Thursday 18 September 2003 21:31, Marc Schmitt wrote:
> On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > > - Is there a way to get the client working again w/o having to reboot
> > > > (or kill all the users processes and umount the home if that's
> > > > possible)? I tried restarting rpc.statd on the client but that did
> > > > not help.
> > >
> > > not really... :(
> >
> > Mounting the directory again should help. IHMO not a nice solution but it
> > always worked on our systems.
>
> Do you mean 'mount -o remount /home/user'?
> I've tried that, it didn't work.
>

No, no, I simply mean:

mount -t nfs server:export_dir   target_dir 

so just 'overmounting' the already mounted directory.

Hope it helps,
Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:26   ` Marc Schmitt
@ 2003-09-19  0:22     ` Neil Brown
  2003-09-19  9:27       ` Marc Schmitt
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2003-09-19  0:22 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: Steve Dickson, nfs

On  September 18, mschmitt@inf.ethz.ch wrote:
> 
> A user has found a bug that appears when checking out a big subversion
> repository on the same server over NFS, it will always timeout upon this
> huge amount of file manipulations and the checkout fails. He then
> reproduced the issue with a small script that basicly loops over those
> four commands:
> 
> rename ("old/bla", "new/bla")
> stat("new,bla",..)
> chmod("new/bla")
> rename ("new/bla", "old/bla")
> 
> Before 1000 iterations the script returns: Error setting new/bla to
> read-only! We'll try to narrow this down on the test cluster, too. One
> particularity has been found already: the bug only appears if
> the renaming takes place over directory boundries.

Sounds like you are using the "subtree_check" export flag on that
export (possibly implicitly).  You don't want to.  i.e. add
"no_subtree_check" after reading "man exports"

NeilBrown


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-19  0:22     ` Neil Brown
@ 2003-09-19  9:27       ` Marc Schmitt
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Schmitt @ 2003-09-19  9:27 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

Neil Brown wrote:

>On  September 18, mschmitt@inf.ethz.ch wrote:
>  
>
>>A user has found a bug that appears when checking out a big subversion
>>repository on the same server over NFS, it will always timeout upon this
>>huge amount of file manipulations and the checkout fails. He then
>>reproduced the issue with a small script that basicly loops over those
>>four commands:
>>
>>rename ("old/bla", "new/bla")
>>stat("new,bla",..)
>>chmod("new/bla")
>>rename ("new/bla", "old/bla")
>>
>>Before 1000 iterations the script returns: Error setting new/bla to
>>read-only! We'll try to narrow this down on the test cluster, too. One
>>particularity has been found already: the bug only appears if
>>the renaming takes place over directory boundries.
>>    
>>
>
>Sounds like you are using the "subtree_check" export flag on that
>export (possibly implicitly).  You don't want to.  i.e. add
>"no_subtree_check" after reading "man exports"
>  
>
Excellent, that worked. Thank you very much.

    Marc



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
  2003-09-18 19:26   ` Marc Schmitt
@ 2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 0 replies; 10+ messages in thread
From: Marc Schmitt @ 2003-09-21 13:38 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
> 
> >
> > My questions are:
> >
> > - Is this a race? 
> 
> It sounds to me like it could be be a server issue under
> a very heavy load...  How many nfsd are you running? Try
> increasing the number to see if that helps....

I remembered that the NFS-HowTo refers to this by giving a rule how to
detremine if one needs more nfsd running, the HowTo says
(http://nfs.sourceforge.net/nfs-howto/performance.html section 5.6):
"If you are using a 2.4 or higher kernel and you want to see how heavily
each nfsd thread is being used, you can look at the file
/proc/net/rpc/nfsd. The last ten numbers on the th line in that file
indicate the number of seconds that the thread usage was at that
percentage of the maximum allowable. If you have a large number in the
top three deciles, you may wish to increase the number of nfsd
instances."

The th line looks like this (after changing to 64 nfsd, obviously):
th 64 6121728 134012.900 61327.500 34092.130 21573.980 22513.750
8121.200 5826.550 4062.540 3129.340 26975.820

The last ten numbers are then:
134012.900 61327.500 34092.130 21573.980 22513.750 8121.200 5826.550
4062.540 3129.340 26975.820

But what is referred by "top three deciles"? I have an english
understanding problem here, sorry. I looked up the word decile and it
means what I guessed: one tenth or one unit out of ten. Does that mean
that the top three deciles are:
134012.900 61327.500 34092.130 ?

That does not make sense to me, because it says "If you have a large
number...", it refers to "a" number or should it read "If you have a
large number amongst the top...".

And then what is "the thread usage at that percentage of the maximum
allowable"? Which value refers to the maximum? 6121728?

Can someone please try to explain this to me, I'm pretty much lost...

TIA

	Marc










-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:49       ` Bernd Schubert
@ 2003-11-14 14:57         ` Marc Schmitt
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Schmitt @ 2003-11-14 14:57 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nfs

Hi Bernd,

Thanks for that hint, I've used it a couple of times meanwhile and it 
works fine.

    Marc

Bernd Schubert wrote:

> No, no, I simply mean:
>
>mount -t nfs server:export_dir   target_dir 
>
>so just 'overmounting' the already mounted directory.
>
>Hope it helps,
>Bernd
>  
>



-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-11-14 14:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-18  9:56 nfs_statfs: statfs error = 116 Marc Schmitt
2003-09-18 13:49 ` Steve Dickson
2003-09-18 19:24   ` Bernd Schubert
2003-09-18 19:31     ` Marc Schmitt
2003-09-18 19:49       ` Bernd Schubert
2003-11-14 14:57         ` Marc Schmitt
2003-09-18 19:26   ` Marc Schmitt
2003-09-19  0:22     ` Neil Brown
2003-09-19  9:27       ` Marc Schmitt
2003-09-21 13:38   ` Marc Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox