nfs_statfs: statfs error = 116

All of lore.kernel.org
 help / color / mirror / Atom feed

* nfs_statfs: statfs error = 116
@ 2003-09-18  9:56 Marc Schmitt
  2003-09-18 13:49 ` Steve Dickson
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Schmitt @ 2003-09-18  9:56 UTC (permalink / raw)
  To: nfs

Dear all,

Server: RedHat 7.3, kernel 2.4.20-19.7smp, nfs-util 1.0.5
Client: RedHat 7.3, kernel 2.4.18-27.7.x, nfs-utils 0.3.3-6.73

Over 200 entries in /etc/exports, after issuing 'exportfs -r', some 
clients keep certain home directories stale.

On the client, the dmesg output says:

nfs: server x not responding, still trying
nfs: server x OK
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116
nfs_statfs: statfs error = 116

bash# ls -ld /home/user
ls: /home/user: Stale NFS file handle

On the server, I'm looking at the traffic from client ('tcpdump -s300 -i 
eth2 -Nt host client'):
...
client.4180178520 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4180178520: reply ok 32 (DF)
client.4196955736 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4196955736: reply ok 32 (DF)
client.4213732952 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4213732952: reply ok 32 (DF)
client.4230510168 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4230510168: reply ok 32 (DF)
client.4247287384 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4247287384: reply ok 32 (DF)
client.4264064600 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4264064600: reply ok 32 (DF)
client.4280841816 > fs.nfs: 112 read fh Unknown/1 4096 bytes @ 
0x000029000 (DF)
x.nfs > client.4280841816: reply ok 32 (DF)
...

I don't know how to interpret this, but the facts are:

- the server x is running NFS, most clients still work w/o problems
- client can not recover from the stale handle and it looks like it is 
spamming the server

My questions are:

- Is this a race?
- Is there a way to get the client working again w/o having to reboot 
(or kill all the users processes and umount the home if that's 
possible)? I tried restarting rpc.statd on the client but that did not help.
- How can I provide more debugging infos if needed?
- Could this be related to the thread "[NFS] nfs errors clutter up logs 
after 2.4.20 -> 2.4.22-pre10", we really see a lot of messages like that 
on all clients

Thanks for your help.

Regards,
     Marc







-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18  9:56 Marc Schmitt
@ 2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Steve Dickson @ 2003-09-18 13:49 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: nfs



Marc Schmitt wrote:

>
> My questions are:
>
> - Is this a race? 

It sounds to me like it could be be a server issue under
a very heavy load...  How many nfsd are you running? Try
increasing the number to see if that helps....

>
> - Is there a way to get the client working again w/o having to reboot 
> (or kill all the users processes and umount the home if that's 
> possible)? I tried restarting rpc.statd on the client but that did not 
> help.

not really... :(

> - How can I provide more debugging infos if needed? 

ethereal traces have more information and are
generally more useful... imo...

SteveD.




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
@ 2003-09-18 19:24   ` Bernd Schubert
  2003-09-18 19:31     ` Marc Schmitt
  2003-09-18 19:26   ` Marc Schmitt
  2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 1 reply; 21+ messages in thread
From: Bernd Schubert @ 2003-09-18 19:24 UTC (permalink / raw)
  To: Steve Dickson, Marc Schmitt; +Cc: nfs

> > - Is there a way to get the client working again w/o having to reboot
> > (or kill all the users processes and umount the home if that's
> > possible)? I tried restarting rpc.statd on the client but that did not
> > help.
>
> not really... :(
>

Mounting the directory again should help. IHMO not a nice solution but it 
always worked on our systems.

Cheers,
	Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
@ 2003-09-18 19:26   ` Marc Schmitt
  2003-09-19  0:22     ` Neil Brown
  2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 1 reply; 21+ messages in thread
From: Marc Schmitt @ 2003-09-18 19:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
> 
> >
> > My questions are:
> >
> > - Is this a race? 
> 
> It sounds to me like it could be be a server issue under
> a very heavy load...  How many nfsd are you running? Try
> increasing the number to see if that helps....

Thanks, I'm trying that and changed the number of nfsd from 32 to 64 on
the production system.

> > - How can I provide more debugging infos if needed? 
> 
> ethereal traces have more information and are
> generally more useful... imo...

I'll try to get a test setup running with the same software versions,
create a couple hundres of exports and bomb it from one of the our
clusters with bonnie++s. Like that I'll hopefully be able to reproduce
this re-exporting issue.

A user has found a bug that appears when checking out a big subversion
repository on the same server over NFS, it will always timeout upon this
huge amount of file manipulations and the checkout fails. He then
reproduced the issue with a small script that basicly loops over those
four commands:

rename ("old/bla", "new/bla")
stat("new,bla",..)
chmod("new/bla")
rename ("new/bla", "old/bla")

Before 1000 iterations the script returns: Error setting new/bla to
read-only! We'll try to narrow this down on the test cluster, too. One
particularity has been found already: the bug only appears if
the renaming takes place over directory boundries.

Regards,
    Marc

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:24   ` Bernd Schubert
@ 2003-09-18 19:31     ` Marc Schmitt
  2003-09-18 19:49       ` Bernd Schubert
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Schmitt @ 2003-09-18 19:31 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nfs

On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > - Is there a way to get the client working again w/o having to reboot
> > > (or kill all the users processes and umount the home if that's
> > > possible)? I tried restarting rpc.statd on the client but that did not
> > > help.
> >
> > not really... :(
> >
> 
> Mounting the directory again should help. IHMO not a nice solution but it 
> always worked on our systems.

Do you mean 'mount -o remount /home/user'?
I've tried that, it didn't work.

Greetz
   Marc



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:31     ` Marc Schmitt
@ 2003-09-18 19:49       ` Bernd Schubert
  2003-11-14 14:57         ` Marc Schmitt
  0 siblings, 1 reply; 21+ messages in thread
From: Bernd Schubert @ 2003-09-18 19:49 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: nfs

On Thursday 18 September 2003 21:31, Marc Schmitt wrote:
> On Thu, 2003-09-18 at 21:24, Bernd Schubert wrote:
> > > > - Is there a way to get the client working again w/o having to reboot
> > > > (or kill all the users processes and umount the home if that's
> > > > possible)? I tried restarting rpc.statd on the client but that did
> > > > not help.
> > >
> > > not really... :(
> >
> > Mounting the directory again should help. IHMO not a nice solution but it
> > always worked on our systems.
>
> Do you mean 'mount -o remount /home/user'?
> I've tried that, it didn't work.
>

No, no, I simply mean:

mount -t nfs server:export_dir   target_dir 

so just 'overmounting' the already mounted directory.

Hope it helps,
Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:26   ` Marc Schmitt
@ 2003-09-19  0:22     ` Neil Brown
  2003-09-19  9:27       ` Marc Schmitt
  0 siblings, 1 reply; 21+ messages in thread
From: Neil Brown @ 2003-09-19  0:22 UTC (permalink / raw)
  To: Marc Schmitt; +Cc: Steve Dickson, nfs

On  September 18, mschmitt@inf.ethz.ch wrote:
> 
> A user has found a bug that appears when checking out a big subversion
> repository on the same server over NFS, it will always timeout upon this
> huge amount of file manipulations and the checkout fails. He then
> reproduced the issue with a small script that basicly loops over those
> four commands:
> 
> rename ("old/bla", "new/bla")
> stat("new,bla",..)
> chmod("new/bla")
> rename ("new/bla", "old/bla")
> 
> Before 1000 iterations the script returns: Error setting new/bla to
> read-only! We'll try to narrow this down on the test cluster, too. One
> particularity has been found already: the bug only appears if
> the renaming takes place over directory boundries.

Sounds like you are using the "subtree_check" export flag on that
export (possibly implicitly).  You don't want to.  i.e. add
"no_subtree_check" after reading "man exports"

NeilBrown


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-19  0:22     ` Neil Brown
@ 2003-09-19  9:27       ` Marc Schmitt
  0 siblings, 0 replies; 21+ messages in thread
From: Marc Schmitt @ 2003-09-19  9:27 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

Neil Brown wrote:

>On  September 18, mschmitt@inf.ethz.ch wrote:
>  
>
>>A user has found a bug that appears when checking out a big subversion
>>repository on the same server over NFS, it will always timeout upon this
>>huge amount of file manipulations and the checkout fails. He then
>>reproduced the issue with a small script that basicly loops over those
>>four commands:
>>
>>rename ("old/bla", "new/bla")
>>stat("new,bla",..)
>>chmod("new/bla")
>>rename ("new/bla", "old/bla")
>>
>>Before 1000 iterations the script returns: Error setting new/bla to
>>read-only! We'll try to narrow this down on the test cluster, too. One
>>particularity has been found already: the bug only appears if
>>the renaming takes place over directory boundries.
>>    
>>
>
>Sounds like you are using the "subtree_check" export flag on that
>export (possibly implicitly).  You don't want to.  i.e. add
>"no_subtree_check" after reading "man exports"
>  
>
Excellent, that worked. Thank you very much.

    Marc



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 13:49 ` Steve Dickson
  2003-09-18 19:24   ` Bernd Schubert
  2003-09-18 19:26   ` Marc Schmitt
@ 2003-09-21 13:38   ` Marc Schmitt
  2 siblings, 0 replies; 21+ messages in thread
From: Marc Schmitt @ 2003-09-21 13:38 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

On Thu, 2003-09-18 at 15:49, Steve Dickson wrote:
> Marc Schmitt wrote:
> 
> >
> > My questions are:
> >
> > - Is this a race? 
> 
> It sounds to me like it could be be a server issue under
> a very heavy load...  How many nfsd are you running? Try
> increasing the number to see if that helps....

I remembered that the NFS-HowTo refers to this by giving a rule how to
detremine if one needs more nfsd running, the HowTo says
(http://nfs.sourceforge.net/nfs-howto/performance.html section 5.6):
"If you are using a 2.4 or higher kernel and you want to see how heavily
each nfsd thread is being used, you can look at the file
/proc/net/rpc/nfsd. The last ten numbers on the th line in that file
indicate the number of seconds that the thread usage was at that
percentage of the maximum allowable. If you have a large number in the
top three deciles, you may wish to increase the number of nfsd
instances."

The th line looks like this (after changing to 64 nfsd, obviously):
th 64 6121728 134012.900 61327.500 34092.130 21573.980 22513.750
8121.200 5826.550 4062.540 3129.340 26975.820

The last ten numbers are then:
134012.900 61327.500 34092.130 21573.980 22513.750 8121.200 5826.550
4062.540 3129.340 26975.820

But what is referred by "top three deciles"? I have an english
understanding problem here, sorry. I looked up the word decile and it
means what I guessed: one tenth or one unit out of ten. Does that mean
that the top three deciles are:
134012.900 61327.500 34092.130 ?

That does not make sense to me, because it says "If you have a large
number...", it refers to "a" number or should it read "If you have a
large number amongst the top...".

And then what is "the thread usage at that percentage of the maximum
allowable"? Which value refers to the maximum? 6121728?

Can someone please try to explain this to me, I'm pretty much lost...

TIA

	Marc

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* nfs_statfs: statfs error = 116
@ 2003-11-13 14:15 martin.knoblauch 
  2003-11-13 14:39 ` Richard B. Johnson
  0 siblings, 1 reply; 21+ messages in thread
From: martin.knoblauch  @ 2003-11-13 14:15 UTC (permalink / raw)
  To: linux-kernel

Hi,

  sorry if OT, but what is above message trying to tell me? Where can I 
find a translation of the numbers? We are seeing 116 very frequently, 
512 and 5 on occasion.

  We have a bunch of Linux clients (Dual P4, RH7.3, 2.4.20-18.7smp 
errata kernel) hanging off two Sun NFS Servers (Solaris 8) in a 
Veritas/VCS HA configuration. All of the clients show the 116 messages, 
while some of them show the 512 in addition. Those with the 512s seem to 
"hang" for some periods of time.

  The mounts are "vers=3,proto=tcp,hard,intr,bg". Some of them mounted 
at boottime, quite a few via "amd".

  Any ideas are welcome.

Thanks
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:15 nfs_statfs: statfs error = 116 martin.knoblauch 
@ 2003-11-13 14:39 ` Richard B. Johnson
  2003-11-13 14:52   ` Martin.Knoblauch
  2003-11-13 15:27   ` Trond Myklebust
  0 siblings, 2 replies; 21+ messages in thread
From: Richard B. Johnson @ 2003-11-13 14:39 UTC (permalink / raw)
  To: martin.knoblauch ; +Cc: Linux kernel

On Thu, 13 Nov 2003, martin.knoblauch  wrote:

> Hi,
>
>   sorry if OT, but what is above message trying to tell me? Where can I
> find a translation of the numbers? We are seeing 116 very frequently,
> 512 and 5 on occasion.
>

ESTALE is "errno" 116
EIO  is "errno" 5
ERESTARTSYS is "errno" 512

You can find these in /usr/include/asm/errno.h (not good to
directly include in a program).

The program reporting these errors should have included:

<errno.h>
<string.h>

Then used...
	strerror(errno);
or
	perror("");
etc.

Errno 512 should never be seen by user-mode program, so the
header file, /usr/include/linux/errno.h, states...

ESTALE happens when a mounted file-system is on a server that
went down or re-booted. The file-handles are then "stale".

EIO is a general catch-all for an I/O error.

ERESTARTSYS is the error returned by a server that has
re-booted that is supposed to tell the client-side software
to get a new file-handle because of an attempt to access with
a stale file-handle. When getting this error, the client
should have reopened the file(s) to obtain a new handle.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:39 ` Richard B. Johnson
@ 2003-11-13 14:52   ` Martin.Knoblauch
  2003-11-13 20:26     ` Jesse Pollard
  2003-11-13 15:27   ` Trond Myklebust
  1 sibling, 1 reply; 21+ messages in thread
From: Martin.Knoblauch @ 2003-11-13 14:52 UTC (permalink / raw)
  To: root; +Cc: Linux kernel





"Richard B. Johnson" <root@chaos.analogic.com> wrote on 11/13/2003 03:39:53
PM:

> On Thu, 13 Nov 2003, martin.knoblauch  wrote:
>
> > Hi,
> >
> >   sorry if OT, but what is above message trying to tell me? Where can I
> > find a translation of the numbers? We are seeing 116 very frequently,
> > 512 and 5 on occasion.
> >
>
> ESTALE is "errno" 116
> EIO  is "errno" 5
> ERESTARTSYS is "errno" 512
>
> You can find these in /usr/include/asm/errno.h (not good to
> directly include in a program).
>
> The program reporting these errors should have included:
>
> <errno.h>
> <string.h>
>

 The messages actually come out of the kernel-nfs code (inode.c). Should
have mentioned "dmesg" :-)

> Then used...
>    strerror(errno);
> or
>    perror("");
> etc.
>
>
> Errno 512 should never be seen by user-mode program, so the
> header file, /usr/include/linux/errno.h, states...
>

 This worries me a bit :-)

> ESTALE happens when a mounted file-system is on a server that
> went down or re-booted. The file-handles are then "stale".
>

 I am "alomost" sure that there were no reboot or failover events at the
time of most of the stale messages. But I'm not going to lay my hand on the
book for that.

> EIO is a general catch-all for an I/O error.
>
> ERESTARTSYS is the error returned by a server that has
> re-booted that is supposed to tell the client-side software
> to get a new file-handle because of an attempt to access with
> a stale file-handle. When getting this error, the client
> should have reopened the file(s) to obtain a new handle.
>

 Definitely no server reboot or HA Failover at the time of the messages.

Thanks
Martin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:39 ` Richard B. Johnson
  2003-11-13 14:52   ` Martin.Knoblauch
@ 2003-11-13 15:27   ` Trond Myklebust
  2003-11-13 16:00     ` Richard B. Johnson
  1 sibling, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2003-11-13 15:27 UTC (permalink / raw)
  To: root; +Cc: martin.knoblauch , Linux kernel

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

     > ESTALE happens when a mounted file-system is on a server that
     > went down or re-booted. The file-handles are then "stale".

Sort of. It means that the server is unable to find the file that
corresponds to the filehandle that the client sent it. If the server
strictly follows the NFS specs, then this is only supposed to happen
if somebody else has deleted the file (and this is why designing a
scheme for generating filehandles is such a difficult job).

Some broken servers do, however, "lose" the file in other interesting
and unpredictable ways.

     > ERESTARTSYS is the error returned by a server that has
     > re-booted that is supposed to tell the client-side software to
     > get a new file-handle because of an attempt to access with a
     > stale file-handle. When getting this error, the client should
     > have reopened the file(s) to obtain a new handle.

ERESTARTSYS actually just means that a signal was received while
inside a system call. If this results in a interruption of that
syscall, the kernel is supposed to translate ERESTARTSYS into the user
error EINTR.

Userland should therefore never have to handle ERESTARTSYS errors.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 15:27   ` Trond Myklebust
@ 2003-11-13 16:00     ` Richard B. Johnson
  2003-11-13 17:03       ` Trond Myklebust
  0 siblings, 1 reply; 21+ messages in thread
From: Richard B. Johnson @ 2003-11-13 16:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: martin.knoblauch , Linux kernel

On Thu, 13 Nov 2003, Trond Myklebust wrote:

> >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
>
>      > ESTALE happens when a mounted file-system is on a server that
>      > went down or re-booted. The file-handles are then "stale".
>
> Sort of. It means that the server is unable to find the file that
> corresponds to the filehandle that the client sent it. If the server
> strictly follows the NFS specs, then this is only supposed to happen
> if somebody else has deleted the file (and this is why designing a
> scheme for generating filehandles is such a difficult job).
>
> Some broken servers do, however, "lose" the file in other interesting
> and unpredictable ways.
>
>      > ERESTARTSYS is the error returned by a server that has
>      > re-booted that is supposed to tell the client-side software to
>      > get a new file-handle because of an attempt to access with a
>      > stale file-handle. When getting this error, the client should
>      > have reopened the file(s) to obtain a new handle.
>
> ERESTARTSYS actually just means that a signal was received while
> inside a system call. If this results in a interruption of that
> syscall, the kernel is supposed to translate ERESTARTSYS into the user
> error EINTR.
>
> Userland should therefore never have to handle ERESTARTSYS errors.
>

Hmmm, Maybe I'm getting confused by all the winning-lottery messages,
but it's in the syscall specifications for connect() and
even fcntl(). http:/www.infran.ru/Techinfo/syscalls/syscalls_43.html

Also, maybe Linux now claims exclusive ownership and keeps it internal,
but some networking software, nfsd and pcnfsd, might not know about that.
I've seen ERESTARTSYS returned from a DOS (actually FAT) file-handle use
after a server has crashed and come back on-line.

Moot point, though, the reported errors were internal via syslog, which
was not previously known when I responded.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 16:00     ` Richard B. Johnson
@ 2003-11-13 17:03       ` Trond Myklebust
  0 siblings, 0 replies; 21+ messages in thread
From: Trond Myklebust @ 2003-11-13 17:03 UTC (permalink / raw)
  To: root; +Cc: martin.knoblauch , Linux kernel

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

    >> ERESTARTSYS actually just means that a signal was received
    >> while inside a system call. If this results in a interruption
    >> of that syscall, the kernel is supposed to translate
    >> ERESTARTSYS into the user error EINTR.

     > Hmmm, Maybe I'm getting confused by all the winning-lottery
     > messages, but it's in the syscall specifications for connect()
     > and even
     > fcntl(). http:/www.infran.ru/Techinfo/syscalls/syscalls_43.html

AFAICS that documentation was written in 1994, and refers to Linux
v1.0. We've come a long way since then...

Todays Linux userland is supposed to try to comply with the Single
Unix Specification (see http://www.unix-systems.org/version3/)
whenever possible. ERESTARTSYS is missing altogether from the SUSv3
definitions in <errno.h> (and hence does not appear as a valid return
value for any SUSv3-compliant functions).

Note: the Linux manpages do list ERESTARTSYS as still being returned
by the accept() and syslog() system call. In both those cases,
however, they point out that your libc is supposed to intercept it
before it gets to the user.

     > Also, maybe Linux now claims exclusive ownership and keeps it
     > internal, but some networking software, nfsd and pcnfsd, might
     > not know about that.  I've seen ERESTARTSYS returned from a DOS
     > (actually FAT) file-handle use after a server has crashed and
     > come back on-line.

Linux used to be buggy/non-compliant w.r.t. NFS exporting of FAT
filesystems. I'm not sure if that has been fixed yet.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:52   ` Martin.Knoblauch
@ 2003-11-13 20:26     ` Jesse Pollard
  2003-11-13 20:34       ` Trond Myklebust
  0 siblings, 1 reply; 21+ messages in thread
From: Jesse Pollard @ 2003-11-13 20:26 UTC (permalink / raw)
  To: Martin.Knoblauch, root; +Cc: Linux kernel

On Thursday 13 November 2003 08:52, Martin.Knoblauch@mscsoftware.com wrote:
> "Richard B. Johnson" <root@chaos.analogic.com> wrote on 11/13/2003 03:39:53
>
> PM:
> > On Thu, 13 Nov 2003, martin.knoblauch  wrote:
[snip]
> > ESTALE happens when a mounted file-system is on a server that
> > went down or re-booted. The file-handles are then "stale".
>
>  I am "alomost" sure that there were no reboot or failover events at the
> time of most of the stale messages. But I'm not going to lay my hand on the
> book for that.

ESTALE should occur whenever the client looses connection to the server,
or thinks it has lost connection. It isn't directly related to the server
other than the fact that a server reboot will also cause it to happen.

This should be a transient failure that recovers when communication verified
from some of the timeouts/retries associated with NFS.

At worst, it can require a remount of the NFS volumn.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 20:26     ` Jesse Pollard
@ 2003-11-13 20:34       ` Trond Myklebust
  2003-11-14  8:43         ` Martin.Knoblauch
  0 siblings, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2003-11-13 20:34 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: Martin.Knoblauch, root, Linux kernel

>>>>> " " == Jesse Pollard <jesse@cats-chateau.net> writes:

     > ESTALE should occur whenever the client looses connection to
     > the server, or thinks it has lost connection.

No it should not.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 20:34       ` Trond Myklebust
@ 2003-11-14  8:43         ` Martin.Knoblauch
  2003-11-14 13:49           ` Trond Myklebust
  0 siblings, 1 reply; 21+ messages in thread
From: Martin.Knoblauch @ 2003-11-14  8:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jesse Pollard, Linux kernel, root

Trond Myklebust <trond.myklebust@fys.uio.no> wrote on 11/13/2003 09:34:55
PM:

> >>>>> " " == Jesse Pollard <jesse@cats-chateau.net> writes:
>
>      > ESTALE should occur whenever the client looses connection to
>      > the server, or thinks it has lost connection.
>
> No it should not.
>
> Cheers,
>   Trond
Hi Trond,

 just by incident I found one reason when an user space application can get
the ESTALE in our setup (Linux client RH-2.4.20-18.7smp, Solaris 2.8
Server). I accidentally run iozone on two clients with the output file
being the same and residing on the NFS Server. Pure luser error, but it
produced ESTALE pretty much reproducibly.

B^HCheers
Martin
--
Martin Knoblauch
Senior System Architect
MSC.software GmbH
Am Moosfeld 13
D-81829 Muenchen, Germany

e-mail: martin.knoblauch@mscsoftware.com
http://www.mscsoftware.com
Phone/Fax: +49-89-431987-189 / -7189
Mobile: +49-174-3069245

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-14  8:43         ` Martin.Knoblauch
@ 2003-11-14 13:49           ` Trond Myklebust
  2003-11-14 14:22             ` Martin.Knoblauch
  0 siblings, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2003-11-14 13:49 UTC (permalink / raw)
  To: Martin.Knoblauch; +Cc: Linux kernel

>>>>> " " == Martin Knoblauch <Martin.Knoblauch@mscsoftware.com> writes:

     > I accidentally run iozone on two clients with the output file
     > being the same and residing on the NFS Server. Pure luser
     > error, but it produced ESTALE pretty much reproducibly.

Sure. This is a prime example of where ESTALE *is* appropriate. One
NFS client is deleting a file on the server while the other is still
using it.

In the NFSv2/v3 protocols, the assumption is that filehandles are
valid for the entire lifetime of the file on the server. IOW only
"unlink()" can cause a valid filehandle to become stale. This is
mainly because there is no notion of open()/close(), so the server
would never be capable of determining when your client has stopped
using the filehandle.

If your 2 processes were running on the same machine, you would have
seen the kernel temporarily rename your file to .nfsXXXXXX in order to
work around the above problem. Delete that file, and you will generate
ESTALE reproducibly too....

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-14 13:49           ` Trond Myklebust
@ 2003-11-14 14:22             ` Martin.Knoblauch
  0 siblings, 0 replies; 21+ messages in thread
From: Martin.Knoblauch @ 2003-11-14 14:22 UTC (permalink / raw)
  To: trond.myklebust; +Cc: Linux kernel






Trond Myklebust <trond.myklebust@fys.uio.no> wrote on 11/14/2003 02:49:31
PM:

> >>>>> " " == Martin Knoblauch <Martin.Knoblauch@mscsoftware.com> writes:
>
>      > I accidentally run iozone on two clients with the output file
>      > being the same and residing on the NFS Server. Pure luser
>      > error, but it produced ESTALE pretty much reproducibly.
>
> Sure. This is a prime example of where ESTALE *is* appropriate. One
> NFS client is deleting a file on the server while the other is still
> using it.
>
> In the NFSv2/v3 protocols, the assumption is that filehandles are
> valid for the entire lifetime of the file on the server. IOW only
> "unlink()" can cause a valid filehandle to become stale. This is
> mainly because there is no notion of open()/close(), so the server
> would never be capable of determining when your client has stopped
> using the filehandle.
>
> If your 2 processes were running on the same machine, you would have
> seen the kernel temporarily rename your file to .nfsXXXXXX in order to
> work around the above problem. Delete that file, and you will generate
> ESTALE reproducibly too....
>
> Cheers,
>   Trond
Trond,

 cool. Great explanation. Always good if you can get those that know into
talking :-)

Cheers
Martin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-09-18 19:49       ` Bernd Schubert
@ 2003-11-14 14:57         ` Marc Schmitt
  0 siblings, 0 replies; 21+ messages in thread
From: Marc Schmitt @ 2003-11-14 14:57 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nfs

Hi Bernd,

Thanks for that hint, I've used it a couple of times meanwhile and it 
works fine.

    Marc

Bernd Schubert wrote:

> No, no, I simply mean:
>
>mount -t nfs server:export_dir   target_dir 
>
>so just 'overmounting' the already mounted directory.
>
>Hope it helps,
>Bernd
>  
>



-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-11-14 14:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-13 14:15 nfs_statfs: statfs error = 116 martin.knoblauch 
2003-11-13 14:39 ` Richard B. Johnson
2003-11-13 14:52   ` Martin.Knoblauch
2003-11-13 20:26     ` Jesse Pollard
2003-11-13 20:34       ` Trond Myklebust
2003-11-14  8:43         ` Martin.Knoblauch
2003-11-14 13:49           ` Trond Myklebust
2003-11-14 14:22             ` Martin.Knoblauch
2003-11-13 15:27   ` Trond Myklebust
2003-11-13 16:00     ` Richard B. Johnson
2003-11-13 17:03       ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2003-09-18  9:56 Marc Schmitt
2003-09-18 13:49 ` Steve Dickson
2003-09-18 19:24   ` Bernd Schubert
2003-09-18 19:31     ` Marc Schmitt
2003-09-18 19:49       ` Bernd Schubert
2003-11-14 14:57         ` Marc Schmitt
2003-09-18 19:26   ` Marc Schmitt
2003-09-19  0:22     ` Neil Brown
2003-09-19  9:27       ` Marc Schmitt
2003-09-21 13:38   ` Marc Schmitt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.