why do i get "Stale NFS file handle" for hours?

All of lore.kernel.org
 help / color / mirror / Atom feed

* why do i get "Stale NFS file handle" for hours?
@ 2004-09-05  1:06 Sven Köhler
  2004-09-05  1:39   ` Trond Myklebust
  2004-09-06  6:47 ` Frank Steiner
  0 siblings, 2 replies; 24+ messages in thread
From: Sven Köhler @ 2004-09-05  1:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: nfs

Hi,

i think i know what's going an, and why i get the "stale nfs handle" 
error-message when the NFS server is restartet (real reboot, or a simply 
/etc/init.d/nfs restart) but what i don't understand is, why the NFS 
client doesn't "remount" the filesystem autmatically. In case of NFS 
over tcp, the NFS client could easily detect a restart of the NFS server 
(the tcp-connection was aborted) or are there other factors that keep 
the NFS client from recognizing such stuff?

The scenaria that made me writing this, is that i'm setting up an NFS 
server at my college right now. It will export a directory to many 
clients where i don't have root-access. The NFS-directories are mounted 
by the clients via automounter, and if i restart my Server for any 
reason, i will get the "stale nfs handle" for hours. The kernel does 
neither remount nor unmount the directory, and the automounter simply 
doesn't unmount it too. It keeps mounted, and that will cause me 
troubles for hours.

Were that any thought on that subject here or in any other mailinglist?
Is there any chance, that this might be improved in the future somehow?

Thx
   Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  1:06 why do i get "Stale NFS file handle" for hours? Sven Köhler
@ 2004-09-05  1:39   ` Trond Myklebust
  2004-09-06  6:47 ` Frank Steiner
  1 sibling, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  1:39 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

P=E5 lau , 04/09/2004 klokka 21:06, skreiv Sven K=F6hler:
> Hi,
>=20
> i think i know what's going an, and why i get the "stale nfs handle"=20
> error-message when the NFS server is restartet (real reboot, or a sim=
ply=20
> /etc/init.d/nfs restart) but what i don't understand is, why the NFS=20
> client doesn't "remount" the filesystem autmatically. In case of NFS=20
> over tcp, the NFS client could easily detect a restart of the NFS ser=
ver=20
> (the tcp-connection was aborted) or are there other factors that keep=
=20
> the NFS client from recognizing such stuff?

Sigh. This question keeps coming up again and again and again. Why can'=
t
you people search the archives?

Of course we could "fix" things for the user so that we just look up al=
l
those filehandles again transparently.

  The real question is: how do we know that is the right thing to do?

The NFS client wouldn't know the difference between your /etc/passwd
file and a javascript pop-up ad. If it gets an ESTALE error, then that
tells it that the original filehandle is invalid, but it does not know
WHY that is the case. The file may have been deleted and replaced by a
new one. It may be that your server is broken, and is actually losing
filehandles on reboot (as appears to be the case in your setup),...

Reopening the file, and then continuing to write from the same position
may be the right thing to do, but then again it may cause you to
overwrite a bunch of freshly written password entries.

So we bounce the error up to userland where these issues can actually b=
e
resolved.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-05  1:39   ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  1:39 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

På lau , 04/09/2004 klokka 21:06, skreiv Sven Köhler:
> Hi,
> 
> i think i know what's going an, and why i get the "stale nfs handle" 
> error-message when the NFS server is restartet (real reboot, or a simply 
> /etc/init.d/nfs restart) but what i don't understand is, why the NFS 
> client doesn't "remount" the filesystem autmatically. In case of NFS 
> over tcp, the NFS client could easily detect a restart of the NFS server 
> (the tcp-connection was aborted) or are there other factors that keep 
> the NFS client from recognizing such stuff?

Sigh. This question keeps coming up again and again and again. Why can't
you people search the archives?

Of course we could "fix" things for the user so that we just look up all
those filehandles again transparently.

  The real question is: how do we know that is the right thing to do?

The NFS client wouldn't know the difference between your /etc/passwd
file and a javascript pop-up ad. If it gets an ESTALE error, then that
tells it that the original filehandle is invalid, but it does not know
WHY that is the case. The file may have been deleted and replaced by a
new one. It may be that your server is broken, and is actually losing
filehandles on reboot (as appears to be the case in your setup),...

Reopening the file, and then continuing to write from the same position
may be the right thing to do, but then again it may cause you to
overwrite a bunch of freshly written password entries.

So we bounce the error up to userland where these issues can actually be
resolved.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  1:39   ` Trond Myklebust
  (?)
@ 2004-09-05  1:51   ` Sven Köhler
  2004-09-05  2:02       ` Trond Myklebust
  -1 siblings, 1 reply; 24+ messages in thread
From: Sven Köhler @ 2004-09-05  1:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, nfs

> Of course we could "fix" things for the user so that we just look up all
> those filehandles again transparently.
> 
>   The real question is: how do we know that is the right thing to do?
> 
> The NFS client wouldn't know the difference between your /etc/passwd
> file and a javascript pop-up ad. If it gets an ESTALE error, then that
> tells it that the original filehandle is invalid, but it does not know
> WHY that is the case. The file may have been deleted and replaced by a
> new one. It may be that your server is broken, and is actually losing
> filehandles on reboot (as appears to be the case in your setup),...

I agree, but you simply admit that the NFS client doesn't seem to know, 
when the server was restart. The simpliest thing i can imagine, is that 
the NFS server generates a random integer-value at start, and transmits 
it along with ESTALE. If the integer-value is different from the 
integer-value the server send while mounting the FS, than the kernel has 
to remount it transparently. This is a simple thing so that a client can 
safely determine, if the server has been restarted, or not, and it only 
adds 4 byte to some nfs-packets.

> Reopening the file, and then continuing to write from the same position
> may be the right thing to do, but then again it may cause you to
> overwrite a bunch of freshly written password entries.

In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do 
a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if 
i use a different shell. I always thought, that the "cd /mnt/nfs" should 
work, since the shell will aquire a new handle, but it doesn't work :-(

So i'm not really talking about restoring all file-handles. The 
filehandles that were still open while the server restarted may stay 
broken, but i'd like to be abled to open new ones at last.

> So we bounce the error up to userland where these issues can actually be
> resolved.

This is a good thing to do in general, but i think this needs improvement.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  1:51   ` Sven Köhler
@ 2004-09-05  2:02       ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  2:02 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

P=E5 lau , 04/09/2004 klokka 21:51, skreiv Sven K=F6hler:

> I agree, but you simply admit that the NFS client doesn't seem to kno=
w,=20
> when the server was restart. The simpliest thing i can imagine, is th=
at=20
> the NFS server generates a random integer-value at start, and transmi=
ts=20
> it along with ESTALE. If the integer-value is different from the=20
> integer-value the server send while mounting the FS, than the kernel =
has=20
> to remount it transparently. This is a simple thing so that a client =
can=20
> safely determine, if the server has been restarted, or not, and it on=
ly=20
> adds 4 byte to some nfs-packets.

No.... The simplest thing is for the server to actually abide by the
RFCs and not generate filehandles that change on reboot.

NFSv4 is the ONLY version of the protocol that actually supports the
concept of filehandles that have a finite lifetime.

> In my case, if the nfs directory is mounted to /mnt/nfs, i can't even=
 do=20
> a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even =
if=20
> i use a different shell. I always thought, that the "cd /mnt/nfs" sho=
uld=20
> work, since the shell will aquire a new handle, but it doesn't work :=
-(

It won't if the root filehandle is broken too. That is the standard way
of telling the NFS client that the administrator has revoked our access
to the filesystem.

The solution is simple here: fix the broken server...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-05  2:02       ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  2:02 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

På lau , 04/09/2004 klokka 21:51, skreiv Sven Köhler:

> I agree, but you simply admit that the NFS client doesn't seem to know, 
> when the server was restart. The simpliest thing i can imagine, is that 
> the NFS server generates a random integer-value at start, and transmits 
> it along with ESTALE. If the integer-value is different from the 
> integer-value the server send while mounting the FS, than the kernel has 
> to remount it transparently. This is a simple thing so that a client can 
> safely determine, if the server has been restarted, or not, and it only 
> adds 4 byte to some nfs-packets.

No.... The simplest thing is for the server to actually abide by the
RFCs and not generate filehandles that change on reboot.

NFSv4 is the ONLY version of the protocol that actually supports the
concept of filehandles that have a finite lifetime.

> In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do 
> a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if 
> i use a different shell. I always thought, that the "cd /mnt/nfs" should 
> work, since the shell will aquire a new handle, but it doesn't work :-(

It won't if the root filehandle is broken too. That is the standard way
of telling the NFS client that the administrator has revoked our access
to the filesystem.

The solution is simple here: fix the broken server...

Cheers,
   Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  2:02       ` Trond Myklebust
  (?)
@ 2004-09-05  2:23       ` Sven Köhler
  2004-09-05  3:01           ` Trond Myklebust
  -1 siblings, 1 reply; 24+ messages in thread
From: Sven Köhler @ 2004-09-05  2:23 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, nfs

>>I agree, but you simply admit that the NFS client doesn't seem to know, 
>>when the server was restart. The simpliest thing i can imagine, is that 
>>the NFS server generates a random integer-value at start, and transmits 
>>it along with ESTALE. If the integer-value is different from the 
>>integer-value the server send while mounting the FS, than the kernel has 
>>to remount it transparently. This is a simple thing so that a client can 
>>safely determine, if the server has been restarted, or not, and it only 
>>adds 4 byte to some nfs-packets.
> 
> No.... The simplest thing is for the server to actually abide by the
> RFCs and not generate filehandles that change on reboot.

OK, that sounds complicated, but if it would work, than it would be very 
nice indeed.

> NFSv4 is the ONLY version of the protocol that actually supports the
> concept of filehandles that have a finite lifetime.

But NFSv4 is still exprerimental :-( and i think the client don't have 
NFSv4 support too.

>>In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do 
>>a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if 
>>i use a different shell. I always thought, that the "cd /mnt/nfs" should 
>>work, since the shell will aquire a new handle, but it doesn't work :-(
> 
> It won't if the root filehandle is broken too. That is the standard way
> of telling the NFS client that the administrator has revoked our access
> to the filesystem.
> 
> The solution is simple here: fix the broken server...

Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils 
1.0.6 on my server, and i don't see, what should be broken.

Thx
   Sven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  2:23       ` Sven Köhler
@ 2004-09-05  3:01           ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  3:01 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

P=E5 lau , 04/09/2004 klokka 22:23, skreiv Sven K=F6hler:

> Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-uti=
ls=20
> 1.0.6 on my server, and i don't see, what should be broken.

When your server fails to work as per spec, then it is said to be
"broken" no matter what kernel/nfs-utils combination you are using.
The spec is that reboots are not supposed to clobber filehandles.

So, there are 3 possibilities:

 1) You are exporting a non-supported filesystem, (e.g. FAT). See the
=46AQ on http://nfs.sourceforge.org.
 2) A bug in your initscripts is causing the table of exports to be
clobbered. Running "exportfs" in legacy 2.4 mode (without having the
nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me a=
t
least...
 3) There is some other bug in knfsd that nobody else appears to be
seeing.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-05  3:01           ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05  3:01 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

På lau , 04/09/2004 klokka 22:23, skreiv Sven Köhler:

> Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils 
> 1.0.6 on my server, and i don't see, what should be broken.

When your server fails to work as per spec, then it is said to be
"broken" no matter what kernel/nfs-utils combination you are using.
The spec is that reboots are not supposed to clobber filehandles.

So, there are 3 possibilities:

 1) You are exporting a non-supported filesystem, (e.g. FAT). See the
FAQ on http://nfs.sourceforge.org.
 2) A bug in your initscripts is causing the table of exports to be
clobbered. Running "exportfs" in legacy 2.4 mode (without having the
nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
least...
 3) There is some other bug in knfsd that nobody else appears to be
seeing.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  3:01           ` Trond Myklebust
@ 2004-09-05  8:17             ` Tim Connors
  -1 siblings, 0 replies; 24+ messages in thread
From: Tim Connors @ 2004-09-05  8:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Sven Köhler, linux-kernel, nfs

Trond Myklebust <trond.myklebust@fys.uio.no> said on Sat, 04 Sep 2004 2=
3:01:07 -0400:
> P=E5 lau , 04/09/2004 klokka 22:23, skreiv Sven K=F6hler:
>=20
> > Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-u=
tils=20
> > 1.0.6 on my server, and i don't see, what should be broken.
>=20
> When your server fails to work as per spec, then it is said to be
> "broken" no matter what kernel/nfs-utils combination you are using.
> The spec is that reboots are not supposed to clobber filehandles.
>=20
> So, there are 3 possibilities:
>=20
>  1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
>  2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me=
 at
> least...
>  3) There is some other bug in knfsd that nobody else appears to be
> seeing.

Have I got 2 cases of 3) for you perhaps?

I can't give you more info, because I am not the admin of the boxes
concerned, but we lose filehandles of specific files and spontaneously
sometimes (no server reboots, nfsd restarts, etc).

Background:

We have a compute cluster of machines all running SuSE's 2.4.20, or
thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
bigass apple Xserves.

I will update one directory with rsync from one host, and then try, a
little later on, to operate on that directory from another host. Every
now and then, from a single host only, a few files in that tree will
get stale filehandles - an ls of that directory will mostly be fine
apart from those files. They will also be fine from any other machine.

I have found that if I clobber cache with my alloclargemem program,
then those files will come back immediately.

The other problem we see regularly, and I have encoded explicitly into
my scripts to workaround, because it is such a common occurence, is
when I start 120 jobs in a short time on 120 nodes, which deal with a
bunch of common files read-only, and then write their own private
files, a few of them will die with the read-only files being stale. It
looks as if the server just can't cope with a hundred requests (and
possibly mounts, since they are automounted) in the space of half a
minute (big files, mind you), and starts returning bogus data.

The entire mount (which is automounted, looks like version 3) will
then remain stale for eternity, with df returning its minus 3
bazillion GB free, until automount is restarted.

Known problems? I googled for '"stale nfs file handle" spontaneous'
with no luck. Or is it likely perhaps that SuSE fscked with the nfs
(and autofs) client side code? The sysadmins look at these failures as
being a fact of life, but perhaps no-one else is seeing this, so it's
worth reporting.

--=20
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
PUBLIC NOTICE AS REQUIRED BY LAW: Any Use of This Product, in Any Manne=
r=20
Whatsoever, Will Increase the Amount of Disorder in the Universe. Altho=
ugh No=20
Liability Is Implied Herein, the Consumer Is Warned That This Process W=
ill=20
Ultimately Lead to the Heat Death of the Universe.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-05  8:17             ` Tim Connors
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Connors @ 2004-09-05  8:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Sven Köhler, linux-kernel, nfs

Trond Myklebust <trond.myklebust@fys.uio.no> said on Sat, 04 Sep 2004 23:01:07 -0400:
> På lau , 04/09/2004 klokka 22:23, skreiv Sven Köhler:
> 
> > Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils 
> > 1.0.6 on my server, and i don't see, what should be broken.
> 
> When your server fails to work as per spec, then it is said to be
> "broken" no matter what kernel/nfs-utils combination you are using.
> The spec is that reboots are not supposed to clobber filehandles.
> 
> So, there are 3 possibilities:
> 
>  1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
>  2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
>  3) There is some other bug in knfsd that nobody else appears to be
> seeing.

Have I got 2 cases of 3) for you perhaps?

I can't give you more info, because I am not the admin of the boxes
concerned, but we lose filehandles of specific files and spontaneously
sometimes (no server reboots, nfsd restarts, etc).

Background:

We have a compute cluster of machines all running SuSE's 2.4.20, or
thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
bigass apple Xserves.

I will update one directory with rsync from one host, and then try, a
little later on, to operate on that directory from another host. Every
now and then, from a single host only, a few files in that tree will
get stale filehandles - an ls of that directory will mostly be fine
apart from those files. They will also be fine from any other machine.

I have found that if I clobber cache with my alloclargemem program,
then those files will come back immediately.

The other problem we see regularly, and I have encoded explicitly into
my scripts to workaround, because it is such a common occurence, is
when I start 120 jobs in a short time on 120 nodes, which deal with a
bunch of common files read-only, and then write their own private
files, a few of them will die with the read-only files being stale. It
looks as if the server just can't cope with a hundred requests (and
possibly mounts, since they are automounted) in the space of half a
minute (big files, mind you), and starts returning bogus data.

The entire mount (which is automounted, looks like version 3) will
then remain stale for eternity, with df returning its minus 3
bazillion GB free, until automount is restarted.

Known problems? I googled for '"stale nfs file handle" spontaneous'
with no luck. Or is it likely perhaps that SuSE fscked with the nfs
(and autofs) client side code? The sysadmins look at these failures as
being a fact of life, but perhaps no-one else is seeing this, so it's
worth reporting.

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
PUBLIC NOTICE AS REQUIRED BY LAW: Any Use of This Product, in Any Manner 
Whatsoever, Will Increase the Amount of Disorder in the Universe. Although No 
Liability Is Implied Herein, the Consumer Is Warned That This Process Will 
Ultimately Lead to the Heat Death of the Universe.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  8:17             ` Tim Connors
  (?)
@ 2004-09-05  8:59             ` Florian Weimer
  2004-09-05  9:02               ` Tim Connors
  -1 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2004-09-05  8:59 UTC (permalink / raw)
  To: Tim Connors; +Cc: linux-kernel, nfs

* Tim Connors:

> Background:
>
> We have a compute cluster of machines all running SuSE's 2.4.20, or
> thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
> bigass apple Xserves.

Which NFS server software are you using?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  8:59             ` Florian Weimer
@ 2004-09-05  9:02               ` Tim Connors
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Connors @ 2004-09-05  9:02 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Linux Kernel Mailing List, nfs

On Sun, 5 Sep 2004, Florian Weimer wrote:

> * Tim Connors:
>
> > Background:
> >
> > We have a compute cluster of machines all running SuSE's 2.4.20, or
> > thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
> > bigass apple Xserves.
>
> Which NFS server software are you using?

kernel nfsd
Source RPM: nfs-utils-1.0.1-109.src.rpm


-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Save the whales. Feed the hungry. Free the mallocs. --unk

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  8:17             ` Tim Connors
  (?)
  (?)
@ 2004-09-05 16:20             ` Mike Jagdis
  2004-09-06  1:32               ` Tim Connors
  -1 siblings, 1 reply; 24+ messages in thread
From: Mike Jagdis @ 2004-09-05 16:20 UTC (permalink / raw)
  To: Tim Connors; +Cc: Trond Myklebust, Sven Köhler, linux-kernel, nfs



Tim Connors wrote:
> I will update one directory with rsync from one host,

You mean rsync to the server and change files directly on the fs rather 
than through an NFS client?

> and then try, a
> little later on, to operate on that directory from another host. Every
> now and then, from a single host only, a few files in that tree will
> get stale filehandles - an ls of that directory will mostly be fine
> apart from those files. They will also be fine from any other machine.

Yeah, that's what happens... Clients that had the file open are liable 
to get ESTALE. Stale file handles stick around until unmount. As long as 
they're around automount will consider the mount busy and not expire it 
(but you can unmount manually or killall -USR1 automountd).

Mike

-- 
Mike Jagdis                        Web: http://www.eris-associates.co.uk
Eris Associates Limited            Tel: +44 7780 608 368
Reading, England                   Fax: +44 118 926 6974

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05 16:20             ` Mike Jagdis
@ 2004-09-06  1:32               ` Tim Connors
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Connors @ 2004-09-06  1:32 UTC (permalink / raw)
  To: Mike Jagdis
  Cc: Trond Myklebust, Sven Köhler, Linux Kernel Mailing List, nfs

On Sun, 5 Sep 2004, Mike Jagdis wrote:

> Tim Connors wrote:
> > I will update one directory with rsync from one host,
>
> You mean rsync to the server and change files directly on the fs rather
> than through an NFS client?

No - the server is behind a firewall. Just an ordinary nfs client.

> > and then try, a
> > little later on, to operate on that directory from another host. Every
> > now and then, from a single host only, a few files in that tree will
> > get stale filehandles - an ls of that directory will mostly be fine
> > apart from those files. They will also be fine from any other machine.
>
> Yeah, that's what happens... Clients that had the file open are liable
> to get ESTALE. Stale file handles stick around until unmount. As long as
> they're around automount will consider the mount busy and not expire it
> (but you can unmount manually or killall -USR1 automountd).

Yep - that has been the case normally (when the entire mount went stale),
we'd just restart the automounter.

You almost hit the nail on the head with regards to the problem - this
last happened a week ago, and I seem to remember 6 files getting ESTALE.
But only 2 of those would have likely been open on the host where they
went stale, at any time near when they went stale (if they were open at
all), if I am remembering things right. Unless an `ls -lA --color` counts
as "opening" (they weren't symlinks, just normal files, so I doubt it).

What is strange, is I was able to make them "unstale" simply by clearing
cache - allocating a large block of ram, and ensuring buffers and cached
went to something very small. I didn't need to restart the automounter at
all. Then, I could `ls` the directory fine, and could `cat` the files
fine.

I'm afraid that the intermittent nature of this problem is going to make
it hard for me to reproduce though!

I take it the files go stale (normally) because sillyrename only happens
when 1 host tries to delete while the same host has the file open, so the
server doesn't know that a client still has it open, and if the inode just
happens to be allocated by something new, then the server has no choice
but to say "bugger off"? I thought I had seen in the past that you could
delete a file from one host, have another host still be using the file,
and it would do the sillyrename, and the client would continue to use the
file just fine - probably was on a Sun, come to think of it -- does it's
equivalent of sillyrename keep track of who has what open?

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
"Meddle not in the affairs of cats, for they are subtle, and will
piss on your computer."                             - Jeff Wilder

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  3:01           ` Trond Myklebust
  (?)
  (?)
@ 2004-09-05 13:18           ` Sven Köhler
  2004-09-05 20:10               ` Trond Myklebust
  -1 siblings, 1 reply; 24+ messages in thread
From: Sven Köhler @ 2004-09-05 13:18 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, nfs

> So, there are 3 possibilities:
> 
>  1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.

I'm exporting a reiserfs.

>  2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...

So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the 
case on my machine. Should the init-script do a simple "mount -t nfsd 
none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05 13:18           ` Sven Köhler
@ 2004-09-05 20:10               ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05 20:10 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

P=E5 su , 05/09/2004 klokka 09:18, skreiv Sven K=F6hler:

> So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the=20
> case on my machine. Should the init-script do a simple "mount -t nfsd=20
> none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo)=
.

Yes... See the manpage for "exportfs".

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-05 20:10               ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-05 20:10 UTC (permalink / raw)
  To: Sven Köhler; +Cc: linux-kernel, nfs

På su , 05/09/2004 klokka 09:18, skreiv Sven Köhler:

> So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the 
> case on my machine. Should the init-script do a simple "mount -t nfsd 
> none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).

Yes... See the manpage for "exportfs".

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05 20:10               ` Trond Myklebust
  (?)
@ 2004-09-06  7:47               ` Kalin KOZHUHAROV
  -1 siblings, 0 replies; 24+ messages in thread
From: Kalin KOZHUHAROV @ 2004-09-06  7:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: nfs

Trond Myklebust wrote:
>>So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the 
>>case on my machine. Should the init-script do a simple "mount -t nfsd 
>>none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).
Well, I am on Gentoo as well, and it seems that it is mounted on /proc/fs/nfs.

However `cat /proc/fs/nfs/exports` showed only one of 5 exported dirs on my server.
It has been a few weeks since last restart (and NFS restart).
`/etc/init.d/nfs restart` or `exportfs -a` fixed it.

> Yes... See the manpage for "exportfs".

Had a (first) look at it, but I still cannod understand what is the difference
between the "-r" and "-a" option...
The output on my system from both `exportfs -rv` and `exportfs -av` is the same.

Kalin.

-- 
 || ~~~~~~~~~~~~~~~~~~~~~~ ||
(  ) http://ThinRope.net/ (  )
 || ______________________ ||

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  3:01           ` Trond Myklebust
                             ` (2 preceding siblings ...)
  (?)
@ 2004-09-06  9:57           ` David Woodhouse
  2004-09-06 15:59               ` Trond Myklebust
  -1 siblings, 1 reply; 24+ messages in thread
From: David Woodhouse @ 2004-09-06  9:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Sven Köhler, linux-kernel, nfs

On Sat, 2004-09-04 at 23:01 -0400, Trond Myklebust wrote:
>  2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
>  3) There is some other bug in knfsd that nobody else appears to be
> seeing.

The fact that we require a persistent table of exports at all, and can't
call back to mountd to authenticate 'new' clients instead of just
telling them to sod off if the kernel doesn't already know about them,
is considered by some to be a bug in knfsd. 

-- 
dwmw2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-06  9:57           ` David Woodhouse
@ 2004-09-06 15:59               ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-06 15:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Sven Köhler, linux-kernel, nfs

P=E5 m=E5 , 06/09/2004 klokka 05:57, skreiv David Woodhouse:

> The fact that we require a persistent table of exports at all, and can't
> call back to mountd to authenticate 'new' clients instead of just
> telling them to sod off if the kernel doesn't already know about them,
> is considered by some to be a bug in knfsd.=20

That should have been fixed in 2.6.x. If you do mount /proc/fs/nfsd, and
use a recent enough version of mountd, then knfsd can and will work
without any extra help from exportfs.

The one problem I have found with this implementation is that it relies
very heavily on reverse-DNS lookups, so it may give unexpected results
if you have more than one name for your client. I can't see why that
shouldn't be fixable, though...

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
@ 2004-09-06 15:59               ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2004-09-06 15:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Sven Köhler, linux-kernel, nfs

På må , 06/09/2004 klokka 05:57, skreiv David Woodhouse:

> The fact that we require a persistent table of exports at all, and can't
> call back to mountd to authenticate 'new' clients instead of just
> telling them to sod off if the kernel doesn't already know about them,
> is considered by some to be a bug in knfsd. 

That should have been fixed in 2.6.x. If you do mount /proc/fs/nfsd, and
use a recent enough version of mountd, then knfsd can and will work
without any extra help from exportfs.

The one problem I have found with this implementation is that it relies
very heavily on reverse-DNS lookups, so it may give unexpected results
if you have more than one name for your client. I can't see why that
shouldn't be fixable, though...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [NFS] Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  3:01           ` Trond Myklebust
                             ` (3 preceding siblings ...)
  (?)
@ 2004-09-07  0:55           ` Greg Banks
  -1 siblings, 0 replies; 24+ messages in thread
From: Greg Banks @ 2004-09-07  0:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Sven Köhler, linux-kernel, Linux NFS Mailing List

On Sun, 2004-09-05 at 13:01, Trond Myklebust wrote:
> When your server fails to work as per spec, then it is said to be
> "broken" no matter what kernel/nfs-utils combination you are using.
> The spec is that reboots are not supposed to clobber filehandles.
> 
> So, there are 3 possibilities:
> 
>  1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
>  2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
>  3) There is some other bug in knfsd that nobody else appears to be
> seeing.
> 

4) You're exporting a filesystem mounted on a block device whose
   device minor number is dynamic and has changed at the last reboot,
   e.g. loopback mounts or SCSI.
5) The mapping of minor numbers is stable but you physically re-arranged
   the disks or SCSI cards and changed /etc/fstab correspondingly.

Before you say any more, yes this is broken and fixing it properly is
Hard.  This is why have the fsid export option.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: why do i get "Stale NFS file handle" for hours?
  2004-09-05  1:06 why do i get "Stale NFS file handle" for hours? Sven Köhler
  2004-09-05  1:39   ` Trond Myklebust
@ 2004-09-06  6:47 ` Frank Steiner
  1 sibling, 0 replies; 24+ messages in thread
From: Frank Steiner @ 2004-09-06  6:47 UTC (permalink / raw)
  To: Sven Köhler; +Cc: nfs

I had two issues which caused the same behavious for me. Maybe check those.

1) there was a race condition in the nfs server, not yet fixed in 2.6.8.1
    that caused stale handels if the server restarted too fast, i.e.,
    without at least a "sleep 1" between stop and start.
    Neil sent a patch for this on the list on August 18th.

2) on SuSE 9.0, (maybe 9.1, maybe earlier too) the /etc/init.d/nfsserver
    script was broken in that it called "/usr/sbin/exportfs -au" before
    "killproc rpc.mountd" which caused a lot of stale handles. Maybe
    you have sth. similar?

cu,
Frank

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-09-07  0:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-05  1:06 why do i get "Stale NFS file handle" for hours? Sven Köhler
2004-09-05  1:39 ` Trond Myklebust
2004-09-05  1:39   ` Trond Myklebust
2004-09-05  1:51   ` Sven Köhler
2004-09-05  2:02     ` Trond Myklebust
2004-09-05  2:02       ` Trond Myklebust
2004-09-05  2:23       ` Sven Köhler
2004-09-05  3:01         ` Trond Myklebust
2004-09-05  3:01           ` Trond Myklebust
2004-09-05  8:17           ` Tim Connors
2004-09-05  8:17             ` Tim Connors
2004-09-05  8:59             ` Florian Weimer
2004-09-05  9:02               ` Tim Connors
2004-09-05 16:20             ` Mike Jagdis
2004-09-06  1:32               ` Tim Connors
2004-09-05 13:18           ` Sven Köhler
2004-09-05 20:10             ` Trond Myklebust
2004-09-05 20:10               ` Trond Myklebust
2004-09-06  7:47               ` Kalin KOZHUHAROV
2004-09-06  9:57           ` David Woodhouse
2004-09-06 15:59             ` Trond Myklebust
2004-09-06 15:59               ` Trond Myklebust
2004-09-07  0:55           ` [NFS] " Greg Banks
2004-09-06  6:47 ` Frank Steiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.