All of lore.kernel.org
 help / color / mirror / Atom feed
* Stale NFS file handle
@ 2012-02-13 23:32 Székelyi Szabolcs
  2012-02-13 23:34 ` Sage Weil
  2012-02-14  1:04 ` Tommi Virtanen
  0 siblings, 2 replies; 22+ messages in thread
From: Székelyi Szabolcs @ 2012-02-13 23:32 UTC (permalink / raw)
  To: ceph-devel

Hi,

I'm using Ceph 0.41 with the FUSE client. After a while I get stale NFS file 
errors when trying to read a file or list a directory. Logs and scrubbing 
doesn't show any errors or suspicious entries. After remounting the filesystem 
either by restarting the cluster thus forcing the clients to reconnect or 
umount+mount, files and directories either show up again or seem lost forever.

Can you give me any hint on what to check?

Thanks,
-- 
cc



^ permalink raw reply	[flat|nested] 22+ messages in thread
* Stale NFS file handle
@ 2016-12-24  9:48 Xen
  2017-01-03 19:41 ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Xen @ 2016-12-24  9:48 UTC (permalink / raw)
  To: linux-nfs

Hi,

On a Debian server I have mounted several snapshots daily that I export 
with NFS.

At the end of the day the nfs-kernel-server service is shut down, the 
snapshots are renewed, remounted, and the server is brought online 
again.

In the beginning (I haven't been doing this for long) it all worked fine 
and I could mount the shares on the client, which is an older NAS unit, 
running an old kernel as 2.6.32.

Yet one of the shares now refuses to get mounted and I don't know why. 
The only thing I haven't tried is actually renaming the mount points.

mount: mounting island.vpn:/srv/root on /mnt/remote/root failed: Stale 
NFS file handle

This "island.vpn" simply translates to 10.8.20.25, in this case.

This is one of 5 mounts and one of 5 snapshots. The other snapshots 
simply succeed.

I have rebooted both servers.

I have removed the mount points on both places: the mount points for the 
snapshots, and the mount points for the shares on the client.

I have run exportfs -r and exportfs -f.

Oh, apologies, I see the issue, or at least part of it.

Dec 24 02:45:35 island rpc.mountd[3217]: / and /srv/root have same 
filehandle for diskstation.vpn, using first

I really wanted to find out if it uses nfs3 or nfs4, but I think it uses 
nfs 4.

The above message does not always repeat itself:

Dec 24 02:56:35 island rpc.mountd[3217]: authenticated mount request 
from 10.8.20.1:944 for /srv/root (/srv/root)
Dec 24 02:58:09 island rpc.mountd[3217]: authenticated mount request 
from 10.8.20.1:638 for /srv/boot (/srv/boot)

The site uses LVM snapshots, root (and boot) are regular, non-thin 
snapshots.

These are my exports:

/srv/home       diskstation(ro,no_subtree_check,no_root_squash)
/srv/data       diskstation(ro,no_subtree_check,no_root_squash)
/srv/sites      diskstation(ro,no_subtree_check,no_root_squash)
/srv/boot       diskstation(ro,no_subtree_check,no_root_squash)
/srv/root       diskstation(ro,no_subtree_check,no_root_squash)

All other mounts succeed without issue. Root did fine at first as well.

Edit: adding fsid=22 to the root line fixed it:

/srv/home       diskstation(ro,no_subtree_check,no_root_squash)
/srv/data       diskstation(ro,no_subtree_check,no_root_squash)
/srv/sites      diskstation(ro,no_subtree_check,no_root_squash)
/srv/boot       diskstation(ro,no_subtree_check,no_root_squash)
/srv/root       diskstation(ro,fsid=22,no_subtree_check,no_root_squash)

All snapshots are independently mounted and hence do not contain other 
mounts on them.

Well I'm glad that's sorted. I don't know why the NFS server would pick 
a filesystem to export that wasn't even mentioned. Of course the 
snapshot and the root (original) will have the same UUID.

Not its partition, but its filesystem will.

So I apologize for this message ;-).

Regards.

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Stale NFS File Handle
@ 2006-02-03 18:05 Brian D. McGrew
  2006-02-03 19:09 ` Trond Myklebust
  2006-02-03 19:24 ` Roger Heflin
  0 siblings, 2 replies; 22+ messages in thread
From: Brian D. McGrew @ 2006-02-03 18:05 UTC (permalink / raw)
  To: linux-kernel

Good morning all (kind of a long winded mail, please have patience!)

I've got an FC3 server running a 2.6.9 kernel and sharing about 500GB of
disk space on a RAID5 array via NFS.  This box has been running fine for
over a year now but in the last three weeks or so I'm seeing a ton of
Stale NFS File Handle errors; especially in my overnight builds.

Most of my clients are FC3 and a couple of Solaris boxes running a stock
configuration.  All we're doing is serving up NFS and compiling with
GCC.  We're seeing this error more and more and the harder I try to
track it down, the more we're seeing it (ok, maybe that's my
imagination).

I'm guessing that the problem has to be somewhere in the FC3 server
because I've still got some Solaris NFS servers that have been running
for years with no problems.

What should I be looking for in tracking this error down?  Should I
upgrade my kernel?  Should I throw away FC3 and go to Enterprise Linux?
I'm at the end of my rope here because this is now causing a major set
back to our development team!

Please help!

-brian

Brian D. McGrew { brian@visionpro.com || brian@doubledimension.com }
--
> Those of you who think you know it all,
  really annoy those of us who do! 


^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: Stale NFS file handle
@ 2005-03-23 18:59 Lever, Charles
  0 siblings, 0 replies; 22+ messages in thread
From: Lever, Charles @ 2005-03-23 18:59 UTC (permalink / raw)
  To: Filipe Brandenburger; +Cc: nfs

filipe-

in general the kernel patches i referred to earlier will prevent most
issues when using rsync and serving web pages.  an occassional ESTALE is
unavoidable because no NFS client can recover from an ESTALE during a
read operation.  however, the patches do allow a subsequent open(2)
operation on that pathname to find the new file.


> -----Original Message-----
> From: Filipe Brandenburger [mailto:branden@terra.com.br]=20
> Sent: Wednesday, March 23, 2005 12:15 PM
> To: Trond Myklebust
> Cc: Steve Dickson; nfs@lists.sourceforge.net
> Subject: Re: [NFS] Stale NFS file handle
>=20
>=20
> * Wed, 23 Mar 2005 08:57:15 -0500, Trond Myklebust=20
> <trond.myklebust@fys.uio.no>:
> > He was running
> >=20
> > while :; do cat test.txt; done >/dev/null
> >=20
> > on a client, then deleting the file on the server. Even if=20
> the call to
> > open() is successful, you both can and will get ESTALEs on the=20
> > subsequent call to read().
>=20
> Ok,
>=20
> But then, how do you suggest I should change applications to=20
> do it? The applications that publish content to the NFS run=20
> on one host and are based on rsync, the applications that=20
> deliver content are web servers
> (Apache) reading from this same NFS on another pool of hosts=20
> (these are the ones that get the ESTALE error).
>=20
> Where is the problem? On the applications that publish?=20
> Should they open the file and update it in-place instead of=20
> creating a new one and renaming? I don't think so! This would=20
> lead to content that is a mix of the old and the new, that is corrupt.
>=20
> Or is the webserver? Should the application protect itself=20
> from ESTALE errors and retry? Somehow that seems wrong to me=20
> also. Then I would have to change all applications that read=20
> this content to do it. Why doesn't the NFS client recover=20
> from this kind of errors?
>=20
> If it's really not possible to change it on the NFS client=20
> (the kernel), what workaround would you suggest me to use?
>=20
> Thanks,
> Filipe
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.net email is sponsored by Microsoft Mobile & Embedded=20
> DevCon 2005 Attend MEDC 2005 May 9-12 in Vegas. Learn more=20
> about the latest Windows
> Embedded(r) & Windows Mobile(tm) platforms, applications &=20
> content.  Register by 3/29 & save $300=20
> http://ads.osdn.com/?ad_id=3D6883&alloc_id=3D15149&op=3Dclick
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net=20
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
This SF.net email is sponsored by Microsoft Mobile & Embedded DevCon 2005
Attend MEDC 2005 May 9-12 in Vegas. Learn more about the latest Windows
Embedded(r) & Windows Mobile(tm) platforms, applications & content.  Register
by 3/29 & save $300 http://ads.osdn.com/?ad_id=6883&alloc_id=15149&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: Stale NFS file handle
@ 2005-03-23 14:42 Lever, Charles
  0 siblings, 0 replies; 22+ messages in thread
From: Lever, Charles @ 2005-03-23 14:42 UTC (permalink / raw)
  To: Filipe Brandenburger; +Cc: nfs

i saw trond's post.  he is correct, your use case is broken.  the
patches will fix ESTALE for open(2), but not for read(2).  i don't know
of any NFS implementation that will recover from an ESTALE on a read
operation.

you need to understand that NFS is not a cluster file system.  it does
not provide single-system semantics.  for a better understanding of the
limitations of NFS's caching model, take a look at Callaghan's "NFS
Illustrated."

> -----Original Message-----
> From: Filipe Brandenburger [mailto:branden@terra.com.br]=20
> Sent: Wednesday, March 23, 2005 9:35 AM
> To: Lever, Charles
> Subject: Re: [NFS] Stale NFS file handle
>=20
>=20
> Hi, there.
>=20
> Thanks for your answer. Do you know of such a patch that=20
> would solve this issue at the Linux Kernel level? I'm using=20
> kernel 2.4, do you know if 2.6 is any better on that? Do you=20
> know if other client implementations actually recover from=20
> these errors? I googled around and found out that this may be=20
> an issue on Solaris as well...
>=20
> Thanks,
> Filipe
>=20
>=20
>=20
> * Wed, 23 Mar 2005 05:53:39 -0800, "Lever, Charles"=20
> <Charles.Lever@netapp.com>:
> > when you replaced the file, client 2 still had the old file handle=20
> > cached.  when it used that old file handle again, the=20
> server reported=20
> > the file no longer existed with an ESTALE error.
> >=20
> > the problem is that Linux NFS clients don't recover from=20
> ESTALE errors.
> > it's a deficiency in the client implementation that, at=20
> this point, is=20
> > fixed only by patches.  at some point soon the patches will be=20
> > integrated into the mainline and distributions.
>=20
>=20


-------------------------------------------------------
This SF.net email is sponsored by: 2005 Windows Mobile Application Contest
Submit applications for Windows Mobile(tm)-based Pocket PCs or Smartphones
for the chance to win $25,000 and application distribution. Enter today at
http://ads.osdn.com/?ad_id=6882&alloc_id=15148&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Stale NFS file handle
@ 2005-03-23  0:19 Filipe Brandenburger
  2005-03-23 13:12 ` Steve Dickson
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Brandenburger @ 2005-03-23  0:19 UTC (permalink / raw)
  To: nfs


Hello,

I have a problem where I'm getting "Stale NFS file handle" errors when a
file is updated. I can easily reproduce the problem if I run a sequence
of commands in two different hosts.

My environment is:

1) Server: Netapp FAS940
2) Client 1: Linux RedHat 9 with kernel 2.4.21-4.ELsmp (kernel of RHAS3)
3) Client 2: exactly the same as client 1.

The file system is mounted on both clients with the options
rsize=8192,wsize=8192,timeo=28,intr, additionally it's mounted read-only
on client 2 (it also gives me stale file handle if it's mounted
read-write, so it doesn't really matter).

My test setup is:

On client 2, I setup a loop to read a file:

# while :; do cat test.txt; done >/dev/null

Then, on client 1, I create a new file and rename it over the original
file:

# date >new.txt; mv -f new.txt test.txt

Whenever I execute this on client 1, I get the following error message
on client 2:

cat: test.txt: Stale NFS file handle



Why is this happening? Is there a way to fix this problem? I tried the
mount options "noac" and "nocto" on client 2, and used "mount -o remount"
on it, after that the output of "mount" returned these options, and it
didn't solve the issue.

Is there a way to solve this issue without changing applications that
access this file? Because although my test environment consists of only
"cat" and "mv", my real production environment is of proprietary
applications, that are harder to fix, "cat" and "mv" was only the way I
used to reproduce the problem in a controlled environment...

Thanks a lot,
Filipe



-------------------------------------------------------
This SF.net email is sponsored by: 2005 Windows Mobile Application Contest
Submit applications for Windows Mobile(tm)-based Pocket PCs or Smartphones
for the chance to win $25,000 and application distribution. Enter today at
http://ads.osdn.com/?ad_id=6882&alloc_id=15148&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-01-03 19:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-13 23:32 Stale NFS file handle Székelyi Szabolcs
2012-02-13 23:34 ` Sage Weil
2012-02-13 23:51   ` Székelyi Szabolcs
2012-02-13 23:54     ` Sage Weil
2012-02-14  0:51       ` Székelyi Szabolcs
2012-02-23 18:43         ` Tommi Virtanen
2012-02-24 12:25           ` Székelyi Szabolcs
2012-02-14  1:04 ` Tommi Virtanen
2012-02-14 13:20   ` Székelyi Szabolcs
  -- strict thread matches above, loose matches on Subject: below --
2016-12-24  9:48 Xen
2017-01-03 19:41 ` J. Bruce Fields
2006-02-03 18:05 Stale NFS File Handle Brian D. McGrew
2006-02-03 19:09 ` Trond Myklebust
2006-02-03 19:28   ` Roger Heflin
2006-02-03 19:24 ` Roger Heflin
2005-03-23 18:59 Stale NFS file handle Lever, Charles
2005-03-23 14:42 Lever, Charles
2005-03-23  0:19 Filipe Brandenburger
2005-03-23 13:12 ` Steve Dickson
2005-03-23 13:57   ` Trond Myklebust
2005-03-23 17:15     ` Filipe Brandenburger
2005-03-23 17:26       ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.