public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* Why is remount necessary after rebooting server?
@ 2010-04-16  6:12 Michael Tokarev
  2010-04-16 15:49 ` Chuck Lever
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Tokarev @ 2010-04-16  6:12 UTC (permalink / raw)
  To: linux-nfs

Hello.

It has been a while since I've seen.. issues with nfs for the
last time.  Now I hit a limitation of number of groups in nfs3,
and had to switch to nfs4.  And immediately hit another problem,
which makes the whole thing almost unusable for us.

The problem is that each time the nfs server is rebooted, I have
to - it boils down to - forcibly REBOOT each client which has
some mounts from the said server.  Because, well, remount - in
theory - should be sufficient, but I can't perform remount because
the filesystem(s) in question are busy.

Here's a typical situation (after reboot):

# ls /net/gnome/home
ls: cannot access /net/gnome/home: Stale NFS file handle

# mount | tail -n2
gnome:/ on /net/gnome type nfs4 (rw,nosuid,nodev,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.88.2,addr=192.168.88.4)
gnome:/home on /net/gnome/home type nfs4 (rw,nosuid,nodev,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.88.2,addr=192.168.88.4)

# umount /net/gnome/home
umount.nfs4: /net/gnome/home: device is busy
umount.nfs4: /net/gnome/home: device is busy

# umount -f /net/gnome/home
umount2: Device or resource busy
umount.nfs4: /net/gnome/home: device is busy
umount2: Device or resource busy
umount.nfs4: /net/gnome/home: device is busy

# umount -f /net/gnome
umount2: Device or resource busy
umount.nfs4: /net/gnome: device is busy
umount2: Device or resource busy
umount.nfs4: /net/gnome: device is busy


At this point, there are two ways:

  1.  try to find and kill all processes which are using
    the mountpoint.  But in almost all cases it is not
    possible since there is at least one process which is
    in D state and unkillable, so we proceed to variant 2:

  2. echo b > /proc/sysrq-trigger
    or something of this sort, since it will not be able
    umount / anyway.

Note that even if 1. succeed, the system is unusable anyway,
since it is here to service users.  So it is simpler and
faster to proceed to 2. stright away.

What can be done to stop the "Stale NFS handle" situation
from happening -- except of stopping rebooting the server?
At least with nfs3 it has been almost solved (almost, because
from time to time it still happens even with nfs3, leading
to the same issue).

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Why is remount necessary after rebooting server?
  2010-04-16  6:12 Why is remount necessary after rebooting server? Michael Tokarev
@ 2010-04-16 15:49 ` Chuck Lever
  0 siblings, 0 replies; 2+ messages in thread
From: Chuck Lever @ 2010-04-16 15:49 UTC (permalink / raw)
  To: linux-nfs, Michael Tokarev

On 04/16/2010 02:12 AM, Michael Tokarev wrote:
> Hello.
>
> It has been a while since I've seen.. issues with nfs for the
> last time. Now I hit a limitation of number of groups in nfs3,
> and had to switch to nfs4.  And immediately hit another problem,
> which makes the whole thing almost unusable for us.
>
> The problem is that each time the nfs server is rebooted, I have
> to - it boils down to - forcibly REBOOT each client which has
> some mounts from the said server. Because, well, remount - in
> theory - should be sufficient, but I can't perform remount because
> the filesystem(s) in question are busy.
>
> Here's a typical situation (after reboot):
>
> # ls /net/gnome/home
> ls: cannot access /net/gnome/home: Stale NFS file handle
>
> # mount | tail -n2
> gnome:/ on /net/gnome type nfs4
> (rw,nosuid,nodev,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.88.2,addr=192.168.88.4)
>
> gnome:/home on /net/gnome/home type nfs4
> (rw,nosuid,nodev,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.88.2,addr=192.168.88.4)
>
>
> # umount /net/gnome/home
> umount.nfs4: /net/gnome/home: device is busy
> umount.nfs4: /net/gnome/home: device is busy
>
> # umount -f /net/gnome/home
> umount2: Device or resource busy
> umount.nfs4: /net/gnome/home: device is busy
> umount2: Device or resource busy
> umount.nfs4: /net/gnome/home: device is busy
>
> # umount -f /net/gnome
> umount2: Device or resource busy
> umount.nfs4: /net/gnome: device is busy
> umount2: Device or resource busy
> umount.nfs4: /net/gnome: device is busy

> At this point, there are two ways:
>
> 1. try to find and kill all processes which are using
> the mountpoint. But in almost all cases it is not
> possible since there is at least one process which is
> in D state and unkillable, so we proceed to variant 2:
>
> 2. echo b > /proc/sysrq-trigger
> or something of this sort, since it will not be able
> umount / anyway.
>
> Note that even if 1. succeed, the system is unusable anyway,
> since it is here to service users. So it is simpler and
> faster to proceed to 2. stright away.
>
> What can be done to stop the "Stale NFS handle" situation
> from happening -- except of stopping rebooting the server?
> At least with nfs3 it has been almost solved (almost, because
> from time to time it still happens even with nfs3, leading
> to the same issue).

First, what is the kernel version on your server and clients?

ESTALE after a server reboot usually means that the file handle of the 
exported root has changed.  Can you tell us what physical file system 
type is being exported?

Since you are using automounter and /net, it could also mean that your 
clients are mounting an export, and after the server reboot, that export 
no longer exists.  So, one way you could fix this is by using static mounts.

Capturing a network trace on one of the clients across a server reboot 
could tell you how the server exports are changing across the reboot to 
cause the client heartburn.

-- 
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-04-16 15:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-16  6:12 Why is remount necessary after rebooting server? Michael Tokarev
2010-04-16 15:49 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox