* nfs-utils, umount -l, and unmount requests
@ 2008-11-19 23:54 David Mathog
[not found] ` <E1L2wsc-0004Rs-2I-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: David Mathog @ 2008-11-19 23:54 UTC (permalink / raw)
To: linux-nfs
[I hope this is the right place to send this. I tried
agud@redhat.com, who is listed on the relevant file's copyright,
but that mailbox was disabled.]
This concerns nfsmount.c in nfs-utils-1.1.4.
I spent most of today trying to figure out why after upgrading an NFS
client machine from Mandriva 2007.1 to 2008.1 suddenly the NFS server
stopped logging "authenticated unmount request" messages when the
client rebooted. It turned out this was a consequence of the umount
"-l" flag, which is used in the shutdown scripts and the many recent
changes in the organization of the umount commands. Long story short,
under Mandriva 2007.1 umount looked for umount.nfs, which was not
present in that release, and then went on to send the unmount request
itself. On Mandriva 2008.1, where umount.nfs is present, the ball
was passed to this program, which promptly dropped it if "-l" had been
specified. The NFS directory was unmounted correctly, but the "unmount
request" was never sent.
I traced this issue down to these lines of code in nfsumount.c,
starting at line 351 (some may wrap):
if (mc) {
if (!lazy && strcmp(mc->m.mnt_type, "nfs4") != 0)
/* We ignore the error from do_nfs_umount23.
* If the actual umount succeeds (in del_mtab),
* we don't want to signal an error, as that
* could cause /sbin/mount to retry!
*/
do_nfs_umount23(mc->m.mnt_fsname, mc->m.mnt_opts);
ret = del_mtab(mc->m.mnt_fsname, mc->m.mnt_dir);
} else if (*spec != '/') {
if (!lazy)
ret = do_nfs_umount23(spec, "tcp,v3");
} else
ret = del_mtab(NULL, spec);
return ret;
Removing the first "!lazy &&" resolved my immediate problem. What I
don't understand is why it, and the latter "!lazy", were there in
the first place. The do_nfs_umount23() routine seems to be relatively
harmless, it either sends the message or it doesn't, but either way it
doesn't seem to do anything to the mount information on the local node.
So why is it not run if "-l" (lazy) is set?
Thanks,
David Mathog
mathog-7GExONQZ6ZKVc3sceRu5cw@public.gmane.org
Manager, Sequence Analysis Facility, Biology Division, Caltech
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <E1L2wsc-0004Rs-2I-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>]
* Re: nfs-utils, umount -l, and unmount requests [not found] ` <E1L2wsc-0004Rs-2I-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org> @ 2008-11-21 11:08 ` Steve Dickson [not found] ` <492696AD.8040905-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Steve Dickson @ 2008-11-21 11:08 UTC (permalink / raw) To: David Mathog; +Cc: linux-nfs David Mathog wrote: > > if (mc) { > if (!lazy && strcmp(mc->m.mnt_type, "nfs4") != 0) > /* We ignore the error from do_nfs_umount23. > * If the actual umount succeeds (in del_mtab), > * we don't want to signal an error, as that > * could cause /sbin/mount to retry! > */ > do_nfs_umount23(mc->m.mnt_fsname, mc->m.mnt_opts); > ret = del_mtab(mc->m.mnt_fsname, mc->m.mnt_dir); > } else if (*spec != '/') { > if (!lazy) > ret = do_nfs_umount23(spec, "tcp,v3"); > } else > ret = del_mtab(NULL, spec); > return ret; > > Removing the first "!lazy &&" resolved my immediate problem. What I > don't understand is why it, and the latter "!lazy", were there in > the first place. To quote from the umount(8) man page: -l Lazy unmount. Detach the filesystem from the filesystem hierar- chy now, and cleanup all references to the filesystem as soon as it is not busy anymore. (Requires kernel 2.4.11 or later.) Which means in an NFS context that no RPC calls (i.e. call to the server) can be made that could possibility hang > The do_nfs_umount23() routine seems to be relatively > harmless, it either sends the message or it doesn't, but either way it > doesn't seem to do anything to the mount information on the local node. > So why is it not run if "-l" (lazy) is set? Take a closer look... nfs_call_umount() makes a clnt_call() which would hang if the server is down, which is the reason its not called. I hope this helps.... steved. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <492696AD.8040905-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>]
* Re: nfs-utils, umount -l, and unmount requests [not found] ` <492696AD.8040905-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org> @ 2008-11-21 14:59 ` Chuck Lever 0 siblings, 0 replies; 5+ messages in thread From: Chuck Lever @ 2008-11-21 14:59 UTC (permalink / raw) To: Steve Dickson, David Mathog; +Cc: Linux NFS Mailing List On Nov 21, 2008, at 6:08 AM, Steve Dickson wrote: > David Mathog wrote: >> >> if (mc) { >> if (!lazy && strcmp(mc->m.mnt_type, "nfs4") != 0) >> /* We ignore the error from do_nfs_umount23. >> * If the actual umount succeeds (in del_mtab), >> * we don't want to signal an error, as that >> * could cause /sbin/mount to retry! >> */ >> do_nfs_umount23(mc->m.mnt_fsname, mc->m.mnt_opts); >> ret = del_mtab(mc->m.mnt_fsname, mc->m.mnt_dir); >> } else if (*spec != '/') { >> if (!lazy) >> ret = do_nfs_umount23(spec, "tcp,v3"); >> } else >> ret = del_mtab(NULL, spec); >> return ret; >> >> Removing the first "!lazy &&" resolved my immediate problem. Can you explain further what your "immediate problem" was? >> What I >> don't understand is why it, and the latter "!lazy", were there in >> the first place. > To quote from the umount(8) man page: > > -l Lazy unmount. Detach the filesystem from the > filesystem hierar- > > chy now, and cleanup all references to the filesystem > as soon as > > it is not busy anymore. (Requires kernel 2.4.11 or > later.) > > Which means in an NFS context that no RPC calls (i.e. call to the > server) > can be made that could possibility hang > > >> The do_nfs_umount23() routine seems to be relatively >> harmless, it either sends the message or it doesn't, but either way >> it >> doesn't seem to do anything to the mount information on the local >> node. >> So why is it not run if "-l" (lazy) is set? > Take a closer look... nfs_call_umount() makes a clnt_call() which > would hang if the server is down, which is the reason its not called. > > I hope this helps.... To clarify that, "-l" is used in cases where there are still outstanding NFS RPCs in the kernel for that mount point, but they can't complete because the server isn't available. Thus the mount point appears busy, and won't umount. "umount -l" is used here to prevent the system shutdown from hanging. When there may be outstanding NFS requests, the client really shouldn't send the MNT_UMNT. In general, though, MNT_UMNT is entirely advisory, and the client shouldn't wait for it to complete. The server maintains a list of clients that have mounted each export, but that list isn't used for anything. So, it shouldn't matter whether the MNT_UMNT is sent. One of the changes I made in my IPv6 patches is to shorten the timeout on that clnt_call(). We could even go so far as to make MNT_UMNT entirely asynchronous and not wait for the reply. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfs-utils, umount -l, and unmount requests
@ 2008-11-21 16:58 David Mathog
[not found] ` <E1L3ZKo-0005DO-74-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: David Mathog @ 2008-11-21 16:58 UTC (permalink / raw)
To: Chuck Lever, Steve Dickson, David Mathog, Linux NFS Mailing List
> >> Removing the first "!lazy &&" resolved my immediate problem.
>
> Can you explain further what your "immediate problem" was?
It was the typical hall of mirrors:
1. upgrade NFS clients Mandriva 2007.1 -> Mandriva 2008.1
2. change in log files on server when client reboots - no "unmount
request" messages logged. Otherwise client shutdown seemed normal.
3. tracked that to the /etc/rc.d/init.d/netfs script, which came down to
a "umount -f -l", where the behavior of -l had changed between releases.
4. tracked the -l behavior through a change in umount (actually two)
such that in the current version the umount.nfs function is entirely
dependent on the external program, whereas in the former version it
would do it itself if that external program was not found.
5. tracked the -l behavior to the !lazy line in nfsumount.c which was
mentioned in the first post in this thread.
6. removing the "!lazy &&" restored the previous logging behavior by
the client on shutdown.
> To clarify that, "-l" is used in cases where there are still
> outstanding NFS RPCs in the kernel for that mount point, but they
> can't complete because the server isn't available. Thus the mount
> point appears busy, and won't umount. "umount -l" is used here to
> prevent the system shutdown from hanging.
>
> When there may be outstanding NFS requests, the client really
> shouldn't send the MNT_UMNT. In general, though, MNT_UMNT is entirely
> advisory, and the client shouldn't wait for it to complete.
That's what I'm getting at. But rather than trying to send it, and
doing so in such a way that it won't lock up if the server is down, it
sends nothing. This changes the behavior of a normal shutdown as seen
from the server side.
> One of the changes I made in my IPv6 patches is to shorten the timeout
> on that clnt_call(). We could even go so far as to make MNT_UMNT
> entirely asynchronous and not wait for the reply.
That sounds good to me. If the client makes a "reasonable effort" to
send the "unmount request" to the server, so that under normal working
conditions the server will be notified, that should be sufficient.
One thing that bothers me though are these two situations:
dd if=/dev/zero of=/mountpoint/datafile count=1000000 bs=512
umount -f -l /mountpoint
and
dd if=/dev/zero of=/mountpoint/datafile count=1000000 bs=512 &
umount -f -l /mountpoint
What happens to the data being written to datafile, does it make it to
the server, assuming the server is up and working normally? The client
mounted /mountpoint with bg,hard,intr,rw.
Thanks,
David Mathog
mathog-7GExONQZ6ZKVc3sceRu5cw@public.gmane.org
Manager, Sequence Analysis Facility, Biology Division, Caltech
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <E1L3ZKo-0005DO-74-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>]
* Re: nfs-utils, umount -l, and unmount requests [not found] ` <E1L3ZKo-0005DO-74-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org> @ 2008-11-21 17:43 ` Chuck Lever 0 siblings, 0 replies; 5+ messages in thread From: Chuck Lever @ 2008-11-21 17:43 UTC (permalink / raw) To: David Mathog; +Cc: Steve Dickson, Linux NFS Mailing List On Nov 21, 2008, at Nov 21, 2008, 11:58 AM, David Mathog wrote: >>>> Removing the first "!lazy &&" resolved my immediate problem. >> >> Can you explain further what your "immediate problem" was? > > It was the typical hall of mirrors: > > 1. upgrade NFS clients Mandriva 2007.1 -> Mandriva 2008.1 > 2. change in log files on server when client reboots - no "unmount > request" messages logged. Otherwise client shutdown seemed normal. I'm trying to understand why that is a problem. The only issue here is that "showmount" against the server will display clients that may be no longer mounting its exports. Otherwise, operationally there is really no difference. MNT_UMNT is advisory. It is not required, and proper server operation does not depend on it. > 3. tracked that to the /etc/rc.d/init.d/netfs script, which came > down to > a "umount -f -l", where the behavior of -l had changed between > releases. > 4. tracked the -l behavior through a change in umount (actually two) > such that in the current version the umount.nfs function is entirely > dependent on the external program, whereas in the former version it > would do it itself if that external program was not found. > 5. tracked the -l behavior to the !lazy line in nfsumount.c which was > mentioned in the first post in this thread. > 6. removing the "!lazy &&" restored the previous logging behavior by > the client on shutdown. > >> To clarify that, "-l" is used in cases where there are still >> outstanding NFS RPCs in the kernel for that mount point, but they >> can't complete because the server isn't available. Thus the mount >> point appears busy, and won't umount. "umount -l" is used here to >> prevent the system shutdown from hanging. >> >> When there may be outstanding NFS requests, the client really >> shouldn't send the MNT_UMNT. In general, though, MNT_UMNT is >> entirely >> advisory, and the client shouldn't wait for it to complete. > > That's what I'm getting at. But rather than trying to send it, and > doing so in such a way that it won't lock up if the server is down, it > sends nothing. This changes the behavior of a normal shutdown as seen > from the server side. The client shouldn't send MNT_UMNT if it thinks the local file system wasn't actually unmounted. With "umount -l" the kernel doesn't finish unmounting until all outstanding requests are complete. Since that's all in the background with "-l", the umount command exits long before that, so it can't know whether to send MNT_UMNT or not. In the normal case (ie "-l" was not specified) the kernel unmounts the NFS file system, then the system call returns and reports success or failure. The umount command can then assess whether to send the MNT_UMNT call. >> One of the changes I made in my IPv6 patches is to shorten the >> timeout >> on that clnt_call(). We could even go so far as to make MNT_UMNT >> entirely asynchronous and not wait for the reply. > > That sounds good to me. If the client makes a "reasonable effort" to > send the "unmount request" to the server, so that under normal working > conditions the server will be notified, that should be sufficient. Except when the umount.nfs command can't tell if the kernel has actually unmounted the file system or not. > One thing that bothers me though are these two situations: > > dd if=/dev/zero of=/mountpoint/datafile count=1000000 bs=512 > umount -f -l /mountpoint > > and > > dd if=/dev/zero of=/mountpoint/datafile count=1000000 bs=512 & > umount -f -l /mountpoint > > What happens to the data being written to datafile, does it make it to > the server, assuming the server is up and working normally? The > client > mounted /mountpoint with bg,hard,intr,rw. I'm not exactly sure how this would work, especially with the presence of "-f". The exact behavior of these flags has changed over time. But the client should wait for all outstanding I/O to complete and for applications to close all files before the file system is actually unmounted. "Hard" will cause the client to retry these requests forever. It may even be the case that the kernel removes that file system from the system's file namespace to prevent other applications from opening files there. This is why "umount -l" is probably reasonable during shutdown, but perhaps not for normal operation. During shutdown, any applications left running will be killed by the shutdown script anyway. My opinion though is that the shutdown script should try a normal umount first, then if it retries, use "umount -l" as a last resort so the rest of shutdown processing can continue in an orderly fashion. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-11-21 17:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-19 23:54 nfs-utils, umount -l, and unmount requests David Mathog
[not found] ` <E1L2wsc-0004Rs-2I-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>
2008-11-21 11:08 ` Steve Dickson
[not found] ` <492696AD.8040905-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
2008-11-21 14:59 ` Chuck Lever
-- strict thread matches above, loose matches on Subject: below --
2008-11-21 16:58 David Mathog
[not found] ` <E1L3ZKo-0005DO-74-Mlb+9xX7RBA8OKIjkCRLjtKIQNXEaThN@public.gmane.org>
2008-11-21 17:43 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.