All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: df hangs on down nfs server mounted with hard,intr,can't kill
@ 2004-03-15 22:57 Lever, Charles
  2004-03-16  2:26 ` Steve Dickson
  0 siblings, 1 reply; 14+ messages in thread
From: Lever, Charles @ 2004-03-15 22:57 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

hi steve-

downloaded your patches from 30 dec '03.

zeroing the RPC counters is a little outside the scope of my
patch, as i'm providing a whole new interface that is outside
the /proc/net/rpc/* hierarchy.  this is work that might/should
go in separately from mine?  (my patch will allow zeroing the
stats that i'm providing, which are different than the ones
already provided in /proc/net/rpc/nfs).

but one question:  why do you use a hex value for this interface?
i thought /proc was supposed to be human-readable.  why not use
"z" "zc" "zs" just like your nfsstat arguments so that one can
just as easily do:

  echo "zs" > /proc/net/rpc/nfs

as well as

  nfsstat -zs

> -----Original Message-----
> From: Steve Dickson [mailto:SteveD@redhat.com]=20
> Sent: Monday, March 15, 2004 1:16 PM
> To: Trond Myklebust
> Cc: Yusuf Goolamabbas; nfs@lists.sourceforge.net
> Subject: Re: [NFS] df hangs on down nfs server mounted with=20
> hard,intr, can't kill
>=20
>=20
> Trond Myklebust wrote:
>=20
> >P=E5 m=E5 , 15/03/2004 klokka 00:21, skreiv Yusuf Goolamabbas:
> > =20
> >
> >>Also, what's the status with the nfszerostats patch from=20
> Steve Dickson=20
> >>of Redhat
> >>
> >>http://people.redhat.com/steved/NFS/nfszerostats/
> >>   =20
> >>
> >
> >Chuck is currently working on some enhanced NFS statistics=20
> patches. I=20
> >imagine he will take steps to include Steve's patches in his=20
> work, and=20
> >then push them to me.
> > =20
> >
> FYI... The nfs-utils code to support nfszerostats is already
> in the FC2 rawhides... I'm hopeful Neil will include this
> patch in the next nfs-utils release....
>=20
> SteveD.
>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President=20
> and CEO of GenToo technologies. Learn everything from=20
> fundamentals to system=20
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=3Dick
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net=20
> https://lists.sourceforge.net/lists/listinfo/n> fs
>=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: df hangs on down nfs server mounted with hard,intr, can't kill
@ 2004-03-09 19:51 Lever, Charles
  2004-03-09 20:08 ` Wade Hampton
  2004-03-10  8:09 ` Olaf Kirch
  0 siblings, 2 replies; 14+ messages in thread
From: Lever, Charles @ 2004-03-09 19:51 UTC (permalink / raw)
  To: Wade Hampton, nfs

hi wade-

the fact that "intr" doesn't work as expected is a bug, and
folks are attempting to address this at least partially in 2.6.

if you want a way to do a "df" without hanging your client,
try using a soft mount with a short-ish timeout for your
df requests.

caveat:  read the Linux NFS FAQ for more on using "soft" safely.

> -----Original Message-----
> From: Wade Hampton [mailto:wade.hampton@nsc1.net]=20
> Sent: Tuesday, March 09, 2004 2:33 PM
> To: nfs@lists.sourceforge.net
> Subject: [NFS] df hangs on down nfs server mounted with=20
> hard,intr, can't kill
>=20
>=20
> [I posted this to the Fedora list yesterday.]
>=20
> I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting=20
> a remote solaris server (hence choice of options):
>=20
>   rsize=3D32768,ro,hard,intr,tcp,nfsvers=3D3
>=20
> When the remote is down or disconnected, a "df" hangs (as=20
> expected), but I can't kill it, even as root or with kill -9.=20
>  The docs for mount=20
> indicate
> that the INTR option should allow for killing apps mounted=20
> with HARD. Is this a bug (glibc, 2.4 kernel, NFS, or Fedora's kernel)?
>=20
> I also coded a test program that calls statvfs(2) and it=20
> hangs on the statvfs(2) call when run against a down NFS=20
> server.  It too can't be interrupted or killed.
>=20
> My questions are:
>=20
> 1)  Is there a safe and reliable means to check for a down NFS server
>     (e.g., is showmount -e <server> safe enough?)
>=20
> 2)  Is the non-interruptable operation (even with INTR option)
>     a bug or feature?
>=20
> 3)  Is there a simple kernel call, /proc entry, or similar that can
>    be used to reliably check for free/used disk space and for a down
>    host, without hanging my application?
>  =20
>        A showmount -e followed by a statvfs() might work, but
>        there is the possibility of losing the host between the two
>        calls, resulting in an application hang.
>=20
> 4)  Is there a perl module to accomplish this?
>=20
> This would be very useful for network monitoring, e.g., when=20
> the server goes down and stays down for >1 minute, generate=20
> an SNMP trap and write to a log file.  It would be good if=20
> you can't put an SNMP agent on the server, but only on the=20
> client.  It is also useful for writing a highly reliable=20
> client application.
>=20
> As I have no control over the remote system, when it went=20
> down, I had to do a hard reboot of my Linux box to stop the=20
> hung apps.  This is a Windows solution, not a Linux solution....
> =20
> Note, I found this when writing some scripts for MRTG to=20
> check the disk utilization of partitions.  My df's hung so I=20
> didn't even get the proper values for my local partitions. =20
> After a few days, I had LOTS of hung MRTG apps and had to=20
> reboot (this test server is down for a week or two).
>=20
> Thanks
> --=20
> Wade Hampton
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President=20
> and CEO of GenToo technologies. Learn everything from=20
> fundamentals to system=20
> =
administration.http://ads.osdn.com/?ad_id=3D1470&alloc_id=3D3638&op=3Dcli=
ck
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net=20
> https://lists.sourceforge.net/lists/listinfo/n> fs
>=20


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 14+ messages in thread
* df hangs on down nfs server mounted with hard,intr, can't kill
@ 2004-03-09 19:33 Wade Hampton
  2004-03-10  2:40 ` Ian Kent
  0 siblings, 1 reply; 14+ messages in thread
From: Wade Hampton @ 2004-03-09 19:33 UTC (permalink / raw)
  To: nfs

[I posted this to the Fedora list yesterday.]

I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
remote solaris server (hence choice of options):

  rsize=32768,ro,hard,intr,tcp,nfsvers=3

When the remote is down or disconnected, a "df" hangs (as expected),
but I can't kill it, even as root or with kill -9.  The docs for mount 
indicate
that the INTR option should allow for killing apps mounted with HARD.
Is this a bug (glibc, 2.4 kernel, NFS, or Fedora's kernel)?

I also coded a test program that calls statvfs(2) and it hangs on
the statvfs(2) call when run against a down NFS server.  It too
can't be interrupted or killed.

My questions are:

1)  Is there a safe and reliable means to check for a down NFS server
    (e.g., is showmount -e <server> safe enough?)

2)  Is the non-interruptable operation (even with INTR option)
    a bug or feature?

3)  Is there a simple kernel call, /proc entry, or similar that can
   be used to reliably check for free/used disk space and for a down
   host, without hanging my application?
  
       A showmount -e followed by a statvfs() might work, but
       there is the possibility of losing the host between the two
       calls, resulting in an application hang.

4)  Is there a perl module to accomplish this?

This would be very useful for network monitoring, e.g., when the
server goes down and stays down for >1 minute, generate an SNMP
trap and write to a log file.  It would be good if you can't put an SNMP
agent on the server, but only on the client.  It is also useful for writing
a highly reliable client application.

As I have no control over the remote system, when it went down,
I had to do a hard reboot of my Linux box to stop the hung apps.  This
is a Windows solution, not a Linux solution....
 
Note, I found this when writing some scripts for MRTG to check
the disk utilization of partitions.  My df's hung so I didn't even get
the proper values for my local partitions.  After a few days, I had
LOTS of hung MRTG apps and had to reboot (this test server is
down for a week or two).

Thanks
-- 
Wade Hampton


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-03-16  2:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-15 22:57 df hangs on down nfs server mounted with hard,intr,can't kill Lever, Charles
2004-03-16  2:26 ` Steve Dickson
  -- strict thread matches above, loose matches on Subject: below --
2004-03-09 19:51 df hangs on down nfs server mounted with hard,intr, can't kill Lever, Charles
2004-03-09 20:08 ` Wade Hampton
2004-03-10  8:09 ` Olaf Kirch
2004-03-10 19:18   ` Trond Myklebust
2004-03-11  9:31     ` Olaf Kirch
2004-03-11 19:44       ` Trond Myklebust
2004-03-12  8:54         ` Olaf Kirch
2004-03-15  5:21     ` Yusuf Goolamabbas
2004-03-15 17:03       ` Trond Myklebust
2004-03-15 18:15         ` Steve Dickson
2004-03-09 19:33 Wade Hampton
2004-03-10  2:40 ` Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.