All of lore.kernel.org
 help / color / mirror / Atom feed
* bug in linux mount? (says NetApp)
@ 2006-07-11 19:00 Gregory Baker
  2006-07-11 20:21 ` Chuck Lever
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Gregory Baker @ 2006-07-11 19:00 UTC (permalink / raw)
  To: nfs; +Cc: autofs


We have thousands of linux clients hitting netapp file servers (many 
3500 series, clustered) on a local gigabit LAN.  From time to time, 
applications return "file not found" when attempting to automount a 
directory and access a file.  An example of this is a long running 
process, which reads in data, processes it for hours (in which time the 
filesystem is unmounted) then tries to read more data from that mount 
point (which causes a "file not found" error in the application).  This 
occurs about 1/100th of the time.

Researching at Netapp turns up this bit by Chuck Lever (Linux NFS 
contributer)

"Using the Linux NFS Client with Network Appliance Filers"
http://www.netapp.com/libr ary/tr/3183.pdf  (February 2006)

page 10 says...

"Due to a bug in the mount command, the default retransmission timeout 
value on Linux for NFS over TCP is quite small...To obtain standard 
behavior, we strongly recommend using "timeo=600, retrans=2" explicitly 
when mounting via TCP."

Our defaults (assuming man pages are correct, RedHat Enterprise Linux 3) 
would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105 tenths 
of a second (10 seconds).  It appears netapp is suggesting waiting 
600+600 = 1200 tenths (120 seconds) before giving up on the mount command...

* What "bug" in the mount command do you believe NetApp is talking about?

* What do you think proper options for NFS auto/mounts would be for 
extremely busy centralized NFS filers?

* What is the reference standard behavior?

Thanks,

--Greg

-- 
----------------------------------------------------------------------
Greg Baker                                         512-602-3287 (work)
gregory.baker@amd.com                              512-602-6970 (fax)
5900 E. Ben White Blvd MS 626                      512-555-1212 (info)
Austin, TX 78741





-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: bug in linux mount? (says NetApp)
@ 2006-07-12  2:13 ` Gomez, Daniel
  0 siblings, 0 replies; 16+ messages in thread
From: Gomez, Daniel @ 2006-07-12  2:13 UTC (permalink / raw)
  To: autofs, nfs

Hello,

>> * What do you think proper options for NFS auto/mounts would be for
>> extremely busy centralized NFS filers?
>
>Something like
>
>mount -t nfs -ohard,timeo=600,retrans=2,rsize=32768,wsize=32768,tcp
foo:/ /bar
>
>should be a fairly safe bet. You might want to add the 'intr' flag too,
depending on how you feel about the behaviour w.r.t. pressing ^C.


We use the following setup with good results and without loss of
functionality (assuminig linux defaults [eg, 'hard','rw', etc]):

For read-only filesystem:
	
ro,rsize=32768,wsize=32768,timeo=600,actimeo=600,intr,tcp,nfsvers=3

For read-write:
	rsize=32768,wsize=32768,timeo=600,actimeo=120,intr,tcp,nfsvers=3

-Daniel


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: bug in linux mount? (says NetApp)
@ 2006-07-12 20:23 ` Murata, Dennis W (SAIC)
  0 siblings, 0 replies; 16+ messages in thread
From: Murata, Dennis W (SAIC) @ 2006-07-12 20:23 UTC (permalink / raw)
  To: Trond Myklebust, gregory.baker; +Cc: autofs, nfs

I am seeing something very similar to the problem Greg has stated.  We
are using udp rather than tcp as the transport protocol.  Should we be
using tcp rather than udp?  That seems to be the recommendation.  I am
testing configuration with tcp with the following arguments:

	DAEMONOPTIONS="--timeout=60
rsize=32768,wsize=32768,tcp,timeo=600,retrans=2,bg"

We are using automount for all the nfs directories, nothing is listed in
the /etc/fstab.  The nis maps are legacy from Solaris, and we still use
Solaris NIS servers.  I am little reluctant to modify the maps
themselves if I don't have to.  Will this work using the DAEMONOPTIONS
in /etc/sysconfig/autofs?  From the mount command I see:

nfsserver:/vol/vol1/home/foo on /home/foo type nfs
(rw,nosuid,rsize=32768,wsize=32768,tcp,timeo=600,retrans=2,bg,intr,retry
=1000,vers=3,addr=XXX.XXX.XXX.XXX)

The entry from /proc/mounts does not list the values for timeo or
retrans:

nfsserver:/vol/vol1/home/foo /home/foo nfs
rw,nosuid,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=nfsserver 0
0

Is this normal?

Wayne Murata

-----Original Message-----
From: nfs-bounces@lists.sourceforge.net
[mailto:nfs-bounces@lists.sourceforge.net]On Behalf Of Trond Myklebust
Sent: Tuesday, July 11, 2006 6:28 PM
To: gregory.baker@amd.com
Cc: autofs@linux.kernel.org; nfs@lists.sourceforge.net
Subject: Re: [NFS] bug in linux mount? (says NetApp)


On Tue, 2006-07-11 at 14:00 -0500, Gregory Baker wrote:
> We have thousands of linux clients hitting netapp file servers (many 
> 3500 series, clustered) on a local gigabit LAN.  From time to time, 
> applications return "file not found" when attempting to automount a 
> directory and access a file.  An example of this is a long running 
> process, which reads in data, processes it for hours (in which time
the 
> filesystem is unmounted) then tries to read more data from that mount 
> point (which causes a "file not found" error in the application).
This 
> occurs about 1/100th of the time.
> 
> Researching at Netapp turns up this bit by Chuck Lever (Linux NFS 
> contributer)
> 
> "Using the Linux NFS Client with Network Appliance Filers"
> http://www.netapp.com/libr ary/tr/3183.pdf  (February 2006)
> 
> page 10 says...
> 
> "Due to a bug in the mount command, the default retransmission timeout

> value on Linux for NFS over TCP is quite small...To obtain standard 
> behavior, we strongly recommend using "timeo=600, retrans=2"
explicitly 
> when mounting via TCP."
> 
> Our defaults (assuming man pages are correct, RedHat Enterprise Linux
3) 
> would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105
tenths 
> of a second (10 seconds).  It appears netapp is suggesting waiting 
> 600+600 = 1200 tenths (120 seconds) before giving up on the mount
command...

No they are not. See below.

> * What "bug" in the mount command do you believe NetApp is talking
about?

It has nothing to do with the mount timeout: Chuck is talking about the
retransmission timeout for TCP connections 'timeo' which should indeed
be set to a high value since TCP guarantees message delivery (unlike UDP
which requires a small timeo value). Setting it too low means that you
end up spamming your server with a load of unnecessary retransmissions.

This was indeed the case for some older versions of 'mount' and also for
older versions of the am-utils/amd automounters.

> * What do you think proper options for NFS auto/mounts would be for 
> extremely busy centralized NFS filers?

Something like

mount -t nfs -ohard,timeo=600,retrans=2,rsize=32768,wsize=32768,tcp
foo:/ /bar

should be a fairly safe bet. You might want to add the 'intr' flag too,
depending on how you feel about the behaviour w.r.t. pressing ^C.

> * What is the reference standard behavior?

To which reference are you referring?

Cheers,
  Trond



------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-07-14 20:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 19:00 bug in linux mount? (says NetApp) Gregory Baker
2006-07-11 20:21 ` Chuck Lever
2006-07-14 20:36   ` Gregory Baker
2006-07-11 23:27 ` [NFS] " Trond Myklebust
2006-07-11 23:34   ` Gregory Baker
2006-07-12  3:03   ` [autofs] " Ian Kent
2006-07-12 12:19     ` Trond Myklebust
2006-07-12  9:32   ` James Pearson
2006-07-12  0:40 ` Blake Golliher
2006-07-12  1:07   ` Gregory Baker
  -- strict thread matches above, loose matches on Subject: below --
2006-07-12  2:13 Gomez, Daniel
2006-07-12  2:13 ` Gomez, Daniel
2006-07-12 20:23 Murata, Dennis W (SAIC)
2006-07-12 20:23 ` Murata, Dennis W (SAIC)
2006-07-13 12:55 ` Ian Kent
2006-07-13 13:37 ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.