From: "Gregory Baker" <gregory.baker@amd.com>
To: Chuck Lever <chucklever@gmail.com>
Cc: autofs@linux.kernel.org, nfs@lists.sourceforge.net
Subject: Re: bug in linux mount? (says NetApp)
Date: Fri, 14 Jul 2006 15:36:59 -0500 [thread overview]
Message-ID: <44B8006B.4090904@amd.com> (raw)
In-Reply-To: <76bd70e30607111321m2d35fe5etc3475ac65efa9f0@mail.gmail.com>
Ahh... I should have expanded "linux clients" to "linux clients running
RHEL 3 U5".
[greg@apathy greg]$ rpm -qa util-linux
util-linux-2.11y-31.6
Red Hat support has this to say...
[...snip...]
"I have been looking into this issue and I have found other people are
experiencing similar behavior. I also found a fix that was added to the
util-linux package that I think addresses this issue...... I believe
this is what Chuck refers to with his comment "and I believe later
releases of RHEL 3 were fixed to do this"
>From the upstream package change log:
"RHEL3 util-linux >=2.11y-31.8 should make the default 70s (instead of
7s) for TCP mounts:
* Wed Jun 8 2005 Steve Dickson <SteveD@RedHat.com> 2.11y-31.8
- Changed nfsmount to retry calls to mountd in foreground as
well as in background (bz# 138775)
- Increased TCP timeouts to 70 secs (bz# 151097)"
I am pretty sure this will fix the problem that you are seeing. The
util-linux package in the Red Hat Enterprise Linux AS (v. 3 for x86)
Beta channel on RHN is version util-linux-2.11y-31.16.i386.rpm, which
shold have this fix in it."
[...snip...]
The bug/errata
http://rhn.redhat.com/errata/RHBA-2005-626.html
became available in RHEL3 U6. Sigh.
We skipped U3, U4 (autofs woes) U6 (just finished upgrading from U2->U5
and dealing with fallout) and recently began using U7 (to support Sun
x4100 SAS drives).
Thanks,
--Greg
Chuck Lever wrote:
> On 7/11/06, Gregory Baker <gregory.baker@amd.com> wrote:
>> We have thousands of linux clients hitting netapp file servers (many
>> 3500 series, clustered) on a local gigabit LAN. From time to time,
>> applications return "file not found" when attempting to automount a
>> directory and access a file. An example of this is a long running
>> process, which reads in data, processes it for hours (in which time the
>> filesystem is unmounted) then tries to read more data from that mount
>> point (which causes a "file not found" error in the application). This
>> occurs about 1/100th of the time.
>>
>> Researching at Netapp turns up this bit by Chuck Lever (Linux NFS
>> contributer)
>>
>> "Using the Linux NFS Client with Network Appliance Filers"
>> http://www.netapp.com/libr ary/tr/3183.pdf (February 2006)
>>
>> page 10 says...
>>
>> "Due to a bug in the mount command, the default retransmission timeout
>> value on Linux for NFS over TCP is quite small...To obtain standard
>> behavior, we strongly recommend using "timeo=600, retrans=2" explicitly
>> when mounting via TCP."
>>
>> Our defaults (assuming man pages are correct, RedHat Enterprise Linux 3)
>> would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105 tenths
>> of a second (10 seconds). It appears netapp is suggesting waiting
>> 600+600 = 1200 tenths (120 seconds) before giving up on the mount
>> command...
>
> It's important to distinguish two different types of timeouts.
>
> 1. The mount operation has timed out.
>
> 2. After the mount operation succeeds, an NFS RPC operation has timed out.
>
> TR-3183 discusses the proper settings for 2, but you are experiencing 1.
>
> The automounter attempts to mount one of the filer's exports, but the
> mount request times out causing the mounted-on directory to be
> exposed. Your filer is heavily loaded, and the filer's mountd is
> single-threaded. The filer may also be experiencing delays when
> requesting information from external servers (like DNS or NIS), in
> which case the mount request is held up at the filer.
>
> Both sides are at fault: the Linux mount command should retry (and I
> believe later releases of RHEL 3 were fixed to do this) and the filer
> configuration should be reviewed to make sure there are no avoidable
> delays while processing mount requests.
>
>> * What "bug" in the mount command do you believe NetApp is talking about?
>
> The bug is that the mount command overrides the proper default RPC
> timeout value with a timeout value of 0.7 seconds. This is *not* the
> timeout for mount operations, it is the timeout for the in-kernel NFS
> client to retransmit RPC requests.
>
>> * What do you think proper options for NFS auto/mounts would be for
>> extremely busy centralized NFS filers?
>
> If you are using NFS over TCP, the proper timeout value is 60 seconds.
>
>> * What is the reference standard behavior?
>
> Solaris, which is the NFSv3 reference implementation, uses effectively
> a 60 second timeout on TCP mounts.
>
--
----------------------------------------------------------------------
Greg Baker 512-602-3287 (work)
gregory.baker@amd.com 512-602-6970 (fax)
5900 E. Ben White Blvd MS 626 512-555-1212 (info)
Austin, TX 78741
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2006-07-14 20:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-11 19:00 bug in linux mount? (says NetApp) Gregory Baker
2006-07-11 20:21 ` Chuck Lever
2006-07-14 20:36 ` Gregory Baker [this message]
2006-07-11 23:27 ` [NFS] " Trond Myklebust
2006-07-11 23:34 ` Gregory Baker
2006-07-12 3:03 ` [autofs] " Ian Kent
2006-07-12 12:19 ` Trond Myklebust
2006-07-12 9:32 ` James Pearson
2006-07-12 0:40 ` Blake Golliher
2006-07-12 1:07 ` Gregory Baker
-- strict thread matches above, loose matches on Subject: below --
2006-07-12 2:13 Gomez, Daniel
2006-07-12 2:13 ` Gomez, Daniel
2006-07-12 20:23 Murata, Dennis W (SAIC)
2006-07-12 20:23 ` Murata, Dennis W (SAIC)
2006-07-13 12:55 ` Ian Kent
2006-07-13 13:37 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44B8006B.4090904@amd.com \
--to=gregory.baker@amd.com \
--cc=autofs@linux.kernel.org \
--cc=chucklever@gmail.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.