From: "Gregory Baker" <gregory.baker@amd.com>
To: "Chuck Lever" <chucklever@gmail.com>
Cc: autofs@linux.kernel.org, nfs@lists.sourceforge.net
Subject: Re: bug in linux mount? (says NetApp)
Date: Fri, 14 Jul 2006 15:36:59 -0500 [thread overview]
Message-ID: <44B8006B.4090904@amd.com> (raw)
In-Reply-To: <76bd70e30607111321m2d35fe5etc3475ac65efa9f0@mail.gmail.com>
Ahh... I should have expanded "linux clients" to "linux clients running
RHEL 3 U5".
[greg@apathy greg]$ rpm -qa util-linux
util-linux-2.11y-31.6
Red Hat support has this to say...
[...snip...]
"I have been looking into this issue and I have found other people are
experiencing similar behavior. I also found a fix that was added to the
util-linux package that I think addresses this issue...... I believe
this is what Chuck refers to with his comment "and I believe later
releases of RHEL 3 were fixed to do this"
>From the upstream package change log:
"RHEL3 util-linux >=2.11y-31.8 should make the default 70s (instead of
7s) for TCP mounts:
* Wed Jun 8 2005 Steve Dickson <SteveD@RedHat.com> 2.11y-31.8
- Changed nfsmount to retry calls to mountd in foreground as
well as in background (bz# 138775)
- Increased TCP timeouts to 70 secs (bz# 151097)"
I am pretty sure this will fix the problem that you are seeing. The
util-linux package in the Red Hat Enterprise Linux AS (v. 3 for x86)
Beta channel on RHN is version util-linux-2.11y-31.16.i386.rpm, which
shold have this fix in it."
[...snip...]
The bug/errata
http://rhn.redhat.com/errata/RHBA-2005-626.html
became available in RHEL3 U6. Sigh.
We skipped U3, U4 (autofs woes) U6 (just finished upgrading from U2->U5
and dealing with fallout) and recently began using U7 (to support Sun
x4100 SAS drives).
Thanks,
--Greg
Chuck Lever wrote:
> On 7/11/06, Gregory Baker <gregory.baker@amd.com> wrote:
>> We have thousands of linux clients hitting netapp file servers (many
>> 3500 series, clustered) on a local gigabit LAN. From time to time,
>> applications return "file not found" when attempting to automount a
>> directory and access a file. An example of this is a long running
>> process, which reads in data, processes it for hours (in which time the
>> filesystem is unmounted) then tries to read more data from that mount
>> point (which causes a "file not found" error in the application). This
>> occurs about 1/100th of the time.
>>
>> Researching at Netapp turns up this bit by Chuck Lever (Linux NFS
>> contributer)
>>
>> "Using the Linux NFS Client with Network Appliance Filers"
>> http://www.netapp.com/libr ary/tr/3183.pdf (February 2006)
>>
>> page 10 says...
>>
>> "Due to a bug in the mount command, the default retransmission timeout
>> value on Linux for NFS over TCP is quite small...To obtain standard
>> behavior, we strongly recommend using "timeo=600, retrans=2" explicitly
>> when mounting via TCP."
>>
>> Our defaults (assuming man pages are correct, RedHat Enterprise Linux 3)
>> would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105 tenths
>> of a second (10 seconds). It appears netapp is suggesting waiting
>> 600+600 = 1200 tenths (120 seconds) before giving up on the mount
>> command...
>
> It's important to distinguish two different types of timeouts.
>
> 1. The mount operation has timed out.
>
> 2. After the mount operation succeeds, an NFS RPC operation has timed out.
>
> TR-3183 discusses the proper settings for 2, but you are experiencing 1.
>
> The automounter attempts to mount one of the filer's exports, but the
> mount request times out causing the mounted-on directory to be
> exposed. Your filer is heavily loaded, and the filer's mountd is
> single-threaded. The filer may also be experiencing delays when
> requesting information from external servers (like DNS or NIS), in
> which case the mount request is held up at the filer.
>
> Both sides are at fault: the Linux mount command should retry (and I
> believe later releases of RHEL 3 were fixed to do this) and the filer
> configuration should be reviewed to make sure there are no avoidable
> delays while processing mount requests.
>
>> * What "bug" in the mount command do you believe NetApp is talking about?
>
> The bug is that the mount command overrides the proper default RPC
> timeout value with a timeout value of 0.7 seconds. This is *not* the
> timeout for mount operations, it is the timeout for the in-kernel NFS
> client to retransmit RPC requests.
>
>> * What do you think proper options for NFS auto/mounts would be for
>> extremely busy centralized NFS filers?
>
> If you are using NFS over TCP, the proper timeout value is 60 seconds.
>
>> * What is the reference standard behavior?
>
> Solaris, which is the NFSv3 reference implementation, uses effectively
> a 60 second timeout on TCP mounts.
>
--
----------------------------------------------------------------------
Greg Baker 512-602-3287 (work)
gregory.baker@amd.com 512-602-6970 (fax)
5900 E. Ben White Blvd MS 626 512-555-1212 (info)
Austin, TX 78741
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2006-07-14 23:47 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-11 19:00 bug in linux mount? (says NetApp) Gregory Baker
2006-07-11 20:21 ` Chuck Lever
2006-07-14 20:36 ` Gregory Baker [this message]
2006-07-11 23:27 ` [NFS] " Trond Myklebust
2006-07-11 23:34 ` Gregory Baker
2006-07-12 3:03 ` [autofs] " Ian Kent
2006-07-12 12:19 ` Trond Myklebust
2006-07-12 9:32 ` James Pearson
2006-07-12 0:40 ` Blake Golliher
2006-07-12 1:07 ` Gregory Baker
-- strict thread matches above, loose matches on Subject: below --
2006-07-12 2:13 Gomez, Daniel
2006-07-12 20:23 Murata, Dennis W (SAIC)
2006-07-13 12:55 ` Ian Kent
2006-07-13 13:37 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44B8006B.4090904@amd.com \
--to=gregory.baker@amd.com \
--cc=autofs@linux.kernel.org \
--cc=chucklever@gmail.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox