linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: gb <bakerg3@yahoo.com>
To: nfs@lists.sourceforge.net
Cc: charles.lever@netapp.com
Subject: (no subject)
Date: Wed, 23 Apr 2003 11:38:59 -0700 (PDT)	[thread overview]
Message-ID: <20030423183859.28233.qmail@web41304.mail.yahoo.com> (raw)


...an analysis that I recently undertook is attached
below.  any comments this group would have would be
extremely beneficial.  please include
(XXXbakerg3@yahoo.com.XXX) in the reply in addition to
(XXXnfs@lists.sourceforge.net.XXX)

SUMMARY

During periods of heavy tcp-nfs traffic to a remote
nfs mounted directory on a Network Appliance filer,
linux systems will "freeze" causing processes
accessing that directory to enter an non-interruptible
deadlocked state.  Using udp-nfs mounts these problems
do not manifest themselves.

ANALYSIS and CONCLUSION

Linux tcp-nfs is not ready for production in our large
scale distributed environment with the current set of
NetApp filers.

While the root of the problem may be with the tcp-nfs
implementation on Linux, it is interesting to note
until a certain load level is generated
via tcp-nfs accessing a directory on a filer, no
problems manifest themselves.

The latest kernel available (2.4.21pre7 + patches via
Chuck Lever of NetApp) do not appear to fix the
problem.

Until this critical problem is resolved, it is a moot
point to argue the advantages of tcp-nfs vs. udp-nfs
regarding network traffic or CPU usage.

RECOMMENDATION

Force automount to use udp via the localoptions line
in 
etc/init.d/autofs.

Contact netapp with the deatils of our testing and ask
why a certain load level of tcp-nfs traffic causes
other tcp-nfs clients to go into the weeds.


Any suggestions welcome, please include me (the
poster) in your replies.

Thanks,

--Greg

(Charles, if you've read this far, please contact me
so that I can reference our NetApp case id #).

GORY DETAILS (go get something to drink first):

* Tools:

traffic generator: iozone (http://www.iozone.org)
analysis equipment: lump of meat and bone located
above the shoulders.

* Testing Procedure:

Three 'control hosts' managed a pool of linux clients
via iozone to generate traffic to target directories
stored on the netapp filers below.

1 NetApp Release 6.2.2D21: Fri Feb 28 18:39:39 PST
2003 (sphere)
    exported directory ( /u/admin-test1 quota)
1 NetApp Release 6.2.2: Wed Oct 16 01:12:25 PDT 2002
(vger)
    exported directory ( /u/admin-test2 qtree)
1 NetApp Release 6.2.2: Wed Oct 16 01:12:25 PDT 2002
(wopr)
    exported directory ( /u/admin-test3 qtree)

Each control host ran a single instance of iozone as
shown below:

ch1: iozone -t 25 -r 64 -s 10000 -+m
iozone.test1.hosts
ch2: iozone -t 25 -r 64 -s 10000 -+m
iozone.test2.hosts
ch3: iozone -t 25 -r 64 -s 10000 -+m
iozone.test3.hosts

# -t 25 25 concurrent test
# -r read in 64kb chunks
# -s size of file in kb
# -+m extended commands enabled

Where the extended command control file contains a
repetitive series of lines, one per test population
host.  Each extended command file referenced a
different nfs-mounted directory from a netapp filer.

valk004 /u/admin-test1 /tool/pandora/sbin/iozone
valk074 /u/admin-test1 /tool/pandora/sbin/iozone
go064 /u/admin-test1 /tool/pandora/sbin/iozone
	.
	.
	.

All filers are connected via fiber gig; all linux
hosts 100baseTX-FD switched.  Network backbone is
catalyst 6509 (netapp filers) and catalyst 4000/6506
(linux clients).

* Test Population A:

10 redhat 7.3 running kernel 2.4.18 using tcp-nfs
7  redhat 7.3 running kernel 2.4.21pre7 using tcp-nfs
6  redhat 7.3 running kernel 2.4.18 using udp-nfs
2  redhat 7.1 running kernel 2.4.16 using udp-nfs

* Test Results A

All clients using tcp-nfs (17/17) fail after a short
amount of time with the following errors:

"nfs server XXX not responding"
"nfs task XXX can't get a request slot"

At which point the remote directories mounted from the
NetApp filers were unavailable.  An examination of the
/proc file system shows that the iozone process
attempting to access the remote file system believes
it 
to be sleeping.

Some of the clients using udp-nfs saw the "nfs server
XXX not responding", but was typically followed with
"nfs server XXX ok".  At no point did the
remote directories mounted from the NetApp filers
become unavailable.

Stopping the traffic simulation did not allow the
clients using tcp-nfs to regain access to the remote
directories.

* Test Population B:

5 redhat 7.3 running kernel 2.4.18 using tcp-nfs
7  redhat 7.3 running kernel 2.4.21pre7 using udp-nfs
11  redhat 7.3 running kernel 2.4.18 using udp-nfs
2  redhat 7.1 running kernel 2.4.16 using udp-nfs

* Test Results B:

After a test period of 12 hours, no problems were seen
with access to remote directories for either tcp-nfs
or udp-nfs clients.

* Test Population C:

7  redhat 7.3 running kernel 2.4.21pre7 using udp-nfs
16  redhat 7.3 running kernel 2.4.18 using udp-nfs
2  redhat 7.1 running kernel 2.4.16 using udp-nfs

* Test Results C:

After a test period of 3 hours, no problems were seen
with access to remote directories for udp-nfs clients.

* Test Population D:

10 redhat 7.3 running kernel 2.4.18 using tcp-nfs
7  redhat 7.3 running kernel 2.4.21pre7 using udp-nfs
6  redhat 7.3 running kernel 2.4.18 using udp-nfs
2  redhat 7.1 running kernel 2.4.16 using udp-nfs

* Test Results D:

All clients using tcp-nfs (10/10) fail after
approximately one hour of time with the following
errors:

"nfs server XXX not responding"
"nfs task XXX can't get a request slot"

At which point the remote directories mounted from the
NetApp filers were unavailable.  An examination of the
/proc file system shows that the iozone process
attempting to access the remote file system believes
it 
to be sleeping.

# cat status 
Name:   df
State:  D (disk sleep)

Some of the clients using udp-nfs saw the "nfs server
XXX not responding", but was typically followed with
"nfs server XXX ok".  At no point did the
remote directories mounted from the NetApp filers
become unavailable.

Stopping the traffic simulation did not allow the
clients using tcp-nfs to regain access to the remote
directories.

ANALYSIS / CONCLUSION

Linux tcp-nfs is not ready for production in our large
scale distributed environment with the current set of
NetApp filers.

While the root of the problem may be with the tcp-nfs
implementation on Linux, it is interesting to note
until a certain load level is generated
via tcp-nfs accessing a directory on a filer, no
problems manifest themselves.

The latest kernel available (2.4.21pre7 + patches via
Chuck Lever of NetApp) does not appear to fix the
problem.

Until this critical problem is resolved, it is a moot
point to argue the advantages of tcp-nfs vs. udp-nfs
regarding network traffic or CPU usage.




__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2003-04-23 18:39 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-23 18:38 gb [this message]
2003-04-23 19:11 ` (no subject) Spencer Shepler
2003-04-23 19:20 ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2010-11-10 18:10 Russell Cattelan
2010-07-16 13:40 Tom H
2009-06-05 10:41 Mike Brodbelt
2007-11-08 15:36 Willis
2007-10-31 20:59 immanuel lily
2007-10-26 21:38 『晴れたらいいね』
2007-10-19  3:44 Neil Brown
2007-08-10  7:16 grikxd
2007-08-08 20:15 采购成本降低技巧及供应商管理
2007-08-06 13:45 Piotr Kandziora
2007-08-02  7:24 Piotr Kandziora
2007-08-02 15:45 ` Jeff Layton
2007-08-02  0:08 Mahoney O.Becky
2007-07-26  9:27 Olive Crosby
2007-07-25 14:36 Eldridge
2007-07-24 14:42 Dennis
2007-07-24 14:41 Bella
2007-07-16  9:48 Riccardo Bini
2007-07-14 14:24 Cummings
2007-06-20  9:00 sun lu
2007-06-07 17:05 [PATCH] locks: provide a file lease method enabling cluster-coherent leases J. Bruce Fields
2007-06-08 22:14 ` (no subject) J. Bruce Fields
2007-04-24 16:25 Fabio Olive Leite
2006-09-11  3:38 qinping
2006-07-26 10:47 Bernd Schubert
2006-07-26 11:43 ` Trond Myklebust
2006-07-08 22:27 潘思广
2006-01-17 21:37 Jonas Lihnell
2006-01-17 21:41 ` Trond Myklebust
2006-01-17 22:18   ` Jonas Lihnell
2005-11-14 21:58 Kyle Perkins
2003-10-14 19:50 Ralph Churchill
2003-10-16 19:42 ` Ralph Churchill
2003-06-04  1:59 xiyu
2003-03-10 16:32 Lever, Charles
2003-03-09 19:58 Steve Salazar
2003-01-27 14:23 Emanuel.Quass
2003-01-27 15:17 ` Trond Myklebust
2003-01-10 10:29 Adam.Szabo
2002-07-30 11:10 Nir Cohen
2002-07-24 11:53 Nir Cohen
2002-04-25  9:41 Tina Arora
2002-04-03 11:03 Ozy Ali
     [not found] <15484.3052.362597.167779@notabene.cse.unsw.edu.au>
2002-03-27 13:36 ` shalini jain
2002-03-25 11:03 Ozy Ali
2002-03-22 14:06 Ozy Ali
2002-03-22 14:45 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030423183859.28233.qmail@web41304.mail.yahoo.com \
    --to=bakerg3@yahoo.com \
    --cc=charles.lever@netapp.com \
    --cc=greg@bakers.org \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).