From: Chuck Lever <chuck.lever@oracle.com>
To: Bill Johnstone <beejstone3@yahoo.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: Performance tuning on modern 2.6 series kernels
Date: Thu, 14 Jun 2007 14:00:15 -0400 [thread overview]
Message-ID: <4671822F.6010407@oracle.com> (raw)
In-Reply-To: <906723.78703.qm@web63902.mail.re1.yahoo.com>
[-- Attachment #1: Type: text/plain, Size: 2129 bytes --]
Bill Johnstone wrote:
> 1. TCP vs. UDP. Assuming a high-bandwidth switched network, with
> enough disk and network bandwidth on the storage server, and
> rsize/wsize (for both) less than the MTU, UDP should have less overhead
> and be faster, than TCP, correct?
The overhead of TCP ACKs and extra TCP header fields does cause slow
down, but it really depends on network latency, host interrupt latency,
and host processor speed on the server and client. The slowdown was
measured at about 5% several years ago on gigahertz class Pentium III
processors, but is probably much less these days.
In almost any real world network, TCP will win over UDP because it is
properly designed to recover from network losses due to congestion and
link speed mismatches. Even the "perfect" network you describe can
suffer packet losses from bus and buffer overruns and hardware errors.
In addition, the NFSv4 spec requires the use of reliable transport
protocol, which UDP is not. So for NFSv4, UDP isn't an option.
In other words, TCP is the only choice for any modern NFS deployment.
> 2. Large rsize/wsize, and TCP vs. UDP again. I understand that if the
> rsize/wsize is larger than the MTU, a UDP packet containing NFS
> read/write data will be fragmented into multiple packets. However, all
> the documentation in this cases seems to imply that even though the
> same MTU restriction applies to TCP, TCP will be faster in this case
> than the fragmented UDP. Why is this? Doesn't TCP need to "fragment"
> the NFS payload as well?
TCP does break data into MTU sized chunks, but has proper management of
the chunks using 32-bit TCP sequence numbers. UDP uses only a 16 bit ID
field in the IP header, which is known to wrap in many common real world
situations. IP ID wrapping can cause fragment misassembly which results
in either a bad checksum on reassembly or corrupt file data.
TCP was designed as a *reliable* transport protocol, thus provides many
guarantees that the data that leaves one host is the same as the data
that arrives at the receiving end. UDP makes no guarantees about data
reliability.
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 315 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard
[-- Attachment #3: Type: text/plain, Size: 286 bytes --]
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
[-- Attachment #4: Type: text/plain, Size: 140 bytes --]
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-06-14 18:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-13 17:45 Performance tuning on modern 2.6 series kernels Bill Johnstone
2007-06-13 18:58 ` Trond Myklebust
2007-06-14 18:00 ` Chuck Lever [this message]
-- strict thread matches above, loose matches on Subject: below --
2007-06-15 4:17 Bill Johnstone
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4671822F.6010407@oracle.com \
--to=chuck.lever@oracle.com \
--cc=beejstone3@yahoo.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.