From: Rick Jones <rick.jones2@hp.com>
To: netdev@oss.sgi.com
Cc: m.iatrou@freemail.gr
Subject: Re: e1000 (?) jumbo frames performance issue
Date: Thu, 05 May 2005 16:24:25 -0700 [thread overview]
Message-ID: <427AAB29.8040607@hp.com> (raw)
In-Reply-To: <20050505151720.075e4a91.davem@davemloft.net>
David S. Miller wrote:
> On Thu, 05 May 2005 14:54:43 -0700 Rick Jones <rick.jones2@hp.com> wrote:
>
>
>> assuming of course that the intent of the algorithm was to try to get the
>> average header/header+data ratio to something around 0.9 (although IIRC,
>> none of a 537 byte send would be delayed by Nagle since it was the size of
>> the user's send being >= the MSS, so make that ~0.45 ?)
>
>
> It tries to hold smaller packets back in hopes to get some more sendmsg()
> calls which will bunch up some more data before all outstanding data is
> ACK'd.
I think we may be saying _nearly_ the same thing, although I would call that
smaller user sends. Nothing I've read (and remembered) suggested that a user
send of MSS+1 bytes should have that last byte delayed. That's were I then got
that handwaving math of 0.45 instead of 0.9.
My bringing up the ratio of header to header+data comes from stuff like this in
rfc896:
<begin>
The small-packet problem
There is a special problem associated with small packets. When
TCP is used for the transmission of single-character messages
originating at a keyboard, the typical result is that 41 byte
packets (one byte of data, 40 bytes of header) are transmitted
for each byte of useful data. This 4000% overhead is annoying
but tolerable on lightly loaded networks. On heavily loaded net-
works, however, the congestion resulting from this overhead can
result in lost datagrams and retransmissions, as well as exces-
sive propagation time caused by congestion in switching nodes and
gateways. In practice, throughput may drop so low that TCP con-
nections are aborted.
<end>
The reason I make the "user send" versus packet distinction comes from stuff
like this:
<begin>
The solution is to inhibit the sending of new TCP segments when
new outgoing data arrives from the user if any previously
transmitted data on the connection remains unacknowledged.
<end>
I do acknowledge though that there have been stacks that interpreted Nagle on a
segment by segment basis rather than a user send by user send basis. I just
don't think that they were correct :)
>
> It's meant for terminal protocols and other chatty sequences.
>
He included an FTP example with 512 byte sends which leads me to believe it was
meant for more than just terminal protocols:
<begin>
We use our scheme for all TCP connections, not just Telnet con-
nections. Let us see what happens for a file transfer data con-
nection using our technique. The two extreme cases will again be
considered.
As before, we first consider the Ethernet case. The user is now
writing data to TCP in 512 byte blocks as fast as TCP will accept
them. The user's first write to TCP will start things going; our
first datagram will be 512+40 bytes or 552 bytes long. The
user's second write to TCP will not cause a send but will cause
the block to be buffered.
<end>
What I'd forgotten is that the original RFC had no explicit discussion of checks
against the MSS. It _seems_ that the first reference to that is in rfc898,
which was a writeup of meeting notes:
<begin>
Congestion Control -- FACC - Nagle
Postel: This was a discussion of the situation leading to the
ideas presented in RFC 896, and how the policies described there
improved overall performance.
Hinden, Postel, Muuss, & Reynolds [Page 20]
\f
RFC 898 April 1984
Gateway SIG Meeting Notes
Muuss:
First principle of congestion control:
DON'T DROP PACKETS (unless absolutely necessary)
Second principle:
Hosts must behave themselves (or else)
Enemies list -
1. TOPS-20 TCP from DEC
2. VAX/UNIX 4.2 from Berkeley
Third principle:
Memory won't help (beyond a certain point).
The small packet problem: Big packets are good, small are bad
(big = 576).
Suggested fix: Rule: When the user writes to TCP, initiate a send
only if there are NO outstanding packets on the connection. [good
for TELNET, at least] (or if you fill a segment). No change when
Acks come back. Assumption is that there is a pipe-like buffer
between the user and the TCP.
<end>
with that parenthetical "(or if you fill a segment)" comment. It is interesting
how they define "big = 576" :)
It seems the full-sized segment bit gets formalized in 1122:
<begin>
A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
coalesce short segments. However, there MUST be a way for
an application to disable the Nagle algorithm on an
individual connection. In all cases, sending data is also
subject to the limitation imposed by the Slow Start
algorithm (Section 4.2.2.15).
DISCUSSION:
The Nagle algorithm is generally as follows:
If there is unacknowledged data (i.e., SND.NXT >
SND.UNA), then the sending TCP buffers all user
Internet Engineering Task Force [Page 98]
\f
RFC1122 TRANSPORT LAYER -- TCP October 1989
data (regardless of the PSH bit), until the
outstanding data has been acknowledged or until
the TCP can send a full-sized segment (Eff.snd.MSS
bytes; see Section 4.2.2.6).
<end>
> It was not designed with 16K MSS frame sizes in mind.
I certainly agree that those frame sizes were probably far from their minds at
the time and that basing the decision on the ratio of header overhead is well
within the spirit.
rick jones
next prev parent reply other threads:[~2005-05-05 23:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-05 16:28 e1000 (?) jumbo frames performance issue Michael Iatrou
2005-05-05 20:17 ` Rick Jones
2005-05-05 21:33 ` David S. Miller
2005-05-05 21:54 ` Rick Jones
2005-05-05 22:17 ` David S. Miller
2005-05-05 23:24 ` Rick Jones [this message]
2005-05-05 21:55 ` Michael Iatrou
2005-05-05 22:26 ` Michael Iatrou
2005-05-06 16:18 ` Rick Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=427AAB29.8040607@hp.com \
--to=rick.jones2@hp.com \
--cc=m.iatrou@freemail.gr \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).