From: Bill Fink <billfink@mindspring.com>
To: Willy Tarreau <w@1wt.eu>
Cc: David Miller <davem@davemloft.net>,
herbert@gondor.apana.org.au, zbr@ioremap.net, jarkao2@gmail.com,
dada1@cosmosbay.com, ben@zeus.com, mingo@elte.hu,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
jens.axboe@oracle.com
Subject: Re: [PATCH v2] tcp: splice as many packets as possible at once
Date: Thu, 5 Feb 2009 03:32:41 -0500 [thread overview]
Message-ID: <20090205033241.a99121fe.billfink@mindspring.com> (raw)
In-Reply-To: <20090204091217.GA21385@1wt.eu>
On Wed, 4 Feb 2009, Willy Tarreau wrote:
> On Wed, Feb 04, 2009 at 01:01:46AM -0800, David Miller wrote:
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Wed, 4 Feb 2009 19:59:07 +1100
> >
> > > On Wed, Feb 04, 2009 at 09:54:32AM +0100, Willy Tarreau wrote:
> > > >
> > > > My server is running 2.4 :-), but I observed the same issues with older
> > > > 2.6 as well. I can certainly imagine that things have changed a lot since,
> > > > but the initial point remains : jumbo frames are expensive to deal with,
> > > > and with recent NICs and drivers, we might get close performance for
> > > > little additional cost. After all, initial justification for jumbo frames
> > > > was the devastating interrupt rate and all NICs coalesce interrupts now.
> > >
> > > This is total crap! Jumbo frames are way better than any of the
> > > hacks (such as GSO) that people have come up with to get around it.
> > > The only reason we are not using it as much is because of this
> > > nasty thing called the Internet.
> >
> > Completely agreed.
> >
> > If Jumbo frames are slower, it is NOT some fundamental issue. It is
> > rather due to some misdesign of the hardware or it's driver.
>
> Agreed we can't use them *because* of the internet, but this
> limitation has forced hardware designers to find valid alternatives.
> For instance, having the ability to reach 10 Gbps with 1500 bytes
> frames on myri10ge with a low CPU usage is a real achievement. This
> is "only" 800 kpps after all.
>
> And the arbitrary choice of 9k for jumbo frames was total crap too.
> It's clear that no hardware designer was involved in the process.
> They have to stuff 16kB of RAM on a NIC to use only 9. And we need
> to allocate 3 pages for slightly more than 2. 7.5 kB would have been
> better in this regard.
>
> I still find it nice to lower CPU usage with frames larger than 1500,
> but given the fact that this is rarely used (even in datacenters), I
> think our efforts should concentrate on where the real users are, ie
> <1500.
Those in the HPC realm use 9000 byte jumbo frames because it makes
a major performance difference, especially across large RTT paths,
and the Internet2 backbone fully supports 9000 byte jumbo frames
(with some wishing we could support much larger frame sizes).
Local environment:
9000 byte jumbo frames:
[root@lang2 ~]# nuttcp -w10m 192.168.88.16
11818.1875 MB / 10.01 sec = 9905.9707 Mbps 100 %TX 76 %RX 0 retrans 0.15 msRTT
4080 byte MTU:
[root@lang2 ~]# nuttcp -w10m 192.168.88.16
9171.6875 MB / 10.02 sec = 7680.7663 Mbps 100 %TX 99 %RX 0 retrans 0.19 msRTT
The performance impact is even more pronounced on a large RTT path
such as the following netem emulated 80 ms RTT path:
9000 byte jumbo frames:
[root@lang2 ~]# nuttcp -T30 -w80m 192.168.89.15
25904.2500 MB / 30.16 sec = 7205.8755 Mbps 96 %TX 55 %RX 0 retrans 82.73 msRTT
4080 byte MTU:
[root@lang2 ~]# nuttcp -T30 -w80m 192.168.89.15
8650.0129 MB / 30.25 sec = 2398.8862 Mbps 33 %TX 19 %RX 2371 retrans 81.98 msRTT
And if there's any loss in the path, the performance difference is also
dramatic, such as here across a real MAN environment with about a 1 ms RTT:
9000 byte jumbo frames:
[root@chance9 ~]# nuttcp -w20m 192.168.88.8
7711.8750 MB / 10.05 sec = 6436.2406 Mbps 82 %TX 96 %RX 261 retrans 0.92 msRTT
4080 byte MTU:
[root@chance9 ~]# nuttcp -w20m 192.168.88.8
4551.0625 MB / 10.08 sec = 3786.2108 Mbps 50 %TX 95 %RX 42 retrans 0.95 msRTT
All testing was with myri10ge on the transmitter side (2.6.20.7 kernel).
So my experience has definitely been that 9000 byte jumbo frames are a
major performance win for high throughput applications.
-Bill
next prev parent reply other threads:[~2009-02-05 8:33 UTC|newest]
Thread overview: 190+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-08 17:30 [PATCH] tcp: splice as many packets as possible at once Willy Tarreau
2009-01-08 19:44 ` Jens Axboe
2009-01-08 22:03 ` Willy Tarreau
2009-01-08 21:50 ` Ben Mansell
2009-01-08 21:55 ` David Miller
2009-01-08 22:20 ` Willy Tarreau
2009-01-13 23:08 ` David Miller
2009-01-09 6:47 ` Eric Dumazet
2009-01-09 7:04 ` Willy Tarreau
2009-01-09 7:28 ` Eric Dumazet
2009-01-09 7:42 ` Willy Tarreau
2009-01-13 23:27 ` David Miller
2009-01-13 23:35 ` Eric Dumazet
2009-01-09 15:42 ` Eric Dumazet
2009-01-09 17:57 ` Eric Dumazet
2009-01-09 18:54 ` Willy Tarreau
2009-01-09 20:51 ` Eric Dumazet
2009-01-09 21:24 ` Willy Tarreau
2009-01-09 22:02 ` Eric Dumazet
2009-01-09 22:09 ` Willy Tarreau
2009-01-09 22:07 ` Willy Tarreau
2009-01-09 22:12 ` Eric Dumazet
2009-01-09 22:17 ` Willy Tarreau
2009-01-09 22:42 ` Evgeniy Polyakov
2009-01-09 22:50 ` Willy Tarreau
2009-01-09 23:01 ` Evgeniy Polyakov
2009-01-09 23:06 ` Willy Tarreau
2009-01-10 7:40 ` Eric Dumazet
2009-01-11 12:58 ` Evgeniy Polyakov
2009-01-11 13:14 ` Eric Dumazet
2009-01-11 13:35 ` Evgeniy Polyakov
2009-01-11 16:00 ` Eric Dumazet
2009-01-11 16:05 ` Evgeniy Polyakov
2009-01-14 0:07 ` David Miller
2009-01-14 0:13 ` Evgeniy Polyakov
2009-01-14 0:16 ` David Miller
2009-01-14 0:22 ` Evgeniy Polyakov
2009-01-14 0:37 ` David Miller
2009-01-14 3:51 ` Herbert Xu
2009-01-14 4:25 ` David Miller
2009-01-14 7:27 ` David Miller
2009-01-14 8:26 ` Herbert Xu
2009-01-14 8:53 ` Jarek Poplawski
2009-01-14 9:29 ` David Miller
2009-01-14 9:42 ` Jarek Poplawski
2009-01-14 10:06 ` David Miller
2009-01-14 10:47 ` Jarek Poplawski
2009-01-14 11:29 ` Herbert Xu
2009-01-14 11:40 ` Jarek Poplawski
2009-01-14 11:45 ` Jarek Poplawski
2009-01-14 9:54 ` Jarek Poplawski
2009-01-14 10:01 ` Willy Tarreau
2009-01-14 12:06 ` Jarek Poplawski
2009-01-14 12:15 ` Jarek Poplawski
2009-01-14 11:28 ` Herbert Xu
2009-01-15 23:03 ` Willy Tarreau
2009-01-15 23:19 ` David Miller
2009-01-15 23:19 ` Herbert Xu
2009-01-15 23:26 ` David Miller
2009-01-15 23:32 ` Herbert Xu
2009-01-15 23:34 ` David Miller
2009-01-15 23:42 ` Willy Tarreau
2009-01-15 23:44 ` Willy Tarreau
2009-01-15 23:54 ` David Miller
2009-01-19 0:42 ` Willy Tarreau
2009-01-19 3:08 ` Herbert Xu
2009-01-19 3:27 ` David Miller
2009-01-19 6:14 ` Willy Tarreau
2009-01-19 6:19 ` David Miller
2009-01-19 6:45 ` Willy Tarreau
2009-01-19 10:19 ` Herbert Xu
2009-01-19 20:59 ` David Miller
2009-01-19 21:24 ` Herbert Xu
2009-01-25 21:03 ` Willy Tarreau
2009-01-26 7:59 ` Jarek Poplawski
2009-01-26 8:12 ` Willy Tarreau
2009-01-19 8:40 ` Jarek Poplawski
2009-01-19 3:28 ` David Miller
2009-01-19 6:11 ` Willy Tarreau
2009-01-24 21:23 ` Willy Tarreau
2009-01-20 12:01 ` Ben Mansell
2009-01-20 12:11 ` Evgeniy Polyakov
2009-01-20 13:43 ` Ben Mansell
2009-01-20 14:06 ` Jarek Poplawski
2009-01-16 6:51 ` Jarek Poplawski
2009-01-19 6:08 ` David Miller
2009-01-19 6:16 ` David Miller
2009-01-19 10:20 ` Herbert Xu
2009-01-20 8:37 ` Jarek Poplawski
2009-01-20 9:33 ` [PATCH v2] " Jarek Poplawski
2009-01-20 10:00 ` Evgeniy Polyakov
2009-01-20 10:20 ` Jarek Poplawski
2009-01-20 10:31 ` Evgeniy Polyakov
2009-01-20 11:01 ` Jarek Poplawski
2009-01-20 17:16 ` David Miller
2009-01-21 9:54 ` Jarek Poplawski
2009-01-22 9:04 ` [PATCH v3] " Jarek Poplawski
2009-01-26 5:22 ` David Miller
2009-01-27 7:11 ` Herbert Xu
2009-01-27 7:54 ` Jarek Poplawski
2009-01-27 10:09 ` Herbert Xu
2009-01-27 10:35 ` Jarek Poplawski
2009-01-27 10:57 ` Jarek Poplawski
2009-01-27 11:48 ` Herbert Xu
2009-01-27 12:16 ` Jarek Poplawski
2009-01-27 12:31 ` Jarek Poplawski
2009-01-27 17:06 ` David Miller
2009-01-28 8:10 ` Jarek Poplawski
2009-02-01 8:41 ` David Miller
2009-01-26 8:20 ` [PATCH v2] " Jarek Poplawski
2009-01-26 21:21 ` Evgeniy Polyakov
2009-01-27 6:10 ` David Miller
2009-01-27 7:40 ` Jarek Poplawski
2009-01-30 21:42 ` David Miller
2009-01-30 21:59 ` Willy Tarreau
2009-01-30 22:03 ` David Miller
2009-01-30 22:13 ` Willy Tarreau
2009-01-30 22:15 ` David Miller
2009-01-30 22:16 ` Herbert Xu
2009-02-02 8:08 ` Jarek Poplawski
2009-02-02 8:18 ` David Miller
2009-02-02 8:43 ` Jarek Poplawski
2009-02-03 7:50 ` David Miller
2009-02-03 9:41 ` Jarek Poplawski
2009-02-03 11:10 ` Evgeniy Polyakov
2009-02-03 11:24 ` Herbert Xu
2009-02-03 11:49 ` Evgeniy Polyakov
2009-02-03 11:53 ` Herbert Xu
2009-02-03 12:07 ` Evgeniy Polyakov
2009-02-03 12:12 ` Herbert Xu
2009-02-03 12:18 ` Evgeniy Polyakov
2009-02-03 12:25 ` Willy Tarreau
2009-02-03 12:28 ` Herbert Xu
2009-02-04 0:47 ` David Miller
2009-02-04 6:19 ` Willy Tarreau
2009-02-04 8:12 ` Evgeniy Polyakov
2009-02-04 8:54 ` Willy Tarreau
2009-02-04 8:59 ` Herbert Xu
2009-02-04 9:01 ` David Miller
2009-02-04 9:12 ` Willy Tarreau
2009-02-04 9:15 ` David Miller
2009-02-04 19:19 ` Roland Dreier
2009-02-04 19:28 ` Willy Tarreau
2009-02-04 19:48 ` Jarek Poplawski
2009-02-05 8:32 ` Bill Fink [this message]
2009-02-04 9:12 ` David Miller
2009-02-03 12:27 ` Herbert Xu
2009-02-03 13:05 ` david
2009-02-03 12:12 ` Evgeniy Polyakov
2009-02-03 12:18 ` Herbert Xu
2009-02-03 12:30 ` Evgeniy Polyakov
2009-02-03 12:33 ` Herbert Xu
2009-02-03 12:33 ` Nick Piggin
2009-02-04 0:46 ` David Miller
2009-02-04 9:41 ` Benny Amorsen
2009-02-04 12:01 ` Herbert Xu
2009-02-03 12:36 ` Jarek Poplawski
2009-02-03 13:06 ` Evgeniy Polyakov
2009-02-03 13:25 ` Jarek Poplawski
2009-02-03 14:20 ` Evgeniy Polyakov
2009-02-04 0:46 ` David Miller
2009-02-04 8:08 ` Evgeniy Polyakov
2009-02-04 9:23 ` Nick Piggin
2009-02-04 7:56 ` Jarek Poplawski
2009-02-06 7:52 ` David Miller
2009-02-06 8:09 ` Herbert Xu
2009-02-06 9:10 ` Jarek Poplawski
2009-02-06 9:17 ` David Miller
2009-02-06 9:42 ` Jarek Poplawski
2009-02-06 9:49 ` David Miller
2009-02-06 9:23 ` Herbert Xu
2009-02-06 9:51 ` Jarek Poplawski
2009-02-06 10:28 ` Herbert Xu
2009-02-06 10:58 ` Jarek Poplawski
2009-02-06 11:10 ` Willy Tarreau
2009-02-06 11:47 ` Jarek Poplawski
2009-02-06 18:59 ` Jarek Poplawski
2009-02-03 11:38 ` Nick Piggin
2009-01-27 18:42 ` David Miller
2009-01-15 23:32 ` [PATCH] " Willy Tarreau
2009-01-15 23:35 ` David Miller
2009-01-14 0:51 ` Herbert Xu
2009-01-14 1:24 ` David Miller
2009-01-09 22:45 ` Eric Dumazet
2009-01-09 22:53 ` Willy Tarreau
2009-01-09 23:34 ` Eric Dumazet
2009-01-13 5:45 ` David Miller
2009-01-14 0:05 ` David Miller
2009-01-13 23:31 ` David Miller
2009-01-13 23:26 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090205033241.a99121fe.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=ben@zeus.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=jarkao2@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=w@1wt.eu \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.