From: "Kok, Auke" <auke-jan.h.kok@intel.com>
To: Bruce Allen <ballen@gravity.phys.uwm.edu>
Cc: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
netdev@vger.kernel.org,
Carsten Aulbert <carsten.aulbert@aei.mpg.de>,
Henning Fehrmann <henning.fehrmann@aei.mpg.de>,
Bruce Allen <bruce.allen@aei.mpg.de>
Subject: Re: e1000 full-duplex TCP performance well below wire speed
Date: Thu, 31 Jan 2008 11:32:17 -0800 [thread overview]
Message-ID: <47A22241.70600@intel.com> (raw)
In-Reply-To: <Pine.LNX.4.63.0801311251040.14403@trinity.phys.uwm.edu>
Bruce Allen wrote:
> Hi Auke,
>
>>>>> Important note: we ARE able to get full duplex wire speed (over 900
>>>>> Mb/s simulaneously in both directions) using UDP. The problems occur
>>>>> only with TCP connections.
>>>>
>>>> That eliminates bus bandwidth issues, probably, but small packets take
>>>> up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
>>>
>>> I see. Your concern is the extra ACK packets associated with TCP. Even
>>> those these represent a small volume of data (around 5% with MTU=1500,
>>> and less at larger MTU) they double the number of packets that must be
>>> handled by the system compared to UDP transmission at the same data
>>> rate. Is that correct?
>>
>> A lot of people tend to forget that the pci-express bus has enough
>> bandwidth on first glance - 2.5gbit/sec for 1gbit of traffix, but
>> apart from data going over it there is significant overhead going on:
>> each packet requires transmit, cleanup and buffer transactions, and
>> there are many irq register clears per second (slow ioread/writes).
>> The transactions double for TCP ack processing, and this all
>> accumulates and starts to introduce latency, higher cpu utilization
>> etc...
>
> Based on the discussion in this thread, I am inclined to believe that
> lack of PCI-e bus bandwidth is NOT the issue. The theory is that the
> extra packet handling associated with TCP acknowledgements are pushing
> the PCI-e x1 bus past its limits. However the evidence seems to show
> otherwise:
>
> (1) Bill Fink has reported the same problem on a NIC with a 133 MHz
> 64-bit PCI connection. That connection can transfer data at 8Gb/s.
That was even a PCI-X connection, which is known to have extremely good latency
numbers, IIRC better than PCI-e? (?) which could account for a lot of the
latency-induced lower performance...
also, 82573's are _not_ a serverpart and were not designed for this usage. 82546's
are and that really does make a difference. 82573's are full of power savings
features and all that does make a difference even with some of them turned off.
It's not for nothing that these 82573's are used in a ton of laptops like from
toshiba, lenovo etc.... A lot of this has to do with the cards internal clock
timings as usual.
So, you'd really have to compare the 82546 to a 82571 card to be fair. You get
what you pay for so to speak.
> (2) If the theory is right, then doubling the MTU from 1500 to 3000
> should have significantly reduce the problem, since it drops the number
> of ACK's by two. Similarly, going from MTU 1500 to MTU 9000 should
> reduce the number of ACK's by a factor of six, practically eliminating
> the problem. But changing the MTU size does not help.
>
> (3) The interrupt counts are quite reasonable. Broadcom NICs without
> interrupt aggregation generate an order of magnitude more irq/s and this
> doesn't prevent wire speed performance there.
>
> (4) The CPUs on the system are largely idle. There are plenty of
> computing resources available.
>
> (5) I don't think that the overhead will increase the bandwidth needed
> by more than a factor of two. Of course you and the other e1000
> developers are the experts, but the dominant bus cost should be copying
> data buffers across the bus. Everything else in minimal in comparison.
>
> Intel insiders: isn't there some simple instrumentation available (which
> read registers or statistics counters on the PCI-e interface chip) to
> tell us statistics such as how many bits have moved over the link in
> each direction? This plus some accurate timing would make it easy to see
> if the TCP case is saturating the PCI-e bus. Then the theory addressed
> with data rather than with opinions.
the only tools we have are expensive bus analyzers. As said in the thread with
Rick Jones, I think there might be some tools avaialable from Intel for this but I
have never seen these.
Auke
next prev parent reply other threads:[~2008-01-31 19:33 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-30 12:23 e1000 full-duplex TCP performance well below wire speed Bruce Allen
2008-01-30 17:36 ` Brandeburg, Jesse
2008-01-30 18:45 ` Rick Jones
2008-01-30 23:15 ` Bruce Allen
2008-01-31 11:35 ` Carsten Aulbert
2008-01-31 17:55 ` Rick Jones
2008-02-01 19:57 ` Carsten Aulbert
2008-01-30 23:07 ` Bruce Allen
2008-01-31 5:43 ` Brandeburg, Jesse
2008-01-31 8:31 ` Bruce Allen
2008-01-31 18:08 ` Kok, Auke
2008-01-31 18:38 ` Rick Jones
2008-01-31 18:47 ` Kok, Auke
2008-01-31 19:07 ` Rick Jones
2008-01-31 19:13 ` Bruce Allen
2008-01-31 19:32 ` Kok, Auke [this message]
2008-01-31 19:48 ` Bruce Allen
2008-02-01 6:27 ` Bill Fink
2008-02-01 7:54 ` Bruce Allen
2008-01-31 15:12 ` Carsten Aulbert
2008-01-31 17:20 ` Brandeburg, Jesse
2008-01-31 17:27 ` Carsten Aulbert
2008-01-31 17:33 ` Brandeburg, Jesse
2008-01-31 18:11 ` running aggregate netperf TCP_RR " Rick Jones
2008-01-31 18:03 ` Rick Jones
2008-01-31 15:18 ` Carsten Aulbert
2008-01-31 9:17 ` Andi Kleen
2008-01-31 9:59 ` Bruce Allen
2008-01-31 16:09 ` Carsten Aulbert
2008-01-31 18:15 ` Kok, Auke
2008-01-30 19:17 ` Ben Greear
2008-01-30 22:33 ` Bruce Allen
-- strict thread matches above, loose matches on Subject: below --
2008-01-30 9:51 Bruce Allen
2008-01-30 13:18 ` Andi Kleen
2008-01-30 13:38 ` Bruce Allen
2008-01-30 14:08 ` David Miller
2008-01-30 13:53 ` David Miller
2008-01-30 14:01 ` Bruce Allen
2008-01-30 16:21 ` Stephen Hemminger
2008-01-30 16:21 ` Stephen Hemminger
2008-01-30 22:25 ` Bruce Allen
2008-01-30 22:33 ` Stephen Hemminger
2008-01-30 23:23 ` Bruce Allen
2008-01-31 0:17 ` SANGTAE HA
2008-01-31 8:52 ` Bruce Allen
2008-01-31 11:45 ` Bill Fink
2008-01-31 14:50 ` David Acker
2008-01-31 15:57 ` Bruce Allen
2008-01-31 15:54 ` Bruce Allen
2008-01-31 17:36 ` Bill Fink
2008-01-31 19:37 ` Bruce Allen
2008-01-31 18:26 ` Brandeburg, Jesse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47A22241.70600@intel.com \
--to=auke-jan.h.kok@intel.com \
--cc=ballen@gravity.phys.uwm.edu \
--cc=bruce.allen@aei.mpg.de \
--cc=carsten.aulbert@aei.mpg.de \
--cc=henning.fehrmann@aei.mpg.de \
--cc=jesse.brandeburg@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.