get beyond 1Gbps with pktgen on 10Gb nic?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* get beyond 1Gbps with pktgen  on 10Gb nic?
@ 2010-05-11 13:13 Jon Zhou
  2010-05-11 13:35 ` Ben Hutchings
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Zhou @ 2010-05-11 13:13 UTC (permalink / raw)
  To: netdev@vger.kernel.org

hi there:

anyone can get beyond 1Gbps with pktgen or other SW traffic generator with 10Gb nic(intel 82599 or BCM 57711)?
found that some one had met similar situation with broadcom 10G nic but no solution yet

thanks
jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen  on 10Gb nic?
  2010-05-11 13:13 get beyond 1Gbps with pktgen on 10Gb nic? Jon Zhou
@ 2010-05-11 13:35 ` Ben Hutchings
  2010-05-11 15:12   ` Rick Jones
  2010-05-11 15:55   ` Ben Greear
  0 siblings, 2 replies; 9+ messages in thread
From: Ben Hutchings @ 2010-05-11 13:35 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org

On Tue, 2010-05-11 at 06:13 -0700, Jon Zhou wrote:
> hi there:
> 
> anyone can get beyond 1Gbps with pktgen or other SW traffic generator with 10Gb nic(intel 82599 or BCM 57711)?
> found that some one had met similar situation with broadcom 10G nic but no solution yet

I don't know about those specific controllers, but you should be able to
achieve close to 10G line rate with netperf's TCP_STREAM on any recent
PC server.  UDP throughput tends to be poorer as there is less support
for offloading segmentation and reassembly.  Performance may also be
constrained by PCI Express bandwidth (you need a real 8-lane slot) and
memory bandwidth (a single memory bank may not be enough).

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen  on 10Gb nic?
  2010-05-11 13:35 ` Ben Hutchings
@ 2010-05-11 15:12   ` Rick Jones
  2010-05-11 15:55   ` Ben Greear
  1 sibling, 0 replies; 9+ messages in thread
From: Rick Jones @ 2010-05-11 15:12 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jon Zhou, netdev@vger.kernel.org

Ben Hutchings wrote:
> On Tue, 2010-05-11 at 06:13 -0700, Jon Zhou wrote:
> 
>>hi there:
>>
>> anyone can get beyond 1Gbps with pktgen or other SW traffic generator with
>> 10Gb nic(intel 82599 or BCM 57711)? found that some one had met similar
>> situation with broadcom 10G nic but no solution yet
> 
> 
> I don't know about those specific controllers, but you should be able to
> achieve close to 10G line rate with netperf's TCP_STREAM on any recent
> PC server.  UDP throughput tends to be poorer as there is less support
> for offloading segmentation and reassembly.  Performance may also be
> constrained by PCI Express bandwidth (you need a real 8-lane slot) and
> memory bandwidth (a single memory bank may not be enough).

Further, at least in the context of netperf benchmarking, depending on the 
quantity of offloads, and the speed of your cores, you may want to use the 
global -T option to bind netperf, and particularly netserver to a core other 
than the one on which the interrupts were processed.

It is also good to check whether any of the cores in the system are at or near 
saturation (eg 100%ish utilization).

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen  on 10Gb nic?
  2010-05-11 13:35 ` Ben Hutchings
  2010-05-11 15:12   ` Rick Jones
@ 2010-05-11 15:55   ` Ben Greear
  2010-05-12  4:00     ` Jon Zhou
  1 sibling, 1 reply; 9+ messages in thread
From: Ben Greear @ 2010-05-11 15:55 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jon Zhou, netdev@vger.kernel.org

On 05/11/2010 06:35 AM, Ben Hutchings wrote:
> On Tue, 2010-05-11 at 06:13 -0700, Jon Zhou wrote:
>> hi there:
>>
>> anyone can get beyond 1Gbps with pktgen or other SW traffic generator with 10Gb nic(intel 82599 or BCM 57711)?
>> found that some one had met similar situation with broadcom 10G nic but no solution yet
>
> I don't know about those specific controllers, but you should be able to
> achieve close to 10G line rate with netperf's TCP_STREAM on any recent
> PC server.  UDP throughput tends to be poorer as there is less support
> for offloading segmentation and reassembly.  Performance may also be
> constrained by PCI Express bandwidth (you need a real 8-lane slot) and
> memory bandwidth (a single memory bank may not be enough).

We can easily push right at 10Gbps full-duplex on two ports (sending to self)
with a 2-port 82599 NIC, 3.3Ghz quad-core Intel core i7 6GT/s processor, etc.

In fact, recent testing with a 2-port 10G NIC and a bunch of intel 1G ports showed about
50Gbps aggregate bandwidth across the network on such a system.   (We were using 9000 MTU
for the 50Gbps test, but can reach 10G send-to-self with 1500 MTU on the 10G ports by themselves.)

This is all using a slightly modified pktgen, but normal pktgen should do just fine.

Thanks,
Ben

>
> Ben.
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: get beyond 1Gbps with pktgen  on 10Gb nic?
  2010-05-11 15:55   ` Ben Greear
@ 2010-05-12  4:00     ` Jon Zhou
  2010-05-12  4:51       ` Jesse Brandeburg
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Zhou @ 2010-05-12  4:00 UTC (permalink / raw)
  To: Ben Greear, Ben Hutchings; +Cc: netdev@vger.kernel.org

I just used multi netperf instances to reach 900K pps/8Gb+ traffic on the Broadcom 10G nic:

command:

for i in 1 2 3 4 5 6 7 8 9 10
do
netperf -l 60 -H 192.168.0.53 -- -m 60 -s 100M -S 100M &
done

the msg size was assigned as 64 bytes, but when I checked the file captured by tcpdump,
found that netperf sent many frames which are large than 64 bytes(i.e.4000-10K+ bytes) and these frames 
were truncated by tcpdump.

so that the actual avg packet size is around 1500 bytes, but what I want is avg packet: 300-400 bytes and reach 5Gb+.

does it make sense?

thanks
jon

-----Original Message-----
From: Ben Greear [mailto:greearb@candelatech.com] 
Sent: Tuesday, May 11, 2010 11:55 PM
To: Ben Hutchings
Cc: Jon Zhou; netdev@vger.kernel.org
Subject: Re: get beyond 1Gbps with pktgen on 10Gb nic?

On 05/11/2010 06:35 AM, Ben Hutchings wrote:
> On Tue, 2010-05-11 at 06:13 -0700, Jon Zhou wrote:
>> hi there:
>>
>> anyone can get beyond 1Gbps with pktgen or other SW traffic generator with 10Gb nic(intel 82599 or BCM 57711)?
>> found that some one had met similar situation with broadcom 10G nic but no solution yet
>
> I don't know about those specific controllers, but you should be able to
> achieve close to 10G line rate with netperf's TCP_STREAM on any recent
> PC server.  UDP throughput tends to be poorer as there is less support
> for offloading segmentation and reassembly.  Performance may also be
> constrained by PCI Express bandwidth (you need a real 8-lane slot) and
> memory bandwidth (a single memory bank may not be enough).

We can easily push right at 10Gbps full-duplex on two ports (sending to self)
with a 2-port 82599 NIC, 3.3Ghz quad-core Intel core i7 6GT/s processor, etc.

In fact, recent testing with a 2-port 10G NIC and a bunch of intel 1G ports showed about
50Gbps aggregate bandwidth across the network on such a system.   (We were using 9000 MTU
for the 50Gbps test, but can reach 10G send-to-self with 1500 MTU on the 10G ports by themselves.)

This is all using a slightly modified pktgen, but normal pktgen should do just fine.

Thanks,
Ben

>
> Ben.
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen on 10Gb nic?
  2010-05-12  4:00     ` Jon Zhou
@ 2010-05-12  4:51       ` Jesse Brandeburg
  2010-05-12 18:32         ` Rick Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Jesse Brandeburg @ 2010-05-12  4:51 UTC (permalink / raw)
  To: Jon Zhou; +Cc: Ben Greear, Ben Hutchings, netdev@vger.kernel.org

On Tue, May 11, 2010 at 9:00 PM, Jon Zhou <Jon.Zhou@jdsu.com> wrote:
> I just used multi netperf instances to reach 900K pps/8Gb+ traffic on the Broadcom 10G nic:
>
> command:
>
> for i in 1 2 3 4 5 6 7 8 9 10
> do
> netperf -l 60 -H 192.168.0.53 -- -m 60 -s 100M -S 100M &
> done
>
> the msg size was assigned as 64 bytes, but when I checked the file captured by tcpdump,
> found that netperf sent many frames which are large than 64 bytes(i.e.4000-10K+ bytes) and these frames
> were truncated by tcpdump.
>
> so that the actual avg packet size is around 1500 bytes, but what I want is avg packet: 300-400 bytes and reach 5Gb+.
>
> does it make sense?

if you set the TCP_NODELAY (to disable nagle) option on netperf (check
netperf -t TCP_STREAM -- -h) then you should be able to control packet
size.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen on 10Gb nic?
  2010-05-12  4:51       ` Jesse Brandeburg
@ 2010-05-12 18:32         ` Rick Jones
  2010-05-18 11:14           ` Jon Zhou
  0 siblings, 1 reply; 9+ messages in thread
From: Rick Jones @ 2010-05-12 18:32 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Jon Zhou, Ben Greear, Ben Hutchings, netdev@vger.kernel.org

Jesse Brandeburg wrote:
> On Tue, May 11, 2010 at 9:00 PM, Jon Zhou <Jon.Zhou@jdsu.com> wrote:
> 
>> I just used multi netperf instances to reach 900K pps/8Gb+ traffic on the
>> Broadcom 10G nic:

>>
>>command:
>>
>>for i in 1 2 3 4 5 6 7 8 9 10
>>do
>>netperf -l 60 -H 192.168.0.53 -- -m 60 -s 100M -S 100M &
>>done

100 Megabytes seems a trifle excessive as a socket buffer size.  I would suggest 
  lopping-off a few zeros and use 1M instead.  Or, one can let linux auto-tune 
the socket buffers/windows - just don't accept the socket buffer size reported 
by the classic netperf command - it is from the initial creation of the socket. 
  To get what it became by the end of the test one should use the "omni" tests. 
  Contact me offlist or via netperf-talk in the netperf.org domain for more on that.

>> the msg size was assigned as 64 bytes, but when I checked the file captured
>> by tcpdump, found that netperf sent many frames which are large than 64
>> bytes(i.e.4000-10K+ bytes) and these frames were truncated by tcpdump.
>>
>> so that the actual avg packet size is around 1500 bytes, but what I want is
>> avg packet: 300-400 bytes and reach 5Gb+.
>>
>>does it make sense?
> 
> 
> if you set the TCP_NODELAY (to disable nagle) option on netperf 

If he was seeing 4K to 10K byte frames in his tcpdump, that was likely TSO above 
and beyond nagle.  I was going to say it also suggests he was running tcpdump on 
the sending side rather than the receiver, but then there is LRO/GRO isn't there...

> (check netperf -t TCP_STREAM -- -h) then you should be able to control packet
>  size.

Or at least influence it meaningfully :)  If he was seeing 4K to 10K byte frames 
in his tcpdump, that was likely TSO above and beyond nagle.If there are packet 
losses and retransmissions, the retransmissions, which may or may not include 
new data may be larger.

The "netperf -t TCP_STREAM -- -h" to get test-specific help shown by Jesse will 
show the option you need to add to set TCP_NODELAY.  For additional descriptions 
of netperf command options:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html

For "quick and dirty" testing, the loop as it appears above is "ok" but I would 
suggest abusing the confidence intervals code to minimize the skew error:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

happy benchmarking,

rick jones

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: get beyond 1Gbps with pktgen on 10Gb nic?
  2010-05-12 18:32         ` Rick Jones
@ 2010-05-18 11:14           ` Jon Zhou
  2010-05-18 16:50             ` Rick Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Zhou @ 2010-05-18 11:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev@vger.kernel.org

hi rick:

do you mean "TCP_NODELAY" will send with packet size as I expect

without this option,netperf might sent packet with large size? (but eventually it will be splitted into MTU size?)

thanks
jon

-----Original Message-----
From: Rick Jones [mailto:rick.jones2@hp.com] 
Sent: Thursday, May 13, 2010 2:33 AM
To: Jesse Brandeburg
Cc: Jon Zhou; Ben Greear; Ben Hutchings; netdev@vger.kernel.org
Subject: Re: get beyond 1Gbps with pktgen on 10Gb nic?

Jesse Brandeburg wrote:
> On Tue, May 11, 2010 at 9:00 PM, Jon Zhou <Jon.Zhou@jdsu.com> wrote:
> 
>> I just used multi netperf instances to reach 900K pps/8Gb+ traffic on the
>> Broadcom 10G nic:

>>
>>command:
>>
>>for i in 1 2 3 4 5 6 7 8 9 10
>>do
>>netperf -l 60 -H 192.168.0.53 -- -m 60 -s 100M -S 100M &
>>done

100 Megabytes seems a trifle excessive as a socket buffer size.  I would suggest 
  lopping-off a few zeros and use 1M instead.  Or, one can let linux auto-tune 
the socket buffers/windows - just don't accept the socket buffer size reported 
by the classic netperf command - it is from the initial creation of the socket. 
  To get what it became by the end of the test one should use the "omni" tests. 
  Contact me offlist or via netperf-talk in the netperf.org domain for more on that.

>> the msg size was assigned as 64 bytes, but when I checked the file captured
>> by tcpdump, found that netperf sent many frames which are large than 64
>> bytes(i.e.4000-10K+ bytes) and these frames were truncated by tcpdump.
>>
>> so that the actual avg packet size is around 1500 bytes, but what I want is
>> avg packet: 300-400 bytes and reach 5Gb+.
>>
>>does it make sense?
> 
> 
> if you set the TCP_NODELAY (to disable nagle) option on netperf 

If he was seeing 4K to 10K byte frames in his tcpdump, that was likely TSO above 
and beyond nagle.  I was going to say it also suggests he was running tcpdump on 
the sending side rather than the receiver, but then there is LRO/GRO isn't there...

> (check netperf -t TCP_STREAM -- -h) then you should be able to control packet
>  size.

Or at least influence it meaningfully :)  If he was seeing 4K to 10K byte frames 
in his tcpdump, that was likely TSO above and beyond nagle.If there are packet 
losses and retransmissions, the retransmissions, which may or may not include 
new data may be larger.

The "netperf -t TCP_STREAM -- -h" to get test-specific help shown by Jesse will 
show the option you need to add to set TCP_NODELAY.  For additional descriptions 
of netperf command options:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html

For "quick and dirty" testing, the loop as it appears above is "ok" but I would 
suggest abusing the confidence intervals code to minimize the skew error:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

happy benchmarking,

rick jones

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: get beyond 1Gbps with pktgen on 10Gb nic?
  2010-05-18 11:14           ` Jon Zhou
@ 2010-05-18 16:50             ` Rick Jones
  0 siblings, 0 replies; 9+ messages in thread
From: Rick Jones @ 2010-05-18 16:50 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org

Jon Zhou wrote:
> hi rick:
> 
> do you mean "TCP_NODELAY" will send with packet size as I expect
> without this option,netperf might sent packet with large size? (but
> eventually it will be splitted into MTU size?)

First things first - netperf only ever calls send() with the size you give it 
via the command line.

It is what happens after that which matters.  Specifically, then when/how TCP 
decides to send the data across the network.  Setting TCP_NODELAY will disable 
the Nagle Algorithm, which, 99 times out of 10, will cause each send() call by 
the application to be a separate TCP segment.  The 100th time out of 10, 
something like a retransmission or a zero window from the remote etc may still 
cause multiple small send() calls to be aggregated into larger segments.  How 
much larger will depend on the Maximum Segment Size (MSS) for the connection, 
the MTU is one of the inputs to the decision of what to use for the MSS.

At the end of this message is a bit of boilerplate I have on the aforementioned 
Nagle algorithm. It is a bit generic, not stack-specific.  It discusses issues 
beyond benchmarking considerations, so keep that in mind while you are reading it.

happy benchmarking,

rick jones

$ cat usenet_replies/nagle_algorithm

 > I'm not familiar with this issue, and I'm mostly ignorant about what
 > tcp does below the sockets interface. Can anybody briefly explain what
 > "nagle" is, and how and when to turn it off? Or point me to the
 > appropriate manual.

In broad terms, whenever an application does a send() call, the logic
of the Nagle algorithm is supposed to go something like this:

1) Is the quantity of data in this send, plus any queued, unsent data,
greater than the MSS (Maximum Segment Size) for this connection? If
yes, send the data in the user's send now (modulo any other
constraints such as receiver's advertised window and the TCP
congestion window). If no, go to 2.

2) Is the connection to the remote otherwise idle? That is, is there
no unACKed data outstanding on the network. If yes, send the data in
the user's send now. If no, queue the data and wait. Either the
application will continue to call send() with enough data to get to a
full MSS-worth of data, or the remote will ACK all the currently sent,
unACKed data, or our retransmission timer will expire.

Now, where applications run into trouble is when they have what might
be described as "write, write, read" behaviour, where they present
logically associated data to the transport in separate 'send' calls
and those sends are typically less than the MSS for the connection.
It isn't so much that they run afoul of Nagle as they run into issues
with the interaction of Nagle and the other heuristics operating on
the remote. In particular, the delayed ACK heuristics.

When a receiving TCP is deciding whether or not to send an ACK back to
the sender, in broad handwaving terms it goes through logic similar to
this:

a) is there data being sent back to the sender? if yes, piggy-back the
ACK on the data segment.

b) is there a window update being sent back to the sender? if yes,
piggy-back the ACK on the window update.

c) has the standalone ACK timer expired.

Window updates are generally triggered by the following heuristics:

i) would the window update be for a non-trivial fraction of the window
- typically somewhere at or above 1/4 the window, that is, has the
application "consumed" at least that much data? if yes, send a
window update. if no, check ii.

ii) would the window update be for, the application "consumed," at
least 2*MSS worth of data? if yes, send a window update, if no wait.

Now, going back to that write, write, read application, on the sending
side, the first write will be transmitted by TCP via logic rule 2 -
the connection is otherwise idle. However, the second small send will
be delayed as there is at that point unACKnowledged data outstanding
on the connection.

At the receiver, that small TCP segment will arrive and will be passed
to the application. The application does not have the entire app-level
message, so it will not send a reply (data to TCP) back. The typical
TCP window is much much larger than the MSS, so no window update would
be triggered by heuristic i. The data just arrived is < 2*MSS, so no
window update from heuristic ii. Since there is no window update, no
ACK is sent by heuristic b.

So, that leaves heuristic c - the standalone ACK timer. That ranges
anywhere between 50 and 200 milliseconds depending on the TCP stack in
use.

If you've read this far :) now we can take a look at the effect of
various things touted as "fixes" to applications experiencing this
interaction.  We take as our example a client-server application where
both the client and the server are implemented with a write of a small
application header, followed by application data.  First, the
"default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
with standard ACK behaviour:

               Client                     Server
              Req Header        ->
                                <-        Standalone ACK after Nms
              Req Data          ->
                                <-        Possible standalone ACK
                                <-        Rsp Header
              Standalone ACK    ->
                                <-        Rsp Data
     Possible standalone ACK    ->

For two "messages" we end-up with at least six segments on the wire.
The possible standalone ACKs will depend on whether the server's
response time, or client's think time is longer than the standalone
ACK interval on their respective sides. Now, if TCP_NODELAY is set we
see:

               Client                     Server
              Req Header        ->
              Req Data          ->
                                <-        Possible Standalone ACK after Nms
                                <-        Rsp Header
                                <-        Rsp Data
      Possible Standalone ACK   ->

In theory, we are down two four segments on the wire which seems good,
but frankly we can do better.  First though, consider what happens
when someone disables delayed ACKs

               Client                     Server
              Req Header        ->
                                <-        Immediate Standalone ACK
              Req Data          ->
                                <-        Immediate Standalone ACK
                                <-        Rsp Header
    Immediate Standalone ACK    ->
                                <-        Rsp Data
    Immediate Standalone ACK    ->

Now we definitly see 8 segments on the wire.  It will also be that way
if both TCP_NODELAY is set and delayed ACKs are disabled.

How about if the application did the "right" think in the first place?
That is sent the logically associated data at the same time:

               Client                     Server
              Request        ->
                             <-           Possible Standalone ACK
                                <-        Response
    Possible Standalone ACK    ->

We are down to two segments on the wire.

For "small" packets, the CPU cost is about the same regardless of data
or ACK.  This means that the application which is making the propper
gathering send call will spend far fewer CPU cycles in the networking
stack.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-05-18 16:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11 13:13 get beyond 1Gbps with pktgen on 10Gb nic? Jon Zhou
2010-05-11 13:35 ` Ben Hutchings
2010-05-11 15:12   ` Rick Jones
2010-05-11 15:55   ` Ben Greear
2010-05-12  4:00     ` Jon Zhou
2010-05-12  4:51       ` Jesse Brandeburg
2010-05-12 18:32         ` Rick Jones
2010-05-18 11:14           ` Jon Zhou
2010-05-18 16:50             ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).