Re: GRO aggregation - Shlomo Pongartz

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Shlomo Pongartz <shlomop@mellanox.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: GRO aggregation
Date: Wed, 12 Sep 2012 17:41:52 +0300	[thread overview]
Message-ID: <50509F30.30402@mellanox.com> (raw)
In-Reply-To: <1347442394.13103.703.camel@edumazet-glaptop>

On 9/12/2012 12:33 PM, Eric Dumazet wrote:
> On Wed, 2012-09-12 at 12:23 +0300, Shlomo Pongartz wrote:
>> On 9/11/2012 10:35 PM, Eric Dumazet wrote:
>>> On Tue, 2012-09-11 at 19:24 +0000, Shlomo Pongratz wrote:
>>>
>>>> I see that in ixgbe the weight for the NAPI is 64 (netif_napi_add). So
>>>> if packets are arriving in high rate then an the CPU is fast enough to
>>>> collect the packets as they arrive, assuming packets continue to
>>>> arrives while the NAPI runs. Then it should have aggregate more. So we
>>>> will have less passes trough the stack.
>>>>
>>> As I said, _if_ your cpu was loaded by other stuff, then you would see
>>> biggest GRO packets.
>>>
>>> GRO is not : "We want to kill latency and have big packets just because
>>> its better"
>>>
>>> Its more like : If load is big enough, try to aggregate TCP frames in
>>> less skbs.
>>>
>>>
>>>
>>>
>> First I want to apologize for breaking the mailing thread. I wasn't at
>> work and used webmail.
>>
>> I agree with your but I think that something is still strange.
>> On the transmitter side all the offloading are enabled, e.g. TSO and GSO.
>> The tcpdump on the sender side shows size of 64240 which is 44 packets
>> of 1460 each.
>> Now since the offloading are enabled the HW should transmit 44 frames
>> back to back,
>> that is in a burst of 44 * 1500 bytes, which according to my calculation
>> should take 52.8 micro on 10G Ethernet.
>> Using ethtool I've set the rx-usecs to 1022 micro, which I think is the
>> maximal value for ixgbe.
>> Note that there is no way to set rx-frames on ixgbe.
>> Now since the ixgbe weight is 64 I expected that the NAPI will be able
>> to poll for more then 21 packets,
>> since 44 packets came in one burst.
>> However the results remains the same.
> TSO uses PAGE frags, so 64KB needs about 16 pages.
>
> tcp_sendmsg() could even use order-3 pages, so that only 2 pages would
> be needed to fill 64KB of data.
>
> GRO uses whatever fragment size provided by NIC, depending on MTU.
>
> One skb has a limit on number of frags.
>
> Handling a huge array of frags would be actually slower in some helper
> functions.
>
> Since you dont exactly describe why you ask all these questions, its
> hard to guess what problem you try to solve.
>
>
>
> .
>
Hi Eric

The TSO is just a mean to create a burst of frames on the wire so the 
NAPI will be able to pool as much as possible.
I'm looking on the aggregation done by GRO on behalf of IPoIB. With 
IPoIB I added a counter that counts how many
packets were aggregated before napi_complete is called (ether directly 
or by net_rx_action) and found that although
the NAPI consumes 64 packets on average before napi_complete is called, 
the tcpdump shows  that no more then 16-17
packets were aggregated. BTW when I increased the MTU  to 4K I did 
reached 64K aggregation which again is 16-17 packets.
So in order to see if 17 packets is the aggregation limit I  wanted to 
see how ixgbe is doing and found that it aggregates 21 packets.
So I wanted to know if there is another factor that governs the 
aggregation, one that I can tune.

Shlomo.

next prev parent reply	other threads:[~2012-09-12 14:46 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-11 13:45 GRO aggregation Shlomo Pongartz
2012-09-11 18:20 ` Marcelo Ricardo Leitner
2012-09-11 18:41   ` Shlomo Pongratz
2012-09-11 18:48     ` Marcelo Ricardo Leitner
2012-09-11 18:51       ` Shlomo Pongratz
2012-09-11 18:33 ` Eric Dumazet
2012-09-11 18:49   ` Shlomo Pongratz
2012-09-11 19:01     ` Eric Dumazet
2012-09-11 19:24       ` Shlomo Pongratz
2012-09-11 19:35         ` David Miller
2012-09-11 19:35         ` Eric Dumazet
2012-09-12  9:23           ` Shlomo Pongartz
2012-09-12  9:33             ` Eric Dumazet
2012-09-12 14:41               ` Shlomo Pongartz [this message]
2012-09-12 16:23                 ` Rick Jones
2012-09-12 16:34                   ` Shlomo Pongartz
2012-09-12 16:52                     ` Rick Jones
2012-09-13  6:36                       ` Shlomo Pongartz
2012-09-13  8:11                         ` Eric Dumazet
2012-09-13  9:59                           ` Or Gerlitz
2012-09-13 12:05                             ` Eric Dumazet
2012-09-13 12:34                               ` Eric Dumazet
2012-09-13 12:47                               ` Or Gerlitz
2012-09-13 13:22                                 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50509F30.30402@mellanox.com \
    --to=shlomop@mellanox.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.