Re: [RFC] GRO scalability

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Jesse Gross <jesse@nicira.com>
Subject: Re: [RFC] GRO scalability
Date: Fri, 05 Oct 2012 22:06:18 +0200	[thread overview]
Message-ID: <1349467578.21172.178.camel@edumazet-glaptop> (raw)
In-Reply-To: <506F368F.3070403@hp.com>

On Fri, 2012-10-05 at 12:35 -0700, Rick Jones wrote:

> Just how much code path is there between NAPI and the socket?? (And I 
> guess just how much combining are you hoping for?)
> 

When GRO correctly works, you can save about 30% of cpu cycles, it
depends...

Doubling MAX_SKB_FRAGS (allowing 32+1 MSS per GRO skb instead of 16+1)
gives an improvement as well...

> > Lets say we allow no more than 1ms of delay in GRO,
> 
> OK.  That means we can ignore HPC and FSI because they wouldn't tolerate 
> that kind of added delay anyway.  I'm not sure if that also then 
> eliminates the networked storage types.
> 

I took this 1ms delay, but I never said it was a fixed value ;)

Also remember one thing, this is the _max_ delay in case your napi
handler is flooded. This almost never happen (tm)

> > this means we could have about 400 packets in the GRO queue (assuming
> > 1500 bytes packets)
> 
> How many flows are you going to have entering via that queue?  And just 
> how well "shuffled" will the segments of those flows be?  That is what 
> it all comes down to right?  How many (active) flows and how well 
> shuffled they are.  If the flows aren't well shuffled, you can get away 
> with a smallish coalescing context.  If they are perfectly shuffled and 
> greater in number than your delay allowance you get right back to square 
> with all the overhead of GRO attempts with none of the benefit.

Not sure what you mean by shuffle. We use a hash table to locate a flow,
but we also have a LRU list to get the packets ordered by their entry in
the 'GRO unit'.

If napi completes, all the LRU list content is flushed to IP stack.
( napi_gro_flush()) 

If napi doesnt complete, we would only flush 'too old' packets found in
the LRU.

Note: this selective flush can be called once per napi run from
net_rx_action(). Extra cost to get a somewhat precise timestamp
would be acceptable (one call to ktime_get() or get_cycles() every 64
packets)

This timestamp could be stored in napi->timestamp and done once per
n->poll(n, weight) call.

> 
> If the flow count is < 400 to allow a decent shot at a non-zero 
> combining rate on well shuffled flows with the 400 packet limit, then 
> that means each flow is >= 12.5 Mbit/s on average at 5 Gbit/s 
> aggregated.  And I think you then get two segments per flow aggregated 
> at a time.  Is that consistent with what you expect to be the 
> characteristics of the flows entering via that queue?

If a packet cant stay more than 1ms, then a flow sending less than 1000
packets per second wont benefit from GRO.

So yes, 12.5 Mbit/s would be the threshold.

By the way, when TCP timestamps are used, and hosts are linux machines
with HZ=1000, current GRO can not coalesce packets anyway because their
TCP options are different.

(So it would be not useful trying bigger sojourn time than 1ms)

next prev parent reply	other threads:[~2012-10-05 20:06 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27 12:48 [PATCH net-next 3/3] ipv4: gre: add GRO capability Eric Dumazet
2012-09-27 17:52 ` Jesse Gross
2012-09-27 18:08   ` Eric Dumazet
2012-09-27 18:19     ` Eric Dumazet
2012-09-27 22:03       ` Jesse Gross
2012-09-28 14:04         ` Eric Dumazet
2012-10-01 20:56           ` Jesse Gross
2012-10-05 14:52             ` [RFC] GRO scalability Eric Dumazet
2012-10-05 18:16               ` Rick Jones
2012-10-05 19:00                 ` Eric Dumazet
2012-10-05 19:35                   ` Rick Jones
2012-10-05 20:06                     ` Eric Dumazet [this message]
2012-10-08 16:40                       ` Rick Jones
2012-10-08 16:59                         ` Eric Dumazet
2012-10-08 17:49                           ` Rick Jones
2012-10-08 17:55                             ` Eric Dumazet
2012-10-08 17:56                               ` Eric Dumazet
2012-10-08 18:58                                 ` [RFC] napi: limit GRO latency Stephen Hemminger
2012-10-08 19:10                                   ` David Miller
2012-10-08 19:12                                     ` Stephen Hemminger
2012-10-08 19:30                                       ` Eric Dumazet
2012-10-08 19:40                                         ` Stephen Hemminger
2012-10-08 19:46                                           ` Eric Dumazet
2012-10-08 19:21                                   ` Eric Dumazet
2012-10-08 18:21                               ` [RFC] GRO scalability Rick Jones
2012-10-08 18:28                                 ` Eric Dumazet
2012-10-06  4:11               ` Herbert Xu
2012-10-06  5:08                 ` Eric Dumazet
2012-10-06  5:14                   ` Herbert Xu
2012-10-06  6:22                     ` Eric Dumazet
2012-10-06  7:00                       ` Eric Dumazet
2012-10-06 10:56                         ` Herbert Xu
2012-10-06 18:08                           ` [PATCH] net: gro: selective flush of packets Eric Dumazet
2012-10-07  0:32                             ` Herbert Xu
2012-10-07  5:29                               ` Eric Dumazet
2012-10-08  7:39                                 ` Eric Dumazet
2012-10-08 16:42                                   ` Rick Jones
2012-10-08 17:10                                     ` Eric Dumazet
2012-10-08 18:52                             ` David Miller
2012-09-27 22:03     ` [PATCH net-next 3/3] ipv4: gre: add GRO capability Jesse Gross
2012-10-01 21:04 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1349467578.21172.178.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=jesse@nicira.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox