From: Rick Jones <rick.jones2@hp.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
David Miller <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>, Jesse Gross <jesse@nicira.com>
Subject: Re: [RFC] GRO scalability
Date: Fri, 05 Oct 2012 11:16:22 -0700 [thread overview]
Message-ID: <506F23F6.1060704@hp.com> (raw)
In-Reply-To: <1349448747.21172.113.camel@edumazet-glaptop>
On 10/05/2012 07:52 AM, Eric Dumazet wrote:
> What we could do :
>
> 1) Use a hash to avoid expensive gro_list management and allow
> much more concurrent flows.
>
> Use skb_get_rxhash(skb) to compute rxhash
>
> If l4_rxhash not set -> not a GRO candidate.
>
> If l4_rxhash set, use a hash lookup to immediately finds a 'same flow'
> candidates.
>
> (tcp stack could eventually use rxhash instead of its custom hash
> computation ...)
>
> 2) Use a LRU list to eventually be able to 'flush' too old packets,
> even if the napi never completes. Each time we process a new packet,
> being a GRO candidate or not, we increment a napi->sequence, and we
> flush the oldest packet in gro_lru_list if its own sequence is too
> old.
>
> That would give a latency guarantee.
Flushing things if N packets have come though sounds like goodness, and
it reminds me a bit about what happens with IP fragment reassembly -
another area where the stack is trying to guess just how long to
hang-onto a packet before doing something else with it. But the value
of N to get a "decent" per-flow GRO aggregation rate will depend on the
number of concurrent flows right? If I want to have a good shot at
getting 2 segments combined for 1000 active, concurrent flows entering
my system via that interface, won't N have to approach 2000?
GRO (and HW LRO) has a fundamental limitation/disadvantage here. GRO
does provide a very nice "boost" on various situations (especially
numbers of concurrent netperfs that don't blow-out the tracking limits)
but since it won't really know anything about the flow(s) involved (*)
or even their number (?), it will always be guessing. That is why it is
really only "poor man's JumboFrames" (or larger MTU - Sadly, the IEEE
keeps us all beggars here).
A goodly portion of the benefit of GRO comes from the "incidental" ACK
avoidance it causes yes? That being the case, might that be a
worthwhile avenue to explore? It would then naturally scale as TCP et
al do today.
When we go to 40 GbE will we have 4x as many flows, or the same number
of 4x faster flows?
rick jones
* for example - does this TCP segment contain the last byte(s) of a
pipelined http request/response and the first byte(s) of the next one
and so should "flush" now?
next prev parent reply other threads:[~2012-10-05 18:16 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-27 12:48 [PATCH net-next 3/3] ipv4: gre: add GRO capability Eric Dumazet
2012-09-27 17:52 ` Jesse Gross
2012-09-27 18:08 ` Eric Dumazet
2012-09-27 18:19 ` Eric Dumazet
2012-09-27 22:03 ` Jesse Gross
2012-09-28 14:04 ` Eric Dumazet
2012-10-01 20:56 ` Jesse Gross
2012-10-05 14:52 ` [RFC] GRO scalability Eric Dumazet
2012-10-05 18:16 ` Rick Jones [this message]
2012-10-05 19:00 ` Eric Dumazet
2012-10-05 19:35 ` Rick Jones
2012-10-05 20:06 ` Eric Dumazet
2012-10-08 16:40 ` Rick Jones
2012-10-08 16:59 ` Eric Dumazet
2012-10-08 17:49 ` Rick Jones
2012-10-08 17:55 ` Eric Dumazet
2012-10-08 17:56 ` Eric Dumazet
2012-10-08 18:58 ` [RFC] napi: limit GRO latency Stephen Hemminger
2012-10-08 19:10 ` David Miller
2012-10-08 19:12 ` Stephen Hemminger
2012-10-08 19:30 ` Eric Dumazet
2012-10-08 19:40 ` Stephen Hemminger
2012-10-08 19:46 ` Eric Dumazet
2012-10-08 19:21 ` Eric Dumazet
2012-10-08 18:21 ` [RFC] GRO scalability Rick Jones
2012-10-08 18:28 ` Eric Dumazet
2012-10-06 4:11 ` Herbert Xu
2012-10-06 5:08 ` Eric Dumazet
2012-10-06 5:14 ` Herbert Xu
2012-10-06 6:22 ` Eric Dumazet
2012-10-06 7:00 ` Eric Dumazet
2012-10-06 10:56 ` Herbert Xu
2012-10-06 18:08 ` [PATCH] net: gro: selective flush of packets Eric Dumazet
2012-10-07 0:32 ` Herbert Xu
2012-10-07 5:29 ` Eric Dumazet
2012-10-08 7:39 ` Eric Dumazet
2012-10-08 16:42 ` Rick Jones
2012-10-08 17:10 ` Eric Dumazet
2012-10-08 18:52 ` David Miller
2012-09-27 22:03 ` [PATCH net-next 3/3] ipv4: gre: add GRO capability Jesse Gross
2012-10-01 21:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506F23F6.1060704@hp.com \
--to=rick.jones2@hp.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=herbert@gondor.apana.org.au \
--cc=jesse@nicira.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.