netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
@ 2011-05-19 15:39 Tom Herbert
  2011-05-19 16:16 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2011-05-19 15:39 UTC (permalink / raw)
  To: davem, netdev

Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
in encapsulated packet.  Note that this is used only when the
__skb_get_rxhash is taken, in particular only when the device does
not compute provide the rxhash (ie. feature is disabled).

This was tested by creating a single GRE tunnel between two 16 core
AMD machines.  200 netperf TCP_RR streams were ran with 1 byte
request and response size.

Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
With patch: 325896 tps, 50/90/99% latencies 603/848/1169

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0c83494..7799bbd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2552,6 +2552,28 @@ again:
 	}
 
 	switch (ip_proto) {
+	case IPPROTO_GRE:
+		if (pskb_may_pull(skb, nhoff + 16)) {
+			u8 *h = skb->data + nhoff;
+			__be16 flags = *(__be16 *)h;
+
+			/*
+			 * Only look inside GRE if version zero and no
+			 * routing
+			 */
+			if (!(flags & (GRE_VERSION|GRE_ROUTING))) {
+				proto = *(__be16 *)(h + 2);
+				nhoff += 4;
+				if (flags & GRE_CSUM)
+					nhoff += 4;
+				if (flags & GRE_KEY)
+					nhoff += 4;
+				if (flags & GRE_SEQ)
+					nhoff += 4;
+				goto again;
+			}
+		}
+		break;
 	default:
 		break;
 	}
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
  2011-05-19 15:39 [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash Tom Herbert
@ 2011-05-19 16:16 ` Eric Dumazet
  2011-05-19 19:39   ` David Miller
  2011-05-19 19:56   ` Tom Herbert
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2011-05-19 16:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

Le jeudi 19 mai 2011 à 08:39 -0700, Tom Herbert a écrit :
> Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
> in encapsulated packet.  Note that this is used only when the
> __skb_get_rxhash is taken, in particular only when the device does
> not compute provide the rxhash (ie. feature is disabled).
> 
> This was tested by creating a single GRE tunnel between two 16 core
> AMD machines.  200 netperf TCP_RR streams were ran with 1 byte
> request and response size.
> 
> Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
> With patch: 325896 tps, 50/90/99% latencies 603/848/1169
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/core/dev.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0c83494..7799bbd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2552,6 +2552,28 @@ again:
>  	}
>  
>  	switch (ip_proto) {
> +	case IPPROTO_GRE:
> +		if (pskb_may_pull(skb, nhoff + 16)) {
> +			u8 *h = skb->data + nhoff;
> +			__be16 flags = *(__be16 *)h;
> +
> +			/*
> +			 * Only look inside GRE if version zero and no
> +			 * routing
> +			 */
> +			if (!(flags & (GRE_VERSION|GRE_ROUTING))) {
> +				proto = *(__be16 *)(h + 2);
> +				nhoff += 4;
> +				if (flags & GRE_CSUM)
> +					nhoff += 4;
> +				if (flags & GRE_KEY)
> +					nhoff += 4;
> +				if (flags & GRE_SEQ)
> +					nhoff += 4;
> +				goto again;
> +			}
> +		}
> +		break;
>  	default:
>  		break;
>  	}


Hi Tom

For sure it helps if this machine is the final host for these packets.

If I am a firewall or router [and not looking into GRE packets], maybe I
dont want to spread all packets received on a tunnel to several cpus and
reorder them when forwarded.

Maybe we need to add a table, so that upper layer (GRE or IPIP tunnels)
can instruct __skb_get_rxhash() that we want to deep inspect packets.

1) Say we keep rxhash first evaluation be the one we have today.

2) Do a hash lookup in a new table to tell if upper layer handled a
previous packet for this first level flow and want more inspection.

3) table could contains 'pointers' to decoding function, that would
recompute a new rxhash function.

4) Find a way to "clean the table", garbage collect or expirations times
can do.

This way we can add stuff in GRE and IPIP modules [and other kind of
tunnels], without layer violations ?




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
  2011-05-19 16:16 ` Eric Dumazet
@ 2011-05-19 19:39   ` David Miller
  2011-05-19 20:06     ` Tom Herbert
  2011-05-19 19:56   ` Tom Herbert
  1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2011-05-19 19:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 May 2011 18:16:35 +0200

> For sure it helps if this machine is the final host for these packets.
> 
> If I am a firewall or router [and not looking into GRE packets], maybe I
> dont want to spread all packets received on a tunnel to several cpus and
> reorder them when forwarded.

Maybe is the operative word here.

Unless you look inside of the tunnel, you may not have any entropy
at all for packet steering and that seems to be what Tom is trying
to attack here.

Also, if we are properly keying the inner-flow, reordering isn't an
issue.  Actual flows will not be reordered.

> Maybe we need to add a table, so that upper layer (GRE or IPIP tunnels)
> can instruct __skb_get_rxhash() that we want to deep inspect packets.

Keep in mind that we have essentially already established that the
goal of this code is to obtain as much hash steering entropy as
possible without causing the reordering of traffic within a
TCP/UDP/SCTP/etc. connection.

And Tom's changes are consistent with those goals.

If we want to start having knobs and ways to change this policy that
is a totally seperate discussion from whether Tom's changes are
correct and ready to be applied, which I think they essentially are.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
  2011-05-19 16:16 ` Eric Dumazet
  2011-05-19 19:39   ` David Miller
@ 2011-05-19 19:56   ` Tom Herbert
  2011-05-19 20:17     ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2011-05-19 19:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev

Eric,

Thanks for comments.

> For sure it helps if this machine is the final host for these packets.
>
I would hope it helps routers too, assuming RPS already helps non
encapsulated traffic in a router.

> If I am a firewall or router [and not looking into GRE packets], maybe I
> dont want to spread all packets received on a tunnel to several cpus and
> reorder them when forwarded.
>
Do know a specific use case where this patch would be a bad thing?  I
don't believe there's any special ooo constraints on the packets in a
tunnel, flow ordering should still be maintained.

> 3) table could contains 'pointers' to decoding function, that would
> recompute a new rxhash function.
>
This could be done as a function in the protocol switch table, so that
all the protocol specific code could be moved out of dev.c.  I thought
about that several times but haven't convinced myself it's worth it;
even with the code to handle tunneling the get_hash function is still
pretty elegant (I believe ipip support is just two more lines by the
way).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
  2011-05-19 19:39   ` David Miller
@ 2011-05-19 20:06     ` Tom Herbert
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Herbert @ 2011-05-19 20:06 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

> Unless you look inside of the tunnel, you may not have any entropy
> at all for packet steering and that seems to be what Tom is trying
> to attack here.
>
Entropy for RPS, but flow identifier for RFS.  Also, this is not just
a SW problem, RSS doesn't seem to do anything creative with GRE
packets... I'm hoping we'll be able get support in NICs to do this
same functionality.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash
  2011-05-19 19:56   ` Tom Herbert
@ 2011-05-19 20:17     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2011-05-19 20:17 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

Le jeudi 19 mai 2011 à 12:56 -0700, Tom Herbert a écrit :
> Eric,
> 
> Thanks for comments.
> 
> > For sure it helps if this machine is the final host for these packets.
> >
> I would hope it helps routers too, assuming RPS already helps non
> encapsulated traffic in a router.

If conntracking is active, RPS probably helps. (Not tested yet here)

> 
> > If I am a firewall or router [and not looking into GRE packets], maybe I
> > dont want to spread all packets received on a tunnel to several cpus and
> > reorder them when forwarded.
> >
> Do know a specific use case where this patch would be a bad thing?  I
> don't believe there's any special ooo constraints on the packets in a
> tunnel, flow ordering should still be maintained.
> 

I was referring of reordering packets coming and outgoing, this could
defeat an AQM (the source carefully ordered packets with SFQ or other
qdisc, and we send them in different order).

You and David thought I was speaking of possible OOO for packets of a
given flow. I'm pretty confident you did this right :)

> > 3) table could contains 'pointers' to decoding function, that would
> > recompute a new rxhash function.
> >
> This could be done as a function in the protocol switch table, so that
> all the protocol specific code could be moved out of dev.c.  I thought
> about that several times but haven't convinced myself it's worth it;
> even with the code to handle tunneling the get_hash function is still
> pretty elegant (I believe ipip support is just two more lines by the
> way).

Yes, anyway this can be done later.

I find a bit annoying that such changes are pushed right before merge
window when we have litle time, thats all.

Thanks



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-19 20:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-19 15:39 [PATCH 4/4] rps: Inspect GRE encapsulated packets to get flow hash Tom Herbert
2011-05-19 16:16 ` Eric Dumazet
2011-05-19 19:39   ` David Miller
2011-05-19 20:06     ` Tom Herbert
2011-05-19 19:56   ` Tom Herbert
2011-05-19 20:17     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).