Get rxhash fixes and RFS support in tun

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Get rxhash fixes and RFS support in tun
@ 2013-11-20 20:25 Tom Herbert
  2013-11-21  0:02 ` David Miller
  2013-11-21  5:03 ` Jason Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Tom Herbert @ 2013-11-20 20:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, hkchu

This patch series fixes some subtle bugs in tun use of skb->rxhash, all
rxhash hash not be cleared appropraitely, and adds support for tun flows
to work with RFS.

Testing, in particular with tun, hasn't been completed yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-20 20:25 Get rxhash fixes and RFS support in tun Tom Herbert
@ 2013-11-21  0:02 ` David Miller
  2013-11-21  1:09   ` Tom Herbert
  2013-11-21  5:03 ` Jason Wang
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2013-11-21  0:02 UTC (permalink / raw)
  To: therbert; +Cc: netdev, edumazet, hkchu

From: Tom Herbert <therbert@google.com>
Date: Wed, 20 Nov 2013 12:25:48 -0800 (PST)

> This patch series fixes some subtle bugs in tun use of skb->rxhash, all
> rxhash hash not be cleared appropraitely, and adds support for tun flows
> to work with RFS.
> 
> Testing, in particular with tun, hasn't been completed yet.

I think this needs to be reworked slightly.

We really only have two boolean states:

1) Is the rxhash value in this SKB valid?

2) Is it a full L4 tuple hash?

You are adding a "this is a SW computed hash" boolean state but I do
not think you should distinguish sw vs. hw especially.  If the
hardware computed the rxhash on a tunneled packet in the
pre-decapsulated state, we very much want to recompute it, in
software, upon tunnel decapsulation in ip_tunnel_core.c

This is actually implemented in this patch set, by testing two states.
Both the "l4_rxhash" and "sw_rxhash".

Why don't we just do everything in a straightforward manner, where
nothing directly sets rxhash values.  Only helper routines do.

1) skb_set_rxhash(struct sk_buff *skb, __u32 rxhash, bool l4_rxhash)

   Update all drivers to call this.

2) Add "rxhash_valid" boolean to sk_buff, set it in skb_set_rxhash,
   test it in skb_get_rxhash(), propagate it in SKB copies.

3) Your skb_clear_rxhash() now just clears rxhash_valid.

Now, if the issue is that HW computed hashes sometimes do the tunnel
demux and look into the inner L4 headers to compute the hash, you'll
need a boolean to indicate _that_ rather than unconditionally treating
hardware that way.  Because not all of them will do this, and for
those that do not you do want to compute the hash in SW after tunnel
decapsulation.

Thoughts?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  0:02 ` David Miller
@ 2013-11-21  1:09   ` Tom Herbert
  2013-11-21  1:12     ` Stephen Hemminger
  2013-11-21  2:23     ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Tom Herbert @ 2013-11-21  1:09 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric Dumazet, Jerry Chu

On Wed, Nov 20, 2013 at 4:02 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Wed, 20 Nov 2013 12:25:48 -0800 (PST)
>
>> This patch series fixes some subtle bugs in tun use of skb->rxhash, all
>> rxhash hash not be cleared appropraitely, and adds support for tun flows
>> to work with RFS.
>>
>> Testing, in particular with tun, hasn't been completed yet.
>
> I think this needs to be reworked slightly.
>
> We really only have two boolean states:
>
> 1) Is the rxhash value in this SKB valid?
>
> 2) Is it a full L4 tuple hash?
>
> You are adding a "this is a SW computed hash" boolean state but I do
> not think you should distinguish sw vs. hw especially.  If the
> hardware computed the rxhash on a tunneled packet in the
> pre-decapsulated state, we very much want to recompute it, in
> software, upon tunnel decapsulation in ip_tunnel_core.c
>
In either case it would be recomputed in SW if L4 hash was not set
(i.e. no flow_dissector done finding L4).  If L4 hash is set, that
should refer to the hash of the inner 4-tuple, so I don't think you'd
need to recompute it.  I suppose there could be a case like encap in
UDP where there are potentially two 4-tuples to deal with, but then
the rxhash could just be cleared in decap to force recomputing hash
over the inner packet-- even better I would hope that the trick of
using outer source port to hold the hash of the inner packet (like in
nvgre) is used so that the hash on the outer header is good for L4.

The primary reason for sw_rhash is to know whether it is a comparable
value to match against that of a flow whose hash was computed in SW
(tun case).

> This is actually implemented in this patch set, by testing two states.
> Both the "l4_rxhash" and "sw_rxhash".
>
> Why don't we just do everything in a straightforward manner, where
> nothing directly sets rxhash values.  Only helper routines do.
>
> 1) skb_set_rxhash(struct sk_buff *skb, __u32 rxhash, bool l4_rxhash)
>
Yes, we should be using helper functions for this anyway.  I'll make
that change.

>    Update all drivers to call this.
>
> 2) Add "rxhash_valid" boolean to sk_buff, set it in skb_set_rxhash,
>    test it in skb_get_rxhash(), propagate it in SKB copies.
>
I think the sw_rxhash holds more useful information.  In
skb_get_rxhash() either we have an L4 hash or we try to get one by
doing the SW computation, this means we don't ever return a HW hash
which is not L4 so basically that case is treated as an invalid hash.
So after a call to skb_get_rxhash, at least one of l4_rxhash or
sw_rxhash is and we won't redo flow dissection on subsequent calls
unless the rxhash is cleared.

Thanks,
Tom

> 3) Your skb_clear_rxhash() now just clears rxhash_valid.
>
> Now, if the issue is that HW computed hashes sometimes do the tunnel
> demux and look into the inner L4 headers to compute the hash, you'll
> need a boolean to indicate _that_ rather than unconditionally treating
> hardware that way.  Because not all of them will do this, and for
> those that do not you do want to compute the hash in SW after tunnel
> decapsulation.
>
> Thoughts?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  1:09   ` Tom Herbert
@ 2013-11-21  1:12     ` Stephen Hemminger
  2013-11-21  2:23     ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2013-11-21  1:12 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List, Eric Dumazet, Jerry Chu

On Wed, 20 Nov 2013 17:09:01 -0800
Tom Herbert <therbert@google.com> wrote:

> On Wed, Nov 20, 2013 at 4:02 PM, David Miller <davem@davemloft.net> wrote:
> > From: Tom Herbert <therbert@google.com>
> > Date: Wed, 20 Nov 2013 12:25:48 -0800 (PST)
> >  
> >> This patch series fixes some subtle bugs in tun use of skb->rxhash, all
> >> rxhash hash not be cleared appropraitely, and adds support for tun flows
> >> to work with RFS.
> >>
> >> Testing, in particular with tun, hasn't been completed yet.  
> >
> > I think this needs to be reworked slightly.
> >
> > We really only have two boolean states:
> >
> > 1) Is the rxhash value in this SKB valid?
> >
> > 2) Is it a full L4 tuple hash?
> >
> > You are adding a "this is a SW computed hash" boolean state but I do
> > not think you should distinguish sw vs. hw especially.  If the
> > hardware computed the rxhash on a tunneled packet in the
> > pre-decapsulated state, we very much want to recompute it, in
> > software, upon tunnel decapsulation in ip_tunnel_core.c
> >  
> In either case it would be recomputed in SW if L4 hash was not set
> (i.e. no flow_dissector done finding L4).  If L4 hash is set, that
> should refer to the hash of the inner 4-tuple, so I don't think you'd
> need to recompute it.  I suppose there could be a case like encap in
> UDP where there are potentially two 4-tuples to deal with, but then
> the rxhash could just be cleared in decap to force recomputing hash
> over the inner packet-- even better I would hope that the trick of
> using outer source port to hold the hash of the inner packet (like in
> nvgre) is used so that the hash on the outer header is good for L4.
> 
> The primary reason for sw_rhash is to know whether it is a comparable
> value to match against that of a flow whose hash was computed in SW
> (tun case).

You can't guarantee that two hardware hashes from different devices
are the same.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  1:09   ` Tom Herbert
  2013-11-21  1:12     ` Stephen Hemminger
@ 2013-11-21  2:23     ` David Miller
  2013-11-21  2:42       ` Tom Herbert
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2013-11-21  2:23 UTC (permalink / raw)
  To: therbert; +Cc: netdev, edumazet, hkchu

From: Tom Herbert <therbert@google.com>
Date: Wed, 20 Nov 2013 17:09:01 -0800

> I think the sw_rxhash holds more useful information.  In
> skb_get_rxhash() either we have an L4 hash or we try to get one by
> doing the SW computation, this means we don't ever return a HW hash
> which is not L4 so basically that case is treated as an invalid hash.
> So after a call to skb_get_rxhash, at least one of l4_rxhash or
> sw_rxhash is and we won't redo flow dissection on subsequent calls
> unless the rxhash is cleared.

Ok.... one interesting issue remains, which is that the SW flow
dissector considers decapsulating protocols like AH and ESP as
"L4".

I think we'll need to do something about that at some point.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  2:23     ` David Miller
@ 2013-11-21  2:42       ` Tom Herbert
  2013-11-21  2:50         ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2013-11-21  2:42 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric Dumazet, Jerry Chu

On Wed, Nov 20, 2013 at 6:23 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Wed, 20 Nov 2013 17:09:01 -0800
>
>> I think the sw_rxhash holds more useful information.  In
>> skb_get_rxhash() either we have an L4 hash or we try to get one by
>> doing the SW computation, this means we don't ever return a HW hash
>> which is not L4 so basically that case is treated as an invalid hash.
>> So after a call to skb_get_rxhash, at least one of l4_rxhash or
>> sw_rxhash is and we won't redo flow dissection on subsequent calls
>> unless the rxhash is cleared.
>
> Ok.... one interesting issue remains, which is that the SW flow
> dissector considers decapsulating protocols like AH and ESP as
> "L4".
>
Thinking about a little more, maybe the valid bit is better.  Using
rxhash for both populating the flow hash table for consumption at a
low level and using it for flow lookup at a higher level might be at
odds.  In the original patches Eric mentioned that tun should always
compute its own hash anyway.

> I think we'll need to do something about that at some point.

We need the rxhash to be the value seen at the point of RPS (to do RFS
correctly), which I think probably means we don't ever want to change
it after the first calculation! (clearing at tunnel decap wouldn't be
correct either)  For ESP or AH, I believe it's appropriate to use SPI
as a substitute for ports.

Thanks for the comments,
Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  2:42       ` Tom Herbert
@ 2013-11-21  2:50         ` David Miller
  2013-11-21 16:28           ` Tom Herbert
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2013-11-21  2:50 UTC (permalink / raw)
  To: therbert; +Cc: netdev, edumazet, hkchu

From: Tom Herbert <therbert@google.com>
Date: Wed, 20 Nov 2013 18:42:48 -0800

> We need the rxhash to be the value seen at the point of RPS (to do RFS
> correctly), which I think probably means we don't ever want to change
> it after the first calculation! (clearing at tunnel decap wouldn't be
> correct either)  For ESP or AH, I believe it's appropriate to use SPI
> as a substitute for ports.

But that means that all connections going over the same IPSEC path
hash to the same value.

I really think that xfrm_input() should zap the rxhash near the
existing nf_reset() call.

The same argument goes for tunnels, that is why ip_tunnel_core.c does
what it does with the rxhash clearing right now.

In both the IPSEC tunnel (not transport) and normal IP/GRE tunnel
cases, it's a completely new SKB receive, done via netif_rx(), after
decapsulation.

That leaves only IPSEC transport mode as the only case where RFS isn't
(re-)performed but we can build infrastructure to make that happen.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-20 20:25 Get rxhash fixes and RFS support in tun Tom Herbert
  2013-11-21  0:02 ` David Miller
@ 2013-11-21  5:03 ` Jason Wang
  2013-11-22 20:09   ` Tom Herbert
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Wang @ 2013-11-21  5:03 UTC (permalink / raw)
  To: Tom Herbert, davem; +Cc: netdev, edumazet, hkchu

On 11/21/2013 04:25 AM, Tom Herbert wrote:
> This patch series fixes some subtle bugs in tun use of skb->rxhash, all
> rxhash hash not be cleared appropraitely, and adds support for tun flows
> to work with RFS.
>
>
> Testing, in particular with tun, hasn't been completed yet.

Interesting work, did you plan to test this with a kvm guest or even a
multi-queue guest?

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  2:50         ` David Miller
@ 2013-11-21 16:28           ` Tom Herbert
  2013-11-21 18:03             ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2013-11-21 16:28 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Eric Dumazet, Jerry Chu

On Wed, Nov 20, 2013 at 6:50 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Wed, 20 Nov 2013 18:42:48 -0800
>
>> We need the rxhash to be the value seen at the point of RPS (to do RFS
>> correctly), which I think probably means we don't ever want to change
>> it after the first calculation! (clearing at tunnel decap wouldn't be
>> correct either)  For ESP or AH, I believe it's appropriate to use SPI
>> as a substitute for ports.
>
> But that means that all connections going over the same IPSEC path
> hash to the same value.
>
> I really think that xfrm_input() should zap the rxhash near the
> existing nf_reset() call.
>
> The same argument goes for tunnels, that is why ip_tunnel_core.c does
> what it does with the rxhash clearing right now.
>
I suspect this is not the right thing to do any more.  Since we're
doing deep inspection now in flow_dissector, we should already have
the discovered the hash on the inner header if it's a standard encap.
In order to do RFS for tunneled packets at the NIC interface we don't
want recompute to a different L4 hash (which would be common case if
NIC did deep inspection on tunneled packets and we clear hash at
decap).

> In both the IPSEC tunnel (not transport) and normal IP/GRE tunnel
> cases, it's a completely new SKB receive, done via netif_rx(), after
> decapsulation.
>
> That leaves only IPSEC transport mode as the only case where RFS isn't
> (re-)performed but we can build infrastructure to make that happen.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21 16:28           ` Tom Herbert
@ 2013-11-21 18:03             ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2013-11-21 18:03 UTC (permalink / raw)
  To: therbert; +Cc: netdev, edumazet, hkchu

From: Tom Herbert <therbert@google.com>
Date: Thu, 21 Nov 2013 08:28:18 -0800

> Since we're doing deep inspection now in flow_dissector, we should
> already have the discovered the hash on the inner header if it's a
> standard encap.

For AH and normal IP tunnels, this might be fine.  But this is not
possible for ESP since the inner headers are encrypted when the
flow_dissector takes a look.

And again, there is the issue of which hardware devices are doing this
properly.

And for those that do perform inner-tunnel deep inspection for rxhash
generation, how many layers of tunnels are they able to look beneath?

My impression is that they support one level, at best, which means the
rxhash is basically reliable only under a specific set of conditions.

I need you to elaborate with your knowledge of what hardware actually
supports and does in this area a bit more before I can have a real
opinion on what we should be doing.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Get rxhash fixes and RFS support in tun
  2013-11-21  5:03 ` Jason Wang
@ 2013-11-22 20:09   ` Tom Herbert
  0 siblings, 0 replies; 11+ messages in thread
From: Tom Herbert @ 2013-11-22 20:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: David Miller, Linux Netdev List, Eric Dumazet, Jerry Chu

On Wed, Nov 20, 2013 at 9:03 PM, Jason Wang <jasowang@redhat.com> wrote:
> On 11/21/2013 04:25 AM, Tom Herbert wrote:
>> This patch series fixes some subtle bugs in tun use of skb->rxhash, all
>> rxhash hash not be cleared appropraitely, and adds support for tun flows
>> to work with RFS.
>>
>>
>> Testing, in particular with tun, hasn't been completed yet.
>
> Interesting work, did you plan to test this with a kvm guest or even a
> multi-queue guest?
>
Yes, I think that is the direction.  Once we clean up the hashing in
the stack, next thing will be to extend RPS/RFS and computed hashes
functionality all the way into the guest.  I'm not sure yet if the
flow director like mechanism (like in tun) is sufficient, or if we
want to expose aRFS to the guest driver.  As I mentioned, first we
need to test RFS integration in tun to see the effect.

> Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-22 20:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-20 20:25 Get rxhash fixes and RFS support in tun Tom Herbert
2013-11-21  0:02 ` David Miller
2013-11-21  1:09   ` Tom Herbert
2013-11-21  1:12     ` Stephen Hemminger
2013-11-21  2:23     ` David Miller
2013-11-21  2:42       ` Tom Herbert
2013-11-21  2:50         ` David Miller
2013-11-21 16:28           ` Tom Herbert
2013-11-21 18:03             ` David Miller
2013-11-21  5:03 ` Jason Wang
2013-11-22 20:09   ` Tom Herbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).