[PATCH net-next 1/2] udp: Record RPS flow in socket operations

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 1/2] udp: Record RPS flow in socket operations
@ 2014-10-27 18:01 Tom Herbert
  2014-10-27 18:01 ` [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets Tom Herbert
  2014-10-27 18:50 ` [PATCH net-next 1/2] udp: Record RPS flow in socket operations Eric Dumazet
  0 siblings, 2 replies; 13+ messages in thread
From: Tom Herbert @ 2014-10-27 18:01 UTC (permalink / raw)
  To: davem, netdev

Add calls to sock_rps_record_flow for udp_sendmsg, udp_sendpage
and udp_recvmsg. This enables RFS for connected UDP sockets.

Tested:
  Ran netperf UDP_RR with 200 flows, with and without UDP RSS enabled

Before fix:
  No RSS
    Client (connected UDP)
      36.87% CPU utilization
    Server (unconnected UDP)
      33.64% CPU utilization
    256/440/687 90/95/99% latencies
    727273 tps

  UDP RSS
    Client
      79.59% CPU utilization
    Server
      78.83% CPU utilization
    116/159/226 90/95/99% latencies
    1.60974e+06 tps

After fix:
  No RSS
    Client
      44.38% CPU utilization
    Server
      50.46% CPU utilization
    192/245/343 90/95/99% latencies
    1.01413e+06

  UDP RSS
    Client
      79.98% CPU utilization
    Server
      80.35% CPU utilization
    113/158/230 90/95/99% latencies
    1.60622e+06 tps

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd0db54..9a0d346 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -881,6 +881,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	struct ip_options_data opt_copy;
 
+	sock_rps_record_flow(sk);
+
 	if (len > 0xFFFF)
 		return -EMSGSIZE;
 
@@ -1113,6 +1115,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	struct udp_sock *up = udp_sk(sk);
 	int ret;
 
+	sock_rps_record_flow(sk);
+
 	if (flags & MSG_SENDPAGE_NOTLAST)
 		flags |= MSG_MORE;
 
@@ -1253,6 +1257,8 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int is_udplite = IS_UDPLITE(sk);
 	bool slow;
 
+	sock_rps_record_flow(sk);
+
 	if (flags & MSG_ERRQUEUE)
 		return ip_recv_error(sk, msg, len, addr_len);
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-27 18:01 [PATCH net-next 1/2] udp: Record RPS flow in socket operations Tom Herbert
@ 2014-10-27 18:01 ` Tom Herbert
  2014-10-27 18:43   ` Eric Dumazet
  2014-10-27 18:50 ` [PATCH net-next 1/2] udp: Record RPS flow in socket operations Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2014-10-27 18:01 UTC (permalink / raw)
  To: davem, netdev

When receiving a packet on an unconnected UDP socket clear the
flow table for the corresponding hash. This is needed so flows over
unconnected UDP sockets will use RPS instead of using what is
present in the flow table. In particular, this avoids having flows
over unconnected sockets be perpetually steered by unrelated
entries in the flow table (idle TCP connections for instance).

Tested:

First filled up the RPS flow tables by creating a bunch of TCP
connections and letting them turn idle. Next, run netperf UDP_RR
with 200 flows.

Before fix:
  Client (connected UDP)
    81.15% CPU uilization
  Server (unneconnedted UDP)
    83.63% CPU uilization
  118/167/249 90/95/99% latencies
  1.59215e+06 tps

After fix:
  Client (connected UDP)
    81.13% CPU uilization
  Server (unneconnedted UDP)
    80.68% CPU uilization
  116/167/248 90/95/99% latencies
  1.61048e+06 tps

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9a0d346..e58d841 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1451,6 +1451,11 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (inet_sk(sk)->inet_daddr) {
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
+	} else {
+		/* For an unconnected socket reset flow hash so that related
+		 * flow will use RPS.
+		 */
+		sock_rps_reset_flow_hash(skb->hash);
 	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-27 18:01 ` [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets Tom Herbert
@ 2014-10-27 18:43   ` Eric Dumazet
  2014-10-27 19:36     ` Tom Herbert
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2014-10-27 18:43 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Mon, 2014-10-27 at 11:01 -0700, Tom Herbert wrote:
> When receiving a packet on an unconnected UDP socket clear the
> flow table for the corresponding hash. This is needed so flows over
> unconnected UDP sockets will use RPS instead of using what is
> present in the flow table. In particular, this avoids having flows
> over unconnected sockets be perpetually steered by unrelated
> entries in the flow table (idle TCP connections for instance).
> 
> Tested:
> 
> First filled up the RPS flow tables by creating a bunch of TCP
> connections and letting them turn idle. Next, run netperf UDP_RR
> with 200 flows.
> 
> Before fix:
>   Client (connected UDP)
>     81.15% CPU uilization
>   Server (unneconnedted UDP)
>     83.63% CPU uilization
>   118/167/249 90/95/99% latencies
>   1.59215e+06 tps
> 
> After fix:
>   Client (connected UDP)
>     81.13% CPU uilization
>   Server (unneconnedted UDP)
>     80.68% CPU uilization
>   116/167/248 90/95/99% latencies
>   1.61048e+06 tps
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/ipv4/udp.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 9a0d346..e58d841 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1451,6 +1451,11 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  	if (inet_sk(sk)->inet_daddr) {
>  		sock_rps_save_rxhash(sk, skb);
>  		sk_mark_napi_id(sk, skb);
> +	} else {
> +		/* For an unconnected socket reset flow hash so that related
> +		 * flow will use RPS.
> +		 */
> +		sock_rps_reset_flow_hash(skb->hash);
>  	}

I believe I already said this patch was wrong Tom.

We need something else for UDP packets.

Its not because RFS is wrong for UDP packets that we want to make it
worse for TCP traffic.

We do now want UDP packets to gradually make flow table empty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/2] udp: Record RPS flow in socket operations
  2014-10-27 18:01 [PATCH net-next 1/2] udp: Record RPS flow in socket operations Tom Herbert
  2014-10-27 18:01 ` [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets Tom Herbert
@ 2014-10-27 18:50 ` Eric Dumazet
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-10-27 18:50 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Mon, 2014-10-27 at 11:01 -0700, Tom Herbert wrote:
> Add calls to sock_rps_record_flow for udp_sendmsg, udp_sendpage
> and udp_recvmsg. This enables RFS for connected UDP sockets.
> 
> Tested:

...

> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/ipv4/udp.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index cd0db54..9a0d346 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -881,6 +881,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	struct sk_buff *skb;
>  	struct ip_options_data opt_copy;
>  
> +	sock_rps_record_flow(sk);
> +
>  	if (len > 0xFFFF)
>  		return -EMSGSIZE;
>  
> @@ -1113,6 +1115,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
>  	struct udp_sock *up = udp_sk(sk);
>  	int ret;
>  
> +	sock_rps_record_flow(sk);
> +
>  	if (flags & MSG_SENDPAGE_NOTLAST)
>  		flags |= MSG_MORE;
>  
> @@ -1253,6 +1257,8 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	int is_udplite = IS_UDPLITE(sk);
>  	bool slow;
>  
> +	sock_rps_record_flow(sk);
> +
>  	if (flags & MSG_ERRQUEUE)
>  		return ip_recv_error(sk, msg, len, addr_len);
>  

This patch is not needed.

All these paths go through af_inet.c and calls to sock_rps_record_flow()
are already done in inet_sendmsg(), inet_sendpage(), inet_recvmsg()

I wonder what you actually tested.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-27 18:43   ` Eric Dumazet
@ 2014-10-27 19:36     ` Tom Herbert
  2014-10-27 23:19       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2014-10-27 19:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List

On Mon, Oct 27, 2014 at 11:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2014-10-27 at 11:01 -0700, Tom Herbert wrote:
>> When receiving a packet on an unconnected UDP socket clear the
>> flow table for the corresponding hash. This is needed so flows over
>> unconnected UDP sockets will use RPS instead of using what is
>> present in the flow table. In particular, this avoids having flows
>> over unconnected sockets be perpetually steered by unrelated
>> entries in the flow table (idle TCP connections for instance).
>>
>> Tested:
>>
>> First filled up the RPS flow tables by creating a bunch of TCP
>> connections and letting them turn idle. Next, run netperf UDP_RR
>> with 200 flows.
>>
>> Before fix:
>>   Client (connected UDP)
>>     81.15% CPU uilization
>>   Server (unneconnedted UDP)
>>     83.63% CPU uilization
>>   118/167/249 90/95/99% latencies
>>   1.59215e+06 tps
>>
>> After fix:
>>   Client (connected UDP)
>>     81.13% CPU uilization
>>   Server (unneconnedted UDP)
>>     80.68% CPU uilization
>>   116/167/248 90/95/99% latencies
>>   1.61048e+06 tps
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  net/ipv4/udp.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 9a0d346..e58d841 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1451,6 +1451,11 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>>       if (inet_sk(sk)->inet_daddr) {
>>               sock_rps_save_rxhash(sk, skb);
>>               sk_mark_napi_id(sk, skb);
>> +     } else {
>> +             /* For an unconnected socket reset flow hash so that related
>> +              * flow will use RPS.
>> +              */
>> +             sock_rps_reset_flow_hash(skb->hash);
>>       }
>
> I believe I already said this patch was wrong Tom.
>
> We need something else for UDP packets.
>
> Its not because RFS is wrong for UDP packets that we want to make it
> worse for TCP traffic.
>
Please try this patch and provide real data to support your points.

> We do now want UDP packets to gradually make flow table empty.

If a TCP connection is hot it will continually refresh the table for
that connection, if connection becomes idle it only takes one received
packet to restore the CPU. The only time there could be a persistent
problem is if collision rate is high (which probably means table is
too small).

>
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-27 19:36     ` Tom Herbert
@ 2014-10-27 23:19       ` Eric Dumazet
  2014-10-28  1:09         ` Tom Herbert
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2014-10-27 23:19 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Mon, 2014-10-27 at 12:36 -0700, Tom Herbert wrote:

> Please try this patch and provide real data to support your points.
> 

Yep. This is not good, I confirm my fear.

Google servers are shifting to serve both TCP & UDP traffic (QUIC
protocol), with an increasing UDP load.

Millions of packets per second per host, from millions of different
sources...

And your patch voids the RFS table, adds another cache miss in fast path
for UDP rx path which is already too expensive.

> If a TCP connection is hot it will continually refresh the table for
> that connection, if connection becomes idle it only takes one received
> packet to restore the CPU. The only time there could be a persistent
> problem is if collision rate is high (which probably means table is
> too small).

RFS already has a low hit/miss rate, this patch does not help neither
UDP or TCP.

Ideally, RFS should be enabled on a protocol base, not an agnostic u32
flow hash.

Whatever strategy you implement, as long as different protocols share a
common hash table, it wont be perfect for mixed workloads.

Fundamental problem is that when an UDP packet comes, its not possible
to know if its a 'flow' or 'not', unless we perform an expensive lookup,
and then RPS/RFS cost becomes prohibitive.

While for TCP, the current RFS cache miss is good enough, because about
all packets are for connected flows. We eventually have bad steering for
<not yet established> flows where the stack performs poorly anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-27 23:19       ` Eric Dumazet
@ 2014-10-28  1:09         ` Tom Herbert
  2014-10-28  4:51           ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2014-10-28  1:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List

On Mon, Oct 27, 2014 at 4:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2014-10-27 at 12:36 -0700, Tom Herbert wrote:
>
>> Please try this patch and provide real data to support your points.
>>
>
> Yep. This is not good, I confirm my fear.
>
> Google servers are shifting to serve both TCP & UDP traffic (QUIC
> protocol), with an increasing UDP load.
>
> Millions of packets per second per host, from millions of different
> sources...
>
This indicates nothing about the merits of this patch. Nevertheless,
in order to avoid further rat-holing and since this patch does change
a long standing behavior I'll will respin to make it enabled only by
sysctl.

Tom

> And your patch voids the RFS table, adds another cache miss in fast path
> for UDP rx path which is already too expensive.
>
>
>> If a TCP connection is hot it will continually refresh the table for
>> that connection, if connection becomes idle it only takes one received
>> packet to restore the CPU. The only time there could be a persistent
>> problem is if collision rate is high (which probably means table is
>> too small).
>
>
> RFS already has a low hit/miss rate, this patch does not help neither
> UDP or TCP.
>
> Ideally, RFS should be enabled on a protocol base, not an agnostic u32
> flow hash.
>
> Whatever strategy you implement, as long as different protocols share a
> common hash table, it wont be perfect for mixed workloads.
>
> Fundamental problem is that when an UDP packet comes, its not possible
> to know if its a 'flow' or 'not', unless we perform an expensive lookup,
> and then RPS/RFS cost becomes prohibitive.
>
> While for TCP, the current RFS cache miss is good enough, because about
> all packets are for connected flows. We eventually have bad steering for
> <not yet established> flows where the stack performs poorly anyway.
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28  1:09         ` Tom Herbert
@ 2014-10-28  4:51           ` David Miller
  2014-10-28 15:18             ` Tom Herbert
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2014-10-28  4:51 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, netdev

From: Tom Herbert <therbert@google.com>
Date: Mon, 27 Oct 2014 18:09:25 -0700

> This indicates nothing about the merits of this patch. Nevertheless,
> in order to avoid further rat-holing and since this patch does change
> a long standing behavior I'll will respin to make it enabled only by
> sysctl.

Kind of disappointed on my end that you haven't addressed Eric's
main point, which is that:

1) A hash table shared between protocols will perform poorly for
   mixed workloads which are becomming increasingly common.

2) UDP is fundamentally different from TCP in that the issue of
   'flow' vs. 'non-flow' packets

I personally do not see you avoiding this conversation by simply
hiding the new behavior behind a sysctl, I still want you to address
it before I apply anything.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28  4:51           ` David Miller
@ 2014-10-28 15:18             ` Tom Herbert
  2014-10-28 17:38               ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2014-10-28 15:18 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, Linux Netdev List

On Mon, Oct 27, 2014 at 9:51 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Mon, 27 Oct 2014 18:09:25 -0700
>
>> This indicates nothing about the merits of this patch. Nevertheless,
>> in order to avoid further rat-holing and since this patch does change
>> a long standing behavior I'll will respin to make it enabled only by
>> sysctl.
>
> Kind of disappointed on my end that you haven't addressed Eric's
> main point, which is that:
>
> 1) A hash table shared between protocols will perform poorly for
>    mixed workloads which are becomming increasingly common.
>
The major design point of RFS is that it steers L4 flows based on a
hash for the each flow. Preferably, this hash is based on the 5-tuple
of the (innermost) UDP, TCP, SCTP, etc. packet. It is a probabilistic
algorithm whose effectiveness depends on hit rate in the table, hence
the table should be sized to the working set. In RFS, the working set
is defined by the number of simultaneously active flows not by the
number of established flows which could be much greater. We've known
from the beginning that for some servers with large amounts of
non-flow based traffic (particularly DNS servers) RFS may not be
useful. If it's not feasible to size the table to the working set,
then RFS shouldn't be used.

> 2) UDP is fundamentally different from TCP in that the issue of
>    'flow' vs. 'non-flow' packets
>
We are seeing many instances where UDP packets carry flows, and
conversely there are important cases where TCP packets do not
correspond to flows.

UDP tunnels are becoming increasingly common. VXLAN, FOU, GUE, geneve,
l2tp, esp/UDP, GRE/UDP, nvgre, etc. all rely on steering based on the
outer header without deep inspection. When the source port is set to
inner hash RFS works as is and steering is effectively done based
inner TCP connections. If aRFS supports UDP, then this should just
work also for UDP tunnels (another instance where we don't need
protocol specific support in devices for tunneling).

QUIC itself is flow based. It is a transport protocol about as
sophisticated as TCP that is encapsulated in UDP to facilitate
transport. The fact that QUIC might have millions of simultaneously
active connections is a problem of scale, not of the algorithm. If we
have a server with millions of active TCP connections we'd have the
exact same scaling problem.

Under several DOS attacks TCP packets are not flow based. For instance
in a SYN attack once we get into syn cookies, SYNs are steered based
on whatever is in the table and the table is not updated for these
packets. This case exhibits the same characteristics as non-flow UDP.
In fact makes me think we should also be clearing the flow table in
this case.

> I personally do not see you avoiding this conversation by simply
> hiding the new behavior behind a sysctl, I still want you to address
> it before I apply anything.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28 15:18             ` Tom Herbert
@ 2014-10-28 17:38               ` Eric Dumazet
  2014-10-28 19:07                 ` Tom Herbert
  2014-10-29  1:35                 ` Tom Herbert
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-10-28 17:38 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Tue, 2014-10-28 at 08:18 -0700, Tom Herbert wrote:

> UDP tunnels are becoming increasingly common. VXLAN, FOU, GUE, geneve,
> l2tp, esp/UDP, GRE/UDP, nvgre, etc. all rely on steering based on the
> outer header without deep inspection. When the source port is set to
> inner hash RFS works as is and steering is effectively done based
> inner TCP connections. If aRFS supports UDP, then this should just
> work also for UDP tunnels (another instance where we don't need
> protocol specific support in devices for tunneling).

If you really wanted to solve this, you would need to change RFS to be
aware of the tunnel and find L4 information, instead of current
implementation stopping at first UDP layer.

But as get_rps_cpu() / __skb_flow_dissect() have no way to find this,
you instead chose to invalidate RFS and maybe rely on RPS, because it
might help your workload.

Just to be clear : I tested the patch and saw a regression in my tests,
sending as little as one million UDP packets per second on the target.

Not only UDP rx processing was slower, but TCP flows were impacted.

With a table of 65536 slots, each slot was written 16 times per second
in average.

Google kernels have RFS_Hit/FRS_Miss snmp counters to catch this kind of
problems. Maybe I should upstream this part.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28 17:38               ` Eric Dumazet
@ 2014-10-28 19:07                 ` Tom Herbert
  2014-10-28 19:59                   ` Eric Dumazet
  2014-10-29  1:35                 ` Tom Herbert
  1 sibling, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2014-10-28 19:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List

On Tue, Oct 28, 2014 at 10:38 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-28 at 08:18 -0700, Tom Herbert wrote:
>
>> UDP tunnels are becoming increasingly common. VXLAN, FOU, GUE, geneve,
>> l2tp, esp/UDP, GRE/UDP, nvgre, etc. all rely on steering based on the
>> outer header without deep inspection. When the source port is set to
>> inner hash RFS works as is and steering is effectively done based
>> inner TCP connections. If aRFS supports UDP, then this should just
>> work also for UDP tunnels (another instance where we don't need
>> protocol specific support in devices for tunneling).
>
>
> If you really wanted to solve this, you would need to change RFS to be
> aware of the tunnel and find L4 information, instead of current
> implementation stopping at first UDP layer.
>
I don't see what problem there is to solve. RFS already works for UDP
tunnels by design. If we do deep packet inspection for UDP tunnels it
is just additional complexity in flow_dissector, risking taking
additional cache miss on every packet to load headers, and it's very
possible we'd do a lot of work just hit to an impenetrable wall (like
encapsulated headers are encrypted).

> But as get_rps_cpu() / __skb_flow_dissect() have no way to find this,
> you instead chose to invalidate RFS and maybe rely on RPS, because it
> might help your workload.
>
> Just to be clear : I tested the patch and saw a regression in my tests,
> sending as little as one million UDP packets per second on the target.
>
> Not only UDP rx processing was slower, but TCP flows were impacted.
>
> With a table of 65536 slots, each slot was written 16 times per second
> in average.
>
As I said, for some applications (like DNS which I suspect you're
basically emulating) it is infeasible to size the table. Try disabling
RFS for your test.

> Google kernels have RFS_Hit/FRS_Miss snmp counters to catch this kind of
> problems. Maybe I should upstream this part.
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28 19:07                 ` Tom Herbert
@ 2014-10-28 19:59                   ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2014-10-28 19:59 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Tue, 2014-10-28 at 12:07 -0700, Tom Herbert wrote:

> As I said, for some applications (like DNS which I suspect you're
> basically emulating) it is infeasible to size the table. Try disabling
> RFS for your test.

Well, we already did experiments.

- DNS servers are using kernel bypass.
  Damn faster than SO_REUSEPORT on UDP anyway (Ying Cai is working on
this problem, since QUIC does not yet use kernel bypass and wants
FQ/pacing)

- Disable RFS for non TCP flows.

- Or have separate hash tables for TCP/UDP (slightly same effect, as UDP
table is mostly empty in our case)


Disabling RFS is the on/off behavior you seem to push, nice for
benchmarks without hassle.

I will no longer comment on this thread, it appears we disagree and wont
find an agreement.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
  2014-10-28 17:38               ` Eric Dumazet
  2014-10-28 19:07                 ` Tom Herbert
@ 2014-10-29  1:35                 ` Tom Herbert
  1 sibling, 0 replies; 13+ messages in thread
From: Tom Herbert @ 2014-10-29  1:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List

On Tue, Oct 28, 2014 at 10:38 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-10-28 at 08:18 -0700, Tom Herbert wrote:
>
>> UDP tunnels are becoming increasingly common. VXLAN, FOU, GUE, geneve,
>> l2tp, esp/UDP, GRE/UDP, nvgre, etc. all rely on steering based on the
>> outer header without deep inspection. When the source port is set to
>> inner hash RFS works as is and steering is effectively done based
>> inner TCP connections. If aRFS supports UDP, then this should just
>> work also for UDP tunnels (another instance where we don't need
>> protocol specific support in devices for tunneling).
>
>
> If you really wanted to solve this, you would need to change RFS to be
> aware of the tunnel and find L4 information, instead of current
> implementation stopping at first UDP layer.
>
> But as get_rps_cpu() / __skb_flow_dissect() have no way to find this,
> you instead chose to invalidate RFS and maybe rely on RPS, because it
> might help your workload.
>
> Just to be clear : I tested the patch and saw a regression in my tests,
> sending as little as one million UDP packets per second on the target.
>
Can you describe this test so that I can try to reproduce and maybe
debug the issue you're seeing with the patch?

Thanks,
Tom

> Not only UDP rx processing was slower, but TCP flows were impacted.
>
> With a table of 65536 slots, each slot was written 16 times per second
> in average.
>
> Google kernels have RFS_Hit/FRS_Miss snmp counters to catch this kind of
> problems. Maybe I should upstream this part.
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-10-29  1:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27 18:01 [PATCH net-next 1/2] udp: Record RPS flow in socket operations Tom Herbert
2014-10-27 18:01 ` [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets Tom Herbert
2014-10-27 18:43   ` Eric Dumazet
2014-10-27 19:36     ` Tom Herbert
2014-10-27 23:19       ` Eric Dumazet
2014-10-28  1:09         ` Tom Herbert
2014-10-28  4:51           ` David Miller
2014-10-28 15:18             ` Tom Herbert
2014-10-28 17:38               ` Eric Dumazet
2014-10-28 19:07                 ` Tom Herbert
2014-10-28 19:59                   ` Eric Dumazet
2014-10-29  1:35                 ` Tom Herbert
2014-10-27 18:50 ` [PATCH net-next 1/2] udp: Record RPS flow in socket operations Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).