netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 0/3] Improve UDP multicast receive latency
@ 2013-10-01 19:33 Shawn Bohrer
  2013-10-01 19:33 ` [net-next 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, tomk, netdev, Shawn Bohrer

The removal of the routing cache in 3.6 had impacted the latency of our
UDP multicast workload.  This patch series brings down the latency to
what we were seeing with 3.4.

Patch 1 "udp: Only allow busy read/poll on connected sockets" is mostly
done for correctness and because it allows unifying the unicast and
multicast paths when a socket is found in early demux.  It can also
improve latency for a connected multicast socket if busy read/poll is
used.

Patches 2&3 remove the fib lookups and restore latency for our workload
to the pre 3.6 levels.

Benchmark results from a netperf UDP_RR test:
3.11 kernel   90596.44 transactions/s
3.11 + series 91792.70 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
3.11 kernel   12.647us RTT
3.11 + series 12.233us RTT

Shawn Bohrer (3):
  udp: Only allow busy read/poll on connected sockets
  udp: Add udp early demux
  net: ipv4 only populate IP_PKTINFO when needed

 include/net/ip.h       |    2 +-
 include/net/sock.h     |    2 +-
 include/net/udp.h      |    1 +
 net/ipv4/af_inet.c     |    1 +
 net/ipv4/ip_sockglue.c |    5 +-
 net/ipv4/raw.c         |    2 +-
 net/ipv4/udp.c         |  160 +++++++++++++++++++++++++++++++++++++++++------
 net/ipv6/udp.c         |    5 +-
 8 files changed, 150 insertions(+), 28 deletions(-)

-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [net-next 1/3] udp: Only allow busy read/poll on connected sockets
  2013-10-01 19:33 [net-next 0/3] Improve UDP multicast receive latency Shawn Bohrer
@ 2013-10-01 19:33 ` Shawn Bohrer
  2013-10-01 20:44   ` Eric Dumazet
  2013-10-01 19:33 ` [net-next 2/3] udp: Add udp early demux Shawn Bohrer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, tomk, netdev, Shawn Bohrer

UDP sockets can receive packets from multiple endpoints and thus may be
received on multiple receive queues.  Since packets packets can arrive
on multiple receive queues we should not mark the napi_id for all
packets.  This makes busy read/poll only work for connected UDP sockets.

This additionally enables busy read/poll for UDP multicast packets as
long as the socket is connected by moving the check into
__udp_queue_rcv_skb().

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
 net/ipv4/udp.c |    5 +++--
 net/ipv6/udp.c |    5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 728ce95..1982a03 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1405,8 +1405,10 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int rc;
 
-	if (inet_sk(sk)->inet_daddr)
+	if (inet_sk(sk)->inet_daddr) {
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
+	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
@@ -1716,7 +1718,6 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk != NULL) {
 		int ret;
 
-		sk_mark_napi_id(sk, skb);
 		ret = udp_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f405815..84e18ab 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -549,8 +549,10 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int rc;
 
-	if (!ipv6_addr_any(&inet6_sk(sk)->daddr))
+	if (!ipv6_addr_any(&inet6_sk(sk)->daddr)) {
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
+	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
@@ -844,7 +846,6 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk != NULL) {
 		int ret;
 
-		sk_mark_napi_id(sk, skb);
 		ret = udpv6_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next 2/3] udp: Add udp early demux
  2013-10-01 19:33 [net-next 0/3] Improve UDP multicast receive latency Shawn Bohrer
  2013-10-01 19:33 ` [net-next 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
@ 2013-10-01 19:33 ` Shawn Bohrer
  2013-10-01 20:12   ` Rick Jones
  2013-10-01 20:52   ` Eric Dumazet
  2013-10-01 19:33 ` [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
  2013-10-01 20:21 ` [net-next 0/3] Improve UDP multicast receive latency Veaceslav Falico
  3 siblings, 2 replies; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, tomk, netdev, Shawn Bohrer

The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.

For UDP multicast we can only cache the dst if there is only one
receiving socket on the host.  Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.

Benchmark results from a netperf UDP_RR test:
Before 90596.44 transactions/s
After  91296.97 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.647us RTT
After  12.497us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
 include/net/sock.h |    2 +-
 include/net/udp.h  |    1 +
 net/ipv4/af_inet.c |    1 +
 net/ipv4/udp.c     |  153 +++++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 138 insertions(+), 19 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index cf91c8e..46661dd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -218,7 +218,7 @@ struct cg_proto;
   *	@sk_lock:	synchronizer
   *	@sk_rcvbuf: size of receive buffer in bytes
   *	@sk_wq: sock wait queue and async head
-  *	@sk_rx_dst: receive input route used by early tcp demux
+  *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
   *	@sk_dst_lock: destination cache lock
   *	@sk_policy: flow policy
diff --git a/include/net/udp.h b/include/net/udp.h
index 510b8cb..fe4ba9f 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,6 +175,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		     unsigned int hash2_nulladdr);
 
 /* net/ipv4/udp.c */
+void udp_v4_early_demux(struct sk_buff *skb);
 int udp_get_port(struct sock *sk, unsigned short snum,
 		 int (*saddr_cmp)(const struct sock *,
 				  const struct sock *));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..3539ddf 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1548,6 +1548,7 @@ static const struct net_protocol tcp_protocol = {
 };
 
 static const struct net_protocol udp_protocol = {
+	.early_demux =	udp_v4_early_demux,
 	.handler =	udp_rcv,
 	.err_handler =	udp_err,
 	.no_policy =	1,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1982a03..ca54886 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -565,6 +565,26 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 }
 EXPORT_SYMBOL_GPL(udp4_lib_lookup);
 
+static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
+				       __be16 loc_port, __be32 loc_addr,
+				       __be16 rmt_port, __be32 rmt_addr,
+				       int dif, unsigned short hnum)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	if (!net_eq(sock_net(sk), net) ||
+	    udp_sk(sk)->udp_port_hash != hnum ||
+	    (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
+	    (inet->inet_dport != rmt_port && inet->inet_dport) ||
+	    (inet->inet_rcv_saddr && inet->inet_rcv_saddr != loc_addr) ||
+	    ipv6_only_sock(sk) ||
+	    (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif))
+		return false;
+	if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif))
+		return false;
+	return true;
+}
+
 static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
 					     __be16 loc_port, __be32 loc_addr,
 					     __be16 rmt_port, __be32 rmt_addr,
@@ -575,20 +595,11 @@ static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
 	unsigned short hnum = ntohs(loc_port);
 
 	sk_nulls_for_each_from(s, node) {
-		struct inet_sock *inet = inet_sk(s);
-
-		if (!net_eq(sock_net(s), net) ||
-		    udp_sk(s)->udp_port_hash != hnum ||
-		    (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
-		    (inet->inet_dport != rmt_port && inet->inet_dport) ||
-		    (inet->inet_rcv_saddr &&
-		     inet->inet_rcv_saddr != loc_addr) ||
-		    ipv6_only_sock(s) ||
-		    (s->sk_bound_dev_if && s->sk_bound_dev_if != dif))
-			continue;
-		if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif))
-			continue;
-		goto found;
+		if (__udp_is_mcast_sock(net, s,
+					loc_port, loc_addr,
+					rmt_port, rmt_addr,
+					dif, hnum))
+			goto found;
 	}
 	s = NULL;
 found:
@@ -1581,6 +1592,14 @@ static void flush_stack(struct sock **stack, unsigned int count,
 		kfree_skb(skb1);
 }
 
+static void udp_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	dst_hold(dst);
+	sk->sk_rx_dst = dst;
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
@@ -1709,11 +1728,28 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (udp4_csum_init(skb, uh, proto))
 		goto csum_error;
 
-	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
-		return __udp4_lib_mcast_deliver(net, skb, uh,
-				saddr, daddr, udptable);
+	if (skb->sk) {
+		int ret;
+		sk = skb->sk;
+
+		if (unlikely(sk->sk_rx_dst == NULL))
+			udp_sk_rx_dst_set(sk, skb);
+
+		ret = udp_queue_rcv_skb(sk, skb);
+
+		/* a return value > 0 means to resubmit the input, but
+		 * it wants the return to be -protocol, or 0
+		 */
+		if (ret > 0)
+			return -ret;
+		return 0;
+	} else {
+		if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
+			return __udp4_lib_mcast_deliver(net, skb, uh,
+					saddr, daddr, udptable);
 
-	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+		sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+	}
 
 	if (sk != NULL) {
 		int ret;
@@ -1771,6 +1807,87 @@ drop:
 	return 0;
 }
 
+/* We can only early demux multicast if there is a single matching socket.
+ * If more than one socket found returns NULL
+ */
+static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net,
+						  __be16 loc_port, __be32 loc_addr,
+						  __be16 rmt_port, __be32 rmt_addr,
+						  int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int count, slot = udp_hashfn(net, hnum, udp_table.mask);
+	struct udp_hslot *hslot = &udp_table.hash[slot];
+
+	rcu_read_lock();
+begin:
+	count = 0;
+	result = NULL;
+	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+		if (__udp_is_mcast_sock(net, sk,
+					loc_port, loc_addr,
+					rmt_port, rmt_addr,
+					dif, hnum)) {
+			result = sk;
+			++count;
+		}
+	}
+	/*
+	 * if the nulls value we got at the end of this lookup is
+	 * not the expected one, we must restart lookup.
+	 * We probably met an item that was moved to another chain.
+	 */
+	if (get_nulls_value(node) != slot)
+		goto begin;
+
+	if (result) {
+		if (count != 1 ||
+		    unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+	}
+	rcu_read_unlock();
+	return result;
+
+}
+
+void udp_v4_early_demux(struct sk_buff *skb)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	const struct udphdr *uh = udp_hdr(skb);
+	struct sock *sk;
+	struct dst_entry *dst;
+	struct net *net = dev_net(skb->dev);
+	int dif = skb->dev->ifindex;
+
+	/* validate the packet */
+	if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr)))
+		return;
+
+	if (skb->pkt_type == PACKET_BROADCAST ||
+	    skb->pkt_type == PACKET_MULTICAST)
+		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
+						   uh->source, iph->saddr, dif);
+	else if (skb->pkt_type == PACKET_HOST)
+		sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
+				       iph->daddr, uh->dest, dif, &udp_table);
+	else
+		return;
+
+	if (!sk)
+		return;
+
+	skb->sk = sk;
+	skb->destructor = sock_edemux;
+	dst = sk->sk_rx_dst;
+
+	if (dst)
+		dst = dst_check(dst, 0);
+	if (dst)
+		skb_dst_set_noref(skb, dst);
+}
+
 int udp_rcv(struct sk_buff *skb)
 {
 	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed
  2013-10-01 19:33 [net-next 0/3] Improve UDP multicast receive latency Shawn Bohrer
  2013-10-01 19:33 ` [net-next 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
  2013-10-01 19:33 ` [net-next 2/3] udp: Add udp early demux Shawn Bohrer
@ 2013-10-01 19:33 ` Shawn Bohrer
  2013-10-01 20:42   ` Eric Dumazet
  2013-10-01 20:21 ` [net-next 0/3] Improve UDP multicast receive latency Veaceslav Falico
  3 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, tomk, netdev, Shawn Bohrer

The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received.  This has introduced a performance regression for some
UDP workloads.

This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.

Benchmark results from a netperf UDP_RR test:
Before 91296.97 transactions/s
After  91792.70 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.647us RTT
After  12.233us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
 include/net/ip.h       |    2 +-
 net/ipv4/ip_sockglue.c |    5 +++--
 net/ipv4/raw.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 16078f4..bc98241 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -459,7 +459,7 @@ int ip_options_rcv_srr(struct sk_buff *skb);
  *	Functions provided by ip_sockglue.c
  */
 
-void ipv4_pktinfo_prepare(struct sk_buff *skb);
+void	ipv4_pktinfo_prepare(struct sock *sk, struct sk_buff *skb);
 void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb);
 int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc);
 int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval,
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 56e3445..dda9866 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1052,11 +1052,12 @@ e_inval:
  * destination in skb->cb[] before dst drop.
  * This way, receiver doesnt make cache line misses to read rtable.
  */
-void ipv4_pktinfo_prepare(struct sk_buff *skb)
+void ipv4_pktinfo_prepare(struct sock *sk, struct sk_buff *skb)
 {
 	struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb);
 
-	if (skb_rtable(skb)) {
+	if ((inet_sk(sk)->cmsg_flags & IP_CMSG_PKTINFO) &&
+	    skb_rtable(skb)) {
 		pktinfo->ipi_ifindex = inet_iif(skb);
 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
 	} else {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a3fe534..28694f8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -297,7 +297,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	/* Charge it to the socket. */
 
-	ipv4_pktinfo_prepare(skb);
+	ipv4_pktinfo_prepare(sk, skb);
 	if (sock_queue_rcv_skb(sk, skb) < 0) {
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca54886..02185a5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1543,7 +1543,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 	rc = 0;
 
-	ipv4_pktinfo_prepare(skb);
+	ipv4_pktinfo_prepare(sk, skb);
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		rc = __udp_queue_rcv_skb(sk, skb);
-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-01 19:33 ` [net-next 2/3] udp: Add udp early demux Shawn Bohrer
@ 2013-10-01 20:12   ` Rick Jones
  2013-10-01 22:26     ` Shawn Bohrer
  2013-10-01 20:52   ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Rick Jones @ 2013-10-01 20:12 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, Eric Dumazet, tomk, netdev

On 10/01/2013 12:33 PM, Shawn Bohrer wrote:
> The removal of the routing cache introduced a performance regression for
> some UDP workloads since a dst lookup must be done for each packet.
> This change caches the dst per socket in a similar manner to what we do
> for TCP by implementing early_demux.
>
> For UDP multicast we can only cache the dst if there is only one
> receiving socket on the host.  Since caching only works when there is
> one receiving socket we do the multicast socket lookup using RCU.
>
> Benchmark results from a netperf UDP_RR test:
> Before 90596.44 transactions/s
> After  91296.97 transactions/s

Were those measured with confidence intervals enabled?  It would be a 
Good Idea (tm) to either use that - I would suggest -I 99,1 -i 30,3 
added to the global portion of the netperf command line - or take 
several runs.  (If you've not already done so since those look more like 
"raw" netperf numbers rather than theaverage of several runs)

happy benchmarking,

rick jones

> Benchmark results from a fio 1 byte UDP multicast pingpong test
> (Multicast one way unicast response):
> Before 12.647us RTT
> After  12.497us RTT

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 0/3] Improve UDP multicast receive latency
  2013-10-01 19:33 [net-next 0/3] Improve UDP multicast receive latency Shawn Bohrer
                   ` (2 preceding siblings ...)
  2013-10-01 19:33 ` [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
@ 2013-10-01 20:21 ` Veaceslav Falico
  3 siblings, 0 replies; 19+ messages in thread
From: Veaceslav Falico @ 2013-10-01 20:21 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, Eric Dumazet, tomk, netdev

On Tue, Oct 01, 2013 at 02:33:42PM -0500, Shawn Bohrer wrote:
>The removal of the routing cache in 3.6 had impacted the latency of our
>UDP multicast workload.  This patch series brings down the latency to
>what we were seeing with 3.4.
>
>Patch 1 "udp: Only allow busy read/poll on connected sockets" is mostly
>done for correctness and because it allows unifying the unicast and
>multicast paths when a socket is found in early demux.  It can also
>improve latency for a connected multicast socket if busy read/poll is
>used.
>
>Patches 2&3 remove the fib lookups and restore latency for our workload
>to the pre 3.6 levels.
>
>Benchmark results from a netperf UDP_RR test:
>3.11 kernel   90596.44 transactions/s
>3.11 + series 91792.70 transactions/s
>
>Benchmark results from a fio 1 byte UDP multicast pingpong test
>(Multicast one way unicast response):
>3.11 kernel   12.647us RTT
>3.11 + series 12.233us RTT
>
>Shawn Bohrer (3):
>  udp: Only allow busy read/poll on connected sockets
>  udp: Add udp early demux
>  net: ipv4 only populate IP_PKTINFO when needed
>
> include/net/ip.h       |    2 +-
> include/net/sock.h     |    2 +-
> include/net/udp.h      |    1 +
> net/ipv4/af_inet.c     |    1 +
> net/ipv4/ip_sockglue.c |    5 +-
> net/ipv4/raw.c         |    2 +-
> net/ipv4/udp.c         |  160 +++++++++++++++++++++++++++++++++++++++++------
> net/ipv6/udp.c         |    5 +-
> 8 files changed, 150 insertions(+), 28 deletions(-)
>
>-- 
>1.7.7.6
>
>
>-- 
>
>---------------------------------------------------------------
>This email, along with any attachments, is confidential. If you
>believe you received this message in error, please contact the
>sender immediately and delete all copies of the message.
>Thank you.

It's not a good idea to send patches with that kind of footer, afaik.

>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed
  2013-10-01 19:33 ` [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
@ 2013-10-01 20:42   ` Eric Dumazet
  2013-10-01 22:29     ` Shawn Bohrer
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-10-01 20:42 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:

> -void ipv4_pktinfo_prepare(struct sk_buff *skb)
> +void ipv4_pktinfo_prepare(struct sock *sk, struct sk_buff *skb)


Seems good to me, could you use :

void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 1/3] udp: Only allow busy read/poll on connected sockets
  2013-10-01 19:33 ` [net-next 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
@ 2013-10-01 20:44   ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2013-10-01 20:44 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> UDP sockets can receive packets from multiple endpoints and thus may be
> received on multiple receive queues.  Since packets packets can arrive
> on multiple receive queues we should not mark the napi_id for all
> packets.  This makes busy read/poll only work for connected UDP sockets.
> 
> This additionally enables busy read/poll for UDP multicast packets as
> long as the socket is connected by moving the check into
> __udp_queue_rcv_skb().
> 
> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> ---

Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>

Thanks for following up, it seems I forgot to submit this patch ;)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-01 19:33 ` [net-next 2/3] udp: Add udp early demux Shawn Bohrer
  2013-10-01 20:12   ` Rick Jones
@ 2013-10-01 20:52   ` Eric Dumazet
  2013-10-02 17:34     ` Shawn Bohrer
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-10-01 20:52 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> The removal of the routing cache introduced a performance regression for
> some UDP workloads since a dst lookup must be done for each packet.
> This change caches the dst per socket in a similar manner to what we do
> for TCP by implementing early_demux.
> 
> For UDP multicast we can only cache the dst if there is only one
> receiving socket on the host.  Since caching only works when there is
> one receiving socket we do the multicast socket lookup using RCU.

For unicast, we should find a matching socket for early demux only if
this is a connected socket.

Otherwise, forwarding setups will break.

You probably need to add a minimum score to __udp4_lib_lookup()

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-01 20:12   ` Rick Jones
@ 2013-10-01 22:26     ` Shawn Bohrer
  0 siblings, 0 replies; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 22:26 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, Eric Dumazet, tomk, netdev

On Tue, Oct 01, 2013 at 01:12:18PM -0700, Rick Jones wrote:
> On 10/01/2013 12:33 PM, Shawn Bohrer wrote:
> >The removal of the routing cache introduced a performance regression for
> >some UDP workloads since a dst lookup must be done for each packet.
> >This change caches the dst per socket in a similar manner to what we do
> >for TCP by implementing early_demux.
> >
> >For UDP multicast we can only cache the dst if there is only one
> >receiving socket on the host.  Since caching only works when there is
> >one receiving socket we do the multicast socket lookup using RCU.
> >
> >Benchmark results from a netperf UDP_RR test:
> >Before 90596.44 transactions/s
> >After  91296.97 transactions/s
> 
> Were those measured with confidence intervals enabled?  It would be
> a Good Idea (tm) to either use that - I would suggest -I 99,1 -i
> 30,3 added to the global portion of the netperf command line - or
> take several runs.  (If you've not already done so since those look
> more like "raw" netperf numbers rather than theaverage of several
> runs)

Those are the averages from six one minute UDP_RR tests.  If I
re-rerun the netperf numbers I'll use the confidence intervals I did
not know about that feature.  In general I just added these quick
numbers as a point of reference and to double check that some open
source benchmarks could even see a difference.  For me the real
benchmark is my workload which does show a much more significant
difference with these changes.

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed
  2013-10-01 20:42   ` Eric Dumazet
@ 2013-10-01 22:29     ` Shawn Bohrer
  0 siblings, 0 replies; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-01 22:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev

On Tue, Oct 01, 2013 at 01:42:30PM -0700, Eric Dumazet wrote:
> On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> 
> > -void ipv4_pktinfo_prepare(struct sk_buff *skb)
> > +void ipv4_pktinfo_prepare(struct sock *sk, struct sk_buff *skb)
> 
> 
> Seems good to me, could you use :
> 
> void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)

Yep, I'll make that const and resend.

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-01 20:52   ` Eric Dumazet
@ 2013-10-02 17:34     ` Shawn Bohrer
  2013-10-02 18:09       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-02 17:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev

On Tue, Oct 01, 2013 at 01:52:49PM -0700, Eric Dumazet wrote:
> On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> > The removal of the routing cache introduced a performance regression for
> > some UDP workloads since a dst lookup must be done for each packet.
> > This change caches the dst per socket in a similar manner to what we do
> > for TCP by implementing early_demux.
> > 
> > For UDP multicast we can only cache the dst if there is only one
> > receiving socket on the host.  Since caching only works when there is
> > one receiving socket we do the multicast socket lookup using RCU.
> 
> For unicast, we should find a matching socket for early demux only if
> this is a connected socket.
> 
> Otherwise, forwarding setups will break.
> 
> You probably need to add a minimum score to __udp4_lib_lookup()

Perhaps I'm missing something but I don't think a minimum score would
work because compute_score() and compute_score2() have several ways of
returning a score of lets say 4 and I don't think they all mean the
socket is connected.  Why not just test the socket returned by
__udp4_lib_lookup() to see if it is connected in
udp_v4_early_demux()?  Something like:

        sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
                               iph->daddr, uh->dest, dif,
                               &udp_table);
        /* Only demux connected sockets or forwarding setups will break */
        if (sk && !inet_sk(sk)->inet_daddr)
                return;

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 17:34     ` Shawn Bohrer
@ 2013-10-02 18:09       ` Eric Dumazet
  2013-10-02 20:35         ` Shawn Bohrer
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-10-02 18:09 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Wed, 2013-10-02 at 12:34 -0500, Shawn Bohrer wrote:
> On Tue, Oct 01, 2013 at 01:52:49PM -0700, Eric Dumazet wrote:
> > On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> > > The removal of the routing cache introduced a performance regression for
> > > some UDP workloads since a dst lookup must be done for each packet.
> > > This change caches the dst per socket in a similar manner to what we do
> > > for TCP by implementing early_demux.
> > > 
> > > For UDP multicast we can only cache the dst if there is only one
> > > receiving socket on the host.  Since caching only works when there is
> > > one receiving socket we do the multicast socket lookup using RCU.
> > 
> > For unicast, we should find a matching socket for early demux only if
> > this is a connected socket.
> > 
> > Otherwise, forwarding setups will break.
> > 
> > You probably need to add a minimum score to __udp4_lib_lookup()
> 
> Perhaps I'm missing something but I don't think a minimum score would
> work because compute_score() and compute_score2() have several ways of
> returning a score of lets say 4 and I don't think they all mean the
> socket is connected.

Just change how score is computed. The existing +4 values are not hard
coded anywhere.

You want to compute a score so that a single compare against a threshold
is enough to tell you what's going on, before even taking a refcount on
the socket.


>   Why not just test the socket returned by
> __udp4_lib_lookup() to see if it is connected in
> udp_v4_early_demux()?  Something like:
> 
>         sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
>                                iph->daddr, uh->dest, dif,
>                                &udp_table);
>         /* Only demux connected sockets or forwarding setups will break */
>         if (sk && !inet_sk(sk)->inet_daddr)
>                 return;

nice socket refcount leak ;)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 18:09       ` Eric Dumazet
@ 2013-10-02 20:35         ` Shawn Bohrer
  2013-10-02 21:08           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-02 20:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev

On Wed, Oct 02, 2013 at 11:09:25AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 12:34 -0500, Shawn Bohrer wrote:
> > On Tue, Oct 01, 2013 at 01:52:49PM -0700, Eric Dumazet wrote:
> > > On Tue, 2013-10-01 at 14:33 -0500, Shawn Bohrer wrote:
> > > > The removal of the routing cache introduced a performance regression for
> > > > some UDP workloads since a dst lookup must be done for each packet.
> > > > This change caches the dst per socket in a similar manner to what we do
> > > > for TCP by implementing early_demux.
> > > > 
> > > > For UDP multicast we can only cache the dst if there is only one
> > > > receiving socket on the host.  Since caching only works when there is
> > > > one receiving socket we do the multicast socket lookup using RCU.
> > > 
> > > For unicast, we should find a matching socket for early demux only if
> > > this is a connected socket.
> > > 
> > > Otherwise, forwarding setups will break.
> > > 
> > > You probably need to add a minimum score to __udp4_lib_lookup()
> > 
> > Perhaps I'm missing something but I don't think a minimum score would
> > work because compute_score() and compute_score2() have several ways of
> > returning a score of lets say 4 and I don't think they all mean the
> > socket is connected.
> 
> Just change how score is computed. The existing +4 values are not hard
> coded anywhere.
> 
> You want to compute a score so that a single compare against a threshold
> is enough to tell you what's going on, before even taking a refcount on
> the socket.

Sorry, I must be a little slow today.  I understand what you are
suggesting but I don't see how to implement it with a score.  Or at
least not without potentially changing existing behavior.  For example
I could make the inet->inet_daddr case add +100 to the score and I
would know that a score >= 100 was connected.  However, this would
unfairly favor that one case making a socket that only had a matching
inet_daddr be better than one that only had a matching inet_dport,
sk_bound_dev_if, and inet_rcv_saddr.

The other possibility I can think of would be to use a bit mask so I
could see which tests passed and I could compute a score by counting
the set bits.  This would probably work since most of the tests
currently add an equal weight of 4 except for the one
(sk->sk_family == PF_INET) test which I'm not sure how to handle using
this strategy.

Did you have something specific in mind with your suggestion?

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 20:35         ` Shawn Bohrer
@ 2013-10-02 21:08           ` Eric Dumazet
  2013-10-02 21:24             ` Shawn Bohrer
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-10-02 21:08 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Wed, 2013-10-02 at 15:35 -0500, Shawn Bohrer wrote:

> Sorry, I must be a little slow today.  I understand what you are
> suggesting but I don't see how to implement it with a score.  Or at
> least not without potentially changing existing behavior.  For example
> I could make the inet->inet_daddr case add +100 to the score and I
> would know that a score >= 100 was connected.  However, this would
> unfairly favor that one case making a socket that only had a matching
> inet_daddr be better than one that only had a matching inet_dport,
> sk_bound_dev_if, and inet_rcv_saddr.
> 

If early demux has to increment a socket refcount, then decrementing it
because it found a non connected socket, this will be too expensive.

Also, keep in mind UDP chains can be long, so you should limit the early
lookup to say a single socket.

TCP ehash is mostly empty (0 or 1 socket per bucket), so early demux
really makes sense, but for UDP, there is no such property.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 21:08           ` Eric Dumazet
@ 2013-10-02 21:24             ` Shawn Bohrer
  2013-10-02 21:38               ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-02 21:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev

On Wed, Oct 02, 2013 at 02:08:38PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 15:35 -0500, Shawn Bohrer wrote:
> 
> > Sorry, I must be a little slow today.  I understand what you are
> > suggesting but I don't see how to implement it with a score.  Or at
> > least not without potentially changing existing behavior.  For example
> > I could make the inet->inet_daddr case add +100 to the score and I
> > would know that a score >= 100 was connected.  However, this would
> > unfairly favor that one case making a socket that only had a matching
> > inet_daddr be better than one that only had a matching inet_dport,
> > sk_bound_dev_if, and inet_rcv_saddr.
> > 
> 
> If early demux has to increment a socket refcount, then decrementing it
> because it found a non connected socket, this will be too expensive.
> 
> Also, keep in mind UDP chains can be long, so you should limit the early
> lookup to say a single socket.
> 
> TCP ehash is mostly empty (0 or 1 socket per bucket), so early demux
> really makes sense, but for UDP, there is no such property.

So... Are you suggesting that I just skip the early demux for unicast
UDP entirely?  That is fine by me since I only care about the
multicast case.

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 21:24             ` Shawn Bohrer
@ 2013-10-02 21:38               ` Eric Dumazet
  2013-10-03 17:39                 ` Shawn Bohrer
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-10-02 21:38 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Wed, 2013-10-02 at 16:24 -0500, Shawn Bohrer wrote:

> So... Are you suggesting that I just skip the early demux for unicast
> UDP entirely?  That is fine by me since I only care about the
> multicast case.

Nope this is not what I suggested.

I suggested that for unicast, you do a limited lookup to the first
socket found in bucket.

If its an exact match, you take the socket.

If not, you give up, and do not scan the whole chain.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-02 21:38               ` Eric Dumazet
@ 2013-10-03 17:39                 ` Shawn Bohrer
  2013-10-03 18:06                   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Bohrer @ 2013-10-03 17:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev

On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> I suggested that for unicast, you do a limited lookup to the first
> socket found in bucket.
> 
> If its an exact match, you take the socket.
> 
> If not, you give up, and do not scan the whole chain.

So something like the following?


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 02185a5..d202e5b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1849,7 +1849,42 @@ begin:
 	}
 	rcu_read_unlock();
 	return result;
+}
 
+/* For unicast we should only early demux connected sockets or we can
+ * break forwarding setups.  The chains here can be long so only check
+ * if the first socket is an exact match and if not move on.
+ */
+static struct sock *__udp4_lib_demux_lookup(struct net *net,
+					    __be16 loc_port, __be32 loc_addr,
+					    __be16 rmt_port, __be32 rmt_addr,
+					    int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
+	struct udp_hslot *hslot = &udp_table.hash[slot];
+	const int exact_match = 18;
+	int score;
+
+	rcu_read_lock();
+	result = NULL;
+	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
+				      loc_addr, loc_port, dif);
+		if (score == exact_match)
+			result = sk;
+		/* Only check first socket in chain */
+		break;
+	}
+
+	if (result) {
+		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+	}
+	rcu_read_unlock();
+	return result;
 }
 
 void udp_v4_early_demux(struct sk_buff *skb)
@@ -1870,8 +1905,8 @@ void udp_v4_early_demux(struct sk_buff *skb)
 		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
 						   uh->source, iph->saddr, dif);
 	else if (skb->pkt_type == PACKET_HOST)
-		sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
-				       iph->daddr, uh->dest, dif, &udp_table);
+		sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
+					     uh->source, iph->saddr, dif);
 	else
 		return;
 

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [net-next 2/3] udp: Add udp early demux
  2013-10-03 17:39                 ` Shawn Bohrer
@ 2013-10-03 18:06                   ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2013-10-03 18:06 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev

On Thu, 2013-10-03 at 12:39 -0500, Shawn Bohrer wrote:
> On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> > I suggested that for unicast, you do a limited lookup to the first
> > socket found in bucket.
> > 
> > If its an exact match, you take the socket.
> > 
> > If not, you give up, and do not scan the whole chain.
> 
> So something like the following?
> 
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 02185a5..d202e5b 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1849,7 +1849,42 @@ begin:
>  	}
>  	rcu_read_unlock();
>  	return result;
> +}
>  
> +/* For unicast we should only early demux connected sockets or we can
> + * break forwarding setups.  The chains here can be long so only check
> + * if the first socket is an exact match and if not move on.
> + */
> +static struct sock *__udp4_lib_demux_lookup(struct net *net,
> +					    __be16 loc_port, __be32 loc_addr,
> +					    __be16 rmt_port, __be32 rmt_addr,
> +					    int dif)
> +{
> +	struct sock *sk, *result;
> +	struct hlist_nulls_node *node;
> +	unsigned short hnum = ntohs(loc_port);
> +	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
> +	struct udp_hslot *hslot = &udp_table.hash[slot];
> +	const int exact_match = 18;
> +	int score;
> +
> +	rcu_read_lock();
> +	result = NULL;
> +	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
> +		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
> +				      loc_addr, loc_port, dif);
> +		if (score == exact_match)
> +			result = sk;
> +		/* Only check first socket in chain */
> +		break;
> +	}
> +
> +	if (result) {
> +		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
> +			result = NULL;
> +	}
> +	rcu_read_unlock();
> +	return result;
>  }
>  

Just do the tuple comparison instead of compute_score(),
since you know we want full L4 match.

The standard way is to use the INET_MATCH() macro

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-10-03 18:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01 19:33 [net-next 0/3] Improve UDP multicast receive latency Shawn Bohrer
2013-10-01 19:33 ` [net-next 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
2013-10-01 20:44   ` Eric Dumazet
2013-10-01 19:33 ` [net-next 2/3] udp: Add udp early demux Shawn Bohrer
2013-10-01 20:12   ` Rick Jones
2013-10-01 22:26     ` Shawn Bohrer
2013-10-01 20:52   ` Eric Dumazet
2013-10-02 17:34     ` Shawn Bohrer
2013-10-02 18:09       ` Eric Dumazet
2013-10-02 20:35         ` Shawn Bohrer
2013-10-02 21:08           ` Eric Dumazet
2013-10-02 21:24             ` Shawn Bohrer
2013-10-02 21:38               ` Eric Dumazet
2013-10-03 17:39                 ` Shawn Bohrer
2013-10-03 18:06                   ` Eric Dumazet
2013-10-01 19:33 ` [net-next 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
2013-10-01 20:42   ` Eric Dumazet
2013-10-01 22:29     ` Shawn Bohrer
2013-10-01 20:21 ` [net-next 0/3] Improve UDP multicast receive latency Veaceslav Falico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).