netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] udp: Add socket early demux support
@ 2012-06-26 19:43 Vijay Subramanian
  2012-06-26 20:49 ` Eric Dumazet
  2012-06-26 21:34 ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Vijay Subramanian @ 2012-06-26 19:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, shemminger, eric.dumazet, alexander.h.duyck,
	Vijay Subramanian

Based on the recent TCP socket early demux code, this patch provides similar
support for UDP.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
This has been tested on x86 with UDP iperf flows and seemed to work. If this is
accepted, I plan to submit one more patch moving common code from TCP and UDP
early demux code into common helper functions.
Thanks in advance for feedback.

 include/net/udp.h  |    1 +
 net/ipv4/af_inet.c |    1 +
 net/ipv4/udp.c     |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 065f379..e0ed11d 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,6 +175,7 @@ extern int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			    unsigned int hash2_nulladdr);
 
 /* net/ipv4/udp.c */
+extern int udp_v4_early_demux(struct sk_buff *skb);
 extern int udp_get_port(struct sock *sk, unsigned short snum,
 			int (*saddr_cmp)(const struct sock *,
 					 const struct sock *));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 07a02f6..c7d40db 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1531,6 +1531,7 @@ static const struct net_protocol tcp_protocol = {
 };
 
 static const struct net_protocol udp_protocol = {
+	.early_demux =	udp_v4_early_demux,
 	.handler =	udp_rcv,
 	.err_handler =	udp_err,
 	.gso_send_check = udp4_ufo_send_check,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index db017ef..bd85cd8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -520,10 +520,10 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 						 __be16 sport, __be16 dport,
 						 struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk = skb_steal_sock(skb);
 	const struct iphdr *iph = ip_hdr(skb);
 
-	if (unlikely(sk = skb_steal_sock(skb)))
+	if (sk)
 		return sk;
 	else
 		return __udp4_lib_lookup(dev_net(skb_dst(skb)->dev), iph->saddr, sport,
@@ -1403,6 +1403,19 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	int rc;
 	int is_udplite = IS_UDPLITE(sk);
 
+	if (sk->sk_rx_dst) {
+		struct dst_entry *dst = sk->sk_rx_dst;
+		if (unlikely(dst->obsolete)) {
+			if (dst->ops->check(dst, 0) == NULL) {
+				dst_release(dst);
+				sk->sk_rx_dst = NULL;
+			}
+		}
+	}
+
+	if (unlikely(sk->sk_rx_dst == NULL))
+		sk->sk_rx_dst = dst_clone(skb_dst(skb));
+
 	/*
 	 *	Charge it to the socket, dropping if the queue is full.
 	 */
@@ -1622,6 +1635,56 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
 	return 0;
 }
 
+int udp_v4_early_demux(struct sk_buff *skb)
+{
+	struct net *net = dev_net(skb->dev);
+	const struct iphdr *iph;
+	const struct udphdr *uh;
+	struct net_device *dev;
+	struct sock *sk;
+	unsigned short ulen;
+	int err;
+
+	err = -ENOENT;
+	if (skb->pkt_type != PACKET_HOST)
+		goto out_err;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct udphdr)))
+		goto out_err;
+
+	iph = ip_hdr(skb);
+	uh = (struct udphdr *)((char *)iph + ip_hdrlen(skb));
+	ulen = ntohs(uh->len);
+
+	if (ulen > skb->len)
+		goto out_err;
+
+	dev = skb->dev;
+
+	sk = udp4_lib_lookup(net, iph->saddr, uh->source, iph->daddr,
+			     uh->dest, dev->ifindex);
+
+	if (sk) {
+		struct dst_entry *dst = sk->sk_rx_dst;
+		skb->sk = sk;
+		skb->destructor = sock_edemux;
+
+		if (dst)
+			dst = dst_check(dst, 0);
+		if (dst) {
+			struct rtable *rt = (struct rtable *)dst;
+
+			if (rt->rt_iif == dev->ifindex) {
+				skb_dst_set_noref(skb, dst);
+				err = 0;
+			}
+		}
+	}
+
+out_err:
+	return err;
+}
+
 /*
  *	All we need to do is get the socket, and then do a checksum.
  */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] udp: Add socket early demux support
  2012-06-26 19:43 [PATCH net-next] udp: Add socket early demux support Vijay Subramanian
@ 2012-06-26 20:49 ` Eric Dumazet
  2012-06-26 21:34 ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2012-06-26 20:49 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: netdev, davem, shemminger, alexander.h.duyck

On Tue, 2012-06-26 at 12:43 -0700, Vijay Subramanian wrote:
> Based on the recent TCP socket early demux code, this patch provides similar
> support for UDP.
> 
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> This has been tested on x86 with UDP iperf flows and seemed to work. If this is
> accepted, I plan to submit one more patch moving common code from TCP and UDP
> early demux code into common helper functions.
> Thanks in advance for feedback.

Hmm... I cant see how it can work.

Have you tested a router can still route DNS packets if you also have a
DNS server on it, listening on 0.0.0.0:53 ?

Most UDP applications are using unconnected sockets.
(In fact few programmers are aware UDP sockets can be connected)

Try to bench how this is going to work with random IP sources, instead
of UDP iperf flows using a single IP ?

And what about multicast ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] udp: Add socket early demux support
  2012-06-26 19:43 [PATCH net-next] udp: Add socket early demux support Vijay Subramanian
  2012-06-26 20:49 ` Eric Dumazet
@ 2012-06-26 21:34 ` David Miller
  2012-06-27  0:06   ` Vijay Subramanian
  2012-06-27  2:29   ` Changli Gao
  1 sibling, 2 replies; 6+ messages in thread
From: David Miller @ 2012-06-26 21:34 UTC (permalink / raw)
  To: subramanian.vijay; +Cc: netdev, shemminger, eric.dumazet, alexander.h.duyck


You can't do this.

If the UDP socket has wildcards, that means the source address of the
route will not be validated.  This means we will start accepting
spoofed packets.  It also means the route you are caching is going
to be the wrong route since the keys are variable.

You can only do an early demux where all the keys are fully specified
and there are no wildcards.  That why for TCP we only early demux for
established sockets.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] udp: Add socket early demux support
  2012-06-26 21:34 ` David Miller
@ 2012-06-27  0:06   ` Vijay Subramanian
  2012-06-27  2:29   ` Changli Gao
  1 sibling, 0 replies; 6+ messages in thread
From: Vijay Subramanian @ 2012-06-27  0:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, shemminger, eric.dumazet, alexander.h.duyck

Thanks for the reviews and feedback, Dave and Eric. They were very helpful.

On 26 June 2012 14:34, David Miller <davem@davemloft.net> wrote:
>
> You can't do this.
>
> If the UDP socket has wildcards, that means the source address of the
> route will not be validated.  This means we will start accepting
> spoofed packets.  It also means the route you are caching is going
> to be the wrong route since the keys are variable.

Thanks for this explanation.
I see why Eric wanted me to test with DNS server bound to wildcard
address. I guess the same issue exists with multicast too which I had
not even considered.

>
> You can only do an early demux where all the keys are fully specified
> and there are no wildcards.  That why for TCP we only early demux for
> established sockets.

I guess for UDP, early demux will work only if server binds to a
specific address and port instead of wildcard. But I believe
a lot of UDP servers use wildcard addresses, so use of early demux for
UDP may be limited.

Thanks,
Vijay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] udp: Add socket early demux support
  2012-06-26 21:34 ` David Miller
  2012-06-27  0:06   ` Vijay Subramanian
@ 2012-06-27  2:29   ` Changli Gao
  2012-06-27  3:59     ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Changli Gao @ 2012-06-27  2:29 UTC (permalink / raw)
  To: David Miller
  Cc: subramanian.vijay, netdev, shemminger, eric.dumazet,
	alexander.h.duyck

On Wed, Jun 27, 2012 at 5:34 AM, David Miller <davem@davemloft.net> wrote:
>
> You can't do this.
>
> If the UDP socket has wildcards, that means the source address of the
> route will not be validated.  This means we will start accepting
> spoofed packets.  It also means the route you are caching is going
> to be the wrong route since the keys are variable.
>
> You can only do an early demux where all the keys are fully specified
> and there are no wildcards.  That why for TCP we only early demux for
> established sockets.

How about implementing the whole early demux infrastructure on
iptables/netfilter like rpfilter and TPROXY, then users have more
controls.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] udp: Add socket early demux support
  2012-06-27  2:29   ` Changli Gao
@ 2012-06-27  3:59     ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2012-06-27  3:59 UTC (permalink / raw)
  To: xiaosuo
  Cc: subramanian.vijay, netdev, shemminger, eric.dumazet,
	alexander.h.duyck

From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 27 Jun 2012 10:29:19 +0800

> How about implementing the whole early demux infrastructure on
> iptables/netfilter like rpfilter and TPROXY, then users have more
> controls.

Sorry, no.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-27  3:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-26 19:43 [PATCH net-next] udp: Add socket early demux support Vijay Subramanian
2012-06-26 20:49 ` Eric Dumazet
2012-06-26 21:34 ` David Miller
2012-06-27  0:06   ` Vijay Subramanian
2012-06-27  2:29   ` Changli Gao
2012-06-27  3:59     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).