[PATCH 1/2] udp: cleanup __udp4_lib_mcast

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver
@ 2009-11-05 18:33 Lucian Adrian Grijincu
  2009-11-06  8:42 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Lucian Adrian Grijincu @ 2009-11-05 18:33 UTC (permalink / raw)
  To: netdev; +Cc: opurdila

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]


__udp4_lib_mcast_deliver always returned 0.
It's caller can return 0 explicitly to make things clearer.

Also, we don't need the spin_lock() on the hslot to free the skb.

Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
---
 net/ipv4/udp.c |   57 +++++++++++++++++++++++++++++--------------------------
 1 files changed, 30 insertions(+), 27 deletions(-)


[-- Attachment #2: 0001-udp-cleanup-__udp4_lib_mcast_deliver.patch --]
[-- Type: text/plain, Size: 2479 bytes --]

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4274c1c..425b2d4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1198,12 +1198,12 @@ drop:
  *	Note: called only from the BH handler context,
  *	so we don't need to lock the hashes.
  */
-static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
+static void __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
 				    struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk, *sknext;
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
 
@@ -1211,31 +1211,32 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	sk = sk_nulls_head(&hslot->head);
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (sk) {
-		struct sock *sknext = NULL;
-
-		do {
-			struct sk_buff *skb1 = skb;
-
-			sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
-						   daddr, uh->source, saddr,
-						   dif);
-			if (sknext)
-				skb1 = skb_clone(skb, GFP_ATOMIC);
-
-			if (skb1) {
-				int ret = udp_queue_rcv_skb(sk, skb1);
-				if (ret > 0)
-					/* we should probably re-process instead
-					 * of dropping packets here. */
-					kfree_skb(skb1);
-			}
-			sk = sknext;
-		} while (sknext);
-	} else
+	if (!sk) {
+		spin_unlock(&hslot->lock);
 		consume_skb(skb);
+		return;
+	}
+
+	do {
+		struct sk_buff *skb1 = skb;
+
+		sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
+					   daddr, uh->source, saddr,
+					   dif);
+		if (sknext)
+			skb1 = skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1) {
+			int ret = udp_queue_rcv_skb(sk, skb1);
+			if (ret > 0)
+				/* we should probably re-process instead
+				 * of dropping packets here. */
+				kfree_skb(skb1);
+		}
+		sk = sknext;
+	} while (sknext);
 	spin_unlock(&hslot->lock);
-	return 0;
+	return;
 }
 
 /* Initialize UDP checksum. If exited with zero value (success),
@@ -1314,9 +1315,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	saddr = ip_hdr(skb)->saddr;
 	daddr = ip_hdr(skb)->daddr;
 
-	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
-		return __udp4_lib_mcast_deliver(net, skb, uh,
+	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) {
+		__udp4_lib_mcast_deliver(net, skb, uh,
 				saddr, daddr, udptable);
+		return 0;
+	}
 
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver
  2009-11-05 18:33 [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver Lucian Adrian Grijincu
@ 2009-11-06  8:42 ` David Miller
  2009-11-06 14:04   ` Lucian Adrian Grijincu
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2009-11-06  8:42 UTC (permalink / raw)
  To: lgrijincu; +Cc: netdev, opurdila

From: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Date: Thu, 5 Nov 2009 20:33:21 +0200

> 
> __udp4_lib_mcast_deliver always returned 0.
> It's caller can return 0 explicitly to make things clearer.
> 
> Also, we don't need the spin_lock() on the hslot to free the skb.
> 
> Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>

I think the current code is much easier to understand than
your version.

Getting rid of the useless return value is fine, but the
new do{}while() loop et al. is less readable to me.

I'm not applying these two patches, sorry.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver
  2009-11-06  8:42 ` David Miller
@ 2009-11-06 14:04   ` Lucian Adrian Grijincu
  2009-11-06 15:10     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Lucian Adrian Grijincu @ 2009-11-06 14:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, opurdila

În data de Vin 06 Noi 2009 10:42:15 David Miller a scris:
> Getting rid of the useless return value is fine, but the
> new do{}while() loop et al. is less readable to me.

It's not new.
I just moved it:
previously:
	if(cond) {
		do {}
		 while()
	} else {return}

now:
	if (!cond)
		return;
	do {}
	while {}

Also this moves a consume_skb() from a path protected by spin locks.

As far as I understand it, the spin locks protect the hslot, and freeing the 
skb does not walk/change or interact with the hslot in any way.

-- 
Lucian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver
  2009-11-06 14:04   ` Lucian Adrian Grijincu
@ 2009-11-06 15:10     ` Eric Dumazet
  2009-11-06 15:59       ` [PATCH net-next-2.6] udp: Optimise multicast reception Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 15:10 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David Miller, netdev, opurdila

Lucian Adrian Grijincu a écrit :
> 
> As far as I understand it, the spin locks protect the hslot, and freeing the 
> skb does not walk/change or interact with the hslot in any way.
> 

Yes, but this single skb freeing is in multicast very slow path
(it happens if we receive a multicast packet with no listener, which should
 not happen with multicast aware network...)


If you really want to optimize this part, we could use an array of
32 (or 64) socket pointers, to be able to perform the really expensive
work (skb_clone(), udp_queue_rcv_skb()) outside of the lock.

Something like this untested patch :


 net/ipv4/udp.c |   68 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 44 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d5e75e9..5d71aee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1190,6 +1190,24 @@ drop:
 	return -1;
 }
 
+
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sk_buff *skb1 = NULL;
+
+	for (i = 0; i < count; i++) {
+		if (likely(skb1 == NULL))
+			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (likely(skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0))
+			skb1 = NULL;
+	}
+	if (unlikely(skb1))
+		consume_skb(skb1);
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
@@ -1201,38 +1219,40 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    __be32 saddr, __be32 daddr,
 				    struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk, *stack[256 / sizeof(void *)];
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (sk) {
-		struct sock *sknext = NULL;
-
-		do {
-			struct sk_buff *skb1 = skb;
-
-			sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
-						   daddr, uh->source, saddr,
-						   dif);
-			if (sknext)
-				skb1 = skb_clone(skb, GFP_ATOMIC);
-
-			if (skb1) {
-				int ret = udp_queue_rcv_skb(sk, skb1);
-				if (ret > 0)
-					/* we should probably re-process instead
-					 * of dropping packets here. */
-					kfree_skb(skb1);
-			}
-			sk = sknext;
-		} while (sknext);
-	} else
-		consume_skb(skb);
+	while (sk) {
+		stack[count++] = sk;
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
+		}
+
+		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
+				       daddr, uh->source, saddr, dif);
+	}
+	/*
+	 * before releasing the lock, we must take reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	/*
+	 * do the slow work with no lock held
+	 */
+	flush_stack(stack, count, skb, count - 1);
+
+	for (i = 0; i < count; i++)
+		sock_put(stack[i]);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next-2.6] udp: Optimise multicast reception
  2009-11-06 15:10     ` Eric Dumazet
@ 2009-11-06 15:59       ` Eric Dumazet
  2009-11-06 16:30         ` [PATCH net-next-2.6] ipv6: " Eric Dumazet
  2009-11-06 16:35         ` [PATCH net-next-2.6] " Lucian Adrian Grijincu
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 15:59 UTC (permalink / raw)
  To: Lucian Adrian Grijincu, David Miller; +Cc: netdev, opurdila

Eric Dumazet a écrit :

> Yes, but this single skb freeing is in multicast very slow path
> (it happens if we receive a multicast packet with no listener, which should
>  not happen with multicast aware network...)
> 
> 
> If you really want to optimize this part, we could use an array of
> 32 (or 64) socket pointers, to be able to perform the really expensive
> work (skb_clone(), udp_queue_rcv_skb()) outside of the lock.
> 
> Something like this untested patch :

I did some tests and made one fix.

With this kind of stacking, we eventually could try a rcu lookup as well.



[PATCH net-next-2.6] udp: Optimise multicast reception

UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/udp.c |   70 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d5e75e9..89637d0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1190,6 +1190,24 @@ drop:
 	return -1;
 }
 
+
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sk_buff *skb1 = NULL;
+
+	for (i = 0; i < count; i++) {
+		if (likely(skb1 == NULL))
+			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0)
+			skb1 = NULL;
+	}
+	if (skb1)
+		consume_skb(skb1);
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
@@ -1201,38 +1219,42 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    __be32 saddr, __be32 daddr,
 				    struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (sk) {
-		struct sock *sknext = NULL;
-
-		do {
-			struct sk_buff *skb1 = skb;
-
-			sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
-						   daddr, uh->source, saddr,
-						   dif);
-			if (sknext)
-				skb1 = skb_clone(skb, GFP_ATOMIC);
-
-			if (skb1) {
-				int ret = udp_queue_rcv_skb(sk, skb1);
-				if (ret > 0)
-					/* we should probably re-process instead
-					 * of dropping packets here. */
-					kfree_skb(skb1);
-			}
-			sk = sknext;
-		} while (sknext);
-	} else
-		consume_skb(skb);
+	while (sk) {
+		stack[count++] = sk;
+		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
+				       daddr, uh->source, saddr, dif);
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			if (!sk)
+				break;
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
+		}
+
+	}
+	/*
+	 * before releasing the lock, we must take reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	/*
+	 * do the slow work with no lock held
+	 */
+	flush_stack(stack, count, skb, count - 1);
+
+	for (i = 0; i < count; i++)
+		sock_put(stack[i]);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next-2.6] ipv6: udp: Optimise multicast reception
  2009-11-06 15:59       ` [PATCH net-next-2.6] udp: Optimise multicast reception Eric Dumazet
@ 2009-11-06 16:30         ` Eric Dumazet
  2009-11-06 16:35           ` Eric Dumazet
  2009-11-06 16:35         ` [PATCH net-next-2.6] " Lucian Adrian Grijincu
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 16:30 UTC (permalink / raw)
  To: David Miller; +Cc: Lucian Adrian Grijincu, netdev, opurdila

And here is the ipv6 part as well

[PATCH net-next-2.6] ipv6: udp: Optimise multicast reception

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next-2.6] ipv6: udp: Optimise multicast reception
  2009-11-06 16:30         ` [PATCH net-next-2.6] ipv6: " Eric Dumazet
@ 2009-11-06 16:35           ` Eric Dumazet
  2009-11-06 17:06             ` [PATCH net-next-2.6 take2] " Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 16:35 UTC (permalink / raw)
  To: David Miller; +Cc: Lucian Adrian Grijincu, netdev, opurdila

Sorry for previous empty mail :(

[PATCH net-next-2.6] ipv6: udp: Optimise multicast reception

IPV6 UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/udp.c |   67 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 5bc7cdb..670662a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -440,6 +440,27 @@ static struct sock *udp_v6_mcast_next(struct net *net, struct sock *sk,
 	return NULL;
 }
 
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sock *sk;
+	struct sk_buff *skb1;
+
+	for (i = 0; i < count; i++) {
+		skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1) {
+			sk = stack[i];
+			bh_lock_sock(sk);
+			if (!sock_owned_by_user(sk))
+				udpv6_queue_rcv_skb(sk, skb1);
+			else
+				sk_add_backlog(sk, skb1);
+			bh_unlock_sock(sk);
+		}
+	}
+}
 /*
  * Note: called only from the BH handler context,
  * so we don't need to lock the hashes.
@@ -448,41 +469,39 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		struct in6_addr *saddr, struct in6_addr *daddr,
 		struct udp_table *udptable)
 {
-	struct sock *sk, *sk2;
+	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	const struct udphdr *uh = udp_hdr(skb);
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = inet6_iif(skb);
 	sk = udp_v6_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (!sk) {
-		kfree_skb(skb);
-		goto out;
-	}
-
-	sk2 = sk;
-	while ((sk2 = udp_v6_mcast_next(net, sk_nulls_next(sk2), uh->dest, daddr,
-					uh->source, saddr, dif))) {
-		struct sk_buff *buff = skb_clone(skb, GFP_ATOMIC);
-		if (buff) {
-			bh_lock_sock(sk2);
-			if (!sock_owned_by_user(sk2))
-				udpv6_queue_rcv_skb(sk2, buff);
-			else
-				sk_add_backlog(sk2, buff);
-			bh_unlock_sock(sk2);
+	while (sk) {
+		stack[count++] = sk;
+		sk = udp_v6_mcast_next(net, sk_nulls_next(sk), uh->dest, daddr,
+				       uh->source, saddr, dif);
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			if (!sk)
+				break;
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
 		}
 	}
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
-out:
+	/*
+	 * before releasing the lock, we must take reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	flush_stack(stack, count, skb, count - 1);
+
+	for (i = 0; i < count; i++)
+		sock_put(stack[i]);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 16:35           ` Eric Dumazet
@ 2009-11-06 17:06             ` Eric Dumazet
  2009-11-06 17:19               ` Lucian Adrian Grijincu
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 17:06 UTC (permalink / raw)
  To: David Miller; +Cc: Lucian Adrian Grijincu, netdev, opurdila

Take 2 of patch, adding a missing kfree_skb() in case no socket could receive a copy

[PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception

IPV6 UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.



Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/udp.c |   71 +++++++++++++++++++++++++++++++----------------
 1 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 5bc7cdb..ecb6a6f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -440,6 +440,27 @@ static struct sock *udp_v6_mcast_next(struct net *net, struct sock *sk,
 	return NULL;
 }
 
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sock *sk;
+	struct sk_buff *skb1;
+
+	for (i = 0; i < count; i++) {
+		skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1) {
+			sk = stack[i];
+			bh_lock_sock(sk);
+			if (!sock_owned_by_user(sk))
+				udpv6_queue_rcv_skb(sk, skb1);
+			else
+				sk_add_backlog(sk, skb1);
+			bh_unlock_sock(sk);
+		}
+	}
+}
 /*
  * Note: called only from the BH handler context,
  * so we don't need to lock the hashes.
@@ -448,41 +469,43 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		struct in6_addr *saddr, struct in6_addr *daddr,
 		struct udp_table *udptable)
 {
-	struct sock *sk, *sk2;
+	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	const struct udphdr *uh = udp_hdr(skb);
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = inet6_iif(skb);
 	sk = udp_v6_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (!sk) {
-		kfree_skb(skb);
-		goto out;
-	}
-
-	sk2 = sk;
-	while ((sk2 = udp_v6_mcast_next(net, sk_nulls_next(sk2), uh->dest, daddr,
-					uh->source, saddr, dif))) {
-		struct sk_buff *buff = skb_clone(skb, GFP_ATOMIC);
-		if (buff) {
-			bh_lock_sock(sk2);
-			if (!sock_owned_by_user(sk2))
-				udpv6_queue_rcv_skb(sk2, buff);
-			else
-				sk_add_backlog(sk2, buff);
-			bh_unlock_sock(sk2);
+	while (sk) {
+		stack[count++] = sk;
+		sk = udp_v6_mcast_next(net, sk_nulls_next(sk), uh->dest, daddr,
+				       uh->source, saddr, dif);
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			if (!sk)
+				break;
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
 		}
 	}
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
-out:
+	/*
+	 * before releasing the lock, we must take reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	if (count) {
+		flush_stack(stack, count, skb, count - 1);
+
+		for (i = 0; i < count; i++)
+			sock_put(stack[i]);
+	} else {
+		kfree_skb(skb);
+	}
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 17:06             ` [PATCH net-next-2.6 take2] " Eric Dumazet
@ 2009-11-06 17:19               ` Lucian Adrian Grijincu
  2009-11-06 17:24                 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Lucian Adrian Grijincu @ 2009-11-06 17:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, opurdila

În data de Vin 06 Noi 2009 19:06:30 Eric Dumazet a scris:
> +       for (i = 0; i < count; i++) {
> +               skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
> +
> +               if (skb1) {
> +                       sk = stack[i];
> +                       bh_lock_sock(sk);
> +                       if (!sock_owned_by_user(sk))
> +                               udpv6_queue_rcv_skb(sk, skb1);
> +                       else
> +                               sk_add_backlog(sk, skb1);
> +                       bh_unlock_sock(sk);
> +               }
> +       }

Is there any reason to continue this loop if we ever get a NULL skb1?

If once we can't get a GFP_ATOMIC clone, are there merrits to calling 
skb_clone() like crazy (in a very tight loop) (whilst holding a spin lock in 
some cases)?

-- 
Lucian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 17:19               ` Lucian Adrian Grijincu
@ 2009-11-06 17:24                 ` Eric Dumazet
  2009-11-06 17:54                   ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 17:24 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David Miller, netdev, opurdila

Lucian Adrian Grijincu a écrit :
> În data de Vin 06 Noi 2009 19:06:30 Eric Dumazet a scris:
>> +       for (i = 0; i < count; i++) {
>> +               skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
>> +
>> +               if (skb1) {
>> +                       sk = stack[i];
>> +                       bh_lock_sock(sk);
>> +                       if (!sock_owned_by_user(sk))
>> +                               udpv6_queue_rcv_skb(sk, skb1);
>> +                       else
>> +                               sk_add_backlog(sk, skb1);
>> +                       bh_unlock_sock(sk);
>> +               }
>> +       }
> 
> Is there any reason to continue this loop if we ever get a NULL skb1?
> 
> If once we can't get a GFP_ATOMIC clone, are there merrits to calling 
> skb_clone() like crazy (in a very tight loop) (whilst holding a spin lock in 
> some cases)?
> 

Well, because of future RCU patch, I'll need to continue the loop anyway,
because sock_put() will be part of the loop.

Also, the 'final' case will at least provide the skb we already have.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 17:24                 ` Eric Dumazet
@ 2009-11-06 17:54                   ` Eric Dumazet
  2009-11-06 17:59                     ` Lucian Adrian Grijincu
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 17:54 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David Miller, netdev, opurdila

Eric Dumazet a écrit :
>
> 
> Well, because of future RCU patch, I'll need to continue the loop anyway,
> because sock_put() will be part of the loop.
> 
> Also, the 'final' case will at least provide the skb we already have.
> 

Last point :

We should atomic_inc(&sk->sk_drops) in the case skb_clone()
could not provide a copy, and eventually increment UDP_MIB_RCVBUFERRORS & 
UDP_MIB_INERRORS SNMP counters



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 17:54                   ` Eric Dumazet
@ 2009-11-06 17:59                     ` Lucian Adrian Grijincu
  2009-11-06 18:03                       ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Lucian Adrian Grijincu @ 2009-11-06 17:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, opurdila

În data de Vin 06 Noi 2009 19:54:55 Eric Dumazet a scris:
> We should atomic_inc(&sk->sk_drops) in the case skb_clone()
> could not provide a copy, and eventually increment UDP_MIB_RCVBUFERRORS &
> UDP_MIB_INERRORS SNMP counters

Shouldn't this be also done if udp_queue_rcv_skb() returned a failure code (a 
positive number) instead of silently droping?

-- 
Lucian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6 take2] ipv6: udp: Optimise multicast reception
  2009-11-06 17:59                     ` Lucian Adrian Grijincu
@ 2009-11-06 18:03                       ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 18:03 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David Miller, netdev, opurdila

Lucian Adrian Grijincu a écrit :
> În data de Vin 06 Noi 2009 19:54:55 Eric Dumazet a scris:
>> We should atomic_inc(&sk->sk_drops) in the case skb_clone()
>> could not provide a copy, and eventually increment UDP_MIB_RCVBUFERRORS &
>> UDP_MIB_INERRORS SNMP counters
> 
> Shouldn't this be also done if udp_queue_rcv_skb() returned a failure code (a 
> positive number) instead of silently droping?
> 

Nope, this is correctly done in __udp_queue_rcv_skb() & sock_queue_rcv_skb()

Only multicast is lazy in this area.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6] udp: Optimise multicast reception
  2009-11-06 15:59       ` [PATCH net-next-2.6] udp: Optimise multicast reception Eric Dumazet
  2009-11-06 16:30         ` [PATCH net-next-2.6] ipv6: " Eric Dumazet
@ 2009-11-06 16:35         ` Lucian Adrian Grijincu
  2009-11-06 16:54           ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Lucian Adrian Grijincu @ 2009-11-06 16:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, opurdila

[-- Attachment #1: Type: Text/Plain, Size: 1298 bytes --]

În data de Vin 06 Noi 2009 17:59:51 ați scris:
> +static void flush_stack(struct sock **stack, unsigned int count,
> +                       struct sk_buff *skb, unsigned int final)
> +{
> +       unsigned int i;
> +       struct sk_buff *skb1 = NULL;
> +
> +       for (i = 0; i < count; i++) {
> +             if (likely(skb1 == NULL))
> +                    skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
> +
> +               if (skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0)
> +                       skb1 = NULL;
> +       }
> +       if (skb1)
> +               consume_skb(skb1);
> +}

consume_skb() assumes the skb was successfuly transmitted.

free_skb() does the same thing, but assumes that the frame is being dropped 
after a failure and notes that.

In your code, if (count == 0) you:
* fail to remove the original skb (memory leak), 
* simply consume the last dropped skb, without noting the droping failure.

I fixed these in the attached (untested) patch.

One last issue: you silently ignore dropped failures (skb1 is reused in case 
of a failure).

If this tracing must record all failures, I'd add an
	trace_kfree_skb(skb1, __builtin_return_address(0));
if udp_queue_rcv_skb() fails.

Other than this, nicely done!
-- 
Lucian

[-- Attachment #2: udp-optimise-multicast-reception.patch --]
[-- Type: text/plain, Size: 2359 bytes --]

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d5e75e9..a2c1a27 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1190,6 +1190,29 @@ drop:
 	return -1;
 }
 
+
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sk_buff *skb1 = NULL;
+
+	if ((count == 0) && skb) {
+		consume_skb(skb);
+		return;
+	}
+
+	for (i = 0; i < count; i++) {
+		if (likely(skb1 == NULL))
+			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0)
+			skb1 = NULL;
+	}
+	if (skb1)
+		free_skb(skb1);
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
@@ -1201,38 +1224,42 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    __be32 saddr, __be32 daddr,
 				    struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (sk) {
-		struct sock *sknext = NULL;
-
-		do {
-			struct sk_buff *skb1 = skb;
-
-			sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
-						   daddr, uh->source, saddr,
-						   dif);
-			if (sknext)
-				skb1 = skb_clone(skb, GFP_ATOMIC);
-
-			if (skb1) {
-				int ret = udp_queue_rcv_skb(sk, skb1);
-				if (ret > 0)
-					/* we should probably re-process instead
-					 * of dropping packets here. */
-					kfree_skb(skb1);
-			}
-			sk = sknext;
-		} while (sknext);
-	} else
-		consume_skb(skb);
+	while (sk) {
+		stack[count++] = sk;
+		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
+				       daddr, uh->source, saddr, dif);
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			if (!sk)
+				break;
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
+		}
+
+	}
+	/*
+	 * before releasing the lock, we must take reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	/*
+	 * do the slow work with no lock held
+	 */
+	flush_stack(stack, count, skb, count - 1);
+
+	for (i = 0; i < count; i++)
+		sock_put(stack[i]);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next-2.6] udp: Optimise multicast reception
  2009-11-06 16:35         ` [PATCH net-next-2.6] " Lucian Adrian Grijincu
@ 2009-11-06 16:54           ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2009-11-06 16:54 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David Miller, netdev, opurdila

Lucian Adrian Grijincu a écrit :
> În data de Vin 06 Noi 2009 17:59:51 ați scris:
>> +static void flush_stack(struct sock **stack, unsigned int count,
>> +                       struct sk_buff *skb, unsigned int final)
>> +{
>> +       unsigned int i;
>> +       struct sk_buff *skb1 = NULL;
>> +
>> +       for (i = 0; i < count; i++) {
>> +             if (likely(skb1 == NULL))
>> +                    skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
>> +
>> +               if (skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0)
>> +                       skb1 = NULL;
>> +       }
>> +       if (skb1)
>> +               consume_skb(skb1);
>> +}
> 
> consume_skb() assumes the skb was successfuly transmitted.
> 
> free_skb() does the same thing, but assumes that the frame is being dropped 
> after a failure and notes that.
> 
> In your code, if (count == 0) you:
> * fail to remove the original skb (memory leak), 
> * simply consume the last dropped skb, without noting the droping failure.
> 
> I fixed these in the attached (untested) patch.
> 
> One last issue: you silently ignore dropped failures (skb1 is reused in case 
> of a failure).
> 
> If this tracing must record all failures, I'd add an
> 	trace_kfree_skb(skb1, __builtin_return_address(0));
> if udp_queue_rcv_skb() fails.
> 
> Other than this, nicely done!
> 

Thanks !

Note, free_skb() doesnt exist ;)
And we should not call consume_skb() in this path.

I made the if (count == 0) done at the end of __udp4_lib_mcast_deliver()



[PATCH net-next-2.6] udp: Optimise multicast reception

UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
---
 net/ipv4/udp.c |   76 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d5e75e9..45c73b1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1190,49 +1190,73 @@ drop:
 	return -1;
 }
 
+
+static void flush_stack(struct sock **stack, unsigned int count,
+			struct sk_buff *skb, unsigned int final)
+{
+	unsigned int i;
+	struct sk_buff *skb1 = NULL;
+
+	for (i = 0; i < count; i++) {
+		if (likely(skb1 == NULL))
+			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
+
+		if (skb1 && udp_queue_rcv_skb(stack[i], skb1) <= 0)
+			skb1 = NULL;
+	}
+	if (unlikely(skb1))
+		kfree_skb(skb1);
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
- *	Note: called only from the BH handler context,
- *	so we don't need to lock the hashes.
+ *	Note: called only from the BH handler context.
  */
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
 				    struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
+	unsigned int i, count = 0;
 
 	spin_lock(&hslot->lock);
 	sk = sk_nulls_head(&hslot->head);
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
-	if (sk) {
-		struct sock *sknext = NULL;
-
-		do {
-			struct sk_buff *skb1 = skb;
-
-			sknext = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
-						   daddr, uh->source, saddr,
-						   dif);
-			if (sknext)
-				skb1 = skb_clone(skb, GFP_ATOMIC);
-
-			if (skb1) {
-				int ret = udp_queue_rcv_skb(sk, skb1);
-				if (ret > 0)
-					/* we should probably re-process instead
-					 * of dropping packets here. */
-					kfree_skb(skb1);
-			}
-			sk = sknext;
-		} while (sknext);
-	} else
-		consume_skb(skb);
+	while (sk) {
+		stack[count++] = sk;
+		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
+				       daddr, uh->source, saddr, dif);
+		if (unlikely(count == ARRAY_SIZE(stack))) {
+			if (!sk)
+				break;
+			flush_stack(stack, count, skb, ~0);
+			count = 0;
+		}
+	}
+	/*
+	 * before releasing chain lock, we must take a reference on sockets
+	 */
+	for (i = 0; i < count; i++)
+		sock_hold(stack[i]);
+
 	spin_unlock(&hslot->lock);
+
+	/*
+	 * do the slow work with no lock held
+	 */
+	if (count) {
+		flush_stack(stack, count, skb, count - 1);
+
+		for (i = 0; i < count; i++)
+			sock_put(stack[i]);
+	} else {
+		kfree_skb(skb);
+	}
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-11-06 18:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-05 18:33 [PATCH 1/2] udp: cleanup __udp4_lib_mcast_deliver Lucian Adrian Grijincu
2009-11-06  8:42 ` David Miller
2009-11-06 14:04   ` Lucian Adrian Grijincu
2009-11-06 15:10     ` Eric Dumazet
2009-11-06 15:59       ` [PATCH net-next-2.6] udp: Optimise multicast reception Eric Dumazet
2009-11-06 16:30         ` [PATCH net-next-2.6] ipv6: " Eric Dumazet
2009-11-06 16:35           ` Eric Dumazet
2009-11-06 17:06             ` [PATCH net-next-2.6 take2] " Eric Dumazet
2009-11-06 17:19               ` Lucian Adrian Grijincu
2009-11-06 17:24                 ` Eric Dumazet
2009-11-06 17:54                   ` Eric Dumazet
2009-11-06 17:59                     ` Lucian Adrian Grijincu
2009-11-06 18:03                       ` Eric Dumazet
2009-11-06 16:35         ` [PATCH net-next-2.6] " Lucian Adrian Grijincu
2009-11-06 16:54           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).