speed regression in udp_lib_lport

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* speed regression in udp_lib_lport_inuse()
@ 2009-01-22 18:49 Vitaly Mayatskikh
  2009-01-22 22:06 ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Mayatskikh @ 2009-01-22 18:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Hello!

I found your latest patches w.r.t. udp port randomization really solve
the "finding shortest chain kills randomness" problem, but
significantly slow down system in the case when almost every port is
in use. Kernel spends too much time trying to find free port number.

Try to compile and run this reproducer (after increasing open files
limit).

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <pthread.h>
#include <assert.h>

#define PORTS 65536
#define NP 64
#define THREADS

void* foo(void* arg)
{
	int s, err, i, j;
	struct sockaddr_in sa;
	int optval = 1, port;
	unsigned int p[PORTS] = { 0 };

	for (i = 0; i < PORTS * 100; ++i) {
		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
		assert(s > 0);
		memset(&sa, 0, sizeof(sa));
		sa.sin_addr.s_addr = htonl(INADDR_ANY);
		sa.sin_family = AF_INET;
		sa.sin_port = 0;
		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));

		getsockname(s, (struct sockaddr*)&sa, &j);
		port = ntohs(sa.sin_port);
		p[port] = s;
// free some ports
		if (p[port + 1]) {
			close(p[port + 1]);
			p[port + 1] = 0;
		}
		if (p[port - 1]) {
			close(p[port - 1]);
			p[port - 1] = 0;
		}
	}
}

int main()
{
	int i, err;
#ifdef THREADS
	pthread_t t[NP];

	for (i = 0; i < NP; ++i)
	{
		err = pthread_create(&t[i], NULL, foo, NULL);
		assert(err == 0);
	}
	for (i = 0; i < NP; ++i)
	{
		err = pthread_join(t[i], NULL);
		assert(err == 0);
	}
#else
	for (i = 0; i < NP; ++i) {
		err = fork();
		if (err == 0)
			foo(NULL);
	}
#endif
}

I ran glxgears and had these numbers:

$ glxgears 
3297 frames in 5.0 seconds = 659.283 FPS
3680 frames in 5.0 seconds = 735.847 FPS
3840 frames in 5.0 seconds = 767.891 FPS
3574 frames in 5.0 seconds = 714.704 FPS
-> here I ran reproducer
2507 frames in 5.1 seconds = 493.173 FPS
56 frames in 7.7 seconds =  7.316 FPS
14 frames in 5.1 seconds =  2.752 FPS
1 frames in 6.8 seconds =  0.146 FPS
9 frames in 7.6 seconds =  1.188 FPS
1 frames in 9.3 seconds =  0.108 FPS
12 frames in 5.5 seconds =  2.187 FPS
30 frames in 9.0 seconds =  3.338 FPS
25 frames in 5.1 seconds =  4.888 FPS
<- here I killed reproducer
1034 frames in 5.0 seconds = 206.764 FPS
3728 frames in 5.0 seconds = 745.541 FPS
3668 frames in 5.0 seconds = 733.496 FPS

Last stable kernel survives it more or less smoothly.

Thanks!
--
wbr, Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-22 18:49 speed regression in udp_lib_lport_inuse() Vitaly Mayatskikh
@ 2009-01-22 22:06 ` Eric Dumazet
  2009-01-22 22:14   ` Evgeniy Polyakov
  2009-01-22 22:40   ` Vitaly Mayatskikh
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2009-01-22 22:06 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: David Miller, netdev

Vitaly Mayatskikh a écrit :
> Hello!
> 
> I found your latest patches w.r.t. udp port randomization really solve
> the "finding shortest chain kills randomness" problem, but
> significantly slow down system in the case when almost every port is
> in use. Kernel spends too much time trying to find free port number.
> 
> Try to compile and run this reproducer (after increasing open files
> limit).
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <pthread.h>
> #include <assert.h>
> 
> #define PORTS 65536
> #define NP 64
> #define THREADS
> 
> void* foo(void* arg)
> {
> 	int s, err, i, j;
> 	struct sockaddr_in sa;
> 	int optval = 1, port;
> 	unsigned int p[PORTS] = { 0 };
> 
> 	for (i = 0; i < PORTS * 100; ++i) {
> 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
> 		assert(s > 0);
> 		memset(&sa, 0, sizeof(sa));
> 		sa.sin_addr.s_addr = htonl(INADDR_ANY);
> 		sa.sin_family = AF_INET;
> 		sa.sin_port = 0;
> 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));

Bug here, if bind() returns -1 (all ports are in use)

> 
> 		getsockname(s, (struct sockaddr*)&sa, &j);
> 		port = ntohs(sa.sin_port);
> 		p[port] = s;
> // free some ports
> 		if (p[port + 1]) {
> 			close(p[port + 1]);
> 			p[port + 1] = 0;
> 		}
> 		if (p[port - 1]) {
> 			close(p[port - 1]);
> 			p[port - 1] = 0;
> 		}
> 	}
> }
> 
> int main()
> {
> 	int i, err;
> #ifdef THREADS
> 	pthread_t t[NP];
> 
> 	for (i = 0; i < NP; ++i)
> 	{
> 		err = pthread_create(&t[i], NULL, foo, NULL);
> 		assert(err == 0);
> 	}
> 	for (i = 0; i < NP; ++i)
> 	{
> 		err = pthread_join(t[i], NULL);
> 		assert(err == 0);
> 	}
> #else
> 	for (i = 0; i < NP; ++i) {
> 		err = fork();
> 		if (err == 0)
> 			foo(NULL);
> 	}
> #endif
> }
> 
> I ran glxgears and had these numbers:
> 
> $ glxgears 
> 3297 frames in 5.0 seconds = 659.283 FPS
> 3680 frames in 5.0 seconds = 735.847 FPS
> 3840 frames in 5.0 seconds = 767.891 FPS
> 3574 frames in 5.0 seconds = 714.704 FPS
> -> here I ran reproducer
> 2507 frames in 5.1 seconds = 493.173 FPS
> 56 frames in 7.7 seconds =  7.316 FPS
> 14 frames in 5.1 seconds =  2.752 FPS
> 1 frames in 6.8 seconds =  0.146 FPS
> 9 frames in 7.6 seconds =  1.188 FPS
> 1 frames in 9.3 seconds =  0.108 FPS
> 12 frames in 5.5 seconds =  2.187 FPS
> 30 frames in 9.0 seconds =  3.338 FPS
> 25 frames in 5.1 seconds =  4.888 FPS
> <- here I killed reproducer
> 1034 frames in 5.0 seconds = 206.764 FPS
> 3728 frames in 5.0 seconds = 745.541 FPS
> 3668 frames in 5.0 seconds = 733.496 FPS
> 
> Last stable kernel survives it more or less smoothly.
> 
> Thanks!

Hello Vitaly, thanks for this excellent report.

Yes, current code is really not good when all ports are in use :

We now have to scan 28232 [1] times long chains of 220 sockets.
Thats very long (but at least thread is preemptable)

In the past (before patches), only one thread was allowed to run in kernel while scanning
udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
This thread was faster because it was not slowed down by other threads.
(But the rwlock we used was responsible for starvations of writers if many UDP frames
were received)



One way to solve the problem could be to use following :

1) Raising UDP_HTABLE_SIZE from 128 to 1024 to reduce average chain lengths.

2) In bind(0) algo, use rcu locking to find a possible usable port. All cpus can run in //, without
dirtying locks. Then lock the found chain and recheck port is available before using it.

[1] replace 28232 by your actual /proc/sys/net/ipv4/ip_local_port_range values
61000 - 32768 = 28232

I will try to code a patch before this week end.

Thanks

Note : I tried to use a mutex to force only one thread in bind(0) code but got no real speedup.
But it should help if you have a SMP machine, since only one cpu will be busy in bind(0)


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..a572407 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -155,6 +155,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	struct udp_hslot *hslot;
 	struct udp_table *udptable = sk->sk_prot->h.udp_table;
 	int    error = 1;
+	static DEFINE_MUTEX(bind0_mutex);
+	int mutex_acquired = 0;
 	struct net *net = sock_net(sk);
 
 	if (!snum) {
@@ -162,6 +164,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		unsigned rand;
 		unsigned short first;
 
+		mutex_lock(&bind0_mutex);
+		mutex_acquired = 1;
 		inet_get_local_port_range(&low, &high);
 		remaining = (high - low) + 1;
 
@@ -196,6 +200,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 fail_unlock:
 	spin_unlock_bh(&hslot->lock);
 fail:
+	if (mutex_acquired)
+		mutex_unlock(&bind0_mutex);
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-22 22:06 ` Eric Dumazet
@ 2009-01-22 22:14   ` Evgeniy Polyakov
  2009-01-23  0:20     ` Eric Dumazet
  2009-01-22 22:40   ` Vitaly Mayatskikh
  1 sibling, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2009-01-22 22:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vitaly Mayatskikh, David Miller, netdev

Hi Eric.

On Thu, Jan 22, 2009 at 11:06:59PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> Hello Vitaly, thanks for this excellent report.
> 
> Yes, current code is really not good when all ports are in use :
> 
> We now have to scan 28232 [1] times long chains of 220 sockets.
> Thats very long (but at least thread is preemptable)
> 
> In the past (before patches), only one thread was allowed to run in kernel while scanning
> udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
> This thread was faster because it was not slowed down by other threads.
> (But the rwlock we used was responsible for starvations of writers if many UDP frames
> were received)

I believe problem is in the port searching algorithm, when we
have exponentially grow of the number of ports to check after random
selection of the first one. This allows to have small chains but setup
time will be very long. Not sure if bind chais should be that small
actually. In the 64k patch, which allows to have more than 64k bound
sockets per system I store rough amount of bound sockets and when it
becomes larger than sysctl limit I just randomly select a bundle.
This works for the bind(0) for the sockets with reuse option though.
I posted a picture of the bind(0) time for the .28 kernel iirc.

Or is this a different issue?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-22 22:06 ` Eric Dumazet
  2009-01-22 22:14   ` Evgeniy Polyakov
@ 2009-01-22 22:40   ` Vitaly Mayatskikh
  2009-01-23  0:14     ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Vitaly Mayatskikh @ 2009-01-22 22:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vitaly Mayatskikh, David Miller, netdev

At Thu, 22 Jan 2009 23:06:59 +0100, Eric Dumazet wrote:

> > 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
> 
> Bug here, if bind() returns -1 (all ports are in use)

Yeah, there was assert(), but the program drops to problems very soon,
I was lazy to handle this situation correctly and just removed it ;)

> > Thanks!
> 
> Hello Vitaly, thanks for this excellent report.
> 
> Yes, current code is really not good when all ports are in use :
> 
> We now have to scan 28232 [1] times long chains of 220 sockets.
> Thats very long (but at least thread is preemptable)
> 
> In the past (before patches), only one thread was allowed to run in kernel while scanning
> udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).

Very true, my (older) kernel with udp_hash_lock just become totally
unresponsive after running this test. .29-rc2 become jerky only, but
still works.

> This thread was faster because it was not slowed down by other threads.
> (But the rwlock we used was responsible for starvations of writers if many UDP frames
> were received)
>
> 
> 
> One way to solve the problem could be to use following :
> 
> 1) Raising UDP_HTABLE_SIZE from 128 to 1024 to reduce average chain lengths.
>
> 2) In bind(0) algo, use rcu locking to find a possible usable port. All cpus can run in //, without
> dirtying locks. Then lock the found chain and recheck port is available before using it.

I think 2 is definitely better than 1, because 1 is not actually
fixing anything, but postpones the problem slightly.

> [1] replace 28232 by your actual /proc/sys/net/ipv4/ip_local_port_range values
> 61000 - 32768 = 28232
> 
> I will try to code a patch before this week end.

Cool!

> Thanks
> 
> Note : I tried to use a mutex to force only one thread in bind(0) code but got no real speedup.
> But it should help if you have a SMP machine, since only one cpu will be busy in bind(0)
> 

You saved my time, I was thinking about trying mutexes also. Thanks :)

--
wbr, Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-22 22:40   ` Vitaly Mayatskikh
@ 2009-01-23  0:14     ` Eric Dumazet
  2009-01-23  9:42       ` Vitaly Mayatskikh
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2009-01-23  0:14 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: David Miller, netdev

Vitaly Mayatskikh a écrit :
> At Thu, 22 Jan 2009 23:06:59 +0100, Eric Dumazet wrote:
> 
>>> 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
>> Bug here, if bind() returns -1 (all ports are in use)
> 
> Yeah, there was assert(), but the program drops to problems very soon,
> I was lazy to handle this situation correctly and just removed it ;)
> 
>>> Thanks!
>> Hello Vitaly, thanks for this excellent report.
>>
>> Yes, current code is really not good when all ports are in use :
>>
>> We now have to scan 28232 [1] times long chains of 220 sockets.
>> Thats very long (but at least thread is preemptable)
>>
>> In the past (before patches), only one thread was allowed to run in kernel while scanning
>> udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
> 
> Very true, my (older) kernel with udp_hash_lock just become totally
> unresponsive after running this test. .29-rc2 become jerky only, but
> still works.
> 
>> This thread was faster because it was not slowed down by other threads.
>> (But the rwlock we used was responsible for starvations of writers if many UDP frames
>> were received)
>>
>>
>>
>> One way to solve the problem could be to use following :
>>
>> 1) Raising UDP_HTABLE_SIZE from 128 to 1024 to reduce average chain lengths.
>>
>> 2) In bind(0) algo, use rcu locking to find a possible usable port. All cpus can run in //, without
>> dirtying locks. Then lock the found chain and recheck port is available before using it.
> 
> I think 2 is definitely better than 1, because 1 is not actually
> fixing anything, but postpones the problem slightly.
> 
>> [1] replace 28232 by your actual /proc/sys/net/ipv4/ip_local_port_range values
>> 61000 - 32768 = 28232
>>
>> I will try to code a patch before this week end.
> 
> Cool!
> 
>> Thanks
>>
>> Note : I tried to use a mutex to force only one thread in bind(0) code but got no real speedup.
>> But it should help if you have a SMP machine, since only one cpu will be busy in bind(0)
>>
> 
> You saved my time, I was thinking about trying mutexes also. Thanks :)
> 

Could you try following patch ?

Thank you

[PATCH] udp: optimize bind(0) if many ports are in use

commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.

This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.

Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.

Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

Based on a report from Vitaly Mayatskikh

Reported-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..adbdbd8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
+			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
 						 const struct sock *sk2))
@@ -132,12 +135,16 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 	sk_nulls_for_each(sk2, node, &hslot->head)
 		if (net_eq(sock_net(sk2), net)			&&
 		    sk2 != sk					&&
-		    sk2->sk_hash == num				&&
+		    (bitmap || sk2->sk_hash == num)		&&
 		    (!sk2->sk_reuse || !sk->sk_reuse)		&&
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-		    (*saddr_comp)(sk, sk2))
-			return 1;
+		    (*saddr_comp)(sk, sk2)) {
+			if (bitmap)
+				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, bitmap);
+			else
+				return 1;
+		}
 	return 0;
 }
 
@@ -158,34 +165,44 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	struct net *net = sock_net(sk);
 
 	if (!snum) {
-		int low, high, remaining;
-		unsigned rand;
+		int low, high;
+		unsigned rand, slotn, bias;
 		unsigned short first;
+		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
 		inet_get_local_port_range(&low, &high);
-		remaining = (high - low) + 1;
 
 		rand = net_random();
-		snum = first = rand % remaining + low;
-		rand |= 1;
-		for (;;) {
-			hslot = &udptable->hash[udp_hashfn(net, snum)];
+		bias = rand;
+		rand = ((rand >> 16) | 1) * UDP_HTABLE_SIZE;
+		for (slotn = 0; slotn < UDP_HTABLE_SIZE; slotn++) {
+			first = slotn + bias;
+			hslot = &udptable->hash[udp_hashfn(net, first)];
+			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
-			if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
-				break;
-			spin_unlock_bh(&hslot->lock);
+			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, saddr_comp);
+
+			snum = first;
+			/*
+			 * PORTS_PER_CHAIN loops, because snum is unsigned short
+			 * and we add an odd multiple of UDP_HTABLE_SIZE
+			 */
 			do {
-				snum = snum + rand;
-			} while (snum < low || snum > high);
-			if (snum == first)
-				goto fail;
+				if (low <= snum && snum <= high &&
+				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+					goto found;
+				snum += rand;
+			} while (snum != first);
+			spin_unlock_bh(&hslot->lock);
 		}
+		goto fail;
 	} else {
 		hslot = &udptable->hash[udp_hashfn(net, snum)];
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
 			goto fail_unlock;
 	}
+found:
 	inet_sk(sk)->num = snum;
 	sk->sk_hash = snum;
 	if (sk_unhashed(sk)) {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-22 22:14   ` Evgeniy Polyakov
@ 2009-01-23  0:20     ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2009-01-23  0:20 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Vitaly Mayatskikh, David Miller, netdev

Evgeniy Polyakov a écrit :
> Hi Eric.
> 
> On Thu, Jan 22, 2009 at 11:06:59PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
>> Hello Vitaly, thanks for this excellent report.
>>
>> Yes, current code is really not good when all ports are in use :
>>
>> We now have to scan 28232 [1] times long chains of 220 sockets.
>> Thats very long (but at least thread is preemptable)
>>
>> In the past (before patches), only one thread was allowed to run in kernel while scanning
>> udp port table (we had only one global lock udp_hash_lock protecting the whole udp table).
>> This thread was faster because it was not slowed down by other threads.
>> (But the rwlock we used was responsible for starvations of writers if many UDP frames
>> were received)
>  
> I believe problem is in the port searching algorithm, when we
> have exponentially grow of the number of ports to check after random
> selection of the first one. This allows to have small chains but setup
> time will be very long. Not sure if bind chais should be that small
> actually. In the 64k patch, which allows to have more than 64k bound
> sockets per system I store rough amount of bound sockets and when it
> becomes larger than sysctl limit I just randomly select a bundle.
> This works for the bind(0) for the sockets with reuse option though.
> I posted a picture of the bind(0) time for the .28 kernel iirc.
> 
> Or is this a different issue?
> 

Well, this is not exactly the same issue, udp bind() code is slightly different
than tcp. (Probably not so many machines use lot of udp sockets)

Since UDP hash table is really small (128 slots), we can try to allocate UDP ports
chains per chain, instead of port per port, to reduce number of chain lookups.
In tcp, most machines have 64k slots for bind table so this wont help


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23  0:14     ` Eric Dumazet
@ 2009-01-23  9:42       ` Vitaly Mayatskikh
  2009-01-23 11:45         ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Mayatskikh @ 2009-01-23  9:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vitaly Mayatskikh, David Miller, netdev

At Fri, 23 Jan 2009 01:14:58 +0100, Eric Dumazet wrote:

> Could you try following patch ?
> 
> Thank you
> 
> [PATCH] udp: optimize bind(0) if many ports are in use
> 
> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
> to null port (getting a random port) in case lot of ports are already in use.
> 
> This is because we do about 28000 scans of very long chains (220 sockets per chain),
> with many spin_lock_bh()/spin_unlock_bh() calls.
> 
> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
> so that we scan chains at most once.
> 
> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

It's much better, thanks. FPS in glxgears now drops down only 2x
harder if compare with 2.6.28. However, this again kills randomness :)
Now number distribution is k*x^2 with x-axis zero in the (high - low)
/ 2. Try this program, it produces input file for Octave + Gnuplot.

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define PORTS 65536

int main()
{
	int s, err, i, j;
	char buf[256];
	struct sockaddr_in sa;
	int optval = 1, port;
	unsigned int p[PORTS] = { 0 };

	for (i = 0; i < PORTS * 100; ++i) {
		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
		memset(&sa, 0, sizeof(sa));
		sa.sin_addr.s_addr = htonl(INADDR_ANY);
		sa.sin_family = AF_INET;
		sa.sin_port = 0;
		setsockopt(s, IPPROTO_UDP, SO_REUSEADDR, &optval, sizeof(optval));
		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
		getsockname(s, (struct sockaddr*)&sa, &j);
		port = ntohs(sa.sin_port);
		p[port]++;
		close(s);
	}
	printf("x = 32766:1:65535;\ny = [-100; ");
	for (i = 32767; i < PORTS; i++)
		printf("%d%s", p[i], (i + 1 < PORTS ? "; " : ""));
	printf("];\nplot(x,y,'.');pause;");
}

I was thinking about bitmap also, but in a bit different approach. It
is also uses bias (delta) value instead of exact port number. When we
get next random port value (from rng or in the next iteration), we can
calculate byte offset in that bitmap:

    A        B        C        D
76543210 76543210 7654321 076543210
11110111 11011110 1001111 011110111
            ^

We here land in the byte B in the marked bit position, but it is
already busy. If there're any free bits in this byte B, we can stop
further iterations and use any free bit. I don't think it can kill
randomness too much, because average bias will be small. May be it
only needs some more complicated logic for searching free bit in the
byte, because it's not good to do scanning always from the beginning.

--
wbr, Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23  9:42       ` Vitaly Mayatskikh
@ 2009-01-23 11:45         ` Eric Dumazet
  2009-01-23 13:44           ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2009-01-23 11:45 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: David Miller, netdev

Vitaly Mayatskikh a écrit :
> At Fri, 23 Jan 2009 01:14:58 +0100, Eric Dumazet wrote:
> 
>> Could you try following patch ?
>>
>> Thank you
>>
>> [PATCH] udp: optimize bind(0) if many ports are in use
>>
>> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
>> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
>> to null port (getting a random port) in case lot of ports are already in use.
>>
>> This is because we do about 28000 scans of very long chains (220 sockets per chain),
>> with many spin_lock_bh()/spin_unlock_bh() calls.
>>
>> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
>> so that we scan chains at most once.
>>
>> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 
> 
> It's much better, thanks. FPS in glxgears now drops down only 2x
> harder if compare with 2.6.28. However, this again kills randomness :)
> Now number distribution is k*x^2 with x-axis zero in the (high - low)
> / 2. Try this program, it produces input file for Octave + Gnuplot.
> 
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> 
> #define PORTS 65536
> 
> int main()
> {
> 	int s, err, i, j;
> 	char buf[256];
> 	struct sockaddr_in sa;
> 	int optval = 1, port;
> 	unsigned int p[PORTS] = { 0 };
> 
> 	for (i = 0; i < PORTS * 100; ++i) {
> 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
> 		memset(&sa, 0, sizeof(sa));
> 		sa.sin_addr.s_addr = htonl(INADDR_ANY);
> 		sa.sin_family = AF_INET;
> 		sa.sin_port = 0;
> 		setsockopt(s, IPPROTO_UDP, SO_REUSEADDR, &optval, sizeof(optval));
> 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
> 		getsockname(s, (struct sockaddr*)&sa, &j);
> 		port = ntohs(sa.sin_port);
> 		p[port]++;
> 		close(s);
> 	}
> 	printf("x = 32766:1:65535;\ny = [-100; ");
> 	for (i = 32767; i < PORTS; i++)
> 		printf("%d%s", p[i], (i + 1 < PORTS ? "; " : ""));
> 	printf("];\nplot(x,y,'.');pause;");
> }
> 
> I was thinking about bitmap also, but in a bit different approach. It
> is also uses bias (delta) value instead of exact port number. When we
> get next random port value (from rng or in the next iteration), we can
> calculate byte offset in that bitmap:
> 
>     A        B        C        D
> 76543210 76543210 7654321 076543210
> 11110111 11011110 1001111 011110111
>             ^
> 
> We here land in the byte B in the marked bit position, but it is
> already busy. If there're any free bits in this byte B, we can stop
> further iterations and use any free bit. I don't think it can kill
> randomness too much, because average bias will be small. May be it
> only needs some more complicated logic for searching free bit in the
> byte, because it's not good to do scanning always from the beginning.
> 

Interesting... Please note I dont search in the bitmap from its begining,
but from a random point.

Maybe we should study lib/random32.c and discover it has said distribution :)

Since my algo uses net_random() (random32() to get 32 bits number, that we
split in two "16 bits numbers" (bias & rand).

One (bias) to select the starting chain and starting slot in chain (so it really should be random)
first = bias + 0  (slotn=0)

One (rand, forced to be odd) to select next slot in chain in case current slot is already in use.
I feel this is the problem, because when we hit a slot outside of ip_local_port_range, it seems we
escape from this range with the distribution you got. maybe we should get rand depending on ip_local_port_range




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23 11:45         ` Eric Dumazet
@ 2009-01-23 13:44           ` Eric Dumazet
  2009-01-23 14:56             ` Vitaly Mayatskikh
  2009-01-26  8:20             ` [PATCH] udp: optimize bind(0) if many ports are in use Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2009-01-23 13:44 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: David Miller, netdev

Eric Dumazet a écrit :
> Vitaly Mayatskikh a écrit :
>> At Fri, 23 Jan 2009 01:14:58 +0100, Eric Dumazet wrote:
>>
>>> Could you try following patch ?
>>>
>>> Thank you
>>>
>>> [PATCH] udp: optimize bind(0) if many ports are in use
>>>
>>> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
>>> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
>>> to null port (getting a random port) in case lot of ports are already in use.
>>>
>>> This is because we do about 28000 scans of very long chains (220 sockets per chain),
>>> with many spin_lock_bh()/spin_unlock_bh() calls.
>>>
>>> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
>>> so that we scan chains at most once.
>>>
>>> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 
>> It's much better, thanks. FPS in glxgears now drops down only 2x
>> harder if compare with 2.6.28. However, this again kills randomness :)
>> Now number distribution is k*x^2 with x-axis zero in the (high - low)

> 
> Interesting... Please note I dont search in the bitmap from its begining,
> but from a random point.
> 
> Maybe we should study lib/random32.c and discover it has said distribution :)
> 
> Since my algo uses net_random() (random32() to get 32 bits number, that we
> split in two "16 bits numbers" (bias & rand).
> 
> One (bias) to select the starting chain and starting slot in chain (so it really should be random)
> first = bias + 0  (slotn=0)
> 
> One (rand, forced to be odd) to select next slot in chain in case current slot is already in use.
> I feel this is the problem, because when we hit a slot outside of ip_local_port_range, it seems we
> escape from this range with the distribution you got. maybe we should get rand depending on ip_local_port_range
> 
> 

Here is an updated of patch, that still have an uniform random distribution,
keeping a starting point in the range [low - high].

My attempt to avoid a divide failed ;)

Thank you

[PATCH] udp: optimize bind(0) if many ports are in use

commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.

This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.

Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.

Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

Based on a report from Vitaly Mayatskikh

Reported-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..6e27868 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
+			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
 						 const struct sock *sk2))
@@ -132,12 +135,16 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 	sk_nulls_for_each(sk2, node, &hslot->head)
 		if (net_eq(sock_net(sk2), net)			&&
 		    sk2 != sk					&&
-		    sk2->sk_hash == num				&&
+		    (bitmap || sk2->sk_hash == num)		&&
 		    (!sk2->sk_reuse || !sk->sk_reuse)		&&
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-		    (*saddr_comp)(sk, sk2))
-			return 1;
+		    (*saddr_comp)(sk, sk2)) {
+			if (bitmap)
+				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, bitmap);
+			else
+				return 1;
+		}
 	return 0;
 }
 
@@ -160,32 +167,42 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	if (!snum) {
 		int low, high, remaining;
 		unsigned rand;
-		unsigned short first;
+		unsigned short first, last;
+		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
 		inet_get_local_port_range(&low, &high);
 		remaining = (high - low) + 1;
 
 		rand = net_random();
-		snum = first = rand % remaining + low;
-		rand |= 1;
-		for (;;) {
-			hslot = &udptable->hash[udp_hashfn(net, snum)];
+		first = rand % remaining + low;
+		rand = (rand | 1) * UDP_HTABLE_SIZE;
+		for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
+			hslot = &udptable->hash[udp_hashfn(net, first)];
+			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
-			if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
-				break;
-			spin_unlock_bh(&hslot->lock);
+			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, saddr_comp);
+
+			snum = first;
+			/*
+			 * PORTS_PER_CHAIN loops, because snum is unsigned short
+			 * and we add an odd multiple of UDP_HTABLE_SIZE
+			 */
 			do {
-				snum = snum + rand;
-			} while (snum < low || snum > high);
-			if (snum == first)
-				goto fail;
+				if (low <= snum && snum <= high &&
+				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+					goto found;
+				snum += rand;
+			} while (snum != first);
+			spin_unlock_bh(&hslot->lock);
 		}
+		goto fail;
 	} else {
 		hslot = &udptable->hash[udp_hashfn(net, snum)];
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
 			goto fail_unlock;
 	}
+found:
 	inet_sk(sk)->num = snum;
 	sk->sk_hash = snum;
 	if (sk_unhashed(sk)) {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23 13:44           ` Eric Dumazet
@ 2009-01-23 14:56             ` Vitaly Mayatskikh
  2009-01-23 16:05               ` Eric Dumazet
  2009-01-26  8:20             ` [PATCH] udp: optimize bind(0) if many ports are in use Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Vitaly Mayatskikh @ 2009-01-23 14:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vitaly Mayatskikh, David Miller, netdev

At Fri, 23 Jan 2009 14:44:22 +0100, Eric Dumazet wrote:

> Here is an updated of patch, that still have an uniform random distribution,
> keeping a starting point in the range [low - high].
> 
> My attempt to avoid a divide failed ;)
> 
> Thank you
> 
> [PATCH] udp: optimize bind(0) if many ports are in use
> 
> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
> to null port (getting a random port) in case lot of ports are already in use.
> 
> This is because we do about 28000 scans of very long chains (220 sockets per chain),
> with many spin_lock_bh()/spin_unlock_bh() calls.
> 
> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
> so that we scan chains at most once.
> 
> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 
> 
> Based on a report from Vitaly Mayatskikh
> 
> Reported-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index cf5ab05..6e27868 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
>  atomic_t udp_memory_allocated;
>  EXPORT_SYMBOL(udp_memory_allocated);
>  
> +#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
> +
>  static int udp_lib_lport_inuse(struct net *net, __u16 num,
>  			       const struct udp_hslot *hslot,
> +			       unsigned long *bitmap,
>  			       struct sock *sk,
>  			       int (*saddr_comp)(const struct sock *sk1,
>  						 const struct sock *sk2))
> @@ -132,12 +135,16 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
>  	sk_nulls_for_each(sk2, node, &hslot->head)
>  		if (net_eq(sock_net(sk2), net)			&&
>  		    sk2 != sk					&&
> -		    sk2->sk_hash == num				&&
> +		    (bitmap || sk2->sk_hash == num)		&&
>  		    (!sk2->sk_reuse || !sk->sk_reuse)		&&
>  		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
>  			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
> -		    (*saddr_comp)(sk, sk2))
> -			return 1;
> +		    (*saddr_comp)(sk, sk2)) {
> +			if (bitmap)
> +				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, bitmap);
> +			else
> +				return 1;
> +		}
>  	return 0;
>  }
>  
> @@ -160,32 +167,42 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>  	if (!snum) {
>  		int low, high, remaining;
>  		unsigned rand;
> -		unsigned short first;
> +		unsigned short first, last;
> +		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
>  
>  		inet_get_local_port_range(&low, &high);
>  		remaining = (high - low) + 1;
>  
>  		rand = net_random();
> -		snum = first = rand % remaining + low;
> -		rand |= 1;
> -		for (;;) {
> -			hslot = &udptable->hash[udp_hashfn(net, snum)];
> +		first = rand % remaining + low;
> +		rand = (rand | 1) * UDP_HTABLE_SIZE;
> +		for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
> +			hslot = &udptable->hash[udp_hashfn(net, first)];
> +			bitmap_zero(bitmap, PORTS_PER_CHAIN);
>  			spin_lock_bh(&hslot->lock);
> -			if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
> -				break;
> -			spin_unlock_bh(&hslot->lock);
> +			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, saddr_comp);
> +
> +			snum = first;
> +			/*
> +			 * PORTS_PER_CHAIN loops, because snum is unsigned short
> +			 * and we add an odd multiple of UDP_HTABLE_SIZE
> +			 */
>  			do {
> -				snum = snum + rand;
> -			} while (snum < low || snum > high);
> -			if (snum == first)
> -				goto fail;
> +				if (low <= snum && snum <= high &&
> +				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
> +					goto found;
> +				snum += rand;
> +			} while (snum != first);
> +			spin_unlock_bh(&hslot->lock);
>  		}
> +		goto fail;
>  	} else {
>  		hslot = &udptable->hash[udp_hashfn(net, snum)];
>  		spin_lock_bh(&hslot->lock);
> -		if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
> +		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
>  			goto fail_unlock;
>  	}
> +found:
>  	inet_sk(sk)->num = snum;
>  	sk->sk_hash = snum;
>  	if (sk_unhashed(sk)) {
> 

Both randomness and speed are good. There're several never bound ports
in the test, but only few. Thank you for the patch.

--
wbr, Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23 14:56             ` Vitaly Mayatskikh
@ 2009-01-23 16:05               ` Eric Dumazet
  2009-01-23 16:14                 ` Vitaly Mayatskikh
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2009-01-23 16:05 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: David Miller, netdev

Vitaly Mayatskikh a écrit :

> Both randomness and speed are good. There're several never bound ports
> in the test, but only few. Thank you for the patch.

I guess "never bound ports" were already in use by other applications on your machine :)




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: speed regression in udp_lib_lport_inuse()
  2009-01-23 16:05               ` Eric Dumazet
@ 2009-01-23 16:14                 ` Vitaly Mayatskikh
  0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Mayatskikh @ 2009-01-23 16:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vitaly Mayatskikh, David Miller, netdev

At Fri, 23 Jan 2009 17:05:15 +0100, Eric Dumazet wrote:

> > Both randomness and speed are good. There're several never bound ports
> > in the test, but only few. Thank you for the patch.
> 
> I guess "never bound ports" were already in use by other applications on your machine :)

Possibly, but not very probably :)

--
wbr, Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] udp: optimize bind(0) if many ports are in use
  2009-01-23 13:44           ` Eric Dumazet
  2009-01-23 14:56             ` Vitaly Mayatskikh
@ 2009-01-26  8:20             ` Eric Dumazet
  2009-01-27  5:35               ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2009-01-26  8:20 UTC (permalink / raw)
  To: David Miller; +Cc: Vitaly Mayatskikh, netdev

Hello David

Here is last patch about performance regression of UDP bind() to random port.

Change against previous one is the divide, that can be replaced by a multiply,
as already done in other parts of kernel.

ie instead of doing : "first = rand % remaining + low;" we can use :

first = (((u64)rand * remaining) >> 32) + low;

Also some polishing about too long lines and comments before official submission.

Thank you


[PATCH] udp: optimize bind(0) if many ports are in use

commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.

This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.

Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.

Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

Based on a report from Vitaly Mayatskikh

Reported-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cf5ab05..b7faffe 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,8 +120,11 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
+			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
 						 const struct sock *sk2))
@@ -132,12 +135,17 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 	sk_nulls_for_each(sk2, node, &hslot->head)
 		if (net_eq(sock_net(sk2), net)			&&
 		    sk2 != sk					&&
-		    sk2->sk_hash == num				&&
+		    (bitmap || sk2->sk_hash == num)		&&
 		    (!sk2->sk_reuse || !sk->sk_reuse)		&&
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-		    (*saddr_comp)(sk, sk2))
-			return 1;
+		    (*saddr_comp)(sk, sk2)) {
+			if (bitmap)
+				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE,
+					  bitmap);
+			else
+				return 1;
+		}
 	return 0;
 }
 
@@ -160,32 +168,47 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	if (!snum) {
 		int low, high, remaining;
 		unsigned rand;
-		unsigned short first;
+		unsigned short first, last;
+		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
 		inet_get_local_port_range(&low, &high);
 		remaining = (high - low) + 1;
 
 		rand = net_random();
-		snum = first = rand % remaining + low;
-		rand |= 1;
-		for (;;) {
-			hslot = &udptable->hash[udp_hashfn(net, snum)];
+		first = (((u64)rand * remaining) >> 32) + low;
+		/*
+		 * force rand to be an odd multiple of UDP_HTABLE_SIZE
+		 */
+		rand = (rand | 1) * UDP_HTABLE_SIZE;
+		for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
+			hslot = &udptable->hash[udp_hashfn(net, first)];
+			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
-			if (!udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
-				break;
-			spin_unlock_bh(&hslot->lock);
+			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk,
+					    saddr_comp);
+
+			snum = first;
+			/*
+			 * Iterate on all possible values of snum for this hash.
+			 * Using steps of an odd multiple of UDP_HTABLE_SIZE
+			 * give us randomization and full range coverage.
+			 */
 			do {
-				snum = snum + rand;
-			} while (snum < low || snum > high);
-			if (snum == first)
-				goto fail;
+				if (low <= snum && snum <= high &&
+				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+					goto found;
+				snum += rand;
+			} while (snum != first);
+			spin_unlock_bh(&hslot->lock);
 		}
+		goto fail;
 	} else {
 		hslot = &udptable->hash[udp_hashfn(net, snum)];
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
 			goto fail_unlock;
 	}
+found:
 	inet_sk(sk)->num = snum;
 	sk->sk_hash = snum;
 	if (sk_unhashed(sk)) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] udp: optimize bind(0) if many ports are in use
  2009-01-26  8:20             ` [PATCH] udp: optimize bind(0) if many ports are in use Eric Dumazet
@ 2009-01-27  5:35               ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2009-01-27  5:35 UTC (permalink / raw)
  To: dada1; +Cc: v.mayatskih, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 26 Jan 2009 09:20:31 +0100

> [PATCH] udp: optimize bind(0) if many ports are in use

Applied, thanks a lot everyone.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-01-27  5:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-22 18:49 speed regression in udp_lib_lport_inuse() Vitaly Mayatskikh
2009-01-22 22:06 ` Eric Dumazet
2009-01-22 22:14   ` Evgeniy Polyakov
2009-01-23  0:20     ` Eric Dumazet
2009-01-22 22:40   ` Vitaly Mayatskikh
2009-01-23  0:14     ` Eric Dumazet
2009-01-23  9:42       ` Vitaly Mayatskikh
2009-01-23 11:45         ` Eric Dumazet
2009-01-23 13:44           ` Eric Dumazet
2009-01-23 14:56             ` Vitaly Mayatskikh
2009-01-23 16:05               ` Eric Dumazet
2009-01-23 16:14                 ` Vitaly Mayatskikh
2009-01-26  8:20             ` [PATCH] udp: optimize bind(0) if many ports are in use Eric Dumazet
2009-01-27  5:35               ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).