netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* inet established hash question
@ 2008-03-04 19:45 Cosmin Ratiu
  2008-03-04 22:24 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Cosmin Ratiu @ 2008-03-04 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Octavian Purdila

Hello,

I work at Ixia (most of you probably heard of it), we do network testing 
using a custom Linux distribution and some specialized hardware. There 
is one scalability issue we ran into a while ago regarding large number 
of tcp connections and although we solved it by changing the established 
hash function, we'd like your opinion on the issue if you're kind enough.

Basically, the situation is as follows:
There is a client machine and a server machine. Both create 15000 
virtual interfaces, open up a socket for each pair of interfaces and do 
SIP traffic. By profiling I noticed that there is a lot of time spent 
walking the established hash chains with this particular setup. We are 
using an old version of the kernel (2.6.7), which was using the 
following hash function:

/static __inline__ int tcp_hashfn(__u32 laddr, __u16 lport,
                 __u32 faddr, __u16 fport)
{
    int h = (laddr ^ lport) ^ (faddr ^ fport);
    h ^= h >> 16;
    h ^= h >> 8;
    return h & (tcp_ehash_size - 1);
}/

The addresses were distributed like this: client interfaces were 
198.18.0.1/16 with increments of 1 and server interfaces were 
198.18.128.1/16 with increments of 1. As I said, there were 15000 
interfaces. Source and destination ports were 5060 for each connection. 
So in this case, ports don't matter for hashing purposes, and the bits 
from the address pairs used cancel each other, meaning there are no 
differences in the whole lot of pairs, so they all end up in the same 
hash chain.

After investigating things, I noticed the hash function has been changed 
in the recent kernels to
/
static inline unsigned int inet_ehashfn(const __be32 laddr, const __u16 
lport,
                    const __be32 faddr, const __be16 fport)
{
    return jhash_2words((__force __u32) laddr ^ (__force __u32) faddr,
                ((__u32) lport) << 16 | (__force __u32)fport,
                inet_ehash_secret);
}
/
We tested with the new function and it seems that the results are the 
same for this case: bits in address pairs cancel each other out and they 
all end up in the same chain.

So I changed the function yet again to stop xor-ing addresses before 
feeding them to the jenkins hash. I got something like:
/
{
    int h = jhash_3words(laddr, faddr, ((__u32)lport) << 16 | fport, 
tcp_ehash_secret);

    return h & (tcp_ehash_size - 1);
}/

This way, connections get distributed properly in this case and other 
cases we tested so far.

So, thanks for reading through all this. My question is whether this is 
a good thing to do or not, as I am not so good with hash functions, so I 
can't say for sure if we won't run into a collision with a different setup.


Thank you,
Cosmin.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: inet established hash question
  2008-03-04 19:45 inet established hash question Cosmin Ratiu
@ 2008-03-04 22:24 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2008-03-04 22:24 UTC (permalink / raw)
  To: cratiu; +Cc: netdev, opurdila

From: Cosmin Ratiu <cratiu@ixiacom.com>
Date: Tue, 04 Mar 2008 21:45:40 +0200

> My question is whether this is a good thing to do or not, as I am
> not so good with hash functions, so I can't say for sure if we won't
> run into a collision with a different setup.

It makes perfect sense, I'll add the following patch to
the tree, thanks!

[TCP]: Improve ipv4 established hash function.

If all of the entropy is in the local and foreign addresses,
but xor'ing together would cancel out that entropy, the
current hash performs poorly.

Suggested by Cosmin Ratiu:

	Basically, the situation is as follows: There is a client
	machine and a server machine. Both create 15000 virtual
	interfaces, open up a socket for each pair of interfaces and
	do SIP traffic. By profiling I noticed that there is a lot of
	time spent walking the established hash chains with this
	particular setup.

	The addresses were distributed like this: client interfaces
	were 198.18.0.1/16 with increments of 1 and server interfaces
	were 198.18.128.1/16 with increments of 1. As I said, there
	were 15000 interfaces. Source and destination ports were 5060
	for each connection.  So in this case, ports don't matter for
	hashing purposes, and the bits from the address pairs used
	cancel each other, meaning there are no differences in the
	whole lot of pairs, so they all end up in the same hash chain.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 70013c5..89cd011 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -175,7 +175,8 @@ extern void build_ehash_secret(void);
 static inline unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
 					const __be32 faddr, const __be16 fport)
 {
-	return jhash_2words((__force __u32) laddr ^ (__force __u32) faddr,
+	return jhash_3words((__force __u32) laddr,
+			    (__force __u32) faddr,
 			    ((__u32) lport) << 16 | (__force __u32)fport,
 			    inet_ehash_secret);
 }

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-03-04 22:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-04 19:45 inet established hash question Cosmin Ratiu
2008-03-04 22:24 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).