* [PATCH] improvement on net/sched/cls_fw.c's hash function
@ 2005-04-05 5:35 Wang Jian
2005-04-05 5:37 ` David S. Miller
0 siblings, 1 reply; 33+ messages in thread
From: Wang Jian @ 2005-04-05 5:35 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 390 bytes --]
Hi,
This is a simple patch against net/sched/cls_fw.c. The idea of this
patch is discussed in this thread
https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html
I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then
251 is good too.
BTW: I don't know much about hash performance and hash distribution of
jhash. This is a quick fix.
--
lark
[-- Attachment #2: hash-cls_fw.diff --]
[-- Type: application/octet-stream, Size: 1029 bytes --]
Index: cls_fw.c
===================================================================
--- cls_fw.c (revision 1)
+++ cls_fw.c (working copy)
@@ -45,10 +45,13 @@
#include <net/sock.h>
#include <net/act_api.h>
#include <net/pkt_cls.h>
+#include <linux/jhash.h>
+#define FW_FILTER_HSIZE 509
+
struct fw_head
{
- struct fw_filter *ht[256];
+ struct fw_filter *ht[FW_FILTER_HSIZE];
};
struct fw_filter
@@ -69,7 +72,7 @@
static __inline__ int fw_hash(u32 handle)
{
- return handle&0xFF;
+ return (jhash_1word(handle, 0xF30A7129) % FW_FILTER_HSIZE);
}
static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
@@ -152,7 +155,7 @@
if (head == NULL)
return;
- for (h=0; h<256; h++) {
+ for (h=0; h<FW_FILTER_HSIZE; h++) {
while ((f=head->ht[h]) != NULL) {
head->ht[h] = f->next;
fw_delete_filter(tp, f);
@@ -291,7 +294,7 @@
if (arg->stop)
return;
- for (h = 0; h < 256; h++) {
+ for (h = 0; h < FW_FILTER_HSIZE; h++) {
struct fw_filter *f;
for (f = head->ht[h]; f; f = f->next) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 5:35 [PATCH] improvement on net/sched/cls_fw.c's hash function Wang Jian
@ 2005-04-05 5:37 ` David S. Miller
2005-04-05 6:05 ` Wang Jian
0 siblings, 1 reply; 33+ messages in thread
From: David S. Miller @ 2005-04-05 5:37 UTC (permalink / raw)
To: Wang Jian; +Cc: netdev
On Tue, 05 Apr 2005 13:35:02 +0800
Wang Jian <lark@linux.net.cn> wrote:
> https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html
>
> I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then
> 251 is good too.
Please us a power of two, the "%" is expensive on some cpus.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 5:37 ` David S. Miller
@ 2005-04-05 6:05 ` Wang Jian
2005-04-05 10:25 ` jamal
2005-04-05 10:38 ` Thomas Graf
0 siblings, 2 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-05 6:05 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 507 bytes --]
Hi David S. Miller,
New patch attached. Hashsize is 256, the same as old one.
On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" <davem@davemloft.net> wrote:
> On Tue, 05 Apr 2005 13:35:02 +0800
> Wang Jian <lark@linux.net.cn> wrote:
>
> > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html
> >
> > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then
> > 251 is good too.
>
> Please us a power of two, the "%" is expensive on some cpus.
--
lark
[-- Attachment #2: hash-cls_fw-2.diff --]
[-- Type: application/octet-stream, Size: 1104 bytes --]
Index: linux-2.6.11-w/net/sched/cls_fw.c
===================================================================
--- linux-2.6.11-w/net/sched/cls_fw.c (revision 1)
+++ linux-2.6.11-w/net/sched/cls_fw.c (working copy)
@@ -45,10 +45,13 @@
#include <net/sock.h>
#include <net/act_api.h>
#include <net/pkt_cls.h>
+#include <linux/jhash.h>
+#define FW_FILTER_HSIZE 256
+
struct fw_head
{
- struct fw_filter *ht[256];
+ struct fw_filter *ht[FW_FILTER_HSIZE];
};
struct fw_filter
@@ -69,7 +72,7 @@
static __inline__ int fw_hash(u32 handle)
{
- return handle&0xFF;
+ return (jhash_1word(handle, 0xF30A7129) % FW_FILTER_HSIZE);
}
static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
@@ -152,7 +155,7 @@
if (head == NULL)
return;
- for (h=0; h<256; h++) {
+ for (h=0; h<FW_FILTER_HSIZE; h++) {
while ((f=head->ht[h]) != NULL) {
head->ht[h] = f->next;
fw_delete_filter(tp, f);
@@ -291,7 +294,7 @@
if (arg->stop)
return;
- for (h = 0; h < 256; h++) {
+ for (h = 0; h < FW_FILTER_HSIZE; h++) {
struct fw_filter *f;
for (f = head->ht[h]; f; f = f->next) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 6:05 ` Wang Jian
@ 2005-04-05 10:25 ` jamal
2005-04-05 10:38 ` Thomas Graf
1 sibling, 0 replies; 33+ messages in thread
From: jamal @ 2005-04-05 10:25 UTC (permalink / raw)
To: Wang Jian; +Cc: netdev
Wang,
I read that thread and i am a little confused. What is this change
supposed to improve?
cheers,
jamal
On Tue, 2005-04-05 at 02:05, Wang Jian wrote:
> Hi David S. Miller,
>
> New patch attached. Hashsize is 256, the same as old one.
>
>
> On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" <davem@davemloft.net> wrote:
>
> > On Tue, 05 Apr 2005 13:35:02 +0800
> > Wang Jian <lark@linux.net.cn> wrote:
> >
> > > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html
> > >
> > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then
> > > 251 is good too.
> >
> > Please us a power of two, the "%" is expensive on some cpus.
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 6:05 ` Wang Jian
2005-04-05 10:25 ` jamal
@ 2005-04-05 10:38 ` Thomas Graf
2005-04-05 11:25 ` Wang Jian
1 sibling, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-05 10:38 UTC (permalink / raw)
To: Wang Jian; +Cc: netdev
* Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05
> New patch attached. Hashsize is 256, the same as old one.
Do you have any numbers that could prove that this change
actually improves the hash distribution and thus the overall
lookup performance?
The most often used and thus most important range of mark
values is definitely 0..255. I did not look into jhash
but the risk of collisions definitely increases with this
change which affects about 90% of the users of fw which
could benefit from a collision free hashtable so far.
I would appreciate if you could provide some numbers proving
both the need and actual improvement of this change since
fwmark is one of the most often used classifiers.
Cheers
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 10:38 ` Thomas Graf
@ 2005-04-05 11:25 ` Wang Jian
2005-04-05 12:16 ` Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: Wang Jian @ 2005-04-05 11:25 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, jamal
Hi Thomas Graf,
If you read the thread I pointed to, then you know there is chance that
nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16
bit number can be mapped to a classid. This is one of many chances.
In that case, nfmark can be used like this
0x00010000
0x00020000
0x00030000
...
0x00000001
0x00000002
0x00000003
...
The old hash function doesn't expect such pattern.
I must admit that I am not very familiar with hash function. I find that
and use a quick hack. My patch just points out the existing risk. Anyone
can improve this by using a faster and even distributed hash function.
And actually, for 256 as hash size, the second patch I sent can be still
improved,
return (jhash_1word(handle, 0xF30A7129) & 0xFF);
instead of
return (jhash_1word(handle, 0xF30A7129) % 256);
On Tue, 5 Apr 2005 12:38:27 +0200, Thomas Graf <tgraf@suug.ch> wrote:
> * Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05
> > New patch attached. Hashsize is 256, the same as old one.
>
> Do you have any numbers that could prove that this change
> actually improves the hash distribution and thus the overall
> lookup performance?
>
> The most often used and thus most important range of mark
> values is definitely 0..255. I did not look into jhash
> but the risk of collisions definitely increases with this
> change which affects about 90% of the users of fw which
> could benefit from a collision free hashtable so far.
>
> I would appreciate if you could provide some numbers proving
> both the need and actual improvement of this change since
> fwmark is one of the most often used classifiers.
>
> Cheers
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 11:25 ` Wang Jian
@ 2005-04-05 12:16 ` Thomas Graf
2005-04-05 12:39 ` Wang Jian
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-05 12:16 UTC (permalink / raw)
To: Wang Jian; +Cc: netdev, jamal
* Wang Jian <20050405190024.024D.LARK@linux.net.cn> 2005-04-05 19:25
> If you read the thread I pointed to, then you know there is chance that
> nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16
> bit number can be mapped to a classid. This is one of many chances.
>
> In that case, nfmark can be used like this
>
> 0x00010000
> 0x00020000
> 0x00030000
> ...
>
> 0x00000001
> 0x00000002
> 0x00000003
> ...
>
> The old hash function doesn't expect such pattern.
I'm aware of the problem you're facing, if the lower 8bits are
set to 0 for a large amount of flows you get all that flows
chained in the first hash bucket.
> I must admit that I am not very familiar with hash function. I find that
> and use a quick hack. My patch just points out the existing risk. Anyone
> can improve this by using a faster and even distributed hash function.
I can't really give you feedback on this since I don't have the
background for this. Theoretically a hash size being a prime would
do better but is stupid regarding slab efficiency.
What I'm worried about is that we lose the zero collisions behaviour
for the most popular use case.
New idea: we make this configureable and allow 3 types of hash functions:
1) default as-is, perfect for marks 0..255
2) all bits taken into account (your patch)
3) bitmask + shift provided by the user just like
dsmark.
Thoughts?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 12:16 ` Thomas Graf
@ 2005-04-05 12:39 ` Wang Jian
2005-04-05 12:52 ` Thomas Graf
2005-04-05 12:54 ` jamal
0 siblings, 2 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-05 12:39 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, jamal
Hi Thomas Graf,
On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote:
>
> What I'm worried about is that we lose the zero collisions behaviour
> for the most popular use case.
If a web interface is used to generate netfilter/tc rules that use
nfmark, then the above assumption is false. nfmark will be used
incrementally and wrapped back to 0 somewhere like process id. So zero
collision is not likely.
When linux's QoS control capability is widely used, such web interface
sooner or later comes into being.
> New idea: we make this configureable and allow 3 types of hash functions:
> 1) default as-is, perfect for marks 0..255
> 2) all bits taken into account (your patch)
> 3) bitmask + shift provided by the user just like
> dsmark.
>
> Thoughts?
Your suggestion is very considerable. But that needs some more work. And,
isn't that some bloated?
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 12:39 ` Wang Jian
@ 2005-04-05 12:52 ` Thomas Graf
2005-04-05 13:29 ` Wang Jian
2005-04-05 12:54 ` jamal
1 sibling, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-05 12:52 UTC (permalink / raw)
To: Wang Jian; +Cc: netdev, jamal
* Wang Jian <20050405202039.0250.LARK@linux.net.cn> 2005-04-05 20:39
> On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote:
> > What I'm worried about is that we lose the zero collisions behaviour
> > for the most popular use case.
>
> If a web interface is used to generate netfilter/tc rules that use
> nfmark, then the above assumption is false. nfmark will be used
> incrementally and wrapped back to 0 somewhere like process id. So zero
> collision is not likely.
I did not claim that the above assumption is true for all case but the
most common use of cls_fw is static marks set by netfilter to values
from 0..255.
> When linux's QoS control capability is widely used, such web interface
> sooner or later comes into being.
That might be true but I will never ack on something that makes zero
collision use of cls_fw impossible. I'm all for improving this but
not at the cost of reduced performance for the most obvious use case
of cls_fw.
> Your suggestion is very considerable. But that needs some more work. And,
> isn't that some bloated?
The shift + bitmask might be bloated and can be deferred a bit until
someone comes up with this need. I can cook up a patch for this
if you want, it's not much work.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 12:39 ` Wang Jian
2005-04-05 12:52 ` Thomas Graf
@ 2005-04-05 12:54 ` jamal
2005-04-05 14:18 ` Wang Jian
1 sibling, 1 reply; 33+ messages in thread
From: jamal @ 2005-04-05 12:54 UTC (permalink / raw)
To: Wang Jian; +Cc: Thomas Graf, netdev
On Tue, 2005-04-05 at 08:39, Wang Jian wrote:
> Hi Thomas Graf,
>
>
> On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote:
>
> >
> > What I'm worried about is that we lose the zero collisions behaviour
> > for the most popular use case.
>
> If a web interface is used to generate netfilter/tc rules that use
> nfmark, then the above assumption is false. nfmark will be used
> incrementally and wrapped back to 0 somewhere like process id. So zero
> collision is not likely.
>
Yes, but the distribution is still very good even in that case.
If you have 257 entries then all except for two will be in separate
buckets.
> When linux's QoS control capability is widely used, such web interface
> sooner or later comes into being.
>
> > New idea: we make this configureable and allow 3 types of hash functions:
> > 1) default as-is, perfect for marks 0..255
> > 2) all bits taken into account (your patch)
> > 3) bitmask + shift provided by the user just like
> > dsmark.
> >
> > Thoughts?
>
> Your suggestion is very considerable. But that needs some more work. And,
> isn't that some bloated?
>
Why dont you run a quick test? Very easy to do in user space.
Enter two sets of values using the two different approaches; yours and
the current way tc uses nfmark (incremental). And then apply the jenkins
approach you had to see how well it looks like? I thinkw e know how it
will look with current hash - but if you can show its not so bad in the
case of jenkins as well it may be an acceptable approach,
cheers,
jamal
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 12:52 ` Thomas Graf
@ 2005-04-05 13:29 ` Wang Jian
0 siblings, 0 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-05 13:29 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, jamal
Hi Thomas Graf,
On Tue, 5 Apr 2005 14:52:37 +0200, Thomas Graf <tgraf@suug.ch> wrote:
>
> > When linux's QoS control capability is widely used, such web interface
> > sooner or later comes into being.
>
> That might be true but I will never ack on something that makes zero
> collision use of cls_fw impossible. I'm all for improving this but
> not at the cost of reduced performance for the most obvious use case
> of cls_fw.
Agree.
>
> > Your suggestion is very considerable. But that needs some more work. And,
> > isn't that some bloated?
>
> The shift + bitmask might be bloated and can be deferred a bit until
> someone comes up with this need. I can cook up a patch for this
> if you want, it's not much work.
I am glad that you are willing to help to hook them up. I really need
the new hashing way.
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 12:54 ` jamal
@ 2005-04-05 14:18 ` Wang Jian
2005-04-05 16:11 ` jamal
0 siblings, 1 reply; 33+ messages in thread
From: Wang Jian @ 2005-04-05 14:18 UTC (permalink / raw)
To: hadi; +Cc: Thomas Graf, netdev
Hi jamal,
On 05 Apr 2005 08:54:49 -0400, jamal <hadi@cyberus.ca> wrote:
>
> Why dont you run a quick test? Very easy to do in user space.
> Enter two sets of values using the two different approaches; yours and
> the current way tc uses nfmark (incremental). And then apply the jenkins
> approach you had to see how well it looks like? I thinkw e know how it
> will look with current hash - but if you can show its not so bad in the
> case of jenkins as well it may be an acceptable approach,
>
I am not saying that we must use jenkins. We may use a less expensive
hash function than jenkins, but better than & 0xFF.
Anyway, I have done userspace test for jhash. The following test is done
in a AMD Athlon 800MHz without other CPU load.
-- snip jhash_test.c --
typedef unsigned long u32;
typedef unsigned char u8;
#include <linux/jhash.h>
#include <stdlib.h>
int main(void)
{
u32 i;
u32 h;
for (i = 0; i < 10000000; i++) {
h = jhash_1word(i, 0xF30A7129) & 0xFFL;
// printf("h is %u\n", h);
}
return 0;
}
-- snip --
[root@qos ~]# gcc jhash_test.c
[root@qos ~]# time ./a.out
0.77user 0.00system 0:00.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
--snip simple_hash.c --
typedef unsigned long u32;
typedef unsigned char u8;
#include <stdlib.h>
int main(void)
{
u32 i;
u32 h;
for (i = 0; i < 10000000; i++) {
h = i & 0xFF;
// printf("h is %u\n", h);
}
return 0;
}
-- snip --
[root@qos ~]# gcc simple_hash.c
[root@qos ~]# time ./a.out
0.02user 0.00system 0:00.02elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
The simple method is far better in performance. For extreme situation,
100Mbps ethernet has about 148800 pps for TCP. Replace 10000000 with
150000.
[root@qos ~]# time ./a.out
0.01user 0.00system 0:00.01elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
So use jhash is not big deal at 100Mbps.
For 1000Mbps ethernet, replace 10000000 with 1489000.
[root@qos ~]# time ./a.out
0.11user 0.00system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
It's expected that a more hot CPU is used for GE, for example, 2.4GHz
CPU. So
0.11 / (2.4/0.8) = 0.04.
This is still not a big problem for a dedicated linux box for qos
control. We know that 500Mbps is already a bottleneck here.
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 14:18 ` Wang Jian
@ 2005-04-05 16:11 ` jamal
2005-04-06 6:45 ` Wang Jian
0 siblings, 1 reply; 33+ messages in thread
From: jamal @ 2005-04-05 16:11 UTC (permalink / raw)
To: Wang Jian; +Cc: Thomas Graf, netdev
Hi Wang,
On Tue, 2005-04-05 at 10:18, Wang Jian wrote:
> I am not saying that we must use jenkins. We may use a less expensive
> hash function than jenkins, but better than & 0xFF.
>
Sure; point is as long as it doesnt destroy the common use in place.
>Anyway, I have done userspace test for jhash. The following test
> is done in a AMD Athlon 800MHz without other CPU load.
>
No, the test i was asking for is to show distribution of the
hash not how long it took (which is also an ok test).
i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024
samples of the data you showed earlier for how fwmark looks like,
how well would the result look like.
And what if you fed it with something like 1024 incremental fwmark from
say 1..1024?
cheers,
jamal
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-05 16:11 ` jamal
@ 2005-04-06 6:45 ` Wang Jian
2005-04-06 12:16 ` jamal
2005-04-06 12:30 ` Thomas Graf
0 siblings, 2 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-06 6:45 UTC (permalink / raw)
To: hadi; +Cc: Thomas Graf, netdev
[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]
Hi jamal,
On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote:
> Hi Wang,
>
> On Tue, 2005-04-05 at 10:18, Wang Jian wrote:
>
> > I am not saying that we must use jenkins. We may use a less expensive
> > hash function than jenkins, but better than & 0xFF.
> >
>
> Sure; point is as long as it doesnt destroy the common use in place.
>
> >Anyway, I have done userspace test for jhash. The following test
> > is done in a AMD Athlon 800MHz without other CPU load.
> >
>
> No, the test i was asking for is to show distribution of the
> hash not how long it took (which is also an ok test).
Sorry, the test is not on the AMD Athlon 800MHz, but a 2.4G Pentium4! I
didn't notice that. So the jhash's performance is not fast enough.
On AMD 800Mhz
line speed @ 1000Mbps
[root@home ~]# time ./a.out
0.31user 0.00system 0:00.32elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
line speed @ 100Mbps
[root@home ~]# time ./a.out
0.03user 0.00system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
>
> i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024
> samples of the data you showed earlier for how fwmark looks like,
> how well would the result look like.
> And what if you fed it with something like 1024 incremental fwmark from
> say 1..1024?
>
The test result looks not good. See attached file.
So let's find a better way.
--
lark
[-- Attachment #2: jhash-dist.txt --]
[-- Type: application/octet-stream, Size: 5499 bytes --]
$VAR1 = {
'32' => 1,
'127' => 3,
'90' => 6,
'118' => 3,
'206' => 2,
'71' => 2,
'102' => 7,
'200' => 4,
'18' => 7,
'125' => 4,
'16' => 8,
'44' => 2,
'55' => 4,
'84' => 6,
'27' => 4,
'190' => 5,
'233' => 4,
'161' => 9,
'57' => 2,
'194' => 3,
'95' => 5,
'220' => 2,
'20' => 2,
'163' => 2,
'231' => 1,
'109' => 4,
'243' => 3,
'151' => 3,
'89' => 3,
'175' => 3,
'148' => 7,
'31' => 2,
'35' => 2,
'11' => 5,
'208' => 5,
'78' => 4,
'93' => 2,
'106' => 1,
'157' => 5,
'65' => 3,
'29' => 5,
'197' => 7,
'203' => 2,
'138' => 1,
'114' => 4,
'199' => 6,
'226' => 2,
'58' => 7,
'15' => 3,
'153' => 1,
'211' => 5,
'137' => 1,
'81' => 5,
'60' => 3,
'73' => 2,
'101' => 5,
'76' => 5,
'86' => 6,
'62' => 6,
'247' => 2,
'67' => 4,
'204' => 3,
'241' => 3,
'165' => 7,
'139' => 6,
'198' => 6,
'129' => 7,
'2' => 3,
'17' => 5,
'186' => 4,
'82' => 6,
'110' => 2,
'147' => 9,
'228' => 2,
'236' => 4,
'249' => 3,
'202' => 6,
'168' => 3,
'218' => 1,
'14' => 1,
'135' => 3,
'184' => 5,
'69' => 5,
'112' => 7,
'172' => 3,
'145' => 4,
'49' => 4,
'178' => 3,
'24' => 5,
'224' => 3,
'140' => 7,
'187' => 4,
'223' => 5,
'124' => 1,
'104' => 3,
'131' => 5,
'181' => 2,
'234' => 4,
'79' => 3,
'121' => 4,
'212' => 3,
'154' => 6,
'0' => 5,
'23' => 1,
'96' => 7,
'126' => 3,
'238' => 3,
'159' => 6,
'251' => 3,
'253' => 2,
'160' => 6,
'176' => 2,
'47' => 7,
'8' => 6,
'98' => 4,
'209' => 4,
'216' => 4,
'37' => 4,
'117' => 2,
'43' => 4,
'195' => 7,
'5' => 4,
'170' => 4,
'33' => 5,
'63' => 5,
'21' => 6,
'7' => 5,
'227' => 3,
'80' => 4,
'26' => 4,
'193' => 1,
'119' => 3,
'99' => 1,
'180' => 3,
'244' => 4,
'162' => 2,
'72' => 4,
'179' => 3,
'255' => 3,
'246' => 3,
'74' => 4,
'240' => 5,
'182' => 2,
'61' => 5,
'230' => 4,
'108' => 2,
'115' => 6,
'92' => 5,
'103' => 5,
'201' => 2,
'232' => 4,
'10' => 3,
'113' => 6,
'152' => 2,
'189' => 6,
'225' => 3,
'207' => 2,
'142' => 5,
'91' => 1,
'167' => 5,
'48' => 6,
'107' => 4,
'87' => 3,
'77' => 2,
'174' => 6,
'214' => 4,
'133' => 4,
'149' => 3,
'123' => 4,
'221' => 2,
'50' => 7,
'39' => 1,
'210' => 2,
'64' => 2,
'97' => 5,
'41' => 4,
'12' => 4,
'52' => 4,
'173' => 1,
'56' => 3,
'229' => 2,
'66' => 4,
'45' => 6,
'19' => 4,
'54' => 4,
'237' => 2,
'70' => 4,
'188' => 4,
'68' => 4,
'166' => 2,
'1' => 2,
'136' => 6,
'116' => 8,
'88' => 2,
'30' => 2,
'141' => 4,
'144' => 5,
'100' => 4,
'222' => 6,
'128' => 3,
'25' => 5,
'252' => 3,
'28' => 5,
'120' => 2,
'75' => 2,
'83' => 5,
'40' => 2,
'134' => 5,
'156' => 5,
'192' => 6,
'250' => 8,
'59' => 7,
'215' => 4,
'254' => 3,
'177' => 2,
'150' => 3,
'130' => 3,
'155' => 4,
'53' => 4,
'217' => 6,
'245' => 4,
'239' => 4,
'122' => 1,
'143' => 6,
'205' => 2,
'42' => 3,
'158' => 5,
'22' => 4,
'46' => 9,
'219' => 5,
'13' => 2,
'105' => 3,
'235' => 9,
'6' => 1,
'85' => 4,
'185' => 5,
'3' => 5,
'36' => 5,
'183' => 6,
'94' => 7,
'213' => 5,
'248' => 5,
'146' => 4,
'111' => 5,
'9' => 6,
'38' => 5,
'4' => 6,
'34' => 9,
'132' => 3,
'169' => 7,
'164' => 6,
'196' => 5,
'171' => 4,
'242' => 5
};
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 6:45 ` Wang Jian
@ 2005-04-06 12:16 ` jamal
2005-04-06 12:30 ` Thomas Graf
1 sibling, 0 replies; 33+ messages in thread
From: jamal @ 2005-04-06 12:16 UTC (permalink / raw)
To: Wang Jian; +Cc: Thomas Graf, netdev
On Wed, 2005-04-06 at 02:45, Wang Jian wrote:
> The test result looks not good. See attached file.
>
Yes, that doesnt look too good.
> So let's find a better way.
Thomas?
cheers,
jamal
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 6:45 ` Wang Jian
2005-04-06 12:16 ` jamal
@ 2005-04-06 12:30 ` Thomas Graf
2005-04-06 13:01 ` Wang Jian
` (2 more replies)
1 sibling, 3 replies; 33+ messages in thread
From: Thomas Graf @ 2005-04-06 12:30 UTC (permalink / raw)
To: Wang Jian; +Cc: hadi, netdev
* Wang Jian <20050406143842.026B.LARK@linux.net.cn> 2005-04-06 14:45
> On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote:
> > i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024
> > samples of the data you showed earlier for how fwmark looks like,
> > how well would the result look like.
> > And what if you fed it with something like 1024 incremental fwmark from
> > say 1..1024?
> >
>
> The test result looks not good. See attached file.
>
> So let's find a better way.
We need to provide some kind of option to the user so he can specify
the needs. The & 0xFF will suit most just fine but has one essential
drawback which is that no distribution is done at all if the lower 8
bits are set to 0. For static marks this is no issue at all and even
for enumerated marks growing it takes quite some time to grow into
an area where it starts hurting. The problem obviously is if someone
splits the mark field into 2 parts and uses the upper 16 bits for
some special purpose just like you did. In such as case it would make
sense to either take all bits into account or let the user specify
a bitmask + shift.
So here is the same idea I posted before but revised:
Let the user specify one of the hash tables via a new TLV:
- default: & 0xFF
- ((mark & mask) >> shift) & 0xFF
- jenkins for 16, 32, and 64 bits
- FNV for 16, 32, and 64 bits
Why variations for type sizes? The chance of collisions reduces
a lot if the user exactly knows he'll never use more than 16bits
but 255 marks are not enough.
I'm cooking up a patch for this today together with a fix to
allow 64bit values for the mark.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 12:30 ` Thomas Graf
@ 2005-04-06 13:01 ` Wang Jian
2005-04-06 13:34 ` jamal
2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann
2 siblings, 0 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-06 13:01 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, netdev
Hi Thomas Graf,
On Wed, 6 Apr 2005 14:30:36 +0200, Thomas Graf <tgraf@suug.ch> wrote:
> * Wang Jian <20050406143842.026B.LARK@linux.net.cn> 2005-04-06 14:45
> > On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote:
> > > i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024
> > > samples of the data you showed earlier for how fwmark looks like,
> > > how well would the result look like.
> > > And what if you fed it with something like 1024 incremental fwmark from
> > > say 1..1024?
> > >
> >
> > The test result looks not good. See attached file.
> >
> > So let's find a better way.
>
> We need to provide some kind of option to the user so he can specify
> the needs. The & 0xFF will suit most just fine but has one essential
> drawback which is that no distribution is done at all if the lower 8
> bits are set to 0. For static marks this is no issue at all and even
> for enumerated marks growing it takes quite some time to grow into
> an area where it starts hurting. The problem obviously is if someone
> splits the mark field into 2 parts and uses the upper 16 bits for
> some special purpose just like you did. In such as case it would make
> sense to either take all bits into account or let the user specify
> a bitmask + shift.
>
> So here is the same idea I posted before but revised:
>
> Let the user specify one of the hash tables via a new TLV:
> - default: & 0xFF
> - ((mark & mask) >> shift) & 0xFF
> - jenkins for 16, 32, and 64 bits
> - FNV for 16, 32, and 64 bits
>
> Why variations for type sizes? The chance of collisions reduces
> a lot if the user exactly knows he'll never use more than 16bits
> but 255 marks are not enough.
>
> I'm cooking up a patch for this today together with a fix to
> allow 64bit values for the mark.
Given that no 1-hash-fit-all solution exists, I think your solution is
quite elegant.
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 12:30 ` Thomas Graf
2005-04-06 13:01 ` Wang Jian
@ 2005-04-06 13:34 ` jamal
2005-04-06 13:45 ` Thomas Graf
2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann
2 siblings, 1 reply; 33+ messages in thread
From: jamal @ 2005-04-06 13:34 UTC (permalink / raw)
To: Thomas Graf; +Cc: Wang Jian, netdev
On Wed, 2005-04-06 at 08:30, Thomas Graf wrote:
> Let the user specify one of the hash tables via a new TLV:
> - default: & 0xFF
> - ((mark & mask) >> shift) & 0xFF
> - jenkins for 16, 32, and 64 bits
> - FNV for 16, 32, and 64 bits
When does the user specify this? i would think you need to set it once
only per bootup. After that it will be quiet complex to reset. It could
be done but sounds complex and unnecessary. i.e it may be a boot or
module parameter more than it is a netlink controlled value, no?
cheers,
jamal
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 12:30 ` Thomas Graf
2005-04-06 13:01 ` Wang Jian
2005-04-06 13:34 ` jamal
@ 2005-04-06 13:36 ` Eran Mann
2005-04-06 13:53 ` Wang Jian
2 siblings, 1 reply; 33+ messages in thread
From: Eran Mann @ 2005-04-06 13:36 UTC (permalink / raw)
To: Thomas Graf; +Cc: Wang Jian, hadi, netdev
Thomas Graf wrote:
> We need to provide some kind of option to the user so he can specify
> the needs. The & 0xFF will suit most just fine but has one essential
> drawback which is that no distribution is done at all if the lower 8
> bits are set to 0. For static marks this is no issue at all and even
> for enumerated marks growing it takes quite some time to grow into
> an area where it starts hurting. The problem obviously is if someone
> splits the mark field into 2 parts and uses the upper 16 bits for
> some special purpose just like you did. In such as case it would make
> sense to either take all bits into account or let the user specify
> a bitmask + shift.
>
> So here is the same idea I posted before but revised:
>
> Let the user specify one of the hash tables via a new TLV:
> - default: & 0xFF
> - ((mark & mask) >> shift) & 0xFF
> - jenkins for 16, 32, and 64 bits
> - FNV for 16, 32, and 64 bits
>
> Why variations for type sizes? The chance of collisions reduces
> a lot if the user exactly knows he'll never use more than 16bits
> but 255 marks are not enough.
>
> I'm cooking up a patch for this today together with a fix to
> allow 64bit values for the mark.
>
Maybe you could add to the list of options a XOR of bytes hash,
something like:
static inline u8 bytes_xor_u32( u32 key )
{
u8 *dummy_array = (u8 *)&key;
u8 hash = dummy_array[0] ^ dummy_array[1] ^
dummy_array[2] ^ dummy_array[3];
return hash;
}
--
Eran Mann
MRV International
www.mrv.com
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 13:34 ` jamal
@ 2005-04-06 13:45 ` Thomas Graf
2005-04-06 14:10 ` Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-06 13:45 UTC (permalink / raw)
To: jamal; +Cc: Wang Jian, netdev
* jamal <1112794459.1096.61.camel@jzny.localdomain> 2005-04-06 09:34
> On Wed, 2005-04-06 at 08:30, Thomas Graf wrote:
>
> > Let the user specify one of the hash tables via a new TLV:
> > - default: & 0xFF
> > - ((mark & mask) >> shift) & 0xFF
> > - jenkins for 16, 32, and 64 bits
> > - FNV for 16, 32, and 64 bits
>
> When does the user specify this? i would think you need to set it once
> only per bootup. After that it will be quiet complex to reset. It could
> be done but sounds complex and unnecessary. i.e it may be a boot or
> module parameter more than it is a netlink controlled value, no?
I'm not 100% sure about this yet but I think during fw_change so
we can have different hashes for different qdiscs filter chains.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann
@ 2005-04-06 13:53 ` Wang Jian
0 siblings, 0 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-06 13:53 UTC (permalink / raw)
To: Eran Mann; +Cc: Thomas Graf, hadi, netdev
Hi Eran Mann,
On Wed, 06 Apr 2005 16:36:20 +0300, emann@mrv.com (Eran Mann) wrote:
>
> Maybe you could add to the list of options a XOR of bytes hash,
> something like:
> static inline u8 bytes_xor_u32( u32 key )
> {
> u8 *dummy_array = (u8 *)&key;
> u8 hash = dummy_array[0] ^ dummy_array[1] ^
> dummy_array[2] ^ dummy_array[3];
>
> return hash;
> }
>
Ha, then this hi-or-lo word hash, which assumes that high or low word is
0, and key is distributed in the old fashion (0..255, etc)
static inline u32 hi_or_lo_hash(u32 key)
{
return (key >> 16 ^ key) & 0xFF;
}
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 13:45 ` Thomas Graf
@ 2005-04-06 14:10 ` Thomas Graf
2005-04-06 18:15 ` David S. Miller
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-06 14:10 UTC (permalink / raw)
To: jamal; +Cc: Wang Jian, netdev
* Thomas Graf <20050406134502.GP26731@postel.suug.ch> 2005-04-06 15:45
> * jamal <1112794459.1096.61.camel@jzny.localdomain> 2005-04-06 09:34
> > When does the user specify this? i would think you need to set it once
> > only per bootup. After that it will be quiet complex to reset. It could
> > be done but sounds complex and unnecessary. i.e it may be a boot or
> > module parameter more than it is a netlink controlled value, no?
>
> I'm not 100% sure about this yet but I think during fw_change so
> we can have different hashes for different qdiscs filter chains.
The thing I'm worrying about is that I don't want to break the perfect
alignment of fw_head to good slab obj sizes but I guess there is no
way around. I'd really like to make hash size and hash function
configureable. For example a hash size of 1024 would perform much
better and would still fit into a single page on most systems.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 14:10 ` Thomas Graf
@ 2005-04-06 18:15 ` David S. Miller
2005-04-06 18:31 ` Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: David S. Miller @ 2005-04-06 18:15 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, lark, netdev
On Wed, 6 Apr 2005 16:10:20 +0200
Thomas Graf <tgraf@suug.ch> wrote:
> The thing I'm worrying about is that I don't want to break the perfect
> alignment of fw_head to good slab obj sizes but I guess there is no
> way around. I'd really like to make hash size and hash function
> configureable. For example a hash size of 1024 would perform much
> better and would still fit into a single page on most systems.
I think a hash xor'ing in the high bits into the low 8 bits, as has
been suggested a few times already, meets your criteria and solves
Lark's problem.
The hash table size, if still an issue, can be dynamically sized based
upon some criteria with some reasonable default initial selection
(such as the current 256).
I think dynamic hash function selection is madness. You have a key
that needs to be hashed, you are aware of one very common usage
(the one that perfectly hashes currently) and you are also aware of
a method by which the cases that don't fit into that mold can be made
to perform acceptably too (xor'ing in the upper bits). We can thus
do this with one single hash function.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function
2005-04-06 18:15 ` David S. Miller
@ 2005-04-06 18:31 ` Thomas Graf
2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-06 18:31 UTC (permalink / raw)
To: David S. Miller; +Cc: hadi, lark, netdev
* David S. Miller <20050406111509.0462abcf.davem@davemloft.net> 2005-04-06 11:15
> On Wed, 6 Apr 2005 16:10:20 +0200
> Thomas Graf <tgraf@suug.ch> wrote:
>
> > The thing I'm worrying about is that I don't want to break the perfect
> > alignment of fw_head to good slab obj sizes but I guess there is no
> > way around. I'd really like to make hash size and hash function
> > configureable. For example a hash size of 1024 would perform much
> > better and would still fit into a single page on most systems.
>
> I think a hash xor'ing in the high bits into the low 8 bits, as has
> been suggested a few times already, meets your criteria and solves
> Lark's problem.
Yes, it sounds pretty good. I can't find any scenario where it
would perform unacceptable, it's not perfect but fair enough
for everyone I guess.
> The hash table size, if still an issue, can be dynamically sized based
> upon some criteria with some reasonable default initial selection
> (such as the current 256).
I guess sometimes it would make a lot of sense to spend a few dozen
pages for the hashtable but it's not worth to play games with the
perfectly aligned fw_head just in case. PAGESIZE/address_size would
make sense though. If someone needs something bigger it can be changed
in the source.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC] dynamic hash table size & xor hash function for cls_fw
2005-04-06 18:31 ` Thomas Graf
@ 2005-04-07 0:55 ` Thomas Graf
2005-04-07 10:38 ` jamal
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-07 0:55 UTC (permalink / raw)
To: David S. Miller; +Cc: hadi, lark, netdev
* Thomas Graf <20050406183134.GR26731@postel.suug.ch> 2005-04-06 20:31
> Yes, it sounds pretty good. I can't find any scenario where it
> would perform unacceptable, it's not perfect but fair enough
> for everyone I guess.
Changes the hashtable size to PAGE_SIZE/sizeof(unsigned long) and
adapts the hashing algorithm to xor various buckets in order to
maintain collision free hashing for the most common use case
while providing some kind of distribution for more rare cases.
The hashing function looks really ugly, it is, but gcc will optimize
away all the uneeded branches which will make it short again. The hash
case for 10bits hash table ignores the most significant 2 bits, I
don't think this has any impact but would rather be wasted cycles.
It looks inefficient but the manual shifting & xoring only takes one
instruction more than the HTSIZE == 256 case on my x86 box so the
increased hashing area should be worth it.
Thoughts?
===== net/sched/cls_fw.c 1.21 vs edited =====
--- 1.21/net/sched/cls_fw.c 2005-03-29 02:45:46 +02:00
+++ edited/net/sched/cls_fw.c 2005-04-07 02:41:46 +02:00
@@ -46,9 +46,11 @@
#include <net/act_api.h>
#include <net/pkt_cls.h>
+#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *))
+
struct fw_head
{
- struct fw_filter *ht[256];
+ struct fw_filter *ht[HTSIZE];
};
struct fw_filter
@@ -69,7 +71,28 @@
static __inline__ int fw_hash(u32 handle)
{
- return handle&0xFF;
+ if (HTSIZE == 4096)
+ return ((handle >> 24) & 0xFFF) ^
+ ((handle >> 12) & 0xFFF) ^
+ (handle & 0xFFF);
+ else if (HTSIZE == 2048)
+ return ((handle >> 22) & 0x7FF) ^
+ ((handle >> 11) & 0x7FF) ^
+ (handle & 0x7FF);
+ else if (HTSIZE == 1024)
+ return ((handle >> 20) & 0x3FF) ^
+ ((handle >> 10) & 0x3FF) ^
+ (handle & 0x3FF);
+ else if (HTSIZE == 512)
+ return (handle >> 27) ^
+ ((handle >> 18) & 0x1FF) ^
+ ((handle >> 9) & 0x1FF) ^
+ (handle & 0x1FF);
+ else if (HTSIZE == 256) {
+ u8 *t = (u8 *) &handle;
+ return t[0] ^ t[1] ^ t[2] ^ t[3];
+ } else
+ return handle & (HTSIZE - 1);
}
static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
@@ -152,7 +175,7 @@
if (head == NULL)
return;
- for (h=0; h<256; h++) {
+ for (h=0; h<HTSIZE; h++) {
while ((f=head->ht[h]) != NULL) {
head->ht[h] = f->next;
fw_delete_filter(tp, f);
@@ -291,7 +314,7 @@
if (arg->stop)
return;
- for (h = 0; h < 256; h++) {
+ for (h = 0; h < HTSIZE; h++) {
struct fw_filter *f;
for (f = head->ht[h]; f; f = f->next) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw
2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf
@ 2005-04-07 10:38 ` jamal
2005-04-07 10:47 ` Wang Jian
2005-04-07 10:51 ` Thomas Graf
0 siblings, 2 replies; 33+ messages in thread
From: jamal @ 2005-04-07 10:38 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, lark, netdev
On Wed, 2005-04-06 at 20:55, Thomas Graf wrote:
>
> static __inline__ int fw_hash(u32 handle)
> {
> - return handle&0xFF;
> + if (HTSIZE == 4096)
> + return ((handle >> 24) & 0xFFF) ^
> + ((handle >> 12) & 0xFFF) ^
> + (handle & 0xFFF);
> + else if (HTSIZE == 2048)
> + return ((handle >> 22) & 0x7FF) ^
> + ((handle >> 11) & 0x7FF) ^
> + (handle & 0x7FF);
> + else if (HTSIZE == 1024)
> + return ((handle >> 20) & 0x3FF) ^
> + ((handle >> 10) & 0x3FF) ^
> + (handle & 0x3FF);
> + else if (HTSIZE == 512)
> + return (handle >> 27) ^
> + ((handle >> 18) & 0x1FF) ^
> + ((handle >> 9) & 0x1FF) ^
> + (handle & 0x1FF);
> + else if (HTSIZE == 256) {
> + u8 *t = (u8 *) &handle;
> + return t[0] ^ t[1] ^ t[2] ^ t[3];
> + } else
> + return handle & (HTSIZE - 1);
> }
Does HTSIZE change at runtime? How does migrating from one to other take
place?
Also why not have a function pointer with a series of these being
separate instead of doing the if checks? BTW it does seem any one of
those hashes maybe sufficient, no?
cheers,
jamal
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw
2005-04-07 10:38 ` jamal
@ 2005-04-07 10:47 ` Wang Jian
2005-04-07 10:51 ` Thomas Graf
1 sibling, 0 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-07 10:47 UTC (permalink / raw)
To: hadi; +Cc: Thomas Graf, David S. Miller, netdev
Hi jamal,
I think Thomas decide to only support one hash function at compile time,
and no switch at runtime.
HSIZE is a constant so the if branch will be optimized by gcc at compile
time. only one hash is left.
On 07 Apr 2005 06:38:27 -0400, jamal <hadi@cyberus.ca> wrote:
> On Wed, 2005-04-06 at 20:55, Thomas Graf wrote:
>
> >
> > static __inline__ int fw_hash(u32 handle)
> > {
> > - return handle&0xFF;
> > + if (HTSIZE == 4096)
> > + return ((handle >> 24) & 0xFFF) ^
> > + ((handle >> 12) & 0xFFF) ^
> > + (handle & 0xFFF);
> > + else if (HTSIZE == 2048)
> > + return ((handle >> 22) & 0x7FF) ^
> > + ((handle >> 11) & 0x7FF) ^
> > + (handle & 0x7FF);
> > + else if (HTSIZE == 1024)
> > + return ((handle >> 20) & 0x3FF) ^
> > + ((handle >> 10) & 0x3FF) ^
> > + (handle & 0x3FF);
> > + else if (HTSIZE == 512)
> > + return (handle >> 27) ^
> > + ((handle >> 18) & 0x1FF) ^
> > + ((handle >> 9) & 0x1FF) ^
> > + (handle & 0x1FF);
> > + else if (HTSIZE == 256) {
> > + u8 *t = (u8 *) &handle;
> > + return t[0] ^ t[1] ^ t[2] ^ t[3];
> > + } else
> > + return handle & (HTSIZE - 1);
> > }
>
> Does HTSIZE change at runtime? How does migrating from one to other take
> place?
> Also why not have a function pointer with a series of these being
> separate instead of doing the if checks? BTW it does seem any one of
> those hashes maybe sufficient, no?
>
> cheers,
> jamal
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw
2005-04-07 10:38 ` jamal
2005-04-07 10:47 ` Wang Jian
@ 2005-04-07 10:51 ` Thomas Graf
2005-04-07 11:07 ` jamal
1 sibling, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-07 10:51 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, lark, netdev
* jamal <1112870307.1118.91.camel@jzny.localdomain> 2005-04-07 06:38
> On Wed, 2005-04-06 at 20:55, Thomas Graf wrote:
>
> >
> > static __inline__ int fw_hash(u32 handle)
> > {
> > - return handle&0xFF;
> > + if (HTSIZE == 4096)
> > + return ((handle >> 24) & 0xFFF) ^
> > + ((handle >> 12) & 0xFFF) ^
> > + (handle & 0xFFF);
> > + else if (HTSIZE == 2048)
> > + return ((handle >> 22) & 0x7FF) ^
> > + ((handle >> 11) & 0x7FF) ^
> > + (handle & 0x7FF);
> > + else if (HTSIZE == 1024)
> > + return ((handle >> 20) & 0x3FF) ^
> > + ((handle >> 10) & 0x3FF) ^
> > + (handle & 0x3FF);
> > + else if (HTSIZE == 512)
> > + return (handle >> 27) ^
> > + ((handle >> 18) & 0x1FF) ^
> > + ((handle >> 9) & 0x1FF) ^
> > + (handle & 0x1FF);
> > + else if (HTSIZE == 256) {
> > + u8 *t = (u8 *) &handle;
> > + return t[0] ^ t[1] ^ t[2] ^ t[3];
> > + } else
> > + return handle & (HTSIZE - 1);
> > }
>
> Does HTSIZE change at runtime? How does migrating from one to other take
> place?
No, the size is static (PAGE_SIZE/sizeof(unsigned long)) (typically 1024).
> Also why not have a function pointer with a series of these being
> separate instead of doing the if checks?
Because it is static, gcc will optimize and remove all branches leaving
only the one that is actually needed, i.e. on most systems the hash
function will get down to the contents of the if (HTSIZE == 1024) branch.
> BTW it does seem any one of those hashes maybe sufficient, no?
If we want to do the xor'ing to maintain collision free hashing
for the most common case we need to separate. It would be possible
to play games and calculate the shifts etc but gcc has a hard
time optimizing such things.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw
2005-04-07 10:51 ` Thomas Graf
@ 2005-04-07 11:07 ` jamal
2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: jamal @ 2005-04-07 11:07 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, lark, netdev
ok, gotcha - good idea to leave it as is..
cheers,
jamal
On Thu, 2005-04-07 at 06:51, Thomas Graf wrote:
> * jamal <1112870307.1118.91.camel@jzny.localdomain> 2005-04-07 06:38
> > On Wed, 2005-04-06 at 20:55, Thomas Graf wrote:
> >
> > >
> > > static __inline__ int fw_hash(u32 handle)
> > > {
> > > - return handle&0xFF;
> > > + if (HTSIZE == 4096)
> > > + return ((handle >> 24) & 0xFFF) ^
> > > + ((handle >> 12) & 0xFFF) ^
> > > + (handle & 0xFFF);
> > > + else if (HTSIZE == 2048)
> > > + return ((handle >> 22) & 0x7FF) ^
> > > + ((handle >> 11) & 0x7FF) ^
> > > + (handle & 0x7FF);
> > > + else if (HTSIZE == 1024)
> > > + return ((handle >> 20) & 0x3FF) ^
> > > + ((handle >> 10) & 0x3FF) ^
> > > + (handle & 0x3FF);
> > > + else if (HTSIZE == 512)
> > > + return (handle >> 27) ^
> > > + ((handle >> 18) & 0x1FF) ^
> > > + ((handle >> 9) & 0x1FF) ^
> > > + (handle & 0x1FF);
> > > + else if (HTSIZE == 256) {
> > > + u8 *t = (u8 *) &handle;
> > > + return t[0] ^ t[1] ^ t[2] ^ t[3];
> > > + } else
> > > + return handle & (HTSIZE - 1);
> > > }
> >
> > Does HTSIZE change at runtime? How does migrating from one to other take
> > place?
>
> No, the size is static (PAGE_SIZE/sizeof(unsigned long)) (typically 1024).
>
> > Also why not have a function pointer with a series of these being
> > separate instead of doing the if checks?
>
> Because it is static, gcc will optimize and remove all branches leaving
> only the one that is actually needed, i.e. on most systems the hash
> function will get down to the contents of the if (HTSIZE == 1024) branch.
>
> > BTW it does seem any one of those hashes maybe sufficient, no?
>
> If we want to do the xor'ing to maintain collision free hashing
> for the most common case we need to separate. It would be possible
> to play games and calculate the shifts etc but gcc has a hard
> time optimizing such things.
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw
2005-04-07 11:07 ` jamal
@ 2005-04-07 13:09 ` Thomas Graf
2005-04-07 13:31 ` Wang Jian
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-07 13:09 UTC (permalink / raw)
To: David S. Miller; +Cc: lark, Jamal Hadi Salim, netdev
Some test results for the new hashing
Test-1: hash 0..2147483648
old 256 buckets hash (enum):
empty buckets: 0 average chain length: 8388607.996 min: 8388607 max: 8388608
new 1024 buckets hash (enum):
empty buckets: 0 average chain length: 2097151.999 min: 2097151 max: 2097152
Test-2: hash 65536*rand()
old 256 buckets hash (rand):
empty buckets: 0 average chain length: 256.000 min: 222 max: 296
new 1024 buckets hash (rand):
empty buckets: 0 average chain length: 64.000 min: 40 max: 93
Test-3: hash 1024*rand()
old 256 buckets hash (rand):
empty buckets: 3 average chain length: 4.047 min: 0 max: 9
new 1024 buckets hash (rand):
empty buckets: 380 average chain length: 1.590 min: 0 max: 5
Test-4: total time for 2147483648 hashes, 10 runs
old hashing algorithm new hashing algorithm
1.619s 1.610s
1.623s 1.611s
1.625s 1.612s
1.629s 1.614s
1.633s 1.616s
1.637s 1.621s
1.638s 1.625s
1.639s 1.630s
1.644s 1.632s
1.737s 1.658s
Please do a
bk pull bk://kernel.bkbits.net/tgraf/net-2.6-cls_fw
This will update the following files:
net/sched/cls_fw.c | 31 +++++++++++++++++++++++++++----
1 files changed, 27 insertions(+), 4 deletions(-)
through these ChangeSets:
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2005/04/07 14:23:09+02:00 tgraf@suug.ch
# [PKT_SCHED]: improve hashing performance of cls_fw
#
# Calculate hashtable size to fit into a page instead of a hardcoded
# 256 buckets hash table. Results in a 1024 buckets hashtable on
# most systems.
#
# Replace old naive extract-8-lsb-bits algorithm with a better
# algorithm xor'ing 3 or 4 bit fields at the size of the hashtable
# array index in order to improve distribution if the majority of
# the lower bits are unused while keeping zero collision behaviour
# for the most common use case.
#
# Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue
# to attention and to Eran Mann <emann@mrv.com> for the initial
# idea for this new algorithm.
#
# Signed-off-by: Thomas Graf <tgraf@suug.ch>
# Signed-off-by: David S. Miller <davem@davemloft.net>
#
# net/sched/cls_fw.c
# 2005/04/07 14:22:58+02:00 tgraf@suug.ch +27 -4
# [PKT_SCHED]: improve hashing performance of cls_fw
#
diff -Nru a/net/sched/cls_fw.c b/net/sched/cls_fw.c
--- a/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00
+++ b/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00
@@ -46,9 +46,11 @@
#include <net/act_api.h>
#include <net/pkt_cls.h>
+#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *))
+
struct fw_head
{
- struct fw_filter *ht[256];
+ struct fw_filter *ht[HTSIZE];
};
struct fw_filter
@@ -69,7 +71,28 @@
static __inline__ int fw_hash(u32 handle)
{
- return handle&0xFF;
+ if (HTSIZE == 4096)
+ return ((handle >> 24) & 0xFFF) ^
+ ((handle >> 12) & 0xFFF) ^
+ (handle & 0xFFF);
+ else if (HTSIZE == 2048)
+ return ((handle >> 22) & 0x7FF) ^
+ ((handle >> 11) & 0x7FF) ^
+ (handle & 0x7FF);
+ else if (HTSIZE == 1024)
+ return ((handle >> 20) & 0x3FF) ^
+ ((handle >> 10) & 0x3FF) ^
+ (handle & 0x3FF);
+ else if (HTSIZE == 512)
+ return (handle >> 27) ^
+ ((handle >> 18) & 0x1FF) ^
+ ((handle >> 9) & 0x1FF) ^
+ (handle & 0x1FF);
+ else if (HTSIZE == 256) {
+ u8 *t = (u8 *) &handle;
+ return t[0] ^ t[1] ^ t[2] ^ t[3];
+ } else
+ return handle & (HTSIZE - 1);
}
static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
@@ -152,7 +175,7 @@
if (head == NULL)
return;
- for (h=0; h<256; h++) {
+ for (h=0; h<HTSIZE; h++) {
while ((f=head->ht[h]) != NULL) {
head->ht[h] = f->next;
fw_delete_filter(tp, f);
@@ -291,7 +314,7 @@
if (arg->stop)
return;
- for (h = 0; h < 256; h++) {
+ for (h = 0; h < HTSIZE; h++) {
struct fw_filter *f;
for (f = head->ht[h]; f; f = f->next) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw
2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf
@ 2005-04-07 13:31 ` Wang Jian
2005-04-07 13:52 ` Thomas Graf
0 siblings, 1 reply; 33+ messages in thread
From: Wang Jian @ 2005-04-07 13:31 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, Jamal Hadi Salim, netdev
Hi Thomas Graf,
Would you please test this case?
(0..1024) << 16
The old hash gives a 1024 depth bucket for this case.
And I am not sure if there is a bad range (0..n) << s which is mapped
into one bucket.
On Thu, 7 Apr 2005 15:09:25 +0200, Thomas Graf <tgraf@suug.ch> wrote:
> Some test results for the new hashing
>
> Test-1: hash 0..2147483648
>
> old 256 buckets hash (enum):
> empty buckets: 0 average chain length: 8388607.996 min: 8388607 max: 8388608
>
> new 1024 buckets hash (enum):
> empty buckets: 0 average chain length: 2097151.999 min: 2097151 max: 2097152
>
> Test-2: hash 65536*rand()
>
> old 256 buckets hash (rand):
> empty buckets: 0 average chain length: 256.000 min: 222 max: 296
>
> new 1024 buckets hash (rand):
> empty buckets: 0 average chain length: 64.000 min: 40 max: 93
>
> Test-3: hash 1024*rand()
>
> old 256 buckets hash (rand):
> empty buckets: 3 average chain length: 4.047 min: 0 max: 9
>
> new 1024 buckets hash (rand):
> empty buckets: 380 average chain length: 1.590 min: 0 max: 5
>
> Test-4: total time for 2147483648 hashes, 10 runs
>
> old hashing algorithm new hashing algorithm
> 1.619s 1.610s
> 1.623s 1.611s
> 1.625s 1.612s
> 1.629s 1.614s
> 1.633s 1.616s
> 1.637s 1.621s
> 1.638s 1.625s
> 1.639s 1.630s
> 1.644s 1.632s
> 1.737s 1.658s
>
>
> Please do a
>
> bk pull bk://kernel.bkbits.net/tgraf/net-2.6-cls_fw
>
> This will update the following files:
>
> net/sched/cls_fw.c | 31 +++++++++++++++++++++++++++----
> 1 files changed, 27 insertions(+), 4 deletions(-)
>
> through these ChangeSets:
>
> # This is a BitKeeper generated diff -Nru style patch.
> #
> # ChangeSet
> # 2005/04/07 14:23:09+02:00 tgraf@suug.ch
> # [PKT_SCHED]: improve hashing performance of cls_fw
> #
> # Calculate hashtable size to fit into a page instead of a hardcoded
> # 256 buckets hash table. Results in a 1024 buckets hashtable on
> # most systems.
> #
> # Replace old naive extract-8-lsb-bits algorithm with a better
> # algorithm xor'ing 3 or 4 bit fields at the size of the hashtable
> # array index in order to improve distribution if the majority of
> # the lower bits are unused while keeping zero collision behaviour
> # for the most common use case.
> #
> # Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue
> # to attention and to Eran Mann <emann@mrv.com> for the initial
> # idea for this new algorithm.
> #
> # Signed-off-by: Thomas Graf <tgraf@suug.ch>
> # Signed-off-by: David S. Miller <davem@davemloft.net>
> #
> # net/sched/cls_fw.c
> # 2005/04/07 14:22:58+02:00 tgraf@suug.ch +27 -4
> # [PKT_SCHED]: improve hashing performance of cls_fw
> #
> diff -Nru a/net/sched/cls_fw.c b/net/sched/cls_fw.c
> --- a/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00
> +++ b/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00
> @@ -46,9 +46,11 @@
> #include <net/act_api.h>
> #include <net/pkt_cls.h>
>
> +#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *))
> +
> struct fw_head
> {
> - struct fw_filter *ht[256];
> + struct fw_filter *ht[HTSIZE];
> };
>
> struct fw_filter
> @@ -69,7 +71,28 @@
>
> static __inline__ int fw_hash(u32 handle)
> {
> - return handle&0xFF;
> + if (HTSIZE == 4096)
> + return ((handle >> 24) & 0xFFF) ^
> + ((handle >> 12) & 0xFFF) ^
> + (handle & 0xFFF);
> + else if (HTSIZE == 2048)
> + return ((handle >> 22) & 0x7FF) ^
> + ((handle >> 11) & 0x7FF) ^
> + (handle & 0x7FF);
> + else if (HTSIZE == 1024)
> + return ((handle >> 20) & 0x3FF) ^
> + ((handle >> 10) & 0x3FF) ^
> + (handle & 0x3FF);
> + else if (HTSIZE == 512)
> + return (handle >> 27) ^
> + ((handle >> 18) & 0x1FF) ^
> + ((handle >> 9) & 0x1FF) ^
> + (handle & 0x1FF);
> + else if (HTSIZE == 256) {
> + u8 *t = (u8 *) &handle;
> + return t[0] ^ t[1] ^ t[2] ^ t[3];
> + } else
> + return handle & (HTSIZE - 1);
> }
>
> static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
> @@ -152,7 +175,7 @@
> if (head == NULL)
> return;
>
> - for (h=0; h<256; h++) {
> + for (h=0; h<HTSIZE; h++) {
> while ((f=head->ht[h]) != NULL) {
> head->ht[h] = f->next;
> fw_delete_filter(tp, f);
> @@ -291,7 +314,7 @@
> if (arg->stop)
> return;
>
> - for (h = 0; h < 256; h++) {
> + for (h = 0; h < HTSIZE; h++) {
> struct fw_filter *f;
>
> for (f = head->ht[h]; f; f = f->next) {
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw
2005-04-07 13:31 ` Wang Jian
@ 2005-04-07 13:52 ` Thomas Graf
2005-04-07 14:03 ` Wang Jian
0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2005-04-07 13:52 UTC (permalink / raw)
To: Wang Jian; +Cc: David S. Miller, Jamal Hadi Salim, netdev
* Wang Jian <20050407212340.02D2.LARK@linux.net.cn> 2005-04-07 21:31
> Would you please test this case?
>
> (0..1024) << 16
>
> The old hash gives a 1024 depth bucket for this case.
>
> And I am not sure if there is a bad range (0..n) << s which is mapped
> into one bucket.
old hash (enum shift):
empty buckets: 255 average chain length: 1024.000 min: 0 max: 1024
new hash (enum shift):
empty buckets: 0 average chain length: 1.000 min: 1 max: 1
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw
2005-04-07 13:52 ` Thomas Graf
@ 2005-04-07 14:03 ` Wang Jian
0 siblings, 0 replies; 33+ messages in thread
From: Wang Jian @ 2005-04-07 14:03 UTC (permalink / raw)
To: Thomas Graf; +Cc: David S. Miller, Jamal Hadi Salim, netdev
Hi Thomas Graf,
Thanks :-)
On Thu, 7 Apr 2005 15:52:40 +0200, Thomas Graf <tgraf@suug.ch> wrote:
> * Wang Jian <20050407212340.02D2.LARK@linux.net.cn> 2005-04-07 21:31
> > Would you please test this case?
> >
> > (0..1024) << 16
> >
> > The old hash gives a 1024 depth bucket for this case.
> >
> > And I am not sure if there is a bad range (0..n) << s which is mapped
> > into one bucket.
>
> old hash (enum shift):
> empty buckets: 255 average chain length: 1024.000 min: 0 max: 1024
>
> new hash (enum shift):
> empty buckets: 0 average chain length: 1.000 min: 1 max: 1
--
lark
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2005-04-07 14:03 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-05 5:35 [PATCH] improvement on net/sched/cls_fw.c's hash function Wang Jian
2005-04-05 5:37 ` David S. Miller
2005-04-05 6:05 ` Wang Jian
2005-04-05 10:25 ` jamal
2005-04-05 10:38 ` Thomas Graf
2005-04-05 11:25 ` Wang Jian
2005-04-05 12:16 ` Thomas Graf
2005-04-05 12:39 ` Wang Jian
2005-04-05 12:52 ` Thomas Graf
2005-04-05 13:29 ` Wang Jian
2005-04-05 12:54 ` jamal
2005-04-05 14:18 ` Wang Jian
2005-04-05 16:11 ` jamal
2005-04-06 6:45 ` Wang Jian
2005-04-06 12:16 ` jamal
2005-04-06 12:30 ` Thomas Graf
2005-04-06 13:01 ` Wang Jian
2005-04-06 13:34 ` jamal
2005-04-06 13:45 ` Thomas Graf
2005-04-06 14:10 ` Thomas Graf
2005-04-06 18:15 ` David S. Miller
2005-04-06 18:31 ` Thomas Graf
2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf
2005-04-07 10:38 ` jamal
2005-04-07 10:47 ` Wang Jian
2005-04-07 10:51 ` Thomas Graf
2005-04-07 11:07 ` jamal
2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf
2005-04-07 13:31 ` Wang Jian
2005-04-07 13:52 ` Thomas Graf
2005-04-07 14:03 ` Wang Jian
2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann
2005-04-06 13:53 ` Wang Jian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).