* [PATCH] improvement on net/sched/cls_fw.c's hash function
@ 2005-04-05 5:35 Wang Jian
2005-04-05 5:37 ` David S. Miller
0 siblings, 1 reply; 33+ messages in thread
From: Wang Jian @ 2005-04-05 5:35 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 390 bytes --]
Hi,
This is a simple patch against net/sched/cls_fw.c. The idea of this
patch is discussed in this thread
https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html
I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then
251 is good too.
BTW: I don't know much about hash performance and hash distribution of
jhash. This is a quick fix.
--
lark
[-- Attachment #2: hash-cls_fw.diff --]
[-- Type: application/octet-stream, Size: 1029 bytes --]
Index: cls_fw.c
===================================================================
--- cls_fw.c (revision 1)
+++ cls_fw.c (working copy)
@@ -45,10 +45,13 @@
#include <net/sock.h>
#include <net/act_api.h>
#include <net/pkt_cls.h>
+#include <linux/jhash.h>
+#define FW_FILTER_HSIZE 509
+
struct fw_head
{
- struct fw_filter *ht[256];
+ struct fw_filter *ht[FW_FILTER_HSIZE];
};
struct fw_filter
@@ -69,7 +72,7 @@
static __inline__ int fw_hash(u32 handle)
{
- return handle&0xFF;
+ return (jhash_1word(handle, 0xF30A7129) % FW_FILTER_HSIZE);
}
static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
@@ -152,7 +155,7 @@
if (head == NULL)
return;
- for (h=0; h<256; h++) {
+ for (h=0; h<FW_FILTER_HSIZE; h++) {
while ((f=head->ht[h]) != NULL) {
head->ht[h] = f->next;
fw_delete_filter(tp, f);
@@ -291,7 +294,7 @@
if (arg->stop)
return;
- for (h = 0; h < 256; h++) {
+ for (h = 0; h < FW_FILTER_HSIZE; h++) {
struct fw_filter *f;
for (f = head->ht[h]; f; f = f->next) {
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 5:35 [PATCH] improvement on net/sched/cls_fw.c's hash function Wang Jian @ 2005-04-05 5:37 ` David S. Miller 2005-04-05 6:05 ` Wang Jian 0 siblings, 1 reply; 33+ messages in thread From: David S. Miller @ 2005-04-05 5:37 UTC (permalink / raw) To: Wang Jian; +Cc: netdev On Tue, 05 Apr 2005 13:35:02 +0800 Wang Jian <lark@linux.net.cn> wrote: > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > 251 is good too. Please us a power of two, the "%" is expensive on some cpus. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 5:37 ` David S. Miller @ 2005-04-05 6:05 ` Wang Jian 2005-04-05 10:25 ` jamal 2005-04-05 10:38 ` Thomas Graf 0 siblings, 2 replies; 33+ messages in thread From: Wang Jian @ 2005-04-05 6:05 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 507 bytes --] Hi David S. Miller, New patch attached. Hashsize is 256, the same as old one. On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" <davem@davemloft.net> wrote: > On Tue, 05 Apr 2005 13:35:02 +0800 > Wang Jian <lark@linux.net.cn> wrote: > > > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > > 251 is good too. > > Please us a power of two, the "%" is expensive on some cpus. -- lark [-- Attachment #2: hash-cls_fw-2.diff --] [-- Type: application/octet-stream, Size: 1104 bytes --] Index: linux-2.6.11-w/net/sched/cls_fw.c =================================================================== --- linux-2.6.11-w/net/sched/cls_fw.c (revision 1) +++ linux-2.6.11-w/net/sched/cls_fw.c (working copy) @@ -45,10 +45,13 @@ #include <net/sock.h> #include <net/act_api.h> #include <net/pkt_cls.h> +#include <linux/jhash.h> +#define FW_FILTER_HSIZE 256 + struct fw_head { - struct fw_filter *ht[256]; + struct fw_filter *ht[FW_FILTER_HSIZE]; }; struct fw_filter @@ -69,7 +72,7 @@ static __inline__ int fw_hash(u32 handle) { - return handle&0xFF; + return (jhash_1word(handle, 0xF30A7129) % FW_FILTER_HSIZE); } static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp, @@ -152,7 +155,7 @@ if (head == NULL) return; - for (h=0; h<256; h++) { + for (h=0; h<FW_FILTER_HSIZE; h++) { while ((f=head->ht[h]) != NULL) { head->ht[h] = f->next; fw_delete_filter(tp, f); @@ -291,7 +294,7 @@ if (arg->stop) return; - for (h = 0; h < 256; h++) { + for (h = 0; h < FW_FILTER_HSIZE; h++) { struct fw_filter *f; for (f = head->ht[h]; f; f = f->next) { ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 6:05 ` Wang Jian @ 2005-04-05 10:25 ` jamal 2005-04-05 10:38 ` Thomas Graf 1 sibling, 0 replies; 33+ messages in thread From: jamal @ 2005-04-05 10:25 UTC (permalink / raw) To: Wang Jian; +Cc: netdev Wang, I read that thread and i am a little confused. What is this change supposed to improve? cheers, jamal On Tue, 2005-04-05 at 02:05, Wang Jian wrote: > Hi David S. Miller, > > New patch attached. Hashsize is 256, the same as old one. > > > On Mon, 4 Apr 2005 22:37:44 -0700, "David S. Miller" <davem@davemloft.net> wrote: > > > On Tue, 05 Apr 2005 13:35:02 +0800 > > Wang Jian <lark@linux.net.cn> wrote: > > > > > https://lists.netfilter.org/pipermail/netfilter-devel/2005-March/018762.html > > > > > > I chose 509 for FW_FILTER_HSIZE. If you feel it is waste of memory, then > > > 251 is good too. > > > > Please us a power of two, the "%" is expensive on some cpus. > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 6:05 ` Wang Jian 2005-04-05 10:25 ` jamal @ 2005-04-05 10:38 ` Thomas Graf 2005-04-05 11:25 ` Wang Jian 1 sibling, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-05 10:38 UTC (permalink / raw) To: Wang Jian; +Cc: netdev * Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05 > New patch attached. Hashsize is 256, the same as old one. Do you have any numbers that could prove that this change actually improves the hash distribution and thus the overall lookup performance? The most often used and thus most important range of mark values is definitely 0..255. I did not look into jhash but the risk of collisions definitely increases with this change which affects about 90% of the users of fw which could benefit from a collision free hashtable so far. I would appreciate if you could provide some numbers proving both the need and actual improvement of this change since fwmark is one of the most often used classifiers. Cheers ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 10:38 ` Thomas Graf @ 2005-04-05 11:25 ` Wang Jian 2005-04-05 12:16 ` Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: Wang Jian @ 2005-04-05 11:25 UTC (permalink / raw) To: Thomas Graf; +Cc: netdev, jamal Hi Thomas Graf, If you read the thread I pointed to, then you know there is chance that nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16 bit number can be mapped to a classid. This is one of many chances. In that case, nfmark can be used like this 0x00010000 0x00020000 0x00030000 ... 0x00000001 0x00000002 0x00000003 ... The old hash function doesn't expect such pattern. I must admit that I am not very familiar with hash function. I find that and use a quick hack. My patch just points out the existing risk. Anyone can improve this by using a faster and even distributed hash function. And actually, for 256 as hash size, the second patch I sent can be still improved, return (jhash_1word(handle, 0xF30A7129) & 0xFF); instead of return (jhash_1word(handle, 0xF30A7129) % 256); On Tue, 5 Apr 2005 12:38:27 +0200, Thomas Graf <tgraf@suug.ch> wrote: > * Wang Jian <20050405140342.024A.LARK@linux.net.cn> 2005-04-05 14:05 > > New patch attached. Hashsize is 256, the same as old one. > > Do you have any numbers that could prove that this change > actually improves the hash distribution and thus the overall > lookup performance? > > The most often used and thus most important range of mark > values is definitely 0..255. I did not look into jhash > but the risk of collisions definitely increases with this > change which affects about 90% of the users of fw which > could benefit from a collision free hashtable so far. > > I would appreciate if you could provide some numbers proving > both the need and actual improvement of this change since > fwmark is one of the most often used classifiers. > > Cheers -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 11:25 ` Wang Jian @ 2005-04-05 12:16 ` Thomas Graf 2005-04-05 12:39 ` Wang Jian 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-05 12:16 UTC (permalink / raw) To: Wang Jian; +Cc: netdev, jamal * Wang Jian <20050405190024.024D.LARK@linux.net.cn> 2005-04-05 19:25 > If you read the thread I pointed to, then you know there is chance that > nfmark is used as two 16 bit numbers (along with CONNMARK), and the 16 > bit number can be mapped to a classid. This is one of many chances. > > In that case, nfmark can be used like this > > 0x00010000 > 0x00020000 > 0x00030000 > ... > > 0x00000001 > 0x00000002 > 0x00000003 > ... > > The old hash function doesn't expect such pattern. I'm aware of the problem you're facing, if the lower 8bits are set to 0 for a large amount of flows you get all that flows chained in the first hash bucket. > I must admit that I am not very familiar with hash function. I find that > and use a quick hack. My patch just points out the existing risk. Anyone > can improve this by using a faster and even distributed hash function. I can't really give you feedback on this since I don't have the background for this. Theoretically a hash size being a prime would do better but is stupid regarding slab efficiency. What I'm worried about is that we lose the zero collisions behaviour for the most popular use case. New idea: we make this configureable and allow 3 types of hash functions: 1) default as-is, perfect for marks 0..255 2) all bits taken into account (your patch) 3) bitmask + shift provided by the user just like dsmark. Thoughts? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 12:16 ` Thomas Graf @ 2005-04-05 12:39 ` Wang Jian 2005-04-05 12:52 ` Thomas Graf 2005-04-05 12:54 ` jamal 0 siblings, 2 replies; 33+ messages in thread From: Wang Jian @ 2005-04-05 12:39 UTC (permalink / raw) To: Thomas Graf; +Cc: netdev, jamal Hi Thomas Graf, On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote: > > What I'm worried about is that we lose the zero collisions behaviour > for the most popular use case. If a web interface is used to generate netfilter/tc rules that use nfmark, then the above assumption is false. nfmark will be used incrementally and wrapped back to 0 somewhere like process id. So zero collision is not likely. When linux's QoS control capability is widely used, such web interface sooner or later comes into being. > New idea: we make this configureable and allow 3 types of hash functions: > 1) default as-is, perfect for marks 0..255 > 2) all bits taken into account (your patch) > 3) bitmask + shift provided by the user just like > dsmark. > > Thoughts? Your suggestion is very considerable. But that needs some more work. And, isn't that some bloated? -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 12:39 ` Wang Jian @ 2005-04-05 12:52 ` Thomas Graf 2005-04-05 13:29 ` Wang Jian 2005-04-05 12:54 ` jamal 1 sibling, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-05 12:52 UTC (permalink / raw) To: Wang Jian; +Cc: netdev, jamal * Wang Jian <20050405202039.0250.LARK@linux.net.cn> 2005-04-05 20:39 > On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote: > > What I'm worried about is that we lose the zero collisions behaviour > > for the most popular use case. > > If a web interface is used to generate netfilter/tc rules that use > nfmark, then the above assumption is false. nfmark will be used > incrementally and wrapped back to 0 somewhere like process id. So zero > collision is not likely. I did not claim that the above assumption is true for all case but the most common use of cls_fw is static marks set by netfilter to values from 0..255. > When linux's QoS control capability is widely used, such web interface > sooner or later comes into being. That might be true but I will never ack on something that makes zero collision use of cls_fw impossible. I'm all for improving this but not at the cost of reduced performance for the most obvious use case of cls_fw. > Your suggestion is very considerable. But that needs some more work. And, > isn't that some bloated? The shift + bitmask might be bloated and can be deferred a bit until someone comes up with this need. I can cook up a patch for this if you want, it's not much work. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 12:52 ` Thomas Graf @ 2005-04-05 13:29 ` Wang Jian 0 siblings, 0 replies; 33+ messages in thread From: Wang Jian @ 2005-04-05 13:29 UTC (permalink / raw) To: Thomas Graf; +Cc: netdev, jamal Hi Thomas Graf, On Tue, 5 Apr 2005 14:52:37 +0200, Thomas Graf <tgraf@suug.ch> wrote: > > > When linux's QoS control capability is widely used, such web interface > > sooner or later comes into being. > > That might be true but I will never ack on something that makes zero > collision use of cls_fw impossible. I'm all for improving this but > not at the cost of reduced performance for the most obvious use case > of cls_fw. Agree. > > > Your suggestion is very considerable. But that needs some more work. And, > > isn't that some bloated? > > The shift + bitmask might be bloated and can be deferred a bit until > someone comes up with this need. I can cook up a patch for this > if you want, it's not much work. I am glad that you are willing to help to hook them up. I really need the new hashing way. -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 12:39 ` Wang Jian 2005-04-05 12:52 ` Thomas Graf @ 2005-04-05 12:54 ` jamal 2005-04-05 14:18 ` Wang Jian 1 sibling, 1 reply; 33+ messages in thread From: jamal @ 2005-04-05 12:54 UTC (permalink / raw) To: Wang Jian; +Cc: Thomas Graf, netdev On Tue, 2005-04-05 at 08:39, Wang Jian wrote: > Hi Thomas Graf, > > > On Tue, 5 Apr 2005 14:16:05 +0200, Thomas Graf <tgraf@suug.ch> wrote: > > > > > What I'm worried about is that we lose the zero collisions behaviour > > for the most popular use case. > > If a web interface is used to generate netfilter/tc rules that use > nfmark, then the above assumption is false. nfmark will be used > incrementally and wrapped back to 0 somewhere like process id. So zero > collision is not likely. > Yes, but the distribution is still very good even in that case. If you have 257 entries then all except for two will be in separate buckets. > When linux's QoS control capability is widely used, such web interface > sooner or later comes into being. > > > New idea: we make this configureable and allow 3 types of hash functions: > > 1) default as-is, perfect for marks 0..255 > > 2) all bits taken into account (your patch) > > 3) bitmask + shift provided by the user just like > > dsmark. > > > > Thoughts? > > Your suggestion is very considerable. But that needs some more work. And, > isn't that some bloated? > Why dont you run a quick test? Very easy to do in user space. Enter two sets of values using the two different approaches; yours and the current way tc uses nfmark (incremental). And then apply the jenkins approach you had to see how well it looks like? I thinkw e know how it will look with current hash - but if you can show its not so bad in the case of jenkins as well it may be an acceptable approach, cheers, jamal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 12:54 ` jamal @ 2005-04-05 14:18 ` Wang Jian 2005-04-05 16:11 ` jamal 0 siblings, 1 reply; 33+ messages in thread From: Wang Jian @ 2005-04-05 14:18 UTC (permalink / raw) To: hadi; +Cc: Thomas Graf, netdev Hi jamal, On 05 Apr 2005 08:54:49 -0400, jamal <hadi@cyberus.ca> wrote: > > Why dont you run a quick test? Very easy to do in user space. > Enter two sets of values using the two different approaches; yours and > the current way tc uses nfmark (incremental). And then apply the jenkins > approach you had to see how well it looks like? I thinkw e know how it > will look with current hash - but if you can show its not so bad in the > case of jenkins as well it may be an acceptable approach, > I am not saying that we must use jenkins. We may use a less expensive hash function than jenkins, but better than & 0xFF. Anyway, I have done userspace test for jhash. The following test is done in a AMD Athlon 800MHz without other CPU load. -- snip jhash_test.c -- typedef unsigned long u32; typedef unsigned char u8; #include <linux/jhash.h> #include <stdlib.h> int main(void) { u32 i; u32 h; for (i = 0; i < 10000000; i++) { h = jhash_1word(i, 0xF30A7129) & 0xFFL; // printf("h is %u\n", h); } return 0; } -- snip -- [root@qos ~]# gcc jhash_test.c [root@qos ~]# time ./a.out 0.77user 0.00system 0:00.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps --snip simple_hash.c -- typedef unsigned long u32; typedef unsigned char u8; #include <stdlib.h> int main(void) { u32 i; u32 h; for (i = 0; i < 10000000; i++) { h = i & 0xFF; // printf("h is %u\n", h); } return 0; } -- snip -- [root@qos ~]# gcc simple_hash.c [root@qos ~]# time ./a.out 0.02user 0.00system 0:00.02elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps The simple method is far better in performance. For extreme situation, 100Mbps ethernet has about 148800 pps for TCP. Replace 10000000 with 150000. [root@qos ~]# time ./a.out 0.01user 0.00system 0:00.01elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps So use jhash is not big deal at 100Mbps. For 1000Mbps ethernet, replace 10000000 with 1489000. [root@qos ~]# time ./a.out 0.11user 0.00system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+81minor)pagefaults 0swaps It's expected that a more hot CPU is used for GE, for example, 2.4GHz CPU. So 0.11 / (2.4/0.8) = 0.04. This is still not a big problem for a dedicated linux box for qos control. We know that 500Mbps is already a bottleneck here. -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 14:18 ` Wang Jian @ 2005-04-05 16:11 ` jamal 2005-04-06 6:45 ` Wang Jian 0 siblings, 1 reply; 33+ messages in thread From: jamal @ 2005-04-05 16:11 UTC (permalink / raw) To: Wang Jian; +Cc: Thomas Graf, netdev Hi Wang, On Tue, 2005-04-05 at 10:18, Wang Jian wrote: > I am not saying that we must use jenkins. We may use a less expensive > hash function than jenkins, but better than & 0xFF. > Sure; point is as long as it doesnt destroy the common use in place. >Anyway, I have done userspace test for jhash. The following test > is done in a AMD Athlon 800MHz without other CPU load. > No, the test i was asking for is to show distribution of the hash not how long it took (which is also an ok test). i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024 samples of the data you showed earlier for how fwmark looks like, how well would the result look like. And what if you fed it with something like 1024 incremental fwmark from say 1..1024? cheers, jamal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-05 16:11 ` jamal @ 2005-04-06 6:45 ` Wang Jian 2005-04-06 12:16 ` jamal 2005-04-06 12:30 ` Thomas Graf 0 siblings, 2 replies; 33+ messages in thread From: Wang Jian @ 2005-04-06 6:45 UTC (permalink / raw) To: hadi; +Cc: Thomas Graf, netdev [-- Attachment #1: Type: text/plain, Size: 1502 bytes --] Hi jamal, On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote: > Hi Wang, > > On Tue, 2005-04-05 at 10:18, Wang Jian wrote: > > > I am not saying that we must use jenkins. We may use a less expensive > > hash function than jenkins, but better than & 0xFF. > > > > Sure; point is as long as it doesnt destroy the common use in place. > > >Anyway, I have done userspace test for jhash. The following test > > is done in a AMD Athlon 800MHz without other CPU load. > > > > No, the test i was asking for is to show distribution of the > hash not how long it took (which is also an ok test). Sorry, the test is not on the AMD Athlon 800MHz, but a 2.4G Pentium4! I didn't notice that. So the jhash's performance is not fast enough. On AMD 800Mhz line speed @ 1000Mbps [root@home ~]# time ./a.out 0.31user 0.00system 0:00.32elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+83minor)pagefaults 0swaps line speed @ 100Mbps [root@home ~]# time ./a.out 0.03user 0.00system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+83minor)pagefaults 0swaps > > i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024 > samples of the data you showed earlier for how fwmark looks like, > how well would the result look like. > And what if you fed it with something like 1024 incremental fwmark from > say 1..1024? > The test result looks not good. See attached file. So let's find a better way. -- lark [-- Attachment #2: jhash-dist.txt --] [-- Type: application/octet-stream, Size: 5499 bytes --] $VAR1 = { '32' => 1, '127' => 3, '90' => 6, '118' => 3, '206' => 2, '71' => 2, '102' => 7, '200' => 4, '18' => 7, '125' => 4, '16' => 8, '44' => 2, '55' => 4, '84' => 6, '27' => 4, '190' => 5, '233' => 4, '161' => 9, '57' => 2, '194' => 3, '95' => 5, '220' => 2, '20' => 2, '163' => 2, '231' => 1, '109' => 4, '243' => 3, '151' => 3, '89' => 3, '175' => 3, '148' => 7, '31' => 2, '35' => 2, '11' => 5, '208' => 5, '78' => 4, '93' => 2, '106' => 1, '157' => 5, '65' => 3, '29' => 5, '197' => 7, '203' => 2, '138' => 1, '114' => 4, '199' => 6, '226' => 2, '58' => 7, '15' => 3, '153' => 1, '211' => 5, '137' => 1, '81' => 5, '60' => 3, '73' => 2, '101' => 5, '76' => 5, '86' => 6, '62' => 6, '247' => 2, '67' => 4, '204' => 3, '241' => 3, '165' => 7, '139' => 6, '198' => 6, '129' => 7, '2' => 3, '17' => 5, '186' => 4, '82' => 6, '110' => 2, '147' => 9, '228' => 2, '236' => 4, '249' => 3, '202' => 6, '168' => 3, '218' => 1, '14' => 1, '135' => 3, '184' => 5, '69' => 5, '112' => 7, '172' => 3, '145' => 4, '49' => 4, '178' => 3, '24' => 5, '224' => 3, '140' => 7, '187' => 4, '223' => 5, '124' => 1, '104' => 3, '131' => 5, '181' => 2, '234' => 4, '79' => 3, '121' => 4, '212' => 3, '154' => 6, '0' => 5, '23' => 1, '96' => 7, '126' => 3, '238' => 3, '159' => 6, '251' => 3, '253' => 2, '160' => 6, '176' => 2, '47' => 7, '8' => 6, '98' => 4, '209' => 4, '216' => 4, '37' => 4, '117' => 2, '43' => 4, '195' => 7, '5' => 4, '170' => 4, '33' => 5, '63' => 5, '21' => 6, '7' => 5, '227' => 3, '80' => 4, '26' => 4, '193' => 1, '119' => 3, '99' => 1, '180' => 3, '244' => 4, '162' => 2, '72' => 4, '179' => 3, '255' => 3, '246' => 3, '74' => 4, '240' => 5, '182' => 2, '61' => 5, '230' => 4, '108' => 2, '115' => 6, '92' => 5, '103' => 5, '201' => 2, '232' => 4, '10' => 3, '113' => 6, '152' => 2, '189' => 6, '225' => 3, '207' => 2, '142' => 5, '91' => 1, '167' => 5, '48' => 6, '107' => 4, '87' => 3, '77' => 2, '174' => 6, '214' => 4, '133' => 4, '149' => 3, '123' => 4, '221' => 2, '50' => 7, '39' => 1, '210' => 2, '64' => 2, '97' => 5, '41' => 4, '12' => 4, '52' => 4, '173' => 1, '56' => 3, '229' => 2, '66' => 4, '45' => 6, '19' => 4, '54' => 4, '237' => 2, '70' => 4, '188' => 4, '68' => 4, '166' => 2, '1' => 2, '136' => 6, '116' => 8, '88' => 2, '30' => 2, '141' => 4, '144' => 5, '100' => 4, '222' => 6, '128' => 3, '25' => 5, '252' => 3, '28' => 5, '120' => 2, '75' => 2, '83' => 5, '40' => 2, '134' => 5, '156' => 5, '192' => 6, '250' => 8, '59' => 7, '215' => 4, '254' => 3, '177' => 2, '150' => 3, '130' => 3, '155' => 4, '53' => 4, '217' => 6, '245' => 4, '239' => 4, '122' => 1, '143' => 6, '205' => 2, '42' => 3, '158' => 5, '22' => 4, '46' => 9, '219' => 5, '13' => 2, '105' => 3, '235' => 9, '6' => 1, '85' => 4, '185' => 5, '3' => 5, '36' => 5, '183' => 6, '94' => 7, '213' => 5, '248' => 5, '146' => 4, '111' => 5, '9' => 6, '38' => 5, '4' => 6, '34' => 9, '132' => 3, '169' => 7, '164' => 6, '196' => 5, '171' => 4, '242' => 5 }; ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 6:45 ` Wang Jian @ 2005-04-06 12:16 ` jamal 2005-04-06 12:30 ` Thomas Graf 1 sibling, 0 replies; 33+ messages in thread From: jamal @ 2005-04-06 12:16 UTC (permalink / raw) To: Wang Jian; +Cc: Thomas Graf, netdev On Wed, 2005-04-06 at 02:45, Wang Jian wrote: > The test result looks not good. See attached file. > Yes, that doesnt look too good. > So let's find a better way. Thomas? cheers, jamal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 6:45 ` Wang Jian 2005-04-06 12:16 ` jamal @ 2005-04-06 12:30 ` Thomas Graf 2005-04-06 13:01 ` Wang Jian ` (2 more replies) 1 sibling, 3 replies; 33+ messages in thread From: Thomas Graf @ 2005-04-06 12:30 UTC (permalink / raw) To: Wang Jian; +Cc: hadi, netdev * Wang Jian <20050406143842.026B.LARK@linux.net.cn> 2005-04-06 14:45 > On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote: > > i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024 > > samples of the data you showed earlier for how fwmark looks like, > > how well would the result look like. > > And what if you fed it with something like 1024 incremental fwmark from > > say 1..1024? > > > > The test result looks not good. See attached file. > > So let's find a better way. We need to provide some kind of option to the user so he can specify the needs. The & 0xFF will suit most just fine but has one essential drawback which is that no distribution is done at all if the lower 8 bits are set to 0. For static marks this is no issue at all and even for enumerated marks growing it takes quite some time to grow into an area where it starts hurting. The problem obviously is if someone splits the mark field into 2 parts and uses the upper 16 bits for some special purpose just like you did. In such as case it would make sense to either take all bits into account or let the user specify a bitmask + shift. So here is the same idea I posted before but revised: Let the user specify one of the hash tables via a new TLV: - default: & 0xFF - ((mark & mask) >> shift) & 0xFF - jenkins for 16, 32, and 64 bits - FNV for 16, 32, and 64 bits Why variations for type sizes? The chance of collisions reduces a lot if the user exactly knows he'll never use more than 16bits but 255 marks are not enough. I'm cooking up a patch for this today together with a fix to allow 64bit values for the mark. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 12:30 ` Thomas Graf @ 2005-04-06 13:01 ` Wang Jian 2005-04-06 13:34 ` jamal 2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann 2 siblings, 0 replies; 33+ messages in thread From: Wang Jian @ 2005-04-06 13:01 UTC (permalink / raw) To: Thomas Graf; +Cc: hadi, netdev Hi Thomas Graf, On Wed, 6 Apr 2005 14:30:36 +0200, Thomas Graf <tgraf@suug.ch> wrote: > * Wang Jian <20050406143842.026B.LARK@linux.net.cn> 2005-04-06 14:45 > > On 05 Apr 2005 12:11:35 -0400, jamal <hadi@cyberus.ca> wrote: > > > i.e if you fed the jenkins hash with 256 buckets - lets pick the number 1024 > > > samples of the data you showed earlier for how fwmark looks like, > > > how well would the result look like. > > > And what if you fed it with something like 1024 incremental fwmark from > > > say 1..1024? > > > > > > > The test result looks not good. See attached file. > > > > So let's find a better way. > > We need to provide some kind of option to the user so he can specify > the needs. The & 0xFF will suit most just fine but has one essential > drawback which is that no distribution is done at all if the lower 8 > bits are set to 0. For static marks this is no issue at all and even > for enumerated marks growing it takes quite some time to grow into > an area where it starts hurting. The problem obviously is if someone > splits the mark field into 2 parts and uses the upper 16 bits for > some special purpose just like you did. In such as case it would make > sense to either take all bits into account or let the user specify > a bitmask + shift. > > So here is the same idea I posted before but revised: > > Let the user specify one of the hash tables via a new TLV: > - default: & 0xFF > - ((mark & mask) >> shift) & 0xFF > - jenkins for 16, 32, and 64 bits > - FNV for 16, 32, and 64 bits > > Why variations for type sizes? The chance of collisions reduces > a lot if the user exactly knows he'll never use more than 16bits > but 255 marks are not enough. > > I'm cooking up a patch for this today together with a fix to > allow 64bit values for the mark. Given that no 1-hash-fit-all solution exists, I think your solution is quite elegant. -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 12:30 ` Thomas Graf 2005-04-06 13:01 ` Wang Jian @ 2005-04-06 13:34 ` jamal 2005-04-06 13:45 ` Thomas Graf 2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann 2 siblings, 1 reply; 33+ messages in thread From: jamal @ 2005-04-06 13:34 UTC (permalink / raw) To: Thomas Graf; +Cc: Wang Jian, netdev On Wed, 2005-04-06 at 08:30, Thomas Graf wrote: > Let the user specify one of the hash tables via a new TLV: > - default: & 0xFF > - ((mark & mask) >> shift) & 0xFF > - jenkins for 16, 32, and 64 bits > - FNV for 16, 32, and 64 bits When does the user specify this? i would think you need to set it once only per bootup. After that it will be quiet complex to reset. It could be done but sounds complex and unnecessary. i.e it may be a boot or module parameter more than it is a netlink controlled value, no? cheers, jamal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 13:34 ` jamal @ 2005-04-06 13:45 ` Thomas Graf 2005-04-06 14:10 ` Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-06 13:45 UTC (permalink / raw) To: jamal; +Cc: Wang Jian, netdev * jamal <1112794459.1096.61.camel@jzny.localdomain> 2005-04-06 09:34 > On Wed, 2005-04-06 at 08:30, Thomas Graf wrote: > > > Let the user specify one of the hash tables via a new TLV: > > - default: & 0xFF > > - ((mark & mask) >> shift) & 0xFF > > - jenkins for 16, 32, and 64 bits > > - FNV for 16, 32, and 64 bits > > When does the user specify this? i would think you need to set it once > only per bootup. After that it will be quiet complex to reset. It could > be done but sounds complex and unnecessary. i.e it may be a boot or > module parameter more than it is a netlink controlled value, no? I'm not 100% sure about this yet but I think during fw_change so we can have different hashes for different qdiscs filter chains. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 13:45 ` Thomas Graf @ 2005-04-06 14:10 ` Thomas Graf 2005-04-06 18:15 ` David S. Miller 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-06 14:10 UTC (permalink / raw) To: jamal; +Cc: Wang Jian, netdev * Thomas Graf <20050406134502.GP26731@postel.suug.ch> 2005-04-06 15:45 > * jamal <1112794459.1096.61.camel@jzny.localdomain> 2005-04-06 09:34 > > When does the user specify this? i would think you need to set it once > > only per bootup. After that it will be quiet complex to reset. It could > > be done but sounds complex and unnecessary. i.e it may be a boot or > > module parameter more than it is a netlink controlled value, no? > > I'm not 100% sure about this yet but I think during fw_change so > we can have different hashes for different qdiscs filter chains. The thing I'm worrying about is that I don't want to break the perfect alignment of fw_head to good slab obj sizes but I guess there is no way around. I'd really like to make hash size and hash function configureable. For example a hash size of 1024 would perform much better and would still fit into a single page on most systems. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 14:10 ` Thomas Graf @ 2005-04-06 18:15 ` David S. Miller 2005-04-06 18:31 ` Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: David S. Miller @ 2005-04-06 18:15 UTC (permalink / raw) To: Thomas Graf; +Cc: hadi, lark, netdev On Wed, 6 Apr 2005 16:10:20 +0200 Thomas Graf <tgraf@suug.ch> wrote: > The thing I'm worrying about is that I don't want to break the perfect > alignment of fw_head to good slab obj sizes but I guess there is no > way around. I'd really like to make hash size and hash function > configureable. For example a hash size of 1024 would perform much > better and would still fit into a single page on most systems. I think a hash xor'ing in the high bits into the low 8 bits, as has been suggested a few times already, meets your criteria and solves Lark's problem. The hash table size, if still an issue, can be dynamically sized based upon some criteria with some reasonable default initial selection (such as the current 256). I think dynamic hash function selection is madness. You have a key that needs to be hashed, you are aware of one very common usage (the one that perfectly hashes currently) and you are also aware of a method by which the cases that don't fit into that mold can be made to perform acceptably too (xor'ing in the upper bits). We can thus do this with one single hash function. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 18:15 ` David S. Miller @ 2005-04-06 18:31 ` Thomas Graf 2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-06 18:31 UTC (permalink / raw) To: David S. Miller; +Cc: hadi, lark, netdev * David S. Miller <20050406111509.0462abcf.davem@davemloft.net> 2005-04-06 11:15 > On Wed, 6 Apr 2005 16:10:20 +0200 > Thomas Graf <tgraf@suug.ch> wrote: > > > The thing I'm worrying about is that I don't want to break the perfect > > alignment of fw_head to good slab obj sizes but I guess there is no > > way around. I'd really like to make hash size and hash function > > configureable. For example a hash size of 1024 would perform much > > better and would still fit into a single page on most systems. > > I think a hash xor'ing in the high bits into the low 8 bits, as has > been suggested a few times already, meets your criteria and solves > Lark's problem. Yes, it sounds pretty good. I can't find any scenario where it would perform unacceptable, it's not perfect but fair enough for everyone I guess. > The hash table size, if still an issue, can be dynamically sized based > upon some criteria with some reasonable default initial selection > (such as the current 256). I guess sometimes it would make a lot of sense to spend a few dozen pages for the hashtable but it's not worth to play games with the perfectly aligned fw_head just in case. PAGESIZE/address_size would make sense though. If someone needs something bigger it can be changed in the source. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC] dynamic hash table size & xor hash function for cls_fw 2005-04-06 18:31 ` Thomas Graf @ 2005-04-07 0:55 ` Thomas Graf 2005-04-07 10:38 ` jamal 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-07 0:55 UTC (permalink / raw) To: David S. Miller; +Cc: hadi, lark, netdev * Thomas Graf <20050406183134.GR26731@postel.suug.ch> 2005-04-06 20:31 > Yes, it sounds pretty good. I can't find any scenario where it > would perform unacceptable, it's not perfect but fair enough > for everyone I guess. Changes the hashtable size to PAGE_SIZE/sizeof(unsigned long) and adapts the hashing algorithm to xor various buckets in order to maintain collision free hashing for the most common use case while providing some kind of distribution for more rare cases. The hashing function looks really ugly, it is, but gcc will optimize away all the uneeded branches which will make it short again. The hash case for 10bits hash table ignores the most significant 2 bits, I don't think this has any impact but would rather be wasted cycles. It looks inefficient but the manual shifting & xoring only takes one instruction more than the HTSIZE == 256 case on my x86 box so the increased hashing area should be worth it. Thoughts? ===== net/sched/cls_fw.c 1.21 vs edited ===== --- 1.21/net/sched/cls_fw.c 2005-03-29 02:45:46 +02:00 +++ edited/net/sched/cls_fw.c 2005-04-07 02:41:46 +02:00 @@ -46,9 +46,11 @@ #include <net/act_api.h> #include <net/pkt_cls.h> +#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *)) + struct fw_head { - struct fw_filter *ht[256]; + struct fw_filter *ht[HTSIZE]; }; struct fw_filter @@ -69,7 +71,28 @@ static __inline__ int fw_hash(u32 handle) { - return handle&0xFF; + if (HTSIZE == 4096) + return ((handle >> 24) & 0xFFF) ^ + ((handle >> 12) & 0xFFF) ^ + (handle & 0xFFF); + else if (HTSIZE == 2048) + return ((handle >> 22) & 0x7FF) ^ + ((handle >> 11) & 0x7FF) ^ + (handle & 0x7FF); + else if (HTSIZE == 1024) + return ((handle >> 20) & 0x3FF) ^ + ((handle >> 10) & 0x3FF) ^ + (handle & 0x3FF); + else if (HTSIZE == 512) + return (handle >> 27) ^ + ((handle >> 18) & 0x1FF) ^ + ((handle >> 9) & 0x1FF) ^ + (handle & 0x1FF); + else if (HTSIZE == 256) { + u8 *t = (u8 *) &handle; + return t[0] ^ t[1] ^ t[2] ^ t[3]; + } else + return handle & (HTSIZE - 1); } static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp, @@ -152,7 +175,7 @@ if (head == NULL) return; - for (h=0; h<256; h++) { + for (h=0; h<HTSIZE; h++) { while ((f=head->ht[h]) != NULL) { head->ht[h] = f->next; fw_delete_filter(tp, f); @@ -291,7 +314,7 @@ if (arg->stop) return; - for (h = 0; h < 256; h++) { + for (h = 0; h < HTSIZE; h++) { struct fw_filter *f; for (f = head->ht[h]; f; f = f->next) { ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw 2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf @ 2005-04-07 10:38 ` jamal 2005-04-07 10:47 ` Wang Jian 2005-04-07 10:51 ` Thomas Graf 0 siblings, 2 replies; 33+ messages in thread From: jamal @ 2005-04-07 10:38 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, lark, netdev On Wed, 2005-04-06 at 20:55, Thomas Graf wrote: > > static __inline__ int fw_hash(u32 handle) > { > - return handle&0xFF; > + if (HTSIZE == 4096) > + return ((handle >> 24) & 0xFFF) ^ > + ((handle >> 12) & 0xFFF) ^ > + (handle & 0xFFF); > + else if (HTSIZE == 2048) > + return ((handle >> 22) & 0x7FF) ^ > + ((handle >> 11) & 0x7FF) ^ > + (handle & 0x7FF); > + else if (HTSIZE == 1024) > + return ((handle >> 20) & 0x3FF) ^ > + ((handle >> 10) & 0x3FF) ^ > + (handle & 0x3FF); > + else if (HTSIZE == 512) > + return (handle >> 27) ^ > + ((handle >> 18) & 0x1FF) ^ > + ((handle >> 9) & 0x1FF) ^ > + (handle & 0x1FF); > + else if (HTSIZE == 256) { > + u8 *t = (u8 *) &handle; > + return t[0] ^ t[1] ^ t[2] ^ t[3]; > + } else > + return handle & (HTSIZE - 1); > } Does HTSIZE change at runtime? How does migrating from one to other take place? Also why not have a function pointer with a series of these being separate instead of doing the if checks? BTW it does seem any one of those hashes maybe sufficient, no? cheers, jamal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw 2005-04-07 10:38 ` jamal @ 2005-04-07 10:47 ` Wang Jian 2005-04-07 10:51 ` Thomas Graf 1 sibling, 0 replies; 33+ messages in thread From: Wang Jian @ 2005-04-07 10:47 UTC (permalink / raw) To: hadi; +Cc: Thomas Graf, David S. Miller, netdev Hi jamal, I think Thomas decide to only support one hash function at compile time, and no switch at runtime. HSIZE is a constant so the if branch will be optimized by gcc at compile time. only one hash is left. On 07 Apr 2005 06:38:27 -0400, jamal <hadi@cyberus.ca> wrote: > On Wed, 2005-04-06 at 20:55, Thomas Graf wrote: > > > > > static __inline__ int fw_hash(u32 handle) > > { > > - return handle&0xFF; > > + if (HTSIZE == 4096) > > + return ((handle >> 24) & 0xFFF) ^ > > + ((handle >> 12) & 0xFFF) ^ > > + (handle & 0xFFF); > > + else if (HTSIZE == 2048) > > + return ((handle >> 22) & 0x7FF) ^ > > + ((handle >> 11) & 0x7FF) ^ > > + (handle & 0x7FF); > > + else if (HTSIZE == 1024) > > + return ((handle >> 20) & 0x3FF) ^ > > + ((handle >> 10) & 0x3FF) ^ > > + (handle & 0x3FF); > > + else if (HTSIZE == 512) > > + return (handle >> 27) ^ > > + ((handle >> 18) & 0x1FF) ^ > > + ((handle >> 9) & 0x1FF) ^ > > + (handle & 0x1FF); > > + else if (HTSIZE == 256) { > > + u8 *t = (u8 *) &handle; > > + return t[0] ^ t[1] ^ t[2] ^ t[3]; > > + } else > > + return handle & (HTSIZE - 1); > > } > > Does HTSIZE change at runtime? How does migrating from one to other take > place? > Also why not have a function pointer with a series of these being > separate instead of doing the if checks? BTW it does seem any one of > those hashes maybe sufficient, no? > > cheers, > jamal -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw 2005-04-07 10:38 ` jamal 2005-04-07 10:47 ` Wang Jian @ 2005-04-07 10:51 ` Thomas Graf 2005-04-07 11:07 ` jamal 1 sibling, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-07 10:51 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, lark, netdev * jamal <1112870307.1118.91.camel@jzny.localdomain> 2005-04-07 06:38 > On Wed, 2005-04-06 at 20:55, Thomas Graf wrote: > > > > > static __inline__ int fw_hash(u32 handle) > > { > > - return handle&0xFF; > > + if (HTSIZE == 4096) > > + return ((handle >> 24) & 0xFFF) ^ > > + ((handle >> 12) & 0xFFF) ^ > > + (handle & 0xFFF); > > + else if (HTSIZE == 2048) > > + return ((handle >> 22) & 0x7FF) ^ > > + ((handle >> 11) & 0x7FF) ^ > > + (handle & 0x7FF); > > + else if (HTSIZE == 1024) > > + return ((handle >> 20) & 0x3FF) ^ > > + ((handle >> 10) & 0x3FF) ^ > > + (handle & 0x3FF); > > + else if (HTSIZE == 512) > > + return (handle >> 27) ^ > > + ((handle >> 18) & 0x1FF) ^ > > + ((handle >> 9) & 0x1FF) ^ > > + (handle & 0x1FF); > > + else if (HTSIZE == 256) { > > + u8 *t = (u8 *) &handle; > > + return t[0] ^ t[1] ^ t[2] ^ t[3]; > > + } else > > + return handle & (HTSIZE - 1); > > } > > Does HTSIZE change at runtime? How does migrating from one to other take > place? No, the size is static (PAGE_SIZE/sizeof(unsigned long)) (typically 1024). > Also why not have a function pointer with a series of these being > separate instead of doing the if checks? Because it is static, gcc will optimize and remove all branches leaving only the one that is actually needed, i.e. on most systems the hash function will get down to the contents of the if (HTSIZE == 1024) branch. > BTW it does seem any one of those hashes maybe sufficient, no? If we want to do the xor'ing to maintain collision free hashing for the most common case we need to separate. It would be possible to play games and calculate the shifts etc but gcc has a hard time optimizing such things. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC] dynamic hash table size & xor hash function for cls_fw 2005-04-07 10:51 ` Thomas Graf @ 2005-04-07 11:07 ` jamal 2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: jamal @ 2005-04-07 11:07 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, lark, netdev ok, gotcha - good idea to leave it as is.. cheers, jamal On Thu, 2005-04-07 at 06:51, Thomas Graf wrote: > * jamal <1112870307.1118.91.camel@jzny.localdomain> 2005-04-07 06:38 > > On Wed, 2005-04-06 at 20:55, Thomas Graf wrote: > > > > > > > > static __inline__ int fw_hash(u32 handle) > > > { > > > - return handle&0xFF; > > > + if (HTSIZE == 4096) > > > + return ((handle >> 24) & 0xFFF) ^ > > > + ((handle >> 12) & 0xFFF) ^ > > > + (handle & 0xFFF); > > > + else if (HTSIZE == 2048) > > > + return ((handle >> 22) & 0x7FF) ^ > > > + ((handle >> 11) & 0x7FF) ^ > > > + (handle & 0x7FF); > > > + else if (HTSIZE == 1024) > > > + return ((handle >> 20) & 0x3FF) ^ > > > + ((handle >> 10) & 0x3FF) ^ > > > + (handle & 0x3FF); > > > + else if (HTSIZE == 512) > > > + return (handle >> 27) ^ > > > + ((handle >> 18) & 0x1FF) ^ > > > + ((handle >> 9) & 0x1FF) ^ > > > + (handle & 0x1FF); > > > + else if (HTSIZE == 256) { > > > + u8 *t = (u8 *) &handle; > > > + return t[0] ^ t[1] ^ t[2] ^ t[3]; > > > + } else > > > + return handle & (HTSIZE - 1); > > > } > > > > Does HTSIZE change at runtime? How does migrating from one to other take > > place? > > No, the size is static (PAGE_SIZE/sizeof(unsigned long)) (typically 1024). > > > Also why not have a function pointer with a series of these being > > separate instead of doing the if checks? > > Because it is static, gcc will optimize and remove all branches leaving > only the one that is actually needed, i.e. on most systems the hash > function will get down to the contents of the if (HTSIZE == 1024) branch. > > > BTW it does seem any one of those hashes maybe sufficient, no? > > If we want to do the xor'ing to maintain collision free hashing > for the most common case we need to separate. It would be possible > to play games and calculate the shifts etc but gcc has a hard > time optimizing such things. > ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw 2005-04-07 11:07 ` jamal @ 2005-04-07 13:09 ` Thomas Graf 2005-04-07 13:31 ` Wang Jian 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-07 13:09 UTC (permalink / raw) To: David S. Miller; +Cc: lark, Jamal Hadi Salim, netdev Some test results for the new hashing Test-1: hash 0..2147483648 old 256 buckets hash (enum): empty buckets: 0 average chain length: 8388607.996 min: 8388607 max: 8388608 new 1024 buckets hash (enum): empty buckets: 0 average chain length: 2097151.999 min: 2097151 max: 2097152 Test-2: hash 65536*rand() old 256 buckets hash (rand): empty buckets: 0 average chain length: 256.000 min: 222 max: 296 new 1024 buckets hash (rand): empty buckets: 0 average chain length: 64.000 min: 40 max: 93 Test-3: hash 1024*rand() old 256 buckets hash (rand): empty buckets: 3 average chain length: 4.047 min: 0 max: 9 new 1024 buckets hash (rand): empty buckets: 380 average chain length: 1.590 min: 0 max: 5 Test-4: total time for 2147483648 hashes, 10 runs old hashing algorithm new hashing algorithm 1.619s 1.610s 1.623s 1.611s 1.625s 1.612s 1.629s 1.614s 1.633s 1.616s 1.637s 1.621s 1.638s 1.625s 1.639s 1.630s 1.644s 1.632s 1.737s 1.658s Please do a bk pull bk://kernel.bkbits.net/tgraf/net-2.6-cls_fw This will update the following files: net/sched/cls_fw.c | 31 +++++++++++++++++++++++++++---- 1 files changed, 27 insertions(+), 4 deletions(-) through these ChangeSets: # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/04/07 14:23:09+02:00 tgraf@suug.ch # [PKT_SCHED]: improve hashing performance of cls_fw # # Calculate hashtable size to fit into a page instead of a hardcoded # 256 buckets hash table. Results in a 1024 buckets hashtable on # most systems. # # Replace old naive extract-8-lsb-bits algorithm with a better # algorithm xor'ing 3 or 4 bit fields at the size of the hashtable # array index in order to improve distribution if the majority of # the lower bits are unused while keeping zero collision behaviour # for the most common use case. # # Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue # to attention and to Eran Mann <emann@mrv.com> for the initial # idea for this new algorithm. # # Signed-off-by: Thomas Graf <tgraf@suug.ch> # Signed-off-by: David S. Miller <davem@davemloft.net> # # net/sched/cls_fw.c # 2005/04/07 14:22:58+02:00 tgraf@suug.ch +27 -4 # [PKT_SCHED]: improve hashing performance of cls_fw # diff -Nru a/net/sched/cls_fw.c b/net/sched/cls_fw.c --- a/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00 +++ b/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00 @@ -46,9 +46,11 @@ #include <net/act_api.h> #include <net/pkt_cls.h> +#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *)) + struct fw_head { - struct fw_filter *ht[256]; + struct fw_filter *ht[HTSIZE]; }; struct fw_filter @@ -69,7 +71,28 @@ static __inline__ int fw_hash(u32 handle) { - return handle&0xFF; + if (HTSIZE == 4096) + return ((handle >> 24) & 0xFFF) ^ + ((handle >> 12) & 0xFFF) ^ + (handle & 0xFFF); + else if (HTSIZE == 2048) + return ((handle >> 22) & 0x7FF) ^ + ((handle >> 11) & 0x7FF) ^ + (handle & 0x7FF); + else if (HTSIZE == 1024) + return ((handle >> 20) & 0x3FF) ^ + ((handle >> 10) & 0x3FF) ^ + (handle & 0x3FF); + else if (HTSIZE == 512) + return (handle >> 27) ^ + ((handle >> 18) & 0x1FF) ^ + ((handle >> 9) & 0x1FF) ^ + (handle & 0x1FF); + else if (HTSIZE == 256) { + u8 *t = (u8 *) &handle; + return t[0] ^ t[1] ^ t[2] ^ t[3]; + } else + return handle & (HTSIZE - 1); } static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp, @@ -152,7 +175,7 @@ if (head == NULL) return; - for (h=0; h<256; h++) { + for (h=0; h<HTSIZE; h++) { while ((f=head->ht[h]) != NULL) { head->ht[h] = f->next; fw_delete_filter(tp, f); @@ -291,7 +314,7 @@ if (arg->stop) return; - for (h = 0; h < 256; h++) { + for (h = 0; h < HTSIZE; h++) { struct fw_filter *f; for (f = head->ht[h]; f; f = f->next) { ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw 2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf @ 2005-04-07 13:31 ` Wang Jian 2005-04-07 13:52 ` Thomas Graf 0 siblings, 1 reply; 33+ messages in thread From: Wang Jian @ 2005-04-07 13:31 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, Jamal Hadi Salim, netdev Hi Thomas Graf, Would you please test this case? (0..1024) << 16 The old hash gives a 1024 depth bucket for this case. And I am not sure if there is a bad range (0..n) << s which is mapped into one bucket. On Thu, 7 Apr 2005 15:09:25 +0200, Thomas Graf <tgraf@suug.ch> wrote: > Some test results for the new hashing > > Test-1: hash 0..2147483648 > > old 256 buckets hash (enum): > empty buckets: 0 average chain length: 8388607.996 min: 8388607 max: 8388608 > > new 1024 buckets hash (enum): > empty buckets: 0 average chain length: 2097151.999 min: 2097151 max: 2097152 > > Test-2: hash 65536*rand() > > old 256 buckets hash (rand): > empty buckets: 0 average chain length: 256.000 min: 222 max: 296 > > new 1024 buckets hash (rand): > empty buckets: 0 average chain length: 64.000 min: 40 max: 93 > > Test-3: hash 1024*rand() > > old 256 buckets hash (rand): > empty buckets: 3 average chain length: 4.047 min: 0 max: 9 > > new 1024 buckets hash (rand): > empty buckets: 380 average chain length: 1.590 min: 0 max: 5 > > Test-4: total time for 2147483648 hashes, 10 runs > > old hashing algorithm new hashing algorithm > 1.619s 1.610s > 1.623s 1.611s > 1.625s 1.612s > 1.629s 1.614s > 1.633s 1.616s > 1.637s 1.621s > 1.638s 1.625s > 1.639s 1.630s > 1.644s 1.632s > 1.737s 1.658s > > > Please do a > > bk pull bk://kernel.bkbits.net/tgraf/net-2.6-cls_fw > > This will update the following files: > > net/sched/cls_fw.c | 31 +++++++++++++++++++++++++++---- > 1 files changed, 27 insertions(+), 4 deletions(-) > > through these ChangeSets: > > # This is a BitKeeper generated diff -Nru style patch. > # > # ChangeSet > # 2005/04/07 14:23:09+02:00 tgraf@suug.ch > # [PKT_SCHED]: improve hashing performance of cls_fw > # > # Calculate hashtable size to fit into a page instead of a hardcoded > # 256 buckets hash table. Results in a 1024 buckets hashtable on > # most systems. > # > # Replace old naive extract-8-lsb-bits algorithm with a better > # algorithm xor'ing 3 or 4 bit fields at the size of the hashtable > # array index in order to improve distribution if the majority of > # the lower bits are unused while keeping zero collision behaviour > # for the most common use case. > # > # Thanks to Wang Jian <lark@linux.net.cn> for bringing this issue > # to attention and to Eran Mann <emann@mrv.com> for the initial > # idea for this new algorithm. > # > # Signed-off-by: Thomas Graf <tgraf@suug.ch> > # Signed-off-by: David S. Miller <davem@davemloft.net> > # > # net/sched/cls_fw.c > # 2005/04/07 14:22:58+02:00 tgraf@suug.ch +27 -4 > # [PKT_SCHED]: improve hashing performance of cls_fw > # > diff -Nru a/net/sched/cls_fw.c b/net/sched/cls_fw.c > --- a/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00 > +++ b/net/sched/cls_fw.c 2005-04-07 14:55:55 +02:00 > @@ -46,9 +46,11 @@ > #include <net/act_api.h> > #include <net/pkt_cls.h> > > +#define HTSIZE (PAGE_SIZE/sizeof(struct fw_filter *)) > + > struct fw_head > { > - struct fw_filter *ht[256]; > + struct fw_filter *ht[HTSIZE]; > }; > > struct fw_filter > @@ -69,7 +71,28 @@ > > static __inline__ int fw_hash(u32 handle) > { > - return handle&0xFF; > + if (HTSIZE == 4096) > + return ((handle >> 24) & 0xFFF) ^ > + ((handle >> 12) & 0xFFF) ^ > + (handle & 0xFFF); > + else if (HTSIZE == 2048) > + return ((handle >> 22) & 0x7FF) ^ > + ((handle >> 11) & 0x7FF) ^ > + (handle & 0x7FF); > + else if (HTSIZE == 1024) > + return ((handle >> 20) & 0x3FF) ^ > + ((handle >> 10) & 0x3FF) ^ > + (handle & 0x3FF); > + else if (HTSIZE == 512) > + return (handle >> 27) ^ > + ((handle >> 18) & 0x1FF) ^ > + ((handle >> 9) & 0x1FF) ^ > + (handle & 0x1FF); > + else if (HTSIZE == 256) { > + u8 *t = (u8 *) &handle; > + return t[0] ^ t[1] ^ t[2] ^ t[3]; > + } else > + return handle & (HTSIZE - 1); > } > > static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp, > @@ -152,7 +175,7 @@ > if (head == NULL) > return; > > - for (h=0; h<256; h++) { > + for (h=0; h<HTSIZE; h++) { > while ((f=head->ht[h]) != NULL) { > head->ht[h] = f->next; > fw_delete_filter(tp, f); > @@ -291,7 +314,7 @@ > if (arg->stop) > return; > > - for (h = 0; h < 256; h++) { > + for (h = 0; h < HTSIZE; h++) { > struct fw_filter *f; > > for (f = head->ht[h]; f; f = f->next) { -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw 2005-04-07 13:31 ` Wang Jian @ 2005-04-07 13:52 ` Thomas Graf 2005-04-07 14:03 ` Wang Jian 0 siblings, 1 reply; 33+ messages in thread From: Thomas Graf @ 2005-04-07 13:52 UTC (permalink / raw) To: Wang Jian; +Cc: David S. Miller, Jamal Hadi Salim, netdev * Wang Jian <20050407212340.02D2.LARK@linux.net.cn> 2005-04-07 21:31 > Would you please test this case? > > (0..1024) << 16 > > The old hash gives a 1024 depth bucket for this case. > > And I am not sure if there is a bad range (0..n) << s which is mapped > into one bucket. old hash (enum shift): empty buckets: 255 average chain length: 1024.000 min: 0 max: 1024 new hash (enum shift): empty buckets: 0 average chain length: 1.000 min: 1 max: 1 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw 2005-04-07 13:52 ` Thomas Graf @ 2005-04-07 14:03 ` Wang Jian 0 siblings, 0 replies; 33+ messages in thread From: Wang Jian @ 2005-04-07 14:03 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, Jamal Hadi Salim, netdev Hi Thomas Graf, Thanks :-) On Thu, 7 Apr 2005 15:52:40 +0200, Thomas Graf <tgraf@suug.ch> wrote: > * Wang Jian <20050407212340.02D2.LARK@linux.net.cn> 2005-04-07 21:31 > > Would you please test this case? > > > > (0..1024) << 16 > > > > The old hash gives a 1024 depth bucket for this case. > > > > And I am not sure if there is a bad range (0..n) << s which is mapped > > into one bucket. > > old hash (enum shift): > empty buckets: 255 average chain length: 1024.000 min: 0 max: 1024 > > new hash (enum shift): > empty buckets: 0 average chain length: 1.000 min: 1 max: 1 -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 12:30 ` Thomas Graf 2005-04-06 13:01 ` Wang Jian 2005-04-06 13:34 ` jamal @ 2005-04-06 13:36 ` Eran Mann 2005-04-06 13:53 ` Wang Jian 2 siblings, 1 reply; 33+ messages in thread From: Eran Mann @ 2005-04-06 13:36 UTC (permalink / raw) To: Thomas Graf; +Cc: Wang Jian, hadi, netdev Thomas Graf wrote: > We need to provide some kind of option to the user so he can specify > the needs. The & 0xFF will suit most just fine but has one essential > drawback which is that no distribution is done at all if the lower 8 > bits are set to 0. For static marks this is no issue at all and even > for enumerated marks growing it takes quite some time to grow into > an area where it starts hurting. The problem obviously is if someone > splits the mark field into 2 parts and uses the upper 16 bits for > some special purpose just like you did. In such as case it would make > sense to either take all bits into account or let the user specify > a bitmask + shift. > > So here is the same idea I posted before but revised: > > Let the user specify one of the hash tables via a new TLV: > - default: & 0xFF > - ((mark & mask) >> shift) & 0xFF > - jenkins for 16, 32, and 64 bits > - FNV for 16, 32, and 64 bits > > Why variations for type sizes? The chance of collisions reduces > a lot if the user exactly knows he'll never use more than 16bits > but 255 marks are not enough. > > I'm cooking up a patch for this today together with a fix to > allow 64bit values for the mark. > Maybe you could add to the list of options a XOR of bytes hash, something like: static inline u8 bytes_xor_u32( u32 key ) { u8 *dummy_array = (u8 *)&key; u8 hash = dummy_array[0] ^ dummy_array[1] ^ dummy_array[2] ^ dummy_array[3]; return hash; } -- Eran Mann MRV International www.mrv.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] improvement on net/sched/cls_fw.c's hash function 2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann @ 2005-04-06 13:53 ` Wang Jian 0 siblings, 0 replies; 33+ messages in thread From: Wang Jian @ 2005-04-06 13:53 UTC (permalink / raw) To: Eran Mann; +Cc: Thomas Graf, hadi, netdev Hi Eran Mann, On Wed, 06 Apr 2005 16:36:20 +0300, emann@mrv.com (Eran Mann) wrote: > > Maybe you could add to the list of options a XOR of bytes hash, > something like: > static inline u8 bytes_xor_u32( u32 key ) > { > u8 *dummy_array = (u8 *)&key; > u8 hash = dummy_array[0] ^ dummy_array[1] ^ > dummy_array[2] ^ dummy_array[3]; > > return hash; > } > Ha, then this hi-or-lo word hash, which assumes that high or low word is 0, and key is distributed in the old fashion (0..255, etc) static inline u32 hi_or_lo_hash(u32 key) { return (key >> 16 ^ key) & 0xFF; } -- lark ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2005-04-07 14:03 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-04-05 5:35 [PATCH] improvement on net/sched/cls_fw.c's hash function Wang Jian 2005-04-05 5:37 ` David S. Miller 2005-04-05 6:05 ` Wang Jian 2005-04-05 10:25 ` jamal 2005-04-05 10:38 ` Thomas Graf 2005-04-05 11:25 ` Wang Jian 2005-04-05 12:16 ` Thomas Graf 2005-04-05 12:39 ` Wang Jian 2005-04-05 12:52 ` Thomas Graf 2005-04-05 13:29 ` Wang Jian 2005-04-05 12:54 ` jamal 2005-04-05 14:18 ` Wang Jian 2005-04-05 16:11 ` jamal 2005-04-06 6:45 ` Wang Jian 2005-04-06 12:16 ` jamal 2005-04-06 12:30 ` Thomas Graf 2005-04-06 13:01 ` Wang Jian 2005-04-06 13:34 ` jamal 2005-04-06 13:45 ` Thomas Graf 2005-04-06 14:10 ` Thomas Graf 2005-04-06 18:15 ` David S. Miller 2005-04-06 18:31 ` Thomas Graf 2005-04-07 0:55 ` [RFC] dynamic hash table size & xor hash function for cls_fw Thomas Graf 2005-04-07 10:38 ` jamal 2005-04-07 10:47 ` Wang Jian 2005-04-07 10:51 ` Thomas Graf 2005-04-07 11:07 ` jamal 2005-04-07 13:09 ` [PATCH] [PKT_SCHED]: improve hashing performance of cls_fw Thomas Graf 2005-04-07 13:31 ` Wang Jian 2005-04-07 13:52 ` Thomas Graf 2005-04-07 14:03 ` Wang Jian 2005-04-06 13:36 ` [PATCH] improvement on net/sched/cls_fw.c's hash function Eran Mann 2005-04-06 13:53 ` Wang Jian
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).