* re: conntrack hash function comparison [not found] <20030120232634.24011.21218.Mailman@kashyyyk> @ 2003-01-21 6:19 ` Don Cohen 2003-01-21 11:12 ` Jozsef Kadlecsik 2003-01-21 6:40 ` Question: Variable sized matchinfo Don Cohen 1 sibling, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-01-21 6:19 UTC (permalink / raw) To: netfilter-devel; +Cc: kadlec > Please look trough the web pages and comment it. I see several hash functions have been added since I last looked. Aftr some search I see the code in http://bei.bof.de/ (I suggest a reference to it from the advertised page.) This code looks a bit old. Below is the version I'm using at present. I suggest you try testing this version too. I expect it will be a little slower (using long numbers) but a little better, for instance, on table sizes that are powers of 2. Note A,B,C,D are in this code coef[0..3]. As described below these are randomly chosen at initialization. Hmm, I notice I've lost proto. The code below was meant for only one protocol. It should be ok to add key->proto to sum below. ==== static u32 flowhash (struct flowdesc *key) { unsigned long long ls, ld, lp; unsigned long sum; ls = (unsigned long long) key->saddr * coef[0]; ld = (unsigned long long) key->daddr * coef[1]; lp = (unsigned long long) coef[2] * (u32)((u32)(key->source << 16) + key->dest); sum = ((ls ^ (ls >> 32)) + (ld ^ (ld >> 32)) + (lp ^ (lp >> 32))) ^ coef[3]; return sum % flow_ht_size; } ==== A few details: - The ABCD(EF) algorithms were actually suggested to me by Rich Schroeppel. - I don't see what you consider odd about the random results. - I think the discussion on this list last summer contained an analysis of what we should expect from the random hash function. And now the important point: Besides performance, ABCD(EF) was designed to satisfy another goal: it is supposed to be difficult to attack. In particular, an attacker might try to cause you to spend lots of time in your hash function by arranging to send lots of packets that generate connections that all hash to the same bucket. If he knows your algorithm, including all of the relevant constants, this is pretty easy. The original hash algorithm, of course, poses no challenge to such an attacker. It turns out (as was discussed on the mailing list last summer) that CRC is also easy to attack. I don't know about the hash functions that have shown up more recently, of course. The idea behind ABCD(EF) is that the coefficients A,B,C,D(,E,F) are secrets, chosen randomly by each connection tracking machine. So far I don't know how this can be attacked. My current code assumes that, if it can be attacked at all, this will at least take a long time. The solution is that (1) I detect that the current coeffecients are not working as well as expected (simply that some bucket gets filled past a very unlikely threshold), and in that case (2) we choose new randome coeffecients and rehash everything (which probably takes less than a second unless you have millions of connections). ^ permalink raw reply [flat|nested] 16+ messages in thread
* re: conntrack hash function comparison 2003-01-21 6:19 ` conntrack hash function comparison Don Cohen @ 2003-01-21 11:12 ` Jozsef Kadlecsik 2003-01-21 16:59 ` Don Cohen 0 siblings, 1 reply; 16+ messages in thread From: Jozsef Kadlecsik @ 2003-01-21 11:12 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel Hi, On Mon, 20 Jan 2003, Don Cohen wrote: > > Please look trough the web pages and comment it. > > I see several hash functions have been added since I last looked. > Aftr some search I see the code in http://bei.bof.de/ > (I suggest a reference to it from the advertised page.) I added the reference to cttest to the frontpage as well. The extended cttest source I used hasn't been sent back to Patrick yet, because depending on the answer from the author, I might have to remove the buzhash code from the source. > This code looks a bit old. Below is the version I'm using at > present. I suggest you try testing this version too. Fine, I'll test it. > I expect it will be a little slower (using long numbers) but > a little better, for instance, on table sizes that are powers of 2. I don't think we should restrict ourselves to hashes which work with any kind of hashsize. Be the hast as fast as possible and as good as possible :-). > A few details: > - The ABCD(EF) algorithms were actually suggested to me by Rich > Schroeppel. Then credicts goes to him and to you as well. > - I don't see what you consider odd about the random results. I believe the random hash should give a smooth distribution of the entries. But for example in the case of the matrix dataset/8191 hash size it gives: Count Length CT 5 21 0.14% 3 20 0.22% 11 19 0.50% 29 18 1.19% 50 17 2.32% 99 16 4.43% 180 15 8.02% 290 14 13.41% 445 13 21.11% 662 12 31.67% 833 11 43.85% 1011 10 57.29% 1051 9 69.87% 1046 8 80.99% 911 7 89.47% 716 6 95.18% 442 5 98.12% 254 4 99.47% 105 3 99.89% 38 2 99.99% 7 1 100.00% 3 0 100.00% The number of the longest hash buckets is out of the distribution - there are more of them than the hash buckets just shorter than the longest one. (As I wrote on the pages, we actually test the random number generator in Linux too so I count these examples as the indication of little weaknesses in the generator.) This is the very reason why I "measured" the distances from the ideal hash instead. > The idea behind ABCD(EF) is that the coefficients A,B,C,D(,E,F) are > secrets, chosen randomly by each connection tracking machine. > So far I don't know how this can be attacked. My current code assumes > that, if it can be attacked at all, this will at least take a long > time. The solution is that (1) I detect that the current coeffecients > are not working as well as expected (simply that some bucket gets > filled past a very unlikely threshold), and in that case (2) we choose > new randome coeffecients and rehash everything (which probably takes > less than a second unless you have millions of connections). I don't really like the idea of rehashing if the hash function is (so) strong. What is your opinion on the abcdx variant, where the four random multipliers were chosen so that at any bit postition in the four numbers there are exactly two zeros and two ones (similarly to the rtable in buzhash, http://www.cl.cam.ac.uk/users/kw217/pub/Hash.adt.ps.gz). Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 16+ messages in thread
* re: conntrack hash function comparison 2003-01-21 11:12 ` Jozsef Kadlecsik @ 2003-01-21 16:59 ` Don Cohen 2003-01-31 11:41 ` Harald Welte 0 siblings, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-01-21 16:59 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: netfilter-devel > > I expect it will be a little slower (using long numbers) but > > a little better, for instance, on table sizes that are powers of 2. > > I don't think we should restrict ourselves to hashes which work with any > kind of hashsize. Be the hast as fast as possible and as good as possible > :-). If it's not going to work well on certain table sizes then you should really do something to ensure that the table size is one of the good ones. At least make the default good and add a comment saying which sizes are good/bad. Perhaps I should tell you what the changes are supposed to be good for. The code I saw in cttest (I only suppose you're using the same code) multiplies 32 bit source IP by 32 bit random number. The question is then what part of the result is used. If you do 32 bit adds then only the last 32 bits if the product is used. Note that, while the last bit of source address affects all 32 of these result bits, the first bit of source address affects only the first result bit, and that only if the last bit of the random number is 1. What I want is for every bit of input to affect every bit of output. I therefore now split the 64 bit result in two and add the two halves. (Which has that effect.) So before if the table size was, say, 2^13 (8K), only the last 13 bits of the inputs would affect the result. So 1.2.3.4 would hash to the same thing as 17.39.3.4, or even worse if you're unlucky with endianness, 1.2.3.4 might hash the same as 1.2.17.39 ! In any case, regardless of the likelihood of this sort of thing arising in legitimate data, it makes intentional attack a lot easier. > > - I don't see what you consider odd about the random results. > > I believe the random hash should give a smooth distribution of the > entries. But for example in the case of the matrix dataset/8191 hash size > it gives: > > Count Length CT > 5 21 0.14% > 3 20 0.22% > 11 19 0.50% > 29 18 1.19% > 50 17 2.32% > 99 16 4.43% > 180 15 8.02% > 290 14 13.41% > 445 13 21.11% > 662 12 31.67% > 833 11 43.85% > 1011 10 57.29% > 1051 9 69.87% > 1046 8 80.99% > 911 7 89.47% > 716 6 95.18% > 442 5 98.12% > 254 4 99.47% > 105 3 99.89% > 38 2 99.99% > 7 1 100.00% > 3 0 100.00% > > The number of the longest hash buckets is out of the distribution - there > are more of them than the hash buckets just shorter than the longest one. The expected number is "smooth", but of course you shouldn't expect a given test to follow the expected number exactly. This looks to me like a perfectly reasonable data point. Generally I'm happy if the longest bucket is not much longer than expected. That is, use the formula from last summer's discussion to see the expected number of buckets of length 21, and if it's > .1 then don't worry. If it's between .1 and .001 then try a few more tests. If it's 1e-12 then your hash function is no good. > I don't really like the idea of rehashing if the hash function is > (so) strong. At some point it's clearly worth rehashing. You have to decide what that point is. If the function is so good and successfully resists attack then you'll never (or hardly ever) need to rehash, so all you pay is a tiny amount of space for code that's never used. Whereas, if you're extremely unlucky, or a clever new attack method comes along, that small amount of code will help you a whole lot. You choose your table size and rehash threshold in order to make the search time acceptable (max length of bucket) and also the probability of rehash very small. I think at the moment I use 20 as my rehash threshold and a table size that's the number of entries I plan to put in it, so average bucket occupancy <= 1. (Dusting off my old code, 20 entries in a given bucket in that case seems to have probability ~ 1e-19. So even with 1e6 buckets, not very likely.) > What is your opinion on the abcdx variant, where the four random > multipliers were chosen so that at any bit postition in the four > numbers there are exactly two zeros and two ones (similarly to the > rtable in buzhash, http://www.cl.cam.ac.uk/users/kw217/pub/Hash.adt.ps.gz). Immediate reactions: - I'd like to hear the rationale for this - what good is it supposed to do? Conversely, what problems arise from not doing it? - Rich did mention to me that it would be good for the random numbers to each contain approximately the same number of 1's and 0's. It's at least clear that zero would be a bad choice. Perhaps when you choose random numbers you could just require, e.g., every byte contain 3-5 1's (and 0's), i.e., eliminate the cases of 0,1,2, of which there are 2+16+56=74, leaving 182 out of 256 possible values... Right now I simply rely on rehash to fix any such problems (or any OTHER problems for that matter) that actually arise. Since I argue you really ought to have that as a backup anyway, I don't mind using it to solve this problem. I do recommend a log entry if you ever have to rehash, and frequent rehash would be worth notification if you have such a facility. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-01-21 16:59 ` Don Cohen @ 2003-01-31 11:41 ` Harald Welte 2003-02-01 0:41 ` Don Cohen 0 siblings, 1 reply; 16+ messages in thread From: Harald Welte @ 2003-01-31 11:41 UTC (permalink / raw) To: Don Cohen; +Cc: Jozsef Kadlecsik, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1776 bytes --] On Tue, Jan 21, 2003 at 08:59:12AM -0800, Don Cohen wrote: > > I don't really like the idea of rehashing if the hash function is > > (so) strong. > At some point it's clearly worth rehashing. You have to decide what > that point is. If the function is so good and successfully resists > attack then you'll never (or hardly ever) need to rehash, so all you > pay is a tiny amount of space for code that's never used. > Whereas, if you're extremely unlucky, or a clever new attack method > comes along, that small amount of code will help you a whole lot. > You choose your table size and rehash threshold in order to make the > search time acceptable (max length of bucket) and also the probability > of rehash very small. not that I totally disagree with rehashing being a good idea... but in the case you add this 'small piece of code', you also introduce a new vulnerability: If somebody can provoke the firewall to rehash all the time ;( > Right now I simply rely on rehash to fix any such problems (or any > OTHER problems for that matter) that actually arise. Since I argue > you really ought to have that as a backup anyway, I don't mind using > it to solve this problem. I do recommend a log entry if you ever have > to rehash, and frequent rehash would be worth notification if you have > such a facility. yup. rehash parameters should be a sysctl, and rehash statistics kept somewhere in /proc. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-01-31 11:41 ` Harald Welte @ 2003-02-01 0:41 ` Don Cohen 2003-02-01 6:57 ` Patrick Schaaf 0 siblings, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-02-01 0:41 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, netfilter-devel Harald Welte writes: > not that I totally disagree with rehashing being a good idea... but in > the case you add this 'small piece of code', you also introduce a new > vulnerability: If somebody can provoke the firewall to rehash all the > time ;( The counter argument is that if they can do that then they have a way of causing lots of connections to go into the same bucket, and allowing that to happen is (at least in the extreme case) much worse than rehashing. Obviously it's also bad to rehash too eagerly. I think the only question is where to put the threshold. It's reasonable to make that threshold read/writable via /proc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-02-01 0:41 ` Don Cohen @ 2003-02-01 6:57 ` Patrick Schaaf 2003-02-01 7:58 ` Don Cohen 0 siblings, 1 reply; 16+ messages in thread From: Patrick Schaaf @ 2003-02-01 6:57 UTC (permalink / raw) To: Don Cohen; +Cc: Harald Welte, Jozsef Kadlecsik, netfilter-devel On Fri, Jan 31, 2003 at 04:41:28PM -0800, Don Cohen wrote: > Harald Welte writes: > > not that I totally disagree with rehashing being a good idea... but in > > the case you add this 'small piece of code', you also introduce a new > > vulnerability: If somebody can provoke the firewall to rehash all the > > time ;( > The counter argument is that if they can do that then they have a way > of causing lots of connections to go into the same bucket, and allowing > that to happen is (at least in the extreme case) much worse than > rehashing. Obviously it's also bad to rehash too eagerly. > I think the only question is where to put the threshold. I imagine that instead of, or in addition to, a threshold on list lengths, I would like to have a "minimum time between rehashings" knob, set to e.g. one hour. That way, the cost is guaranteed independant of packet arrival rate. And, to practically disable rehashing, that knob can be set to an immensely large value. Hmm. Maybe even with a printk() 10 (N) minutes in advance, telling the admin "I think I'm going to rehash a bit, soon". No. Forget about all that. 1) metrics in /proc, including a "max list length during the last hour" statistic. 2) one read/writable /proc entry, containing an integer: the number of hash buckets. This is missing right now, anyway (have to grep the boot messages). Make it writable, and rehash exactly when a write changes the hash bucket count. This way, the policy (when to rehash) is safely in user space, the kernel gives mechanism, only - just like it should. What do you think? best regards Patrick ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-02-01 6:57 ` Patrick Schaaf @ 2003-02-01 7:58 ` Don Cohen 2003-02-01 8:37 ` Harald Welte 0 siblings, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-02-01 7:58 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Harald Welte, Jozsef Kadlecsik, netfilter-devel Patrick Schaaf writes: > On Fri, Jan 31, 2003 at 04:41:28PM -0800, Don Cohen wrote: > > Harald Welte writes: > > > not that I totally disagree with rehashing being a good idea... but in > > > the case you add this 'small piece of code', you also introduce a new > > > vulnerability: If somebody can provoke the firewall to rehash all the > > > time ;( > > The counter argument is that if they can do that then they have a way > > of causing lots of connections to go into the same bucket, and allowing > > that to happen is (at least in the extreme case) much worse than > > rehashing. Obviously it's also bad to rehash too eagerly. > > I think the only question is where to put the threshold. > > I imagine that instead of, or in addition to, a threshold on list > lengths, I would like to have a "minimum time between rehashings" knob, > set to e.g. one hour. That way, the cost is guaranteed independant of > packet arrival rate. And, to practically disable rehashing, that knob > can be set to an immensely large value. I don't understand the utility of such a control. Why should you rehash if the table is doing ok? Conversely, if it's not, why should you suffer for an hour? > 1) metrics in /proc, including a "max list length during the last hour" > statistic. not very practical if you think about it More reasonable: longest bucket length since last rehash If you really want to take the time to look at such things, I'd suggest a list of lengths of all buckets. > 2) one read/writable /proc entry, containing an integer: the number of > hash buckets. This is missing right now, anyway (have to grep the > boot messages). Make it writable, and rehash exactly when a write > changes the hash bucket count. It's currently a module parameter? Not that it would be so hard to change it at run time, but is that really so useful? You could unload the module and reload it with a different parameter. If the complaint is that you then lose the data, I suggest using the proc file to save and restore (copy the /proc data to a real file, then copy it back after unload/reload). That would even let you save and restore over a reboot. > This way, the policy (when to rehash) is safely in user space, the > kernel gives mechanism, only - just like it should. /proc control over the rehash threshold (max bucket size) seems like an appropriate "control over policy". Whereas a user space program doing something to cause a rehash seems to me to be a bad choice. (For one thing, when the kernel gets into trouble the user space program might never have a chance to run.) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-02-01 7:58 ` Don Cohen @ 2003-02-01 8:37 ` Harald Welte 2003-02-01 19:05 ` Don Cohen 0 siblings, 1 reply; 16+ messages in thread From: Harald Welte @ 2003-02-01 8:37 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2503 bytes --] On Fri, Jan 31, 2003 at 11:58:37PM -0800, Don Cohen wrote: > not very practical if you think about it > More reasonable: longest bucket length since last rehash > If you really want to take the time to look at such things, I'd suggest > a list of lengths of all buckets. I think that whas what Jozsef really meant. > > 2) one read/writable /proc entry, containing an integer: the number of > > hash buckets. This is missing right now, anyway (have to grep the > > boot messages). Make it writable, and rehash exactly when a write > > changes the hash bucket count. > It's currently a module parameter? yes, it is. > Not that it would be so hard to > change it at run time, but is that really so useful? Especially in the case where it is not a kernel module ;) > You could unload the module and reload it with a different parameter. > If the complaint is that you then lose the data, I suggest using the > proc file to save and restore (copy the /proc data to a real file, then > copy it back after unload/reload). That would even let you save and > restore over a reboot. no way. no way. no way. sorry, but read the various discussions about the /proc/ip_conntrack interface in general, think about all the data that is not printed (like references to currently in-kernel packets, connection between expectations and master connection, ... lots of those issues are relevant for conntrack failover as well. This is only viable via ctnetlink, which is not part of the stock kernel yet. > > This way, the policy (when to rehash) is safely in user space, the > > kernel gives mechanism, only - just like it should. > /proc control over the rehash threshold (max bucket size) seems like > an appropriate "control over policy". > Whereas a user space program doing something to cause a rehash seems > to me to be a bad choice. (For one thing, when the kernel gets into > trouble the user space program might never have a chance to run.) yes, I agree. and that's why i like the idea of a 'min rehash interval' proc parameter. If you set this to '0', kernel will rehash never. '1' means maximum of one rehash per second, ... -- - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-02-01 8:37 ` Harald Welte @ 2003-02-01 19:05 ` Don Cohen 2003-02-02 22:45 ` Harald Welte 0 siblings, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-02-01 19:05 UTC (permalink / raw) To: Harald Welte; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel Harald Welte writes: > > Not that it would be so hard to change [hash table size] at run time, but is that really so useful? > Especially in the case where it is not a kernel module ;) Well, then you need a way to initialize it. Is there a standard solution to this problem? Or is the inability to choose such values just the penalty you pay for choosing Y instead of M? > > You could unload the module and reload it with a different parameter. > > If the complaint is that you then lose the data, I suggest using the > > proc file to save and restore (copy the /proc data to a real file, then > > copy it back after unload/reload). That would even let you save and > > restore over a reboot. > > no way. no way. no way. > sorry, but read the various discussions about the /proc/ip_conntrack > interface in general, think about all the data that is not printed (like > references to currently in-kernel packets, connection between > expectations and master connection, ... lots of those issues are > relevant for conntrack failover as well. This is only viable via > ctnetlink, which is not part of the stock kernel yet. This was meant as a general suggestion, not specific to conntrack. I don't know all the details specific to conntrack. It seems surprising that you couldn't save all relevant state via /proc. Even more so if you think you could do it with netlink. The stuff above suggests that you're worried about highly volatile state, but the suggestion is meant only for "stable" states. It makes sense to remember connections across a reboot (if it doesn't take too long), not packets. You'd have to effectively disable networking (so there are no packets to save) before saving state. I plan to try it for my module that does conntrack-like things. > > > This way, the policy (when to rehash) is safely in user space, the > > > kernel gives mechanism, only - just like it should. > > /proc control over the rehash threshold (max bucket size) seems like > > an appropriate "control over policy". > > Whereas a user space program doing something to cause a rehash seems > > to me to be a bad choice. (For one thing, when the kernel gets into > > trouble the user space program might never have a chance to run.) > > yes, I agree. and that's why i like the idea of a 'min rehash interval' > proc parameter. If you set this to '0', kernel will rehash never. '1' > means maximum of one rehash per second, ... I think I finally understand what this is all about. It's not to automatically rehash after time t, but rather to make sure you don't spend all of your time rehashing. This seems reasonable. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: conntrack hash function comparison 2003-02-01 19:05 ` Don Cohen @ 2003-02-02 22:45 ` Harald Welte 0 siblings, 0 replies; 16+ messages in thread From: Harald Welte @ 2003-02-02 22:45 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1438 bytes --] On Sat, Feb 01, 2003 at 11:05:54AM -0800, Don Cohen wrote: > Harald Welte writes: > > > > Not that it would be so hard to change [hash table size] > at run time, but is that really so useful? > > Especially in the case where it is not a kernel module ;) > > Well, then you need a way to initialize it. Is there a standard > solution to this problem? Or is the inability to choose such values > just the penalty you pay for choosing Y instead of M? no, there is no standard solution. And yes, compilng conntrack into the kernel instead of dynamically loading it as a module _is_ a performance improvement. On x86 it's negligible - but on ppc it should make a measurable difference. > > yes, I agree. and that's why i like the idea of a 'min rehash interval' > > proc parameter. If you set this to '0', kernel will rehash never. '1' > > means maximum of one rehash per second, ... > > I think I finally understand what this is all about. > It's not to automatically rehash after time t, but rather to > make sure you don't spend all of your time rehashing. exactly. > This seems reasonable. -- - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* re: Question: Variable sized matchinfo [not found] <20030120232634.24011.21218.Mailman@kashyyyk> 2003-01-21 6:19 ` conntrack hash function comparison Don Cohen @ 2003-01-21 6:40 ` Don Cohen 2003-01-21 11:42 ` Patrick McHardy 1 sibling, 1 reply; 16+ messages in thread From: Don Cohen @ 2003-01-21 6:40 UTC (permalink / raw) To: netfilter-devel, Patrick McHardy > I want to write a match where it would be nice to pass a variable sized > matchinfo to kernelspace. The answer when I asked this question in connection with u32 was no. In fact, I asked: > Would it work to make that 10 into a module parm (or three) ? Harald answered: unfortunately not. The size of this structure needs to be known at compile time of the kernel and iptables userspace (and they have to be the same, obviously). > Is this possible ? I want to avoid always using the largest possible > values (2^16 + a few bytes). > The data in question is a bpf program compiled with pcap_compile .. > This is probably not the most useful match but i think it beats u32 > because bpf syntax is already well-known > and very pleasant to use. Thanks for the reference. I'll read about it. I assume you mean the u32 I posted recently. So far the bpf language doesn't strike me as pleasant to use compared to the small language I made up for u32, but maybe that's just cause I'm not used to it. I gather 2^16 is the maximum possible size of a bpf program. How about supplying several different variants of your module, one that has data size 128 bytes, another with size 1K, another 8K and finally 64K. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Question: Variable sized matchinfo 2003-01-21 6:40 ` Question: Variable sized matchinfo Don Cohen @ 2003-01-21 11:42 ` Patrick McHardy 2003-01-21 16:07 ` Laszlo Valko 0 siblings, 1 reply; 16+ messages in thread From: Patrick McHardy @ 2003-01-21 11:42 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel Don Cohen wrote: > > I want to write a match where it would be nice to pass a variable sized > > matchinfo to kernelspace. >The answer when I asked this question in connection with u32 was no. >In fact, >I asked: > > Would it work to make that 10 into a module parm (or three) ? >Harald answered: > unfortunately not. The size of this structure needs to be known at > compile time of the kernel and iptables userspace (and they have to be > the same, obviously). > Oops, i probably missed that. Anyway this doesn't seem to be real problem, you could just pass pointers and copy_from_user the data. The probleme there is the match is not informed if its not needed anymore, so no chance to free the memory. > > > Is this possible ? I want to avoid always using the largest possible > > values (2^16 + a few bytes). > > The data in question is a bpf program compiled with pcap_compile .. > > This is probably not the most useful match but i think it beats u32 > > because bpf syntax is already well-known > > and very pleasant to use. >Thanks for the reference. I'll read about it. I assume you mean the >u32 I posted recently. >So far the bpf language doesn't strike me as pleasant to use compared >to the small language I made up for u32, but maybe that's just cause >I'm not used to it. >I gather 2^16 is the maximum possible size of a bpf program. > no they are not limited in size AFAIK. I recently portet PPP_FILTERs to isdn and they chose a limit of 2^16 which sounds sane. But i missed its not 2^16 bytes but 2^16 * sizeof(struct sock_fprog). If you want to look at a example for using sk_run_filter (matches bpf code in kernel) you can look at it at http://trash.net/~kaber >How about supplying several different variants of your module, >one that has data size 128 bytes, another with size 1K, another 8K >and finally 64K. > > no i decided its too ugly to do stuff like this without beeing able to either pass a variable-sized struct from userspace (the size-check is done by the module itself, so no problem there) or allocate the memory in kernelspace myself (and free it afterwards). Both doesn't seem like a big problem to do, but it's not worth it for a just-for-fun match. Another uglyness is it is not possible to display the bpf code in userspace with iptables -L because its already compiled and optimized. IIRC you ran into the same problem with u32. If anyone would really like a bpf match tell me and i might reconsider. Patrick ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Question: Variable sized matchinfo 2003-01-21 11:42 ` Patrick McHardy @ 2003-01-21 16:07 ` Laszlo Valko 2003-01-21 16:32 ` [OT " Patrick McHardy 0 siblings, 1 reply; 16+ messages in thread From: Laszlo Valko @ 2003-01-21 16:07 UTC (permalink / raw) To: Patrick McHardy; +Cc: Don Cohen, netfilter-devel On Tue, Jan 21, 2003 at 12:42:16PM +0100, Patrick McHardy wrote: > Oops, i probably missed that. > Anyway this doesn't seem to be real problem, you could just pass pointers > and copy_from_user the data. The probleme there is the match is not informed > if its not needed anymore, so no chance to free the memory. But consider that the pointer on the kernel-side might not look like what it looks like on the user-side! See my patch earlier with a sparc64 compatibility fix. Basically, you will have to implement a thunk which translates the 32-bit userspace pointer sent from iptables to a 64-bit kernelspace pointer. The best solution is to avoid pointers, as the pointer translation usually involves translating the whole struct being passed because of the field offset differences. Laszlo > > Patrick > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [OT Re: Question: Variable sized matchinfo 2003-01-21 16:07 ` Laszlo Valko @ 2003-01-21 16:32 ` Patrick McHardy 2003-01-21 16:46 ` Laszlo Valko 0 siblings, 1 reply; 16+ messages in thread From: Patrick McHardy @ 2003-01-21 16:32 UTC (permalink / raw) To: Laszlo Valko; +Cc: Don Cohen, netfilter-devel Laszlo Valko wrote: >On Tue, Jan 21, 2003 at 12:42:16PM +0100, Patrick McHardy wrote: > > >>Oops, i probably missed that. >>Anyway this doesn't seem to be real problem, you could just pass pointers >>and copy_from_user the data. The probleme there is the match is not informed >>if its not needed anymore, so no chance to free the memory. >> >> > >But consider that the pointer on the kernel-side might not look like >what it looks like on the user-side! > >See my patch earlier with a sparc64 compatibility fix. Basically, >you will have to implement a thunk which translates the 32-bit userspace >pointer sent from iptables to a 64-bit kernelspace pointer. The best >solution is to avoid pointers, as the pointer translation usually involves >translating the whole struct being passed because of the field >offset differences. > >Laszlo > > you're right of course .. one question: i see copy_from_user in many places in the kernel, often within with user supplied addresses. Do all of these don't work on sparc64 ? what a horrible architecture ;) regards, patrick ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [OT Re: Question: Variable sized matchinfo 2003-01-21 16:32 ` [OT " Patrick McHardy @ 2003-01-21 16:46 ` Laszlo Valko 0 siblings, 0 replies; 16+ messages in thread From: Laszlo Valko @ 2003-01-21 16:46 UTC (permalink / raw) To: Patrick McHardy; +Cc: Don Cohen, netfilter-devel On Tue, Jan 21, 2003 at 05:32:46PM +0100, Patrick McHardy wrote: > you're right of course .. one question: i see copy_from_user in many > places in the kernel, often within with user supplied > addresses. Do all of these don't work on sparc64 ? what a horrible > architecture ;) > > regards, > patrick > Look at arch/sparc64/kernel/*.c and you'll see magic in there... Just to have a small picture of the situation... And imagine what will happen with x86-64! I doubt that there'll be a compiler for that platform which produces working 64bit userspace code real soon. It took about 5 years for sparc... :))) Laszlo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Question: Variable sized matchinfo @ 2003-01-20 12:14 Patrick McHardy 0 siblings, 0 replies; 16+ messages in thread From: Patrick McHardy @ 2003-01-20 12:14 UTC (permalink / raw) To: Netfilter Development Mailinglist Hi. I want to write a match where it would be nice to pass a variable sized matchinfo to kernelspace. Is this possible ? I want to avoid always using the largest possible values (2^16 + a few bytes). The data in question is a bpf program compiled with pcap_compile .. This is probably not the most useful match but i think it beats u32 because bpf syntax is already well-known and very pleasant to use. Regards, Patrick ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2003-02-02 22:45 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030120232634.24011.21218.Mailman@kashyyyk>
2003-01-21 6:19 ` conntrack hash function comparison Don Cohen
2003-01-21 11:12 ` Jozsef Kadlecsik
2003-01-21 16:59 ` Don Cohen
2003-01-31 11:41 ` Harald Welte
2003-02-01 0:41 ` Don Cohen
2003-02-01 6:57 ` Patrick Schaaf
2003-02-01 7:58 ` Don Cohen
2003-02-01 8:37 ` Harald Welte
2003-02-01 19:05 ` Don Cohen
2003-02-02 22:45 ` Harald Welte
2003-01-21 6:40 ` Question: Variable sized matchinfo Don Cohen
2003-01-21 11:42 ` Patrick McHardy
2003-01-21 16:07 ` Laszlo Valko
2003-01-21 16:32 ` [OT " Patrick McHardy
2003-01-21 16:46 ` Laszlo Valko
2003-01-20 12:14 Patrick McHardy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.