re: conntrack hash function comparison

All of lore.kernel.org
 help / color / mirror / Atom feed

* re: conntrack hash function comparison
       [not found] <20030120232634.24011.21218.Mailman@kashyyyk>
@ 2003-01-21  6:19 ` Don Cohen
  2003-01-21 11:12   ` Jozsef Kadlecsik
  2003-01-21  6:40 ` Question: Variable sized matchinfo Don Cohen
  1 sibling, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-01-21  6:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kadlec

 > Please look trough the web pages and comment it.

I see several hash functions have been added since I last looked.
Aftr some search I see the code in http://bei.bof.de/
(I suggest a reference to it from the advertised page.)
This code looks a bit old.  Below is the version I'm using at
present.  I suggest you try testing this version too.
I expect it will be a little slower (using long numbers) but
a little better, for instance, on table sizes that are powers of 2.
Note A,B,C,D are in this code coef[0..3].  As described below these
are randomly chosen at initialization.  
Hmm, I notice I've lost proto.  The code below was meant for only
one protocol.  It should be ok to add key->proto to sum below.
====
static u32 flowhash (struct flowdesc *key)
{
  unsigned long long ls, ld, lp;
  unsigned long sum;
  ls = (unsigned long long) key->saddr * coef[0];
  ld = (unsigned long long) key->daddr * coef[1];
  lp = (unsigned long long) coef[2] *
    (u32)((u32)(key->source << 16) + key->dest);
  sum =
    ((ls ^ (ls >> 32)) +
     (ld ^ (ld >> 32)) +
     (lp ^ (lp >> 32))) ^ coef[3];
  return sum % flow_ht_size;
}
====

A few details:
- The ABCD(EF) algorithms were actually suggested to me by Rich
Schroeppel. 
- I don't see what you consider odd about the random results.
- I think the discussion on this list last summer contained an
analysis of what we should expect from the random hash function.

And now the important point:
Besides performance, ABCD(EF) was designed to satisfy another goal:
it is supposed to be difficult to attack.  In particular, an attacker
might try to cause you to spend lots of time in your hash function by
arranging to send lots of packets that generate connections that all
hash to the same bucket.  If he knows your algorithm, including all
of the relevant constants, this is pretty easy.  The original hash
algorithm, of course, poses no challenge to such an attacker.  
It turns out (as was discussed on the mailing list last summer) that
CRC is also easy to attack.  I don't know about the hash functions
that have shown up more recently, of course.
The idea behind ABCD(EF) is that the coefficients A,B,C,D(,E,F) are
secrets, chosen randomly by each connection tracking machine.
So far I don't know how this can be attacked.  My current code assumes
that, if it can be attacked at all, this will at least take a long
time.  The solution is that (1) I detect that the current coeffecients
are not working as well as expected (simply that some bucket gets
filled past a very unlikely threshold), and in that case (2) we choose
new randome coeffecients and rehash everything (which probably takes
less than a second unless you have millions of connections).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: Question: Variable sized matchinfo
       [not found] <20030120232634.24011.21218.Mailman@kashyyyk>
  2003-01-21  6:19 ` conntrack hash function comparison Don Cohen
@ 2003-01-21  6:40 ` Don Cohen
  2003-01-21 11:42   ` Patrick McHardy
  1 sibling, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-01-21  6:40 UTC (permalink / raw)
  To: netfilter-devel, Patrick McHardy

 > I want to write a match where it would be nice to pass a variable sized 
 > matchinfo to kernelspace.
The answer when I asked this question in connection with u32 was no.
In fact,
I asked:
  > Would it work to make that 10 into a module parm (or three) ?
Harald answered:
  unfortunately not.  The size of this structure needs to be known at
  compile time of the kernel and iptables userspace (and they have to be
  the same, obviously).

 > Is this possible ? I want to avoid always using the largest possible 
 > values (2^16 + a few bytes).
 > The data in question is a bpf program compiled with pcap_compile ..
 > This is probably not the most useful match but i think it beats u32 
 > because bpf syntax is already well-known
 > and very pleasant to use.
Thanks for the reference.  I'll read about it.  I assume you mean the
u32 I posted recently.
So far the bpf language doesn't strike me as pleasant to use compared
to the small language I made up for u32, but maybe that's just cause
I'm not used to it.  
I gather 2^16 is the maximum possible size of a bpf program.
How about supplying several different variants of your module,
one that has data size 128 bytes, another with size 1K, another 8K
and finally 64K.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: conntrack hash function comparison
  2003-01-21  6:19 ` conntrack hash function comparison Don Cohen
@ 2003-01-21 11:12   ` Jozsef Kadlecsik
  2003-01-21 16:59     ` Don Cohen
  0 siblings, 1 reply; 15+ messages in thread
From: Jozsef Kadlecsik @ 2003-01-21 11:12 UTC (permalink / raw)
  To: Don Cohen; +Cc: netfilter-devel

Hi,

On Mon, 20 Jan 2003, Don Cohen wrote:

>  > Please look trough the web pages and comment it.
>
> I see several hash functions have been added since I last looked.
> Aftr some search I see the code in http://bei.bof.de/
> (I suggest a reference to it from the advertised page.)

I added the reference to cttest to the frontpage as well.

The extended cttest source I used hasn't been sent back to Patrick yet,
because depending on the answer from the author, I might have to remove
the buzhash code from the source.

> This code looks a bit old.  Below is the version I'm using at
> present.  I suggest you try testing this version too.

Fine, I'll test it.

> I expect it will be a little slower (using long numbers) but
> a little better, for instance, on table sizes that are powers of 2.

I don't think we should restrict ourselves to hashes which work with any
kind of hashsize. Be the hast as fast as possible and as good as possible
:-).

> A few details:
> - The ABCD(EF) algorithms were actually suggested to me by Rich
> Schroeppel.

Then credicts goes to him and to you as well.

> - I don't see what you consider odd about the random results.

I believe the random hash should give a smooth distribution of the
entries. But for example in the case of the matrix dataset/8191 hash size
it gives:

Count   Length  CT
5       21      0.14%
3       20      0.22%
11      19      0.50%
29      18      1.19%
50      17      2.32%
99      16      4.43%
180     15      8.02%
290     14      13.41%
445     13      21.11%
662     12      31.67%
833     11      43.85%
1011    10      57.29%
1051    9       69.87%
1046    8       80.99%
911     7       89.47%
716     6       95.18%
442     5       98.12%
254     4       99.47%
105     3       99.89%
38      2       99.99%
7       1       100.00%
3       0       100.00%

The number of the longest hash buckets is out of the distribution - there
are more of them than the hash buckets just shorter than the longest one.
(As I wrote on the pages, we actually test the random number generator in
Linux too so I count these examples as the indication of little
weaknesses in the generator.) This is the very reason why I "measured" the
distances from the ideal hash instead.

> The idea behind ABCD(EF) is that the coefficients A,B,C,D(,E,F) are
> secrets, chosen randomly by each connection tracking machine.
> So far I don't know how this can be attacked.  My current code assumes
> that, if it can be attacked at all, this will at least take a long
> time.  The solution is that (1) I detect that the current coeffecients
> are not working as well as expected (simply that some bucket gets
> filled past a very unlikely threshold), and in that case (2) we choose
> new randome coeffecients and rehash everything (which probably takes
> less than a second unless you have millions of connections).

I don't really like the idea of rehashing if the hash function is
(so) strong.

What is your opinion on the abcdx variant, where the four random
multipliers were chosen so that at any bit postition in the four
numbers there are exactly two zeros and two ones (similarly to the
rtable in buzhash, http://www.cl.cam.ac.uk/users/kw217/pub/Hash.adt.ps.gz).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question: Variable sized matchinfo
  2003-01-21  6:40 ` Question: Variable sized matchinfo Don Cohen
@ 2003-01-21 11:42   ` Patrick McHardy
  2003-01-21 16:07     ` Laszlo Valko
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2003-01-21 11:42 UTC (permalink / raw)
  To: Don Cohen; +Cc: netfilter-devel

Don Cohen wrote:

> > I want to write a match where it would be nice to pass a variable sized 
> > matchinfo to kernelspace.
>The answer when I asked this question in connection with u32 was no.
>In fact,
>I asked:
>  > Would it work to make that 10 into a module parm (or three) ?
>Harald answered:
>  unfortunately not.  The size of this structure needs to be known at
>  compile time of the kernel and iptables userspace (and they have to be
>  the same, obviously).
>

Oops, i probably missed that.
Anyway this doesn't seem to be real problem, you could just pass pointers
and copy_from_user the data. The probleme there is the match is not informed
if its not needed anymore, so no chance to free the memory.

>
> > Is this possible ? I want to avoid always using the largest possible 
> > values (2^16 + a few bytes).
> > The data in question is a bpf program compiled with pcap_compile ..
> > This is probably not the most useful match but i think it beats u32 
> > because bpf syntax is already well-known
> > and very pleasant to use.
>Thanks for the reference.  I'll read about it.  I assume you mean the
>u32 I posted recently.
>So far the bpf language doesn't strike me as pleasant to use compared
>to the small language I made up for u32, but maybe that's just cause
>I'm not used to it.  
>I gather 2^16 is the maximum possible size of a bpf program.
>

no they are not limited in size AFAIK. I recently portet PPP_FILTERs to 
isdn and
they chose a limit of 2^16 which sounds sane. But i missed its not 2^16 
bytes
but 2^16 * sizeof(struct sock_fprog). If you want to look at a example 
for using
sk_run_filter (matches bpf code in kernel) you can look at it at 
http://trash.net/~kaber

>How about supplying several different variants of your module,
>one that has data size 128 bytes, another with size 1K, another 8K
>and finally 64K.
>  
>

no i decided its too ugly to do stuff like this without beeing able to 
either pass a variable-sized
struct from userspace (the size-check is done by the module itself, so 
no problem there) or allocate
the memory in kernelspace myself (and free it afterwards). Both doesn't 
seem like a big problem to do,
but it's not worth it for a just-for-fun match. Another uglyness is it 
is not possible to display the bpf
code in userspace with iptables -L because its already compiled and 
optimized. IIRC you ran into the same
problem with u32.
If anyone would really like a bpf match tell me and i might reconsider.

Patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question: Variable sized matchinfo
  2003-01-21 11:42   ` Patrick McHardy
@ 2003-01-21 16:07     ` Laszlo Valko
  2003-01-21 16:32       ` [OT " Patrick McHardy
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Valko @ 2003-01-21 16:07 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Don Cohen, netfilter-devel

On Tue, Jan 21, 2003 at 12:42:16PM +0100, Patrick McHardy wrote:
> Oops, i probably missed that.
> Anyway this doesn't seem to be real problem, you could just pass pointers
> and copy_from_user the data. The probleme there is the match is not informed
> if its not needed anymore, so no chance to free the memory.

But consider that the pointer on the kernel-side might not look like
what it looks like on the user-side!

See my patch earlier with a sparc64 compatibility fix. Basically,
you will have to implement a thunk which translates the 32-bit userspace
pointer sent from iptables to a 64-bit kernelspace pointer. The best
solution is to avoid pointers, as the pointer translation usually involves
translating the whole struct being passed because of the field
offset differences.

Laszlo

> 
> Patrick
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [OT Re: Question: Variable sized matchinfo
  2003-01-21 16:07     ` Laszlo Valko
@ 2003-01-21 16:32       ` Patrick McHardy
  2003-01-21 16:46         ` Laszlo Valko
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2003-01-21 16:32 UTC (permalink / raw)
  To: Laszlo Valko; +Cc: Don Cohen, netfilter-devel

Laszlo Valko wrote:

>On Tue, Jan 21, 2003 at 12:42:16PM +0100, Patrick McHardy wrote:
>  
>
>>Oops, i probably missed that.
>>Anyway this doesn't seem to be real problem, you could just pass pointers
>>and copy_from_user the data. The probleme there is the match is not informed
>>if its not needed anymore, so no chance to free the memory.
>>    
>>
>
>But consider that the pointer on the kernel-side might not look like
>what it looks like on the user-side!
>
>See my patch earlier with a sparc64 compatibility fix. Basically,
>you will have to implement a thunk which translates the 32-bit userspace
>pointer sent from iptables to a 64-bit kernelspace pointer. The best
>solution is to avoid pointers, as the pointer translation usually involves
>translating the whole struct being passed because of the field
>offset differences.
>
>Laszlo
>  
>
you're right of course .. one question: i see copy_from_user in many 
places in the kernel, often within with user supplied
addresses. Do all of these don't work on sparc64 ? what a horrible 
architecture ;)

regards,
patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [OT Re: Question: Variable sized matchinfo
  2003-01-21 16:32       ` [OT " Patrick McHardy
@ 2003-01-21 16:46         ` Laszlo Valko
  0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Valko @ 2003-01-21 16:46 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Don Cohen, netfilter-devel

On Tue, Jan 21, 2003 at 05:32:46PM +0100, Patrick McHardy wrote:
> you're right of course .. one question: i see copy_from_user in many 
> places in the kernel, often within with user supplied
> addresses. Do all of these don't work on sparc64 ? what a horrible 
> architecture ;)
> 
> regards,
> patrick
> 

Look at arch/sparc64/kernel/*.c and you'll see magic in there...
Just to have a small picture of the situation...

And imagine what will happen with x86-64! I doubt that there'll be
a compiler for that platform which produces working 64bit
userspace code real soon. It took about 5 years for sparc... :)))

Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* re: conntrack hash function comparison
  2003-01-21 11:12   ` Jozsef Kadlecsik
@ 2003-01-21 16:59     ` Don Cohen
  2003-01-31 11:41       ` Harald Welte
  0 siblings, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-01-21 16:59 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter-devel

 > > I expect it will be a little slower (using long numbers) but
 > > a little better, for instance, on table sizes that are powers of 2.
 > 
 > I don't think we should restrict ourselves to hashes which work with any
 > kind of hashsize. Be the hast as fast as possible and as good as possible
 > :-).
If it's not going to work well on certain table sizes then you should
really do something to ensure that the table size is one of the good
ones.  At least make the default good and add a comment saying which
sizes are good/bad.

Perhaps I should tell you what the changes are supposed to be good for.
The code I saw in cttest (I only suppose you're using the same code)
multiplies 32 bit source IP by 32 bit random number.  The question is
then what part of the result is used.  If you do 32 bit adds then only
the last 32 bits if the product is used.  Note that, while the last
bit of source address affects all 32 of these result bits, the first
bit of source address affects only the first result bit, and that only
if the last bit of the random number is 1.  What I want is for every
bit of input to affect every bit of output.  I therefore now split the
64 bit result in two and add the two halves.  (Which has that effect.)
So before if the table size was, say, 2^13 (8K), only the last 13 bits
of the inputs would affect the result.  So 1.2.3.4 would hash to the 
same thing as 17.39.3.4, or even worse if you're unlucky with
endianness, 1.2.3.4 might hash the same as 1.2.17.39 !
In any case, regardless of the likelihood of this sort of thing
arising in legitimate data, it makes intentional attack a lot easier.

 > > - I don't see what you consider odd about the random results.
 > 
 > I believe the random hash should give a smooth distribution of the
 > entries. But for example in the case of the matrix dataset/8191 hash size
 > it gives:
 > 
 > Count   Length  CT
 > 5       21      0.14%
 > 3       20      0.22%
 > 11      19      0.50%
 > 29      18      1.19%
 > 50      17      2.32%
 > 99      16      4.43%
 > 180     15      8.02%
 > 290     14      13.41%
 > 445     13      21.11%
 > 662     12      31.67%
 > 833     11      43.85%
 > 1011    10      57.29%
 > 1051    9       69.87%
 > 1046    8       80.99%
 > 911     7       89.47%
 > 716     6       95.18%
 > 442     5       98.12%
 > 254     4       99.47%
 > 105     3       99.89%
 > 38      2       99.99%
 > 7       1       100.00%
 > 3       0       100.00%
 > 
 > The number of the longest hash buckets is out of the distribution - there
 > are more of them than the hash buckets just shorter than the longest one.

The expected number is "smooth", but of course you shouldn't expect a
given test to follow the expected number exactly.  This looks to me
like a perfectly reasonable data point.  Generally I'm happy if the
longest bucket is not much longer than expected.  That is, use the
formula from last summer's discussion to see the expected number of 
buckets of length 21, and if it's > .1 then don't worry.  If it's
between .1 and .001 then try a few more tests.  If it's 1e-12 then
your hash function is no good.

 > I don't really like the idea of rehashing if the hash function is
 > (so) strong.
At some point it's clearly worth rehashing.  You have to decide what
that point is.  If the function is so good and successfully resists
attack then you'll never (or hardly ever) need to rehash, so all you
pay is a tiny amount of space for code that's never used.
Whereas, if you're extremely unlucky, or a clever new attack method
comes along, that small amount of code will help you a whole lot.
You choose your table size and rehash threshold in order to make the 
search time acceptable (max length of bucket) and also the probability
of rehash very small. 
I think at the moment I use 20 as my rehash threshold and a table
size that's the number of entries I plan to put in it, so average
bucket occupancy <= 1.  
(Dusting off my old code, 20 entries in a given bucket in that case
seems to have probability ~ 1e-19.
So even with 1e6 buckets, not very likely.)

 > What is your opinion on the abcdx variant, where the four random
 > multipliers were chosen so that at any bit postition in the four
 > numbers there are exactly two zeros and two ones (similarly to the
 > rtable in buzhash, http://www.cl.cam.ac.uk/users/kw217/pub/Hash.adt.ps.gz).

Immediate reactions:
- I'd like to hear the rationale for this - what good is it supposed
to do?  Conversely, what problems arise from not doing it?
- Rich did mention to me that it would be good for the random numbers
to each contain approximately the same number of 1's and 0's.
It's at least clear that zero would be a bad choice.  Perhaps when
you choose random numbers you could just require, e.g., every byte
contain 3-5 1's (and 0's), i.e., eliminate the cases of 0,1,2, of
which there are 2+16+56=74, leaving 182 out of 256 possible values... 

Right now I simply rely on rehash to fix any such problems (or any
OTHER problems for that matter) that actually arise.  Since I argue
you really ought to have that as a backup anyway, I don't mind using
it to solve this problem.  I do recommend a log entry if you ever have
to rehash, and frequent rehash would be worth notification if you have
such a facility.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-01-21 16:59     ` Don Cohen
@ 2003-01-31 11:41       ` Harald Welte
  2003-02-01  0:41         ` Don Cohen
  0 siblings, 1 reply; 15+ messages in thread
From: Harald Welte @ 2003-01-31 11:41 UTC (permalink / raw)
  To: Don Cohen; +Cc: Jozsef Kadlecsik, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1776 bytes --]

On Tue, Jan 21, 2003 at 08:59:12AM -0800, Don Cohen wrote:
>  > I don't really like the idea of rehashing if the hash function is
>  > (so) strong.
> At some point it's clearly worth rehashing.  You have to decide what
> that point is.  If the function is so good and successfully resists
> attack then you'll never (or hardly ever) need to rehash, so all you
> pay is a tiny amount of space for code that's never used.
> Whereas, if you're extremely unlucky, or a clever new attack method
> comes along, that small amount of code will help you a whole lot.
> You choose your table size and rehash threshold in order to make the 
> search time acceptable (max length of bucket) and also the probability
> of rehash very small. 

not that I totally disagree with rehashing being a good idea... but in
the case you add this 'small piece of code', you also introduce a new
vulnerability: If somebody can provoke the firewall to rehash all the
time ;(

> Right now I simply rely on rehash to fix any such problems (or any
> OTHER problems for that matter) that actually arise.  Since I argue
> you really ought to have that as a backup anyway, I don't mind using
> it to solve this problem.  I do recommend a log entry if you ever have
> to rehash, and frequent rehash would be worth notification if you have
> such a facility.

yup. rehash parameters should be a sysctl, and rehash statistics kept
somewhere in /proc.

-- 
Live long and prosper
- Harald Welte / laforge@gnumonks.org               http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- 
V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*)

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-01-31 11:41       ` Harald Welte
@ 2003-02-01  0:41         ` Don Cohen
  2003-02-01  6:57           ` Patrick Schaaf
  0 siblings, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-02-01  0:41 UTC (permalink / raw)
  To: Harald Welte; +Cc: Jozsef Kadlecsik, netfilter-devel

Harald Welte writes:
 > not that I totally disagree with rehashing being a good idea... but in
 > the case you add this 'small piece of code', you also introduce a new
 > vulnerability: If somebody can provoke the firewall to rehash all the
 > time ;(
The counter argument is that if they can do that then they have a way
of causing lots of connections to go into the same bucket, and allowing
that to happen is (at least in the extreme case) much worse than
rehashing.  Obviously it's also bad to rehash too eagerly.
I think the only question is where to put the threshold.

It's reasonable to make that threshold read/writable via /proc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-02-01  0:41         ` Don Cohen
@ 2003-02-01  6:57           ` Patrick Schaaf
  2003-02-01  7:58             ` Don Cohen
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Schaaf @ 2003-02-01  6:57 UTC (permalink / raw)
  To: Don Cohen; +Cc: Harald Welte, Jozsef Kadlecsik, netfilter-devel

On Fri, Jan 31, 2003 at 04:41:28PM -0800, Don Cohen wrote:
> Harald Welte writes:
>  > not that I totally disagree with rehashing being a good idea... but in
>  > the case you add this 'small piece of code', you also introduce a new
>  > vulnerability: If somebody can provoke the firewall to rehash all the
>  > time ;(
> The counter argument is that if they can do that then they have a way
> of causing lots of connections to go into the same bucket, and allowing
> that to happen is (at least in the extreme case) much worse than
> rehashing.  Obviously it's also bad to rehash too eagerly.
> I think the only question is where to put the threshold.

I imagine that instead of, or in addition to, a threshold on list
lengths, I would like to have a "minimum time between rehashings" knob,
set to e.g. one hour. That way, the cost is guaranteed independant of
packet arrival rate. And, to practically disable rehashing, that knob
can be set to an immensely large value.

Hmm. Maybe even with a printk() 10 (N) minutes in advance, telling the
admin "I think I'm going to rehash a bit, soon".

No. Forget about all that.

1) metrics in /proc, including a "max list length during the last hour"
   statistic.
2) one read/writable /proc entry, containing an integer: the number of
   hash buckets. This is missing right now, anyway (have to grep the
   boot messages). Make it writable, and rehash exactly when a write
   changes the hash bucket count.

This way, the policy (when to rehash) is safely in user space, the
kernel gives mechanism, only - just like it should.

What do you think?

best regards
  Patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-02-01  6:57           ` Patrick Schaaf
@ 2003-02-01  7:58             ` Don Cohen
  2003-02-01  8:37               ` Harald Welte
  0 siblings, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-02-01  7:58 UTC (permalink / raw)
  To: Patrick Schaaf; +Cc: Harald Welte, Jozsef Kadlecsik, netfilter-devel

Patrick Schaaf writes:
 > On Fri, Jan 31, 2003 at 04:41:28PM -0800, Don Cohen wrote:
 > > Harald Welte writes:
 > >  > not that I totally disagree with rehashing being a good idea... but in
 > >  > the case you add this 'small piece of code', you also introduce a new
 > >  > vulnerability: If somebody can provoke the firewall to rehash all the
 > >  > time ;(
 > > The counter argument is that if they can do that then they have a way
 > > of causing lots of connections to go into the same bucket, and allowing
 > > that to happen is (at least in the extreme case) much worse than
 > > rehashing.  Obviously it's also bad to rehash too eagerly.
 > > I think the only question is where to put the threshold.
 > 
 > I imagine that instead of, or in addition to, a threshold on list
 > lengths, I would like to have a "minimum time between rehashings" knob,
 > set to e.g. one hour. That way, the cost is guaranteed independant of
 > packet arrival rate. And, to practically disable rehashing, that knob
 > can be set to an immensely large value.

I don't understand the utility of such a control.
Why should you rehash if the table is doing ok?
Conversely, if it's not, why should you suffer for an hour?

 > 1) metrics in /proc, including a "max list length during the last hour"
 >    statistic.
not very practical if you think about it
More reasonable: longest bucket length since last rehash
If you really want to take the time to look at such things, I'd suggest
a list of lengths of all buckets.

 > 2) one read/writable /proc entry, containing an integer: the number of
 >    hash buckets. This is missing right now, anyway (have to grep the
 >    boot messages). Make it writable, and rehash exactly when a write
 >    changes the hash bucket count.
It's currently a module parameter?  Not that it would be so hard to
change it at run time, but is that really so useful?
You could unload the module and reload it with a different parameter.  
If the complaint is that you then lose the data, I suggest using the
proc file to save and restore (copy the /proc data to a real file, then
copy it back after unload/reload).  That would even let you save and 
restore over a reboot.


 > This way, the policy (when to rehash) is safely in user space, the
 > kernel gives mechanism, only - just like it should.
/proc control over the rehash threshold (max bucket size) seems like
an appropriate "control over policy".  
Whereas a user space program doing something to cause a rehash seems
to me to be a bad choice.  (For one thing, when the kernel gets into 
trouble the user space program might never have a chance to run.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-02-01  7:58             ` Don Cohen
@ 2003-02-01  8:37               ` Harald Welte
  2003-02-01 19:05                 ` Don Cohen
  0 siblings, 1 reply; 15+ messages in thread
From: Harald Welte @ 2003-02-01  8:37 UTC (permalink / raw)
  To: Don Cohen; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Fri, Jan 31, 2003 at 11:58:37PM -0800, Don Cohen wrote:

> not very practical if you think about it
> More reasonable: longest bucket length since last rehash
> If you really want to take the time to look at such things, I'd suggest
> a list of lengths of all buckets.

I think that whas what Jozsef really meant.

>  > 2) one read/writable /proc entry, containing an integer: the number of
>  >    hash buckets. This is missing right now, anyway (have to grep the
>  >    boot messages). Make it writable, and rehash exactly when a write
>  >    changes the hash bucket count.
> It's currently a module parameter?  

yes, it is.  

> Not that it would be so hard to
> change it at run time, but is that really so useful?

Especially in the case where it is not a kernel module ;)

> You could unload the module and reload it with a different parameter.  
> If the complaint is that you then lose the data, I suggest using the
> proc file to save and restore (copy the /proc data to a real file, then
> copy it back after unload/reload).  That would even let you save and 
> restore over a reboot.

no way. no way. no way.
sorry, but read the various discussions about the /proc/ip_conntrack
interface in general, think about all the data that is not printed (like
references to currently in-kernel packets, connection between
expectations and master connection, ...  lots of those issues are
relevant for conntrack failover as well.  This is only viable via
ctnetlink, which is not part of the stock kernel yet.

>  > This way, the policy (when to rehash) is safely in user space, the
>  > kernel gives mechanism, only - just like it should.
> /proc control over the rehash threshold (max bucket size) seems like
> an appropriate "control over policy".  
> Whereas a user space program doing something to cause a rehash seems
> to me to be a bad choice.  (For one thing, when the kernel gets into 
> trouble the user space program might never have a chance to run.)

yes, I agree.  and that's why i like the idea of a 'min rehash interval'
proc parameter.  If you set this to '0', kernel will rehash never.  '1'
means maximum of one rehash per second, ...

-- 
- Harald Welte / laforge@gnumonks.org               http://www.gnumonks.org/
============================================================================
"If this were a dictatorship, it'd be a heck of a lot easier, just so long
 as I'm the dictator."  --  George W. Bush Dec 18, 2000

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-02-01  8:37               ` Harald Welte
@ 2003-02-01 19:05                 ` Don Cohen
  2003-02-02 22:45                   ` Harald Welte
  0 siblings, 1 reply; 15+ messages in thread
From: Don Cohen @ 2003-02-01 19:05 UTC (permalink / raw)
  To: Harald Welte; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel

Harald Welte writes:

 > > Not that it would be so hard to change [hash table size]
     at run time, but is that really so useful?
 > Especially in the case where it is not a kernel module ;)

Well, then you need a way to initialize it.  Is there a standard
solution to this problem?  Or is the inability to choose such values
just the penalty you pay for choosing Y instead of M?

 > > You could unload the module and reload it with a different parameter.  
 > > If the complaint is that you then lose the data, I suggest using the
 > > proc file to save and restore (copy the /proc data to a real file, then
 > > copy it back after unload/reload).  That would even let you save and 
 > > restore over a reboot.
 > 
 > no way. no way. no way.
 > sorry, but read the various discussions about the /proc/ip_conntrack
 > interface in general, think about all the data that is not printed (like
 > references to currently in-kernel packets, connection between
 > expectations and master connection, ...  lots of those issues are
 > relevant for conntrack failover as well.  This is only viable via
 > ctnetlink, which is not part of the stock kernel yet.

This was meant as a general suggestion, not specific to conntrack.

I don't know all the details specific to conntrack.  It seems
surprising that you couldn't save all relevant state via /proc.
Even more so if you think you could do it with netlink.
The stuff above suggests that you're worried about highly volatile
state, but the suggestion is meant only for "stable" states.  
It makes sense to remember connections across a reboot (if it doesn't 
take too long), not packets.  You'd have to effectively disable
networking (so there are no packets to save) before saving state.  

I plan to try it for my module that does conntrack-like things.

 > >  > This way, the policy (when to rehash) is safely in user space, the
 > >  > kernel gives mechanism, only - just like it should.
 > > /proc control over the rehash threshold (max bucket size) seems like
 > > an appropriate "control over policy".  
 > > Whereas a user space program doing something to cause a rehash seems
 > > to me to be a bad choice.  (For one thing, when the kernel gets into 
 > > trouble the user space program might never have a chance to run.)
 > 
 > yes, I agree.  and that's why i like the idea of a 'min rehash interval'
 > proc parameter.  If you set this to '0', kernel will rehash never.  '1'
 > means maximum of one rehash per second, ...

I think I finally understand what this is all about.
It's not to automatically rehash after time t, but rather to 
make sure you don't spend all of your time rehashing. 
This seems reasonable.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: conntrack hash function comparison
  2003-02-01 19:05                 ` Don Cohen
@ 2003-02-02 22:45                   ` Harald Welte
  0 siblings, 0 replies; 15+ messages in thread
From: Harald Welte @ 2003-02-02 22:45 UTC (permalink / raw)
  To: Don Cohen; +Cc: Patrick Schaaf, Jozsef Kadlecsik, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]

On Sat, Feb 01, 2003 at 11:05:54AM -0800, Don Cohen wrote:
> Harald Welte writes:
> 
>  > > Not that it would be so hard to change [hash table size]
>      at run time, but is that really so useful?
>  > Especially in the case where it is not a kernel module ;)
> 
> Well, then you need a way to initialize it.  Is there a standard
> solution to this problem?  Or is the inability to choose such values
> just the penalty you pay for choosing Y instead of M?

no, there is no standard solution.  And yes, compilng conntrack into the
kernel instead of dynamically loading it as a module _is_ a performance
improvement.  On x86 it's negligible - but on ppc it should make a
measurable difference.

>  > yes, I agree.  and that's why i like the idea of a 'min rehash interval'
>  > proc parameter.  If you set this to '0', kernel will rehash never.  '1'
>  > means maximum of one rehash per second, ...
> 
> I think I finally understand what this is all about.
> It's not to automatically rehash after time t, but rather to 
> make sure you don't spend all of your time rehashing. 

exactly.

> This seems reasonable.

-- 
- Harald Welte / laforge@gnumonks.org               http://www.gnumonks.org/
============================================================================
"If this were a dictatorship, it'd be a heck of a lot easier, just so long
 as I'm the dictator."  --  George W. Bush Dec 18, 2000

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-02-02 22:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20030120232634.24011.21218.Mailman@kashyyyk>
2003-01-21  6:19 ` conntrack hash function comparison Don Cohen
2003-01-21 11:12   ` Jozsef Kadlecsik
2003-01-21 16:59     ` Don Cohen
2003-01-31 11:41       ` Harald Welte
2003-02-01  0:41         ` Don Cohen
2003-02-01  6:57           ` Patrick Schaaf
2003-02-01  7:58             ` Don Cohen
2003-02-01  8:37               ` Harald Welte
2003-02-01 19:05                 ` Don Cohen
2003-02-02 22:45                   ` Harald Welte
2003-01-21  6:40 ` Question: Variable sized matchinfo Don Cohen
2003-01-21 11:42   ` Patrick McHardy
2003-01-21 16:07     ` Laszlo Valko
2003-01-21 16:32       ` [OT " Patrick McHardy
2003-01-21 16:46         ` Laszlo Valko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.