* bloom filter in netfilter?
@ 2007-03-20 15:07 Sebastien Tandel
2007-03-20 15:20 ` Patrick McHardy
2007-03-20 15:25 ` Pablo Neira Ayuso
0 siblings, 2 replies; 12+ messages in thread
From: Sebastien Tandel @ 2007-03-20 15:07 UTC (permalink / raw)
To: netfilter-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Hi,
I'm wondering if bloom filters could not improve performance of the
conntracker. For a quick overwiew of bloom filters see
http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
In a few words, a bloom filter is a data structure which represents
concisely a set. When you have a set, you can decide very quickly if an
element belongs to it.
I was then wondering if we could not get rid of these two
list_for_each_entry in the __nf_conntrack_confirm by using the bloom
filters.
You can use it as a key for a hash table as well. And therefore we may
use them at a later stage for the hash table as well. But let's see
first if you're interested by the previous idea ;)
What do you think?
Regards,
Sebastien Tandel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF//jDw76McB8jGxkRClszAJ0Zp2Jw1ctmldL3EX4NPhNchz9OZQCgluPh
i8edz1Zv/JuBUso/mqTx4EQ=
=xkDc
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 15:07 bloom filter in netfilter? Sebastien Tandel
@ 2007-03-20 15:20 ` Patrick McHardy
2007-03-20 15:25 ` Pablo Neira Ayuso
1 sibling, 0 replies; 12+ messages in thread
From: Patrick McHardy @ 2007-03-20 15:20 UTC (permalink / raw)
To: Sebastien Tandel; +Cc: netfilter-devel
Sebastien Tandel wrote:
> I'm wondering if bloom filters could not improve performance of the
> conntracker. For a quick overwiew of bloom filters see
> http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
>
> In a few words, a bloom filter is a data structure which represents
> concisely a set. When you have a set, you can decide very quickly if an
> element belongs to it.
>
> I was then wondering if we could not get rid of these two
> list_for_each_entry in the __nf_conntrack_confirm by using the bloom
> filters.
With a properly sizes hash table the buckets should have only
a single entry on average. The problem with bloom filters is
that its quite expensive to calculate enough hash bits for
a large filter with a low probability of false positives.
> You can use it as a key for a hash table as well. And therefore we may
> use them at a later stage for the hash table as well. But let's see
> first if you're interested by the previous idea ;)
>
> What do you think?
I thought about using them for caching packet classification results
(DROP/ACCEPT) some time ago. Never got around to try it though.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 15:07 bloom filter in netfilter? Sebastien Tandel
2007-03-20 15:20 ` Patrick McHardy
@ 2007-03-20 15:25 ` Pablo Neira Ayuso
2007-03-20 15:26 ` Patrick McHardy
1 sibling, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-03-20 15:25 UTC (permalink / raw)
To: Sebastien Tandel; +Cc: netfilter-devel
Hi Sebastien,
Sebastien Tandel wrote:
> I'm wondering if bloom filters could not improve performance of the
> conntracker. For a quick overwiew of bloom filters see
> http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
Yes, I know that work.
> In a few words, a bloom filter is a data structure which represents
> concisely a set. When you have a set, you can decide very quickly if an
> element belongs to it.
>
> I was then wondering if we could not get rid of these two
> list_for_each_entry in the __nf_conntrack_confirm by using the bloom
> filters.
We can't just get rid of it since bloom filters have false positives, so
it could happen that we could miss some new connections that are not
actually in the conntrack table.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 15:25 ` Pablo Neira Ayuso
@ 2007-03-20 15:26 ` Patrick McHardy
2007-03-20 15:34 ` Patrick Schaaf
0 siblings, 1 reply; 12+ messages in thread
From: Patrick McHardy @ 2007-03-20 15:26 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
Pablo Neira Ayuso wrote:
>> I was then wondering if we could not get rid of these two
>> list_for_each_entry in the __nf_conntrack_confirm by using the bloom
>> filters.
>
>
> We can't just get rid of it since bloom filters have false positives, so
> it could happen that we could miss some new connections that are not
> actually in the conntrack table.
That wouldn't be a big problem in my opinion, you can freely tune the
probability.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 15:26 ` Patrick McHardy
@ 2007-03-20 15:34 ` Patrick Schaaf
2007-03-20 15:43 ` Patrick McHardy
0 siblings, 1 reply; 12+ messages in thread
From: Patrick Schaaf @ 2007-03-20 15:34 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel, Pablo Neira Ayuso
> > We can't just get rid of it since bloom filters have false positives, so
> > it could happen that we could miss some new connections that are not
> > actually in the conntrack table.
>
> That wouldn't be a big problem in my opinion, you can freely tune the
> probability.
Wouldn't two conntrack entries in the hash for the same tuples be
catastrophic somewhere? (I have no idea, actually)
The other thing about bloom filters that worries me, having read
the wikipedia entry, is that you cannot remove elements. Probability
of false positives thus increases as conntracks are added. And a
future repetition of exactly the same tuple would certainly be
a false positive. Tuples repeat pretty fast with things like BGP,
and a filter regeneration seems to involve shuffling all current
conntracks into a fresh bloom filter.
Either I misunderstand something fundamental, which is not
unlikely, or that kills the idea.
best regards
Patrick
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 15:34 ` Patrick Schaaf
@ 2007-03-20 15:43 ` Patrick McHardy
0 siblings, 0 replies; 12+ messages in thread
From: Patrick McHardy @ 2007-03-20 15:43 UTC (permalink / raw)
To: Patrick Schaaf; +Cc: netfilter-devel, Pablo Neira Ayuso
Patrick Schaaf wrote:
>>>We can't just get rid of it since bloom filters have false positives, so
>>>it could happen that we could miss some new connections that are not
>>>actually in the conntrack table.
>>
>>That wouldn't be a big problem in my opinion, you can freely tune the
>>probability.
>
>
> Wouldn't two conntrack entries in the hash for the same tuples be
> catastrophic somewhere? (I have no idea, actually)
Yes, but bloom filters only have false positives (already in
hash, so skip confirmation), never false negatives, so that
would still not happen.
> The other thing about bloom filters that worries me, having read
> the wikipedia entry, is that you cannot remove elements. Probability
> of false positives thus increases as conntracks are added. And a
> future repetition of exactly the same tuple would certainly be
> a false positive. Tuples repeat pretty fast with things like BGP,
> and a filter regeneration seems to involve shuffling all current
> conntracks into a fresh bloom filter.
>
> Either I misunderstand something fundamental, which is not
> unlikely, or that kills the idea.
You're right, tuple reuse is a problem, it seems counting bloom
filters would be needed to deal with that, which are a lot less
memory efficient. The increasing probabilities of false positives
with increasing number of entries could be dealt with by using a
second bloom filter that is filled with new entries once a low
threshold is exceeded and replaces the active one once a high
threshold is exceeded.
I have to admit that I'm a huge fan of bloom filters and always
wanted to use them for something cool :)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
@ 2007-03-20 16:31 Robert Iakobashvili
0 siblings, 0 replies; 12+ messages in thread
From: Robert Iakobashvili @ 2007-03-20 16:31 UTC (permalink / raw)
To: netfilter-devel
> You're right, tuple reuse is a problem, it seems counting bloom
> filters would be needed to deal with that, which are a lot less
> memory efficient. The increasing probabilities of false positives
> with increasing number of entries could be dealt with by using a
> second bloom filter that is filled with new entries once a low
> threshold is exceeded and replaces the active one once a high
> threshold is exceeded.
>
> I have to admit that I'm a huge fan of bloom filters and always
> wanted to use them for something cool :)
Recent paper from Luca Deri could be relevant here:
http://luca.ntop.org/Blooms.pdf
--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 16:37 [Fwd: Re: bloom filter in netfilter?] Sebastien Tandel
@ 2007-03-20 19:27 ` Pablo Neira Ayuso
2007-03-20 21:41 ` Sebastien Tandel
2007-03-21 12:46 ` Sebastien Tandel
0 siblings, 2 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-03-20 19:27 UTC (permalink / raw)
To: Sebastien Tandel; +Cc: netfilter-devel
Sebastien Tandel wrote:
> Patrick McHardy wrote:
>> That wouldn't be a big problem in my opinion, you can freely tune the
>> probability.
>
> In the specific case I was speaking about, you don't expect to find
> anything. Therefore, as Patrick says, if you tune the probability of
> false positives, you should not expect ones really often. If one occurs,
> of course, you have to verify it in the list.
But the case in which we don't expect to find anything is the worst
case, eg. someone is generating trash traffic to stress the conntrack
system. So this can be considered as a hardening technique. Anyway, we
would need to evaluate the impact of such solution in terms of memory
consumption (probably something from 4KB to 16KB) and extra CPU cycles
due to hashing. Of course, previously we would have to get some numbers
to know how bad is currently the worst case.
I have some experimental stuff on bloom filters at my people netfilter
place that I needed for some works here at the university, it can't be
used for any productive purposes but you could use it as a starting point.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 19:27 ` bloom filter in netfilter? Pablo Neira Ayuso
@ 2007-03-20 21:41 ` Sebastien Tandel
2007-03-21 12:45 ` Sebastien Tandel
2007-03-21 12:46 ` Sebastien Tandel
1 sibling, 1 reply; 12+ messages in thread
From: Sebastien Tandel @ 2007-03-20 21:41 UTC (permalink / raw)
To: netfilter-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
>>> That wouldn't be a big problem in my opinion, you can freely tune
>>> the probability.
>> >>
>> >> In the specific case I was speaking about, you don't expect to
>> >> find anything. Therefore, as Patrick says, if you tune the
>> >> probability of false positives, you should not expect ones really
>> >> often. If one occurs, of course, you have to verify it in the
>> >> list.
> >
> > But the case in which we don't expect to find anything is the worst
> > case, eg. someone is generating trash traffic to stress the
> > conntrack system. So this can be considered as a hardening
> > technique.
In the case of the traversal of the two lists in "nf_conntrack_confirm"?
You're not expecting the race from NAT at every call, do you? Or have I
missed something in the code?
> > Anyway, we would need to evaluate the impact of such solution in
> > terms of memory consumption (probably something from 4KB to 16KB)
> > and extra CPU cycles due to hashing.
Yes, of course I do not imagine that it will be very useful for people
at home with only a few connections at a time.
> > Of course, previously we would have to get some numbers
> > to know how bad is currently the worst case.
Indeed, it would be good.
> > I have some experimental stuff on bloom filters at my people
> > netfilter place that I needed for some works here at the university,
> > it can't be used for any productive purposes but you could use it as
> > a starting point.
Why? what are your deceptions about these bloomy things?
I've looked at your bloom-experiments.tgz. Do you think one sha1 is
slower than this implementation of hash in your modulo.c? Do you really
think it will be random?
One sha1 computation should suffice to compute what you need. The digest
is of 160 bits. If you want 216 buckets, you may generate 10 bits hash.
Could the sha1 algorithm from the kernel take advantage of cryptography
dedicated chipset?
One another nice property of counting bloom filters is that you have the
opportunity to perform some kind of "repartition" in your hash table.
But as Patrick has mentioned earlier, the drawback is the memory
consumption, if you want to avoid this memory consumption, it will take
much more CPU cycles.
Regards,
Sebastien Tandel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGAFURw76McB8jGxkRCimEAJ9UV8Hl0ixV8vz0LvYxO57d7AvO+gCfZ8lm
ReyT0HwmqlxIi3oi3iTqFbA=
=SV8r
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 21:41 ` Sebastien Tandel
@ 2007-03-21 12:45 ` Sebastien Tandel
0 siblings, 0 replies; 12+ messages in thread
From: Sebastien Tandel @ 2007-03-21 12:45 UTC (permalink / raw)
To: netfilter-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
one must read 2^16 buckets and not 216 ... of course.
Sebastien Tandel wrote:
>>>> That wouldn't be a big problem in my opinion, you can freely tune
>>>> the probability.
>>>>> In the specific case I was speaking about, you don't expect to
>>>>> find anything. Therefore, as Patrick says, if you tune the
>>>>> probability of false positives, you should not expect ones really
>>>>> often. If one occurs, of course, you have to verify it in the
>>>>> list.
>>> But the case in which we don't expect to find anything is the worst
>>> case, eg. someone is generating trash traffic to stress the
>>> conntrack system. So this can be considered as a hardening
>>> technique.
>
> In the case of the traversal of the two lists in "nf_conntrack_confirm"?
> You're not expecting the race from NAT at every call, do you? Or have I
> missed something in the code?
>
>>> Anyway, we would need to evaluate the impact of such solution in
>>> terms of memory consumption (probably something from 4KB to 16KB)
>>> and extra CPU cycles due to hashing.
>
> Yes, of course I do not imagine that it will be very useful for people
> at home with only a few connections at a time.
>
>>> Of course, previously we would have to get some numbers
>>> to know how bad is currently the worst case.
>
> Indeed, it would be good.
>
>>> I have some experimental stuff on bloom filters at my people
>>> netfilter place that I needed for some works here at the university,
>>> it can't be used for any productive purposes but you could use it as
>>> a starting point.
>
> Why? what are your deceptions about these bloomy things?
>
> I've looked at your bloom-experiments.tgz. Do you think one sha1 is
> slower than this implementation of hash in your modulo.c? Do you really
> think it will be random?
> One sha1 computation should suffice to compute what you need. The digest
> is of 160 bits. If you want 216 buckets, you may generate 10 bits hash.
> Could the sha1 algorithm from the kernel take advantage of cryptography
> dedicated chipset?
>
> One another nice property of counting bloom filters is that you have the
> opportunity to perform some kind of "repartition" in your hash table.
> But as Patrick has mentioned earlier, the drawback is the memory
> consumption, if you want to avoid this memory consumption, it will take
> much more CPU cycles.
>
>
>
> Regards,
> Sebastien Tandel
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGASjbw76McB8jGxkRCuMKAJ9cpfePn3/jB5gApIRX5ZXUVdSBlgCeOI4O
7xQUdwQjMfWse1lPG/oKSHs=
=9wZr
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-20 19:27 ` bloom filter in netfilter? Pablo Neira Ayuso
2007-03-20 21:41 ` Sebastien Tandel
@ 2007-03-21 12:46 ` Sebastien Tandel
2007-03-21 15:00 ` Jozsef Kadlecsik
1 sibling, 1 reply; 12+ messages in thread
From: Sebastien Tandel @ 2007-03-21 12:46 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
one potential bad news ? :)
http://people.netfilter.org/kadlec/hashtrie/
see the shared node case
Regards,
Sebastien Tandel
Pablo Neira Ayuso wrote:
> Sebastien Tandel wrote:
>> Patrick McHardy wrote:
>>> That wouldn't be a big problem in my opinion, you can freely tune the
>>> probability.
>>
>> In the specific case I was speaking about, you don't expect to find
>> anything. Therefore, as Patrick says, if you tune the probability of
>> false positives, you should not expect ones really often. If one occurs,
>> of course, you have to verify it in the list.
>
> But the case in which we don't expect to find anything is the worst
> case, eg. someone is generating trash traffic to stress the conntrack
> system. So this can be considered as a hardening technique. Anyway, we
> would need to evaluate the impact of such solution in terms of memory
> consumption (probably something from 4KB to 16KB) and extra CPU cycles
> due to hashing. Of course, previously we would have to get some numbers
> to know how bad is currently the worst case.
>
> I have some experimental stuff on bloom filters at my people netfilter
> place that I needed for some works here at the university, it can't be
> used for any productive purposes but you could use it as a starting point.
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGASkxw76McB8jGxkRCp46AKCaRRhop9jvHLjKltmq+Q4VzGbQswCfZbVi
2hl1DfEw/vzotcOPMLEFGo8=
=e95M
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: bloom filter in netfilter?
2007-03-21 12:46 ` Sebastien Tandel
@ 2007-03-21 15:00 ` Jozsef Kadlecsik
0 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2007-03-21 15:00 UTC (permalink / raw)
To: Sebastien Tandel; +Cc: netfilter-devel, Pablo Neira Ayuso
On Wed, 21 Mar 2007, Sebastien Tandel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> one potential bad news ? :)
>
> http://people.netfilter.org/kadlec/hashtrie/
>
> see the shared node case
As I wrote there, the bad results might be my fault. Feel free anyone to
plug in another bloom filter implementation into the code and test it.
Best regards,
Jozsef
-
E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-03-21 15:00 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-20 15:07 bloom filter in netfilter? Sebastien Tandel
2007-03-20 15:20 ` Patrick McHardy
2007-03-20 15:25 ` Pablo Neira Ayuso
2007-03-20 15:26 ` Patrick McHardy
2007-03-20 15:34 ` Patrick Schaaf
2007-03-20 15:43 ` Patrick McHardy
-- strict thread matches above, loose matches on Subject: below --
2007-03-20 16:31 Robert Iakobashvili
2007-03-20 16:37 [Fwd: Re: bloom filter in netfilter?] Sebastien Tandel
2007-03-20 19:27 ` bloom filter in netfilter? Pablo Neira Ayuso
2007-03-20 21:41 ` Sebastien Tandel
2007-03-21 12:45 ` Sebastien Tandel
2007-03-21 12:46 ` Sebastien Tandel
2007-03-21 15:00 ` Jozsef Kadlecsik
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.