[Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
@ 2003-05-19  3:01 David S. Miller
  2003-05-20  0:38 ` Jamal Hadi
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Miller @ 2003-05-19  3:01 UTC (permalink / raw)
  To: linux-net; +Cc: netdev, sommere

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]

I'm forwarding Ethan's announcement here.  Ethan, you'll get
better reception to your ideas if you post them to the correct
place.  Most networking hackers do not read linux-kernel due to
the sheer volume of traffic there :-)

-- 
David S. Miller <davem@redhat.com>

[-- Attachment #2: Forwarded message - [ANNOUNCE] Layer-7 Filter for Linux QoS --]
[-- Type: message/rfc822, Size: 3641 bytes --]

From: Ethan Sommer <sommere@ethanet.com>
To: linux-kernel@vger.kernel.org
Subject: [ANNOUNCE] Layer-7 Filter for Linux QoS
Date: Sun, 18 May 2003 21:23:45 -0500
Message-ID: <3EC84031.90300@ethanet.com>

We have written a filter for the QoS infrastructure that looks at the 
data segment of packets and uses regular expressions to identify the 
protocol of a stream of traffic regardless of port number.

Many peer-to-peer programs (such as Kazaa and Gnucleus) will change to 
use a different port (including well known ports such as, say, 80) if 
they find that they can get better throughput there. That means that the 
port based filtering is no longer sufficient. However, by analyzing the 
application layer data, we can differentiate Kazaa from non-Kazaa HTTP, 
and lower the priority of whichever we deem to be less important. :)

It is a filter in the existing QoS infrastructure, so it can be used in 
conjunction with u32 filters, HTB or CBQ scheduling, SFQ queueing etc, 
etc...

Commercial companies sell devices which do layer-7 classification for 
anywhere from $6000-$80,000 depending on the bandwidth required. If we 
can build a comprehensive set of patterns I don't see any reason why 
Linux can't beat the pants off the commercial devices; we already have 
excellent queueing, and scheduling.

Our home page is http://l7-filter.sourceforge.net/ but if you want to 
skip right to the downloads go to 
http://sourceforge.net/projects/l7-filter/ (there is a kernel patch, a 
patched version of tc, and some sample patterns for HTTP, POP3, IMAP, 
SSH, Kazaa, and FTP.) You'll notice the patch is a somewhat large, most 
of that is regexp code.

We're still working on it. It currently only does TCP for example... Do 
you guys/gals have any comments/suggestions/etc? I suspect that this is 
a post 2.6 thing, but it is very non-invasive (it only adds approx. 2 
lines of code that would affect anything if the user were not using the 
layer-7 filters,) so I still have a little bit of hope.

Ethan Sommer
Matt Strait
Justin Levandoski

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-19  3:01 [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS] David S. Miller
@ 2003-05-20  0:38 ` Jamal Hadi
  2003-05-20  5:07   ` Ethan Sommer
  0 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi @ 2003-05-20  0:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-net, netdev, sommere

Hi,

The concept could be improved in my opinion.
_All_ filters could use the regex (finally cnews regex in the kernel,
yiiha). So what we need is this to be an extension to _all_ filters.
i.e define u32 filter then a regexp on a match with a start offset to
start scanning etc. I think we need to have extended matches in general
regardless of this. I have been thinking of it for sometime and even
implemented some simple prototype (u32 followed by a fwmark). This is one
of the nice concepts netfilter introduced in my opinion.

BTW, i am not sure how efficient the stuff that Henry spencer
wrote is in comparison to newer research on variants of boyer-moore
for regex searches. comment?

The code also does seem inefficient but thats beside the point at the
moment (ex you seem to malloc for every incoming packet so you can do a
regex).

cheers,
jamal

On Sun, 18 May 2003, David S. Miller wrote:

> I'm forwarding Ethan's announcement here.  Ethan, you'll get
> better reception to your ideas if you post them to the correct
> place.  Most networking hackers do not read linux-kernel due to
> the sheer volume of traffic there :-)
>
> --
> David S. Miller <davem@redhat.com>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20  0:38 ` Jamal Hadi
@ 2003-05-20  5:07   ` Ethan Sommer
  2003-05-20 12:14     ` Jamal Hadi
  0 siblings, 1 reply; 19+ messages in thread
From: Ethan Sommer @ 2003-05-20  5:07 UTC (permalink / raw)
  To: Jamal Hadi; +Cc: David S. Miller, linux-net, netdev

Jamal Hadi wrote:

>Hi,
>
...

>BTW, i am not sure how efficient the stuff that Henry spencer
>wrote is in comparison to newer research on variants of boyer-moore
>for regex searches. comment?
>  
>
I picked Spencer's mainly because it was a pain to port the glibc 
version into kernel space. They are both basically the posix regexp 
interface, so if someone wants to get another version working in kernel 
space we can test and see which one performs better with real load 
pretty easily.


>The code also does seem inefficient but thats beside the point at the
>moment (ex you seem to malloc for every incoming packet so you can do a
>regex).
>  
>
Yep. We haven't spent a lot of time on optimizations. Obviously that 
example can be fixed pretty quickly... except I'm not sure we can avoid 
it and stay thread safe? (linux can route multiple packets at the same 
time on an smp box right? so we can't just use a staically defined 
buffer...)

Ethan Sommer
http://l7-filter.sf.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20  5:07   ` Ethan Sommer
@ 2003-05-20 12:14     ` Jamal Hadi
  2003-05-20 14:39       ` Ethan Sommer
  0 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi @ 2003-05-20 12:14 UTC (permalink / raw)
  To: Ethan Sommer; +Cc: David S. Miller, linux-net, netdev



On Tue, 20 May 2003, Ethan Sommer wrote:

> I picked Spencer's mainly because it was a pain to port the glibc
> version into kernel space. They are both basically the posix regexp
> interface, so if someone wants to get another version working in kernel
> space we can test and see which one performs better with real load
> pretty easily.
>

Take a look at snorts implementation. I think snort is GPL so you may
just be able to borrow it.

>
> Yep. We haven't spent a lot of time on optimizations. Obviously that
> example can be fixed pretty quickly... except I'm not sure we can avoid
> it and stay thread safe? (linux can route multiple packets at the same
> time on an smp box right? so we can't just use a staically defined
> buffer...)
>

Your problem seesm to be the regexp scheme used. You dont need a static
buffer i think. You should be able to operate on an incoming packet
itself.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20 12:14     ` Jamal Hadi
@ 2003-05-20 14:39       ` Ethan Sommer
  2003-05-20 15:00         ` Jamal Hadi
  0 siblings, 1 reply; 19+ messages in thread
From: Ethan Sommer @ 2003-05-20 14:39 UTC (permalink / raw)
  To: Jamal Hadi; +Cc: David S. Miller, linux-net, netdev

Jamal Hadi wrote:

>On Tue, 20 May 2003, Ethan Sommer wrote:
>  
>
>  
>
>>Yep. We haven't spent a lot of time on optimizations. Obviously that
>>example can be fixed pretty quickly... except I'm not sure we can avoid
>>it and stay thread safe? (linux can route multiple packets at the same
>>time on an smp box right? so we can't just use a staically defined
>>buffer...)
>>
>>    
>>
>
>Your problem seesm to be the regexp scheme used. You dont need a static
>buffer i think. You should be able to operate on an incoming packet
>itself.
>
>  
>
Nope. I need to strip out all the nulls from the packet, or any posix 
regex parser will think the string ends at the first null. (so protocols 
which use null's will be difficult/impossible to identify)

I could modify the regexec function to take a length, but then it 
wouldn't be the posix regexec prototype and I was hopeing someone would 
add those to the common library of kernel functions, so others could use 
them. (and hence make it easier to maintain.)

Ethan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20 14:39       ` Ethan Sommer
@ 2003-05-20 15:00         ` Jamal Hadi
  2003-05-20 15:15           ` Martin Josefsson
  2003-05-20 19:50           ` Ethan Sommer
  0 siblings, 2 replies; 19+ messages in thread
From: Jamal Hadi @ 2003-05-20 15:00 UTC (permalink / raw)
  To: Ethan Sommer; +Cc: David S. Miller, linux-net, netdev

On Tue, 20 May 2003, Ethan Sommer wrote:

> Nope. I need to strip out all the nulls from the packet, or any posix
> regex parser will think the string ends at the first null. (so protocols
> which use null's will be difficult/impossible to identify)

Ok, i see your dilema. How does snort do it? I dont think copying the
packet is the right way to do it. Could the null NOT be considered as
something speacial unless explicitly stated?

>
> I could modify the regexec function to take a length, but then it
> wouldn't be the posix regexec prototype and I was hopeing someone would
> add those to the common library of kernel functions, so others could use
> them. (and hence make it easier to maintain.)
>

This would be the first start. Check with the netfilter folks who are
famous for creating bread slicers - they may already have something along
these lines.
I am actually  interested in the kernel variant of such a
library. Actually once you have the library (which is efficient) we could
work together. I have some stuff cooking (and lotsa opinions on what i
would like to see in it that you could consider as requirements).

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20 15:00         ` Jamal Hadi
@ 2003-05-20 15:15           ` Martin Josefsson
  2003-05-21 12:39             ` Jamal Hadi
  2003-05-20 19:50           ` Ethan Sommer
  1 sibling, 1 reply; 19+ messages in thread
From: Martin Josefsson @ 2003-05-20 15:15 UTC (permalink / raw)
  To: Jamal Hadi; +Cc: Ethan Sommer, David S. Miller, linux-net, netdev

On Tue, 2003-05-20 at 17:00, Jamal Hadi wrote:
> On Tue, 20 May 2003, Ethan Sommer wrote:
> 
> > Nope. I need to strip out all the nulls from the packet, or any posix
> > regex parser will think the string ends at the first null. (so protocols
> > which use null's will be difficult/impossible to identify)
> 
> Ok, i see your dilema. How does snort do it? I dont think copying the
> packet is the right way to do it. Could the null NOT be considered as
> something speacial unless explicitly stated?

Maybe make it take a length parameter and if it's zero treat null's like
all other algorithms do and it's non-zero use the length instead.
Then you can hide it in a wrapper function for the "normal" case that
just calls the actual search-function but with 0 as length.

> >
> > I could modify the regexec function to take a length, but then it
> > wouldn't be the posix regexec prototype and I was hopeing someone would
> > add those to the common library of kernel functions, so others could use
> > them. (and hence make it easier to maintain.)
> >
> 
> This would be the first start. Check with the netfilter folks who are
> famous for creating bread slicers - they may already have something along
> these lines.
> I am actually  interested in the kernel variant of such a
> library. Actually once you have the library (which is efficient) we could
> work together. I have some stuff cooking (and lotsa opinions on what i
> would like to see in it that you could consider as requirements).

Well we don't have a that big bread slicer (yet) but take a look at
libqsearch, it is a library for searching and has been ported to the
linux kernel by the author. It has support for various algorithms that
have diffrent capabilities, unfortunately I don't think it has an
algorithm that has support for regexp yet (the framework is there, ie
the flag that says an algorithm supports regexp).
It's modular and I don't think it should be that hard to add an regexp
algorithm.
It looks quite nice and it can search for multiple strings at the same
time and call diffrent callbacks depending on which string matched.

-- 
/Martin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20 15:15           ` Martin Josefsson
@ 2003-05-21 12:39             ` Jamal Hadi
  2003-05-21 13:20               ` Philippe Biondi
  2003-05-21 15:42               ` Ethan Sommer
  0 siblings, 2 replies; 19+ messages in thread
From: Jamal Hadi @ 2003-05-21 12:39 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: Ethan Sommer, David S. Miller, linux-net, netdev, biondi



On Tue, 20 May 2003, Martin Josefsson wrote:

> Maybe make it take a length parameter and if it's zero treat null's like
> all other algorithms do and it's non-zero use the length instead.
> Then you can hide it in a wrapper function for the "normal" case that
> just calls the actual search-function but with 0 as length.
>

Actually, the library that you pointed to seems to have callbacks
associated with every match - so it could be used on string matches.
The author is on the cc.

> Well we don't have a that big bread slicer (yet) but take a look at
> libqsearch, it is a library for searching and has been ported to the
> linux kernel by the author. It has support for various algorithms that

Didnt see anything kernel related in my quick scan.
The library certainly appears sane.

> have diffrent capabilities, unfortunately I don't think it has an
> algorithm that has support for regexp yet (the framework is there, ie
> the flag that says an algorithm supports regexp).
> It's modular and I don't think it should be that hard to add an regexp
> algorithm.

it does seems to imply regexp is available but wasnt anywhere i could
find.

> It looks quite nice and it can search for multiple strings at the same
> time and call diffrent callbacks depending on which string matched.
>

yep, can sed that packet easily with those callbacks ;-> s/val/val2/g

On Tue, 20 May 2003, Ethan Sommer wrote:

> One thing I should have pointed out earlier, it only copies that
> memory/does regex stuff until it finds a match or the first 8 packets,
> whichever is less. So, at least based on my tests, it doesn't seem to
> slow down 100BT much from what it would be otherwise. We might run into
> trouble if we look at GB or 10GB, but until we find a problem with
> speed, I think it is probably more important to make this as simple and
> easy to maintain as possible. If we see a need to make it more
> complicated due to speed issues, _then_ we should think about trying to
> get rid of that copy.
>

I think you should do some measurements - "it  doesnt slow 100Mbps" and
"lets worry about it when we get to 1 or 10Gbps" are handwaving at best.
Infact i would strongly recommend looking at the libqpsearch above.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 12:39             ` Jamal Hadi
@ 2003-05-21 13:20               ` Philippe Biondi
  2003-05-21 15:46                 ` Ethan Sommer
  2003-05-21 15:42               ` Ethan Sommer
  1 sibling, 1 reply; 19+ messages in thread
From: Philippe Biondi @ 2003-05-21 13:20 UTC (permalink / raw)
  To: Jamal Hadi
  Cc: Martin Josefsson, Ethan Sommer, David S. Miller, linux-net,
	netdev

Hi all,

> On Tue, 20 May 2003, Martin Josefsson wrote:
>
> > Maybe make it take a length parameter and if it's zero treat null's like
> > all other algorithms do and it's non-zero use the length instead.
> > Then you can hide it in a wrapper function for the "normal" case that
> > just calls the actual search-function but with 0 as length.
> >
>
> Actually, the library that you pointed to seems to have callbacks
> associated with every match - so it could be used on string matches.
> The author is on the cc.

There is only one callback, but it will be called with a per-pattern
cb_data pointer (and a per-search cb_data pointer too).


> > Well we don't have a that big bread slicer (yet) but take a look at
> > libqsearch, it is a library for searching and has been ported to the
> > linux kernel by the author. It has support for various algorithms that
>
> Didnt see anything kernel related in my quick scan.
> The library certainly appears sane.

Be sure you took the latest version :
cvs -d:pserver:anonymous@cvs.prelude-ids.org:/cvsroot/prelude co libqsearch

I confirm I ported it to kernel space. To be exact, I ported the API and
made a script that generate wrappers for algorithms, that are compiled
as-is for kernel space.

More infos here:
http://www.cartel-securite.fr/pbiondi/libqsearch.html

Lot more info here (presentation I made at FOSDEM03) :
http://www.cartel-securite.fr/pbiondi/conf/libqsearch.pdf


> > have diffrent capabilities, unfortunately I don't think it has an
> > algorithm that has support for regexp yet (the framework is there, ie
> > the flag that says an algorithm supports regexp).
> > It's modular and I don't think it should be that hard to add an regexp
> > algorithm.
>
> it does seems to imply regexp is available but wasnt anywhere i could
> find.

regexp support was planned but not done yet. (if someone know where I can
download more free time !).

The implementation should not be that hard, once you have the compiler to
transform the string describing the regexp to an automaton.

Note that to respect the framework, you have to deal with multiple
patterns (should not be that hard). If you have pat1 and pat2, searching
for (pat1|pat2) is not sufficient because for each match, you have to
point which pattern matched.


> > It looks quite nice and it can search for multiple strings at the same
> > time and call diffrent callbacks depending on which string matched.
> >
>
> yep, can sed that packet easily with those callbacks ;-> s/val/val2/g

Nothing is done to change the content, but you have the position of the
match in the buffer. You can modify it yourself.




-- 
Philippe Biondi <biondi@ cartel-securite.fr> Cartel Sécurité
Security Consultant/R&D                      http://www.cartel-securite.fr
Phone: +33 1 44 06 97 94                     Fax: +33 1 44 06 97 99
PGP KeyID:3D9A43E2  FingerPrint:C40A772533730E39330DC0985EE8FF5F3D9A43E2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 13:20               ` Philippe Biondi
@ 2003-05-21 15:46                 ` Ethan Sommer
  2003-05-21 23:11                   ` Philippe Biondi
  0 siblings, 1 reply; 19+ messages in thread
From: Ethan Sommer @ 2003-05-21 15:46 UTC (permalink / raw)
  To: Philippe Biondi
  Cc: Jamal Hadi, Martin Josefsson, David S. Miller, linux-net, netdev

Philippe Biondi wrote:

>regexp support was planned but not done yet. (if someone know where I can
>download more free time !).
>
>The implementation should not be that hard, once you have the compiler to
>transform the string describing the regexp to an automaton.
>
>Note that to respect the framework, you have to deal with multiple
>patterns (should not be that hard). If you have pat1 and pat2, searching
>for (pat1|pat2) is not sufficient because for each match, you have to
>point which pattern matched.
>  
>

We actually planned on doing that initially. You should note that if you 
want to generate one automaton for multiple patterns, that is not a 
regular language (and thus can not be represented by a FA or DFA.)  You 
will have to try matching against the first pattern, then the next and 
so on.

Ethan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 15:46                 ` Ethan Sommer
@ 2003-05-21 23:11                   ` Philippe Biondi
  2003-05-21 23:26                     ` Ethan Sommer
  0 siblings, 1 reply; 19+ messages in thread
From: Philippe Biondi @ 2003-05-21 23:11 UTC (permalink / raw)
  To: Ethan Sommer
  Cc: Jamal Hadi, Martin Josefsson, David S. Miller, linux-net, netdev

On Wed, 21 May 2003, Ethan Sommer wrote:

> Philippe Biondi wrote:
>
> >regexp support was planned but not done yet. (if someone know where I can
> >download more free time !).
> >
> >The implementation should not be that hard, once you have the compiler to
> >transform the string describing the regexp to an automaton.
> >
> >Note that to respect the framework, you have to deal with multiple
> >patterns (should not be that hard). If you have pat1 and pat2, searching
> >for (pat1|pat2) is not sufficient because for each match, you have to
> >point which pattern matched.
> >
> >
>
> We actually planned on doing that initially. You should note that if you
> want to generate one automaton for multiple patterns, that is not a
> regular language (and thus can not be represented by a FA or DFA.)  You
> will have to try matching against the first pattern, then the next and
> so on.

If P and Q are regexps, P|Q is a regexp, so you do can.
So detection is clearly not a problem. But to fit in the libqsearch model,
you have to know which of the parterns matched. This is theorically
possible, but need a bit of work in comparison with using an
off-the-shelf regexp compiler on (Pat1|Pat2|..|Patn).



-- 
Philippe Biondi <biondi@ cartel-securite.fr> Cartel Sécurité
Security Consultant/R&D                      http://www.cartel-securite.fr
Phone: +33 1 44 06 97 94                     Fax: +33 1 44 06 97 99
PGP KeyID:3D9A43E2  FingerPrint:C40A772533730E39330DC0985EE8FF5F3D9A43E2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 23:11                   ` Philippe Biondi
@ 2003-05-21 23:26                     ` Ethan Sommer
  2003-05-22  8:26                       ` Philippe Biondi
  2003-05-24  4:11                       ` Werner Almesberger
  0 siblings, 2 replies; 19+ messages in thread
From: Ethan Sommer @ 2003-05-21 23:26 UTC (permalink / raw)
  To: Philippe Biondi
  Cc: Jamal Hadi, Martin Josefsson, David S. Miller, linux-net, netdev

Philippe Biondi wrote:

>On Wed, 21 May 2003, Ethan Sommer wrote:
>
>  
>
>>Philippe Biondi wrote:
>>
>>    
>>
>>>regexp support was planned but not done yet. (if someone know where I can
>>>download more free time !).
>>>
>>>The implementation should not be that hard, once you have the compiler to
>>>transform the string describing the regexp to an automaton.
>>>
>>>Note that to respect the framework, you have to deal with multiple
>>>patterns (should not be that hard). If you have pat1 and pat2, searching
>>>for (pat1|pat2) is not sufficient because for each match, you have to
>>>point which pattern matched.
>>>
>>>
>>>      
>>>
>>We actually planned on doing that initially. You should note that if you
>>want to generate one automaton for multiple patterns, that is not a
>>regular language (and thus can not be represented by a FA or DFA.)  You
>>will have to try matching against the first pattern, then the next and
>>so on.
>>    
>>
>
>If P and Q are regexps, P|Q is a regexp, so you do can.
>So detection is clearly not a problem. But to fit in the libqsearch model,
>you have to know which of the parterns matched. This is theorically
>possible, but need a bit of work in comparison with using an
>off-the-shelf regexp compiler on (Pat1|Pat2|..|Patn).
>  
>
I take it back, it is regular (kinda) but you can't to it with a 
deterministic finite atomaton. If there is a cycle in pattern1, off of 
which pattern2 has a branch, then you would need to count how many times 
you have gone around the cycle to know where to jump to in pattern2 if 
it fails to match pattern1 (which you can't do, pumping lemma and all 
that.) If you use a non-determistic FA, you should be able to just go 
through each pattern until both crash or one matches and declare that 
the winner.

If your lib did that, it would work quite well for the layer-7 filter.

Ethan



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 23:26                     ` Ethan Sommer
@ 2003-05-22  8:26                       ` Philippe Biondi
  2003-05-22 14:40                         ` Ethan Sommer
  2003-05-24  4:11                       ` Werner Almesberger
  1 sibling, 1 reply; 19+ messages in thread
From: Philippe Biondi @ 2003-05-22  8:26 UTC (permalink / raw)
  To: Ethan Sommer
  Cc: Jamal Hadi, Martin Josefsson, David S. Miller, linux-net, netdev

> >
> I take it back, it is regular (kinda) but you can't to it with a
> deterministic finite atomaton. If there is a cycle in pattern1, off of
> which pattern2 has a branch, then you would need to count how many times
> you have gone around the cycle to know where to jump to in pattern2 if
> it fails to match pattern1 (which you can't do, pumping lemma and all
> that.) If you use a non-determistic FA, you should be able to just go
> through each pattern until both crash or one matches and declare that
> the winner.


Strange way of reasoning... what if pattern1 is "(subpat1|subpat2)" ?
regexp is a regular language so it is equivalent to a DFA.
For every NDFA, there exist a DFA that recognize the same language.
So, it is possible.

The question is : will we have enough memory to store a DFA that recognize
a big regexp ? The answer is : let loose some speed and use NDFA.

> If your lib did that, it would work quite well for the layer-7 filter.

Anyone here interested by doing a regexp compiler ? I can help for details
or libqsearch internals, but I won't find enough time to do all that
quickly enough.


-- 
Philippe Biondi <biondi@ cartel-securite.fr> Cartel Sécurité
Security Consultant/R&D                      http://www.cartel-securite.fr
Phone: +33 1 44 06 97 94                     Fax: +33 1 44 06 97 99
PGP KeyID:3D9A43E2  FingerPrint:C40A772533730E39330DC0985EE8FF5F3D9A43E2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-22  8:26                       ` Philippe Biondi
@ 2003-05-22 14:40                         ` Ethan Sommer
  2003-05-24  7:22                           ` Werner Almesberger
  0 siblings, 1 reply; 19+ messages in thread
From: Ethan Sommer @ 2003-05-22 14:40 UTC (permalink / raw)
  To: Philippe Biondi
  Cc: Jamal Hadi, Martin Josefsson, David S. Miller, linux-net, netdev

Philippe Biondi wrote:

>>I take it back, it is regular (kinda) but you can't to it with a
>>deterministic finite atomaton. If there is a cycle in pattern1, off of
>>which pattern2 has a branch, then you would need to count how many times
>>you have gone around the cycle to know where to jump to in pattern2 if
>>it fails to match pattern1 (which you can't do, pumping lemma and all
>>that.) If you use a non-determistic FA, you should be able to just go
>>through each pattern until both crash or one matches and declare that
>>the winner.
>>    
>>
>
>
>Strange way of reasoning... what if pattern1 is "(subpat1|subpat2)" ?
>regexp is a regular language so it is equivalent to a DFA.
>For every NDFA, there exist a DFA that recognize the same language.
>So, it is possible.
>

That is true only if you only care if either are matched. Not if you 
care which is matched. By combining them you lose the ability to tell 
which matched.

>
>The question is : will we have enough memory to store a DFA that recognize
>a big regexp ? The answer is : let loose some speed and use NDFA.
>  
>

you will have to for the reason above.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-22 14:40                         ` Ethan Sommer
@ 2003-05-24  7:22                           ` Werner Almesberger
  0 siblings, 0 replies; 19+ messages in thread
From: Werner Almesberger @ 2003-05-24  7:22 UTC (permalink / raw)
  To: Ethan Sommer
  Cc: Philippe Biondi, Jamal Hadi, Martin Josefsson, David S. Miller,
	linux-net, netdev

Ethan Sommer wrote:
> Philippe Biondi wrote:
>> For every NDFA, there exist a DFA that recognize the same language.
>> So, it is possible.
>>
> 
> That is true only if you only care if either are matched. Not if you 
> care which is matched. By combining them you lose the ability to tell 
> which matched.

"exists a DFA" doesn't mean that there is only one :-) Typically,
there are a lot of DFAs for each NFA, usually an infinite number
of them. And among them are also those that don't combine states
you don't want to combine.

>> The question is : will we have enough memory to store a DFA that recognize
>> a big regexp ? The answer is : let loose some speed and use NDFA.

Also simpler DFAs would be interesting, e.g. acyclic ones. Size
shouldn't be a problem for them. In fact, for "traditional"
classification (i.e. well below layer 7), that's really all you
need.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 23:26                     ` Ethan Sommer
  2003-05-22  8:26                       ` Philippe Biondi
@ 2003-05-24  4:11                       ` Werner Almesberger
  2003-05-24  4:23                         ` Werner Almesberger
  1 sibling, 1 reply; 19+ messages in thread
From: Werner Almesberger @ 2003-05-24  4:11 UTC (permalink / raw)
  To: Ethan Sommer
  Cc: Philippe Biondi, Jamal Hadi, Martin Josefsson, David S. Miller,
	linux-net, netdev

Ethan Sommer wrote:
> I take it back, it is regular (kinda) but you can't to it with a 
> deterministic finite atomaton. If there is a cycle in pattern1, off of 
> which pattern2 has a branch, then you would need to count how many times 
> you have gone around the cycle to know where to jump to in pattern2 if 
> it fails to match pattern1

You mean things like

a*b -> 1
ac -> 2

? You can still express this with a DFA, i.e.

a(c|a*b)

Of course, if you have patterns that match the same string, you
need to decide at some point whether you want longest or shortest
match.

For the idea of using regexps for all types of classification: 
this is definitely something I'd love to see. The problem I've
encountered is that it seems prohibitively expensive to
construct an efficient DFA from human-friendly input.

I experimented a bit with this in tcng, but never got to a
usable solution. (tcng can handle arbitrary expressions, but
it does this by following a collection of rules for rewriting
the expression into a canonical and possibly redundant form.
Basically, it does what a human being would do if given the
task of simplifying a Boolan expression. This is usually quick,
but has very ugly worst-case overhead.)

I never thought of viewing this as a regexp, though. For this,
one would need to be able to introduce positions, e.g. through
the ability of putting "^" in the middle of an expression.
Example:

^(A|B|C)(^D)|^(E|D)

where A ... E would be strings containing 0, 1, or .

Example:

ip_tos == 0xb8:   ^........10111000

Now, how to turn such expressions with embedded ^ efficiently
into a DFA ?

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-24  4:11                       ` Werner Almesberger
@ 2003-05-24  4:23                         ` Werner Almesberger
  0 siblings, 0 replies; 19+ messages in thread
From: Werner Almesberger @ 2003-05-24  4:23 UTC (permalink / raw)
  To: Ethan Sommer
  Cc: Philippe Biondi, Jamal Hadi, Martin Josefsson, David S. Miller,
	linux-net, netdev

I wrote:
> Now, how to turn such expressions with embedded ^ efficiently
> into a DFA ?

Well, that's too easy. The DFA should also test bits only in the
order in which they appear in the packet. Constructing a DFA that
jumps back and forth and tests the same bit over and over again
wouldn't be much of a challenge :-)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-21 12:39             ` Jamal Hadi
  2003-05-21 13:20               ` Philippe Biondi
@ 2003-05-21 15:42               ` Ethan Sommer
  1 sibling, 0 replies; 19+ messages in thread
From: Ethan Sommer @ 2003-05-21 15:42 UTC (permalink / raw)
  To: Jamal Hadi; +Cc: Martin Josefsson, David S. Miller, linux-net, netdev, biondi

Jamal Hadi wrote:

>I think you should do some measurements - "it  doesnt slow 100Mbps" and
>"lets worry about it when we get to 1 or 10Gbps" are handwaving at best.
>Infact i would strongly recommend looking at the libqpsearch above.
>  
>
Agreed. However, libqpsearch doesn't do regex yet, so we can't use it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS]
  2003-05-20 15:00         ` Jamal Hadi
  2003-05-20 15:15           ` Martin Josefsson
@ 2003-05-20 19:50           ` Ethan Sommer
  1 sibling, 0 replies; 19+ messages in thread
From: Ethan Sommer @ 2003-05-20 19:50 UTC (permalink / raw)
  To: Jamal Hadi; +Cc: David S. Miller, linux-net, netdev

Jamal Hadi wrote:

>On Tue, 20 May 2003, Ethan Sommer wrote:
>
>  
>
>>Nope. I need to strip out all the nulls from the packet, or any posix
>>regex parser will think the string ends at the first null. (so protocols
>>which use null's will be difficult/impossible to identify)
>>    
>>
>
>Ok, i see your dilema. How does snort do it? I dont think copying the
>packet is the right way to do it. Could the null NOT be considered as
>something speacial unless explicitly stated?
>
>  
>
One thing I should have pointed out earlier, it only copies that 
memory/does regex stuff until it finds a match or the first 8 packets, 
whichever is less. So, at least based on my tests, it doesn't seem to 
slow down 100BT much from what it would be otherwise. We might run into 
trouble if we look at GB or 10GB, but until we find a problem with 
speed, I think it is probably more important to make this as simple and 
easy to maintain as possible. If we see a need to make it more 
complicated due to speed issues, _then_ we should think about trying to 
get rid of that copy.

Ethan
http://l7-filter.sf.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-05-24  7:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-19  3:01 [Fwd: [ANNOUNCE] Layer-7 Filter for Linux QoS] David S. Miller
2003-05-20  0:38 ` Jamal Hadi
2003-05-20  5:07   ` Ethan Sommer
2003-05-20 12:14     ` Jamal Hadi
2003-05-20 14:39       ` Ethan Sommer
2003-05-20 15:00         ` Jamal Hadi
2003-05-20 15:15           ` Martin Josefsson
2003-05-21 12:39             ` Jamal Hadi
2003-05-21 13:20               ` Philippe Biondi
2003-05-21 15:46                 ` Ethan Sommer
2003-05-21 23:11                   ` Philippe Biondi
2003-05-21 23:26                     ` Ethan Sommer
2003-05-22  8:26                       ` Philippe Biondi
2003-05-22 14:40                         ` Ethan Sommer
2003-05-24  7:22                           ` Werner Almesberger
2003-05-24  4:11                       ` Werner Almesberger
2003-05-24  4:23                         ` Werner Almesberger
2003-05-21 15:42               ` Ethan Sommer
2003-05-20 19:50           ` Ethan Sommer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).