Re: [RFC] string matching ematch

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Graf <tgraf@suug.ch>
To: Pablo Neira <pablo@eurodev.net>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>,
	Patrick McHardy <kaber@trash.net>,
	netdev@oss.sgi.com
Subject: Re: [RFC] string matching ematch
Date: Thu, 27 Jan 2005 21:51:47 +0100	[thread overview]
Message-ID: <20050127205147.GS31837@postel.suug.ch> (raw)
In-Reply-To: <41F94C63.7010800@eurodev.net>

* Pablo Neira <41F94C63.7010800@eurodev.net> 2005-01-27 21:17
> Thomas Graf wrote:
> 
> >I'd like to discuss the string matching ematch, I don't care about the
> >algorithm used but rather whether to make it stateful, match over
> >fragments, etc. I attached a simple stateless string matching ematch
> >using the Knuth-Morris-Pratt algorithm as a starting point.
> > 
> 
> I've posted something similar after christmas in netfilter-devel[1]. 
> It's fragment aware, actually my implementation uses boyer-moore to look 
> for matches in the payload, and it uses brute force together with 
> Rusty's skb_iter stuff to look for matches on the edges.

I've seen it but sticked to KMP because it uses less memory. Their
searching phase time complexity is nearly equal around O(nm) for
n being the length of T[] and m being the length of P[]. BM
definitely has a better performance for highly periodic P[]'s in a
periodic T[] though.

I'm missing a few things in your string matching API, namely
the ability to define a upper limit of the searching range which
can give much better performance gains than the best optimization
can do. A naive searching method around the borders of fragments
is definiltey easier but even there you could benefit from ruling
out invalid shifts.

> The worst case is not that bad for small patterns.

I don't think that any of the algorithms really make a difference,
theoretically yes but what we're basically should be looking for
is one with good average performance by detecting unnecessary
shifts. Our T[] is limited by the skb as long as we're not going
into statefull searches and thus other resources matter more to me
than a few more cycles.

> I'll give it more spins these days since I've got some spare time. I'll 
> also have a look at your work. I think that we could join efforts and 
> push something good, thoughts?

Definitely, being able to specify the upper limit is a must for me
though. Another difference is that I compute the prefix table in
userspace.

next prev parent reply	other threads:[~2005-01-27 20:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-26 15:07 [RFC] string matching ematch Thomas Graf
2005-01-26 21:03 ` David S. Miller
2005-01-26 21:41   ` Thomas Graf
2005-01-26 23:26     ` David S. Miller
2005-01-27 14:20       ` Thomas Graf
2005-01-27 20:17 ` Pablo Neira
2005-01-27 20:51   ` Thomas Graf [this message]
2005-01-31 13:59     ` jamal
2005-02-01  1:26       ` Pablo Neira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050127205147.GS31837@postel.suug.ch \
    --to=tgraf@suug.ch \
    --cc=hadi@cyberus.ca \
    --cc=kaber@trash.net \
    --cc=netdev@oss.sgi.com \
    --cc=pablo@eurodev.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).