Re: [RFC] textsearch infrastructure et al v2

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Graf <tgraf@suug.ch>
To: jamal <hadi@cyberus.ca>
Cc: netdev@oss.sgi.com
Subject: Re: [RFC] textsearch infrastructure et al v2
Date: Sat, 28 May 2005 14:35:42 +0200	[thread overview]
Message-ID: <20050528123542.GR15391@postel.suug.ch> (raw)
In-Reply-To: <1117281581.6251.68.camel@localhost.localdomain>

* jamal <1117281581.6251.68.camel@localhost.localdomain> 2005-05-28 07:59
> I dont have  opinions on this since you and Pablo seem to agree on it.
> What I remember is that libssearch (or whatever that thing Harald was
> looking at) did it differently (callback invoked and it return a code
> which said to continue or not).

Same for my infrastructure with the difference that libqsearch uses
a single input buffer so no chance for non-linear data.

> Also the design should (I think you do already, just double checking) -
> should be wary of optimizing for a specific algorithmn; i see you have
> KMP but not Boyer-Moore for example and i am not sure what the
> repurcassions of above approach are etc etc.

This is a weakness of the current implementation, it could be
addresses via the other method that I proposed. The current patch
doesn't allow for random access over fragment borders which means
that Boyer-Moore would require a temporary buffer with a size of
the pattern length in order to do random access over multiple
fragments. However, I haven't seen any infrastructure yet that can
handle non-linear data with bm wihout a complete prefetch in the
beginning though. You could still implement a bm by either using a
naive search at the borders or simply define the limitation that you
can only look at the first fragment so you'd have to linearize.

> > So basically we save the state of the segment fetching
> > instead of saving the state of the search algorithm which
> > would be way too complex for things like regular expression
> > string matching.
> > 
> 
> Can you explain this a little more. Didnt quiet understand.
> If you are going across skbs, dont you need to save state?

Most people look at it from the angle of iterate over text segments
and apply a text search algorithm. I inverted this and said,
start the text search algorithm and iterate over all text
segments inside the algorithm. This makes many things a lot
easier because we only have to save the state of the iterator.
This brings up another limitation though, we cannot stop a search
in progress and continue later on.

> I think the best approach would be to first linearize then search. 

This is what I'm trying to avoid. ;->

> On ingress as well you are also better off to assemble fragments first
> on that side before doing text searches. 

We can still do this optionally, just because we support it doesn't
mean we have to use it.

> Perhaps implementing an action like the one used by openbsd to
> "normalize" packets before a string match.
> i.e 
> classifier (pkt header) --> normalize --> classifier(string match)-> ?

> Infact one could argue that if you are to scan a virus you may need
> to assemble more than one skb on ingress (essentially a message).

http://oss.sgi.com/archives/netdev/2005-05/msg00298.html

See the section after <<coffee break>>

> The only other comment i have is on the patch you called naive regexp;
> I think you should probably call it something else instead of regexp
> since you invented it.

I was never good at names, any suggestions?

next prev parent reply	other threads:[~2005-05-28 12:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-27 22:47 [RFC] textsearch infrastructure et al v2 Thomas Graf
2005-05-27 22:47 ` [PATCH 1/5] [LIB] textsearch infrastructure Thomas Graf
2005-05-27 22:48 ` [PATCH 2/5] [LIB] Knuth-Morris-Pratt string-matching algorithm Thomas Graf
2005-05-27 22:48 ` [PATCH 3/5] [LIB] Naive regular expression " Thomas Graf
2005-05-27 22:48 ` [PATCH 4/5] [NET] Add skb_find_text() to search for a text pattern in skbs Thomas Graf
2005-05-28  3:11   ` Pablo Neira
2005-05-28 11:32     ` Thomas Graf
2005-05-27 22:49 ` [PATCH 5/5] [PKT_SCHED] textsearch ematch Thomas Graf
2005-05-28 11:59 ` [RFC] textsearch infrastructure et al v2 jamal
2005-05-28 12:35   ` Thomas Graf [this message]
2005-05-28 12:56     ` Pablo Neira
2005-05-28 12:58       ` Pablo Neira
2005-05-28 12:58       ` Pablo Neira
2005-05-28 13:58       ` Thomas Graf
2005-05-31 22:05       ` David S. Miller
2005-05-31 21:56 ` David S. Miller
2005-05-31 22:44   ` Thomas Graf
2005-05-31 22:50     ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050528123542.GR15391@postel.suug.ch \
    --to=tgraf@suug.ch \
    --cc=hadi@cyberus.ca \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).