netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@suug.ch>
To: jamal <hadi@cyberus.ca>
Cc: netdev@oss.sgi.com, Pablo Neira <pablo@eurodev.net>
Subject: Re: [RFC] textsearch infrastructure + skb_find_text()
Date: Thu, 5 May 2005 16:12:22 +0200	[thread overview]
Message-ID: <20050505141222.GA25977@postel.suug.ch> (raw)
In-Reply-To: <1115296937.7680.52.camel@localhost.localdomain>

* jamal <1115296937.7680.52.camel@localhost.localdomain> 2005-05-05 08:42
> How is this different from libqsearch? IIRC, it also kept pointers and
> callbacks.

The main difference is that my infrastructure is much simpler.

> BTW, I hope theres sync with libqsearch - at least some canibalization
> of ideas.

I read the libqsearch code but fixing up all the issues required
for my use would have taken up more time than writing up something new.

> Also hopefully, pluggin of ne algorithms is trivial (e.g boyer-moore
> could be included in addition to kmp etc)

Very simple, ts_kmp.c is 108 lines whereas simple.c in libqsearch
is over 500.

> I have a lot of questions:
> - does a string have to be terminated by \0?

All strings are length terminated.

> - do you keep state of the string from the begining? ex: how do you know
> that preceeding "hanky" was "Need a"?

I store the shift of a match in struct ts_state, add the length of
the pattern in the call to textsearch_next() and provide this shift
as offset to the get_text() callback.

> - all sorts of limits: how long is the string? etc

Both the length of a pattern and the length of a text block is
limited by INT_MAX. However, since there can be an arbitary number
of blocks there is no limit in the text length. Depending on the
search algorithm there might be more limits, for example kmp uses
unsigned int for each prefix table entry so this limits the length
of the pattern as well (theoretically).

> - what happens if a string spans multiple skbs or even multiple
> fragments?

This was the single most important requirement. Actually you can
compose the text out of anything. I already wrote some code to
handle paged skbs and multiple skbs can be implemented the same
way. You could for example store a ts_state per conntrack and
call textsearch_next continuously until you find a match. Would
require some magic in the get_text() but shouldn't be too hard.

> 
> > You might wonder about the 1 given to _prepare(),  it indicates whether
> > to autoload modules because the ematches will need it to be able to drop
> > rtnl sem.
> > 
> 
> do you really wanna leave that decision upto the user?

We have to, otherwise we can't use it in ematches without the risk
of deadlocks. There is no way around than have the caller drop its
locks and call prepare again with no locks held, at least I'm not
aware of one.

> It would be nice to have other utilities which could be loaded eg; case
> compare, regualr expressions, strchr after you match, etc

Indeed, I'm working on the regular expression thing but it has
some issues with textsearch_next() for patterns like .+abc.+ which
I want to resolve first.

  reply	other threads:[~2005-05-05 14:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-04 23:40 [RFC] textsearch infrastructure + skb_find_text() Thomas Graf
2005-05-05 12:42 ` jamal
2005-05-05 14:12   ` Thomas Graf [this message]
2005-05-05 17:02 ` Pablo Neira
2005-05-05 17:42   ` Thomas Graf
2005-05-06  1:33     ` Pablo Neira
2005-05-06 12:36       ` Thomas Graf
2005-05-06 13:04         ` jamal
2005-05-06 14:43           ` Thomas Graf
2005-05-07 13:03             ` Jamal Hadi Salim
2005-05-08 11:45               ` Thomas Graf
2005-05-06 21:44 ` Thomas Graf
2005-05-07  0:17   ` YOSHIFUJI Hideaki / 吉藤英明
2005-05-07  0:36     ` Thomas Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050505141222.GA25977@postel.suug.ch \
    --to=tgraf@suug.ch \
    --cc=hadi@cyberus.ca \
    --cc=netdev@oss.sgi.com \
    --cc=pablo@eurodev.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).