From: jamal <hadi@cyberus.ca>
To: Thomas Graf <tgraf@suug.ch>
Cc: netdev@oss.sgi.com, Pablo Neira <pablo@eurodev.net>
Subject: Re: [RFC] textsearch infrastructure + skb_find_text()
Date: Thu, 05 May 2005 08:42:17 -0400 [thread overview]
Message-ID: <1115296937.7680.52.camel@localhost.localdomain> (raw)
In-Reply-To: <20050504234036.GH18452@postel.suug.ch>
On Thu, 2005-05-05 at 01:40 +0200, Thomas Graf wrote:
> The patch below is a report on the current state of the textsearch
> infrastructure and its first user skb_find_text(). The textsearch
> is kept as simple as possible but advanced enough to handle non-linear
> data such as skb fragments. Unlike in many other approaches the text
> input is not seen as a single pointer but rather as a continuously
> called callback get_text() until 0 is returned allowing to search
> on any kind of data and to implement customized from-to limits.
>
How is this different from libqsearch? IIRC, it also kept pointers and
callbacks.
BTW, I hope theres sync with libqsearch - at least some canibalization
of ideas.
Also hopefully, pluggin of ne algorithms is trivial (e.g boyer-moore
could be included in addition to kmp etc)
> The patch is separated into 3 parts, the first one being the textsearch
> infrastructure itself followed by a simple Knuth-Morris-Pratt
> implementation for reference. I'm also working on what could be called
> the smallest regular expression implementation ever but I left that
> out for now since it still has issues. Last but not least the
> function skb_find_text() written in a hurry and probably not yet
> correct but you should get the idea. From a userspace perspective
> the first user will be an ematch but writing it will be peanuts
> so I left it out for now.
>
nice
> Basically what it looks like right now is:
>
> int pos;
> struct ts_state;
> struct ts_config *conf = textsearch_prepare("kmp", "hanky", 5, GFP_KERNEL, 1);
>
> /* search for "hanky" at offset 20 until end of packet */
> for (pos = skb_find_text(skb, 20, INT_MAX, conf, &state;
> pos >= 0;
> pos = textsearch_next(conf, &state)) {
> printk("Need a hanky? I found one at offset %d.\n", pos);
> }
>
I have a lot of questions:
- does a string have to be terminated by \0?
- do you keep state of the string from the begining? ex: how do you know
that preceeding "hanky" was "Need a"?
- all sorts of limits: how long is the string? etc
- what happens if a string spans multiple skbs or even multiple
fragments?
> textsearch_put(conf);
> kfree(conf);
>
> You might wonder about the 1 given to _prepare(), it indicates whether
> to autoload modules because the ematches will need it to be able to drop
> rtnl sem.
>
do you really wanna leave that decision upto the user?
> The code is not tested and cerainly not bug free yet but should compile.
>
> Thoughts?
I dont have time to look at the patch to sufficiently critique it, but
it looks like a good start - maybe this weekend.
It would be nice to have other utilities which could be loaded eg; case
compare, regualr expressions, strchr after you match, etc
Of course all this to be followed by actions such as strok etc.
Probably all this is a layer above this - but essentially when you are
doing this keep the desire to do this in mind.
cheers,
jamal
next prev parent reply other threads:[~2005-05-05 12:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-04 23:40 [RFC] textsearch infrastructure + skb_find_text() Thomas Graf
2005-05-05 12:42 ` jamal [this message]
2005-05-05 14:12 ` Thomas Graf
2005-05-05 17:02 ` Pablo Neira
2005-05-05 17:42 ` Thomas Graf
2005-05-06 1:33 ` Pablo Neira
2005-05-06 12:36 ` Thomas Graf
2005-05-06 13:04 ` jamal
2005-05-06 14:43 ` Thomas Graf
2005-05-07 13:03 ` Jamal Hadi Salim
2005-05-08 11:45 ` Thomas Graf
2005-05-06 21:44 ` Thomas Graf
2005-05-07 0:17 ` YOSHIFUJI Hideaki / 吉藤英明
2005-05-07 0:36 ` Thomas Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1115296937.7680.52.camel@localhost.localdomain \
--to=hadi@cyberus.ca \
--cc=netdev@oss.sgi.com \
--cc=pablo@eurodev.net \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).