From: jamal <hadi@cyberus.ca>
To: Thomas Graf <tgraf@suug.ch>
Cc: Pablo Neira <pablo@eurodev.net>, netdev@oss.sgi.com
Subject: Re: [RFC] textsearch infrastructure + skb_find_text()
Date: Fri, 06 May 2005 09:04:08 -0400 [thread overview]
Message-ID: <1115384649.7660.140.camel@localhost.localdomain> (raw)
In-Reply-To: <20050506123639.GE28419@postel.suug.ch>
On Fri, 2005-06-05 at 14:36 +0200, Thomas Graf wrote:
[..]
> function find-text(pattern)
> foreach text-segment do
> call search-bm(text-segment, pattern);
> done
> end
>
> This is what most people naturally think of when dealing with this
> problem and it might be sufficient for many situations, however it
> requires the searching algorithm to store its state in some persistent
> memory which can be troublesome depending on the volume of these
> variables.
>
[..]
I dont see any difference between the two in terms of state storage.
Can you really avoid storing state? Can you give an example?
> Why? Because it dramatically simplifies the searching algorithms
> since they can decide when to fetch the next block which is
> absolutely essential for cases like regular expression where the
> state machine gets quite complex and a single entry point
The other scheme is to have the callback return a code which says
"get me more text" or " enough/done" in which case the state machine for
the search alg is no different in both cases.
> (i.e.
> leaving the function and enter it again with new data) would make
> things much more complex. Minor advantages are a) the get_text()
> implementation is usually quite small and will hopefully not mess
> up the cacheline optimizations of the actual searching loop
It depends on the search algorithm - if it is small, you benefit from
either scheme. If it is large, you loose in both.
> and
> b) it allows the searching algorithm to do prefetching.
>
I am trying to sink this in; prefetching would be valuable for regexp,
but why would the other scheme not be able to do it?
> << coffee break >>
>
I have to run to work while you get coffee ;->
I will read the rest later.
cheers,
jamal
next prev parent reply other threads:[~2005-05-06 13:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-04 23:40 [RFC] textsearch infrastructure + skb_find_text() Thomas Graf
2005-05-05 12:42 ` jamal
2005-05-05 14:12 ` Thomas Graf
2005-05-05 17:02 ` Pablo Neira
2005-05-05 17:42 ` Thomas Graf
2005-05-06 1:33 ` Pablo Neira
2005-05-06 12:36 ` Thomas Graf
2005-05-06 13:04 ` jamal [this message]
2005-05-06 14:43 ` Thomas Graf
2005-05-07 13:03 ` Jamal Hadi Salim
2005-05-08 11:45 ` Thomas Graf
2005-05-06 21:44 ` Thomas Graf
2005-05-07 0:17 ` YOSHIFUJI Hideaki / 吉藤英明
2005-05-07 0:36 ` Thomas Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1115384649.7660.140.camel@localhost.localdomain \
--to=hadi@cyberus.ca \
--cc=netdev@oss.sgi.com \
--cc=pablo@eurodev.net \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).