Re: [RFC] textsearch infrastructure + skb_find_text()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: jamal <hadi@cyberus.ca>
To: Thomas Graf <tgraf@suug.ch>
Cc: Pablo Neira <pablo@eurodev.net>, netdev@oss.sgi.com
Subject: Re: [RFC] textsearch infrastructure + skb_find_text()
Date: Fri, 06 May 2005 09:04:08 -0400	[thread overview]
Message-ID: <1115384649.7660.140.camel@localhost.localdomain> (raw)
In-Reply-To: <20050506123639.GE28419@postel.suug.ch>

On Fri, 2005-06-05 at 14:36 +0200, Thomas Graf wrote:
[..]

> function find-text(pattern)
>   foreach text-segment do
>     call search-bm(text-segment, pattern);
>   done
> end
> 
> This is what most people naturally think of when dealing with this
> problem and it might be sufficient for many situations, however it
> requires the searching algorithm to store its state in some persistent
> memory which can be troublesome depending on the volume of these
> variables.
> 

[..]

I dont see any difference between the two in terms of state storage.
Can you really avoid storing state? Can you give an example?

> Why? Because it dramatically simplifies the searching algorithms
> since they can decide when to fetch the next block which is
> absolutely essential for cases like regular expression where the
> state machine gets quite complex and a single entry point

The other scheme is to have the callback return a code which says
"get me more text" or " enough/done" in which case the state machine for
the search alg is no different in both cases.

>  (i.e.
> leaving the function and enter it again with new data) would make
> things much more complex. Minor advantages are a) the get_text()
> implementation is usually quite small and will hopefully not mess
> up the cacheline optimizations of the actual searching loop 

It depends on the search algorithm - if it is small, you benefit from
either scheme. If it is large, you loose in both.
 
> and
> b) it allows the searching algorithm to do prefetching.
> 

I am trying to sink this in; prefetching would be valuable for regexp,
but why would the other scheme not be able to do it? 

> << coffee break >>
> 

I have to run to work while you get coffee ;->
I will read the rest later.

cheers,
jamal

next prev parent reply	other threads:[~2005-05-06 13:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-04 23:40 [RFC] textsearch infrastructure + skb_find_text() Thomas Graf
2005-05-05 12:42 ` jamal
2005-05-05 14:12   ` Thomas Graf
2005-05-05 17:02 ` Pablo Neira
2005-05-05 17:42   ` Thomas Graf
2005-05-06  1:33     ` Pablo Neira
2005-05-06 12:36       ` Thomas Graf
2005-05-06 13:04         ` jamal [this message]
2005-05-06 14:43           ` Thomas Graf
2005-05-07 13:03             ` Jamal Hadi Salim
2005-05-08 11:45               ` Thomas Graf
2005-05-06 21:44 ` Thomas Graf
2005-05-07  0:17   ` YOSHIFUJI Hideaki / 吉藤英明
2005-05-07  0:36     ` Thomas Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1115384649.7660.140.camel@localhost.localdomain \
    --to=hadi@cyberus.ca \
    --cc=netdev@oss.sgi.com \
    --cc=pablo@eurodev.net \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).