[RFC] textsearch infrastructure et al v2

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Graf <tgraf@suug.ch>
To: netdev@oss.sgi.com
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Subject: [RFC] textsearch infrastructure et al v2
Date: Sat, 28 May 2005 00:47:25 +0200	[thread overview]
Message-ID: <20050527224725.GG15391@postel.suug.ch> (raw)

Dave, I'm going through another RFC round to hopefully get
some comments on the notes below.

I'm not yet satistifed with the core infrastructure wrt to
non-linear data, especially for complex cases such as non
linear skbs. The current way can be easly described as:

search_occurrance()
  search-algorithm()
    while get-next-segment()
      process-segment()

So basically we save the state of the segment fetching
instead of saving the state of the search algorithm which
would be way too complex for things like regular expression
string matching.

In the case of non linear skbs this would lead to:

1) start of search algorithm
2) first call to get_text(), return skb->data
3) processing of the text segment, return if match found
4) second call to get_text(), map and return first fragment
5) processing ...
6) third call to get_text(), unmap first fragment, map and
   return second fragment
7) processing ...
n) etc.

Assuming that the most common case is to have no fragments
at all or if we only process a very limited number of
fragments (due to restricted search area) this should be
quite fast. However, the complexity of get_text() for the
skb case including the map/unmap of the fragment interrupts
the search flow quite a bit which hurts performance.

In order to address this issue I'm having second thoughts
about whether to allow the textsearch user to provide an
array of text segments including their size, i.e. the skb
fragments could be maped in prior and the algorithm could
scan through it like nothing. ;-> However, this has one
essential drawback, we might map fragments which we'll
never use so this can be slower in the end.

Any other ideas around?

next             reply	other threads:[~2005-05-27 22:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-27 22:47 Thomas Graf [this message]
2005-05-27 22:47 ` [PATCH 1/5] [LIB] textsearch infrastructure Thomas Graf
2005-05-27 22:48 ` [PATCH 2/5] [LIB] Knuth-Morris-Pratt string-matching algorithm Thomas Graf
2005-05-27 22:48 ` [PATCH 3/5] [LIB] Naive regular expression " Thomas Graf
2005-05-27 22:48 ` [PATCH 4/5] [NET] Add skb_find_text() to search for a text pattern in skbs Thomas Graf
2005-05-28  3:11   ` Pablo Neira
2005-05-28 11:32     ` Thomas Graf
2005-05-27 22:49 ` [PATCH 5/5] [PKT_SCHED] textsearch ematch Thomas Graf
2005-05-28 11:59 ` [RFC] textsearch infrastructure et al v2 jamal
2005-05-28 12:35   ` Thomas Graf
2005-05-28 12:56     ` Pablo Neira
2005-05-28 12:58       ` Pablo Neira
2005-05-28 12:58       ` Pablo Neira
2005-05-28 13:58       ` Thomas Graf
2005-05-31 22:05       ` David S. Miller
2005-05-31 21:56 ` David S. Miller
2005-05-31 22:44   ` Thomas Graf
2005-05-31 22:50     ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050527224725.GG15391@postel.suug.ch \
    --to=tgraf@suug.ch \
    --cc=hadi@cyberus.ca \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).