From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pablo Neira Subject: Re: [RFC] string matching based packet classification/filtering Date: Thu, 17 Feb 2005 02:00:57 +0100 Message-ID: <4213ECC9.30502@eurodev.net> References: <20050215203211.GL31837@postel.suug.ch> <42126C9B.5060406@eurodev.net> <20050216223038.GQ31837@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: netdev@oss.sgi.com, Netfilter Development Mailinglist , Harald Welte To: Thomas Graf In-Reply-To: <20050216223038.GQ31837@postel.suug.ch> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: netfilter-devel-bounces@lists.netfilter.org Errors-To: netfilter-devel-bounces@lists.netfilter.org List-Id: netdev.vger.kernel.org Thomas Graf wrote: >>Actually I wanted to contact Harald after this friday, once I'm done >>with my exams. I'd like to merge my work with his libqsearch hackings >>that were about to be finished. >> >> > >Thinking about it, we can use the architecture to implement a regular >expression match. The pattern would consist of an array of tokens >in the form of: > >struct regexp_token >{ > __u8 type; > __u8 recur; > __u8 value; > __u8 unused; >}; > > Yes, this is a good idea. This could be useful to match things like interrogation marks or asterisks that are usually used as wildcard. >where type would be > - specific character (must match `value') > - wildcard > - digit > - xdigit > - alpha ... > >and recur > - 1 > - 0..1 > - 0..n > - 1..0 > >The matching algorithm would parse the array as a finite automation >eating up byte by byte in the text. > Looks good but Boyer-Moore algorithm doesn't need this at all. BM doesn't support wildcards like '*' because of the shiftings that it uses to look for matches. This issue together with memory consumption are two limitations that we have to live with if we want to use BM. Anyway I think it's worth it because we can perform search in O(n/m). -- Pablo