All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH] Interpret :/<pattern> as a regular expression
Date: Thu, 14 Jun 2007 19:48:11 +1200	[thread overview]
Message-ID: <4670F2BB.5060909@vilain.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0706132317240.4059@racer.site>

Johannes Schindelin wrote:
> Actually, that's funny. Yesterday, I repeated my claim that pcre is 
> slow on IRC, and Sam Villain on IRC accused me of trolling. But as you can 
> see from my postings on this list ($gmane/41682), you can see that _I_ had 
> numbers to back up my claim.
>
> So no, I think pcre is just not worth it.
>   

A strange thing to conclude from your figures, which show pcre as the
fastest out of several libraries that you tested.

Your figures show exactly what I was saying on IRC - that a DFA
(external grep) vs NFA engine (most regex libraries) is inherently
faster. The paper I linked to, specially selected as I had previously
read a significant amount of the peer review the paper received,
explained this in detail. The one piece of feedback your numbers got
on-list also mentioned this.

However there is a further flaw in your study. All but one of the
performance tests use an external program, which on a given system may
or may not be faster because of pipeline performance characteristics.
You could improve the quality of the result by using the 'pcregrep'
program as a data point. It might also be worth trying a few more
complex patterns. I suggest reading the paper
(http://swtch.com/~rsc/regexp/regexp1.html) for some background before
repeating the experiment.

Apologies for not reviewing your numbers at the time; it sure is hard to
keep on top of this list. But very interesting that they seem to suggest
pcre would be the best choice from a performance perspective, even
though the figures are very preliminary. Perhaps it is worth pursuing
after all.

Sam.

  reply	other threads:[~2007-06-14  7:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-13  0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
2007-06-13  1:21 ` Junio C Hamano
2007-06-13 13:13   ` Johannes Schindelin
2007-06-13  4:52 ` Junio C Hamano
2007-06-13 11:10   ` Johannes Schindelin
2007-06-13 11:17   ` (unknown) Johannes Schindelin
2007-06-13 12:11     ` [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
2007-06-13 18:41 ` Jeff King
2007-06-13 18:54   ` Johannes Schindelin
2007-06-13 20:00     ` Jeff King
2007-06-13 22:20       ` Johannes Schindelin
2007-06-14  7:48         ` Sam Vilain [this message]
2007-06-14  8:09           ` Johannes Schindelin
2007-06-14  9:07             ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4670F2BB.5060909@vilain.net \
    --to=sam@vilain.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.