* [PATCH 1/3] grep: Put calls to fixmatch() and regmatch() into patmatch() @ 2011-05-03 21:35 Michał Kiedrowicz 2011-05-03 21:35 ` [PATCH 2/3] Documentation: Add --line-number to git-grep synopsis Michał Kiedrowicz 2011-05-03 21:35 ` [RFC PATCH 3/3] git-grep: Learn PCRE Michał Kiedrowicz 0 siblings, 2 replies; 6+ messages in thread From: Michał Kiedrowicz @ 2011-05-03 21:35 UTC (permalink / raw) To: Git List; +Cc: Michał Kiedrowicz --- grep.c | 23 +++++++++++++++-------- 1 files changed, 15 insertions(+), 8 deletions(-) diff --git a/grep.c b/grep.c index 63c4280..d67baf9 100644 --- a/grep.c +++ b/grep.c @@ -412,6 +412,19 @@ static int regmatch(const regex_t *preg, char *line, char *eol, return regexec(preg, line, 1, match, eflags); } +static int patmatch(struct grep_pat *p, char *line, char *eol, + regmatch_t *match, int eflags) +{ + int hit; + + if (p->fixed) + hit = !fixmatch(p, line, eol, match); + else + hit = !regmatch(&p->regexp, line, eol, match, eflags); + + return hit; +} + static int strip_timestamp(char *bol, char **eol_p) { char *eol = *eol_p; @@ -461,10 +474,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, } again: - if (p->fixed) - hit = !fixmatch(p, bol, eol, pmatch); - else - hit = !regmatch(&p->regexp, bol, eol, pmatch, eflags); + hit = patmatch(p, bol, eol, pmatch, eflags); if (hit && p->word_regexp) { if ((pmatch[0].rm_so < 0) || @@ -791,10 +801,7 @@ static int look_ahead(struct grep_opt *opt, int hit; regmatch_t m; - if (p->fixed) - hit = !fixmatch(p, bol, bol + *left_p, &m); - else - hit = !regmatch(&p->regexp, bol, bol + *left_p, &m, 0); + hit = patmatch(p, bol, bol + *left_p, &m, 0); if (!hit || m.rm_so < 0 || m.rm_eo < 0) continue; if (earliest < 0 || m.rm_so < earliest) -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] Documentation: Add --line-number to git-grep synopsis 2011-05-03 21:35 [PATCH 1/3] grep: Put calls to fixmatch() and regmatch() into patmatch() Michał Kiedrowicz @ 2011-05-03 21:35 ` Michał Kiedrowicz 2011-05-03 21:35 ` [RFC PATCH 3/3] git-grep: Learn PCRE Michał Kiedrowicz 1 sibling, 0 replies; 6+ messages in thread From: Michał Kiedrowicz @ 2011-05-03 21:35 UTC (permalink / raw) To: Git List; +Cc: Michał Kiedrowicz --- Documentation/git-grep.txt | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index d7523b3..4a58378 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -12,7 +12,7 @@ SYNOPSIS 'git grep' [-a | --text] [-I] [-i | --ignore-case] [-w | --word-regexp] [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] - [-F | --fixed-strings] [-n] + [-F | --fixed-strings] [-n | --line-number] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) [<pager>]] [-z | --null] -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH 3/3] git-grep: Learn PCRE 2011-05-03 21:35 [PATCH 1/3] grep: Put calls to fixmatch() and regmatch() into patmatch() Michał Kiedrowicz 2011-05-03 21:35 ` [PATCH 2/3] Documentation: Add --line-number to git-grep synopsis Michał Kiedrowicz @ 2011-05-03 21:35 ` Michał Kiedrowicz 2011-05-03 23:11 ` Junio C Hamano 2011-05-03 23:15 ` Martin Langhoff 1 sibling, 2 replies; 6+ messages in thread From: Michał Kiedrowicz @ 2011-05-03 21:35 UTC (permalink / raw) To: Git List; +Cc: Michał Kiedrowicz This is quite minimal (I hope) patch that teaches git-grep the --perl-regexp/-P options in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. Here are some implementation notes: Names of command line options are borrowed from GNU grep. Makefile changes are based on curl handing. The configure script wasn't updated. This patch lacks tests. AFAIK grep.c is a part of libgit.a, so libpcre is linked with the git itself, which bloats it. I don't like it, especially because it makes 'make test' take: real 6m27.558s user 1m34.392s sys 2m11.029s instead of real 3m41.322s user 1m44.005s sys 2m32.403s on my PC. --- Documentation/git-grep.txt | 6 ++++ Makefile | 16 ++++++++++++ builtin/grep.c | 2 + grep.c | 57 ++++++++++++++++++++++++++++++++++++++++++++ grep.h | 8 ++++++ 5 files changed, 89 insertions(+), 0 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a58378..e150c77 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -12,6 +12,7 @@ SYNOPSIS 'git grep' [-a | --text] [-I] [-i | --ignore-case] [-w | --word-regexp] [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] + [-P | --perl-regexp] [-F | --fixed-strings] [-n | --line-number] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) [<pager>]] @@ -97,6 +98,11 @@ OPTIONS Use POSIX extended/basic regexp for patterns. Default is to use basic regexp. +-P:: +--perl-regexp:: + Use Perl-compatible regexp for patterns. Requires libpcre to be + compiled in. + -F:: --fixed-strings:: Use fixed strings for patterns (don't interpret pattern diff --git a/Makefile b/Makefile index 3a1fe20..98841dc 100644 --- a/Makefile +++ b/Makefile @@ -24,6 +24,12 @@ all:: # Define NO_OPENSSL environment variable if you do not have OpenSSL. # This also implies BLK_SHA1. # +# Define NO_LIBPCRE if you do not have libpcre installed. git-grep cannot use +# Perl-compatible regexes. +# +# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in +# /foo/bar/include and /foo/bar/lib directories. +# # Define NO_CURL if you do not have libcurl installed. git-http-pull and # git-http-push are not built, and you cannot use http:// and https:// # transports. @@ -1251,6 +1257,16 @@ ifdef NO_LIBGEN_H COMPAT_OBJS += compat/basename.o endif +ifdef NO_LIBPCRE + BASIC_CFLAGS += -DNO_LIBPCRE +else + ifdef LIBPCREDIR + BASIC_CFLAGS += -I$(LIBPCREDIR)/include + EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib) + endif + EXTLIBS += -lpcre +endif + ifdef NO_CURL BASIC_CFLAGS += -DNO_CURL REMOTE_CURL_PRIMARY = diff --git a/builtin/grep.c b/builtin/grep.c index 10a1f65..6831975 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -781,6 +781,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) REG_EXTENDED), OPT_BOOLEAN('F', "fixed-strings", &opt.fixed, "interpret patterns as fixed strings"), + OPT_BOOLEAN('P', "perl-regexp", &opt.pcre, + "use Perl-compatible regular expressions"), OPT_GROUP(""), OPT_BOOLEAN('n', "line-number", &opt.linenum, "show line numbers"), OPT_NEGBIT('h', NULL, &opt.pathname, "don't show filenames", 1), diff --git a/grep.c b/grep.c index d67baf9..4f5fcbb 100644 --- a/grep.c +++ b/grep.c @@ -70,6 +70,30 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) if (p->fixed) return; + if (opt->pcre) { +#ifdef NO_LIBPCRE + die("cannot use Perl-compatible regexes when libpcre is not compiled in"); +#else + const char *error; + int erroffset; + int options = 0; + + if (opt->ignore_case) + options |= PCRE_CASELESS; + + p->re = pcre_compile(p->pattern, options, &error, + &erroffset, NULL); + if (!p->re) + die("'%s': %s", p->pattern, error); + + p->extra = pcre_study(p->re, 0, &error); + if (!p->extra && error) + die("%s", error); + + return; +#endif + } + err = regcomp(&p->regexp, p->pattern, opt->regflags); if (err) { char errbuf[1024]; @@ -320,7 +344,16 @@ void free_grep_patterns(struct grep_opt *opt) case GREP_PATTERN: /* atom */ case GREP_PATTERN_HEAD: case GREP_PATTERN_BODY: +#ifndef NO_LIBPCRE + if (p->re) { + pcre_free(p->re); + pcre_free(p->extra); + } else { + regfree(&p->regexp); + } +#else regfree(&p->regexp); +#endif break; default: break; @@ -401,6 +434,26 @@ static int fixmatch(struct grep_pat *p, char *line, char *eol, } } +#ifndef NO_LIBPCRE +static int pcrematch(const pcre *re, const pcre_extra *extra, char *line, + char *eol, regmatch_t *match) +{ + int ovector[30], ret; + + ret = pcre_exec(re, NULL, line, eol - line, 0, 0, ovector, + ARRAY_SIZE(ovector)); + if (ret < 0 && ret != PCRE_ERROR_NOMATCH) + die("pcre_exec failed with error code %d", ret); + if (ret > 0) { + ret = 0; + match->rm_so = ovector[0]; + match->rm_eo = ovector[1]; + } + + return ret; +} +#endif /* NO_LIBPCRE */ + static int regmatch(const regex_t *preg, char *line, char *eol, regmatch_t *match, int eflags) { @@ -419,6 +472,10 @@ static int patmatch(struct grep_pat *p, char *line, char *eol, if (p->fixed) hit = !fixmatch(p, line, eol, match); +#ifndef NO_LIBPCRE + else if (p->re) + hit = !pcrematch(p->re, p->extra, line, eol, match); +#endif /* NO_LIBPCRE */ else hit = !regmatch(&p->regexp, line, eol, match, eflags); diff --git a/grep.h b/grep.h index 06621fe..cd202a9 100644 --- a/grep.h +++ b/grep.h @@ -1,6 +1,9 @@ #ifndef GREP_H #define GREP_H #include "color.h" +#ifndef NO_LIBPCRE +#include <pcre.h> +#endif /* NO_LIBPCRE */ enum grep_pat_token { GREP_PATTERN, @@ -33,6 +36,10 @@ struct grep_pat { size_t patternlen; enum grep_header_field field; regex_t regexp; +#ifndef NO_LIBPCRE + pcre *re; + pcre_extra *extra; +#endif /* NO_LIBPCRE */ unsigned fixed:1; unsigned ignore_case:1; unsigned word_regexp:1; @@ -83,6 +90,7 @@ struct grep_opt { #define GREP_BINARY_TEXT 2 int binary; int extended; + int pcre; int relative; int pathname; int null_following_name; -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 3/3] git-grep: Learn PCRE 2011-05-03 21:35 ` [RFC PATCH 3/3] git-grep: Learn PCRE Michał Kiedrowicz @ 2011-05-03 23:11 ` Junio C Hamano 2011-05-04 6:16 ` Michal Kiedrowicz 2011-05-03 23:15 ` Martin Langhoff 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2011-05-03 23:11 UTC (permalink / raw) To: Michał Kiedrowicz; +Cc: Git List Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes: > This patch lacks tests. > > AFAIK grep.c is a part of libgit.a, so libpcre is linked with the git > itself, which bloats it. I don't like it, especially because it makes > 'make test' take: > > real 6m27.558s > user 1m34.392s > sys 2m11.029s > > instead of > > real 3m41.322s > user 1m44.005s > sys 2m32.403s > > on my PC. I am not against pcre, but the performance numbers look rather yucky. > diff --git a/grep.c b/grep.c > index d67baf9..4f5fcbb 100644 > --- a/grep.c > +++ b/grep.c > @@ -70,6 +70,30 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) > if (p->fixed) > return; > > + if (opt->pcre) { > +#ifdef NO_LIBPCRE > + die("cannot use Perl-compatible regexes when libpcre is not compiled in"); > +#else > + const char *error; > + ... > + return; > +#endif > + } > + Please avoid these #ifdefs in the middle of otherwise generic function. Make the above into something like: if (opt->pcre) { compile_pcre(p, opt); return; } and then have an extra section entirely devoted to pcre integration that has bunch of ... #ifdef NO_LIBPCRE static void compile_pcre_part(struct grep_pat *p, struct grep_opt *opt) { die("..."); } #else static void compile_pcre_part(struct grep_pat *p, struct grep_opt *opt) { ... } #endif ... that is totally separate from the generic part of the codebase. They could even be in a separate file, if you need numerous helpers. > @@ -320,7 +344,16 @@ void free_grep_patterns(struct grep_opt *opt) > case GREP_PATTERN: /* atom */ > case GREP_PATTERN_HEAD: > case GREP_PATTERN_BODY: > +#ifndef NO_LIBPCRE > + if (p->re) { > + pcre_free(p->re); > + pcre_free(p->extra); > + } else { > + regfree(&p->regexp); > + } > +#else > regfree(&p->regexp); > +#endif > break; > default: > break; > diff --git a/grep.h b/grep.h > index 06621fe..cd202a9 100644 > --- a/grep.h > +++ b/grep.h > @@ -1,6 +1,9 @@ > #ifndef GREP_H > #define GREP_H > #include "color.h" > +#ifndef NO_LIBPCRE > +#include <pcre.h> > +#endif /* NO_LIBPCRE */ This part might want to do something like #ifndef NO_LIBPCRE #include <pcre.h> #else typedef int pcre; /* dummy */ typedef int pcre_extra; /* dummy */ #endif if it makes it easier to keep the generic part of the code and structure definition clean. For example, the "free" part of your patch above then would become if (p->pcre_regexp) free_pcre_part(p); else regfree(&p->regexp); with the conditional enclosed entirely within the implementation of the free_pcre_part() helper function. Also by doing so, the struct definition below can lose #ifndef, and can become: struct grep_pat { ... + pcre *pcre_regexp; + pcre_extra *pcre_extra_info; ... }; > @@ -33,6 +36,10 @@ struct grep_pat { > size_t patternlen; > enum grep_header_field field; > regex_t regexp; > +#ifndef NO_LIBPCRE > + pcre *re; > + pcre_extra *extra; > +#endif /* NO_LIBPCRE */ > unsigned fixed:1; > unsigned ignore_case:1; > unsigned word_regexp:1; > @@ -83,6 +90,7 @@ struct grep_opt { > #define GREP_BINARY_TEXT 2 > int binary; > int extended; > + int pcre; > int relative; > int pathname; > int null_following_name; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 3/3] git-grep: Learn PCRE 2011-05-03 23:11 ` Junio C Hamano @ 2011-05-04 6:16 ` Michal Kiedrowicz 0 siblings, 0 replies; 6+ messages in thread From: Michal Kiedrowicz @ 2011-05-04 6:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List On 03.05.2011 16:11:06 -0700 Junio C Hamano <gitster@pobox.com> wrote: > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes: > > > This patch lacks tests. > > > > AFAIK grep.c is a part of libgit.a, so libpcre is linked with the > > git itself, which bloats it. I don't like it, especially because it > > makes 'make test' take: > > > > real 6m27.558s > > user 1m34.392s > > sys 2m11.029s > > > > instead of > > > > real 3m41.322s > > user 1m44.005s > > sys 2m32.403s > > > > on my PC. > > I am not against pcre, but the performance numbers look rather yucky. > > > diff --git a/grep.c b/grep.c > > index d67baf9..4f5fcbb 100644 > > --- a/grep.c > > +++ b/grep.c > > @@ -70,6 +70,30 @@ static void compile_regexp(struct grep_pat *p, > > struct grep_opt *opt) if (p->fixed) > > return; > > > > + if (opt->pcre) { > > +#ifdef NO_LIBPCRE > > + die("cannot use Perl-compatible regexes when > > libpcre is not compiled in"); +#else > > + const char *error; > > + ... > > + return; > > +#endif > > + } > > + > > Please avoid these #ifdefs in the middle of otherwise generic > function. Yeah, I don't like them too. > > Make the above into something like: > > if (opt->pcre) { > compile_pcre(p, opt); > return; > } > > and then have an extra section entirely devoted to pcre integration > that has bunch of ... > > #ifdef NO_LIBPCRE > static void compile_pcre_part(struct grep_pat *p, struct > grep_opt *opt) { > die("..."); > } > #else > static void compile_pcre_part(struct grep_pat *p, struct > grep_opt *opt) { > ... > } > #endif > > ... that is totally separate from the generic part of the codebase. > They could even be in a separate file, if you need numerous helpers. > > > @@ -320,7 +344,16 @@ void free_grep_patterns(struct grep_opt *opt) > > case GREP_PATTERN: /* atom */ > > case GREP_PATTERN_HEAD: > > case GREP_PATTERN_BODY: > > +#ifndef NO_LIBPCRE > > + if (p->re) { > > + pcre_free(p->re); > > + pcre_free(p->extra); > > + } else { > > + regfree(&p->regexp); > > + } > > +#else > > regfree(&p->regexp); > > +#endif > > break; > > default: > > break; > > > diff --git a/grep.h b/grep.h > > index 06621fe..cd202a9 100644 > > --- a/grep.h > > +++ b/grep.h > > @@ -1,6 +1,9 @@ > > #ifndef GREP_H > > #define GREP_H > > #include "color.h" > > +#ifndef NO_LIBPCRE > > +#include <pcre.h> > > +#endif /* NO_LIBPCRE */ > > This part might want to do something like > > #ifndef NO_LIBPCRE > #include <pcre.h> > #else > typedef int pcre; /* dummy */ > typedef int pcre_extra; /* dummy */ > #endif > > if it makes it easier to keep the generic part of the code and > structure definition clean. > > For example, the "free" part of your patch above then would become > > if (p->pcre_regexp) > free_pcre_part(p); > else > regfree(&p->regexp); > > with the conditional enclosed entirely within the implementation of > the free_pcre_part() helper function. > > Also by doing so, the struct definition below can lose #ifndef, and > can become: > > struct grep_pat { > ... > + pcre *pcre_regexp; > + pcre_extra *pcre_extra_info; > ... > }; That makes sense. Will re-roll (and add Signed-off-by). > > > @@ -33,6 +36,10 @@ struct grep_pat { > > size_t patternlen; > > enum grep_header_field field; > > regex_t regexp; > > +#ifndef NO_LIBPCRE > > + pcre *re; > > + pcre_extra *extra; > > +#endif /* NO_LIBPCRE */ > > unsigned fixed:1; > > unsigned ignore_case:1; > > unsigned word_regexp:1; > > @@ -83,6 +90,7 @@ struct grep_opt { > > #define GREP_BINARY_TEXT 2 > > int binary; > > int extended; > > + int pcre; > > int relative; > > int pathname; > > int null_following_name; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 3/3] git-grep: Learn PCRE 2011-05-03 21:35 ` [RFC PATCH 3/3] git-grep: Learn PCRE Michał Kiedrowicz 2011-05-03 23:11 ` Junio C Hamano @ 2011-05-03 23:15 ` Martin Langhoff 1 sibling, 0 replies; 6+ messages in thread From: Martin Langhoff @ 2011-05-03 23:15 UTC (permalink / raw) To: Michał Kiedrowicz; +Cc: Git List 2011/5/3 Michał Kiedrowicz <michal.kiedrowicz@gmail.com>: > This is quite minimal (I hope) patch that teaches git-grep the > --perl-regexp/-P options in order to allow specifying PCRE regexes on > the command line. Oh joy! m -- martin.langhoff@gmail.com martin@laptop.org -- Software Architect - OLPC - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-04 6:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-03 21:35 [PATCH 1/3] grep: Put calls to fixmatch() and regmatch() into patmatch() Michał Kiedrowicz 2011-05-03 21:35 ` [PATCH 2/3] Documentation: Add --line-number to git-grep synopsis Michał Kiedrowicz 2011-05-03 21:35 ` [RFC PATCH 3/3] git-grep: Learn PCRE Michał Kiedrowicz 2011-05-03 23:11 ` Junio C Hamano 2011-05-04 6:16 ` Michal Kiedrowicz 2011-05-03 23:15 ` Martin Langhoff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).