From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?q?Micha=C5=82=20Kiedrowicz?= Subject: [PATCH V2 4/5] git-grep: Learn PCRE Date: Thu, 5 May 2011 00:00:20 +0200 Message-ID: <1304546421-25439-5-git-send-email-michal.kiedrowicz@gmail.com> References: <1304546421-25439-1-git-send-email-michal.kiedrowicz@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Junio C Hamano , Martin Langhoff , =?UTF-8?q?Micha=C5=82=20Kiedrowicz?= To: Git List X-From: git-owner@vger.kernel.org Thu May 05 00:01:51 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QHk8j-0002pM-S1 for gcvg-git-2@lo.gmane.org; Thu, 05 May 2011 00:01:50 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756008Ab1EDWB1 convert rfc822-to-quoted-printable (ORCPT ); Wed, 4 May 2011 18:01:27 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:33565 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755917Ab1EDWBZ (ORCPT ); Wed, 4 May 2011 18:01:25 -0400 Received: by mail-ww0-f44.google.com with SMTP id 36so1707354wwa.1 for ; Wed, 04 May 2011 15:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:mime-version:content-type :content-transfer-encoding; bh=JvwwkOLt7HK7yknAvz+iaqRGXy19cHQk5WT+fa8HqFo=; b=nE9C7WP/uGJ1XL3ooGUOSqj4b9QOwaJ+l8xbTNFjTxDrlHH1NalrjL2y3SpS+dGRB7 Ov2cH3kNAwhNLHk4yXhvZcteB8ZIYSx6Vsd8tuiKbN64B7Mbwb2MviIXFuxuiiFpLJ2h jcNGSCVv2OFFYUXyXGQr/2P1UYh/wwptH6nPo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :mime-version:content-type:content-transfer-encoding; b=O1C5/6YM+t5Z0VmBjeS4Rzxez1fmSVYXmEm5qKcTQJG+LurNFtvVL2VLe6VKTKo9Ka nZlauMwlYI34R7IXSvv3YIdCJ8hTk5iPhpvDLFlpvo2ILdgfBiQOmGtYAJDP0zvlkIjk sRjgCSdnE+CSrLBIlNjTO4XTGoYmqiWlsXpvA= Received: by 10.227.203.145 with SMTP id fi17mr1651526wbb.106.1304546484410; Wed, 04 May 2011 15:01:24 -0700 (PDT) Received: from localhost (85-177-78-94.net.stream.pl [94.78.177.85]) by mx.google.com with ESMTPS id o23sm975872wbc.27.2011.05.04.15.01.23 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 04 May 2011 15:01:24 -0700 (PDT) X-Mailer: git-send-email 1.7.3.4 In-Reply-To: <1304546421-25439-1-git-send-email-michal.kiedrowicz@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on th= e command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. Signed-off-by: Micha=C5=82 Kiedrowicz --- Documentation/git-grep.txt | 6 +++ Makefile | 16 +++++++ builtin/grep.c | 2 + contrib/completion/git-completion.bash | 1 + grep.c | 77 ++++++++++++++++++++++++= +++++++- grep.h | 9 ++++ 6 files changed, 110 insertions(+), 1 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a58378..e150c77 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -12,6 +12,7 @@ SYNOPSIS 'git grep' [-a | --text] [-I] [-i | --ignore-case] [-w | --word-regexp= ] [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] + [-P | --perl-regexp] [-F | --fixed-strings] [-n | --line-number] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) []] @@ -97,6 +98,11 @@ OPTIONS Use POSIX extended/basic regexp for patterns. Default is to use basic regexp. =20 +-P:: +--perl-regexp:: + Use Perl-compatible regexp for patterns. Requires libpcre to be + compiled in. + -F:: --fixed-strings:: Use fixed strings for patterns (don't interpret pattern diff --git a/Makefile b/Makefile index 3a1fe20..98841dc 100644 --- a/Makefile +++ b/Makefile @@ -24,6 +24,12 @@ all:: # Define NO_OPENSSL environment variable if you do not have OpenSSL. # This also implies BLK_SHA1. # +# Define NO_LIBPCRE if you do not have libpcre installed. git-grep ca= nnot use +# Perl-compatible regexes. +# +# Define LIBPCREDIR=3D/foo/bar if your libpcre header and library file= s are in +# /foo/bar/include and /foo/bar/lib directories. +# # Define NO_CURL if you do not have libcurl installed. git-http-pull = and # git-http-push are not built, and you cannot use http:// and https:// # transports. @@ -1251,6 +1257,16 @@ ifdef NO_LIBGEN_H COMPAT_OBJS +=3D compat/basename.o endif =20 +ifdef NO_LIBPCRE + BASIC_CFLAGS +=3D -DNO_LIBPCRE +else + ifdef LIBPCREDIR + BASIC_CFLAGS +=3D -I$(LIBPCREDIR)/include + EXTLIBS +=3D -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(= lib) + endif + EXTLIBS +=3D -lpcre +endif + ifdef NO_CURL BASIC_CFLAGS +=3D -DNO_CURL REMOTE_CURL_PRIMARY =3D diff --git a/builtin/grep.c b/builtin/grep.c index 10a1f65..6831975 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -781,6 +781,8 @@ int cmd_grep(int argc, const char **argv, const cha= r *prefix) REG_EXTENDED), OPT_BOOLEAN('F', "fixed-strings", &opt.fixed, "interpret patterns as fixed strings"), + OPT_BOOLEAN('P', "perl-regexp", &opt.pcre, + "use Perl-compatible regular expressions"), OPT_GROUP(""), OPT_BOOLEAN('n', "line-number", &opt.linenum, "show line numbers"), OPT_NEGBIT('h', NULL, &opt.pathname, "don't show filenames", 1), diff --git a/contrib/completion/git-completion.bash b/contrib/completio= n/git-completion.bash index 4b2654d..95790a1 100755 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1487,6 +1487,7 @@ _git_grep () --text --ignore-case --word-regexp --invert-match --full-name --line-number --extended-regexp --basic-regexp --fixed-strings + --perl-regexp --files-with-matches --name-only --files-without-match --max-depth diff --git a/grep.c b/grep.c index d67baf9..561b791 100644 --- a/grep.c +++ b/grep.c @@ -3,6 +3,71 @@ #include "userdiff.h" #include "xdiff-interface.h" =20 +#ifdef NO_LIBPCRE +static void compile_pcre_regexp(struct grep_pat *p, struct grep_opt *o= pt) +{ + die("cannot use Perl-compatible regexes when libpcre is not compiled = in"); +} + +static int pcrematch(struct grep_pat *p, char *line, char *eol, + regmatch_t *match, int eflags) +{ + die("cannot use Perl-compatible regexes when libpcre is not compiled = in"); +} + +static void free_pcre_regexp(struct grep_pat *p) +{ + die("cannot use Perl-compatible regexes when libpcre is not compiled = in"); +} + +#else /* !NO_LIBPCRE */ +static void compile_pcre_regexp(struct grep_pat *p, struct grep_opt *o= pt) +{ + const char *error; + int erroffset; + int options =3D 0; + + if (opt->ignore_case) + options |=3D PCRE_CASELESS; + + p->pcre_regexp =3D pcre_compile(p->pattern, options, &error, &erroffs= et, + NULL); + if (!p->pcre_regexp) + die("'%s': %s", p->pattern, error); + + p->extra =3D pcre_study(p->pcre_regexp, 0, &error); + if (!p->extra && error) + die("%s", error); +} + +static int pcrematch(struct grep_pat *p, char *line, char *eol, + regmatch_t *match, int eflags) +{ + int ovector[30], ret, flags =3D 0; + + if (eflags & REG_NOTBOL) + flags |=3D PCRE_NOTBOL; + + ret =3D pcre_exec(p->pcre_regexp, p->extra, line, eol - line, 0, flag= s, + ovector, ARRAY_SIZE(ovector)); + if (ret < 0 && ret !=3D PCRE_ERROR_NOMATCH) + die("pcre_exec failed with error code %d", ret); + if (ret > 0) { + ret =3D 0; + match->rm_so =3D ovector[0]; + match->rm_eo =3D ovector[1]; + } + + return ret; +} + +static void free_pcre_regexp(struct grep_pat *p) +{ + pcre_free(p->pcre_regexp); + pcre_free(p->extra); +} +#endif /* !NO_LIBPCRE */ + void append_header_grep_pattern(struct grep_opt *opt, enum grep_header= _field field, const char *pat) { struct grep_pat *p =3D xcalloc(1, sizeof(*p)); @@ -70,6 +135,11 @@ static void compile_regexp(struct grep_pat *p, stru= ct grep_opt *opt) if (p->fixed) return; =20 + if (opt->pcre) { + compile_pcre_regexp(p, opt); + return; + } + err =3D regcomp(&p->regexp, p->pattern, opt->regflags); if (err) { char errbuf[1024]; @@ -320,7 +390,10 @@ void free_grep_patterns(struct grep_opt *opt) case GREP_PATTERN: /* atom */ case GREP_PATTERN_HEAD: case GREP_PATTERN_BODY: - regfree(&p->regexp); + if (p->pcre_regexp) + free_pcre_regexp(p); + else + regfree(&p->regexp); break; default: break; @@ -419,6 +492,8 @@ static int patmatch(struct grep_pat *p, char *line,= char *eol, =20 if (p->fixed) hit =3D !fixmatch(p, line, eol, match); + else if (p->pcre_regexp) + hit =3D !pcrematch(p, line, eol, match, eflags); else hit =3D !regmatch(&p->regexp, line, eol, match, eflags); =20 diff --git a/grep.h b/grep.h index 06621fe..68aa47a 100644 --- a/grep.h +++ b/grep.h @@ -1,6 +1,12 @@ #ifndef GREP_H #define GREP_H #include "color.h" +#ifndef NO_LIBPCRE +#include +#else +typedef int pcre; +typedef int pcre_extra; +#endif /* NO_LIBPCRE */ =20 enum grep_pat_token { GREP_PATTERN, @@ -33,6 +39,8 @@ struct grep_pat { size_t patternlen; enum grep_header_field field; regex_t regexp; + pcre *pcre_regexp; + pcre_extra *extra; unsigned fixed:1; unsigned ignore_case:1; unsigned word_regexp:1; @@ -83,6 +91,7 @@ struct grep_opt { #define GREP_BINARY_TEXT 2 int binary; int extended; + int pcre; int relative; int pathname; int null_following_name; --=20 1.7.3.4