git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Add PCRE support to git-grep
@ 2011-05-09 21:52 Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 1/6] grep: Fix a typo in a comment Michał Kiedrowicz
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This is the 3rd attempt to add PCRE support to git-grep.

Changes from v2:

* Replaced NO_LIBPCRE with USE_LIBPCRE. This makes libpcre support
optional. I also considered WITH_LIBPCRE but I see most #defines start
with USE_.

* Renamed 'pcre_extra *extra' with 'pcre_extra *pcre_extra_info'. Now
this variable is more descriptive.

* Reworded description of USE_LIBPCRE in Makefile and configure.ac

* I _didn't_ change the die() message to contain only "Perl-compatible
regexes not supported", but left "cannot use Perl-compatible regexes
when not compiled with USE_LIBPCRE". In my opinion this is more
informative to the user. However, it's OK for me to change that if more
people prefer shorter version.

* Removed die() from pcrematch() and free_pcre_regexp(), because die()
is first called from compile_pcre_regexp(). I left die() there and not
moved it to option parsing because I'd like to keep as few '#ifdef
USE_LIBPCRE' as possible.

* Added compile_regexp_failed() which handles both regcomp() and
pcre_compile() failures.

* Added basic testcases. I'm not sure if I should repeat all
git-grep tests with -P enabled, this seems to be officiousness. Beyond
testing -P/--perl-regexp and their interaction with -i and -w,  I also
check if `git grep -F -P` and `git grep -E -P` was called, but do not
protect from `git grep -G -P` nor `git grep -E -G -P`. Fixing this
would require changing the way -G -E are parsed (currently they
set/unset REG_EXTENDED bit in opts.regflags). Personally, if I were
fixing this, I'd remove regflags completely from grep_opt and use it
only in functions which call regcomp()/regexec() directly. That would
make code more general. But it's also possible to just convert option
parsing code to properly detect used flags. That said, I don't think the
end result is worth the effort.

I'd like to thank for all comments!

Michał Kiedrowicz (6):
  grep: Fix a typo in a comment
  grep: Extract compile_regexp_failed() from compile_regexp()
  git-grep: Learn PCRE
  configure: Check for libpcre
  grep: Add basic tests
  git-grep: Bail out when -P is used with -F or -E

 Documentation/git-grep.txt             |    6 ++
 Makefile                               |   15 +++++
 builtin/grep.c                         |    6 ++-
 config.mak.in                          |    1 +
 configure.ac                           |   40 ++++++++++++
 contrib/completion/git-completion.bash |    1 +
 grep.c                                 |  102 ++++++++++++++++++++++++++++----
 grep.h                                 |    9 +++
 t/README                               |    5 ++
 t/t7810-grep.sh                        |   54 +++++++++++++++++
 t/test-lib.sh                          |    1 +
 11 files changed, 228 insertions(+), 12 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/6] grep: Fix a typo in a comment
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 2/6] grep: Extract compile_regexp_failed() from compile_regexp() Michał Kiedrowicz
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz


Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 grep.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/grep.c b/grep.c
index d67baf9..250462e 100644
--- a/grep.c
+++ b/grep.c
@@ -898,7 +898,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 		int hit;
 
 		/*
-		 * look_ahead() skips quicly to the line that possibly
+		 * look_ahead() skips quickly to the line that possibly
 		 * has the next hit; don't call it if we need to do
 		 * something more than just skipping the current line
 		 * in response to an unmatch for the current line.  E.g.
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/6] grep: Extract compile_regexp_failed() from compile_regexp()
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 1/6] grep: Fix a typo in a comment Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 3/6] git-grep: Learn PCRE Michał Kiedrowicz
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This simplifies compile_regexp() a little and allows re-using error
handling code.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 grep.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/grep.c b/grep.c
index 250462e..870d10c 100644
--- a/grep.c
+++ b/grep.c
@@ -59,6 +59,21 @@ struct grep_opt *grep_opt_dup(const struct grep_opt *opt)
 	return ret;
 }
 
+static NORETURN void compile_regexp_failed(const struct grep_pat *p,
+		const char *error)
+{
+	char where[1024];
+
+	if (p->no)
+		sprintf(where, "In '%s' at %d, ", p->origin, p->no);
+	else if (p->origin)
+		sprintf(where, "%s, ", p->origin);
+	else
+		where[0] = 0;
+
+	die("%s'%s': %s", where, p->pattern, error);
+}
+
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int err;
@@ -73,17 +88,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
-		char where[1024];
-		if (p->no)
-			sprintf(where, "In '%s' at %d, ",
-				p->origin, p->no);
-		else if (p->origin)
-			sprintf(where, "%s, ", p->origin);
-		else
-			where[0] = 0;
 		regerror(err, &p->regexp, errbuf, 1024);
 		regfree(&p->regexp);
-		die("%s'%s': %s", where, p->pattern, errbuf);
+		compile_regexp_failed(p, errbuf);
 	}
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/6] git-grep: Learn PCRE
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 1/6] grep: Fix a typo in a comment Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 2/6] grep: Extract compile_regexp_failed() from compile_regexp() Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 4/6] configure: Check for libpcre Michał Kiedrowicz
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This patch teaches git-grep the --perl-regexp/-P options (naming
borrowed from GNU grep) in order to allow specifying PCRE regexes on the
command line.

PCRE has a number of features which make them more handy to use than
POSIX regexes, like consistent escaping rules, extended character
classes, ungreedy matching etc.

git isn't build with PCRE support automatically. USE_LIBPCRE environment
variable must be enabled (like `make USE_LIBPCRE=YesPlease`).

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 Documentation/git-grep.txt             |    6 +++
 Makefile                               |   15 ++++++
 builtin/grep.c                         |    2 +
 contrib/completion/git-completion.bash |    1 +
 grep.c                                 |   75 +++++++++++++++++++++++++++++++-
 grep.h                                 |    9 ++++
 6 files changed, 107 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a58378..e150c77 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -12,6 +12,7 @@ SYNOPSIS
 'git grep' [-a | --text] [-I] [-i | --ignore-case] [-w | --word-regexp]
 	   [-v | --invert-match] [-h|-H] [--full-name]
 	   [-E | --extended-regexp] [-G | --basic-regexp]
+	   [-P | --perl-regexp]
 	   [-F | --fixed-strings] [-n | --line-number]
 	   [-l | --files-with-matches] [-L | --files-without-match]
 	   [(-O | --open-files-in-pager) [<pager>]]
@@ -97,6 +98,11 @@ OPTIONS
 	Use POSIX extended/basic regexp for patterns.  Default
 	is to use basic regexp.
 
+-P::
+--perl-regexp::
+	Use Perl-compatible regexp for patterns. Requires libpcre to be
+	compiled in.
+
 -F::
 --fixed-strings::
 	Use fixed strings for patterns (don't interpret pattern
diff --git a/Makefile b/Makefile
index 3a1fe20..717a47c 100644
--- a/Makefile
+++ b/Makefile
@@ -24,6 +24,12 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
+# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
+# able to use Perl-compatible regular expressions.
+#
+# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# /foo/bar/include and /foo/bar/lib directories.
+#
 # Define NO_CURL if you do not have libcurl installed.  git-http-pull and
 # git-http-push are not built, and you cannot use http:// and https://
 # transports.
@@ -1251,6 +1257,15 @@ ifdef NO_LIBGEN_H
 	COMPAT_OBJS += compat/basename.o
 endif
 
+ifdef USE_LIBPCRE
+	BASIC_CFLAGS += -DUSE_LIBPCRE
+	ifdef LIBPCREDIR
+		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
+		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+	endif
+	EXTLIBS += -lpcre
+endif
+
 ifdef NO_CURL
 	BASIC_CFLAGS += -DNO_CURL
 	REMOTE_CURL_PRIMARY =
diff --git a/builtin/grep.c b/builtin/grep.c
index 10a1f65..6831975 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -781,6 +781,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			REG_EXTENDED),
 		OPT_BOOLEAN('F', "fixed-strings", &opt.fixed,
 			"interpret patterns as fixed strings"),
+		OPT_BOOLEAN('P', "perl-regexp", &opt.pcre,
+				"use Perl-compatible regular expressions"),
 		OPT_GROUP(""),
 		OPT_BOOLEAN('n', "line-number", &opt.linenum, "show line numbers"),
 		OPT_NEGBIT('h', NULL, &opt.pathname, "don't show filenames", 1),
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 4b2654d..95790a1 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1487,6 +1487,7 @@ _git_grep ()
 			--text --ignore-case --word-regexp --invert-match
 			--full-name --line-number
 			--extended-regexp --basic-regexp --fixed-strings
+			--perl-regexp
 			--files-with-matches --name-only
 			--files-without-match
 			--max-depth
diff --git a/grep.c b/grep.c
index 870d10c..d03d9e2 100644
--- a/grep.c
+++ b/grep.c
@@ -74,6 +74,69 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
 	die("%s'%s': %s", where, p->pattern, error);
 }
 
+#ifdef USE_LIBPCRE
+static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+{
+	const char *error;
+	int erroffset;
+	int options = 0;
+
+	if (opt->ignore_case)
+		options |= PCRE_CASELESS;
+
+	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
+			NULL);
+	if (!p->pcre_regexp)
+		compile_regexp_failed(p, error);
+
+	p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);
+	if (!p->pcre_extra_info && error)
+		die("%s", error);
+}
+
+static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	int ovector[30], ret, flags = 0;
+
+	if (eflags & REG_NOTBOL)
+		flags |= PCRE_NOTBOL;
+
+	ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+			0, flags, ovector, ARRAY_SIZE(ovector));
+	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
+		die("pcre_exec failed with error code %d", ret);
+	if (ret > 0) {
+		ret = 0;
+		match->rm_so = ovector[0];
+		match->rm_eo = ovector[1];
+	}
+
+	return ret;
+}
+
+static void free_pcre_regexp(struct grep_pat *p)
+{
+	pcre_free(p->pcre_regexp);
+	pcre_free(p->pcre_extra_info);
+}
+#else /* !USE_LIBPCRE */
+static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+{
+	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
+}
+
+static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	return 1;
+}
+
+static void free_pcre_regexp(struct grep_pat *p)
+{
+}
+#endif /* !USE_LIBPCRE */
+
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int err;
@@ -85,6 +148,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	if (p->fixed)
 		return;
 
+	if (opt->pcre) {
+		compile_pcre_regexp(p, opt);
+		return;
+	}
+
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
@@ -327,7 +395,10 @@ void free_grep_patterns(struct grep_opt *opt)
 		case GREP_PATTERN: /* atom */
 		case GREP_PATTERN_HEAD:
 		case GREP_PATTERN_BODY:
-			regfree(&p->regexp);
+			if (p->pcre_regexp)
+				free_pcre_regexp(p);
+			else
+				regfree(&p->regexp);
 			break;
 		default:
 			break;
@@ -426,6 +497,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 
 	if (p->fixed)
 		hit = !fixmatch(p, line, eol, match);
+	else if (p->pcre_regexp)
+		hit = !pcrematch(p, line, eol, match, eflags);
 	else
 		hit = !regmatch(&p->regexp, line, eol, match, eflags);
 
diff --git a/grep.h b/grep.h
index 06621fe..cd055cd 100644
--- a/grep.h
+++ b/grep.h
@@ -1,6 +1,12 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
+#ifdef USE_LIBPCRE
+#include <pcre.h>
+#else
+typedef int pcre;
+typedef int pcre_extra;
+#endif
 
 enum grep_pat_token {
 	GREP_PATTERN,
@@ -33,6 +39,8 @@ struct grep_pat {
 	size_t patternlen;
 	enum grep_header_field field;
 	regex_t regexp;
+	pcre *pcre_regexp;
+	pcre_extra *pcre_extra_info;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
 	unsigned word_regexp:1;
@@ -83,6 +91,7 @@ struct grep_opt {
 #define GREP_BINARY_TEXT	2
 	int binary;
 	int extended;
+	int pcre;
 	int relative;
 	int pathname;
 	int null_following_name;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/6] configure: Check for libpcre
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
                   ` (2 preceding siblings ...)
  2011-05-09 21:52 ` [PATCH v3 3/6] git-grep: Learn PCRE Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 5/6] grep: Add basic tests Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E Michał Kiedrowicz
  5 siblings, 0 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This patch adds checks for libpcre to configure. By default libpcre is
disabled, --with-libpcre enables it (if it works).

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 config.mak.in |    1 +
 configure.ac  |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/config.mak.in b/config.mak.in
index e378534..f30130b 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -61,6 +61,7 @@ NO_INET_PTON=@NO_INET_PTON@
 NO_ICONV=@NO_ICONV@
 OLD_ICONV=@OLD_ICONV@
 NO_REGEX=@NO_REGEX@
+USE_LIBPCRE=@USE_LIBPCRE@
 NO_DEFLATE_BOUND=@NO_DEFLATE_BOUND@
 INLINE=@INLINE@
 SOCKLEN_T=@SOCKLEN_T@
diff --git a/configure.ac b/configure.ac
index fafd815..048a1d4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -220,6 +220,27 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)])
 AS_HELP_STRING([],              [ARG can be prefix for openssl library and headers]),\
 GIT_PARSE_WITH(openssl))
 #
+# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
+# able to use Perl-compatible regular expressions.
+#
+# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# /foo/bar/include and /foo/bar/lib directories.
+#
+AC_ARG_WITH(libpcre,
+AS_HELP_STRING([--with-libpcre],[support Perl-compatible regexes (default is NO)])
+AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+if test "$withval" = "no"; then \
+	USE_LIBPCRE=; \
+elif test "$withval" = "yes"; then \
+	USE_LIBPCRE=YesPlease; \
+else
+	USE_LIBPCRE=YesPlease; \
+	LIBPCREDIR=$withval; \
+	AC_MSG_NOTICE([Setting LIBPCREDIR to $withval]); \
+	GIT_CONF_APPEND_LINE(LIBPCREDIR=$withval); \
+fi \
+)
+#
 # Define NO_CURL if you do not have curl installed.  git-http-pull and
 # git-http-push are not built, and you cannot use http:// and https://
 # transports.
@@ -435,6 +456,25 @@ AC_SUBST(NEEDS_SSL_WITH_CRYPTO)
 AC_SUBST(NO_OPENSSL)
 
 #
+# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
+# able to use Perl-compatible regular expressions.
+#
+
+if test -n "$USE_LIBPCRE"; then
+
+GIT_STASH_FLAGS($LIBPCREDIR)
+
+AC_CHECK_LIB([pcre], [pcre_version],
+[USE_LIBPCRE=YesPlease],
+[USE_LIBPCRE=])
+
+GIT_UNSTASH_FLAGS($LIBPCREDIR)
+
+AC_SUBST(USE_LIBPCRE)
+
+fi
+
+#
 # Define NO_CURL if you do not have libcurl installed.  git-http-pull and
 # git-http-push are not built, and you cannot use http:// and https://
 # transports.
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/6] grep: Add basic tests
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
                   ` (3 preceding siblings ...)
  2011-05-09 21:52 ` [PATCH v3 4/6] configure: Check for libpcre Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-09 21:52 ` [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E Michał Kiedrowicz
  5 siblings, 0 replies; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This modest patch adds simple tests for git grep -P/--perl-regexp and
its interoperation with -i and -w.

Tests are only enabled when prerequisite LIBPCRE is defined (it's
automatically set based on USE_LIBPCRE in test-lib.sh).

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 t/README        |    5 +++++
 t/t7810-grep.sh |   38 ++++++++++++++++++++++++++++++++++++++
 t/test-lib.sh   |    1 +
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/t/README b/t/README
index 428ee05..238729c 100644
--- a/t/README
+++ b/t/README
@@ -587,6 +587,11 @@ use these, and "test_set_prereq" for how to define your own.
    Test is not run by root user, and an attempt to write to an
    unwritable file is expected to fail correctly.
 
+ - LIBPCRE
+
+   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   that use git-grep --perl-regexp or git-grep -P in these.
+
 Tips for Writing Tests
 ----------------------
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 8184c26..e845218 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -26,6 +26,12 @@ test_expect_success setup '
 		echo foo mmap bar_mmap
 		echo foo_mmap bar mmap baz
 	} >file &&
+	{
+		echo Hello world
+		echo HeLLo world
+		echo Hello_world
+		echo HeLLo_world
+	} >hello_world &&
 	echo vvv >v &&
 	echo ww w >w &&
 	echo x x xx x >x &&
@@ -599,4 +605,36 @@ test_expect_success 'grep -e -- -- path' '
 	test_cmp expected actual
 '
 
+cat >expected <<EOF
+hello.c:int main(int argc, const char **argv)
+hello.c:	printf("Hello world.\n");
+EOF
+
+test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+	git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success LIBPCRE 'grep -P pattern' '
+	git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success LIBPCRE 'grep -P -i pattern' '
+	{
+		echo "hello.c:	printf(\"Hello world.\n\");"
+	} >expected &&
+	git grep -P -i "PRINTF\([^\d]+\)" hello.c >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success LIBPCRE 'grep -P -w pattern' '
+	{
+		echo "hello_world:Hello world"
+		echo "hello_world:HeLLo world"
+	} >expected &&
+	git grep -P -w "He((?i)ll)o" hello_world >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index c5b18e2..368f7ae 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1069,6 +1069,7 @@ esac
 
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
+test -n "$USE_LIBPCRE" && test_set_prereq LIBPCRE
 
 # Can we rely on git's output in the C locale?
 if test -n "$GETTEXT_POISON"
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E
  2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
                   ` (4 preceding siblings ...)
  2011-05-09 21:52 ` [PATCH v3 5/6] grep: Add basic tests Michał Kiedrowicz
@ 2011-05-09 21:52 ` Michał Kiedrowicz
  2011-05-10  1:48   ` Junio C Hamano
  5 siblings, 1 reply; 10+ messages in thread
From: Michał Kiedrowicz @ 2011-05-09 21:52 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen, Michał Kiedrowicz

This patch makes git-grep die() when -P is used on command line together
with -E/--extended-regexp or -F/--fixed-strings.

This also makes it bail out when grep.extendedRegexp is enabled.

But `git grep -G -P pattern` and `git grep -E -G -P pattern` still work
because -G and -E set opts.regflags during parse_options() and there is
no way to detect `-G` or `-E -G`.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 builtin/grep.c  |    4 +++-
 t/t7810-grep.sh |   16 ++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 6831975..8f26026 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -925,9 +925,11 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!opt.pattern_list)
 		die(_("no pattern given."));
+	if (opt.regflags != REG_NEWLINE && opt.pcre)
+		die(_("cannot mix --extended-regexp and --perl-regexp"));
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
-	if ((opt.regflags != REG_NEWLINE) && opt.fixed)
+	if ((opt.regflags != REG_NEWLINE || opt.pcre) && opt.fixed)
 		die(_("cannot mix --fixed-strings and regexp"));
 
 #ifndef NO_PTHREADS
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index e845218..2a31eca 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -637,4 +637,20 @@ test_expect_success LIBPCRE 'grep -P -w pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success LIBPCRE 'grep -P -F returns error' '
+	test_expect_code 128 git grep -P -F main
+'
+
+test_expect_success LIBPCRE 'grep -P -E returns error' '
+	test_expect_code 128 git grep -P -E main
+'
+
+test_expect_failure LIBPCRE 'grep -P -G returns error' '
+	test_expect_code 128 git grep -P -G main
+'
+
+test_expect_failure LIBPCRE 'grep -P -E -G returns error' '
+	test_expect_code 128 git grep -P -E -G main
+'
+
 test_done
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E
  2011-05-09 21:52 ` [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E Michał Kiedrowicz
@ 2011-05-10  1:48   ` Junio C Hamano
  2011-05-10  5:24     ` Michal Kiedrowicz
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-05-10  1:48 UTC (permalink / raw)
  To: Michał Kiedrowicz
  Cc: Git List, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> This also makes it bail out when grep.extendedRegexp is enabled.

That is a no-starter.  "git grep -P foo" from the command line should just
ignore the configured default.  It is not entirely your fault, as I think
you inherited the bug from the existing code that lets grep.extendedRegexp
interact with the "--fixed" option from the command line.

> But `git grep -G -P pattern` and `git grep -E -G -P pattern` still work
> because -G and -E set opts.regflags during parse_options() and there is
> no way to detect `-G` or `-E -G`.

How about following the usual pattern of letting the last one win?

Perhaps like this?  This is not even compile tested, but should apply
cleanly on top of, and can be squashed into, your 6/6.  You of course
would need to rewrite the commit log message and documentation, if you
said only one of these can be used.

We would need some tests for "grep -P", no?  Please throw in the "last one
wins" and "command line defeats configuration" when you add one.

Also I deliberately said "--ignore-case and -P are not compatible (yet)";
shouldn't you be able to do ignore case fairly easily, I wonder?  Isn't it
just the matter of wrapping each one with "(?i:" and ")" pair, or anything
more involved necessary?

 builtin/grep.c |   58 +++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 8e422b3..37f2331 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -753,6 +753,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	int i;
 	int dummy;
 	int use_index = 1;
+	enum {
+		pattern_type_unspecified = 0,
+		pattern_type_bre,
+		pattern_type_ere,
+		pattern_type_fixed,
+		pattern_type_pcre,
+	};
+	int pattern_type = pattern_type_unspecified;
+
 	struct option options[] = {
 		OPT_BOOLEAN(0, "cached", &cached,
 			"search in index instead of in the work tree"),
@@ -774,15 +783,18 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			"descend at most <depth> levels", PARSE_OPT_NONEG,
 			NULL, 1 },
 		OPT_GROUP(""),
-		OPT_BIT('E', "extended-regexp", &opt.regflags,
-			"use extended POSIX regular expressions", REG_EXTENDED),
-		OPT_NEGBIT('G', "basic-regexp", &opt.regflags,
-			"use basic POSIX regular expressions (default)",
-			REG_EXTENDED),
-		OPT_BOOLEAN('F', "fixed-strings", &opt.fixed,
-			"interpret patterns as fixed strings"),
-		OPT_BOOLEAN('P', "perl-regexp", &opt.pcre,
-				"use Perl-compatible regular expressions"),
+		OPT_SET_INT('E', "extended-regexp", &pattern_type,
+			    "use extended POSIX regular expressions",
+			    pattern_type_ere),
+		OPT_SET_INT('G', "basic-regexp", &pattern_type,
+			    "use basic POSIX regular expressions (default)",
+			    pattern_type_bre),
+		OPT_SET_INT('F', "fixed-strings", &pattern_type,
+			    "interpret patterns as fixed strings",
+			    pattern_type_fixed),
+		OPT_SET_INT('P', "perl-regexp", &pattern_type,
+			    "use Perl-compatible regular expressions",
+			    pattern_type_pcre),
 		OPT_GROUP(""),
 		OPT_BOOLEAN('n', "line-number", &opt.linenum, "show line numbers"),
 		OPT_NEGBIT('h', NULL, &opt.pathname, "don't show filenames", 1),
@@ -888,6 +900,28 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_KEEP_DASHDASH |
 			     PARSE_OPT_STOP_AT_NON_OPTION |
 			     PARSE_OPT_NO_INTERNAL_HELP);
+	switch (pattern_type) {
+	case pattern_type_fixed:
+		opt.fixed = 1;
+		opt.pcre = 0;
+		break;
+	case pattern_type_bre:
+		opt.fixed = 0;
+		opt.pcre = 0;
+		opt.regflags &= ~REG_EXTENDED;
+		break;
+	case pattern_type_ere:
+		opt.fixed = 0;
+		opt.pcre = 0;
+		opt.regflags |= REG_EXTENDED;
+		break;
+	case pattern_type_pcre:
+		opt.fixed = 0;
+		opt.pcre = 1;
+		break;
+	default:
+		break; /* nothing */
+	}
 
 	if (use_index && !startup_info->have_repository)
 		/* die the same way as if we did it at the beginning */
@@ -925,12 +959,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!opt.pattern_list)
 		die(_("no pattern given."));
-	if (opt.regflags != REG_NEWLINE && opt.pcre)
-		die(_("cannot mix --extended-regexp and --perl-regexp"));
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
-	if ((opt.regflags != REG_NEWLINE || opt.pcre) && opt.fixed)
-		die(_("cannot mix --fixed-strings and regexp"));
+	if (opt.pcre && opt.ignore_case)
+		die(_("--ignore-case and -P are not compatible (yet)"));
 
 #ifndef NO_PTHREADS
 	if (online_cpus() == 1 || !grep_threads_ok(&opt))

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E
  2011-05-10  1:48   ` Junio C Hamano
@ 2011-05-10  5:24     ` Michal Kiedrowicz
  2011-05-10  6:11       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Kiedrowicz @ 2011-05-10  5:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git List, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen

On 09.05.2011 18:48:36 -0700 Junio C Hamano <gitster@pobox.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > This also makes it bail out when grep.extendedRegexp is enabled.
> 
> That is a no-starter.  "git grep -P foo" from the command line should
> just ignore the configured default.  It is not entirely your fault,
> as I think you inherited the bug from the existing code that lets
> grep.extendedRegexp interact with the "--fixed" option from the
> command line.
> 
> > But `git grep -G -P pattern` and `git grep -E -G -P pattern` still
> > work because -G and -E set opts.regflags during parse_options() and
> > there is no way to detect `-G` or `-E -G`.
> 
> How about following the usual pattern of letting the last one win?
> 
> Perhaps like this?  This is not even compile tested, but should apply
> cleanly on top of, and can be squashed into, your 6/6.  You of course
> would need to rewrite the commit log message and documentation, if you
> said only one of these can be used.

Sounds good, thanks.

> 
> We would need some tests for "grep -P", no?  

What about those in patch 5/6?

> Please throw in the
> "last one wins" and "command line defeats configuration" when you add
> one.
> 
> Also I deliberately said "--ignore-case and -P are not compatible
> (yet)"; shouldn't you be able to do ignore case fairly easily, I
> wonder?  Isn't it just the matter of wrapping each one with "(?i:"
> and ")" pair, or anything more involved necessary?

If you look at patch 3/6 you will see:

+
+	if (opt->ignore_case)
+		options |= PCRE_CASELESS;
+
+	p->pcre_regexp = pcre_compile(p->pattern, options, &error,
&erroffset,
+			NULL);

and also

+test_expect_success LIBPCRE 'grep -P -i pattern' '

in patch 5/6 :). Or perhaps it doesn't work for you?

> 
>  builtin/grep.c |   58
> +++++++++++++++++++++++++++++++++++++++++++------------ 1 files
> changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8e422b3..37f2331 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -753,6 +753,15 @@ int cmd_grep(int argc, const char **argv, const
> char *prefix) int i;
>  	int dummy;
>  	int use_index = 1;
> +	enum {
> +		pattern_type_unspecified = 0,
> +		pattern_type_bre,
> +		pattern_type_ere,
> +		pattern_type_fixed,
> +		pattern_type_pcre,
> +	};
> +	int pattern_type = pattern_type_unspecified;
> +
>  	struct option options[] = {
>  		OPT_BOOLEAN(0, "cached", &cached,
>  			"search in index instead of in the work
> tree"), @@ -774,15 +783,18 @@ int cmd_grep(int argc, const char
> **argv, const char *prefix) "descend at most <depth> levels",
> PARSE_OPT_NONEG, NULL, 1 },
>  		OPT_GROUP(""),
> -		OPT_BIT('E', "extended-regexp", &opt.regflags,
> -			"use extended POSIX regular expressions",
> REG_EXTENDED),
> -		OPT_NEGBIT('G', "basic-regexp", &opt.regflags,
> -			"use basic POSIX regular expressions
> (default)",
> -			REG_EXTENDED),
> -		OPT_BOOLEAN('F', "fixed-strings", &opt.fixed,
> -			"interpret patterns as fixed strings"),
> -		OPT_BOOLEAN('P', "perl-regexp", &opt.pcre,
> -				"use Perl-compatible regular
> expressions"),
> +		OPT_SET_INT('E', "extended-regexp", &pattern_type,
> +			    "use extended POSIX regular expressions",
> +			    pattern_type_ere),
> +		OPT_SET_INT('G', "basic-regexp", &pattern_type,
> +			    "use basic POSIX regular expressions
> (default)",
> +			    pattern_type_bre),
> +		OPT_SET_INT('F', "fixed-strings", &pattern_type,
> +			    "interpret patterns as fixed strings",
> +			    pattern_type_fixed),
> +		OPT_SET_INT('P', "perl-regexp", &pattern_type,
> +			    "use Perl-compatible regular
> expressions",
> +			    pattern_type_pcre),
>  		OPT_GROUP(""),
>  		OPT_BOOLEAN('n', "line-number", &opt.linenum, "show
> line numbers"), OPT_NEGBIT('h', NULL, &opt.pathname, "don't show
> filenames", 1), @@ -888,6 +900,28 @@ int cmd_grep(int argc, const
> char **argv, const char *prefix) PARSE_OPT_KEEP_DASHDASH |
>  			     PARSE_OPT_STOP_AT_NON_OPTION |
>  			     PARSE_OPT_NO_INTERNAL_HELP);
> +	switch (pattern_type) {
> +	case pattern_type_fixed:
> +		opt.fixed = 1;
> +		opt.pcre = 0;
> +		break;
> +	case pattern_type_bre:
> +		opt.fixed = 0;
> +		opt.pcre = 0;
> +		opt.regflags &= ~REG_EXTENDED;
> +		break;
> +	case pattern_type_ere:
> +		opt.fixed = 0;
> +		opt.pcre = 0;
> +		opt.regflags |= REG_EXTENDED;
> +		break;
> +	case pattern_type_pcre:
> +		opt.fixed = 0;
> +		opt.pcre = 1;
> +		break;
> +	default:
> +		break; /* nothing */
> +	}
>  
>  	if (use_index && !startup_info->have_repository)
>  		/* die the same way as if we did it at the beginning
> */ @@ -925,12 +959,10 @@ int cmd_grep(int argc, const char **argv,
> const char *prefix) 
>  	if (!opt.pattern_list)
>  		die(_("no pattern given."));
> -	if (opt.regflags != REG_NEWLINE && opt.pcre)
> -		die(_("cannot mix --extended-regexp and
> --perl-regexp")); if (!opt.fixed && opt.ignore_case)
>  		opt.regflags |= REG_ICASE;
> -	if ((opt.regflags != REG_NEWLINE || opt.pcre) && opt.fixed)
> -		die(_("cannot mix --fixed-strings and regexp"));
> +	if (opt.pcre && opt.ignore_case)
> +		die(_("--ignore-case and -P are not compatible
> (yet)")); 
>  #ifndef NO_PTHREADS
>  	if (online_cpus() == 1 || !grep_threads_ok(&opt))
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E
  2011-05-10  5:24     ` Michal Kiedrowicz
@ 2011-05-10  6:11       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2011-05-10  6:11 UTC (permalink / raw)
  To: Michal Kiedrowicz
  Cc: Git List, Martin Langhoff, Bert Wesarg, Johannes Sixt,
	Alex Riesen

Michal Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

>> We would need some tests for "grep -P", no?  
>
> What about those in patch 5/6?

There are some, but we would also want to see negative cases where
compilation detects an incorrect regexp.

>> Please throw in the
>> "last one wins" and "command line defeats configuration" when you add
>> one.

> +test_expect_success LIBPCRE 'grep -P -i pattern' '
>
> in patch 5/6 :). Or perhaps it doesn't work for you?

These I overlooked.  Will remove the "-i -P does not work yet".

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-10  6:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-09 21:52 [PATCH v3 0/6] Add PCRE support to git-grep Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 1/6] grep: Fix a typo in a comment Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 2/6] grep: Extract compile_regexp_failed() from compile_regexp() Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 3/6] git-grep: Learn PCRE Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 4/6] configure: Check for libpcre Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 5/6] grep: Add basic tests Michał Kiedrowicz
2011-05-09 21:52 ` [PATCH v3 6/6] git-grep: Bail out when -P is used with -F or -E Michał Kiedrowicz
2011-05-10  1:48   ` Junio C Hamano
2011-05-10  5:24     ` Michal Kiedrowicz
2011-05-10  6:11       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).