git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Filenames and prefixes in extended diffs
@ 2010-01-13 16:13 Andreas Gruenbacher
  2010-01-13 19:49 ` Junio C Hamano
  2010-01-14  0:16 ` Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2010-01-13 16:13 UTC (permalink / raw)
  To: git

I'm having a problem filename prefixes in git's extended diffs for patches 
which rename or copy files: those patches include the old and new filenames in 
"rename from", "rename to", "copy from", and "copy to" headers, e.g.,

	$ git show -M
	diff --git a/f b/g
	similarity index 87%
	rename from f
	rename to g
	index f00c965..3bb459b 100644
	--- a/f
	+++ b/g
	@@ -8,3 +8,4 @@
	 8
	 9
	 10
	+11

Unlike the filenames in the "diff --git", "---", and "+++" headers, the 
"rename from", "rename to", "copy from", and "copy to" filenames do not 
include prefixes.

Now when applying a patch, GNU patch's -p option determines the number of 
pathname components to strip off from filenames.  This obviously can't work 
consistently for the prefixed and prefix-less headers.

Can git be changed to include prefixes in all filenames?

The only alternative I see is to ignore the filenames in the rename/copy 
headers and rely only on the "diff --git" line.  (The "---" and "+++" headers 
are not guaranteed to exist.)  What's worse, as already discussed here, the 
"diff --git" line uses space as a separator between filenames yet it doesn't 
quote spaces in filenames.  When being forced to ignore rename/copy headers, 
this defect would make things much worse.


Any ideas?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Filenames and prefixes in extended diffs
  2010-01-13 16:13 Filenames and prefixes in extended diffs Andreas Gruenbacher
@ 2010-01-13 19:49 ` Junio C Hamano
  2010-01-14  0:16 ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-13 19:49 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: git

Andreas Gruenbacher <agruen@suse.de> writes:

> Any ideas?

How about studying what "git-apply" does?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Filenames and prefixes in extended diffs
  2010-01-13 16:13 Filenames and prefixes in extended diffs Andreas Gruenbacher
  2010-01-13 19:49 ` Junio C Hamano
@ 2010-01-14  0:16 ` Junio C Hamano
  2010-01-15 13:32   ` Nanako Shiraishi
  2010-01-18 23:22   ` Filenames and prefixes in extended diffs Andreas Gruenbacher
  1 sibling, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-14  0:16 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: git

Andreas Gruenbacher <agruen@suse.de> writes:

> Can git be changed to ...

Just to save your time coming up with more ways to *change* git diff...

Even though I wouldn't say _any_ change is too late to bring in, change in
the output format from "git diff" family _must_ be usable by "git apply"
people have been using for the last 4 years or so.

Suppose your updated version of "git diff" with a certain set of options
produces output A, which is different from the output B you would get out
of today's "git diff" that is run with the same set of options.

If "git apply" people have been using understands B (i.e. current output)
and does something, the format change between A and B must be designed in
such a way that the same "git apply" accepts A (i.e. your output) and do
the same thing.

Two examples:

 - "git diff -M" (or "git show -M") is _defined_ to show the filenames
   without prefix on "rename from" line, and deployed "git apply" relies
   on this definition to apply the patch to the file the patch was meant
   to apply.  If your modified "git diff -M" changes it to add the prefix,
   and existing "git apply" changes behaviour (either by rejecting your
   output, or applying the patch to a wrong file), then such a change has
   *no chance* of getting in.  It is merely a breakage.

 - If you say "git diff --src-prefix=a/b/c --dst-prefix=x/y", it _might_
   produce something "git apply" won't grok (I haven't checked this,
   though).  You can suggest to change the output from such a case to work
   better.  We didn't work as expected so a change _could_ be a fix.

The output from "git diff --no-index" is an exception to the above rule.
It is primarily for people who have unmanaged contents and want to use
features of the git diff engine that are not found in other people's diff
implementations (e.g. wordwise colored diff), and the header part of its
output does not currently follow "git diff" convention to be grokkable by
"git apply".

Fixing _that_ is a welcome change, but I suspect that there are corner
cases, e.g. "git diff --no-index frotz-1.2.36/ /tmp/frotz/" (i.e. you have
a pristine version in frotz-1.2.36 directory, but your modified version is
in /tmp/frtoz/) that might make fixing it fundamentally impossible (I
haven't looked into it for a long time, so it could be easy, but my gut
feeling is it isn't).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Filenames and prefixes in extended diffs
  2010-01-14  0:16 ` Junio C Hamano
@ 2010-01-15 13:32   ` Nanako Shiraishi
  2010-01-15 18:09     ` Junio C Hamano
  2010-01-18 23:22   ` Filenames and prefixes in extended diffs Andreas Gruenbacher
  1 sibling, 1 reply; 20+ messages in thread
From: Nanako Shiraishi @ 2010-01-15 13:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Quoting Junio C Hamano <gitster@pobox.com>

> The output from "git diff --no-index" is an exception to the above rule.
> It is primarily for people who have unmanaged contents and want to use
> features of the git diff engine that are not found in other people's diff
> implementations (e.g. wordwise colored diff),...

Is it possible to give --no-index option to "git grep", please?

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Filenames and prefixes in extended diffs
  2010-01-15 13:32   ` Nanako Shiraishi
@ 2010-01-15 18:09     ` Junio C Hamano
  2010-01-15 20:50       ` [PATCH] grep: prepare to run outside of a work tree Junio C Hamano
  2010-01-15 20:52       ` [PATCH] grep --no-index: allow use of "git grep" outside a git repository Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-15 18:09 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: git

Nanako Shiraishi <nanako3@lavabit.com> writes:

> Quoting Junio C Hamano <gitster@pobox.com>
>
>> The output from "git diff --no-index" is an exception to the above rule.
>> It is primarily for people who have unmanaged contents and want to use
>> features of the git diff engine that are not found in other people's diff
>> implementations (e.g. wordwise colored diff),...
>
> Is it possible to give --no-index option to "git grep", please?

Surely.  And "grep" is much easier to do than "diff".  Will send a patch
perhaps during my lunch break.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] grep: prepare to run outside of a work tree
  2010-01-15 18:09     ` Junio C Hamano
@ 2010-01-15 20:50       ` Junio C Hamano
  2010-01-15 20:52       ` [PATCH] grep --no-index: allow use of "git grep" outside a git repository Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-15 20:50 UTC (permalink / raw)
  To: git; +Cc: Nanako Shiraishi

This moves the call to setup_git_directory() for running "grep" from
the "git" wrapper to the implementation of the "grep" subcommand.  A
new variable "use_index" is always true at this stage in the series,
and when it is on, we require that we are in a directory that is under
git control.  To make sure we die the same way, we make a second call
into setup_git_directory() when we detect this situation.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Junio C Hamano <gitster@pobox.com> writes:

 > Nanako Shiraishi <nanako3@lavabit.com> writes:
 >
 >> Is it possible to give --no-index option to "git grep", please?
 >
 > Surely.  And "grep" is much easier to do than "diff".  Will send a
 > patch perhaps during my lunch break.

 This is merely a preparatory step.

 builtin-grep.c |    7 +++++++
 git.c          |    2 +-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index 3d6ebb5..229555d 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -414,6 +414,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	const char **paths = NULL;
 	int i;
 	int dummy;
+	int nongit = 0, use_index = 1;
 	struct option options[] = {
 		OPT_BOOLEAN(0, "cached", &cached,
 			"search in index instead of in the work tree"),
@@ -497,6 +498,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 
+	prefix = setup_git_directory_gently(&nongit);
+
 	/*
 	 * 'git grep -h', unlike 'git grep -h <pattern>', is a request
 	 * to show usage information and exit.
@@ -534,6 +537,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION |
 			     PARSE_OPT_NO_INTERNAL_HELP);
 
+	if (use_index && nongit)
+		/* die the same way as if we did it at the beginning */
+		setup_git_directory();
+
 	/* First unrecognized non-option token */
 	if (argc > 0 && !opt.pattern_list) {
 		append_grep_pattern(&opt, argv[0], "command line", 0,
diff --git a/git.c b/git.c
index 11544cd..ad07473 100644
--- a/git.c
+++ b/git.c
@@ -317,7 +317,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "fsck-objects", cmd_fsck, RUN_SETUP },
 		{ "gc", cmd_gc, RUN_SETUP },
 		{ "get-tar-commit-id", cmd_get_tar_commit_id },
-		{ "grep", cmd_grep, RUN_SETUP | USE_PAGER },
+		{ "grep", cmd_grep, USE_PAGER },
 		{ "help", cmd_help },
 		{ "init", cmd_init_db },
 		{ "init-db", cmd_init_db },
-- 
1.6.6.324.g20f8f4.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-15 18:09     ` Junio C Hamano
  2010-01-15 20:50       ` [PATCH] grep: prepare to run outside of a work tree Junio C Hamano
@ 2010-01-15 20:52       ` Junio C Hamano
  2010-01-15 21:08         ` Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-01-15 20:52 UTC (permalink / raw)
  To: git; +Cc: Nanako Shiraishi

Just like some people wanted diff features that are not found in
other people's diff implementations outside of a git repository
and added --no-index mode to the command, this adds --no-index mode
to the "git grep" command.

Also, inside a git repository, --no-index mode allows you to grep
in untracked (but not ignored) files.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is much easier than "git diff --no-index" because it does not have
   to worry about corner cases like "git diff --no-index file1 file2".

 builtin-grep.c  |   26 ++++++++++++++++++++++++++
 t/t7002-grep.sh |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index 229555d..1283373 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -14,6 +14,7 @@
 #include "userdiff.h"
 #include "grep.h"
 #include "quote.h"
+#include "dir.h"
 
 static char const * const grep_usage[] = {
 	"git grep [options] [-e] <pattern> [<rev>...] [[--] path...]",
@@ -320,6 +321,21 @@ static int grep_object(struct grep_opt *opt, const char **paths,
 	die("unable to grep from object of type %s", typename(obj->type));
 }
 
+static int grep_directory(struct grep_opt *opt, const char **paths)
+{
+	struct dir_struct dir;
+	int i, hit = 0;
+
+	memset(&dir, 0, sizeof(dir));
+	setup_standard_excludes(&dir);
+
+	fill_directory(&dir, paths);
+	for (i = 0; i < dir.nr; i++)
+		hit |= grep_file(opt, dir.entries[i]->name);
+	free_grep_patterns(opt);
+	return hit;
+}
+
 static int context_callback(const struct option *opt, const char *arg,
 			    int unset)
 {
@@ -418,6 +434,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	struct option options[] = {
 		OPT_BOOLEAN(0, "cached", &cached,
 			"search in index instead of in the work tree"),
+		OPT_BOOLEAN(0, "index", &use_index,
+			"--no-index finds in contents not managed by git"),
 		OPT_GROUP(""),
 		OPT_BOOLEAN('v', "invert-match", &opt.invert,
 			"show non-matching lines"),
@@ -591,6 +609,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		paths[1] = NULL;
 	}
 
+	if (!use_index) {
+		if (cached)
+			die("--cached cannot be used with --no-index.");
+		if (list.nr)
+			die("--no-index cannot be used with revs.");
+		return !grep_directory(&opt, paths);
+	}
+
 	if (!list.nr) {
 		if (!cached)
 			setup_work_tree();
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index c369cdb..7eceb08 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -426,4 +426,56 @@ test_expect_success 'grep -Fi' '
 	test_cmp expected actual
 '
 
+test_expect_success 'outside of git repository' '
+	rm -fr non &&
+	mkdir -p non/git/sub &&
+	echo hello >non/git/file1 &&
+	echo world >non/git/sub/file2 &&
+	echo ".*o*" >non/git/.gitignore &&
+	{
+		echo file1:hello &&
+		echo sub/file2:world
+	} >non/expect.full &&
+	echo file2:world >non/expect.sub
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non/git" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_must_fail git grep o &&
+		git grep --no-index o >../actual.full &&
+		test_cmp ../expect.full ../actual.full
+		cd sub &&
+		test_must_fail git grep o &&
+		git grep --no-index o >../../actual.sub &&
+		test_cmp ../../expect.sub ../../actual.sub
+	)
+'
+
+test_expect_success 'inside git repository but with --no-index' '
+	rm -fr is &&
+	mkdir -p is/git/sub &&
+	echo hello >is/git/file1 &&
+	echo world >is/git/sub/file2 &&
+	echo ".*o*" >is/git/.gitignore &&
+	{
+		echo file1:hello &&
+		echo sub/file2:world
+	} >is/expect.full &&
+	: >is/expect.empty &&
+	echo file2:world >is/expect.sub
+	(
+		cd is/git &&
+		git init &&
+		test_must_fail git grep o >../actual.full &&
+		test_cmp ../expect.empty ../actual.full &&
+		git grep --no-index o >../actual.full &&
+		test_cmp ../expect.full ../actual.full &&
+		cd sub &&
+		test_must_fail git grep o >../../actual.sub &&
+		test_cmp ../../expect.empty ../../actual.sub &&
+		git grep --no-index o >../../actual.sub &&
+		test_cmp ../../expect.sub ../../actual.sub
+	)
+'
+
 test_done
-- 
1.6.6.324.g20f8f4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-15 20:52       ` [PATCH] grep --no-index: allow use of "git grep" outside a git repository Junio C Hamano
@ 2010-01-15 21:08         ` Jeff King
  2010-01-16  1:05           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2010-01-15 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nanako Shiraishi

On Fri, Jan 15, 2010 at 12:52:40PM -0800, Junio C Hamano wrote:

> Just like some people wanted diff features that are not found in
> other people's diff implementations outside of a git repository
> and added --no-index mode to the command, this adds --no-index mode
> to the "git grep" command.

Out of curiosity, what are the interesting features in git grep versus
other greps?

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-15 21:08         ` Jeff King
@ 2010-01-16  1:05           ` Junio C Hamano
  2010-01-16  1:15             ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-01-16  1:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nanako Shiraishi

Jeff King <peff@peff.net> writes:

> On Fri, Jan 15, 2010 at 12:52:40PM -0800, Junio C Hamano wrote:
>
>> Just like some people wanted diff features that are not found in
>> other people's diff implementations outside of a git repository
>> and added --no-index mode to the command, this adds --no-index mode
>> to the "git grep" command.
>
> Out of curiosity, what are the interesting features in git grep versus
> other greps?

Three examples:

    git grep -e Junio --and -e Dscho --and -e Peff

is different from

    grep "Junio.*Dscho.*Peff"

in that the latter wouldn't find a line that has these names in different
order.  You can of course give permutations explicitly, like

    grep -e "Junio.*Dscho.*Peff" \
         -e "Dscho.*Junio.*Peff" \
         ...
	 -e "Peff.*Dscho.*Junio"

I don't know how you would do these with "grep":

    git grep -e Junio --and -e Dscho --and --not -e Linus

    git grep --all-match -e Junio -e Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-16  1:05           ` Junio C Hamano
@ 2010-01-16  1:15             ` Jeff King
  2010-01-16  4:15               ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2010-01-16  1:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nanako Shiraishi

On Fri, Jan 15, 2010 at 05:05:42PM -0800, Junio C Hamano wrote:

> > Out of curiosity, what are the interesting features in git grep versus
> > other greps?
> 
> Three examples:
> 
>     git grep -e Junio --and -e Dscho --and -e Peff
> 
> is different from
> 
>     grep "Junio.*Dscho.*Peff"

Right. I would do:

  grep Junio | grep Dscho | grep Peff

No, it's not quite as accurate, as you are grepping the filenames too.

And no, it's not as efficient, but given that the first grep eliminates
most of your input anyway, it's generally not a big deal.

So the short answer to my question seems to be "git grep has logical
operators". I don't find that compelling, but I guess some people do.
Thanks for satisfying my curiosity.

> I don't know how you would do these with "grep":
> 
>     git grep -e Junio --and -e Dscho --and --not -e Linus

I would do "grep Junio | grep Dscho | grep -v Linus".

>     git grep --all-match -e Junio -e Dscho

That one is a little harder (though it is not something I do very often,
and I had to actually read the docs to find what --all-match does):

  grep Junio `grep -l Dscho *`

which of course has problems with exotic filenames.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-16  1:15             ` Jeff King
@ 2010-01-16  4:15               ` Junio C Hamano
  2010-01-16  6:51                 ` David Aguilar
  2010-01-18  1:51                 ` Jeff King
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-16  4:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nanako Shiraishi

Jeff King <peff@peff.net> writes:

>>     git grep --all-match -e Junio -e Dscho
>
> That one is a little harder (though it is not something I do very often,
> and I had to actually read the docs to find what --all-match does):
>
>   grep Junio `grep -l Dscho *`
>
> which of course has problems with exotic filenames.

Also it doesn't find lines that match Dscho in the result ;-)

Realistically, this most often is used when grepping in the log, e.g.

    git log --all-match --author=peff --grep=test

I actually wish "log" to somehow default to --all-match mode at least when
using the --author option.  "Change by Jeff, or about test by anybody" is
rarely what I would want to look for.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-16  4:15               ` Junio C Hamano
@ 2010-01-16  6:51                 ` David Aguilar
  2010-01-16  7:21                   ` Junio C Hamano
  2010-01-18  1:51                 ` Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: David Aguilar @ 2010-01-16  6:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Nanako Shiraishi

On Fri, Jan 15, 2010 at 08:15:49PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> >>     git grep --all-match -e Junio -e Dscho
> >
> > That one is a little harder (though it is not something I do very often,
> > and I had to actually read the docs to find what --all-match does):
> >
> >   grep Junio `grep -l Dscho *`
> >
> > which of course has problems with exotic filenames.
> 
> Also it doesn't find lines that match Dscho in the result ;-)
> 
> Realistically, this most often is used when grepping in the log, e.g.
> 
>     git log --all-match --author=peff --grep=test
> 
> I actually wish "log" to somehow default to --all-match mode at least when
> using the --author option.  "Change by Jeff, or about test by anybody" is
> rarely what I would want to look for.

Kinda like this?

I originally had it set grep_filter.all_match in --author only,
but then I thought "why author and not commiter too", so changing
the default seemed like the natural thing to do.  Or it could be
a cat brained idea, I dunno ;)

-- -- -- 8< -- -- -- 8< -- -- --
From 2277a6e512c2f597c6240f06c9e7d5ff83e2fe3f Mon Sep 17 00:00:00 2001
From: David Aguilar <davvid@gmail.com>
Date: Fri, 15 Jan 2010 21:18:36 -0800
Subject: [PATCH] Make --all-match the default in "log" family

'git log --author=peff --grep=test' means "search for commits by Jeff,
or about test by anybody," which is rarely what what we want to do.
The original behavior can by achieved by specifying --no-all-match.

Reference: http://article.gmane.org/gmane.comp.version-control.git/137197
Signed-off-by: David Aguilar <davvid@gmail.com>
---
 Documentation/rev-list-options.txt |    1 +
 revision.c                         |    3 +++
 t/t7002-grep.sh                    |    2 +-
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 1f57aed..0ce1008 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -179,6 +179,7 @@ endif::git-rev-list[]
 --all-match::
 	Limit the commits output to ones that match all given --grep,
 	--author and --committer instead of ones that match at least one.
+	--all-match is the defaullt and can be disabled with --no-all-match.
 
 -i::
 --regexp-ignore-case::
diff --git a/revision.c b/revision.c
index 25fa14d..64ebdc5 100644
--- a/revision.c
+++ b/revision.c
@@ -804,6 +804,7 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 
 	revs->commit_format = CMIT_FMT_DEFAULT;
 
+	revs->grep_filter.all_match = 1;
 	revs->grep_filter.status_only = 1;
 	revs->grep_filter.pattern_tail = &(revs->grep_filter.pattern_list);
 	revs->grep_filter.regflags = REG_NEWLINE;
@@ -1222,6 +1223,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->grep_filter.fixed = 1;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
+	} else if (!strcmp(arg, "--no-all-match")) {
+		revs->grep_filter.all_match = 0;
 	} else if (!prefixcmp(arg, "--encoding=")) {
 		arg += 11;
 		if (strcmp(arg, "none"))
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index 76c5e09..92ef534 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -358,7 +358,7 @@ test_expect_success 'log grep (4)' '
 
 test_expect_success 'log grep (5)' '
 	git log --author=Thor -F --grep=Thu --pretty=tformat:%s >actual &&
-	( echo third ; echo initial ) >expect &&
+	: >expect &&
 	test_cmp expect actual
 '
 
-- 
1.6.6.197.g2277

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-16  6:51                 ` David Aguilar
@ 2010-01-16  7:21                   ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-01-16  7:21 UTC (permalink / raw)
  To: David Aguilar; +Cc: Jeff King, git, Nanako Shiraishi

David Aguilar <davvid@gmail.com> writes:

> On Fri, Jan 15, 2010 at 08:15:49PM -0800, Junio C Hamano wrote:
>
>> Realistically, this most often is used when grepping in the log, e.g.
>> 
>>     git log --all-match --author=peff --grep=test
>> 
>> I actually wish "log" to somehow default to --all-match mode at least when
>> using the --author option.  "Change by Jeff, or about test by anybody" is
>> rarely what I would want to look for.
>
> Kinda like this?

Not quite.  What I really want is

    git log --author=davvid --grep=difftool --grep=mergetool

to find all commits by you that is about (either diff or mergetool).  I
think your patch will limit the search only to your patch that talks about
both of these two tools (not necessarily on the same line, but in the same
commit).

The extended "grep" expression parser by default creates a list of OR'ed
terms.  What --all-match does is to make this top-level chain to mean "all
of these must trigger somewhere in the whole _document_ (not an individual
line), for the document to be considered a hit" for the purpose of "grep -l",
and when used with "log" family, --author/--committer/--grep are used to
limit the output to commits "grep -l" would say "yes, this document has
matched".

Currently,

    git log --author=davvid --grep=difftool --grep=mergetool

will parse to a list of three terms:

    GREP_PATTERN_HEAD("^author .*davvid")
    GREP_PATTERN_BODY("difftool")
    GREP_PATTERN_BODY("mergetool")

And giving --all-match will require all of these OR'ed terms to appear in
the commit object.

My dream one will probably has to make a list of two terms as its parse
tree instead, like this:

    GREP_PATTERN_HEAD("^author .*davvid")
    GREP_NODE_OR(
        GREP_PATTERN_BODY("difftool")
        GREP_PATTERN_BODY("mergetool")
    )           

and then run it with --all-match semantics.  The top-level consists of two
terms, and they both must hit, but the second term is an OR'ed one.

It is unclear how we would want to throw the committer in the mix.  If we
make this parse tree:

    GREP_PATTERN_HEAD("^author .*davvid")
    GREP_PATTERN_HEAD("^committer .*gitster")
    GREP_NODE_OR(
        GREP_PATTERN_BODY("difftool")
        GREP_PATTERN_BODY("mergetool")
    )           

we would be looking for your patch about either diff or mergetool _and_
it has to be committed by me.  On the other hand, if we do this:

    GREP_NODE_OR(
        GREP_PATTERN_HEAD("^author .*davvid")
        GREP_PATTERN_HEAD("^committer .*gitster")
    )
    GREP_NODE_OR(
        GREP_PATTERN_BODY("difftool")
        GREP_PATTERN_BODY("mergetool")
    )           

we would be looking for a patch about (either diff or mergetool) _and_
(either committed by me or written by you).

I think the former makes more sense in _our_ project (because there are
very few committers), but in the context of other projects, e.g. the Linux
kernel, you may want to give "Linus" to both --author and --committer to
track what he did (either as an author to some other subsystem, or as the
top-level integrator for the entire system), and for such a use case, the
latter would make more sense.

Unfortunately, the parsing of --grep/--author/--committer options to the
log family is quite limited (you cannot give --and, --or and --not, for
example), and it would be hard to express these distinction.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-16  4:15               ` Junio C Hamano
  2010-01-16  6:51                 ` David Aguilar
@ 2010-01-18  1:51                 ` Jeff King
  2010-01-18  3:35                   ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff King @ 2010-01-18  1:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nanako Shiraishi

On Fri, Jan 15, 2010 at 08:15:49PM -0800, Junio C Hamano wrote:

> Realistically, this most often is used when grepping in the log, e.g.
> 
>     git log --all-match --author=peff --grep=test
> 
> I actually wish "log" to somehow default to --all-match mode at least when
> using the --author option.  "Change by Jeff, or about test by anybody" is
> rarely what I would want to look for.

Agreed. That is the most common log grep pattern for me (author + grep),
and I always want all-match. I see from later in the thread, though,
that implementing it is not as straightforward as we might hope.

I would personally be fine with "--all-match" being the default, but
that may be too big a change in behavior for some people to swallow (I
would also be fine with log.allmatch in the config, but every time I
suggest something like that people's heads explode and I get told to
make an alias).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-18  1:51                 ` Jeff King
@ 2010-01-18  3:35                   ` Junio C Hamano
  2010-01-18  4:02                     ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-01-18  3:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Nanako Shiraishi

Jeff King <peff@peff.net> writes:

> Agreed. That is the most common log grep pattern for me (author + grep),
> and I always want all-match. I see from later in the thread, though,
> that implementing it is not as straightforward as we might hope.

I haven't looked at the codepath for quite some time but I have a feeling
that it probably won't be too bad.

It just won't be as simple as flipping the all_match bit with a one-liner.

Before calling compile_grep_patterns() in revision.c::setup_revisions(),
we probably would want to massage revs->grep_filter to result in the
desired grep expression parse tree, i.e. from

    GREP_PATTERN_HEAD("^author .*davvid")
    GREP_PATTERN_HEAD("^committer .*gitster")
    GREP_PATTERN_BODY("difftool")
    GREP_PATTERN_BODY("mergetool")

to

    GREP_PATTERN_HEAD("^author .*davvid")
    GREP_PATTERN_HEAD("^committer .*gitster")
    GREP_NODE_OR(
        GREP_PATTERN_BODY("difftool")
        GREP_PATTERN_BODY("mergetool")
    )           

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-18  3:35                   ` Junio C Hamano
@ 2010-01-18  4:02                     ` Junio C Hamano
  2010-01-18  5:57                       ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-01-18  4:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Aguilar, Nanako Shiraishi

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> Agreed. That is the most common log grep pattern for me (author + grep),
>> and I always want all-match. I see from later in the thread, though,
>> that implementing it is not as straightforward as we might hope.
>
> I haven't looked at the codepath for quite some time but I have a feeling
> that it probably won't be too bad.
>
> It just won't be as simple as flipping the all_match bit with a one-liner.

Perhaps something like this.

-- >8 --
Subject: "log --author=me --grep=it" should find intersection, not union

Historically, any grep filter in "git log" family of commands were taken
as restricting to commits with any of the words in the commit log message.
However, the user almost always want to find commits "done by this person
on that topic".  With "--all-match" option, a series of grep patterns can
be turned into a requirement that all of them must produce a match, but
that makes it impossible to ask for "done by me, on either this or that"
with:

	log --author=me --grep=this --grep=that

because it will require both "this" and "that" to appear.

Change the "header" parser of grep library to treat the headers specially.
When parsing the above, behave as if it was specified like this on the
command line:

	--all-match --author=me '(' --grep=this --grep=that ')'

Even though the "log" command line parser doesn't give direct access to
the extended grep syntax to group terms with parentheses, this change will
cover the majority of the case the users would want.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-grep.c |    1 +
 grep.c         |   20 ++++++++++++++++++--
 grep.h         |    2 ++
 revision.c     |    1 +
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index 529461f..d57c4d9 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -820,6 +820,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	opt.relative = 1;
 	opt.pathname = 1;
 	opt.pattern_tail = &opt.pattern_list;
+	opt.header_tail = &opt.header_list;
 	opt.regflags = REG_NEWLINE;
 	opt.max_depth = -1;
 
diff --git a/grep.c b/grep.c
index bdadf2c..f51fa4a 100644
--- a/grep.c
+++ b/grep.c
@@ -11,8 +11,8 @@ void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field fie
 	p->no = 0;
 	p->token = GREP_PATTERN_HEAD;
 	p->field = field;
-	*opt->pattern_tail = p;
-	opt->pattern_tail = &p->next;
+	*opt->header_tail = p;
+	opt->header_tail = &p->next;
 	p->next = NULL;
 }
 
@@ -173,6 +173,22 @@ void compile_grep_patterns(struct grep_opt *opt)
 {
 	struct grep_pat *p;
 
+	if (opt->header_list && !opt->all_match) {
+		struct grep_pat *p = opt->pattern_list;
+		opt->pattern_list = opt->header_list;
+		opt->pattern_tail = opt->header_tail;
+		opt->header_list = NULL;
+		opt->header_tail = NULL;
+
+		append_grep_pattern(opt, "(", "internal", 0, GREP_OPEN_PAREN);
+		while (p) {
+			*opt->pattern_tail = p;
+			opt->pattern_tail = &p->next;
+			p = p->next;
+		}
+		append_grep_pattern(opt, ")", "internal", 0, GREP_CLOSE_PAREN);
+		opt->all_match = 1;
+	}
 	if (opt->all_match)
 		opt->extended = 1;
 
diff --git a/grep.h b/grep.h
index 75370f6..e39e514 100644
--- a/grep.h
+++ b/grep.h
@@ -59,6 +59,8 @@ struct grep_expr {
 struct grep_opt {
 	struct grep_pat *pattern_list;
 	struct grep_pat **pattern_tail;
+	struct grep_pat *header_list;
+	struct grep_pat **header_tail;
 	struct grep_expr *pattern_expression;
 	const char *prefix;
 	int prefix_length;
diff --git a/revision.c b/revision.c
index 25fa14d..18a3658 100644
--- a/revision.c
+++ b/revision.c
@@ -806,6 +806,7 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 
 	revs->grep_filter.status_only = 1;
 	revs->grep_filter.pattern_tail = &(revs->grep_filter.pattern_list);
+	revs->grep_filter.header_tail = &(revs->grep_filter.header_list);
 	revs->grep_filter.regflags = REG_NEWLINE;
 
 	diff_setup(&revs->diffopt);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-18  4:02                     ` Junio C Hamano
@ 2010-01-18  5:57                       ` Jeff King
  2010-01-18  6:30                         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2010-01-18  5:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Aguilar, Nanako Shiraishi

On Sun, Jan 17, 2010 at 08:02:25PM -0800, Junio C Hamano wrote:

> Subject: "log --author=me --grep=it" should find intersection, not union
> 
> Historically, any grep filter in "git log" family of commands were taken
> as restricting to commits with any of the words in the commit log message.
> However, the user almost always want to find commits "done by this person
> on that topic".  With "--all-match" option, a series of grep patterns can
> be turned into a requirement that all of them must produce a match, but
> that makes it impossible to ask for "done by me, on either this or that"
> with:
> 
> 	log --author=me --grep=this --grep=that
> 
> because it will require both "this" and "that" to appear.
> 
> Change the "header" parser of grep library to treat the headers specially.
> When parsing the above, behave as if it was specified like this on the
> command line:
> 
> 	--all-match --author=me '(' --grep=this --grep=that ')'
> 
> Even though the "log" command line parser doesn't give direct access to
> the extended grep syntax to group terms with parentheses, this change will
> cover the majority of the case the users would want.

Hmm. I like the new behavior. The implementation feels a little
hack-ish, like we should really be supporting full-on:

  git log --author=me --and --grep=foo

That gets a little weird, though. We already have "--not" for ref
limiting, so clearly there is some conflict over exactly what logical
operators would be operating on. I guess we could use context to see
that the adjacent arguments were grep-related.

So perhaps, as you say, this is enough as it covers the usual case.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-18  5:57                       ` Jeff King
@ 2010-01-18  6:30                         ` Junio C Hamano
  2010-01-18  6:50                           ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-01-18  6:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Aguilar, Nanako Shiraishi

Jeff King <peff@peff.net> writes:

> Hmm. I like the new behavior. The implementation feels a little
> hack-ish, like we should really be supporting full-on:
>
>   git log --author=me --and --grep=foo
>
> That gets a little weird, though. We already have "--not" for ref
> limiting, so clearly there is some conflict ...

That is fundamentally wrong.

Remember, "grep" works on two levels: a line matches or does not match the
given set of patterns (rather, the expression given), and matched lines
are shown.  A file as a whole is considered to have matched if one or more
lines produced a match, or under the --all-match option, only when all of
the top-level ORed terms in the expression have fired for some lines in
it.

And --not and --and are both elements of grep expression that determines
if the expression matches "a single line".  --author=me --and --grep=foo
would ask: does the "^author " line in the header have "me" _and_ also
string "foo" on it at the same time?

IOW, most of the "logical" stuff (including the precedence binding
parentheses) works at a line level.  --all-match is currently the only
thing that affects "grep -l" (and "will the commit get shown") behaviour
by collecting hits from the whole buffer.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] grep --no-index: allow use of "git grep" outside a git repository
  2010-01-18  6:30                         ` Junio C Hamano
@ 2010-01-18  6:50                           ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2010-01-18  6:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Aguilar, Nanako Shiraishi

On Sun, Jan 17, 2010 at 10:30:19PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Hmm. I like the new behavior. The implementation feels a little
> > hack-ish, like we should really be supporting full-on:
> >
> >   git log --author=me --and --grep=foo
> >
> > That gets a little weird, though. We already have "--not" for ref
> > limiting, so clearly there is some conflict ...
> 
> That is fundamentally wrong.
> 
> Remember, "grep" works on two levels: a line matches or does not match the
> given set of patterns (rather, the expression given), and matched lines
> are shown.  A file as a whole is considered to have matched if one or more
> lines produced a match, or under the --all-match option, only when all of
> the top-level ORed terms in the expression have fired for some lines in
> it.

Fundamentally wrong for the way "log --grep" is currently implemented
perhaps, but I don't see anything wrong with considering each commit as
a single "record", just as regular grep considers each line to be a
record. That is a much more useful distinction for log traversal than
lines, which are useless from the user's perspective. If searching for
two terms, I care about whether they are in the same commit message, but
I don't care at all about line breaks.

Yes, I know that internally --author is really about line-matching the
commit headers, but that is an implementation detail. The mental model
we should present to the user is record-matching based on specific
fields like author, committer, or body text.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Filenames and prefixes in extended diffs
  2010-01-14  0:16 ` Junio C Hamano
  2010-01-15 13:32   ` Nanako Shiraishi
@ 2010-01-18 23:22   ` Andreas Gruenbacher
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Gruenbacher @ 2010-01-18 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thursday 14 January 2010 01:16:21 am Junio C Hamano wrote:
> Andreas Gruenbacher <agruen@suse.de> writes:
> > Can git be changed to ...
> 
> Just to save your time coming up with more ways to *change* git diff...
> 
> Even though I wouldn't say _any_ change is too late to bring in, change in
> the output format from "git diff" family _must_ be usable by "git apply"
> people have been using for the last 4 years or so.
> 
> Suppose your updated version of "git diff" with a certain set of options
> produces output A, which is different from the output B you would get out
> of today's "git diff" that is run with the same set of options.
> 
> If "git apply" people have been using understands B (i.e. current output)
> and does something, the format change between A and B must be designed in
> such a way that the same "git apply" accepts A (i.e. your output) and do
> the same thing.
> 
> Two examples:
> 
>  - "git diff -M" (or "git show -M") is _defined_ to show the filenames
>    without prefix on "rename from" line, and deployed "git apply" relies
>    on this definition to apply the patch to the file the patch was meant
>    to apply.  If your modified "git diff -M" changes it to add the prefix,
>    and existing "git apply" changes behaviour (either by rejecting your
>    output, or applying the patch to a wrong file), then such a change has
>    *no chance* of getting in.  It is merely a breakage.

Git apply is currently broken in some (uncommon) cases.  Consider the 
following two patches:

	$ cat add.diff
	diff --git a/d/f b/d/f
	new file mode 100644
	index 0000000..6a69f92
	--- /dev/null
	+++ b/d/f
	@@ -0,0 +1 @@
	+f

	$ cat rename.diff
	diff --git a/d/f b/d/g
	similarity index 100%
	rename from d/f
	rename to d/g

They apply fine after each other with plain "git apply".  When you try to 
apply them to a different location, things break though:

	$ git apply --directory e -p2 add.diff
	$ git apply --directory e -p2 rename.diff
	error: e/d/f: No such file or directory

Had the second patch not been generated with "git diff -M", things would just 
have worked; in other words, the -M format is broken.

I think the easiest way to fix this in "git apply" would be to figure out what 
the missing prefixes are in the rename and copy lines (in this case, just "a/" 
and "b/"), and to prepend those prefixes to the decoded filenames before 
stripping off pathname components.  This is the same as just taking the 
filenames from the "diff --git" line, and ignoring the filenames in the "copy 
from", "copy to", "rename from", and "rename to" headers.

I don't see a way how to fix this in the existing headers in a backwards 
compatible way.  Do you?

Independent of whether and how this is addressed, can I please at least have 
the "diff --git" line parsing problem fixed so that filenames which contain 
spaces are put in double quotes there?  Then I can at least ignore all the 
prefix-less filenames in GNU patch and still make it understand git's output.

>  - If you say "git diff --src-prefix=a/b/c --dst-prefix=x/y", it _might_
>    produce something "git apply" won't grok (I haven't checked this,
>    though).  You can suggest to change the output from such a case to work
>    better.  We didn't work as expected so a change _could_ be a fix.

The output format for that is fine.

> The output from "git diff --no-index" is an exception to the above rule.
> It is primarily for people who have unmanaged contents and want to use
> features of the git diff engine that are not found in other people's diff
> implementations (e.g. wordwise colored diff), and the header part of its
> output does not currently follow "git diff" convention to be grokkable by
> "git apply".
> 
> Fixing _that_ is a welcome change, but I suspect that there are corner
> cases, e.g. "git diff --no-index frotz-1.2.36/ /tmp/frotz/" (i.e. you have
> a pristine version in frotz-1.2.36 directory, but your modified version is
> in /tmp/frtoz/) that might make fixing it fundamentally impossible (I
> haven't looked into it for a long time, so it could be easy, but my gut
> feeling is it isn't).

Patches with a different number of components in the from and to prefixes are 
a really bad idea.  (GNU patch will prefer the pathname with the fewer 
components, but this yould just as well be the wrong one.)

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-01-18 23:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-13 16:13 Filenames and prefixes in extended diffs Andreas Gruenbacher
2010-01-13 19:49 ` Junio C Hamano
2010-01-14  0:16 ` Junio C Hamano
2010-01-15 13:32   ` Nanako Shiraishi
2010-01-15 18:09     ` Junio C Hamano
2010-01-15 20:50       ` [PATCH] grep: prepare to run outside of a work tree Junio C Hamano
2010-01-15 20:52       ` [PATCH] grep --no-index: allow use of "git grep" outside a git repository Junio C Hamano
2010-01-15 21:08         ` Jeff King
2010-01-16  1:05           ` Junio C Hamano
2010-01-16  1:15             ` Jeff King
2010-01-16  4:15               ` Junio C Hamano
2010-01-16  6:51                 ` David Aguilar
2010-01-16  7:21                   ` Junio C Hamano
2010-01-18  1:51                 ` Jeff King
2010-01-18  3:35                   ` Junio C Hamano
2010-01-18  4:02                     ` Junio C Hamano
2010-01-18  5:57                       ` Jeff King
2010-01-18  6:30                         ` Junio C Hamano
2010-01-18  6:50                           ` Jeff King
2010-01-18 23:22   ` Filenames and prefixes in extended diffs Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).