* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
@ 2013-09-22 19:15 Jonathon Mah
2013-09-24 6:57 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Jonathon Mah @ 2013-09-22 19:15 UTC (permalink / raw)
To: phil.hord, Jeff King; +Cc: git@vger.kernel.org
> > HEAD:/some/path/to/foo.txt
> > HEAD:some/path/to/foo.txt
>
> With my patch it prints the latter.
>
> This is because get_sha1_with_context("HEAD:"...) returns an empty
> 'path' string. The code decides to use ':' as the delimiter in that
> case, but it sees there already is one at the end of "HEAD:".
A few days ago I came across the same "surprising" output of git-grep, tried to adjust the code to print "git show"-able object names, and ran into similar subtleties. I just found this thread, and Jeff's code handles more cases than mine did (I didn't try Phil's initial patch), but I can add some more test cases with non-showable output (again related to git-grep's path scoping):
$ git grep -l cache HEAD:./ | head -1
HEAD:./:.gitignore
$ cd Documentation
$ git grep -l cache HEAD | head -1
HEAD:CodingGuidelines
$ git grep -l cache HEAD:Documentation/CodingGuidelines
../HEAD:Documentation/CodingGuidelines
(woah!)
Sorry that I don't yet have anything useful to suggest! But I can tell the story of my use case:
I have a large repository (1.6GB bare) which I don't work on, but which contains code that I need to refer to. A checkout is ~600MB and 27k files, which I'd like to avoid (it's redundant data, and would slow down backups of my drive). I found myself "git-grep"ping through parts of the tree, looking through the results, and then "git-show"ing interesting files. Having a real object name in the grep output allows copy-and-paste of the object path.
Jonathon Mah
me@JonathonMah.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-09-22 19:15 [PATCH 2/2] grep: use slash for path delimiter, not colon Jonathon Mah @ 2013-09-24 6:57 ` Jeff King 0 siblings, 0 replies; 8+ messages in thread From: Jeff King @ 2013-09-24 6:57 UTC (permalink / raw) To: Jonathon Mah; +Cc: phil.hord, git@vger.kernel.org On Sun, Sep 22, 2013 at 12:15:21PM -0700, Jonathon Mah wrote: > A few days ago I came across the same "surprising" output of git-grep, > tried to adjust the code to print "git show"-able object names, and > ran into similar subtleties. I just found this thread, and Jeff's code > handles more cases than mine did (I didn't try Phil's initial patch), > but I can add some more test cases with non-showable output (again > related to git-grep's path scoping): > If you haven't read the side thread starting at [1], there are some arguments that git-grep is doing the right thing already. I think there are a few issues at play here: > $ git grep -l cache HEAD:./ | head -1 > HEAD:./:.gitignore As you show, using a colon separator from a tree-ish that contains a partial path looks bad. The downside of turning this into a slash, though, is that you lose the information of the tree-ish. See [2]. > $ cd Documentation > $ git grep -l cache HEAD | head -1 > HEAD:CodingGuidelines Grepping from a subdirectory produces relative paths that look like real tree:path specifications, but aren't. Showing the full path would potentially be cluttering if you are in a deep directory. These days we have the "./" syntax, though, so we could perhaps output: HEAD:./CodingGuidelines which is succinct but can be used to access the path in the tree (and makes more clear to the user that we have only grepped in the current subdirectory). > $ git grep -l cache HEAD:Documentation/CodingGuidelines > ../HEAD:Documentation/CodingGuidelines > (woah!) That one just seems nonsensical and buggy to me. We should not be applying ".." at all to a blob spec like this. -Peff [1] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=232980 [2] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=233004 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2] grep: use slash for path delimiter, not colon
@ 2013-08-26 19:53 Jeff King
2013-08-26 19:56 ` [PATCH 2/2] " Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2013-08-26 19:53 UTC (permalink / raw)
To: Phil Hord; +Cc: git, phil.hord, Junio C Hamano, Jonathan Nieder
On Mon, Aug 26, 2013 at 03:28:26PM -0400, Jeff King wrote:
> Changing the object_array API would be hard, but I don't think we need
> to do it here. Can we simply stop using object_array to pass the list,
> and instead just have a custom list?
>
> I'll see how painful that is.
Not very, I think. Here's the series.
[1/2]: grep: stop using object_array
[2/2]: grep: use slash for path delimiter, not colon
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 19:53 [PATCHv2] " Jeff King @ 2013-08-26 19:56 ` Jeff King 2013-08-26 20:13 ` Johannes Sixt 0 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2013-08-26 19:56 UTC (permalink / raw) To: Phil Hord; +Cc: git, phil.hord, Junio C Hamano, Jonathan Nieder From: Phil Hord <hordp@cisco.com> When a commit is grepped and matching filenames are printed, grep-objects creates the filename by prefixing the original cmdline argument to the matched path separated by a colon. Normally this forms a valid blob reference to the filename, like this: git grep -l foo HEAD HEAD:some/path/to/foo.txt ^ But a tree path may be given to grep instead; in this case the colon is not a valid delimiter to use since it is placed inside a path. git grep -l foo HEAD:some HEAD:some:path/to/foo.txt ^ The slash path delimiter should be used instead. Fix git grep to discern the correct delimiter so it can report valid object names. git grep -l foo HEAD:some HEAD:some/path/to/foo.txt ^ Also, prevent the delimiter being added twice, as happens now in these examples: git grep -l foo HEAD: HEAD::some/path/to/foo.txt ^ git grep -l foo HEAD:some/ HEAD:some/:path/to/foo.txt ^ Add a test to confirm correct path forming. Signed-off-by: Jeff King <peff@peff.net> --- I left the author as you, since you have done all the hard work; this is really just me rebasing your patch on top of mine. But note that you did not signoff the original. builtin/grep.c | 13 +++++++++---- t/t7810-grep.sh | 15 +++++++++++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index ee47d49..2df7986 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -26,6 +26,7 @@ struct object_to_grep { struct object_to_grep { struct object *item; const char *name; + unsigned has_path:1; }; static int use_threads = 1; @@ -462,7 +463,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, } static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, - struct object *obj, const char *name) + struct object *obj, const char *name, char delimiter) { if (obj->type == OBJ_BLOB) return grep_sha1(opt, obj->sha1, name, 0, NULL); @@ -485,7 +486,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_init(&base, PATH_MAX + len + 1); if (len) { strbuf_add(&base, name, len); - strbuf_addch(&base, ':'); + if (name[len-1] != delimiter) + strbuf_addch(&base, delimiter); } init_tree_desc(&tree, data, size); hit = grep_tree(opt, pathspec, &tree, &base, base.len, @@ -506,7 +508,8 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec, for (i = 0; i < nr; i++) { struct object *real_obj; real_obj = deref_tag(list[i].item, NULL, 0); - if (grep_object(opt, pathspec, real_obj, list[i].name)) { + if (grep_object(opt, pathspec, real_obj, list[i].name, + list[i].has_path ? '/' : ':')) { hit = 1; if (opt->status_only) break; @@ -822,8 +825,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) for (i = 0; i < argc; i++) { const char *arg = argv[i]; unsigned char sha1[20]; + struct object_context oc; /* Is it a rev? */ - if (!get_sha1(arg, sha1)) { + if (!get_sha1_with_context(arg, 0, sha1, &oc)) { struct object *object = parse_object_or_die(sha1, arg); if (!seen_dashdash) verify_non_filename(prefix, arg); @@ -831,6 +835,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) ALLOC_GROW(list, list_nr+1, list_alloc); list[list_nr].item = object; list[list_nr].name = arg; + list[list_nr].has_path = !!oc.path[0]; list_nr++; continue; } diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index f698001..2494bfc 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -886,6 +886,21 @@ cat >expected <<EOF ' cat >expected <<EOF +HEAD:t/a/v:1:vvv +HEAD:t/v:1:vvv +EOF + +test_expect_success "grep HEAD -- path/" ' + git grep -n -e vvv HEAD -- t/ >actual && + test_cmp expected actual +' + +test_expect_success "grep HEAD:path" ' + git grep -n -e vvv HEAD:t/ >actual && + test_cmp expected actual +' + +cat >expected <<EOF hello.c:int main(int argc, const char **argv) hello.c: printf("Hello world.\n"); EOF -- 1.8.4.2.g87d4a77 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 19:56 ` [PATCH 2/2] " Jeff King @ 2013-08-26 20:13 ` Johannes Sixt 2013-08-26 20:52 ` Phil Hord 2013-08-26 20:52 ` Jeff King 0 siblings, 2 replies; 8+ messages in thread From: Johannes Sixt @ 2013-08-26 20:13 UTC (permalink / raw) To: phil.hord; +Cc: Jeff King, Phil Hord, git, Junio C Hamano, Jonathan Nieder Am 26.08.2013 21:56, schrieb Jeff King: > Also, prevent the delimiter being added twice, as happens now in these > examples: > > git grep -l foo HEAD: > HEAD::some/path/to/foo.txt > ^ Which one of these two does it print then? HEAD:/some/path/to/foo.txt HEAD:some/path/to/foo.txt -- Hannes ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 20:13 ` Johannes Sixt @ 2013-08-26 20:52 ` Phil Hord 2013-08-26 20:52 ` Jeff King 1 sibling, 0 replies; 8+ messages in thread From: Phil Hord @ 2013-08-26 20:52 UTC (permalink / raw) To: Johannes Sixt Cc: Jeff King, Phil Hord, git@vger.kernel.org, Junio C Hamano, Jonathan Nieder On Mon, Aug 26, 2013 at 4:13 PM, Johannes Sixt <j6t@kdbg.org> wrote: > Am 26.08.2013 21:56, schrieb Jeff King: >> Also, prevent the delimiter being added twice, as happens now in these >> examples: >> >> git grep -l foo HEAD: >> HEAD::some/path/to/foo.txt >> ^ > > Which one of these two does it print then? > > HEAD:/some/path/to/foo.txt > HEAD:some/path/to/foo.txt With my patch it prints the latter. This is because get_sha1_with_context("HEAD:"...) returns an empty 'path' string. The code decides to use ':' as the delimiter in that case, but it sees there already is one at the end of "HEAD:". Phil ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 20:13 ` Johannes Sixt 2013-08-26 20:52 ` Phil Hord @ 2013-08-26 20:52 ` Jeff King 2013-08-26 21:03 ` Phil Hord 1 sibling, 1 reply; 8+ messages in thread From: Jeff King @ 2013-08-26 20:52 UTC (permalink / raw) To: Johannes Sixt; +Cc: phil.hord, Phil Hord, git, Junio C Hamano, Jonathan Nieder On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote: > Am 26.08.2013 21:56, schrieb Jeff King: > > Also, prevent the delimiter being added twice, as happens now in these > > examples: > > > > git grep -l foo HEAD: > > HEAD::some/path/to/foo.txt > > ^ > > Which one of these two does it print then? > > HEAD:/some/path/to/foo.txt > HEAD:some/path/to/foo.txt It should (and does) print the latter. But I do note that our pathspec handling for subdirectories seems buggy. If you do: $ cd Documentation $ git grep -l foo | head -1 RelNotes/1.5.1.5.txt that's fine; we limit to the current directory. But then if you do: $ git grep -l foo HEAD | head -1 HEAD:RelNotes/1.5.1.5.txt we still limit to the current directory, but the output does note note this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is orthogonal to Phil's patch, though. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 20:52 ` Jeff King @ 2013-08-26 21:03 ` Phil Hord 2013-08-26 21:13 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: Phil Hord @ 2013-08-26 21:03 UTC (permalink / raw) To: Jeff King Cc: Johannes Sixt, Phil Hord, git@vger.kernel.org, Junio C Hamano, Jonathan Nieder On Mon, Aug 26, 2013 at 4:52 PM, Jeff King <peff@peff.net> wrote: > On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote: > >> Am 26.08.2013 21:56, schrieb Jeff King: >> > Also, prevent the delimiter being added twice, as happens now in these >> > examples: >> > >> > git grep -l foo HEAD: >> > HEAD::some/path/to/foo.txt >> > ^ >> >> Which one of these two does it print then? >> >> HEAD:/some/path/to/foo.txt >> HEAD:some/path/to/foo.txt > > It should (and does) print the latter. > > But I do note that our pathspec handling for subdirectories seems buggy. > If you do: > > $ cd Documentation > $ git grep -l foo | head -1 > RelNotes/1.5.1.5.txt > > that's fine; we limit to the current directory. But then if you do: > > $ git grep -l foo HEAD | head -1 > HEAD:RelNotes/1.5.1.5.txt > > we still limit to the current directory, but the output does not note > this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is > orthogonal to Phil's patch, though. Maybe not. My path completes the assumption that the L:R value returned by grep is an object ref; but Junio still thought it wasn't. I think this is another case where his view was correct. There's more bad news on this front. $ cd Documentation $ git grep -l foo HEAD .. | head -1 HEAD:../.gitignore That's not a valid ref, either (though maybe it could be). Phil ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] grep: use slash for path delimiter, not colon 2013-08-26 21:03 ` Phil Hord @ 2013-08-26 21:13 ` Jeff King 0 siblings, 0 replies; 8+ messages in thread From: Jeff King @ 2013-08-26 21:13 UTC (permalink / raw) To: Phil Hord Cc: Johannes Sixt, Phil Hord, git@vger.kernel.org, Junio C Hamano, Jonathan Nieder On Mon, Aug 26, 2013 at 05:03:04PM -0400, Phil Hord wrote: > > $ git grep -l foo HEAD | head -1 > > HEAD:RelNotes/1.5.1.5.txt > > > > we still limit to the current directory, but the output does not note > > this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is > > orthogonal to Phil's patch, though. > > Maybe not. My path completes the assumption that the L:R value > returned by grep is an object ref; but Junio still thought it wasn't. > I think this is another case where his view was correct. I certainly assumed it was, because it is in most cases it is. And something like "HEAD:RelNotes/1.5.1.5.txt" certainly _looks_ like one, and is generated by the current git. And what is the point of coming up with a file listing if the names you return do not actually exist? > There's more bad news on this front. > > $ cd Documentation > $ git grep -l foo HEAD .. | head -1 > HEAD:../.gitignore > > That's not a valid ref, either (though maybe it could be). Yes, though we seem to normalize paths already. So the other entries from that command are (in git.git): HEAD:../.mailmap HEAD:RelNotes/1.5.1.5.txt So we could either: 1. Prepend the current path before normalizing to yield: HEAD:.mailmap HEAD:Documentation/RelNotes/1.5.1.5.txt 2. Teach the get_sha1 path parser about "..", and prepend "./" when we are in a prefixed subdir. HEAD:./../.mailmap HEAD:./RelNotes/1.5.1.5.txt -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-09-24 6:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-22 19:15 [PATCH 2/2] grep: use slash for path delimiter, not colon Jonathon Mah 2013-09-24 6:57 ` Jeff King -- strict thread matches above, loose matches on Subject: below -- 2013-08-26 19:53 [PATCHv2] " Jeff King 2013-08-26 19:56 ` [PATCH 2/2] " Jeff King 2013-08-26 20:13 ` Johannes Sixt 2013-08-26 20:52 ` Phil Hord 2013-08-26 20:52 ` Jeff King 2013-08-26 21:03 ` Phil Hord 2013-08-26 21:13 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).