git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] Fix path prefixing in grep_object
@ 2013-08-25  1:35 Phil Hord
  2013-08-25  2:07 ` Phil Hord
  2013-08-25  4:41 ` Jeff King
  0 siblings, 2 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-25  1:35 UTC (permalink / raw)
  To: git; +Cc: phil.hord, Phil Hord

When the pathspec given to grep includes a tree name, the full
name of matched files is assembled using colon as a separator.
If the pathspec includes a tree name, it should use a slash
instead.

Check if the pathspec already names a tree and ref (including
a colon) and use a slash if so.
---

I'm not sure about the detection I used here.  It works, but it is
not terribly robust.  Is there a better way to handle this?  Maybe
something like 'prefix_pathspec(name,"");'.

 builtin/grep.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 03bc442..d0deae4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -480,8 +480,9 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
+			int has_colon = !!strchr(name,':');
 			strbuf_add(&base, name, len);
-			strbuf_addch(&base, ':');
+			strbuf_addch(&base, has_colon?'/':':');
 		}
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
-- 
1.8.4.557.g34b3a2e

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  1:35 [RFC/PATCH] Fix path prefixing in grep_object Phil Hord
@ 2013-08-25  2:07 ` Phil Hord
  2013-08-25  3:35   ` Junio C Hamano
  2013-08-25  4:41 ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Phil Hord @ 2013-08-25  2:07 UTC (permalink / raw)
  To: Phil Hord; +Cc: git@vger.kernel.org

On Sat, Aug 24, 2013 at 9:35 PM, Phil Hord <hordp@cisco.com> wrote:
>
> When the pathspec given to grep includes a tree name, the full
> name of matched files is assembled using colon as a separator.
> If the pathspec includes a tree name, it should use a slash
> instead.
>
> Check if the pathspec already names a tree and ref (including
> a colon) and use a slash if so.

I think I used lots of wrong terminology there.  What do I call these
things?

HEAD:path is a tree.

HEAD is a commit name.

Maybe like this?

  When a tree is given to grep, the full name of matched files
  is assembled using colon as a separator.

  If the tree name includes an object name, as in
  HEAD:some/path, it should use a slash instead.

Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  2:07 ` Phil Hord
@ 2013-08-25  3:35   ` Junio C Hamano
  2013-08-25  4:23     ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-08-25  3:35 UTC (permalink / raw)
  To: Phil Hord; +Cc: Phil Hord, git@vger.kernel.org

Phil Hord <phil.hord@gmail.com> writes:

> On Sat, Aug 24, 2013 at 9:35 PM, Phil Hord <hordp@cisco.com> wrote:
>>
>> When the pathspec given to grep includes a tree name, the full
>> name of matched files is assembled using colon as a separator.
>> If the pathspec includes a tree name, it should use a slash
>> instead.
>>
>> Check if the pathspec already names a tree and ref (including
>> a colon) and use a slash if so.
>
> I think I used lots of wrong terminology there.  What do I call these
> things?
>
> HEAD:path is a tree.

It is one way to name a tree object, yes (another obvious way is to
spell it out, e.g. 3610ac62).

> HEAD is a commit name.

It is one way to name the commit object, yes, but I am guessing that
you are interested in these in the context of resolving HEAD:path.
If that is the case, then "commit"-ness is not something you are
interested in---any tree-ish would do (e.g. v1.8.4, maint^{tree}).

> Maybe like this?
>
>   When a tree is given to grep, the full name of matched files
>   is assembled using colon as a separator.
>
>   If the tree name includes an object name, as in
>   HEAD:some/path, it should use a slash instead.

What problem are you trying to solve?  It should use HEAD:some/path,
not HEAD/some/path.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  3:35   ` Junio C Hamano
@ 2013-08-25  4:23     ` Jonathan Nieder
  2013-08-25  5:25       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2013-08-25  4:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phil Hord, Phil Hord, git@vger.kernel.org

Junio C Hamano wrote:
> Phil Hord <phil.hord@gmail.com> writes:
>> On Sat, Aug 24, 2013 at 9:35 PM, Phil Hord <hordp@cisco.com> wrote:

>>> When the pathspec given to grep includes a tree name, the full
>>> name of matched files is assembled using colon as a separator.
>>> If the pathspec includes a tree name, it should use a slash
>>> instead.
[...]
>>   If the tree name includes an object name, as in
>>   HEAD:some/path, it should use a slash instead.
>
> What problem are you trying to solve?  It should use HEAD:some/path,
> not HEAD/some/path.

I think Phil meant that when "git grep" is asked to search within
"HEAD:some/path", filenames tacked on at the end should be appended
with a '/' separator instead of the usual ':' (e.g.,
"HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").
Otherwise I cannot copy and paste "git grep" output and get something
suitable for passing to "git show".

I don't think we have a standard name for the tree:path syntax.  I've
always just called it tree:path syntax. :)

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  1:35 [RFC/PATCH] Fix path prefixing in grep_object Phil Hord
  2013-08-25  2:07 ` Phil Hord
@ 2013-08-25  4:41 ` Jeff King
  2013-08-25  5:41   ` Jonathan Nieder
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2013-08-25  4:41 UTC (permalink / raw)
  To: Phil Hord; +Cc: Junio C Hamano, Jonathan Nieder, git, phil.hord

On Sat, Aug 24, 2013 at 09:35:58PM -0400, Phil Hord wrote:

> When the pathspec given to grep includes a tree name, the full
> name of matched files is assembled using colon as a separator.
> If the pathspec includes a tree name, it should use a slash
> instead.
> 
> Check if the pathspec already names a tree and ref (including
> a colon) and use a slash if so.

Makes sense.

> I'm not sure about the detection I used here.  It works, but it is
> not terribly robust.  Is there a better way to handle this?  Maybe
> something like 'prefix_pathspec(name,"");'.

I think the information you want has been thrown away by the time we get
to grep_object. Only get_sha1 knows whether the name was really a direct
tree reference or if it had to traverse paths.

So we are necessarily reconstructing based on what we know of the
syntax. And I think that your rule is OK, because we know that refnames
cannot contain a colon. So even though pathnames can, we do not have to
care; we only want to know "is there a path in the name", and if we have
at least one colon, the answer is yes.

>  builtin/grep.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

A test would be nice. Both to make sure we do not re-break it, and because
it helps demonstrate the problem very easily (it took me a minute to
figure out what was going on from your description).

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 03bc442..d0deae4 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -480,8 +480,9 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
>  		len = name ? strlen(name) : 0;
>  		strbuf_init(&base, PATH_MAX + len + 1);
>  		if (len) {
> +			int has_colon = !!strchr(name,':');
>  			strbuf_add(&base, name, len);
> -			strbuf_addch(&base, ':');
> +			strbuf_addch(&base, has_colon?'/':':');

Please use whitespace with your ternary operator. The '/':':' made me
think I was reading Perl for a minute. :)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  4:23     ` Jonathan Nieder
@ 2013-08-25  5:25       ` Junio C Hamano
  2013-08-26  7:14         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-08-25  5:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Phil Hord, Phil Hord, git@vger.kernel.org

Jonathan Nieder <jrnieder@gmail.com> writes:

> I think Phil meant that when "git grep" is asked to search within
> "HEAD:some/path", filenames tacked on at the end should be appended
> with a '/' separator instead of the usual ':' (e.g.,
> "HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").

Ah, OK.

I am not sure if we have a proper hint in the revision parser
machinery, but it may not be too hard to teach rev-cmdline interface
to keep that kind of information (i.e. "This tree object name is a
result of parsing '<tree-ish>:path' syntax").

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  4:41 ` Jeff King
@ 2013-08-25  5:41   ` Jonathan Nieder
  2013-08-25  5:54     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2013-08-25  5:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Phil Hord, Junio C Hamano, git, phil.hord

Jeff King wrote:

> So we are necessarily reconstructing based on what we know of the
> syntax. And I think that your rule is OK, because we know that refnames
> cannot contain a colon.

What happens with expressions like HEAD^{/test:}?

Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  5:41   ` Jonathan Nieder
@ 2013-08-25  5:54     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2013-08-25  5:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Phil Hord, Junio C Hamano, git, phil.hord

On Sat, Aug 24, 2013 at 10:41:42PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > So we are necessarily reconstructing based on what we know of the
> > syntax. And I think that your rule is OK, because we know that refnames
> > cannot contain a colon.
> 
> What happens with expressions like HEAD^{/test:}?

Ugh, right. Names are more than just refnames.

So I think the only way to do this robustly is to ask get_sha1 to
remember more about what happened. We might even be able to get away
without teaching get_sha1_with_context anything else; it already records
the path, so we should be able to just check whether that is non-empty.

But we use an object_array to store the list of objects, and it has no
room for such a bit. So we'd probably want to refactor that, too.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-25  5:25       ` Junio C Hamano
@ 2013-08-26  7:14         ` Junio C Hamano
  2013-08-26 11:44           ` Phil Hord
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-08-26  7:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Phil Hord, Phil Hord, git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> I think Phil meant that when "git grep" is asked to search within
>> "HEAD:some/path", filenames tacked on at the end should be appended
>> with a '/' separator instead of the usual ':' (e.g.,
>> "HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").
>
> Ah, OK.
>
> I am not sure if we have a proper hint in the revision parser
> machinery, but it may not be too hard to teach rev-cmdline interface
> to keep that kind of information (i.e. "This tree object name is a
> result of parsing '<tree-ish>:path' syntax").

Actually, having thought about this a bit more, I am no longer sure
if this is even a good idea to begin with.

An output line that begins with HEAD:some/path:inner/path.c from
"git grep" is an answer to this question: find the given pattern in
a tree-ish specified with "HEAD:some/path".

On the other hand, HEAD:some/path/inner/path.c is an answer to a
different question: find the pattern in a tree-ish specified with
"HEAD".  The question may optionally be further qualified with "but
limit the search to some/path".  Both the question and its answer
are more intuitive than the former one.

And we have a nice way to ask that question directly, i.e.

    $ git grep -e pattern HEAD some/path

which can be extended naturally to more than one path, e.g.

    $ git grep -e pattern HEAD some/path another/hierarchy

without having to repeat HEAD: part again for each path.

If the user asked the question of the former form, i.e.

    $ git grep -e pattern HEAD:some/path

there may be a reason why the user wanted to ask it in that
(somewhat strange) way.  I am not 100% sure if it is a good idea to
give an answer to a question different from what the user asked by
internally rewriting the question to

    $ git grep -e pattern HEAD -- some/path

So...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26  7:14         ` Junio C Hamano
@ 2013-08-26 11:44           ` Phil Hord
  2013-08-26 16:23             ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Phil Hord @ 2013-08-26 11:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Mon, Aug 26, 2013 at 3:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Jonathan Nieder <jrnieder@gmail.com> writes:
>>
>>> I think Phil meant that when "git grep" is asked to search within
>>> "HEAD:some/path", filenames tacked on at the end should be appended
>>> with a '/' separator instead of the usual ':' (e.g.,
>>> "HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").
>>
>> Ah, OK.
>>
>> I am not sure if we have a proper hint in the revision parser
>> machinery, but it may not be too hard to teach rev-cmdline interface
>> to keep that kind of information (i.e. "This tree object name is a
>> result of parsing '<tree-ish>:path' syntax").
>
> Actually, having thought about this a bit more, I am no longer sure
> if this is even a good idea to begin with.
>
> An output line that begins with HEAD:some/path:inner/path.c from
> "git grep" is an answer to this question: find the given pattern in
> a tree-ish specified with "HEAD:some/path".
>
> On the other hand, HEAD:some/path/inner/path.c is an answer to a
> different question: find the pattern in a tree-ish specified with
> "HEAD".  The question may optionally be further qualified with "but
> limit the search to some/path".  Both the question and its answer
> are more intuitive than the former one.

I disagree.  The man page says that git grep lists the filenames of
matching files.  And it usually does.  When told to search in a
different branch, the filename is helpfully shown in a form that other
git commands recognize, namely $branch:$path. This is useful for
scripts that want to do something with the resulting file names.

But when a path included in the query with the branch, the output is
useless to my scripts or finger memory without some cleanup first.
The aim of this patch is to fix that so the cleanup is not necessary.

   $ git grep -l setup_check HEAD Documentation
   HEAD:Documentation/technical/api-gitattributes.txt

   $ git grep -l setup_check HEAD:Documentation
   HEAD:Documentation:technical/api-gitattributes.txt

The path in the first example is meaningful.  The path in the second
example is erroneous.


> And we have a nice way to ask that question directly, i.e.
>
>     $ git grep -e pattern HEAD some/path
>
> which can be extended naturally to more than one path, e.g.
>
>     $ git grep -e pattern HEAD some/path another/hierarchy
>
> without having to repeat HEAD: part again for each path.

Yes, but that's not always what I want. Sometimes I want to search on
different trees. When doing so, why should I be crippled with broken
output?

    $ git grep -e pattern origin/master:some/path origin/next:another/hierarchy
    origin/master:some/path:sub/dir/foo.txt
    origin/next:another/hierarchy:path/frotz.c

I would prefer to have real paths I can pass to 'git show', ones with
just one meaningful colon rather than two vague ones:

    origin/master:some/path/sub/dir/foo.txt
    origin/next:another/hierarchy/path/frotz.c

> If the user asked the question of the former form, i.e.
>
>     $ git grep -e pattern HEAD:some/path
>
> there may be a reason why the user wanted to ask it in that
> (somewhat strange) way.  I am not 100% sure if it is a good idea to
> give an answer to a question different from what the user asked by
> internally rewriting the question to
>
>     $ git grep -e pattern HEAD -- some/path

We are not rewriting the question at all.

The current code assumes the user gave only an object name and is
trying to help by prefixing that name on the matched path using the
colon as a separator, as would be the norm.  But that is the wrong
separator in some cases, specifically when the tree reference includes
a path.

Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 11:44           ` Phil Hord
@ 2013-08-26 16:23             ` Junio C Hamano
  2013-08-26 16:49               ` Phil Hord
  2013-08-26 17:03               ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2013-08-26 16:23 UTC (permalink / raw)
  To: Phil Hord; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

Phil Hord <phil.hord@gmail.com> writes:

> On Mon, Aug 26, 2013 at 3:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Jonathan Nieder <jrnieder@gmail.com> writes:
>>>
>>>> I think Phil meant that when "git grep" is asked to search within
>>>> "HEAD:some/path", filenames tacked on at the end should be appended
>>>> with a '/' separator instead of the usual ':' (e.g.,
>>>> "HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").
>>>
>>> Ah, OK.
>>>
>>> I am not sure if we have a proper hint in the revision parser
>>> machinery, but it may not be too hard to teach rev-cmdline interface
>>> to keep that kind of information (i.e. "This tree object name is a
>>> result of parsing '<tree-ish>:path' syntax").
>>
>> Actually, having thought about this a bit more, I am no longer sure
>> if this is even a good idea to begin with.
>>
>> An output line that begins with HEAD:some/path:inner/path.c from
>> "git grep" is an answer to this question: find the given pattern in
>> a tree-ish specified with "HEAD:some/path".
>>
>> On the other hand, HEAD:some/path/inner/path.c is an answer to a
>> different question: find the pattern in a tree-ish specified with
>> "HEAD".  The question may optionally be further qualified with "but
>> limit the search to some/path".  Both the question and its answer
>> are more intuitive than the former one.
>
> I disagree.  The man page says that git grep lists the filenames of
> matching files.

But you need to realize that the manual page gives a white lie in
order to stay short enough to be readable.  It does give filenames
when reading from the working tree, the index or a top-level tree
object.  When given arbitrary tree object that is not top-level, it
does give filenames relative to the given tree object, which is the
answer HEAD:some/path:inner/path.c to the question "where in the
tree HEAD:some/path do we have hits?".

>> If the user asked the question of the former form, i.e.
>>
>>     $ git grep -e pattern HEAD:some/path
>>
>> there may be a reason why the user wanted to ask it in that
>> (somewhat strange) way.  I am not 100% sure if it is a good idea to
>> give an answer to a question different from what the user asked by
>> internally rewriting the question to
>>
>>     $ git grep -e pattern HEAD -- some/path
>
> We are not rewriting the question at all.

That statement is trapped in a narrow "implementor" mind. I know you
are not rewriting the question in your implementation, but what do
the end users see? It gives an answer to a question different from
they asked; the observed behaviour is as if the question was
rewritten before the machinery went to work.

If your justification were "above says 'there may be a readon why
the user wanted to ask it in that way', i.e. 'find in this tree
object HEAD:some/path and report where hits appear', but the reason
can only be from laziness and/or broken script and the user always
wants the answer from within the top-level tree-ish", then that
argument may make some sense. You need to justify why it is OK to
lose information in the answer by rewriting the colon that separates
the question ("in this tree object") and the answer ("at this path
relative to the tree object given").

Whether you rewrite the input or the output is not important; you
are trying to give an answer to a different question, which is what
I find questionable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 16:23             ` Junio C Hamano
@ 2013-08-26 16:49               ` Phil Hord
  2013-08-26 17:07                 ` Phil Hord
  2013-08-26 17:26                 ` Junio C Hamano
  2013-08-26 17:03               ` Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-26 16:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Mon, Aug 26, 2013 at 12:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Phil Hord <phil.hord@gmail.com> writes:
>
>> On Mon, Aug 26, 2013 at 3:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Jonathan Nieder <jrnieder@gmail.com> writes:
>>>>
>>>>> I think Phil meant that when "git grep" is asked to search within
>>>>> "HEAD:some/path", filenames tacked on at the end should be appended
>>>>> with a '/' separator instead of the usual ':' (e.g.,
>>>>> "HEAD:some/path/inner/path.c", not "HEAD:some/path:inner/path.c").
>>>>
>>>> Ah, OK.
>>>>
>>>> I am not sure if we have a proper hint in the revision parser
>>>> machinery, but it may not be too hard to teach rev-cmdline interface
>>>> to keep that kind of information (i.e. "This tree object name is a
>>>> result of parsing '<tree-ish>:path' syntax").
>>>
>>> Actually, having thought about this a bit more, I am no longer sure
>>> if this is even a good idea to begin with.
>>>
>>> An output line that begins with HEAD:some/path:inner/path.c from
>>> "git grep" is an answer to this question: find the given pattern in
>>> a tree-ish specified with "HEAD:some/path".
>>>
>>> On the other hand, HEAD:some/path/inner/path.c is an answer to a
>>> different question: find the pattern in a tree-ish specified with
>>> "HEAD".  The question may optionally be further qualified with "but
>>> limit the search to some/path".  Both the question and its answer
>>> are more intuitive than the former one.
>>
>> I disagree.  The man page says that git grep lists the filenames of
>> matching files.
>
> But you need to realize that the manual page gives a white lie in
> order to stay short enough to be readable.  It does give filenames
> when reading from the working tree, the index or a top-level tree
> object.  When given arbitrary tree object that is not top-level, it
> does give filenames relative to the given tree object, which is the
> answer HEAD:some/path:inner/path.c to the question "where in the
> tree HEAD:some/path do we have hits?".
>
>>> If the user asked the question of the former form, i.e.
>>>
>>>     $ git grep -e pattern HEAD:some/path
>>>
>>> there may be a reason why the user wanted to ask it in that
>>> (somewhat strange) way.  I am not 100% sure if it is a good idea to
>>> give an answer to a question different from what the user asked by
>>> internally rewriting the question to
>>>
>>>     $ git grep -e pattern HEAD -- some/path
>>
>> We are not rewriting the question at all.
>
> That statement is trapped in a narrow "implementor" mind. I know you
> are not rewriting the question in your implementation, but what do
> the end users see? It gives an answer to a question different from
> they asked; the observed behaviour is as if the question was
> rewritten before the machinery went to work.
>
> If your justification were "above says 'there may be a readon why
> the user wanted to ask it in that way', i.e. 'find in this tree
> object HEAD:some/path and report where hits appear', but the reason
> can only be from laziness and/or broken script and the user always
> wants the answer from within the top-level tree-ish", then that
> argument may make some sense. You need to justify why it is OK to
> lose information in the answer by rewriting the colon that separates
> the question ("in this tree object") and the answer ("at this path
> relative to the tree object given").
>
> Whether you rewrite the input or the output is not important; you
> are trying to give an answer to a different question, which is what
> I find questionable.

Ok, so if I can summarize what I am inferring from your objection:

 1. The (tree-path, found-path) pair is useful information to get back
from git-grep.

 2. A colon is used to delimit these pieces of information, just as a
colon is used to delimit the filename from the matched-line results.

 3. The fact that the colon is also the separator used in object refs
is mere coincidence; the colon was _not_ chosen because it
conveniently turns the results list into valid object references.  A
comma could have been instead, or even a \t.

Have I got that right?

If so, then I would like to point out to you the convenience I
accidentally encountered using this tool.  Perhaps you didn't realize
how helpful it was when you chose to use a colon there.

On the other hand, perhaps a colon is an unfortunate syntactic
collision which should be corrected in the future.

Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 16:23             ` Junio C Hamano
  2013-08-26 16:49               ` Phil Hord
@ 2013-08-26 17:03               ` Junio C Hamano
  2013-08-26 17:19                 ` Phil Hord
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-08-26 17:03 UTC (permalink / raw)
  To: Phil Hord; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> If your justification were "above says 'there may be a readon why
> the user wanted to ask it in that way', i.e. 'find in this tree
> object HEAD:some/path and report where hits appear', but the reason
> can only be from laziness and/or broken script and the user always
> wants the answer from within the top-level tree-ish", then that
> argument may make some sense. You need to justify why it is OK to
> lose information in the answer by rewriting the colon that separates
> the question ("in this tree object") and the answer ("at this path
> relative to the tree object given").
>
> Whether you rewrite the input or the output is not important; you
> are trying to give an answer to a different question, which is what
> I find questionable.

For example, one of the cases the proposed change will break that I
am worried about is a script that wants to take N trees and a
pattern, and report where in the given trees hits appear, something
like:

git grep -c -e $pattern "$@" |
perl -e '
	my @trees = @ARGV;
	my %found = ();
	while (<>) {
	        my $line = $_;
	        for (@trees) {
			my $tree_prefix = $_ . ":";
			my $len = len($tree_prefix);
			if (substr($line, 0, $len) eq $tree_prefix) {
				my ($path_count) = substr($line, $len);
				my ($path, $count) = $path_count =~ /^(.*):(\d+)$/
				$found{$tree} = [$path, $count];
			}
		}
	}
	# Do stats on %found
' "$@"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 16:49               ` Phil Hord
@ 2013-08-26 17:07                 ` Phil Hord
  2013-08-26 17:26                 ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-26 17:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Mon, Aug 26, 2013 at 12:49 PM, Phil Hord <phil.hord@gmail.com> wrote:
> If so, then I would like to point out to you the convenience I
> accidentally encountered using this tool.  Perhaps you didn't realize
> how helpful it was when you chose to use a colon there.

My itch comes from a case where I am looking for references in some
other branch and I want to narrow my search.

  $ git grep object origin/master
  origin/master:.gitignore:/git-count-objects
  origin/master:.gitignore:/git-fsck-objects
  origin/master:.gitignore:/git-hash-object
  <8600 lines more deleted>

I find hits in the region that interests me and then I try to zoom in
there.  Conveniently, the path I want to search is right there on my
screen.  So I copy the part of the object reference I want to limit my
search to, and I try again:

   $ git grep object origin/master:builtin/

Now, I find the file that has the code I wanted.  But I want to do
something fancier to it than grep, so this time I copy-and-paste the
filename into my command line after typing 'git show':

   $ git show origin/master:builtin/pack-objects.c | vim -

But this doesn't work if the delimiter used was a colon.

   $ git show origin/master:builtin:pack-objects.c | vim -

I have to edit my copy-and-pasted text, which means I have think about
it a bit, and it interrupts all the other things I was thinking about
already.  Eventually I might learn to do this edit automatically,
except it is not needed most of the time.  It is only needed when I
provide a tree-path instead of a "branch <space> path". In the latter
case, the full path is spelled out for me correctly.  And in all other
cases the filenames provided by git-grep are valid filenames or object
names.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 17:03               ` Junio C Hamano
@ 2013-08-26 17:19                 ` Phil Hord
  0 siblings, 0 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-26 17:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Mon, Aug 26, 2013 at 1:03 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> If your justification were "above says 'there may be a readon why
>> the user wanted to ask it in that way', i.e. 'find in this tree
>> object HEAD:some/path and report where hits appear', but the reason
>> can only be from laziness and/or broken script and the user always
>> wants the answer from within the top-level tree-ish", then that
>> argument may make some sense. You need to justify why it is OK to
>> lose information in the answer by rewriting the colon that separates
>> the question ("in this tree object") and the answer ("at this path
>> relative to the tree object given").
>>
>> Whether you rewrite the input or the output is not important; you
>> are trying to give an answer to a different question, which is what
>> I find questionable.
>
> For example, one of the cases the proposed change will break that I
> am worried about is a script that wants to take N trees and a
> pattern, and report where in the given trees hits appear, something
> like:
>
> git grep -c -e $pattern "$@" |
> perl -e '
>         my @trees = @ARGV;
>         my %found = ();
>         while (<>) {
>                 my $line = $_;
>                 for (@trees) {
>                         my $tree_prefix = $_ . ":";
>                         my $len = len($tree_prefix);
>                         if (substr($line, 0, $len) eq $tree_prefix) {
>                                 my ($path_count) = substr($line, $len);
>                                 my ($path, $count) = $path_count =~ /^(.*):(\d+)$/
>                                 $found{$tree} = [$path, $count];
>                         }
>                 }
>         }
>         # Do stats on %found
> ' "$@"
>

I do understand there is potential breakage when a script is parsing
the output.  I did not consider that this was a feature someone may
want; it really only looks like a breakage to me, for the reasons I've
already given.

It's surprising just how broken it looks to me (given that I now know
it is not) since all the other filenames output from 'git-grep -l' are
valid filenames or object references.  There is only this one
tree-path instance which does not.  I suppose I will learn to live
with it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 16:49               ` Phil Hord
  2013-08-26 17:07                 ` Phil Hord
@ 2013-08-26 17:26                 ` Junio C Hamano
  2013-08-26 17:45                   ` Phil Hord
  2013-08-27  4:07                   ` Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2013-08-26 17:26 UTC (permalink / raw)
  To: Phil Hord; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

Phil Hord <phil.hord@gmail.com> writes:

>> If your justification were "above says 'there may be a readon why
>> the user wanted to ask it in that way', i.e. 'find in this tree
>> object HEAD:some/path and report where hits appear', but the reason
>> can only be from laziness and/or broken script and the user always
>> wants the answer from within the top-level tree-ish", then that
>> argument may make some sense. You need to justify why it is OK to
>> lose information in the answer by rewriting the colon that separates
>> the question ("in this tree object") and the answer ("at this path
>> relative to the tree object given").
>>
>> Whether you rewrite the input or the output is not important; you
>> are trying to give an answer to a different question, which is what
>> I find questionable.
>
> Ok, so if I can summarize what I am inferring from your objection:
>
>  1. The (tree-path, found-path) pair is useful information to get back
> from git-grep.

At least that was the intent. I can be persuaded that your change
will not break anybody if you successfully argue that it is not a
useful information, though.

>  2. A colon is used to delimit these pieces of information, just as a
> colon is used to delimit the filename from the matched-line results.
>
>  3. The fact that the colon is also the separator used in object refs
> is mere coincidence; the colon was _not_ chosen because it
> conveniently turns the results list into valid object references.  A
> comma could have been instead, or even a \t.

Not necessarily.  If the user is asking the question in a more
natural way (I want to see where in 'next' branch's tip commit hits
appear, by the way, I know I am only interested in builtin/ so I'd
give pathspec as well when I am asking this question), the output
does give <commit> <colon> <path>, so it is more than coincidence.

I do not think it is worth doing only for this particular use case,
but it might be a good change to allow A:B:C to be parsed as a
proper extended SHA-1 expression and make it yield

	git rev-parse $(git rev-parse $(git rev-parse A):B):C

Right now, "B:C" is used as a path inside tree-ish "A", but I think
we can safely fall back, when path B:C does not appear in tree-ish
A, to see if path B appears in it and is a tree, and then turn it
into a look-up of path C in that tree A:B.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 17:26                 ` Junio C Hamano
@ 2013-08-26 17:45                   ` Phil Hord
  2013-08-27  4:07                   ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-26 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Mon, Aug 26, 2013 at 1:26 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Phil Hord <phil.hord@gmail.com> writes:
>
>>> If your justification were "above says 'there may be a readon why
>>> the user wanted to ask it in that way', i.e. 'find in this tree
>>> object HEAD:some/path and report where hits appear', but the reason
>>> can only be from laziness and/or broken script and the user always
>>> wants the answer from within the top-level tree-ish", then that
>>> argument may make some sense. You need to justify why it is OK to
>>> lose information in the answer by rewriting the colon that separates
>>> the question ("in this tree object") and the answer ("at this path
>>> relative to the tree object given").
>>>
>>> Whether you rewrite the input or the output is not important; you
>>> are trying to give an answer to a different question, which is what
>>> I find questionable.
>>
>> Ok, so if I can summarize what I am inferring from your objection:
>>
>>  1. The (tree-path, found-path) pair is useful information to get back
>> from git-grep.
>
> At least that was the intent. I can be persuaded that your change
> will not break anybody if you successfully argue that it is not a
> useful information, though.

Anyone who is interested in the matched trees from the original
command-line only needs to compare the matched paths' prefixes with
the original args to see which one is responsible for each hit.  But
this is not convincing to me because they may not know the original
args to the grep command.

I do not know if this a good reason to keep supporting this mode,
though.  I can just as easily see some script being confused by four
colons in

    origin/master:some/:path/file.c:1:int main()

instead of only three that he is used to because someone passed in a
longer tree-path than expected.

I don't know if I can make that argument.

I think I _can_ argue that the colon separator here is historically
just a part of the filename because it is not affected by "--null".

>>  2. A colon is used to delimit these pieces of information, just as a
>> colon is used to delimit the filename from the matched-line results.
>>
>>  3. The fact that the colon is also the separator used in object refs
>> is mere coincidence; the colon was _not_ chosen because it
>> conveniently turns the results list into valid object references.  A
>> comma could have been instead, or even a \t.
>
> Not necessarily.  If the user is asking the question in a more
> natural way (I want to see where in 'next' branch's tip commit hits
> appear, by the way, I know I am only interested in builtin/ so I'd
> give pathspec as well when I am asking this question), the output
> does give <commit> <colon> <path>, so it is more than coincidence.
>
> I do not think it is worth doing only for this particular use case,
> but it might be a good change to allow A:B:C to be parsed as a
> proper extended SHA-1 expression and make it yield
>
>         git rev-parse $(git rev-parse $(git rev-parse A):B):C
>
> Right now, "B:C" is used as a path inside tree-ish "A", but I think
> we can safely fall back, when path B:C does not appear in tree-ish
> A, to see if path B appears in it and is a tree, and then turn it
> into a look-up of path C in that tree A:B.

That would also solve this problem, usually.  But I don't like it
nearly as much as my patch, and I agree it seems extreme for this one
corner-case.

Phil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-26 17:26                 ` Junio C Hamano
  2013-08-26 17:45                   ` Phil Hord
@ 2013-08-27  4:07                   ` Junio C Hamano
  2013-08-27 11:54                     ` Phil Hord
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-08-27  4:07 UTC (permalink / raw)
  To: Phil Hord; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> Not necessarily.  If the user is asking the question in a more
> natural way (I want to see where in 'next' branch's tip commit hits
> appear, by the way, I know I am only interested in builtin/ so I'd
> give pathspec as well when I am asking this question), the output
> does give <commit> <colon> <path>, so it is more than coincidence.

This part needs to be qualified.  "Natural" is of course in the eyes
of beholder.  If we assume that your #1 holds true (i.e. the tuple
<in which tree object are we reporting, what path we saw hits> is
important and merging them into one <in what blob we saw hits> lose
information), then it is natural to ask "inside origin/master tree,
where do I have hits?  By the way, I am only interested in builtin/"
and get "origin/master:builtin/pack-objects.c" as an answer (this is
from your earlier example), than asking "inside origin/master:builtin
tree, where do I have hits?"

If we do not consider #1 is false and the tree information can be
discarded, then it does not matter if we converted the colon after
origin/master to slash when we answer the latter question, and the
latter question stops being unnatural.

> ...
> but it might be a good change to allow A:B:C to be parsed as a
> proper extended SHA-1 expression and make it yield
>
> 	git rev-parse $(git rev-parse $(git rev-parse A):B):C
>
> Right now, "B:C" is used as a path inside tree-ish "A", but I think
> we can safely fall back, when path B:C does not appear in tree-ish
> A, to see if path B appears in it and is a tree, and then turn it
> into a look-up of path C in that tree A:B.

And if we want to keep the <tree, path> tuple, but still want to
make it easier to work with the output, allowing A:B:C to be parsed
as an extended SHA-1 expression would be a reasonable solution, not
a work-around. The answer is given to the question asked in either
way (either "in origin/master, but limited to these pathspecs" or
"in the tree origin/master:builtin/") without losing information,
but the issue you had is that the answer to the latter form of
question is not understood by the object name parser, which I
personally think is a bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC/PATCH] Fix path prefixing in grep_object
  2013-08-27  4:07                   ` Junio C Hamano
@ 2013-08-27 11:54                     ` Phil Hord
  0 siblings, 0 replies; 19+ messages in thread
From: Phil Hord @ 2013-08-27 11:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Phil Hord, git@vger.kernel.org

On Tue, Aug 27, 2013 at 12:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Not necessarily.  If the user is asking the question in a more
>> natural way (I want to see where in 'next' branch's tip commit hits
>> appear, by the way, I know I am only interested in builtin/ so I'd
>> give pathspec as well when I am asking this question), the output
>> does give <commit> <colon> <path>, so it is more than coincidence.
>
> This part needs to be qualified.  "Natural" is of course in the eyes
> of beholder.  If we assume that your #1 holds true (i.e. the tuple
> <in which tree object are we reporting, what path we saw hits> is
> important and merging them into one <in what blob we saw hits> lose
> information),

"My #1" is really "what I inferred from your statements."  I did not
think the tuple was important, but I agree that may be my
shortsightedness.  If the tuple is important, then it seems to me that
the --null behavior is a bug (the colon is left intact); but changing
it now seems senseless or harmful.

> then it is natural to ask "inside origin/master tree,
> where do I have hits?  By the way, I am only interested in builtin/"
> and get "origin/master:builtin/pack-objects.c" as an answer (this is
> from your earlier example), than asking "inside origin/master:builtin
> tree, where do I have hits?"
>
> If we do not consider #1 is false and the tree information can be
> discarded, then it does not matter if we converted the colon after
> origin/master to slash when we answer the latter question, and the
> latter question stops being unnatural.
>
>> ...
>> but it might be a good change to allow A:B:C to be parsed as a
>> proper extended SHA-1 expression and make it yield
>>
>>       git rev-parse $(git rev-parse $(git rev-parse A):B):C
>>
>> Right now, "B:C" is used as a path inside tree-ish "A", but I think
>> we can safely fall back, when path B:C does not appear in tree-ish
>> A, to see if path B appears in it and is a tree, and then turn it
>> into a look-up of path C in that tree A:B.
>
> And if we want to keep the <tree, path> tuple, but still want to
> make it easier to work with the output, allowing A:B:C to be parsed
> as an extended SHA-1 expression would be a reasonable solution, not
> a work-around. The answer is given to the question asked in either
> way (either "in origin/master, but limited to these pathspecs" or
> "in the tree origin/master:builtin/") without losing information,
> but the issue you had is that the answer to the latter form of
> question is not understood by the object name parser, which I
> personally think is a bug.

I am beginning to agree, though we discovered some other weird output
from grep which also does not parse. (origin/master:../relative/path).

It seems that fixing this bug could be very confusing for
get_sha1_with_context unless the context was rewritten to match the
traditional syntax.

P

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-08-27 11:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-25  1:35 [RFC/PATCH] Fix path prefixing in grep_object Phil Hord
2013-08-25  2:07 ` Phil Hord
2013-08-25  3:35   ` Junio C Hamano
2013-08-25  4:23     ` Jonathan Nieder
2013-08-25  5:25       ` Junio C Hamano
2013-08-26  7:14         ` Junio C Hamano
2013-08-26 11:44           ` Phil Hord
2013-08-26 16:23             ` Junio C Hamano
2013-08-26 16:49               ` Phil Hord
2013-08-26 17:07                 ` Phil Hord
2013-08-26 17:26                 ` Junio C Hamano
2013-08-26 17:45                   ` Phil Hord
2013-08-27  4:07                   ` Junio C Hamano
2013-08-27 11:54                     ` Phil Hord
2013-08-26 17:03               ` Junio C Hamano
2013-08-26 17:19                 ` Phil Hord
2013-08-25  4:41 ` Jeff King
2013-08-25  5:41   ` Jonathan Nieder
2013-08-25  5:54     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).