git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Enhancing --show-function and --function-context in default configurations
@ 2021-08-01 21:40 heapcrash heapcrash
  2021-08-02  2:21 ` Junio C Hamano
  2021-08-02  8:45 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 7+ messages in thread
From: heapcrash heapcrash @ 2021-08-01 21:40 UTC (permalink / raw)
  To: git

Hello all!

Thanks to all of the Git community for making such amazing tools.

Some of my favorite features of Git are the --show-function and
--function-context features of git {grep,log,diff}.

However, the default configuration leaves a bit to be desired --
setting some simple flags in ~/.gitattributes for e.g.

    *.cpp diff=cpp
    *.py diff=python

Makes these features MUCH more accurate and usable.  However, one has
to know about gitattributes, diff filters, xfuncname, etc. in order to
turn these settings on.

I'd like to contribute changes to Git that makes the "obvious"
correlations be the default setting.  Before I start, I wanted to
gauge whether these changes would be accepted or not.  As far as I can
tell, these would not change the default behavior of plain git
{grep,log,diff} unless the --show-function or --function-context flags
are specified -- and if they are, the results would be improved.

Should I work on a patch that does this?

Thanks in advance,
Heapcrash

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-01 21:40 Enhancing --show-function and --function-context in default configurations heapcrash heapcrash
@ 2021-08-02  2:21 ` Junio C Hamano
  2021-08-02  8:45 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2021-08-02  2:21 UTC (permalink / raw)
  To: heapcrash heapcrash; +Cc: git

heapcrash heapcrash <heapcrash@gmail.com> writes:

> gauge whether these changes would be accepted or not.  As far as I can
> tell, these would not change the default behavior of plain git
> {grep,log,diff} unless the --show-function or --function-context flags
> are specified

Two other places may be diff hunk headers and --diff-words output, I think.

> Should I work on a patch that does this?

That depends.  If you are going to introduce a mechanism to
introduce hardcoded configurations, depending on how it is done, it
will become a huge security headache.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-01 21:40 Enhancing --show-function and --function-context in default configurations heapcrash heapcrash
  2021-08-02  2:21 ` Junio C Hamano
@ 2021-08-02  8:45 ` Ævar Arnfjörð Bjarmason
  2021-08-02 16:17   ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-02  8:45 UTC (permalink / raw)
  To: heapcrash heapcrash; +Cc: git


On Sun, Aug 01 2021, heapcrash heapcrash wrote:

> Hello all!
>
> Thanks to all of the Git community for making such amazing tools.
>
> Some of my favorite features of Git are the --show-function and
> --function-context features of git {grep,log,diff}.
>
> However, the default configuration leaves a bit to be desired --
> setting some simple flags in ~/.gitattributes for e.g.
>
>     *.cpp diff=cpp
>     *.py diff=python
>
> Makes these features MUCH more accurate and usable.  However, one has
> to know about gitattributes, diff filters, xfuncname, etc. in order to
> turn these settings on.
>
> I'd like to contribute changes to Git that makes the "obvious"
> correlations be the default setting.  Before I start, I wanted to
> gauge whether these changes would be accepted or not.  As far as I can
> tell, these would not change the default behavior of plain git
> {grep,log,diff} unless the --show-function or --function-context flags
> are specified -- and if they are, the results would be improved.
>
> Should I work on a patch that does this?

As Junio hinted at in his reply this doesn't just change diff behavior
with the -W or --function-context flag, but changes the work we do (even
if the output is the same) on all unified diff output.

I.e. to generate the "@@" context line we by default use a C function in
xdiff/, this is replaced with a regex-in-a-loop invoking the
userdiff.[ch] code.

I would like to see us have a setting to turn these on by default, but
think it would be better to make that a diff.* config setting to put
into ~/.gitconfig, i.e. we'd extend git itself to know about a list of
extensions for the given userdiff drivers, and use them when rendering
diffs.

It makes emitting the diffs take more CPU, but the same is true of other
options like colorMoved etc, so that in itself is not a dealbreaker.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-02  8:45 ` Ævar Arnfjörð Bjarmason
@ 2021-08-02 16:17   ` Jeff King
  2021-08-02 17:06     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2021-08-02 16:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: heapcrash heapcrash, git

On Mon, Aug 02, 2021 at 10:45:25AM +0200, Ævar Arnfjörð Bjarmason wrote:

> I would like to see us have a setting to turn these on by default, but
> think it would be better to make that a diff.* config setting to put
> into ~/.gitconfig, i.e. we'd extend git itself to know about a list of
> extensions for the given userdiff drivers, and use them when rendering
> diffs.

A long time ago we discussed doing this. The relevant thread is:

  https://lore.kernel.org/git/20111216110000.GA15676@sigill.intra.peff.net/

which references a few others:

  https://lore.kernel.org/git/4E569F10.8060808@panasas.com/

  https://lore.kernel.org/git/4E6E928A.6080003@sunshineco.com/

From my re-read of the threads, it seemed like people were mostly on
board with the idea, but:

  - there was some question about a few of the more obscure extensions
    (e.g., '.m' as objective-c versus matlab). Obviously we could drop
    any contentious ones, though I do wonder if this is simply opening
    up a can of worms where people will fight about what ought to go
    into the "official" list.

  - there was some question over whether some of our builtin funcname
    regexps were actually worse than the default behavior (especially
    the "cpp" one). In response we started using .gitattributes more
    heavily within git.git itself, and we've seen quite a few
    improvements in the intervening decade.

    So hopefully there is no reason anybody would _not_ want the
    content-specific driver to kick in, assuming that the extension
    mapping is accurate. There should be no security risk or anything
    like that, since we'd already respect such a mapping from an
    untrusted repository via its .gitattributes file.

It looks like Junio picked up the patch at one point, but it got bumped
to the next release cycle. It was marked as "expecting a re-roll" until
finally getting discarded in Oct 2011; you can see the progression
by searching for "jk/default-attr":

  https://lore.kernel.org/git/?q=jk%2Fdefault-attr

Then in Dec 2011 I posted that re-roll (the top link I gave above), and
it looks like it never got picked up.  I don't see any conclusions
either way in the thread, so I think it just kind of died out due to
lack of anybody pushing it forward (and I don't remember making a
decision either way on that; it may have been the "can of worms" thing
above scaring me off).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-02 16:17   ` Jeff King
@ 2021-08-02 17:06     ` Junio C Hamano
  2021-08-03  2:34       ` heapcrash heapcrash
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2021-08-02 17:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, heapcrash heapcrash, git

Jeff King <peff@peff.net> writes:

> On Mon, Aug 02, 2021 at 10:45:25AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I would like to see us have a setting to turn these on by default, but
>> think it would be better to make that a diff.* config setting to put
>> into ~/.gitconfig, i.e. we'd extend git itself to know about a list of
>> extensions for the given userdiff drivers, and use them when rendering
>> diffs.
>
> A long time ago we discussed doing this. The relevant thread is:
>
>   https://lore.kernel.org/git/20111216110000.GA15676@sigill.intra.peff.net/
>
> which references a few others:
>
>   https://lore.kernel.org/git/4E569F10.8060808@panasas.com/
>
>   https://lore.kernel.org/git/4E6E928A.6080003@sunshineco.com/
> ...

Thanks for pointers.

One good suggestion given there was to use diff=c and diff=perl in
our own .gitattributes to use the patterns ourselves, which we seem
to have been doing just fine ;-)

As long as the default built-in ones are

 (1) at least 90% of the time improvement over, or at least is not
     broken compared to, the unconfigured case, and

 (2) at the lowest priority that users can easily countermand for
     the rest 10% cases

I do not think it is too bad to resurrect the old patches from these
threads.

Thanks.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-02 17:06     ` Junio C Hamano
@ 2021-08-03  2:34       ` heapcrash heapcrash
  2021-08-03 14:55         ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: heapcrash heapcrash @ 2021-08-03  2:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ævar Arnfjörð Bjarmason, git

Thanks all for the feedback!  I'll try to address it below:

> Two other places may be diff hunk headers and --diff-words output, I think.

I didn't think of those.  Does this mean that diffs generated with a
given e.g. diff=cpp configuration might not apply cleanly if run on
some other user's system without that setting?

> That depends.  If you are going to introduce a mechanism to
> introduce hardcoded configurations, depending on how it is done, it
> will become a huge security headache.

I don't intend to add hard-coded configurations, just hard-coded
defaults.  User configurations (~/.gitattributes and gitconfig
diff.funcname/diff.xcfuncname) will still take precedence.

> It makes emitting the diffs take more CPU, but the same is true of other
> options like colorMoved etc, so that in itself is not a dealbreaker.

I didn't think of this scenario, that it would add CPU time even
without -W/--function-context/--show-function.  I'd definitely be fine
with preserving the current behavior in these cases (more below).

> As long as the default built-in ones are
>
> (1) at least 90% of the time improvement over, or at least is not
>     broken compared to, the unconfigured case, and
>
> (2) at the lowest priority that users can easily countermand for
>     the rest 10% cases
>
> I do not think it is too bad to resurrect the old patches from these

The currently-hard-coded regexps for e.g. cpp, python, objc, etc. (in
userdiff.c) are all VAST improvements over the default, which is
effectively '^[A-Za-z0-9_:]+' (i.e., any word which starts on the
zeroth column, IIRC).  This has lots of issues for e.g. Python where
the function name is often indented (e.g. inside a class definition)
and cpp is compatible with, and an improvement over, the default "C"
setting (which is particularly broken as it marks any "goto" label as
a function start / end).

All of those points covered, I think we can make this work in a way
that preserves backward compatibility (no defaults at all, user must
manually configure ~/.gitattributes) by only setting "smart" defaults
(e.g. "*.cpp diff=cpp", "*.cc diff=cpp" etc) when the command is
EXPLICITLY invoked with -W/--function-context/--show-function.

Another case I didn't think of in the original post is with the "git
log -L:funcname:path/to/file.cpp" option, which tracks changes within
a function over time.  Having "better" default function boundary
detection here would also be useful.

Ultimately, I agree we should not have any extra overhead for any
commands (log, diff, grep, etc) unless the user EXPLICITLY uses flags
that indicate function boundary detection are key to functionality of
those flags.

On Mon, Aug 2, 2021 at 12:06 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > On Mon, Aug 02, 2021 at 10:45:25AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> >> I would like to see us have a setting to turn these on by default, but
> >> think it would be better to make that a diff.* config setting to put
> >> into ~/.gitconfig, i.e. we'd extend git itself to know about a list of
> >> extensions for the given userdiff drivers, and use them when rendering
> >> diffs.
> >
> > A long time ago we discussed doing this. The relevant thread is:
> >
> >   https://lore.kernel.org/git/20111216110000.GA15676@sigill.intra.peff.net/
> >
> > which references a few others:
> >
> >   https://lore.kernel.org/git/4E569F10.8060808@panasas.com/
> >
> >   https://lore.kernel.org/git/4E6E928A.6080003@sunshineco.com/
> > ...
>
> Thanks for pointers.
>
> One good suggestion given there was to use diff=c and diff=perl in
> our own .gitattributes to use the patterns ourselves, which we seem
> to have been doing just fine ;-)
>
> As long as the default built-in ones are
>
>  (1) at least 90% of the time improvement over, or at least is not
>      broken compared to, the unconfigured case, and
>
>  (2) at the lowest priority that users can easily countermand for
>      the rest 10% cases
>
> I do not think it is too bad to resurrect the old patches from these
> threads.
>
> Thanks.
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enhancing --show-function and --function-context in default configurations
  2021-08-03  2:34       ` heapcrash heapcrash
@ 2021-08-03 14:55         ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2021-08-03 14:55 UTC (permalink / raw)
  To: heapcrash heapcrash
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, git

On Mon, Aug 02, 2021 at 09:34:32PM -0500, heapcrash heapcrash wrote:

> Thanks all for the feedback!  I'll try to address it below:
> 
> > Two other places may be diff hunk headers and --diff-words output, I think.
> 
> I didn't think of those.  Does this mean that diffs generated with a
> given e.g. diff=cpp configuration might not apply cleanly if run on
> some other user's system without that setting?

No, because the text in the hunk headers is purely cosmetic. They are
just for humans, and applying the patch ignores them.

The word-diffs are likewise just meant for human consumption, and can't
be applied at all.

> > It makes emitting the diffs take more CPU, but the same is true of other
> > options like colorMoved etc, so that in itself is not a dealbreaker.
> 
> I didn't think of this scenario, that it would add CPU time even
> without -W/--function-context/--show-function.  I'd definitely be fine
> with preserving the current behavior in these cases (more below).

I think the CPU increase is negligible. Either way, the funcname header
involves walking backwards through the file applying a pattern to see if
a line is a good candidate. The type-specific patterns are a little more
complicated, but I doubt the difference is even measurable in practice.

If we are going to have defaults for extension-to-diff mapping, I would
prefer apply them consistently (so for -W, but also for funcnames,
word-diffs, etc). That is both easier to implement and easier to explain
to users (and assuming our mapping is accurate, more likely to be what
they want).

> Another case I didn't think of in the original post is with the "git
> log -L:funcname:path/to/file.cpp" option, which tracks changes within
> a function over time.  Having "better" default function boundary
> detection here would also be useful.

Yes, I think this uses the same diff funcname patterns, so once we have
default mappings, it would start to use them.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-03 14:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-01 21:40 Enhancing --show-function and --function-context in default configurations heapcrash heapcrash
2021-08-02  2:21 ` Junio C Hamano
2021-08-02  8:45 ` Ævar Arnfjörð Bjarmason
2021-08-02 16:17   ` Jeff King
2021-08-02 17:06     ` Junio C Hamano
2021-08-03  2:34       ` heapcrash heapcrash
2021-08-03 14:55         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).