All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Sixt <j.sixt@viscovery.net>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Thomas Rast <tr@thomasrast.ch>
Subject: Re: [RFC/PATCH] diff: simplify cpp funcname regex
Date: Fri, 07 Mar 2014 08:23:05 +0100	[thread overview]
Message-ID: <531973D9.9070803@viscovery.net> (raw)
In-Reply-To: <20140306212835.GA11743@sigill.intra.peff.net>

Am 3/6/2014 22:28, schrieb Jeff King:
> On Wed, Mar 05, 2014 at 08:58:26AM +0100, Johannes Sixt wrote:
>> The pattern I chose also catches variable definition, not just
>> functions. That is what I need, but it hurts grep --function-context
>> That's the reason I didn't forward the patch, yet.
> 
> If by variable definition you mean:
> 
>    struct foo bar = {
>   -       old
>   +       new
>    };
> 
> I'd think that would be covered by the existing "struct|class|enum".
> Though I think we'd want to also allow keywords in front of it, like
> "static". I suspect the original was more meant to find:
> 
>    struct foo {
>   -old
>   +new
>    };

No, I meant lines like

    static double var;
   -static int old;
   +static int new;

The motivation is to show hints where in a file the change is located:
Anything that could be of significance for the author should be picked up.

But that does not necessarily help grep --function-context. For example,
when there is a longish block of global variable definitions and there is
a match in the middle, then --function-context would provide no context
because the line itself would be regarded as the beginning of a
"function", i.e., the context, and the next line (which also matches the
pattern) would be the beginning of the *next* function, and would not be
in the context anymore.

> 
>> The parts of the pattern have the following flaws:
>>
>> - The first part matches an identifier followed immediately by a colon and
>>   arbitrary text and is intended to reject goto labels and C++ access
>>   specifiers (public, private, protected). But this pattern also rejects
>>   C++ constructs, which look like this:
>>
>>     MyClass::MyClass()
>>     MyClass::~MyClass()
>>     MyClass::Item MyClass::Find(...
> 
> Makes sense. I noticed your fix is to look for end-of-line or comments
> afterwards.  Would it be simpler to just check for a non-colon, like:
> 
>   !^[ \t]*[A-Za-z_][A-Za-z_0-9]*:($|[^:])

I want to match [[:space:]] after the label's colon, because I have lot's
of C++ files with CRLF, and I need to match the CR. Your more liberal
pattern would fit as well, although it would pick up a bit field as in

   struct foo {
      unsigned
        flag: 1;
   -old
   +new

I would not mind ignoring this case ("garbage in, garbage out" ;-).

>> - The second part matches an identifier followed by a list of qualified
>>   names (i.e. identifiers separated by the C++ scope operator '::')
>> [...]
> 
> A tried to keep the "looks like a function definition" bit in mine, and
> yours loosens this quite a bit more. I think that may be OK. That is, I
> do not think there is any reason for somebody to do:
> 
>     void foo() {
>     call_to_bar();
>    -old
>    +new
>     }
> 
> That is, nobody would put a function _call_ without indentation. If
> something has alphanumerics at the left-most column, then it is probably
> interesting no matter what.
> 
>> - The third part of the pattern finally matches compound definitions. But
>>   it forgets about unions and namespaces, and also skips single-line
>>   definitions
>>
>>     struct random_iterator_tag {};
>>
>>   because no semicolon can occur on the line.
> 
> I don't see how that is an interesting line. The point is to find a
> block that is surrounding the changes, but that is not surrounding
> the lines below.

I more often than not want to have an answer to the question "where?", not
"wherein?" Then anything that helps locate a hunk is useful.

(The particular example, an empty struct, looks strange for C programmers,
of course, but it's a common idiom in C++ when it comes to template
meta-programming.)

>> Notice that all interesting anchor points begin with an identifier or
>> keyword. But since there is a large variety of syntactical constructs after
>> the first "word", the simplest is to require only this word and accept
>> everything else. Therefore, this boils down to a line that begins with a
>> letter or underscore (optionally preceded by the C++ scope operator '::'
>> to accept functions returning a type anchored at the global namespace).
>> Replace the second and third part by a single pattern that picks such a
>> line.
> 
> Yeah, this bit makes sense to me.
> 
> Both yours and mine will find the first line here in things like:
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> but I think that is OK. There _isn't_ any interesting surrounding
> context here. The current code will sometimes come up with an empty
> funcline (which is good), but it may just as easily come up with a
> totally bogus funcline in a case like:
> 
>    void unrelated(void)
>    {
>    }
> 
>    void foo(void);
>   -void bar(void);
>   +void bar(int arg);
> 
> So trying to be very restrictive and say "that doesn't look like a
> function" does not really buy us anything (and it creates tons of false
> negatives, as you documented, because C++ syntax has all kinds of crazy
> stuff).
> 
> _If_ the backwards search learned to terminate (e.g., seeing the "^}"
> line and saying "well, we can't be inside a function"), then such
> negative lines might be useful for coming up with an empty funcname
> rather than the bogus "void foo(void);". But we do not do that
> currently, and I do not think it is that useful (the funcname above is
> arguably just as or more useful than an empty one).

As I said, my motivation is not so much to find a "container", but rather
a clue to help locate a change while reading the patch text. I can speak
for myself, but I have no idea what is more important for the majority.

-- Hannes

  reply	other threads:[~2014-03-07  7:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05  0:36 [RFC/PATCH] diff: simplify cpp funcname regex Jeff King
2014-03-05  7:58 ` Johannes Sixt
2014-03-06 21:28   ` Jeff King
2014-03-07  7:23     ` Johannes Sixt [this message]
2014-03-14  3:54       ` Jeff King
2014-03-14  6:56         ` Johannes Sixt
2014-03-18  5:24           ` Jeff King
2014-03-18  8:02             ` Johannes Sixt
2014-03-18 11:00               ` René Scharfe
2014-03-21 21:07                 ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Johannes Sixt
2014-03-21 21:07                   ` [PATCH 01/10] userdiff: support C++ ->* and .* operators in the word regexp Johannes Sixt
2014-03-21 21:07                   ` [PATCH 02/10] userdiff: support unsigned and long long suffixes of integer constants Johannes Sixt
2014-03-21 21:07                   ` [PATCH 03/10] t4018: an infrastructure to test hunk headers Johannes Sixt
2014-03-21 22:00                     ` Junio C Hamano
2014-03-22  6:56                       ` Johannes Sixt
2014-03-23 19:36                         ` Junio C Hamano
2014-03-24 21:36                     ` Jeff King
2014-03-24 21:39                       ` Jeff King
2014-03-25 20:07                         ` Johannes Sixt
2014-03-25 21:42                           ` Jeff King
2014-03-21 21:07                   ` [PATCH 04/10] t4018: convert perl pattern tests to the new infrastructure Johannes Sixt
2014-03-21 21:07                   ` [PATCH 05/10] t4018: convert java pattern test " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 06/10] t4018: convert custom " Johannes Sixt
2014-03-21 21:07                   ` [PATCH 07/10] t4018: reduce test files for pattern compilation tests Johannes Sixt
2014-03-21 21:07                   ` [PATCH 08/10] t4018: test cases for the built-in cpp pattern Johannes Sixt
2014-03-21 21:07                   ` [PATCH 09/10] t4018: test cases showing that the cpp pattern misses many anchor points Johannes Sixt
2014-03-21 21:07                   ` [PATCH 10/10] userdiff: have 'cpp' hunk header pattern catch more C++ " Johannes Sixt
2014-03-21 22:25                   ` [PATCH 00/10] userdiff: cpp pattern simplification and test framework Junio C Hamano
2014-03-24 21:49                   ` Jeff King
2014-03-05 20:31 ` [RFC/PATCH] diff: simplify cpp funcname regex Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=531973D9.9070803@viscovery.net \
    --to=j.sixt@viscovery.net \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.