public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Norris <computersforpeace@gmail.com>
To: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>, linux-kernel@vger.kernel.org
Subject: Re: [BUG] checkpatch: false positive for commits with quote characters
Date: Thu, 3 Dec 2015 17:39:48 -0800	[thread overview]
Message-ID: <20151204013948.GB116589@google.com> (raw)
In-Reply-To: <1449189388.17296.20.camel@perches.com>

On Thu, Dec 03, 2015 at 04:36:28PM -0800, Joe Perches wrote:
> On Thu, 2015-12-03 at 16:29 -0800, Joe Perches wrote:
> > On Thu, 2015-12-03 at 16:13 -0800, Brian Norris wrote:
> > > Ping? I've hit some different false positives today on the same rule.
> > > I'll stop bothering to report them if no one cares.
> > 
> > Perhaps this:

Aside from the non-breaking spaces your copy of Evolution inserted, it's
an improvement. (It doesn't give a false positive for the patch I
reported.) But it's still got some holes.

> (minus the debugging this time...)
> ---
>  scripts/checkpatch.pl | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index d4960f7..b23dff8 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2397,20 +2397,20 @@ sub process {
>  			$long = 1 if ($line =~ /\bcommit\s+[0-9a-f]{41,}/i);
>  			$space = 0 if ($line =~ /\bcommit [0-9a-f]/i);
>  			$case = 0 if ($line =~ /\b[Cc]ommit\s+[0-9a-f]{5,40}[^A-F]/);
> -			if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
> +			if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("(.*)"\)/i) {

You're got the reverse problem now. This will match too broadly. For
instance, if there is another instance of terminating quote +
parentheses on the same line, then we'll get a false postive. e.g., if I
wrote a patch that included something like this:

  Commit 6dc0dcde406b ("parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool")

Or even if it's wrapped like this:

  Commit 6dc0dcde406b ("parisc: Use
  platform_device_register_simple("rtc-generic")") is cool (as in "cool")

Then the regexes will think the description was:

  parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool

A hacky workaround for that one: only check for (loosely, not proper
regex syntax):

	parsed_description =~ description . ")";

rather than:

	description == parsed_description

Not sure how far you want to go on chasing false positives...

>  				$orig_desc = $1;
>  				$hasparens = 1;
>  			} elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
>  				 defined $rawlines[$linenr] &&
> -				 $rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
> +				 $rawlines[$linenr] =~ /^\s*\("(.*)"\)/) {
>  				$orig_desc = $1;
>  				$hasparens = 1;
>  			} elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("[^"]+$/i &&
>  				 defined $rawlines[$linenr] &&
> -				 $rawlines[$linenr] =~ /^\s*[^"]+"\)/) {
> +				 $rawlines[$linenr] =~ /^\s*.*"\)/) {
>  				$line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
>  				$orig_desc = $1;
> -				$rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
> +				$rawlines[$linenr] =~ /^\s*(.*)"\)/;
>  				$orig_desc .= " " . $1;
>  				$hasparens = 1;
>  			}

BTW, another false positive: just including this text in a commit
triggers a different one:

http://lkml.kernel.org/g/20151113220039.GA74382@google.com

A simple hack (appended, in addition to yours) would be to assume that
when people are trying to include badly-formatted commit hashes, they
will be preceding them with whitespace, the beginning of a line, or
encapsulating punctuation. Or use a URL parser.

Brian

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d2993f19df3f..e7110ba3242b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2349,7 +2349,7 @@ sub process {
 # Check for git id commit length and improperly formed commit descriptions
 		if ($in_commit_log && !$commit_log_possible_stack_dump &&
 		    ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i ||
-		     ($line =~ /\b[0-9a-f]{12,40}\b/i &&
+		     ($line =~ /\b[\s^\("][0-9a-f]{12,40}\b/i &&
 		      $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i &&
 		      $line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) {
 			my $init_char = "c";

      reply	other threads:[~2015-12-04  1:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-16 22:43 [BUG] checkpatch: false positive for commits with quote characters Brian Norris
2015-11-17 17:48 ` Joe Perches
2015-11-17 18:03   ` Brian Norris
2015-12-04  0:13     ` Brian Norris
2015-12-04  0:29       ` Joe Perches
2015-12-04  0:36         ` Joe Perches
2015-12-04  1:39           ` Brian Norris [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151204013948.GB116589@google.com \
    --to=computersforpeace@gmail.com \
    --cc=apw@canonical.com \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox