* [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe @ 2020-11-30 14:45 Dwaipayan Ray 2020-11-30 16:43 ` Joe Perches 0 siblings, 1 reply; 8+ messages in thread From: Dwaipayan Ray @ 2020-11-30 14:45 UTC (permalink / raw) To: joe Cc: linux-kernel-mentees, dwaipayanray1, linux-kernel, lukas.bulwahn, Peilin Ye checkpatch reports a false TYPO_SPELLING warning for some words containing an apostrophe. A false positive is "doesn't". Occurrence of the word causes checkpatch to emit the following warning: "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?" Check the word boundary for such cases so that words like "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong splitting of the word by the \b regex metacharacter. Reported-by: Peilin Ye <yepeilin.cs@gmail.com> Tested-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> --- scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3c86ea737e9c..be6d09929941 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3106,7 +3106,7 @@ sub process { # Check for various typo / spelling mistakes if (defined($misspellings) && ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) { my $typo = $1; my $typo_fix = $spelling_fix{lc($typo)}; $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); -- 2.27.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 14:45 [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe Dwaipayan Ray @ 2020-11-30 16:43 ` Joe Perches 2020-11-30 17:03 ` Dwaipayan Ray 0 siblings, 1 reply; 8+ messages in thread From: Joe Perches @ 2020-11-30 16:43 UTC (permalink / raw) To: Dwaipayan Ray Cc: linux-kernel-mentees, linux-kernel, lukas.bulwahn, Peilin Ye On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote: > checkpatch reports a false TYPO_SPELLING warning for some words > containing an apostrophe. > > A false positive is "doesn't". Occurrence of the word causes > checkpatch to emit the following warning: > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?" > > Check the word boundary for such cases so that words like > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong > splitting of the word by the \b regex metacharacter. [] > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl [] > @@ -3106,7 +3106,7 @@ sub process { > # Check for various typo / spelling mistakes > if (defined($misspellings) && > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) { Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 16:43 ` Joe Perches @ 2020-11-30 17:03 ` Dwaipayan Ray 2020-11-30 17:24 ` Joe Perches 0 siblings, 1 reply; 8+ messages in thread From: Dwaipayan Ray @ 2020-11-30 17:03 UTC (permalink / raw) To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote: > > On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote: > > checkpatch reports a false TYPO_SPELLING warning for some words > > containing an apostrophe. > > > > A false positive is "doesn't". Occurrence of the word causes > > checkpatch to emit the following warning: > > > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?" > > > > Check the word boundary for such cases so that words like > > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong > > splitting of the word by the \b regex metacharacter. > [] > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > [] > > @@ -3106,7 +3106,7 @@ sub process { > > # Check for various typo / spelling mistakes > > if (defined($misspellings) && > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) { > > Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ? > Hi, I tried it and it doesn't seem to work. Probably because the first group already causes the word to be captured. In this case `doesn'` was already captured because of the \b group. Is the first group modification perhaps okay? Or would you suggest something else? Thank you, Dwaipayan. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 17:03 ` Dwaipayan Ray @ 2020-11-30 17:24 ` Joe Perches 2020-11-30 17:32 ` Dwaipayan Ray 0 siblings, 1 reply; 8+ messages in thread From: Joe Perches @ 2020-11-30 17:24 UTC (permalink / raw) To: Dwaipayan Ray Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, 2020-11-30 at 22:33 +0530, Dwaipayan Ray wrote: > On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote: > > > > On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote: > > > checkpatch reports a false TYPO_SPELLING warning for some words > > > containing an apostrophe. > > > > > > A false positive is "doesn't". Occurrence of the word causes > > > checkpatch to emit the following warning: > > > > > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?" > > > > > > Check the word boundary for such cases so that words like > > > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong > > > splitting of the word by the \b regex metacharacter. > > [] > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > > [] > > > @@ -3106,7 +3106,7 @@ sub process { > > > # Check for various typo / spelling mistakes > > > if (defined($misspellings) && > > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > > > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > > > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) { > > > > Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ? > > > Hi, > I tried it and it doesn't seem to work. Probably because the first > group already causes the > word to be captured. In this case `doesn'` was already captured > because of the \b group. > > Is the first group modification perhaps okay? Or would you suggest > something else? Seems to work for me: $ git diff --stat -p scripts/checkpatch.pl scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 7dc094445d83..a6d4d524ae66 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3106,7 +3106,7 @@ sub process { # Check for various typo / spelling mistakes if (defined($misspellings) && ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { + while ($rawline =~ /(?:^|[^a-z@'-])($misspellings)(?:\b|$|[^a-z@'-])/gi) { my $typo = $1; my $typo_fix = $spelling_fix{lc($typo)}; $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); $ cat t_spell.c // SPDX-License-Identifier: GPL-2.0-only void foo(void) { //misspelled arne't word } $ ./scripts/checkpatch.pl -f --strict t_spell.c CHECK: 'arne't' may be misspelled - perhaps 'aren't'? #4: FILE: t_spell.c:4: + //misspelled arne't word total: 0 errors, 0 warnings, 1 checks, 5 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. t_spell.c has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 17:24 ` Joe Perches @ 2020-11-30 17:32 ` Dwaipayan Ray 2020-11-30 18:01 ` Joe Perches 0 siblings, 1 reply; 8+ messages in thread From: Dwaipayan Ray @ 2020-11-30 17:32 UTC (permalink / raw) To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, Nov 30, 2020 at 10:54 PM Joe Perches <joe@perches.com> wrote: > > On Mon, 2020-11-30 at 22:33 +0530, Dwaipayan Ray wrote: > > On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote: > > > > > > On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote: > > > > checkpatch reports a false TYPO_SPELLING warning for some words > > > > containing an apostrophe. > > > > > > > > A false positive is "doesn't". Occurrence of the word causes > > > > checkpatch to emit the following warning: > > > > > > > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?" > > > > > > > > Check the word boundary for such cases so that words like > > > > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong > > > > splitting of the word by the \b regex metacharacter. > > > [] > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > > > [] > > > > @@ -3106,7 +3106,7 @@ sub process { > > > > # Check for various typo / spelling mistakes > > > > if (defined($misspellings) && > > > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > > > > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > > > > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) { > > > > > > Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ? > > > > > Hi, > > I tried it and it doesn't seem to work. Probably because the first > > group already causes the > > word to be captured. In this case `doesn'` was already captured > > because of the \b group. > > > > Is the first group modification perhaps okay? Or would you suggest > > something else? > > Seems to work for me: > > $ git diff --stat -p scripts/checkpatch.pl > scripts/checkpatch.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 7dc094445d83..a6d4d524ae66 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -3106,7 +3106,7 @@ sub process { > # Check for various typo / spelling mistakes > if (defined($misspellings) && > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > + while ($rawline =~ /(?:^|[^a-z@'-])($misspellings)(?:\b|$|[^a-z@'-])/gi) { > my $typo = $1; > my $typo_fix = $spelling_fix{lc($typo)}; > $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); > > $ cat t_spell.c > // SPDX-License-Identifier: GPL-2.0-only > void foo(void) > { > //misspelled arne't word > } > > $ ./scripts/checkpatch.pl -f --strict t_spell.c > CHECK: 'arne't' may be misspelled - perhaps 'aren't'? > #4: FILE: t_spell.c:4: > + //misspelled arne't word > > total: 0 errors, 0 warnings, 1 checks, 5 lines checked > > NOTE: For some of the reported defects, checkpatch may be able to > mechanically convert to the typical style using --fix or --fix-inplace. > > t_spell.c has style problems, please review. > > NOTE: If any of the errors are false positives, please report > them to the maintainer, see CHECKPATCH in MAINTAINERS. > Sorry I think i explained wrong. For words like "doesn't", it still has the same problem. With your suggested modification in place: $ ./scripts/checkpatch.pl 0001-checkpatch-fix-TYPO_SPELLING-check-for-words-with-ap.patch --codespell WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'? #9: A false positive is "doesn't". Occurrence of the word causes WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'? #15: "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong Thank you, Dwaipayan. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 17:32 ` Dwaipayan Ray @ 2020-11-30 18:01 ` Joe Perches 2020-11-30 18:26 ` Dwaipayan Ray 0 siblings, 1 reply; 8+ messages in thread From: Joe Perches @ 2020-11-30 18:01 UTC (permalink / raw) To: Dwaipayan Ray Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, 2020-11-30 at 23:02 +0530, Dwaipayan Ray wrote: > Sorry I think i explained wrong. For words like "doesn't", it still > has the same problem. I think you explained it wrong when you didn't mention this is _only_ a problem when using --codespell. Likely it'd be better to use "(?:^|\s)($misspellings)(?=\s|$)" diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 7dc094445d83..b1783f02f745 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3106,7 +3106,7 @@ sub process { # Check for various typo / spelling mistakes if (defined($misspellings) && ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { + while ($rawline =~ /(?:^|\s)($misspellings)(?=\s|$)/gi) { my $typo = $1; my $typo_fix = $spelling_fix{lc($typo)}; $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); $ cat t_spell.c // SPDX-License-Identifier: GPL-2.0-only void foo(void) { //misspelled doesn' doesn't arne't word } $ ./scripts/checkpatch.pl -f --strict t_spell.c --codespell --codespellfile /usr/lib/python3/dist-packages/codespell_lib/data/dictionary.txt CHECK: 'doesn'' may be misspelled - perhaps 'doesn't'? #4: FILE: t_spell.c:4: + //misspelled doesn' doesn't arne't word CHECK: 'arne't' may be misspelled - perhaps 'aren't'? #4: FILE: t_spell.c:4: + //misspelled doesn' doesn't arne't word total: 0 errors, 0 warnings, 2 checks, 5 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. t_spell.c has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 18:01 ` Joe Perches @ 2020-11-30 18:26 ` Dwaipayan Ray 2020-11-30 19:37 ` Joe Perches 0 siblings, 1 reply; 8+ messages in thread From: Dwaipayan Ray @ 2020-11-30 18:26 UTC (permalink / raw) To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, Nov 30, 2020 at 11:31 PM Joe Perches <joe@perches.com> wrote: > > On Mon, 2020-11-30 at 23:02 +0530, Dwaipayan Ray wrote: > > Sorry I think i explained wrong. For words like "doesn't", it still > > has the same problem. > > I think you explained it wrong when you didn't mention this is > _only_ a problem when using --codespell. > Sorry for that. > Likely it'd be better to use "(?:^|\s)($misspellings)(?=\s|$)" > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 7dc094445d83..b1783f02f745 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -3106,7 +3106,7 @@ sub process { > # Check for various typo / spelling mistakes > if (defined($misspellings) && > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) { > + while ($rawline =~ /(?:^|\s)($misspellings)(?=\s|$)/gi) { > my $typo = $1; > my $typo_fix = $spelling_fix{lc($typo)}; > $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/); > > $ cat t_spell.c > // SPDX-License-Identifier: GPL-2.0-only > void foo(void) > { > //misspelled doesn' doesn't arne't word > } > > $ ./scripts/checkpatch.pl -f --strict t_spell.c --codespell --codespellfile /usr/lib/python3/dist-packages/codespell_lib/data/dictionary.txt > CHECK: 'doesn'' may be misspelled - perhaps 'doesn't'? > #4: FILE: t_spell.c:4: > + //misspelled doesn' doesn't arne't word > Thanks, this does resolve the original problem, but again the following line throws 0 warnings: "zeebra" ther, yourr. Any punctuation separators are ignored :( (?:^|\s)($misspellings)(?=[\s\.\,\:\;\"\?\!]|$) Would this be acceptable rather? But again this doesn't handle [therr] or (therr). Thank you, Dwaipayan. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe 2020-11-30 18:26 ` Dwaipayan Ray @ 2020-11-30 19:37 ` Joe Perches 0 siblings, 0 replies; 8+ messages in thread From: Joe Perches @ 2020-11-30 19:37 UTC (permalink / raw) To: Dwaipayan Ray Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye On Mon, 2020-11-30 at 23:56 +0530, Dwaipayan Ray wrote: > Thanks, this does resolve the original problem, but again the following > line throws 0 warnings: > > "zeebra" ther, yourr. > > Any punctuation separators are ignored :( > > (?:^|\s)($misspellings)(?=[\s\.\,\:\;\"\?\!]|$) > > Would this be acceptable rather? But again this doesn't > handle [therr] or (therr). No idea. What does codespell use for its regex? Maybe that should be used. Maybe all the added lines should be collected and codespell should be called on those lines instead. Try other options and check if the overall cpu/wall clock use is reduced. Adding codespell's dictionary was kind of a 'nice to have' option and it's not likely that it matters a lot if it's perfect or not. My presumption is that it's not frequently used, but hey, who knows. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-11-30 19:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-30 14:45 [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe Dwaipayan Ray 2020-11-30 16:43 ` Joe Perches 2020-11-30 17:03 ` Dwaipayan Ray 2020-11-30 17:24 ` Joe Perches 2020-11-30 17:32 ` Dwaipayan Ray 2020-11-30 18:01 ` Joe Perches 2020-11-30 18:26 ` Dwaipayan Ray 2020-11-30 19:37 ` Joe Perches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox