* [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
@ 2020-11-30 14:45 Dwaipayan Ray
2020-11-30 16:43 ` Joe Perches
0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-11-30 14:45 UTC (permalink / raw)
To: joe
Cc: linux-kernel-mentees, dwaipayanray1, linux-kernel, lukas.bulwahn,
Peilin Ye
checkpatch reports a false TYPO_SPELLING warning for some words
containing an apostrophe.
A false positive is "doesn't". Occurrence of the word causes
checkpatch to emit the following warning:
"WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?"
Check the word boundary for such cases so that words like
"doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
splitting of the word by the \b regex metacharacter.
Reported-by: Peilin Ye <yepeilin.cs@gmail.com>
Tested-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
---
scripts/checkpatch.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3c86ea737e9c..be6d09929941 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3106,7 +3106,7 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
+ while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) {
my $typo = $1;
my $typo_fix = $spelling_fix{lc($typo)};
$typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/);
--
2.27.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 14:45 [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe Dwaipayan Ray
@ 2020-11-30 16:43 ` Joe Perches
2020-11-30 17:03 ` Dwaipayan Ray
0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-11-30 16:43 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: linux-kernel-mentees, linux-kernel, lukas.bulwahn, Peilin Ye
On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote:
> checkpatch reports a false TYPO_SPELLING warning for some words
> containing an apostrophe.
>
> A false positive is "doesn't". Occurrence of the word causes
> checkpatch to emit the following warning:
>
> "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?"
>
> Check the word boundary for such cases so that words like
> "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
> splitting of the word by the \b regex metacharacter.
[]
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3106,7 +3106,7 @@ sub process {
> # Check for various typo / spelling mistakes
> if (defined($misspellings) &&
> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) {
Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 16:43 ` Joe Perches
@ 2020-11-30 17:03 ` Dwaipayan Ray
2020-11-30 17:24 ` Joe Perches
0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-11-30 17:03 UTC (permalink / raw)
To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote:
>
> On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote:
> > checkpatch reports a false TYPO_SPELLING warning for some words
> > containing an apostrophe.
> >
> > A false positive is "doesn't". Occurrence of the word causes
> > checkpatch to emit the following warning:
> >
> > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?"
> >
> > Check the word boundary for such cases so that words like
> > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
> > splitting of the word by the \b regex metacharacter.
> []
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3106,7 +3106,7 @@ sub process {
> > # Check for various typo / spelling mistakes
> > if (defined($misspellings) &&
> > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) {
>
> Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ?
>
Hi,
I tried it and it doesn't seem to work. Probably because the first
group already causes the
word to be captured. In this case `doesn'` was already captured
because of the \b group.
Is the first group modification perhaps okay? Or would you suggest
something else?
Thank you,
Dwaipayan.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 17:03 ` Dwaipayan Ray
@ 2020-11-30 17:24 ` Joe Perches
2020-11-30 17:32 ` Dwaipayan Ray
0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-11-30 17:24 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, 2020-11-30 at 22:33 +0530, Dwaipayan Ray wrote:
> On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote:
> >
> > On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote:
> > > checkpatch reports a false TYPO_SPELLING warning for some words
> > > containing an apostrophe.
> > >
> > > A false positive is "doesn't". Occurrence of the word causes
> > > checkpatch to emit the following warning:
> > >
> > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?"
> > >
> > > Check the word boundary for such cases so that words like
> > > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
> > > splitting of the word by the \b regex metacharacter.
> > []
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > []
> > > @@ -3106,7 +3106,7 @@ sub process {
> > > # Check for various typo / spelling mistakes
> > > if (defined($misspellings) &&
> > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> > > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> > > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) {
> >
> > Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ?
> >
> Hi,
> I tried it and it doesn't seem to work. Probably because the first
> group already causes the
> word to be captured. In this case `doesn'` was already captured
> because of the \b group.
>
> Is the first group modification perhaps okay? Or would you suggest
> something else?
Seems to work for me:
$ git diff --stat -p scripts/checkpatch.pl
scripts/checkpatch.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7dc094445d83..a6d4d524ae66 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3106,7 +3106,7 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
+ while ($rawline =~ /(?:^|[^a-z@'-])($misspellings)(?:\b|$|[^a-z@'-])/gi) {
my $typo = $1;
my $typo_fix = $spelling_fix{lc($typo)};
$typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/);
$ cat t_spell.c
// SPDX-License-Identifier: GPL-2.0-only
void foo(void)
{
//misspelled arne't word
}
$ ./scripts/checkpatch.pl -f --strict t_spell.c
CHECK: 'arne't' may be misspelled - perhaps 'aren't'?
#4: FILE: t_spell.c:4:
+ //misspelled arne't word
total: 0 errors, 0 warnings, 1 checks, 5 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.
t_spell.c has style problems, please review.
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 17:24 ` Joe Perches
@ 2020-11-30 17:32 ` Dwaipayan Ray
2020-11-30 18:01 ` Joe Perches
0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-11-30 17:32 UTC (permalink / raw)
To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, Nov 30, 2020 at 10:54 PM Joe Perches <joe@perches.com> wrote:
>
> On Mon, 2020-11-30 at 22:33 +0530, Dwaipayan Ray wrote:
> > On Mon, Nov 30, 2020 at 10:13 PM Joe Perches <joe@perches.com> wrote:
> > >
> > > On Mon, 2020-11-30 at 20:15 +0530, Dwaipayan Ray wrote:
> > > > checkpatch reports a false TYPO_SPELLING warning for some words
> > > > containing an apostrophe.
> > > >
> > > > A false positive is "doesn't". Occurrence of the word causes
> > > > checkpatch to emit the following warning:
> > > >
> > > > "WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?"
> > > >
> > > > Check the word boundary for such cases so that words like
> > > > "doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
> > > > splitting of the word by the \b regex metacharacter.
> > > []
> > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > []
> > > > @@ -3106,7 +3106,7 @@ sub process {
> > > > # Check for various typo / spelling mistakes
> > > > if (defined($misspellings) &&
> > > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> > > > - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> > > > + while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b(?![^\w]?[a-z@]+)|$|[^a-z@])/gi) {
> > >
> > > Wouldn't it be simpler to change the existing [^a-z@] blocks to [^a-z@'-] ?
> > >
> > Hi,
> > I tried it and it doesn't seem to work. Probably because the first
> > group already causes the
> > word to be captured. In this case `doesn'` was already captured
> > because of the \b group.
> >
> > Is the first group modification perhaps okay? Or would you suggest
> > something else?
>
> Seems to work for me:
>
> $ git diff --stat -p scripts/checkpatch.pl
> scripts/checkpatch.pl | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 7dc094445d83..a6d4d524ae66 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3106,7 +3106,7 @@ sub process {
> # Check for various typo / spelling mistakes
> if (defined($misspellings) &&
> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> + while ($rawline =~ /(?:^|[^a-z@'-])($misspellings)(?:\b|$|[^a-z@'-])/gi) {
> my $typo = $1;
> my $typo_fix = $spelling_fix{lc($typo)};
> $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/);
>
> $ cat t_spell.c
> // SPDX-License-Identifier: GPL-2.0-only
> void foo(void)
> {
> //misspelled arne't word
> }
>
> $ ./scripts/checkpatch.pl -f --strict t_spell.c
> CHECK: 'arne't' may be misspelled - perhaps 'aren't'?
> #4: FILE: t_spell.c:4:
> + //misspelled arne't word
>
> total: 0 errors, 0 warnings, 1 checks, 5 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
> mechanically convert to the typical style using --fix or --fix-inplace.
>
> t_spell.c has style problems, please review.
>
> NOTE: If any of the errors are false positives, please report
> them to the maintainer, see CHECKPATCH in MAINTAINERS.
>
Sorry I think i explained wrong. For words like "doesn't", it still
has the same problem.
With your suggested modification in place:
$ ./scripts/checkpatch.pl
0001-checkpatch-fix-TYPO_SPELLING-check-for-words-with-ap.patch
--codespell
WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?
#9:
A false positive is "doesn't". Occurrence of the word causes
WARNING: 'doesn'' may be misspelled - perhaps 'doesn't'?
#15:
"doesn't", "zig-zag", etc. aren't misinterpreted due to wrong
Thank you,
Dwaipayan.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 17:32 ` Dwaipayan Ray
@ 2020-11-30 18:01 ` Joe Perches
2020-11-30 18:26 ` Dwaipayan Ray
0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2020-11-30 18:01 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, 2020-11-30 at 23:02 +0530, Dwaipayan Ray wrote:
> Sorry I think i explained wrong. For words like "doesn't", it still
> has the same problem.
I think you explained it wrong when you didn't mention this is
_only_ a problem when using --codespell.
Likely it'd be better to use "(?:^|\s)($misspellings)(?=\s|$)"
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7dc094445d83..b1783f02f745 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3106,7 +3106,7 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
+ while ($rawline =~ /(?:^|\s)($misspellings)(?=\s|$)/gi) {
my $typo = $1;
my $typo_fix = $spelling_fix{lc($typo)};
$typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/);
$ cat t_spell.c
// SPDX-License-Identifier: GPL-2.0-only
void foo(void)
{
//misspelled doesn' doesn't arne't word
}
$ ./scripts/checkpatch.pl -f --strict t_spell.c --codespell --codespellfile /usr/lib/python3/dist-packages/codespell_lib/data/dictionary.txt
CHECK: 'doesn'' may be misspelled - perhaps 'doesn't'?
#4: FILE: t_spell.c:4:
+ //misspelled doesn' doesn't arne't word
CHECK: 'arne't' may be misspelled - perhaps 'aren't'?
#4: FILE: t_spell.c:4:
+ //misspelled doesn' doesn't arne't word
total: 0 errors, 0 warnings, 2 checks, 5 lines checked
NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.
t_spell.c has style problems, please review.
NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 18:01 ` Joe Perches
@ 2020-11-30 18:26 ` Dwaipayan Ray
2020-11-30 19:37 ` Joe Perches
0 siblings, 1 reply; 8+ messages in thread
From: Dwaipayan Ray @ 2020-11-30 18:26 UTC (permalink / raw)
To: Joe Perches; +Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, Nov 30, 2020 at 11:31 PM Joe Perches <joe@perches.com> wrote:
>
> On Mon, 2020-11-30 at 23:02 +0530, Dwaipayan Ray wrote:
> > Sorry I think i explained wrong. For words like "doesn't", it still
> > has the same problem.
>
> I think you explained it wrong when you didn't mention this is
> _only_ a problem when using --codespell.
>
Sorry for that.
> Likely it'd be better to use "(?:^|\s)($misspellings)(?=\s|$)"
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 7dc094445d83..b1783f02f745 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3106,7 +3106,7 @@ sub process {
> # Check for various typo / spelling mistakes
> if (defined($misspellings) &&
> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> - while ($rawline =~ /(?:^|[^a-z@])($misspellings)(?:\b|$|[^a-z@])/gi) {
> + while ($rawline =~ /(?:^|\s)($misspellings)(?=\s|$)/gi) {
> my $typo = $1;
> my $typo_fix = $spelling_fix{lc($typo)};
> $typo_fix = ucfirst($typo_fix) if ($typo =~ /^[A-Z]/);
>
> $ cat t_spell.c
> // SPDX-License-Identifier: GPL-2.0-only
> void foo(void)
> {
> //misspelled doesn' doesn't arne't word
> }
>
> $ ./scripts/checkpatch.pl -f --strict t_spell.c --codespell --codespellfile /usr/lib/python3/dist-packages/codespell_lib/data/dictionary.txt
> CHECK: 'doesn'' may be misspelled - perhaps 'doesn't'?
> #4: FILE: t_spell.c:4:
> + //misspelled doesn' doesn't arne't word
>
Thanks, this does resolve the original problem, but again the following
line throws 0 warnings:
"zeebra" ther, yourr.
Any punctuation separators are ignored :(
(?:^|\s)($misspellings)(?=[\s\.\,\:\;\"\?\!]|$)
Would this be acceptable rather? But again this doesn't
handle [therr] or (therr).
Thank you,
Dwaipayan.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe
2020-11-30 18:26 ` Dwaipayan Ray
@ 2020-11-30 19:37 ` Joe Perches
0 siblings, 0 replies; 8+ messages in thread
From: Joe Perches @ 2020-11-30 19:37 UTC (permalink / raw)
To: Dwaipayan Ray
Cc: linux-kernel-mentees, linux-kernel, Lukas Bulwahn, Peilin Ye
On Mon, 2020-11-30 at 23:56 +0530, Dwaipayan Ray wrote:
> Thanks, this does resolve the original problem, but again the following
> line throws 0 warnings:
>
> "zeebra" ther, yourr.
>
> Any punctuation separators are ignored :(
>
> (?:^|\s)($misspellings)(?=[\s\.\,\:\;\"\?\!]|$)
>
> Would this be acceptable rather? But again this doesn't
> handle [therr] or (therr).
No idea.
What does codespell use for its regex?
Maybe that should be used.
Maybe all the added lines should be collected and
codespell should be called on those lines instead.
Try other options and check if the overall cpu/wall clock use is reduced.
Adding codespell's dictionary was kind of a 'nice to have' option and
it's not likely that it matters a lot if it's perfect or not.
My presumption is that it's not frequently used, but hey, who knows.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-11-30 19:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-30 14:45 [PATCH] checkpatch: fix TYPO_SPELLING check for words with apostrophe Dwaipayan Ray
2020-11-30 16:43 ` Joe Perches
2020-11-30 17:03 ` Dwaipayan Ray
2020-11-30 17:24 ` Joe Perches
2020-11-30 17:32 ` Dwaipayan Ray
2020-11-30 18:01 ` Joe Perches
2020-11-30 18:26 ` Dwaipayan Ray
2020-11-30 19:37 ` Joe Perches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox