* [PATCH] checkpatch: use utf-8 match for spell checking
@ 2023-12-12 9:43 Antonio Borneo
2023-12-12 19:07 ` Joe Perches
2024-01-02 16:10 ` [PATCH v2] " Antonio Borneo
0 siblings, 2 replies; 6+ messages in thread
From: Antonio Borneo @ 2023-12-12 9:43 UTC (permalink / raw)
To: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn
Cc: Antonio Borneo, linux-kernel, Clément Léger,
Clément Le Goffic, linux-stm32
The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]
Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
./script/checkpatch.pl --git 81c2f059ab9
WARNING: 'ment' may be misspelled - perhaps 'meant'?
#36: FILE: MAINTAINERS:14360:
+M: Clément Léger <clement.leger@bootlin.com>
^^^^
Use a utf-8 version of $rawline for spell checking.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
---
scripts/checkpatch.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda112..58646bd6ef56 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3477,7 +3477,8 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
+ my $rawline_utf8 = decode("utf8", $rawline);
+ while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
my $typo = $1;
my $blank = copy_spacing($rawline);
my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] checkpatch: use utf-8 match for spell checking
2023-12-12 9:43 [PATCH] checkpatch: use utf-8 match for spell checking Antonio Borneo
@ 2023-12-12 19:07 ` Joe Perches
2024-01-02 16:04 ` Antonio Borneo
2024-01-02 16:10 ` [PATCH v2] " Antonio Borneo
1 sibling, 1 reply; 6+ messages in thread
From: Joe Perches @ 2023-12-12 19:07 UTC (permalink / raw)
To: Antonio Borneo, Andy Whitcroft, Dwaipayan Ray, Lukas Bulwahn,
Andrew Morton
Cc: linux-kernel, Clément Léger, Clément Le Goffic,
linux-stm32
On Tue, 2023-12-12 at 10:43 +0100, Antonio Borneo wrote:
> The current code that checks for misspelling verifies, in a more
> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
>
> Being $rawline a byte-string, a utf-8 character in $rawline can
> match the non-word-char [^\w].
> E.g.:
> ./script/checkpatch.pl --git 81c2f059ab9
> WARNING: 'ment' may be misspelled - perhaps 'meant'?
> #36: FILE: MAINTAINERS:14360:
> +M: Clément Léger <clement.leger@bootlin.com>
> ^^^^
>
> Use a utf-8 version of $rawline for spell checking.
>
> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
> Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
Seems sensible, thanks, but:
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3477,7 +3477,8 @@ sub process {
> # Check for various typo / spelling mistakes
> if (defined($misspellings) &&
> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> + my $rawline_utf8 = decode("utf8", $rawline);
> + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> my $typo = $1;
> my $blank = copy_spacing($rawline);
Maybe this needs to use $rawline_utf8 ?
> my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
And may now the $fix bit will not always work properly
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] checkpatch: use utf-8 match for spell checking
2023-12-12 19:07 ` Joe Perches
@ 2024-01-02 16:04 ` Antonio Borneo
0 siblings, 0 replies; 6+ messages in thread
From: Antonio Borneo @ 2024-01-02 16:04 UTC (permalink / raw)
To: Joe Perches, Andy Whitcroft, Dwaipayan Ray, Lukas Bulwahn,
Andrew Morton
Cc: linux-kernel, Clément Léger, Clément Le Goffic,
linux-stm32
On Tue, 2023-12-12 at 11:07 -0800, Joe Perches wrote:
> On Tue, 2023-12-12 at 10:43 +0100, Antonio Borneo wrote:
> > The current code that checks for misspelling verifies, in a more
> > complex regex, if $rawline matches [^\w]($misspellings)[^\w]
> >
> > Being $rawline a byte-string, a utf-8 character in $rawline can
> > match the non-word-char [^\w].
> > E.g.:
> > ./script/checkpatch.pl --git 81c2f059ab9
> > WARNING: 'ment' may be misspelled - perhaps 'meant'?
> > #36: FILE: MAINTAINERS:14360:
> > +M: Clément Léger <clement.leger@bootlin.com>
> > ^^^^
> >
> > Use a utf-8 version of $rawline for spell checking.
> >
> > Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
> > Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
>
> Seems sensible, thanks, but:
>
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > @@ -3477,7 +3477,8 @@ sub process {
> > # Check for various typo / spelling mistakes
> > if (defined($misspellings) &&
> > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> > - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> > + my $rawline_utf8 = decode("utf8", $rawline);
> > + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> > my $typo = $1;
> > my $blank = copy_spacing($rawline);
>
> Maybe this needs to use $rawline_utf8 ?
Correct, I will send a v2!
>
> > my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
>
> And may now the $fix bit will not always work properly
I have run some test and it looks ok with current ASCII file scripts/spelling.txt.
I have also tested adding some utf-8 string in the spelling file, but checkpatch reads it as
ASCII and extending it to utf-8 will require further modifications in checkpatch, way beyond
this simple fix.
Thanks for the review.
Antonio
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] checkpatch: use utf-8 match for spell checking
2023-12-12 9:43 [PATCH] checkpatch: use utf-8 match for spell checking Antonio Borneo
2023-12-12 19:07 ` Joe Perches
@ 2024-01-02 16:10 ` Antonio Borneo
2024-05-06 12:07 ` Clement LE GOFFIC
1 sibling, 1 reply; 6+ messages in thread
From: Antonio Borneo @ 2024-01-02 16:10 UTC (permalink / raw)
To: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn
Cc: Antonio Borneo, linux-kernel, Clément Léger,
Clément Le Goffic, linux-stm32
The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]
Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
./scripts/checkpatch.pl --git 81c2f059ab9
WARNING: 'ment' may be misspelled - perhaps 'meant'?
#36: FILE: MAINTAINERS:14360:
+M: Clément Léger <clement.leger@bootlin.com>
^^^^
Use a utf-8 version of $rawline for spell checking.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
---
Changes in v2:
- use $rawline_utf8 also in the while-loop's body;
- fix path of checkpatch in the commit message.
---
scripts/checkpatch.pl | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda112..2d122d232c6d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3477,9 +3477,10 @@ sub process {
# Check for various typo / spelling mistakes
if (defined($misspellings) &&
($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
- while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
+ my $rawline_utf8 = decode("utf8", $rawline);
+ while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
my $typo = $1;
- my $blank = copy_spacing($rawline);
+ my $blank = copy_spacing($rawline_utf8);
my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
my $hereptr = "$hereline$ptr\n";
my $typo_fix = $spelling_fix{lc($typo)};
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
--
2.42.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] checkpatch: use utf-8 match for spell checking
2024-01-02 16:10 ` [PATCH v2] " Antonio Borneo
@ 2024-05-06 12:07 ` Clement LE GOFFIC
2025-05-20 13:14 ` Clement LE GOFFIC
0 siblings, 1 reply; 6+ messages in thread
From: Clement LE GOFFIC @ 2024-05-06 12:07 UTC (permalink / raw)
To: Antonio Borneo, Andy Whitcroft, Joe Perches, Dwaipayan Ray,
Lukas Bulwahn
Cc: linux-kernel, Clément Léger, linux-stm32
Hello,
A gentle reminder to review this patch.
Best regards,
Clément
On 1/2/24 17:10, Antonio Borneo wrote:
> The current code that checks for misspelling verifies, in a more
> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
>
> Being $rawline a byte-string, a utf-8 character in $rawline can
> match the non-word-char [^\w].
> E.g.:
> ./scripts/checkpatch.pl --git 81c2f059ab9
> WARNING: 'ment' may be misspelled - perhaps 'meant'?
> #36: FILE: MAINTAINERS:14360:
> +M: Clément Léger <clement.leger@bootlin.com>
> ^^^^
>
> Use a utf-8 version of $rawline for spell checking.
>
> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
> Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
> ---
> Changes in v2:
> - use $rawline_utf8 also in the while-loop's body;
> - fix path of checkpatch in the commit message.
> ---
> scripts/checkpatch.pl | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 25fdb7fda112..2d122d232c6d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3477,9 +3477,10 @@ sub process {
> # Check for various typo / spelling mistakes
> if (defined($misspellings) &&
> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
> - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> + my $rawline_utf8 = decode("utf8", $rawline);
> + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
> my $typo = $1;
> - my $blank = copy_spacing($rawline);
> + my $blank = copy_spacing($rawline_utf8);
> my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
> my $hereptr = "$hereline$ptr\n";
> my $typo_fix = $spelling_fix{lc($typo)};
>
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] checkpatch: use utf-8 match for spell checking
2024-05-06 12:07 ` Clement LE GOFFIC
@ 2025-05-20 13:14 ` Clement LE GOFFIC
0 siblings, 0 replies; 6+ messages in thread
From: Clement LE GOFFIC @ 2025-05-20 13:14 UTC (permalink / raw)
To: Antonio Borneo, Andy Whitcroft, Joe Perches, Dwaipayan Ray,
Lukas Bulwahn
Cc: linux-kernel, Clément Léger, linux-stm32
On 5/6/24 14:07, Clement LE GOFFIC wrote:
> Hello,
>
> A gentle reminder to review this patch.
>
> Best regards,
>
> Clément
>
> On 1/2/24 17:10, Antonio Borneo wrote:
>> The current code that checks for misspelling verifies, in a more
>> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
>>
>> Being $rawline a byte-string, a utf-8 character in $rawline can
>> match the non-word-char [^\w].
>> E.g.:
>> ./scripts/checkpatch.pl --git 81c2f059ab9
>> WARNING: 'ment' may be misspelled - perhaps 'meant'?
>> #36: FILE: MAINTAINERS:14360:
>> +M: Clément Léger <clement.leger@bootlin.com>
>> ^^^^
>>
>> Use a utf-8 version of $rawline for spell checking.
>>
>> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
>> Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
>> ---
>> Changes in v2:
>> - use $rawline_utf8 also in the while-loop's body;
>> - fix path of checkpatch in the commit message.
>> ---
>> scripts/checkpatch.pl | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> index 25fdb7fda112..2d122d232c6d 100755
>> --- a/scripts/checkpatch.pl
>> +++ b/scripts/checkpatch.pl
>> @@ -3477,9 +3477,10 @@ sub process {
>> # Check for various typo / spelling mistakes
>> if (defined($misspellings) &&
>> ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
>> - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:
>> [^\w\-'`]|$)/gi) {
>> + my $rawline_utf8 = decode("utf8", $rawline);
>> + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)
>> (?:[^\w\-'`]|$)/gi) {
>> my $typo = $1;
>> - my $blank = copy_spacing($rawline);
>> + my $blank = copy_spacing($rawline_utf8);
>> my $ptr = substr($blank, 0, $-[1]) . "^" x
>> length($typo);
>> my $hereptr = "$hereline$ptr\n";
>> my $typo_fix = $spelling_fix{lc($typo)};
>>
>> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
Hi,
Is it just due to -ENOTIME for the maintainers, or are there doubts
about this patch? (inspired from a response of Uwe).
Clément
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-20 13:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-12 9:43 [PATCH] checkpatch: use utf-8 match for spell checking Antonio Borneo
2023-12-12 19:07 ` Joe Perches
2024-01-02 16:04 ` Antonio Borneo
2024-01-02 16:10 ` [PATCH v2] " Antonio Borneo
2024-05-06 12:07 ` Clement LE GOFFIC
2025-05-20 13:14 ` Clement LE GOFFIC
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox