linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking
@ 2025-06-16  7:59 Clément Le Goffic
  2025-06-17  0:30 ` Andrew Morton
  2025-06-17  0:31 ` Andrew Morton
  0 siblings, 2 replies; 4+ messages in thread
From: Clément Le Goffic @ 2025-06-16  7:59 UTC (permalink / raw)
  To: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn
  Cc: linux-kernel, akpm, Antonio Borneo, Clément Le Goffic

From: Antonio Borneo <antonio.borneo@foss.st.com>

The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]

Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
	./scripts/checkpatch.pl --git 81c2f059ab9
	WARNING: 'ment' may be misspelled - perhaps 'meant'?
	#36: FILE: MAINTAINERS:14360:
	+M:     Clément Léger <clement.leger@bootlin.com>
	            ^^^^

Use a utf-8 version of $rawline for spell checking.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
---
Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com>
---
- Link to v1: https://lore.kernel.org/lkml/20231212094310.3633-1-antonio.borneo@foss.st.com/
---
 scripts/checkpatch.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 664f7b7a622c..489b74d52abe 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3502,9 +3502,10 @@ sub process {
 # Check for various typo / spelling mistakes
 		if (defined($misspellings) &&
 		    ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) {
-			while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
+			my $rawline_utf8 = decode("utf8", $rawline);
+			while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) {
 				my $typo = $1;
-				my $blank = copy_spacing($rawline);
+				my $blank = copy_spacing($rawline_utf8);
 				my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);
 				my $hereptr = "$hereline$ptr\n";
 				my $typo_fix = $spelling_fix{lc($typo)};

---
base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
change-id: 20250616-b4-checkpatch-upstream-a8ef45ce0fc7

Best regards,
-- 
Clément Le Goffic <clement.legoffic@foss.st.com>


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking
  2025-06-16  7:59 [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking Clément Le Goffic
@ 2025-06-17  0:30 ` Andrew Morton
  2025-06-17  0:31 ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2025-06-17  0:30 UTC (permalink / raw)
  To: Clément Le Goffic
  Cc: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn,
	linux-kernel, Antonio Borneo

On Mon, 16 Jun 2025 09:59:13 +0200 Clément Le Goffic <clement.legoffic@foss.st.com> wrote:

> From: Antonio Borneo <antonio.borneo@foss.st.com>
> 
> The current code that checks for misspelling verifies, in a more
> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
> 
> Being $rawline a byte-string, a utf-8 character in $rawline can
> match the non-word-char [^\w].
> E.g.:
> 	./scripts/checkpatch.pl --git 81c2f059ab9
> 	WARNING: 'ment' may be misspelled - perhaps 'meant'?
> 	#36: FILE: MAINTAINERS:14360:
> 	+M:     Clément Léger <clement.leger@bootlin.com>
> 	            ^^^^
> 
> Use a utf-8 version of $rawline for spell checking.
> 
> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>

This should have included your signed-off-by.  as you were on the patch
delivery path.  I have made that change to the mm.git copy of this
patch, thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking
  2025-06-16  7:59 [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking Clément Le Goffic
  2025-06-17  0:30 ` Andrew Morton
@ 2025-06-17  0:31 ` Andrew Morton
  2025-06-17  7:35   ` Clement LE GOFFIC
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2025-06-17  0:31 UTC (permalink / raw)
  To: Clément Le Goffic
  Cc: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn,
	linux-kernel, Antonio Borneo

On Mon, 16 Jun 2025 09:59:13 +0200 Clément Le Goffic <clement.legoffic@foss.st.com> wrote:

> From: Antonio Borneo <antonio.borneo@foss.st.com>
> 
> The current code that checks for misspelling verifies, in a more
> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
> 
> Being $rawline a byte-string, a utf-8 character in $rawline can
> match the non-word-char [^\w].
> E.g.:
> 	./scripts/checkpatch.pl --git 81c2f059ab9
> 	WARNING: 'ment' may be misspelled - perhaps 'meant'?
> 	#36: FILE: MAINTAINERS:14360:
> 	+M:     Clément Léger <clement.leger@bootlin.com>
> 	            ^^^^
> 
> Use a utf-8 version of $rawline for spell checking.
> 
> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
> Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
> ---
> Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com>

Oh, there it is, after the "^---$", which marks end-of-changelog!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking
  2025-06-17  0:31 ` Andrew Morton
@ 2025-06-17  7:35   ` Clement LE GOFFIC
  0 siblings, 0 replies; 4+ messages in thread
From: Clement LE GOFFIC @ 2025-06-17  7:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, Joe Perches, Dwaipayan Ray, Lukas Bulwahn,
	linux-kernel, Antonio Borneo

On 6/17/25 02:31, Andrew Morton wrote:
> On Mon, 16 Jun 2025 09:59:13 +0200 Clément Le Goffic <clement.legoffic@foss.st.com> wrote:
> 
>> From: Antonio Borneo <antonio.borneo@foss.st.com>
>>
>> The current code that checks for misspelling verifies, in a more
>> complex regex, if $rawline matches [^\w]($misspellings)[^\w]
>>
>> Being $rawline a byte-string, a utf-8 character in $rawline can
>> match the non-word-char [^\w].
>> E.g.:
>> 	./scripts/checkpatch.pl --git 81c2f059ab9
>> 	WARNING: 'ment' may be misspelled - perhaps 'meant'?
>> 	#36: FILE: MAINTAINERS:14360:
>> 	+M:     Clément Léger <clement.leger@bootlin.com>
>> 	            ^^^^
>>
>> Use a utf-8 version of $rawline for spell checking.
>>
>> Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
>> Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
>> ---
>> Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com>
> 
> Oh, there it is, after the "^---$", which marks end-of-changelog!

Hi, Oh right sorry !
Thank you for making the change.

Clément

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-17  7:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-16  7:59 [PATCH RESEND v2] checkpatch: use utf-8 match for spell checking Clément Le Goffic
2025-06-17  0:30 ` Andrew Morton
2025-06-17  0:31 ` Andrew Morton
2025-06-17  7:35   ` Clement LE GOFFIC

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).