From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser
Date: Mon, 4 Aug 2025 14:58:18 +0200 [thread overview]
Message-ID: <20250804145818.3cc73ca2@foz.lan> (raw)
In-Reply-To: <87h5yrruki.fsf@trenco.lwn.net>
Em Fri, 01 Aug 2025 08:21:49 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>
> > Em Thu, 31 Jul 2025 18:13:17 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >
> >> A lot of the regular expressions in this file have extraneous backslashes
> >
> > This one is a bit scary... It could actually cause issues somewhere.
>
> What kind of issues?
I caught several issues in the past due to the lack of it. Don't
recall the specific cases, but using reserved symbols without
backslashes have giving me enough headaches.
Yet, see POSIX rules for some cases:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03
like this one:
"The character sequences "[.", "[=", and "[:" shall be special
inside a bracket expression"
Basically, if you don't know exactly what you're doing, and just
place special characters there without extra case, you may be
in serious troubles. And see, this is just for BRE (basic regular
expressions). There are also other weirdness with ERE (extended
regular expressions):
"The <period>, <left-square-bracket>, <backslash>, and
<left-parenthesis> shall be special except when used
in a bracket expression"
> > Also, IMHO, some expressions look worse on my eyes ;-)
>
> Here I think we're going to disagree. The extra backslashes are really
> just visual noise as far as I'm concerned.
>
> >> that may have been needed in Perl, but aren't helpful here. Take them out
> >> to reduce slightly the visual noise.
> >
> > No idea if Perl actually requires, but, at least for me, I do prefer to
> > see all special characters properly escaped with a backslash. This way,
> > it is a lot clearer that what it is expecting is a string, instead of
> > using something that may affect regex processing.
>
> I guess my point is that, in the given cases, the characters in question
> *aren't* special.
They are special in the sense that we're using characters that
have meanings in regular expressions and even placing them on
a random order may cause POSIX violations (and eventually cause
troubles if, for instance, we need to use "regex" instead of "re",
or if someone fixes python native "re" to be more POSIX compliant.
> >> - param = KernRe(r'[\[\)].*').sub('', param, count=1)
> >> + param = KernRe(r'[)[].*').sub('', param, count=1)
> >
> > This one, for instance, IMHO looks a lot worse for my eyes to understand
> > that there is a "[" that it is not an operator, but instead a string.
> > The open close parenthesis also looks weird. My regex-trained eyes think
> > that this would be part of a capture group.
>
> ...and mine say "that's in [brackets] why are you escaping it?" :)
Heh, all those years writing and reviewing kernel code, for me
seeing unmatched parenthesis/brackets really bugs me... perhaps
it starts some sort of TOC syndrome ;-)
Perhaps one alternative would be to have a separate var, like:
# Before touching this, see:
# https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
# As some char sequences inside brackets have special meanings
escape_chars = ")["
param = KernRe(rf'[{escape_chars}].*').sub('', param, count=1)
or to use re_escape().
> >> if dtype == "" and param.endswith("..."):
> >> if KernRe(r'\w\.\.\.$').search(param):
> >> @@ -405,7 +405,7 @@ class KernelDoc:
> >>
> >> for arg in args.split(splitter):
> >> # Strip comments
> >> - arg = KernRe(r'\/\*.*\*\/').sub('', arg)
> >> + arg = KernRe(r'/\*.*\*/').sub('', arg)
> >
> > A pattern like /..../ is a standard way to pass search group with Regex
> > on many languages and utils that accept regular expressions like the
> > sed command. Dropping the backslash here IMHO makes it confusing ;-)
>
> ...but it is definitely not any such in Python and never has been, so
> escaping slashes looks weird and makes the reader wonder what they are
> missing.
After re-reading, this specific change is actually ok, but yeah, I
still need to read it twice or three times, as on sed, perl and other
languages that are more POSIX compliant, /re/ means a regex delimiter:
https://en.wikipedia.org/wiki/Regular_expression
> > Seriously, IMHO this patch makes a lot worse to understand what brackets,
> > parenthesis and dots are strings, and which ones are part of the regex
> > syntax.
>
> So I guess I won't fight this one to the death, but I really do
> disagree. Writing regexes in a non-canonical style just makes it harder
> for anybody else who comes along to figure out what is going on; it
> certainly made it harder for me.
Heh, for me, my main concerns are:
- unmatched brackets/parenthesis
- POSIX violations - it may work today, but future Python versions
that fix "re" module will cause regressions. It is also annoying
to write/understand regex that only works on Python.
I can live with the other ones.
Thanks,
Mauro
next prev parent reply other threads:[~2025-08-04 12:58 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-01 0:13 [PATCH 00/12] docs: kdoc: thrash up dump_struct() Jonathan Corbet
2025-08-01 0:13 ` [PATCH 01/12] docs: kdoc: consolidate the stripping of private struct/union members Jonathan Corbet
2025-08-01 5:29 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 02/12] docs: kdoc: Move a regex line in dump_struct() Jonathan Corbet
2025-08-01 5:29 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser Jonathan Corbet
2025-08-01 4:27 ` Mauro Carvalho Chehab
2025-08-01 14:21 ` Jonathan Corbet
2025-08-04 12:58 ` Mauro Carvalho Chehab [this message]
2025-08-04 16:00 ` Mauro Carvalho Chehab
2025-08-04 18:29 ` Jonathan Corbet
2025-08-01 0:13 ` [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct() Jonathan Corbet
2025-08-01 5:28 ` Mauro Carvalho Chehab
2025-08-01 5:35 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 05/12] docs: kdoc: split top-level prototype parsing " Jonathan Corbet
2025-08-01 5:34 ` Mauro Carvalho Chehab
2025-08-01 14:10 ` Jonathan Corbet
2025-08-04 12:20 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 06/12] docs: kdoc: split struct-member rewriting " Jonathan Corbet
2025-08-01 5:37 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 07/12] docs: kdoc: rework the rewrite_struct_members() main loop Jonathan Corbet
2025-08-01 5:42 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 08/12] docs: kdoc: remove an extraneous strip() call Jonathan Corbet
2025-08-01 5:45 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 09/12] docs: kdoc: Some rewrite_struct_members() commenting Jonathan Corbet
2025-08-01 5:50 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup Jonathan Corbet
2025-08-01 6:07 ` Mauro Carvalho Chehab
2025-08-01 22:52 ` Jonathan Corbet
2025-08-04 13:15 ` Mauro Carvalho Chehab
2025-08-05 22:46 ` Jonathan Corbet
2025-08-06 9:05 ` Mauro Carvalho Chehab
2025-08-06 13:00 ` Jonathan Corbet
2025-08-06 21:27 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 11/12] docs: kdoc: extract output formatting from dump_struct() Jonathan Corbet
2025-08-01 6:09 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 12/12] docs: kdoc: a few final dump_struct() touches Jonathan Corbet
2025-08-01 6:10 ` Mauro Carvalho Chehab
2025-08-01 6:23 ` [PATCH 00/12] docs: kdoc: thrash up dump_struct() Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250804145818.3cc73ca2@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=akiyks@gmail.com \
--cc=corbet@lwn.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).