* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
@ 2020-10-14 20:09 Nícolas F. R. A. Prado
2020-10-14 20:16 ` Jonathan Corbet
0 siblings, 1 reply; 5+ messages in thread
From: Nícolas F. R. A. Prado @ 2020-10-14 20:09 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Mauro Carvalho Chehab, linux-doc, linux-kernel, lkcamp,
andrealmeid
On Wed Oct 14, 2020 at 4:11 PM -03, Jonathan Corbet wrote:
>
> On Tue, 13 Oct 2020 23:13:17 +0000
> Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:
>
> > The warnings were caused by the expressions matching words in the
> > translated versions of the documentation, since any unicode character
> > was matched.
> >
> > Fix the regular expression by making the C regexes use ASCII
>
> I don't quite understand this part, can you give an example of the kinds
> of warnings you were seeing?
Hi Jon,
sure.
One I had noted down was:
WARNING: Unparseable C cross-reference: '调用debugfs_rename'
which I believe occurred in the chinese translation.
I think the problem is that in chinese there normally isn't space between the
words, so even if I had made the regexes only match the beginning of the word
(which I didn't, but I fixed this in this patch with the \b), it would still try
to cross-reference to that symbol containing chinese characters, which is
unparsable to sphinx.
So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
to make \w, and \d only match ASCII characters, otherwise they match any unicode
character.
If you want to have a look at other warnings or more complete output let me know
and I will recompile those versions. That sentence was the only thing I noted
down, but I think it gives a good idea of the problem.
Thanks,
Nícolas
>
> Thanks,
>
> jon
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
2020-10-14 20:09 [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
@ 2020-10-14 20:16 ` Jonathan Corbet
2020-10-15 6:31 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Corbet @ 2020-10-14 20:16 UTC (permalink / raw)
To: Nícolas F. R. A. Prado
Cc: Mauro Carvalho Chehab, linux-doc, linux-kernel, lkcamp,
andrealmeid
On Wed, 14 Oct 2020 20:09:10 +0000
Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:
> One I had noted down was:
>
> WARNING: Unparseable C cross-reference: '调用debugfs_rename'
>
> which I believe occurred in the chinese translation.
>
> I think the problem is that in chinese there normally isn't space between the
> words, so even if I had made the regexes only match the beginning of the word
> (which I didn't, but I fixed this in this patch with the \b), it would still try
> to cross-reference to that symbol containing chinese characters, which is
> unparsable to sphinx.
>
> So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> to make \w, and \d only match ASCII characters, otherwise they match any unicode
> character.
OK, this all makes sense, as does your fix. The one thing I would ask
would be to put that warning into the changelog for future reference.
Thanks,
jon
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
2020-10-14 20:16 ` Jonathan Corbet
@ 2020-10-15 6:31 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 5+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-15 6:31 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Nícolas F. R. A. Prado, linux-doc, linux-kernel, lkcamp,
andrealmeid
Em Wed, 14 Oct 2020 14:16:16 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:
> On Wed, 14 Oct 2020 20:09:10 +0000
> Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:
>
> > One I had noted down was:
> >
> > WARNING: Unparseable C cross-reference: '调用debugfs_rename'
> >
> > which I believe occurred in the chinese translation.
> >
> > I think the problem is that in chinese there normally isn't space between the
> > words, so even if I had made the regexes only match the beginning of the word
> > (which I didn't, but I fixed this in this patch with the \b), it would still try
> > to cross-reference to that symbol containing chinese characters, which is
> > unparsable to sphinx.
> >
> > So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> > to make \w, and \d only match ASCII characters, otherwise they match any unicode
> > character.
>
> OK, this all makes sense, as does your fix. The one thing I would ask
> would be to put that warning into the changelog for future reference.
I added yesterday patches 1 to 4 from Nícolas series on my -next tree:
https://git.linuxtv.org/mchehab/media-next.git/log/
Today, I changed the changelog in order to better describe the ASCII issue:
https://git.linuxtv.org/mchehab/media-next.git/commit/?id=f66e47f98c1e827a85654a8cfa1ba539bb381a1b
If this is enough, I'll likely send the PR to Linus later today or tomorrow,
depending on next- merge results.
Patch 5 can be added later, after we find a way to keep it safe for
parallel reading.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+
@ 2020-10-13 23:13 Nícolas F. R. A. Prado
2020-10-13 23:13 ` [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
0 siblings, 1 reply; 5+ messages in thread
From: Nícolas F. R. A. Prado @ 2020-10-13 23:13 UTC (permalink / raw)
To: Jonathan Corbet, Mauro Carvalho Chehab
Cc: linux-doc, linux-kernel, lkcamp, andrealmeid
Hi,
this patch series makes the automatic markup extension ready for Sphinx 3.1+.
It was based on Mauro's Sphinx patch series, and requires it for the namespaces
to work, but could also be merged through the docs tree without regressions
(other than the increased build time explained below).
The first three patches make automarkup compatible with Sphinx 3.1. The first
patch makes use of the new C roles in Sphinx3 instead of the generic type role
from Sphinx 2, while patches 2 and 3 solve the warnings caused by Sphinx3's
stricter C domain.
Patch 4 adds cross-referencing to C macros with parameters for Sphinx 3.
Patch 5 enables cross-referencing inside C namespaces, which are new to Sphinx
3.1.
On an importante note:
In order to be able to support automatic cross-referencing inside C namespaces,
I needed to disable parallel source reading for Sphinx in patch 5. On my
machine, this makes the build process take about 4 additional minutes. This is
very bad, since the documentation building process already takes too long, but I
couldn't think of a way to sidestep this issue. If anyone has any idea, it would
be greatly appreciated.
Also, for some reason, disabling the source read parallelization makes
Sphinx output 2 warnings saying so, which is another annoyance.
Thanks,
Nícolas
Changes in v2:
- Split the single patch into patches 1, 2 and 3
- Change sphinx version verification in patch 1
- Thanks Mauro for the clarifications in v1:
- Add patches 4 and 5 for the missing functionalities
Nícolas F. R. A. Prado (5):
docs: automarkup.py: Use new C roles in Sphinx 3
docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
docs: automarkup.py: Skip C reserved words when cross-referencing
docs: automarkup.py: Add cross-reference for parametrized C macros
docs: automarkup.py: Allow automatic cross-reference inside C
namespace
Documentation/sphinx/automarkup.py | 188 +++++++++++++++++++++++------
1 file changed, 154 insertions(+), 34 deletions(-)
--
2.28.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
2020-10-13 23:13 [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+ Nícolas F. R. A. Prado
@ 2020-10-13 23:13 ` Nícolas F. R. A. Prado
2020-10-14 19:11 ` Jonathan Corbet
0 siblings, 1 reply; 5+ messages in thread
From: Nícolas F. R. A. Prado @ 2020-10-13 23:13 UTC (permalink / raw)
To: Jonathan Corbet, Mauro Carvalho Chehab
Cc: linux-doc, linux-kernel, lkcamp, andrealmeid
With the transition to Sphinx 3, new warnings were generated by
automarkup, exposing bugs in the regexes.
The warnings were caused by the expressions matching words in the
translated versions of the documentation, since any unicode character
was matched.
Fix the regular expression by making the C regexes use ASCII and
ensuring the expressions only match the beginning of words.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@protonmail.com>
---
Documentation/sphinx/automarkup.py | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index db13fb15cedc..43dd9025fc77 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -22,12 +22,13 @@ from itertools import chain
# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
# bit tries to restrict matches to things that won't create trouble.
#
-RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
+RE_function = re.compile(r'\b(([a-zA-Z_]\w+)\(\))', flags=re.ASCII)
#
# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
#
-RE_generic_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
+RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
+ flags=re.ASCII)
#
# Sphinx 3 uses a different C role for each one of struct, union, enum and
@@ -42,7 +43,7 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
# Detects a reference to a documentation page of the form Documentation/... with
# an optional extension
#
-RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
+RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
#
# Many places in the docs refer to common system calls. It is
--
2.28.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-10-15 6:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-14 20:09 [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
2020-10-14 20:16 ` Jonathan Corbet
2020-10-15 6:31 ` Mauro Carvalho Chehab
-- strict thread matches above, loose matches on Subject: below --
2020-10-13 23:13 [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+ Nícolas F. R. A. Prado
2020-10-13 23:13 ` [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
2020-10-14 19:11 ` Jonathan Corbet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).