* [PATCH] docs: kfigure.py: don't crash during read/write
@ 2025-08-20 9:09 Mauro Carvalho Chehab
2025-08-20 12:42 ` Jonathan Corbet
0 siblings, 1 reply; 3+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-20 9:09 UTC (permalink / raw)
To: Jonathan Corbet, Linux Doc Mailing List
Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Kees Cook,
linux-kernel
By default, Python does a very bad job when reading/writing
from files, as it tries to enforce that the character is < 128.
Nothing prevents a SVG file to contain, for instance, a comment
with an utf-8 accented copyright notice - or even an utf-8
invalid char.
While testing PDF and html builds, I recently faced one build
that got an error at kfigure.py saying that a char was > 128,
crashing PDF output.
To avoid such issues, let's use PEP 383 subrogate escape encoding
to prevent read/write errors on such cases.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Documentation/sphinx/kfigure.py | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/sphinx/kfigure.py b/Documentation/sphinx/kfigure.py
index ad495c0da270..8ba07344a1c8 100644
--- a/Documentation/sphinx/kfigure.py
+++ b/Documentation/sphinx/kfigure.py
@@ -88,7 +88,7 @@ def mkdir(folder, mode=0o775):
os.makedirs(folder, mode)
def file2literal(fname):
- with open(fname, "r") as src:
+ with open(fname, "r", encoding='utf8', errors='surrogateescape') as src:
data = src.read()
node = nodes.literal_block(data, data)
return node
@@ -355,7 +355,7 @@ def dot2format(app, dot_fname, out_fname):
cmd = [dot_cmd, '-T%s' % out_format, dot_fname]
exit_code = 42
- with open(out_fname, "w") as out:
+ with open(out_fname, "w", encoding='utf8', errors='surrogateescape') as out:
exit_code = subprocess.call(cmd, stdout = out)
if exit_code != 0:
logger.warning(
@@ -533,7 +533,7 @@ def visit_kernel_render(self, node):
literal_block = node[0]
code = literal_block.astext()
- hashobj = code.encode('utf-8') # str(node.attributes)
+ hashobj = code.encode('utf-8', errors='surrogateescape')) # str(node.attributes)
fname = path.join('%s-%s' % (srclang, sha1(hashobj).hexdigest()))
tmp_fname = path.join(
@@ -541,7 +541,7 @@ def visit_kernel_render(self, node):
if not path.isfile(tmp_fname):
mkdir(path.dirname(tmp_fname))
- with open(tmp_fname, "w") as out:
+ with open(tmp_fname, "w", encoding='utf8', errors='surrogateescape') as out:
out.write(code)
img_node = nodes.image(node.rawsource, **node.attributes)
--
2.50.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] docs: kfigure.py: don't crash during read/write
2025-08-20 9:09 [PATCH] docs: kfigure.py: don't crash during read/write Mauro Carvalho Chehab
@ 2025-08-20 12:42 ` Jonathan Corbet
2025-08-20 15:48 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Corbet @ 2025-08-20 12:42 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Linux Doc Mailing List
Cc: Mauro Carvalho Chehab, Mauro Carvalho Chehab, Kees Cook,
linux-kernel
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> By default, Python does a very bad job when reading/writing
> from files, as it tries to enforce that the character is < 128.
> Nothing prevents a SVG file to contain, for instance, a comment
> with an utf-8 accented copyright notice - or even an utf-8
> invalid char.
Do you have a locale that expects everything to be ASCII? This seems a
bit weird. I would expect utf8 to work by default these days.
> While testing PDF and html builds, I recently faced one build
> that got an error at kfigure.py saying that a char was > 128,
> crashing PDF output.
>
> To avoid such issues, let's use PEP 383 subrogate escape encoding
> to prevent read/write errors on such cases.
Being explicit about utf8 is good...but where are the errors coming
from? Is this really a utf8 file?
Thanks,
jon
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] docs: kfigure.py: don't crash during read/write
2025-08-20 12:42 ` Jonathan Corbet
@ 2025-08-20 15:48 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 3+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-20 15:48 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Mauro Carvalho Chehab, Linux Doc Mailing List, linux-kernel
On Wed, Aug 20, 2025 at 06:42:29AM -0600, Jonathan Corbet wrote:
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>
> > By default, Python does a very bad job when reading/writing
> > from files, as it tries to enforce that the character is < 128.
> > Nothing prevents a SVG file to contain, for instance, a comment
> > with an utf-8 accented copyright notice - or even an utf-8
> > invalid char.
>
> Do you have a locale that expects everything to be ASCII? This seems a
> bit weird. I would expect utf8 to work by default these days.
>
> > While testing PDF and html builds, I recently faced one build
> > that got an error at kfigure.py saying that a char was > 128,
> > crashing PDF output.
> >
> > To avoid such issues, let's use PEP 383 subrogate escape encoding
> > to prevent read/write errors on such cases.
>
> Being explicit about utf8 is good...but where are the errors coming
> from? Is this really a utf8 file?
Unfortunately, I forgot to store a note when I got it the error...
heh, I almost forgot to also write/submit this one ;-)
Yet, see: kfigure.py reads a .dot or .svg file. both may contain utf-8
characters on strings. For instance, they may have an accent inside a
copyright comment, a greek letter, a math symbol, ...
So, IMO we should change read to work with encoding and have a
fallback like PEP 383.
Now, I did a git grep treewide at svg and dot files. Currently,
they're all ascii only.
-
That's said, I guess the error I got was during write. This script
tries to write in "w" mode, instead of "wb" (it came from python 2.7
times, where Python were following the typical standards for write
in Linux).
Anyway, let's not apply this one for now. It will require extra
changes.
I'll return to this when I have some time.
--
Thanks,
Mauro
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-20 15:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-20 9:09 [PATCH] docs: kfigure.py: don't crash during read/write Mauro Carvalho Chehab
2025-08-20 12:42 ` Jonathan Corbet
2025-08-20 15:48 ` Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).