From: "Daniel P. Berrange" <berrange@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
"Fam Zheng" <famz@redhat.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Markus Armbruster" <armbru@redhat.com>,
"Eduardo Habkost" <ehabkost@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 07/13] qapi: force a UTF-8 locale for running Python
Date: Mon, 15 Jan 2018 17:28:00 +0000 [thread overview]
Message-ID: <20180115172800.GV8218@redhat.com> (raw)
In-Reply-To: <33212c79-61aa-3089-dda3-2715cd43caf0@redhat.com>
On Mon, Jan 15, 2018 at 11:15:01AM -0600, Eric Blake wrote:
> On 01/15/2018 11:02 AM, Daniel P. Berrange wrote:
> > Python2 did not validate locale correctness when reading input data, so
> > would happily read UTF-8 data in non-UTF-8 locales. Python3 is strict so
> > if you try to read UTF-8 data in the C locale, it will raise an error
> > for any UTF-8 bytes that aren't representable in 7-bit ascii encoding.
>
> Urgh, that sounds like a Python bug. The C locale is defined by POSIX to
> be 8-bit clean (ie. a superset of ascii with 256 characters, not strict
> ascii with only 128 characters and 128 bytes that form encoding errors).
> But that doesn't change the fact that we have to work around python's
> braindead misinterpretation of reality.
FYI there is some background on this behaviour here:
https://www.python.org/dev/peps/pep-0538/
NB that doc says the new C-is-UTF-8 assumpion is for Python 3.7 or later,
but Fedora backported it to F27's Python 3.6 :-)
The failure can be seen on Fedora with 3.0 -> 3.5 only. (BTW you can
install many Python 3.x versions concurrently on Fedora which is handy
for testing)
> > e.g.
> >
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 54: ordinal not in range(128)
> > Traceback (most recent call last):
> > File "/tmp/qemu-test/src/scripts/qapi-commands.py", line 317, in <module>
> > schema = QAPISchema(input_file)
> > File "/tmp/qemu-test/src/scripts/qapi.py", line 1468, in __init__
> > parser = QAPISchemaParser(open(fname, 'r'))
> > File "/tmp/qemu-test/src/scripts/qapi.py", line 301, in __init__
> > previously_included)
> > File "/tmp/qemu-test/src/scripts/qapi.py", line 348, in _include
> > exprs_include = QAPISchemaParser(fobj, previously_included, info)
> > File "/tmp/qemu-test/src/scripts/qapi.py", line 271, in __init__
> > self.src = fp.read()
> > File "/usr/lib64/python3.5/encodings/ascii.py", line 26, in decode
> > return codecs.ascii_decode(input, self.errors)[0]
> >
> > Many distros support a new C.UTF-8 locale that is like the C locale,
> > but with UTF-8 instead of 7-bit ASCII. That is not entirely portable
> > though, so this patch instead forces the en_US.UTF-8 locale, which
> > is pretty similar but more widely available.
> >
> > We set LANG, rather than only LC_CTYPE, since generated source ought
> > to be independant of all of the user's locale settings.
>
> s/independant/independent/
>
> LANG is the lowest-priority setting - if the user has explicitly set
> LC_CTYPE or LC_ALL, their settings override what is in LANG.
>
> >
> > This patch only forces UTF-8 for QAPI scripts, since that is the one
> > showing the immediate error under Python3 with C locale, but potentially
> > we ought to force this for all python scripts used in the build process.
> >
> > Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > ---
> > Makefile | 22 ++++++++++++----------
> > 1 file changed, 12 insertions(+), 10 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index d86ecd2dd4..fde91cc42d 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -17,6 +17,8 @@ ifneq ($(wildcard config-host.mak),)
> > all:
> > include config-host.mak
> >
> > +PYTHON_UTF8 = LANG=en_US.UTF-8 $(PYTHON)
>
> I'm worried that this is not reproducible in the face of a user that
> explicitly sets different locale env-vars with higher priority than LANG.
You might remember a similar issue affecting libvirt-glib/libosinfo when
glib-mkenums was rewritten to use Python instead of Perl. For that I ended
up doing
LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8
> > +
> > git-submodule-update:
> >
> > .PHONY: git-submodule-update
> > @@ -471,17 +473,17 @@ qapi-py = $(SRC_PATH)/scripts/qapi.py $(SRC_PATH)/scripts/ordereddict.py
> >
> > qga/qapi-generated/qga-qapi-types.c qga/qapi-generated/qga-qapi-types.h :\
> > $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
> > - $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
> > + $(call quiet-command,$(PYTHON_UTF8) $(SRC_PATH)/scripts/qapi-types.py \
>
> But once we agree on the right override to stuff into PYTHON_UTF8, the
> rest of the patch converting invocations to PYTHON_UTF8 makes sense.
Any thoughts on whether we should apply this more widely to our build
to make its output predictable regardless of user's locale ?
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2018-01-15 17:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-15 17:02 [Qemu-devel] [PATCH v4 00/13] Support building with py2 or py3 Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 01/13] qapi: use items()/values() intead of iteritems()/itervalues() Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 02/13] qapi: Use OrderedDict from standard library if available Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 03/13] qapi: adapt to moved location of StringIO module in py3 Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 04/13] qapi: Adapt to moved location of 'maketrans' function " Daniel P. Berrange
2018-01-15 17:18 ` Eric Blake
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 05/13] qapi: remove '-q' arg to diff when comparing QAPI output Daniel P. Berrange
2018-01-15 17:08 ` Eric Blake
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 06/13] qapi: ensure stable sort ordering when checking QAPI entities Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 07/13] qapi: force a UTF-8 locale for running Python Daniel P. Berrange
2018-01-15 17:15 ` Eric Blake
2018-01-15 17:28 ` Daniel P. Berrange [this message]
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 08/13] scripts: ensure signrom treats data as bytes Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 09/13] configure: allow use of python 3 Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 10/13] input: add missing JIS keys to virtio input Daniel P. Berrange
2018-01-15 17:17 ` Eric Blake
2018-01-15 17:30 ` Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 11/13] ui: update keycodemapdb to get py3 fixes Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 12/13] travis: improve python version test coverage Daniel P. Berrange
2018-01-15 17:02 ` [Qemu-devel] [PATCH v4 13/13] docker: change Fedora images to run with python3 Daniel P. Berrange
2018-01-17 2:28 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180115172800.GV8218@redhat.com \
--to=berrange@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=armbru@redhat.com \
--cc=eblake@redhat.com \
--cc=ehabkost@redhat.com \
--cc=f4bug@amsat.org \
--cc=famz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.