Re: [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: John Snow <jsnow@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: qemu-trivial@nongnu.org, Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages
Date: Tue, 20 Nov 2018 17:43:04 -0500	[thread overview]
Message-ID: <53e365fc-bc5f-5a3a-0b96-68e308c1e5b7@redhat.com> (raw)
In-Reply-To: <1671d82c-da21-de1e-58c4-dd22696f9a62@redhat.com>



On 11/20/18 5:01 PM, Eric Blake wrote:
> [adding Markus in CC, since git didn't do it automatically from the
> 'Reported-by']
> 
> On 11/20/18 3:28 PM, John Snow wrote:
>>
>>
>> On 11/20/18 3:36 PM, Eric Blake wrote:
>>> While most developers are now using UTF-8 environments, it's
>>> harder to guarantee that error messages will be output to
>>> a multibyte locale. Rather than risking error messages that
>>> get corrupted into mojibake when the user runs qemu in a
>>> non-multibyte locale, let's stick to straight ASCII error
>>> messages, rather than assuming that our use of UTF-8 in source
>>> code string constants will work unchanged in other locales.
>>>
>>> Reported-by: Markus Armbruster <armbru@redhat.com>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>   hw/misc/tmp105.c | 2 +-
>>>   hw/misc/tmp421.c | 2 +-
>>>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
>>
>> Do we have any policy in place to prohibit this in the future?
>> (Presumably a policy that is automatic and won't interfere with QEMU
>> localization efforts which may rightly attempt to use UTF-8 for those
>> locales.)
> 
> Not that I know of.
> 
>>
>> Do you have a script or trick to find utf-8 containing strings in our
>> source?
> 
> Markus found these two, probably by reading over a list resulting from
> his claim of finding 217 out of 6455 files (53 of them binary, which
> don't count):
> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg04017.html
> 
> My quick and dirty attempt, which does not quite reproduce his numbers:
> 
> $ LC_ALL=C git grep -l $'[\x80-\xff]' | wc
>     279     279    7490
> 
> Thus, by forcing a unibyte locale (where encoding errors are impossible)
> with sane range expressions (POSIX says only the C locale is required to
> interpret regex ranges according to byte value - all bets are off in
> other locales) and using $'' to type non-UTF-8 bytes into my search, I
> found 279 files with at least one byte outside of ASCII.  But the use of
> -l has no easy way to filter which of those files are binary; while
> dropping -l claims 2138 "lines" with non-ASCII, which gets tedious to
> scroll through, especially considering there ARE binary files in the mix.
> 
> Narrowing the search to a more specific pattern:
> 
> $ LC_ALL=C git grep $'".*[\x80-\xff].*"' | grep -v 'Binary file' | wc
>     129     685    8808
> 
> is a bit more manageable, with MOST of the hits in pc-bios/qemu.rsrc
> (false positive hits, due to interesting? comments), in po/ (which
> doesn't count), or in scripts/ for python.  And the proof for THIS patch:
> 
> $ LC_ALL=C git grep -l $'".*[\x80-\xff].*"' origin -- '**/*.[ch]' | cat
> origin:hw/misc/tmp105.c
> origin:hw/misc/tmp421.c
> 

Aha!

Thanks. I was surprised we've managed to introduce only two cases of
this so far. Your patch seems complete in this case.

Since I took the time to read along, I may as well:

Reviewed-by: John Snow <jsnow@redhat.com>

>>
>> Only curious, don't hold this patch up on my account. I'm not raising a
>> challenge.
> 
> Maybe checkpatch.pl could be taught to do a similar check?
>

next prev parent reply	other threads:[~2018-11-20 22:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-20 20:36 [Qemu-devel] [PATCH] misc: Avoid UTF-8 in error messages Eric Blake
2018-11-20 21:28 ` John Snow
2018-11-20 22:01   ` Eric Blake
2018-11-20 22:43     ` John Snow [this message]
2018-11-21 11:39     ` Philippe Mathieu-Daudé
2018-11-21 16:20       ` Markus Armbruster
2018-11-21  6:03 ` Thomas Huth
2018-11-22 12:27 ` [Qemu-devel] [Qemu-trivial] " Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53e365fc-bc5f-5a3a-0b96-68e308c1e5b7@redhat.com \
    --to=jsnow@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-trivial@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).