From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46346)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lersek@redhat.com>) id 1a0xgw-0003nT-3o
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 15:26:27 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lersek@redhat.com>) id 1a0xgs-0001lc-3g
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 15:26:26 -0500
Received: from mx1.redhat.com ([209.132.183.28]:45780)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lersek@redhat.com>) id 1a0xgr-0001lS-VE
	for qemu-devel@nongnu.org; Mon, 23 Nov 2015 15:26:22 -0500
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (Postfix) with ESMTPS id 0044CC0F1CF5
	for <qemu-devel@nongnu.org>; Mon, 23 Nov 2015 20:26:20 +0000 (UTC)
References: <1448300659-23559-1-git-send-email-pbonzini@redhat.com>
	<1448300659-23559-3-git-send-email-pbonzini@redhat.com>
	<565353ED.1090502@redhat.com> <56537170.8070600@redhat.com>
From: Laszlo Ersek <lersek@redhat.com>
Message-ID: <5653766A.3090300@redhat.com>
Date: Mon, 23 Nov 2015 21:26:18 +0100
MIME-Version: 1.0
In-Reply-To: <56537170.8070600@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Cc: armbru@redhat.com

On 11/23/15 21:05, Eric Blake wrote:
> On 11/23/2015 10:59 AM, Laszlo Ersek wrote:
>> On 11/23/15 18:44, Paolo Bonzini wrote:
>>> JSON is LL(1) and our parser indeed needs only 1 token lookahead.
>>> Saving the parser context is mostly unnecessary; we can replace it
>>> with peeking at the next token, or remove it altogether when the
>>> restore only happens on errors.  The token list is destroyed anyway
>>> on errors.
>>>
>>> The only interesting thing is that parse_keyword always eats
>>> a TOKEN_KEYWORD, even if it is invalid, so it must come last in
>>> parse_value (otherwise, NULL is returned, parse_literal is invoked
>>> and it tries to peek beyond end of input).  This is caught by
>>> /errors/unterminated/literal, which actually checks for an unterminat=
ed
>>> keyword. =E0=B2=A0_=E0=B2=A0
>>
>> Is it accepted practice to put UTF-8 in commit messages? (Or, actually=
,
>> anywhere in patches, except maybe the notes section?)
>>
>=20
> Git handles UTF-8 just fine (and for any other encoding, properly
> transmitted in the email, git transcodes to UTF-8 before writing it int=
o
> the repository).
>=20

Yes, I know. I use latin2:

$ locale

LANG=3D
LC_CTYPE=3Dhu_HU.ISO8859-2
LC_NUMERIC=3D"POSIX"
LC_TIME=3D"POSIX"
LC_COLLATE=3D"POSIX"
LC_MONETARY=3D"POSIX"
LC_MESSAGES=3D"POSIX"
LC_PAPER=3D"POSIX"
LC_NAME=3D"POSIX"
LC_ADDRESS=3D"POSIX"
LC_TELEPHONE=3D"POSIX"
LC_MEASUREMENT=3D"POSIX"
LC_IDENTIFICATION=3D"POSIX"
LC_ALL=3D

and from my git config:

[i18n]
	logOutputEncoding =3D latin2
	commitencoding =3D latin2

This works very well -- as long as it doesn't choke on something outside
of latin2 --, both the glibc locale support and git are doing their jobs
perfectly fine; my question concerned any other users who decided to
stay with single-byte encodings (with an ASCII subset).

(I believe that RFCs stick with ASCII to this day, and I also think that
our source code and docs/ should stick with ASCII; but I know I can't
plausibly argue for the same in commit messages, assuming I'm alone with
that anyway.

BTW I should have written "non-ASCII Unicode code points" in my original
question, rather than "UTF-8".)

Thanks!
Laszlo