From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57504) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtKV8-0003RX-8A for qemu-devel@nongnu.org; Mon, 02 Nov 2015 14:10:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtKV4-0003iU-V8 for qemu-devel@nongnu.org; Mon, 02 Nov 2015 14:10:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51747) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtKV4-0003iM-It for qemu-devel@nongnu.org; Mon, 02 Nov 2015 14:10:38 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id A43384C64A for ; Mon, 2 Nov 2015 19:10:37 +0000 (UTC) From: Markus Armbruster References: <5633C8EC.8030309@redhat.com> <874mh44wvs.fsf@blackfin.pond.sub.org> <56378572.5020203@redhat.com> Date: Mon, 02 Nov 2015 20:10:33 +0100 In-Reply-To: <56378572.5020203@redhat.com> (Eric Blake's message of "Mon, 2 Nov 2015 08:46:58 -0700") Message-ID: <87egg8nro5.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] RFC: libyajl for JSON List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Luiz Capitulino , "qemu-devel@nongnu.org" , "Dr. David Alan Gilbert" Eric Blake writes: > On 11/02/2015 01:40 AM, Markus Armbruster wrote: >> Eric Blake writes: >> >>> Loaded question in response to >>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06988.html, but >> >> Discussion of our parser's enormous appetite for wasting RAM. Fixable, >> but it's work, and it's not its only defect. >> > >>> On the other hand, one of the "features" of qemu's hand-rolled json >>> parser is the ability to do qobject_from_jsonf("{'foo':%s}", "bar") >>> (that is, we extended JSON with our notion of single-quoted strings, and >>> with our notion of %s and similar escape sequences for piecing together >>> multiple inputs into a single input stream without having to first >>> g_strdup_printf our pieces into a single string). I don't know if >>> libyajl lets us add extensions to the language it parses. >> >> Actually two separate extensions: >> >> * Single-quoted strings >> >> The extension's purpose is avoiding quotes in C. Example: >> >> "{ 'execute': 'migrate_set_speed', 'arguments': { 'value': 10 } }" >> >> is easier to read than >> >> "{ \"execute\": \"migrate_set_speed\", \"arguments\": { \"value\": 10 } }" >> >> Most actual uses are in tests. > > Except that we have documented that QMP clients may use it; and indeed: > > $ printf "{'execute':'qmp_capabilities'}\n{'execute':'quit'}" | \ > ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 2}, > "package": ""}, "capabilities": []}} > {"return": {}} > {"return": {}} > {"timestamp": {"seconds": 1446478389, "microseconds": 265886}, "event": > "SHUTDOWN"} Unnecessary mistake :( > So that is an absolute must for whatever parser we choose. My first > glance at libyajl is that it does NOT currently allow single quotes (not > even with any of its existing options), so we'd have to pre-process > input before feeding it to yajl. I'm not sure reusing an existing parser makes much sense if we need to preprocess anyway. After all, parsing JSON isn't exactly a moon shot, at least if you know what you're doing. >> JSON5, a proposed extension to JSON also supports single-quoted >> strings. So does Python. > > It would be an interesting task to see if yajl would accept patches to > parse JSON5 as additional options (for example, yajl already has options > on whether to diagnose or ignore UTF8 errors, and on whether to allow /* > */ javascript comments). But then we'd be requiring an > as-yet-unreleased version of libyajl rather than being able to rely on > what distros already have, at least for a few years; or we'd have to > carry new-enough yajl as a git submodule within qemu.git. > > I'll ask on the yajl mailing lists, to get a feel for community > receptiveness, before attempting to actually write patches on that front. Makes sense. >> * Optional interpolation >> >> If you pass a va_list to the parser, it replaces certain escape >> sequences by values from that va_list. The escape sequences are a >> subset of printf conversion specifiers, to enable compile-time >> checking with __attribute__((format...)). >> >> We used to rely on this for creating "rich" error objects, but those >> are long gone (commit df1e608). Perhaps two dozen uses are left. > > The testsuite is probably the biggest user, but we still have uses > throughout the code base. > > Based on my look at libyajl, I think we CAN get away with stream > interpolation - yajl maintains state such that you do NOT have to feed > it the entire stream in one go. So this one is not insurmountable; we > could rewrite our qobject_from_jsonf() to make multiple calls into yajl > without having to pre-format a single string. Sounds complicated... >> We could convert them to use "normal" interpolation, i.e. first format >> the arguments as JSON, then interpolate with g_strdup_printf() or >> similar, then parse. Formatting only to parse it right back is >> inelegant. Also inefficient, but that probably doesn't matter here. > > Or indeed, this is still an option. > >> >> The current interface could be retained as convenience function to >> interpolate and feed the result to the JSOn parser. >> >> Alternatively, we could build the QObject manually instead. More >> efficient than either kind of interpolation, but a good deal less >> readable. > > At any rate, it is certainly less of a show-stopper when compared to the > need for supporting single-quoted strings. > >> Got one more, actually a pitfall, not an extension: >> >> * Representation of JSON numbers >> >> RFC 7159 doesn't specify how numbers are to be represented. >> >> Many JSON implementations represent any JSON number as double. This >> can represent signed integers up to 53 bits exactly. >> >> Our parser represents numbers as int64_t when possible, else as >> double. >> >> Unlike the extensions above, this one is essential: any parser that >> can't do it is useless for us. Can yajl do it? > > Yes; the yajl parser has 4 modes of parse operation based on which of > three callbacks you implement: double-only (yajl_double), long long-only > (yajl_integer), double-and-int (both yajl_double and yajl_integer, not > sure which one has precedence if input satisfies both formats), or > custom number (yajl_number, which is given a string, and you turn it > into the format of your choice). Likewise, the yajl formatter has 2 > functions: yajl_gen_double() (formats doubles) and yajl_gen_number() > (you provide literal strings formatted how you like). Good. >> More extensions or pitfalls might be lurking in our parser. Therefore, >> replacing our parser by a suitable library is not without risk. I guess >> we could do it over a full development cycle. No way for 2.5. >> > > Absolutely not for 2.5; we've missed softfreeze, and this would count as > a new feature. > >> If we decide to attempt replacing it in the next development cycle, we >> might want to refrain from fixing its bugs except for true show >> stoppers. >>