From: Eric Blake <eblake@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, lcapitulino@redhat.com
Subject: Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
Date: Thu, 29 Oct 2015 17:35:38 -0600 [thread overview]
Message-ID: <5632AD4A.8040504@redhat.com> (raw)
In-Reply-To: <87io5psf7m.fsf@blackfin.pond.sub.org>
[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]
On 10/29/2015 12:27 PM, Markus Armbruster wrote:
>> Sounds like we have some quadratic (or worse) scaling in the parser.
>> Worth fixing some day, but I also agree that we don't have to tackle it
>> in this series.
>
> I believe it's linear with a criminally negligent constant (several KiB
> per token). The first hog is actually fairly obvious: we use on QDict
> per token.
Wow. I just read through the code, and you're right. We are passing
around QDicts right and left (one per token, with 4 keys, which is
several mallocs per token), and then creating a QList of those tokens.
Prior to commit 65c0f1e9, when we were really incrementing the refcount
of each token on each level of nesting (as part of copying context, for
definite quadratic behavior), the refcounts let us ensure tokens would
be cleaned up at the end. But I'm hard pressed to see the refcount of
tokens going above one in the current code, which means we aren't
gaining anything by using QDict for reference counting. For that
matter, JSON is quite linear - the code talks about needing to backtrack
in different contexts, but JSON doesn't have ambiguities that need
backtracking to try multiple different rules. It seems like the code is
overengineered because it is copied from another language where
backtracking to try several alternate parses actually makes sense.
I suspect that using a C struct per token, and a C array of those
structs, would go a long way towards alleviating memory abuse per token.
Are you tackling that, or would you like me to take a stab at it while
you work on flushing my pending qapi patches?
>
>> I'm assuming you temporarily patched check-qjson to use larger constants
>> when you hit your ~100K token testing? Because I am definitely seeing a
>> lot of execution time spent on large_dict when running tests/check-qjson
>> by hand, in relation to all the other tests of that file, but not
>> minutes worth. Care to post the diff you played with?
>
> I tested on a slow machine.
I guess it was all the malloc pressure on a low-memory system that would
make it so much slower than what I'm seeing, if you stuck with the
default gen_test_json(gstr, 10, 100).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
next prev parent reply other threads:[~2015-10-29 23:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
2015-10-29 16:22 ` Eric Blake
2015-10-29 12:44 ` [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit Markus Armbruster
2015-10-29 16:25 ` Eric Blake
2015-11-23 17:21 ` Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit Markus Armbruster
2015-10-29 16:36 ` Eric Blake
2015-10-29 18:33 ` Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size Markus Armbruster
2015-10-29 16:43 ` Eric Blake
2015-10-29 18:27 ` Markus Armbruster
2015-10-29 23:35 ` Eric Blake [this message]
2015-10-30 7:52 ` Markus Armbruster
2015-10-30 15:22 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5632AD4A.8040504@redhat.com \
--to=eblake@redhat.com \
--cc=armbru@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).