From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39445) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZrwjZ-0005HG-Q8 for qemu-devel@nongnu.org; Thu, 29 Oct 2015 19:35:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZrwjV-0007Yj-QS for qemu-devel@nongnu.org; Thu, 29 Oct 2015 19:35:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43599) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZrwjV-0007Yc-Ha for qemu-devel@nongnu.org; Thu, 29 Oct 2015 19:35:49 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 1F284B8BAB for ; Thu, 29 Oct 2015 23:35:48 +0000 (UTC) References: <1446122683-2355-1-git-send-email-armbru@redhat.com> <1446122683-2355-5-git-send-email-armbru@redhat.com> <56324CB8.5060303@redhat.com> <87io5psf7m.fsf@blackfin.pond.sub.org> From: Eric Blake Message-ID: <5632AD4A.8040504@redhat.com> Date: Thu, 29 Oct 2015 17:35:38 -0600 MIME-Version: 1.0 In-Reply-To: <87io5psf7m.fsf@blackfin.pond.sub.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5w9nO7mxK2sKdLpsRFJObfQ3EBu9b4HKU" Subject: Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: qemu-devel@nongnu.org, lcapitulino@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --5w9nO7mxK2sKdLpsRFJObfQ3EBu9b4HKU Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 10/29/2015 12:27 PM, Markus Armbruster wrote: >> Sounds like we have some quadratic (or worse) scaling in the parser. >> Worth fixing some day, but I also agree that we don't have to tackle i= t >> in this series. >=20 > I believe it's linear with a criminally negligent constant (several KiB= > per token). The first hog is actually fairly obvious: we use on QDict > per token. Wow. I just read through the code, and you're right. We are passing around QDicts right and left (one per token, with 4 keys, which is several mallocs per token), and then creating a QList of those tokens. Prior to commit 65c0f1e9, when we were really incrementing the refcount of each token on each level of nesting (as part of copying context, for definite quadratic behavior), the refcounts let us ensure tokens would be cleaned up at the end. But I'm hard pressed to see the refcount of tokens going above one in the current code, which means we aren't gaining anything by using QDict for reference counting. For that matter, JSON is quite linear - the code talks about needing to backtrack in different contexts, but JSON doesn't have ambiguities that need backtracking to try multiple different rules. It seems like the code is overengineered because it is copied from another language where backtracking to try several alternate parses actually makes sense. I suspect that using a C struct per token, and a C array of those structs, would go a long way towards alleviating memory abuse per token. Are you tackling that, or would you like me to take a stab at it while you work on flushing my pending qapi patches? >=20 >> I'm assuming you temporarily patched check-qjson to use larger constan= ts >> when you hit your ~100K token testing? Because I am definitely seeing= a >> lot of execution time spent on large_dict when running tests/check-qjs= on >> by hand, in relation to all the other tests of that file, but not >> minutes worth. Care to post the diff you played with? >=20 > I tested on a slow machine. I guess it was all the malloc pressure on a low-memory system that would make it so much slower than what I'm seeing, if you stuck with the default gen_test_json(gstr, 10, 100). --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --5w9nO7mxK2sKdLpsRFJObfQ3EBu9b4HKU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWMq1KAAoJEKeha0olJ0NqvCoIAKtpcNf6DxVqibWZC54xXMqH aMAflO0bWPcpijvon91w1lNY5Yem5/AJkGC5VJhUDFAd2XJYfs5bRRuTf/HEKI2G vbUM+4fDrLMQJhmT14aOB1ay/8UojaoonsZXRu3wgopuNcSRrrtqnXvNPH4Mmv0a 7N4RrV+4i+Zeh3gEO/xmGb+xqaMneKfnXUnAFrCveO2zcM78t6qpxQRCK9mRqh21 5Pdy8jIsCEodtzmpB3wwf+TVceEA3H+BnXqMv53Owf6rAQNI/fPHIHHMHgPlJUZd 2mZFwf4G7eF3DtSxWRGI7IpEbwcONEojtkOxQKLcObDrGIepz2spNUC5ZmaD5nY= =KfIY -----END PGP SIGNATURE----- --5w9nO7mxK2sKdLpsRFJObfQ3EBu9b4HKU--