Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, kwolf@redhat.com, hreitz@redhat.com,
	peter.maydell@linaro.org
Subject: Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal
Date: Thu, 7 Jul 2022 16:37:56 +0100	[thread overview]
Message-ID: <Ysb91P/X8WO7y3h+@redhat.com> (raw)
In-Reply-To: <20220705095421.2455041-1-armbru@redhat.com>

On Tue, Jul 05, 2022 at 11:54:21AM +0200, Markus Armbruster wrote:
> QDict is implemented as a simple hash table of fixed size.  Observe:
> 
> * Slow for large n.  Not sure this matters.

I presume you're referring qdict_find() here, which would
ideally be O(1).

Our bucket size is 512, so for hash tables less than say
2000, it is close enough to O(1) that it likely doesn't
matter (except for our deterministic hash function which
can be abused to overfill specific buckets).

Ignoring the latter attack though, the fixed hash bucket
count isn't likely a speed issue for normal usage as our
QDict element counts are just not that big typically. So
it is mostly a memory wastage issue.

Historically QEMU's JSON input has come from sources that
are more trusted than QEMU itself, so didn't matter. As
we split up QEMU into co-operating processes with potentially
varying privileges, this may cease to be a safe assumption.

For pre-emptive robustness though I'd favour a guaranteed
O(1) impl, which would mean a dynamically resizing bucket
count, along with a non-deterministic (ideally cryptographically
strong) key hash function.

> * A QDict with n entries takes 4120 + n * 32 bytes on my box.  Wastes
>   space for small n, which is a common case.

So effectively 8k usage for every QDict instance at a minimum.
This is not so great with widespread QDict usage.

> * Order of traversal depends on the hash function and on insertion
>   order, because it iterates first over buckets, then collision
>   chains.
> 
> * Special code ensures qdict_size() takes constant time.
> 
> Replace the hash table by a linked list.  Observe:
> 
> * Even slower for large n.  Might be bad enough to matter.

Guaranteed O(n) every time, even for small values of 'n'.
Just feels like a bad idea to me.

> * A QDict with n entries takes 32 + n * 24 bytes.
> 
> * Traversal is in insertion order.
> 
> * qdict_size() is linear in the number of entries.
> 
> This is an experiment.  Do not commit to master as is.

Two alternative ideas.

 * Implement it is both a hashtable and a linked list.
   Hashtable to get O(1) lookup, linked list to get
   stable iteration order based on insertion order.
   Makes the insert/delete operations more expensive,
   and slightly greater memory overhead.

 * Merely change the users to apply the ordering they
   require when iterating.

In both those cases, I'd suggest we consider use of GHashTable, to give
us a more dynamic hash table impl with resizing buckets, so it is more
memory efficient and stronger guarantee of O(1) lookups. It also quite
simple to iterate over the keys in a fixed order, as you can get a GList
of keys, and invoke g_list_sort with any comparator. While we could add
more APIs to do this with QDict and QLIST, re-inventing the wheel feels
dubious unless there's a compelling benefit to our existing impl.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

next prev parent reply	other threads:[~2022-07-07 15:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05  9:54 [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal Markus Armbruster
2022-07-06 11:35 ` Markus Armbruster
2022-07-06 11:49   ` Mark Cave-Ayland
2022-07-08 11:01   ` Daniel P. Berrangé
2022-07-11 10:32     ` Peter Maydell
2022-07-11 11:09       ` Daniel P. Berrangé
2022-07-11 11:15         ` Peter Maydell
2022-07-18 10:45           ` Markus Armbruster
2022-07-07 12:57 ` Peter Maydell
2022-07-07 14:27   ` Markus Armbruster
2022-07-07 15:07     ` Daniel P. Berrangé
2022-07-07 14:43 ` Daniel P. Berrangé
2022-07-07 15:37 ` Daniel P. Berrangé [this message]
2022-07-07 15:52 ` Alex Bennée
2022-07-08 10:19   ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ysb91P/X8WO7y3h+@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).