Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Sergey Fedorov" <sergey.fedorov@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line
Date: Tue, 25 Oct 2016 16:55:58 -0400	[thread overview]
Message-ID: <20161025205558.GB22860@flamenco> (raw)
In-Reply-To: <20161025153507.27110-1-bobby.prani@gmail.com>

On Tue, Oct 25, 2016 at 11:35:06 -0400, Pranith Kumar wrote:
> Using perf, I see that sequence lock is being a bottleneck since it is
> being read by everyone. Giving it its own cache-line seems to help
> things quite a bit.
> 
> Using qht-bench, I measured the following for:
> 
> $ ./tests/qht-bench -d 10 -n 24 -u <x>
> 
> throughput base   patch  %change
> update
> 0          8.07   13.33  +65%
> 10         7.10   8.90   +25%
> 20         6.34   7.02	 +10%
> 30         5.48   6.11   +9.6%
> 40         4.90   5.46   +11.42%
> 
> I am not able to see any significant increases for lower thread counts though.

Honestly I don't know what you're measuring here.

Your results are low (I assume you're showing here throughput per-thread,
but still they're low), and it makes no sense that increasing the cacheline
footprint will make a *read-only* workload (0% updates) run faster.

More below.

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  include/qemu/seqlock.h | 2 +-
>  util/qht.c             | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
> index 8dee11d..954abe8 100644
> --- a/include/qemu/seqlock.h
> +++ b/include/qemu/seqlock.h
> @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock;
>  
>  struct QemuSeqLock {
>      unsigned sequence;
> -};
> +} QEMU_ALIGNED(64);
>  
>  static inline void seqlock_init(QemuSeqLock *sl)
>  {
> diff --git a/util/qht.c b/util/qht.c
> index ff4d2e6..4d82609 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -101,14 +101,14 @@
>   * be grabbed first.
>   */
>  struct qht_bucket {
> -    QemuSpin lock;
>      QemuSeqLock sequence;
> +    QemuSpin lock;
>      uint32_t hashes[QHT_BUCKET_ENTRIES];
>      void *pointers[QHT_BUCKET_ENTRIES];
>      struct qht_bucket *next;
>  } QEMU_ALIGNED(QHT_BUCKET_ALIGN);

I understand this is a hack but this would have been more localized:

diff --git a/util/qht.c b/util/qht.c
index ff4d2e6..55db907 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -101,14 +101,16 @@
  * be grabbed first.
  */
 struct qht_bucket {
+    struct {
+        QemuSeqLock sequence;
+    } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
     QemuSpin lock;
-    QemuSeqLock sequence;
     uint32_t hashes[QHT_BUCKET_ENTRIES];
     void *pointers[QHT_BUCKET_ENTRIES];
     struct qht_bucket *next;
 } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
 
So I tested my change above vs. master on a 16-core (32-way) Intel machine
(Xeon E5-2690 @ 2.90GHz with turbo-boost disabled), making sure threads are
scheduled on separate cores, favouring same-socket ones.
Results: http://imgur.com/a/c4dTB

So really I don't know what you're measuring.
The idea of decoupling the seqlock from the spinlock's cache line doesn't
make sense to me, because:
- Bucket lock holders are very likely to update the seqlock, so it makes sense
  to have them in the same cache line (exceptions to this are resizes or
  traversals, but those are very rare and we're not measuring those in qht-bench)
- Thanks to resizing + good hashing, bucket chains are very short, so
  a single seqlock per bucket is all we need.
- We can have *many* buckets (200K is not crazy for the TB htable), so
  anything that increases their size needs very good justification (see
  200K results above).

		E.

     prev parent reply	other threads:[~2016-10-25 20:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-25 15:35 [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line Pranith Kumar
2016-10-25 15:41 ` Paolo Bonzini
2016-10-25 15:49   ` Pranith Kumar
2016-10-25 15:54     ` Paolo Bonzini
2016-10-25 19:12       ` Pranith Kumar
2016-10-25 20:02         ` Paolo Bonzini
2016-10-25 20:35           ` Pranith Kumar
2016-10-25 20:45             ` Emilio G. Cota
2016-10-25 20:58               ` Paolo Bonzini
2016-10-25 20:55 ` Emilio G. Cota [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ff4d2e6 dfblob:55db907 )
 OR (
bs:"Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025205558.GB22860@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=bobby.prani@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=sergey.fedorov@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).