From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-devel@nongnu.org, laurent@vivier.eu, qemu-arm@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 0/2] linux-user: Change mmap_lock to rwlock
Date: Sat, 23 Jun 2018 14:20:39 -0400 [thread overview]
Message-ID: <20180623182039.GA4920@flamenco> (raw)
In-Reply-To: <1319a0f0-0009-ebfc-dab4-eec196ba8ba5@linaro.org>
On Sat, Jun 23, 2018 at 08:25:52 -0700, Richard Henderson wrote:
> On 06/22/2018 02:12 PM, Emilio G. Cota wrote:
> > I'm curious to see how much perf could be gained. It seems that the hold
> > times in SVE code for readers might not be very large, which
> > then wouldn't let us amortize the atomic inc of the read lock
> > (IOW, we might not see much of a difference compared to a regular
> > mutex).
>
> In theory, the uncontended case for rwlocks is the same as a mutex.
In the fast path, wr_lock/unlock have one more atomic than
mutex_lock/unlock. The perf difference is quite large in
microbenchmarks, e.g. changing tests/atomic_add-bench to
use pthread_mutex or pthread_rwlock_wrlock instead of
an atomic operation (this is enabled with the added -m flag):
$ taskset -c 0 perf record tests/atomic_add-bench-mutex -d 4 -m
Throughput: 62.05 Mops/s
$ taskset -c 0 perf record tests/atomic_add-bench-rwlock -d 4 -m
Throughput: 37.68 Mops/s
That said, it's unlikely to have real user-space code
(i.e. not from microbenchmarks) that would be sensitive to
the additional delay and/or lower scalability. It is common to
avoid frequent calls to mmap(2) due to potential serialization
in the kernel -- think for instance of memory allocators, they
do a few large mmap calls and then manage the memory themselves.
To double-check I ran some multi-threaded benchmarks from
Hoard[1] under qemu-linux-user, with and without the rwlock change,
and couldn't measure a significant difference.
[1] https://github.com/emeryberger/Hoard/tree/master/benchmarks
> > Are you using any benchmark that shows any perf difference?
>
> Not so far. Glibc has some microbenchmarks for strings, which I will try next
> week, but they are not multi-threaded. Maybe just run 4 threads of those
> benchmark?
I'd run more threads if possible. I have access to a 64-core machine,
so ping me once you identify benchmarks that are of interest.
Emilio
prev parent reply other threads:[~2018-06-23 18:20 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-21 17:36 [Qemu-devel] [PATCH 0/2] linux-user: Change mmap_lock to rwlock Richard Henderson
2018-06-21 17:36 ` [Qemu-devel] [PATCH 1/2] exec: Split mmap_lock to mmap_rdlock/mmap_wrlock Richard Henderson
2018-06-21 17:36 ` [Qemu-devel] [PATCH 2/2] linux-user: Use pthread_rwlock_t for mmap_rd/wrlock Richard Henderson
2018-06-22 21:13 ` Emilio G. Cota
2018-06-22 21:12 ` [Qemu-devel] [PATCH 0/2] linux-user: Change mmap_lock to rwlock Emilio G. Cota
2018-06-23 15:25 ` Richard Henderson
2018-06-23 18:20 ` Emilio G. Cota [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180623182039.GA4920@flamenco \
--to=cota@braap.org \
--cc=laurent@vivier.eu \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).