From: Jan Kiszka <jan.kiszka@siemens.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 08/30] rcu: add rcu library
Date: Mon, 01 Jul 2013 11:47:06 +0200 [thread overview]
Message-ID: <51D1501A.60503@siemens.com> (raw)
In-Reply-To: <1372444009-11544-9-git-send-email-pbonzini@redhat.com>
On 2013-06-28 20:26, Paolo Bonzini wrote:
> This includes a (mangled) copy of the urcu-qsbr code from liburcu.
> The main changes are: 1) removing dependencies on many other header files
> in liburcu; 2) removing for simplicity the tentative busy waiting in
> synchronize_rcu, which has limited performance effects; 3) replacing
> futexes in synchronize_rcu with QemuEvents for Win32 portability.
> The API is the same as liburcu, so it should be possible in the future
> to require liburcu on POSIX systems for example and use our copy only
> on Windows.
>
> Among the various versions available I chose urcu-qsbr, which has the
> fastest rcu_read_{lock,unlock} but requires the program to manually
> annotate quiescent points or intervals. QEMU threads usually have easily
> identified quiescent periods, so this should not be a problem.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> docs/rcu.txt | 303 +++++++++++++++++++++++++++++++++++++++++++++
> hw/9pfs/virtio-9p-synth.c | 1 +
> include/qemu/queue.h | 13 ++
> include/qemu/rcu-pointer.h | 110 ++++++++++++++++
> include/qemu/rcu.h | 141 +++++++++++++++++++++
> include/qemu/thread.h | 3 -
> libcacard/Makefile | 3 +-
> util/Makefile.objs | 1 +
> util/rcu.c | 195 +++++++++++++++++++++++++++++
> 9 files changed, 766 insertions(+), 4 deletions(-)
> create mode 100644 docs/rcu.txt
> create mode 100644 include/qemu/rcu-pointer.h
> create mode 100644 include/qemu/rcu.h
> create mode 100644 util/rcu.c
>
> diff --git a/docs/rcu.txt b/docs/rcu.txt
> new file mode 100644
> index 0000000..4869ec7
> --- /dev/null
> +++ b/docs/rcu.txt
> @@ -0,0 +1,303 @@
> +Using RCU (Read-Copy-Update) for synchronization
> +================================================
> +
> +Read-copy update (RCU) is a synchronization mechanism that is used to
> +protect read-mostly data structures. RCU is very efficient and scalable
> +on the read side (it is wait-free), and thus can make the read paths
> +extremely fast.
> +
> +RCU supports concurrency between a single writer and multiple readers,
> +thus it is not used alone. Typically, the write-side will use a lock to
> +serialize multiple updates, but other approaches are possible (e.g.,
> +restricting updates to a single task). In QEMU, when a lock is used,
> +this will often be the "iothread mutex", also known as the "big QEMU
> +lock" (BQL). Also, restricting updates to a single task is done in
> +QEMU using the "bottom half" API.
> +
> +RCU is fundamentally a "wait-to-finish" mechanism. The read side marks
> +sections of code with "critical sections", and the update side will wait
> +for the execution of all *currently running* critical sections before
> +proceeding, or before asynchronously executing a callback.
> +
> +The key point here is that only the currently running critical sections
> +are waited for; critical sections that are started _after_ the beginning
> +of the wait do not extend the wait, despite running concurrently with
> +the updater. This is the reason why RCU is more scalable than,
> +for example, reader-writer locks. It is so much more scalable that
> +the system will have a single instance of the RCU mechanism; a single
> +mechanism can be used for an arbitrary number of "things", without
> +having to worry about things such as contention or deadlocks.
> +
> +How is this possible? The basic idea is to split updates in two phases,
> +"removal" and "reclamation". During removal, we ensure that subsequent
> +readers will not be able to get a reference to the old data. After
> +removal has completed, a critical section will not be able to access
> +the old data. Therefore, critical sections that begin after removal
> +do not matter; as soon as all previous critical sections have finished,
> +there cannot be any readers who hold references to the data structure,
> +which may not be safely reclaimed (e.g., freed or unref'ed).
> +
> +Here is a picutre:
> +
> + thread 1 thread 2 thread 3
> + ------------------- ------------------------ -------------------
> + enter RCU crit.sec.
> + | finish removal phase
> + | begin wait
> + | | enter RCU crit.sec.
> + exit RCU crit.sec | |
> + complete wait |
> + begin reclamation phase |
> + exit RCU crit.sec.
> +
> +
> +Note how thread 3 is still executing its critical section when thread 2
> +starts reclaiming data. This is possible, because the old version of the
> +data structure was not accessible at the time thread 3 began executing
> +that critical section.
> +
> +
> +RCU API
> +=======
> +
> +The core RCU API is small:
> +
> + void rcu_read_lock(void);
> +
> + Used by a reader to inform the reclaimer that the reader is
> + entering an RCU read-side critical section.
> +
> + void rcu_read_unlock(void);
> +
> + Used by a reader to inform the reclaimer that the reader is
> + exiting an RCU read-side critical section. Note that RCU
> + read-side critical sections may be nested and/or overlapping.
> +
> + void synchronize_rcu(void);
> +
> + Blocks until all pre-existing RCU read-side critical sections
> + on all threads have completed. This marks the end of the removal
> + phase and the beginning of reclamation phase.
> +
> + Note that it would be valid for another update to come while
> + synchronize_rcu is running. Because of this, it is better that
> + the updater releases any locks it may hold before calling
> + synchronize_rcu.
> +
> + typeof(*p) rcu_dereference(p);
> + typeof(p) rcu_assign_pointer(p, typeof(p) v);
> +
> + These macros are similar to atomic_mb_read() and atomic_mb_set()
> + respectively. However, they make some assumptions on the code
> + that calls them, which allows a more optimized implementation.
> +
> + rcu_assign_pointer assumes that the update side is not going
> + to read from the data structure after "publishing" the new
> + values; that is, it assumes that all assignments happen at
> + the very end of the removal phase.
> +
> + rcu_dereference assumes that whenever a single RCU critical
> + section reads multiple shared data, these reads are either
> + data-dependent or need no ordering. This is almost always the
> + case when using RCU. If this were not the case, you can use
> + atomic_mb_read() or smp_rmb().
> +
> + If you are going to be fetching multiple fields from the
> + RCU-protected structure, repeated rcu_dereference() calls
> + would look ugly and incur unnecessary overhead on Alpha CPUs.
> + You can then do this:
> +
> + p = &rcu_dereference(head);
> + foo = head->foo;
> + bar = head->bar;
> +
> +
> +RCU QUIESCENT STATES
> +====================
> +
> +An efficient implementation of rcu_read_lock() and rcu_read_unlock()
> +relies on the availability of fast thread-local storage. Unfortunately,
> +this is not possible on all the systems supported by QEMU (in particular
> +on many POSIX systems other than Linux and Solaris).
> +
> +For this reason, QEMU's RCU implementation resorts to manual annotation
> +of "quiescent states", i.e. points where no RCU read-side critical
> +section can be active. All threads that participate in the RCU mechanism
> +need to annotate such points.
"... can be active" also implies that
rcu_read_lock();
rcu_thread_offline();
is actually an illegal ordering, right? Can we add some optional debug
code that performs runtime checking of this requirements? That should
also catch (sooner or later) leaking RCU locks.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2013-07-01 9:47 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-28 18:26 [Qemu-devel] [PATCH 00/30] Memory API changes for 1.6: RCU-protected address space dispatch Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 01/30] memory: access FlatView from a local variable Paolo Bonzini
2013-06-28 20:01 ` Anthony Liguori
2013-06-28 18:26 ` [Qemu-devel] [PATCH 02/30] memory: use a new FlatView pointer on every topology update Paolo Bonzini
2013-06-28 20:02 ` Anthony Liguori
2013-06-28 18:26 ` [Qemu-devel] [PATCH 03/30] memory: add reference counting to FlatView Paolo Bonzini
2013-06-28 20:07 ` Anthony Liguori
2013-06-28 18:26 ` [Qemu-devel] [PATCH 04/30] add a header file for atomic operations Paolo Bonzini
2013-06-28 20:41 ` Anthony Liguori
2013-07-01 10:21 ` Paolo Bonzini
2013-07-01 13:00 ` Anthony Liguori
2013-07-01 13:04 ` Paolo Bonzini
2013-07-01 13:20 ` Anthony Liguori
2013-07-04 5:24 ` liu ping fan
2013-07-01 11:08 ` Peter Maydell
2013-07-03 2:24 ` liu ping fan
2013-07-03 5:59 ` Paolo Bonzini
2013-07-03 7:07 ` liu ping fan
2013-06-28 18:26 ` [Qemu-devel] [PATCH 05/30] exec: do not use qemu/tls.h Paolo Bonzini
2013-06-28 20:43 ` Anthony Liguori
2013-06-28 23:53 ` Ed Maste
2013-07-01 10:16 ` Paolo Bonzini
2013-06-29 10:55 ` Peter Maydell
2013-07-01 10:45 ` Paolo Bonzini
2013-07-01 11:05 ` Peter Maydell
2013-07-01 16:21 ` Paolo Bonzini
2013-07-01 16:26 ` Peter Maydell
2013-07-01 20:52 ` Paolo Bonzini
2013-07-01 21:34 ` Peter Maydell
2013-07-02 13:40 ` Andreas Färber
2013-07-02 14:06 ` Alexander Graf
2013-06-28 18:26 ` [Qemu-devel] [PATCH 06/30] qemu-thread: add TLS wrappers Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 07/30] qemu-thread: add QemuEvent Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 08/30] rcu: add rcu library Paolo Bonzini
2013-07-01 9:47 ` Jan Kiszka [this message]
2013-06-28 18:26 ` [Qemu-devel] [PATCH 09/30] qemu-thread: register threads with RCU Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 10/30] rcu: add call_rcu Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 11/30] rcu: add rcutorture Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 12/30] rcu: allow nested calls to rcu_thread_offline/rcu_thread_online Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 13/30] qemu-thread: report RCU quiescent states Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 14/30] event loop: " Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 15/30] cpus: " Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 16/30] block: " Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 17/30] migration: " Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 18/30] memory: protect current_map by RCU Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 19/30] memory: avoid ref/unref in memory_region_find Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 20/30] exec: change well-known physical sections to macros Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 21/30] exec: separate current memory map from the one being built Paolo Bonzini
2013-07-02 14:41 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 22/30] memory: move MemoryListener declaration earlier Paolo Bonzini
2013-07-02 14:41 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 23/30] exec: move listener from AddressSpaceDispatch to AddressSpace Paolo Bonzini
2013-07-02 14:41 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 24/30] exec: separate current radix tree from the one being built Paolo Bonzini
2013-07-02 14:41 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 25/30] exec: put memory map in AddressSpaceDispatch Paolo Bonzini
2013-07-02 14:42 ` Jan Kiszka
2013-07-02 15:08 ` Paolo Bonzini
2013-07-02 15:48 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 26/30] exec: remove cur_map Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 27/30] exec: change some APIs to take AddressSpaceDispatch Paolo Bonzini
2013-07-02 14:47 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 28/30] exec: change iotlb " Paolo Bonzini
2013-07-02 10:00 ` Jan Kiszka
2013-06-28 18:26 ` [Qemu-devel] [PATCH 29/30] exec: add a reference to the region returned by address_space_translate Paolo Bonzini
2013-06-28 18:26 ` [Qemu-devel] [PATCH 30/30] exec: put address space dispatch under RCU critical section Paolo Bonzini
2013-06-28 19:38 ` Jan Kiszka
2013-07-01 11:48 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51D1501A.60503@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.