From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: [Qemu-devel] [PULL v3 29/38] coroutine: rewrite pool to avoid mutex
Date: Tue, 13 Jan 2015 13:48:07 +0000 [thread overview]
Message-ID: <1421156896-11599-30-git-send-email-stefanha@redhat.com> (raw)
In-Reply-To: <1421156896-11599-1-git-send-email-stefanha@redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
This patch removes the mutex by using fancy lock-free manipulation of
the pool. Lock-free stacks and queues are not hard, but they can suffer
from the ABA problem so they are better avoided unless you have some
deferred reclamation scheme like RCU. Otherwise you have to stick
with adding to a list, and emptying it completely. This is what this
patch does, by coupling a lock-free global list of available coroutines
with per-CPU lists that are actually used on coroutine creation.
Whenever the destruction pool is big enough, the next thread that runs
out of coroutines will steal the whole destruction pool. This is positive
in two ways:
1) the allocation does not have to do any atomic operation in the fast
path, it's entirely using thread-local storage. Once every POOL_BATCH_SIZE
allocations it will do a single atomic_xchg. Release does an atomic_cmpxchg
loop, that hopefully doesn't cause any starvation, and an atomic_inc.
A later patch will also remove atomic operations from the release path,
and try to avoid the atomic_xchg altogether---succeeding in doing so if
all devices either use ioeventfd or are not submitting requests actively.
2) in theory this should be completely adaptive. The number of coroutines
around should be a little more than POOL_BATCH_SIZE * number of allocating
threads; so this also empties qemu_coroutine_adjust_pool_size. (The previous
pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit
more generous. But if you actually have many high-iodepth disks, it's better
to put them in different iothreads, which will also use separate thread
pools and aio=native file descriptors).
This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33.
No matter if we end with some kind of coroutine bypass scheme or not,
it cannot hurt to optimize hot code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417518350-6167-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
qemu-coroutine.c | 92 +++++++++++++++++++++++++-------------------------------
1 file changed, 41 insertions(+), 51 deletions(-)
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index bd574aa..93fddc7 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -15,31 +15,57 @@
#include "trace.h"
#include "qemu-common.h"
#include "qemu/thread.h"
+#include "qemu/atomic.h"
#include "block/coroutine.h"
#include "block/coroutine_int.h"
enum {
- POOL_DEFAULT_SIZE = 64,
+ POOL_BATCH_SIZE = 64,
};
/** Free list to speed up creation */
-static QemuMutex pool_lock;
-static QSLIST_HEAD(, Coroutine) pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int pool_size;
-static unsigned int pool_max_size = POOL_DEFAULT_SIZE;
+static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
+static unsigned int release_pool_size;
+static __thread QSLIST_HEAD(, Coroutine) alloc_pool = QSLIST_HEAD_INITIALIZER(pool);
+static __thread Notifier coroutine_pool_cleanup_notifier;
+
+static void coroutine_pool_cleanup(Notifier *n, void *value)
+{
+ Coroutine *co;
+ Coroutine *tmp;
+
+ QSLIST_FOREACH_SAFE(co, &alloc_pool, pool_next, tmp) {
+ QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
+ qemu_coroutine_delete(co);
+ }
+}
Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
{
Coroutine *co = NULL;
if (CONFIG_COROUTINE_POOL) {
- qemu_mutex_lock(&pool_lock);
- co = QSLIST_FIRST(&pool);
+ co = QSLIST_FIRST(&alloc_pool);
+ if (!co) {
+ if (release_pool_size > POOL_BATCH_SIZE) {
+ /* Slow path; a good place to register the destructor, too. */
+ if (!coroutine_pool_cleanup_notifier.notify) {
+ coroutine_pool_cleanup_notifier.notify = coroutine_pool_cleanup;
+ qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
+ }
+
+ /* This is not exact; there could be a little skew between
+ * release_pool_size and the actual size of release_pool. But
+ * it is just a heuristic, it does not need to be perfect.
+ */
+ release_pool_size = 0;
+ QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
+ co = QSLIST_FIRST(&alloc_pool);
+ }
+ }
if (co) {
- QSLIST_REMOVE_HEAD(&pool, pool_next);
- pool_size--;
+ QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
}
- qemu_mutex_unlock(&pool_lock);
}
if (!co) {
@@ -53,39 +79,19 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
static void coroutine_delete(Coroutine *co)
{
+ co->caller = NULL;
+
if (CONFIG_COROUTINE_POOL) {
- qemu_mutex_lock(&pool_lock);
- if (pool_size < pool_max_size) {
- QSLIST_INSERT_HEAD(&pool, co, pool_next);
- co->caller = NULL;
- pool_size++;
- qemu_mutex_unlock(&pool_lock);
+ if (release_pool_size < POOL_BATCH_SIZE * 2) {
+ QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next);
+ atomic_inc(&release_pool_size);
return;
}
- qemu_mutex_unlock(&pool_lock);
}
qemu_coroutine_delete(co);
}
-static void __attribute__((constructor)) coroutine_pool_init(void)
-{
- qemu_mutex_init(&pool_lock);
-}
-
-static void __attribute__((destructor)) coroutine_pool_cleanup(void)
-{
- Coroutine *co;
- Coroutine *tmp;
-
- QSLIST_FOREACH_SAFE(co, &pool, pool_next, tmp) {
- QSLIST_REMOVE_HEAD(&pool, pool_next);
- qemu_coroutine_delete(co);
- }
-
- qemu_mutex_destroy(&pool_lock);
-}
-
static void coroutine_swap(Coroutine *from, Coroutine *to)
{
CoroutineAction ret;
@@ -140,20 +146,4 @@ void coroutine_fn qemu_coroutine_yield(void)
void qemu_coroutine_adjust_pool_size(int n)
{
- qemu_mutex_lock(&pool_lock);
-
- pool_max_size += n;
-
- /* Callers should never take away more than they added */
- assert(pool_max_size >= POOL_DEFAULT_SIZE);
-
- /* Trim oversized pool down to new max */
- while (pool_size > pool_max_size) {
- Coroutine *co = QSLIST_FIRST(&pool);
- QSLIST_REMOVE_HEAD(&pool, pool_next);
- pool_size--;
- qemu_coroutine_delete(co);
- }
-
- qemu_mutex_unlock(&pool_lock);
}
--
2.1.0
next prev parent reply other threads:[~2015-01-13 13:49 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-13 13:47 [Qemu-devel] [PULL v3 00/38] Block patches Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 01/38] qemu-iotests: Remove 091 from quick group Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 02/38] qemu-iotests: Speed up make check-block Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 03/38] block: mark AioContext as recursive Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 04/38] block: do not allocate an iovec per read of a growable/zero_after_eof BDS Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 05/38] block: replace g_new0 with g_new for bottom half allocation Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 06/38] checkpatch: Brace handling on multi-line condition Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 07/38] block: Get full backing filename from string Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 08/38] block: JSON filenames and relative backing files Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 09/38] block: Relative backing file for image creation Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 10/38] block/vmdk: Relative backing file for creation Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 11/38] iotests: Add test for relative backing file names Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 12/38] qapi: Fix document for BlockStats.node-name Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 13/38] block: fix spoiling all dirty bitmaps by mirror and migration Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 14/38] qapi: Comment version info in TransactionAction Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 15/38] qmp: Add command 'blockdev-backup' Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 16/38] block: Add blockdev-backup to transaction Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 17/38] qemu-iotests: Test blockdev-backup in 055 Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 18/38] iotests: Filter out "I/O thread spun..." warning Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 19/38] migration/block: fix pending() return value Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 20/38] libqos: Convert malloc-pc allocator to a generic allocator Stefan Hajnoczi
2015-01-13 13:47 ` [Qemu-devel] [PULL v3 21/38] .gitignore: Ignore generated "common.env" Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 22/38] qemu-iotests: Replace "/bin/true" with "true" Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 23/38] qemu-iotests: Add "_supported_os Linux" to 058 Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 24/38] qemu-iotests: Add supported os parameter for python tests Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 25/38] coroutine-ucontext: use __thread Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 26/38] qemu-thread: add per-thread atexit functions Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 27/38] test-coroutine: avoid overflow on 32-bit systems Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 28/38] QSLIST: add lock-free operations Stefan Hajnoczi
2015-01-13 13:48 ` Stefan Hajnoczi [this message]
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 30/38] coroutine: drop qemu_coroutine_adjust_pool_size Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 31/38] coroutine: try harder not to delete coroutines Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 32/38] block: limited request size in write zeroes unsupported path Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 33/38] block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET} Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 34/38] ide: Implement VPD response for ATAPI Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 35/38] nvme: Fix get/set number of queues feature Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 36/38] MAINTAINERS: Update email addresses for Chrysostomos Nanakos Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 37/38] MAINTAINERS: Add migration/block* to block subsystem Stefan Hajnoczi
2015-01-13 13:48 ` [Qemu-devel] [PULL v3 38/38] NVMe: Set correct VS Value for 1.1 Compliant Controllers Stefan Hajnoczi
2015-01-13 14:37 ` [Qemu-devel] [PULL v3 00/38] Block patches Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1421156896-11599-30-git-send-email-stefanha@redhat.com \
--to=stefanha@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).