Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@canonical.com>
To: Peter Lieven <pl@kamp.de>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations
Date: Mon, 1 Dec 2014 15:46:03 +0800	[thread overview]
Message-ID: <20141201154603.2e6a0565@tom-ThinkPad-T410> (raw)
In-Reply-To: <547C132D.3070303@kamp.de>

On Mon, 01 Dec 2014 08:05:17 +0100
Peter Lieven <pl@kamp.de> wrote:

> On 01.12.2014 06:55, Ming Lei wrote:
> > On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> As discussed in the other thread, this brings speedups from
> >> dropping the coroutine mutex (which serializes multiple iothreads,
> >> too) and using ELF thread-local storage.
> >>
> >> The speedup in perf/cost is about 30% (190->145).  Windows port tested
> >> with tests/test-coroutine.exe under Wine.
> > The data is very nice, and in my laptop, 'perf cost' can be decreased
> > from 244ns to 174ns.
> >
> > BTW, the cost by using coroutine to run function isn't only from these
> > helpers(*_yield, *_enter, *_create, and perf-cost just measures
> > this part of cost), but also some implicit/invisible part. I have some
> > test cases which can show the problem. If someone is interested,
> > I can post them in list.
> 
> Of course, maybe the problem can be solved or impaired.

OK, please try below patch:

From 917d5cc0a273f9825b10abd52152c54e08c81ef8 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Mon, 1 Dec 2014 11:11:23 +0800
Subject: [PATCH] test-coroutine: introduce perf-cost-with-load

The perf/cost test case only covers explicit cost by
using coroutine.

This patch provides a open/close file test case, and
from this case, we can find there is also some implicit
or invisible cost except for the cost measured by /perf/cost.

In my environment, follows the test result after appying this
patch and running perf/cost and perf/cost-with-load:

	{*LOG(start):{/perf/cost}:LOG*}
	/perf/cost: {*LOG(message):{Run operation 40000000 iterations 7.539413
	s, 5305K operations/s, 188ns per coroutine}:LOG*}
	OK
	{*LOG(stop):(0;0;7.539497):LOG*}

	{*LOG(start):{/perf/cost-with-load}:LOG*}
	/perf/cost-with-load: {*LOG(message):{Run operation 1000000 iterations
	2.648014 s, 377K operations/s, 2648ns per operation without using
	coroutine}:LOG*}
	{*LOG(message):{Run operation 1000000 iterations 2.919133 s, 342K
	operations/s, 2919ns per operation, 271ns(cost introduced by coroutine)
	per operation with using coroutine}:LOG*}
	OK
	{*LOG(stop):(0;0;5.567333):LOG*}

From above data, we can see 188ns is introduced for running one
coroutine, but in /perf/cost-with-load, the actual cost introduced
is 271ns, and the extra 83ns cost is invisible and implicit.

The similar result can be found in following test case too:
	- read from /dev/nullb0 which is opened with O_DIRECT
	(it is sort of aio read simulation, need 3.13+ kernel for
    /dev/nullbX support by 'modprobe null_blk', this case
	can show +150ns extra cost)
	- statvfs() syscall, there is ~30ns extra cost for running
	one statvfs() with coroutine
---
 tests/test-coroutine.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 27d1b6f..7323a91 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -311,6 +311,72 @@ static void perf_baseline(void)
         maxcycles, duration);
 }
 
+static void perf_cost_load_worker(void *opaque)
+{
+    int fd;
+
+    fd = open("/proc/self/exe", O_RDONLY);
+    assert(fd >= 0);
+    close(fd);
+}
+
+static __attribute__((noinline)) void perf_cost_load_func(void *opaque)
+{
+    perf_cost_load_worker(opaque);
+    qemu_coroutine_yield();
+}
+
+static double perf_cost_load(unsigned long maxcycles, bool use_co)
+{
+    unsigned long i = 0;
+    double duration;
+
+    g_test_timer_start();
+    if (use_co) {
+        Coroutine *co;
+        while (i++ < maxcycles) {
+            co = qemu_coroutine_create(perf_cost_load_func);
+            qemu_coroutine_enter(co, &i);
+            qemu_coroutine_enter(co, NULL);
+        }
+    } else {
+        while (i++ < maxcycles) {
+            perf_cost_load_worker(&i);
+        }
+    }
+    duration = g_test_timer_elapsed();
+
+    return duration;
+}
+
+static void perf_cost_with_load(void)
+{
+    const unsigned long maxcycles = 1000000;
+    double duration;
+    unsigned long ops;
+    unsigned long cost_co, cost;
+
+    duration = perf_cost_load(maxcycles, false);
+    ops = (long)(maxcycles / (duration * 1000));
+    cost = (unsigned long)(1000000000.0 * duration / maxcycles);
+    g_test_message("Run operation %lu iterations %f s, %luK operations/s, "
+                   "%luns per operation without using coroutine",
+                   maxcycles,
+                   duration, ops,
+                   cost);
+
+    duration = perf_cost_load(maxcycles, true);
+    ops = (long)(maxcycles / (duration * 1000));
+    cost_co = (unsigned long)(1000000000.0 * duration / maxcycles);
+    g_test_message("Run operation %lu iterations %f s, %luK operations/s, "
+                   "%luns per operation, "
+                   "%luns(cost introduced by coroutine) per operation "
+                   "with using coroutine",
+                   maxcycles,
+                   duration, ops,
+                   cost_co, cost_co - cost);
+}
+
 static __attribute__((noinline)) void perf_cost_func(void *opaque)
 {
     qemu_coroutine_yield();
@@ -355,6 +421,7 @@ int main(int argc, char **argv)
         g_test_add_func("/perf/yield", perf_yield);
         g_test_add_func("/perf/function-call", perf_baseline);
         g_test_add_func("/perf/cost", perf_cost);
+        g_test_add_func("/perf/cost-with-load", perf_cost_with_load);
     }
     return g_test_run();
 }
-- 
1.7.9.5


Thanks,
-- 
Ming Lei

     prev parent reply	other threads:[~2014-12-01  7:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-28 14:12 [Qemu-devel] [PATCH 0/7] coroutine: optimizations Paolo Bonzini
2014-11-28 14:12 ` [Qemu-devel] [PATCH 1/7] coroutine-ucontext: use __thread Paolo Bonzini
2014-11-28 14:28   ` Peter Maydell
2014-11-28 14:45   ` Markus Armbruster
2014-11-28 15:36     ` Kevin Wolf
2014-11-28 14:12 ` [Qemu-devel] [PATCH 2/7] qemu-thread: add per-thread atexit functions Paolo Bonzini
2014-11-28 14:12 ` [Qemu-devel] [PATCH 3/7] test-coroutine: avoid overflow on 32-bit systems Paolo Bonzini
2014-12-01  1:28   ` Ming Lei
2014-12-01 12:41     ` Paolo Bonzini
2014-12-02  1:20       ` Ming Lei
2014-11-28 14:12 ` [Qemu-devel] [PATCH 4/7] QSLIST: add lock-free operations Paolo Bonzini
2014-11-28 14:12 ` [Qemu-devel] [PATCH 5/7] coroutine: rewrite pool to avoid mutex Paolo Bonzini
2014-11-28 16:40   ` Kevin Wolf
2014-11-28 17:30     ` Paolo Bonzini
2014-11-28 17:31     ` Paolo Bonzini
2014-11-28 18:34       ` Kevin Wolf
2014-11-28 19:57         ` Paolo Bonzini
2014-11-28 14:12 ` [Qemu-devel] [PATCH 6/7] coroutine: drop qemu_coroutine_adjust_pool_size Paolo Bonzini
2014-11-28 14:12 ` [Qemu-devel] [PATCH 7/7] coroutine: try harder not to delete coroutines Paolo Bonzini
2014-11-28 20:52   ` Peter Lieven
2014-11-29 14:27     ` Paolo Bonzini
2014-11-29 21:28       ` Peter Lieven
2014-11-29 14:28     ` Paolo Bonzini
2014-12-01  5:55 ` [Qemu-devel] [PATCH 0/7] coroutine: optimizations Ming Lei
2014-12-01  7:05   ` Peter Lieven
2014-12-01  7:46     ` Ming Lei [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:27d1b6f dfblob:7323a91 )
 OR (
bs:"test-coroutine: introduce perf-cost-with-load" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141201154603.2e6a0565@tom-ThinkPad-T410 \
    --to=ming.lei@canonical.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).