qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Fam Zheng <famz@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Ming Lei <ming.lei@canonical.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Tue, 5 Aug 2014 16:47:28 +0200	[thread overview]
Message-ID: <20140805144728.GH4391@noname.str.redhat.com> (raw)
In-Reply-To: <20140805134815.GD12251@stefanha-thinkpad.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6461 bytes --]

Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben:
> On Tue, Aug 05, 2014 at 06:00:22PM +0800, Ming Lei wrote:
> > On Tue, Aug 5, 2014 at 5:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > > Am 05.08.2014 um 05:33 hat Ming Lei geschrieben:
> > >> Hi,
> > >>
> > >> These patches bring up below 4 changes:
> > >>         - introduce object allocation pool and apply it to
> > >>         virtio-blk dataplane for improving its performance
> > >>
> > >>         - introduce selective coroutine bypass mechanism
> > >>         for improving performance of virtio-blk dataplane with
> > >>         raw format image
> > >
> > > Before applying any bypassing patches, I think we should understand in
> > > detail where we are losing performance with coroutines enabled.
> > 
> > From the below profiling data, CPU becomes slow to run instructions
> > with coroutine, and CPU dcache miss is increased so it is very
> > likely caused by switching stack frequently.
> > 
> > http://marc.info/?l=qemu-devel&m=140679721126306&w=2
> > 
> > http://pastebin.com/ae0vnQ6V
> 
> I have been wondering how to prove that the root cause is the ucontext
> coroutine mechanism (stack switching).  Here is an idea:
> 
> Hack your "bypass" code path to run the request inside a coroutine.
> That way you can compare "bypass without coroutine" against "bypass with
> coroutine".
> 
> Right now I think there are doubts because the bypass code path is
> indeed a different (and not 100% correct) code path.  So this approach
> might prove that the coroutines are adding the overhead and not
> something that you bypassed.

My doubts aren't only that the overhead might not come from the
coroutines, but also whether any coroutine-related overhead is really
unavoidable. If we can optimise coroutines, I'd strongly prefer to do
just that instead of introducing additional code paths.

Another thought I had was this: If the performance difference is indeed
only coroutines, then that is completely inside the block layer and we
don't actually need a VM to test it. We could instead have something
like a simple qemu-img based benchmark and should be observing the same.

I played a bit with the following, I hope it's not too naive. I couldn't
see a difference with your patches, but at least one reason for this is
probably that my laptop SSD isn't fast enough to make the CPU the
bottleneck. Haven't tried ramdisk yet, that would probably be the next
thing. (I actually wrote the patch up just for some profiling on my own,
not for comparing throughput, but it should be usable for that as well.)

Kevin


diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index d029609..ae64b3d 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -9,6 +9,12 @@ STEXI
 @table @option
 ETEXI
 
+DEF("bench", img_bench,
+    "bench [-q] [-f fmt] [-n] [-t cache] filename")
+STEXI
+@item bench [-q] [-f @var{fmt]} [-n] [-t @var{cache}] filename
+ETEXI
+
 DEF("check", img_check,
     "check [-q] [-f fmt] [--output=ofmt]  [-r [leaks | all]] filename")
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index d4518e7..92e9529 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2789,6 +2789,132 @@ out:
     return 0;
 }
 
+typedef struct BenchData {
+    BlockDriverState *bs;
+    int bufsize;
+    int nrreq;
+    int n;
+    uint8_t *buf;
+    QEMUIOVector *qiov;
+
+    int in_flight;
+    uint64_t sector;
+} BenchData;
+
+static void bench_cb(void *opaque, int ret)
+{
+    BenchData *b = opaque;
+    BlockDriverAIOCB *acb;
+
+    if (ret < 0) {
+        error_report("Failed request: %s\n", strerror(-ret));
+        exit(EXIT_FAILURE);
+    }
+    if (b->in_flight > 0) {
+        b->n--;
+        b->in_flight--;
+    }
+
+    while (b->n > b->in_flight && b->in_flight < b->nrreq) {
+        acb = bdrv_aio_readv(b->bs, b->sector, b->qiov,
+                             b->bufsize >> BDRV_SECTOR_BITS,
+                             bench_cb, b);
+        if (!acb) {
+            error_report("Failed to issue request");
+            exit(EXIT_FAILURE);
+        }
+        b->in_flight++;
+        b->sector += b->bufsize;
+        b->sector %= b->bs->total_sectors;
+    }
+}
+
+static int img_bench(int argc, char **argv)
+{
+    int c, ret = 0;
+    const char *fmt = NULL, *filename;
+    bool quiet = false;
+    BlockDriverState *bs = NULL;
+    int flags = BDRV_O_FLAGS;
+    int i;
+
+    for (;;) {
+        c = getopt(argc, argv, "hf:nqt:");
+        if (c == -1) {
+            break;
+        }
+
+        switch (c) {
+            case 'h':
+            case '?':
+                help();
+                break;
+            case 'f':
+                fmt = optarg;
+                break;
+            case 'n':
+                flags |= BDRV_O_NATIVE_AIO;
+                break;
+            case 'q':
+                quiet = true;
+                break;
+            case 't':
+                ret = bdrv_parse_cache_flags(optarg, &flags);
+                if (ret < 0) {
+                    error_report("Invalid cache mode");
+                    ret = -1;
+                    goto out;
+                }
+                break;
+        }
+    }
+
+    if (optind != argc - 1) {
+        error_exit("Expecting one image file name");
+    }
+    filename = argv[argc - 1];
+
+    bs = bdrv_new_open("image", filename, fmt, flags, true, quiet);
+    if (!bs) {
+        error_report("Could not open image '%s'", filename);
+        ret = -1;
+        goto out;
+    }
+
+    BenchData data = {
+        .bs = bs,
+        .bufsize = 0x1000,
+        .nrreq = 64,
+        .n = 75000,
+    };
+
+    data.buf = qemu_blockalign(bs, data.nrreq * data.bufsize);
+    data.qiov = g_new(QEMUIOVector, data.nrreq);
+    for (i = 0; i < data.nrreq; i++) {
+        qemu_iovec_init(&data.qiov[i], 1);
+        qemu_iovec_add(&data.qiov[i],
+                       data.buf + i * data.bufsize, data.bufsize);
+    }
+
+    bench_cb(&data, 0);
+
+    while (data.n > 0) {
+        main_loop_wait(false);
+    }
+
+out:
+    qemu_vfree(data.buf);
+    if (bs) {
+        bdrv_unref(bs);
+    }
+
+    if (ret) {
+        return 1;
+    }
+    return 0;
+}
+
+
 static const img_cmd_t img_cmds[] = {
 #define DEF(option, callback, arg_string)        \
     { option, callback },

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-08-05 14:48 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  3:33 [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 01/17] qemu/obj_pool.h: introduce object allocation pool Ming Lei
2014-08-05 11:55   ` Eric Blake
2014-08-05 12:05     ` Michael S. Tsirkin
2014-08-05 12:21       ` Eric Blake
2014-08-05 12:51         ` Michael S. Tsirkin
2014-08-06  2:35     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 02/17] dataplane: use object pool to speed up allocation for virtio blk request Ming Lei
2014-08-05 12:30   ` Eric Blake
2014-08-06  2:45     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 03/17] qemu coroutine: support bypass mode Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 04/17] block: prepare for supporting selective bypass coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 05/17] garbage collector: introduced for support of " Ming Lei
2014-08-05 12:43   ` Eric Blake
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 06/17] block: introduce bdrv_co_can_bypass_co Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 07/17] block: support to bypass qemu coroutinue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 08/17] Revert "raw-posix: drop raw_get_aio_fd() since it is no longer used" Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 09/17] dataplane: enable selective bypassing coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 10/17] linux-aio: fix submit aio as a batch Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 11/17] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 12/17] linux-aio: increase max event to 256 Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 13/17] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 14/17] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 15/17] virtio-blk: support multi queue for non-dataplane Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 16/17] virtio-blk: dataplane: support multi virtqueue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 17/17] hw/virtio-pci: introduce num_queues property Ming Lei
2014-08-05  9:38 ` [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Stefan Hajnoczi
2014-08-05  9:50   ` Ming Lei
2014-08-05  9:56     ` Kevin Wolf
2014-08-05 10:50       ` Ming Lei
2014-08-05 13:59     ` Stefan Hajnoczi
2014-08-05  9:48 ` Kevin Wolf
2014-08-05 10:00   ` Ming Lei
2014-08-05 11:44     ` Paolo Bonzini
2014-08-05 13:48     ` Stefan Hajnoczi
2014-08-05 14:47       ` Kevin Wolf [this message]
2014-08-06  5:33         ` Ming Lei
2014-08-06  7:45           ` Paolo Bonzini
2014-08-06  8:38             ` Ming Lei
2014-08-06  8:50               ` Paolo Bonzini
2014-08-06 13:53                 ` Ming Lei
2014-08-06  8:48           ` Kevin Wolf
2014-08-06  9:37             ` Ming Lei
2014-08-06 10:09               ` Kevin Wolf
2014-08-06 11:28                 ` Ming Lei
2014-08-06 11:44                   ` Ming Lei
2014-08-06 15:40                   ` Kevin Wolf
2014-08-07 10:27                     ` Ming Lei
2014-08-07 10:52                       ` Ming Lei
2014-08-07 11:06                         ` Kevin Wolf
2014-08-07 13:03                           ` Ming Lei
2014-08-07 13:51                       ` Kevin Wolf
2014-08-08 10:32                         ` Ming Lei
2014-08-08 11:26                           ` Ming Lei
2014-08-10  3:46             ` Ming Lei
2014-08-11 14:03               ` Kevin Wolf
2014-08-12  7:53                 ` Ming Lei
2014-08-12 11:40                   ` Kevin Wolf
2014-08-12 12:14                     ` Ming Lei
2014-08-11 19:37               ` Paolo Bonzini
2014-08-12  8:12                 ` Ming Lei
2014-08-12 19:08                   ` Paolo Bonzini
2014-08-13  9:54                     ` Kevin Wolf
2014-08-13 13:16                       ` Paolo Bonzini
2014-08-13 13:49                         ` Ming Lei
2014-08-14  9:39                           ` Stefan Hajnoczi
2014-08-14 10:12                             ` Ming Lei
2014-08-15 20:16                             ` Paolo Bonzini
2014-08-13 10:19                     ` Ming Lei
2014-08-13 12:35                       ` Paolo Bonzini
2014-08-13  8:55                 ` Stefan Hajnoczi
2014-08-13 11:43                 ` Ming Lei
2014-08-13 12:35                   ` Paolo Bonzini
2014-08-13 13:07                     ` Ming Lei
2014-08-14 10:46                 ` Kevin Wolf
2014-08-15 10:39                   ` Ming Lei
2014-08-15 20:15                   ` Paolo Bonzini
2014-08-16  8:20                     ` Ming Lei
2014-08-17  5:29                     ` Paolo Bonzini
2014-08-18  8:58                       ` Kevin Wolf
2014-08-06  9:37           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805144728.GH4391@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).