qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: 王洪浩 <wanghonghao@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kwolf@redhat.com, pbonzini@redhat.com, fam@euphon.net,
	qemu-devel@nongnu.org
Subject: PING: [PATCH 2/2] coroutine: take exactly one batch from global pool at a time
Date: Tue, 29 Sep 2020 11:24:14 +0800	[thread overview]
Message-ID: <CADzM5uRNSZurnZ-wm8-FG7H3y7_bg+V5oNo4AjNiFSWmMJcijA@mail.gmail.com> (raw)
In-Reply-To: <CADzM5uQnVRPaH6Xtef95BMJtLRCgNq2OcaMQi0xTG-dxUjJ1Fg@mail.gmail.com>

Hi, I'd like to know if there are any other problems with this patch,
or if there is a better implement to improve coroutine pool.

王洪浩 <wanghonghao@bytedance.com> 于2020年8月26日周三 下午2:06写道:

>
> The purpose of this patch is to improve performance without increasing
> memory consumption.
>
> My test case:
> QEMU command line arguments
> -drive file=/dev/nvme2n1p1,format=raw,if=none,id=local0,cache=none,aio=native \
>     -device virtio-blk,id=blk0,drive=local0,iothread=iothread0,num-queues=4 \
> -drive file=/dev/nvme3n1p1,format=raw,if=none,id=local1,cache=none,aio=native \
>     -device virtio-blk,id=blk1,drive=local1,iothread=iothread1,num-queues=4 \
>
> run these two fio jobs at the same time
> [job-vda]
> filename=/dev/vda
> iodepth=64
> ioengine=libaio
> rw=randrw
> bs=4k
> size=300G
> rwmixread=80
> direct=1
> numjobs=2
> runtime=60
>
> [job-vdb]
> filename=/dev/vdb
> iodepth=64
> ioengine=libaio
> rw=randrw
> bs=4k
> size=300G
> rwmixread=90
> direct=1
> numjobs=2
> loops=1
> runtime=60
>
> without this patch, test 3 times:
> total iops: 278548.1, 312374.1, 276638.2
> with this patch, test 3 times:
> total iops: 368370.9, 335693.2, 327693.1
>
> 18.9% improvement in average.
>
> In addition, we are also using a distributed block storage, of which
> the io latency is much more than local nvme devices because of the
> network overhead. So it needs higher iodepth(>=256) to reach its max
> throughput.
> Without this patch, it has more than 5% chance of calling
> `qemu_coroutine_new` and the iops is less than 100K, while the iops is
> about 260K with this patch.
>
> On the other hand, there's a simpler way to reduce or eliminate the
> cost of `qemu_coroutine_new` is to increase POOL_BATCH_SIZE. But it
> will also bring much more memory consumption which we don't expect.
> So it's the purpose of this patch.
>
> Stefan Hajnoczi <stefanha@redhat.com> 于2020年8月25日周二 下午10:52写道:
> >
> > On Mon, Aug 24, 2020 at 12:31:21PM +0800, wanghonghao wrote:
> > > This patch replace the global coroutine queue with a lock-free stack of which
> > > the elements are coroutine queues. Threads can put coroutine queues into the
> > > stack or take queues from it and each coroutine queue has exactly
> > > POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but it's
> > > enough for buffer pool.
> > >
> > > Coroutines will be put into thread-local pools first while release. Now the
> > > fast pathes of both allocation and release are atomic-free, and there won't
> > > be too many coroutines remain in a single thread since POOL_BATCH_SIZE has been
> > > reduced to 16.
> > >
> > > In practice, I've run a VM with two block devices binding to two different
> > > iothreads, and run fio with iodepth 128 on each device. It maintains around
> > > 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
> > > without this patch. And with this patch, it maintains no more than 273
> > > coroutines and doesn't call `qemu_coroutine_new` after initial allocations.
> >
> > Does throughput or IOPS change?
> >
> > Is the main purpose of this patch to reduce memory consumption?
> >
> > Stefan


  reply	other threads:[~2020-09-29  3:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24  4:31 [PATCH 1/2] QSLIST: add atomic replace operation wanghonghao
2020-08-24  4:31 ` [PATCH 2/2] coroutine: take exactly one batch from global pool at a time wanghonghao
2020-08-25 14:52   ` Stefan Hajnoczi
2020-08-26  6:06     ` [External] " 王洪浩
2020-09-29  3:24       ` 王洪浩 [this message]
2020-10-13 10:04         ` PING: " Stefan Hajnoczi
2020-08-24 15:26 ` [PATCH 1/2] QSLIST: add atomic replace operation Stefan Hajnoczi
2020-08-25  3:33   ` [External] " 王洪浩
2020-08-25  3:37   ` [PATCH v2 " wanghonghao
2020-08-25  3:37     ` [PATCH v2 2/2] coroutine: take exactly one batch from global pool at a time wanghonghao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADzM5uRNSZurnZ-wm8-FG7H3y7_bg+V5oNo4AjNiFSWmMJcijA@mail.gmail.com \
    --to=wanghonghao@bytedance.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).