From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [PATCH v2 0/5] migration: improve multithreads Date: Mon, 12 Nov 2018 11:07:59 +0800 Message-ID: <2c351ac2-ad51-13de-6aea-ffc014edeffe@gmail.com> References: <20181106122025.3487-1-xiaoguangrong@tencent.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, quintela@redhat.com, Xiao Guangrong , qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com, wei.w.wang@intel.com, cota@braap.org, jiang.biao2@zte.com.cn To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com Return-path: In-Reply-To: <20181106122025.3487-1-xiaoguangrong@tencent.com> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel2=m.gmane.org@nongnu.org Sender: "Qemu-devel" List-Id: kvm.vger.kernel.org Hi, Ping... On 11/6/18 8:20 PM, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong > > Changelog in v2: > These changes are based on Paolo's suggestion: > 1) rename the lockless multithreads model to threaded workqueue > 2) hugely improve the internal design, that make all the request be > a large array, properly partition it, assign requests to threads > respectively and use bitmaps to sync up threads and the submitter, > after that ptr_ring and spinlock are dropped > 3) introduce event wait for the submitter > > These changes are based on Emilio's review: > 4) make more detailed description for threaded workqueue > 5) add a benchmark for threaded workqueue > > The previous version can be found at > https://marc.info/?l=kvm&m=153968821910007&w=2 > > There's the simple performance measurement comparing these two versions, > the environment is the same as we listed in the previous version. > > Use 8 threads to compress the data in the source QEMU > - with compress-wait-thread = off > > > total time busy-ratio > -------------------------------------------------- > v1 125066 0.38 > v2 120444 0.35 > > - with compress-wait-thread = on > total time busy-ratio > -------------------------------------------------- > v1 164426 0 > v2 142609 0 > > The v2 win slightly. > > Xiao Guangrong (5): > bitops: introduce change_bit_atomic > util: introduce threaded workqueue > migration: use threaded workqueue for compression > migration: use threaded workqueue for decompression > tests: add threaded-workqueue-bench > > include/qemu/bitops.h | 13 + > include/qemu/threaded-workqueue.h | 94 +++++++ > migration/ram.c | 538 ++++++++++++++------------------------ > tests/Makefile.include | 5 +- > tests/threaded-workqueue-bench.c | 256 ++++++++++++++++++ > util/Makefile.objs | 1 + > util/threaded-workqueue.c | 466 +++++++++++++++++++++++++++++++++ > 7 files changed, 1030 insertions(+), 343 deletions(-) > create mode 100644 include/qemu/threaded-workqueue.h > create mode 100644 tests/threaded-workqueue-bench.c > create mode 100644 util/threaded-workqueue.c >