From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46383) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPAkl-0008Lb-KT for qemu-devel@nongnu.org; Tue, 20 Nov 2018 13:28:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gPAkc-000283-Te for qemu-devel@nongnu.org; Tue, 20 Nov 2018 13:28:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48598) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gPAkc-00023B-AH for qemu-devel@nongnu.org; Tue, 20 Nov 2018 13:27:54 -0500 References: <20181106122025.3487-1-xiaoguangrong@tencent.com> <2c351ac2-ad51-13de-6aea-ffc014edeffe@gmail.com> From: Paolo Bonzini Message-ID: Date: Tue, 20 Nov 2018 19:27:38 +0100 MIME-Version: 1.0 In-Reply-To: <2c351ac2-ad51-13de-6aea-ffc014edeffe@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 0/5] migration: improve multithreads List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Xiao Guangrong , mst@redhat.com, mtosatti@redhat.com Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, dgilbert@redhat.com, peterx@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn, eblake@redhat.com, quintela@redhat.com, cota@braap.org, Xiao Guangrong On 12/11/18 04:07, Xiao Guangrong wrote: >=20 > Hi, >=20 > Ping... Hi Guangrong, I think this isn't being reviewed because we're in freeze. Paolo > On 11/6/18 8:20 PM, guangrong.xiao@gmail.com wrote: >> From: Xiao Guangrong >> >> Changelog in v2: >> These changes are based on Paolo's suggestion: >> 1) rename the lockless multithreads model to threaded workqueue >> 2) hugely improve the internal design, that make all the request be >> =C2=A0=C2=A0=C2=A0 a large array, properly partition it, assign reques= ts to threads >> =C2=A0=C2=A0=C2=A0 respectively and use bitmaps to sync up threads and= the submitter, >> =C2=A0=C2=A0=C2=A0 after that ptr_ring and spinlock are dropped >> 3) introduce event wait for the submitter >> >> These changes are based on Emilio's review: >> 4) make more detailed description for threaded workqueue >> 5) add a benchmark for threaded workqueue >> >> The previous version can be found at >> =C2=A0=C2=A0=C2=A0=C2=A0https://marc.info/?l=3Dkvm&m=3D153968821910007= &w=3D2 >> >> There's the simple performance measurement comparing these two version= s, >> the environment is the same as we listed in the previous version. >> >> Use 8 threads to compress the data in the source QEMU >> - with compress-wait-thread =3D off >> >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 total time=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 busy-ratio >> -------------------------------------------------- >> v1=C2=A0=C2=A0=C2=A0 125066=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0.38 >> v2=C2=A0=C2=A0=C2=A0 120444=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0.35 >> >> - with compress-wait-thread =3D on >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 total time=C2=A0= =C2=A0=C2=A0 busy-ratio >> -------------------------------------------------- >> v1=C2=A0=C2=A0=C2=A0 164426=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0 >> v2=C2=A0=C2=A0=C2=A0 142609=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0 >> >> The v2 win slightly. >> >> Xiao Guangrong (5): >> =C2=A0=C2=A0 bitops: introduce change_bit_atomic >> =C2=A0=C2=A0 util: introduce threaded workqueue >> =C2=A0=C2=A0 migration: use threaded workqueue for compression >> =C2=A0=C2=A0 migration: use threaded workqueue for decompression >> =C2=A0=C2=A0 tests: add threaded-workqueue-bench >> >> =C2=A0 include/qemu/bitops.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 13 + >> =C2=A0 include/qemu/threaded-workqueue.h |=C2=A0 94 +++++++ >> =C2=A0 migration/ram.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 538 >> ++++++++++++++------------------------ >> =C2=A0 tests/Makefile.include=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 5 +- >> =C2=A0 tests/threaded-workqueue-bench.c=C2=A0 | 256 ++++++++++++++++++ >> =C2=A0 util/Makefile.objs=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 1 + >> =C2=A0 util/threaded-workqueue.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | 466 >> +++++++++++++++++++++++++++++++++ >> =C2=A0 7 files changed, 1030 insertions(+), 343 deletions(-) >> =C2=A0 create mode 100644 include/qemu/threaded-workqueue.h >> =C2=A0 create mode 100644 tests/threaded-workqueue-bench.c >> =C2=A0 create mode 100644 util/threaded-workqueue.c >>