From: Xiao Guangrong <guangrong.xiao@gmail.com>
To: Peter Xu <peterx@redhat.com>
Cc: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com,
qemu-devel@nongnu.org, kvm@vger.kernel.org, dgilbert@redhat.com,
jiang.biao2@zte.com.cn, wei.w.wang@intel.com,
Xiao Guangrong <xiaoguangrong@tencent.com>
Subject: Re: [Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression
Date: Tue, 12 Jun 2018 11:19:14 +0800 [thread overview]
Message-ID: <9127095c-dff2-a963-d9d1-3d1fe22b7937@gmail.com> (raw)
In-Reply-To: <20180611080042.GK7736@xz-mi>
On 06/11/2018 04:00 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 05:55:08PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Background
>> ----------
>> Current implementation of compression and decompression are very
>> hard to be enabled on productions. We noticed that too many wait-wakes
>> go to kernel space and CPU usages are very low even if the system
>> is really free
>>
>> The reasons are:
>> 1) there are two many locks used to do synchronous,there
>> is a global lock and each single thread has its own lock,
>> migration thread and work threads need to go to sleep if
>> these locks are busy
>>
>> 2) migration thread separately submits request to the thread
>> however, only one request can be pended, that means, the
>> thread has to go to sleep after finishing the request
>>
>> Our Ideas
>> ---------
>> To make it work better, we introduce a new multithread model,
>> the user, currently it is the migration thread, submits request
>> to each thread with round-robin manner, the thread has its own
>> ring whose capacity is 4 and puts the result to a global ring
>> which is lockless for multiple producers, the user fetches result
>> out from the global ring and do remaining operations for the
>> request, e.g, posting the compressed data out for migration on
>> the source QEMU
>>
>> Other works in this patchset is offering some statistics to see
>> if compression works as we expected and making the migration thread
>> work fast so it can feed more requests to the threads
>
> Hi, Guangrong,
>
> I'm not sure whether my understanding is correct, but AFAIU the old
> code has a major defect that it depends too much on the big lock. The
> critial section of the small lock seems to be very short always, and
> also that's per-thread. However we use the big lock in lots of
> places: flush compress data, queue every page, or send the notifies in
> the compression thread.
>
The lock is one issue, however, another issue is that, the thread has
to go to sleep after finishing one request and the main thread (live
migration thread) needs to go to kernel space and wake the thread up
for every single request.
And we also observed that linearly scan the threads one by one to
see which is free is not cache-friendly...
> I haven't yet read the whole work, this work seems to be quite nice
> according to your test results. However have you thought about
> firstly remove the big lock without touching much of other part of the
> code, then continue to improve it? Or have you ever tried to do so?
> I don't think you need to do extra work for this, but I would
> appreciate if you have existing test results to share.
>
If you really want the performance result, i will try it...
Actually, the first version we used on our production is that we
use a lockless multi-thread model (only one atomic operation is needed
for both producer and consumer) but only one request can be fed to the
thread. It's comparable to your suggestion (and should far more faster
than your suggestion).
We observed the shortcoming of this solutions is that too many waits and
wakeups trapped to kernel, so CPU is idle and bandwidth is low.
> In other words, would it be nicer to separate the work into two
> pieces?
>
> - one to refactor the existing locks, to see what we can gain by
> simplify the locks to minimum. AFAIU now the locking used is still
> not ideal, my thinking is that _maybe_ we can start by removing the
> big lock, and use a semaphore or something to replace the "done"
> notification while still keep the small lock? Even some busy
> looping?
>
Note: no lock is used after this patchset...
> - one to introduce the lockless ring buffer, to demostrate how the
> lockless data structure helps comparing to the locking ways
>
> Then we can know which item contributed how much to the performance
> numbers. After all the new ring and thread model seems to be a big
> chunk of work (sorry I haven't read them yet, but I will).
It is really a huge burden that refactor old code and later completely
remove old code.
We redesigned the data struct and algorithm completely and abstract the
model to clean up the code used for compression and decompression, it's
not easy to modify the old code part by part... :(
But... if you really it is really needed, i will try to figure out a way
to address your suggestion. :)
Thanks!
next prev parent reply other threads:[~2018-06-12 3:19 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-04 9:55 [Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression guangrong.xiao
2018-06-04 9:55 ` [Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread guangrong.xiao
2018-06-11 7:39 ` Peter Xu
2018-06-12 2:42 ` Xiao Guangrong
2018-06-12 3:15 ` Peter Xu
2018-06-13 15:43 ` Dr. David Alan Gilbert
2018-06-14 3:19 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 02/12] migration: fix counting normal page for compression guangrong.xiao
2018-06-13 15:51 ` Dr. David Alan Gilbert
2018-06-14 3:32 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 03/12] migration: fix counting xbzrle cache_miss_rate guangrong.xiao
2018-06-13 16:09 ` Dr. David Alan Gilbert
2018-06-15 11:30 ` Dr. David Alan Gilbert
2018-06-04 9:55 ` [Qemu-devel] [PATCH 04/12] migration: introduce migration_update_rates guangrong.xiao
2018-06-13 16:17 ` Dr. David Alan Gilbert
2018-06-14 3:35 ` Xiao Guangrong
2018-06-15 11:32 ` Dr. David Alan Gilbert
2018-06-04 9:55 ` [Qemu-devel] [PATCH 05/12] migration: show the statistics of compression guangrong.xiao
2018-06-04 22:31 ` Eric Blake
2018-06-06 12:44 ` Xiao Guangrong
2018-06-13 16:25 ` Dr. David Alan Gilbert
2018-06-14 6:48 ` Xiao Guangrong
2018-07-16 19:01 ` Dr. David Alan Gilbert
2018-07-18 8:51 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression guangrong.xiao
2018-06-19 7:30 ` Peter Xu
2018-06-28 9:12 ` Xiao Guangrong
2018-06-28 9:36 ` Daniel P. Berrangé
2018-06-29 3:50 ` Xiao Guangrong
2018-06-29 9:54 ` Dr. David Alan Gilbert
2018-06-29 9:42 ` Dr. David Alan Gilbert
2018-07-03 3:53 ` Xiao Guangrong
2018-07-16 18:58 ` Dr. David Alan Gilbert
2018-07-18 8:46 ` Xiao Guangrong
2018-07-22 16:05 ` Michael S. Tsirkin
2018-07-23 7:12 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed guangrong.xiao
2018-06-19 7:36 ` Peter Xu
2018-06-28 9:33 ` Xiao Guangrong
2018-06-29 11:22 ` Dr. David Alan Gilbert
2018-07-03 6:27 ` Xiao Guangrong
2018-07-11 8:21 ` Peter Xu
2018-07-12 7:47 ` Xiao Guangrong
2018-07-12 8:26 ` Peter Xu
2018-07-18 8:56 ` Xiao Guangrong
2018-07-18 10:18 ` Peter Xu
2018-07-13 17:44 ` Dr. David Alan Gilbert
2018-06-04 9:55 ` [Qemu-devel] [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration guangrong.xiao
2018-07-13 18:01 ` Dr. David Alan Gilbert
2018-07-18 8:44 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer guangrong.xiao
2018-06-20 4:52 ` Peter Xu
2018-06-28 10:02 ` Xiao Guangrong
2018-06-28 11:55 ` Wei Wang
2018-06-29 3:55 ` Xiao Guangrong
2018-07-03 15:55 ` Paul E. McKenney
2018-06-20 5:55 ` Peter Xu
2018-06-28 14:00 ` Xiao Guangrong
2018-06-20 12:38 ` Michael S. Tsirkin
2018-06-29 7:30 ` Xiao Guangrong
2018-06-29 13:08 ` Michael S. Tsirkin
2018-07-03 7:31 ` Xiao Guangrong
2018-06-28 13:36 ` Jason Wang
2018-06-29 3:59 ` Xiao Guangrong
2018-06-29 6:15 ` Jason Wang
2018-06-29 7:47 ` Xiao Guangrong
2018-06-29 4:23 ` Michael S. Tsirkin
2018-06-29 7:44 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 10/12] migration: introduce lockless multithreads model guangrong.xiao
2018-06-20 6:52 ` Peter Xu
2018-06-28 14:25 ` Xiao Guangrong
2018-07-13 16:24 ` Dr. David Alan Gilbert
2018-07-18 7:12 ` Xiao Guangrong
2018-06-04 9:55 ` [Qemu-devel] [PATCH 11/12] migration: use lockless Multithread model for compression guangrong.xiao
2018-06-04 9:55 ` [Qemu-devel] [PATCH 12/12] migration: use lockless Multithread model for decompression guangrong.xiao
2018-06-11 8:00 ` [Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression Peter Xu
2018-06-12 3:19 ` Xiao Guangrong [this message]
2018-06-12 5:36 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9127095c-dff2-a963-d9d1-3d1fe22b7937@gmail.com \
--to=guangrong.xiao@gmail.com \
--cc=dgilbert@redhat.com \
--cc=jiang.biao2@zte.com.cn \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wei.w.wang@intel.com \
--cc=xiaoguangrong@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).