From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40096) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f0fVw-0002RT-Bv for qemu-devel@nongnu.org; Mon, 26 Mar 2018 23:43:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f0fVt-0006h9-7k for qemu-devel@nongnu.org; Mon, 26 Mar 2018 23:43:12 -0400 Received: from mail-pg0-x22f.google.com ([2607:f8b0:400e:c05::22f]:40255) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1f0fVs-0006gg-UZ for qemu-devel@nongnu.org; Mon, 26 Mar 2018 23:43:09 -0400 Received: by mail-pg0-x22f.google.com with SMTP id g8so8095107pgv.7 for ; Mon, 26 Mar 2018 20:43:08 -0700 (PDT) References: <20180313075739.11194-1-xiaoguangrong@tencent.com> <20180313075739.11194-2-xiaoguangrong@tencent.com> <20180315102501.GA3062@work-vm> <423c901d-16b6-67fb-262b-3021e30871ec@gmail.com> <20180321081923.GB20571@xz-mi> <20180326090213.GB17789@xz-mi> From: Xiao Guangrong Message-ID: <73e25db4-997f-0fbf-0c73-6589283c4005@gmail.com> Date: Mon, 26 Mar 2018 23:43:33 +0800 MIME-Version: 1.0 In-Reply-To: <20180326090213.GB17789@xz-mi> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 1/8] migration: stop compressing page in migration thread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: "Dr. David Alan Gilbert" , liang.z.li@intel.com, kvm@vger.kernel.org, quintela@redhat.com, mtosatti@redhat.com, Xiao Guangrong , qemu-devel@nongnu.org, mst@redhat.com, pbonzini@redhat.com On 03/26/2018 05:02 PM, Peter Xu wrote: > On Thu, Mar 22, 2018 at 07:38:07PM +0800, Xiao Guangrong wrote: >> >> >> On 03/21/2018 04:19 PM, Peter Xu wrote: >>> On Fri, Mar 16, 2018 at 04:05:14PM +0800, Xiao Guangrong wrote: >>>> >>>> Hi David, >>>> >>>> Thanks for your review. >>>> >>>> On 03/15/2018 06:25 PM, Dr. David Alan Gilbert wrote: >>>> >>>>>> migration/ram.c | 32 ++++++++++++++++---------------- >>>>> >>>>> Hi, >>>>> Do you have some performance numbers to show this helps? Were those >>>>> taken on a normal system or were they taken with one of the compression >>>>> accelerators (which I think the compression migration was designed for)? >>>> >>>> Yes, i have tested it on my desktop, i7-4790 + 16G, by locally live migrate >>>> the VM which has 8 vCPUs + 6G memory and the max-bandwidth is limited to 350. >>>> >>>> During the migration, a workload which has 8 threads repeatedly written total >>>> 6G memory in the VM. Before this patchset, its bandwidth is ~25 mbps, after >>>> applying, the bandwidth is ~50 mbps. >>> >>> Hi, Guangrong, >>> >>> Not really review comments, but I got some questions. :) >> >> Your comments are always valuable to me! :) >> >>> >>> IIUC this patch will only change the behavior when last_sent_block >>> changed. I see that the performance is doubled after the change, >>> which is really promising. However I don't fully understand why it >>> brings such a big difference considering that IMHO current code is >>> sending dirty pages per-RAMBlock. I mean, IMHO last_sent_block should >>> not change frequently? Or am I wrong? >> >> It's depends on the configuration, each memory-region which is ram or >> file backend has a RAMBlock. >> >> Actually, more benefits comes from the fact that the performance & throughput >> of the multithreads has been improved as the threads is fed by the >> migration thread and the result is consumed by the migration >> thread. > > I'm not sure whether I got your points - I think you mean that the > compression threads and the migration thread can form a better > pipeline if the migration thread does not do any compression at all. > > I think I agree with that. > > However it does not really explain to me on why a very rare event > (sending the first page of a RAMBlock, considering bitmap sync is > rare) can greatly affect the performance (it shows a doubled boost). > I understand it is trick indeed, but it is not very hard to explain. Multi-threads (using 8 CPUs in our test) keep idle for a long time for the origin code, however, after our patch, as the normal is posted out async-ly that it's extremely fast as you said (the network is almost idle for current implementation) so it has a long time that the CPUs can be used effectively to generate more compressed data than before. > Btw, about the numbers: IMHO the numbers might not be really "true > numbers". Or say, even the bandwidth is doubled, IMHO it does not > mean the performance is doubled. Becasue the data has changed. > > Previously there were only compressed pages, and now for each cycle of > RAMBlock looping we'll send a normal page (then we'll get more thing > to send). So IMHO we don't really know whether we sent more pages > with this patch, we can only know we sent more bytes (e.g., an extreme > case is that the extra 25Mbps/s are all caused by those normal pages, > and we can be sending exactly the same number of pages like before, or > even worse?). > Current implementation uses CPU very ineffectively (it's our next work to be posted out) that the network is almost idle so posting more data out is a better choice,further more, migration thread plays a role for parallel, it'd better to make it fast. >> >>> >>> Another follow-up question would be: have you measured how long time >>> needed to compress a 4k page, and how many time to send it? I think >>> "sending the page" is not really meaningful considering that we just >>> put a page into the buffer (which should be extremely fast since we >>> don't really flush it every time), however I would be curious on how >>> slow would compressing a page be. >> >> I haven't benchmark the performance of zlib, i think it is CPU intensive >> workload, particularly, there no compression-accelerator (e.g, QAT) on >> our production. BTW, we were using lzo instead of zlib which worked >> better for some workload. > > Never mind. Good to know about that. > >> >> Putting a page into buffer should depend on the network, i,e, if the >> network is congested it should take long time. :) > > Again, considering that I don't know much on compression (especially I > hardly used that) mine are only questions, which should not block your > patches to be either queued/merged/reposted when proper. :) Yes, i see. The discussion can potentially raise a better solution. Thanks for your comment, Peter!