From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O8uPO-0004LZ-V8 for qemu-devel@nongnu.org; Mon, 03 May 2010 08:05:59 -0400 Received: from [140.186.70.92] (port=50748 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O8uPG-00046B-Fq for qemu-devel@nongnu.org; Mon, 03 May 2010 08:05:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O8uPD-000487-II for qemu-devel@nongnu.org; Mon, 03 May 2010 08:05:50 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:54193) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O8uPD-00047X-Cs for qemu-devel@nongnu.org; Mon, 03 May 2010 08:05:47 -0400 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by e31.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id o43Btop7002950 for ; Mon, 3 May 2010 05:55:50 -0600 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o43C5Wf4063866 for ; Mon, 3 May 2010 06:05:34 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o43C5VvQ016204 for ; Mon, 3 May 2010 06:05:31 -0600 Message-ID: <4BDEBC09.5020501@linux.vnet.ibm.com> Date: Mon, 03 May 2010 07:05:29 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1271829445-5328-6-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4BD0A35E.8000205@linux.vnet.ibm.com> <4BD11604.3060309@lab.ntt.co.jp> <4BD19F12.2020004@linux.vnet.ibm.com> <4BD1A52C.1090406@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops(). List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org, mtosatti@redhat.com, Anthony Liguori , qemu-devel@nongnu.org, yoshikawa.takuya@oss.ntt.co.jp, Avi Kivity On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote: > 2010/4/23 Avi Kivity: > >> On 04/23/2010 04:22 PM, Anthony Liguori wrote: >> >>>> I currently don't have data, but I'll prepare it. >>>> There were two things I wanted to avoid. >>>> >>>> 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. >>>> 2. Calling write() everytime even when we want to send multiple pages at >>>> once. >>>> >>>> I think 2 may be neglectable. >>>> But 1 seems to be problematic if we want make to the latency as small as >>>> possible, no? >>>> >>> >>> Copying often has strange CPU characteristics depending on whether the >>> data is already in cache. It's better to drive these sort of optimizations >>> through performance measurement because changes are not always obvious. >>> >> Copying always introduces more cache pollution, so even if the data is in >> the cache, it is worthwhile (not disagreeing with the need to measure). >> > Anthony, > > I measure how long it takes to send all guest pages during migration, and I > would like to share the information in this message. For convenience, > I modified > the code to do migration not "live migration" which means buffered file is not > used here. > > In summary, the performance improvement using writev instead of write/send when > we used GbE seems to be neglectable, however, when the underlying network was > fast (InfiniBand with IPoIB in this case), writev performed 17% faster than > write/send, and therefore, it may be worthwhile to introduce vectors. > > Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let > QEMU to transfer fine number of pages. After setting up the guest, I used > cpu_get_real_ticks() to measure the time during the while loop calling > ram_save_block() in ram_save_live(). I removed the qemu_file_rate_limit() to > disable the function of buffered file, and all of the pages would be transfered > at the first round. > > I measure 10 times for each, and took average and standard deviation. > Considering the results, I think the trial number was enough. In addition to > time duration, number of writev/write and number of pages which were compressed > (dup)/not compressed (nodup) are demonstrated. > > Test Environment: > CPU: 2x Intel Xeon Dual Core 3GHz > Mem size: 6GB > Network: GbE, InfiniBand (IPoIB) > > Host OS: Fedora 11 (kernel 2.6.34-rc1) > Guest OS: Fedora 11 (kernel 2.6.33) > Guest Mem size: 512MB > > * GbE writev > time (sec): 35.732 (std 0.002) > write count: 4 (std 0) > writev count: 8269 (std 1) > dup count: 36157 (std 124) > nodup count: 1016808 (std 147) > > * GbE write > time (sec): 35.780 (std 0.164) > write count: 127367 (21) > writev count: 0 (std 0) > dup count: 36134 (std 108) > nodup count: 1016853 (std 165) > > * IPoIB writev > time (sec): 13.889 (std 0.155) > write count: 4 (std 0) > writev count: 8267 (std 1) > dup count: 36147 (std 105) > nodup count: 1016838 (std 111) > > * IPoIB write > time (sec): 16.777 (std 0.239) > write count: 127364 (24) > writev count: 0 (std 0) > dup count: 36173 (std 169) > nodup count: 1016840 (std 190) > > Although the improvement wasn't obvious when the network wan GbE, introducing > writev may be worthwhile when we focus on faster networks like InfiniBand/10GE. > > I agree that separating this optimization from the main logic of Kemari since > this modification must be done widely and carefully at the same time. > Okay. It looks like it's clear that it's a win so let's split it out of the main series and we'll treat it separately. I imagine we'll see even more positive results on 10 gbit and particularly if we move migration out into a separate thread. Regards, Anthony Liguori > Thanks, > > Yoshi >