From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O8xh0-0000kB-Ox for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:22 -0400 Received: from [140.186.70.92] (port=56450 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O8xgz-0000jt-E3 for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O8xgx-0001e6-7F for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:21 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:34116) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O8xgx-0001e1-3A for qemu-devel@nongnu.org; Mon, 03 May 2010 11:36:19 -0400 Received: by gyd5 with SMTP id 5so1192590gyd.4 for ; Mon, 03 May 2010 08:36:18 -0700 (PDT) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <4BDEBC09.5020501@linux.vnet.ibm.com> References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1271829445-5328-6-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4BD0A35E.8000205@linux.vnet.ibm.com> <4BD11604.3060309@lab.ntt.co.jp> <4BD19F12.2020004@linux.vnet.ibm.com> <4BD1A52C.1090406@redhat.com> <4BDEBC09.5020501@linux.vnet.ibm.com> Date: Tue, 4 May 2010 00:36:14 +0900 Message-ID: From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops(). List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org, mtosatti@redhat.com, Anthony Liguori , qemu-devel@nongnu.org, yoshikawa.takuya@oss.ntt.co.jp, Avi Kivity 2010/5/3 Anthony Liguori : > On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote: >> >> 2010/4/23 Avi Kivity: >> >>> >>> On 04/23/2010 04:22 PM, Anthony Liguori wrote: >>> >>>>> >>>>> I currently don't have data, but I'll prepare it. >>>>> There were two things I wanted to avoid. >>>>> >>>>> 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. >>>>> 2. Calling write() everytime even when we want to send multiple pages >>>>> at >>>>> once. >>>>> >>>>> I think 2 may be neglectable. >>>>> But 1 seems to be problematic if we want make to the latency as small >>>>> as >>>>> possible, no? >>>>> >>>> >>>> Copying often has strange CPU characteristics depending on whether the >>>> data is already in cache. =A0It's better to drive these sort of >>>> optimizations >>>> through performance measurement because changes are not always obvious= . >>>> >>> >>> Copying always introduces more cache pollution, so even if the data is = in >>> the cache, it is worthwhile (not disagreeing with the need to measure). >>> >> >> Anthony, >> >> I measure how long it takes to send all guest pages during migration, an= d >> I >> would like to share the information in this message. =A0For convenience, >> I modified >> the code to do migration not "live migration" which means buffered file = is >> not >> used here. >> >> In summary, the performance improvement using writev instead of write/se= nd >> when >> we used GbE seems to be neglectable, however, when the underlying networ= k >> was >> fast (InfiniBand with IPoIB in this case), writev performed 17% faster >> than >> write/send, and therefore, it may be worthwhile to introduce vectors. >> >> Since QEMU compresses pages, I copied a junk file to tmpfs to dirty page= s >> to let >> QEMU to transfer fine number of pages. =A0After setting up the guest, I = used >> cpu_get_real_ticks() to measure the time during the while loop calling >> ram_save_block() in ram_save_live(). =A0I removed the qemu_file_rate_lim= it() >> to >> disable the function of buffered file, and all of the pages would be >> transfered >> at the first round. >> >> I measure 10 times for each, and took average and standard deviation. >> Considering the results, I think the trial number was enough. =A0In addi= tion >> to >> time duration, number of writev/write and number of pages which were >> compressed >> (dup)/not compressed (nodup) are demonstrated. >> >> Test Environment: >> CPU: 2x Intel Xeon Dual Core 3GHz >> Mem size: 6GB >> Network: GbE, InfiniBand (IPoIB) >> >> Host OS: Fedora 11 (kernel 2.6.34-rc1) >> Guest OS: Fedora 11 (kernel 2.6.33) >> Guest Mem size: 512MB >> >> * GbE writev >> time (sec): 35.732 (std 0.002) >> write count: 4 (std 0) >> writev count: 8269 (std 1) >> dup count: 36157 (std 124) >> nodup count: 1016808 (std 147) >> >> * GbE write >> time (sec): 35.780 (std 0.164) >> write count: 127367 (21) >> writev count: 0 (std 0) >> dup count: 36134 (std 108) >> nodup count: 1016853 (std 165) >> >> * IPoIB writev >> time (sec): 13.889 (std 0.155) >> write count: 4 (std 0) >> writev count: 8267 (std 1) >> dup count: 36147 (std 105) >> nodup count: 1016838 (std 111) >> >> * IPoIB write >> time (sec): 16.777 (std 0.239) >> write count: 127364 (24) >> writev count: 0 (std 0) >> dup count: 36173 (std 169) >> nodup count: 1016840 (std 190) >> >> Although the improvement wasn't obvious when the network wan GbE, >> introducing >> writev may be worthwhile when we focus on faster networks like >> InfiniBand/10GE. >> >> I agree that separating this optimization from the main logic of Kemari >> since >> this modification must be done widely and carefully at the same time. >> > > Okay. =A0It looks like it's clear that it's a win so let's split it out o= f the > main series and we'll treat it separately. =A0I imagine we'll see even mo= re > positive results on 10 gbit and particularly if we move migration out int= o a > separate thread. Great! I also wanted to test with 10GE but I'm physically away from my office now, and can't set up the test environment. I'll measure the numbers w/ 10GE next week. BTW, I was thinking to write a patch to separate threads for both sender and receiver of migration. Kemari especially needs a separate thread receiver, so that monitor can accepts commands from other HA tools. Is someone already working on this? If not, I would add it to my task list :-) Thanks, Yoshi > > Regards, > > Anthony Liguori > >> Thanks, >> >> Yoshi