From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55204) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aeKBG-0007iT-Mj for qemu-devel@nongnu.org; Fri, 11 Mar 2016 05:20:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aeKBC-0006GF-Ic for qemu-devel@nongnu.org; Fri, 11 Mar 2016 05:20:26 -0500 Received: from g2t4618.austin.hp.com ([15.73.212.83]:35239) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aeKBC-0006FL-Bg for qemu-devel@nongnu.org; Fri, 11 Mar 2016 05:20:22 -0500 References: <1457082167-12254-1-git-send-email-jitendra.kolhe@hpe.com> <20160310094912.GC9715@rkaganb.sw.ru> <56E25EBF.6050109@hpe.com> From: Jitendra Kolhe Message-ID: <56E29BD9.8010306@hpe.com> Date: Fri, 11 Mar 2016 15:50:09 +0530 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released by virtio-balloon driver. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Li, Liang Z" , Roman Kagan , "qemu-devel@nongnu.org" , "dgilbert@redhat.com" , "simhan@hpe.com" , "mohan_parthasarathy@hpe.com" On 3/11/2016 12:55 PM, Li, Liang Z wrote: >> On 3/10/2016 3:19 PM, Roman Kagan wrote: >>> On Fri, Mar 04, 2016 at 02:32:47PM +0530, Jitendra Kolhe wrote: >>>> Even though the pages which are returned to the host by >>>> virtio-balloon driver are zero pages, the migration algorithm will >>>> still end up scanning the entire page ram_find_and_save_block() -> >>>> ram_save_page/ ram_save_compressed_page -> save_zero_page() -> >>>> is_zero_range(). We also end-up sending some control information >>>> over network for these page during migration. This adds to total mig= ration >> time. >>> >>> I wonder if it is the scanning for zeros or sending the whiteout whic= h >>> affects the total migration time more. If it is the former (as I >>> would >>> expect) then a rather local change to is_zero_range() to make use of >>> the mapping information before scanning would get you all the speedup= s >>> without protocol changes, interfering with postcopy etc. >>> >>> Roman. >>> >> >> Localizing the solution to zero page scan check is a good idea. I too = agree that >> most of the time is send in scanning for zero page in which case we sh= ould be >> able to localize solution to is_zero_range(). >> However in case of ballooned out pages (which can be seen as a subset = of >> guest zero pages) we also spend a very small portion of total migratio= n time >> in sending the control information, which can be also avoided. >> From my tests for 16GB idle guest of which 12GB was ballooned out, t= he >> zero page scan time for 12GB ballooned out pages was ~1789 ms and >> save_page_header + qemu_put_byte(f, 0); for same 12GB ballooned out >> pages was ~556 ms. Total migration time was ~8000 ms > > How did you do the tests? ~ 556ms seems too long for putting several by= tes to the buffer. > It's likely the time you measured contains the portion to processes the= other 4GB guest memory pages. > > Liang > I modified save_zero_page() as below and updated timers only for=20 ballooned out pages so is_zero_page() should return true(also=20 qemu_balloon_bitmap_test() from my patchset returned 1) With below instrumentation, I got t1 =3D ~1789ms and t2 =3D ~556ms. Also = the=20 total migration time noted (~8000ms) is for unmodified qemu source. It seems to addup to final migration time with proposed patchset. Here is the last entry for =93another round=94 of test, this time its ~54= 7ms JK: block=3D7f5417a345e0, offset=3D3ffe42020, zero_page_scan_time=3D1218 = us,=20 save_page_header_time=3D184 us, total_save_zero_page_time=3D1453 us cumulated vals: zero_page_scan_time=3D1723920378 us,=20 save_page_header_time=3D547514618 us, total_save_zero_page_time=3D2371059= 239 us static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset= , uint8_t *p, uint64_t *bytes_transferred) { int pages =3D -1; int64_t time1, time2, time3, time4; static int64_t t1 =3D 0, t2 =3D 0, t3 =3D 0; time1 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); if (is_zero_range(p, TARGET_PAGE_SIZE)) { time2 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); acct_info.dup_pages++; *bytes_transferred +=3D save_page_header(f, block, offset |=20 RAM_SAVE_FLAG_COMPRESS); qemu_put_byte(f, 0); time3 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); *bytes_transferred +=3D 1; pages =3D 1; } time4 =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); if (qemu_balloon_bitmap_test(block, offset) =3D=3D 1) { t1 +=3D (time2-time1); t2 +=3D (time3-time2); t3 +=3D (time4-time1); fprintf(stderr, "block=3D%lx, offset=3D%lx, zero_page_scan_time=3D= %ld=20 us, save_page_header_time=3D%ld us, total_save_zero_page_time=3D%ld us\n" "cumulated vals: zero_page_scan_time=3D%ld us,=20 save_page_header_time=3D%ld us, total_save_zero_page_time=3D%ld us\n", (unsigned long)block, (unsigned long)offset, (time2-time1), (time3-time2), (time4-time1),=20 t1, t2, t3); } return pages; } Thanks, - Jitendra >> if (is_zero_range(p, TARGET_PAGE_SIZE)) { >> acct_info.dup_pages++; >> *bytes_transferred +=3D save_page_header(f, block, >> offset | RAM_SAVE_FLA= G_COMPRESS); >> qemu_put_byte(f, 0); >> *bytes_transferred +=3D 1; >> pages =3D 1; >> } >> Would moving the solution to save_zero_page() be good enough? >> >> Thanks, >> - Jitendra >