From: Jitendra Kolhe <jitendra.kolhe@hpe.com>
To: "Li, Liang Z" <liang.z.li@intel.com>,
Roman Kagan <rkagan@virtuozzo.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"simhan@hpe.com" <simhan@hpe.com>,
"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>
Subject: Re: [Qemu-devel] [PATCH v1] migration: skip sending ram pages released by virtio-balloon driver.
Date: Fri, 11 Mar 2016 20:09:09 +0530 [thread overview]
Message-ID: <56E2D88D.2060702@hpe.com> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0414B554@shsmsx102.ccr.corp.intel.com>
On 3/11/2016 4:24 PM, Li, Liang Z wrote:
>>>>> I wonder if it is the scanning for zeros or sending the whiteout
>>>>> which affects the total migration time more. If it is the former
>>>>> (as I would
>>>>> expect) then a rather local change to is_zero_range() to make use of
>>>>> the mapping information before scanning would get you all the
>>>>> speedups without protocol changes, interfering with postcopy etc.
>>>>>
>>>>> Roman.
>>>>>
>>>>
>>>> Localizing the solution to zero page scan check is a good idea. I too
>>>> agree that most of the time is send in scanning for zero page in
>>>> which case we should be able to localize solution to is_zero_range().
>>>> However in case of ballooned out pages (which can be seen as a subset
>>>> of guest zero pages) we also spend a very small portion of total
>>>> migration time in sending the control information, which can be also
>> avoided.
>>>> From my tests for 16GB idle guest of which 12GB was ballooned out,
>>>> the zero page scan time for 12GB ballooned out pages was ~1789 ms and
>>>> save_page_header + qemu_put_byte(f, 0); for same 12GB ballooned out
>>>> pages was ~556 ms. Total migration time was ~8000 ms
>>>
>>> How did you do the tests? ~ 556ms seems too long for putting several
>> bytes to the buffer.
>>> It's likely the time you measured contains the portion to processes the
>> other 4GB guest memory pages.
>>>
>>> Liang
>>>
>>
>> I modified save_zero_page() as below and updated timers only for ballooned
>> out pages so is_zero_page() should return true(also
>> qemu_balloon_bitmap_test() from my patchset returned 1) With below
>> instrumentation, I got t1 = ~1789ms and t2 = ~556ms. Also the total migration
>> time noted (~8000ms) is for unmodified qemu source.
>
> You mean the total live migration time for the unmodified qemu and the 'you modified for test' qemu
> are almost the same?
>
Not sure I understand the question, but if 'you modified for test' means
below modifications to save_zero_page(), then answer is no. Here is what
I tried, let’s say we have 3 versions of qemu (below timings are for
16GB idle guest with 12GB ballooned out)
v1. Unmodified qemu – absolutely not code change – Total Migration time
= ~7600ms (I rounded this one to ~8000ms)
v2. Modified qemu 1 – with proposed patch set (which skips both zero
pages scan and migrating control information for ballooned out pages) -
Total Migration time = ~5700ms
v3. Modified qemu 2 – only with changes to save_zero_page() as discussed
in previous mail (and of course using proposed patch set only to
maintain bitmap for ballooned out pages) – Total migration time is
irrelevant in this case.
Total Zero page scan time = ~1789ms
Total (save_page_header + qemu_put_byte(f, 0)) = ~556ms.
Everything seems to add up here (may not be exact) – 5700+1789+559 = ~8000ms
I see 2 factors that we have not considered in this add up a. overhead
for migrating balloon bitmap to target and b. as you mentioned below
overhead of qemu_clock_get_ns().
>> It seems to addup to final migration time with proposed patchset.
>>
>> Here is the last entry for "another round" of test, this time its ~547ms
>> JK: block=7f5417a345e0, offset=3ffe42020, zero_page_scan_time=1218 us,
>> save_page_header_time=184 us, total_save_zero_page_time=1453 us
>> cumulated vals: zero_page_scan_time=1723920378 us,
>> save_page_header_time=547514618 us,
>> total_save_zero_page_time=2371059239 us
>>
>> static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>> uint8_t *p, uint64_t *bytes_transferred) {
>> int pages = -1;
>> int64_t time1, time2, time3, time4;
>> static int64_t t1 = 0, t2 = 0, t3 = 0;
>>
>> time1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>> time2 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> acct_info.dup_pages++;
>> *bytes_transferred += save_page_header(f, block,
>> offset | RAM_SAVE_FLAG_COMPRESS);
>> qemu_put_byte(f, 0);
>> time3 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> *bytes_transferred += 1;
>> pages = 1;
>> }
>> time4 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>
>> if (qemu_balloon_bitmap_test(block, offset) == 1) {
>> t1 += (time2-time1);
>> t2 += (time3-time2);
>> t3 += (time4-time1);
>> fprintf(stderr, "block=%lx, offset=%lx, zero_page_scan_time=%ld us,
>> save_page_header_time=%ld us, total_save_zero_page_time=%ld us\n"
>> "cumulated vals: zero_page_scan_time=%ld us,
>> save_page_header_time=%ld us, total_save_zero_page_time=%ld us\n",
>> (unsigned long)block, (unsigned long)offset,
>> (time2-time1), (time3-time2), (time4-time1), t1, t2, t3);
>> }
>> return pages;
>> }
>>
>
> Thanks for your description.
> The issue here is that there are too many qemu_clock_get_ns() call, the cost of the function
> itself may become the main time consuming operation. You can measure the time consumed
> by the qemu_clock_get_ns() you added for test by comparing the result with the version
> which not add the qemu_clock_get_ns().
>
> Liang
>
Yes, we can try to measure overhead for qemu_clock_get_ns() calls and
see if things add up perfectly.
Thanks,
- Jitendra
next prev parent reply other threads:[~2016-03-11 14:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-04 9:02 [Qemu-devel] [PATCH v1] migration: skip sending ram pages released by virtio-balloon driver Jitendra Kolhe
2016-03-07 17:05 ` Eric Blake
2016-03-10 9:49 ` Roman Kagan
2016-03-11 5:59 ` Jitendra Kolhe
2016-03-11 7:25 ` Li, Liang Z
2016-03-11 10:20 ` Jitendra Kolhe
2016-03-11 10:54 ` Li, Liang Z
2016-03-11 14:39 ` Jitendra Kolhe [this message]
2016-03-15 13:20 ` Jitendra Kolhe
2016-03-18 11:27 ` Roman Kagan
2016-03-22 5:47 ` Jitendra Kolhe
-- strict thread matches above, loose matches on Subject: below --
2016-03-10 8:57 Jitendra Kolhe
2016-03-10 17:27 ` Eric Blake
2016-03-11 2:20 ` Jitendra Kolhe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56E2D88D.2060702@hpe.com \
--to=jitendra.kolhe@hpe.com \
--cc=dgilbert@redhat.com \
--cc=liang.z.li@intel.com \
--cc=mohan_parthasarathy@hpe.com \
--cc=qemu-devel@nongnu.org \
--cc=rkagan@virtuozzo.com \
--cc=simhan@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).