Re: [PATCH v6 00/13] Migration: Transmit and detect zero pages in the multifd threads

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Juan Quintela <quintela@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, "Eduardo Habkost" <eduardo@habkost.net>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>
Subject: Re: [PATCH v6 00/13] Migration: Transmit and detect zero pages in the multifd threads
Date: Mon, 16 May 2022 12:45:37 +0200	[thread overview]
Message-ID: <87pmkdsqlq.fsf@secure.mitica> (raw)
In-Reply-To: <Yn0OMzygfmlXgl8w@work-vm> (David Alan Gilbert's message of "Thu,  12 May 2022 14:40:03 +0100")

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:

>> 16GB guest
>>                 Precopy            upstream          zero page
>>                 Time    Downtime   Time    Downtime  Time    Downtime
>> clean idle      1548     93         1359   48         866    167

>                                            866/1359 = 64%


>> dirty idle     16222    220         2092   371       1870    258

>                                            1870/2092 = 89%

>> busy 4GB       don't converge      31000   308       1604    371
>> 
>> In the dirty idle, there is some weirdness in the precopy case, I
>> tried several times and it always took too much time.  It should be
>> faster.
>> 
>> In the busy 4GB case, precopy don't converge (expected) and without
>> zero page, multifd is on the limit, it _almost_ don't convrge, it took
>> 187 iterations to converge.
>> 
>> 1TB
>>                 Precopy            upstream          zero page
>>                 Time    Downtime   Time    Downtime  Time    Downtime
>> clean idle     83174    381        72075   345       52966   273

>                                           52966/72075=74%

>> dirty idle                        104587   381       75601   269

>                                           75601/104587=72%

>> busy 2GB                           79912   345       58953   348
>> 
>> I only tried the clean idle case with 1TB.  Notice that it is already
>> significantively slower.  With 1TB RAM, zero page is clearly superior in all tests.
>> 
>> 4TB
>>                 upstream          zero page
>>                 Time    Downtime  Time    Downtime
>> clean idle      317054  552       215567  500

>                 215567/317054 = 68%

>> dirty idle      357581  553       317428  744

>                 317428/357581 = 89%

>
> The 1TB dirty/idle is a bit of an unusual outlier at 72% time; but still
> the 89% on the 16GB/4TB dirty case is still a useful improvement - I wasn't
> expecting the dirty case to be as good - I wonder if there's some side
> benefit, like meaning the page is only read by the data threads and not
> also read by the main thread so only in one cache?

That could help it, but  Ithink that it is much simpler than that:

live_migration thread with upstream

>    5.07%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
>    0.95%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
>    0.88%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
>    0.36%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
>    0.26%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable

Almost 8% CPU.

live migration with zero page:

>    1.59%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
>    1.45%  live_migration   libc.so.6                [.] __pthread_mutex_unlock_usercnt
>    1.28%  live_migration   libc.so.6                [.] __pthread_mutex_lock
>    0.69%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
>    0.48%  live_migration   qemu-system-x86_64       [.] qemu_mutex_unlock_impl
>    0.48%  live_migration   qemu-system-x86_64       [.] qemu_mutex_lock_impl

less than 6% CPU, and remember, we are going way faster, so we are doing
much more work here.  I *think* that it as much related that we are
waiting less time for the migration thread.  Remember that at this
point, we are already limited by the network.

I think that for explaining it, it is much better the zero page case, we
move from upstream:

>  44.27%  live_migration   qemu-system-x86_64       [.] buffer_zero_avx512
>   10.21%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
>    6.58%  live_migration   qemu-system-x86_64       [.] add_to_iovec
>    4.25%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
>    2.70%  live_migration   qemu-system-x86_64       [.] qemu_put_byte.part.0
>    2.43%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
>    2.34%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
>    1.59%  live_migration   qemu-system-x86_64       [.] qemu_put_be32
>    1.30%  live_migration   qemu-system-x86_64       [.] find_next_bit
>    1.08%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
>    0.98%  live_migration   qemu-system-x86_64       [.] ram_save_iterate
>    0.67%  live_migration   [kernel.kallsyms]        [k] copy_user_enhanced_fast_string
>    0.61%  live_migration   qemu-system-x86_64       [.] save_zero_page_to_file.part.0
>    0.45%  live_migration   qemu-system-x86_64       [.] qemu_put_byte
>    0.42%  live_migration   qemu-system-x86_64       [.] save_page_header
>    0.41%  live_migration   qemu-system-x86_64       [.] qemu_put_be64
>    0.35%  live_migration   qemu-system-x86_64       [.] migrate_postcopy_ram

More than 80% (I am too lazy to do the sum), to zero page detection
with:

>  15.49%  live_migration   qemu-system-x86_64       [.] ram_find_and_save_block.part.0
>    3.20%  live_migration   qemu-system-x86_64       [.] ram_bytes_total_common
>    2.67%  live_migration   qemu-system-x86_64       [.] multifd_queue_page
>    2.33%  live_migration   qemu-system-x86_64       [.] bitmap_test_and_clear_atomic
>    2.19%  live_migration   qemu-system-x86_64       [.] qemu_ram_is_migratable
>    1.19%  live_migration   qemu-system-x86_64       [.] find_next_bit
>    1.18%  live_migration   qemu-system-x86_64       [.] migrate_ignore_shared
>    1.14%  live_migration   qemu-system-x86_64       [.] multifd_send_pages
>    0.96%  live_migration   [kernel.kallsyms]        [k] futex_wake
>    0.81%  live_migration   [kernel.kallsyms]        [k] send_call_function_single_ipi
>    0.71%  live_migration   qemu-system-x86_64       [.] ram_save_iterate

almost 32% (also lazy to do the sum).

> (the 10% improvement on the dirty case is more important to me than the
> more impressive number for the clean case)

Fully agree.  Getting this series to go faster with huge guests (1TB/4TB
guests) was relatively easy.  Being sure that we didn't hurt the smaller
guests was more complicated.  The other added benefit is that we don't
sent any page for RAM through the migration channel, that makes things
much better because we use way less overhead.

Later, Juan.

     prev parent reply	other threads:[~2022-05-16 12:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 22:42 [PATCH v6 00/13] Migration: Transmit and detect zero pages in the multifd threads Juan Quintela
2022-05-10 22:42 ` [PATCH v6 01/13] multifd: Document the locking of MultiFD{Send/Recv}Params Juan Quintela
2022-05-16 13:14   ` Dr. David Alan Gilbert
2022-05-18  8:40     ` Juan Quintela
2022-05-10 22:42 ` [PATCH v6 02/13] multifd: Create page_size fields into both MultiFD{Recv, Send}Params Juan Quintela
2022-05-17  8:44   ` [PATCH v6 02/13] multifd: Create page_size fields into both MultiFD{Recv,Send}Params Dr. David Alan Gilbert
2022-05-18  8:48     ` Juan Quintela
2022-05-10 22:42 ` [PATCH v6 03/13] multifd: Create page_count fields into both MultiFD{Recv, Send}Params Juan Quintela
2022-05-10 22:42 ` [PATCH v6 04/13] migration: Export ram_transferred_ram() Juan Quintela
2022-05-10 22:42 ` [PATCH v6 05/13] multifd: Count the number of bytes sent correctly Juan Quintela
2022-05-10 22:42 ` [PATCH v6 06/13] migration: Make ram_save_target_page() a pointer Juan Quintela
2022-05-10 22:42 ` [PATCH v6 07/13] multifd: Make flags field thread local Juan Quintela
2022-05-10 22:42 ` [PATCH v6 08/13] multifd: Prepare to send a packet without the mutex held Juan Quintela
2022-05-10 22:42 ` [PATCH v6 09/13] multifd: Add property to enable/disable zero_page Juan Quintela
2022-05-10 22:42 ` [PATCH v6 10/13] migration: Export ram_release_page() Juan Quintela
2022-05-10 22:42 ` [PATCH v6 11/13] multifd: Support for zero pages transmission Juan Quintela
2022-05-10 22:42 ` [PATCH v6 12/13] multifd: Zero " Juan Quintela
2022-05-10 22:42 ` [PATCH v6 13/13] migration: Use multifd before we check for the zero page Juan Quintela
2022-05-12 13:40 ` [PATCH v6 00/13] Migration: Transmit and detect zero pages in the multifd threads Dr. David Alan Gilbert
2022-05-16 10:45   ` Juan Quintela [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmkdsqlq.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=leobras@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).