qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,  Peter Xu <peterx@redhat.com>,
	 Leonardo Bras <leobras@redhat.com>,
	 Elena Ufimtseva <elena.ufimtseva@oracle.com>
Subject: Re: [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet
Date: Thu, 19 Oct 2023 12:28:39 +0200	[thread overview]
Message-ID: <87fs267voo.fsf@secure.mitica> (raw)
In-Reply-To: <20231012140651.13122-4-farosas@suse.de> (Fabiano Rosas's message of "Thu, 12 Oct 2023 11:06:48 -0300")

Fabiano Rosas <farosas@suse.de> wrote:
> We currently use the 'sem_sync' semaphore on the sending side:
>
> 1) to know when the multifd_send_thread() has finished sending the
>    MULTIFD_FLAG_SYNC packet;
>
>   This is unnecessary. Multifd sends packets one by one and completion
>   is already bound by the 'sem' semaphore. The SYNC packet has nothing
>   special that would require it to have a separate semaphore on the
>   sending side.

What migration basically does is:

sync_dirty_bitmap()
while (too_much_dirty_memory)
   foreach(dirty_page)
      send(dirty_page)
   sync_dirty_bitmap()

I know, this is an over simplification, but it is enough to explain the
problem that this code tries to fix.

Once that we have multifd, we can have several channels, each of one
going through a different network connection.  Yes, networks are a black
box and there is no guarantees about how packets arrive on different
sockets.

In one iteration, page 99 is dirty.  We send it through channel 0.
We end the iteration and we synchronize the bitmap again.
We send the SYNC packet in both channels.
Page 99 is dirty again, and this time it gets sent through channel 1.

What we want, we want the communication to be:

    Channel 0          migration thread    Channel 1

(1) sent page 99
(2)                    sync bitmap
(3)                                        sent page 99

And we want destination to make sure that it never gets packet with page
99 from channel 0 AFTER page 99 from channel 1.

Notice, this is something that it is highly improbable to happen, but it
_can_ happen (and zero copy increases the probability of it).

So we create this SYNC packet that does that:

foreach (channel)
   create_job_to_send_sync() packet
foreach (channel)
   wait_until_it_has_sent_the_sync_packet()

Notice that this is the main migration thread, it will net send new work
to any channel until it receives the sem_sync for every channel.

Now, how do we deal on the target:

foreach (channel)
   wait(sem_sync)
foreach (channel)
   send(sem_sync)

So, trying to do my best at ASCII art, what happens when we end a round
of memory iteration

MigrationThread(send)           MigrationThread(recv)   channel_n (send)         channel_n(recv)

sync_bitmap()
foreach(channel)
   create_job_with_SYNC
   post(channel->sem)
                                                        wait(channel->sem)
                                                        write(SYNC)
                                                        post(channel->sem_sync)
foreach(channel)
   wait(channel->sem_sync)

write(MULTIFD_FLUSH)
                                                                                  read(SYNC)
                                                                                  post(main->sem_sync)
                                                                                  wait(channel->sem_sync)
                                read(MULTIFD_FLUSH)
                                foreach(channel)
                                   wait(main->sem_sync)
                                foreach(channel)
                                   post(channel->sem_sync)

Interesting points:

1- We guarantee that packets inside the same channel are in order.

2- Migration send thread don't send a MULTIFD_FLUSH packet until every
   channel has sent a SYNC packet

3- After reception of a SYNC packet.  A channel:
   a -> communicates to the main migration thread that it has received
        it (post(main->sem_sync))
   b -> it waits on (channel->sem_sync)

4- Main recv thread receives a MULTIFD_FLUSH
   a -> waits for every channel to say that it has received a SYNC
        packet
   b -> communicates to every channel that they can continue.

Send channels can send new data after the main channel does a
write(MULTIFD_FLUSH).  But reception channels will not read it until the
main recv thread is sure that every reception channel has received a
SYNC, so we are sure that (in the previous example) page 99 from thread
0 is already written.

Is it clearer now?

And yes, after discussion I will convert this email in documentation.

> 2) to wait for the multifd threads to finish before cleaning up;
>
>    This happens because multifd_send_sync_main() blocks
>    ram_save_complete() from finishing until the semaphore is
>    posted. This is surprising and not documented.
>
> Clarify the above situation by renaming 'sem_sync' to 'sem_done' and
> making the #2 usage the main one. Post to 'sem_sync' only when there's
> no more pending_jobs.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> I remove the parts about the receiving side. I wasn't sure about them
> and we don't need to mix the two. Potentially we need the sem_sync on
> the recv to ensure all channels wait before becoming available to read
> once again after a FLUSH.

I think this is not needed.  It is the source how decides when it is
needed to wait for all the packets in the middle.

Later, Juan.



  reply	other threads:[~2023-10-19 10:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
2023-10-19  9:06   ` Juan Quintela
2023-10-19 14:35     ` Peter Xu
2023-10-19 15:00       ` Juan Quintela
2023-10-19 15:46         ` Peter Xu
2023-10-19 18:28           ` Juan Quintela
2023-10-19 18:50             ` Peter Xu
2023-10-20  7:56               ` Juan Quintela
2023-10-19 14:55     ` Fabiano Rosas
2023-10-19 15:18       ` Juan Quintela
2023-10-19 15:56         ` Fabiano Rosas
2023-10-19 18:41           ` Juan Quintela
2023-10-19 19:04             ` Peter Xu
2023-10-20  7:53               ` Juan Quintela
2023-10-20 12:48                 ` Fabiano Rosas
2023-10-22 20:17                   ` Peter Xu
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
2023-10-19  9:08   ` Juan Quintela
2023-10-19 14:58     ` Fabiano Rosas
2023-10-19 15:19       ` Peter Xu
2023-10-19 15:19       ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
2023-10-19 10:28   ` Juan Quintela [this message]
2023-10-19 15:31     ` Peter Xu
2023-10-12 14:06 ` [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
2023-10-19 10:35   ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
2023-10-19 10:43   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fs267voo.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).