From: Peter Xu <peterx@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
qemu-devel@nongnu.org, Leonardo Bras <leobras@redhat.com>,
Elena Ufimtseva <elena.ufimtseva@oracle.com>
Subject: Re: [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet
Date: Thu, 19 Oct 2023 11:31:15 -0400 [thread overview]
Message-ID: <ZTFLw/cgrOcSJTxG@x1n> (raw)
In-Reply-To: <87fs267voo.fsf@secure.mitica>
On Thu, Oct 19, 2023 at 12:28:39PM +0200, Juan Quintela wrote:
> Fabiano Rosas <farosas@suse.de> wrote:
> > We currently use the 'sem_sync' semaphore on the sending side:
> >
> > 1) to know when the multifd_send_thread() has finished sending the
> > MULTIFD_FLAG_SYNC packet;
> >
> > This is unnecessary. Multifd sends packets one by one and completion
> > is already bound by the 'sem' semaphore. The SYNC packet has nothing
> > special that would require it to have a separate semaphore on the
> > sending side.
>
> What migration basically does is:
>
> sync_dirty_bitmap()
> while (too_much_dirty_memory)
> foreach(dirty_page)
> send(dirty_page)
> sync_dirty_bitmap()
>
> I know, this is an over simplification, but it is enough to explain the
> problem that this code tries to fix.
>
> Once that we have multifd, we can have several channels, each of one
> going through a different network connection. Yes, networks are a black
> box and there is no guarantees about how packets arrive on different
> sockets.
>
> In one iteration, page 99 is dirty. We send it through channel 0.
> We end the iteration and we synchronize the bitmap again.
> We send the SYNC packet in both channels.
> Page 99 is dirty again, and this time it gets sent through channel 1.
>
> What we want, we want the communication to be:
>
> Channel 0 migration thread Channel 1
>
> (1) sent page 99
> (2) sync bitmap
> (3) sent page 99
>
> And we want destination to make sure that it never gets packet with page
> 99 from channel 0 AFTER page 99 from channel 1.
>
> Notice, this is something that it is highly improbable to happen, but it
> _can_ happen (and zero copy increases the probability of it).
>
> So we create this SYNC packet that does that:
>
> foreach (channel)
> create_job_to_send_sync() packet
> foreach (channel)
> wait_until_it_has_sent_the_sync_packet()
>
> Notice that this is the main migration thread, it will net send new work
> to any channel until it receives the sem_sync for every channel.
>
> Now, how do we deal on the target:
>
> foreach (channel)
> wait(sem_sync)
> foreach (channel)
> send(sem_sync)
>
> So, trying to do my best at ASCII art, what happens when we end a round
> of memory iteration
>
> MigrationThread(send) MigrationThread(recv) channel_n (send) channel_n(recv)
>
> sync_bitmap()
> foreach(channel)
> create_job_with_SYNC
> post(channel->sem)
> wait(channel->sem)
> write(SYNC)
> post(channel->sem_sync)
> foreach(channel)
> wait(channel->sem_sync)
>
> write(MULTIFD_FLUSH)
> read(SYNC)
> post(main->sem_sync)
> wait(channel->sem_sync)
> read(MULTIFD_FLUSH)
> foreach(channel)
> wait(main->sem_sync)
> foreach(channel)
> post(channel->sem_sync)
Hmm, I think I missed the fact that the main thread also sends something
(in this case, MULTIFD_FLUSH), when I was raising the question in the other
thread on sync.
Now I think it should work indeed.
I'm not sure what is the case before this commit, though:
commit 294e5a4034e81b3d8db03b4e0f691386f20d6ed3
Author: Juan Quintela <quintela@redhat.com>
Date: Tue Jun 21 13:36:11 2022 +0200
multifd: Only flush once each full round of memory
>
> Interesting points:
>
> 1- We guarantee that packets inside the same channel are in order.
>
> 2- Migration send thread don't send a MULTIFD_FLUSH packet until every
> channel has sent a SYNC packet
IMHO this is not required? Main thread can send MULTIFD_FLUSH early too, I
think the important thing is dest recv threads will always sync with this
one, early or late. Then it's impossible that one new page appears earlier
than another old version (in another channel) because when reading the new
page dest threads must have digested the old.
>
> 3- After reception of a SYNC packet. A channel:
> a -> communicates to the main migration thread that it has received
> it (post(main->sem_sync))
> b -> it waits on (channel->sem_sync)
>
> 4- Main recv thread receives a MULTIFD_FLUSH
> a -> waits for every channel to say that it has received a SYNC
> packet
> b -> communicates to every channel that they can continue.
>
> Send channels can send new data after the main channel does a
> write(MULTIFD_FLUSH). But reception channels will not read it until the
> main recv thread is sure that every reception channel has received a
> SYNC, so we are sure that (in the previous example) page 99 from thread
> 0 is already written.
>
> Is it clearer now?
>
> And yes, after discussion I will convert this email in documentation.
Please do so, and I suggest we start to do more with docs/*.rst and not use
wiki pages unless necessary, then doc is with code.
We may consider create docs/migration/ and this can belong to multifd.rst.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2023-10-19 15:32 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
2023-10-19 9:06 ` Juan Quintela
2023-10-19 14:35 ` Peter Xu
2023-10-19 15:00 ` Juan Quintela
2023-10-19 15:46 ` Peter Xu
2023-10-19 18:28 ` Juan Quintela
2023-10-19 18:50 ` Peter Xu
2023-10-20 7:56 ` Juan Quintela
2023-10-19 14:55 ` Fabiano Rosas
2023-10-19 15:18 ` Juan Quintela
2023-10-19 15:56 ` Fabiano Rosas
2023-10-19 18:41 ` Juan Quintela
2023-10-19 19:04 ` Peter Xu
2023-10-20 7:53 ` Juan Quintela
2023-10-20 12:48 ` Fabiano Rosas
2023-10-22 20:17 ` Peter Xu
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
2023-10-19 9:08 ` Juan Quintela
2023-10-19 14:58 ` Fabiano Rosas
2023-10-19 15:19 ` Peter Xu
2023-10-19 15:19 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
2023-10-19 10:28 ` Juan Quintela
2023-10-19 15:31 ` Peter Xu [this message]
2023-10-12 14:06 ` [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
2023-10-19 10:35 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
2023-10-19 10:43 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZTFLw/cgrOcSJTxG@x1n \
--to=peterx@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).