* [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-19 9:06 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
` (4 subsequent siblings)
5 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
The channels_ready semaphore is a global variable not linked to any
single multifd channel. Waiting on it only means that "some" channel
has become ready to send data. Since we need to address the channels
by index (multifd_send_state->params[i]), that information adds
nothing of value. The channel being addressed is not necessarily the
one that just released the semaphore.
The only usage of this semaphore that makes sense is to wait for it in
a loop that iterates for the number of channels. That could mean: all
channels have been setup and are operational OR all channels have
finished their work and are idle.
Currently all code that waits on channels_ready is redundant. There is
always a subsequent lock or semaphore that does the actual data
protection/synchronization.
- at multifd_send_pages: Waiting on channels_ready doesn't mean the
'next_channel' is ready, it could be any other channel. So there are
already cases where this code runs as if no semaphore was there.
Waiting outside of the loop is also incorrect because if the current
channel already has a pending_job, then it will loop into the next
one without waiting the semaphore and the count will be greater than
zero at the end of the execution.
Checking that "any" channel is ready as a proxy for all channels
being ready would work, but it's not what the code is doing and not
really needed because the channel lock and 'sem' would be enough.
- at multifd_send_sync: This usage is correct, but it is made
redundant by the wait on sem_sync. What this piece of code is doing
is making sure all channels have sent the SYNC packet and became
idle afterwards.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/multifd.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..e26f5f246d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -362,8 +362,6 @@ struct {
MultiFDPages_t *pages;
/* global number of generated multifd packets */
uint64_t packet_num;
- /* send channels ready */
- QemuSemaphore channels_ready;
/*
* Have we already run terminate threads. There is a race when it
* happens that we got one error while we are exiting.
@@ -403,7 +401,6 @@ static int multifd_send_pages(QEMUFile *f)
return -1;
}
- qemu_sem_wait(&multifd_send_state->channels_ready);
/*
* next_channel can remain from a previous migration that was
* using more channels, so ensure it doesn't overflow if the
@@ -554,7 +551,6 @@ void multifd_save_cleanup(void)
error_free(local_err);
}
}
- qemu_sem_destroy(&multifd_send_state->channels_ready);
g_free(multifd_send_state->params);
multifd_send_state->params = NULL;
multifd_pages_clear(multifd_send_state->pages);
@@ -630,7 +626,6 @@ int multifd_send_sync_main(QEMUFile *f)
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
- qemu_sem_wait(&multifd_send_state->channels_ready);
trace_multifd_send_sync_main_wait(p->id);
qemu_sem_wait(&p->sem_sync);
@@ -664,7 +659,6 @@ static void *multifd_send_thread(void *opaque)
p->num_packets = 1;
while (true) {
- qemu_sem_post(&multifd_send_state->channels_ready);
qemu_sem_wait(&p->sem);
if (qatomic_read(&multifd_send_state->exiting)) {
@@ -759,7 +753,6 @@ out:
*/
if (ret != 0) {
qemu_sem_post(&p->sem_sync);
- qemu_sem_post(&multifd_send_state->channels_ready);
}
qemu_mutex_lock(&p->mutex);
@@ -796,7 +789,6 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
* is not created, and then tell who pay attention to me.
*/
p->quit = true;
- qemu_sem_post(&multifd_send_state->channels_ready);
qemu_sem_post(&p->sem_sync);
}
}
@@ -874,7 +866,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
{
migrate_set_error(migrate_get_current(), err);
/* Error happen, we need to tell who pay attention to me */
- qemu_sem_post(&multifd_send_state->channels_ready);
qemu_sem_post(&p->sem_sync);
/*
* Although multifd_send_thread is not created, but main migration
@@ -919,7 +910,6 @@ int multifd_save_setup(Error **errp)
multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
multifd_send_state->pages = multifd_pages_init(page_count);
- qemu_sem_init(&multifd_send_state->channels_ready, 0);
qatomic_set(&multifd_send_state->exiting, 0);
multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
@ 2023-10-19 9:06 ` Juan Quintela
2023-10-19 14:35 ` Peter Xu
2023-10-19 14:55 ` Fabiano Rosas
0 siblings, 2 replies; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 9:06 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> The channels_ready semaphore is a global variable not linked to any
> single multifd channel. Waiting on it only means that "some" channel
> has become ready to send data. Since we need to address the channels
> by index (multifd_send_state->params[i]), that information adds
> nothing of value.
NAK.
I disagree here O:-)
the reason why that channel exist is for multifd_send_pages()
And simplifying the function what it does is:
sem_wait(channels_ready);
for_each_channel()
look if it is empty()
But with the semaphore, we guarantee that when we go to the loop, there
is a channel ready, so we know we donat busy wait searching for a
channel that is free.
Notice that I fully agree that the sem is not needed for locking.
Locking is done with the mutex. It is just used to make sure that we
don't busy loop on that loop.
And we use a sem, because it is the easiest way to know how many
channels are ready (even when we only care if there is one when we
arrive to that code).
We lost count of that counter, and we fixed that here:
commit d2026ee117147893f8d80f060cede6d872ecbd7f
Author: Juan Quintela <quintela@redhat.com>
Date: Wed Apr 26 12:20:36 2023 +0200
multifd: Fix the number of channels ready
We don't wait in the sem when we are doing a sync_main. Make it
And we were addressing the problem that some users where finding that we
were busy waiting on that loop.
> The channel being addressed is not necessarily the
> one that just released the semaphore.
We only care that there is at least one free. We are going to search
the next one.
Does this explanation makes sense?
Later, Juan.
> The only usage of this semaphore that makes sense is to wait for it in
> a loop that iterates for the number of channels. That could mean: all
> channels have been setup and are operational OR all channels have
> finished their work and are idle.
>
> Currently all code that waits on channels_ready is redundant. There is
> always a subsequent lock or semaphore that does the actual data
> protection/synchronization.
>
> - at multifd_send_pages: Waiting on channels_ready doesn't mean the
> 'next_channel' is ready, it could be any other channel. So there are
> already cases where this code runs as if no semaphore was there.
> Waiting outside of the loop is also incorrect because if the current
> channel already has a pending_job, then it will loop into the next
> one without waiting the semaphore and the count will be greater than
> zero at the end of the execution.
>
> Checking that "any" channel is ready as a proxy for all channels
> being ready would work, but it's not what the code is doing and not
> really needed because the channel lock and 'sem' would be enough.
>
> - at multifd_send_sync: This usage is correct, but it is made
> redundant by the wait on sem_sync. What this piece of code is doing
> is making sure all channels have sent the SYNC packet and became
> idle afterwards.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/multifd.c | 10 ----------
> 1 file changed, 10 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0f6b203877..e26f5f246d 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -362,8 +362,6 @@ struct {
> MultiFDPages_t *pages;
> /* global number of generated multifd packets */
> uint64_t packet_num;
> - /* send channels ready */
> - QemuSemaphore channels_ready;
> /*
> * Have we already run terminate threads. There is a race when it
> * happens that we got one error while we are exiting.
> @@ -403,7 +401,6 @@ static int multifd_send_pages(QEMUFile *f)
> return -1;
> }
>
> - qemu_sem_wait(&multifd_send_state->channels_ready);
> /*
> * next_channel can remain from a previous migration that was
> * using more channels, so ensure it doesn't overflow if the
> @@ -554,7 +551,6 @@ void multifd_save_cleanup(void)
> error_free(local_err);
> }
> }
> - qemu_sem_destroy(&multifd_send_state->channels_ready);
> g_free(multifd_send_state->params);
> multifd_send_state->params = NULL;
> multifd_pages_clear(multifd_send_state->pages);
> @@ -630,7 +626,6 @@ int multifd_send_sync_main(QEMUFile *f)
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
>
> - qemu_sem_wait(&multifd_send_state->channels_ready);
> trace_multifd_send_sync_main_wait(p->id);
> qemu_sem_wait(&p->sem_sync);
>
> @@ -664,7 +659,6 @@ static void *multifd_send_thread(void *opaque)
> p->num_packets = 1;
>
> while (true) {
> - qemu_sem_post(&multifd_send_state->channels_ready);
> qemu_sem_wait(&p->sem);
>
> if (qatomic_read(&multifd_send_state->exiting)) {
> @@ -759,7 +753,6 @@ out:
> */
> if (ret != 0) {
> qemu_sem_post(&p->sem_sync);
> - qemu_sem_post(&multifd_send_state->channels_ready);
> }
>
> qemu_mutex_lock(&p->mutex);
> @@ -796,7 +789,6 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
> * is not created, and then tell who pay attention to me.
> */
> p->quit = true;
> - qemu_sem_post(&multifd_send_state->channels_ready);
> qemu_sem_post(&p->sem_sync);
> }
> }
> @@ -874,7 +866,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
> {
> migrate_set_error(migrate_get_current(), err);
> /* Error happen, we need to tell who pay attention to me */
> - qemu_sem_post(&multifd_send_state->channels_ready);
> qemu_sem_post(&p->sem_sync);
> /*
> * Although multifd_send_thread is not created, but main migration
> @@ -919,7 +910,6 @@ int multifd_save_setup(Error **errp)
> multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
> multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
> multifd_send_state->pages = multifd_pages_init(page_count);
> - qemu_sem_init(&multifd_send_state->channels_ready, 0);
> qatomic_set(&multifd_send_state->exiting, 0);
> multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 9:06 ` Juan Quintela
@ 2023-10-19 14:35 ` Peter Xu
2023-10-19 15:00 ` Juan Quintela
2023-10-19 14:55 ` Fabiano Rosas
1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-10-19 14:35 UTC (permalink / raw)
To: Juan Quintela; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
Fabiano,
Sorry to look at this series late; I messed up my inbox after I reworked my
arrangement methodology of emails. ;)
On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
> Fabiano Rosas <farosas@suse.de> wrote:
> > The channels_ready semaphore is a global variable not linked to any
> > single multifd channel. Waiting on it only means that "some" channel
> > has become ready to send data. Since we need to address the channels
> > by index (multifd_send_state->params[i]), that information adds
> > nothing of value.
>
> NAK.
>
> I disagree here O:-)
>
> the reason why that channel exist is for multifd_send_pages()
>
> And simplifying the function what it does is:
>
> sem_wait(channels_ready);
>
> for_each_channel()
> look if it is empty()
>
> But with the semaphore, we guarantee that when we go to the loop, there
> is a channel ready, so we know we donat busy wait searching for a
> channel that is free.
>
> Notice that I fully agree that the sem is not needed for locking.
> Locking is done with the mutex. It is just used to make sure that we
> don't busy loop on that loop.
>
> And we use a sem, because it is the easiest way to know how many
> channels are ready (even when we only care if there is one when we
> arrive to that code).
>
> We lost count of that counter, and we fixed that here:
>
> commit d2026ee117147893f8d80f060cede6d872ecbd7f
> Author: Juan Quintela <quintela@redhat.com>
> Date: Wed Apr 26 12:20:36 2023 +0200
>
> multifd: Fix the number of channels ready
>
> We don't wait in the sem when we are doing a sync_main. Make it
>
> And we were addressing the problem that some users where finding that we
> were busy waiting on that loop.
Juan,
I can understand why send_pages needs that sem, but not when sync main.
IOW, why multifd_send_sync_main() needs:
qemu_sem_wait(&multifd_send_state->channels_ready);
If it has:
qemu_sem_wait(&p->sem_sync);
How does a busy loop happen?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 14:35 ` Peter Xu
@ 2023-10-19 15:00 ` Juan Quintela
2023-10-19 15:46 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 15:00 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
Peter Xu <peterx@redhat.com> wrote:
> Fabiano,
>
> Sorry to look at this series late; I messed up my inbox after I reworked my
> arrangement methodology of emails. ;)
>
> On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
>> Fabiano Rosas <farosas@suse.de> wrote:
>> > The channels_ready semaphore is a global variable not linked to any
>> > single multifd channel. Waiting on it only means that "some" channel
>> > has become ready to send data. Since we need to address the channels
>> > by index (multifd_send_state->params[i]), that information adds
>> > nothing of value.
>>
>> NAK.
>>
>> I disagree here O:-)
>>
>> the reason why that channel exist is for multifd_send_pages()
>>
>> And simplifying the function what it does is:
>>
>> sem_wait(channels_ready);
>>
>> for_each_channel()
>> look if it is empty()
>>
>> But with the semaphore, we guarantee that when we go to the loop, there
>> is a channel ready, so we know we donat busy wait searching for a
>> channel that is free.
>>
>> Notice that I fully agree that the sem is not needed for locking.
>> Locking is done with the mutex. It is just used to make sure that we
>> don't busy loop on that loop.
>>
>> And we use a sem, because it is the easiest way to know how many
>> channels are ready (even when we only care if there is one when we
>> arrive to that code).
>>
>> We lost count of that counter, and we fixed that here:
>>
>> commit d2026ee117147893f8d80f060cede6d872ecbd7f
>> Author: Juan Quintela <quintela@redhat.com>
>> Date: Wed Apr 26 12:20:36 2023 +0200
>>
>> multifd: Fix the number of channels ready
>>
>> We don't wait in the sem when we are doing a sync_main. Make it
>>
>> And we were addressing the problem that some users where finding that we
>> were busy waiting on that loop.
>
> Juan,
>
> I can understand why send_pages needs that sem, but not when sync main.
> IOW, why multifd_send_sync_main() needs:
>
> qemu_sem_wait(&multifd_send_state->channels_ready);
>
> If it has:
>
> qemu_sem_wait(&p->sem_sync);
>
> How does a busy loop happen?
What does multifd_send_thread() for a SYNC packet.
static void *multifd_send_thread(void *opaque)
{
while (true) {
qemu_sem_post(&multifd_send_state->channels_ready);
qemu_sem_wait(&p->sem);
qemu_mutex_lock(&p->mutex);
if (p->pending_job) {
....
qemu_mutex_unlock(&p->mutex);
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
}
}
I have simplified it a lot, but yot the idea.
See the 1st post of channel_ready().
We do it for every packet sent. Even for the SYNC ones.
Now what multifd_send_page() does?
static int multifd_send_pages(QEMUFile *f)
{
qemu_sem_wait(&multifd_send_state->channels_ready);
....
}
See, we are decreasing the numbers of channels_ready because we know we
are using one.
As we are sending packets for multifd_send_sync_main(), we need to do a
hack in multifd_send_thread() and say that sync packets don't
account. Or we need to decrease that semaphore in multifd_send_sync_main()
int multifd_send_sync_main(QEMUFile *f)
{
....
for (i = 0; i < migrate_multifd_channels(); i++) {
qemu_sem_wait(&multifd_send_state->channels_ready);
...
}
}
And that is what we do here.
We didn't had this last line (not needed for making sure the channels
are ready here).
But needed to make sure that we are maintaining channels_ready exact.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 15:00 ` Juan Quintela
@ 2023-10-19 15:46 ` Peter Xu
2023-10-19 18:28 ` Juan Quintela
0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-10-19 15:46 UTC (permalink / raw)
To: Juan Quintela; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > Fabiano,
> >
> > Sorry to look at this series late; I messed up my inbox after I reworked my
> > arrangement methodology of emails. ;)
> >
> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
> >> Fabiano Rosas <farosas@suse.de> wrote:
> >> > The channels_ready semaphore is a global variable not linked to any
> >> > single multifd channel. Waiting on it only means that "some" channel
> >> > has become ready to send data. Since we need to address the channels
> >> > by index (multifd_send_state->params[i]), that information adds
> >> > nothing of value.
> >>
> >> NAK.
> >>
> >> I disagree here O:-)
> >>
> >> the reason why that channel exist is for multifd_send_pages()
> >>
> >> And simplifying the function what it does is:
> >>
> >> sem_wait(channels_ready);
> >>
> >> for_each_channel()
> >> look if it is empty()
> >>
> >> But with the semaphore, we guarantee that when we go to the loop, there
> >> is a channel ready, so we know we donat busy wait searching for a
> >> channel that is free.
> >>
> >> Notice that I fully agree that the sem is not needed for locking.
> >> Locking is done with the mutex. It is just used to make sure that we
> >> don't busy loop on that loop.
> >>
> >> And we use a sem, because it is the easiest way to know how many
> >> channels are ready (even when we only care if there is one when we
> >> arrive to that code).
> >>
> >> We lost count of that counter, and we fixed that here:
> >>
> >> commit d2026ee117147893f8d80f060cede6d872ecbd7f
> >> Author: Juan Quintela <quintela@redhat.com>
> >> Date: Wed Apr 26 12:20:36 2023 +0200
> >>
> >> multifd: Fix the number of channels ready
> >>
> >> We don't wait in the sem when we are doing a sync_main. Make it
> >>
> >> And we were addressing the problem that some users where finding that we
> >> were busy waiting on that loop.
> >
> > Juan,
> >
> > I can understand why send_pages needs that sem, but not when sync main.
> > IOW, why multifd_send_sync_main() needs:
> >
> > qemu_sem_wait(&multifd_send_state->channels_ready);
> >
> > If it has:
> >
> > qemu_sem_wait(&p->sem_sync);
> >
> > How does a busy loop happen?
>
> What does multifd_send_thread() for a SYNC packet.
>
> static void *multifd_send_thread(void *opaque)
> {
> while (true) {
> qemu_sem_post(&multifd_send_state->channels_ready);
> qemu_sem_wait(&p->sem);
>
> qemu_mutex_lock(&p->mutex);
>
> if (p->pending_job) {
> ....
> qemu_mutex_unlock(&p->mutex);
>
> if (flags & MULTIFD_FLAG_SYNC) {
> qemu_sem_post(&p->sem_sync);
> }
> }
> }
>
> I have simplified it a lot, but yot the idea.
>
> See the 1st post of channel_ready().
> We do it for every packet sent. Even for the SYNC ones.
>
> Now what multifd_send_page() does?
>
> static int multifd_send_pages(QEMUFile *f)
> {
> qemu_sem_wait(&multifd_send_state->channels_ready);
> ....
> }
>
> See, we are decreasing the numbers of channels_ready because we know we
> are using one.
>
> As we are sending packets for multifd_send_sync_main(), we need to do a
> hack in multifd_send_thread() and say that sync packets don't
> account. Or we need to decrease that semaphore in multifd_send_sync_main()
>
> int multifd_send_sync_main(QEMUFile *f)
> {
> ....
> for (i = 0; i < migrate_multifd_channels(); i++) {
> qemu_sem_wait(&multifd_send_state->channels_ready);
> ...
> }
> }
>
> And that is what we do here.
> We didn't had this last line (not needed for making sure the channels
> are ready here).
>
> But needed to make sure that we are maintaining channels_ready exact.
I didn't expect it to be exact, I think that's the major part of confusion.
For example, I see this comment:
static void *multifd_send_thread(void *opaque)
...
} else {
qemu_mutex_unlock(&p->mutex);
/* sometimes there are spurious wakeups */
}
So do we have spurious wakeup anywhere for either p->sem or channels_ready?
They are related, because if we got spurious p->sem wakeups, then we'll
boost channels_ready one more time too there.
I think two ways to go here:
- If we want to make them all exact: we'd figure out where are spurious
wake ups and we fix all of them. Or,
- IMHO we can also make them not exact. It means they can allow
spurious, and code can actually also work like that. One example is
e.g. what happens if we get spurious wakeup in multifd_send_pages() for
channels_ready? We simply do some cpu loops as long as we double check
with each channel again, we can even do better that if looping over N
channels and see all busy, "goto retry" and wait on the sem again.
What do you think?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 15:46 ` Peter Xu
@ 2023-10-19 18:28 ` Juan Quintela
2023-10-19 18:50 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 18:28 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
>> Peter Xu <peterx@redhat.com> wrote:
>> > Fabiano,
>> >
>> > Sorry to look at this series late; I messed up my inbox after I reworked my
>> > arrangement methodology of emails. ;)
>> >
>> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
>> >> Fabiano Rosas <farosas@suse.de> wrote:
>> >> > The channels_ready semaphore is a global variable not linked to any
>> >> > single multifd channel. Waiting on it only means that "some" channel
>> >> > has become ready to send data. Since we need to address the channels
>> >> > by index (multifd_send_state->params[i]), that information adds
>> >> > nothing of value.
>> And that is what we do here.
>> We didn't had this last line (not needed for making sure the channels
>> are ready here).
>>
>> But needed to make sure that we are maintaining channels_ready exact.
>
> I didn't expect it to be exact, I think that's the major part of confusion.
> For example, I see this comment:
>
> static void *multifd_send_thread(void *opaque)
> ...
> } else {
> qemu_mutex_unlock(&p->mutex);
> /* sometimes there are spurious wakeups */
> }
I put that there during development, and let it there just to be safe.
Years later I put an assert() there and did lots of migrations, never
hit it.
> So do we have spurious wakeup anywhere for either p->sem or channels_ready?
> They are related, because if we got spurious p->sem wakeups, then we'll
> boost channels_ready one more time too there.
I think that we can change that for g_assert_not_reached()
> I think two ways to go here:
>
> - If we want to make them all exact: we'd figure out where are spurious
> wake ups and we fix all of them. Or,
This one.
> - IMHO we can also make them not exact. It means they can allow
> spurious, and code can actually also work like that. One example is
> e.g. what happens if we get spurious wakeup in multifd_send_pages() for
> channels_ready? We simply do some cpu loops as long as we double check
> with each channel again, we can even do better that if looping over N
> channels and see all busy, "goto retry" and wait on the sem again.
>
> What do you think?
Make sure that it is exact O:-)
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 18:28 ` Juan Quintela
@ 2023-10-19 18:50 ` Peter Xu
2023-10-20 7:56 ` Juan Quintela
0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-10-19 18:50 UTC (permalink / raw)
To: Juan Quintela; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
> >> Peter Xu <peterx@redhat.com> wrote:
> >> > Fabiano,
> >> >
> >> > Sorry to look at this series late; I messed up my inbox after I reworked my
> >> > arrangement methodology of emails. ;)
> >> >
> >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
> >> >> Fabiano Rosas <farosas@suse.de> wrote:
> >> >> > The channels_ready semaphore is a global variable not linked to any
> >> >> > single multifd channel. Waiting on it only means that "some" channel
> >> >> > has become ready to send data. Since we need to address the channels
> >> >> > by index (multifd_send_state->params[i]), that information adds
> >> >> > nothing of value.
>
> >> And that is what we do here.
> >> We didn't had this last line (not needed for making sure the channels
> >> are ready here).
> >>
> >> But needed to make sure that we are maintaining channels_ready exact.
> >
> > I didn't expect it to be exact, I think that's the major part of confusion.
> > For example, I see this comment:
> >
> > static void *multifd_send_thread(void *opaque)
> > ...
> > } else {
> > qemu_mutex_unlock(&p->mutex);
> > /* sometimes there are spurious wakeups */
> > }
>
> I put that there during development, and let it there just to be safe.
> Years later I put an assert() there and did lots of migrations, never
> hit it.
>
> > So do we have spurious wakeup anywhere for either p->sem or channels_ready?
> > They are related, because if we got spurious p->sem wakeups, then we'll
> > boost channels_ready one more time too there.
>
> I think that we can change that for g_assert_not_reached()
Sounds good. We can also use an error_erport_once(), depending on your
confidence of that. :) Dropping that comment definitely helps.
I had a quick look, indeed I think it's safe even with assert. We may want
to put some more comment on when one should kick p->sem; IIUC it can only
be kicked in either (1) pending_job increased, or (2) set exiting=1. Then
it seems all guaranteed.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 18:50 ` Peter Xu
@ 2023-10-20 7:56 ` Juan Quintela
0 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2023-10-20 7:56 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote:
>> Peter Xu <peterx@redhat.com> wrote:
>> > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
>> >> Peter Xu <peterx@redhat.com> wrote:
>> >> > Fabiano,
>> >> >
>> >> > Sorry to look at this series late; I messed up my inbox after I reworked my
>> >> > arrangement methodology of emails. ;)
>> >> >
>> >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
>> >> >> Fabiano Rosas <farosas@suse.de> wrote:
>> >> >> > The channels_ready semaphore is a global variable not linked to any
>> >> >> > single multifd channel. Waiting on it only means that "some" channel
>> >> >> > has become ready to send data. Since we need to address the channels
>> >> >> > by index (multifd_send_state->params[i]), that information adds
>> >> >> > nothing of value.
>>
>> >> And that is what we do here.
>> >> We didn't had this last line (not needed for making sure the channels
>> >> are ready here).
>> >>
>> >> But needed to make sure that we are maintaining channels_ready exact.
>> >
>> > I didn't expect it to be exact, I think that's the major part of confusion.
>> > For example, I see this comment:
>> >
>> > static void *multifd_send_thread(void *opaque)
>> > ...
>> > } else {
>> > qemu_mutex_unlock(&p->mutex);
>> > /* sometimes there are spurious wakeups */
>> > }
>>
>> I put that there during development, and let it there just to be safe.
>> Years later I put an assert() there and did lots of migrations, never
>> hit it.
>>
>> > So do we have spurious wakeup anywhere for either p->sem or channels_ready?
>> > They are related, because if we got spurious p->sem wakeups, then we'll
>> > boost channels_ready one more time too there.
>>
>> I think that we can change that for g_assert_not_reached()
>
> Sounds good. We can also use an error_erport_once(), depending on your
> confidence of that. :) Dropping that comment definitely helps.
>
> I had a quick look, indeed I think it's safe even with assert. We may want
> to put some more comment on when one should kick p->sem; IIUC it can only
> be kicked in either (1) pending_job increased, or (2) set exiting=1. Then
> it seems all guaranteed.
I think we can change the end of the loop from:
qemu_mutex_unlock(&p->mutex);
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
} else {
qemu_mutex_unlock(&p->mutex);
/* sometimes there are spurious wakeups */
}
to:
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
}
qemu_mutex_unlock(&p->mutex);
And call it a day. But we can leave one assert there.
But I would preffer to do this kind of locking changes at the beggining
of next cycle.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 9:06 ` Juan Quintela
2023-10-19 14:35 ` Peter Xu
@ 2023-10-19 14:55 ` Fabiano Rosas
2023-10-19 15:18 ` Juan Quintela
1 sibling, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-19 14:55 UTC (permalink / raw)
To: quintela; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Juan Quintela <quintela@redhat.com> writes:
> Fabiano Rosas <farosas@suse.de> wrote:
>> The channels_ready semaphore is a global variable not linked to any
>> single multifd channel. Waiting on it only means that "some" channel
>> has become ready to send data. Since we need to address the channels
>> by index (multifd_send_state->params[i]), that information adds
>> nothing of value.
>
> NAK.
>
> I disagree here O:-)
>
> the reason why that channel exist is for multifd_send_pages()
>
> And simplifying the function what it does is:
>
> sem_wait(channels_ready);
>
> for_each_channel()
> look if it is empty()
>
> But with the semaphore, we guarantee that when we go to the loop, there
> is a channel ready, so we know we donat busy wait searching for a
> channel that is free.
>
Ok, so that clarifies the channels_ready usage.
Now, thinking out loud... can't we simply (famous last words) remove the
"if (!p->pending_job)" line and let multifd_send_pages() prepare another
payload for the channel? That way multifd_send_pages() could already
return and the channel would see one more pending_job and proceed to
send it.
Or, since there's no resending anyway, we could dec pending_jobs earlier
before unlocking the channel. It seems the channel could be made ready
for another job as soon as the packet is built and the lock is released.
That way we could remove the semaphore and let the mutex do the job of
waiting for the channel to become ready.
> Notice that I fully agree that the sem is not needed for locking.
> Locking is done with the mutex. It is just used to make sure that we
> don't busy loop on that loop.
>
> And we use a sem, because it is the easiest way to know how many
> channels are ready (even when we only care if there is one when we
> arrive to that code).
Yep, that's fine, no objection here.
>
> We lost count of that counter, and we fixed that here:
Kind of, because we still don't wait on it during cleanup. If we did,
then we could have an assert at the end to make sure this doesn't
regress again.
And maybe use channels_ready.count for other types of introspection.
> commit d2026ee117147893f8d80f060cede6d872ecbd7f
> Author: Juan Quintela <quintela@redhat.com>
> Date: Wed Apr 26 12:20:36 2023 +0200
>
> multifd: Fix the number of channels ready
>
> We don't wait in the sem when we are doing a sync_main. Make it
>
> And we were addressing the problem that some users where finding that we
> were busy waiting on that loop.
>
>> The channel being addressed is not necessarily the
>> one that just released the semaphore.
>
> We only care that there is at least one free. We are going to search
> the next one.
>
> Does this explanation makes sense?
It does, thanks for taking the time to educate us =)
I made some suggestions above, but I might be missing something still.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 14:55 ` Fabiano Rosas
@ 2023-10-19 15:18 ` Juan Quintela
2023-10-19 15:56 ` Fabiano Rosas
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 15:18 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> wrote:
>>> The channels_ready semaphore is a global variable not linked to any
>>> single multifd channel. Waiting on it only means that "some" channel
>>> has become ready to send data. Since we need to address the channels
>>> by index (multifd_send_state->params[i]), that information adds
>>> nothing of value.
>>
>> NAK.
>>
>> I disagree here O:-)
>>
>> the reason why that channel exist is for multifd_send_pages()
>>
>> And simplifying the function what it does is:
>>
>> sem_wait(channels_ready);
>>
>> for_each_channel()
>> look if it is empty()
>>
>> But with the semaphore, we guarantee that when we go to the loop, there
>> is a channel ready, so we know we donat busy wait searching for a
>> channel that is free.
>>
>
> Ok, so that clarifies the channels_ready usage.
>
> Now, thinking out loud... can't we simply (famous last words) remove the
> "if (!p->pending_job)" line and let multifd_send_pages() prepare another
> payload for the channel? That way multifd_send_pages() could already
> return and the channel would see one more pending_job and proceed to
> send it.
No.
See the while loop in multifd_send_thread()
while (true) {
qemu_mutex_lock(&p->mutex);
if (p->pending_job) {
......
Do things with parts of the struct that are shared with the
migration thread
....
qemu_mutex_unlock(&p->mutex);
// Drop the lock
// Do mothing things on the channel, pending_job means that
// it is working
// mutex dropped means that migration_thread can use the
// shared variables, but not the channel
// now here we decrease pending_job, so main thread can
// change things as it wants
// But we need to take the lock first.
qemu_mutex_lock(&p->mutex);
p->pending_job--;
qemu_mutex_unlock(&p->mutex);
......
}
}
This is a common pattern for concurrency. To not have your mutex locked
too long, you put a variable (that can only be tested/changed with the
lock) to explain that the "channel" is busy, the struct that lock
protects is not (see how we make sure that the channel don't use any
variable of the struct without the locking).
> Or, since there's no resending anyway, we could dec pending_jobs earlier
> before unlocking the channel. It seems the channel could be made ready
> for another job as soon as the packet is built and the lock is released.
pending_jobs can be transformed in a bool. We just need to make sure
that we didn't screw it in _sync().
> That way we could remove the semaphore and let the mutex do the job of
> waiting for the channel to become ready.
As said, we don't want that. Because channels can go a different speeds
due to factors outside of our control.
If the semaphore bothers you, you can change it to to a condition
variable, but you just move the complexity from one side to the other
(Initial implementation had a condition variable, but Paolo said that
the semaphore is more efficient, so he won)
>> Notice that I fully agree that the sem is not needed for locking.
>> Locking is done with the mutex. It is just used to make sure that we
>> don't busy loop on that loop.
>>
>> And we use a sem, because it is the easiest way to know how many
>> channels are ready (even when we only care if there is one when we
>> arrive to that code).
>
> Yep, that's fine, no objection here.
>
>>
>> We lost count of that counter, and we fixed that here:
>
> Kind of, because we still don't wait on it during cleanup. If we did,
> then we could have an assert at the end to make sure this doesn't
> regress again.
>
> And maybe use channels_ready.count for other types of introspection.
We could.
>> commit d2026ee117147893f8d80f060cede6d872ecbd7f
>> Author: Juan Quintela <quintela@redhat.com>
>> Date: Wed Apr 26 12:20:36 2023 +0200
>>
>> multifd: Fix the number of channels ready
>>
>> We don't wait in the sem when we are doing a sync_main. Make it
>>
>> And we were addressing the problem that some users where finding that we
>> were busy waiting on that loop.
>>
>>> The channel being addressed is not necessarily the
>>> one that just released the semaphore.
>>
>> We only care that there is at least one free. We are going to search
>> the next one.
>>
>> Does this explanation makes sense?
>
> It does, thanks for taking the time to educate us =)
>
> I made some suggestions above, but I might be missing something still.
I think that the current code is already quite efficient.
But will have to think if that improves anything major.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 15:18 ` Juan Quintela
@ 2023-10-19 15:56 ` Fabiano Rosas
2023-10-19 18:41 ` Juan Quintela
0 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-19 15:56 UTC (permalink / raw)
To: quintela; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Juan Quintela <quintela@redhat.com> writes:
> Fabiano Rosas <farosas@suse.de> wrote:
>> Juan Quintela <quintela@redhat.com> writes:
>>
>>> Fabiano Rosas <farosas@suse.de> wrote:
>>>> The channels_ready semaphore is a global variable not linked to any
>>>> single multifd channel. Waiting on it only means that "some" channel
>>>> has become ready to send data. Since we need to address the channels
>>>> by index (multifd_send_state->params[i]), that information adds
>>>> nothing of value.
>>>
>>> NAK.
>>>
>>> I disagree here O:-)
>>>
>>> the reason why that channel exist is for multifd_send_pages()
>>>
>>> And simplifying the function what it does is:
>>>
>>> sem_wait(channels_ready);
>>>
>>> for_each_channel()
>>> look if it is empty()
>>>
>>> But with the semaphore, we guarantee that when we go to the loop, there
>>> is a channel ready, so we know we donat busy wait searching for a
>>> channel that is free.
>>>
>>
>> Ok, so that clarifies the channels_ready usage.
>>
>> Now, thinking out loud... can't we simply (famous last words) remove the
>> "if (!p->pending_job)" line and let multifd_send_pages() prepare another
>> payload for the channel? That way multifd_send_pages() could already
>> return and the channel would see one more pending_job and proceed to
>> send it.
>
> No.
>
> See the while loop in multifd_send_thread()
>
> while (true) {
> qemu_mutex_lock(&p->mutex);
>
> if (p->pending_job) {
>
> ......
> Do things with parts of the struct that are shared with the
> migration thread
> ....
> qemu_mutex_unlock(&p->mutex);
>
> // Drop the lock
> // Do mothing things on the channel, pending_job means that
> // it is working
> // mutex dropped means that migration_thread can use the
> // shared variables, but not the channel
>
> // now here we decrease pending_job, so main thread can
> // change things as it wants
> // But we need to take the lock first.
> qemu_mutex_lock(&p->mutex);
> p->pending_job--;
> qemu_mutex_unlock(&p->mutex);
> ......
> }
> }
>
> This is a common pattern for concurrency. To not have your mutex locked
> too long, you put a variable (that can only be tested/changed with the
> lock) to explain that the "channel" is busy, the struct that lock
> protects is not (see how we make sure that the channel don't use any
> variable of the struct without the locking).
Sure, but what purpose is to mark the channel as busy? The migration
thread cannot access the p->packet anyway. From multifd_send_pages()
perspective, as soon as the channel releases the lock to start with the
IO, the packet has been sent. It could start preparing the next pages
struct while the channel is doing IO. No?
We don't touch p after the IO aside from p->pending_jobs-- and we
already distribute the load uniformly by incrementing next_channel.
I'm not saying this would be more performant, just wondering if it would
be possible.
>
>
>> Or, since there's no resending anyway, we could dec pending_jobs earlier
>> before unlocking the channel. It seems the channel could be made ready
>> for another job as soon as the packet is built and the lock is released.
>
> pending_jobs can be transformed in a bool. We just need to make sure
> that we didn't screw it in _sync().
>
>> That way we could remove the semaphore and let the mutex do the job of
>> waiting for the channel to become ready.
>
> As said, we don't want that. Because channels can go a different speeds
> due to factors outside of our control.
>
> If the semaphore bothers you, you can change it to to a condition
> variable, but you just move the complexity from one side to the other
> (Initial implementation had a condition variable, but Paolo said that
> the semaphore is more efficient, so he won)
Oh, it doesn't bother me. I'm just trying to unequivocally understand
it's effects. And if it logically follows that it's not necessary, only
then remove it.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 15:56 ` Fabiano Rosas
@ 2023-10-19 18:41 ` Juan Quintela
2023-10-19 19:04 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 18:41 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> wrote:
>>> Juan Quintela <quintela@redhat.com> writes:
>>>
>>
>> This is a common pattern for concurrency. To not have your mutex locked
>> too long, you put a variable (that can only be tested/changed with the
>> lock) to explain that the "channel" is busy, the struct that lock
>> protects is not (see how we make sure that the channel don't use any
>> variable of the struct without the locking).
>
> Sure, but what purpose is to mark the channel as busy? The migration
> thread cannot access the p->packet anyway. From multifd_send_pages()
> perspective, as soon as the channel releases the lock to start with the
> IO, the packet has been sent. It could start preparing the next pages
> struct while the channel is doing IO. No?
ok, we remove the pending.
Then we are sending that packet.
But see what happens on multifd_send_pages()
channels_ready is 0.
this is channel 1
next_channel == 1
channel 0 gets ready, so it increases channels_ready.
static int multifd_send_pages(QEMUFile *f)
{
qemu_sem_wait(&multifd_send_state->channels_ready);
// we pass this
next_channel %= migrate_multifd_channels();
for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
p = &multifd_send_state->params[i];
// remember that i == 0
qemu_mutex_lock(&p->mutex);
if (p->quit) {
error_report("%s: channel %d has already quit!", __func__, i);
qemu_mutex_unlock(&p->mutex);
return -1;
}
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
qemu_mutex_unlock(&p->mutex);
// We choose 1, to send the packet through it.
// channel 1 is busy.
// channel 0 is idle but receives no work.
}
...
}
So the variable is there to differentiate what channels are busy/idle to
send the work to the idle channels.
> We don't touch p after the IO aside from p->pending_jobs-- and we
> already distribute the load uniformly by incrementing next_channel.
I know. After multifd_send_threads() releases the mutex it will only
touch ->pending_job (taking the mutex 1st).
> I'm not saying this would be more performant, just wondering if it would
> be possible.
Yeap, but as said before quite suboptimal.
>> As said, we don't want that. Because channels can go a different speeds
>> due to factors outside of our control.
>>
>> If the semaphore bothers you, you can change it to to a condition
>> variable, but you just move the complexity from one side to the other
>> (Initial implementation had a condition variable, but Paolo said that
>> the semaphore is more efficient, so he won)
>
> Oh, it doesn't bother me. I'm just trying to unequivocally understand
> it's effects. And if it logically follows that it's not necessary, only
> then remove it.
Both channels_ready and pending_job makes the scheme more performant.
Without them it will not fail, just work way slower.
In the example that just showed you, if we started always from channel 0
to search for a idle channel, we would even do worse (that would be an
actual error):
start with channels_ready == 0;
channels_ready is 0.
channel 1 gets ready, so it increases channels_ready.
static int multifd_send_pages(QEMUFile *f)
{
qemu_sem_wait(&multifd_send_state->channels_ready);
// we pass this
for (i = 0;; i = (i + 1) % migrate_multifd_channels()) {
p = &multifd_send_state->params[i];
// remember that i == 0
qemu_mutex_lock(&p->mutex);
if (p->quit) {
error_report("%s: channel %d has already quit!", __func__, i);
qemu_mutex_unlock(&p->mutex);
return -1;
}
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
// As there is no test to see if this is idle, we put the page here
qemu_mutex_unlock(&p->mutex);
// We put here the page info
}
...
}
channel 2 guest ready, so it increses channels_ready
static int multifd_send_pages(QEMUFile *f)
{
qemu_sem_wait(&multifd_send_state->channels_ready);
// we pass this
for (i = 0;; i = (i + 1) % migrate_multifd_channels()) {
p = &multifd_send_state->params[i];
// remember that i == 0
qemu_mutex_lock(&p->mutex);
if (p->quit) {
error_report("%s: channel %d has already quit!", __func__, i);
qemu_mutex_unlock(&p->mutex);
return -1;
}
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
// As there is no test to see if this is idle, we put the page here
qemu_mutex_unlock(&p->mutex);
// We put here the page info
// channel 0 is still transmitting the 1st page
// And overwrote the previous page info
}
...
}
In this particular case, using next_channel in round robin would have
saved the case. When you put info for a channel to consume
asynhronously, you need to mark somehow that the channel has finished to
use the data before ordering it to put do more job.
We can changing pending_job to a bool if you preffer. I think that we
have nailed all the off_by_one errors by now (famous last words).
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 18:41 ` Juan Quintela
@ 2023-10-19 19:04 ` Peter Xu
2023-10-20 7:53 ` Juan Quintela
0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2023-10-19 19:04 UTC (permalink / raw)
To: Juan Quintela; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Thu, Oct 19, 2023 at 08:41:26PM +0200, Juan Quintela wrote:
> We can changing pending_job to a bool if you preffer. I think that we
> have nailed all the off_by_one errors by now (famous last words).
Would it work to make pending_job a bool, even with SYNC? It seems to me
multifd_send_sync_main() now can boost pending_job even if pending_job==1.
That's also the place where I really think confusing too; where it looks
like the migration thread can modify a pending job's flag as long as it is
fast enough before the send thread put that onto the wire. Then it's
unpredictable whether the SYNC flag will be sent with current packet (where
due to pending_jobs==1 already, can contain valid pages), or will be only
set for the next one (where there will have 0 real page).
IMHO it'll be good to separate the sync task, then we can change
pending_jobs to bool. Something like:
bool pending_send_page;
bool pending_send_sync;
Then multifd_send_thread() handles them separately, only attaching
p->flags=SYNC when pending_send_sync is requested. It guarantees a SYNC
message will always be a separate packet, which will be crystal clear then.
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-19 19:04 ` Peter Xu
@ 2023-10-20 7:53 ` Juan Quintela
2023-10-20 12:48 ` Fabiano Rosas
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-20 7:53 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 08:41:26PM +0200, Juan Quintela wrote:
>> We can changing pending_job to a bool if you preffer. I think that we
>> have nailed all the off_by_one errors by now (famous last words).
>
> Would it work to make pending_job a bool, even with SYNC? It seems to me
> multifd_send_sync_main() now can boost pending_job even if pending_job==1.
Then a int is ok, I think.
> That's also the place where I really think confusing too; where it looks
> like the migration thread can modify a pending job's flag as long as it is
> fast enough before the send thread put that onto the wire.
It never does.
for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
qemu_mutex_lock(&p->mutex);
...
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
qemu_mutex_unlock(&p->mutex);
}
If pending_job == 0 -> owner of the channel is migration_thread and it
can use it.
If pending_job > 0 -> owner of the channel is the channel thread and
migration_thread can't use it.
I think that this is easy to understand. You are right that it is not
_explained_. And clearly, if you have to ask, it is not obvious O:-)
Yes, it was obvious to me, that is the reason why I wrote it on the 1st
place. Notice also that it is a common idiom in multithreaded apps.
That allows it to do stuff without having to have a mutex locked, so
other threads can "look" into the state.
> Then it's
> unpredictable whether the SYNC flag will be sent with current packet (where
> due to pending_jobs==1 already, can contain valid pages), or will be only
> set for the next one (where there will have 0 real page).
I have to think about this one.
Decrease pending_jobs there if we are doing multiple jobs?
But we still have the issue of the semaphore.
> IMHO it'll be good to separate the sync task, then we can change
> pending_jobs to bool. Something like:
>
> bool pending_send_page;
> bool pending_send_sync;
current code:
qemu_mutex_lock(&p->mutex);
qemu_mutex_lock(&p->mutex);
if (p->pending_job) {
uint64_t packet_num = p->packet_num;
uint32_t flags;
p->normal_num = 0;
if (use_zero_copy_send) {
p->iovs_num = 0;
} else {
p->iovs_num = 1;
}
for (int i = 0; i < p->pages->num; i++) {
p->normal[p->normal_num] = p->pages->offset[i];
p->normal_num++;
}
if (p->normal_num) {
ret = multifd_send_state->ops->send_prepare(p, &local_err);
if (ret != 0) {
qemu_mutex_unlock(&p->mutex);
break;
}
}
multifd_send_fill_packet(p);
flags = p->flags;
p->flags = 0;
p->num_packets++;
p->total_normal_pages += p->normal_num;
p->pages->num = 0;
p->pages->block = NULL;
qemu_mutex_unlock(&p->mutex);
trace_multifd_send(p->id, packet_num, p->normal_num, flags,
p->next_packet_size);
if (use_zero_copy_send) {
/* Send header first, without zerocopy */
ret = qio_channel_write_all(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret != 0) {
break;
}
} else {
/* Send header using the same writev call */
p->iov[0].iov_len = p->packet_len;
p->iov[0].iov_base = p->packet;
}
ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
0, p->write_flags, &local_err);
if (ret != 0) {
break;
}
stat64_add(&mig_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
stat64_add(&mig_stats.transferred,
p->next_packet_size + p->packet_len);
p->next_packet_size = 0;
qemu_mutex_lock(&p->mutex);
p->pending_job--;
qemu_mutex_unlock(&p->mutex);
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
} else {
qemu_mutex_unlock(&p->mutex);
/* sometimes there are spurious wakeups */
}
Your suggested change:
qemu_mutex_lock(&p->mutex);
if (p->pending_job_page) {
uint64_t packet_num = p->packet_num;
uint32_t flags;
p->normal_num = 0;
if (use_zero_copy_send) {
p->iovs_num = 0;
} else {
p->iovs_num = 1;
}
for (int i = 0; i < p->pages->num; i++) {
p->normal[p->normal_num] = p->pages->offset[i];
p->normal_num++;
}
if (p->normal_num) {
ret = multifd_send_state->ops->send_prepare(p, &local_err);
if (ret != 0) {
qemu_mutex_unlock(&p->mutex);
break;
}
}
multifd_send_fill_packet(p);
flags = p->flags;
p->flags = 0;
p->num_packets++;
p->total_normal_pages += p->normal_num;
p->pages->num = 0;
p->pages->block = NULL;
qemu_mutex_unlock(&p->mutex);
trace_multifd_send(p->id, packet_num, p->normal_num, flags,
p->next_packet_size);
if (use_zero_copy_send) {
/* Send header first, without zerocopy */
ret = qio_channel_write_all(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret != 0) {
break;
}
} else {
/* Send header using the same writev call */
p->iov[0].iov_len = p->packet_len;
p->iov[0].iov_base = p->packet;
}
ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
0, p->write_flags, &local_err);
if (ret != 0) {
break;
}
stat64_add(&mig_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
stat64_add(&mig_stats.transferred,
p->next_packet_size + p->packet_len);
p->next_packet_size = 0;
qemu_mutex_lock(&p->mutex);
p->pending_job_page = false;
qemu_mutex_unlock(&p->mutex);
else if (p->pending_job_sync)
uint64_t packet_num = p->packet_num;
uint32_t flags;
p->normal_num = 0;
if (use_zero_copy_send) {
p->iovs_num = 0;
} else {
p->iovs_num = 1;
}
multifd_send_fill_packet(p);
flags = p->flags;
p->flags = 0;
p->num_packets++;
p->total_normal_pages += p->normal_num;
p->pages->num = 0;
p->pages->block = NULL;
qemu_mutex_unlock(&p->mutex);
trace_multifd_send(p->id, packet_num, p->normal_num, flags,
p->next_packet_size);
if (use_zero_copy_send) {
/* Send header first, without zerocopy */
ret = qio_channel_write_all(p->c, (void *)p->packet,
p->packet_len, &local_err);
if (ret != 0) {
break;
}
} else {
/* Send header using the same writev call */
p->iov[0].iov_len = p->packet_len;
p->iov[0].iov_base = p->packet;
}
ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
0, p->write_flags, &local_err);
if (ret != 0) {
break;
}
stat64_add(&mig_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
stat64_add(&mig_stats.transferred,
p->next_packet_size + p->packet_len);
p->next_packet_size = 0;
qemu_mutex_lock(&p->mutex);
p->pending_job_sync = false;
qemu_mutex_unlock(&p->mutex);
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
} else {
qemu_mutex_unlock(&p->mutex);
/* sometimes there are spurious wakeups */
}
I.e. we duplicate much more code than the one that we remove. I am not
convinced.
> Then multifd_send_thread() handles them separately, only attaching
> p->flags=SYNC when pending_send_sync is requested. It guarantees a SYNC
> message will always be a separate packet, which will be crystal clear then.
This is not a requirement.
Code should handle the reception of SYNC with a page. We just don't
sent them because it is more complex.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-20 7:53 ` Juan Quintela
@ 2023-10-20 12:48 ` Fabiano Rosas
2023-10-22 20:17 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-20 12:48 UTC (permalink / raw)
To: quintela, Peter Xu; +Cc: qemu-devel, Leonardo Bras, Elena Ufimtseva
Juan Quintela <quintela@redhat.com> writes:
> Peter Xu <peterx@redhat.com> wrote:
>> On Thu, Oct 19, 2023 at 08:41:26PM +0200, Juan Quintela wrote:
>>> We can changing pending_job to a bool if you preffer. I think that we
>>> have nailed all the off_by_one errors by now (famous last words).
>>
>> Would it work to make pending_job a bool, even with SYNC? It seems to me
>> multifd_send_sync_main() now can boost pending_job even if pending_job==1.
>
> Then a int is ok, I think.
>
>> That's also the place where I really think confusing too; where it looks
>> like the migration thread can modify a pending job's flag as long as it is
>> fast enough before the send thread put that onto the wire.
>
> It never does.
>
> for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> qemu_mutex_lock(&p->mutex);
> ...
> if (!p->pending_job) {
> p->pending_job++;
> next_channel = (i + 1) % migrate_multifd_channels();
> break;
> }
> qemu_mutex_unlock(&p->mutex);
> }
We copy the flags into the packet header at multifd_send_fill_packet()
before unlocking. I think that is even independent of the pending_jobs
scheme.
> If pending_job == 0 -> owner of the channel is migration_thread and it
> can use it.
>
> If pending_job > 0 -> owner of the channel is the channel thread and
> migration_thread can't use it.
Do you really mean "migration_thread" here or just multifd_send_pages()?
Because multifd_send_sync_main() doesn't care about this ownership
rule. Does that mean that code is incorrect?
> I think that this is easy to understand. You are right that it is not
> _explained_. And clearly, if you have to ask, it is not obvious O:-)
It is explained at multifd.h. But it doesn't really matter because code
can drift and documentation doesn't assure correctness. That's why we
have to ask about seemingly obvious stuff.
> Yes, it was obvious to me, that is the reason why I wrote it on the
> 1st place. Notice also that it is a common idiom in multithreaded
> apps. That allows it to do stuff without having to have a mutex
> locked, so other threads can "look" into the state.
>> Then it's
>> unpredictable whether the SYNC flag will be sent with current packet (where
>> due to pending_jobs==1 already, can contain valid pages), or will be only
>> set for the next one (where there will have 0 real page).
>
> I have to think about this one.
> Decrease pending_jobs there if we are doing multiple jobs?
> But we still have the issue of the semaphore.
>
>> IMHO it'll be good to separate the sync task, then we can change
>> pending_jobs to bool. Something like:
>>
>> bool pending_send_page;
>> bool pending_send_sync;
>
> current code:
>
> qemu_mutex_lock(&p->mutex);
> qemu_mutex_lock(&p->mutex);
>
> if (p->pending_job) {
> uint64_t packet_num = p->packet_num;
> uint32_t flags;
> p->normal_num = 0;
>
> if (use_zero_copy_send) {
> p->iovs_num = 0;
> } else {
> p->iovs_num = 1;
> }
>
> for (int i = 0; i < p->pages->num; i++) {
> p->normal[p->normal_num] = p->pages->offset[i];
> p->normal_num++;
> }
>
> if (p->normal_num) {
> ret = multifd_send_state->ops->send_prepare(p, &local_err);
> if (ret != 0) {
> qemu_mutex_unlock(&p->mutex);
> break;
> }
> }
> multifd_send_fill_packet(p);
> flags = p->flags;
> p->flags = 0;
> p->num_packets++;
> p->total_normal_pages += p->normal_num;
> p->pages->num = 0;
> p->pages->block = NULL;
> qemu_mutex_unlock(&p->mutex);
>
> trace_multifd_send(p->id, packet_num, p->normal_num, flags,
> p->next_packet_size);
>
> if (use_zero_copy_send) {
> /* Send header first, without zerocopy */
> ret = qio_channel_write_all(p->c, (void *)p->packet,
> p->packet_len, &local_err);
> if (ret != 0) {
> break;
> }
> } else {
> /* Send header using the same writev call */
> p->iov[0].iov_len = p->packet_len;
> p->iov[0].iov_base = p->packet;
> }
>
> ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
> 0, p->write_flags, &local_err);
> if (ret != 0) {
> break;
> }
>
> stat64_add(&mig_stats.multifd_bytes,
> p->next_packet_size + p->packet_len);
> stat64_add(&mig_stats.transferred,
> p->next_packet_size + p->packet_len);
> p->next_packet_size = 0;
> qemu_mutex_lock(&p->mutex);
> p->pending_job--;
> qemu_mutex_unlock(&p->mutex);
>
> if (flags & MULTIFD_FLAG_SYNC) {
> qemu_sem_post(&p->sem_sync);
> }
> } else {
> qemu_mutex_unlock(&p->mutex);
> /* sometimes there are spurious wakeups */
> }
>
> Your suggested change:
>
> qemu_mutex_lock(&p->mutex);
>
> if (p->pending_job_page) {
It's a semantic issue really, but I'd rather we avoid locking ourselves
more into the "pages" idea for multifd threads. The data being sent by
the multifd thread should be opaque.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
2023-10-20 12:48 ` Fabiano Rosas
@ 2023-10-22 20:17 ` Peter Xu
0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-10-22 20:17 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: quintela, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Fri, Oct 20, 2023 at 09:48:54AM -0300, Fabiano Rosas wrote:
> > If pending_job == 0 -> owner of the channel is migration_thread and it
> > can use it.
> >
> > If pending_job > 0 -> owner of the channel is the channel thread and
> > migration_thread can't use it.
>
> Do you really mean "migration_thread" here or just multifd_send_pages()?
> Because multifd_send_sync_main() doesn't care about this ownership
> rule. Does that mean that code is incorrect?
Yes, that's also what I was referring as the confusion, too.
[...]
> It's a semantic issue really, but I'd rather we avoid locking ourselves
> more into the "pages" idea for multifd threads. The data being sent by
> the multifd thread should be opaque.
I've put these ideas into a RFC patchset here:
[PATCH RFC 0/7] migration/multifd: quit unitifications and separate sync packet
I kept it "pending_job" there, avoid using "pages" as a name.
Fabiano, I have a patch there that dropped p->quit, so there will be
crossovers with your patchset. I tried to leave that alone, but found I'd
better clean that up when add the send thread helpers. Let's see how it
goes..
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-19 9:08 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
` (3 subsequent siblings)
5 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
We don't need to check p->quit in the multifd_send_thread() because it
is shadowed by the 'exiting' flag. Ever since that flag was added
p->quit became obsolete as a way to stop the thread.
Since p->quit is set at multifd_send_terminate_threads() under the
p->mutex lock, the thread will only see it once it loops, so 'exiting'
will always be seen first.
Note that setting p->quit at multifd_send_terminate_threads() still
makes sense because we need a way to inform multifd_send_pages() that
the channel has stopped.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/multifd.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index e26f5f246d..92ae61a50f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -731,9 +731,6 @@ static void *multifd_send_thread(void *opaque)
if (flags & MULTIFD_FLAG_SYNC) {
qemu_sem_post(&p->sem_sync);
}
- } else if (p->quit) {
- qemu_mutex_unlock(&p->mutex);
- break;
} else {
qemu_mutex_unlock(&p->mutex);
/* sometimes there are spurious wakeups */
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
@ 2023-10-19 9:08 ` Juan Quintela
2023-10-19 14:58 ` Fabiano Rosas
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 9:08 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> We don't need to check p->quit in the multifd_send_thread() because it
> is shadowed by the 'exiting' flag. Ever since that flag was added
> p->quit became obsolete as a way to stop the thread.
>
> Since p->quit is set at multifd_send_terminate_threads() under the
> p->mutex lock, the thread will only see it once it loops, so 'exiting'
> will always be seen first.
>
> Note that setting p->quit at multifd_send_terminate_threads() still
> makes sense because we need a way to inform multifd_send_pages() that
> the channel has stopped.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
But then should we remove the quit altogether?
> ---
> migration/multifd.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index e26f5f246d..92ae61a50f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -731,9 +731,6 @@ static void *multifd_send_thread(void *opaque)
> if (flags & MULTIFD_FLAG_SYNC) {
> qemu_sem_post(&p->sem_sync);
> }
> - } else if (p->quit) {
> - qemu_mutex_unlock(&p->mutex);
> - break;
> } else {
> qemu_mutex_unlock(&p->mutex);
> /* sometimes there are spurious wakeups */
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread
2023-10-19 9:08 ` Juan Quintela
@ 2023-10-19 14:58 ` Fabiano Rosas
2023-10-19 15:19 ` Peter Xu
2023-10-19 15:19 ` Juan Quintela
0 siblings, 2 replies; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-19 14:58 UTC (permalink / raw)
To: quintela; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Juan Quintela <quintela@redhat.com> writes:
> Fabiano Rosas <farosas@suse.de> wrote:
>> We don't need to check p->quit in the multifd_send_thread() because it
>> is shadowed by the 'exiting' flag. Ever since that flag was added
>> p->quit became obsolete as a way to stop the thread.
>>
>> Since p->quit is set at multifd_send_terminate_threads() under the
>> p->mutex lock, the thread will only see it once it loops, so 'exiting'
>> will always be seen first.
>>
>> Note that setting p->quit at multifd_send_terminate_threads() still
>> makes sense because we need a way to inform multifd_send_pages() that
>> the channel has stopped.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> But then should we remove the quit altogether?
>
It still serves a purpose to allow multifd_send_pages() to see that the
channel has exited. While that function does also check
multifd_send_state->exiting, it could already be waiting at the mutex
when the channel aborts. So we need to either check 'exiting' again or
keep p->quit.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread
2023-10-19 14:58 ` Fabiano Rosas
@ 2023-10-19 15:19 ` Peter Xu
2023-10-19 15:19 ` Juan Quintela
1 sibling, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-10-19 15:19 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: quintela, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Thu, Oct 19, 2023 at 11:58:13AM -0300, Fabiano Rosas wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
> > Fabiano Rosas <farosas@suse.de> wrote:
> >> We don't need to check p->quit in the multifd_send_thread() because it
> >> is shadowed by the 'exiting' flag. Ever since that flag was added
> >> p->quit became obsolete as a way to stop the thread.
> >>
> >> Since p->quit is set at multifd_send_terminate_threads() under the
> >> p->mutex lock, the thread will only see it once it loops, so 'exiting'
> >> will always be seen first.
> >>
> >> Note that setting p->quit at multifd_send_terminate_threads() still
> >> makes sense because we need a way to inform multifd_send_pages() that
> >> the channel has stopped.
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> >
> > But then should we remove the quit altogether?
> >
>
> It still serves a purpose to allow multifd_send_pages() to see that the
> channel has exited. While that function does also check
> multifd_send_state->exiting, it could already be waiting at the mutex
> when the channel aborts. So we need to either check 'exiting' again or
> keep p->quit.
Sounds better to just move multifd_send_state->exiting check into the loop,
then drop the per-channel ->quit?
diff --git a/migration/multifd.c b/migration/multifd.c
index 1fe53d3b98..381a9a14e4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -399,10 +399,6 @@ static int multifd_send_pages(QEMUFile *f)
MultiFDSendParams *p = NULL; /* make happy gcc */
MultiFDPages_t *pages = multifd_send_state->pages;
- if (qatomic_read(&multifd_send_state->exiting)) {
- return -1;
- }
-
qemu_sem_wait(&multifd_send_state->channels_ready);
/*
* next_channel can remain from a previous migration that was
@@ -413,12 +409,11 @@ static int multifd_send_pages(QEMUFile *f)
for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
p = &multifd_send_state->params[i];
- qemu_mutex_lock(&p->mutex);
- if (p->quit) {
- error_report("%s: channel %d has already quit!", __func__, i);
- qemu_mutex_unlock(&p->mutex);
+ if (qatomic_read(&multifd_send_state->exiting)) {
return -1;
}
+
+ qemu_mutex_lock(&p->mutex);
if (!p->pending_job) {
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
Thanks,
--
Peter Xu
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread
2023-10-19 14:58 ` Fabiano Rosas
2023-10-19 15:19 ` Peter Xu
@ 2023-10-19 15:19 ` Juan Quintela
1 sibling, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 15:19 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> wrote:
>>> We don't need to check p->quit in the multifd_send_thread() because it
>>> is shadowed by the 'exiting' flag. Ever since that flag was added
>>> p->quit became obsolete as a way to stop the thread.
>>>
>>> Since p->quit is set at multifd_send_terminate_threads() under the
>>> p->mutex lock, the thread will only see it once it loops, so 'exiting'
>>> will always be seen first.
>>>
>>> Note that setting p->quit at multifd_send_terminate_threads() still
>>> makes sense because we need a way to inform multifd_send_pages() that
>>> the channel has stopped.
>>>
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>
>> But then should we remove the quit altogether?
>>
>
> It still serves a purpose to allow multifd_send_pages() to see that the
> channel has exited. While that function does also check
> multifd_send_state->exiting, it could already be waiting at the mutex
> when the channel aborts. So we need to either check 'exiting' again or
> keep p->quit.
At the point that we are putting p->quit = true, we are really next to
where we put p->running = false.
I think we can move to use only p->running.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-19 10:28 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
` (2 subsequent siblings)
5 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
We currently use the 'sem_sync' semaphore on the sending side:
1) to know when the multifd_send_thread() has finished sending the
MULTIFD_FLAG_SYNC packet;
This is unnecessary. Multifd sends packets one by one and completion
is already bound by the 'sem' semaphore. The SYNC packet has nothing
special that would require it to have a separate semaphore on the
sending side.
2) to wait for the multifd threads to finish before cleaning up;
This happens because multifd_send_sync_main() blocks
ram_save_complete() from finishing until the semaphore is
posted. This is surprising and not documented.
Clarify the above situation by renaming 'sem_sync' to 'sem_done' and
making the #2 usage the main one. Post to 'sem_sync' only when there's
no more pending_jobs.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
I remove the parts about the receiving side. I wasn't sure about them
and we don't need to mix the two. Potentially we need the sem_sync on
the recv to ensure all channels wait before becoming available to read
once again after a FLUSH.
---
migration/multifd.c | 76 ++++++++++++++++++++++++------------------
migration/multifd.h | 4 +--
migration/trace-events | 2 +-
3 files changed, 47 insertions(+), 35 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 92ae61a50f..94f4ae5ff8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -533,7 +533,7 @@ void multifd_save_cleanup(void)
p->c = NULL;
qemu_mutex_destroy(&p->mutex);
qemu_sem_destroy(&p->sem);
- qemu_sem_destroy(&p->sem_sync);
+ qemu_sem_destroy(&p->sem_done);
g_free(p->name);
p->name = NULL;
multifd_pages_clear(p->pages);
@@ -591,6 +591,44 @@ int multifd_send_sync_main(QEMUFile *f)
}
}
+ for (i = 0; i < migrate_multifd_channels(); i++) {
+ MultiFDSendParams *p = &multifd_send_state->params[i];
+
+ trace_multifd_send_sync_main_signal(p->id);
+
+ qemu_mutex_lock(&p->mutex);
+
+ if (p->quit) {
+ error_report("%s: channel %d has already quit", __func__, i);
+ qemu_mutex_unlock(&p->mutex);
+ return -1;
+ }
+
+ p->packet_num = multifd_send_state->packet_num++;
+ p->flags |= MULTIFD_FLAG_SYNC;
+ p->pending_job++;
+ qemu_mutex_unlock(&p->mutex);
+ qemu_sem_post(&p->sem);
+ }
+
+ /* wait for all channels to be idle */
+ for (i = 0; i < migrate_multifd_channels(); i++) {
+ MultiFDSendParams *p = &multifd_send_state->params[i];
+
+ /*
+ * Even idle channels will wait for p->sem at the top of the
+ * loop.
+ */
+ qemu_sem_post(&p->sem);
+
+ trace_multifd_send_wait(migrate_multifd_channels() - i);
+ qemu_sem_wait(&p->sem_done);
+
+ qemu_mutex_lock(&p->mutex);
+ assert(!p->pending_job || p->quit);
+ qemu_mutex_unlock(&p->mutex);
+ }
+
/*
* When using zero-copy, it's necessary to flush the pages before any of
* the pages can be sent again, so we'll make sure the new version of the
@@ -601,34 +639,11 @@ int multifd_send_sync_main(QEMUFile *f)
* to be less frequent, e.g. only after we finished one whole scanning of
* all the dirty bitmaps.
*/
-
flush_zero_copy = migrate_zero_copy_send();
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
- trace_multifd_send_sync_main_signal(p->id);
-
- qemu_mutex_lock(&p->mutex);
-
- if (p->quit) {
- error_report("%s: channel %d has already quit", __func__, i);
- qemu_mutex_unlock(&p->mutex);
- return -1;
- }
-
- p->packet_num = multifd_send_state->packet_num++;
- p->flags |= MULTIFD_FLAG_SYNC;
- p->pending_job++;
- qemu_mutex_unlock(&p->mutex);
- qemu_sem_post(&p->sem);
- }
- for (i = 0; i < migrate_multifd_channels(); i++) {
- MultiFDSendParams *p = &multifd_send_state->params[i];
-
- trace_multifd_send_sync_main_wait(p->id);
- qemu_sem_wait(&p->sem_sync);
-
if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
return -1;
}
@@ -728,12 +743,9 @@ static void *multifd_send_thread(void *opaque)
p->pending_job--;
qemu_mutex_unlock(&p->mutex);
- if (flags & MULTIFD_FLAG_SYNC) {
- qemu_sem_post(&p->sem_sync);
- }
} else {
qemu_mutex_unlock(&p->mutex);
- /* sometimes there are spurious wakeups */
+ qemu_sem_post(&p->sem_done);
}
}
@@ -749,7 +761,7 @@ out:
* who pay attention to me.
*/
if (ret != 0) {
- qemu_sem_post(&p->sem_sync);
+ qemu_sem_post(&p->sem_done);
}
qemu_mutex_lock(&p->mutex);
@@ -786,7 +798,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
* is not created, and then tell who pay attention to me.
*/
p->quit = true;
- qemu_sem_post(&p->sem_sync);
+ qemu_sem_post(&p->sem_done);
}
}
@@ -863,7 +875,7 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
{
migrate_set_error(migrate_get_current(), err);
/* Error happen, we need to tell who pay attention to me */
- qemu_sem_post(&p->sem_sync);
+ qemu_sem_post(&p->sem_done);
/*
* Although multifd_send_thread is not created, but main migration
* thread need to judge whether it is running, so we need to mark
@@ -915,7 +927,7 @@ int multifd_save_setup(Error **errp)
qemu_mutex_init(&p->mutex);
qemu_sem_init(&p->sem, 0);
- qemu_sem_init(&p->sem_sync, 0);
+ qemu_sem_init(&p->sem_done, 0);
p->quit = false;
p->pending_job = 0;
p->id = i;
diff --git a/migration/multifd.h b/migration/multifd.h
index a835643b48..71bd66974d 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -90,8 +90,8 @@ typedef struct {
/* sem where to wait for more work */
QemuSemaphore sem;
- /* syncs main thread and channels */
- QemuSemaphore sem_sync;
+ /* channel is done transmitting until more pages are queued */
+ QemuSemaphore sem_done;
/* this mutex protects the following parameters */
QemuMutex mutex;
diff --git a/migration/trace-events b/migration/trace-events
index ee9c8f4d63..3aef79a951 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -137,7 +137,7 @@ multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, u
multifd_send_error(uint8_t id) "channel %u"
multifd_send_sync_main(long packet_num) "packet num %ld"
multifd_send_sync_main_signal(uint8_t id) "channel %u"
-multifd_send_sync_main_wait(uint8_t id) "channel %u"
+multifd_send_wait(uint8_t n) "waiting for %u channels to finish sending"
multifd_send_terminate_threads(bool error) "error %d"
multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64
multifd_send_thread_start(uint8_t id) "%u"
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
@ 2023-10-19 10:28 ` Juan Quintela
2023-10-19 15:31 ` Peter Xu
0 siblings, 1 reply; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 10:28 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> We currently use the 'sem_sync' semaphore on the sending side:
>
> 1) to know when the multifd_send_thread() has finished sending the
> MULTIFD_FLAG_SYNC packet;
>
> This is unnecessary. Multifd sends packets one by one and completion
> is already bound by the 'sem' semaphore. The SYNC packet has nothing
> special that would require it to have a separate semaphore on the
> sending side.
What migration basically does is:
sync_dirty_bitmap()
while (too_much_dirty_memory)
foreach(dirty_page)
send(dirty_page)
sync_dirty_bitmap()
I know, this is an over simplification, but it is enough to explain the
problem that this code tries to fix.
Once that we have multifd, we can have several channels, each of one
going through a different network connection. Yes, networks are a black
box and there is no guarantees about how packets arrive on different
sockets.
In one iteration, page 99 is dirty. We send it through channel 0.
We end the iteration and we synchronize the bitmap again.
We send the SYNC packet in both channels.
Page 99 is dirty again, and this time it gets sent through channel 1.
What we want, we want the communication to be:
Channel 0 migration thread Channel 1
(1) sent page 99
(2) sync bitmap
(3) sent page 99
And we want destination to make sure that it never gets packet with page
99 from channel 0 AFTER page 99 from channel 1.
Notice, this is something that it is highly improbable to happen, but it
_can_ happen (and zero copy increases the probability of it).
So we create this SYNC packet that does that:
foreach (channel)
create_job_to_send_sync() packet
foreach (channel)
wait_until_it_has_sent_the_sync_packet()
Notice that this is the main migration thread, it will net send new work
to any channel until it receives the sem_sync for every channel.
Now, how do we deal on the target:
foreach (channel)
wait(sem_sync)
foreach (channel)
send(sem_sync)
So, trying to do my best at ASCII art, what happens when we end a round
of memory iteration
MigrationThread(send) MigrationThread(recv) channel_n (send) channel_n(recv)
sync_bitmap()
foreach(channel)
create_job_with_SYNC
post(channel->sem)
wait(channel->sem)
write(SYNC)
post(channel->sem_sync)
foreach(channel)
wait(channel->sem_sync)
write(MULTIFD_FLUSH)
read(SYNC)
post(main->sem_sync)
wait(channel->sem_sync)
read(MULTIFD_FLUSH)
foreach(channel)
wait(main->sem_sync)
foreach(channel)
post(channel->sem_sync)
Interesting points:
1- We guarantee that packets inside the same channel are in order.
2- Migration send thread don't send a MULTIFD_FLUSH packet until every
channel has sent a SYNC packet
3- After reception of a SYNC packet. A channel:
a -> communicates to the main migration thread that it has received
it (post(main->sem_sync))
b -> it waits on (channel->sem_sync)
4- Main recv thread receives a MULTIFD_FLUSH
a -> waits for every channel to say that it has received a SYNC
packet
b -> communicates to every channel that they can continue.
Send channels can send new data after the main channel does a
write(MULTIFD_FLUSH). But reception channels will not read it until the
main recv thread is sure that every reception channel has received a
SYNC, so we are sure that (in the previous example) page 99 from thread
0 is already written.
Is it clearer now?
And yes, after discussion I will convert this email in documentation.
> 2) to wait for the multifd threads to finish before cleaning up;
>
> This happens because multifd_send_sync_main() blocks
> ram_save_complete() from finishing until the semaphore is
> posted. This is surprising and not documented.
>
> Clarify the above situation by renaming 'sem_sync' to 'sem_done' and
> making the #2 usage the main one. Post to 'sem_sync' only when there's
> no more pending_jobs.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> I remove the parts about the receiving side. I wasn't sure about them
> and we don't need to mix the two. Potentially we need the sem_sync on
> the recv to ensure all channels wait before becoming available to read
> once again after a FLUSH.
I think this is not needed. It is the source how decides when it is
needed to wait for all the packets in the middle.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet
2023-10-19 10:28 ` Juan Quintela
@ 2023-10-19 15:31 ` Peter Xu
0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2023-10-19 15:31 UTC (permalink / raw)
To: Juan Quintela; +Cc: Fabiano Rosas, qemu-devel, Leonardo Bras, Elena Ufimtseva
On Thu, Oct 19, 2023 at 12:28:39PM +0200, Juan Quintela wrote:
> Fabiano Rosas <farosas@suse.de> wrote:
> > We currently use the 'sem_sync' semaphore on the sending side:
> >
> > 1) to know when the multifd_send_thread() has finished sending the
> > MULTIFD_FLAG_SYNC packet;
> >
> > This is unnecessary. Multifd sends packets one by one and completion
> > is already bound by the 'sem' semaphore. The SYNC packet has nothing
> > special that would require it to have a separate semaphore on the
> > sending side.
>
> What migration basically does is:
>
> sync_dirty_bitmap()
> while (too_much_dirty_memory)
> foreach(dirty_page)
> send(dirty_page)
> sync_dirty_bitmap()
>
> I know, this is an over simplification, but it is enough to explain the
> problem that this code tries to fix.
>
> Once that we have multifd, we can have several channels, each of one
> going through a different network connection. Yes, networks are a black
> box and there is no guarantees about how packets arrive on different
> sockets.
>
> In one iteration, page 99 is dirty. We send it through channel 0.
> We end the iteration and we synchronize the bitmap again.
> We send the SYNC packet in both channels.
> Page 99 is dirty again, and this time it gets sent through channel 1.
>
> What we want, we want the communication to be:
>
> Channel 0 migration thread Channel 1
>
> (1) sent page 99
> (2) sync bitmap
> (3) sent page 99
>
> And we want destination to make sure that it never gets packet with page
> 99 from channel 0 AFTER page 99 from channel 1.
>
> Notice, this is something that it is highly improbable to happen, but it
> _can_ happen (and zero copy increases the probability of it).
>
> So we create this SYNC packet that does that:
>
> foreach (channel)
> create_job_to_send_sync() packet
> foreach (channel)
> wait_until_it_has_sent_the_sync_packet()
>
> Notice that this is the main migration thread, it will net send new work
> to any channel until it receives the sem_sync for every channel.
>
> Now, how do we deal on the target:
>
> foreach (channel)
> wait(sem_sync)
> foreach (channel)
> send(sem_sync)
>
> So, trying to do my best at ASCII art, what happens when we end a round
> of memory iteration
>
> MigrationThread(send) MigrationThread(recv) channel_n (send) channel_n(recv)
>
> sync_bitmap()
> foreach(channel)
> create_job_with_SYNC
> post(channel->sem)
> wait(channel->sem)
> write(SYNC)
> post(channel->sem_sync)
> foreach(channel)
> wait(channel->sem_sync)
>
> write(MULTIFD_FLUSH)
> read(SYNC)
> post(main->sem_sync)
> wait(channel->sem_sync)
> read(MULTIFD_FLUSH)
> foreach(channel)
> wait(main->sem_sync)
> foreach(channel)
> post(channel->sem_sync)
Hmm, I think I missed the fact that the main thread also sends something
(in this case, MULTIFD_FLUSH), when I was raising the question in the other
thread on sync.
Now I think it should work indeed.
I'm not sure what is the case before this commit, though:
commit 294e5a4034e81b3d8db03b4e0f691386f20d6ed3
Author: Juan Quintela <quintela@redhat.com>
Date: Tue Jun 21 13:36:11 2022 +0200
multifd: Only flush once each full round of memory
>
> Interesting points:
>
> 1- We guarantee that packets inside the same channel are in order.
>
> 2- Migration send thread don't send a MULTIFD_FLUSH packet until every
> channel has sent a SYNC packet
IMHO this is not required? Main thread can send MULTIFD_FLUSH early too, I
think the important thing is dest recv threads will always sync with this
one, early or late. Then it's impossible that one new page appears earlier
than another old version (in another channel) because when reading the new
page dest threads must have digested the old.
>
> 3- After reception of a SYNC packet. A channel:
> a -> communicates to the main migration thread that it has received
> it (post(main->sem_sync))
> b -> it waits on (channel->sem_sync)
>
> 4- Main recv thread receives a MULTIFD_FLUSH
> a -> waits for every channel to say that it has received a SYNC
> packet
> b -> communicates to every channel that they can continue.
>
> Send channels can send new data after the main channel does a
> write(MULTIFD_FLUSH). But reception channels will not read it until the
> main recv thread is sure that every reception channel has received a
> SYNC, so we are sure that (in the previous example) page 99 from thread
> 0 is already written.
>
> Is it clearer now?
>
> And yes, after discussion I will convert this email in documentation.
Please do so, and I suggest we start to do more with docs/*.rst and not use
wiki pages unless necessary, then doc is with code.
We may consider create docs/migration/ and this can belong to multifd.rst.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
` (2 preceding siblings ...)
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
5 siblings, 0 replies; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
This helps document the intent of the loop via the function name and
we can reuse this in the future.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/multifd.c | 47 +++++++++++++++++++++++++++++----------------
1 file changed, 30 insertions(+), 17 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 94f4ae5ff8..4ffa67339c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -576,6 +576,35 @@ static int multifd_zero_copy_flush(QIOChannel *c)
return ret;
}
+static void multifd_send_wait(void)
+{
+ int i;
+
+ /* wait for all channels to be idle */
+ for (i = 0; i < migrate_multifd_channels(); i++) {
+ MultiFDSendParams *p = &multifd_send_state->params[i];
+
+ /*
+ * Even idle channels will wait for p->sem at the top of the
+ * loop.
+ */
+ qemu_sem_post(&p->sem);
+
+ trace_multifd_send_wait(migrate_multifd_channels() - i);
+ qemu_sem_wait(&p->sem_done);
+
+ qemu_mutex_lock(&p->mutex);
+ assert(!p->pending_job || p->quit);
+ qemu_mutex_unlock(&p->mutex);
+ }
+
+ /*
+ * All channels went idle and have no more jobs. Unless we send
+ * them more work, we're good to allow any cleanup code to run at
+ * this point.
+ */
+}
+
int multifd_send_sync_main(QEMUFile *f)
{
int i;
@@ -611,23 +640,7 @@ int multifd_send_sync_main(QEMUFile *f)
qemu_sem_post(&p->sem);
}
- /* wait for all channels to be idle */
- for (i = 0; i < migrate_multifd_channels(); i++) {
- MultiFDSendParams *p = &multifd_send_state->params[i];
-
- /*
- * Even idle channels will wait for p->sem at the top of the
- * loop.
- */
- qemu_sem_post(&p->sem);
-
- trace_multifd_send_wait(migrate_multifd_channels() - i);
- qemu_sem_wait(&p->sem_done);
-
- qemu_mutex_lock(&p->mutex);
- assert(!p->pending_job || p->quit);
- qemu_mutex_unlock(&p->mutex);
- }
+ multifd_send_wait();
/*
* When using zero-copy, it's necessary to flush the pages before any of
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
` (3 preceding siblings ...)
2023-10-12 14:06 ` [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-19 10:35 ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
5 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
We shouldn't really be touching channel state from outside the
channels except for the transfer of pages.
Move the setting of p->quit into the channel. Keep posting the 'sem'
from multifd_send_terminate_threads() so any channel waiting on the
semaphore will unblock and see the 'exiting' flag and quit by itself.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/multifd.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 4ffa67339c..b7ba3fe0e6 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -497,13 +497,11 @@ static void multifd_send_terminate_threads(Error *err)
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
- qemu_mutex_lock(&p->mutex);
- p->quit = true;
+ /* kick the channel if it was waiting for work */
qemu_sem_post(&p->sem);
if (p->c) {
qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
}
- qemu_mutex_unlock(&p->mutex);
}
}
@@ -690,6 +688,9 @@ static void *multifd_send_thread(void *opaque)
qemu_sem_wait(&p->sem);
if (qatomic_read(&multifd_send_state->exiting)) {
+ qemu_mutex_lock(&p->mutex);
+ p->quit = true;
+ qemu_mutex_unlock(&p->mutex);
break;
}
qemu_mutex_lock(&p->mutex);
@@ -765,6 +766,11 @@ static void *multifd_send_thread(void *opaque)
out:
if (local_err) {
trace_multifd_send_error(p->id);
+
+ qemu_mutex_lock(&p->mutex);
+ p->quit = true;
+ qemu_mutex_unlock(&p->mutex);
+
multifd_send_terminate_threads(local_err);
error_free(local_err);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
@ 2023-10-19 10:35 ` Juan Quintela
0 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 10:35 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> We shouldn't really be touching channel state from outside the
> channels except for the transfer of pages.
>
> Move the setting of p->quit into the channel. Keep posting the 'sem'
> from multifd_send_terminate_threads() so any channel waiting on the
> semaphore will unblock and see the 'exiting' flag and quit by itself.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/multifd.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 4ffa67339c..b7ba3fe0e6 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -497,13 +497,11 @@ static void multifd_send_terminate_threads(Error *err)
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
>
> - qemu_mutex_lock(&p->mutex);
> - p->quit = true;
> + /* kick the channel if it was waiting for work */
> qemu_sem_post(&p->sem);
> if (p->c) {
> qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> }
> - qemu_mutex_unlock(&p->mutex);
Can we do two qio_channel_* operations at the same time on different
threads? mutex is also protecting that. This is a question, I don't
know the answer.
> }
> }
>
> @@ -690,6 +688,9 @@ static void *multifd_send_thread(void *opaque)
> qemu_sem_wait(&p->sem);
>
> if (qatomic_read(&multifd_send_state->exiting)) {
> + qemu_mutex_lock(&p->mutex);
> + p->quit = true;
> + qemu_mutex_unlock(&p->mutex);
> break;
> }
> qemu_mutex_lock(&p->mutex);
> @@ -765,6 +766,11 @@ static void *multifd_send_thread(void *opaque)
If we are going this route, we can just put it here, just in one place.
> out:
> if (local_err) {
> trace_multifd_send_error(p->id);
> +
> + qemu_mutex_lock(&p->mutex);
> + p->quit = true;
> + qemu_mutex_unlock(&p->mutex);
> +
> multifd_send_terminate_threads(local_err);
> error_free(local_err);
> }
But now ->quit has basically the same meaning that ->running, so we
could probably merge both.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
` (4 preceding siblings ...)
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
@ 2023-10-12 14:06 ` Fabiano Rosas
2023-10-19 10:43 ` Juan Quintela
5 siblings, 1 reply; 30+ messages in thread
From: Fabiano Rosas @ 2023-10-12 14:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Leonardo Bras, Elena Ufimtseva
Bring back the 'ready' semaphore, but this time make it per-channel,
so that we can do true lockstep switching of MultiFDPages without
taking the channel lock.
Drop the channel lock as it now becomes useless. The rules for
accessing the MultiFDSendParams are:
- between sem and sem_done/ready if we're the channel
qemu_sem_post(&p->ready);
qemu_sem_wait(&p->sem);
<owns p>
qemu_sem_post(&p->sem_done);
- between ready and sem if we're not the channel
qemu_sem_wait(&p->ready);
<owns p>
qemu_sem_post(&p->sem);
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
One issue I can see with this is that we might now have to wait at
multifd_send_pages() if a channel takes too long to send it's packet
and come back to p->ready. We would need to find a way of ignoring a
slow channel and skipping ahead to the next one in line.
---
migration/multifd.c | 45 +++++++++++++--------------------------------
migration/multifd.h | 5 ++---
2 files changed, 15 insertions(+), 35 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index b7ba3fe0e6..7fa7bc33fd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -410,10 +410,10 @@ static int multifd_send_pages(QEMUFile *f)
for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
p = &multifd_send_state->params[i];
- qemu_mutex_lock(&p->mutex);
+ qemu_sem_wait(&p->ready);
+
if (p->quit) {
error_report("%s: channel %d has already quit!", __func__, i);
- qemu_mutex_unlock(&p->mutex);
return -1;
}
if (!p->pending_job) {
@@ -421,7 +421,6 @@ static int multifd_send_pages(QEMUFile *f)
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
- qemu_mutex_unlock(&p->mutex);
}
assert(!p->pages->num);
assert(!p->pages->block);
@@ -429,7 +428,6 @@ static int multifd_send_pages(QEMUFile *f)
p->packet_num = multifd_send_state->packet_num++;
multifd_send_state->pages = p->pages;
p->pages = pages;
- qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
return 1;
@@ -529,9 +527,9 @@ void multifd_save_cleanup(void)
}
socket_send_channel_destroy(p->c);
p->c = NULL;
- qemu_mutex_destroy(&p->mutex);
qemu_sem_destroy(&p->sem);
qemu_sem_destroy(&p->sem_done);
+ qemu_sem_destroy(&p->ready);
g_free(p->name);
p->name = NULL;
multifd_pages_clear(p->pages);
@@ -586,14 +584,12 @@ static void multifd_send_wait(void)
* Even idle channels will wait for p->sem at the top of the
* loop.
*/
+ qemu_sem_wait(&p->ready);
qemu_sem_post(&p->sem);
trace_multifd_send_wait(migrate_multifd_channels() - i);
qemu_sem_wait(&p->sem_done);
-
- qemu_mutex_lock(&p->mutex);
assert(!p->pending_job || p->quit);
- qemu_mutex_unlock(&p->mutex);
}
/*
@@ -621,20 +617,17 @@ int multifd_send_sync_main(QEMUFile *f)
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
+ qemu_sem_wait(&p->ready);
trace_multifd_send_sync_main_signal(p->id);
- qemu_mutex_lock(&p->mutex);
-
if (p->quit) {
error_report("%s: channel %d has already quit", __func__, i);
- qemu_mutex_unlock(&p->mutex);
return -1;
}
p->packet_num = multifd_send_state->packet_num++;
p->flags |= MULTIFD_FLAG_SYNC;
p->pending_job++;
- qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
}
@@ -685,15 +678,14 @@ static void *multifd_send_thread(void *opaque)
p->num_packets = 1;
while (true) {
+ qemu_sem_post(&p->ready);
qemu_sem_wait(&p->sem);
if (qatomic_read(&multifd_send_state->exiting)) {
- qemu_mutex_lock(&p->mutex);
p->quit = true;
- qemu_mutex_unlock(&p->mutex);
+ qemu_sem_post(&p->sem_done);
break;
}
- qemu_mutex_lock(&p->mutex);
if (p->pending_job) {
uint64_t packet_num = p->packet_num;
@@ -714,7 +706,6 @@ static void *multifd_send_thread(void *opaque)
if (p->normal_num) {
ret = multifd_send_state->ops->send_prepare(p, &local_err);
if (ret != 0) {
- qemu_mutex_unlock(&p->mutex);
break;
}
}
@@ -725,7 +716,6 @@ static void *multifd_send_thread(void *opaque)
p->total_normal_pages += p->normal_num;
p->pages->num = 0;
p->pages->block = NULL;
- qemu_mutex_unlock(&p->mutex);
trace_multifd_send(p->id, packet_num, p->normal_num, flags,
p->next_packet_size);
@@ -753,12 +743,9 @@ static void *multifd_send_thread(void *opaque)
stat64_add(&mig_stats.multifd_bytes, p->next_packet_size);
stat64_add(&mig_stats.transferred, p->next_packet_size);
- qemu_mutex_lock(&p->mutex);
p->pending_job--;
- qemu_mutex_unlock(&p->mutex);
} else {
- qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem_done);
}
}
@@ -766,11 +753,8 @@ static void *multifd_send_thread(void *opaque)
out:
if (local_err) {
trace_multifd_send_error(p->id);
-
- qemu_mutex_lock(&p->mutex);
p->quit = true;
- qemu_mutex_unlock(&p->mutex);
-
+ qemu_sem_post(&p->ready);
multifd_send_terminate_threads(local_err);
error_free(local_err);
}
@@ -780,12 +764,10 @@ out:
* who pay attention to me.
*/
if (ret != 0) {
- qemu_sem_post(&p->sem_done);
+ p->quit = true;
+ qemu_sem_post(&p->ready);
}
-
- qemu_mutex_lock(&p->mutex);
p->running = false;
- qemu_mutex_unlock(&p->mutex);
rcu_unregister_thread();
migration_threads_remove(thread);
@@ -817,7 +799,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
* is not created, and then tell who pay attention to me.
*/
p->quit = true;
- qemu_sem_post(&p->sem_done);
+ qemu_sem_post(&p->ready);
}
}
@@ -893,14 +875,13 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
QIOChannel *ioc, Error *err)
{
migrate_set_error(migrate_get_current(), err);
- /* Error happen, we need to tell who pay attention to me */
- qemu_sem_post(&p->sem_done);
/*
* Although multifd_send_thread is not created, but main migration
* thread need to judge whether it is running, so we need to mark
* its status.
*/
p->quit = true;
+ qemu_sem_post(&p->ready);
object_unref(OBJECT(ioc));
error_free(err);
}
@@ -944,9 +925,9 @@ int multifd_save_setup(Error **errp)
for (i = 0; i < thread_count; i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
- qemu_mutex_init(&p->mutex);
qemu_sem_init(&p->sem, 0);
qemu_sem_init(&p->sem_done, 0);
+ qemu_sem_init(&p->ready, 0);
p->quit = false;
p->pending_job = 0;
p->id = i;
diff --git a/migration/multifd.h b/migration/multifd.h
index 71bd66974d..6bb10b07aa 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -93,8 +93,8 @@ typedef struct {
/* channel is done transmitting until more pages are queued */
QemuSemaphore sem_done;
- /* this mutex protects the following parameters */
- QemuMutex mutex;
+ QemuSemaphore ready;
+
/* is this channel thread running */
bool running;
/* should this thread finish */
@@ -209,4 +209,3 @@ typedef struct {
void multifd_register_ops(int method, MultiFDMethods *ops);
#endif
-
--
2.35.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
@ 2023-10-19 10:43 ` Juan Quintela
0 siblings, 0 replies; 30+ messages in thread
From: Juan Quintela @ 2023-10-19 10:43 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Xu, Leonardo Bras, Elena Ufimtseva
Fabiano Rosas <farosas@suse.de> wrote:
> Bring back the 'ready' semaphore, but this time make it per-channel,
> so that we can do true lockstep switching of MultiFDPages without
> taking the channel lock.
>
> Drop the channel lock as it now becomes useless. The rules for
> accessing the MultiFDSendParams are:
>
> - between sem and sem_done/ready if we're the channel
>
> qemu_sem_post(&p->ready);
> qemu_sem_wait(&p->sem);
> <owns p>
> qemu_sem_post(&p->sem_done);
>
> - between ready and sem if we're not the channel
>
> qemu_sem_wait(&p->ready);
> <owns p>
> qemu_sem_post(&p->sem);
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> One issue I can see with this is that we might now have to wait at
> multifd_send_pages() if a channel takes too long to send it's packet
> and come back to p->ready. We would need to find a way of ignoring a
> slow channel and skipping ahead to the next one in line.
See my 1st patch in the series. That is exactly what the channels_ready
sem does. In your network is faster than your multifd channels can
write, main thread never waits on channels_ready, because it is always
positive.
In case that there are no channels ready, we wait in the sem instead of
being busy waiting.
And searching for the channel that is ready is really fast:
static int multifd_send_pages(QEMUFile *f)
{
[...]
qemu_sem_wait(&multifd_send_state->channels_ready);
// taking a sem that is positive is basically 1 instruction
next_channel %= migrate_multifd_channels();
// we do crude load balancing here.
for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
qemu_mutex_lock(&p->mutex); // 1 lock instruction
if (p->quit) { // 1 check
....
}
if (!p->pending_job) { // 1 check
p->pending_job++;
next_channel = (i + 1) % migrate_multifd_channels();
break;
}
qemu_mutex_unlock(&p->mutex); // 1 unlock (another instruction)
}
....
So checking a channel is:
- taking a mutex that is almost always free.
- 2 checks
- unlock the mutex
So I can't really see the need to optimize this.
Notice that your scheme only has real advantanges if:
- all channels are busy
- the channel that becomes ready is just the previous one to
next_channel or so
And in that case, I think that the solution is to have more channels or
faster networking.
Later, Juan.
^ permalink raw reply [flat|nested] 30+ messages in thread