* [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race
@ 2024-09-13 22:05 Fabiano Rosas
0 siblings, 0 replies; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-13 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Peter Maydell
This fixes the crash we've been seing recently in migration-test. The
first patch is a cleanup to have only one place calling
qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
calls to make multifd_recv_cleanup() run first and stop the recv
threads.
CI run: https://gitlab.com/farosas/qemu/-/pipelines/1453038652
Fabiano Rosas (2):
migration/savevm: Remove extra load cleanup calls
migration/multifd: Fix rb->receivedmap cleanup race
migration/migration.c | 1 +
migration/migration.h | 1 -
migration/savevm.c | 11 -----------
3 files changed, 1 insertion(+), 12 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race
@ 2024-09-17 18:58 Fabiano Rosas
2024-09-17 18:58 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-17 18:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Peter Maydell
v2: Keep skipping the cpu_synchronize_all_post_init() call if the
postcopy listen thread is live. Don't copy stable on the first patch.
CI run: https://gitlab.com/farosas/qemu/-/pipelines/1457418838
====
v1:
https://lore.kernel.org/r/20240913220542.18305-1-farosas@suse.de
This fixes the crash we've been seing recently in migration-test. The
first patch is a cleanup to have only one place calling
qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
calls to make multifd_recv_cleanup() run first and stop the recv
threads.
Fabiano Rosas (2):
migration/savevm: Remove extra load cleanup calls
migration/multifd: Fix rb->receivedmap cleanup race
migration/migration.c | 1 +
migration/savevm.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] migration/savevm: Remove extra load cleanup calls
2024-09-17 18:58 [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
@ 2024-09-17 18:58 ` Fabiano Rosas
2024-09-17 19:16 ` Peter Xu
2024-09-17 18:58 ` [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 20:01 ` [PATCH 0/2] " Peter Xu
2 siblings, 1 reply; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-17 18:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Peter Maydell
There are two qemu_loadvm_state_cleanup() calls that were introduced
when qemu_loadvm_state_setup() was still called before loading the
configuration section, so there was state to be cleaned up if the
header checks failed.
However, commit 9e14b84908 ("migration/savevm: load_header before
load_setup") has moved that configuration section part to
qemu_loadvm_state_header() which now happens before
qemu_loadvm_state_setup().
Remove the cleanup calls that are now misplaced.
Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/savevm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index d500eae979..d0759694fd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
if (migrate_get_current()->send_configuration) {
if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
error_report("Configuration section missing");
- qemu_loadvm_state_cleanup();
return -EINVAL;
}
ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
if (ret) {
- qemu_loadvm_state_cleanup();
return ret;
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race
2024-09-17 18:58 [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 18:58 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
@ 2024-09-17 18:58 ` Fabiano Rosas
2024-09-17 19:20 ` Peter Xu
2024-09-17 20:01 ` [PATCH 0/2] " Peter Xu
2 siblings, 1 reply; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-17 18:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Xu, Peter Maydell, qemu-stable
Fix a segmentation fault in multifd when rb->receivedmap is cleared
too early.
After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
multiple page faults"), multifd started using the rb->receivedmap
bitmap, which belongs to ram.c and is initialized and *freed* from the
ram SaveVMHandlers.
Multifd threads are live until migration_incoming_state_destroy(),
which is called after qemu_loadvm_state_cleanup(), leading to a crash
when accessing rb->receivedmap.
process_incoming_migration_co() ...
qemu_loadvm_state() multifd_nocomp_recv()
qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset()
rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap)
...
migration_incoming_state_destroy()
multifd_recv_cleanup()
multifd_recv_terminate_threads(NULL)
Move the loadvm cleanup into migration_incoming_state_destroy(), after
multifd_recv_cleanup() to ensure multifd threads have already exited
when rb->receivedmap is cleared.
Adjust the postcopy listen thread comment to indicate that we still
want to skip the cpu synchronization.
CC: qemu-stable@nongnu.org
Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 1 +
migration/savevm.c | 6 ++++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..b190a574b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
struct MigrationIncomingState *mis = migration_incoming_get_current();
multifd_recv_cleanup();
+ qemu_loadvm_state_cleanup();
if (mis->to_src_file) {
/* Tell source that we are done */
diff --git a/migration/savevm.c b/migration/savevm.c
index d0759694fd..7e1e27182a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2979,7 +2979,10 @@ int qemu_loadvm_state(QEMUFile *f)
trace_qemu_loadvm_state_post_main(ret);
if (mis->have_listen_thread) {
- /* Listen thread still going, can't clean up yet */
+ /*
+ * Postcopy listen thread still going, don't synchronize the
+ * cpus yet.
+ */
return ret;
}
@@ -3022,7 +3025,6 @@ int qemu_loadvm_state(QEMUFile *f)
}
}
- qemu_loadvm_state_cleanup();
cpu_synchronize_all_post_init();
return ret;
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration/savevm: Remove extra load cleanup calls
2024-09-17 18:58 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
@ 2024-09-17 19:16 ` Peter Xu
2024-09-17 19:20 ` Fabiano Rosas
0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2024-09-17 19:16 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Maydell
On Tue, Sep 17, 2024 at 03:58:01PM -0300, Fabiano Rosas wrote:
> There are two qemu_loadvm_state_cleanup() calls that were introduced
> when qemu_loadvm_state_setup() was still called before loading the
> configuration section, so there was state to be cleaned up if the
> header checks failed.
>
> However, commit 9e14b84908 ("migration/savevm: load_header before
> load_setup") has moved that configuration section part to
> qemu_loadvm_state_header() which now happens before
> qemu_loadvm_state_setup().
>
> Remove the cleanup calls that are now misplaced.
>
> Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
Considering it's a cleanup, do you mind if I further remove this Fixes but
just mention it in commit message (so as to make Michael's life easier when
backport)?
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/savevm.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d500eae979..d0759694fd 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
> if (migrate_get_current()->send_configuration) {
> if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
> error_report("Configuration section missing");
> - qemu_loadvm_state_cleanup();
> return -EINVAL;
> }
> ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
>
> if (ret) {
> - qemu_loadvm_state_cleanup();
> return ret;
> }
> }
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race
2024-09-17 18:58 ` [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
@ 2024-09-17 19:20 ` Peter Xu
2024-09-17 19:29 ` Fabiano Rosas
0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2024-09-17 19:20 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Maydell, qemu-stable
On Tue, Sep 17, 2024 at 03:58:02PM -0300, Fabiano Rosas wrote:
> Fix a segmentation fault in multifd when rb->receivedmap is cleared
> too early.
>
> After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
> multiple page faults"), multifd started using the rb->receivedmap
> bitmap, which belongs to ram.c and is initialized and *freed* from the
> ram SaveVMHandlers.
>
> Multifd threads are live until migration_incoming_state_destroy(),
> which is called after qemu_loadvm_state_cleanup(), leading to a crash
> when accessing rb->receivedmap.
>
> process_incoming_migration_co() ...
> qemu_loadvm_state() multifd_nocomp_recv()
> qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset()
> rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap)
> ...
> migration_incoming_state_destroy()
> multifd_recv_cleanup()
> multifd_recv_terminate_threads(NULL)
>
> Move the loadvm cleanup into migration_incoming_state_destroy(), after
> multifd_recv_cleanup() to ensure multifd threads have already exited
> when rb->receivedmap is cleared.
>
> Adjust the postcopy listen thread comment to indicate that we still
> want to skip the cpu synchronization.
>
> CC: qemu-stable@nongnu.org
> Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
One trivial question below..
> ---
> migration/migration.c | 1 +
> migration/savevm.c | 6 ++++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 3dea06d577..b190a574b1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
> struct MigrationIncomingState *mis = migration_incoming_get_current();
>
> multifd_recv_cleanup();
Would you mind I add a comment squashed here when queue?
/*
* RAM state cleanup needs to happen after multifd cleanup, because
* multifd threads can use some of its states (receivedmap).
*/
> + qemu_loadvm_state_cleanup();
>
> if (mis->to_src_file) {
> /* Tell source that we are done */
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d0759694fd..7e1e27182a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2979,7 +2979,10 @@ int qemu_loadvm_state(QEMUFile *f)
> trace_qemu_loadvm_state_post_main(ret);
>
> if (mis->have_listen_thread) {
> - /* Listen thread still going, can't clean up yet */
> + /*
> + * Postcopy listen thread still going, don't synchronize the
> + * cpus yet.
> + */
> return ret;
> }
>
> @@ -3022,7 +3025,6 @@ int qemu_loadvm_state(QEMUFile *f)
> }
> }
>
> - qemu_loadvm_state_cleanup();
> cpu_synchronize_all_post_init();
>
> return ret;
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] migration/savevm: Remove extra load cleanup calls
2024-09-17 19:16 ` Peter Xu
@ 2024-09-17 19:20 ` Fabiano Rosas
0 siblings, 0 replies; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-17 19:20 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Peter Maydell
Peter Xu <peterx@redhat.com> writes:
> On Tue, Sep 17, 2024 at 03:58:01PM -0300, Fabiano Rosas wrote:
>> There are two qemu_loadvm_state_cleanup() calls that were introduced
>> when qemu_loadvm_state_setup() was still called before loading the
>> configuration section, so there was state to be cleaned up if the
>> header checks failed.
>>
>> However, commit 9e14b84908 ("migration/savevm: load_header before
>> load_setup") has moved that configuration section part to
>> qemu_loadvm_state_header() which now happens before
>> qemu_loadvm_state_setup().
>>
>> Remove the cleanup calls that are now misplaced.
>>
>> Fixes: 9e14b84908 ("migration/savevm: load_header before load_setup")
>
> Considering it's a cleanup, do you mind if I further remove this Fixes but
> just mention it in commit message (so as to make Michael's life easier when
> backport)?
Sure, go ahead.
>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> migration/savevm.c | 2 --
>> 1 file changed, 2 deletions(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d500eae979..d0759694fd 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2732,13 +2732,11 @@ static int qemu_loadvm_state_header(QEMUFile *f)
>> if (migrate_get_current()->send_configuration) {
>> if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
>> error_report("Configuration section missing");
>> - qemu_loadvm_state_cleanup();
>> return -EINVAL;
>> }
>> ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
>>
>> if (ret) {
>> - qemu_loadvm_state_cleanup();
>> return ret;
>> }
>> }
>> --
>> 2.35.3
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race
2024-09-17 19:20 ` Peter Xu
@ 2024-09-17 19:29 ` Fabiano Rosas
0 siblings, 0 replies; 9+ messages in thread
From: Fabiano Rosas @ 2024-09-17 19:29 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Peter Maydell, qemu-stable
Peter Xu <peterx@redhat.com> writes:
> On Tue, Sep 17, 2024 at 03:58:02PM -0300, Fabiano Rosas wrote:
>> Fix a segmentation fault in multifd when rb->receivedmap is cleared
>> too early.
>>
>> After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
>> multiple page faults"), multifd started using the rb->receivedmap
>> bitmap, which belongs to ram.c and is initialized and *freed* from the
>> ram SaveVMHandlers.
>>
>> Multifd threads are live until migration_incoming_state_destroy(),
>> which is called after qemu_loadvm_state_cleanup(), leading to a crash
>> when accessing rb->receivedmap.
>>
>> process_incoming_migration_co() ...
>> qemu_loadvm_state() multifd_nocomp_recv()
>> qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset()
>> rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap)
>> ...
>> migration_incoming_state_destroy()
>> multifd_recv_cleanup()
>> multifd_recv_terminate_threads(NULL)
>>
>> Move the loadvm cleanup into migration_incoming_state_destroy(), after
>> multifd_recv_cleanup() to ensure multifd threads have already exited
>> when rb->receivedmap is cleared.
>>
>> Adjust the postcopy listen thread comment to indicate that we still
>> want to skip the cpu synchronization.
>>
>> CC: qemu-stable@nongnu.org
>> Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> One trivial question below..
>
>> ---
>> migration/migration.c | 1 +
>> migration/savevm.c | 6 ++++--
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3dea06d577..b190a574b1 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
>> struct MigrationIncomingState *mis = migration_incoming_get_current();
>>
>> multifd_recv_cleanup();
>
> Would you mind I add a comment squashed here when queue?
>
> /*
> * RAM state cleanup needs to happen after multifd cleanup, because
> * multifd threads can use some of its states (receivedmap).
> */
Yeah, that's ok.
>
>> + qemu_loadvm_state_cleanup();
>>
>> if (mis->to_src_file) {
>> /* Tell source that we are done */
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d0759694fd..7e1e27182a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2979,7 +2979,10 @@ int qemu_loadvm_state(QEMUFile *f)
>> trace_qemu_loadvm_state_post_main(ret);
>>
>> if (mis->have_listen_thread) {
>> - /* Listen thread still going, can't clean up yet */
>> + /*
>> + * Postcopy listen thread still going, don't synchronize the
>> + * cpus yet.
>> + */
>> return ret;
>> }
>>
>> @@ -3022,7 +3025,6 @@ int qemu_loadvm_state(QEMUFile *f)
>> }
>> }
>>
>> - qemu_loadvm_state_cleanup();
>> cpu_synchronize_all_post_init();
>>
>> return ret;
>> --
>> 2.35.3
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race
2024-09-17 18:58 [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 18:58 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
2024-09-17 18:58 ` [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
@ 2024-09-17 20:01 ` Peter Xu
2 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2024-09-17 20:01 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, Peter Maydell
On Tue, Sep 17, 2024 at 03:58:00PM -0300, Fabiano Rosas wrote:
> v2: Keep skipping the cpu_synchronize_all_post_init() call if the
> postcopy listen thread is live. Don't copy stable on the first patch.
>
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1457418838
> ====
> v1:
> https://lore.kernel.org/r/20240913220542.18305-1-farosas@suse.de
>
> This fixes the crash we've been seing recently in migration-test. The
> first patch is a cleanup to have only one place calling
> qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
> calls to make multifd_recv_cleanup() run first and stop the recv
> threads.
>
> Fabiano Rosas (2):
> migration/savevm: Remove extra load cleanup calls
> migration/multifd: Fix rb->receivedmap cleanup race
queued.
Let's see whether this can quiesce all multifd cancel test failures.. If
not, we can still follow that up.
--
Peter Xu
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-09-17 20:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 18:58 [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 18:58 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
2024-09-17 19:16 ` Peter Xu
2024-09-17 19:20 ` Fabiano Rosas
2024-09-17 18:58 ` [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 19:20 ` Peter Xu
2024-09-17 19:29 ` Fabiano Rosas
2024-09-17 20:01 ` [PATCH 0/2] " Peter Xu
-- strict thread matches above, loose matches on Subject: below --
2024-09-13 22:05 Fabiano Rosas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).