* [PATCH 0/3] migration: Fix multifd cancel test
@ 2023-06-06 14:45 Fabiano Rosas
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-06 14:45 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
When doing cleanup of the multifd send threads we're calling
QLIST_REMOVE concurrently on the migration_threads list. This seems to
be the source of the crashes we've seen on the
multifd/tcp/plain/cancel tests.
I'm running the test in a loop and after a few dozen iterations I see
the crash in dmesg.
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
QEMU_TEST_FLAKY_TESTS=1 \
./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel
multifdsend_10[11382]: segfault at 18 ip 0000564b77de1e25 sp
00007fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000]
Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10
48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89
50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89
the offending instruction is a mov dereferencing the
thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel:
void MigrationThreadDel(MigrationThread *thread)
{
if (thread) {
QLIST_REMOVE(thread, node);
g_free(thread);
}
}
where:
#define QLIST_REMOVE(elm, field) do { \
if ((elm)->field.le_next != NULL) \
(elm)->field.le_next->field.le_prev = \ <-- HERE
(elm)->field.le_prev; \
*(elm)->field.le_prev = (elm)->field.le_next; \
(elm)->field.le_next = NULL; \
(elm)->field.le_prev = NULL; \
} while (/*CONSTCOND*/0)
The MigrationThreadDel function is called from the multifd threads and
is not under any lock, so several calls can race when accessing the
list.
(I actually hit this first on my fixed-ram branch which changes some
synchronization in multifd and makes the issue more frequent)
CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519
Fabiano Rosas (3):
migration/multifd: Rename threadinfo.c functions
migration/multifd: Protect accesses to migration_threads
tests/qtest: Re-enable multifd cancel test
migration/migration.c | 7 +++++--
migration/multifd.c | 5 +++--
migration/threadinfo.c | 23 ++++++++++++++++++++---
migration/threadinfo.h | 8 ++++----
tests/qtest/migration-test.c | 10 ++--------
5 files changed, 34 insertions(+), 19 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 14:45 [PATCH 0/3] migration: Fix multifd cancel test Fabiano Rosas
@ 2023-06-06 14:45 ` Fabiano Rosas
2023-06-06 18:38 ` Peter Xu
` (2 more replies)
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
2023-06-06 14:45 ` [PATCH 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2 siblings, 3 replies; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-06 14:45 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
The code in threadinfo.c is only used for the QMP command
query-migrationthreads. Make it explicit that this is something
related to QMP.
The current names are also too generic for a piece of code that
doesn't affect the migration directly in any way.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 4 ++--
migration/multifd.c | 4 ++--
migration/threadinfo.c | 4 ++--
migration/threadinfo.h | 5 ++---
4 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..e731fc98a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2922,7 +2922,7 @@ static void *migration_thread(void *opaque)
MigThrError thr_error;
bool urgent = false;
- thread = MigrationThreadAdd("live_migration", qemu_get_thread_id());
+ thread = qmp_migration_threads_add("live_migration", qemu_get_thread_id());
rcu_register_thread();
@@ -3000,7 +3000,7 @@ static void *migration_thread(void *opaque)
migration_iteration_finish(s);
object_unref(OBJECT(s));
rcu_unregister_thread();
- MigrationThreadDel(thread);
+ qmp_migration_threads_remove(thread);
return NULL;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index 0bf5958a9c..5ec1ac5c64 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -651,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
int ret = 0;
bool use_zero_copy_send = migrate_zero_copy_send();
- thread = MigrationThreadAdd(p->name, qemu_get_thread_id());
+ thread = qmp_migration_threads_add(p->name, qemu_get_thread_id());
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -767,7 +767,7 @@ out:
qemu_mutex_unlock(&p->mutex);
rcu_unregister_thread();
- MigrationThreadDel(thread);
+ qmp_migration_threads_remove(thread);
trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
return NULL;
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 1de8b31855..c3e85c33e8 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -14,7 +14,7 @@
static QLIST_HEAD(, MigrationThread) migration_threads;
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
+MigrationThread *qmp_migration_threads_add(const char *name, int thread_id)
{
MigrationThread *thread = g_new0(MigrationThread, 1);
thread->name = name;
@@ -25,7 +25,7 @@ MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
return thread;
}
-void MigrationThreadDel(MigrationThread *thread)
+void qmp_migration_threads_remove(MigrationThread *thread)
{
if (thread) {
QLIST_REMOVE(thread, node);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 4d69423c0a..61b990f5e3 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -23,6 +23,5 @@ struct MigrationThread {
QLIST_ENTRY(MigrationThread) node;
};
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
-
-void MigrationThreadDel(MigrationThread *info);
+MigrationThread *qmp_migration_threads_add(const char *name, int thread_id);
+void qmp_migration_threads_remove(MigrationThread *info);
--
2.35.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-06 14:45 [PATCH 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
@ 2023-06-06 14:45 ` Fabiano Rosas
2023-06-06 18:43 ` Peter Xu
2023-06-07 8:26 ` Juan Quintela
2023-06-06 14:45 ` [PATCH 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2 siblings, 2 replies; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-06 14:45 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.
Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.
To avoid destroying the mutex before the last element has been
removed, move calls to qmp_migration_thread_remove so they run before
multifd_save_cleanup joins the threads.
Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.c | 5 ++++-
migration/multifd.c | 3 ++-
migration/threadinfo.c | 19 ++++++++++++++++++-
migration/threadinfo.h | 5 +++--
4 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index e731fc98a1..b3b8345eb2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState *s)
qemu_mutex_lock_iothread();
multifd_save_cleanup();
+ qmp_migration_threads_cleanup();
qemu_mutex_lock(&s->qemu_file_lock);
tmp = s->to_dst_file;
s->to_dst_file = NULL;
@@ -1405,6 +1406,8 @@ void migrate_init(MigrationState *s)
s->vm_old_state = -1;
s->iteration_initial_bytes = 0;
s->threshold_size = 0;
+
+ qmp_migration_threads_init();
}
int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -2997,10 +3000,10 @@ static void *migration_thread(void *opaque)
}
trace_migration_thread_after_loop();
+ qmp_migration_threads_remove(thread);
migration_iteration_finish(s);
object_unref(OBJECT(s));
rcu_unregister_thread();
- qmp_migration_threads_remove(thread);
return NULL;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index 5ec1ac5c64..ee7944560a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -762,12 +762,13 @@ out:
qemu_sem_post(&multifd_send_state->channels_ready);
}
+ qmp_migration_threads_remove(thread);
+
qemu_mutex_lock(&p->mutex);
p->running = false;
qemu_mutex_unlock(&p->mutex);
rcu_unregister_thread();
- qmp_migration_threads_remove(thread);
trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
return NULL;
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index c3e85c33e8..1fe64a02dd 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -10,23 +10,40 @@
* See the COPYING file in the top-level directory.
*/
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu/lockable.h"
#include "threadinfo.h"
+QemuMutex migration_threads_lock;
static QLIST_HEAD(, MigrationThread) migration_threads;
+void qmp_migration_threads_init(void)
+{
+ qemu_mutex_init(&migration_threads_lock);
+}
+
+void qmp_migration_threads_cleanup(void)
+{
+ qemu_mutex_destroy(&migration_threads_lock);
+}
+
MigrationThread *qmp_migration_threads_add(const char *name, int thread_id)
{
MigrationThread *thread = g_new0(MigrationThread, 1);
thread->name = name;
thread->thread_id = thread_id;
- QLIST_INSERT_HEAD(&migration_threads, thread, node);
+ WITH_QEMU_LOCK_GUARD(&migration_threads_lock) {
+ QLIST_INSERT_HEAD(&migration_threads, thread, node);
+ }
return thread;
}
void qmp_migration_threads_remove(MigrationThread *thread)
{
+ QEMU_LOCK_GUARD(&migration_threads_lock);
if (thread) {
QLIST_REMOVE(thread, node);
g_free(thread);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 61b990f5e3..eb7f8e5bb2 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -10,8 +10,6 @@
* See the COPYING file in the top-level directory.
*/
-#include "qemu/queue.h"
-#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qapi/qapi-commands-migration.h"
@@ -23,5 +21,8 @@ struct MigrationThread {
QLIST_ENTRY(MigrationThread) node;
};
+void qmp_migration_threads_init(void);
+void qmp_migration_threads_cleanup(void);
+
MigrationThread *qmp_migration_threads_add(const char *name, int thread_id);
void qmp_migration_threads_remove(MigrationThread *info);
--
2.35.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2023-06-06 14:45 [PATCH 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
@ 2023-06-06 14:45 ` Fabiano Rosas
2023-06-07 8:27 ` Juan Quintela
2 siblings, 1 reply; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-06 14:45 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras, Thomas Huth, Laurent Vivier, Paolo Bonzini
We've found the source of flakiness in this test, so re-enable it.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration-test.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..800ad23b75 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
}
qtest_add_func("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
- /*
- * This test is flaky and sometimes fails in CI and otherwise:
- * don't run unless user opts in via environment variable.
- */
- if (getenv("QEMU_TEST_FLAKY_TESTS")) {
- qtest_add_func("/migration/multifd/tcp/plain/cancel",
- test_multifd_tcp_cancel);
- }
+ qtest_add_func("/migration/multifd/tcp/plain/cancel",
+ test_multifd_tcp_cancel);
qtest_add_func("/migration/multifd/tcp/plain/zlib",
test_multifd_tcp_zlib);
#ifdef CONFIG_ZSTD
--
2.35.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
@ 2023-06-06 18:38 ` Peter Xu
2023-06-06 19:34 ` Fabiano Rosas
2023-06-07 6:30 ` Juan Quintela
2023-06-07 7:56 ` Philippe Mathieu-Daudé
2 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2023-06-06 18:38 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Juan Quintela, Jiang Jiacheng,
Leonardo Bras
On Tue, Jun 06, 2023 at 11:45:49AM -0300, Fabiano Rosas wrote:
> The code in threadinfo.c is only used for the QMP command
> query-migrationthreads. Make it explicit that this is something
> related to QMP.
>
> The current names are also too generic for a piece of code that
> doesn't affect the migration directly in any way.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Looks good here, but shall we reserve the qmp_* prefix to mostly qmp stuff
only? Dropping "qmp_" in the new names would look better to me..
> ---
> migration/migration.c | 4 ++--
> migration/multifd.c | 4 ++--
> migration/threadinfo.c | 4 ++--
> migration/threadinfo.h | 5 ++---
> 4 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index dc05c6f6ea..e731fc98a1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2922,7 +2922,7 @@ static void *migration_thread(void *opaque)
> MigThrError thr_error;
> bool urgent = false;
>
> - thread = MigrationThreadAdd("live_migration", qemu_get_thread_id());
> + thread = qmp_migration_threads_add("live_migration", qemu_get_thread_id());
>
> rcu_register_thread();
>
> @@ -3000,7 +3000,7 @@ static void *migration_thread(void *opaque)
> migration_iteration_finish(s);
> object_unref(OBJECT(s));
> rcu_unregister_thread();
> - MigrationThreadDel(thread);
> + qmp_migration_threads_remove(thread);
> return NULL;
> }
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0bf5958a9c..5ec1ac5c64 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -651,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
> int ret = 0;
> bool use_zero_copy_send = migrate_zero_copy_send();
>
> - thread = MigrationThreadAdd(p->name, qemu_get_thread_id());
> + thread = qmp_migration_threads_add(p->name, qemu_get_thread_id());
>
> trace_multifd_send_thread_start(p->id);
> rcu_register_thread();
> @@ -767,7 +767,7 @@ out:
> qemu_mutex_unlock(&p->mutex);
>
> rcu_unregister_thread();
> - MigrationThreadDel(thread);
> + qmp_migration_threads_remove(thread);
> trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
>
> return NULL;
> diff --git a/migration/threadinfo.c b/migration/threadinfo.c
> index 1de8b31855..c3e85c33e8 100644
> --- a/migration/threadinfo.c
> +++ b/migration/threadinfo.c
> @@ -14,7 +14,7 @@
>
> static QLIST_HEAD(, MigrationThread) migration_threads;
>
> -MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
> +MigrationThread *qmp_migration_threads_add(const char *name, int thread_id)
> {
> MigrationThread *thread = g_new0(MigrationThread, 1);
> thread->name = name;
> @@ -25,7 +25,7 @@ MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
> return thread;
> }
>
> -void MigrationThreadDel(MigrationThread *thread)
> +void qmp_migration_threads_remove(MigrationThread *thread)
> {
> if (thread) {
> QLIST_REMOVE(thread, node);
> diff --git a/migration/threadinfo.h b/migration/threadinfo.h
> index 4d69423c0a..61b990f5e3 100644
> --- a/migration/threadinfo.h
> +++ b/migration/threadinfo.h
> @@ -23,6 +23,5 @@ struct MigrationThread {
> QLIST_ENTRY(MigrationThread) node;
> };
>
> -MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
> -
> -void MigrationThreadDel(MigrationThread *info);
> +MigrationThread *qmp_migration_threads_add(const char *name, int thread_id);
> +void qmp_migration_threads_remove(MigrationThread *info);
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
@ 2023-06-06 18:43 ` Peter Xu
2023-06-07 8:26 ` Juan Quintela
1 sibling, 0 replies; 21+ messages in thread
From: Peter Xu @ 2023-06-06 18:43 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Juan Quintela, Jiang Jiacheng,
Leonardo Bras
On Tue, Jun 06, 2023 at 11:45:50AM -0300, Fabiano Rosas wrote:
> This doubly linked list is common for all the multifd and migration
> threads so we need to avoid concurrent access.
>
> Add a mutex to protect the data from concurrent access. This fixes a
> crash when removing two MigrationThread objects from the list at the
> same time during cleanup of multifd threads.
>
> To avoid destroying the mutex before the last element has been
> removed, move calls to qmp_migration_thread_remove so they run before
> multifd_save_cleanup joins the threads.
>
> Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.c | 5 ++++-
> migration/multifd.c | 3 ++-
> migration/threadinfo.c | 19 ++++++++++++++++++-
> migration/threadinfo.h | 5 +++--
> 4 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index e731fc98a1..b3b8345eb2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState *s)
> qemu_mutex_lock_iothread();
>
> multifd_save_cleanup();
> + qmp_migration_threads_cleanup();
> qemu_mutex_lock(&s->qemu_file_lock);
> tmp = s->to_dst_file;
> s->to_dst_file = NULL;
> @@ -1405,6 +1406,8 @@ void migrate_init(MigrationState *s)
> s->vm_old_state = -1;
> s->iteration_initial_bytes = 0;
> s->threshold_size = 0;
> +
> + qmp_migration_threads_init();
> }
>
> int migrate_add_blocker_internal(Error *reason, Error **errp)
> @@ -2997,10 +3000,10 @@ static void *migration_thread(void *opaque)
> }
>
> trace_migration_thread_after_loop();
> + qmp_migration_threads_remove(thread);
> migration_iteration_finish(s);
> object_unref(OBJECT(s));
> rcu_unregister_thread();
> - qmp_migration_threads_remove(thread);
> return NULL;
> }
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 5ec1ac5c64..ee7944560a 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -762,12 +762,13 @@ out:
> qemu_sem_post(&multifd_send_state->channels_ready);
> }
>
> + qmp_migration_threads_remove(thread);
> +
> qemu_mutex_lock(&p->mutex);
> p->running = false;
> qemu_mutex_unlock(&p->mutex);
>
> rcu_unregister_thread();
> - qmp_migration_threads_remove(thread);
> trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
>
> return NULL;
> diff --git a/migration/threadinfo.c b/migration/threadinfo.c
> index c3e85c33e8..1fe64a02dd 100644
> --- a/migration/threadinfo.c
> +++ b/migration/threadinfo.c
> @@ -10,23 +10,40 @@
> * See the COPYING file in the top-level directory.
> */
>
> +#include "qemu/osdep.h"
> +#include "qemu/queue.h"
> +#include "qemu/lockable.h"
> #include "threadinfo.h"
>
> +QemuMutex migration_threads_lock;
> static QLIST_HEAD(, MigrationThread) migration_threads;
>
> +void qmp_migration_threads_init(void)
> +{
> + qemu_mutex_init(&migration_threads_lock);
> +}
> +
> +void qmp_migration_threads_cleanup(void)
> +{
> + qemu_mutex_destroy(&migration_threads_lock);
> +}
> +
> MigrationThread *qmp_migration_threads_add(const char *name, int thread_id)
> {
> MigrationThread *thread = g_new0(MigrationThread, 1);
> thread->name = name;
> thread->thread_id = thread_id;
>
> - QLIST_INSERT_HEAD(&migration_threads, thread, node);
> + WITH_QEMU_LOCK_GUARD(&migration_threads_lock) {
> + QLIST_INSERT_HEAD(&migration_threads, thread, node);
> + }
>
> return thread;
> }
>
> void qmp_migration_threads_remove(MigrationThread *thread)
> {
> + QEMU_LOCK_GUARD(&migration_threads_lock);
> if (thread) {
> QLIST_REMOVE(thread, node);
> g_free(thread);
qmp_query_migrationthreads() better also have the lock?
Other than that looks good, thanks!
> diff --git a/migration/threadinfo.h b/migration/threadinfo.h
> index 61b990f5e3..eb7f8e5bb2 100644
> --- a/migration/threadinfo.h
> +++ b/migration/threadinfo.h
> @@ -10,8 +10,6 @@
> * See the COPYING file in the top-level directory.
> */
>
> -#include "qemu/queue.h"
> -#include "qemu/osdep.h"
> #include "qapi/error.h"
> #include "qapi/qapi-commands-migration.h"
>
> @@ -23,5 +21,8 @@ struct MigrationThread {
> QLIST_ENTRY(MigrationThread) node;
> };
>
> +void qmp_migration_threads_init(void);
> +void qmp_migration_threads_cleanup(void);
> +
> MigrationThread *qmp_migration_threads_add(const char *name, int thread_id);
> void qmp_migration_threads_remove(MigrationThread *info);
> --
> 2.35.3
>
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 18:38 ` Peter Xu
@ 2023-06-06 19:34 ` Fabiano Rosas
2023-06-06 20:03 ` Peter Xu
0 siblings, 1 reply; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-06 19:34 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Peter Maydell, Juan Quintela, Jiang Jiacheng,
Leonardo Bras
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jun 06, 2023 at 11:45:49AM -0300, Fabiano Rosas wrote:
>> The code in threadinfo.c is only used for the QMP command
>> query-migrationthreads. Make it explicit that this is something
>> related to QMP.
>>
>> The current names are also too generic for a piece of code that
>> doesn't affect the migration directly in any way.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Looks good here, but shall we reserve the qmp_* prefix to mostly qmp stuff
> only? Dropping "qmp_" in the new names would look better to me..
>
Well, we're just putting the thread name and id on a list so that QMP
can use them later. It is nothing "important" enough to have a generic
name like migration_thread.
Perhaps:
thread_info_add
thread_info_remove
thread_info_init
thread_info_cleanup
Anyway, as long as we drop that camel case I'm ok with just removing the
qmp =)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 19:34 ` Fabiano Rosas
@ 2023-06-06 20:03 ` Peter Xu
0 siblings, 0 replies; 21+ messages in thread
From: Peter Xu @ 2023-06-06 20:03 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Juan Quintela, Jiang Jiacheng,
Leonardo Bras
On Tue, Jun 06, 2023 at 04:34:31PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Tue, Jun 06, 2023 at 11:45:49AM -0300, Fabiano Rosas wrote:
> >> The code in threadinfo.c is only used for the QMP command
> >> query-migrationthreads. Make it explicit that this is something
> >> related to QMP.
> >>
> >> The current names are also too generic for a piece of code that
> >> doesn't affect the migration directly in any way.
> >>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > Looks good here, but shall we reserve the qmp_* prefix to mostly qmp stuff
> > only? Dropping "qmp_" in the new names would look better to me..
> >
>
> Well, we're just putting the thread name and id on a list so that QMP
> can use them later. It is nothing "important" enough to have a generic
> name like migration_thread.
>
> Perhaps:
>
> thread_info_add
> thread_info_remove
> thread_info_init
> thread_info_cleanup
>
> Anyway, as long as we drop that camel case I'm ok with just removing the
> qmp =)
Thanks. To me OTOH it's good as long as "qmp_" dropped. :)
I don't worry on using "migration_thread_" as prefix, that's exactly what
the api does to me. Or, migration_thread_info_*(), migration_thr_mgr_*(),
etc.
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-06 18:38 ` Peter Xu
@ 2023-06-07 6:30 ` Juan Quintela
2023-06-07 7:56 ` Philippe Mathieu-Daudé
2 siblings, 0 replies; 21+ messages in thread
From: Juan Quintela @ 2023-06-07 6:30 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras
Fabiano Rosas <farosas@suse.de> wrote:
> The code in threadinfo.c is only used for the QMP command
> query-migrationthreads. Make it explicit that this is something
> related to QMP.
>
> The current names are also too generic for a piece of code that
> doesn't affect the migration directly in any way.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Don't like the caml case. And don't really care with/without the qmp_
preffix. You got it eitherwise.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-06 18:38 ` Peter Xu
2023-06-07 6:30 ` Juan Quintela
@ 2023-06-07 7:56 ` Philippe Mathieu-Daudé
2 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-07 7:56 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
On 6/6/23 16:45, Fabiano Rosas wrote:
> The code in threadinfo.c is only used for the QMP command
> query-migrationthreads. Make it explicit that this is something
> related to QMP.
>
> The current names are also too generic for a piece of code that
> doesn't affect the migration directly in any way.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.c | 4 ++--
> migration/multifd.c | 4 ++--
> migration/threadinfo.c | 4 ++--
> migration/threadinfo.h | 5 ++---
> 4 files changed, 8 insertions(+), 9 deletions(-)
> -MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
> -
> -void MigrationThreadDel(MigrationThread *info);
> +MigrationThread *qmp_migration_threads_add(const char *name, int thread_id);
> +void qmp_migration_threads_remove(MigrationThread *info);
Dropping 'qmp_' prefix:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
2023-06-06 18:43 ` Peter Xu
@ 2023-06-07 8:26 ` Juan Quintela
2023-06-07 12:00 ` Fabiano Rosas
1 sibling, 1 reply; 21+ messages in thread
From: Juan Quintela @ 2023-06-07 8:26 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras
Fabiano Rosas <farosas@suse.de> wrote:
> This doubly linked list is common for all the multifd and migration
> threads so we need to avoid concurrent access.
>
> Add a mutex to protect the data from concurrent access. This fixes a
> crash when removing two MigrationThread objects from the list at the
> same time during cleanup of multifd threads.
>
> To avoid destroying the mutex before the last element has been
> removed, move calls to qmp_migration_thread_remove so they run before
> multifd_save_cleanup joins the threads.
>
> Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
I agree with Peter here. Why don't you have to protect the walking?
> ---
> migration/migration.c | 5 ++++-
> migration/multifd.c | 3 ++-
> migration/threadinfo.c | 19 ++++++++++++++++++-
> migration/threadinfo.h | 5 +++--
> 4 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index e731fc98a1..b3b8345eb2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState *s)
> qemu_mutex_lock_iothread();
>
> multifd_save_cleanup();
> + qmp_migration_threads_cleanup();
I think I will spare this one as the mutex is static, so we are not
winning any memory back.
> }
>
> trace_migration_thread_after_loop();
> + qmp_migration_threads_remove(thread);
> migration_iteration_finish(s);
I can understand moving it here, but why before migration_iteration_finish?
> object_unref(OBJECT(s));
> rcu_unregister_thread();
> - qmp_migration_threads_remove(thread);
> return NULL;
> }
> + qmp_migration_threads_remove(thread);
> +
> qemu_mutex_lock(&p->mutex);
> p->running = false;
> qemu_mutex_unlock(&p->mutex);
>
> rcu_unregister_thread();
> - qmp_migration_threads_remove(thread);
> trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
>
> return NULL;
Here it looks like the right place.
> +#include "qemu/osdep.h"
> +#include "qemu/queue.h"
> +#include "qemu/lockable.h"
> #include "threadinfo.h"
Ouch, it missed Markus cleanup. Thanks.
For the rest it looks good.
Later, Juan.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2023-06-06 14:45 ` [PATCH 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
@ 2023-06-07 8:27 ` Juan Quintela
2024-01-08 6:42 ` Peter Xu
0 siblings, 1 reply; 21+ messages in thread
From: Juan Quintela @ 2023-06-07 8:27 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras, Thomas Huth, Laurent Vivier, Paolo Bonzini
Fabiano Rosas <farosas@suse.de> wrote:
> We've found the source of flakiness in this test, so re-enable it.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> tests/qtest/migration-test.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index b0c355bbd9..800ad23b75 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
> }
> qtest_add_func("/migration/multifd/tcp/plain/none",
> test_multifd_tcp_none);
> - /*
> - * This test is flaky and sometimes fails in CI and otherwise:
> - * don't run unless user opts in via environment variable.
> - */
> - if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> - qtest_add_func("/migration/multifd/tcp/plain/cancel",
> - test_multifd_tcp_cancel);
> - }
> + qtest_add_func("/migration/multifd/tcp/plain/cancel",
> + test_multifd_tcp_cancel);
> qtest_add_func("/migration/multifd/tcp/plain/zlib",
> test_multifd_tcp_zlib);
> #ifdef CONFIG_ZSTD
Reviewed-by: Juan Quintela <quintela@redhat.com>
There was another failure with migration test that I will post during
the rest of the day. It needs both to get it right.
Later, Juan.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-07 8:26 ` Juan Quintela
@ 2023-06-07 12:00 ` Fabiano Rosas
2023-06-07 13:25 ` Peter Xu
0 siblings, 1 reply; 21+ messages in thread
From: Fabiano Rosas @ 2023-06-07 12:00 UTC (permalink / raw)
To: quintela; +Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras
Juan Quintela <quintela@redhat.com> writes:
> Fabiano Rosas <farosas@suse.de> wrote:
>> This doubly linked list is common for all the multifd and migration
>> threads so we need to avoid concurrent access.
>>
>> Add a mutex to protect the data from concurrent access. This fixes a
>> crash when removing two MigrationThread objects from the list at the
>> same time during cleanup of multifd threads.
>>
>> To avoid destroying the mutex before the last element has been
>> removed, move calls to qmp_migration_thread_remove so they run before
>> multifd_save_cleanup joins the threads.
>>
>> Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> I agree with Peter here. Why don't you have to protect the walking?
>
Oversight on my part.
>> ---
>> migration/migration.c | 5 ++++-
>> migration/multifd.c | 3 ++-
>> migration/threadinfo.c | 19 ++++++++++++++++++-
>> migration/threadinfo.h | 5 +++--
>> 4 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index e731fc98a1..b3b8345eb2 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>> qemu_mutex_lock_iothread();
>>
>> multifd_save_cleanup();
>> + qmp_migration_threads_cleanup();
>
> I think I will spare this one as the mutex is static, so we are not
> winning any memory back.
>
Ok
>> }
>>
>> trace_migration_thread_after_loop();
>> + qmp_migration_threads_remove(thread);
>> migration_iteration_finish(s);
>
> I can understand moving it here, but why before migration_iteration_finish?
>
Because migration_iteration_finish schedules migrate_fd_cleanup and it
calls qmp_migration_threads_cleanup. So I wanted to be sure that the
removal happens before destroying the mutex.
>> object_unref(OBJECT(s));
>> rcu_unregister_thread();
>> - qmp_migration_threads_remove(thread);
>> return NULL;
>> }
>> + qmp_migration_threads_remove(thread);
>> +
>> qemu_mutex_lock(&p->mutex);
>> p->running = false;
>> qemu_mutex_unlock(&p->mutex);
>>
>> rcu_unregister_thread();
>> - qmp_migration_threads_remove(thread);
>> trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
>>
>> return NULL;
>
> Here it looks like the right place.
>
Yep, we shouldn't really put any new code after that p->running =
false. Because multifd_save_cleanup will happily start cleaning up
everything while this thread is still running if it sees p->running ==
false.
>
>> +#include "qemu/osdep.h"
>> +#include "qemu/queue.h"
>> +#include "qemu/lockable.h"
>> #include "threadinfo.h"
>
> Ouch, it missed Markus cleanup. Thanks.
>
> For the rest it looks good.
>
> Later, Juan.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-07 12:00 ` Fabiano Rosas
@ 2023-06-07 13:25 ` Peter Xu
2023-06-07 16:58 ` Juan Quintela
0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2023-06-07 13:25 UTC (permalink / raw)
To: Fabiano Rosas
Cc: quintela, qemu-devel, Peter Maydell, Jiang Jiacheng,
Leonardo Bras
On Wed, Jun 07, 2023 at 09:00:14AM -0300, Fabiano Rosas wrote:
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index e731fc98a1..b3b8345eb2 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState *s)
> >> qemu_mutex_lock_iothread();
> >>
> >> multifd_save_cleanup();
> >> + qmp_migration_threads_cleanup();
> >
> > I think I will spare this one as the mutex is static, so we are not
> > winning any memory back.
> >
>
> Ok
We could consider __attribute__((__constructor__)) in this case.
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-07 13:25 ` Peter Xu
@ 2023-06-07 16:58 ` Juan Quintela
0 siblings, 0 replies; 21+ messages in thread
From: Juan Quintela @ 2023-06-07 16:58 UTC (permalink / raw)
To: Peter Xu
Cc: Fabiano Rosas, open list:All patches CC here, Peter Maydell,
Jiang Jiacheng, Leonardo Bras
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]
Sounds good.
On Wed, Jun 7, 2023, 18:28 Peter Xu <peterx@redhat.com> wrote:
> On Wed, Jun 07, 2023 at 09:00:14AM -0300, Fabiano Rosas wrote:
> > >> diff --git a/migration/migration.c b/migration/migration.c
> > >> index e731fc98a1..b3b8345eb2 100644
> > >> --- a/migration/migration.c
> > >> +++ b/migration/migration.c
> > >> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState
> *s)
> > >> qemu_mutex_lock_iothread();
> > >>
> > >> multifd_save_cleanup();
> > >> + qmp_migration_threads_cleanup();
> > >
> > > I think I will spare this one as the mutex is static, so we are not
> > > winning any memory back.
> > >
> >
> > Ok
>
> We could consider __attribute__((__constructor__)) in this case.
>
> --
> Peter Xu
>
>
>
[-- Attachment #2: Type: text/html, Size: 1224 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2023-06-07 8:27 ` Juan Quintela
@ 2024-01-08 6:42 ` Peter Xu
2024-01-08 14:26 ` Fabiano Rosas
0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2024-01-08 6:42 UTC (permalink / raw)
To: Fabiano Rosas
Cc: Fabiano Rosas, qemu-devel, Peter Maydell, Jiang Jiacheng,
Leonardo Bras, Thomas Huth, Laurent Vivier, Paolo Bonzini
On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
> Fabiano Rosas <farosas@suse.de> wrote:
> > We've found the source of flakiness in this test, so re-enable it.
> >
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> > tests/qtest/migration-test.c | 10 ++--------
> > 1 file changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index b0c355bbd9..800ad23b75 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
> > }
> > qtest_add_func("/migration/multifd/tcp/plain/none",
> > test_multifd_tcp_none);
> > - /*
> > - * This test is flaky and sometimes fails in CI and otherwise:
> > - * don't run unless user opts in via environment variable.
> > - */
> > - if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> > - qtest_add_func("/migration/multifd/tcp/plain/cancel",
> > - test_multifd_tcp_cancel);
> > - }
> > + qtest_add_func("/migration/multifd/tcp/plain/cancel",
> > + test_multifd_tcp_cancel);
> > qtest_add_func("/migration/multifd/tcp/plain/zlib",
> > test_multifd_tcp_zlib);
> > #ifdef CONFIG_ZSTD
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
>
> There was another failure with migration test that I will post during
> the rest of the day. It needs both to get it right.
This one didn't yet land upstream. I'm not sure, but maybe Juan was saying
about this change:
commit d2026ee117147893f8d80f060cede6d872ecbd7f
Author: Juan Quintela <quintela@trasno.org>
Date: Wed Apr 26 12:20:36 2023 +0200
multifd: Fix the number of channels ready
Fabiano, did you try to reproduce multifd-cancel with current master? I'm
wondering whether this test has already been completely fixed, then maybe
we can pick up this patch now.
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2024-01-08 6:42 ` Peter Xu
@ 2024-01-08 14:26 ` Fabiano Rosas
2024-01-09 2:12 ` Peter Xu
0 siblings, 1 reply; 21+ messages in thread
From: Fabiano Rosas @ 2024-01-08 14:26 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Leonardo Bras,
Thomas Huth, Laurent Vivier, Paolo Bonzini
Peter Xu <peterx@redhat.com> writes:
> On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
>> Fabiano Rosas <farosas@suse.de> wrote:
>> > We've found the source of flakiness in this test, so re-enable it.
>> >
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > ---
>> > tests/qtest/migration-test.c | 10 ++--------
>> > 1 file changed, 2 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> > index b0c355bbd9..800ad23b75 100644
>> > --- a/tests/qtest/migration-test.c
>> > +++ b/tests/qtest/migration-test.c
>> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
>> > }
>> > qtest_add_func("/migration/multifd/tcp/plain/none",
>> > test_multifd_tcp_none);
>> > - /*
>> > - * This test is flaky and sometimes fails in CI and otherwise:
>> > - * don't run unless user opts in via environment variable.
>> > - */
>> > - if (getenv("QEMU_TEST_FLAKY_TESTS")) {
>> > - qtest_add_func("/migration/multifd/tcp/plain/cancel",
>> > - test_multifd_tcp_cancel);
>> > - }
>> > + qtest_add_func("/migration/multifd/tcp/plain/cancel",
>> > + test_multifd_tcp_cancel);
>> > qtest_add_func("/migration/multifd/tcp/plain/zlib",
>> > test_multifd_tcp_zlib);
>> > #ifdef CONFIG_ZSTD
>>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>
>>
>> There was another failure with migration test that I will post during
>> the rest of the day. It needs both to get it right.
>
> This one didn't yet land upstream. I'm not sure, but maybe Juan was saying
> about this change:
>
> commit d2026ee117147893f8d80f060cede6d872ecbd7f
> Author: Juan Quintela <quintela@trasno.org>
> Date: Wed Apr 26 12:20:36 2023 +0200
>
> multifd: Fix the number of channels ready
That's not it. It was something in the test itself around the fact that
we use two sets of: from/to. There was supposed to be a situation where
we'd start 'to2' while 'to' was still running and that would cause
issues (possibly with sockets).
I think what might have happened is that someone merged a fix through
another tree and Juan didn't notice. I think this is the one:
commit f2d063e61ee2026700ab44bef967f663e976bec8
Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Date: Fri Oct 28 12:57:32 2022 +0800
tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
Make sure QEMU process "to" exited before launching another target
for migration in the test_multifd_tcp_cancel case.
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
> Fabiano, did you try to reproduce multifd-cancel with current master? I'm
> wondering whether this test has already been completely fixed, then maybe
> we can pick up this patch now.
Yes, let's merge it. I have kept it enabled during testing of all of the
recent race conditions we've debugged and haven't seen it fail. Current
master also looks fine.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2024-01-08 14:26 ` Fabiano Rosas
@ 2024-01-09 2:12 ` Peter Xu
2024-01-09 7:21 ` Thomas Huth
0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2024-01-09 2:12 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Leonardo Bras,
Thomas Huth, Laurent Vivier, Paolo Bonzini
On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
> >> Fabiano Rosas <farosas@suse.de> wrote:
> >> > We've found the source of flakiness in this test, so re-enable it.
> >> >
> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > ---
> >> > tests/qtest/migration-test.c | 10 ++--------
> >> > 1 file changed, 2 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> > index b0c355bbd9..800ad23b75 100644
> >> > --- a/tests/qtest/migration-test.c
> >> > +++ b/tests/qtest/migration-test.c
> >> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
> >> > }
> >> > qtest_add_func("/migration/multifd/tcp/plain/none",
> >> > test_multifd_tcp_none);
> >> > - /*
> >> > - * This test is flaky and sometimes fails in CI and otherwise:
> >> > - * don't run unless user opts in via environment variable.
> >> > - */
> >> > - if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> >> > - qtest_add_func("/migration/multifd/tcp/plain/cancel",
> >> > - test_multifd_tcp_cancel);
> >> > - }
> >> > + qtest_add_func("/migration/multifd/tcp/plain/cancel",
> >> > + test_multifd_tcp_cancel);
> >> > qtest_add_func("/migration/multifd/tcp/plain/zlib",
> >> > test_multifd_tcp_zlib);
> >> > #ifdef CONFIG_ZSTD
> >>
> >> Reviewed-by: Juan Quintela <quintela@redhat.com>
> >>
> >>
> >> There was another failure with migration test that I will post during
> >> the rest of the day. It needs both to get it right.
> >
> > This one didn't yet land upstream. I'm not sure, but maybe Juan was saying
> > about this change:
> >
> > commit d2026ee117147893f8d80f060cede6d872ecbd7f
> > Author: Juan Quintela <quintela@trasno.org>
> > Date: Wed Apr 26 12:20:36 2023 +0200
> >
> > multifd: Fix the number of channels ready
>
> That's not it. It was something in the test itself around the fact that
> we use two sets of: from/to. There was supposed to be a situation where
> we'd start 'to2' while 'to' was still running and that would cause
> issues (possibly with sockets).
>
> I think what might have happened is that someone merged a fix through
> another tree and Juan didn't notice. I think this is the one:
>
> commit f2d063e61ee2026700ab44bef967f663e976bec8
> Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
> Date: Fri Oct 28 12:57:32 2022 +0800
>
> tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
>
> Make sure QEMU process "to" exited before launching another target
> for migration in the test_multifd_tcp_cancel case.
>
> Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
> Signed-off-by: Bin Meng <bin.meng@windriver.com>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
Hmm, i see.
>
> > Fabiano, did you try to reproduce multifd-cancel with current master? I'm
> > wondering whether this test has already been completely fixed, then maybe
> > we can pick up this patch now.
>
> Yes, let's merge it. I have kept it enabled during testing of all of the
> recent race conditions we've debugged and haven't seen it fail. Current
> master also looks fine.
It needs a trivial touchup, but then I queued it.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2024-01-09 2:12 ` Peter Xu
@ 2024-01-09 7:21 ` Thomas Huth
2024-01-09 7:48 ` Peter Xu
0 siblings, 1 reply; 21+ messages in thread
From: Thomas Huth @ 2024-01-09 7:21 UTC (permalink / raw)
To: Peter Xu, Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Leonardo Bras,
Laurent Vivier, Paolo Bonzini
On 09/01/2024 03.12, Peter Xu wrote:
> On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
>>>> Fabiano Rosas <farosas@suse.de> wrote:
>>>>> We've found the source of flakiness in this test, so re-enable it.
>>>>>
>>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>>> ---
>>>>> tests/qtest/migration-test.c | 10 ++--------
>>>>> 1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>>>>> index b0c355bbd9..800ad23b75 100644
>>>>> --- a/tests/qtest/migration-test.c
>>>>> +++ b/tests/qtest/migration-test.c
>>>>> @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
>>>>> }
>>>>> qtest_add_func("/migration/multifd/tcp/plain/none",
>>>>> test_multifd_tcp_none);
>>>>> - /*
>>>>> - * This test is flaky and sometimes fails in CI and otherwise:
>>>>> - * don't run unless user opts in via environment variable.
>>>>> - */
>>>>> - if (getenv("QEMU_TEST_FLAKY_TESTS")) {
>>>>> - qtest_add_func("/migration/multifd/tcp/plain/cancel",
>>>>> - test_multifd_tcp_cancel);
>>>>> - }
>>>>> + qtest_add_func("/migration/multifd/tcp/plain/cancel",
>>>>> + test_multifd_tcp_cancel);
>>>>> qtest_add_func("/migration/multifd/tcp/plain/zlib",
>>>>> test_multifd_tcp_zlib);
>>>>> #ifdef CONFIG_ZSTD
>>>>
>>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>>
>>>>
>>>> There was another failure with migration test that I will post during
>>>> the rest of the day. It needs both to get it right.
>>>
>>> This one didn't yet land upstream. I'm not sure, but maybe Juan was saying
>>> about this change:
>>>
>>> commit d2026ee117147893f8d80f060cede6d872ecbd7f
>>> Author: Juan Quintela <quintela@trasno.org>
>>> Date: Wed Apr 26 12:20:36 2023 +0200
>>>
>>> multifd: Fix the number of channels ready
>>
>> That's not it. It was something in the test itself around the fact that
>> we use two sets of: from/to. There was supposed to be a situation where
>> we'd start 'to2' while 'to' was still running and that would cause
>> issues (possibly with sockets).
>>
>> I think what might have happened is that someone merged a fix through
>> another tree and Juan didn't notice. I think this is the one:
>>
>> commit f2d063e61ee2026700ab44bef967f663e976bec8
>> Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>> Date: Fri Oct 28 12:57:32 2022 +0800
>>
>> tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
>>
>> Make sure QEMU process "to" exited before launching another target
>> for migration in the test_multifd_tcp_cancel case.
>>
>> Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>> Signed-off-by: Bin Meng <bin.meng@windriver.com>
>> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>
> Hmm, i see.
Sorry for that :-( Maybe it's better if we remove the migration-test from
the qtest section in MAINTAINERS? Since the migration test is very well
maintained already, there's IMHO no need for picking up the patches via the
qtest tree, so something like this should prevent these problems:
diff --git a/MAINTAINERS b/MAINTAINERS
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3269,6 +3269,7 @@ F: tests/qtest/
F: docs/devel/qgraph.rst
F: docs/devel/qtest.rst
X: tests/qtest/bios-tables-test*
+X: tests/qtest/migration-*
Device Fuzzing
M: Alexander Bulekov <alxndr@bu.edu>
(as you can see, we're doing it in a similar way for the bios tables test
already)
If you agree, I can send out a proper patch for this later today.
Thomas
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2024-01-09 7:21 ` Thomas Huth
@ 2024-01-09 7:48 ` Peter Xu
2024-01-09 8:44 ` Thomas Huth
0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2024-01-09 7:48 UTC (permalink / raw)
To: Thomas Huth
Cc: Fabiano Rosas, qemu-devel, Peter Maydell, Jiang Jiacheng,
Leonardo Bras, Laurent Vivier, Paolo Bonzini
Hi, Thomas,
On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote:
> Sorry for that :-(
Not at all! I actually appreciate more people looking after it.
> Maybe it's better if we remove the migration-test from
> the qtest section in MAINTAINERS? Since the migration test is very well
> maintained already, there's IMHO no need for picking up the patches via the
> qtest tree, so something like this should prevent these problems:
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3269,6 +3269,7 @@ F: tests/qtest/
> F: docs/devel/qgraph.rst
> F: docs/devel/qtest.rst
> X: tests/qtest/bios-tables-test*
> +X: tests/qtest/migration-*
>
> Device Fuzzing
> M: Alexander Bulekov <alxndr@bu.edu>
>
> (as you can see, we're doing it in a similar way for the bios tables test
> already)
>
> If you agree, I can send out a proper patch for this later today.
Currently the file is covered by both groups of people, which is the best
condition to me:
$ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c
Peter Xu <peterx@redhat.com> (maintainer:Migration)
Fabiano Rosas <farosas@suse.de> (maintainer:Migration)
Thomas Huth <thuth@redhat.com> (maintainer:qtest)
Laurent Vivier <lvivier@redhat.com> (maintainer:qtest)
Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest)
qemu-devel@nongnu.org (open list:All patches CC here)
It makes sense to me e.g. when qtest reworks the framework, and we'd like
migration-test.c to be covered in that same reworks series and
reviewed/pulled together, for example, then those can go via qtest's tree
directly.
If patch submitter follows the MAINTAINERS file it means all of us will be
in the loop and that's the perfect condition, IMHO. It's just that this
patch didn't have any migration people copied, which caused a very slight
confusion.
It'll be great in that case if qtest maintainers can help submitters to
copy us if the submitters forgot to do so. I think we should do the same
when there's major changes for qtest framework for a new migration test.
Would that work the best for us?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] tests/qtest: Re-enable multifd cancel test
2024-01-09 7:48 ` Peter Xu
@ 2024-01-09 8:44 ` Thomas Huth
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Huth @ 2024-01-09 8:44 UTC (permalink / raw)
To: Peter Xu
Cc: Fabiano Rosas, qemu-devel, Peter Maydell, Jiang Jiacheng,
Leonardo Bras, Laurent Vivier, Paolo Bonzini
On 09/01/2024 08.48, Peter Xu wrote:
> Hi, Thomas,
>
> On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote:
>> Sorry for that :-(
>
> Not at all! I actually appreciate more people looking after it.
>
>> Maybe it's better if we remove the migration-test from
>> the qtest section in MAINTAINERS? Since the migration test is very well
>> maintained already, there's IMHO no need for picking up the patches via the
>> qtest tree, so something like this should prevent these problems:
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3269,6 +3269,7 @@ F: tests/qtest/
>> F: docs/devel/qgraph.rst
>> F: docs/devel/qtest.rst
>> X: tests/qtest/bios-tables-test*
>> +X: tests/qtest/migration-*
>>
>> Device Fuzzing
>> M: Alexander Bulekov <alxndr@bu.edu>
>>
>> (as you can see, we're doing it in a similar way for the bios tables test
>> already)
>>
>> If you agree, I can send out a proper patch for this later today.
>
> Currently the file is covered by both groups of people, which is the best
> condition to me:
>
> $ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c
> Peter Xu <peterx@redhat.com> (maintainer:Migration)
> Fabiano Rosas <farosas@suse.de> (maintainer:Migration)
> Thomas Huth <thuth@redhat.com> (maintainer:qtest)
> Laurent Vivier <lvivier@redhat.com> (maintainer:qtest)
> Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest)
> qemu-devel@nongnu.org (open list:All patches CC here)
>
> It makes sense to me e.g. when qtest reworks the framework, and we'd like
> migration-test.c to be covered in that same reworks series and
> reviewed/pulled together, for example, then those can go via qtest's tree
> directly.
>
> If patch submitter follows the MAINTAINERS file it means all of us will be
> in the loop and that's the perfect condition, IMHO. It's just that this
> patch didn't have any migration people copied, which caused a very slight
> confusion.
>
> It'll be great in that case if qtest maintainers can help submitters to
> copy us if the submitters forgot to do so. I think we should do the same
> when there's major changes for qtest framework for a new migration test.
> Would that work the best for us?
Ok, makes sense, let's try it that way!
Thomas
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-01-09 8:45 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-06 14:45 [PATCH 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-06 18:38 ` Peter Xu
2023-06-06 19:34 ` Fabiano Rosas
2023-06-06 20:03 ` Peter Xu
2023-06-07 6:30 ` Juan Quintela
2023-06-07 7:56 ` Philippe Mathieu-Daudé
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
2023-06-06 18:43 ` Peter Xu
2023-06-07 8:26 ` Juan Quintela
2023-06-07 12:00 ` Fabiano Rosas
2023-06-07 13:25 ` Peter Xu
2023-06-07 16:58 ` Juan Quintela
2023-06-06 14:45 ` [PATCH 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2023-06-07 8:27 ` Juan Quintela
2024-01-08 6:42 ` Peter Xu
2024-01-08 14:26 ` Fabiano Rosas
2024-01-09 2:12 ` Peter Xu
2024-01-09 7:21 ` Thomas Huth
2024-01-09 7:48 ` Peter Xu
2024-01-09 8:44 ` Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).