* [PATCH v2 0/3] migration: Fix multifd cancel test
@ 2023-06-07 16:13 Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Fabiano Rosas @ 2023-06-07 16:13 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
v2:
- patch 1: dropped the qmp_ prefix;
- patch 2: dropped the qemu_mutex_destroy;
stopped moving the _remove functions (don't strictly need it
anymore since not destroying the mutex explicitly);
added the lock to protect the loop in
qmp_query_migrationthreads;
added __attribute__((constructor)).
CI run: https://gitlab.com/farosas/qemu/-/pipelines/892563231
v1:
https://lore.kernel.org/r/20230606144551.24367-1-farosas@suse.de
When doing cleanup of the multifd send threads we're calling
QLIST_REMOVE concurrently on the migration_threads list. This seems to
be the source of the crashes we've seen on the
multifd/tcp/plain/cancel tests.
I'm running the test in a loop and after a few dozen iterations I see
the crash in dmesg.
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
QEMU_TEST_FLAKY_TESTS=1 \
./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel
multifdsend_10[11382]: segfault at 18 ip 0000564b77de1e25 sp
00007fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000]
Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10
48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89
50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89
the offending instruction is a mov dereferencing the
thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel:
void MigrationThreadDel(MigrationThread *thread)
{
if (thread) {
QLIST_REMOVE(thread, node);
g_free(thread);
}
}
where:
#define QLIST_REMOVE(elm, field) do { \
if ((elm)->field.le_next != NULL) \
(elm)->field.le_next->field.le_prev = \ <-- HERE
(elm)->field.le_prev; \
*(elm)->field.le_prev = (elm)->field.le_next; \
(elm)->field.le_next = NULL; \
(elm)->field.le_prev = NULL; \
} while (/*CONSTCOND*/0)
The MigrationThreadDel function is called from the multifd threads and
is not under any lock, so several calls can race when accessing the
list.
(I actually hit this first on my fixed-ram branch which changes some
synchronization in multifd and makes the issue more frequent)
CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519
Fabiano Rosas (3):
migration/multifd: Rename threadinfo.c functions
migration/multifd: Protect accesses to migration_threads
tests/qtest: Re-enable multifd cancel test
migration/migration.c | 4 ++--
migration/multifd.c | 4 ++--
migration/threadinfo.c | 19 ++++++++++++++++---
migration/threadinfo.h | 7 ++-----
tests/qtest/migration-test.c | 10 ++--------
5 files changed, 24 insertions(+), 20 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions
2023-06-07 16:13 [PATCH v2 0/3] migration: Fix multifd cancel test Fabiano Rosas
@ 2023-06-07 16:13 ` Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Fabiano Rosas @ 2023-06-07 16:13 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras, Philippe Mathieu-Daudé
We're about to add more functions to this file so make it use the same
coding style as the rest of the code.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
migration/migration.c | 4 ++--
migration/multifd.c | 4 ++--
migration/threadinfo.c | 4 ++--
migration/threadinfo.h | 5 ++---
4 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..3a001dd042 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2922,7 +2922,7 @@ static void *migration_thread(void *opaque)
MigThrError thr_error;
bool urgent = false;
- thread = MigrationThreadAdd("live_migration", qemu_get_thread_id());
+ thread = migration_threads_add("live_migration", qemu_get_thread_id());
rcu_register_thread();
@@ -3000,7 +3000,7 @@ static void *migration_thread(void *opaque)
migration_iteration_finish(s);
object_unref(OBJECT(s));
rcu_unregister_thread();
- MigrationThreadDel(thread);
+ migration_threads_remove(thread);
return NULL;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index 3387d8277f..4c6cee6547 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -651,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
int ret = 0;
bool use_zero_copy_send = migrate_zero_copy_send();
- thread = MigrationThreadAdd(p->name, qemu_get_thread_id());
+ thread = migration_threads_add(p->name, qemu_get_thread_id());
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -767,7 +767,7 @@ out:
qemu_mutex_unlock(&p->mutex);
rcu_unregister_thread();
- MigrationThreadDel(thread);
+ migration_threads_remove(thread);
trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages);
return NULL;
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 1de8b31855..3dd9b14ae6 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -14,7 +14,7 @@
static QLIST_HEAD(, MigrationThread) migration_threads;
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
+MigrationThread *migration_threads_add(const char *name, int thread_id)
{
MigrationThread *thread = g_new0(MigrationThread, 1);
thread->name = name;
@@ -25,7 +25,7 @@ MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
return thread;
}
-void MigrationThreadDel(MigrationThread *thread)
+void migration_threads_remove(MigrationThread *thread)
{
if (thread) {
QLIST_REMOVE(thread, node);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 4d69423c0a..8aa6999d58 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -23,6 +23,5 @@ struct MigrationThread {
QLIST_ENTRY(MigrationThread) node;
};
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
-
-void MigrationThreadDel(MigrationThread *info);
+MigrationThread *migration_threads_add(const char *name, int thread_id);
+void migration_threads_remove(MigrationThread *info);
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-07 16:13 [PATCH v2 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
@ 2023-06-07 16:13 ` Fabiano Rosas
2023-06-09 10:28 ` Juan Quintela
2023-06-07 16:13 ` [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2023-06-07 16:24 ` [PATCH v2 0/3] migration: Fix " Peter Xu
3 siblings, 1 reply; 7+ messages in thread
From: Fabiano Rosas @ 2023-06-07 16:13 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras
This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.
Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.
Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/threadinfo.c | 15 ++++++++++++++-
migration/threadinfo.h | 2 --
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 3dd9b14ae6..262990dd75 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -10,23 +10,35 @@
* See the COPYING file in the top-level directory.
*/
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu/lockable.h"
#include "threadinfo.h"
+QemuMutex migration_threads_lock;
static QLIST_HEAD(, MigrationThread) migration_threads;
+static void __attribute__((constructor)) migration_threads_init(void)
+{
+ qemu_mutex_init(&migration_threads_lock);
+}
+
MigrationThread *migration_threads_add(const char *name, int thread_id)
{
MigrationThread *thread = g_new0(MigrationThread, 1);
thread->name = name;
thread->thread_id = thread_id;
- QLIST_INSERT_HEAD(&migration_threads, thread, node);
+ WITH_QEMU_LOCK_GUARD(&migration_threads_lock) {
+ QLIST_INSERT_HEAD(&migration_threads, thread, node);
+ }
return thread;
}
void migration_threads_remove(MigrationThread *thread)
{
+ QEMU_LOCK_GUARD(&migration_threads_lock);
if (thread) {
QLIST_REMOVE(thread, node);
g_free(thread);
@@ -39,6 +51,7 @@ MigrationThreadInfoList *qmp_query_migrationthreads(Error **errp)
MigrationThreadInfoList **tail = &head;
MigrationThread *thread = NULL;
+ QEMU_LOCK_GUARD(&migration_threads_lock);
QLIST_FOREACH(thread, &migration_threads, node) {
MigrationThreadInfo *info = g_new0(MigrationThreadInfo, 1);
info->name = g_strdup(thread->name);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 8aa6999d58..2f356ff312 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -10,8 +10,6 @@
* See the COPYING file in the top-level directory.
*/
-#include "qemu/queue.h"
-#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qapi/qapi-commands-migration.h"
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test
2023-06-07 16:13 [PATCH v2 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
@ 2023-06-07 16:13 ` Fabiano Rosas
2023-06-09 10:30 ` Juan Quintela
2023-06-07 16:24 ` [PATCH v2 0/3] migration: Fix " Peter Xu
3 siblings, 1 reply; 7+ messages in thread
From: Fabiano Rosas @ 2023-06-07 16:13 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Juan Quintela, Jiang Jiacheng, Peter Xu,
Leonardo Bras, Thomas Huth, Laurent Vivier, Paolo Bonzini
We've found the source of flakiness in this test, so re-enable it.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
tests/qtest/migration-test.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..800ad23b75 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
}
qtest_add_func("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
- /*
- * This test is flaky and sometimes fails in CI and otherwise:
- * don't run unless user opts in via environment variable.
- */
- if (getenv("QEMU_TEST_FLAKY_TESTS")) {
- qtest_add_func("/migration/multifd/tcp/plain/cancel",
- test_multifd_tcp_cancel);
- }
+ qtest_add_func("/migration/multifd/tcp/plain/cancel",
+ test_multifd_tcp_cancel);
qtest_add_func("/migration/multifd/tcp/plain/zlib",
test_multifd_tcp_zlib);
#ifdef CONFIG_ZSTD
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/3] migration: Fix multifd cancel test
2023-06-07 16:13 [PATCH v2 0/3] migration: Fix multifd cancel test Fabiano Rosas
` (2 preceding siblings ...)
2023-06-07 16:13 ` [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
@ 2023-06-07 16:24 ` Peter Xu
3 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2023-06-07 16:24 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Juan Quintela, Jiang Jiacheng,
Leonardo Bras
On Wed, Jun 07, 2023 at 01:13:03PM -0300, Fabiano Rosas wrote:
> Fabiano Rosas (3):
> migration/multifd: Rename threadinfo.c functions
> migration/multifd: Protect accesses to migration_threads
> tests/qtest: Re-enable multifd cancel test
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads
2023-06-07 16:13 ` [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
@ 2023-06-09 10:28 ` Juan Quintela
0 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2023-06-09 10:28 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras
Fabiano Rosas <farosas@suse.de> wrote:
> This doubly linked list is common for all the multifd and migration
> threads so we need to avoid concurrent access.
>
> Add a mutex to protect the data from concurrent access. This fixes a
> crash when removing two MigrationThread objects from the list at the
> same time during cleanup of multifd threads.
>
> Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test
2023-06-07 16:13 ` [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
@ 2023-06-09 10:30 ` Juan Quintela
0 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2023-06-09 10:30 UTC (permalink / raw)
To: Fabiano Rosas
Cc: qemu-devel, Peter Maydell, Jiang Jiacheng, Peter Xu,
Leonardo Bras, Thomas Huth, Laurent Vivier, Paolo Bonzini
Fabiano Rosas <farosas@suse.de> wrote:
> We've found the source of flakiness in this test, so re-enable it.
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
See my series sent yesterday.
Your fix is necesary, but we still have another failure. If it takes
too long for "to" guest to die we could be waiting on the wrong
"dst_serial". See my series that I sent yesterday that fixes this.
Queueding patch 1 and 2 for next PULL, waiting until my series get
reviewed to put this one.
Later, Juan.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-06-09 10:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-07 16:13 [PATCH v2 0/3] migration: Fix multifd cancel test Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-07 16:13 ` [PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
2023-06-09 10:28 ` Juan Quintela
2023-06-07 16:13 ` [PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2023-06-09 10:30 ` Juan Quintela
2023-06-07 16:24 ` [PATCH v2 0/3] migration: Fix " Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).