qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Juan Quintela <quintela@redhat.com>,
	Jiang Jiacheng <jiangjiacheng@huawei.com>,
	Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>
Subject: [PATCH 0/3] migration: Fix multifd cancel test
Date: Tue,  6 Jun 2023 11:45:48 -0300	[thread overview]
Message-ID: <20230606144551.24367-1-farosas@suse.de> (raw)

When doing cleanup of the multifd send threads we're calling
QLIST_REMOVE concurrently on the migration_threads list. This seems to
be the source of the crashes we've seen on the
multifd/tcp/plain/cancel tests.

I'm running the test in a loop and after a few dozen iterations I see
the crash in dmesg.

  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
  QEMU_TEST_FLAKY_TESTS=1 \
  ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel

  multifdsend_10[11382]: segfault at 18 ip 0000564b77de1e25 sp
  00007fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000]
  Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10
  48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89
  50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89

the offending instruction is a mov dereferencing the
thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel:

  void MigrationThreadDel(MigrationThread *thread)
  {
      if (thread) {
          QLIST_REMOVE(thread, node);
          g_free(thread);
      }
  }

where:
  #define QLIST_REMOVE(elm, field) do {                   \
          if ((elm)->field.le_next != NULL)               \
                  (elm)->field.le_next->field.le_prev =   \ <-- HERE
                      (elm)->field.le_prev;               \
          *(elm)->field.le_prev = (elm)->field.le_next;   \
          (elm)->field.le_next = NULL;                    \
          (elm)->field.le_prev = NULL;                    \
  } while (/*CONSTCOND*/0)

The MigrationThreadDel function is called from the multifd threads and
is not under any lock, so several calls can race when accessing the
list.

(I actually hit this first on my fixed-ram branch which changes some
synchronization in multifd and makes the issue more frequent)

CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519

Fabiano Rosas (3):
  migration/multifd: Rename threadinfo.c functions
  migration/multifd: Protect accesses to migration_threads
  tests/qtest: Re-enable multifd cancel test

 migration/migration.c        |  7 +++++--
 migration/multifd.c          |  5 +++--
 migration/threadinfo.c       | 23 ++++++++++++++++++++---
 migration/threadinfo.h       |  8 ++++----
 tests/qtest/migration-test.c | 10 ++--------
 5 files changed, 34 insertions(+), 19 deletions(-)

-- 
2.35.3



             reply	other threads:[~2023-06-06 14:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 14:45 Fabiano Rosas [this message]
2023-06-06 14:45 ` [PATCH 1/3] migration/multifd: Rename threadinfo.c functions Fabiano Rosas
2023-06-06 18:38   ` Peter Xu
2023-06-06 19:34     ` Fabiano Rosas
2023-06-06 20:03       ` Peter Xu
2023-06-07  6:30   ` Juan Quintela
2023-06-07  7:56   ` Philippe Mathieu-Daudé
2023-06-06 14:45 ` [PATCH 2/3] migration/multifd: Protect accesses to migration_threads Fabiano Rosas
2023-06-06 18:43   ` Peter Xu
2023-06-07  8:26   ` Juan Quintela
2023-06-07 12:00     ` Fabiano Rosas
2023-06-07 13:25       ` Peter Xu
2023-06-07 16:58         ` Juan Quintela
2023-06-06 14:45 ` [PATCH 3/3] tests/qtest: Re-enable multifd cancel test Fabiano Rosas
2023-06-07  8:27   ` Juan Quintela
2024-01-08  6:42     ` Peter Xu
2024-01-08 14:26       ` Fabiano Rosas
2024-01-09  2:12         ` Peter Xu
2024-01-09  7:21           ` Thomas Huth
2024-01-09  7:48             ` Peter Xu
2024-01-09  8:44               ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230606144551.24367-1-farosas@suse.de \
    --to=farosas@suse.de \
    --cc=jiangjiacheng@huawei.com \
    --cc=leobras@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).