From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Bryan Zhang <bryan.zhang@bytedance.com>,
Prasad Pandit <ppandit@redhat.com>,
Yuan Liu <yuan1.liu@intel.com>, Avihai Horon <avihaih@nvidia.com>,
Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups
Date: Thu, 1 Feb 2024 13:47:45 +0800 [thread overview]
Message-ID: <ZbswgRJTXP4yKiuf@x1n> (raw)
In-Reply-To: <871q9xjey8.fsf@suse.de>
On Wed, Jan 31, 2024 at 07:49:51PM -0300, Fabiano Rosas wrote:
> peterx@redhat.com writes:
>
> > From: Peter Xu <peterx@redhat.com>
> >
> > This patchset contains quite a few refactorings to current multifd:
> >
> > - It picked up some patches from an old series of mine [0] (the last
> > patches were dropped, though; I did the cleanup slightly differently):
> >
> > I still managed to include one patch to split pending_job, but I
> > rewrote the patch here.
> >
> > - It tries to cleanup multiple multifd paths here and there, the ultimate
> > goal is to redefine send_prepare() to be something like:
> >
> > p->pages -----------> send_prepare() -------------> IOVs
> >
> > So that there's no obvious change yet on multifd_ops besides redefined
> > interface for send_prepare(). We may want a separate OPs for file
> > later.
> >
> > For 2), one benefit is already presented by Fabiano in his other series [1]
> > on cleaning up zero copy, but this patchset addressed it quite differently,
> > and hopefully also more gradually. The other benefit is for sure if we
> > have a more concrete API for send_prepare() and if we can reach an initial
> > consensus, then we can have the recent compression accelerators rebased on
> > top of this one.
> >
> > This also prepares for the case where the input can be extended to even not
> > any p->pages, but arbitrary data (like VFIO's potential use case in the
> > future?). But that will also for later even if reasonable.
> >
> > Please have a look. Thanks,
> >
> > [0] https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
> > [1] https://lore.kernel.org/qemu-devel/20240126221943.26628-1-farosas@suse.de
> >
> > Peter Xu (14):
> > migration/multifd: Drop stale comment for multifd zero copy
> > migration/multifd: multifd_send_kick_main()
> > migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
> > migration/multifd: Postpone reset of MultiFDPages_t
> > migration/multifd: Drop MultiFDSendParams.normal[] array
> > migration/multifd: Separate SYNC request with normal jobs
> > migration/multifd: Simplify locking in sender thread
> > migration/multifd: Drop pages->num check in sender thread
> > migration/multifd: Rename p->num_packets and clean it up
> > migration/multifd: Move total_normal_pages accounting
> > migration/multifd: Move trace_multifd_send|recv()
> > migration/multifd: multifd_send_prepare_header()
> > migration/multifd: Move header prepare/fill into send_prepare()
> > migration/multifd: Forbid spurious wakeups
> >
> > migration/multifd.h | 34 +++--
> > migration/multifd-zlib.c | 11 +-
> > migration/multifd-zstd.c | 11 +-
> > migration/multifd.c | 291 +++++++++++++++++++--------------------
> > 4 files changed, 182 insertions(+), 165 deletions(-)
>
> This series didn't survive my 9999 iterations test on the opensuse
> machine.
>
> # Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
> ...
> kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>
>
> #0 0x00005575dda06399 in qemu_mutex_lock_impl (mutex=0x18, file=0x5575ddce9cc3 "../util/qemu-thread-posix.c", line=275) at ../util/qemu-thread-posix.c:92
> #1 0x00005575dda06a94 in qemu_sem_post (sem=0x18) at ../util/qemu-thread-posix.c:275
> #2 0x00005575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at ../migration/multifd.c:720
> #3 0x00005575dda0709b in qemu_thread_start (args=0x7fd404001d50) at ../util/qemu-thread-posix.c:541
> #4 0x00007fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at pthread_create.c:477
> #5 0x00007fd45cd2150f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> The multifd thread is posting channels_ready with an already freed
> multifd_send_state.
>
> This is the bug Avihai has hit. We're going into multifd_save_cleanup()
> so early that multifd_new_send_channel_async() hasn't even had the
> chance to set p->running. So it misses the join and frees everything up
> while a second multifd thread is just starting.
Thanks for doing that.
Would this series makes that bug easier to happen? I didn't do a lot of
test on it, it only survived the smoke test and the kicked CI job. I think
we can still decide to fix that issues separately; but if this series makes
that easier to happen then that's definitely bad..
--
Peter Xu
next prev parent reply other threads:[~2024-02-01 5:48 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 10:30 [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-01-31 10:30 ` [PATCH 01/14] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-01-31 10:30 ` [PATCH 02/14] migration/multifd: multifd_send_kick_main() peterx
2024-01-31 10:31 ` [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-01-31 15:05 ` Fabiano Rosas
2024-02-01 9:28 ` Peter Xu
2024-02-01 13:30 ` Fabiano Rosas
2024-02-02 0:21 ` Peter Xu
2024-01-31 10:31 ` [PATCH 04/14] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-01-31 15:27 ` Fabiano Rosas
2024-02-01 10:01 ` Peter Xu
2024-02-01 15:21 ` Fabiano Rosas
2024-02-02 0:28 ` Peter Xu
2024-02-02 0:37 ` Peter Xu
2024-02-02 12:15 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 05/14] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-01-31 16:02 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs peterx
2024-01-31 18:45 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 07/14] migration/multifd: Simplify locking in sender thread peterx
2024-01-31 20:21 ` Fabiano Rosas
2024-02-01 10:37 ` Peter Xu
2024-01-31 10:31 ` [PATCH 08/14] migration/multifd: Drop pages->num check " peterx
2024-01-31 21:19 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up peterx
2024-01-31 21:24 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 10/14] migration/multifd: Move total_normal_pages accounting peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv() peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 12/14] migration/multifd: multifd_send_prepare_header() peterx
2024-01-31 21:32 ` Fabiano Rosas
2024-02-01 10:02 ` Peter Xu
2024-01-31 10:31 ` [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-01-31 21:42 ` Fabiano Rosas
2024-02-01 10:15 ` Peter Xu
2024-02-02 3:57 ` Peter Xu
2024-01-31 10:31 ` [PATCH 14/14] migration/multifd: Forbid spurious wakeups peterx
2024-01-31 21:43 ` Fabiano Rosas
2024-02-01 6:01 ` Peter Xu
2024-01-31 22:49 ` [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups Fabiano Rosas
2024-02-01 5:47 ` Peter Xu [this message]
2024-02-01 12:51 ` Avihai Horon
2024-02-01 21:46 ` Fabiano Rosas
2024-02-02 2:12 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbswgRJTXP4yKiuf@x1n \
--to=peterx@redhat.com \
--cc=avihaih@nvidia.com \
--cc=bryan.zhang@bytedance.com \
--cc=farosas@suse.de \
--cc=hao.xiang@bytedance.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuan1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).