From: Fabiano Rosas <farosas@suse.de>
To: peterx@redhat.com, qemu-devel@nongnu.org
Cc: Bryan Zhang <bryan.zhang@bytedance.com>,
Prasad Pandit <ppandit@redhat.com>,
peterx@redhat.com, Yuan Liu <yuan1.liu@intel.com>,
Avihai Horon <avihaih@nvidia.com>,
Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups
Date: Wed, 31 Jan 2024 19:49:51 -0300 [thread overview]
Message-ID: <871q9xjey8.fsf@suse.de> (raw)
In-Reply-To: <20240131103111.306523-1-peterx@redhat.com>
peterx@redhat.com writes:
> From: Peter Xu <peterx@redhat.com>
>
> This patchset contains quite a few refactorings to current multifd:
>
> - It picked up some patches from an old series of mine [0] (the last
> patches were dropped, though; I did the cleanup slightly differently):
>
> I still managed to include one patch to split pending_job, but I
> rewrote the patch here.
>
> - It tries to cleanup multiple multifd paths here and there, the ultimate
> goal is to redefine send_prepare() to be something like:
>
> p->pages -----------> send_prepare() -------------> IOVs
>
> So that there's no obvious change yet on multifd_ops besides redefined
> interface for send_prepare(). We may want a separate OPs for file
> later.
>
> For 2), one benefit is already presented by Fabiano in his other series [1]
> on cleaning up zero copy, but this patchset addressed it quite differently,
> and hopefully also more gradually. The other benefit is for sure if we
> have a more concrete API for send_prepare() and if we can reach an initial
> consensus, then we can have the recent compression accelerators rebased on
> top of this one.
>
> This also prepares for the case where the input can be extended to even not
> any p->pages, but arbitrary data (like VFIO's potential use case in the
> future?). But that will also for later even if reasonable.
>
> Please have a look. Thanks,
>
> [0] https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
> [1] https://lore.kernel.org/qemu-devel/20240126221943.26628-1-farosas@suse.de
>
> Peter Xu (14):
> migration/multifd: Drop stale comment for multifd zero copy
> migration/multifd: multifd_send_kick_main()
> migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
> migration/multifd: Postpone reset of MultiFDPages_t
> migration/multifd: Drop MultiFDSendParams.normal[] array
> migration/multifd: Separate SYNC request with normal jobs
> migration/multifd: Simplify locking in sender thread
> migration/multifd: Drop pages->num check in sender thread
> migration/multifd: Rename p->num_packets and clean it up
> migration/multifd: Move total_normal_pages accounting
> migration/multifd: Move trace_multifd_send|recv()
> migration/multifd: multifd_send_prepare_header()
> migration/multifd: Move header prepare/fill into send_prepare()
> migration/multifd: Forbid spurious wakeups
>
> migration/multifd.h | 34 +++--
> migration/multifd-zlib.c | 11 +-
> migration/multifd-zstd.c | 11 +-
> migration/multifd.c | 291 +++++++++++++++++++--------------------
> 4 files changed, 182 insertions(+), 165 deletions(-)
This series didn't survive my 9999 iterations test on the opensuse
machine.
# Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
...
kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
#0 0x00005575dda06399 in qemu_mutex_lock_impl (mutex=0x18, file=0x5575ddce9cc3 "../util/qemu-thread-posix.c", line=275) at ../util/qemu-thread-posix.c:92
#1 0x00005575dda06a94 in qemu_sem_post (sem=0x18) at ../util/qemu-thread-posix.c:275
#2 0x00005575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at ../migration/multifd.c:720
#3 0x00005575dda0709b in qemu_thread_start (args=0x7fd404001d50) at ../util/qemu-thread-posix.c:541
#4 0x00007fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at pthread_create.c:477
#5 0x00007fd45cd2150f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The multifd thread is posting channels_ready with an already freed
multifd_send_state.
This is the bug Avihai has hit. We're going into multifd_save_cleanup()
so early that multifd_new_send_channel_async() hasn't even had the
chance to set p->running. So it misses the join and frees everything up
while a second multifd thread is just starting.
next prev parent reply other threads:[~2024-01-31 22:50 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 10:30 [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-01-31 10:30 ` [PATCH 01/14] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-01-31 10:30 ` [PATCH 02/14] migration/multifd: multifd_send_kick_main() peterx
2024-01-31 10:31 ` [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-01-31 15:05 ` Fabiano Rosas
2024-02-01 9:28 ` Peter Xu
2024-02-01 13:30 ` Fabiano Rosas
2024-02-02 0:21 ` Peter Xu
2024-01-31 10:31 ` [PATCH 04/14] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-01-31 15:27 ` Fabiano Rosas
2024-02-01 10:01 ` Peter Xu
2024-02-01 15:21 ` Fabiano Rosas
2024-02-02 0:28 ` Peter Xu
2024-02-02 0:37 ` Peter Xu
2024-02-02 12:15 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 05/14] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-01-31 16:02 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs peterx
2024-01-31 18:45 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 07/14] migration/multifd: Simplify locking in sender thread peterx
2024-01-31 20:21 ` Fabiano Rosas
2024-02-01 10:37 ` Peter Xu
2024-01-31 10:31 ` [PATCH 08/14] migration/multifd: Drop pages->num check " peterx
2024-01-31 21:19 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up peterx
2024-01-31 21:24 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 10/14] migration/multifd: Move total_normal_pages accounting peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv() peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 12/14] migration/multifd: multifd_send_prepare_header() peterx
2024-01-31 21:32 ` Fabiano Rosas
2024-02-01 10:02 ` Peter Xu
2024-01-31 10:31 ` [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-01-31 21:42 ` Fabiano Rosas
2024-02-01 10:15 ` Peter Xu
2024-02-02 3:57 ` Peter Xu
2024-01-31 10:31 ` [PATCH 14/14] migration/multifd: Forbid spurious wakeups peterx
2024-01-31 21:43 ` Fabiano Rosas
2024-02-01 6:01 ` Peter Xu
2024-01-31 22:49 ` Fabiano Rosas [this message]
2024-02-01 5:47 ` [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups Peter Xu
2024-02-01 12:51 ` Avihai Horon
2024-02-01 21:46 ` Fabiano Rosas
2024-02-02 2:12 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871q9xjey8.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=bryan.zhang@bytedance.com \
--cc=hao.xiang@bytedance.com \
--cc=peterx@redhat.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuan1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.