From: Fabiano Rosas <farosas@suse.de>
To: peterx@redhat.com, qemu-devel@nongnu.org
Cc: Hao Xiang <hao.xiang@bytedance.com>,
Bryan Zhang <bryan.zhang@bytedance.com>,
peterx@redhat.com, Avihai Horon <avihaih@nvidia.com>,
Yuan Liu <yuan1.liu@intel.com>,
Prasad Pandit <ppandit@redhat.com>
Subject: Re: [PATCH v2 06/23] migration/multifd: Separate SYNC request with normal jobs
Date: Fri, 02 Feb 2024 16:21:11 -0300 [thread overview]
Message-ID: <87ttmqhdug.fsf@suse.de> (raw)
In-Reply-To: <20240202102857.110210-7-peterx@redhat.com>
peterx@redhat.com writes:
> From: Peter Xu <peterx@redhat.com>
>
> Multifd provide a threaded model for processing jobs. On sender side,
> there can be two kinds of job: (1) a list of pages to send, or (2) a sync
> request.
>
> The sync request is a very special kind of job. It never contains a page
> array, but only a multifd packet telling the dest side to synchronize with
> sent pages.
>
> Before this patch, both requests use the pending_job field, no matter what
> the request is, it will boost pending_job, while multifd sender thread will
> decrement it after it finishes one job.
>
> However this should be racy, because SYNC is special in that it needs to
> set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
> Consider a sequence of operations where:
>
> - migration thread enqueue a job to send some pages, pending_job++ (0->1)
>
> - [...before the selected multifd sender thread wakes up...]
>
> - migration thread enqueue another job to sync, pending_job++ (1->2),
> setup p->flags=MULTIFD_FLAG_SYNC
>
> - multifd sender thread wakes up, found pending_job==2
> - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
> - send the 2nd packet with flags==0 and no pages
>
> This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
> after all the pages are received. Meanwhile, the 2nd packet will be
> completely useless, which contains zero information.
>
> I didn't verify above, but I think this issue is still benign in that at
> least on the recv side we always receive pages before handling
> MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky.
>
> One other reason I want to separate it is using p->flags to communicate
> between the two threads is also not clearly defined, it's very hard to read
> and understand why accessing p->flags is always safe; see the current impl
> of multifd_send_thread() where we tried to cache only p->flags. It doesn't
> need to be that complicated.
>
> This patch introduces pending_sync, a separate flag just to show that the
> requester needs a sync. Alongside, we remove the tricky caching of
> p->flags now because after this patch p->flags should only be used by
> multifd sender thread now, which will be crystal clear. So it is always
> thread safe to access p->flags.
>
> With that, we can also safely convert the pending_job into a boolean,
> because we don't support >1 pending jobs anyway.
>
> Always use atomic ops to access both flags to make sure no cache effect.
> When at it, drop the initial setting of "pending_job = 0" because it's
> always allocated using g_new0().
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
next prev parent reply other threads:[~2024-02-02 19:21 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 10:28 [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-02-02 10:28 ` [PATCH v2 01/23] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-02-02 10:28 ` [PATCH v2 02/23] migration/multifd: multifd_send_kick_main() peterx
2024-02-02 10:28 ` [PATCH v2 03/23] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-02-02 19:15 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 04/23] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-02-02 10:28 ` [PATCH v2 05/23] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-02-09 0:06 ` [External] " Hao Xiang
2024-02-09 12:20 ` Fabiano Rosas
2024-02-14 2:16 ` Hao Xiang
2024-02-14 17:17 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 06/23] migration/multifd: Separate SYNC request with normal jobs peterx
2024-02-02 19:21 ` Fabiano Rosas [this message]
2024-02-02 10:28 ` [PATCH v2 07/23] migration/multifd: Simplify locking in sender thread peterx
2024-02-02 19:23 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 08/23] migration/multifd: Drop pages->num check " peterx
2024-02-02 10:28 ` [PATCH v2 09/23] migration/multifd: Rename p->num_packets and clean it up peterx
2024-02-02 10:28 ` [PATCH v2 10/23] migration/multifd: Move total_normal_pages accounting peterx
2024-02-02 10:28 ` [PATCH v2 11/23] migration/multifd: Move trace_multifd_send|recv() peterx
2024-02-02 10:28 ` [PATCH v2 12/23] migration/multifd: multifd_send_prepare_header() peterx
2024-02-02 10:28 ` [PATCH v2 13/23] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-02-02 19:26 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 14/23] migration/multifd: Forbid spurious wakeups peterx
2024-02-02 10:28 ` [PATCH v2 15/23] migration/multifd: Split multifd_send_terminate_threads() peterx
2024-02-02 19:28 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 16/23] migration/multifd: Change retval of multifd_queue_page() peterx
2024-02-02 19:29 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 17/23] migration/multifd: Change retval of multifd_send_pages() peterx
2024-02-02 19:30 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 18/23] migration/multifd: Rewrite multifd_queue_page() peterx
2024-02-02 20:47 ` Fabiano Rosas
2024-02-05 4:03 ` Peter Xu
2024-02-02 10:28 ` [PATCH v2 19/23] migration/multifd: Cleanup multifd_save_cleanup() peterx
2024-02-02 20:54 ` Fabiano Rosas
2024-02-05 4:25 ` Peter Xu
2024-02-02 10:28 ` [PATCH v2 20/23] migration/multifd: Cleanup multifd_load_cleanup() peterx
2024-02-02 20:55 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 21/23] migration/multifd: Stick with send/recv on function names peterx
2024-02-02 21:03 ` Fabiano Rosas
2024-02-02 10:28 ` [PATCH v2 22/23] migration/multifd: Fix MultiFDSendParams.packet_num race peterx
2024-02-02 21:08 ` Fabiano Rosas
2024-02-05 4:05 ` Peter Xu
2024-02-02 10:28 ` [PATCH v2 23/23] migration/multifd: Optimize sender side to be lockless peterx
2024-02-02 21:34 ` Fabiano Rosas
2024-02-05 4:35 ` Peter Xu
2024-02-05 14:10 ` Fabiano Rosas
2024-02-05 14:24 ` Peter Xu
2024-02-05 17:59 ` Fabiano Rosas
2024-02-06 3:05 ` [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttmqhdug.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=bryan.zhang@bytedance.com \
--cc=hao.xiang@bytedance.com \
--cc=peterx@redhat.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuan1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).