All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Hao Xiang <hao.xiang@bytedance.com>,
	Yuan Liu <yuan1.liu@intel.com>,
	Bryan Zhang <bryan.zhang@bytedance.com>,
	Avihai Horon <avihaih@nvidia.com>
Subject: Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
Date: Tue, 30 Jan 2024 12:11:47 -0300	[thread overview]
Message-ID: <8734uedff0.fsf@suse.de> (raw)
In-Reply-To: <Zbi2XDfeJHcUpUp9@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> +static MultiFDMethods multifd_socket_ops = {
>> >> +    .send_setup = multifd_socket_send_setup,
>> >> +    .send_cleanup = multifd_socket_send_cleanup,
>> >> +    .send_prepare = multifd_socket_send_prepare,
>> >
>> > Here it's named with "socket", however not all socket-based multifd
>> > migrations will go into this route, e.g., when zstd compression enabled it
>> > will not go via this route, even if zstd also uses sockets as transport.
>> > From that pov, this may be slightly confusing.  Maybe it suites more to be
>> > called "socket_plain" / "socket_no_comp"?
>> >
>> > One step back, I had a feeling that the current proposal tried to provide a
>> > single ->ops to cover a model where we may need more than one layer of
>> > abstraction.
>> >
>> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
>> > into the picture.
>> >
>> > Let's consider the ultimate goal of multifd, where the simplest model could
>> > look like this in my mind (I'm only discussing sender side, but it'll be
>> > similar on recv side):
>> >
>> >                prepare()           send()
>> >   Input   ----------------> IOVs ------------> iochannels
>> >
>> > [I used prepare/send, but please think them as generic terms, not 100%
>> >  aligned with what we have with existing multifd_ops, or what you proposed
>> >  later]
>> >
>> > Here what are sure, IMHO, is:
>> >
>> >   - We always can have some input data to dump; I didn't use "guest pages"
>> >     just to say we may allow arbitrary data.  For any multifd user that
>> >     would like to dump arbitrary data, they can already provide IOVs, so
>> >     here input can be either "MultiFDPages_t" or "IOVs".
>> 
>> Or anything else, since the client code also has control over send(),
>> no? So it could give multifd a pointer to some memory and then use
>> send() to do whatever it wants with it. Multifd is just providing worker
>> threads and "scheduling".
>
> IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> mentioned IOVs explicitly because I want to make it part of the protocol so
> that the interface might be clearer, on what is not changing, and what can
> change for a multifd client.

Got it. I agree.

>> 
>> Also note that multifd clients currently _do not_ provide IOVs. They
>> merely provide data to multifd (p->pages) and then convert that data
>> into IOVs at prepare(). This is different, because multifd currently
>> holds that p->pages (and turns that into p->normal), which means the
>> client code does not need to store the data across iterations (in the
>> case of RAM which is iterative).
>
> They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> iov_nums is the length.

Before that, the ram code needs to pass in the p->pages->offset array
first. Then, that gets put into p->normal. Then, that gets put into
p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
multifd".

Hmm, could we just replace multifd_send_state->pages with a
multifd_send_state->iov? I don't really understand why do we need to
carry that pages->offset around.

>> 
>> >
>> >   - We may always want to have IOVs to represent the buffers at some point,
>> >     no matter what the input it
>> >
>> >   - We always flush the IOVs to iochannels; basically I want to say we can
>> >     always assume the last layer is connecting to QIOChannel APIs, while I
>> >     don't think there's outliers here so far, even if the send() may differ.
>> >
>> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >
>> >   - prepare(): it tells how the "input" will be converted into a scatter
>> >     gatter list of buffers.  All compression methods fall into this afaiu.
>> >     This has _nothing_ to do on how the buffers will be sent.  For
>> >     arbitrary-typed input, this can already be a no-op since the IOVs
>> >     provided can already be passed over to send().
>> >
>> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>> >     only useful for fixed-ram migrations.
>> >
>> > Would this be clearer, rather than keep using a single multifd_ops?
>> 
>> Sorry, I don't see how what you describe is any different than what we
>> have. And I don't see how any of this would mean more than one
>> multifd_ops. We already have multifd_ops->prepare() and
>> multifd_ops->send(). What am I missing?
>
> I meant instead of having a single MultiFDMethods, we can have
> MultifdPrepareOps and MultifdSendOps separately.
>
> Now with single MultiFDMethods, it must always provide e.g. both prepare()
> and send() in one set of OPs for one use case.  What I wanted to say is
> maybe it is cleaner we split it into two OPs, then all the socket-based
> scenarios can already stick with the same send() method, even though they
> can prepare() differently.

Hmm, so zlib/zstd implement all ops except for the send one. And
socket_plain and file implement all prepare hooks plus the send. So we'd
have sort of a data handling layer and a transport layer. I'll see how
it looks.

>
> IOW, for this base patchset to pave way for compression accelerators, IIUC
> we don't need a send() yet so far?  Should they still work pretty well with
> qio_channel_writev_full_all() with proper touchups on p->write_flags just
> for zero copy purposes?

Yes. The point here is to just give everyone a heads-up so we avoid
changing the code in incompatible ways.

>
> I'll have a read again to your previous multifd-packet-cleanups branch I
> guess.  but this series definitely doesn't apply there already.

multifd-packet-cleanups attempts to replace MultiFDPages_t with a
generic data structure. That's a separate issue.


  reply	other threads:[~2024-01-30 15:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
2024-01-29  6:29   ` Peter Xu
2024-01-29 12:42     ` Fabiano Rosas
2024-01-30  8:42       ` Peter Xu
2024-01-30 15:11         ` Fabiano Rosas [this message]
2024-01-31  7:24           ` Peter Xu
2024-01-31 13:14             ` Fabiano Rosas
2024-02-01  3:25               ` Peter Xu
2024-01-26 22:19 ` [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c Fabiano Rosas
2024-01-26 22:19 ` [PATCH 3/5] migration/multifd: Add multifd_ops->send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 4/5] migration/multifd: Simplify zero copy send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup Fabiano Rosas
2024-01-29  1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
2024-01-29  7:36   ` Peter Xu
2024-01-29 12:51     ` Fabiano Rosas
2024-01-31  9:29       ` Peter Xu
2024-01-31 13:19         ` Fabiano Rosas
2024-02-01  1:11           ` [External] " Hao Xiang
2024-02-01 13:23             ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8734uedff0.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=avihaih@nvidia.com \
    --cc=bryan.zhang@bytedance.com \
    --cc=hao.xiang@bytedance.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuan1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.