From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Hao Xiang <hao.xiang@bytedance.com>,
Yuan Liu <yuan1.liu@intel.com>,
Bryan Zhang <bryan.zhang@bytedance.com>,
Avihai Horon <avihaih@nvidia.com>
Subject: Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression
Date: Wed, 31 Jan 2024 10:14:58 -0300 [thread overview]
Message-ID: <875xz9lk4t.fsf@suse.de> (raw)
In-Reply-To: <Zbn1mcXXGKdTTz6O@x1n>
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >>
>> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> >> +static MultiFDMethods multifd_socket_ops = {
>> >> >> + .send_setup = multifd_socket_send_setup,
>> >> >> + .send_cleanup = multifd_socket_send_cleanup,
>> >> >> + .send_prepare = multifd_socket_send_prepare,
>> >> >
>> >> > Here it's named with "socket", however not all socket-based multifd
>> >> > migrations will go into this route, e.g., when zstd compression enabled it
>> >> > will not go via this route, even if zstd also uses sockets as transport.
>> >> > From that pov, this may be slightly confusing. Maybe it suites more to be
>> >> > called "socket_plain" / "socket_no_comp"?
>> >> >
>> >> > One step back, I had a feeling that the current proposal tried to provide a
>> >> > single ->ops to cover a model where we may need more than one layer of
>> >> > abstraction.
>> >> >
>> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> >> > VFIO? Avihai might have an answer there..), I'll try to even consider that
>> >> > into the picture.
>> >> >
>> >> > Let's consider the ultimate goal of multifd, where the simplest model could
>> >> > look like this in my mind (I'm only discussing sender side, but it'll be
>> >> > similar on recv side):
>> >> >
>> >> > prepare() send()
>> >> > Input ----------------> IOVs ------------> iochannels
>> >> >
>> >> > [I used prepare/send, but please think them as generic terms, not 100%
>> >> > aligned with what we have with existing multifd_ops, or what you proposed
>> >> > later]
>> >> >
>> >> > Here what are sure, IMHO, is:
>> >> >
>> >> > - We always can have some input data to dump; I didn't use "guest pages"
>> >> > just to say we may allow arbitrary data. For any multifd user that
>> >> > would like to dump arbitrary data, they can already provide IOVs, so
>> >> > here input can be either "MultiFDPages_t" or "IOVs".
>> >>
>> >> Or anything else, since the client code also has control over send(),
>> >> no? So it could give multifd a pointer to some memory and then use
>> >> send() to do whatever it wants with it. Multifd is just providing worker
>> >> threads and "scheduling".
>> >
>> > IOVs contain the case of one single buffer, where n_iovs==1. Here I
>> > mentioned IOVs explicitly because I want to make it part of the protocol so
>> > that the interface might be clearer, on what is not changing, and what can
>> > change for a multifd client.
>>
>> Got it. I agree.
>>
>> >>
>> >> Also note that multifd clients currently _do not_ provide IOVs. They
>> >> merely provide data to multifd (p->pages) and then convert that data
>> >> into IOVs at prepare(). This is different, because multifd currently
>> >> holds that p->pages (and turns that into p->normal), which means the
>> >> client code does not need to store the data across iterations (in the
>> >> case of RAM which is iterative).
>> >
>> > They provide? AFAIU that's exactly MultiFDSendParams.iov as of now, while
>> > iov_nums is the length.
>>
>> Before that, the ram code needs to pass in the p->pages->offset array
>> first. Then, that gets put into p->normal. Then, that gets put into
>> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
>> multifd".
>>
>> Hmm, could we just replace multifd_send_state->pages with a
>> multifd_send_state->iov? I don't really understand why do we need to
>> carry that pages->offset around.
>
> I am thinking the p->normal is mostly redundant.. at least on the sender
> side that I just read. Since I'll be preparing a new spin of the multifd
> cleanup series I posted, maybe I can append one more to try dropping
> p->normal[] completely.
Just for reference, you don't have to use it, but I have this patch:
https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847
>>
>> >>
>> >> >
>> >> > - We may always want to have IOVs to represent the buffers at some point,
>> >> > no matter what the input it
>> >> >
>> >> > - We always flush the IOVs to iochannels; basically I want to say we can
>> >> > always assume the last layer is connecting to QIOChannel APIs, while I
>> >> > don't think there's outliers here so far, even if the send() may differ.
>> >> >
>> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >> >
>> >> > - prepare(): it tells how the "input" will be converted into a scatter
>> >> > gatter list of buffers. All compression methods fall into this afaiu.
>> >> > This has _nothing_ to do on how the buffers will be sent. For
>> >> > arbitrary-typed input, this can already be a no-op since the IOVs
>> >> > provided can already be passed over to send().
>> >> >
>> >> > - send(): how to dump the IOVs to the iochannels. AFAIU this is motly
>> >> > only useful for fixed-ram migrations.
>> >> >
>> >> > Would this be clearer, rather than keep using a single multifd_ops?
>> >>
>> >> Sorry, I don't see how what you describe is any different than what we
>> >> have. And I don't see how any of this would mean more than one
>> >> multifd_ops. We already have multifd_ops->prepare() and
>> >> multifd_ops->send(). What am I missing?
>> >
>> > I meant instead of having a single MultiFDMethods, we can have
>> > MultifdPrepareOps and MultifdSendOps separately.
>> >
>> > Now with single MultiFDMethods, it must always provide e.g. both prepare()
>> > and send() in one set of OPs for one use case. What I wanted to say is
>> > maybe it is cleaner we split it into two OPs, then all the socket-based
>> > scenarios can already stick with the same send() method, even though they
>> > can prepare() differently.
>>
>> Hmm, so zlib/zstd implement all ops except for the send one. And
>> socket_plain and file implement all prepare hooks plus the send. So we'd
>> have sort of a data handling layer and a transport layer. I'll see how
>> it looks.
>
> Yeah something like that if you agree; I'd think socket_plain can also use
> the same socket send() with all the rest? Again, I don't see its specialty
> except the zero copy possibility, while the latter should be able to be
> covered by proper setup of p->write_flags.
>
I see. Makes sense.
>>
>> >
>> > IOW, for this base patchset to pave way for compression accelerators, IIUC
>> > we don't need a send() yet so far? Should they still work pretty well with
>> > qio_channel_writev_full_all() with proper touchups on p->write_flags just
>> > for zero copy purposes?
>>
>> Yes. The point here is to just give everyone a heads-up so we avoid
>> changing the code in incompatible ways.
>>
>> >
>> > I'll have a read again to your previous multifd-packet-cleanups branch I
>> > guess. but this series definitely doesn't apply there already.
>>
>> multifd-packet-cleanups attempts to replace MultiFDPages_t with a
>> generic data structure. That's a separate issue.
>>
next prev parent reply other threads:[~2024-01-31 13:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 22:19 [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Fabiano Rosas
2024-01-26 22:19 ` [PATCH 1/5] migration/multifd: Separate compression ops from non-compression Fabiano Rosas
2024-01-29 6:29 ` Peter Xu
2024-01-29 12:42 ` Fabiano Rosas
2024-01-30 8:42 ` Peter Xu
2024-01-30 15:11 ` Fabiano Rosas
2024-01-31 7:24 ` Peter Xu
2024-01-31 13:14 ` Fabiano Rosas [this message]
2024-02-01 3:25 ` Peter Xu
2024-01-26 22:19 ` [PATCH 2/5] migration/multifd: Move multifd_socket_ops to socket.c Fabiano Rosas
2024-01-26 22:19 ` [PATCH 3/5] migration/multifd: Add multifd_ops->send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 4/5] migration/multifd: Simplify zero copy send Fabiano Rosas
2024-01-26 22:19 ` [PATCH 5/5] migration/multifd: Move zero copy flag into multifd_socket_setup Fabiano Rosas
2024-01-29 1:41 ` [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work Liu, Yuan1
2024-01-29 7:36 ` Peter Xu
2024-01-29 12:51 ` Fabiano Rosas
2024-01-31 9:29 ` Peter Xu
2024-01-31 13:19 ` Fabiano Rosas
2024-02-01 1:11 ` [External] " Hao Xiang
2024-02-01 13:23 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875xz9lk4t.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=bryan.zhang@bytedance.com \
--cc=hao.xiang@bytedance.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuan1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.