From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Fabiano Rosas <farosas@suse.de>,
qemu-devel@nongnu.org, armbru@redhat.com,
Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
Claudio Fontana <cfontana@suse.de>
Subject: Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec
Date: Wed, 17 Jan 2024 14:27:26 +0000 [thread overview]
Message-ID: <Zafjzq5YDTfbYzV-@redhat.com> (raw)
In-Reply-To: <ZafKft029nRUKC4z@redhat.com>
On Wed, Jan 17, 2024 at 12:39:26PM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> > For the upcoming support to fixed-ram migration with multifd, we need
> > to be able to accept an iovec array with non-contiguous data.
> >
> > Add a pwritev and preadv version that splits the array into contiguous
> > segments before writing. With that we can have the ram code continue
> > to add pages in any order and the multifd code continue to send large
> > arrays for reading and writing.
> >
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> > - split the API that was merged into a single function
> > - use uintptr_t for compatibility with 32-bit
> > ---
> > include/io/channel.h | 26 ++++++++++++++++
> > io/channel.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 96 insertions(+)
> >
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 7986c49c71..25383db5aa 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
> > ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> > size_t niov, off_t offset, Error **errp);
> >
> > +/**
> > + * qio_channel_pwritev_all:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to write data from
> > + * @niov: the length of the @iov array
> > + * @offset: the iovec offset in the file where to write the data
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Returns: 0 if all bytes were written, or -1 on error
> > + */
> > +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> > + size_t niov, off_t offset, Error **errp);
> > +
> > /**
> > * qio_channel_pwrite
> > * @ioc: the channel object
> > @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
> > ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> > size_t niov, off_t offset, Error **errp);
> >
> > +/**
> > + * qio_channel_preadv_all:
> > + * @ioc: the channel object
> > + * @iov: the array of memory regions to read data to
> > + * @niov: the length of the @iov array
> > + * @offset: the iovec offset in the file from where to read the data
> > + * @errp: pointer to a NULL-initialized error object
> > + *
> > + * Returns: 0 if all bytes were read, or -1 on error
> > + */
> > +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> > + size_t niov, off_t offset, Error **errp);
> > +
> > /**
> > * qio_channel_pread
> > * @ioc: the channel object
> > diff --git a/io/channel.c b/io/channel.c
> > index a1f12f8e90..2f1745d052 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> > return klass->io_pwritev(ioc, iov, niov, offset, errp);
> > }
> >
> > +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> > + const struct iovec *iov,
> > + size_t niov, off_t offset,
> > + bool is_write, Error **errp)
> > +{
> > + ssize_t ret = -1;
> > + int i, slice_idx, slice_num;
> > + uintptr_t base, next, file_offset;
> > + size_t len;
> > +
> > + slice_idx = 0;
> > + slice_num = 1;
> > +
> > + /*
> > + * If the iov array doesn't have contiguous elements, we need to
> > + * split it in slices because we only have one (file) 'offset' for
> > + * the whole iov. Do this here so callers don't need to break the
> > + * iov array themselves.
> > + */
> > + for (i = 0; i < niov; i++, slice_num++) {
> > + base = (uintptr_t) iov[i].iov_base;
> > +
> > + if (i != niov - 1) {
> > + len = iov[i].iov_len;
> > + next = (uintptr_t) iov[i + 1].iov_base;
> > +
> > + if (base + len == next) {
> > + continue;
> > + }
> > + }
> > +
> > + /*
> > + * Use the offset of the first element of the segment that
> > + * we're sending.
> > + */
> > + file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> > +
> > + if (is_write) {
> > + ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> > + file_offset, errp);
> > + } else {
> > + ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> > + file_offset, errp);
> > + }
>
> iov_base is the address of a pointer in RAM, so could be
> potentially any 64-bit value.
>
> We're assigning file_offset to this pointer address with an
> user supplied offset, and then using it as an offset on disk.
> First this could result in 64-bit overflow when 'offset' is
> added to 'iov_base', and second this could result in a file
> that's 16 Exabytes in size (with holes of course).
>
> I don't get how this is supposed to work, or be used ?
I feel like this whole method might become clearer if we separated
out the logic for merging memory adjacent iovecs.
How about adding a 'iov_collapse' method in iov.h / iov.c to do
the merging and then let the actual I/O code be simpler ?
>
> > +
> > + if (ret < 0) {
> > + break;
> > + }
> > +
> > + slice_idx += slice_num;
> > + slice_num = 0;
> > + }
> > +
> > + return (ret < 0) ? -1 : 0;
> > +}
> > +
> > +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> > + size_t niov, off_t offset, Error **errp)
> > +{
> > + return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
> > + offset, true, errp);
> > +}
> > +
> > ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
> > off_t offset, Error **errp)
> > {
> > @@ -501,6 +564,13 @@ ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> > return klass->io_preadv(ioc, iov, niov, offset, errp);
> > }
> >
> > +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> > + size_t niov, off_t offset, Error **errp)
> > +{
> > + return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
> > + offset, false, errp);
> > +}
> > +
> > ssize_t qio_channel_pread(QIOChannel *ioc, char *buf, size_t buflen,
> > off_t offset, Error **errp)
> > {
> > --
> > 2.35.3
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2024-01-17 14:28 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-27 20:25 [RFC PATCH v3 00/30] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 01/30] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2024-01-10 8:49 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 02/30] io: Add generic pwritev/preadv interface Fabiano Rosas
2024-01-10 9:07 ` Daniel P. Berrangé
2024-01-11 6:59 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 03/30] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2024-01-10 9:08 ` Daniel P. Berrangé
2024-01-11 7:04 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 04/30] io: fsync before closing a file channel Fabiano Rosas
2024-01-10 9:04 ` Daniel P. Berrangé
2024-01-11 8:44 ` Peter Xu
2024-01-11 18:46 ` Fabiano Rosas
2024-01-12 0:01 ` Peter Xu
2024-01-12 10:40 ` Daniel P. Berrangé
2024-01-15 3:38 ` Peter Xu
2024-01-15 8:57 ` Peter Xu
2024-01-15 9:03 ` Daniel P. Berrangé
2024-01-15 9:31 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 05/30] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2024-01-11 9:57 ` Peter Xu
2024-01-11 18:49 ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 06/30] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-12-22 10:35 ` Markus Armbruster
2024-01-11 10:43 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 07/30] migration: Add fixed-ram URI compatibility check Fabiano Rosas
2024-01-15 9:01 ` Peter Xu
2024-01-23 19:07 ` Fabiano Rosas
2024-01-23 19:07 ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 08/30] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
2024-01-15 9:28 ` Peter Xu
2024-01-15 14:50 ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 09/30] migration/ram: Add incoming " Fabiano Rosas
2024-01-15 9:49 ` Peter Xu
2024-01-15 16:43 ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 10/30] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2024-01-15 10:01 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 11/30] migration/multifd: Allow multifd without packets Fabiano Rosas
2024-01-15 11:51 ` Peter Xu
2024-01-15 18:39 ` Fabiano Rosas
2024-01-15 23:01 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 12/30] migration/multifd: Allow QIOTask error reporting without an object Fabiano Rosas
2024-01-15 12:06 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 13/30] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2024-01-16 4:05 ` Peter Xu
2024-01-16 7:25 ` Peter Xu
2024-01-16 13:37 ` Fabiano Rosas
2024-01-17 8:28 ` Peter Xu
2024-01-17 17:34 ` Fabiano Rosas
2024-01-18 7:11 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 14/30] migration/multifd: Add incoming " Fabiano Rosas
2024-01-16 6:29 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2024-01-16 6:58 ` Peter Xu
2024-01-16 18:15 ` Fabiano Rosas
2024-01-17 9:48 ` Peter Xu
2024-01-17 18:06 ` Fabiano Rosas
2024-01-18 7:44 ` Peter Xu
2024-01-18 12:47 ` Fabiano Rosas
2024-01-19 0:22 ` Peter Xu
2024-01-17 12:39 ` Daniel P. Berrangé
2024-01-17 14:27 ` Daniel P. Berrangé [this message]
2024-01-17 18:09 ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 16/30] multifd: Rename MultiFDSendParams::data to compress_data Fabiano Rosas
2024-01-16 7:03 ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 17/30] migration/multifd: Decouple recv method from pages Fabiano Rosas
2024-01-16 7:23 ` Peter Xu
2023-11-27 20:26 ` [RFC PATCH v3 18/30] migration/multifd: Allow receiving pages without packets Fabiano Rosas
2024-01-16 8:10 ` Peter Xu
2024-01-16 20:25 ` Fabiano Rosas
2024-01-19 0:20 ` Peter Xu
2024-01-19 12:57 ` Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 19/30] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2024-01-16 8:23 ` Peter Xu
2024-01-17 18:13 ` Fabiano Rosas
2024-01-19 1:33 ` Peter Xu
2023-11-27 20:26 ` [RFC PATCH v3 20/30] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 21/30] migration/multifd: Support incoming " Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 22/30] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 23/30] migration: Add direct-io parameter Fabiano Rosas
2023-12-22 10:38 ` Markus Armbruster
2023-11-27 20:26 ` [RFC PATCH v3 24/30] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 25/30] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 26/30] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 27/30] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 28/30] docs/devel/migration.rst: Document the file transport Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 29/30] migration: Add support for fdset with multifd + file Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 30/30] tests/qtest: Add a test for fixed-ram with passing of fds Fabiano Rosas
2024-01-11 10:50 ` [RFC PATCH v3 00/30] migration: File based migration with multifd and fixed-ram Peter Xu
2024-01-11 18:38 ` Fabiano Rosas
2024-01-15 6:22 ` Peter Xu
2024-01-15 8:11 ` Daniel P. Berrangé
2024-01-15 8:41 ` Peter Xu
2024-01-15 19:45 ` Fabiano Rosas
2024-01-15 23:20 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zafjzq5YDTfbYzV-@redhat.com \
--to=berrange@redhat.com \
--cc=armbru@redhat.com \
--cc=cfontana@suse.de \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).