All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, berrange@redhat.com, armbru@redhat.com,
	Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	Claudio Fontana <cfontana@suse.de>
Subject: Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec
Date: Wed, 17 Jan 2024 17:48:14 +0800	[thread overview]
Message-ID: <ZaeiXra5hLSo0jnt@x1n> (raw)
In-Reply-To: <875xztxhyh.fsf@suse.de>

On Tue, Jan 16, 2024 at 03:15:50PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
> >> For the upcoming support to fixed-ram migration with multifd, we need
> >> to be able to accept an iovec array with non-contiguous data.
> >> 
> >> Add a pwritev and preadv version that splits the array into contiguous
> >> segments before writing. With that we can have the ram code continue
> >> to add pages in any order and the multifd code continue to send large
> >> arrays for reading and writing.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >> - split the API that was merged into a single function
> >> - use uintptr_t for compatibility with 32-bit
> >> ---
> >>  include/io/channel.h | 26 ++++++++++++++++
> >>  io/channel.c         | 70 ++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 96 insertions(+)
> >> 
> >> diff --git a/include/io/channel.h b/include/io/channel.h
> >> index 7986c49c71..25383db5aa 100644
> >> --- a/include/io/channel.h
> >> +++ b/include/io/channel.h
> >> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
> >>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> >>                              size_t niov, off_t offset, Error **errp);
> >>  
> >> +/**
> >> + * qio_channel_pwritev_all:
> >> + * @ioc: the channel object
> >> + * @iov: the array of memory regions to write data from
> >> + * @niov: the length of the @iov array
> >> + * @offset: the iovec offset in the file where to write the data
> >> + * @errp: pointer to a NULL-initialized error object
> >> + *
> >> + * Returns: 0 if all bytes were written, or -1 on error
> >> + */
> >> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> >> +                            size_t niov, off_t offset, Error **errp);
> >> +
> >>  /**
> >>   * qio_channel_pwrite
> >>   * @ioc: the channel object
> >> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, size_t buflen,
> >>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
> >>                             size_t niov, off_t offset, Error **errp);
> >>  
> >> +/**
> >> + * qio_channel_preadv_all:
> >> + * @ioc: the channel object
> >> + * @iov: the array of memory regions to read data to
> >> + * @niov: the length of the @iov array
> >> + * @offset: the iovec offset in the file from where to read the data
> >> + * @errp: pointer to a NULL-initialized error object
> >> + *
> >> + * Returns: 0 if all bytes were read, or -1 on error
> >> + */
> >> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
> >> +                           size_t niov, off_t offset, Error **errp);
> >> +
> >>  /**
> >>   * qio_channel_pread
> >>   * @ioc: the channel object
> >> diff --git a/io/channel.c b/io/channel.c
> >> index a1f12f8e90..2f1745d052 100644
> >> --- a/io/channel.c
> >> +++ b/io/channel.c
> >> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
> >>      return klass->io_pwritev(ioc, iov, niov, offset, errp);
> >>  }
> >>  
> >> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
> >> +                                                 const struct iovec *iov,
> >> +                                                 size_t niov, off_t offset,
> >> +                                                 bool is_write, Error **errp)
> >> +{
> >> +    ssize_t ret = -1;
> >> +    int i, slice_idx, slice_num;
> >> +    uintptr_t base, next, file_offset;
> >> +    size_t len;
> >> +
> >> +    slice_idx = 0;
> >> +    slice_num = 1;
> >> +
> >> +    /*
> >> +     * If the iov array doesn't have contiguous elements, we need to
> >> +     * split it in slices because we only have one (file) 'offset' for
> >> +     * the whole iov. Do this here so callers don't need to break the
> >> +     * iov array themselves.
> >> +     */
> >> +    for (i = 0; i < niov; i++, slice_num++) {
> >> +        base = (uintptr_t) iov[i].iov_base;
> >> +
> >> +        if (i != niov - 1) {
> >> +            len = iov[i].iov_len;
> >> +            next = (uintptr_t) iov[i + 1].iov_base;
> >> +
> >> +            if (base + len == next) {
> >> +                continue;
> >> +            }
> >> +        }
> >> +
> >> +        /*
> >> +         * Use the offset of the first element of the segment that
> >> +         * we're sending.
> >> +         */
> >> +        file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
> >> +
> >> +        if (is_write) {
> >> +            ret = qio_channel_pwritev(ioc, &iov[slice_idx], slice_num,
> >> +                                      file_offset, errp);
> >> +        } else {
> >> +            ret = qio_channel_preadv(ioc, &iov[slice_idx], slice_num,
> >> +                                     file_offset, errp);
> >> +        }
> >> +
> >> +        if (ret < 0) {
> >> +            break;
> >> +        }
> >> +
> >> +        slice_idx += slice_num;
> >> +        slice_num = 0;
> >> +    }
> >> +
> >> +    return (ret < 0) ? -1 : 0;
> >> +}
> >> +
> >> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
> >> +                            size_t niov, off_t offset, Error **errp)
> >> +{
> >> +    return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
> >> +                                                 offset, true, errp);
> >> +}
> >
> > I'm not sure how Dan thinks about this, but I don't think this is pretty..
> >
> > With this implementation, iochannels' preadv/pwritev is completely not
> > compatible with most OSes now, afaiu.
> 
> This is internal QEMU code. I hope no one is expecting qio_channel_foo()
> to behave like some OS's foo() system call. We cannot guarantee that
> compatibility save for the simplest of wrappers.

I was expecting that when I started to read. :)

https://man.freebsd.org/cgi/man.cgi?query=pwritev
https://linux.die.net/man/2/pwritev

It's not "some OSes", it's mostly all.  I can understand you prefer such
approach, but even if so, shall we still try to avoid using pwritev/preadv
as the names?

> 
> >
> > The definition of offset in preadv/pwritev of current iochannel is hard to
> > understand.. if I read it right it'll later be set to:
> >       
> >                 /*
> >                  * If we subtract the host page now, we don't need to
> >                  * pass it into qio_channel_pwritev_all() below.
> >                  */
> >                 write_base = p->pages->block->pages_offset -
> >                     (uintptr_t)p->pages->block->host;
> >
> > Which I cannot easily tell what it is.. besides being an unsigned int.
> 
> This description was unfortunately dropped along the way:
> 
> "Since iovs can be non contiguous, we'd need a separate array on the
> side to carry an extra file offset for each of them, so I'm relying on
> the fact that iovs are all within a same host page and passing in an
> encoded offset that takes the host page into account."
> 
> > IIUC it's also based on the assumption that the host address of each iov
> > entry is linear to its offset in the file, but it may not be true for
> > future iochannel users of such interface called as pwritev/preadv.  So
> > error prone.
> 
> Yes, but it's also our choice whether to make this a generic API. We may
> have good reasons to consider a migration-specific function here.
> 
> > Would it be possible we keep using the offset array (p->pages->offset[x])?
> > We have it already anyway, right?  Wouldn't that be clearer?
> >
> 
> We'd have to make a copy of the array because p->pages is expected to
> change while the IO happens.

Hmm, I don't see why p->pages can change. IIUC p->pages will be there solid
at least until all IO syscalls are completed, then the next call to, e.g.,
multifd_send_pages() will swap that with multifd_send_state->pages.  But I
think I get your point, with below.

> And while we already have a copy in
> p->normal, my intention for multifd was to eliminate p->normal in the
> future, so it would be nice if we could avoid it.
> 
> Also, we cannot use p->pages->offset alone because we still need the
> pages_offset, i.e. the file offset where that ramblocks's pages begin.
> So that means also adding that to each element of the new array.
> 
> It would probably be overall clearer and less wasteful to pass in the
> host page address instead of an array of offsets. I don't see an issue
> with restricting the iovs to the same host page. The migration code is
> the only user for this code and AFAIK we don't have plans to change that
> invariant.

So I think I get your point now, the only concern (besides naming..) is,
I still want to avoid an interface that contains a field that is hard to
understand like write_base.

How about this?

  /**
   * multifd_write_ramblock_iov: Write IO vector (of ramblock) to channel
   *
   * @ioc: The iochannel to write to. The IOC must have pwritev/preadv
   *       interface must be implemented.
   * @iov: The IO vector to write.  All addresses must be within the
   *       ramblock host address range.
   * @iov_len: The IO vector size
   * @ramblock: The ramblock that covers all buffers in this IO vector
   */
  int multifd_write_ramblock_iov(ioc, iov, iov_len, ramblock);

-- 
Peter Xu



  reply	other threads:[~2024-01-17  9:48 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 20:25 [RFC PATCH v3 00/30] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 01/30] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2024-01-10  8:49   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 02/30] io: Add generic pwritev/preadv interface Fabiano Rosas
2024-01-10  9:07   ` Daniel P. Berrangé
2024-01-11  6:59   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 03/30] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2024-01-10  9:08   ` Daniel P. Berrangé
2024-01-11  7:04   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 04/30] io: fsync before closing a file channel Fabiano Rosas
2024-01-10  9:04   ` Daniel P. Berrangé
2024-01-11  8:44   ` Peter Xu
2024-01-11 18:46     ` Fabiano Rosas
2024-01-12  0:01       ` Peter Xu
2024-01-12 10:40         ` Daniel P. Berrangé
2024-01-15  3:38           ` Peter Xu
2024-01-15  8:57       ` Peter Xu
2024-01-15  9:03         ` Daniel P. Berrangé
2024-01-15  9:31           ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 05/30] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2024-01-11  9:57   ` Peter Xu
2024-01-11 18:49     ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 06/30] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-12-22 10:35   ` Markus Armbruster
2024-01-11 10:43   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 07/30] migration: Add fixed-ram URI compatibility check Fabiano Rosas
2024-01-15  9:01   ` Peter Xu
2024-01-23 19:07     ` Fabiano Rosas
2024-01-23 19:07     ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 08/30] migration/ram: Add outgoing 'fixed-ram' migration Fabiano Rosas
2024-01-15  9:28   ` Peter Xu
2024-01-15 14:50     ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 09/30] migration/ram: Add incoming " Fabiano Rosas
2024-01-15  9:49   ` Peter Xu
2024-01-15 16:43     ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 10/30] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2024-01-15 10:01   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 11/30] migration/multifd: Allow multifd without packets Fabiano Rosas
2024-01-15 11:51   ` Peter Xu
2024-01-15 18:39     ` Fabiano Rosas
2024-01-15 23:01       ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 12/30] migration/multifd: Allow QIOTask error reporting without an object Fabiano Rosas
2024-01-15 12:06   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 13/30] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2024-01-16  4:05   ` Peter Xu
2024-01-16  7:25     ` Peter Xu
2024-01-16 13:37     ` Fabiano Rosas
2024-01-17  8:28       ` Peter Xu
2024-01-17 17:34         ` Fabiano Rosas
2024-01-18  7:11           ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 14/30] migration/multifd: Add incoming " Fabiano Rosas
2024-01-16  6:29   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2024-01-16  6:58   ` Peter Xu
2024-01-16 18:15     ` Fabiano Rosas
2024-01-17  9:48       ` Peter Xu [this message]
2024-01-17 18:06         ` Fabiano Rosas
2024-01-18  7:44           ` Peter Xu
2024-01-18 12:47             ` Fabiano Rosas
2024-01-19  0:22               ` Peter Xu
2024-01-17 12:39   ` Daniel P. Berrangé
2024-01-17 14:27     ` Daniel P. Berrangé
2024-01-17 18:09       ` Fabiano Rosas
2023-11-27 20:25 ` [RFC PATCH v3 16/30] multifd: Rename MultiFDSendParams::data to compress_data Fabiano Rosas
2024-01-16  7:03   ` Peter Xu
2023-11-27 20:25 ` [RFC PATCH v3 17/30] migration/multifd: Decouple recv method from pages Fabiano Rosas
2024-01-16  7:23   ` Peter Xu
2023-11-27 20:26 ` [RFC PATCH v3 18/30] migration/multifd: Allow receiving pages without packets Fabiano Rosas
2024-01-16  8:10   ` Peter Xu
2024-01-16 20:25     ` Fabiano Rosas
2024-01-19  0:20       ` Peter Xu
2024-01-19 12:57         ` Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 19/30] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2024-01-16  8:23   ` Peter Xu
2024-01-17 18:13     ` Fabiano Rosas
2024-01-19  1:33       ` Peter Xu
2023-11-27 20:26 ` [RFC PATCH v3 20/30] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 21/30] migration/multifd: Support incoming " Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 22/30] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 23/30] migration: Add direct-io parameter Fabiano Rosas
2023-12-22 10:38   ` Markus Armbruster
2023-11-27 20:26 ` [RFC PATCH v3 24/30] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 25/30] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 26/30] monitor: Extract fdset fd flags comparison into a function Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 27/30] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 28/30] docs/devel/migration.rst: Document the file transport Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 29/30] migration: Add support for fdset with multifd + file Fabiano Rosas
2023-11-27 20:26 ` [RFC PATCH v3 30/30] tests/qtest: Add a test for fixed-ram with passing of fds Fabiano Rosas
2024-01-11 10:50 ` [RFC PATCH v3 00/30] migration: File based migration with multifd and fixed-ram Peter Xu
2024-01-11 18:38   ` Fabiano Rosas
2024-01-15  6:22     ` Peter Xu
2024-01-15  8:11       ` Daniel P. Berrangé
2024-01-15  8:41         ` Peter Xu
2024-01-15 19:45       ` Fabiano Rosas
2024-01-15 23:20         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaeiXra5hLSo0jnt@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.