qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile
Date: Fri, 16 Aug 2024 11:13:20 -0400	[thread overview]
Message-ID: <92e5309e-9b16-4cf4-8ffb-e1383201cbd0@oracle.com> (raw)
In-Reply-To: <Zr5r4_lyKAPVZY3Y@x1n>

On 8/15/2024 4:58 PM, Peter Xu wrote:
> On Sun, Jun 30, 2024 at 12:44:03PM -0700, Steve Sistare wrote:
>> Define functions to put/get file descriptors to/from a QEMUFile, for qio
>> channels that support SCM_RIGHTS.  Maintain ordering such that
>>    put(A), put(fd), put(B)
>> followed by
>>    get(A), get(fd), get(B)
>> always succeeds.  Other get orderings may succeed but are not guaranteed.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> It's a slight pity that we'll extend fd to qemufiles, rather than changing
> qemufiles back to iochannels.. for the long term we want to remove qemufile.

Thanks, I did not know that removing QEMUFile is a goal.

Currently QEMUFile buffers I/O in userland to reduce system calls.  Do you
plan to eliminate that buffering, or move it to QIOChannel, perhaps in a
new QIOChannelBuffered class?

If you eliminate the buffering, then migration might take longer.
If you keep the buffering, then this patch's logic will still be needed
in the QIOChannelBuffered code.

> Would you think we can start to introduce iochannel-compatible vmstate
> loader from cpr-[exec/transfer] here?  The idea is that we'd want
> vmstate_load_iochannel() then take that from an iochannel and start getting
> rid of qemufile API.  It'll already bring two benefits:
> 
>    - We don't need this patch then I assume, but stick with iochannel API
> 
>    - We can have Error** as last parameter of vmstate_load_iochannel(), then
>      as we discussed in the other thread cpr_state_load() can fail with
>      explicit errors without error_report()s (and as you pointed out, the
>      load side of Error** support is yet missing)
> 
> There's a 3rd potential benefit, and will come to play when we want to
> allow multifd threads to load device states / VMSDs at some point, as
> multifd doesn't maintain qemufiles, but only iochannels.
> 
> I'm not sure whether it adds too much to you yet, but I'm curious how you
> think about it.

A decent idea, but the task quickly mushrooms.  All of the VMSTATE macros used
in cpr.c would need to be converted, and that stack is deep. eg:

   VMSTATE_INT32
     vmstate_info_int32
       put_int32
         qemu_put_sbe32s
           qemu_put_byte
             add_buf_to_iovec
               qemu_fflush
                qio_channel_writev_all

- Steve

>> ---
>>   migration/qemu-file.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++---
>>   migration/qemu-file.h  |  2 ++
>>   migration/trace-events |  2 ++
>>   3 files changed, 83 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index b6d2f58..424c27d 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -37,6 +37,11 @@
>>   #define IO_BUF_SIZE 32768
>>   #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
>>   
>> +typedef struct FdEntry {
>> +    QTAILQ_ENTRY(FdEntry) entry;
>> +    int fd;
>> +} FdEntry;
>> +
>>   struct QEMUFile {
>>       QIOChannel *ioc;
>>       bool is_writable;
>> @@ -51,6 +56,9 @@ struct QEMUFile {
>>   
>>       int last_error;
>>       Error *last_error_obj;
>> +
>> +    bool fd_pass;
>> +    QTAILQ_HEAD(, FdEntry) fds;
>>   };
>>   
>>   /*
>> @@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
>>       object_ref(ioc);
>>       f->ioc = ioc;
>>       f->is_writable = is_writable;
>> +    f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
>> +    QTAILQ_INIT(&f->fds);
>>   
>>       return f;
>>   }
>> @@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>>       int len;
>>       int pending;
>>       Error *local_error = NULL;
>> +    g_autofree int *fds = NULL;
>> +    size_t nfd = 0;
>> +    int **pfds = f->fd_pass ? &fds : NULL;
>> +    size_t *pnfd = f->fd_pass ? &nfd : NULL;
>>   
>>       assert(!qemu_file_is_writable(f));
>>   
>> @@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>>       }
>>   
>>       do {
>> -        len = qio_channel_read(f->ioc,
>> -                               (char *)f->buf + pending,
>> -                               IO_BUF_SIZE - pending,
>> -                               &local_error);
>> +        struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
>> +        len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
>> +                                     &local_error);
>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>               if (qemu_in_coroutine()) {
>>                   qio_channel_yield(f->ioc, G_IO_IN);
>> @@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>>           qemu_file_set_error_obj(f, len, local_error);
>>       }
>>   
>> +    for (int i = 0; i < nfd; i++) {
>> +        FdEntry *fde = g_new0(FdEntry, 1);
>> +        fde->fd = fds[i];
>> +        QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
>> +    }
>> +
>>       return len;
>>   }
>>   
>> +int qemu_file_put_fd(QEMUFile *f, int fd)
>> +{
>> +    int ret = 0;
>> +    QIOChannel *ioc = qemu_file_get_ioc(f);
>> +    Error *err = NULL;
>> +    struct iovec iov = { (void *)" ", 1 };
>> +
>> +    /*
>> +     * Send a dummy byte so qemu_fill_buffer on the receiving side does not
>> +     * fail with a len=0 error.  Flush first to maintain ordering wrt other
>> +     * data.
>> +     */
>> +
>> +    qemu_fflush(f);
>> +    if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
>> +        error_report_err(error_copy(err));
>> +        qemu_file_set_error_obj(f, -EIO, err);
>> +        ret = -1;
>> +    }
>> +    trace_qemu_file_put_fd(f->ioc->name, fd, ret);
>> +    return 0;
>> +}
>> +
>> +int qemu_file_get_fd(QEMUFile *f)
>> +{
>> +    int fd = -1;
>> +    FdEntry *fde;
>> +
>> +    if (!f->fd_pass) {
>> +        Error *err = NULL;
>> +        error_setg(&err, "%s does not support fd passing", f->ioc->name);
>> +        error_report_err(error_copy(err));
>> +        qemu_file_set_error_obj(f, -EIO, err);
>> +        goto out;
>> +    }
>> +
>> +    /* Force the dummy byte and its fd passenger to appear. */
>> +    qemu_peek_byte(f, 0);
>> +
>> +    fde = QTAILQ_FIRST(&f->fds);
>> +    if (fde) {
>> +        qemu_get_byte(f);       /* Drop the dummy byte */
>> +        fd = fde->fd;
>> +        QTAILQ_REMOVE(&f->fds, fde, entry);
>> +    }
>> +out:
>> +    trace_qemu_file_get_fd(f->ioc->name, fd);
>> +    return fd;
>> +}
>> +
>>   /** Closes the file
>>    *
>>    * Returns negative error value if any error happened on previous operations or
>> @@ -361,11 +430,17 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>>    */
>>   int qemu_fclose(QEMUFile *f)
>>   {
>> +    FdEntry *fde, *next;
>>       int ret = qemu_fflush(f);
>>       int ret2 = qio_channel_close(f->ioc, NULL);
>>       if (ret >= 0) {
>>           ret = ret2;
>>       }
>> +    QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
>> +        warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
>> +        close(fde->fd);
>> +        g_free(fde);
>> +    }
>>       g_clear_pointer(&f->ioc, object_unref);
>>       error_free(f->last_error_obj);
>>       g_free(f);
>> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
>> index 11c2120..3e47a20 100644
>> --- a/migration/qemu-file.h
>> +++ b/migration/qemu-file.h
>> @@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
>>                             off_t pos);
>>   
>>   QIOChannel *qemu_file_get_ioc(QEMUFile *file);
>> +int qemu_file_put_fd(QEMUFile *f, int fd);
>> +int qemu_file_get_fd(QEMUFile *f);
>>   
>>   #endif
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 173f2c0..064b22d 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -88,6 +88,8 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
>>   
>>   # qemu-file.c
>>   qemu_file_fclose(void) ""
>> +qemu_file_put_fd(const char *name, int fd, int ret) "ioc %s, fd %d -> status %d"
>> +qemu_file_get_fd(const char *name, int fd) "ioc %s -> fd %d"
>>   
>>   # ram.c
>>   get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>> -- 
>> 1.8.3.1
>>
> 


  reply	other threads:[~2024-08-16 15:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-30 19:44 [RFC V1 0/6] Live update: cpr-transfer Steve Sistare
2024-06-30 19:44 ` [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-08-02  8:20   ` Euan Turner
2024-08-05 19:06     ` Steven Sistare
2024-08-15 20:58   ` Peter Xu
2024-08-16 15:13     ` Steven Sistare [this message]
2024-08-16 15:51       ` Peter Xu
2024-06-30 19:44 ` [RFC V1 2/6] migration: VMSTATE_FD Steve Sistare
2024-06-30 19:44 ` [RFC V1 3/6] migration: cpr-transfer save and load Steve Sistare
2024-06-30 19:44 ` [RFC V1 4/6] migration: cpr-uri parameter Steve Sistare
2024-08-15 20:46   ` Peter Xu
2024-08-16 15:13     ` Steven Sistare
2024-06-30 19:44 ` [RFC V1 5/6] migration: cpr-uri option Steve Sistare
2024-06-30 19:44 ` [RFC V1 6/6] migration: cpr-transfer mode Steve Sistare
2024-08-13 21:27   ` Steven Sistare
2024-07-18 15:36 ` [RFC V1 0/6] Live update: cpr-transfer Peter Xu
2024-07-20 20:07   ` Steven Sistare
2024-08-15 20:28     ` Peter Xu
2024-08-16  8:42       ` Daniel P. Berrangé
2024-08-16 15:14         ` Steven Sistare
2024-08-16 16:07           ` Peter Xu
2024-08-16 15:13       ` Steven Sistare
2024-08-16 15:23         ` Peter Xu
2024-08-16 15:36           ` Daniel P. Berrangé
2024-08-16 15:59             ` Peter Xu
2024-08-16 18:34               ` Steven Sistare
2024-08-20 16:29                 ` Steven Sistare
2024-09-04 21:14                   ` Steven Sistare
2024-09-04 22:09                     ` Peter Xu
2024-09-05 17:30                   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92e5309e-9b16-4cf4-8ffb-e1383201cbd0@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).