From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile
Date: Fri, 16 Aug 2024 11:13:20 -0400 [thread overview]
Message-ID: <92e5309e-9b16-4cf4-8ffb-e1383201cbd0@oracle.com> (raw)
In-Reply-To: <Zr5r4_lyKAPVZY3Y@x1n>
On 8/15/2024 4:58 PM, Peter Xu wrote:
> On Sun, Jun 30, 2024 at 12:44:03PM -0700, Steve Sistare wrote:
>> Define functions to put/get file descriptors to/from a QEMUFile, for qio
>> channels that support SCM_RIGHTS. Maintain ordering such that
>> put(A), put(fd), put(B)
>> followed by
>> get(A), get(fd), get(B)
>> always succeeds. Other get orderings may succeed but are not guaranteed.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>
> It's a slight pity that we'll extend fd to qemufiles, rather than changing
> qemufiles back to iochannels.. for the long term we want to remove qemufile.
Thanks, I did not know that removing QEMUFile is a goal.
Currently QEMUFile buffers I/O in userland to reduce system calls. Do you
plan to eliminate that buffering, or move it to QIOChannel, perhaps in a
new QIOChannelBuffered class?
If you eliminate the buffering, then migration might take longer.
If you keep the buffering, then this patch's logic will still be needed
in the QIOChannelBuffered code.
> Would you think we can start to introduce iochannel-compatible vmstate
> loader from cpr-[exec/transfer] here? The idea is that we'd want
> vmstate_load_iochannel() then take that from an iochannel and start getting
> rid of qemufile API. It'll already bring two benefits:
>
> - We don't need this patch then I assume, but stick with iochannel API
>
> - We can have Error** as last parameter of vmstate_load_iochannel(), then
> as we discussed in the other thread cpr_state_load() can fail with
> explicit errors without error_report()s (and as you pointed out, the
> load side of Error** support is yet missing)
>
> There's a 3rd potential benefit, and will come to play when we want to
> allow multifd threads to load device states / VMSDs at some point, as
> multifd doesn't maintain qemufiles, but only iochannels.
>
> I'm not sure whether it adds too much to you yet, but I'm curious how you
> think about it.
A decent idea, but the task quickly mushrooms. All of the VMSTATE macros used
in cpr.c would need to be converted, and that stack is deep. eg:
VMSTATE_INT32
vmstate_info_int32
put_int32
qemu_put_sbe32s
qemu_put_byte
add_buf_to_iovec
qemu_fflush
qio_channel_writev_all
- Steve
>> ---
>> migration/qemu-file.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++---
>> migration/qemu-file.h | 2 ++
>> migration/trace-events | 2 ++
>> 3 files changed, 83 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> index b6d2f58..424c27d 100644
>> --- a/migration/qemu-file.c
>> +++ b/migration/qemu-file.c
>> @@ -37,6 +37,11 @@
>> #define IO_BUF_SIZE 32768
>> #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
>>
>> +typedef struct FdEntry {
>> + QTAILQ_ENTRY(FdEntry) entry;
>> + int fd;
>> +} FdEntry;
>> +
>> struct QEMUFile {
>> QIOChannel *ioc;
>> bool is_writable;
>> @@ -51,6 +56,9 @@ struct QEMUFile {
>>
>> int last_error;
>> Error *last_error_obj;
>> +
>> + bool fd_pass;
>> + QTAILQ_HEAD(, FdEntry) fds;
>> };
>>
>> /*
>> @@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
>> object_ref(ioc);
>> f->ioc = ioc;
>> f->is_writable = is_writable;
>> + f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
>> + QTAILQ_INIT(&f->fds);
>>
>> return f;
>> }
>> @@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>> int len;
>> int pending;
>> Error *local_error = NULL;
>> + g_autofree int *fds = NULL;
>> + size_t nfd = 0;
>> + int **pfds = f->fd_pass ? &fds : NULL;
>> + size_t *pnfd = f->fd_pass ? &nfd : NULL;
>>
>> assert(!qemu_file_is_writable(f));
>>
>> @@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>> }
>>
>> do {
>> - len = qio_channel_read(f->ioc,
>> - (char *)f->buf + pending,
>> - IO_BUF_SIZE - pending,
>> - &local_error);
>> + struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
>> + len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
>> + &local_error);
>> if (len == QIO_CHANNEL_ERR_BLOCK) {
>> if (qemu_in_coroutine()) {
>> qio_channel_yield(f->ioc, G_IO_IN);
>> @@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>> qemu_file_set_error_obj(f, len, local_error);
>> }
>>
>> + for (int i = 0; i < nfd; i++) {
>> + FdEntry *fde = g_new0(FdEntry, 1);
>> + fde->fd = fds[i];
>> + QTAILQ_INSERT_TAIL(&f->fds, fde, entry);
>> + }
>> +
>> return len;
>> }
>>
>> +int qemu_file_put_fd(QEMUFile *f, int fd)
>> +{
>> + int ret = 0;
>> + QIOChannel *ioc = qemu_file_get_ioc(f);
>> + Error *err = NULL;
>> + struct iovec iov = { (void *)" ", 1 };
>> +
>> + /*
>> + * Send a dummy byte so qemu_fill_buffer on the receiving side does not
>> + * fail with a len=0 error. Flush first to maintain ordering wrt other
>> + * data.
>> + */
>> +
>> + qemu_fflush(f);
>> + if (qio_channel_writev_full(ioc, &iov, 1, &fd, 1, 0, &err) < 1) {
>> + error_report_err(error_copy(err));
>> + qemu_file_set_error_obj(f, -EIO, err);
>> + ret = -1;
>> + }
>> + trace_qemu_file_put_fd(f->ioc->name, fd, ret);
>> + return 0;
>> +}
>> +
>> +int qemu_file_get_fd(QEMUFile *f)
>> +{
>> + int fd = -1;
>> + FdEntry *fde;
>> +
>> + if (!f->fd_pass) {
>> + Error *err = NULL;
>> + error_setg(&err, "%s does not support fd passing", f->ioc->name);
>> + error_report_err(error_copy(err));
>> + qemu_file_set_error_obj(f, -EIO, err);
>> + goto out;
>> + }
>> +
>> + /* Force the dummy byte and its fd passenger to appear. */
>> + qemu_peek_byte(f, 0);
>> +
>> + fde = QTAILQ_FIRST(&f->fds);
>> + if (fde) {
>> + qemu_get_byte(f); /* Drop the dummy byte */
>> + fd = fde->fd;
>> + QTAILQ_REMOVE(&f->fds, fde, entry);
>> + }
>> +out:
>> + trace_qemu_file_get_fd(f->ioc->name, fd);
>> + return fd;
>> +}
>> +
>> /** Closes the file
>> *
>> * Returns negative error value if any error happened on previous operations or
>> @@ -361,11 +430,17 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
>> */
>> int qemu_fclose(QEMUFile *f)
>> {
>> + FdEntry *fde, *next;
>> int ret = qemu_fflush(f);
>> int ret2 = qio_channel_close(f->ioc, NULL);
>> if (ret >= 0) {
>> ret = ret2;
>> }
>> + QTAILQ_FOREACH_SAFE(fde, &f->fds, entry, next) {
>> + warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
>> + close(fde->fd);
>> + g_free(fde);
>> + }
>> g_clear_pointer(&f->ioc, object_unref);
>> error_free(f->last_error_obj);
>> g_free(f);
>> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
>> index 11c2120..3e47a20 100644
>> --- a/migration/qemu-file.h
>> +++ b/migration/qemu-file.h
>> @@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
>> off_t pos);
>>
>> QIOChannel *qemu_file_get_ioc(QEMUFile *file);
>> +int qemu_file_put_fd(QEMUFile *f, int fd);
>> +int qemu_file_get_fd(QEMUFile *f);
>>
>> #endif
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 173f2c0..064b22d 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -88,6 +88,8 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
>>
>> # qemu-file.c
>> qemu_file_fclose(void) ""
>> +qemu_file_put_fd(const char *name, int fd, int ret) "ioc %s, fd %d -> status %d"
>> +qemu_file_get_fd(const char *name, int fd) "ioc %s -> fd %d"
>>
>> # ram.c
>> get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>> --
>> 1.8.3.1
>>
>
next prev parent reply other threads:[~2024-08-16 15:14 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-30 19:44 [RFC V1 0/6] Live update: cpr-transfer Steve Sistare
2024-06-30 19:44 ` [RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-08-02 8:20 ` Euan Turner
2024-08-05 19:06 ` Steven Sistare
2024-08-15 20:58 ` Peter Xu
2024-08-16 15:13 ` Steven Sistare [this message]
2024-08-16 15:51 ` Peter Xu
2024-06-30 19:44 ` [RFC V1 2/6] migration: VMSTATE_FD Steve Sistare
2024-06-30 19:44 ` [RFC V1 3/6] migration: cpr-transfer save and load Steve Sistare
2024-06-30 19:44 ` [RFC V1 4/6] migration: cpr-uri parameter Steve Sistare
2024-08-15 20:46 ` Peter Xu
2024-08-16 15:13 ` Steven Sistare
2024-06-30 19:44 ` [RFC V1 5/6] migration: cpr-uri option Steve Sistare
2024-06-30 19:44 ` [RFC V1 6/6] migration: cpr-transfer mode Steve Sistare
2024-08-13 21:27 ` Steven Sistare
2024-07-18 15:36 ` [RFC V1 0/6] Live update: cpr-transfer Peter Xu
2024-07-20 20:07 ` Steven Sistare
2024-08-15 20:28 ` Peter Xu
2024-08-16 8:42 ` Daniel P. Berrangé
2024-08-16 15:14 ` Steven Sistare
2024-08-16 16:07 ` Peter Xu
2024-08-16 15:13 ` Steven Sistare
2024-08-16 15:23 ` Peter Xu
2024-08-16 15:36 ` Daniel P. Berrangé
2024-08-16 15:59 ` Peter Xu
2024-08-16 18:34 ` Steven Sistare
2024-08-20 16:29 ` Steven Sistare
2024-09-04 21:14 ` Steven Sistare
2024-09-04 22:09 ` Peter Xu
2024-09-05 17:30 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92e5309e-9b16-4cf4-8ffb-e1383201cbd0@oracle.com \
--to=steven.sistare@oracle.com \
--cc=armbru@redhat.com \
--cc=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).