From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, berrange@redhat.com, armbru@redhat.com,
Claudio Fontana <cfontana@suse.de>, Jim Fehlig <jfehlig@suse.com>,
Thomas Huth <thuth@redhat.com>,
Laurent Vivier <lvivier@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration
Date: Wed, 12 Jun 2024 15:08:02 -0300 [thread overview]
Message-ID: <87ed92c9vh.fsf@suse.de> (raw)
In-Reply-To: <87le3cwo9e.fsf@suse.de>
Fabiano Rosas <farosas@suse.de> writes:
> Peter Xu <peterx@redhat.com> writes:
>
>> On Mon, Jun 10, 2024 at 02:45:53PM -0300, Fabiano Rosas wrote:
>>> >> AIUI, the issue here that users are already allowed to specify in
>>> >> libvirt the equivalent to direct-io and multifd independent of each
>>> >> other (bypass-cache, parallel). To start requiring both together now in
>>> >> some situations would be a regression. I confess I don't know libvirt
>>> >> code to know whether this can be worked around somehow, but as I said,
>>> >> it's a relatively simple change from the QEMU side.
>>> >
>>> > Firstly, I definitely want to already avoid all the calls to either
>>> > migration_direct_io_start() or *_finish(), now we already need to
>>> > explicitly call them in three paths, and that's not intuitive and less
>>> > readable, just like the hard coded rdma codes.
>>>
>>> Right, but that's just a side-effect of how the code is structured and
>>> the fact that writes to the stream happen in small chunks. Setting
>>> O_DIRECT needs to happen around aligned IO. We could move the calls
>>> further down into qemu_put_buffer_at(), but that would be four fcntl()
>>> calls for every page.
>>
>> Hmm.. why we need four fcntl()s instead of two?
>
> Because we need to first get the flags before flipping the O_DIRECT
> bit. And we do this once to enable and once to disable.
>
> int flags = fcntl(fioc->fd, F_GETFL);
> if (enabled) {
> flags |= O_DIRECT;
> } else {
> flags &= ~O_DIRECT;
> }
> fcntl(fioc->fd, F_SETFL, flags);
>
>>>
>>> A tangent:
>>> one thing that occured to me now is that we may be able to restrict
>>> calls to qemu_fflush() to internal code like add_to_iovec() and maybe
>>> use that function to gather the correct amount of data before writing,
>>> making sure it disables O_DIRECT in case alignment is about to be
>>> broken?
>>
>> IIUC dio doesn't require alignment if we don't care about perf? I meant it
>> should be legal to write(fd, buffer, 5) even if O_DIRECT?
>
> No, we may get an -EINVAL. See Daniel's reply.
>
>>
>> I just noticed the asserts you added in previous patch, I think that's
>> better indeed, but still I'm wondering whether we can avoid enabling it on
>> qemufile.
>>
>> It makes me feel slightly nervous when introducing dio to QEMUFile rather
>> than iochannels - the API design of QEMUFile seems to easily encourage
>> breaking things in dio worlds with a default and static buffering. And if
>> we're going to blacklist most of the API anyway except the new one for
>> mapped-ram, I start to wonder too why bother on top of QEMUFile anyway.
>>
>> IIRC you also mentioned in the previous doc patch so that libvirt should
>> always pass in two fds anyway to the fdset if dio is enabled. I wonder
>> whether it's also true for multifd=off and directio=on, then would it be
>> possible to use the dio for guest pages with one fd, while keeping the
>> normal stream to use !dio with the other fd. I'm not sure whether it's
>> easy to avoid qemufile in the dio fd, even if not looks like we may avoid
>> frequent fcntl()s?
>
> Hm, sounds like a good idea. We'd need a place to put that new ioc
> though. Either QEMUFile.direct_ioc and then make use of it in
> qemu_put_buffer_at() or a more transparent QIOChannelFile.direct_fd that
> gets set somewhere during file_start_outgoing_migration(). Let me try to
> come up with something.
I looked into this and it's cumbersome:
- We'd need to check migrate_direct_io() several times, once to get the
second fd and during every IO to know to use the fd.
- Even getting the second fd is not straight forward, we need to create
a new ioc for it with qio_channel_new_path(). But QEMUFile is generic
code, so we'd probably need to call this channel-file specific
function from migration_channel_connect().
- With the new ioc, do we put it in QEMUFile, or do we take the fd only?
Or maybe an ioc with two fds? Or a new QIOChannelDirect? All options
look bad to me.
So I suggest we proceed proceed with the 1 multifd channel approach,
passing 2 fds into QEMU just like we do for the n channels. Is that ok
from libvirt's perspective? I assume libvirt users are mostly interested
in _enabling_ parallelism with --parallel, instead of _avoiding_ it with
the ommision of the option, so main thread + 1 channel should not be a
bad thing.
Choosing to use 1 multifd channel now is also a gentler introduction for
when we finally move all of the vmstate migration into multifd (I've
been looking into this, but don't hold your breaths).
next prev parent reply other threads:[~2024-06-12 18:08 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-23 19:05 [PATCH v2 00/18] migration/mapped-ram: Add direct-io support Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 01/18] migration: Fix file migration with fdset Fabiano Rosas
2024-05-24 10:51 ` Prasad Pandit
2024-05-24 12:30 ` Fabiano Rosas
2024-05-25 6:16 ` Prasad Pandit
2024-05-30 16:11 ` Peter Xu
2024-05-31 14:58 ` Fabiano Rosas
2024-06-03 10:20 ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 02/18] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
2024-05-30 16:14 ` Peter Xu
2024-06-03 10:21 ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 03/18] tests/qtest/migration: Add a precopy file test with fdset Fabiano Rosas
2024-05-30 16:18 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 04/18] monitor: Drop monitor_fdset_dup_fd_add() Fabiano Rosas
2024-06-03 10:26 ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 05/18] monitor: Introduce monitor_fdset_*free Fabiano Rosas
2024-05-30 20:03 ` Peter Xu
2024-05-31 15:01 ` Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 06/18] monitor: Stop removing non-duplicated fds Fabiano Rosas
2024-05-30 21:05 ` Peter Xu
2024-05-31 15:25 ` Fabiano Rosas
2024-05-31 15:56 ` Peter Xu
2024-06-04 23:40 ` Dr. David Alan Gilbert
2024-06-05 12:31 ` Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 07/18] monitor: Simplify fdset and fd removal Fabiano Rosas
2024-05-31 15:58 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 08/18] monitor: Report errors from monitor_fdset_dup_fd_add Fabiano Rosas
2024-05-30 21:08 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 09/18] io: Stop using qemu_open_old in channel-file Fabiano Rosas
2024-05-30 21:10 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 10/18] migration: Add direct-io parameter Fabiano Rosas
2024-05-30 21:12 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 11/18] migration/multifd: Add direct-io support Fabiano Rosas
2024-05-30 21:35 ` Peter Xu
2024-05-31 15:27 ` Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 12/18] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 13/18] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-05-30 21:41 ` Peter Xu
2024-05-31 15:42 ` Fabiano Rosas
2024-05-31 15:58 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 14/18] migration: Add documentation for fdset with multifd + file Fabiano Rosas
2024-06-04 20:46 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 15/18] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
2024-06-04 20:51 ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 16/18] io/channel-file: Add direct-io support Fabiano Rosas
2024-06-03 10:32 ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 17/18] migration: Add direct-io helpers Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration Fabiano Rosas
2024-06-04 20:56 ` Peter Xu
2024-06-07 18:42 ` Fabiano Rosas
2024-06-07 20:39 ` Jim Fehlig
2024-06-10 16:09 ` Peter Xu
2024-06-10 17:45 ` Fabiano Rosas
2024-06-10 19:02 ` Peter Xu
2024-06-10 19:07 ` Daniel P. Berrangé
2024-06-10 20:12 ` Fabiano Rosas
2024-06-12 18:08 ` Fabiano Rosas [this message]
2024-06-12 18:15 ` Daniel P. Berrangé
2024-06-12 18:27 ` Peter Xu
2024-06-12 18:44 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ed92c9vh.fsf@suse.de \
--to=farosas@suse.de \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=cfontana@suse.de \
--cc=jfehlig@suse.com \
--cc=lvivier@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.