From: Fabiano Rosas <farosas@suse.de>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>,
Claudio Fontana <cfontana@suse.de>,
Eric Blake <eblake@redhat.com>
Subject: Re: [PATCH v2 28/29] migration: Add direct-io parameter
Date: Mon, 30 Oct 2023 19:51:34 -0300 [thread overview]
Message-ID: <87cywvenbd.fsf@suse.de> (raw)
In-Reply-To: <ZTkpllWQdLSMw3pP@redhat.com>
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Wed, Oct 25, 2023 at 11:32:00AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>> > On Tue, Oct 24, 2023 at 04:32:10PM -0300, Fabiano Rosas wrote:
>> >> Markus Armbruster <armbru@redhat.com> writes:
>> >>
>> >> > Fabiano Rosas <farosas@suse.de> writes:
>> >> >
>> >> >> Add the direct-io migration parameter that tells the migration code to
>> >> >> use O_DIRECT when opening the migration stream file whenever possible.
>> >> >>
>> >> >> This is currently only used for the secondary channels of fixed-ram
>> >> >> migration, which can guarantee that writes are page aligned.
>> >> >>
>> >> >> However the parameter could be made to affect other types of
>> >> >> file-based migrations in the future.
>> >> >>
>> >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> >
>> >> > When would you want to enable @direct-io, and when would you want to
>> >> > leave it disabled?
>> >>
>> >> That depends on a performance analysis. You'd generally leave it
>> >> disabled unless there's some indication that the operating system is
>> >> having trouble draining the page cache.
>> >
>> > That's not the usage model I would suggest.
>> >
>>
>> Hehe I took a shot at answering but I really wanted to say "ask Daniel".
>>
>> > The biggest value of the page cache comes when it holds data that
>> > will be repeatedly accessed.
>> >
>> > When you are saving/restoring a guest to file, that data is used
>> > once only (assuming there's a large gap between save & restore).
>> > By using the page cache to save a big guest we essentially purge
>> > the page cache of most of its existing data that is likely to be
>> > reaccessed, to fill it up with data never to be reaccessed.
>> >
>> > I usually describe save/restore operations as trashing the page
>> > cache.
>> >
>> > IMHO, mgmt apps should request O_DIRECT always unless they expect
>> > the save/restore operation to run in quick succession, or if they
>> > know that the host has oodles of free RAM such that existing data
>> > in the page cache won't be trashed, or
>>
>> Thanks, I'll try to incorporate this to some kind of doc in the next
>> version.
>>
>> > if the host FS does not support O_DIRECT of course.
>>
>> Should we try to probe for this and inform the user?
>
> qemu_open_internall will already do a nice error message. If it gets
> EINVAL when using O_DIRECT, it'll retry without O_DIRECT and if that
> works, it'll reoprt "filesystem does not support O_DIRECT"
>
> Having said that I see a problem with /dev/fdset handling, because
> we're only validating O_ACCMODE and that excludes O_DIRECT.
>
> If the mgmt apps passes an FD with O_DIRECT already set, then it
> won't work for VMstate saving which is unaligned.
>
> If the mgmt app passes an FD without O_DIRECT set, then we are
> not setting O_DIRECT for the multifd RAM threads.
I could use some advice on how to solve this situation. The fdset code
at monitor/fds.c and the add-fd command don't seem to be usable outside
the original use-case of passing fds with different open flags.
There are several problems, the biggest one being that there's no way to
manipulate the set of file descriptors aside from asking for duplication
of an fd that matches a particular set of flags.
That doesn't work for us because the two fds we need (one for main
channel, other for secondary channels) will have the same open flags. So
the fdset code will always return the first one it finds in the set.
Another problem (or feature) of the fdset code is that it only removes
an fd when qmp_remove_fd() is called if the VM runstate is RUNNING,
which means that the migration code cannot remove the fds after use
reliably. We need to be able to remove them to make sure we use the
correct fds in a subsequent migration.
I see some paths forward:
1) Require the user to put the fds in separate fdsets.
This would be the easiest to handle in the migration code, but we
would have to come up with special file: URL syntax to accomodate more
than one fdset. Perhaps "file:/dev/fdsets/1,2" ?
2) Require the two fds in the same fdset and separate them ourselves.
This would require extending the fdset code to allow more ways of
manipulating the fdset. There's two options here:
a) Implement a pop() operation in the fdset code. We receive the
fdset, pop one fd from it and put it in a new fdset. I did some
experimentation with this by having an fd->present flag and just
skipping the fd during query-fdsets and
monitor_fdset_dup_fd_add(). It works, but it's convoluted.
b) Add support for removing the original fd when given the dup()ed
fd. The list of duplicated fds is currently by-fdset and not
by-original-fd, so this would require a larger code change.
3) Design a whole new URI.
Here, there are the usual benefits and drawbacks of doing something
from scratch. With the added drawback of dissociating from the file:
URI which is already well tested and easy to use when doing QEMU-only
migration.
With the three options above there's still the issue of removing the
fd. I think the original commit[1] might have been mistaken in adding
the runstate_is_running() check for *both* the "removed = true" clause
and the "fd was never duplicated" clause. But it's hard to tell since
this whole feature is a bit opaque to me.
1- ebe52b592d (monitor: Prevent removing fd from set during init)
https://gitlab.com/qemu-project/qemu/-/commit/ebe52b592d
All in all, I'm inclined to consider the first option, unless someone
has a better idea. Assuming we can figure out the removal issue, that
is.
Thoughts?
next prev parent reply other threads:[~2023-10-30 22:52 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 20:35 [PATCH v2 00/29] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 01/29] tests/qtest: migration events Fabiano Rosas
2023-10-25 9:44 ` Thomas Huth
2023-10-25 10:14 ` Daniel P. Berrangé
2023-10-25 13:21 ` Fabiano Rosas
2023-10-25 13:33 ` Steven Sistare
2023-10-23 20:35 ` [PATCH v2 02/29] tests/qtest: Move QTestMigrationState to libqtest Fabiano Rosas
2023-10-25 10:17 ` Daniel P. Berrangé
2023-10-25 13:19 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 03/29] tests/qtest: Allow waiting for migration events Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 04/29] migration: Return the saved state from global_state_store Fabiano Rosas
2023-10-25 10:19 ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 05/29] migration: Introduce global_state_store_once Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 06/29] migration: Add auto-pause capability Fabiano Rosas
2023-10-24 5:25 ` Markus Armbruster
2023-10-24 18:12 ` Fabiano Rosas
2023-10-25 5:33 ` Markus Armbruster
2023-10-25 8:48 ` Daniel P. Berrangé
2023-10-25 13:57 ` Fabiano Rosas
2023-10-25 14:20 ` Daniel P. Berrangé
2023-10-25 14:58 ` Peter Xu
2023-10-25 15:25 ` Daniel P. Berrangé
2023-10-25 15:36 ` Peter Xu
2023-10-25 15:40 ` Daniel P. Berrangé
2023-10-25 17:20 ` Peter Xu
2023-10-25 17:31 ` Daniel P. Berrangé
2023-10-25 19:28 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 07/29] migration: Run "file:" migration with a stopped VM Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 08/29] tests/qtest: File migration auto-pause tests Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 10/29] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-10-24 8:18 ` Daniel P. Berrangé
2023-10-24 19:06 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 11/29] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 12/29] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-10-25 10:22 ` Daniel P. Berrangé
2023-10-23 20:35 ` [PATCH v2 13/29] migration: fixed-ram: Add URI compatibility check Fabiano Rosas
2023-10-25 10:27 ` Daniel P. Berrangé
2023-10-31 16:06 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 14/29] migration/ram: Introduce 'fixed-ram' migration capability Fabiano Rosas
2023-10-24 5:33 ` Markus Armbruster
2023-10-24 18:35 ` Fabiano Rosas
2023-10-25 6:18 ` Markus Armbruster
2023-10-23 20:35 ` [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing migration Fabiano Rosas
2023-10-25 9:39 ` Daniel P. Berrangé
2023-10-25 14:03 ` Fabiano Rosas
2023-11-01 15:23 ` Peter Xu
2023-11-01 15:52 ` Daniel P. Berrangé
2023-11-01 16:24 ` Peter Xu
2023-11-01 16:37 ` Daniel P. Berrangé
2023-11-01 17:30 ` Peter Xu
2023-10-31 16:52 ` Peter Xu
2023-10-31 17:33 ` Fabiano Rosas
2023-10-31 17:59 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 16/29] migration/ram: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-10-25 9:43 ` Daniel P. Berrangé
2023-10-25 14:07 ` Fabiano Rosas
2023-10-31 19:03 ` Peter Xu
2023-11-01 9:26 ` Daniel P. Berrangé
2023-11-01 14:21 ` Peter Xu
2023-11-01 14:28 ` Daniel P. Berrangé
2023-11-01 15:00 ` Peter Xu
2023-11-06 13:18 ` Fabiano Rosas
2023-11-06 21:00 ` Peter Xu
2023-11-07 9:02 ` Daniel P. Berrangé
2023-10-31 19:09 ` Peter Xu
2023-10-31 20:00 ` Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 17/29] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 18/29] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-10-23 20:35 ` [PATCH v2 19/29] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-10-25 9:52 ` Daniel P. Berrangé
2023-10-25 14:12 ` Fabiano Rosas
2023-10-25 14:23 ` Daniel P. Berrangé
2023-10-25 15:00 ` Fabiano Rosas
2023-10-25 15:26 ` Daniel P. Berrangé
2023-10-31 20:11 ` Peter Xu
2023-10-23 20:35 ` [PATCH v2 20/29] migration/multifd: Add incoming " Fabiano Rosas
2023-10-25 10:29 ` Daniel P. Berrangé
2023-10-25 14:18 ` Fabiano Rosas
2023-10-31 21:28 ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 21/29] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-10-31 22:10 ` Peter Xu
2023-10-31 23:18 ` Fabiano Rosas
2023-11-01 15:55 ` Peter Xu
2023-11-01 17:20 ` Fabiano Rosas
2023-11-01 17:35 ` Peter Xu
2023-11-01 18:14 ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 22/29] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-10-24 8:50 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 23/29] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-11-01 14:29 ` Peter Xu
2023-10-23 20:36 ` [PATCH v2 24/29] migration/ram: Ignore multifd flush when doing fixed-ram migration Fabiano Rosas
2023-10-25 9:09 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 25/29] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-10-25 9:23 ` Daniel P. Berrangé
2023-10-25 14:21 ` Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 26/29] migration/multifd: Support incoming " Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 27/29] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-10-23 20:36 ` [PATCH v2 28/29] migration: Add direct-io parameter Fabiano Rosas
2023-10-24 5:41 ` Markus Armbruster
2023-10-24 19:32 ` Fabiano Rosas
2023-10-25 6:23 ` Markus Armbruster
2023-10-25 8:44 ` Daniel P. Berrangé
2023-10-25 14:32 ` Fabiano Rosas
2023-10-25 14:43 ` Daniel P. Berrangé
2023-10-25 17:30 ` Fabiano Rosas
2023-10-25 17:45 ` Daniel P. Berrangé
2023-10-25 18:10 ` Fabiano Rosas
2023-10-30 22:51 ` Fabiano Rosas [this message]
2023-10-31 9:03 ` Daniel P. Berrangé
2023-10-31 13:05 ` Fabiano Rosas
2023-10-31 13:45 ` Daniel P. Berrangé
2023-10-31 14:33 ` Fabiano Rosas
2023-10-31 15:22 ` Daniel P. Berrangé
2023-10-31 15:52 ` Fabiano Rosas
2023-10-31 15:58 ` Daniel P. Berrangé
2023-10-31 19:05 ` Fabiano Rosas
2023-11-01 9:30 ` Daniel P. Berrangé
2023-11-01 12:16 ` Fabiano Rosas
2023-11-01 12:23 ` Daniel P. Berrangé
2023-11-01 12:30 ` Fabiano Rosas
2023-10-24 8:33 ` Daniel P. Berrangé
2023-10-24 19:06 ` Fabiano Rosas
2023-10-25 9:07 ` Daniel P. Berrangé
2023-10-25 14:48 ` Fabiano Rosas
2023-10-25 15:22 ` Daniel P. Berrangé
2023-10-23 20:36 ` [PATCH v2 29/29] tests/qtest: Add a test for migration with direct-io and multifd Fabiano Rosas
2023-10-25 9:25 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cywvenbd.fsf@suse.de \
--to=farosas@suse.de \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=cfontana@suse.de \
--cc=eblake@redhat.com \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).