From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH V2 01/13] machine: alloc-anon option
Date: Mon, 7 Oct 2024 15:05:04 -0400 [thread overview]
Message-ID: <ZwQw4P43IhoEIe66@x1n> (raw)
In-Reply-To: <47ba3f75-147d-4ea6-a576-eba6ef168643@redhat.com>
On Mon, Oct 07, 2024 at 06:23:26PM +0200, David Hildenbrand wrote:
> > Yes I thought it could be unconditionally. We can discuss downside below,
> > I think we can still use a new flag otherwise, but the idea would be the
> > same, where I want the flag to be explicit in the callers not implicitly
> > with the object type check, which I think can be hackish.
>
> I agree that the caller should specify it.
>
> But I don't think using shared memory where shared memory is not warranted
> is a reasonable approach.
>
> I'm quite surprise you're considering such changes with unclear impacts on
> other OSes besides Linux (Freedbsd? Windows that doeasn';t even support
> shared memory?) just to make one corner-case QEMU use case happy.
>
> But I'm sure there are valid reasons why you had that idea, so I'm happy to
> learn why using shared memory unconditionally here is better than providing
> a clean alternative path with the feature enabled and memfd actually being
> supported on the setup (e.g., newer Linux kernel).
I was thinking whether cpr-transfer can be enabled by default, so whenever
people want to use it, no code reset needed. It's also easier for Libvirt
to not need to add yet another machine flags if possible.
Currently this parameter is the only one left that needs to be manually
enabled on src. It means if we can get rid of it then any QEMU based VM on
Linux can do a cpr-transfer any time as long as QEMU supports it.
Without it, this new parameter will need to be manually enabled otherwise
another system code reboot / live migration needs to happen first without
CPR, just to enable this flag.
But yeah I don't think I think it all through, so I left my pure question.
I think it looks still like an option, the other option if we still want to
enable it by default is, keep the option, then only enable it on new
machines that is based on Linux.
OS dependency is definitely an issue. AFAICT CPR is only available for
Linux anyway, but I'm happy to be corrected.. IOW, those chunk of new code
(if only unconditionally done..) would need proper #ifdef, so that
non-Linux OSes work like before.
>
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > I think RAM_SHARED can actually be that flag already - I mean, in all paths
> > > > > > that we may create anon mem (but not memory-backend-* objects), is it
> > > > > > always safe we always switch to RAM_SHARED from anon?
> > > > >
> > > > > Do you mean only setting the flag (-> anonymous shmem) or switching also to
> > > > > memfd, which is a bigger change?
> > > >
> > > > Switching to memfd. I thought anon shmem (mmap(MAP_SHARED)) is mostly the
> > > > same internally, if we create memfd then mmap(MAP_SHARED) on top of it, no?
> > >
> > > Memfd is Linux specific, keep that in mind. Apart from that there shouldn't
> > > be much difference between anon shmem and memfd (there are memory commit
> > > differences, though).
> >
> > Could you elaborate the memory commit difference and what does that imply
> > to QEMU's usage?
>
> Note how memfd code passed VM_NORESERVE to shmem_file_setup() and
> shmem_zero_setup() effectively doesn't (unless MAP_NORESERVE was specified
> IIRC).
>
> Not sure if the change makes a big impact in QEMU's usage, it's just one of
> these differences between memfd and shared anonymous memory. (responding to
> your "mostly the same").
So yeah, I hoped the memory commit won't be a problem, because I think they
should be corner case MRs, and should be small.
Vram can take up to 16MB, that's the max I'm aware of, but indeed I don't
know all the use cases to be sure. I think it means some tens of MBs can
be accounted later during fault rather than upfront to fail QEMU from boot.
Ideally mgmt apps should leave enough space for these ones, but if we worry
on that we can stick with the current option (but create a new flag besides
RAM_SHARED).
>
> >
> > >
> > > Of course, there is a difference between anon memory and shmem, for example
> > > regarding what viritofsd faced (e.g., KSM) recently.
> >
> > The four paths shouldn't be KSM target, AFAICT.
>
> Do you have a good overview of what is deduplicated in practice and why
> these don't apply? For example, I thought these functions are also used for
> hosting the BIOS, and that might just be deduplciated between VMs?
I was thinking KSM was for merging OS/App pages (rather than BIOSs, which
are normally very, very small)? Though I could be wrong.
>
> Anyhow, there are obviously other differences with shmem vs. anonymous (THP
> handling, page fault performance, userfaultfd compatibility on older
> kernels) at least on Linux, but I have absolutely no clue how that would
> differ on other host OSes.
>
> None of them are major
>
> This is probably going to result in a bigger discussion, for which I don't
> have any time. So my opinion on it is above.
>
> Anyhow, this sounds like one of the suggestions I wouldn't suggest Steve to
> actually implement.
We don't need to make it a bigger discussion. If there's concern with it,
we can stick with a new flag.
The next question is whether if with a new flag we should enable it by
default sometimes (e.g. on new machine types on Linux). But when with a
new option, that can be discussed later.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-10-07 19:05 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 19:40 [PATCH V2 00/13] Live update: cpr-transfer Steve Sistare
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14 ` Peter Xu
2024-10-04 10:14 ` David Hildenbrand
2024-10-04 12:33 ` Peter Xu
2024-10-04 12:54 ` David Hildenbrand
2024-10-04 13:24 ` Peter Xu
2024-10-07 16:23 ` David Hildenbrand
2024-10-07 19:05 ` Peter Xu [this message]
2024-10-07 15:36 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-07 20:10 ` Peter Xu
2024-10-08 15:57 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27 ` Peter Xu
2024-10-07 20:52 ` Steven Sistare
2024-10-08 15:35 ` Peter Xu
2024-10-08 19:13 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49 ` Peter Xu
2024-10-07 16:28 ` Peter Xu
2024-10-08 15:17 ` Steven Sistare
2024-10-08 16:26 ` Peter Xu
2024-10-08 21:05 ` Steven Sistare
2024-10-08 21:32 ` Peter Xu
2024-10-31 20:32 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06 ` Peter Xu
2024-10-07 16:35 ` Daniel P. Berrangé
2024-10-07 18:12 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-08 15:36 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44 ` Peter Xu
2024-10-07 20:39 ` Steven Sistare
2024-10-08 15:45 ` Peter Xu
2024-10-08 19:12 ` Steven Sistare
2024-10-08 19:38 ` Peter Xu
2024-10-08 18:28 ` Fabiano Rosas
2024-10-08 18:47 ` Peter Xu
2024-10-08 19:11 ` Fabiano Rosas
2024-10-08 19:33 ` Steven Sistare
2024-10-08 19:48 ` Peter Xu
2024-10-09 18:43 ` Steven Sistare
2024-10-09 19:06 ` Peter Xu
2024-10-09 19:59 ` Peter Xu
2024-10-09 20:18 ` Steven Sistare
2024-10-09 20:57 ` Peter Xu
2024-10-09 22:08 ` Fabiano Rosas
2024-10-10 20:05 ` Steven Sistare
2024-10-09 20:09 ` Steven Sistare
2024-10-09 20:36 ` Peter Xu
2024-10-10 20:06 ` Steven Sistare
2024-10-10 21:23 ` Peter Xu
2024-10-24 21:12 ` Steven Sistare
2024-10-25 13:55 ` Peter Xu
2024-10-25 15:04 ` Steven Sistare
2024-10-08 19:29 ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwQw4P43IhoEIe66@x1n \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=imammedo@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).