qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	qemu-devel@nongnu.org, Greg Kurz <groug@kaod.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Murilo Opsfelder Araujo <muriloo@linux.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Stefan Weil <sw@weilnetz.de>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH v1 00/13] Ram blocks with resizable anonymous allocations under POSIX
Date: Thu, 6 Feb 2020 21:31:06 +0100	[thread overview]
Message-ID: <13585E49-B84C-41D8-8825-F96841F475D0@redhat.com> (raw)
In-Reply-To: <20200206201121.GM3655@work-vm>



> Am 06.02.2020 um 21:11 schrieb Dr. David Alan Gilbert <dgilbert@redhat.com>:
> 
> * David Hildenbrand (david@redhat.com) wrote:
>> We already allow resizable ram blocks for anonymous memory, however, they
>> are not actually resized. All memory is mmaped() R/W, including the memory
>> exceeding the used_length, up to the max_length.
>> 
>> When resizing, effectively only the boundary is moved. Implement actually
>> resizable anonymous allocations and make use of them in resizable ram
>> blocks when possible. Memory exceeding the used_length will be
>> inaccessible. Especially ram block notifiers require care.
>> 
>> Having actually resizable anonymous allocations (via mmap-hackery) allows
>> to reserve a big region in virtual address space and grow the
>> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
>> is set to "never" under Linux, huge reservations will succeed. If there is
>> not enough memory when resizing (to populate parts of the reserved region),
>> trying to resize will fail. Only the actually used size is reserved in the
>> OS.
>> 
>> E.g., virtio-mem [1] wants to reserve big resizable memory regions and
>> grow the usable part on demand. I think this change is worth sending out
>> individually. Accompanied by a bunch of minor fixes and cleanups.
>> 
>> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-david@redhat.com/
> 
> There's a few bits I've not understood from skimming the patches:
> 

Thanks for having a look!

>  a) Am I correct in thinking you PROT_NONE the extra space so you can
> gkrow/shrink it?

Yes!

>  b) What does kvm see - does it have a slot for the whole space or for
> only the used space?

Only the used space. Resizing triggers a resize of the memory region. That triggers memory notifiers, which remove the old kvm memslot and re-add the new kvm memslot. (That‘s existing handling, so nothing new).

So KVM will not see PROT_NONE when creating a slot.

>     I ask because we found with virtiofs/DAX experiments that on Power,
> kvm gets upset if you give it a mapping with PROT_NONE.
>     (That maybe less of an issue if you change the mapping after the
> slot is created).

That should work as expected. Resizing *while* kvm is running is tricky, but that‘s not part of this series and a different story :) right now, resizing is only valid on reboot/incoming migration.

> 
>  c) It's interesting this is keyed off the RAMBlock notifiers - do
>     memory_listener's on the address space the block is mapped into get
>    triggered?  I'm wondering how vhost (and vhost-user) in particular
>    see this.

Yes, memory listeners get triggered. Old region is removed, new one is added. Nothing changed on that front.

The issue with ram block notifiers is that they did not do a „remove old, add new“ on resizes. They only added the full ram block. Bad. E.g., vfio wants to pin all memory - which would fail on PROT_NONE.

E.g., for HAX, there is no kernel ioctl to remove a ram block ... for SEV there is, but I am not sure about the implications when converting back and forth between encrypted/unencrypted. So SEV and HAX require legacy handling.

Cheers!



  reply	other threads:[~2020-02-06 20:32 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 18:31 [PATCH v1 00/13] Ram blocks with resizable anonymous allocations under POSIX David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 01/13] util: vfio-helpers: Factor out and fix processing of existings ram blocks David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 02/13] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings() David Hildenbrand
2020-02-06 11:42   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 03/13] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() David Hildenbrand
2020-02-06 11:43   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 04/13] exec: Drop "shared" parameter from ram_block_add() David Hildenbrand
2020-02-06 11:44   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 05/13] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() David Hildenbrand
2020-02-05 19:37   ` Murilo Opsfelder Araújo
2020-02-06 11:46   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 06/13] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2020-02-05 19:40   ` Murilo Opsfelder Araújo
2020-02-06 11:55   ` Richard Henderson
2020-02-06 13:16     ` David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 07/13] util/mmap-alloc: Factor out populating of memory to mmap_populate() David Hildenbrand
2020-02-05 19:56   ` Murilo Opsfelder Araújo
2020-02-06  9:26     ` David Hildenbrand
2020-02-06 11:59   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 08/13] util/mmap-alloc: Prepare for resizable mmaps David Hildenbrand
2020-02-05 23:00   ` Murilo Opsfelder Araújo
2020-02-06  8:52     ` David Hildenbrand
2020-02-06 12:31       ` Murilo Opsfelder Araújo
2020-02-06 13:16         ` David Hildenbrand
2020-02-06 15:13     ` David Hildenbrand
2020-02-06 12:02   ` Richard Henderson
2020-02-03 18:31 ` [PATCH v1 09/13] util/mmap-alloc: Implement " David Hildenbrand
2020-02-06 12:08   ` Richard Henderson
2020-02-06 13:22     ` David Hildenbrand
2020-02-06 15:27       ` David Hildenbrand
2020-02-07  0:29   ` Murilo Opsfelder Araújo
2020-02-10  9:39     ` David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 10/13] numa: Introduce ram_block_notify_resized() and ram_block_notifiers_support_resize() David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 11/13] util: vfio-helpers: Implement ram_block_resized() David Hildenbrand
2020-02-10 13:41   ` David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 12/13] util: oslib: Resizable anonymous allocations under POSIX David Hildenbrand
2020-02-03 18:31 ` [PATCH v1 13/13] exec: Ram blocks with resizable " David Hildenbrand
2020-02-10 10:12   ` David Hildenbrand
2020-02-06  9:27 ` [PATCH v1 00/13] " Michael S. Tsirkin
2020-02-06  9:45   ` David Hildenbrand
2020-02-06 20:11 ` Dr. David Alan Gilbert
2020-02-06 20:31   ` David Hildenbrand [this message]
2020-02-07 15:28     ` Dr. David Alan Gilbert
2020-02-10  9:47       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13585E49-B84C-41D8-8825-F96841F475D0@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=groug@kaod.org \
    --cc=mst@redhat.com \
    --cc=muriloo@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).