From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Juan Quintela" <quintela@redhat.com>,
"Marcel Apfelbaum" <mapfelba@redhat.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Eduardo Habkost" <ehabkost@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Stefan Weil" <sw@weilnetz.de>,
"Murilo Opsfelder Araujo" <muriloo@linux.ibm.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Peter Xu" <peterx@redhat.com>, "Greg Kurz" <groug@kaod.org>,
"Halil Pasic" <pasic@linux.ibm.com>,
"Christian Borntraeger" <borntraeger@de.ibm.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@redhat.com>,
"Igor Kotrasinski" <i.kotrasinsk@partner.samsung.com>
Subject: Re: [PATCH v3 11/12] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE
Date: Wed, 10 Mar 2021 11:28:56 +0100 [thread overview]
Message-ID: <ba7a08d4-3cef-6c0a-5fb7-4f8837eb8d65@redhat.com> (raw)
In-Reply-To: <20210308150600.14440-12-david@redhat.com>
On 08.03.21 16:05, David Hildenbrand wrote:
> Let's support RAM_NORESERVE via MAP_NORESERVE. At least on Linux,
> the flag has no effect on most shared mappings - except for hugetlbfs
> and anonymous memory.
>
> Linux man page:
> "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
> space is reserved, one has the guarantee that it is possible to modify
> the mapping. When swap space is not reserved one might get SIGSEGV
> upon a write if no physical memory is available. See also the discussion
> of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
> 2.6, this flag had effect only for private writable mappings."
>
> Note that the "guarantee" part is wrong with memory overcommit in Linux.
>
> Also, in Linux hugetlbfs is treated differently - we configure reservation
> of huge pages from the pool, not reservation of swap space (huge pages
> cannot be swapped).
>
> The rough behavior is [1]:
> a) !Hugetlbfs:
>
> 1) Without MAP_NORESERVE *or* with memory overcommit under Linux
> disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
> accounting/reservation happens:
> For a file backed map
> SHARED or READ-only - 0 cost (the file is the map not swap)
> PRIVATE WRITABLE - size of mapping per instance
>
> For an anonymous or /dev/zero map
> SHARED - size of mapping
> PRIVATE READ-only - 0 cost (but of little use)
> PRIVATE WRITABLE - size of mapping per instance
>
> 2) With MAP_NORESERVE, no accounting/reservation happens.
>
> b) Hugetlbfs:
>
> 1) Without MAP_NORESERVE, huge pages are reserved.
>
> 2) With MAP_NORESERVE, no huge pages are reserved.
>
> Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
> to configure it for !hugetlbfs globally; this toggle now allows
> configuring it more fine-grained, not for the whole system.
>
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM.
>
> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> softmmu/physmem.c | 1 +
> util/mmap-alloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 66 insertions(+), 1 deletion(-)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index dcc1fb74aa..199c5a4985 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2229,6 +2229,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
> flags = MAP_FIXED;
> flags |= block->flags & RAM_SHARED ?
> MAP_SHARED : MAP_PRIVATE;
> + flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
> if (block->fd >= 0) {
> area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
> flags, block->fd, offset);
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index ecace41ad5..c511a68bbe 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -20,6 +20,7 @@
> #include "qemu/osdep.h"
> #include "qemu/mmap-alloc.h"
> #include "qemu/host-utils.h"
> +#include "qemu/cutils.h"
> #include "qemu/error-report.h"
>
> #define HUGETLBFS_MAGIC 0x958458f6
> @@ -125,6 +126,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, uint32_t mmap_flags,
> const bool readonly = mmap_flags & QEMU_RAM_MMAP_READONLY;
> const bool shared = mmap_flags & QEMU_RAM_MMAP_SHARED;
> const bool is_pmem = mmap_flags & QEMU_RAM_MMAP_PMEM;
> + const bool noreserve = mmap_flags & QEMU_RAM_MMAP_NORESERVE;
> const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
> int map_sync_flags = 0;
> int flags = MAP_FIXED;
> @@ -132,6 +134,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, uint32_t mmap_flags,
>
> flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> flags |= shared ? MAP_SHARED : MAP_PRIVATE;
> + flags |= noreserve ? MAP_NORESERVE : 0;
> if (shared && is_pmem) {
> map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
> }
> @@ -174,6 +177,66 @@ static inline size_t mmap_guard_pagesize(int fd)
> #endif
> }
>
> +#define OVERCOMMIT_MEMORY_PATH "/proc/sys/vm/overcommit_memory"
> +static bool map_noreserve_effective(int fd, uint32_t mmap_flags)
> +{
> +#if defined(__linux__)
> + const bool readonly = mmap_flags & QEMU_RAM_MMAP_READONLY;
> + const bool shared = mmap_flags & QEMU_RAM_MMAP_SHARED;
> + gchar *content = NULL;
> + const char *endptr;
> + unsigned int tmp;
> +
> + /*
> + * hugeltb accounting is different than ordinary swap reservation:
> + * a) Hugetlb pages from the pool are reserved for both private and
> + * shared mappings. For shared mappings, reservations are tracked
> + * per file -- all mappers have to specify MAP_NORESERVE.
> + * b) MAP_NORESERVE is not affected by /proc/sys/vm/overcommit_memory.
> + */
> + if (qemu_fd_getpagesize(fd) != qemu_real_host_page_size) {
> + return true;
> + }
> +
> + /*
> + * Accountable mappings in the kernel that can be affected by MAP_NORESEVE
> + * are private writable mappings (see mm/mmap.c:accountable_mapping() in
> + * Linux). For all shared or readonly mappings, MAP_NORESERVE is always
> + * implicitly active -- no reservation; this includes shmem. The only
> + * exception is shared anonymous memory, it is accounted like private
> + * anonymous memory.
> + */
> + if (readonly || (shared && fd >= 0)) {
> + return true;
> + }
> +
> + /*
> + * MAP_NORESERVE is globally ignored for private writable mappings when
> + * overcommit is set to "never". Sparse memory regions aren't really
> + * possible in this system configuration.
> + *
> + * Bail out now instead of silently committing way more memory than
> + * currently desired by the user.
> + */
> + if (g_file_get_contents(OVERCOMMIT_MEMORY_PATH, &content, NULL, NULL) &&
> + !qemu_strtoui(content, &endptr, 0, &tmp) &&
> + (!endptr || *endptr == '\n')) {
> + if (tmp == 2) {
> + error_report("Skipping reservation of swap space is not supported:"
> + " \"" OVERCOMMIT_MEMORY_PATH "\" is \"2\"");
> + return false;
> + }
> + return true;
> + }
> + /* this interface has been around since Linux 2.6 */
> + error_report("Skipping reservation of swap space is not supported:"
> + " Could not read: \"" OVERCOMMIT_MEMORY_PATH "\"");
> + return false;
> +#else
I'll return "false" here for now after learning that e.g., FreeBSD never
implemented the flag and removed it a while ago
https://github.com/Clozure/ccl/issues/17
So I'll unlock it only for Linux - which makes sense, because I only
test there (and only care about Linux with MAP_NORESERVE)
> + return true;
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-03-10 10:33 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-08 15:05 [PATCH v3 00/12] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 01/12] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
2021-03-11 16:39 ` Dr. David Alan Gilbert
2021-03-11 16:45 ` David Hildenbrand
2021-03-11 17:11 ` Peter Xu
2021-03-11 17:15 ` David Hildenbrand
2021-03-11 17:18 ` David Hildenbrand
2021-03-11 17:22 ` Peter Xu
2021-03-11 17:41 ` David Hildenbrand
2021-03-11 21:25 ` Peter Xu
2021-03-11 21:37 ` Peter Xu
2021-03-11 21:49 ` David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 03/12] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 04/12] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 05/12] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 06/12] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 07/12] softmmu/memory: Pass ram_flags into qemu_ram_alloc_from_fd() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 08/12] softmmu/memory: Pass ram_flags into memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 09/12] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
2021-03-09 20:04 ` Peter Xu
2021-03-09 20:27 ` David Hildenbrand
2021-03-09 20:58 ` Peter Xu
2021-03-10 8:41 ` David Hildenbrand
2021-03-10 10:11 ` David Hildenbrand
2021-03-10 10:55 ` David Hildenbrand
2021-03-10 16:27 ` Peter Xu
2021-03-08 15:05 ` [PATCH v3 10/12] memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 11/12] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE David Hildenbrand
2021-03-10 10:28 ` David Hildenbrand [this message]
2021-03-08 15:06 ` [PATCH v3 12/12] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba7a08d4-3cef-6c0a-5fb7-4f8837eb8d65@redhat.com \
--to=david@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=groug@kaod.org \
--cc=i.kotrasinsk@partner.samsung.com \
--cc=imammedo@redhat.com \
--cc=mapfelba@redhat.com \
--cc=mst@redhat.com \
--cc=muriloo@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
--cc=sw@weilnetz.de \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).