qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Juan Quintela" <quintela@redhat.com>,
	"Marcel Apfelbaum" <mapfelba@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Stefan Weil" <sw@weilnetz.de>,
	"Murilo Opsfelder Araujo" <muriloo@linux.ibm.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Peter Xu" <peterx@redhat.com>, "Greg Kurz" <groug@kaod.org>,
	"Halil Pasic" <pasic@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Igor Kotrasinski" <i.kotrasinsk@partner.samsung.com>
Subject: Re: [PATCH v3 11/12] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE
Date: Wed, 10 Mar 2021 11:28:56 +0100	[thread overview]
Message-ID: <ba7a08d4-3cef-6c0a-5fb7-4f8837eb8d65@redhat.com> (raw)
In-Reply-To: <20210308150600.14440-12-david@redhat.com>

On 08.03.21 16:05, David Hildenbrand wrote:
> Let's support RAM_NORESERVE via MAP_NORESERVE. At least on Linux,
> the flag has no effect on most shared mappings - except for hugetlbfs
> and anonymous memory.
> 
> Linux man page:
>    "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
>    space is reserved, one has the guarantee that it is possible to modify
>    the mapping. When swap space is not reserved one might get SIGSEGV
>    upon a write if no physical memory is available. See also the discussion
>    of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
>    2.6, this flag had effect only for private writable mappings."
> 
> Note that the "guarantee" part is wrong with memory overcommit in Linux.
> 
> Also, in Linux hugetlbfs is treated differently - we configure reservation
> of huge pages from the pool, not reservation of swap space (huge pages
> cannot be swapped).
> 
> The rough behavior is [1]:
> a) !Hugetlbfs:
> 
>    1) Without MAP_NORESERVE *or* with memory overcommit under Linux
>       disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
>       accounting/reservation happens:
>        For a file backed map
>         SHARED or READ-only - 0 cost (the file is the map not swap)
>         PRIVATE WRITABLE - size of mapping per instance
> 
>        For an anonymous or /dev/zero map
>         SHARED   - size of mapping
>         PRIVATE READ-only - 0 cost (but of little use)
>         PRIVATE WRITABLE - size of mapping per instance
> 
>    2) With MAP_NORESERVE, no accounting/reservation happens.
> 
> b) Hugetlbfs:
> 
>    1) Without MAP_NORESERVE, huge pages are reserved.
> 
>    2) With MAP_NORESERVE, no huge pages are reserved.
> 
> Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
> to configure it for !hugetlbfs globally; this toggle now allows
> configuring it more fine-grained, not for the whole system.
> 
> The target use case is virtio-mem, which dynamically exposes memory
> inside a large, sparse memory area to the VM.
> 
> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   softmmu/physmem.c |  1 +
>   util/mmap-alloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index dcc1fb74aa..199c5a4985 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2229,6 +2229,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
>                   flags = MAP_FIXED;
>                   flags |= block->flags & RAM_SHARED ?
>                            MAP_SHARED : MAP_PRIVATE;
> +                flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
>                   if (block->fd >= 0) {
>                       area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
>                                   flags, block->fd, offset);
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index ecace41ad5..c511a68bbe 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -20,6 +20,7 @@
>   #include "qemu/osdep.h"
>   #include "qemu/mmap-alloc.h"
>   #include "qemu/host-utils.h"
> +#include "qemu/cutils.h"
>   #include "qemu/error-report.h"
>   
>   #define HUGETLBFS_MAGIC       0x958458f6
> @@ -125,6 +126,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, uint32_t mmap_flags,
>       const bool readonly = mmap_flags & QEMU_RAM_MMAP_READONLY;
>       const bool shared = mmap_flags & QEMU_RAM_MMAP_SHARED;
>       const bool is_pmem = mmap_flags & QEMU_RAM_MMAP_PMEM;
> +    const bool noreserve = mmap_flags & QEMU_RAM_MMAP_NORESERVE;
>       const int prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
>       int map_sync_flags = 0;
>       int flags = MAP_FIXED;
> @@ -132,6 +134,7 @@ static void *mmap_activate(void *ptr, size_t size, int fd, uint32_t mmap_flags,
>   
>       flags |= fd == -1 ? MAP_ANONYMOUS : 0;
>       flags |= shared ? MAP_SHARED : MAP_PRIVATE;
> +    flags |= noreserve ? MAP_NORESERVE : 0;
>       if (shared && is_pmem) {
>           map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
>       }
> @@ -174,6 +177,66 @@ static inline size_t mmap_guard_pagesize(int fd)
>   #endif
>   }
>   
> +#define OVERCOMMIT_MEMORY_PATH "/proc/sys/vm/overcommit_memory"
> +static bool map_noreserve_effective(int fd, uint32_t mmap_flags)
> +{
> +#if defined(__linux__)
> +    const bool readonly = mmap_flags & QEMU_RAM_MMAP_READONLY;
> +    const bool shared = mmap_flags & QEMU_RAM_MMAP_SHARED;
> +    gchar *content = NULL;
> +    const char *endptr;
> +    unsigned int tmp;
> +
> +    /*
> +     * hugeltb accounting is different than ordinary swap reservation:
> +     * a) Hugetlb pages from the pool are reserved for both private and
> +     *    shared mappings. For shared mappings, reservations are tracked
> +     *    per file -- all mappers have to specify MAP_NORESERVE.
> +     * b) MAP_NORESERVE is not affected by /proc/sys/vm/overcommit_memory.
> +     */
> +    if (qemu_fd_getpagesize(fd) != qemu_real_host_page_size) {
> +        return true;
> +    }
> +
> +    /*
> +     * Accountable mappings in the kernel that can be affected by MAP_NORESEVE
> +     * are private writable mappings (see mm/mmap.c:accountable_mapping() in
> +     * Linux). For all shared or readonly mappings, MAP_NORESERVE is always
> +     * implicitly active -- no reservation; this includes shmem. The only
> +     * exception is shared anonymous memory, it is accounted like private
> +     * anonymous memory.
> +     */
> +    if (readonly || (shared && fd >= 0)) {
> +        return true;
> +    }
> +
> +    /*
> +     * MAP_NORESERVE is globally ignored for private writable mappings when
> +     * overcommit is set to "never". Sparse memory regions aren't really
> +     * possible in this system configuration.
> +     *
> +     * Bail out now instead of silently committing way more memory than
> +     * currently desired by the user.
> +     */
> +    if (g_file_get_contents(OVERCOMMIT_MEMORY_PATH, &content, NULL, NULL) &&
> +        !qemu_strtoui(content, &endptr, 0, &tmp) &&
> +        (!endptr || *endptr == '\n')) {
> +        if (tmp == 2) {
> +            error_report("Skipping reservation of swap space is not supported:"
> +                         " \"" OVERCOMMIT_MEMORY_PATH "\" is \"2\"");
> +            return false;
> +        }
> +        return true;
> +    }
> +    /* this interface has been around since Linux 2.6 */
> +    error_report("Skipping reservation of swap space is not supported:"
> +                 " Could not read: \"" OVERCOMMIT_MEMORY_PATH "\"");
> +    return false;
> +#else

I'll return "false" here for now after learning that e.g., FreeBSD never 
implemented the flag and removed it a while ago
	https://github.com/Clozure/ccl/issues/17

So I'll unlock it only for Linux - which makes sense, because I only 
test there (and only care about Linux with MAP_NORESERVE)

> +    return true;


-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-03-10 10:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 15:05 [PATCH v3 00/12] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 01/12] softmmu/physmem: Mark shared anonymous memory RAM_SHARED David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory David Hildenbrand
2021-03-11 16:39   ` Dr. David Alan Gilbert
2021-03-11 16:45     ` David Hildenbrand
2021-03-11 17:11       ` Peter Xu
2021-03-11 17:15         ` David Hildenbrand
2021-03-11 17:18           ` David Hildenbrand
2021-03-11 17:22           ` Peter Xu
2021-03-11 17:41             ` David Hildenbrand
2021-03-11 21:25               ` Peter Xu
2021-03-11 21:37   ` Peter Xu
2021-03-11 21:49     ` David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 03/12] softmmu/physmem: Fix qemu_ram_remap() " David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 04/12] util/mmap-alloc: Factor out calculation of the pagesize for the guard page David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 05/12] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 06/12] util/mmap-alloc: Factor out activating of memory to mmap_activate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 07/12] softmmu/memory: Pass ram_flags into qemu_ram_alloc_from_fd() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 08/12] softmmu/memory: Pass ram_flags into memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 09/12] util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap() David Hildenbrand
2021-03-09 20:04   ` Peter Xu
2021-03-09 20:27     ` David Hildenbrand
2021-03-09 20:58       ` Peter Xu
2021-03-10  8:41         ` David Hildenbrand
2021-03-10 10:11           ` David Hildenbrand
2021-03-10 10:55             ` David Hildenbrand
2021-03-10 16:27               ` Peter Xu
2021-03-08 15:05 ` [PATCH v3 10/12] memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-03-08 15:05 ` [PATCH v3 11/12] util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE David Hildenbrand
2021-03-10 10:28   ` David Hildenbrand [this message]
2021-03-08 15:06 ` [PATCH v3 12/12] hostmem: Wire up RAM_NORESERVE via "reserve" property David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba7a08d4-3cef-6c0a-5fb7-4f8837eb8d65@redhat.com \
    --to=david@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=groug@kaod.org \
    --cc=i.kotrasinsk@partner.samsung.com \
    --cc=imammedo@redhat.com \
    --cc=mapfelba@redhat.com \
    --cc=mst@redhat.com \
    --cc=muriloo@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=sw@weilnetz.de \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).