qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Thomas Huth" <thuth@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Stefan Weil" <sw@weilnetz.de>,
	"Murilo Opsfelder Araujo" <muriloo@linux.ibm.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-devel@nongnu.org, "Halil Pasic" <pasic@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Greg Kurz" <groug@kaod.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Igor Kotrasinski" <i.kotrasinsk@partner.samsung.com>
Subject: Re: [PATCH v1 8/9] util/mmap-alloc: support RAM_NORESERVE via MAP_NORESERVE
Date: Tue, 2 Mar 2021 16:44:44 -0500	[thread overview]
Message-ID: <20210302214444.GQ397383@xz-x1> (raw)
In-Reply-To: <522c672e-9c16-48ef-24a8-3687b5332b2a@redhat.com>

On Tue, Mar 02, 2021 at 08:01:11PM +0100, David Hildenbrand wrote:
> On 02.03.21 18:51, Peter Xu wrote:
> > On Tue, Feb 09, 2021 at 02:49:38PM +0100, David Hildenbrand wrote:
> > > +#define OVERCOMMIT_MEMORY_PATH "/proc/sys/vm/overcommit_memory"
> > > +static bool map_noreserve_effective(int fd, bool shared)
> > > +{
> > > +#if defined(__linux__)
> > > +    gchar *content = NULL;
> > > +    const char *endptr;
> > > +    unsigned int tmp;
> > > +
> > > +    /* hugetlbfs behaves differently */
> > > +    if (qemu_fd_getpagesize(fd) != qemu_real_host_page_size) {
> > > +        return true;
> > > +    }
> > > +
> > > +    /* only private shared mappings are accounted (ignoring /dev/zero) */
> > > +    if (fd != -1 && shared) {
> > > +        return true;
> > > +    }

[1]

> > > +
> > > +    if (g_file_get_contents(OVERCOMMIT_MEMORY_PATH, &content, NULL, NULL) &&
> > > +        !qemu_strtoui(content, &endptr, 0, &tmp) &&
> > > +        (!endptr || *endptr == '\n')) {
> > > +        if (tmp == 2) {
> > > +            error_report("Skipping reservation of swap space is not supported: "
> > > +                         " \"" OVERCOMMIT_MEMORY_PATH "\" is \"2\"");
> > > +            return false;
> > > +        }
> > > +        return true;
> > > +    }
> > > +    /* this interface has been around since Linux 2.6 */
> > > +    error_report("Skipping reservation of swap space is not supported: "
> > > +                 " Could not read: \"" OVERCOMMIT_MEMORY_PATH "\"");
> > > +    return false;
> > > +#else
> > > +    return true;
> > > +#endif
> > > +}
> > 
> > I feel like this helper wants to fail gracefully for some conditions.  Could
> > you elaborate one example and attach to the commit log?
> 
> Sure. The case is "/proc/sys/vm/overcommit_memory == 2" (never overcommit)
> 
> MAP_NORESERVE is without effect and sparse memory regions are somewhat
> impossible.
> 
> > 
> > I'm also wondering whether it would worth to check the global value.  Even if
> > overcommit is globally disabled, do we (as an application process) need to care
> > about it?  I think the MAP_NORESERVE would simply be silently ignored by the
> > kernel and that seems to be design of it, otherwise would all apps who uses > MAP_NORESERVE would need to do similar things too?
> 
> Right, I want to catch the "gets silently ignored" part, because someone
> requested "reserved=off" (!default) but does not actually get what he asked
> for.
> 
> As one example, glibc manages heaps via:
> 
> a) Creating a new heap: mmap(PROT_NONE, MAP_NORESERVE) the maximum size,
> then mprotect(PROT_READ|PROT_WRITE) the initial heap size. Even if
> MAP_NORESERVE is ignored, only !PROT_NONE memory ever gets committed
> ("reserve swap space") in Linux.
> 
> b) Growing the heap via mprotect(PROT_READ|PROT_WRITE) within the existing
> mmap. This will commit memory in case MAP_NORESERVE got ignored.
> 
> c) Shrinking the heap ("discard memory") via MADV_DONTNEED *unless*
> "/proc/sys/vm/overcommit_memory == 2" - the only way to undo
> mprotect(PROT_READ|PROT_WRITE) and to un-commit memory is by doing a
> mmap(PROT_NONE, MAP_FIXED) over the problematic region.
> 
> If you're interested, you can take a look at:
> 
> malloc/arena.c
> sysdeps/unix/sysv/linux/malloc-sysdep.h:check_may_shrink_heap()

Thanks for the context.  It's interesting to know libc has such special heap
operations.

Glibc shrinks heap to save memory for the no-over-commit case, however in our
case currently we'd like to fail some users using global_overcommit=2 but
reserve=off - it means even if we don't fail the user, mmap() could also fail
if it's overcommitted. Even if this mmap() didn't fail, it'll fail very easily
later on iiuc, right?

I think it's fine to have that early failure, it just seems less helpful than
what glibc was doing which shrinks active memory for real, meanwhile there
seems to encode some very detailed OS information into this helper, so just
less charming.

Btw above [1] "fd != -1 && shared" looks weird to me.

Firstly it'll bypass overcommit_memory==2 check and return true directly, is
that right?  I thought the global will be meaningful for all memories except
hugetlbfs (in do_mmap() of Linux).

Meanwhile, I don't see why file-backed share memories is so special too..  From
your commit message, I'm not sure whether you wanted to return false instead,
however that's still not the case IIUC, since e.g. /dev/shmem still does
accounting iiuc, while MAP_NORESERVE will skip it.

-- 
Peter Xu



  reply	other threads:[~2021-03-02 21:45 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09 13:49 [PATCH v1 0/9] RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 1/9] softmmu/physmem: drop "shared" parameter from ram_block_add() David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 2/9] util/mmap-alloc: factor out calculation of the pagesize for the guard page David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 3/9] util/mmap-alloc: factor out reserving of a memory region to mmap_reserve() David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 4/9] util/mmap-alloc: factor out activating of memory to mmap_activate() David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 5/9] softmmu/memory: pass ram_flags into qemu_ram_alloc_from_fd() David Hildenbrand
2021-03-02 17:17   ` Peter Xu
2021-02-09 13:49 ` [PATCH v1 6/9] softmmu/memory: pass ram_flags into memory_region_init_ram_shared_nomigrate() David Hildenbrand
2021-03-02 17:17   ` Peter Xu
2021-02-09 13:49 ` [PATCH v1 7/9] memory: introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() David Hildenbrand
2021-03-02 17:32   ` Peter Xu
2021-03-02 19:02     ` David Hildenbrand
2021-03-02 20:54       ` Peter Xu
2021-03-02 20:58         ` David Hildenbrand
2021-03-03 11:35       ` Cornelia Huck
2021-03-03 11:37         ` David Hildenbrand
2021-03-03 12:12           ` Thomas Huth
2021-03-03 12:24             ` David Hildenbrand
2021-03-03 11:39         ` Thomas Huth
2021-03-03 11:41           ` David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 8/9] util/mmap-alloc: support RAM_NORESERVE via MAP_NORESERVE David Hildenbrand
2021-03-02 17:51   ` Peter Xu
2021-03-02 19:01     ` David Hildenbrand
2021-03-02 21:44       ` Peter Xu [this message]
2021-03-03 10:14         ` David Hildenbrand
2021-03-03 17:05           ` Peter Xu
2021-03-04 16:15             ` David Hildenbrand
2021-02-09 13:49 ` [PATCH v1 9/9] hostmem: wire up RAM_NORESERVE via "reserve" property David Hildenbrand
2021-03-02 17:55   ` Peter Xu
2021-03-02 13:12 ` [PATCH v1 0/9] RAM_NORESERVE, MAP_NORESERVE and hostmem " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210302214444.GQ397383@xz-x1 \
    --to=peterx@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=groug@kaod.org \
    --cc=i.kotrasinsk@partner.samsung.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=muriloo@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=sw@weilnetz.de \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).