Re: [PATCH 0/2] overcommit: introduce mem-lock-onfault

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Daniil Tatianin <d-tatianin@yandex-team.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Stefan Weil <sw@weilnetz.de>,
	Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH 0/2] overcommit: introduce mem-lock-onfault
Date: Thu, 5 Dec 2024 20:08:53 -0500	[thread overview]
Message-ID: <Z1JOpadES2iV_i0v@x1n> (raw)
In-Reply-To: <20241205231909.1161950-1-d-tatianin@yandex-team.ru>

On Fri, Dec 06, 2024 at 02:19:06AM +0300, Daniil Tatianin wrote:
> Currently, passing mem-lock=on to QEMU causes memory usage to grow by
> huge amounts:
> 
> no memlock:
>     $ qemu-system-x86_64 -overcommit mem-lock=off
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     45652
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     39756
> 
> memlock:
>     $ qemu-system-x86_64 -overcommit mem-lock=on
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     1309876
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     259956
> 
> This is caused by the fact that mlockall(2) automatically
> write-faults every existing and future anonymous mappings in the
> process right away.
> 
> One of the reasons to enable mem-lock is to protect a QEMU process'
> pages from being compacted and migrated by kcompactd (which does so
> by messing with a live process page tables causing thousands of TLB
> flush IPIs per second) basically stealing all guest time while it's
> active.
> 
> mem-lock=on helps against this (given compact_unevictable_allowed is 0),
> but the memory overhead it introduces is an undesirable side effect,
> which we can completely avoid by passing MCL_ONFAULT to mlockall, which
> is what this series allows to do with a new command line option called
> mem-lock-onfault.

IMHO it'll be always helpful to dig and provide information on why such
difference existed.  E.g. guest mem should normally be the major mem sink
and that definitely won't be affected by either ON_FAULT or not.

I had a quick look explicitly on tcg (as that really surprised me a bit..).
When you look at the mappings there's 1G constant shmem map that always got
locked and populated.

It turns out to be tcg's jit buffer, alloc_code_gen_buffer_splitwx_memfd:

    buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, &fd, errp);
    if (buf_rw == NULL) {
        goto fail;
    }

    buf_rx = mmap(NULL, size, host_prot_read_exec(), MAP_SHARED, fd, 0);
    if (buf_rx == MAP_FAILED) {
        error_setg_errno(errp, errno,
                         "failed to map shared memory for execute");
        goto fail;
    }

Looks like that's the major reason why tcg has mlockall bloated constantly
with roughly 1G size - that seems to be from tcg_init_machine().  I didn't
check kvm.

Logically having a on-fault option won't ever hurt, so probably not an
issue to have it anyway.  Still, share my finding above, as IIUC that's
mostly why it was bloated for tcg, so maybe there're other options too.

> 
> memlock-onfault:
>     $ qemu-system-x86_64 -overcommit mem-lock-onfault=on
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     54004
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock-onfault=on -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     47772
> 
> You may notice the memory usage is still slightly higher, in this case
> by a few megabytes over the mem-lock=off case. I was able to trace this
> down to a bug in the linux kernel with MCL_ONFAULT not being honored for
> the early process heap (with brk(2) etc.) so it is still write-faulted in
> this case, but it's still way less than it was with just the mem-lock=on.
> 
> Daniil Tatianin (2):
>   os: add an ability to lock memory on_fault
>   overcommit: introduce mem-lock-onfault
> 
>  include/sysemu/os-posix.h |  2 +-
>  include/sysemu/os-win32.h |  3 ++-
>  include/sysemu/sysemu.h   |  1 +
>  migration/postcopy-ram.c  |  4 ++--
>  os-posix.c                | 10 ++++++++--
>  qemu-options.hx           | 13 ++++++++++---
>  system/globals.c          |  1 +
>  system/vl.c               | 18 ++++++++++++++++--
>  8 files changed, 41 insertions(+), 11 deletions(-)
> 
> -- 
> 2.34.1
> 
> 

-- 
Peter Xu

next prev parent reply	other threads:[~2024-12-06  1:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-05 23:19 [PATCH 0/2] overcommit: introduce mem-lock-onfault Daniil Tatianin
2024-12-05 23:19 ` [PATCH 1/2] os: add an ability to lock memory on_fault Daniil Tatianin
2024-12-05 23:19 ` [PATCH 2/2] overcommit: introduce mem-lock-onfault Daniil Tatianin
2024-12-06  1:08 ` Peter Xu [this message]
2024-12-09  7:40   ` [PATCH 0/2] " Daniil Tatianin
2024-12-10 16:48     ` Peter Xu
2024-12-10 17:01       ` Daniil Tatianin
2024-12-10 17:20         ` Peter Xu
2024-12-10 17:23           ` Daniil Tatianin
2024-12-10 14:48 ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1JOpadES2iV_i0v@x1n \
    --to=peterx@redhat.com \
    --cc=d-tatianin@yandex-team.ru \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.