From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42029) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHrKH-0004Yg-Nh for qemu-devel@nongnu.org; Tue, 28 Jun 2016 07:37:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bHrKE-0006Fx-Fh for qemu-devel@nongnu.org; Tue, 28 Jun 2016 07:37:09 -0400 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:33781) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHrKE-0006Fk-4v for qemu-devel@nongnu.org; Tue, 28 Jun 2016 07:37:06 -0400 Received: by mail-wm0-x242.google.com with SMTP id r201so5188733wme.0 for ; Tue, 28 Jun 2016 04:37:06 -0700 (PDT) Sender: Paolo Bonzini References: <1467104499-27517-1-git-send-email-pl@kamp.de> From: Paolo Bonzini Message-ID: Date: Tue, 28 Jun 2016 13:37:02 +0200 MIME-Version: 1.0 In-Reply-To: <1467104499-27517-1-git-send-email-pl@kamp.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven , qemu-devel@nongnu.org Cc: kwolf@redhat.com, peter.maydell@linaro.org, mst@redhat.com, dgilbert@redhat.com, mreitz@redhat.com, kraxel@redhat.com On 28/06/2016 11:01, Peter Lieven wrote: > I recently found that Qemu is using several hundred megabytes of RSS memory > more than older versions such as Qemu 2.2.0. So I started tracing > memory allocation and found 2 major reasons for this. > > 1) We changed the qemu coroutine pool to have a per thread and a global release > pool. The choosen poolsize and the changed algorithm could lead to up to > 192 free coroutines with just a single iothread. Each of the coroutines > in the pool each having 1MB of stack memory. But the fix, as you correctly note, is to reduce the stack size. It would be nice to compile block-obj-y with -Wstack-usage=2048 too. > 2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed freeing > of memory. This lead to higher heap allocations which could not effectively > be returned to kernel (most likely due to fragmentation). I agree that some of the exec.c allocations need some care, but I would prefer to use a custom free list or lazy allocation instead of mmap. Changing allocations to use mmap also is not really useful if you do it for objects that are never freed (as in patches 8-9-10-15 at least, and probably 11 too which is one of the most contentious). In other words, the effort tracking down the allocation is really, really appreciated. But the patches look like you only had a hammer at hand, and everything looked like a nail. :) Paolo > The following series is what I came up with. Beside the coroutine patches I changed > some allocations to forcibly use mmap. All these allocations are not repeatly made > during runtime so the impact of using mmap should be neglectible. > > There are still some big malloced allocations left which cannot be easily changed > (e.g. the pixman buffers in VNC). So it might an idea to set a lower mmap threshold for > malloc since this threshold seems to be in the order of several Megabytes on modern systems. > > Peter Lieven (15): > coroutine-ucontext: mmap stack memory > coroutine-ucontext: add a switch to monitor maximum stack size > coroutine-ucontext: reduce stack size to 64kB > coroutine: add a knob to disable the shared release pool > util: add a helper to mmap private anonymous memory > exec: use mmap for subpages > qapi: use mmap for QmpInputVisitor > virtio: use mmap for VirtQueue > loader: use mmap for ROMs > vmware_svga: use mmap for scratch pad > qom: use mmap for bigger Objects > util: add a function to realloc mmapped memory > exec: use mmap for PhysPageMap->nodes > vnc-tight: make the encoding palette static > vnc: use mmap for VncState > > configure | 33 ++++++++++++++++++-- > exec.c | 11 ++++--- > hw/core/loader.c | 16 +++++----- > hw/display/vmware_vga.c | 3 +- > hw/virtio/virtio.c | 5 +-- > include/qemu/mmap-alloc.h | 7 +++++ > include/qom/object.h | 1 + > qapi/qmp-input-visitor.c | 5 +-- > qom/object.c | 20 ++++++++++-- > ui/vnc-enc-tight.c | 21 ++++++------- > ui/vnc.c | 5 +-- > ui/vnc.h | 1 + > util/coroutine-ucontext.c | 66 +++++++++++++++++++++++++++++++++++++-- > util/mmap-alloc.c | 27 ++++++++++++++++ > util/qemu-coroutine.c | 79 ++++++++++++++++++++++++++--------------------- > 15 files changed, 225 insertions(+), 75 deletions(-) >