From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38135) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHuNN-0007y5-79 for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bHuNF-0005xk-HS for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:32 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:54547 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHuNF-0005wb-71 for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:25 -0400 References: <1467104499-27517-1-git-send-email-pl@kamp.de> <57726A20.4000808@kamp.de> <1564831478.2624143.1467116962342.JavaMail.zimbra@redhat.com> <57726E7E.3060709@kamp.de> <20160628125620.GI2243@work-vm> <57728D26.7080201@kamp.de> From: Peter Lieven Message-ID: <57728F11.3080802@kamp.de> Date: Tue, 28 Jun 2016 16:52:01 +0200 MIME-Version: 1.0 In-Reply-To: <57728D26.7080201@kamp.de> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Paolo Bonzini , qemu-devel@nongnu.org, kwolf@redhat.com, peter maydell , mst@redhat.com, mreitz@redhat.com, kraxel@redhat.com Am 28.06.2016 um 16:43 schrieb Peter Lieven: > Am 28.06.2016 um 14:56 schrieb Dr. David Alan Gilbert: >> * Peter Lieven (pl@kamp.de) wrote: >>> Am 28.06.2016 um 14:29 schrieb Paolo Bonzini: >>>>> Am 28.06.2016 um 13:37 schrieb Paolo Bonzini: >>>>>> On 28/06/2016 11:01, Peter Lieven wrote: >>>>>>> I recently found that Qemu is using several hundred megabytes of RSS >>>>>>> memory >>>>>>> more than older versions such as Qemu 2.2.0. So I started tracing >>>>>>> memory allocation and found 2 major reasons for this. >>>>>>> >>>>>>> 1) We changed the qemu coroutine pool to have a per thread and a global >>>>>>> release >>>>>>> pool. The choosen poolsize and the changed algorithm could lead to up >>>>>>> to >>>>>>> 192 free coroutines with just a single iothread. Each of the >>>>>>> coroutines >>>>>>> in the pool each having 1MB of stack memory. >>>>>> But the fix, as you correctly note, is to reduce the stack size. It >>>>>> would be nice to compile block-obj-y with -Wstack-usage=2048 too. >>>>> To reveal if there are any big stack allocations in the block layer? >>>> Yes. Most should be fixed by now, but a handful are probably still there. >>>> (definitely one in vvfat.c). >>>> >>>>> As it seems reducing to 64kB breaks live migration in some (non reproducible) cases. >>>> Does it hit the guard page? >>> How would that look like? I get segfaults like this: >>> >>> segfault at 7f91aa642b78 ip 0000555ab714ef7d sp 00007f91aa642b50 error 6 in qemu-system-x86_64[555ab6f2c000+794000] >>> >>> most of the time error 6. Sometimes error 7. segfault is near the sp. >> A backtrace would be good. > > Here we go. My old friend nc_senv_compat ;-) This has already been fixed in master. My test systems use an older Qemu ;-) Peter > > Again the question: Would you go for reducing the stack size an eliminating all stack eaters ? > > The static netbuf in nc_sendv_compat is no problem. > > And: I would go for adding the guard page without MAP_GROWSDOWN and mmaping the rest of the > stack with this flag if availble. So we are save on non Linux systems or Linux before 3.9 or merged memory regions. > > Peter > > --- > > Program received signal SIGSEGV, Segmentation fault. > 0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0) > at net/net.c:701 > (gdb) bt full > #0 0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0) > at net/net.c:701 > buf = '\000' ... > buffer = 0x0 > offset = 0 > #1 0x0000555555a2f058 in qemu_deliver_packet_iov (sender=0x5555565a46b0, > flags=0, iov=0x7ffff7e98d20, iovcnt=1, opaque=0x555557802370) > at net/net.c:745 > nc = 0x555557802370 > ret = 21845 > #2 0x0000555555a3132d in qemu_net_queue_deliver (queue=0x555557802590, > sender=0x5555565a46b0, flags=0, data=0x55555659e2a8 "", size=74) > at net/queue.c:163 > ret = -1 > iov = {iov_base = 0x55555659e2a8, iov_len = 74} > #3 0x0000555555a3178b in qemu_net_queue_flush (queue=0x555557802590) > at net/queue.c:260 > packet = 0x55555659e280 > ret = 21845 > #4 0x0000555555a2eb7a in qemu_flush_or_purge_queued_packets ( > nc=0x555557802370, purge=false) at net/net.c:629 > No locals. > #5 0x0000555555a2ebe4 in qemu_flush_queued_packets (nc=0x555557802370) > at net/net.c:642 > No locals. > #6 0x00005555557747b7 in virtio_net_set_status (vdev=0x555556fb32a8, > status=7 '\a') at /usr/src/qemu-2.5.0/hw/net/virtio-net.c:178 > ncs = 0x555557802370 > queue_started = true > n = 0x555556fb32a8 > __func__ = "virtio_net_set_status" > q = 0x555557308b50 > i = 0 > queue_status = 7 '\a' > #7 0x0000555555795501 in virtio_set_status (vdev=0x555556fb32a8, val=7 '\a') > at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:618 > k = 0x55555657eb40 > __func__ = "virtio_set_status" > #8 0x00005555557985e6 in virtio_vmstate_change (opaque=0x555556fb32a8, > running=1, state=RUN_STATE_RUNNING) > at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:1539 > vdev = 0x555556fb32a8 > qbus = 0x555556fb3240 > __func__ = "virtio_vmstate_change" > k = 0x555556570420 > backend_run = true > #9 0x00005555558592ae in vm_state_notify (running=1, state=RUN_STATE_RUNNING) > at vl.c:1601 > e = 0x555557320cf0 > next = 0x555557af4c40 > #10 0x000055555585737d in vm_start () at vl.c:756 > requested = RUN_STATE_MAX > #11 0x0000555555a209ec in process_incoming_migration_co (opaque=0x5555566a1600) > at migration/migration.c:392 > f = 0x5555566a1600 > local_err = 0x0 > mis = 0x5555575ab0e0 > ps = POSTCOPY_INCOMING_NONE > ret = 0 > #12 0x0000555555b61efd in coroutine_trampoline (i0=1465036928, i1=21845) > at util/coroutine-ucontext.c:80 > arg = {p = 0x55555752b080, i = {1465036928, 21845}} > self = 0x55555752b080 > co = 0x55555752b080 > #13 0x00007ffff5cb7800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > No symbol table info available. > #14 0x00007fffffffcb40 in ?? () > No symbol table info available. > #15 0x0000000000000000 in ?? () > No symbol table info available. > > >> >> Dave >> >>> >>>>>>> 2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed >>>>>>> freeing >>>>>>> of memory. This lead to higher heap allocations which could not >>>>>>> effectively >>>>>>> be returned to kernel (most likely due to fragmentation). >>>>>> I agree that some of the exec.c allocations need some care, but I would >>>>>> prefer to use a custom free list or lazy allocation instead of mmap. >>>>> This would only help if the elements from the free list would be allocated >>>>> using mmap? The issue is that RCU delays the freeing so that the number of >>>>> concurrent allocations is high and then a bunch is freed at once. If the memory >>>>> was malloced it would still have caused trouble. >>>> The free list should improve reuse and fragmentation. I'll take a look at >>>> lazy allocation of subpages, too. >>> Ok, that would be good. And for the PhsyPageMap we use mmap and try to avoid >>> the realloc? >>> >>> Peter >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- Mit freundlichen Grüßen Peter Lieven ........................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...........................................................