From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38135)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bHuNN-0007y5-79
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:34 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1bHuNF-0005xk-HS
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:32 -0400
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:54547 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bHuNF-0005wb-71
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:52:25 -0400
References: <1467104499-27517-1-git-send-email-pl@kamp.de>
	<db2ee39f-a46b-0e91-15e1-cfd20fe82e60@redhat.com>
	<57726A20.4000808@kamp.de>
	<1564831478.2624143.1467116962342.JavaMail.zimbra@redhat.com>
	<57726E7E.3060709@kamp.de> <20160628125620.GI2243@work-vm>
	<57728D26.7080201@kamp.de>
From: Peter Lieven <pl@kamp.de>
Message-ID: <57728F11.3080802@kamp.de>
Date: Tue, 28 Jun 2016 16:52:01 +0200
MIME-Version: 1.0
In-Reply-To: <57728D26.7080201@kamp.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org, kwolf@redhat.com, peter maydell <peter.maydell@linaro.org>, mst@redhat.com, mreitz@redhat.com, kraxel@redhat.com

Am 28.06.2016 um 16:43 schrieb Peter Lieven:
> Am 28.06.2016 um 14:56 schrieb Dr. David Alan Gilbert:
>> * Peter Lieven (pl@kamp.de) wrote:
>>> Am 28.06.2016 um 14:29 schrieb Paolo Bonzini:
>>>>> Am 28.06.2016 um 13:37 schrieb Paolo Bonzini:
>>>>>> On 28/06/2016 11:01, Peter Lieven wrote:
>>>>>>> I recently found that Qemu is using several hundred megabytes of RSS
>>>>>>> memory
>>>>>>> more than older versions such as Qemu 2.2.0. So I started tracing
>>>>>>> memory allocation and found 2 major reasons for this.
>>>>>>>
>>>>>>> 1) We changed the qemu coroutine pool to have a per thread and a global
>>>>>>> release
>>>>>>>       pool. The choosen poolsize and the changed algorithm could lead to up
>>>>>>>       to
>>>>>>>       192 free coroutines with just a single iothread. Each of the
>>>>>>>       coroutines
>>>>>>>       in the pool each having 1MB of stack memory.
>>>>>> But the fix, as you correctly note, is to reduce the stack size.  It
>>>>>> would be nice to compile block-obj-y with -Wstack-usage=2048 too.
>>>>> To reveal if there are any big stack allocations in the block layer?
>>>> Yes.  Most should be fixed by now, but a handful are probably still there.
>>>> (definitely one in vvfat.c).
>>>>
>>>>> As it seems reducing to 64kB breaks live migration in some (non reproducible) cases.
>>>> Does it hit the guard page?
>>> How would that look like? I get segfaults like this:
>>>
>>> segfault at 7f91aa642b78 ip 0000555ab714ef7d sp 00007f91aa642b50 error 6 in qemu-system-x86_64[555ab6f2c000+794000]
>>>
>>> most of the time error 6. Sometimes error 7. segfault is near the sp.
>> A backtrace would be good.
>
> Here we go. My old friend nc_senv_compat ;-)

This has already been fixed in master. My test systems use an older Qemu ;-)

Peter

>
> Again the question: Would you go for reducing the stack size an eliminating all stack eaters ?
>
> The static netbuf in nc_sendv_compat is no problem.
>
> And: I would go for adding the guard page without MAP_GROWSDOWN and mmaping the rest of the
> stack with this flag if availble. So we are save on non Linux systems or Linux before 3.9 or merged memory regions.
>
> Peter
>
> ---
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
>     at net/net.c:701
> (gdb) bt full
> #0  0x0000555555a2ee35 in nc_sendv_compat (nc=0x0, iov=0x0, iovcnt=0, flags=0)
>     at net/net.c:701
>         buf = '\000' <repeats 65890 times>...
>         buffer = 0x0
>         offset = 0
> #1  0x0000555555a2f058 in qemu_deliver_packet_iov (sender=0x5555565a46b0,
>     flags=0, iov=0x7ffff7e98d20, iovcnt=1, opaque=0x555557802370)
>     at net/net.c:745
>         nc = 0x555557802370
>         ret = 21845
> #2  0x0000555555a3132d in qemu_net_queue_deliver (queue=0x555557802590,
>     sender=0x5555565a46b0, flags=0, data=0x55555659e2a8 "", size=74)
>     at net/queue.c:163
>         ret = -1
>         iov = {iov_base = 0x55555659e2a8, iov_len = 74}
> #3  0x0000555555a3178b in qemu_net_queue_flush (queue=0x555557802590)
>     at net/queue.c:260
>         packet = 0x55555659e280
>         ret = 21845
> #4  0x0000555555a2eb7a in qemu_flush_or_purge_queued_packets (
>     nc=0x555557802370, purge=false) at net/net.c:629
> No locals.
> #5  0x0000555555a2ebe4 in qemu_flush_queued_packets (nc=0x555557802370)
>     at net/net.c:642
> No locals.
> #6  0x00005555557747b7 in virtio_net_set_status (vdev=0x555556fb32a8,
>     status=7 '\a') at /usr/src/qemu-2.5.0/hw/net/virtio-net.c:178
>         ncs = 0x555557802370
>         queue_started = true
>         n = 0x555556fb32a8
>         __func__ = "virtio_net_set_status"
>         q = 0x555557308b50
>         i = 0
>         queue_status = 7 '\a'
> #7  0x0000555555795501 in virtio_set_status (vdev=0x555556fb32a8, val=7 '\a')
>     at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:618
>         k = 0x55555657eb40
>         __func__ = "virtio_set_status"
> #8  0x00005555557985e6 in virtio_vmstate_change (opaque=0x555556fb32a8,
>     running=1, state=RUN_STATE_RUNNING)
>     at /usr/src/qemu-2.5.0/hw/virtio/virtio.c:1539
>         vdev = 0x555556fb32a8
>         qbus = 0x555556fb3240
>         __func__ = "virtio_vmstate_change"
>         k = 0x555556570420
>         backend_run = true
> #9  0x00005555558592ae in vm_state_notify (running=1, state=RUN_STATE_RUNNING)
>     at vl.c:1601
>         e = 0x555557320cf0
>         next = 0x555557af4c40
> #10 0x000055555585737d in vm_start () at vl.c:756
>         requested = RUN_STATE_MAX
> #11 0x0000555555a209ec in process_incoming_migration_co (opaque=0x5555566a1600)
>     at migration/migration.c:392
>         f = 0x5555566a1600
>         local_err = 0x0
>         mis = 0x5555575ab0e0
>         ps = POSTCOPY_INCOMING_NONE
>         ret = 0
> #12 0x0000555555b61efd in coroutine_trampoline (i0=1465036928, i1=21845)
>     at util/coroutine-ucontext.c:80
>         arg = {p = 0x55555752b080, i = {1465036928, 21845}}
>         self = 0x55555752b080
>         co = 0x55555752b080
> #13 0x00007ffff5cb7800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #14 0x00007fffffffcb40 in ?? ()
> No symbol table info available.
> #15 0x0000000000000000 in ?? ()
> No symbol table info available.
>
>
>>
>> Dave
>>
>>>
>>>>>>> 2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed
>>>>>>> freeing
>>>>>>>       of memory. This lead to higher heap allocations which could not
>>>>>>>       effectively
>>>>>>>       be returned to kernel (most likely due to fragmentation).
>>>>>> I agree that some of the exec.c allocations need some care, but I would
>>>>>> prefer to use a custom free list or lazy allocation instead of mmap.
>>>>> This would only help if the elements from the free list would be allocated
>>>>> using mmap? The issue is that RCU delays the freeing so that the number of
>>>>> concurrent allocations is high and then a bunch is freed at once. If the memory
>>>>> was malloced it would still have caused trouble.
>>>> The free list should improve reuse and fragmentation.  I'll take a look at
>>>> lazy allocation of subpages, too.
>>> Ok, that would be good. And for the PhsyPageMap we use mmap and try to avoid
>>> the realloc?
>>>
>>> Peter
>>>
>> -- 
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................