From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60548) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK2Bi-0000EB-37 for qemu-devel@nongnu.org; Mon, 04 Jul 2016 07:37:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bK2Bd-00034a-VA for qemu-devel@nongnu.org; Mon, 04 Jul 2016 07:37:17 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:39811 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bK2Bd-00034J-KM for qemu-devel@nongnu.org; Mon, 04 Jul 2016 07:37:13 -0400 References: <1467104499-27517-1-git-send-email-pl@kamp.de> <1467104499-27517-8-git-send-email-pl@kamp.de> <87bn2i6a0z.fsf@dusky.pond.sub.org> <5e0cdb47-1446-e575-a1c7-c437ac7b8e41@redhat.com> <87vb0llkic.fsf@dusky.pond.sub.org> From: Peter Lieven Message-ID: <577A4A56.10901@kamp.de> Date: Mon, 4 Jul 2016 13:36:54 +0200 MIME-Version: 1.0 In-Reply-To: <87vb0llkic.fsf@dusky.pond.sub.org> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 07/15] qapi: use mmap for QmpInputVisitor List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster , Paolo Bonzini Cc: kwolf@redhat.com, peter.maydell@linaro.org, mst@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com, kraxel@redhat.com, dgilbert@redhat.com Am 04.07.2016 um 13:18 schrieb Markus Armbruster: > Paolo Bonzini writes: > >> On 30/06/2016 16:12, Markus Armbruster wrote: >>> Implementing a stack as "big enough" array can be wasteful. >>> Implementing it as dynamically allocated list is differently wasteful. >>> Saving several mallocs and frees can be worth "wasting" a few pages of >>> memory for a short time. >> Most usage of QmpInputVisitor at startup comes from >> object_property_set_qobject, which only sets small scalar objects. The >> stack is entirely unused in this case. > A quick test run shows ~300 qmp_input_visitor_new() calls during > startup, with at most two alive at the same time. > > Why would it matter whether these are in the order of 150 bytes or 25000 > bytes each? How could this materially impact RSS? > > There's one type of waste here that I understand: we zero the whole > QmpInputVisitor on allocation. > > I'm not opposed to changing how the stack is implemented, I just want to > first understand why the current implmementation behaves badly (assuming > it does). The history behind this is that I observed that the RSS usage of Qemu has dramatically increased between Qemu 2.2.0 and 2.5.0. I observed that really clearly since we use hugetblfs everywhere and so I can clearly distinct Qemu memory from VM memory. After having bisected one increase in RSS usage to the introduction of RCU the theory came up that the memory gets fragmented because alloc and dealloc patterns have changed. So I started to trace all malloc calls above 4kB and started to use mmap everywhere where it was possible. To give you an idea of the diffence I observed I'd like to give an example. I have a blade with 22 vServers running on it. Including OS the allocated memory with current master is approx. at 6.5GB. With current master and the following environment set: MALLOC_MMAP_THRESHOLD_=32768 the allocated memory stays at approx. 2GB. Peter