From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43964)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bGMk8-0008Ex-0B
	for qemu-devel@nongnu.org; Fri, 24 Jun 2016 04:45:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1bGMk4-0002oT-Lg
	for qemu-devel@nongnu.org; Fri, 24 Jun 2016 04:45:39 -0400
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:32932 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1bGMk4-0002o5-C2
	for qemu-devel@nongnu.org; Fri, 24 Jun 2016 04:45:36 -0400
Message-ID: <576CF32B.7040101@kamp.de>
Date: Fri, 24 Jun 2016 10:45:31 +0200
From: Peter Lieven <pl@kamp.de>
MIME-Version: 1.0
References: <5768F923.7040502@kamp.de> <576BF910.70304@kamp.de>
	<178ee05d-cb23-e1ba-5a7f-87a5caef1e91@redhat.com>
	<576C00D1.9020202@kamp.de>
	<48f0c4a6-8c26-446d-1dfd-c79da0c18707@redhat.com>
	<576C0C1D.9090709@kamp.de>
	<cd9542a2-1141-6345-6597-0d7f3dc0eed7@redhat.com>
	<576C5481.6070605@kamp.de>
	<7575263.1646445.1466741414660.JavaMail.zimbra@redhat.com>
	<576CEB1D.6040609@kamp.de>
	<dbb956d8-fcdd-9392-8d3e-54acf4dc2cae@redhat.com>
In-Reply-To: <dbb956d8-fcdd-9392-8d3e-54acf4dc2cae@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Qemu and heavily increased RSS usage
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, qemu-devel@nongnu.org, Fam Zheng <famz@redhat.com>, Peter Maydell <peter.maydell@linaro.org>

Am 24.06.2016 um 10:20 schrieb Paolo Bonzini:
>
> On 24/06/2016 10:11, Peter Lieven wrote:
>> Am 24.06.2016 um 06:10 schrieb Paolo Bonzini:
>>>>> If it's 10M nothing.  If there is a 100M regression that is also caused
>>>>> by RCU, we have to give up on it for that data structure, or mmap/munmap
>>>>> the affected data structures.
>>>> If it was only 10MB I would agree. But if I run the VM described earlier
>>>> in this thread it goes from ~35MB with Qemu-2.2.0 to ~130-150MB with
>>>> current master. This is with coroutine pool disabled. With the coroutine pool
>>>> it can grow to sth like 300-350MB.
>>>>
>>>> Is there an easy way to determinate if RCU is the problem? I have the same
>>>> symptoms, valgrind doesn't see the allocated memory. Is it possible
>>>> to make rcu_call directly invoking the function - maybe with a lock around it
>>>> that serializes the calls? Even if its expensive it might show if we search
>>>> at the right place.
>>> Yes, you can do that.  Just make it call the function without locks, for
>>> a quick PoC it will be okay.
>> Unfortunately, it leads to immediate segfaults because a lot of things seem
>> to go horribly wrong ;-)
>>
>> Do you have any other idea than reverting all the rcu patches for this section?
> Try freeing under the big QEMU lock:
>
> 	if (qemu_mutex_iothread_locked()) {
> 	    unlock = true;
> 	    qemu_mutex_lock_iothread();
> 	}
> 		...
> 	if (unlock) {
> 	    qemu_mutex_unlock_iothread();
> 	}
>
> afbe70535ff1a8a7a32910cc15ebecc0ba92e7da should be easy to backport.

Will check this out. Meanwhile I read a little about returning RSS to the kernel as I was wondering
why RSS and HWM are almost at the same high level. It seems that ptmalloc (glibc default alloctor)
is very reluctant in retuning memory to the kernel. There indeed is no guarantee that freed memory
returned. Only mmap'ed memory that is unmapped is guaranteed to be returned.

So I tried the following without reverting anything:

MALLOC_MMAP_THRESHOLD_=4096 ./x86_64-softmmu/qemu-system-x86_64  ...

No idea on performance impact yet, but it solves the issue.

With default threshold my test VM rises up to 154MB RSS usage:

VmHWM:      154284 kB
VmRSS:      154284 kB

With the option it looks like this:

VmHWM:       50588 kB
VmRSS:       41920 kB

with jemalloc I can observe that the HWM is still high, but RSS is below its value. But still in the order of about 100MB.

Peter