From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38434)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <drjones@redhat.com>) id 1Z9uN5-0006w5-Si
	for qemu-devel@nongnu.org; Tue, 30 Jun 2015 08:10:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <drjones@redhat.com>) id 1Z9uN0-0002Is-Ig
	for qemu-devel@nongnu.org; Tue, 30 Jun 2015 08:10:39 -0400
Received: from mx1.redhat.com ([209.132.183.28]:38907)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <drjones@redhat.com>) id 1Z9uN0-0002I2-DX
	for qemu-devel@nongnu.org; Tue, 30 Jun 2015 08:10:34 -0400
Date: Tue, 30 Jun 2015 14:10:29 +0200
From: Andrew Jones <drjones@redhat.com>
Message-ID: <20150630121029.GE3016@hawk.localdomain>
References: <55918593.6090703@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <55918593.6090703@huawei.com>
Subject: Re: [Qemu-devel] QEMU + KVM PSCI and VCPU creation / destruction
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On Mon, Jun 29, 2015 at 07:51:15PM +0200, Claudio Fontana wrote:
> Hello,
> 
> while heavily testing PSCI on QEMU+KVM during OSv enablement, I encountered, among others, the following issue:
> 
> I am running a test in which I boot an OS at EL1 under KVM, then boot a secondary VCPU,
> then immediately call PSCI for a SYSTEM_RESET (reboot).
> 
> This loops over infinitely, or, as a matter of fact, until I run out of memory in the Foundation Model.
> 
> Now, before submitting another support request for the Model, I checked the code for the handling of PSCI, and it turns out that KVM handles the HVC and then sets an exit reason for QEMU to check,
> which again sets the system_reset_requested to true, which causes a qemu_system_reset.
> 
> Now in there I see the call to qemu_devices_reset() and cpu_synchronize_all_post_reset(),
> but are actually the VCPU destroyed? Is the VM destroyed? Or are new resources allocated at the next boot whenever PSCI asks for another VCPU to be booted via KVM_CREATE_VCPU etc?
> 
> If the resources associated to the VCPU (and VM?) are not freed, isn't this always going to cause leak in the host?
> 
> After around 3 hours of continuous PSCI secondary boot followed by SYSTEM_RESET I run out of memory on the host.

I can reproduce this with kvm-unit-tests. I don't see much sign of a
leak using tcg with x86 for the host, nor with using kvm on an aarch64
host. However using tcg on an aarch64 host shows memory getting used
pretty quickly. I didn't test with arm, only aarch64. I would just
provide the test, which is only

int main(void)
{
    assert(cpumask_weight(&cpu_present_mask) > 1);
    smp_boot_secondary(1, halt);
    psci_sys_reset();
    return 0;
}

but, in order to cleanly handle system reset, I ended up tweaking the
framework in several places. So I've created a branch for this, which
is here

https://github.com/rhdrjones/kvm-unit-tests/commits/arm/test-reset

On an aarch64 host, just do
./configure
make
export QEMU=<path-to-qemu>
$QEMU -machine virt,accel=tcg \
-device virtio-serial-device \
-device virtconsole,chardev=ctd \
-chardev testdev,id=ctd \
-display none -serial stdio \
-kernel arm/secondary-leak.flat \
-smp 2

I've expanded the command line because

  arm/run arm/secondary-leak.flat -smp 2

would use kvm by default on the aarch64 host, and we need tcg.
(maybe I'll add an arm/run command line switch for that someday)

The test will never output anything and never exit.  While it's
running just check 'free' several times in another term to see
that memory is going away, and then hit ^C on the test whenever
you want to quit.

drew