From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=oWd0=WS=vger.kernel.org=kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9F2AEC3A59D
	for <kvm@archiver.kernel.org>; Thu, 22 Aug 2019 10:23:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 7251E233FD
	for <kvm@archiver.kernel.org>; Thu, 22 Aug 2019 10:23:10 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=default; t=1566469390;
	bh=JciWaJ+SBpOdjNSHdQD+B3WnVrW5khAvTWNaNMnbe7U=;
	h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From;
	b=XUboe5UUCsLqKjkrGssUSXfGncrJt5jKET3jMEH9wSTHuIzy4k0O3y0YEJ+jxZcb7
	 sH7JUBGccRH/zGwU4rqwkbqOuo89nMCrn8GEhDSw0p0+ZOhQlz0aPMeEbaHOz870YR
	 2IiM05GG6w7r1a3MuM0dvmrbxULxJLsnNQlCPots=
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2387543AbfHVKXJ (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Thu, 22 Aug 2019 06:23:09 -0400
Received: from foss.arm.com ([217.140.110.172]:43166 "EHLO foss.arm.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1730918AbfHVKXJ (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 22 Aug 2019 06:23:09 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
        by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 71380337;
        Thu, 22 Aug 2019 03:23:08 -0700 (PDT)
Received: from [10.1.197.61] (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14])
        by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 41E723F246;
        Thu, 22 Aug 2019 03:23:06 -0700 (PDT)
Subject: Re: Can we boot a 512U kvm guest?
To:     Auger Eric <eric.auger@redhat.com>,
        Zenghui Yu <yuzenghui@huawei.com>,
        kvmarm@lists.cs.columbia.edu, qemu-arm@nongnu.org
Cc:     James Morse <james.morse@arm.com>,
        Julien Thierry <julien.thierry.kdev@gmail.com>,
        suzuki.poulose@arm.com, peter.maydell@linaro.org,
        kvm@vger.kernel.org, "Wanghaibin (D)" <wanghaibin.wang@huawei.com>,
        zhang.zhanghailiang@huawei.com
References: <86aa9609-7dc9-1461-ae47-f50897cd0875@huawei.com>
 <da5c87d6-8b66-75f9-e720-9f1d80a76d7d@redhat.com>
 <fbeb47df-7ea2-04ce-5fe3-a6a6a4751b8b@kernel.org>
 <681f59e8-a193-6d3e-0bcc-5e52f4203868@redhat.com>
From:   Marc Zyngier <maz@kernel.org>
Organization: Approximate
Message-ID: <dc78b8af-e761-dc87-982e-1885cfb98100@kernel.org>
Date:   Thu, 22 Aug 2019 11:23:05 +0100
User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <681f59e8-a193-6d3e-0bcc-5e52f4203868@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On 22/08/2019 10:50, Auger Eric wrote:
> Hi Marc,
> 
> On 8/22/19 11:29 AM, Marc Zyngier wrote:
>> Hi Eric,
>>
>> On 22/08/2019 10:08, Auger Eric wrote:
>>> Hi Zenghui,
>>>
>>> On 8/13/19 10:50 AM, Zenghui Yu wrote:
>>>> Hi folks,
>>>>
>>>> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
>>>> 512"), we seemed to be allowed to boot a 512U guest.  But I failed to
>>>> start it up with the latest QEMU.  I guess there are at least *two*
>>>> reasons (limitations).
>>>>
>>>> First I got a QEMU abort:
>>>>     "kvm_set_irq: Invalid argument"
>>>>
>>>> Enable the trace_kvm_irq_line() under debugfs, when it comed with
>>>> vcpu-256, I got:
>>>>     "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
>>>> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...
>>>>
>>>> So the thing is that we only have 8 bits for vcpu_index field ([23:16])
>>>> in KVM_IRQ_LINE ioctl.  irq_type field will be corrupted if we inject a
>>>> PPI to vcpu-256, whose vcpu_index will take 9 bits.
>>>>
>>>> I temporarily patched the KVM and QEMU with the following diff:
>>>>
>>>> ---8<---
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>>>> b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 95516a4..39a0fb1 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -325,10 +325,10 @@ struct kvm_vcpu_events {
>>>>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER        1
>>>>
>>>>  /* KVM_IRQ_LINE irq field index values */
>>>> -#define KVM_ARM_IRQ_TYPE_SHIFT        24
>>>> -#define KVM_ARM_IRQ_TYPE_MASK        0xff
>>>> +#define KVM_ARM_IRQ_TYPE_SHIFT        28
>>>> +#define KVM_ARM_IRQ_TYPE_MASK        0xf
>>>>  #define KVM_ARM_IRQ_VCPU_SHIFT        16
>>>> -#define KVM_ARM_IRQ_VCPU_MASK        0xff
>>>> +#define KVM_ARM_IRQ_VCPU_MASK        0xfff
>>>>  #define KVM_ARM_IRQ_NUM_SHIFT        0
>>>>  #define KVM_ARM_IRQ_NUM_MASK        0xffff
>>>>
>>>> ---8<---
>>>>
>>>> It makes things a bit better, it also immediately BREAKs the api with
>>>> old versions.
>>>>
>>>>
>>>> Next comes one more QEMU abort (with the "fix" above):
>>>>     "Failed to set device address: No space left on device"
>>>>
>>>> We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for
>>>> each redistributor. 512 vcpus take 1024 io devices, which is beyond the
>>>> maximum limitation of the current kernel - NR_IOBUS_DEVS (1000).
>>>> So we get a ENOSPC error here.
>>>
>>> Do you plan to send a patch for increasing the NR_IOBUS_DEVS? Otherwise
>>> I can do it.
>>
>> I really wonder whether that's a sensible thing to do on its own.
>>
>> Looking at the implementation of kvm_io_bus_register_dev (which copies
>> the whole array each time we insert a device), we have an obvious issue
>> with systems that create a large number of device structures, leading to
>> large transient memory usage and slow guest start.
>>
>> We could also try and reduce the number of devices we insert by making
>> the redistributor a single device (which it is in reality). It probably
>> means we need to make the MMIO decoding more flexible.
> 
> Yes it makes sense. If no objection, I can work on this as I am the
> source of the mess ;-)

Sure, if you have some spare bandwidth, feel free to give it a go. I'd
certainly like to see the userspace counterpart to my earlier patch, and
some agreement on the ABI change.

Thanks,

	M.
-- 
Jazz is not dead, it just smells funny...