From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2F89EB64DD for ; Tue, 11 Jul 2023 12:31:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231266AbjGKMbb (ORCPT ); Tue, 11 Jul 2023 08:31:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229798AbjGKMb3 (ORCPT ); Tue, 11 Jul 2023 08:31:29 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC4B3E67 for ; Tue, 11 Jul 2023 05:31:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EE26F614BE for ; Tue, 11 Jul 2023 12:30:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5C15AC433C9; Tue, 11 Jul 2023 12:30:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689078624; bh=JAoQImOGHobEUP97NTQ9fB21QVqS1rQvMSn2Qvf/Blk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=FqsLHE4HlNiVsy3AeTCslZ0MXzU8I+NbNL9dVneGWTvy0RYjmv11hA+18zVF5e5KA EGLaz9msRGRngFnOhWRUmembwHyP8j4V3ePaogTYpe+UxVhs4HYssI4ezgmx88YHl6 GgdGvooQ5MS1AGtxV43hLbrH5bdjpsEWOGi33Y3fnhn/HSkHTe3sGoejVl28kMcpuk V03vDTdobKEjTB9iO1dRRMNOTVukPnM9WPPgsA9KYkXwcvxG4JPpj7YEwVP4iOtBE+ 4HcA6vJZFB8Ffr3WenAdm2onyIGXDQdPF0Ptss0iDUZh7TXdkq+rInAciFEAeoczz+ 0djTss0FyrToA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qJCVZ-00C7H6-Se; Tue, 11 Jul 2023 13:30:21 +0100 Date: Tue, 11 Jul 2023 13:30:21 +0100 Message-ID: <86bkgiwrz6.wl-maz@kernel.org> From: Marc Zyngier To: Ganapatrao Kulkarni Cc: Miguel Luis , Eric Auger , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Alexandru Elisei , Andre Przywara , Chase Conklin , Christoffer Dall , Darren Hart , Jintack Lim , Russell King , James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu Subject: Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support In-Reply-To: <853a5f76-74fe-a38d-f2cd-785963177c8a@os.amperecomputing.com> References: <20230515173103.1017669-1-maz@kernel.org> <877crmzr5j.wl-maz@kernel.org> <04ec8efb-33ff-153e-3be5-5c84a01bff2a@os.amperecomputing.com> <853a5f76-74fe-a38d-f2cd-785963177c8a@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, miguel.luis@oracle.com, eauger@redhat.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, alexandru.elisei@arm.com, andre.przywara@arm.com, chase.conklin@arm.com, christoffer.dall@arm.com, darren@os.amperecomputing.com, jintack@cs.columbia.edu, rmk+kernel@armlinux.org.uk, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, 11 Jul 2023 12:56:48 +0100, Ganapatrao Kulkarni wrote: >=20 >=20 >=20 > On 07-07-2023 03:16 pm, Ganapatrao Kulkarni wrote: > >=20 > >=20 > > On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote: > >>=20 > >> Hi Marc, > >>=20 > >> On 29-06-2023 12:33 pm, Marc Zyngier wrote: > >>> Hi Ganapatrao, > >>>=20 > >>> On Wed, 28 Jun 2023 07:45:55 +0100, > >>> Ganapatrao Kulkarni wrote: > >>>>=20 > >>>>=20 > >>>> Hi Marc, > >>>>=20 > >>>>=20 > >>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote: > >>>>> This is the 4th drop of NV support on arm64 for this year. > >>>>>=20 > >>>>> For the previous episodes, see [1]. > >>>>>=20 > >>>>> What's changed: > >>>>>=20 > >>>>> - New framework to track system register traps that are reinjected = in > >>>>> =C2=A0=C2=A0=C2=A0 guest EL2. It is expected to replace the discret= e handling we have > >>>>> =C2=A0=C2=A0=C2=A0 enjoyed so far, which didn't scale at all. This = has already > >>>>> fixed a > >>>>> =C2=A0=C2=A0=C2=A0 number of bugs that were hidden (a bunch of trap= s were never > >>>>> =C2=A0=C2=A0=C2=A0 forwarded...). Still a work in progress, but thi= s is going in the > >>>>> =C2=A0=C2=A0=C2=A0 right direction. > >>>>>=20 > >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than > >>>>> =C2=A0=C2=A0=C2=A0 the L0 IPA space. This fixes a number of subtle = issues, > >>>>> depending on > >>>>> =C2=A0=C2=A0=C2=A0 how the initial guest was created. > >>>>>=20 > >>>>> - Consequently, the patch series has gone longer again. Boo. But > >>>>> =C2=A0=C2=A0=C2=A0 hopefully some of it is easier to review... > >>>>>=20 > >>>>=20 > >>>> I am facing issue in booting NestedVM with V9 as well with 10 patchs= et. > >>>>=20 > >>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot > >>>> Guest-Hypervisor and then NestedVM without any issue. > >>>> However when I try to boot using QEMU(not using EDK2/EFI), > >>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From > >>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, > >>>> boot hangs very early stage of the boot. > >>>>=20 > >>>> I did some debug using ftrace and it seems the Guest-Hypervisor is > >>>> getting very high rate of arch-timer interrupts, > >>>> due to that all CPU time is going on in serving the Guest-Hypervisor > >>>> and it is never going back to NestedVM. > >>>>=20 > >>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1] > >>>=20 > >>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I > >>> cannot even boot a L1 with 'virtualization=3Don" (the guest is stuck = at > >>> the point where virtio gets probed and waits for its first interrupt). > >>>=20 > >>> Worse, booting a hVHE guest results in QEMU generating an assert as it > >>> tries to inject an interrupt using the QEMU GICv3 model, something > >>> that should *NEVER* be in use with KVM. > >>>=20 > >>> With help from Eric, I got to a point where the hVHE guest could boot > >>> as long as I kept injecting console interrupts, which is again a > >>> symptom of the vGIC not being used. > >>>=20 > >>> So something is *majorly* wrong with the QEMU patches. I don't know > >>> what makes it possible for you to even boot the L1 - if the GIC is > >>> external, injecting an interrupt in the L2 is simply impossible. > >>>=20 > >>> Miguel, can you please investigate this? > >>>=20 > >>> In the meantime, I'll add some code to the kernel side to refuse the > >>> external interrupt controller configuration with NV. Hopefully that > >>> will lead to some clues about what is going on. > >>=20 > >> Continued debugging of the issue and it seems the endless ptimer > >> interrupts on Ampere platform is due to some mess up of CVAL of > >> ptimer, resulting in interrupt triggered always when it is enabled. > >>=20 > >> I see function "timer_set_offset" called from kvm_arm_timer_set_reg > >> in QEMU case but there is no such calls in kvmtool boot. > >>=20 > >> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg > >> function, then I could boot the Guest-Hypervisor then NestedVM from > >> GH/L1. > >>=20 > >> I also observed in QEMU case, kvm_arm_timer_set_reg is called to > >> set CNT, CVAL and CTL of both vtimer and ptimer. > >> Not sure why QEMU is setting these registers explicitly? need to dig. > >>=20 > >=20 > > I don't see any direct ioctl calls to change any timer > > registers. Looks like it is happening from the emulation > > code(target/arm/helper.c)? >=20 > function "write_list_to_kvmstate(target/arm/kvm.c)" is issuing ioctl > to write timer registers. PTIMER_CNT and TIMER_CNT writes from this > function is resulting in offsets change. Madness. Why is QEMU doing this? It has no business writing to the timer at any stage, at least with KVM. This confirms my suspicions that QEMU is confused about what mode it runs in, and does all sort of funky things it was never expected to do. M. --=20 Without deviation from the norm, progress is not possible.