From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E95BC46467 for ; Tue, 10 Jan 2023 14:06:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233719AbjAJOG2 (ORCPT ); Tue, 10 Jan 2023 09:06:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233295AbjAJOFn (ORCPT ); Tue, 10 Jan 2023 09:05:43 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E8AC8E999 for ; Tue, 10 Jan 2023 06:05:24 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E2D2961746 for ; Tue, 10 Jan 2023 14:05:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C187C433D2; Tue, 10 Jan 2023 14:05:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1673359523; bh=vepTwpGabLDJ6q7ywSRSXtC0H5Q/1f/zT+b9nWkXj7c=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=qc/rSNy1YWWckpBlyTce2AdDS3TnzVzoNt2ezlb36MQs/TzJT+AchonGSoYkfo/Cl qtPpK4rBbGyvXcWWVaGFbYSpAAUBF8K3aVzEqj5Jmw5MSBGofHmt5O6tz5Rrzcmtlw j5debYwx9LDUzRVipLZlTt84FB3UqIAS1HKnI5rQDvFCSSm1y9qz6Z57iY0+G6QLmJ knbcDb9BYkZTevD0lcSme0Rfl74s61bkBBhSpBiJ028jJCmcPBvtgd+ZgZ/t63wzaa KT8n6nb5pw080UWgRFxS0A0l/qhI3WtwJnhHScMcO1uBzfSOpISRSkmv/5QQOg9TyR as8LSGHYrOjwQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pFFFg-000bro-Sa; Tue, 10 Jan 2023 14:05:21 +0000 Date: Tue, 10 Jan 2023 14:05:20 +0000 Message-ID: <86cz7mo57j.wl-maz@kernel.org> From: Marc Zyngier To: Ganapatrao Kulkarni Cc: catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, scott@os.amperecomputing.com, Darren Hart Subject: Re: [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues In-Reply-To: <6171dc7c-5d83-d378-db9e-d94f27afe43a@os.amperecomputing.com> References: <20220824060304.21128-1-gankulkarni@os.amperecomputing.com> <6171dc7c-5d83-d378-db9e-d94f27afe43a@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, scott@os.amperecomputing.com, darren@os.amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Hi Ganapatrao, On Tue, 10 Jan 2023 12:17:20 +0000, Ganapatrao Kulkarni wrote: > > > Hi Marc, > > On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote: > > This series contains 3 fixes which were found while testing > > ARM64 Nested Virtualization patch series. > > > > First patch avoids the restart of hrtimer when timer interrupt is > > fired/forwarded to Guest-Hypervisor. > > > > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor. > > > > Third patch fixes the NestedVM boot hang seen when Guest Hypersior > > configured with 64K pagesize where as Host Hypervisor with 4K. > > > > These patches are rebased on Nested Virtualization V6 patchset[1]. > > If I boot a Guest Hypervisor with more cores and then booting of a > NestedVM with equal number of cores or booting multiple > NestedVMs(simultaneously) with lower number of cores is resulting in > very slow booting and some time RCU soft-lockup of a NestedVM. This I > have debugged and turned out to be due to many SGI are getting > asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM > code prepares NestedVM for WFI wakeup/return. > > When Guest Hypervisor prepares NestedVM while returning/resuming from > WFI, it is loading guest-context, vGIC and timer contexts etc. > The function gic_poke_irq (called from irq_set_irqchip_state with > spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's > KVM code resulting in mem-abort trap to Host Hypervisor. Host > Hypervisor as part of handling the guest mem abort, function > io_mem_abort is called in turn vgic_mmio_write_sactive, which > prepares every vCPU of Guest Hypervisor by calling SGI. The number of > SGI/IPI calls goes exponentially high when more and more cores are > used to boot Guest Hypervisor. This really isn't surprising. NV combined with oversubscribing is bound to be absolutely terrible. The write to GICD_ISACTIVER is only symptomatic of the interrupt amplification problem that already exists without NV (any IPI in a guest is likely to result in at least one IPI in the host). Short of having direct injection of interrupts for *all* interrupts, you end-up with this sort of emulation that relies on being able to synchronise all CPUs. Is it bad? Yes. Very. > > Code trace: > At Guest-hypervisor: > kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active-> > irq_set_irqchip_state->gic_poke_irq > > At Host-Hypervisor: io_mem_abort-> > kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write-> > vgic_mmio_write_sactive->vgic_access_active_prepare-> > kvm_kick_many_cpus->smp_call_function_many > > I am currently working around this with "nohlt" kernel param to > NestedVM. Any suggestions to handle/fix this case/issue and avoid the > slowness of booting of NestedVM with more cores? At the moment, I'm focussing on correctness rather than performance. Maybe we can restrict the conditions in which we perform this synchronisation, but that's pretty low on my radar at the moment. Once things are in a state that can be merged and that works correctly, we can look into it. Thanks, M. -- Without deviation from the norm, progress is not possible.