From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9E95BC46467
	for <kvm@archiver.kernel.org>; Tue, 10 Jan 2023 14:06:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233719AbjAJOG2 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 10 Jan 2023 09:06:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58286 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233295AbjAJOFn (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 10 Jan 2023 09:05:43 -0500
Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E8AC8E999
        for <kvm@vger.kernel.org>; Tue, 10 Jan 2023 06:05:24 -0800 (PST)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by dfw.source.kernel.org (Postfix) with ESMTPS id E2D2961746
        for <kvm@vger.kernel.org>; Tue, 10 Jan 2023 14:05:23 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C187C433D2;
        Tue, 10 Jan 2023 14:05:23 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1673359523;
        bh=vepTwpGabLDJ6q7ywSRSXtC0H5Q/1f/zT+b9nWkXj7c=;
        h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
        b=qc/rSNy1YWWckpBlyTce2AdDS3TnzVzoNt2ezlb36MQs/TzJT+AchonGSoYkfo/Cl
         qtPpK4rBbGyvXcWWVaGFbYSpAAUBF8K3aVzEqj5Jmw5MSBGofHmt5O6tz5Rrzcmtlw
         j5debYwx9LDUzRVipLZlTt84FB3UqIAS1HKnI5rQDvFCSSm1y9qz6Z57iY0+G6QLmJ
         knbcDb9BYkZTevD0lcSme0Rfl74s61bkBBhSpBiJ028jJCmcPBvtgd+ZgZ/t63wzaa
         KT8n6nb5pw080UWgRFxS0A0l/qhI3WtwJnhHScMcO1uBzfSOpISRSkmv/5QQOg9TyR
         as8LSGHYrOjwQ==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
        by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.95)
        (envelope-from <maz@kernel.org>)
        id 1pFFFg-000bro-Sa;
        Tue, 10 Jan 2023 14:05:21 +0000
Date:   Tue, 10 Jan 2023 14:05:20 +0000
Message-ID: <86cz7mo57j.wl-maz@kernel.org>
From:   Marc Zyngier <maz@kernel.org>
To:     Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc:     catalin.marinas@arm.com, will@kernel.org,
        linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
        kvm@vger.kernel.org, scott@os.amperecomputing.com,
        Darren Hart <darren@os.amperecomputing.com>
Subject: Re: [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues
In-Reply-To: <6171dc7c-5d83-d378-db9e-d94f27afe43a@os.amperecomputing.com>
References: <20220824060304.21128-1-gankulkarni@os.amperecomputing.com>
        <6171dc7c-5d83-d378-db9e-d94f27afe43a@os.amperecomputing.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, scott@os.amperecomputing.com, darren@os.amperecomputing.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Hi Ganapatrao,

On Tue, 10 Jan 2023 12:17:20 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> > This series contains 3 fixes which were found while testing
> > ARM64 Nested Virtualization patch series.
> > 
> > First patch avoids the restart of hrtimer when timer interrupt is
> > fired/forwarded to Guest-Hypervisor.
> > 
> > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
> > 
> > Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> > configured with 64K pagesize where as Host Hypervisor with 4K.
> > 
> > These patches are rebased on Nested Virtualization V6 patchset[1].
> 
> If I boot a Guest Hypervisor with more cores and then booting of a
> NestedVM with equal number of cores or booting multiple
> NestedVMs(simultaneously) with lower number of cores is resulting in
> very slow booting and some time RCU soft-lockup of a NestedVM. This I
> have debugged and turned out to be due to many SGI are getting
> asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM
> code prepares NestedVM for WFI wakeup/return.
> 
> When Guest Hypervisor prepares NestedVM while returning/resuming from
> WFI, it is loading guest-context,  vGIC and timer contexts etc.
> The function gic_poke_irq (called from irq_set_irqchip_state with
> spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's
> KVM code resulting in mem-abort trap to Host Hypervisor. Host
> Hypervisor as part of handling the guest mem abort, function
> io_mem_abort is called  in turn vgic_mmio_write_sactive, which
> prepares every vCPU of Guest Hypervisor by calling SGI. The number of
> SGI/IPI calls goes exponentially high when more and more cores are
> used to boot Guest Hypervisor.

This really isn't surprising. NV combined with oversubscribing is
bound to be absolutely terrible. The write to GICD_ISACTIVER is
only symptomatic of the interrupt amplification problem that already
exists without NV (any IPI in a guest is likely to result in at least
one IPI in the host).

Short of having direct injection of interrupts for *all* interrupts,
you end-up with this sort of emulation that relies on being able to
synchronise all CPUs. Is it bad? Yes. Very.

> 
> Code trace:
> At Guest-hypervisor:
> kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active->
> irq_set_irqchip_state->gic_poke_irq
> 
> At Host-Hypervisor: io_mem_abort->
> kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write->
> vgic_mmio_write_sactive->vgic_access_active_prepare->
> kvm_kick_many_cpus->smp_call_function_many
> 
> I am currently working around this with "nohlt" kernel param to
> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
> slowness of booting of NestedVM with more cores?

At the moment, I'm focussing on correctness rather than performance.

Maybe we can restrict the conditions in which we perform this
synchronisation, but that's pretty low on my radar at the moment. Once
things are in a state that can be merged and that works correctly, we
can look into it.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.