From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 613DAC47258
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 25 Jan 2024 08:59:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:
	Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=Vf8kJNxOBTy/XR6opCAoHALr4YdmhEYrvKXuGBd+wq4=; b=LGGK8RwN9BH25+
	0F7YIPs9OJNreKLXZ4X5HwZHsB9Edh94WUVYTzKaUnPq0jq4T28uLj9m0plvw5iwJkn4lvgUCuJX2
	V5ytFg9MXQUM04bkDbUpmWb/qAqCzTamA+CSn3dARUAi4s4BXKsEBxlAfEflvmHOEgTAaIomttUGU
	RBko43KUXp6vG6pAPEBgPbFinO3QXhoGteMhAAMjjnqh+oVePgOB0iU1wVp0kXJZqJsPT4UD+lDjq
	c7dFrli6+Ez/BuoZ6jvbXtrKHu4ZlYk5/UjZEm6X3wpTT2y69SzQnrdonQqA/nt1DM+ESoGNbDejO
	g4E7mYxer7Nuv8x66Jgg==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux))
	id 1rSvZV-007QTM-2t;
	Thu, 25 Jan 2024 08:58:53 +0000
Received: from dfw.source.kernel.org ([139.178.84.217])
	by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux))
	id 1rSvZS-007QRZ-35
	for linux-arm-kernel@lists.infradead.org;
	Thu, 25 Jan 2024 08:58:52 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by dfw.source.kernel.org (Postfix) with ESMTP id 3E0046211F;
	Thu, 25 Jan 2024 08:58:50 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0365C433F1;
	Thu, 25 Jan 2024 08:58:49 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1706173129;
	bh=5ESz6JtE+U09p1ZlznRZpR/p4SYH9bbFXDckngkPJGk=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=OpMMtDBvdUl60ZwdLDe95Zfh/slGKD9gv37QAaVH9zeexBt8z4t/Im1KFyaI8OmyY
	 8Ch8UXPtoOlpdEhI4/9MBBGePNM52SR+tav+FYEOU3Ru4+wTiL3x8Lq9naKnBIQISF
	 mXleuLOdD3i3pwqlgZMsnwMXGVNM9daixDnSQBj4JvZcCqiMRC7g005WwT2jIhLCJf
	 bwpMUJvu9R1hpzIZi59c0sx9CXvXRBhKZ1FrDOIJzfmVws+jISnGQdF9OC7SZ9wPzB
	 u3izUF1B3/HVz0GdOGPcDgfpeDTb258nLy/wTVbXJgly35pMe3U6ct/MyDq2+LMse9
	 7fHsqntZcrdNQ==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1rSvZP-00EemY-8z;
	Thu, 25 Jan 2024 08:58:47 +0000
Date: Thu, 25 Jan 2024 08:58:46 +0000
Message-ID: <868r4d94c9.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: kvmarm@lists.linux.dev,
	kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Andre Przywara <andre.przywara@arm.com>,
	Chase Conklin <chase.conklin@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Darren Hart <darren@os.amperecomputing.com>,
	Jintack Lim <jintack@cs.columbia.edu>,
	Russell King <rmk+kernel@armlinux.org.uk>,
	Miguel Luis <miguel.luis@oracle.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>,
	D Scott Phillips <scott@os.amperecomputing.com>
Subject: Re: [PATCH v11 17/43] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
In-Reply-To: <3b51d760-fd32-41b7-b142-5974fdf3e90e@os.amperecomputing.com>
References: <20231120131027.854038-1-maz@kernel.org>
	<20231120131027.854038-18-maz@kernel.org>
	<f0416fa9-b4f1-4bad-a73b-b1d7ecbffc62@os.amperecomputing.com>
	<86le8g86t6.wl-maz@kernel.org>
	<3b51d760-fd32-41b7-b142-5974fdf3e90e@os.amperecomputing.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, alexandru.elisei@arm.com, andre.przywara@arm.com, chase.conklin@arm.com, christoffer.dall@arm.com, darren@os.amperecomputing.com, jintack@cs.columbia.edu, rmk+kernel@armlinux.org.uk, miguel.luis@oracle.com, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, scott@os.amperecomputing.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240125_005851_101057_AEF62B47 
X-CRM114-Status: GOOD (  37.37  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Thu, 25 Jan 2024 08:14:32 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 23-01-2024 07:56 pm, Marc Zyngier wrote:
> > Hi Ganapatrao,
> > 
> > On Tue, 23 Jan 2024 09:55:32 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> Hi Marc,
> >> 
> >>> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	if (is_hyp_ctxt(vcpu)) {
> >>> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> >>> +	} else {
> >>> +		write_lock(&vcpu->kvm->mmu_lock);
> >>> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> >>> +		write_unlock(&vcpu->kvm->mmu_lock);
> >>> +	}
> >> 
> >> Due to race, there is a non-existing L2's mmu table is getting loaded
> >> for some of vCPU while booting L1(noticed with L1 boot using large
> >> number of vCPUs). This is happening since at the early stage the
> >> e2h(hyp-context) is not set and trap to eret of L1 boot-strap code
> >> resulting in context switch as if it is returning to L2(guest enter)
> >> and loading not initialized mmu table on those vCPUs resulting in
> >> unrecoverable traps and aborts.
> > 
> > I'm not sure I understand the problem you're describing here.
> > 
> 
> IIUC, When the S2 fault happens, the faulted vCPU gets the pages from
> qemu process and maps in S2 and copies the code to allocated
> memory. Mean while other vCPUs which are in race to come online, when
> they switches over to dummy S2 finds the mapping and returns to L1 and
> subsequent execution does not fault instead fetches from memory where
> no code exists yet(for some) and generates stage 1 instruction abort
> and jumps to abort handler and even there is no code exist and keeps
> aborting. This is happening on random vCPUs(no pattern).

Why is that any different from the way we handle faults in the
non-nested case? If there is a case where we can map the PTE at S2
before the data is available, this is a generic bug that can trigger
irrespective of NV.

> 
> > What is the race exactly? Why isn't the shadow S2 good enough? Not
> > having HCR_EL2.VM set doesn't mean we can use the same S2, as the TLBs
> > are tagged by a different VMID, so staying on the canonical S2 seems
> > wrong.
> 
> IMO, it is unnecessary to switch-over for first ERET while L1 is
> booting and repeat the faults and page allocation which is anyway
> dummy once L1 switches to E2H.

It is mandated by the architecture. EL1 is, by definition, a different
translation regime from EL2. So we *must* have a different S2, because
that defines the boundaries of TLB creation and invalidation. The
fact that these are the same pages is totally irrelevant.

> Let L1 use its S2 always which is created by L0. Even we should
> consider avoiding the entry created for L1 in array(first entry in the
> array) of S2-MMUs and avoid unnecessary iteration/lookup while unmap
> of NestedVMs.

I'm sorry, but this is just wrong. You are merging the EL1 and EL2
translation regimes, which is not acceptable.

> I am anticipating this unwanted switch-over wont happen when we have
> NV2 only support in V12?

V11 is already NV2 only, so I really don't get what you mean here.
Everything stays the same, and there is nothing to change here.

What you describe looks like a terrible bug somewhere on the
page-fault path that has the potential to impact non-NV, and I'd like
to focus on that.

I've been booting my L1 with a fairly large number of vcpus (32 vcpu
for 6 physical CPUs), and I don't see this.

Since you seem to have a way to trigger it on your HW, can you please
pinpoint the situation where we map the page without having the
corresponding data?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel