From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAA40C77B6E for ; Thu, 13 Apr 2023 20:58:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229611AbjDMU6h (ORCPT ); Thu, 13 Apr 2023 16:58:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjDMU6e (ORCPT ); Thu, 13 Apr 2023 16:58:34 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09EF983EA for ; Thu, 13 Apr 2023 13:58:34 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-6323eb1a289so1495367b3a.1 for ; Thu, 13 Apr 2023 13:58:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681419513; x=1684011513; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=nEv2HVs1S+Egpav/jnzfzsQjcllfLnLuCFh+OUzwj8A=; b=N/NxeDGiy0A3sp3zsLS3ONiWVymFkZsvGZW/ifZ+0iNpy4Cnu0yvtFO/vEmGTqLWiH KXoYXslYuFxGsyEH33NwF0ERBM0GsmdjykR3Dc67bRTjpbFQ6HEMvFLZiBlBl5vRR2up +3M0y1sLKYIO3HqKiayhyGXRcqPTcmkSVYAJBXoefUccwC7wXAkJ7K/acVdWDJF/zSZE Ws36H1mGkrL56HIgJFH9kyVJunsDFoGM4WLNHVehJERDrCAYFDw4/1tidyOx0ddiMGCT 1M62BnNL7CUWlD6K+tKPpRpxO8IEQ/EADHzLI4PjUt2cbDJunv4i411h+dc9f9uqIlsy KO1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681419513; x=1684011513; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=nEv2HVs1S+Egpav/jnzfzsQjcllfLnLuCFh+OUzwj8A=; b=fDGE++C9a5emEk1JVBJDM1gj10AzHzTZFGSz0mnMApMBuM4A4D0nAgHXMXCnWn2pyo WBgqJdrEfuN6wSrDIk+tfXXbSaze1mlrammttnAYzYbvKXffzyf1hjgSKsB/ohgeBiCV aSAk0tcVWukbFoCT/lnfxZVPQKEGseiAqiV6qYCqt2vw/48z8cTpflo4kYIKKz3TWIN1 +g/KaNVq3wwXMtEY2Wch5KT0RY8LGCf81x7mTciTifLGQgC97/Qq8UN77W16mii7R6xe tqiFTimEhcDO+obCtSguumxDcBcnuoBIXATafJHspYLWtN6GOKklapoVIr635PppC2Qc bMHA== X-Gm-Message-State: AAQBX9dh4kei22R94qZUc9zOGnTJZOamqdAZmMtKrikj+qzJGx1nLkZM awKGsWbE5J4dqO11ZdCVVBDCgqKdijY= X-Google-Smtp-Source: AKy350bSXma0H87PRt68CUCJhjZ6Fcq2Rn85m8fx/BXyeVGSzmlgqoQTUIRypWDhyXYiGsOUwAoC45uhcEQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:2d25:b0:62e:1972:fab5 with SMTP id fa37-20020a056a002d2500b0062e1972fab5mr1835355pfb.4.1681419513538; Thu, 13 Apr 2023 13:58:33 -0700 (PDT) Date: Thu, 13 Apr 2023 13:58:32 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230227171751.1211786-1-jpiotrowski@linux.microsoft.com> <959c5bce-beb5-b463-7158-33fc4a4f910c@linux.microsoft.com> <61d131da-7239-6aae-753f-2eb4f1b84c24@linux.microsoft.com> Message-ID: Subject: Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V From: Sean Christopherson To: David Matlack Cc: Jeremi Piotrowski , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Vitaly Kuznetsov , Paolo Bonzini , Tianyu Lan , Michael Kelley Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 13, 2023, David Matlack wrote: > On Thu, Apr 13, 2023 at 12:10=E2=80=AFPM Sean Christopherson wrote: > > > > On Thu, Apr 13, 2023, Sean Christopherson wrote: > > > Aha! Idea. There are _at most_ 4 possible roots the TDP MMU can enc= ounter. > > > 4-level non-SMM, 4-level SMM, 5-level non-SMM, and 5-level SMM. I.e.= not keeping > > > inactive roots on a per-VM basis is just monumentally stupid. > > > > One correction: there are 6 possible roots: > > > > 1. 4-level !SMM !guest_mode (i.e. not nested) > > 2. 4-level SMM !guest_mode > > 3. 5-level !SMM !guest_mode > > 4. 5-level SMM !guest_mode > > 5. 4-level !SMM guest_mode > > 6. 5-level !SMM guest_mode > > > > I forgot that KVM still uses the TDP MMU when running L2 if L1 doesn't = enable > > EPT/TDP, i.e. if L1 is using shadow paging for L2. But that really doe= sn't change > > anything as each vCPU can already track 4 roots, i.e. userspace can sat= urate all > > 6 roots anyways. And in practice, no sane VMM will create a VM with bo= th 4-level > > and 5-level roots (KVM keys off of guest.MAXPHYADDR for the TDP root le= vel). >=20 > Why do we create a new root for guest_mode=3D1 if L1 disables EPT/NPT? Because "private", a.k.a. KVM-internal, memslots are visible to L1 but not = L2. Which for TDP means the APIC-access page. From commit 3a2936dedd20: kvm: mmu: Don't expose private memslots to L2 =20 These private pages have special purposes in the virtualization of L1, but not in the virtualization of L2. In particular, L1's APIC access page should never be entered into L2's page tables, because this causes a great deal of confusion when the APIC virtualization hardware is being used to accelerate L2's accesses to its own APIC. FWIW, I _think_ KVM could actually let L2 access the APIC-access page when = L1 is running without any APIC virtualization, i.e. when L1 is passing its APIC t= hrough to L2. E.g. something like the below, but I ain't touching that with a 10 = foot pole unless someone explicitly asks for it :-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 039fb16560a0..8aa12f5f2c30 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4370,10 +4370,13 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu,= struct kvm_page_fault *fault if (!kvm_is_visible_memslot(slot)) { /* Don't expose private memslots to L2. */ if (is_guest_mode(vcpu)) { - fault->slot =3D NULL; - fault->pfn =3D KVM_PFN_NOSLOT; - fault->map_writable =3D false; - return RET_PF_CONTINUE; + if (!slot || slot->id !=3D APIC_ACCESS_PAGE_PRIVATE= _MEMSLOT || + nested_cpu_has_virtual_apic(vcpu)) { + fault->slot =3D NULL; + fault->pfn =3D KVM_PFN_NOSLOT; + fault->map_writable =3D false; + return RET_PF_CONTINUE; + } } /* * If the APIC access page exists but is disabled, go direc= tly