From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B47E81E835D for ; Tue, 19 Aug 2025 22:50:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755643861; cv=none; b=GuvwCyKi8YfULAwXzSKWVvTkUdeTH+eyUjGw057zlC0/DDFys/sgCZvXwEljGpcQnvhyFNu0S7IOWyql1IlL4DjvpkzaxkHI5wTNKi5jNTZUuHtUdGGTY5hm9FQ//6Cv6vv0Z7fshsteXKuhZtg7TENudOcXo44moDgD3DLPYlY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755643861; c=relaxed/simple; bh=O/G2r3Y2Ig0ObvWmkHM8QHVdbOy+ELCW90Ohvv9YlWA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XMKYY6GSLl2wy3gGR20cDobiKt/MFAWlnvRoPw8lH3L3Uu6Tz5Gwc+yT8lME8F1Q+6MF1eghJEBALa/Mv8C2BMoPLvY6W8h0/MXO5sWgKydJdhm1EgEI0hGLpiDnd8mQxFDh2aMZmhLlWMv3MJhFIC7A20hcf3bycNbBJpS4VkU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IX2XLVyF; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IX2XLVyF" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b47174c65b0so11438229a12.2 for ; Tue, 19 Aug 2025 15:50:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1755643859; x=1756248659; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8F1d8ensmdynhKAiK090uPNzvX61uCn7sQals2duC9I=; b=IX2XLVyFtXgMKIGdEPmi4xcZ7l+5T0cNE5TTOPa7zcLFYKDzxVGaCGLy01XcSg3ARS QNClBjci/CVS42mU6ZWu56tP23R/ZEq7+22/4E4SQ+ZS52y5e7D4CDi6gi2ZL+Dscf3G VFWBLd12ZdSYInZVxFiE/hDPZyCU1Ke8n/y6rSQsrMYF3o3yI7WzQ319YfUt9nAODQzx q5zV5D5e8yn2rXYd+DFcCrOxd9W15fG6JXpzQ8cDkUa7QPKb5v1+IzcunTrJUXbO+L1n qWucDkQiQrkX41t5rEGTLnar9b25Kj+ECumYEegH6ie6/UL2cRcKkQ82qSVxz/anLMMQ yATA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755643859; x=1756248659; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8F1d8ensmdynhKAiK090uPNzvX61uCn7sQals2duC9I=; b=vsZ1p2yIbBgoJ4Gvjz0AG/NwBt4AjdGrCnGQ7gh/FxTiU/DbxnQPo1IUTlDIaUDRnL xJyI1qmm3J7oszsqIk6wu7Oyz3O8SVYK7+hRmOO5YPzv7hsLaQ10rt+B9WaqgMcYNmjZ dgo+rbrYXiwNYSJwNWxzrcIME206zvhiivQQS8bF+auz4zdz6H7SZyNUnv9/Kf283pLQ LZsy/QFF4KBZNAoR5QqpqU2FVX/NfRad5mkbK1bW3nfOELpqljt1IOOhtlJ3AfpcRc4B 57plbG1d34bMovRDv0g+9uOuBPdeg7pb8g6vcaexQ+NUl7TKOPuEvedO6bJGyP/DMZFG uTBg== X-Forwarded-Encrypted: i=1; AJvYcCU29DZ7nwNeUawhCPdKcMajOMsfUSq9trH79V9SbhTeWENR+G6VJW3sL4vThdCMpLuCMeg=@vger.kernel.org X-Gm-Message-State: AOJu0YwaUjZ9MtgzapTqEoIM8rTG2kHA+srxycS+Kgo9Y5zGLv0gjoGt gqbXamDUk0P3jMH5bPuUJj+FiQaBjuGGsuEB1EWBns8rp/rLhLxkk4ddX8HhbJ37QENzwM9BnnL 1Nd2IsQ== X-Google-Smtp-Source: AGHT+IH8Dtk11/iXlRV2JDXYmRPBJjQyOw0TtR1krn5nmzHVB9u9lIPp4DhpJM9xf0qfXKszXE3zpw15jWU= X-Received: from pjbso3.prod.google.com ([2002:a17:90b:1f83:b0:31f:1707:80f6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:53c7:b0:31e:f3b7:49d2 with SMTP id 98e67ed59e1d1-324e1178297mr1156941a91.0.1755643859135; Tue, 19 Aug 2025 15:50:59 -0700 (PDT) Date: Tue, 19 Aug 2025 15:50:57 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <4266fc8f76c152a3ffcbb2d2ebafd608aa0fb949.1750432368.git.jpiotrowski@linux.microsoft.com> <875xghoaac.fsf@redhat.com> <87o6tttliq.fsf@redhat.com> <87tt2nm6ie.fsf@redhat.com> Message-ID: Subject: Re: [RFC PATCH 1/1] KVM: VMX: Use Hyper-V EPT flush for local TLB flushes From: Sean Christopherson To: Jeremi Piotrowski Cc: Vitaly Kuznetsov , Dave Hansen , linux-kernel@vger.kernel.org, alanjiang@microsoft.com, chinang.ma@microsoft.com, andrea.pellegrini@microsoft.com, Kevin Tian , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , linux-hyperv@vger.kernel.org, Paolo Bonzini , kvm@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Fri, Aug 15, 2025, Jeremi Piotrowski wrote: > On Tue, Aug 05, 2025 at 04:42:46PM -0700, Sean Christopherson wrote: > I started working on extending patch 5, wanted to post it here to make sure I'm > on the right track. > > It works in testing so far and shows promising performance - it gets rid of all > the pathological cases I saw before. Nice :-) > I haven't checked whether I broke SVM yet, and I need figure out a way to > always keep the cpumask "offstack" so that we don't blow up every struct > kvm_mmu_page instance with an inline cpumask - it needs to stay optional. Doh, I meant to include an idea or two for this in my earlier response. /The best I can come up with is > I also came across kvm_mmu_is_dummy_root(), that check is included in > root_to_sp(). Can you think of any other checks that we might need to handle? Don't think so? > @@ -3827,6 +3829,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, > sp = kvm_mmu_get_shadow_page(vcpu, gfn, role); > ++sp->root_count; > > + if (level >= PT64_ROOT_4LEVEL) Was this my code? If so, we should move this into the VMX code, because the fact that PAE roots can be ignored is really a detail of nested EPT, not the overall sceheme. > + kvm_x86_call(alloc_root_cpu_mask)(sp); Ah shoot. Allocating here won't work, because mmu_lock is held and allocating might sleep. I don't want to force an atomic allocation, because that can dip into pools that KVM really shouldn't use. The "standard" way KVM deals with this is to utilize a kvm_mmu_memory_cache. If we do that and add e.g kvm_vcpu_arch.mmu_roots_flushed_cache, then we trivially do the allocation in mmu_topup_memory_caches(). That would eliminate the error handling in vmx_alloc_root_cpu_mask(), and might make it slightly less awful to deal with the "offstack" cpumask. Hmm, and then instead of calling into VMX to do the allocation, maybe just have a flag to communicate that vendor code wants per-root flush tracking? I haven't thought hard about SVM, but I wouldn't be surprised if SVM ends up wanting the same functionality after we switch to per-vCPU ASIDs. > + > return __pa(sp->spt); > } ... > @@ -3307,22 +3309,34 @@ void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) > vpid_sync_context(vmx_get_current_vpid(vcpu)); > } > > -static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa) > +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root) > { This should be conditioned on enable_ept. > + WARN_ON_ONCE(!zalloc_cpumask_var(&root->cpu_flushed_mask, > + GFP_KERNEL_ACCOUNT)); > +} > + > +static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa, int cpu) > +{ > + struct kvm_mmu_page *root; > + > if (!VALID_PAGE(root_hpa)) > return; > > + root = root_to_sp(root_hpa); > + if (!root || cpumask_test_and_set_cpu(cpu, root->cpu_flushed_mask)) Hmm, this should flush if "root" is NULL, because the aforementioned "special" roots don't have a shadow page. But unless I'm missing an edge case (of an edge case), this particular code can WARN_ON_ONCE() since EPT should never need to use any of the special roots. We might need to filter out dummy roots somewhere to avoid false positives, but that should be easy enough. For the mask, it's probably worth splitting test_and_set into separate operations, as the common case will likely be that the root has been used on this pCPU. The test_and_set version will generate a LOCK BTS instruction, and so for the common case where the bit is already set, KVM will generate an atomic access, which can cause noise/bottlenecks E.g. if (WARN_ON_ONCE(!root)) goto flush; if (cpumask_test_cpu(cpu, root->cpu_flushed_mask)) return; cpumask_set_cpu(cpu, root->cpu_flushed_mask); flush: vmx_flush_tlb_ept_root(root_hpa); > + return; > + > vmx_flush_tlb_ept_root(root_hpa); > } > > -static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu) > +static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu, int cpu) > { > int i; > > - __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa); > + __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa, cpu); > > for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) > - __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa); > + __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa, cpu); > } > > void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) > diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h > index b4596f651232..4406d53e6ebe 100644 > --- a/arch/x86/kvm/vmx/x86_ops.h > +++ b/arch/x86/kvm/vmx/x86_ops.h > @@ -84,6 +84,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu); > void vmx_flush_tlb_current(struct kvm_vcpu *vcpu); > void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr); > void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu); > +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root); > void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask); > u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu); > void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall); > -- > 2.39.5 >