From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8075E383326
	for <linux-kselftest@vger.kernel.org>; Mon, 11 May 2026 23:35:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778542509; cv=none; b=YszDuTYWsloTmde5oTRqTsaLAA/3ubYII7ULQGc+UZqXoBHxSeymXtVBJ2FnwBWwS7ffeRVtrTs0W0TAV6ljT4idLhpjjGxQAFJPu0luaQy57pUv4Kn/YaU/tdjFmg4tovt4uP49FfSWSHnB7F8GNtDJbqz/Q+Ov7itH/+QXpbg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778542509; c=relaxed/simple;
	bh=Ji/g90BvsHSv60ZUBpCxZ9AKzBdvPAlczFuLYpKfKjs=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=C07Waj5JvKmpUUPN1sPAXu9lnLWzC90T8XD4Wr3gwGonG8K6jrFctnhEZpVpil9prGZ52BNy6B+Y0k9TTtPcb+KP5z8DXb5/tuIkt1Cbk1B/bKK+/xDU+BW32Q5i7ANXluqsk2inyFHZ52m65JEdeNMpq9XZxtObP/xaeXLwWBs=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Xsw1nCSO; arc=none smtp.client-ip=209.85.210.202
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Xsw1nCSO"
Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-836d0184333so4777829b3a.0
        for <linux-kselftest@vger.kernel.org>; Mon, 11 May 2026 16:35:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1778542508; x=1779147308; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=0jrLObVTajcjOJDJF3S4+2X11nmDT/l3IIL/Yw0FiJQ=;
        b=Xsw1nCSOB6xQlWHSkrX0QQCiJkBxLQYMcBefWiVaxczei465z52tjG37ad7TAWI6Vi
         48dNZNTjcR5/5JUhrORrCLvs49a6LwyTw8615woOcpUCXwCOFZxBtRANDEDw46LNrtts
         OpHVZxTHjvCudArqpFIsThyjasPK9ZUJGEa4fErxrmU0d3qKPYhm7kEl6os4BKmqabil
         DQFjzOV5fLqNoPFqbZ17/vXJFd70JrCDVH80KzXpfWJUvzMLPoVrPWU1zeX9ENjbXwRd
         pN59yK4Sh5KFfP2lDKTUALPmE8EzoL61RbqMFs2nITrnxoG3hOogXtAI5Bekk9oPXhsN
         2v0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1778542508; x=1779147308;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=0jrLObVTajcjOJDJF3S4+2X11nmDT/l3IIL/Yw0FiJQ=;
        b=TXqZ5XmG4zTCk6f3xvcdkb0NiZjVEAk+RxsepBXD21yE9pxvwneW6e8yK5CvFlw4JF
         b48clg5b+HV/kArkA15M7BQ3rZQozqhmct0z+d2azBPBRRTaD7bQSJxmCfrQkr7W++sk
         QLI4CSFwt1dIRuEY8DivPUtsCHVRruFMASDCU0PJRbVi6CBc5MYBR0IVglNbLbWIRkyJ
         sL8mW+al0Mg9h41SneT//HLA37SwAEYsSs8kzQ9VCAz+FmcrZrG0joDRrE4WqzF+yCKr
         9W7eTIHiySLEcFevWVvPoIFoeUCtd1Uk9hivvXHWl8iXzAmwHWoXI9H3gwdiD+Lao2y0
         iJRQ==
X-Forwarded-Encrypted: i=1; AFNElJ9lnK792XZ4Y40F7KL5We5fUKPv85HqcGd03KIdwvH8RWKnTjajJBOoCQNhNIe4yuVdChPjFH7rzsCC2/DW/rE=@vger.kernel.org
X-Gm-Message-State: AOJu0YyauHfMjZXeuARdcDhZPCzB7SXKKRNWLPDDyAMFUEMgcdwR66fv
	sfW5Qv3yQisAZfuQMMrxpoACOqgaXHrJo1KvHHFnZ98ifs7Vozvzwpm/WP5AqQZ07LdXpsmblKO
	FxDyeIA==
X-Received: from pfbgo19.prod.google.com ([2002:a05:6a00:3b13:b0:82f:8a3b:87f0])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:230a:b0:823:d2c:b156
 with SMTP id d2e1a72fcca58-83a5b6d0a01mr26153297b3a.5.1778542507570; Mon, 11
 May 2026 16:35:07 -0700 (PDT)
Date: Mon, 11 May 2026 16:35:06 -0700
In-Reply-To: <20260102142429.896101-5-griffoul@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kselftest@vger.kernel.org
List-Id: <linux-kselftest.vger.kernel.org>
List-Subscribe: <mailto:linux-kselftest+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kselftest+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260102142429.896101-1-griffoul@gmail.com> <20260102142429.896101-5-griffoul@gmail.com>
Message-ID: <agJnqh7ACEkB-ftx@google.com>
Subject: Re: [PATCH v4 04/10] KVM: nVMX: Implement cache for L1 APIC pages
From: Sean Christopherson <seanjc@google.com>
To: Fred Griffoul <griffoul@gmail.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, vkuznets@redhat.com, 
	shuah@kernel.org, dwmw@amazon.co.uk, linux-kselftest@vger.kernel.org, 
	linux-kernel@vger.kernel.org, Fred Griffoul <fgriffo@amazon.co.uk>
Content-Type: text/plain; charset="us-ascii"

On Fri, Jan 02, 2026, Fred Griffoul wrote:
> From: Fred Griffoul <fgriffo@amazon.co.uk>
> 
> Replace kvm_host_map usage with gfn_to_pfn_cache for L1 APIC
> virtualization pages (APIC access, virtual APIC, and posted interrupt
> descriptor pages) to improve performance with unmanaged guest memory.
> 
> The conversion involves several changes:
> 
> - Page loading in nested_get_vmcs12_pages(): load vmcs02 fields with
>   pfncache PFNs after each cache has been checked and possibly activated
>   or refreshed, during OUTSIDE_GUEST_MODE vCPU mode.
> 
> - Invalidation window handling: since nested_get_vmcs12_pages() runs in
>   OUTSIDE_GUEST_MODE, there's a window where caches can be invalidated
>   by MMU notifications before entering IN_GUEST_MODE. implement
>   is_nested_state_invalid() callback to monitor cache validity between
>   OUTSIDE_GUEST_MODE and IN_GUEST_MODE transitions. This triggers
>   KVM_REQ_GET_NESTED_STATE_PAGES when needed.
> 
> - Cache access in event callbacks: the virtual APIC and posted interrupt
>   descriptor pages are accessed by KVM in has_events() and
>   check_events() nested_ops callbacks. These use the kernel HVA following
>   the pfncache pattern of check/refresh, with both callbacks able to sleep
>   if cache refresh is required.
> 
> This eliminates expensive memremap/memunmap cycles for each L2 VM
> entry/exit, providing substantial performance improvements when using
> unmanaged memory.
> 
> Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk>
> ---
>  arch/x86/kvm/vmx/nested.c | 182 +++++++++++++++++++++++++++++---------
>  arch/x86/kvm/vmx/vmx.h    |   8 +-
>  include/linux/kvm_host.h  |   5 ++
>  3 files changed, 150 insertions(+), 45 deletions(-)

Please split this up; one "cache" per patch.  It's a bit gratutious, but the usage
of the vapic and pid structures is just complex enough to make this more than a
straightforward, "boring" conversion.

> @@ -3112,8 +3173,9 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
>  static int nested_vmx_check_controls_late(struct kvm_vcpu *vcpu,
>  					  struct vmcs12 *vmcs12)
>  {
> -	void *vapic = to_vmx(vcpu)->nested.virtual_apic_map.hva;
> -	u32 vtpr = vapic ? (*(u32 *)(vapic + APIC_TASKPRI)) >> 4 : 0;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	void *vapic;
> +	u32 vtpr = 0;
>  
>  	/*
>  	 * Don't bother with the consistency checks if KVM isn't configured to
> @@ -3130,6 +3192,12 @@ static int nested_vmx_check_controls_late(struct kvm_vcpu *vcpu,
>  	if (!warn_on_missed_cc)
>  		return 0;
>  
> +	vapic = nested_lock_vapic(vmx);
> +	if (vapic) {
> +		vtpr = (*(u32 *)(vapic + APIC_TASKPRI)) >> 4;
> +		nested_unlock_vapic(vmx);

This is *very* dangerous.  vapic will still be a legal kernel pointer, and the
code _relies_ on it to be non-NULL, but the pointer could become stale at any time.
Happily, my CLASS() approach makes this a non-issue.

> +	}
> +
>  	if ((exec_controls_get(to_vmx(vcpu)) & CPU_BASED_TPR_SHADOW) &&
>  	    nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) &&
>  	    !nested_cpu_has_vid(vmcs12) &&

...

> @@ -3543,7 +3609,16 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu)
>  
>  static bool vmx_is_nested_state_invalid(struct kvm_vcpu *vcpu)
>  {
> -	return false;
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	/*
> +	 * @vcpu is in IN_GUEST_MODE, eliminating the need for individual gpc

This is straight up wrong.

gpc->active is effectively protected by vcpu->mutex.

And for gpc->valid, it's not completely wrong, but it's partially wrong and
definitely misleading.  It too is protected by vcpu->mutex, but only for %false
=> %true transitions.  And even for %true => %false transitions, IN_GUEST_MODE
doesn't exactly provide safety, rather it ensures that KVM will to kick the vCPU
out of the guest before completing the invalidation.

But that's all moot, because the very existence of this code feels wrong.  KVM
should be sending a "no action" request and then requiring vendor code to check
out-of-band data to proxy that into a "real" request.

Presumably older code that you're reverting used KVM_REQ_OUTSIDE_GUEST_MODE;
that doesn't necessary mean it was a good idea, it got yanked out for a reason :-)

IMO, the least awful way to deal with this is to tag KVM_REQ_GET_NESTED_STATE_PAGES
with KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP, and then pass the specific request
that needs to be made when invalidating a vCPU-mapped gpc as part of
__kvm_gpc_init() (though maybe call it vcpu_gpc_init() instead?).  That way KVM
doesn't need this wonky request => kvm_gpc_invalid() => request proxying, and we
should be avoid to entirely avoid the confusingly-named kvm_gpc_invalid().