From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB7B73750CD
	for <linux-kernel@vger.kernel.org>; Tue,  3 Mar 2026 02:22:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1772504552; cv=none; b=tMHRXMgvNflYTgJzZakDoNGi73kdM4fNknDe0m/HMAnda/qlvGgdBMqYOr//5Zm+rI9lvTqEabUsLQpOHL/cfApzV8AZjOgB6nQviPm09JYg352NyKkGil02jg+B4/Hy05QCbzAOxW9PMPc4z8dJnMgj1L1UUTGESN8ar5cLLpg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1772504552; c=relaxed/simple;
	bh=gf+2BiY/yfU9/1YNczOQ/IYDeZ7anIWZ83BCs8n4Yr4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=F86M7uFYWVf1qBUGvBLgRSN94CbiSw83u617s9WZtaLaeQkDZjg5ORiihcYB5GwEhggB0S8KyRcWk6q6qBlZfTnGSgM0gCCjKXUkRK07Wex8lXl+2DILvZmecjMfrcei9QMaBovVahx0hsJp/MZf+lEjvNIt1XYKV0mqMC4Nk5o=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=V8ZjeqdM; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="V8ZjeqdM"
Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-35980affbf3so1595516a91.0
        for <linux-kernel@vger.kernel.org>; Mon, 02 Mar 2026 18:22:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1772504549; x=1773109349; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=MQvQrkte+OWOvYaFeGeQ3JIRUHTtvoopnhw6ppPf9PY=;
        b=V8ZjeqdMQvIqeMEt38AgNsqAXgUzlkFgRiW/yPcBuYI1rNx1UySxQ3PKc7lNMRIHAq
         3um8PQ820Pc2mEDG4tjOfltJfOts9F9vjOKhDHNCDmNvUoxbZrO/mdWe2nhWBvYp5O3K
         LketIIKF7K+8rqTEXTYDNrWEKtKbpCMIAdLe8sXLeLSwSr1yEbW5yx/P9jzRYHZZsisL
         xnI5DixGaiUEm0pcViSKB0OJgiP17pM41c7h4nmzUG5tII6b0nPKnRbxK20xpXZO1aim
         Z1qEQq9QZtUb6rdtXvG2atc8c+cZOsZOCu4Qez4QwRF8blKt9L470BT0/9LViJQTq6Zt
         wgSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1772504549; x=1773109349;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=MQvQrkte+OWOvYaFeGeQ3JIRUHTtvoopnhw6ppPf9PY=;
        b=qoqmaKteexFnVRmTBHglSmgAle8DZSE1gZW6pqcFwz/G/XwnabHCuSDRW40vbC7tqZ
         rkdipp1yQ4VjH0iN1ZJE8/My4He1P3BofO9zU/vheIe51lsBMJERgdJNuRLfGjsOE27f
         ggnlD4VYxYSObpSZ1ejQ7r6sh55Vyz+E6G2fDp7q0j8Vlk4L+59/IV5HZzrXwQA4A3xj
         y9m9RZp5tUW5fxuNGRhvR90693dqPKObBwZBQ+CIpGDxiCyW2tdGPdZGrwZcWSIhmiwE
         kykFiog6Kh+KJNCRQ6icT27gI8yqTzidugaF0Dvww6FfYji9irOdfDp0WmyAUtEvoPHE
         vBKw==
X-Forwarded-Encrypted: i=1; AJvYcCXTU12nqj9lYsEyX/hd1JimWuXG7ZgmfxRSwShKJcGnMqMtrAKBoE3pdArPT3FQlm/skrWHbnHPq6xvzJE=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz0cH1W38DEpF5y9SLG/qZaiWGu7wx58BhGQSwq+7TJgl2DDpOd
	1KZahThxuK/FBMYOP504lYWHDh46sTLZ5AQSilI9sa0LIcTtj1filgOD951ROxDw9PXK4Pn0zRx
	vedAaiQ==
X-Received: from pjbdw2.prod.google.com ([2002:a17:90b:942:b0:358:f01f:25f3])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3d46:b0:359:120f:d3aa
 with SMTP id 98e67ed59e1d1-35965c40c42mr12513063a91.14.1772504549141; Mon, 02
 Mar 2026 18:22:29 -0800 (PST)
Date: Mon, 2 Mar 2026 18:22:27 -0800
In-Reply-To: <20260228033328.2285047-5-chengkev@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260228033328.2285047-1-chengkev@google.com> <20260228033328.2285047-5-chengkev@google.com>
Message-ID: <aaZF43PdvrZvIaXn@google.com>
Subject: Re: [PATCH V4 4/4] KVM: SVM: Raise #UD if VMMCALL instruction is not intercepted
From: Sean Christopherson <seanjc@google.com>
To: Kevin Cheng <chengkev@google.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, 
	yosry@kernel.org, Vitaly Kuznetsov <vkuznets@redhat.com>
Content-Type: text/plain; charset="us-ascii"

+Vitaly

On Sat, Feb 28, 2026, Kevin Cheng wrote:
> The AMD APM states that if VMMCALL instruction is not intercepted, the
> instruction raises a #UD exception.
> 
> Create a vmmcall exit handler that generates a #UD if a VMMCALL exit
> from L2 is being handled by L0, which means that L1 did not intercept
> the VMMCALL instruction. The exception to this is if the exiting
> instruction was for Hyper-V L2 TLB flush hypercalls as they are handled
> by L0.

*sigh*

Except this changelog doesn't capture *any* of the subtlety.  And were it not for
an internal bug discussion, I would have literally no clue WTF is going on.

There's not generic missed #UD bug, because this code in recalc_intercepts()
effectively disables the VMMCALL intercept in vmcb02 if the intercept isn't set
in vmcb12.

	/*
	 * We want to see VMMCALLs from a nested guest only when Hyper-V L2 TLB
	 * flush feature is enabled.
	 */
	if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
		vmcb_clr_intercept(c, INTERCEPT_VMMCALL);

I.e. the only bug *knowingly* being fixed, maybe, is an edge case where Hyper-V
TLB flushes are enabled for L2 and the hypercall is something other than one of
the blessed Hyper-V hypercalls.  But in that case, it's not at all clear to me
that synthesizing a #UD into L2 is correct.  I can't find anything in the TLFS
(not surprising), so I guess anything goes?

Vitaly,

The scenario in question is where HV_X64_NESTED_DIRECT_FLUSH is enabled, L1 doesn't
intercept VMMCALL, and L2 executes VMMCALL with something other than one of the
Hyper-V TLB flush hypercalls.  The proposed change is to synthesize #UD (which
is what happens if HV_X64_NESTED_DIRECT_FLUSH isn't enable).  Does that sound
sane?  Should KVM instead return an error.

As for bugs that are *unknowingly* being fixed, intercepting VMMCALL and manually
injecting a #UD effectively fixes a bad interaction with KVM's asinine
KVM_X86_QUIRK_FIX_HYPERCALL_INSN.  If KVM doesn't intercept VMMCALL while L2
is active (L1 doesn't wants to intercept VMMCALL and the Hyper-V L2 TLB flush
hypercall is disabled), then L2 will hang on the VMMCALL as KVM will intercept
the #UD, then "emulate" VMMCALL by trying to fixup the opcode and restarting the
instruction.

That can be "fixed" by disabling the quirk, or by hacking the fixup like so:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..3f6d9950f8f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10506,17 +10506,22 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
         * If the quirk is disabled, synthesize a #UD and let the guest pick up
         * the pieces.
         */
-       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
-               ctxt->exception.error_code_valid = false;
-               ctxt->exception.vector = UD_VECTOR;
-               ctxt->have_exception = true;
-               return X86EMUL_PROPAGATE_FAULT;
-       }
+       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN))
+               goto inject_ud;
 
        kvm_x86_call(patch_hypercall)(vcpu, instruction);
 
+       if (is_guest_mode(vcpu) && !memcmp(instruction, ctxt->fetch.data, 3))
+               goto inject_ud;
+
        return emulator_write_emulated(ctxt, rip, instruction, 3,
                &ctxt->exception);
+
+inject_ud:
+       ctxt->exception.error_code_valid = false;
+       ctxt->exception.vector = UD_VECTOR;
+       ctxt->have_exception = true;
+       return X86EMUL_PROPAGATE_FAULT;
 }
 
 static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu)
--

But that's extremely convoluted for no purpose that I can see.  Not intercepting
VMMCALL requires _more_ code and is overall more complex.

So unless I'm missing something, I'm going to tack on this to fix the L2 infinite
loop, and then figure out what to do about Hyper-V, pending Vitaly's input.

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 45d1496031a7..a55af647649c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -156,13 +156,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
                        vmcb_clr_intercept(c, INTERCEPT_VINTR);
        }
 
-       /*
-        * We want to see VMMCALLs from a nested guest only when Hyper-V L2 TLB
-        * flush feature is enabled.
-        */
-       if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
-               vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
-
        for (i = 0; i < MAX_INTERCEPT; i++)
                c->intercepts[i] |= g->intercepts[i];