From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8DF2BE5E for ; Tue, 16 Dec 2025 16:01:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900867; cv=none; b=qcLX8rxuXYhQrIjop9I3Il8IlZQGMpI6/qfSZNUlfVpfa4HruPcdnSPr+7WNHJMYQLBFDHRvXh6M5/ZVVHwvidRDId4nY8T4uyQtzEt1WXdsLFmxKTAdTmqeGmQricM8vgOeyo7IK7x8lEP9GTlAG/Q/4X9/hKMic8MKpvTmFsI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900867; c=relaxed/simple; bh=yNFnE3QRWlC1z1b1Pzzpu0pYl/KVYC+2f5/9GLoa/4w=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RhFHI/S94jgZ8AwI0ttiTvG82ZNr138ZxfVfIoe8oTqWVMODW/oQbCsNhZSsnze0DLclngpKgDRSJXn3ug7goSnvK4w018pLISIysBhFtM57pjfzLvfQGMScIFJ2Oix+1SEjb97d5aQcvSag6xVyjokIpFp43pTKhf4wCx/uxsY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=W5DKYl8t; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="W5DKYl8t" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-34ab459c051so11539786a91.0 for ; Tue, 16 Dec 2025 08:01:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1765900865; x=1766505665; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ed0NKnHXDUx6y67ShmBXuv80GX62o6+xq78eUXQD85M=; b=W5DKYl8tlV1YqJlX20UbYj5sbOZ3rtQsFDAAWEMDGiAJc/gvpmXmzCTNh09o7uRn8y h13hCy1/lfk38zguFoV6CRQAodYP0Ft89Ecn/jB4tirNfcTZwsWkr0vMOc58UPZTF2iI 4sqLCHIJhTgulsUNCdjTDdiGGwiLyBE6vB2oYxOxn8j1RVePvaNuzjxVJHL4OgLo91Is enxHbTx1xhuTP2myZOpnVumgHuFJ9bX504WxbQQtF/FKhgSgP8wmxZoCW0QE9NrOUWpi Jy3ePjNgyJJ8EfrmcXDWpADtDXR19W38OFkE8zFtpcp0Xc2QT+9req7N03HHYS4RNMJv S3sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765900865; x=1766505665; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ed0NKnHXDUx6y67ShmBXuv80GX62o6+xq78eUXQD85M=; b=BE/g80F2eFoQKPLbgjehE7ycnHLs8Qiae06+L/mrvR/MTkjAuhPHeLBgeGF5ZJSNPj /3jkKCWqXQrOfCWAAFK3GCmbXt26tY7jguf1H47CyO+/8EGgYIw/naW+na/EsgB1DhRZ SlaBo4XRtWBaQ+AwBdb2Dfe83jSpW3lCdj0ohH+gTIEc7Dm9h54jDC+T+ONXRovHiVBZ dhMTf0pkJO5xLuX8k2WSMQQ1GmQMjGe+pJMAliLi1EgRxO5c496lAgm2UZptP1EWlXnI fVVMmhkJwwLb/NI7T+JckzKaqxfyaI0JyFajvoqifbdSle9Ah1XWQmw7lDc7oRqA7t6e q8TA== X-Forwarded-Encrypted: i=1; AJvYcCWAEGOwPxyz1nW+51C8dyHrXlXPBoYR+geg0Ru9Pryy94IRdSlPs+SmqWn0mz5FPtEELEE=@vger.kernel.org X-Gm-Message-State: AOJu0YzryRNjOowXOEHBvxqCA5gUhlurE82XakvLQsstdW7jX2ecZQsE +RJyrbsm77TC0CWWw07z/nefFdpwGdKHLd0T3P5IAQWRXBqFWP+qT2KghPiOaWX/IYzXQ9N+AQU /VWx4ew== X-Google-Smtp-Source: AGHT+IHxGgJVMrExe1w1FALuUErnemaqW+x4EsczgP3V99F1HVfGLEgM+LG40KSurJS2g3VWhV/ZeVpDzeA= X-Received: from pjbqx13.prod.google.com ([2002:a17:90b:3e4d:b0:34c:811d:e3ca]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d450:b0:34c:f92a:ad05 with SMTP id 98e67ed59e1d1-34cf92ac1ebmr161260a91.11.1765900864883; Tue, 16 Dec 2025 08:01:04 -0800 (PST) Date: Tue, 16 Dec 2025 08:01:03 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251205070630.4013452-1-chengkev@google.com> Message-ID: Subject: Re: [PATCH] KVM: SVM: Don't allow L1 intercepts for instructions not advertised From: Sean Christopherson To: Kevin Cheng Cc: pbonzini@redhat.com, jmattson@google.com, yosry.ahmed@linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Sat, Dec 13, 2025, Kevin Cheng wrote: > On Tue, Dec 9, 2025 at 11:52=E2=80=AFAM Sean Christopherson wrote: > > On Fri, Dec 05, 2025, Kevin Cheng wrote: > > static > > void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, > > struct vmcb_ctrl_area_cached *= to, > > struct vmcb_control_area *from= ) > > { > > unsigned int i; > > > > for (i =3D 0; i < MAX_INTERCEPT; i++) > > to->intercepts[i] =3D from->intercepts[i]; > > > > nested_svm_sanitize_intercept(vcpu, to, RDTSCP); > > nested_svm_sanitize_intercept(vcpu, to, SKINIT); > > __nested_svm_sanitize_intercept(vcpu, to, XSAVE, XSETBV); > > nested_svm_sanitize_intercept(vcpu, to, RDPRU); > > nested_svm_sanitize_intercept(vcpu, to, INVPCID); > > > > Side topic, do we care about handling the case where userspace sets CPU= ID after > > stuffing guest state? I'm very tempted to send a patch disallowing KVM= _SET_CPUID > > if is_guest_mode() is true, and hoping no one cares. >=20 > Hmm good point I haven't thought about that. Would it be better to > instead update the nested state in svm_vcpu_after_set_cpuid() if > KVM_SET_CPUID is executed when is_guest_mode() is true? Allowing CPUID to change (or VMX feature MSRs) when the vCPU is in L2 creat= es a massive set of potential complications that would be all but impossible to = handle. E.g. if userspace clears features that are being used to run L2, then KVM c= ould end up with conflicting/nonsensical state for the vCPU. Note, the same holds true for non-nested features, which is largely what le= d to 63f5a1909f9e ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_R= UN is broken") and feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RU= N"). > Also sorry if this is a dumb question, but in general if KVM_SET_CPUID > is disallowed, then how does userspace handle a failed IOCTL call? The VMM logs an error and exits. > Do they just try again later or accept that the call has failed? Or is th= ere > an error code that signals that the vcpu is executing in guest mode and > should wait until not in guest mode to call the IOCTL? In practice, CPUID (and feature MSRs) are set very early on, when userspace= is creating vCPUs. With one exception, any other approach simply can't work, = because as above CPUID can't be configured after the vCPU has run. The only way fo= r the vCPU to be "running" L2 is if userspace stuffed guest state via KVM_SET_NES= TED_STATE, and actually changing CPUID after that point simply can't work (it'll "succ= eed", but KVM provides no guarantees as to the viability of the vCPU). The exception is that KVM allows setting _identical_ CPUID after KVM_RUN to support QEMU's vCPU hotplug implementation, which is what commit c6617c61e8= fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") is all about. Trying to _completely_ protect userspace from itself is a fool's errand, as= KVM's granular set of ioctls for loading vCPU state and the subjectivity of what = is a "sane" vCPU model make it all but impossible to fully prevent userspace fro= m hurting itself without creating a maintenance nightmare. But I can't think= of any scenario where it would be useful for userspace to set nested state, th= en go back and change the vCPU model. I'll send a patch and see what happens. Worst case scenario we break users= pace and get to learn about crazy VMM behavior :-)