From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F00432A3C9 for ; Tue, 16 Jun 2026 17:46:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781631994; cv=none; b=SZKtZARIesdt2na1ffa6CzdK8LJNwZphh3iJOdwa2dXTVWmds1styjg6K25TYt5pI4ZUQqJJRt0vvMSnlo8drLg/xoz2KFKfWUaMOCseJQMrpDAyKwkNgcYMY//djoYEiIvH0MSWEM5hZIVZNFVcoWtG078dkrfcQx/D6tEmsho= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781631994; c=relaxed/simple; bh=lLdwQpDTcPOWBVReltKAfchd1iqm/kq2ztbfmzrjYVs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cN4h2UT6BkF1nUty91AJRkudLy2DPm6LQjJApmDNt/YQaN/kJyJ/Rwv403/BmweMlgTa2V7Q9mFkXOvVInZ5E6eg4VDuyVt/n7fqG/y5fj9mLJeSZ/wvvrGoujLBTbf7cnDkEDXt2I9Yc/8lszoQUofX8HHH+rjU2ylc+WZ9ypo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=OfMpsCm0; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OfMpsCm0" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2c6b7c75550so2483115ad.2 for ; Tue, 16 Jun 2026 10:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1781631993; x=1782236793; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8Nmo0xgaFEu9Qka/xs6PV8EIWbCG7nAbJ7f7XgGDPyM=; b=OfMpsCm0mN+prlDM62M9kNdyoJyHhCprz0BfxVnNwe8mFHCCSkkQjxEp5YKiJ2gwvt kAXWyhcJ10HZJNeViI0lNZuzahFrb1sUKwlJaij7ulKvneI45lxbxWoIWgEOMFn7vJM8 wTb8QzKU+AcwdEdzrGQjxh+unfo/D6tFsg59hQKcQaJIiDQReoami6gIPJ2qWMmQ72O3 qyUiX5ZKqDIs1lnIklNKXw7l3mOOfd1+W2Zx4lSmqVZcL2pzPYz/XCHqvMUz9vjEt7Kw 1k+B8RiuuLWhAn/WfmrLyKMS960hhpAa5MAxsx+LAvFuz+al7FJVCj8VqaDd4urad5vf yLsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781631993; x=1782236793; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8Nmo0xgaFEu9Qka/xs6PV8EIWbCG7nAbJ7f7XgGDPyM=; b=hH77XFRakc7e36RwucRSSIYPf0Fyr7JPSTW/V8kZSCmIeLrUL1JL3XMNXBDcpFpJM3 aGMQ4JTbF+0y2LLjWuRfVefJPHLvWbf57u/J0aVOgTFcUbhC4BlMPngI9zzPhDb4ueuz pFVko2/ZeDr2QaX9QKsVm8pNX5ngGjZCOoBQxTaosonb7xvQ9GHnXQaeYv6Bp/PqBZdI gt1Cz1K20ybM8u01t6+S+rWXVferoVMevvZUDGjC5wnsNA/eD7sKoC7QKxt7ESJsofA3 I/f/p0ZNy21qpsgujQZTDrd8+0zGAkoM4M7UiJWw0Gd3bYFnYPK2RHUfVpyFJ1YTMjR6 z5Sg== X-Forwarded-Encrypted: i=1; AFNElJ+cMyNIqn3JbJ791g61Bpq3h6cuQKhmOAbJaCxWUFhHPGeXBZTXsVTtQDmTFnAIF84rfz7kdrQ+5WU7GOc=@vger.kernel.org X-Gm-Message-State: AOJu0YxBT1f1ueB5XuoQzexoEZ74XH13vTzpzYyxnb0I6l1hJ8RtuviZ VDrNDiyQ4bZWrsASwLr7feBulsb6xWBFrYANLAxjb15pM9gA7jxQrmRaEQHbsK0CzZb1ZA7dBfq xNdgJpw== X-Received: from plbli14.prod.google.com ([2002:a17:903:294e:b0:2c0:a7f8:c132]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f610:b0:2b2:67ca:5ff9 with SMTP id d9443c01a7336-2c6bbcae428mr1042675ad.0.1781631992468; Tue, 16 Jun 2026 10:46:32 -0700 (PDT) Date: Tue, 16 Jun 2026 10:46:31 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260613000329.732085-1-seanjc@google.com> <20260613000329.732085-27-seanjc@google.com> Message-ID: Subject: Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , Vitaly Kuznetsov , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Kai Huang Content-Type: text/plain; charset="us-ascii" On Mon, Jun 15, 2026, Yosry Ahmed wrote: > > > The code makes sense to me but I am trying to make sense of the changelog. > > > > What part (parts?) is confusing? Honest question. I'm trying to reword the > > changelog to make it "better", but I'm failing miserable because I don't know > > what's wrong :-) > > 1. For kvm_vcpu_has_events() being unaffected, the explanation in > paragraph #3 is focused on the code path from nested_vmx_run() -> > kvm_emulate_halt_noskip(). I don't immediately see how > kvm_arch_vcpu_runnable() is unaffected. To reach kvm_vcpu_has_events(), kvm_vcpu_running() needs to return false. For that to happen, vcpu->arch.mp_state needs to be something other than RUNNABLE. If nested_run_pending is true, then mp_state *must* be RUNNABLE (barring bugs or stupid userspace), because KVM shouldn't emulate VMRUN/VMLAUNCH/VMRESUME while the vCPU is !RUNNABLE. I didn't include that in the changelog because I thought it was obvious, but obviously (LOL) not :-D I called out the GUEST_ACTIVITY_HLT case because (to me) that is less obvious. > 2. More importantly, paragraphs #3 and #4 read like this patch would > regress kvm_vcpu_ready_for_interrupt_injection() and > kvm_vcpu_has_events() if it affected them. Maybe clearly state that > this patch is the right thing to do for these 2 functions as well, but > they are more-or-less unaffected by the bug anyway? For > kvm_vcpu_ready_for_interrupt_injection(), maybe just make it more > clear in paragraph #4 that it currently incorrectly treats interrupts > as allowed in the problematic scenario, but it is not a problem > because ..., and it only results in a spurious exit to userspace (or > not even that?). Is this better? When querying whether or not interrupts (IRQs) are allowed, check for a pending nested run _after_ checking whether or not interrupts are blocked. If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can be blocked while running L2, and interrupts will indeed be blocked once the nested VM-Enter to L2 is completed, then KVM should treat interrupts as not being allowed. For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can immediately request an IRQ window, instead of forcing an exit and _then_ requesting an IRQ window (because after the forced exit, KVM will see that interrupts are blocked). For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is affected in practice. Barring KVM bugs or misbehaving userspace (at which point all architectural guarantees are off), kvm_vcpu_has_events() is unreachable when a nested run is pending. To reach kvm_vcpu_has_events(), kvm_vcpu_running() needs to return false, i.e. vcpu->arch.mp_state needs to be something other than RUNNABLE. If nested_run_pending is true, then mp_state *must* be RUNNABLE (again barring bugs or stupid userspace), because KVM shouldn't emulate VMRUN/VMLAUNCH/VMRESUME while the vCPU is !RUNNABLE. The one "near miss" is VMX's GUEST_ACTIVITY_STATE field, which allows L1 to put the vCPU into HLT or WFS as part of nested VMLAUNCH/VMRESUME. However, KVM clears nested_run_pending prior to calling kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT, and also clears the flag before setting mp_state to INIT_RECEIVED. SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will always be runnable if a nested run is pending, and thus kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code, as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as KVM shouldn't be faulting in memory with a pending nested VM-Enter. As for kvm_vcpu_ready_for_interrupt_injection(), KVM's current behavior of incorrectly treating interrupts as being allowed could result in KVM prematurely exiting to userspace to accept an ExtINT. But, KVM will still hold/block the ExtINT and request its own IRQ window. I.e. the net effect is more or less the same as the for-injection case, the unnecessary exit just happens at a different boundary.