From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4832434B185 for ; Tue, 6 Jan 2026 17:19:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767719963; cv=none; b=pH1TvVgQLn9RZ8qlJzrFhuwZ4DmOiIzgwopaNcIUrjQn2UWvPjS3C2VhoQRzlFEd6aNWU6QDuLZVmUQx9I2N178mwmzNDooNY42xF6VCUY6Qc2zM+i+0xUzttI6GTDZ8pN6oJINawGpEHbLlE/OtslXIGuewbSKfsP1AXbWhB/w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767719963; c=relaxed/simple; bh=KyAVrWzv3AID7JMa80Wqw08D32rx5181M5Uz4WEsr4M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bWam4hqwa+Pr9TVvfzAXjwXFj4g3wfQm5HXeqGcvZ1ezBpa5oFDOzA05YKwUXOxAeGnBO6JQYm6FmiiPyNrljnx8OWOLeoyWs/IqWrC8pnuoT8HVI/lF6J6Nqh6lcu4hVNpXihkHEPrIFCuSFYri4CAbExeOSER2epijKmNXKT4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ub9QFSf4; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ub9QFSf4" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a0bb1192cbso19956595ad.1 for ; Tue, 06 Jan 2026 09:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767719961; x=1768324761; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HniCsd4ODup/fdDva7XbnCYJAp8ItqdAhsexUFQc9RM=; b=Ub9QFSf4V8lChLq3bV9NV+C8/bR3XwDANetCPy+J+dT6PyrqhyyfULM3fpisWwQBk9 Y9OhIiUvdRoN8jnFf6ALOvNFnYnVKDfYQYZVwzqSbt7a7H3psApVZYfbC0u8aBUdzBqu x9hejFaLr8zen51tw9KkI2qqlIVX8Yno/+ou5dmiGIbycikKpfGuaVw1t9wMoXqyay2r kbxSMMUWEsBiwpbTH+l2cLiBL8IXgblT9f/Ub3ZvQhx/CwYQKNfSEWHhd1EGSLyovtS4 5ctt5lmoHNKCyp1KCJIEQHYWCMj1VCGShqDt4LpcGzs05V6v61qR+cXhte7b5WoaBzaf WGhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767719961; x=1768324761; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HniCsd4ODup/fdDva7XbnCYJAp8ItqdAhsexUFQc9RM=; b=uL24tPpqh+5pNgXfnndAd4JUUeamph16NPloYW1dBddXfhMS3bBxtIaNLRzCit7+6C XGcQcKkJ0qKJ7QRK5izfMgWBIcrNhMKzuZH3vIc6gjumfDKu0rp7g/J7OsFdr9NYAy1j y4Ik/1WCEpYmzGF7edYNGCqvajbwtYaGRYGOaQYzulH3eaYQRb0Bp+JXYwPPTjCcxb1m oq01y/MPSFcaCq45lOfAoMMZ7KPaYtZA3gzg4NkynFpuunByxdLjjliFnKW/HEHNyga3 MaoJW5p7Afgh+7fFUhEpU3B00XFtW386l1Fj+sjvlObBeND2aII5zKkG6CuIAx3ot889 ZbWg== X-Forwarded-Encrypted: i=1; AJvYcCXBbzNpi7PsQoqFoGXJ1fxY9U5R50bc6mWcgneQ5fKuGcD/PvRyjFNhczetCCXr6B6zRrlRTCQ5MwgV0Mc=@vger.kernel.org X-Gm-Message-State: AOJu0YxsOOaiYlLoifGlhBxnKpZv2vRrAW0Ptt/Th/O1BSCcyzwphnNg bcsRGSZbs/FvugOB4tuyshH0GJmIvdMYsFWPv/D5Cp98vZHrrE44ZNiItE/Lp4/ICzSrl2T3Q5h /j0uuQg== X-Google-Smtp-Source: AGHT+IHyS7zlYGcQubwOVWHsPNUuLoDSaTcet4YEYvFTyEy7rjsqt5ZlGKGrsst2abDakkNkgyAv/um8oac= X-Received: from plek17.prod.google.com ([2002:a17:903:4511:b0:2a0:974c:2e6e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2d1:b0:2a0:f0e5:3f5c with SMTP id d9443c01a7336-2a3e2cef6d7mr32610505ad.34.1767719961264; Tue, 06 Jan 2026 09:19:21 -0800 (PST) Date: Tue, 6 Jan 2026 09:19:19 -0800 In-Reply-To: <20260104093221.494510-1-alessandro@0x65c.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260104093221.494510-1-alessandro@0x65c.net> Message-ID: Subject: Re: [PATCH] KVM: x86: Retry guest entry on -EBUSY from kvm_check_nested_events() From: Sean Christopherson To: Alessandro Ratti Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Content-Type: text/plain; charset="us-ascii" On Sun, Jan 04, 2026, Alessandro Ratti wrote: > When a vCPU running in nested guest mode attempts to block (e.g., due > to HLT), kvm_check_nested_events() may return -EBUSY to indicate that a > nested event is pending but cannot be injected immediately, such as > when event delivery is temporarily blocked in the guest. > > Currently, vcpu_block() logs a WARN_ON_ONCE() and then treats -EBUSY > like any other error, returning 0 to exit to userspace. This can cause > the vCPU to repeatedly block without making forward progress, delaying > event injection and potentially leading to guest hangs under rare timing > conditions. > > Remove the WARN_ON_ONCE() and handle -EBUSY explicitly by returning 1 > to retry guest entry instead of exiting to userspace. This allows the > nested event to be injected once the temporary blocking condition > clears, ensuring forward progress. > > This issue was triggered by syzkaller while exercising nested > virtualization. Syzkaller always ruins the fun :-( > Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") > Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=1522459a74d26b0ac33a > Tested-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com > Signed-off-by: Alessandro Ratti > --- > arch/x86/kvm/x86.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ff8812f3a129..d5cf9a7ff8c5 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11596,7 +11596,15 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) > if (is_guest_mode(vcpu)) { > int r = kvm_check_nested_events(vcpu); > > - WARN_ON_ONCE(r == -EBUSY); > + /* > + * -EBUSY indicates a nested event is pending but cannot be > + * injected immediately (e.g., event delivery is temporarily > + * blocked). Return to the vCPU run loop to retry guest entry > + * instead of blocking, which would lose the pending event. > + */ > + if (r == -EBUSY) > + return 1; The code and the comment are both wrong. Returning immediately will incorrectly leave vcpu->arch.mp_state in a non-RUNNABLE state, and _that_ will put the vCPU into an infinite loop. The for-loop in vcpu_run() will always see the vCPU as !running and so will call back into vcpu_block(). vcpu_block() will see the vCPU as _runnable_ (but still not fully running!) because of the pending (and injected) event, check nested events again, hit -EBUSY again, and repeat until the VMM kills the VM. And returning '0' doesn't block the vCPU, it triggers an exit to userspace. In most cases, the spurious exit will be KVM_EXIT_UNKNOWN, but it could be something else entirely if KVM filled vcpu->run->exit_reason but didn't complete the exit to userspace. And as above, the pending event isn't lost, it'll still be pending if userspace invokes KVM_RUN again. Of course, unless userspace stuff MP_STATE, the infinite will still occur, just with userspace's KVM_RUN loop being the outermost loop (assuming userspace doesn't simply kill the VM). I said above that syzkaller ruins the fun because, as noted by the changelog in the Fixes commit, this scenario _should_ be impossible. And AFAICT, within KVM itself, that still holds true. I finally found one of syzbot's reproducers that is straightforward, i.e. doesn't require hitting a timing window with threading. In that reproducer (see Link below), userspace stuff MP_STATE and an "injected" event, thus forcing the vCPU into what is effectively an impossible state. All of the other reproducers get into HALTED naturally by executing HLT in L2, and then stuff an injected event. I've never been able to repro those, because hitting the WARN requires forcing the vCPU to exit to userspace (e.g. with a signal) just after HLT is executed so that userspace can stuff event state. But in principle it's the same scenario: userspace stuffs impossible vCPU state. For now, I'm pretty sure the least awful "fix" is to drop the WARN and continue with waking the vCPU. In all likelhiood, the garbage event stuffed by userspace will generate a failed VM-Entry, which KVM will reflect to L1. So L2 might die, but L1 should live on, which more than good enough when userspace is being stupid, and is about as good as we can do if KVM itself is buggy, i.e. if there's a legitimate KVM but that generates impossible state. I'll post the below as part of a series, as there is at least one cleanup that can be done on top to consolidate handling of EBUSY, and I'm hopeful that the spirit of the WARN can be preserved, e.g. by adding/extending WARNs in paths where KVM (re)injects events. -- From: Sean Christopherson Date: Tue, 6 Jan 2026 07:46:38 -0800 Subject: [PATCH] KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block() Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as *something* has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti Cc: stable@vger.kernel.org Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ff8812f3a129..4bf9be1e17a7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11596,8 +11596,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) if (is_guest_mode(vcpu)) { int r = kvm_check_nested_events(vcpu); - WARN_ON_ONCE(r == -EBUSY); - if (r < 0) + if (r < 0 && r != -EBUSY) return 0; } base-commit: 9448598b22c50c8a5bb77a9103e2d49f134c9578 --