Linux Kernel Selftest development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	 Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	 Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Shuah Khan <shuah@kernel.org>,  Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Sean Christopherson <seanjc@google.com>
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	 xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
	 linux-kselftest@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,  kvmarm@lists.linux.dev,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	 David Matlack <dmatlack@google.com>
Subject: [PATCH v3 09/13] KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter
Date: Thu, 22 May 2025 16:52:19 -0700	[thread overview]
Message-ID: <20250522235223.3178519-10-seanjc@google.com> (raw)
In-Reply-To: <20250522235223.3178519-1-seanjc@google.com>

Disallow binding an irqfd to an eventfd that already has a priority waiter,
i.e. to an eventfd that already has an attached irqfd.  KVM always
operates in exclusive mode for EPOLL_IN (unconditionally returns '1'),
i.e. only the first waiter will be notified.

KVM already disallows binding multiple irqfds to an eventfd in a single
VM, but doesn't guard against multiple VMs binding to an eventfd.  Adding
the extra protection reduces the pain of a userspace VMM bug, e.g. if
userspace fails to de-assign before re-assigning when transferring state
for intra-host migration, then the migration will explicitly fail as
opposed to dropping IRQs on the destination VM.

Temporarily keep KVM's manual check on irqfds.items, but add a WARN, e.g.
to allow sanity checking the waitqueue enforcement.

Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/eventfd.c | 55 +++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c7969904637a..7b2e1f858f6d 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -291,38 +291,57 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh,
 	struct kvm_kernel_irqfd *tmp;
 	struct kvm *kvm = p->kvm;
 
+	/*
+	 * Note, irqfds.lock protects the irqfd's irq_entry, i.e. its routing,
+	 * and irqfds.items.  It does NOT protect registering with the eventfd.
+	 */
 	spin_lock_irq(&kvm->irqfds.lock);
 
-	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
-		if (irqfd->eventfd != tmp->eventfd)
-			continue;
-		/* This fd is used for another irq already. */
-		p->ret = -EBUSY;
-		spin_unlock_irq(&kvm->irqfds.lock);
-		return;
-	}
-
+	/*
+	 * Initialize the routing information prior to adding the irqfd to the
+	 * eventfd's waitqueue, as irqfd_wakeup() can be invoked as soon as the
+	 * irqfd is registered.
+	 */
 	irqfd_update(kvm, irqfd);
 
-	list_add_tail(&irqfd->list, &kvm->irqfds.items);
-
 	/*
 	 * Add the irqfd as a priority waiter on the eventfd, with a custom
 	 * wake-up handler, so that KVM *and only KVM* is notified whenever the
-	 * underlying eventfd is signaled.  Temporarily lie to lockdep about
-	 * holding irqfds.lock to avoid a false positive regarding potential
-	 * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details).
+	 * underlying eventfd is signaled.
 	 */
 	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
 
+	/*
+	 * Temporarily lie to lockdep about holding irqfds.lock to avoid a
+	 * false positive regarding potential deadlock with irqfd_wakeup()
+	 * (see irqfd_wakeup() for details).
+	 *
+	 * Adding to the wait queue will fail if there is already a priority
+	 * waiter, i.e. if the eventfd is associated with another irqfd (in any
+	 * VM).  Note, kvm_irqfd_deassign() waits for all in-flight shutdown
+	 * jobs to complete, i.e. ensures the irqfd has been removed from the
+	 * eventfd's waitqueue before returning to userspace.
+	 */
 	spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_);
-	irqfd->wait.flags |= WQ_FLAG_EXCLUSIVE;
-	add_wait_queue_priority(wqh, &irqfd->wait);
+	p->ret = add_wait_queue_priority_exclusive(wqh, &irqfd->wait);
 	spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_);
+	if (p->ret)
+		goto out;
 
+	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
+		if (irqfd->eventfd != tmp->eventfd)
+			continue;
+
+		WARN_ON_ONCE(1);
+		/* This fd is used for another irq already. */
+		p->ret = -EBUSY;
+		goto out;
+	}
+
+	list_add_tail(&irqfd->list, &kvm->irqfds.items);
+
+out:
 	spin_unlock_irq(&kvm->irqfds.lock);
-
-	p->ret = 0;
 }
 
 #if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
-- 
2.49.0.1151.ga128411c76-goog


  parent reply	other threads:[~2025-05-22 23:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-22 23:52 [PATCH v3 00/13] KVM: Make irqfd registration globally unique Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 01/13] KVM: Use a local struct to do the initial vfs_poll() on an irqfd Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 02/13] KVM: Acquire SCRU lock outside of irqfds.lock during assignment Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 03/13] KVM: Initialize irqfd waitqueue callback when adding to the queue Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 04/13] KVM: Add irqfd to KVM's list via the vfs_poll() callback Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 05/13] KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 06/13] sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 07/13] xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 08/13] sched/wait: Add a waitqueue helper for fully exclusive priority waiters Sean Christopherson
2025-05-30  8:45   ` K Prateek Nayak
2025-05-22 23:52 ` Sean Christopherson [this message]
2025-05-22 23:52 ` [PATCH v3 10/13] KVM: Drop sanity check that per-VM list of irqfds is unique Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 11/13] KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 12/13] KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 13/13] KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements Sean Christopherson
2025-05-23  7:23   ` Sairaj Kodilkar
2025-05-23 14:33     ` Sean Christopherson
2025-05-26  3:36       ` Sairaj Kodilkar
2025-05-23 11:14 ` [PATCH v3 00/13] KVM: Make irqfd registration globally unique Peter Zijlstra
2025-05-30  8:49 ` K Prateek Nayak
2025-06-24 19:38 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250522235223.3178519-10-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=decui@microsoft.com \
    --cc=dmatlack@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=jgross@suse.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=kys@microsoft.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shuah@kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wei.liu@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox