From: Sean Christopherson <seanjc@google.com>
To: "K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Juergen Gross <jgross@suse.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Shuah Khan <shuah@kernel.org>, Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Sean Christopherson <seanjc@google.com>
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
K Prateek Nayak <kprateek.nayak@amd.com>,
David Matlack <dmatlack@google.com>
Subject: [PATCH v3 06/13] sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority()
Date: Thu, 22 May 2025 16:52:16 -0700 [thread overview]
Message-ID: <20250522235223.3178519-7-seanjc@google.com> (raw)
In-Reply-To: <20250522235223.3178519-1-seanjc@google.com>
Drop the setting of WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() and
instead have callers manually add the flag prior to adding their structure
to the queue. Blindly setting WQ_FLAG_EXCLUSIVE is flawed, as the nature
of exclusive, priority waiters means that only the first waiter added will
ever receive notifications.
Pushing the flawed behavior to callers will allow fixing the problem one
hypervisor at a time (KVM added the flawed API, and then KVM's code was
copy+pasted nearly verbatim by Xen and Hyper-V), and will also allow for
adding an API that provides true exclusivity, i.e. that guarantees at most
one priority waiter is in the queue.
Opportunistically add a comment in Hyper-V to call out the mess. Xen
privcmd's irqfd_wakefup() doesn't actually operate in exclusive mode, i.e.
can be "fixed" simply by dropping WQ_FLAG_EXCLUSIVE. And KVM is primed to
switch to the aforementioned fully exclusive API, i.e. won't be carrying
the flawed code for long.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
drivers/hv/mshv_eventfd.c | 8 ++++++++
drivers/xen/privcmd.c | 1 +
kernel/sched/wait.c | 4 ++--
virt/kvm/eventfd.c | 1 +
4 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 8dd22be2ca0b..b348928871c2 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -368,6 +368,14 @@ static void mshv_irqfd_queue_proc(struct file *file, wait_queue_head_t *wqh,
container_of(polltbl, struct mshv_irqfd, irqfd_polltbl);
irqfd->irqfd_wqh = wqh;
+
+ /*
+ * TODO: Ensure there isn't already an exclusive, priority waiter, e.g.
+ * that the irqfd isn't already bound to another partition. Only the
+ * first exclusive waiter encountered will be notified, and
+ * add_wait_queue_priority() doesn't enforce exclusivity.
+ */
+ irqfd->irqfd_wait.flags |= WQ_FLAG_EXCLUSIVE;
add_wait_queue_priority(wqh, &irqfd->irqfd_wait);
}
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 13a10f3294a8..c08ec8a7d27c 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -957,6 +957,7 @@ irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
struct privcmd_kernel_irqfd *kirqfd =
container_of(pt, struct privcmd_kernel_irqfd, pt);
+ kirqfd->wait.flags |= WQ_FLAG_EXCLUSIVE;
add_wait_queue_priority(wqh, &kirqfd->wait);
}
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 51e38f5f4701..4ab3ab195277 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -40,7 +40,7 @@ void add_wait_queue_priority(struct wait_queue_head *wq_head, struct wait_queue_
{
unsigned long flags;
- wq_entry->flags |= WQ_FLAG_EXCLUSIVE | WQ_FLAG_PRIORITY;
+ wq_entry->flags |= WQ_FLAG_PRIORITY;
spin_lock_irqsave(&wq_head->lock, flags);
__add_wait_queue(wq_head, wq_entry);
spin_unlock_irqrestore(&wq_head->lock, flags);
@@ -64,7 +64,7 @@ EXPORT_SYMBOL(remove_wait_queue);
* the non-exclusive tasks. Normally, exclusive tasks will be at the end of
* the list and any non-exclusive tasks will be woken first. A priority task
* may be at the head of the list, and can consume the event without any other
- * tasks being woken.
+ * tasks being woken if it's also an exclusive task.
*
* There are circumstances in which we can try to wake a task which has already
* started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 04877b297267..c7969904637a 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -316,6 +316,7 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh,
init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_);
+ irqfd->wait.flags |= WQ_FLAG_EXCLUSIVE;
add_wait_queue_priority(wqh, &irqfd->wait);
spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_);
--
2.49.0.1151.ga128411c76-goog
next prev parent reply other threads:[~2025-05-22 23:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 23:52 [PATCH v3 00/13] KVM: Make irqfd registration globally unique Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 01/13] KVM: Use a local struct to do the initial vfs_poll() on an irqfd Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 02/13] KVM: Acquire SCRU lock outside of irqfds.lock during assignment Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 03/13] KVM: Initialize irqfd waitqueue callback when adding to the queue Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 04/13] KVM: Add irqfd to KVM's list via the vfs_poll() callback Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 05/13] KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock Sean Christopherson
2025-05-22 23:52 ` Sean Christopherson [this message]
2025-05-22 23:52 ` [PATCH v3 07/13] xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 08/13] sched/wait: Add a waitqueue helper for fully exclusive priority waiters Sean Christopherson
2025-05-30 8:45 ` K Prateek Nayak
2025-05-22 23:52 ` [PATCH v3 09/13] KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 10/13] KVM: Drop sanity check that per-VM list of irqfds is unique Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 11/13] KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 12/13] KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD Sean Christopherson
2025-05-22 23:52 ` [PATCH v3 13/13] KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements Sean Christopherson
2025-05-23 7:23 ` Sairaj Kodilkar
2025-05-23 14:33 ` Sean Christopherson
2025-05-26 3:36 ` Sairaj Kodilkar
2025-05-23 11:14 ` [PATCH v3 00/13] KVM: Make irqfd registration globally unique Peter Zijlstra
2025-05-30 8:49 ` K Prateek Nayak
2025-06-24 19:38 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250522235223.3178519-7-seanjc@google.com \
--to=seanjc@google.com \
--cc=decui@microsoft.com \
--cc=dmatlack@google.com \
--cc=haiyangz@microsoft.com \
--cc=jgross@suse.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=kys@microsoft.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=shuah@kernel.org \
--cc=sstabellini@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=wei.liu@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox