From: Sean Christopherson <seanjc@google.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Nuno Das Neves <nunodasneves@linux.microsoft.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
K Prateek Nayak <kprateek.nayak@amd.com>,
David Matlack <dmatlack@google.com>,
Juergen Gross <jgross@suse.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Subject: Re: [PATCH v2 08/12] sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority()
Date: Wed, 21 May 2025 08:05:09 -0700 [thread overview]
Message-ID: <aC3rpZChhtw4NODS@google.com> (raw)
In-Reply-To: <BN7PR02MB4148503E1599C1310F408863D49EA@BN7PR02MB4148.namprd02.prod.outlook.com>
On Wed, May 21, 2025, Michael Kelley wrote:
> From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 21, 2025 4:43 AM
> >
> > On Tue, May 20, 2025 at 03:20:00PM -0700, Sean Christopherson wrote:
> > > On Tue, May 20, 2025, Peter Zijlstra wrote:
> > > > On Mon, May 19, 2025 at 11:55:10AM -0700, Sean Christopherson wrote:
> > > > > Drop the setting of WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() to
> > > > > differentiate it from add_wait_queue_priority_exclusive(). The one and
> > > > > only user add_wait_queue_priority(), Xen privcmd's irqfd_wakeup(),
> > > > > unconditionally returns '0', i.e. doesn't actually operate in exclusive
> > > > > mode.
> > > >
> > > > I find:
> > > >
> > > > drivers/hv/mshv_eventfd.c: add_wait_queue_priority(wqh, &irqfd->irqfd_wait);
> > > > drivers/xen/privcmd.c: add_wait_queue_priority(wqh, &kirqfd->wait);
> > > >
> > > > I mean, it might still be true and all, but hyperv seems to also use
> > > > this now.
> > >
> > > Oh FFS, another "heavily inspired by KVM". I should have bribed someone to take
> > > this series when I had the chance. *sigh*
> > >
> > > Unfortunately, the Hyper-V code does actually operate in exclusive mode. Unless
> > > you have a better idea, I'll tweak the series to:
> > >
> > > 1. Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() and have the callers
> > > explicitly set the flag,
> > > 2. Add a patch to drop WQ_FLAG_EXCLUSIVE from Xen privcmd entirely.
> > > 3. Introduce add_wait_queue_priority_exclusive() and switch KVM to use it.
> > >
> > > That has an added bonus of introducing the Xen change in a dedicated patch, i.e.
> > > is probably a sequence anyways.
> > >
> > > Alternatively, I could rewrite the Hyper-V code a la the KVM changes, but I'm not
> > > feeling very charitable at the moment (the complete lack of documentation for
> > > their ioctl doesn't help).
> >
> > Works for me. Michael is typically very responsive wrt hyperv (but you
> > probably know this).
>
> I can't be much help on this issue. This Hyper-V code is for Linux running in
> the root partition (i.e., "dom0") and I don't have a setup where I can run and
> test that configuration.
>
> Adding Nuno Das Neves from Microsoft for his thoughts.
A slightly more helpful, less ranty explanation of what's going on:
KVM's irqfd code, which was pretty copied verbatim for Hyper-V partitions, disallows
binding an eventfd to a single VM multiple times, but doesn't handle the scenario
where an eventfd is bound to multiple VMs, i.e. to multiple partitions. What's
particular "fun" about such a scenario is that WQ_FLAG_EXCLUSIVE+WQ_FLAG_PRIORITY
means only the first VM/partition that bound the eventfd will be notified.
For KVM-based setups, this is a legitimate concern because KVM supports intra-host
migration. E.g. to upgrade the userspace VMM, a guest can be "migrated" from the
old VMM's "struct kvm" instance to the new VMM's "struct kvm". If userspace mucks
up the migration, e.g. doesn't *unbind* the eventfd from the old VM(M) before
resuming the guest in the new VM(M), KVM will effectively drop virtual IRQs.
This is purely a hardening exercise, i.e. isn't required for correctness, assuming
userspace userspace is bug-free. The KVM patches surrounding this patch show how
I am planning on ensuring a 1:1 eventfd:VM binding.
To not block the KVM hardening on Hyper-V's eventfd usage, I am planning on making
this change in the next version of the series:
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 8dd22be2ca0b..b348928871c2 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -368,6 +368,14 @@ static void mshv_irqfd_queue_proc(struct file *file, wait_queue_head_t *wqh,
container_of(polltbl, struct mshv_irqfd, irqfd_polltbl);
irqfd->irqfd_wqh = wqh;
+
+ /*
+ * TODO: Ensure there isn't already an exclusive, priority waiter, e.g.
+ * that the irqfd isn't already bound to another partition. Only the
+ * first exclusive waiter encountered will be notified, and
+ * add_wait_queue_priority() doesn't enforce exclusivity.
+ */
+ irqfd->irqfd_wait.flags |= WQ_FLAG_EXCLUSIVE;
add_wait_queue_priority(wqh, &irqfd->irqfd_wait);
}
next prev parent reply other threads:[~2025-05-21 15:05 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-19 18:55 [PATCH v2 00/12] KVM: Make irqfd registration globally unique Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 01/12] KVM: Use a local struct to do the initial vfs_poll() on an irqfd Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 02/12] KVM: Acquire SCRU lock outside of irqfds.lock during assignment Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 03/12] KVM: Initialize irqfd waitqueue callback when adding to the queue Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 04/12] KVM: Add irqfd to KVM's list via the vfs_poll() callback Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 05/12] KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 06/12] sched/wait: Add a waitqueue helper for fully exclusive priority waiters Sean Christopherson
2025-05-20 19:17 ` Peter Zijlstra
2025-05-20 20:57 ` Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 07/12] KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 08/12] sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() Sean Christopherson
2025-05-20 19:18 ` Peter Zijlstra
2025-05-20 22:20 ` Sean Christopherson
2025-05-21 11:42 ` Peter Zijlstra
2025-05-21 14:44 ` Michael Kelley
2025-05-21 15:05 ` Sean Christopherson [this message]
2025-05-21 13:22 ` Jürgen Groß
2025-05-19 18:55 ` [PATCH v2 09/12] KVM: Drop sanity check that per-VM list of irqfds is unique Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 10/12] KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 11/12] KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD Sean Christopherson
2025-05-19 18:55 ` [PATCH v2 12/12] KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC3rpZChhtw4NODS@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=jgross@suse.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mhklinux@outlook.com \
--cc=mingo@redhat.com \
--cc=nunodasneves@linux.microsoft.com \
--cc=oleksandr_tyshchenko@epam.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=sstabellini@kernel.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.