All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "dwmw2@infradead.org" <dwmw2@infradead.org>,
	Jon Kohler <jon@nutanix.com>,
	 "khushit.shah@nutanix.com" <khushit.shah@nutanix.com>,
	"x86@kernel.org" <x86@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
	 "hpa@zytor.com" <hpa@zytor.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	 "stable@vger.kernel.org" <stable@vger.kernel.org>,
	 "shaju.abraham@nutanix.com" <shaju.abraham@nutanix.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
Date: Wed, 28 Jan 2026 06:57:46 -0800	[thread overview]
Message-ID: <aXoj6szBMy6BSzYO@google.com> (raw)
In-Reply-To: <62f9bdb9f3c7f55db747a29c292fa632bb6ec749.camel@intel.com>

On Wed, Jan 28, 2026, Kai Huang wrote:
> On Tue, 2026-01-27 at 19:48 -0800, David Woodhouse wrote:
> > On Wed, 2026-01-28 at 02:22 +0000, Huang, Kai wrote:
> > >  
> > > > Ah, so userspace which checks all the kernel's capabilities *first*
> > > > will not see KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST advertised,
> > > > because it needs to enable KVM_CAP_SPLIT_IRQCHIP first?
> > > > 
> > > > I guess that's tolerable¹ but the documentation could make it clearer,
> > > > perhaps? I can see VMMs silently failing to detect the feature because
> > > > they just don't set split-irqchip before checking for it? 
> > > > 
> > > > 
> > > > ¹ although I still kind of hate it and would have preferred to have the
> > > >    I/O APIC patch; userspace still has to intentionally *enable* that
> > > >    combination. But OK, I've reluctantly conceded that.
> > > 
> > > To make it even more robust, perhaps we can grab kvm->lock mutex in
> > > kvm_vm_ioctl_enable_cap() for KVM_CAP_X2APIC_API, so that it won't race with
> > > KVM_CREATE_IRQCHIP (which already grabs kvm->lock) and
> > > KVM_CAP_SPLIT_IRQCHIP?
> > > 
> > > Even more, we can add additional check in KVM_CREATE_IRQCHIP to return -
> > > EINVAL when it sees kvm->arch.suppress_eoi_broadcast_mode is
> > > KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST?
> > 
> > If we do that, then the query for KVM_CAP_X2APIC_API could advertise
> > the KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST for a freshly created KVM,
> > even before userspace has enabled *either* KVM_CREATE_IRQCHIP nor
> > KVM_CAP_SPLIT_IRQCHIP?
> 
> No IIUC it doesn't change that?
> 
> The change I mentioned above is only related to "enable" part, but not
> "query" part.
> 
> The "query" is done via kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API),
> and in this patch, it does:
> 
> @@ -4931,6 +4933,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
> ext)
>  		break;
>  	case KVM_CAP_X2APIC_API:
>  		r = KVM_X2APIC_API_VALID_FLAGS;
> +		if (kvm && !irqchip_split(kvm))
> +			r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
> 
> IIRC if this is called before KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP,
> then !irqchip_split() will be true, so it will NOT advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.
> 
> If it is called after KVM_CAP_SPLIT_IRQCHIP, then it will advertise
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.

Yep.  And when called at system-scope, i.e. with @kvm=NULL, userspace will see
the maximal support with KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST.

> Btw, it doesn't grab kvm->lock either, so theoretically it could race with
> KVM_CREATE_IRQCHIP and kvm_vm_ioctl_enable_cap(KVM_CAP_SPLIT_IRQCHIP) too.

That's totally fine.

> > That would be slightly better than the existing proposed awfulness
> > where the kernel doesn't *admit* to having the _ENABLE_ capability
> > until userspace first enables the KVM_CAP_SPLIT_IRQCHIP.
> 
> We could also make kvm_vm_ioctl_check_extension(KVM_CAP_X2APIC_API) to
> _always_ advertise KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST if that's
> better.

No, because then we'd need new uAPI if we add support for ENABLE_SUPPRESS_EOI_BROADCAST
with an in-kernel I/O APIC.

> I suppose what we need is to document such behaviour -- that albeit 
> KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST is advertise as supposed, but it
> cannot be enabled together with KVM_CREATE_IRQCHIP -- one will fail
> depending on which is called first.

No, we don't need to explicitly document this, because it's super duper basic
multi-threaded programming.  KVM only needs to documented that
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST requires a VM with KVM_CAP_SPLIT_IRQCHIP.

> As a bonus, it can get rid of "calling irqchip_split() w/o holding kvm-
> >lock" awfulness too.

No, it's not awfulness.  It's userspace's responsibility to not be stupid.  KVM
taking kvm->lock changes *nothing*.  All holding kvm->lock does is serialize KVM
code, it doesn't prevent a race.  I.e. it just changes whether tasks are racing
to acquire kvm->lock versus racing against irqchip_mode.

If userspace invokes KVM_CAP_SPLIT_IRQCHIP and KVM_ENABLE_CAP concurrently on two
separate tasks, then KVM_ENABLE_CAP will fail ~50% of the time regardless of
whether or not KVM takes kvm->lock.

CPU0                         CPU1

1. Locked Failure
----------------------------------------------------
                             lock(kvm->lock)
			     KVM_ENABLE_CAP = EINVAL
			     unlock(kvm->lock)
lock(kvm->lock)
KVM_CAP_SPLIT_IRQCHIP = 0
unlock(kvm->lock)

1. Locked Success
----------------------------------------------------
lock(kvm->lock)
KVM_CAP_SPLIT_IRQCHIP = 0
unlock(kvm->lock)
                             lock(kvm->lock)
			     KVM_ENABLE_CAP = 0
			     unlock(kvm->lock)

3. Lockless Failure
----------------------------------------------------
			     KVM_ENABLE_CAP = EINVAL
KVM_CAP_SPLIT_IRQCHIP = 0


4. Lockless Success
----------------------------------------------------
CPU0                         CPU1
KVM_CAP_SPLIT_IRQCHIP = 0
			     KVM_ENABLE_CAP = 0

  reply	other threads:[~2026-01-28 14:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 12:56 [PATCH v6] KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Khushit Shah
2026-01-27  2:21 ` Khushit Shah
2026-01-27  2:41   ` Khushit Shah
2026-01-27 21:09 ` David Woodhouse
2026-01-27 21:49   ` Sean Christopherson
2026-01-27 22:36     ` David Woodhouse
2026-01-28  2:22       ` Huang, Kai
2026-01-28  3:48         ` David Woodhouse
     [not found]           ` <SA2PR02MB756478359EE9185285ACE6158891A@SA2PR02MB7564.namprd02.prod.outlook.com>
2026-01-28  5:17             ` Khushit Shah
2026-01-28  5:32               ` David Woodhouse
2026-01-28  6:40               ` Huang, Kai
2026-01-28 15:04               ` Sean Christopherson
2026-01-28  6:15           ` Huang, Kai
2026-01-28 14:57             ` Sean Christopherson [this message]
2026-01-28 21:10               ` Huang, Kai
2026-01-28 14:44       ` Sean Christopherson
2026-02-04  0:10 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXoj6szBMy6BSzYO@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jon@nutanix.com \
    --cc=kai.huang@intel.com \
    --cc=khushit.shah@nutanix.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=shaju.abraham@nutanix.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.