in-kernel interrupt controller steering

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* in-kernel interrupt controller steering
@ 2013-03-04 22:20 Alexander Graf
  2013-03-05  0:59 ` Scott Wood
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-04 22:20 UTC (permalink / raw)
  To: Overall list:Overall
  Cc: kvm-ppc, Gleb Natapov, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell

Howdy,

We just sat down to discuss the proposed XICS and MPIC interfaces and how we can take bits of each and create an interface that works for everyone. In this, it feels like we came to some conclusions. Some of which we already reached earlier, but forgot in between :).

I hope I didn't forget too many pieces. Scott, Paul and Stuart, please add whatever you find missing in here.

Alex

1) We need to set the generic interrupt type of the system before we create vcpus.

This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.

This interrupt type is write once and frozen after the first vcpu got created.

2) Interrupt controllers (XICS / MPIC) get created by the device create api

Getting and setting state of an interrupt controller also happens through this. Getting and setting state from vcpus happens through ONE_REG. Injecting interrupt happens through the normal irqchip ioctl (we probably need to encode the target device id in there somehow).

This fits in nicely with a model where the interrupt controller is a proper QOM device in QEMU, since we can create it long after vcpus have been created.

3) We open code interrupt controller distinction

There is no need for function pointers. We just switch() based on the type that gets set in the initial ioctl to determine which code to call. The retrieval of the irq type happens through a static inline function in a header that can return a constant number for configurations that don't support multiple in-kernel irqchips.

4) The device attribute API has separate groups that target different use cases

Paul needs live migration, so he will implement device attributes that enable him to do live migration.
Scott doesn't implement live migration, so his MPIC attribute groups are solely for debugging purposes today.

5) There is no need for atomic device control accessors today.

Live migration happens with vcpus stopped, so we don't need to be atomic in the kernel <-> user space interface.

6) The device attribute API will keep read and write (get / set) accessors.

There is no specific need for a generic "command" ioctl.

7) Interrupt line connections to vcpus are implicit

We don't explicitly mark which in-kernel irqchip interrupt line goes to which vcpu. This is done implicitly. If we see a need for it, we create a new irqchip device type that allows us to explicitly configure vcpu connections.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-04 22:20 in-kernel interrupt controller steering Alexander Graf
@ 2013-03-05  0:59 ` Scott Wood
  2013-03-05  5:44   ` Paul Mackerras
  2013-03-05 15:25 ` Gleb Natapov
  2013-03-06  0:23 ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 47+ messages in thread
From: Scott Wood @ 2013-03-05  0:59 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Overall list:Overall, kvm-ppc, Gleb Natapov, Stuart Yoder,
	Paul Mackerras, Peter Maydell

On 03/04/2013 04:20:47 PM, Alexander Graf wrote:
> Howdy,
> 
> We just sat down to discuss the proposed XICS and MPIC interfaces and  
> how we can take bits of each and create an interface that works for  
> everyone. In this, it feels like we came to some conclusions. Some of  
> which we already reached earlier, but forgot in between :).
> 
> I hope I didn't forget too many pieces. Scott, Paul and Stuart,  
> please add whatever you find missing in here.

It looks about right.

> 1) We need to set the generic interrupt type of the system before we  
> create vcpus.
> 
> This is a new ioctl that sets the overall system interrupt controller  
> type to a specific model. This used so that when we create vcpus, we  
> can create the appended "local interrupt controller" state without  
> the actual interrupt controller device available yet. It is also used  
> later to switch between interrupt controller implementations.
> 
> This interrupt type is write once and frozen after the first vcpu got  
> created.

Who is going to write up this patch?

> 2) Interrupt controllers (XICS / MPIC) get created by the device  
> create api
> 
> Getting and setting state of an interrupt controller also happens  
> through this. Getting and setting state from vcpus happens through  
> ONE_REG. Injecting interrupt happens through the normal irqchip ioctl  
> (we probably need to encode the target device id in there somehow).
> 
> This fits in nicely with a model where the interrupt controller is a  
> proper QOM device in QEMU, since we can create it long after vcpus  
> have been created.
> 
> 
> 3) We open code interrupt controller distinction
> 
> There is no need for function pointers. We just switch() based on the  
> type that gets set in the initial ioctl to determine which code to  
> call. The retrieval of the irq type happens through a static inline  
> function in a header that can return a constant number for  
> configurations that don't support multiple in-kernel irqchips.
> 
> 
> 4) The device attribute API has separate groups that target different  
> use cases
> 
> Paul needs live migration, so he will implement device attributes  
> that enable him to do live migration.
> Scott doesn't implement live migration, so his MPIC attribute groups  
> are solely for debugging purposes today.
> 
> 
> 5) There is no need for atomic device control accessors today.
> 
> Live migration happens with vcpus stopped, so we don't need to be  
> atomic in the kernel <-> user space interface.
> 
> 
> 6) The device attribute API will keep read and write (get / set)  
> accessors.
> 
> There is no specific need for a generic "command" ioctl.

Gleb, is this OK?  A bidirectional command accessor could be added  
later if a need arises.

Will attributes still be renamed to "commands", even if the get/set  
approach is retained?

> 7) Interrupt line connections to vcpus are implicit
> 
> We don't explicitly mark which in-kernel irqchip interrupt line goes  
> to which vcpu. This is done implicitly. If we see a need for it, we  
> create a new irqchip device type that allows us to explicitly  
> configure vcpu connections.

Are there any changes needed to the device control api patch (just  
patch 1/6, not the rest of the patchset), besides Christoffer's request  
to tone down one of the comments, and whatever the response is to the  
questions in #6?

Should we add a "size" field in kvm_device, both for error checking and  
to assist tools such as strace?

-Scott

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-05  0:59 ` Scott Wood
@ 2013-03-05  5:44   ` Paul Mackerras
  0 siblings, 0 replies; 47+ messages in thread
From: Paul Mackerras @ 2013-03-05  5:44 UTC (permalink / raw)
  To: Scott Wood
  Cc: Alexander Graf, Overall list:Overall, kvm-ppc, Gleb Natapov,
	Stuart Yoder, Peter Maydell

On Mon, Mar 04, 2013 at 06:59:16PM -0600, Scott Wood wrote:
> On 03/04/2013 04:20:47 PM, Alexander Graf wrote:
> >1) We need to set the generic interrupt type of the system before
> >we create vcpus.
> >
> >This is a new ioctl that sets the overall system interrupt
> >controller type to a specific model. This used so that when we
> >create vcpus, we can create the appended "local interrupt
> >controller" state without the actual interrupt controller device
> >available yet. It is also used later to switch between interrupt
> >controller implementations.
> >
> >This interrupt type is write once and frozen after the first vcpu
> >got created.
> 
> Who is going to write up this patch?

I'll have a stab at it.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-04 22:20 in-kernel interrupt controller steering Alexander Graf
  2013-03-05  0:59 ` Scott Wood
@ 2013-03-05 15:25 ` Gleb Natapov
  2013-03-06  9:40   ` Paolo Bonzini
  2013-03-06  0:23 ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2013-03-05 15:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Overall list:Overall, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Mon, Mar 04, 2013 at 11:20:47PM +0100, Alexander Graf wrote:
> Howdy,
> 
> We just sat down to discuss the proposed XICS and MPIC interfaces and how we can take bits of each and create an interface that works for everyone. In this, it feels like we came to some conclusions. Some of which we already reached earlier, but forgot in between :).
> 
> I hope I didn't forget too many pieces. Scott, Paul and Stuart, please add whatever you find missing in here.
> 
> 
> Alex
> 
Great! Thank you guys for collaborating on this.

> 
> 1) We need to set the generic interrupt type of the system before we create vcpus.
> 
> This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.
> 
> This interrupt type is write once and frozen after the first vcpu got created.
>
Why explicit ioctl is needed? Why not require specific irqchip to be
created before first vcpu. The device created determines system interrupt
controller type.

> 
> 2) Interrupt controllers (XICS / MPIC) get created by the device create api
> 
> Getting and setting state of an interrupt controller also happens through this. Getting and setting state from vcpus happens through ONE_REG. Injecting interrupt happens through the normal irqchip ioctl (we probably need to encode the target device id in there somehow).
> 
Sounds fine. MSI goes through KVM_SIGNAL_MSI?

> This fits in nicely with a model where the interrupt controller is a proper QOM device in QEMU, since we can create it long after vcpus have been created.
> 
> 
> 3) We open code interrupt controller distinction
> 
> There is no need for function pointers. We just switch() based on the type that gets set in the initial ioctl to determine which code to call. The retrieval of the irq type happens through a static inline function in a header that can return a constant number for configurations that don't support multiple in-kernel irqchips.
> 
That's internal implementation detail, so less important to set in stone.

> 
> 4) The device attribute API has separate groups that target different use cases
> 
> Paul needs live migration, so he will implement device attributes that enable him to do live migration.
> Scott doesn't implement live migration, so his MPIC attribute groups are solely for debugging purposes today.
> 
What's the difference? The only difference I see is that for migration
you need to make all internal state accessible, for debug this is not
necessary, but since proposed API access each bit of a state one at a time
debug interface should be extensible to become migration interface just
by adding accessible state, no?

> 
> 5) There is no need for atomic device control accessors today.
> 
> Live migration happens with vcpus stopped, so we don't need to be atomic in the kernel <-> user space interface.
> 
Do you mean control that retrieves the whole device state in one ioctl
call? Yes, we do not need it.

> 
> 6) The device attribute API will keep read and write (get / set) accessors.
> 
> There is no specific need for a generic "command" ioctl.
That depends on how people will use get/set accessors :)
Since for interrupt injection normal irqchip ioctl will be used we can
probably skip adding "command" ioctl now.

> 
> 
> 7) Interrupt line connections to vcpus are implicit
> 
> We don't explicitly mark which in-kernel irqchip interrupt line goes to which vcpu. This is done implicitly. If we see a need for it, we create a new irqchip device type that allows us to explicitly configure vcpu connections.
OK.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-04 22:20 in-kernel interrupt controller steering Alexander Graf
  2013-03-05  0:59 ` Scott Wood
  2013-03-05 15:25 ` Gleb Natapov
@ 2013-03-06  0:23 ` Benjamin Herrenschmidt
  2013-03-06  0:33   ` Alexander Graf
  2 siblings, 1 reply; 47+ messages in thread
From: Benjamin Herrenschmidt @ 2013-03-06  0:23 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Overall list:Overall, kvm-ppc, Gleb Natapov, Stuart Yoder,
	Scott Wood, Paul Mackerras, Peter Maydell

On Mon, 2013-03-04 at 23:20 +0100, Alexander Graf wrote:
> 7) Interrupt line connections to vcpus are implicit
> 
> We don't explicitly mark which in-kernel irqchip interrupt line goes
> to which vcpu. This is done implicitly. If we see a need for it, we
> create a new irqchip device type that allows us to explicitly
> configure vcpu connections.

I don't understand that one. The connection from irq source goes to
which vcpu is an intrinsic part of the configuration of that interrupt
line on XICS and MPIC and probably any other, which is set explicitly by
the guest using MMIO or hcalls ...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06  0:23 ` Benjamin Herrenschmidt
@ 2013-03-06  0:33   ` Alexander Graf
  0 siblings, 0 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06  0:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Overall list:Overall, kvm-ppc, Gleb Natapov, Stuart Yoder,
	Scott Wood, Paul Mackerras, Peter Maydell


On 06.03.2013, at 01:23, Benjamin Herrenschmidt wrote:

> On Mon, 2013-03-04 at 23:20 +0100, Alexander Graf wrote:
>> 7) Interrupt line connections to vcpus are implicit
>> 
>> We don't explicitly mark which in-kernel irqchip interrupt line goes
>> to which vcpu. This is done implicitly. If we see a need for it, we
>> create a new irqchip device type that allows us to explicitly
>> configure vcpu connections.
> 
> I don't understand that one. The connection from irq source goes to
> which vcpu is an intrinsic part of the configuration of that interrupt
> line on XICS and MPIC and probably any other, which is set explicitly by
> the guest using MMIO or hcalls ...

Yes, I was referring to the other side of the connection. You really only have a pool of vcpus. But your interrupt controller needs to know "CPU 0 is the one over there". That connection is implicit by vcpu id today, but we could make it user space settable later.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-05 15:25 ` Gleb Natapov
@ 2013-03-06  9:40   ` Paolo Bonzini
  2013-03-06  9:58     ` Gleb Natapov
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06  9:40 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm@vger.kernel.org, kvm-ppc, Stuart Yoder,
	Scott Wood, Paul Mackerras, Peter Maydell

Il 05/03/2013 16:25, Gleb Natapov ha scritto:
>> 1) We need to set the generic interrupt type of the system before we create vcpus.
>>
>> This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.
>>
>> This interrupt type is write once and frozen after the first vcpu got created.
>
> Why explicit ioctl is needed? Why not require specific irqchip to be
> created before first vcpu. The device created determines system interrupt
> controller type.

QEMU creates CPUs before devices, and CPUs need to know what kind of
local interrupt controller to create.  Similar to how in-kernel LAPIC
state is created long before the userspace device that proxies the LAPIC.

I think the above design makes sense.  The alternative would be to
entirely separate the creation of CPUs and devices.  You could even
support heterogeneous systems with some in-kernel irqchips and some
userspace irqchips; sounds cool, but useless too.

>> 7) Interrupt line connections to vcpus are implicit
>>
>> We don't explicitly mark which in-kernel irqchip interrupt line
>> goes to which vcpu. This is done implicitly. If we see a need for it, we
>> create a new irqchip device type that allows us to explicitly configure
>> vcpu connections.
> 
> OK.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06  9:40   ` Paolo Bonzini
@ 2013-03-06  9:58     ` Gleb Natapov
  2013-03-06 10:04       ` Alexander Graf
                         ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06  9:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, kvm@vger.kernel.org, kvm-ppc, Stuart Yoder,
	Scott Wood, Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
> Il 05/03/2013 16:25, Gleb Natapov ha scritto:
> >> 1) We need to set the generic interrupt type of the system before we create vcpus.
> >>
> >> This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.
> >>
> >> This interrupt type is write once and frozen after the first vcpu got created.
> >
> > Why explicit ioctl is needed? Why not require specific irqchip to be
> > created before first vcpu. The device created determines system interrupt
> > controller type.
> 
> QEMU creates CPUs before devices, and CPUs need to know what kind of
> local interrupt controller to create.  Similar to how in-kernel LAPIC
> state is created long before the userspace device that proxies the LAPIC.
> 
So what is the difference between calling this special ioctl before
creating vcpus and calling create device ioctl instead and create QEMU
proxy device at whatever point in time QEMU wants to create it?

> I think the above design makes sense.  The alternative would be to
> entirely separate the creation of CPUs and devices.  You could even
> support heterogeneous systems with some in-kernel irqchips and some
> userspace irqchips; sounds cool, but useless too.
Not so useless. It may make sense for x86 to emulate only ioapic in
kernel and leave PIC in userspace.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06  9:58     ` Gleb Natapov
@ 2013-03-06 10:04       ` Alexander Graf
  2013-03-06 10:12         ` Gleb Natapov
  2013-03-06 10:38       ` Paolo Bonzini
  2013-03-06 10:38       ` Paolo Bonzini
  2 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 10:04 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	Stuart Yoder, Scott Wood, Paul Mackerras, Peter Maydell



Am 06.03.2013 um 10:58 schrieb Gleb Natapov <gleb@redhat.com>:

> On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
>> Il 05/03/2013 16:25, Gleb Natapov ha scritto:
>>>> 1) We need to set the generic interrupt type of the system before we create vcpus.
>>>> 
>>>> This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.
>>>> 
>>>> This interrupt type is write once and frozen after the first vcpu got created.
>>> 
>>> Why explicit ioctl is needed? Why not require specific irqchip to be
>>> created before first vcpu. The device created determines system interrupt
>>> controller type.
>> 
>> QEMU creates CPUs before devices, and CPUs need to know what kind of
>> local interrupt controller to create.  Similar to how in-kernel LAPIC
>> state is created long before the userspace device that proxies the LAPIC.
> So what is the difference between calling this special ioctl before
> creating vcpus and calling create device ioctl instead and create QEMU
> proxy device at whatever point in time QEMU wants to create it?

I don't understand the question really. What proxy device?


Alex


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 10:04       ` Alexander Graf
@ 2013-03-06 10:12         ` Gleb Natapov
  0 siblings, 0 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 10:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	Stuart Yoder, Scott Wood, Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 11:04:21AM +0100, Alexander Graf wrote:
> 
> 
> Am 06.03.2013 um 10:58 schrieb Gleb Natapov <gleb@redhat.com>:
> 
> > On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
> >> Il 05/03/2013 16:25, Gleb Natapov ha scritto:
> >>>> 1) We need to set the generic interrupt type of the system before we create vcpus.
> >>>> 
> >>>> This is a new ioctl that sets the overall system interrupt controller type to a specific model. This used so that when we create vcpus, we can create the appended "local interrupt controller" state without the actual interrupt controller device available yet. It is also used later to switch between interrupt controller implementations.
> >>>> 
> >>>> This interrupt type is write once and frozen after the first vcpu got created.
> >>> 
> >>> Why explicit ioctl is needed? Why not require specific irqchip to be
> >>> created before first vcpu. The device created determines system interrupt
> >>> controller type.
> >> 
> >> QEMU creates CPUs before devices, and CPUs need to know what kind of
> >> local interrupt controller to create.  Similar to how in-kernel LAPIC
> >> state is created long before the userspace device that proxies the LAPIC.
> > So what is the difference between calling this special ioctl before
> > creating vcpus and calling create device ioctl instead and create QEMU
> > proxy device at whatever point in time QEMU wants to create it?
> 
> I don't understand the question really. What proxy device?
> 
That's how Paolo called the QEMU part of the kernel irqchip device.

So the question is this. You propose special ioctl to set "irqchip
architecture". Lets call it SET_IRQCHIP_ARCH. QEMU suppose to do that:

ioctl(SET_IRQCHIP_ARCH, MPIC)
create_vcpus()
create_devices()
  create_irqchips()
    ioctl(CREATE_DEVICE, MPIC)
    ioctl(SET_ATTR, attr1)
    ioctl(SET_ATTR, attr2)
 

Why can't it do:
ioctl(CREATE_DEVICE, MPIC)
create_vcpus()
  create_irqchips()
   ioctl(SET_ATTR, attr1)
   ioctl(SET_ATTR, attr2)

The question is rhetorical though because I know it can, it does that
for x86. So the real question is what is the disadvantages that warrant
separate ioctl?

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06  9:58     ` Gleb Natapov
  2013-03-06 10:04       ` Alexander Graf
@ 2013-03-06 10:38       ` Paolo Bonzini
  2013-03-06 10:38       ` Paolo Bonzini
  2 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 10:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell



----- Messaggio originale -----
> Da: "Gleb Natapov" <gleb@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: "Alexander Graf" <agraf@suse.de>, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, "Stuart Yoder"
> <stuart.yoder@freescale.com>, "Scott Wood" <scottwood@freescale.com>, "Paul Mackerras" <paulus@samba.org>, "Peter
> Maydell" <peter.maydell@linaro.org>
> Inviato: Mercoledì, 6 marzo 2013 10:58:35
> Oggetto: Re: in-kernel interrupt controller steering
> 
> On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
> > Il 05/03/2013 16:25, Gleb Natapov ha scritto:
> > >> 1) We need to set the generic interrupt type of the system
> > >> before we create vcpus.
> > >>
> > >> This is a new ioctl that sets the overall system interrupt
> > >> controller type to a specific model. This used so that when we
> > >> create vcpus, we can create the appended "local interrupt
> > >> controller" state without the actual interrupt controller
> > >> device available yet. It is also used later to switch between
> > >> interrupt controller implementations.
> > >>
> > >> This interrupt type is write once and frozen after the first
> > >> vcpu got created.
> > >
> > > Why explicit ioctl is needed? Why not require specific irqchip to
> > > be
> > > created before first vcpu. The device created determines system
> > > interrupt
> > > controller type.
> > 
> > QEMU creates CPUs before devices, and CPUs need to know what kind of
> > local interrupt controller to create.  Similar to how in-kernel LAPIC
> > state is created long before the userspace device that proxies the
> > LAPIC.
>
> So what is the difference between calling this special ioctl before
> creating vcpus and calling create device ioctl instead and create
> QEMU proxy device at whatever point in time QEMU wants to create it?

Because you'd have to stash the handle that KVM_CREATE_DEVICE returns
somewhere, waiting for the QEMU device to be created.

Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
and KVM_CREATE_DEVICE are added, it really will not create the device anymore.
Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.

So we can have this instead:
- KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
can be a valid irqchip type).

- KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it a
device type and possibly a VCPU number.

It's mostly about names, but one important property is that
KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
multiple times.  Gleb, do you like this more?

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06  9:58     ` Gleb Natapov
  2013-03-06 10:04       ` Alexander Graf
  2013-03-06 10:38       ` Paolo Bonzini
@ 2013-03-06 10:38       ` Paolo Bonzini
  2013-03-06 11:26         ` Gleb Natapov
  2 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 10:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell



----- Messaggio originale -----
> Da: "Gleb Natapov" <gleb@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: "Alexander Graf" <agraf@suse.de>, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, "Stuart Yoder"
> <stuart.yoder@freescale.com>, "Scott Wood" <scottwood@freescale.com>, "Paul Mackerras" <paulus@samba.org>, "Peter
> Maydell" <peter.maydell@linaro.org>
> Inviato: Mercoledì, 6 marzo 2013 10:58:35
> Oggetto: Re: in-kernel interrupt controller steering
> 
> On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
> > Il 05/03/2013 16:25, Gleb Natapov ha scritto:
> > >> 1) We need to set the generic interrupt type of the system
> > >> before we create vcpus.
> > >>
> > >> This is a new ioctl that sets the overall system interrupt
> > >> controller type to a specific model. This used so that when we
> > >> create vcpus, we can create the appended "local interrupt
> > >> controller" state without the actual interrupt controller
> > >> device available yet. It is also used later to switch between
> > >> interrupt controller implementations.
> > >>
> > >> This interrupt type is write once and frozen after the first
> > >> vcpu got created.
> > >
> > > Why explicit ioctl is needed? Why not require specific irqchip to
> > > be
> > > created before first vcpu. The device created determines system
> > > interrupt
> > > controller type.
> > 
> > QEMU creates CPUs before devices, and CPUs need to know what kind of
> > local interrupt controller to create.  Similar to how in-kernel LAPIC
> > state is created long before the userspace device that proxies the
> > LAPIC.
>
> So what is the difference between calling this special ioctl before
> creating vcpus and calling create device ioctl instead and create
> QEMU proxy device at whatever point in time QEMU wants to create it?

Because you'd have to stash the handle that KVM_CREATE_DEVICE returns
somewhere, waiting for the QEMU device to be created.

Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
and KVM_CREATE_DEVICE are added, it really will not create the device anymore.
Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.

So we can have this instead:
- KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
can be a valid irqchip type).

- KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it a
device type and possibly a VCPU number.

It's mostly about names, but one important property is that
KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
multiple times.  Gleb, do you like this more?

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 10:38       ` Paolo Bonzini
@ 2013-03-06 11:26         ` Gleb Natapov
  2013-03-06 11:44           ` Paolo Bonzini
  2013-03-06 11:44           ` Alexander Graf
  0 siblings, 2 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 11:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 05:38:33AM -0500, Paolo Bonzini wrote:
> 
> 
> ----- Messaggio originale -----
> > Da: "Gleb Natapov" <gleb@redhat.com>
> > A: "Paolo Bonzini" <pbonzini@redhat.com>
> > Cc: "Alexander Graf" <agraf@suse.de>, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, "Stuart Yoder"
> > <stuart.yoder@freescale.com>, "Scott Wood" <scottwood@freescale.com>, "Paul Mackerras" <paulus@samba.org>, "Peter
> > Maydell" <peter.maydell@linaro.org>
> > Inviato: Mercoledì, 6 marzo 2013 10:58:35
> > Oggetto: Re: in-kernel interrupt controller steering
> > 
> > On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
> > > Il 05/03/2013 16:25, Gleb Natapov ha scritto:
> > > >> 1) We need to set the generic interrupt type of the system
> > > >> before we create vcpus.
> > > >>
> > > >> This is a new ioctl that sets the overall system interrupt
> > > >> controller type to a specific model. This used so that when we
> > > >> create vcpus, we can create the appended "local interrupt
> > > >> controller" state without the actual interrupt controller
> > > >> device available yet. It is also used later to switch between
> > > >> interrupt controller implementations.
> > > >>
> > > >> This interrupt type is write once and frozen after the first
> > > >> vcpu got created.
> > > >
> > > > Why explicit ioctl is needed? Why not require specific irqchip to
> > > > be
> > > > created before first vcpu. The device created determines system
> > > > interrupt
> > > > controller type.
> > > 
> > > QEMU creates CPUs before devices, and CPUs need to know what kind of
> > > local interrupt controller to create.  Similar to how in-kernel LAPIC
> > > state is created long before the userspace device that proxies the
> > > LAPIC.
> >
> > So what is the difference between calling this special ioctl before
> > creating vcpus and calling create device ioctl instead and create
> > QEMU proxy device at whatever point in time QEMU wants to create it?
> 
> Because you'd have to stash the handle that KVM_CREATE_DEVICE returns
> somewhere, waiting for the QEMU device to be created.
> 
OK, we try not to add interfaces for one userspace convenience though.
Is this such insurmountable problem for QEMU?

> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
> not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
> and KVM_CREATE_DEVICE are added, it really will not create the device anymore.
> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.
> 
> So we can have this instead:
> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
> can be a valid irqchip type).
> 
> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it a
> device type and possibly a VCPU number.
> 
> It's mostly about names, but one important property is that
> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
> multiple times.  Gleb, do you like this more?
> 
If you put it like this it sounds better (well you've just stashed the
handle in kernel for QEMU convenience :)), but you've made the interface
irqchips specific again and this is what we are trying to avoid.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:26         ` Gleb Natapov
  2013-03-06 11:44           ` Paolo Bonzini
@ 2013-03-06 11:44           ` Alexander Graf
  2013-03-06 11:46             ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 11:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 12:26, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 05:38:33AM -0500, Paolo Bonzini wrote:
>> 
>> 
>> ----- Messaggio originale -----
>>> Da: "Gleb Natapov" <gleb@redhat.com>
>>> A: "Paolo Bonzini" <pbonzini@redhat.com>
>>> Cc: "Alexander Graf" <agraf@suse.de>, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, "Stuart Yoder"
>>> <stuart.yoder@freescale.com>, "Scott Wood" <scottwood@freescale.com>, "Paul Mackerras" <paulus@samba.org>, "Peter
>>> Maydell" <peter.maydell@linaro.org>
>>> Inviato: Mercoledì, 6 marzo 2013 10:58:35
>>> Oggetto: Re: in-kernel interrupt controller steering
>>> 
>>> On Wed, Mar 06, 2013 at 10:40:18AM +0100, Paolo Bonzini wrote:
>>>> Il 05/03/2013 16:25, Gleb Natapov ha scritto:
>>>>>> 1) We need to set the generic interrupt type of the system
>>>>>> before we create vcpus.
>>>>>> 
>>>>>> This is a new ioctl that sets the overall system interrupt
>>>>>> controller type to a specific model. This used so that when we
>>>>>> create vcpus, we can create the appended "local interrupt
>>>>>> controller" state without the actual interrupt controller
>>>>>> device available yet. It is also used later to switch between
>>>>>> interrupt controller implementations.
>>>>>> 
>>>>>> This interrupt type is write once and frozen after the first
>>>>>> vcpu got created.
>>>>> 
>>>>> Why explicit ioctl is needed? Why not require specific irqchip to
>>>>> be
>>>>> created before first vcpu. The device created determines system
>>>>> interrupt
>>>>> controller type.
>>>> 
>>>> QEMU creates CPUs before devices, and CPUs need to know what kind of
>>>> local interrupt controller to create.  Similar to how in-kernel LAPIC
>>>> state is created long before the userspace device that proxies the
>>>> LAPIC.
>>> 
>>> So what is the difference between calling this special ioctl before
>>> creating vcpus and calling create device ioctl instead and create
>>> QEMU proxy device at whatever point in time QEMU wants to create it?
>> 
>> Because you'd have to stash the handle that KVM_CREATE_DEVICE returns
>> somewhere, waiting for the QEMU device to be created.
>> 
> OK, we try not to add interfaces for one userspace convenience though.
> Is this such insurmountable problem for QEMU?

Please go ahead and try to describe an interface the way you envision it. It needs to fulfill the following criteria:

  * different machine models have different interrupt controller types
  * we need to be able to fetch information from interrupt controllers, this should be as flexible as possible because we don't know all future state we want to synchronize today
  * user space creates its virtual representation of an interrupt controller after the vcpus got created
  * user space needs a token to an interrupt controller, so that we have the possibility to add a second in-kernel irqchip if the need arises

What the current interface does is:

  SET_IRQCHIP_TYPE:

    * declare CPUs as listeners to a specific irqchip bus
    * set the path that interrupt injection takes (this could probably be changed to dynamic lookups though, based on device tokens)

  CREATE_DEVICE:

    * spawn one or multiple in-kernel irqchip devices that hook up to CPUs using the irqchip bus
    * tell user space a token to access this irqchip

I really don't see why you wouldn't want to have that split.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:26         ` Gleb Natapov
@ 2013-03-06 11:44           ` Paolo Bonzini
  2013-03-06 11:46             ` Alexander Graf
  2013-03-06 11:44           ` Alexander Graf
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 11:44 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


> > > So what is the difference between calling this special ioctl before
> > > creating vcpus and calling create device ioctl instead and create
> > > QEMU proxy device at whatever point in time QEMU wants to create
> > > it?
> > 
> > Because you'd have to stash the handle that KVM_CREATE_DEVICE
> > returns somewhere, waiting for the QEMU device to be created.
> 
> OK, we try not to add interfaces for one userspace convenience
> though. Is this such insurmountable problem for QEMU?

Nothing is insurmountable.  However, forcing a particular order
of device creation is not very nice on userspace.  If the hypervisor
wants to do that, it can do userspace the favor of keeping the id
in kernel.  :)

> > Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
> > not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
> > and KVM_CREATE_DEVICE are added, it really will not create the
> > device anymore.
> > Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
> > KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.
> > 
> > So we can have this instead:
> > - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
> > can be a valid irqchip type).
> > 
> > - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it
> > a device type and possibly a VCPU number.
> > 
> > It's mostly about names, but one important property is that
> > KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
> > multiple times.  Gleb, do you like this more?
> 
> If you put it like this it sounds better (well you've just stashed
> the handle in kernel for QEMU convenience :)), but you've made the
> interface irqchips specific again and this is what we are trying to avoid.

Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because (following
the model of x86) the irqchip type is chosen before creating VCPUs.
I don't see an alternative unless we stop having irqchip as an
all-or-nothing choice.

I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
sure it is really what is needed in this case.  KVM_CREATE_DEVICE
would be perfect as a replacement for KVM_CREATE_PIT2, for example.
But in this case creating a device is not what we're really doing;
the creation is done magically by the hypervisor by virtue of
the previous KVM_CREATE_IRQCHIP_ARGS.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:44           ` Alexander Graf
@ 2013-03-06 11:46             ` Paolo Bonzini
  2013-03-06 11:47               ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 11:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, kvm-ppc, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell, Gleb Natapov

> Please go ahead and try to describe an interface the way you envision
> it. It needs to fulfill the following criteria:
> 
>   * different machine models have different interrupt controller
>   types
>   * we need to be able to fetch information from interrupt
>   controllers, this should be as flexible as possible because we
>   don't know all future state we want to synchronize today
>   * user space creates its virtual representation of an interrupt
>   controller after the vcpus got created
>   * user space needs a token to an interrupt controller, so that we
>   have the possibility to add a second in-kernel irqchip if the need
>   arises
> 
> What the current interface does is:
> 
>   SET_IRQCHIP_TYPE:
> 
>     * declare CPUs as listeners to a specific irqchip bus
>     * set the path that interrupt injection takes (this could
>     probably be changed to dynamic lookups though, based on device
>     tokens)
> 
>   CREATE_DEVICE:
> 
>     * spawn one or multiple in-kernel irqchip devices that hook up to
>     CPUs using the irqchip bus
>     * tell user space a token to access this irqchip
> 
> I really don't see why you wouldn't want to have that split.

I agree.  But is the device really being created at CREATE_DEVICE time?
What happens if you create N CPUs and N-1 irqchips?

On x86, the LAPIC is created magically together with the VCPU.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:44           ` Paolo Bonzini
@ 2013-03-06 11:46             ` Alexander Graf
  2013-03-06 11:59               ` Gleb Natapov
  0 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 11:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 12:44, Paolo Bonzini wrote:

> 
>>>> So what is the difference between calling this special ioctl before
>>>> creating vcpus and calling create device ioctl instead and create
>>>> QEMU proxy device at whatever point in time QEMU wants to create
>>>> it?
>>> 
>>> Because you'd have to stash the handle that KVM_CREATE_DEVICE
>>> returns somewhere, waiting for the QEMU device to be created.
>> 
>> OK, we try not to add interfaces for one userspace convenience
>> though. Is this such insurmountable problem for QEMU?
> 
> Nothing is insurmountable.  However, forcing a particular order
> of device creation is not very nice on userspace.  If the hypervisor
> wants to do that, it can do userspace the favor of keeping the id
> in kernel.  :)
> 
>>> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
>>> not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
>>> and KVM_CREATE_DEVICE are added, it really will not create the
>>> device anymore.
>>> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
>>> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.
>>> 
>>> So we can have this instead:
>>> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
>>> can be a valid irqchip type).
>>> 
>>> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it
>>> a device type and possibly a VCPU number.
>>> 
>>> It's mostly about names, but one important property is that
>>> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
>>> multiple times.  Gleb, do you like this more?
>> 
>> If you put it like this it sounds better (well you've just stashed
>> the handle in kernel for QEMU convenience :)), but you've made the
>> interface irqchips specific again and this is what we are trying to avoid.
> 
> Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because (following
> the model of x86) the irqchip type is chosen before creating VCPUs.
> I don't see an alternative unless we stop having irqchip as an
> all-or-nothing choice.
> 
> I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
> sure it is really what is needed in this case.  KVM_CREATE_DEVICE
> would be perfect as a replacement for KVM_CREATE_PIT2, for example.
> But in this case creating a device is not what we're really doing;
> the creation is done magically by the hypervisor by virtue of
> the previous KVM_CREATE_IRQCHIP_ARGS.

No, it's not and it shouldn't be. To speak in x86 terms:

  KVM_SET_IRQCHIP_TYPE spawns LAPICs (indirectly, they only get spawned on vcpu creation)
  KVM_CREATE_DEVICE spawns IOAPICs.


Alex


> 
> Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:46             ` Paolo Bonzini
@ 2013-03-06 11:47               ` Alexander Graf
  2013-03-06 11:57                 ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 11:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, kvm-ppc, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell, Gleb Natapov


On 06.03.2013, at 12:46, Paolo Bonzini wrote:

>> Please go ahead and try to describe an interface the way you envision
>> it. It needs to fulfill the following criteria:
>> 
>>  * different machine models have different interrupt controller
>>  types
>>  * we need to be able to fetch information from interrupt
>>  controllers, this should be as flexible as possible because we
>>  don't know all future state we want to synchronize today
>>  * user space creates its virtual representation of an interrupt
>>  controller after the vcpus got created
>>  * user space needs a token to an interrupt controller, so that we
>>  have the possibility to add a second in-kernel irqchip if the need
>>  arises
>> 
>> What the current interface does is:
>> 
>>  SET_IRQCHIP_TYPE:
>> 
>>    * declare CPUs as listeners to a specific irqchip bus
>>    * set the path that interrupt injection takes (this could
>>    probably be changed to dynamic lookups though, based on device
>>    tokens)
>> 
>>  CREATE_DEVICE:
>> 
>>    * spawn one or multiple in-kernel irqchip devices that hook up to
>>    CPUs using the irqchip bus
>>    * tell user space a token to access this irqchip
>> 
>> I really don't see why you wouldn't want to have that split.
> 
> I agree.  But is the device really being created at CREATE_DEVICE time?
> What happens if you create N CPUs and N-1 irqchips?

irqchip in CREATE_DEVICE is the IOAPIC, not the LAPIC. The LAPIC gets spawned at vcpu creation.

> On x86, the LAPIC is created magically together with the VCPU.

Yes, and so far I haven't seen any proposal to change this even in the CREATE_DEVICE world.


Alex


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:47               ` Alexander Graf
@ 2013-03-06 11:57                 ` Paolo Bonzini
  2013-03-06 11:58                   ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 11:57 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, kvm-ppc, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell, Gleb Natapov

> > I agree.  But is the device really being created at CREATE_DEVICE
> > time? What happens if you create N CPUs and N-1 irqchips?
> 
> irqchip in CREATE_DEVICE is the IOAPIC, not the LAPIC. The LAPIC gets
> spawned at vcpu creation.
>
> > On x86, the LAPIC is created magically together with the VCPU.
> 
> Yes, and so far I haven't seen any proposal to change this even in
> the CREATE_DEVICE world.

But don't you need anyway an id to get/set the device properties of
the per-VCPU irqchip?  If you were adding the x86 irqchip with the
new API, what would be the replacement of KVM_GET_LAPIC/KVM_SET_LAPIC?

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:57                 ` Paolo Bonzini
@ 2013-03-06 11:58                   ` Alexander Graf
  2013-03-06 13:16                     ` Gleb Natapov
  0 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 11:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, kvm-ppc, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell, Gleb Natapov


On 06.03.2013, at 12:57, Paolo Bonzini wrote:

>>> I agree.  But is the device really being created at CREATE_DEVICE
>>> time? What happens if you create N CPUs and N-1 irqchips?
>> 
>> irqchip in CREATE_DEVICE is the IOAPIC, not the LAPIC. The LAPIC gets
>> spawned at vcpu creation.
>> 
>>> On x86, the LAPIC is created magically together with the VCPU.
>> 
>> Yes, and so far I haven't seen any proposal to change this even in
>> the CREATE_DEVICE world.
> 
> But don't you need anyway an id to get/set the device properties of
> the per-VCPU irqchip?  If you were adding the x86 irqchip with the
> new API, what would be the replacement of KVM_GET_LAPIC/KVM_SET_LAPIC?

In the current model, that would be ONE_REG registers on the vcpus.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:46             ` Alexander Graf
@ 2013-03-06 11:59               ` Gleb Natapov
  2013-03-06 12:02                 ` Alexander Graf
  2013-03-06 12:14                 ` Paolo Bonzini
  0 siblings, 2 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 11:59 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 12:46:52PM +0100, Alexander Graf wrote:
> 
> On 06.03.2013, at 12:44, Paolo Bonzini wrote:
> 
> > 
> >>>> So what is the difference between calling this special ioctl before
> >>>> creating vcpus and calling create device ioctl instead and create
> >>>> QEMU proxy device at whatever point in time QEMU wants to create
> >>>> it?
> >>> 
> >>> Because you'd have to stash the handle that KVM_CREATE_DEVICE
> >>> returns somewhere, waiting for the QEMU device to be created.
> >> 
> >> OK, we try not to add interfaces for one userspace convenience
> >> though. Is this such insurmountable problem for QEMU?
> > 
> > Nothing is insurmountable.  However, forcing a particular order
> > of device creation is not very nice on userspace.  If the hypervisor
> > wants to do that, it can do userspace the favor of keeping the id
> > in kernel.  :)
> > 
> >>> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
> >>> not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
> >>> and KVM_CREATE_DEVICE are added, it really will not create the
> >>> device anymore.
> >>> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
> >>> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.
> >>> 
> >>> So we can have this instead:
> >>> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
> >>> can be a valid irqchip type).
> >>> 
> >>> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it
> >>> a device type and possibly a VCPU number.
> >>> 
> >>> It's mostly about names, but one important property is that
> >>> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
> >>> multiple times.  Gleb, do you like this more?
> >> 
> >> If you put it like this it sounds better (well you've just stashed
> >> the handle in kernel for QEMU convenience :)), but you've made the
> >> interface irqchips specific again and this is what we are trying to avoid.
> > 
> > Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because (following
> > the model of x86) the irqchip type is chosen before creating VCPUs.
> > I don't see an alternative unless we stop having irqchip as an
> > all-or-nothing choice.
> > 
> > I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
> > sure it is really what is needed in this case.  KVM_CREATE_DEVICE
> > would be perfect as a replacement for KVM_CREATE_PIT2, for example.
> > But in this case creating a device is not what we're really doing;
> > the creation is done magically by the hypervisor by virtue of
> > the previous KVM_CREATE_IRQCHIP_ARGS.
> 
> No, it's not and it shouldn't be. To speak in x86 terms:
> 
>   KVM_SET_IRQCHIP_TYPE spawns LAPICs (indirectly, they only get spawned on vcpu creation)
>   KVM_CREATE_DEVICE spawns IOAPICs.
> 
> 
Agree. Lumping up in-kernel LAPIC and IRQCHIPS under one in-kernel
irqchip umbrella was a mistake on x86. The one we should not force on
others.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:59               ` Gleb Natapov
@ 2013-03-06 12:02                 ` Alexander Graf
  2013-03-06 12:14                 ` Paolo Bonzini
  1 sibling, 0 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 12:02 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 12:59, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 12:46:52PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 12:44, Paolo Bonzini wrote:
>> 
>>> 
>>>>>> So what is the difference between calling this special ioctl before
>>>>>> creating vcpus and calling create device ioctl instead and create
>>>>>> QEMU proxy device at whatever point in time QEMU wants to create
>>>>>> it?
>>>>> 
>>>>> Because you'd have to stash the handle that KVM_CREATE_DEVICE
>>>>> returns somewhere, waiting for the QEMU device to be created.
>>>> 
>>>> OK, we try not to add interfaces for one userspace convenience
>>>> though. Is this such insurmountable problem for QEMU?
>>> 
>>> Nothing is insurmountable.  However, forcing a particular order
>>> of device creation is not very nice on userspace.  If the hypervisor
>>> wants to do that, it can do userspace the favor of keeping the id
>>> in kernel.  :)
>>> 
>>>>> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is simply
>>>>> not the right name for the interface.  Once both KVM_CREATE_IRQCHIP_ARGS
>>>>> and KVM_CREATE_DEVICE are added, it really will not create the
>>>>> device anymore.
>>>>> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and possibly by
>>>>> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an id.
>>>>> 
>>>>> So we can have this instead:
>>>>> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and "none"
>>>>> can be a valid irqchip type).
>>>>> 
>>>>> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you pass it
>>>>> a device type and possibly a VCPU number.
>>>>> 
>>>>> It's mostly about names, but one important property is that
>>>>> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
>>>>> multiple times.  Gleb, do you like this more?
>>>> 
>>>> If you put it like this it sounds better (well you've just stashed
>>>> the handle in kernel for QEMU convenience :)), but you've made the
>>>> interface irqchips specific again and this is what we are trying to avoid.
>>> 
>>> Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because (following
>>> the model of x86) the irqchip type is chosen before creating VCPUs.
>>> I don't see an alternative unless we stop having irqchip as an
>>> all-or-nothing choice.
>>> 
>>> I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
>>> sure it is really what is needed in this case.  KVM_CREATE_DEVICE
>>> would be perfect as a replacement for KVM_CREATE_PIT2, for example.
>>> But in this case creating a device is not what we're really doing;
>>> the creation is done magically by the hypervisor by virtue of
>>> the previous KVM_CREATE_IRQCHIP_ARGS.
>> 
>> No, it's not and it shouldn't be. To speak in x86 terms:
>> 
>>  KVM_SET_IRQCHIP_TYPE spawns LAPICs (indirectly, they only get spawned on vcpu creation)
>>  KVM_CREATE_DEVICE spawns IOAPICs.
>> 
>> 
> Agree. Lumping up in-kernel LAPIC and IRQCHIPS under one in-kernel
> irqchip umbrella was a mistake on x86. The one we should not force on
> others.

Good :). So where do we disagree now?


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:59               ` Gleb Natapov
  2013-03-06 12:02                 ` Alexander Graf
@ 2013-03-06 12:14                 ` Paolo Bonzini
  2013-03-06 12:20                   ` Alexander Graf
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 12:14 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: kvm, kvm-ppc, Stuart Yoder, Scott Wood, Paul Mackerras,
	Peter Maydell, Alexander Graf


> > >>>> So what is the difference between calling this special ioctl
> > >>>> before
> > >>>> creating vcpus and calling create device ioctl instead and
> > >>>> create
> > >>>> QEMU proxy device at whatever point in time QEMU wants to
> > >>>> create
> > >>>> it?
> > >>> 
> > >>> Because you'd have to stash the handle that KVM_CREATE_DEVICE
> > >>> returns somewhere, waiting for the QEMU device to be created.
> > >> 
> > >> OK, we try not to add interfaces for one userspace convenience
> > >> though. Is this such insurmountable problem for QEMU?
> > > 
> > > Nothing is insurmountable.  However, forcing a particular order
> > > of device creation is not very nice on userspace.  If the
> > > hypervisor
> > > wants to do that, it can do userspace the favor of keeping the id
> > > in kernel.  :)
> > > 
> > >>> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is
> > >>> simply
> > >>> not the right name for the interface.  Once both
> > >>> KVM_CREATE_IRQCHIP_ARGS
> > >>> and KVM_CREATE_DEVICE are added, it really will not create the
> > >>> device anymore.
> > >>> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and
> > >>> possibly by
> > >>> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an
> > >>> id.
> > >>> 
> > >>> So we can have this instead:
> > >>> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and
> > >>> "none"
> > >>> can be a valid irqchip type).
> > >>> 
> > >>> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you
> > >>> pass it
> > >>> a device type and possibly a VCPU number.
> > >>> 
> > >>> It's mostly about names, but one important property is that
> > >>> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
> > >>> multiple times.  Gleb, do you like this more?
> > >> 
> > >> If you put it like this it sounds better (well you've just
> > >> stashed
> > >> the handle in kernel for QEMU convenience :)), but you've made
> > >> the
> > >> interface irqchips specific again and this is what we are trying
> > >> to avoid.
> > > 
> > > Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because
> > > (following
> > > the model of x86) the irqchip type is chosen before creating
> > > VCPUs.
> > > I don't see an alternative unless we stop having irqchip as an
> > > all-or-nothing choice.
> > > 
> > > I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
> > > sure it is really what is needed in this case.  KVM_CREATE_DEVICE
> > > would be perfect as a replacement for KVM_CREATE_PIT2, for
> > > example.
> > > But in this case creating a device is not what we're really
> > > doing;
> > > the creation is done magically by the hypervisor by virtue of
> > > the previous KVM_CREATE_IRQCHIP_ARGS.
> > 
> > No, it's not and it shouldn't be. To speak in x86 terms:
> > 
> >   KVM_SET_IRQCHIP_TYPE spawns LAPICs (indirectly, they only get
> >   spawned on vcpu creation)
> >   KVM_CREATE_DEVICE spawns IOAPICs.

Ok, that makes sense.

> Agree. Lumping up in-kernel LAPIC and IRQCHIPS under one in-kernel
> irqchip umbrella was a mistake on x86. The one we should not force on
> others.

Alex, would the PPC patches let you run with in-kernel "LAPICs"
and userspace "IOAPICs"?  If so, the new model would not be a
problem with QEMU at all.

The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 12:14                 ` Paolo Bonzini
@ 2013-03-06 12:20                   ` Alexander Graf
  2013-03-06 12:28                     ` Paolo Bonzini
  2013-03-06 13:14                     ` Gleb Natapov
  0 siblings, 2 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 12:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 13:14, Paolo Bonzini wrote:

> 
>>>>>>> So what is the difference between calling this special ioctl
>>>>>>> before
>>>>>>> creating vcpus and calling create device ioctl instead and
>>>>>>> create
>>>>>>> QEMU proxy device at whatever point in time QEMU wants to
>>>>>>> create
>>>>>>> it?
>>>>>> 
>>>>>> Because you'd have to stash the handle that KVM_CREATE_DEVICE
>>>>>> returns somewhere, waiting for the QEMU device to be created.
>>>>> 
>>>>> OK, we try not to add interfaces for one userspace convenience
>>>>> though. Is this such insurmountable problem for QEMU?
>>>> 
>>>> Nothing is insurmountable.  However, forcing a particular order
>>>> of device creation is not very nice on userspace.  If the
>>>> hypervisor
>>>> wants to do that, it can do userspace the favor of keeping the id
>>>> in kernel.  :)
>>>> 
>>>>>> Perhaps it's just a problem of naming, and KVM_CREATE_DEVICE is
>>>>>> simply
>>>>>> not the right name for the interface.  Once both
>>>>>> KVM_CREATE_IRQCHIP_ARGS
>>>>>> and KVM_CREATE_DEVICE are added, it really will not create the
>>>>>> device anymore.
>>>>>> Devices will be created by KVM_CREATE_IRQCHIP_ARGS, and
>>>>>> possibly by
>>>>>> KVM_CREATE_VCPU.  KVM_CREATE_DEVICE is really only returning an
>>>>>> id.
>>>>>> 
>>>>>> So we can have this instead:
>>>>>> - KVM_CREATE_IRQCHIP_ARGS becomes KVM_SET_IRQCHIP_TYPE (and
>>>>>> "none"
>>>>>> can be a valid irqchip type).
>>>>>> 
>>>>>> - KVM_CREATE_DEVICE becomes KVM_GET_IRQCHIP_DEVICE, and you
>>>>>> pass it
>>>>>> a device type and possibly a VCPU number.
>>>>>> 
>>>>>> It's mostly about names, but one important property is that
>>>>>> KVM_GET_IRQCHIP_DEVICE can be called at any time and, in fact,
>>>>>> multiple times.  Gleb, do you like this more?
>>>>> 
>>>>> If you put it like this it sounds better (well you've just
>>>>> stashed
>>>>> the handle in kernel for QEMU convenience :)), but you've made
>>>>> the
>>>>> interface irqchips specific again and this is what we are trying
>>>>> to avoid.
>>>> 
>>>> Yes, KVM_GET_IRQCHIP_DEVICE is specific to irqchips because
>>>> (following
>>>> the model of x86) the irqchip type is chosen before creating
>>>> VCPUs.
>>>> I don't see an alternative unless we stop having irqchip as an
>>>> all-or-nothing choice.
>>>> 
>>>> I'm not saying KVM_CREATE_DEVICE is a bad interface, but I'm not
>>>> sure it is really what is needed in this case.  KVM_CREATE_DEVICE
>>>> would be perfect as a replacement for KVM_CREATE_PIT2, for
>>>> example.
>>>> But in this case creating a device is not what we're really
>>>> doing;
>>>> the creation is done magically by the hypervisor by virtue of
>>>> the previous KVM_CREATE_IRQCHIP_ARGS.
>>> 
>>> No, it's not and it shouldn't be. To speak in x86 terms:
>>> 
>>>  KVM_SET_IRQCHIP_TYPE spawns LAPICs (indirectly, they only get
>>>  spawned on vcpu creation)
>>>  KVM_CREATE_DEVICE spawns IOAPICs.
> 
> Ok, that makes sense.
> 
>> Agree. Lumping up in-kernel LAPIC and IRQCHIPS under one in-kernel
>> irqchip umbrella was a mistake on x86. The one we should not force on
>> others.
> 
> Alex, would the PPC patches let you run with in-kernel "LAPICs"
> and userspace "IOAPICs"?  If so, the new model would not be a
> problem with QEMU at all.

The split on PPC isn't that clean. The MPIC doesn't split it at all for example. There we only have an "IOAPIC" without a "LAPIC". So setting the irqchip type to MPIC would be a nop.

For XICS, we would have something similar to a LAPIC. We would however have to communicate with that piece to tell it that interrupts are pending or not. I suppose this might be doable through the ONE_REG interface that Paul implemented, but I'm not sure.

I don't really think doing such a split makes sense though :).

> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.

Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.


Alex

> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 12:20                   ` Alexander Graf
@ 2013-03-06 12:28                     ` Paolo Bonzini
  2013-03-06 13:14                     ` Gleb Natapov
  1 sibling, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 12:28 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

> > Alex, would the PPC patches let you run with in-kernel "LAPICs"
> > and userspace "IOAPICs"?  If so, the new model would not be a
> > problem with QEMU at all.
> 
> The split on PPC isn't that clean. The MPIC doesn't split it at all
> for example. There we only have an "IOAPIC" without a "LAPIC". So
> setting the irqchip type to MPIC would be a nop.
> 
> For XICS, we would have something similar to a LAPIC. We would
> however have to communicate with that piece to tell it that
> interrupts are pending or not. I suppose this might be doable
> through the ONE_REG interface that Paul implemented, but I'm not
> sure.

Paul, can you confirm?

> I don't really think doing such a split makes sense though :).
> 
> > The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> > KVM_CREATE_IRQCHIP_ARGS) forced you to later call
> > KVM_CREATE_DEVICE.
> 
> Ah, I see. I don't see why it would. The fact that there is a "LAPIC"
> doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So
> if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller
> emulation I would consider that a bug.

If there's agreement that KVM_SET_IRQCHIP_TYPE doesn't force
subsequent KVM_CREATE_DEVICE calls (for CPU interrupt controllers
that's because they're controlled via ONE_REG and not DEVICE_ATTR),
I think we're fine.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 12:20                   ` Alexander Graf
  2013-03-06 12:28                     ` Paolo Bonzini
@ 2013-03-06 13:14                     ` Gleb Natapov
  2013-03-06 13:22                       ` Alexander Graf
  2013-03-06 13:41                       ` Paolo Bonzini
  1 sibling, 2 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 13:14 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
> > The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> > KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
> 
> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
> 
For x86 this is the case though. I do not see how it can't be. If
LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
vector that should be handled as a result of LAPIC emulation.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 11:58                   ` Alexander Graf
@ 2013-03-06 13:16                     ` Gleb Natapov
  0 siblings, 0 replies; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 13:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 12:58:39PM +0100, Alexander Graf wrote:
> 
> On 06.03.2013, at 12:57, Paolo Bonzini wrote:
> 
> >>> I agree.  But is the device really being created at CREATE_DEVICE
> >>> time? What happens if you create N CPUs and N-1 irqchips?
> >> 
> >> irqchip in CREATE_DEVICE is the IOAPIC, not the LAPIC. The LAPIC gets
> >> spawned at vcpu creation.
> >> 
> >>> On x86, the LAPIC is created magically together with the VCPU.
> >> 
> >> Yes, and so far I haven't seen any proposal to change this even in
> >> the CREATE_DEVICE world.
> > 
> > But don't you need anyway an id to get/set the device properties of
> > the per-VCPU irqchip?  If you were adding the x86 irqchip with the
> > new API, what would be the replacement of KVM_GET_LAPIC/KVM_SET_LAPIC?
> 
> In the current model, that would be ONE_REG registers on the vcpus.
> 
Yes, the same way as KVM_GET_LAPIC is per cpu ioctl. No need for special
ID.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 13:14                     ` Gleb Natapov
@ 2013-03-06 13:22                       ` Alexander Graf
  2013-03-06 13:56                         ` Gleb Natapov
  2013-03-06 13:41                       ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 13:22 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 14:14, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>> 
>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>> 
> For x86 this is the case though. I do not see how it can't be. If
> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> vector that should be handled as a result of LAPIC emulation.

So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 13:14                     ` Gleb Natapov
  2013-03-06 13:22                       ` Alexander Graf
@ 2013-03-06 13:41                       ` Paolo Bonzini
  2013-03-06 14:11                         ` Gleb Natapov
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 13:41 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

Il 06/03/2013 14:14, Gleb Natapov ha scritto:
>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>> 
>>> Ah, I see. I don't see why it would. The fact that there is a
>>> "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops
>>> working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt
>>> controller emulation I would consider that a bug.
>> 
> For x86 this is the case though. I do not see how it can't be. If
> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> vector that should be handled as a result of LAPIC emulation.

SET_IRQCHIP_TYPE creates the LAPICs; it would indeed break userspace
LAPIC emulation because the LAPICs would not cause userspace exits anymore.

However, it need not mandate the usage of an in-kernel IOAPIC or PIC
though.  KVM_INTERRUPT, the docs say, "is only useful if in-kernel local
APIC or equivalent is not used", but it is really only useful for if
in-kernel *IOAPIC* is not used.  The userspace IOAPIC can use it to
inject the interrupts to the in-kernel LAPIC.

So, it would be possible to create the IOAPIC or PIC separately with
KVM_CREATE_DEVICE, and have the userspace devices inject the interrupts
with KVM_IRQ_LINE_STATUS (PIC->IOAPIC) or KVM_INTERRUPT (IOAPIC->LAPIC).

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 13:22                       ` Alexander Graf
@ 2013-03-06 13:56                         ` Gleb Natapov
  2013-03-06 14:03                           ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 13:56 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
> 
> On 06.03.2013, at 14:14, Gleb Natapov wrote:
> 
> > On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
> >>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> >>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
> >> 
> >> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
> >> 
> > For x86 this is the case though. I do not see how it can't be. If
> > LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> > vector that should be handled as a result of LAPIC emulation.
> 
> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
> 
No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
jump immediately. LAPIC is really part of a cpu and we cut it and put into
userspace, so interface between userspace LAPIC emulation is really low
level and has to be synchronous. X86 has two interrupt lines NMI and INTR
and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
GSI lines which do not go into CPU directly. They go either via PIC (which
triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
with APICs via bus, but in our emulation just calls APICs directly.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 13:56                         ` Gleb Natapov
@ 2013-03-06 14:03                           ` Alexander Graf
  2013-03-06 14:12                             ` Paolo Bonzini
  2013-03-06 14:41                             ` Gleb Natapov
  0 siblings, 2 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:03 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On 06.03.2013, at 14:56, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>> 
>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>> 
>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>>>> 
>>> For x86 this is the case though. I do not see how it can't be. If
>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>> vector that should be handled as a result of LAPIC emulation.
>> 
>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
>> 
> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
> jump immediately. LAPIC is really part of a cpu and we cut it and put into
> userspace, so interface between userspace LAPIC emulation is really low
> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
> GSI lines which do not go into CPU directly. They go either via PIC (which
> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
> with APICs via bus, but in our emulation just calls APICs directly.

Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.

So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).

KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :). But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.

So again, I'm failing to see where we think differently :).

Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 13:41                       ` Paolo Bonzini
@ 2013-03-06 14:11                         ` Gleb Natapov
  2013-03-06 14:31                           ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 14:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 02:41:04PM +0100, Paolo Bonzini wrote:
> Il 06/03/2013 14:14, Gleb Natapov ha scritto:
> >>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> >>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
> >>> 
> >>> Ah, I see. I don't see why it would. The fact that there is a
> >>> "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops
> >>> working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt
> >>> controller emulation I would consider that a bug.
> >> 
> > For x86 this is the case though. I do not see how it can't be. If
> > LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> > vector that should be handled as a result of LAPIC emulation.
> 
> SET_IRQCHIP_TYPE creates the LAPICs; it would indeed break userspace
> LAPIC emulation because the LAPICs would not cause userspace exits anymore.
> 
The reason it will break userspace is much more fundamental than that.
There is not interface suitable for communication between userspace
PIC/IOAPIC and in-kernel LAPIC right now.

Why LAPICs should cause userspace exit? The reason we exit to userspace
without in kernel irqchip is because we need LAPIC emulation to run.
 
> However, it need not mandate the usage of an in-kernel IOAPIC or PIC
> though. 
There is no such need, its just how things are implemented right now. To
allow IOAPIC/PIC to be in userspace and LAPIC in kernel additional
interfaces are needed and since we need to keep current way for
foreseeable feature the value of doing the work is not clear.

>          KVM_INTERRUPT, the docs say, "is only useful if in-kernel local
> APIC or equivalent is not used", but it is really only useful for if
> in-kernel *IOAPIC* is not used.  The userspace IOAPIC can use it to
> inject the interrupts to the in-kernel LAPIC.
KVM_INTERRUPT is synchronous. There is no reason for userspace
IOAPIC<->kernel space LAPIC interface to be synchronous. KVM_IRQ_LINE is
much better fit.

> 
> So, it would be possible to create the IOAPIC or PIC separately with
> KVM_CREATE_DEVICE, and have the userspace devices inject the interrupts
> with KVM_IRQ_LINE_STATUS (PIC->IOAPIC) or KVM_INTERRUPT (IOAPIC->LAPIC).
> 
PIC never injects into IOAPIC. Some GSIs are handled by both.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:03                           ` Alexander Graf
@ 2013-03-06 14:12                             ` Paolo Bonzini
  2013-03-06 14:30                               ` Alexander Graf
  2013-03-06 14:41                             ` Gleb Natapov
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 14:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

Il 06/03/2013 15:03, Alexander Graf ha scritto:
> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's
> fine. That ioctl should get an ioapic device handle to work on.

It would be a KVM_SET_DEVICE_ATTR in your case, right?

> Whether we call the IOAPIC PINs GSIs or something different is really
> just a naming question. I'd probably call it IRQ number :).

Yup.

> So again, I'm failing to see where we think differently :).

I think we're not, just making sure that the existing x86 ioctls can be
clearly mapping to what you're proposed.

The only change that came up is the rename of KVM_CREATE_IRQCHIP_ARGS,
and the addition of a "none" type.  Everything else is just clarifying
the desired semantics (and Gleb correcting me on several accounts---I
hope I haven't caused more confusion).

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:12                             ` Paolo Bonzini
@ 2013-03-06 14:30                               ` Alexander Graf
  2013-03-06 14:37                                 ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 15:12, Paolo Bonzini wrote:

> Il 06/03/2013 15:03, Alexander Graf ha scritto:
>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's
>> fine. That ioctl should get an ioapic device handle to work on.
> 
> It would be a KVM_SET_DEVICE_ATTR in your case, right?

No, it would be KVM_IRQ_LINE. It's basically a command ("do this interrupt"), not an attribute modification. Unless we want to implement the IRQ pin levels on the "IOAPIC" as attributes. Then it'd be a SET_DEVICE_ATTR. But that makes edge interrupt injection harder / less obvious ;).


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:11                         ` Gleb Natapov
@ 2013-03-06 14:31                           ` Alexander Graf
  2013-03-06 18:46                             ` Peter Maydell
  0 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:31 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 15:11, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 02:41:04PM +0100, Paolo Bonzini wrote:
>> Il 06/03/2013 14:14, Gleb Natapov ha scritto:
>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>> 
>>>>> Ah, I see. I don't see why it would. The fact that there is a
>>>>> "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops
>>>>> working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt
>>>>> controller emulation I would consider that a bug.
>>>> 
>>> For x86 this is the case though. I do not see how it can't be. If
>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>> vector that should be handled as a result of LAPIC emulation.
>> 
>> SET_IRQCHIP_TYPE creates the LAPICs; it would indeed break userspace
>> LAPIC emulation because the LAPICs would not cause userspace exits anymore.
>> 
> The reason it will break userspace is much more fundamental than that.
> There is not interface suitable for communication between userspace
> PIC/IOAPIC and in-kernel LAPIC right now.
> 
> Why LAPICs should cause userspace exit? The reason we exit to userspace
> without in kernel irqchip is because we need LAPIC emulation to run.
> 
>> However, it need not mandate the usage of an in-kernel IOAPIC or PIC
>> though. 
> There is no such need, its just how things are implemented right now. To
> allow IOAPIC/PIC to be in userspace and LAPIC in kernel additional
> interfaces are needed and since we need to keep current way for
> foreseeable feature the value of doing the work is not clear.
> 
>>         KVM_INTERRUPT, the docs say, "is only useful if in-kernel local
>> APIC or equivalent is not used", but it is really only useful for if
>> in-kernel *IOAPIC* is not used.  The userspace IOAPIC can use it to
>> inject the interrupts to the in-kernel LAPIC.
> KVM_INTERRUPT is synchronous.

Ah, I think that's where my thinko is. On PPC, KVM_INTERRUPT is asynchronous.


Alex

> There is no reason for userspace
> IOAPIC<->kernel space LAPIC interface to be synchronous. KVM_IRQ_LINE is
> much better fit.
> 
>> 
>> So, it would be possible to create the IOAPIC or PIC separately with
>> KVM_CREATE_DEVICE, and have the userspace devices inject the interrupts
>> with KVM_IRQ_LINE_STATUS (PIC->IOAPIC) or KVM_INTERRUPT (IOAPIC->LAPIC).
>> 
> PIC never injects into IOAPIC. Some GSIs are handled by both.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:30                               ` Alexander Graf
@ 2013-03-06 14:37                                 ` Paolo Bonzini
  2013-03-06 14:40                                   ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 14:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

Il 06/03/2013 15:30, Alexander Graf ha scritto:
>>> >> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's
>>> >> fine. That ioctl should get an ioapic device handle to work on.
>> > 
>> > It would be a KVM_SET_DEVICE_ATTR in your case, right?
> No, it would be KVM_IRQ_LINE. It's basically a command ("do this
> interrupt"), not an attribute modification. Unless we want to
> implement the IRQ pin levels on the "IOAPIC" as attributes. Then it'd
> be a SET_DEVICE_ATTR. But that makes edge interrupt injection harder
> / less obvious ;).

Why is it harder?  You don't really inject interrupts, you inject
changes of the pin status, don't you?

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:37                                 ` Paolo Bonzini
@ 2013-03-06 14:40                                   ` Alexander Graf
  0 siblings, 0 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 15:37, Paolo Bonzini wrote:

> Il 06/03/2013 15:30, Alexander Graf ha scritto:
>>>>>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's
>>>>>> fine. That ioctl should get an ioapic device handle to work on.
>>>> 
>>>> It would be a KVM_SET_DEVICE_ATTR in your case, right?
>> No, it would be KVM_IRQ_LINE. It's basically a command ("do this
>> interrupt"), not an attribute modification. Unless we want to
>> implement the IRQ pin levels on the "IOAPIC" as attributes. Then it'd
>> be a SET_DEVICE_ATTR. But that makes edge interrupt injection harder
>> / less obvious ;).
> 
> Why is it harder?  You don't really inject interrupts, you inject
> changes of the pin status, don't you?

Because we need to somehow model irqfd as well at least for MSIs. So I'd prefer to reuse the same interface :). Whether we plumb this behind a SET_DEVICE_ADDR ioctl or behind a KVM_IRQ_LINE ioctl is something I don't care much about though.

Since irqfd is essentially a command, it just feels more natural to treat it as such.


Alex


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:03                           ` Alexander Graf
  2013-03-06 14:12                             ` Paolo Bonzini
@ 2013-03-06 14:41                             ` Gleb Natapov
  2013-03-06 14:48                               ` Alexander Graf
  1 sibling, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 14:41 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
> 
> On 06.03.2013, at 14:56, Gleb Natapov wrote:
> 
> > On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
> >> 
> >> On 06.03.2013, at 14:14, Gleb Natapov wrote:
> >> 
> >>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
> >>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> >>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
> >>>> 
> >>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
> >>>> 
> >>> For x86 this is the case though. I do not see how it can't be. If
> >>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> >>> vector that should be handled as a result of LAPIC emulation.
> >> 
> >> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
> >> 
> > No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
> > jump immediately. LAPIC is really part of a cpu and we cut it and put into
> > userspace, so interface between userspace LAPIC emulation is really low
> > level and has to be synchronous. X86 has two interrupt lines NMI and INTR
> > and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
> > GSI lines which do not go into CPU directly. They go either via PIC (which
> > triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
> > with APICs via bus, but in our emulation just calls APICs directly.
> 
> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
> 
> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
>
Is it possible for some devices to inject interrupt directly and other
to go through interrupt controller?
 
> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
if it goes through PIC, so additional entity was invented: irq routing.
It maps between GSI and irqchips pin. Same GSI may go to more than one
irqchip. This is why for x86 having irqchip device handle as a parameter
to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
router and this is how it work now except that "device handlers" are
hard coded.
 
> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
The restriction that LAPIC has to be created before IOAPIC would be a
bug that need to be fixed on X86. The reason is cpu hotplug. If you have
to support cpu hotplug you have to be able to create LAPICs after IOAPIC
and at this point you can create IOAPIC before any LAPICs as well. I
understand this may not be the case for all architectures right now, but
something to keep in mind.

> 
> So again, I'm failing to see where we think differently :).
> 
The difference is very minor really. I still try to justify to myself
why we need separate ioctl() to announce what irqchip we are going to
create before creating one (except save QEMU some troubles). The question
is: is this ioctl can be useful by itself? Seems like unlikely scenario
that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
kernel may be such case. QEMU will call it before creating vcpus to
tell KVM that LAPICs need to be created along with VCPUs, but no
irqchip will be created.

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:41                             ` Gleb Natapov
@ 2013-03-06 14:48                               ` Alexander Graf
  2013-03-06 14:59                                 ` Alexander Graf
                                                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:48 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 15:41, Gleb Natapov wrote:

> On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 14:56, Gleb Natapov wrote:
>> 
>>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>>>> 
>>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>>> 
>>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>>>>>> 
>>>>> For x86 this is the case though. I do not see how it can't be. If
>>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>>>> vector that should be handled as a result of LAPIC emulation.
>>>> 
>>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
>>>> 
>>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
>>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
>>> userspace, so interface between userspace LAPIC emulation is really low
>>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
>>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
>>> GSI lines which do not go into CPU directly. They go either via PIC (which
>>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
>>> with APICs via bus, but in our emulation just calls APICs directly.
>> 
>> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
>> 
>> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
>> 
> Is it possible for some devices to inject interrupt directly and other
> to go through interrupt controller?

It would be racy if both assert + deassert the same line, but I don't see why we should keep anyone from doing it. If user space wants to run such a configuration, it needs to ensure that only one of the 2 is actively used at any given time.

>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
> Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
> X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
> if it goes through PIC, so additional entity was invented: irq routing.
> It maps between GSI and irqchips pin. Same GSI may go to more than one
> irqchip. This is why for x86 having irqchip device handle as a parameter
> to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
> router and this is how it work now except that "device handlers" are
> hard coded.

Then you would create a new "irq router" device that does the multiplexing and can also receive IRQs. You could then directly assert an IOAPIC/PIC line or a multiplexer line. Or am I misunderstanding something?

> 
>> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
> The restriction that LAPIC has to be created before IOAPIC would be a
> bug that need to be fixed on X86. The reason is cpu hotplug. If you have
> to support cpu hotplug you have to be able to create LAPICs after IOAPIC
> and at this point you can create IOAPIC before any LAPICs as well. I
> understand this may not be the case for all architectures right now, but
> something to keep in mind.

Paul, Scott, do you think we can move the "this CPU can receive interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That ENABLE_CAP would allocate the structures in the vcpu and register the vcpu with the interrupt controller pool.

The interrupt controller device would still iterate through all vcpus to find the ones that match so that we support the ENABLE_CAP at any point in time.

> 
>> 
>> So again, I'm failing to see where we think differently :).
>> 
> The difference is very minor really. I still try to justify to myself
> why we need separate ioctl() to announce what irqchip we are going to
> create before creating one (except save QEMU some troubles). The question
> is: is this ioctl can be useful by itself? Seems like unlikely scenario
> that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
> kernel may be such case. QEMU will call it before creating vcpus to
> tell KVM that LAPICs need to be created along with VCPUs, but no
> irqchip will be created.

I don't have a real answer for you yet, but so far the general design mantra of "small, individual pieces that plug together" worked out way better for us than the "have one call that does it all" one. Being explicit simply makes sure that we support more scenarios we don't think of today.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:48                               ` Alexander Graf
@ 2013-03-06 14:59                                 ` Alexander Graf
  2013-03-06 15:02                                   ` Paolo Bonzini
  2013-03-06 15:30                                 ` Gleb Natapov
  2013-03-07  0:32                                 ` Paul Mackerras
  2 siblings, 1 reply; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 14:59 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell


On 06.03.2013, at 15:48, Alexander Graf wrote:

> 
> On 06.03.2013, at 15:41, Gleb Natapov wrote:
> 
>> On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
>>> 
>>> On 06.03.2013, at 14:56, Gleb Natapov wrote:
>>> 
>>>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>>>>> 
>>>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>>>>> 
>>>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>>>> 
>>>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>>>>>>> 
>>>>>> For x86 this is the case though. I do not see how it can't be. If
>>>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>>>>> vector that should be handled as a result of LAPIC emulation.
>>>>> 
>>>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
>>>>> 
>>>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
>>>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
>>>> userspace, so interface between userspace LAPIC emulation is really low
>>>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
>>>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
>>>> GSI lines which do not go into CPU directly. They go either via PIC (which
>>>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
>>>> with APICs via bus, but in our emulation just calls APICs directly.
>>> 
>>> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
>>> 
>>> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
>>> 
>> Is it possible for some devices to inject interrupt directly and other
>> to go through interrupt controller?
> 
> It would be racy if both assert + deassert the same line, but I don't see why we should keep anyone from doing it. If user space wants to run such a configuration, it needs to ensure that only one of the 2 is actively used at any given time.
> 
>>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
>> Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
>> X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
>> if it goes through PIC, so additional entity was invented: irq routing.
>> It maps between GSI and irqchips pin. Same GSI may go to more than one
>> irqchip. This is why for x86 having irqchip device handle as a parameter
>> to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
>> router and this is how it work now except that "device handlers" are
>> hard coded.
> 
> Then you would create a new "irq router" device that does the multiplexing and can also receive IRQs. You could then directly assert an IOAPIC/PIC line or a multiplexer line. Or am I misunderstanding something?
> 
>> 
>>> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
>> The restriction that LAPIC has to be created before IOAPIC would be a
>> bug that need to be fixed on X86. The reason is cpu hotplug. If you have
>> to support cpu hotplug you have to be able to create LAPICs after IOAPIC
>> and at this point you can create IOAPIC before any LAPICs as well. I
>> understand this may not be the case for all architectures right now, but
>> something to keep in mind.
> 
> Paul, Scott, do you think we can move the "this CPU can receive interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That ENABLE_CAP would allocate the structures in the vcpu and register the vcpu with the interrupt controller pool.
> 
> The interrupt controller device

creation

> would still iterate through all vcpus to find the ones that match so that we support the ENABLE_CAP at any point in time.

Actually, thinking about this a bit more. If we had explicit interrupt connections, user space would take care of all this:

-- machine init --

for (i = 0; i < smp_cpus; i++) {
    create_cpu();
}

mpic = create_device(DEVICE_MPIC)
for (i = 0; i < smp_cpus; i++) {
    enable_cap(cpus[i], CAP_MPIC_LISTENER);
    mpic_hook_up_irqline(i, cpus[i]);
}

-- hotplug add --

create_cpu();
enable_cap(cpus[i], CAP_MPIC_LISTENER);
mpic_hook_up_irqline(mpic, i, cpus[i]);


Then we don't care about any ordering at all anymore from KVM's perspective. Alternatively, the above code could live inside kvm as well of course. create_vcpu() would have to register itself with "the interrupt controller" then to allow for hotplug.


Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:59                                 ` Alexander Graf
@ 2013-03-06 15:02                                   ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-06 15:02 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

Il 06/03/2013 15:59, Alexander Graf ha scritto:
> Then we don't care about any ordering at all anymore from KVM's
> perspective. Alternatively, the above code could live inside kvm as
> well of course. create_vcpu() would have to register itself with "the
> interrupt controller" then to allow for hotplug.

One of the series from Scott already does that.  He adds a notifier for
the creation of VCPUs and hooks the MPIC to that notifier.

BTW, I don't think the enable_cap is necessary.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:48                               ` Alexander Graf
  2013-03-06 14:59                                 ` Alexander Graf
@ 2013-03-06 15:30                                 ` Gleb Natapov
  2013-03-06 16:33                                   ` Alexander Graf
  2013-03-07  0:32                                 ` Paul Mackerras
  2 siblings, 1 reply; 47+ messages in thread
From: Gleb Natapov @ 2013-03-06 15:30 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Paul Mackerras, Peter Maydell

On Wed, Mar 06, 2013 at 03:48:54PM +0100, Alexander Graf wrote:
> 
> On 06.03.2013, at 15:41, Gleb Natapov wrote:
> 
> > On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
> >> 
> >> On 06.03.2013, at 14:56, Gleb Natapov wrote:
> >> 
> >>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
> >>>> 
> >>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
> >>>> 
> >>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
> >>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
> >>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
> >>>>>> 
> >>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
> >>>>>> 
> >>>>> For x86 this is the case though. I do not see how it can't be. If
> >>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
> >>>>> vector that should be handled as a result of LAPIC emulation.
> >>>> 
> >>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
> >>>> 
> >>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
> >>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
> >>> userspace, so interface between userspace LAPIC emulation is really low
> >>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
> >>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
> >>> GSI lines which do not go into CPU directly. They go either via PIC (which
> >>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
> >>> with APICs via bus, but in our emulation just calls APICs directly.
> >> 
> >> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
> >> 
> >> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
> >> 
> > Is it possible for some devices to inject interrupt directly and other
> > to go through interrupt controller?
> 
> It would be racy if both assert + deassert the same line, but I don't see why we should keep anyone from doing it. If user space wants to run such a configuration, it needs to ensure that only one of the 2 is actively used at any given time.
> 
> >> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
> > Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
> > X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
> > if it goes through PIC, so additional entity was invented: irq routing.
> > It maps between GSI and irqchips pin. Same GSI may go to more than one
> > irqchip. This is why for x86 having irqchip device handle as a parameter
> > to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
> > router and this is how it work now except that "device handlers" are
> > hard coded.
> 
> Then you would create a new "irq router" device that does the multiplexing and can also receive IRQs. You could then directly assert an IOAPIC/PIC line or a multiplexer line. Or am I misunderstanding something?
> 
The usefulness of such flexibility is questionable, but you are right, it can be implemented this way.

> > 
> >> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
> > The restriction that LAPIC has to be created before IOAPIC would be a
> > bug that need to be fixed on X86. The reason is cpu hotplug. If you have
> > to support cpu hotplug you have to be able to create LAPICs after IOAPIC
> > and at this point you can create IOAPIC before any LAPICs as well. I
> > understand this may not be the case for all architectures right now, but
> > something to keep in mind.
> 
> Paul, Scott, do you think we can move the "this CPU can receive interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That ENABLE_CAP would allocate the structures in the vcpu and register the vcpu with the interrupt controller pool.
> 
> The interrupt controller device would still iterate through all vcpus to find the ones that match so that we support the ENABLE_CAP at any point in time.
> 
> > 
> >> 
> >> So again, I'm failing to see where we think differently :).
> >> 
> > The difference is very minor really. I still try to justify to myself
> > why we need separate ioctl() to announce what irqchip we are going to
> > create before creating one (except save QEMU some troubles). The question
> > is: is this ioctl can be useful by itself? Seems like unlikely scenario
> > that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
> > kernel may be such case. QEMU will call it before creating vcpus to
> > tell KVM that LAPICs need to be created along with VCPUs, but no
> > irqchip will be created.
> 
> I don't have a real answer for you yet, but so far the general design mantra of "small, individual pieces that plug together" worked out way better for us than the "have one call that does it all" one. Being explicit simply makes sure that we support more scenarios we don't think of today.
> 
Suppose we are going with the IRQ_CHIP_ARCH ioctl. What happens if
userspace calls ioctl(IRQ_CHIP_ARCH, MPIC) and tries to call KVM_RUN
before creating MPIC device?

--
			Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 15:30                                 ` Gleb Natapov
@ 2013-03-06 16:33                                   ` Alexander Graf
  0 siblings, 0 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 16:33 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	Stuart Yoder, Scott Wood, Paul Mackerras, Peter Maydell



Am 06.03.2013 um 16:30 schrieb Gleb Natapov <gleb@redhat.com>:

> On Wed, Mar 06, 2013 at 03:48:54PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 15:41, Gleb Natapov wrote:
>> 
>>> On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 06.03.2013, at 14:56, Gleb Natapov wrote:
>>>> 
>>>>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>>>>>> 
>>>>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>>>>>> 
>>>>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>>>>> 
>>>>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller emulation I would consider that a bug.
>>>>>>> For x86 this is the case though. I do not see how it can't be. If
>>>>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>>>>>> vector that should be handled as a result of LAPIC emulation.
>>>>>> 
>>>>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in that vcpu? For us it directly controls the CPU interrupt pin.
>>>>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
>>>>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
>>>>> userspace, so interface between userspace LAPIC emulation is really low
>>>>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
>>>>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
>>>>> GSI lines which do not go into CPU directly. They go either via PIC (which
>>>>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
>>>>> with APICs via bus, but in our emulation just calls APICs directly.
>>>> 
>>>> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR line of the vcpu. There is nothing like an IDT on PPC, so external interrupts simply arrive at a specific vector. That vector can differ for critical or NMI interrupts IIRC, but I'm not sure we implement that right now. If so, it'd be a different line for SET_INTERRUPT.
>>>> 
>>>> So in a way, it's the same. And SET_INTERRUPT should work regardless of whether a LAPIC is used or not really. At least it would for us :).
>>> Is it possible for some devices to inject interrupt directly and other
>>> to go through interrupt controller?
>> 
>> It would be racy if both assert + deassert the same line, but I don't see why we should keep anyone from doing it. If user space wants to run such a configuration, it needs to ensure that only one of the 2 is actively used at any given time.
>> 
>>>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. That ioctl should get an ioapic device handle to work on. Whether we call the IOAPIC PINs GSIs or something different is really just a naming question. I'd probably call it IRQ number :).
>>> Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
>>> X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
>>> if it goes through PIC, so additional entity was invented: irq routing.
>>> It maps between GSI and irqchips pin. Same GSI may go to more than one
>>> irqchip. This is why for x86 having irqchip device handle as a parameter
>>> to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
>>> router and this is how it work now except that "device handlers" are
>>> hard coded.
>> 
>> Then you would create a new "irq router" device that does the multiplexing and can also receive IRQs. You could then directly assert an IOAPIC/PIC line or a multiplexer line. Or am I misunderstanding something?
> The usefulness of such flexibility is questionable, but you are right, it can be implemented this way.
> 
>>> 
>>>> But it's the same idea. The "IOAPIC" would then talk to to in-kernel "LAPIC" style bits (or in case of the MPIC just integrate them inside of itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the system have to be populated.
>>> The restriction that LAPIC has to be created before IOAPIC would be a
>>> bug that need to be fixed on X86. The reason is cpu hotplug. If you have
>>> to support cpu hotplug you have to be able to create LAPICs after IOAPIC
>>> and at this point you can create IOAPIC before any LAPICs as well. I
>>> understand this may not be the case for all architectures right now, but
>>> something to keep in mind.
>> 
>> Paul, Scott, do you think we can move the "this CPU can receive interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That ENABLE_CAP would allocate the structures in the vcpu and register the vcpu with the interrupt controller pool.
>> 
>> The interrupt controller device would still iterate through all vcpus to find the ones that match so that we support the ENABLE_CAP at any point in time.
>> 
>>> 
>>>> 
>>>> So again, I'm failing to see where we think differently :).
>>> The difference is very minor really. I still try to justify to myself
>>> why we need separate ioctl() to announce what irqchip we are going to
>>> create before creating one (except save QEMU some troubles). The question
>>> is: is this ioctl can be useful by itself? Seems like unlikely scenario
>>> that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
>>> kernel may be such case. QEMU will call it before creating vcpus to
>>> tell KVM that LAPICs need to be created along with VCPUs, but no
>>> irqchip will be created.
>> 
>> I don't have a real answer for you yet, but so far the general design mantra of "small, individual pieces that plug together" worked out way better for us than the "have one call that does it all" one. Being explicit simply makes sure that we support more scenarios we don't think of today.
> Suppose we are going with the IRQ_CHIP_ARCH ioctl. What happens if
> userspace calls ioctl(IRQ_CHIP_ARCH, MPIC) and tries to call KVM_RUN
> before creating MPIC device?

User space can't access the MPIC :). So it has to do SET_INTERRUPT on vcpus like it does when it doesn't set the irq arch.

Alex

> 
> --
>            Gleb.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:31                           ` Alexander Graf
@ 2013-03-06 18:46                             ` Peter Maydell
  2013-03-06 19:20                               ` Alexander Graf
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2013-03-06 18:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder,
	Scott Wood, Paul Mackerras

On 6 March 2013 22:31, Alexander Graf <agraf@suse.de> wrote:
> On 06.03.2013, at 15:11, Gleb Natapov wrote:
>> KVM_INTERRUPT is synchronous.
>
> Ah, I think that's where my thinko is. On PPC, KVM_INTERRUPT is
> asynchronous.

As an aside, would it be worth moving PPC into line with other
archs by providing and using a KVM_IRQ_LINE for async
interrupts (ie continuing to provide an asynchronous KVM_INTERRUPT
but only as a deprecated backward compatibility thing) ?

-- PMM

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 18:46                             ` Peter Maydell
@ 2013-03-06 19:20                               ` Alexander Graf
  0 siblings, 0 replies; 47+ messages in thread
From: Alexander Graf @ 2013-03-06 19:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Gleb Natapov, Paolo Bonzini, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org, Stuart Yoder, Scott Wood, Paul Mackerras



Am 06.03.2013 um 19:46 schrieb Peter Maydell <peter.maydell@linaro.org>:

> On 6 March 2013 22:31, Alexander Graf <agraf@suse.de> wrote:
>> On 06.03.2013, at 15:11, Gleb Natapov wrote:
>>> KVM_INTERRUPT is synchronous.
>> 
>> Ah, I think that's where my thinko is. On PPC, KVM_INTERRUPT is
>> asynchronous.
> 
> As an aside, would it be worth moving PPC into line with other
> archs by providing and using a KVM_IRQ_LINE for async
> interrupts (ie continuing to provide an asynchronous KVM_INTERRUPT
> but only as a deprecated backward compatibility thing) ?

Since we need to maintain backwards compatibility anyways, I don't think it makes sense and would only clutter the code :)

Alex

> 
> -- PMM

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-06 14:48                               ` Alexander Graf
  2013-03-06 14:59                                 ` Alexander Graf
  2013-03-06 15:30                                 ` Gleb Natapov
@ 2013-03-07  0:32                                 ` Paul Mackerras
  2013-03-07  7:43                                   ` Paolo Bonzini
  2 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2013-03-07  0:32 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Gleb Natapov, Paolo Bonzini, kvm, kvm-ppc, Stuart Yoder,
	Scott Wood, Peter Maydell

On Wed, Mar 06, 2013 at 03:48:54PM +0100, Alexander Graf wrote:
> 
> Paul, Scott, do you think we can move the "this CPU can receive
> interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set
> dynamically? That ENABLE_CAP would allocate the structures in the vcpu
> and register the vcpu with the interrupt controller pool.
> 
> The interrupt controller device would still iterate through all
> vcpus to find the ones that match so that we support the ENABLE_CAP at
> any point in time. 

When you say "gets set dynamically", do you mean some time in the
interval between vcpu creation and when it starts running, or do you
mean at any time, potentially after the vcpu has accessed and modified
its per-vcpu interrupt controller (~ LAPIC) state?

If the former, then sure, I don't see a major problem.  If the latter,
then we'd have to atomically transfer the "LAPIC" state from userspace
to the kernel at the same time as we did the ENABLE_CAP - which is
certainly possible, but we'd need the vcpu to be not running at the
time.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: in-kernel interrupt controller steering
  2013-03-07  0:32                                 ` Paul Mackerras
@ 2013-03-07  7:43                                   ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2013-03-07  7:43 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Gleb Natapov, kvm, kvm-ppc, Stuart Yoder, Scott Wood,
	Peter Maydell, Alexander Graf


> On Wed, Mar 06, 2013 at 03:48:54PM +0100, Alexander Graf wrote:
> > 
> > Paul, Scott, do you think we can move the "this CPU can receive
> > interrupts from MPIC / XICS" part into an ENABLE_CAP that gets set
> > dynamically? That ENABLE_CAP would allocate the structures in the
> > vcpu and register the vcpu with the interrupt controller pool.
> > 
> > The interrupt controller device would still iterate through all
> > vcpus to find the ones that match so that we support the ENABLE_CAP
> > at any point in time.
> 
> When you say "gets set dynamically", do you mean some time in the
> interval between vcpu creation and when it starts running, or do you
> mean at any time, potentially after the vcpu has accessed and
> modified
> its per-vcpu interrupt controller (~ LAPIC) state?
> 
> If the former, then sure, I don't see a major problem.

Only the former.  But I don't think you need a capability even.
KVM_SET_IRQCHIP_TYPE should force the usage of in-kernel per-VCPU
interrupt controllers.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2013-03-07  7:43 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-04 22:20 in-kernel interrupt controller steering Alexander Graf
2013-03-05  0:59 ` Scott Wood
2013-03-05  5:44   ` Paul Mackerras
2013-03-05 15:25 ` Gleb Natapov
2013-03-06  9:40   ` Paolo Bonzini
2013-03-06  9:58     ` Gleb Natapov
2013-03-06 10:04       ` Alexander Graf
2013-03-06 10:12         ` Gleb Natapov
2013-03-06 10:38       ` Paolo Bonzini
2013-03-06 10:38       ` Paolo Bonzini
2013-03-06 11:26         ` Gleb Natapov
2013-03-06 11:44           ` Paolo Bonzini
2013-03-06 11:46             ` Alexander Graf
2013-03-06 11:59               ` Gleb Natapov
2013-03-06 12:02                 ` Alexander Graf
2013-03-06 12:14                 ` Paolo Bonzini
2013-03-06 12:20                   ` Alexander Graf
2013-03-06 12:28                     ` Paolo Bonzini
2013-03-06 13:14                     ` Gleb Natapov
2013-03-06 13:22                       ` Alexander Graf
2013-03-06 13:56                         ` Gleb Natapov
2013-03-06 14:03                           ` Alexander Graf
2013-03-06 14:12                             ` Paolo Bonzini
2013-03-06 14:30                               ` Alexander Graf
2013-03-06 14:37                                 ` Paolo Bonzini
2013-03-06 14:40                                   ` Alexander Graf
2013-03-06 14:41                             ` Gleb Natapov
2013-03-06 14:48                               ` Alexander Graf
2013-03-06 14:59                                 ` Alexander Graf
2013-03-06 15:02                                   ` Paolo Bonzini
2013-03-06 15:30                                 ` Gleb Natapov
2013-03-06 16:33                                   ` Alexander Graf
2013-03-07  0:32                                 ` Paul Mackerras
2013-03-07  7:43                                   ` Paolo Bonzini
2013-03-06 13:41                       ` Paolo Bonzini
2013-03-06 14:11                         ` Gleb Natapov
2013-03-06 14:31                           ` Alexander Graf
2013-03-06 18:46                             ` Peter Maydell
2013-03-06 19:20                               ` Alexander Graf
2013-03-06 11:44           ` Alexander Graf
2013-03-06 11:46             ` Paolo Bonzini
2013-03-06 11:47               ` Alexander Graf
2013-03-06 11:57                 ` Paolo Bonzini
2013-03-06 11:58                   ` Alexander Graf
2013-03-06 13:16                     ` Gleb Natapov
2013-03-06  0:23 ` Benjamin Herrenschmidt
2013-03-06  0:33   ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox