* [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-14 0:04 Christoffer Dall
2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
` (3 more replies)
0 siblings, 4 replies; 59+ messages in thread
From: Christoffer Dall @ 2012-10-14 0:04 UTC (permalink / raw)
To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall
*** warning: this RFC patch series is only compile-tested ***
We need a way to specify the address at which we expect VMs to access
the interrupt controller (both the emulated distributor and the hardware
interface supporting virtualization). User space should decide on this
address as user space decides on an emulated board and loads a device
tree describing these details directly to the guest.
Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
ioctl with a a highly device specific set of parameters, we try
something slightly more generic, that should fit well with how user
space (read QEMU) first builds the individual devices and later sets up
the emulated platform.
Comments welcome!
Christoffer Dall (3):
KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP
Documentation/virtual/kvm/api.txt | 46 +++++++++++++++++++++++
arch/arm/include/asm/kvm.h | 13 +++++++
arch/arm/include/asm/kvm_mmu.h | 1 +
arch/arm/include/asm/kvm_vgic.h | 12 ++++++
arch/arm/kvm/arm.c | 38 ++++++++++++++++++-
arch/arm/kvm/vgic.c | 74 +++++++++++++++++++++++++++++++------
include/linux/kvm.h | 11 ++++++
7 files changed, 183 insertions(+), 12 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 59+ messages in thread* [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-14 0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall @ 2012-10-14 0:04 ` Christoffer Dall 2012-10-17 20:21 ` Peter Maydell 2012-10-18 12:20 ` Avi Kivity 2012-10-14 0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall ` (2 subsequent siblings) 3 siblings, 2 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-14 0:04 UTC (permalink / raw) To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall Used to initialize the in-kernel interrupt controller. On ARM we need to map the virtual generic interrupt controller (vGIC) into Hyp the guest's physicall address space so the guest can access the virtual cpu interface. This must be done after the IRQ chips is create and after a base address has been provided for the emulated platform (patch is following), but before the CPU is initally run. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> --- Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++ arch/arm/kvm/arm.c | 1 + include/linux/kvm.h | 3 +++ 3 files changed, 20 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 25eacc6..26e953d 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. +4.79 KVM_INIT_IRQCHIP + +Capability: KVM_CAP_INIT_IRQCHIP +Architectures: arm +Type: vm ioctl +Parameters: none +Returns: 0 on success, -1 on error + +Initialize the in-kernel interrupt controller. On ARM we need to map the +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall +address space so the guest can access the virtual cpu interface. This must be +done after the IRQ chips is create and after a base address has been provided +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is +initally run. + + 5. The kvm_run structure ------------------------ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index f8c377b..85c76e4 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext) switch (ext) { #ifdef CONFIG_KVM_ARM_VGIC case KVM_CAP_IRQCHIP: + case KVM_CAP_INIT_IRQCHIP: r = vgic_present; break; #endif diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 8091b1d..90ee023 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info { #ifdef __KVM_HAVE_READONLY_MEM #define KVM_CAP_READONLY_MEM 81 #endif +#define KVM_CAP_INIT_IRQCHIP 82 #ifdef KVM_CAP_IRQ_ROUTING @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping { #define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info) /* Available with KVM_CAP_PPC_ALLOC_HTAB */ #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32) +/* Available with KVM_CAP_INIT_IRQCHIP */ +#define KVM_INIT_IRQCHIP _IO(KVMIO, 0xa8) /* * ioctls for vcpu fds -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall @ 2012-10-17 20:21 ` Peter Maydell 2012-10-17 20:23 ` Christoffer Dall 2012-10-18 12:20 ` Avi Kivity 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-17 20:21 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier On 14 October 2012 01:04, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > Used to initialize the in-kernel interrupt controller. On ARM we need to > map the virtual generic interrupt controller (vGIC) into Hyp the guest's > physicall address space so the guest can access the virtual cpu > interface. This must be done after the IRQ chips is create and after a > base address has been provided for the emulated platform (patch is > following), but before the CPU is initally run. I've now written the code for that patch but don't have access to a machine with the ARM cross compile setup to build it until tomorrow. > > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> > --- > Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++ > arch/arm/kvm/arm.c | 1 + > include/linux/kvm.h | 3 +++ > 3 files changed, 20 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 25eacc6..26e953d 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the > KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. > > > +4.79 KVM_INIT_IRQCHIP > + > +Capability: KVM_CAP_INIT_IRQCHIP > +Architectures: arm > +Type: vm ioctl > +Parameters: none > +Returns: 0 on success, -1 on error > + > +Initialize the in-kernel interrupt controller. On ARM we need to map the > +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall Should that "Hyp" be deleted? "physical" > +address space so the guest can access the virtual cpu interface. This must be > +done after the IRQ chips is create and after a base address has been provided "chip". "created". > +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is > +initally run. "initially". (all these typos are also in your commit message) > + > + > 5. The kvm_run structure > ------------------------ > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index f8c377b..85c76e4 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext) > switch (ext) { > #ifdef CONFIG_KVM_ARM_VGIC > case KVM_CAP_IRQCHIP: > + case KVM_CAP_INIT_IRQCHIP: > r = vgic_present; > break; > #endif > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 8091b1d..90ee023 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info { > #ifdef __KVM_HAVE_READONLY_MEM > #define KVM_CAP_READONLY_MEM 81 > #endif > +#define KVM_CAP_INIT_IRQCHIP 82 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping { > #define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info) > /* Available with KVM_CAP_PPC_ALLOC_HTAB */ > #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32) > +/* Available with KVM_CAP_INIT_IRQCHIP */ > +#define KVM_INIT_IRQCHIP _IO(KVMIO, 0xa8) > > /* > * ioctls for vcpu fds > -- > 1.7.9.5 > -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-17 20:21 ` Peter Maydell @ 2012-10-17 20:23 ` Christoffer Dall 2012-10-17 20:31 ` Peter Maydell 0 siblings, 1 reply; 59+ messages in thread From: Christoffer Dall @ 2012-10-17 20:23 UTC (permalink / raw) To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 14 October 2012 01:04, Christoffer Dall > <c.dall@virtualopensystems.com> wrote: >> Used to initialize the in-kernel interrupt controller. On ARM we need to >> map the virtual generic interrupt controller (vGIC) into Hyp the guest's >> physicall address space so the guest can access the virtual cpu >> interface. This must be done after the IRQ chips is create and after a >> base address has been provided for the emulated platform (patch is >> following), but before the CPU is initally run. > > I've now written the code for that patch but don't have access to a machine > with the ARM cross compile setup to build it until tomorrow. > >> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> >> --- >> Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++ >> arch/arm/kvm/arm.c | 1 + >> include/linux/kvm.h | 3 +++ >> 3 files changed, 20 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index 25eacc6..26e953d 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the >> KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. >> >> >> +4.79 KVM_INIT_IRQCHIP >> + >> +Capability: KVM_CAP_INIT_IRQCHIP >> +Architectures: arm >> +Type: vm ioctl >> +Parameters: none >> +Returns: 0 on success, -1 on error >> + >> +Initialize the in-kernel interrupt controller. On ARM we need to map the >> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall > > Should that "Hyp" be deleted? yup > > "physical" > >> +address space so the guest can access the virtual cpu interface. This must be >> +done after the IRQ chips is create and after a base address has been provided > > "chip". "created". > >> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is >> +initally run. > > "initially". thanks a bunch for those, and sorry about the sloppyness. > > (all these typos are also in your commit message) > yeah, you caught my -ECUTANDPASTE there ;) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-17 20:23 ` Christoffer Dall @ 2012-10-17 20:31 ` Peter Maydell 2012-10-17 20:39 ` Christoffer Dall 0 siblings, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-17 20:31 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier On 17 October 2012 21:23, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is >>> +initally run. >> >> "initially". > > thanks a bunch for those, and sorry about the sloppyness. No problem. Also just noticed "platform" there :-) -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-17 20:31 ` Peter Maydell @ 2012-10-17 20:39 ` Christoffer Dall 0 siblings, 0 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-17 20:39 UTC (permalink / raw) To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier On Wed, Oct 17, 2012 at 4:31 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 17 October 2012 21:23, Christoffer Dall > <c.dall@virtualopensystems.com> wrote: >> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is >>>> +initally run. >>> >>> "initially". >> >> thanks a bunch for those, and sorry about the sloppyness. > > No problem. Also just noticed "platform" there :-) > I'll spell check the diff just to be sure. :) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall 2012-10-17 20:21 ` Peter Maydell @ 2012-10-18 12:20 ` Avi Kivity 2012-10-19 18:42 ` Christoffer Dall 1 sibling, 1 reply; 59+ messages in thread From: Avi Kivity @ 2012-10-18 12:20 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, peter.maydell, marc.zyngier On 10/14/2012 02:04 AM, Christoffer Dall wrote: > Used to initialize the in-kernel interrupt controller. On ARM we need to > map the virtual generic interrupt controller (vGIC) into Hyp the guest's > physicall address space so the guest can access the virtual cpu > interface. This must be done after the IRQ chips is create and after a > base address has been provided for the emulated platform (patch is > following), but before the CPU is initally run. > > > +4.79 KVM_INIT_IRQCHIP > + > +Capability: KVM_CAP_INIT_IRQCHIP > +Architectures: arm > +Type: vm ioctl > +Parameters: none > +Returns: 0 on success, -1 on error > + > +Initialize the in-kernel interrupt controller. On ARM we need to map the > +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall > +address space so the guest can access the virtual cpu interface. This must be > +done after the IRQ chips is create and after a base address has been provided > +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is > +initally run. > + What enforces this? Can it be done automatically? issue a kvm_make_request(KVM_REQ_INIT_IRQCHIP) on vcpu creation, and you'll automatically be notified before the first guest entry. Having an ioctl that must be called after point A but before point B seems pointless, when A and B are both known. > + > 5. The kvm_run structure > ------------------------ > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index f8c377b..85c76e4 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext) > switch (ext) { > #ifdef CONFIG_KVM_ARM_VGIC > case KVM_CAP_IRQCHIP: > + case KVM_CAP_INIT_IRQCHIP: This could be part of a baseline, if you don't envision ever taking it out. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl 2012-10-18 12:20 ` Avi Kivity @ 2012-10-19 18:42 ` Christoffer Dall 0 siblings, 0 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-19 18:42 UTC (permalink / raw) To: Avi Kivity; +Cc: kvmarm, kvm, peter.maydell, marc.zyngier On Thu, Oct 18, 2012 at 8:20 AM, Avi Kivity <avi@redhat.com> wrote: > On 10/14/2012 02:04 AM, Christoffer Dall wrote: >> Used to initialize the in-kernel interrupt controller. On ARM we need to >> map the virtual generic interrupt controller (vGIC) into Hyp the guest's >> physicall address space so the guest can access the virtual cpu >> interface. This must be done after the IRQ chips is create and after a >> base address has been provided for the emulated platform (patch is >> following), but before the CPU is initally run. >> >> >> +4.79 KVM_INIT_IRQCHIP >> + >> +Capability: KVM_CAP_INIT_IRQCHIP >> +Architectures: arm >> +Type: vm ioctl >> +Parameters: none >> +Returns: 0 on success, -1 on error >> + >> +Initialize the in-kernel interrupt controller. On ARM we need to map the >> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall >> +address space so the guest can access the virtual cpu interface. This must be >> +done after the IRQ chips is create and after a base address has been provided >> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is >> +initally run. >> + > > What enforces this? > > Can it be done automatically? issue a > kvm_make_request(KVM_REQ_INIT_IRQCHIP) on vcpu creation, and you'll > automatically be notified before the first guest entry. > > Having an ioctl that must be called after point A but before point B > seems pointless, when A and B are both known. > I reworked this according to your comments, patches on the way. thanks for the input. >> + >> 5. The kvm_run structure >> ------------------------ >> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index f8c377b..85c76e4 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext) >> switch (ext) { >> #ifdef CONFIG_KVM_ARM_VGIC >> case KVM_CAP_IRQCHIP: >> + case KVM_CAP_INIT_IRQCHIP: > > This could be part of a baseline, if you don't envision ever taking it out. > ^ permalink raw reply [flat|nested] 59+ messages in thread
* [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-14 0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall @ 2012-10-14 0:04 ` Christoffer Dall 2012-10-17 20:29 ` Peter Maydell 2012-10-14 0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall 2012-10-17 20:38 ` [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Alexander Graf 3 siblings, 1 reply; 59+ messages in thread From: Christoffer Dall @ 2012-10-14 0:04 UTC (permalink / raw) To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall On ARM (and possibly other architectures) some bits are specific to the model being emulated for the guest and user space needs a way to tell the kernel about those bits. An example is mmio device base addresses, where KVM must know the base address for a given device to properly emulate mmio accesses within a certain address range or directly map a device with virtualiation extensions into the guest address space. We try to make this API slightly more generic than for our specific use, but so far only the VGIC uses this feature. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> --- Documentation/virtual/kvm/api.txt | 30 ++++++++++++++++++++++++++++++ arch/arm/include/asm/kvm.h | 13 +++++++++++++ arch/arm/include/asm/kvm_mmu.h | 1 + arch/arm/include/asm/kvm_vgic.h | 6 ++++++ arch/arm/kvm/arm.c | 31 ++++++++++++++++++++++++++++++- arch/arm/kvm/vgic.c | 34 +++++++++++++++++++++++++++++++--- include/linux/kvm.h | 8 ++++++++ 7 files changed, 119 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 26e953d..30ddcac 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is initally run. +4.80 KVM_SET_DEVICE_ADDRESS + +Capability: KVM_CAP_SET_DEVICE_ADDRESS +Architectures: arm +Type: vm ioctl +Parameters: struct kvm_device_address (in) +Returns: 0 on success, -1 on error +Errors: + ENODEV: The device id is unknwown + ENXIO: Device not supported in configuration + E2BIG: Address outside of guest physical address space + +struct kvm_device_address { + __u32 id; + __u64 addr; +}; + +Specify a device address in the guest's physical address space where guests +can access emulated or directly exposed devices, which the host kernel needs +to know about. The id field is an architecture specific identifier for a +specific device. + +ARM divides the id field into two parts, a device ID and an address type id +specific to the individual device. + + bits: | 31 ... 16 | 15 ... 0 | + field: | device id | addr type id | + + + 5. The kvm_run structure ------------------------ diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h index b6eaf0c..dfd60cc 100644 --- a/arch/arm/include/asm/kvm.h +++ b/arch/arm/include/asm/kvm.h @@ -42,6 +42,19 @@ struct kvm_regs { #define KVM_ARM_TARGET_CORTEX_A15 0 #define KVM_ARM_NUM_TARGETS 1 +/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */ +#define KVM_DEVICE_TYPE_SHIFT 0 +#define KVM_DEVICE_TYPE_MASK (0xffff << KVM_DEVICE_TYPE_SHIFT) +#define KVM_DEVICE_ID_SHIFT 16 +#define KVM_DEVICE_ID_MASK (0xffff << KVM_DEVICE_ID_SHIFT) + +/* Supported device IDs */ +#define KVM_ARM_DEVICE_VGIC_V2 0 + +/* Supported VGIC address types */ +#define KVM_VGIC_V2_ADDR_TYPE_DIST 0 +#define KVM_VGIC_V2_ADDR_TYPE_CPU 1 + struct kvm_vcpu_init { __u32 target; __u32 features[7]; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index ecfaaf0..0aef24f 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -26,6 +26,7 @@ * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA * for now, but remember that the level-1 table must be aligned to its size. */ +#define KVM_MAX_IPA ((1ULL << 38) - 1) #define PTRS_PER_PGD2 512 #define PGD2_ORDER get_order(PTRS_PER_PGD2 * sizeof(pgd_t)) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 588c637..a688132 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -242,6 +242,7 @@ struct kvm_exit_mmio; #ifdef CONFIG_KVM_ARM_VGIC int kvm_vgic_hyp_init(void); +int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); int kvm_vgic_init(struct kvm *kvm); void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); @@ -261,6 +262,11 @@ static inline int kvm_vgic_hyp_init(void) return 0; } +static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr) +{ + return 0; +} + static inline int kvm_vgic_init(struct kvm *kvm) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 85c76e4..67c8cc2 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -207,6 +207,9 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_COALESCED_MMIO: r = KVM_COALESCED_MMIO_PAGE_OFFSET; break; + case KVM_CAP_SET_DEVICE_ADDR: + r = 1; + break; default: r = 0; break; @@ -859,20 +862,46 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) return -EINVAL; } +static int kvm_vm_ioctl_set_device_address(struct kvm *kvm, + struct kvm_device_address *dev_addr) +{ + unsigned long dev_id, type; + + dev_id = (dev_addr->id & KVM_DEVICE_ID_MASK) >> KVM_DEVICE_ID_SHIFT; + type = (dev_addr->id & KVM_DEVICE_TYPE_MASK) >> KVM_DEVICE_TYPE_SHIFT; + + switch (dev_id) { + case KVM_ARM_DEVICE_VGIC_V2: + if (!vgic_present) + return -ENXIO; + return kvm_vgic_set_addr(kvm, type, dev_addr->addr); + default: + return -ENODEV; + } +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { + struct kvm *kvm = filp->private_data; + void __user *argp = (void __user *)arg; switch (ioctl) { #ifdef CONFIG_KVM_ARM_VGIC case KVM_CREATE_IRQCHIP: { - struct kvm *kvm = filp->private_data; if (vgic_present) return kvm_vgic_init(kvm); else return -EINVAL; } #endif + case KVM_SET_DEVICE_ADDRESS: { + struct kvm_device_address dev_addr; + + if (copy_from_user(&dev_addr, argp, sizeof(dev_addr))) + return -EFAULT; + return kvm_vm_ioctl_set_device_address(kvm, &dev_addr); + } default: return -EINVAL; } diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 494d94d..1e4be2d 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -65,12 +65,17 @@ * interrupt line to be sampled again. */ -/* Temporary hacks, need to be provided by userspace emulation */ -#define VGIC_DIST_BASE 0x2c001000 +#define VGIC_ADDR_UNDEF (-1) #define VGIC_DIST_SIZE 0x1000 -#define VGIC_CPU_BASE 0x2c002000 #define VGIC_CPU_SIZE 0x2000 +/* Physical address of vgic virtual cpu interface */ +static phys_addr_t vgic_vcpu_base; + +/* Guest physical addresses used by the guest to access the vgic */ +static unsigned long vgic_guest_dist_base = VGIC_ADDR_UNDEF; +static unsigned long vgic_guest_cpu_base = VGIC_ADDR_UNDEF; + /* Virtual control interface base address */ static void __iomem *vgic_vctrl_base; @@ -1117,3 +1122,26 @@ out: return ret; } + +int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr) +{ + int r = 0; + + if (addr > KVM_MAX_IPA) + return -E2BIG; + + mutex_lock(&kvm->lock); + switch (type) { + case KVM_VGIC_V2_ADDR_TYPE_DIST: + vgic_guest_dist_base = addr; + break; + case KVM_VGIC_V2_ADDR_TYPE_CPU: + vgic_guest_cpu_base = addr; + break; + default: + r = -ENODEV; + } + + mutex_unlock(&kvm->lock); + return r; +} diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 90ee023..e9da8ec 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -627,6 +627,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_READONLY_MEM 81 #endif #define KVM_CAP_INIT_IRQCHIP 82 +#define KVM_CAP_SET_DEVICE_ADDR 83 #ifdef KVM_CAP_IRQ_ROUTING @@ -760,6 +761,11 @@ struct kvm_msi { __u8 pad[16]; }; +struct kvm_device_address { + __u32 id; + __u64 addr; +}; + /* * ioctls for VM fds */ @@ -842,6 +848,8 @@ struct kvm_s390_ucas_mapping { #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32) /* Available with KVM_CAP_INIT_IRQCHIP */ #define KVM_INIT_IRQCHIP _IO(KVMIO, 0xa8) +/* Available with KVM_CAP_SET_DEVICE_ADDR */ +#define KVM_SET_DEVICE_ADDRESS _IOW(KVMIO, 0xa9, struct kvm_device_address) /* * ioctls for vcpu fds -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-14 0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall @ 2012-10-17 20:29 ` Peter Maydell 2012-10-19 18:46 ` Christoffer Dall 0 siblings, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-17 20:29 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier On 14 October 2012 01:04, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > On ARM (and possibly other architectures) some bits are specific to the > model being emulated for the guest and user space needs a way to tell > the kernel about those bits. An example is mmio device base addresses, > where KVM must know the base address for a given device to properly > emulate mmio accesses within a certain address range or directly map a > device with virtualiation extensions into the guest address space. > > We try to make this API slightly more generic than for our specific use, > but so far only the VGIC uses this feature. > > Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> > --- > Documentation/virtual/kvm/api.txt | 30 ++++++++++++++++++++++++++++++ > arch/arm/include/asm/kvm.h | 13 +++++++++++++ > arch/arm/include/asm/kvm_mmu.h | 1 + > arch/arm/include/asm/kvm_vgic.h | 6 ++++++ > arch/arm/kvm/arm.c | 31 ++++++++++++++++++++++++++++++- > arch/arm/kvm/vgic.c | 34 +++++++++++++++++++++++++++++++--- > include/linux/kvm.h | 8 ++++++++ > 7 files changed, 119 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 26e953d..30ddcac 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is > initally run. > > > +4.80 KVM_SET_DEVICE_ADDRESS > + > +Capability: KVM_CAP_SET_DEVICE_ADDRESS > +Architectures: arm > +Type: vm ioctl > +Parameters: struct kvm_device_address (in) > +Returns: 0 on success, -1 on error > +Errors: > + ENODEV: The device id is unknwown "unknown" > + ENXIO: Device not supported in configuration "in this configuration" ? (I'm guessing this is for "you tried to map a GIC when this CPU doesn't have a GIC" and similar errors?) > + E2BIG: Address outside of guest physical address space I would say "outside" rather than "outside of" here. > + > +struct kvm_device_address { > + __u32 id; > + __u64 addr; > +}; > + > +Specify a device address in the guest's physical address space where guests > +can access emulated or directly exposed devices, which the host kernel needs > +to know about. The id field is an architecture specific identifier for a > +specific device. > + > +ARM divides the id field into two parts, a device ID and an address type id We should be consistent about whether ID is capitalised or not. > +specific to the individual device. > + > + bits: | 31 ... 16 | 15 ... 0 | > + field: | device id | addr type id | This doesn't say whether userspace is allowed to make this ioctl multiple times for the same device. This could be any of: * undefined behaviour * second call fails with some errno * second call overrides first one It also doesn't say that you're supposed to call this after CREATE and before INIT of the irqchip. (Nor does it say what happens if you call it at some other time.) -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-17 20:29 ` Peter Maydell @ 2012-10-19 18:46 ` Christoffer Dall 2012-10-19 20:24 ` Peter Maydell 0 siblings, 1 reply; 59+ messages in thread From: Christoffer Dall @ 2012-10-19 18:46 UTC (permalink / raw) To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 14 October 2012 01:04, Christoffer Dall > <c.dall@virtualopensystems.com> wrote: >> On ARM (and possibly other architectures) some bits are specific to the >> model being emulated for the guest and user space needs a way to tell >> the kernel about those bits. An example is mmio device base addresses, >> where KVM must know the base address for a given device to properly >> emulate mmio accesses within a certain address range or directly map a >> device with virtualiation extensions into the guest address space. >> >> We try to make this API slightly more generic than for our specific use, >> but so far only the VGIC uses this feature. >> >> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> >> --- >> Documentation/virtual/kvm/api.txt | 30 ++++++++++++++++++++++++++++++ >> arch/arm/include/asm/kvm.h | 13 +++++++++++++ >> arch/arm/include/asm/kvm_mmu.h | 1 + >> arch/arm/include/asm/kvm_vgic.h | 6 ++++++ >> arch/arm/kvm/arm.c | 31 ++++++++++++++++++++++++++++++- >> arch/arm/kvm/vgic.c | 34 +++++++++++++++++++++++++++++++--- >> include/linux/kvm.h | 8 ++++++++ >> 7 files changed, 119 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index 26e953d..30ddcac 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is >> initally run. >> >> >> +4.80 KVM_SET_DEVICE_ADDRESS >> + >> +Capability: KVM_CAP_SET_DEVICE_ADDRESS >> +Architectures: arm >> +Type: vm ioctl >> +Parameters: struct kvm_device_address (in) >> +Returns: 0 on success, -1 on error >> +Errors: >> + ENODEV: The device id is unknwown > > "unknown" > >> + ENXIO: Device not supported in configuration > > "in this configuration" ? (I'm guessing this is for "you tried to > map a GIC when this CPU doesn't have a GIC" and similar errors?) > >> + E2BIG: Address outside of guest physical address space > > I would say "outside" rather than "outside of" here. > >> + >> +struct kvm_device_address { >> + __u32 id; >> + __u64 addr; >> +}; >> + >> +Specify a device address in the guest's physical address space where guests >> +can access emulated or directly exposed devices, which the host kernel needs >> +to know about. The id field is an architecture specific identifier for a >> +specific device. >> + >> +ARM divides the id field into two parts, a device ID and an address type id > > We should be consistent about whether ID is capitalised or not. > indeed >> +specific to the individual device. >> + >> + bits: | 31 ... 16 | 15 ... 0 | >> + field: | device id | addr type id | > > This doesn't say whether userspace is allowed to make this ioctl > multiple times for the same device. This could be any of: > * undefined behaviour > * second call fails with some errno > * second call overrides first one > I added an error condition EEXIST, but since this is trying to not be arm-vgic specific this is really up to the individual device - maybe we can have some polymorphic device that moves around later. > It also doesn't say that you're supposed to call this after CREATE > and before INIT of the irqchip. (Nor does it say what happens if > you call it at some other time.) > same non-device specific argument as above. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-19 18:46 ` Christoffer Dall @ 2012-10-19 20:24 ` Peter Maydell 2012-10-19 20:27 ` Christoffer Dall 0 siblings, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-19 20:24 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier On 19 October 2012 19:46, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> This doesn't say whether userspace is allowed to make this ioctl >> multiple times for the same device. This could be any of: >> * undefined behaviour >> * second call fails with some errno >> * second call overrides first one >> > > I added an error condition EEXIST, but since this is trying to not be > arm-vgic specific this is really up to the individual device - maybe > we can have some polymorphic device that moves around later. > >> It also doesn't say that you're supposed to call this after CREATE >> and before INIT of the irqchip. (Nor does it say what happens if >> you call it at some other time.) >> > > same non-device specific argument as above. We could have a section in the docs that says "On ARM platforms there are devices X and Y and they have such-and-such properties and requirements" [and other devices later can have further docs as appropriate]. -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-19 20:24 ` Peter Maydell @ 2012-10-19 20:27 ` Christoffer Dall 2012-10-19 20:33 ` Christoffer Dall 0 siblings, 1 reply; 59+ messages in thread From: Christoffer Dall @ 2012-10-19 20:27 UTC (permalink / raw) To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier On Fri, Oct 19, 2012 at 4:24 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 19 October 2012 19:46, Christoffer Dall > <c.dall@virtualopensystems.com> wrote: >> On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>> This doesn't say whether userspace is allowed to make this ioctl >>> multiple times for the same device. This could be any of: >>> * undefined behaviour >>> * second call fails with some errno >>> * second call overrides first one >>> >> >> I added an error condition EEXIST, but since this is trying to not be >> arm-vgic specific this is really up to the individual device - maybe >> we can have some polymorphic device that moves around later. >> >>> It also doesn't say that you're supposed to call this after CREATE >>> and before INIT of the irqchip. (Nor does it say what happens if >>> you call it at some other time.) >>> >> >> same non-device specific argument as above. > > We could have a section in the docs that says "On ARM platforms > there are devices X and Y and they have such-and-such properties > and requirements" [and other devices later can have further docs > as appropriate]. > sure, I can add that. -Christoffer ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl 2012-10-19 20:27 ` Christoffer Dall @ 2012-10-19 20:33 ` Christoffer Dall 0 siblings, 0 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-19 20:33 UTC (permalink / raw) To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier On Fri, Oct 19, 2012 at 4:27 PM, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > On Fri, Oct 19, 2012 at 4:24 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 19 October 2012 19:46, Christoffer Dall >> <c.dall@virtualopensystems.com> wrote: >>> On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>>> This doesn't say whether userspace is allowed to make this ioctl >>>> multiple times for the same device. This could be any of: >>>> * undefined behaviour >>>> * second call fails with some errno >>>> * second call overrides first one >>>> >>> >>> I added an error condition EEXIST, but since this is trying to not be >>> arm-vgic specific this is really up to the individual device - maybe >>> we can have some polymorphic device that moves around later. >>> >>>> It also doesn't say that you're supposed to call this after CREATE >>>> and before INIT of the irqchip. (Nor does it say what happens if >>>> you call it at some other time.) >>>> >>> >>> same non-device specific argument as above. >> >> We could have a section in the docs that says "On ARM platforms >> there are devices X and Y and they have such-and-such properties >> and requirements" [and other devices later can have further docs >> as appropriate]. >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 65aacc5..1380885 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2131,6 +2131,12 @@ specific to the individual device. bits: | 31 ... 16 | 15 ... 0 | field: | device id | addr type id | +ARM currently only require this when using the in-kernel GIC support for the +hardware vGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id. When +setting the base address for the guest's mapping of the vGIC virtual CPU +and distributor interface, the ioctl must be called after calling +KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs. Calling +this ioctl twice for any of the base addresses will return -EEXIST. 5. The kvm_run structure ^ permalink raw reply related [flat|nested] 59+ messages in thread
* [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP 2012-10-14 0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall @ 2012-10-14 0:04 ` Christoffer Dall 2012-10-18 11:15 ` Peter Maydell 2012-10-17 20:38 ` [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Alexander Graf 3 siblings, 1 reply; 59+ messages in thread From: Christoffer Dall @ 2012-10-14 0:04 UTC (permalink / raw) To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall We need this two factor initialization step to support a sane user space initialization of the emulated model. We simply follow the names of the ioctls for the internal vgic implementation steps and check if we have everything we need on the host side when we create the vgic and set up the rest on init. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> --- arch/arm/include/asm/kvm_vgic.h | 6 ++++++ arch/arm/kvm/arm.c | 6 ++++++ arch/arm/kvm/vgic.c | 40 +++++++++++++++++++++++++++++++-------- 3 files changed, 44 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index a688132..8bd1426 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -243,6 +243,7 @@ struct kvm_exit_mmio; #ifdef CONFIG_KVM_ARM_VGIC int kvm_vgic_hyp_init(void); int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); +int kvm_vgic_create(struct kvm *kvm); int kvm_vgic_init(struct kvm *kvm); void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu); void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu); @@ -267,6 +268,11 @@ static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 add return 0; } +static inline int kvm_vgic_create(struct kvm *kvm) +{ + return 0; +} + static inline int kvm_vgic_init(struct kvm *kvm) { return 0; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 67c8cc2..c0af87e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -890,6 +890,12 @@ long kvm_arch_vm_ioctl(struct file *filp, #ifdef CONFIG_KVM_ARM_VGIC case KVM_CREATE_IRQCHIP: { if (vgic_present) + return kvm_vgic_create(kvm); + else + return -EINVAL; + } + case KVM_INIT_IRQCHIP: { + if (vgic_present) return kvm_vgic_init(kvm); else return -EINVAL; diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 1e4be2d..78d590c 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -1084,12 +1084,12 @@ out_free_irq: int kvm_vgic_init(struct kvm *kvm) { int ret, i; - struct resource vcpu_res; mutex_lock(&kvm->lock); - if (of_address_to_resource(vgic_node, 3, &vcpu_res)) { - kvm_err("Cannot obtain VCPU resource\n"); + if (vgic_guest_dist_base == VGIC_ADDR_UNDEF || + vgic_guest_cpu_base == VGIC_ADDR_UNDEF) { + kvm_err("Need to set vgic cpu and dist addresses first\n"); ret = -ENXIO; goto out; } @@ -1101,11 +1101,12 @@ int kvm_vgic_init(struct kvm *kvm) spin_lock_init(&kvm->arch.vgic.lock); kvm->arch.vgic.vctrl_base = vgic_vctrl_base; - kvm->arch.vgic.vgic_dist_base = VGIC_DIST_BASE; + kvm->arch.vgic.vgic_dist_base = vgic_guest_dist_base; kvm->arch.vgic.vgic_dist_size = VGIC_DIST_SIZE; - ret = kvm_phys_addr_ioremap(kvm, VGIC_CPU_BASE, - vcpu_res.start, VGIC_CPU_SIZE); + ret = kvm_phys_addr_ioremap(kvm, vgic_guest_cpu_base, + vgic_vcpu_base, VGIC_CPU_SIZE); + if (ret) { kvm_err("Unable to remap VGIC CPU to VCPU\n"); goto out; @@ -1113,13 +1114,36 @@ int kvm_vgic_init(struct kvm *kvm) for (i = 32; i < VGIC_NR_IRQS; i += 4) vgic_set_target_reg(kvm, 0, i); - out: mutex_unlock(&kvm->lock); - if (!ret) kvm_timer_init(kvm); + return ret; +} + +int kvm_vgic_create(struct kvm *kvm) +{ + int ret; + struct resource vcpu_res; + + mutex_lock(&kvm->lock); + if (of_address_to_resource(vgic_node, 3, &vcpu_res)) { + kvm_err("Cannot obtain VCPU resource\n"); + ret = -ENXIO; + goto out; + } + + if (atomic_read(&kvm->online_vcpus) || kvm->arch.vgic.vctrl_base) { + ret = -EEXIST; + goto out; + } + + + vgic_vcpu_base = vcpu_res.start; + ret = 0; +out: + mutex_unlock(&kvm->lock); return ret; } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP 2012-10-14 0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall @ 2012-10-18 11:15 ` Peter Maydell 0 siblings, 0 replies; 59+ messages in thread From: Peter Maydell @ 2012-10-18 11:15 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier On 14 October 2012 01:04, Christoffer Dall <c.dall@virtualopensystems.com> wrote: > We need this two factor initialization step to support a sane user space > initialization of the emulated model. We simply follow the names of the > ioctls for the internal vgic implementation steps and check if we have > everything we need on the host side when we create the vgic and set up > the rest on init. With this patch I find that KVM_INIT_IRQCHIP fails EEXIST because it's hitting the check in kvm_vgic_init() that online_vcpus is 0. I think this check should be removed now as INIT_IRQCHIP will always happen late, after we've created vcpus. (The patch puts this check in kvm_vgic_create() so I guess I'm saying the check should be moved rather than copied.) On the other hand, I removed that check, and the host kernel oopses: Unable to handle kernel paging request at virtual address 78656e75 pgd = dea94c80 [78656e75] *pgd=00000000 Internal error: Oops: 205 [#1] SMP ARM CPU: 1 Tainted: G W (3.6.0+ #87) PC is at vsnprintf+0x38/0x400 LR is at panic+0x60/0x1dc pc : [<c01bb7c4>] lr : [<c037eb98>] psr: 20000093 sp : de145df0 ip : de145e6c fp : 00000020 r10: c04c3dd4 r9 : 00000000 r8 : de145e6c r7 : 00000000 r6 : cfdfdfdf r5 : 00000000 r4 : 78656e75 r3 : de145e6c r2 : 78656e75 r1 : 00000400 r0 : c04c3dd4 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 9ea94c80 DAC: 00000000 Process qemu-system-arm (pid: 1078, stack limit = 0xde1442f8) Stack: (0xde145df0 to 0xde146000) 5de0: 00030003 c006a07c 00000000 00000001 5e00: c038830c c04c3dd4 00000400 00000000 00000000 00000000 00000001 c04c3db0 5e20: 00000000 cfdfdfdf 00000000 00000000 00000000 00000001 de1642c4 c037eb98 5e40: 00000000 de145e6c fffffce0 00000000 cfdfdfdf 00000000 00000000 00000000 5e60: 00000001 c0022ac8 78656e75 c0022ac8 00000000 00000000 de144000 00000001 5e80: 00000002 00000000 00000000 00010000 9eacc000 00000000 7ffbfeff fffffffe 5ea0: 00000000 de164000 dea96600 00000000 de144000 00000000 00000000 0000ae80 5ec0: b63da6a4 c001ef5c 00000001 00000002 00000000 c00689a4 00000000 c003c794 5ee0: de073ec4 0000000a de400e18 00000000 dea96600 0000000b c000ed08 de144000 5f00: 00000000 c00c36fc fffffffa 00000434 00000000 c04c3618 c04974c0 c006cdf4 5f20: 00000100 3fb69f7c 00000000 00000004 00000084 7fffffff 00000001 00000001 5f40: 00000081 00000000 00000001 007bc068 de144000 00000000 b63da6a4 c0069418 5f60: 00000002 dea96600 00000000 0000ae80 0000000b c000ed08 de144000 00000000 5f80: b63da6a4 c00c3ca8 00000002 00000001 00000000 b63da470 00000000 00000000 5fa0: 00000036 c000eb80 b63da470 00000000 0000000b 0000ae80 00000000 0000ae80 5fc0: b63da470 00000000 00000000 00000036 00000000 00000000 beab7628 b63da6a4 5fe0: 0037e250 b63d9d6c 00290181 b6d9c2ec 600f0010 0000000b dfdfdfcf cfdfdfdf [<c01bb7c4>] (vsnprintf+0x38/0x400) from [<c037eb98>] (panic+0x60/0x1dc) [<c037eb98>] (panic+0x60/0x1dc) from [<c0022ac8>] (kvm_arch_vcpu_ioctl_run+0xe8/0x404) [<c0022ac8>] (kvm_arch_vcpu_ioctl_run+0xe8/0x404) from [<c001ef5c>] (kvm_vcpu_ioctl+0x4d0/0x6a0) [<c001ef5c>] (kvm_vcpu_ioctl+0x4d0/0x6a0) from [<c00c36fc>] (do_vfs_ioctl+0x84/0x5f8) [<c00c36fc>] (do_vfs_ioctl+0x84/0x5f8) from [<c00c3ca8>] (sys_ioctl+0x38/0x5c) [<c00c3ca8>] (sys_ioctl+0x38/0x5c) from [<c000eb80>] (ret_fast_syscall+0x0/0x30) Code: ba0000e5 e59da014 e3a0b020 e59d1018 (e5d23000) ---[ end trace 1b75b31a2719ed1e ]--- QEMU test code is here: git://git.linaro.org/people/pmaydell/qemu-arm.git kvm-arm-dev-addr-test thanks -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-14 0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall ` (2 preceding siblings ...) 2012-10-14 0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall @ 2012-10-17 20:38 ` Alexander Graf 2012-10-17 20:39 ` Christoffer Dall 3 siblings, 1 reply; 59+ messages in thread From: Alexander Graf @ 2012-10-17 20:38 UTC (permalink / raw) To: Christoffer Dall; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc On 10/14/2012 02:04 AM, Christoffer Dall wrote: > *** warning: this RFC patch series is only compile-tested *** > > We need a way to specify the address at which we expect VMs to access > the interrupt controller (both the emulated distributor and the hardware > interface supporting virtualization). User space should decide on this > address as user space decides on an emulated board and loads a device > tree describing these details directly to the guest. > > Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific > ioctl with a a highly device specific set of parameters, we try > something slightly more generic, that should fit well with how user > space (read QEMU) first builds the individual devices and later sets up > the emulated platform. Have you talked to Ben about this one? He wanted to design a new, more flexible irqchip API that would work for XICS & MPIC. Maybe there's some room for cooperation here? Alex ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-17 20:38 ` [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Alexander Graf @ 2012-10-17 20:39 ` Christoffer Dall 2012-10-17 21:19 ` Benjamin Herrenschmidt 2012-10-17 22:10 ` Paul Mackerras 0 siblings, 2 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-17 20:39 UTC (permalink / raw) To: Alexander Graf; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote: > On 10/14/2012 02:04 AM, Christoffer Dall wrote: >> >> *** warning: this RFC patch series is only compile-tested *** >> >> We need a way to specify the address at which we expect VMs to access >> the interrupt controller (both the emulated distributor and the hardware >> interface supporting virtualization). User space should decide on this >> address as user space decides on an emulated board and loads a device >> tree describing these details directly to the guest. >> >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific >> ioctl with a a highly device specific set of parameters, we try >> something slightly more generic, that should fit well with how user >> space (read QEMU) first builds the individual devices and later sets up >> the emulated platform. > > > Have you talked to Ben about this one? He wanted to design a new, more > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > room for cooperation here? > I have not - Ben, what do you have in mind? -Christoffer ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-17 20:39 ` Christoffer Dall @ 2012-10-17 21:19 ` Benjamin Herrenschmidt 2012-10-17 22:10 ` Paul Mackerras 1 sibling, 0 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-17 21:19 UTC (permalink / raw) To: Christoffer Dall; +Cc: Alexander Graf, kvmarm, kvm, kvm-ppc, Paul Mackerras On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote: > > Have you talked to Ben about this one? He wanted to design a new, more > > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > > room for cooperation here? > > > I have not - Ben, what do you have in mind? I've been sidetracked to some other stuff so for now Paul (CC) is taking over my interrupt patches. We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was causing an x86 ABI breakage (ioctl number changing). So we'll probably be creating a new one. >From there, nothing fancy really, just an ioctl with an IRQ chip type at the beginning followed by a union of type-specific parameters. The main problem we haven't sorted out yet is how to replace some of the horrors related to mapping interrupts that have tendrils all the way into virtio-pci etc... in kemu that don't apply to use (well mostly) and the interaction with in-kernel generated interrupts to avoid going through qemu for vhost ec... Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-17 20:39 ` Christoffer Dall 2012-10-17 21:19 ` Benjamin Herrenschmidt @ 2012-10-17 22:10 ` Paul Mackerras 2012-10-17 23:58 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Paul Mackerras @ 2012-10-17 22:10 UTC (permalink / raw) To: Christoffer Dall Cc: Alexander Graf, kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote: > On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote: > > On 10/14/2012 02:04 AM, Christoffer Dall wrote: > >> > >> *** warning: this RFC patch series is only compile-tested *** > >> > >> We need a way to specify the address at which we expect VMs to access > >> the interrupt controller (both the emulated distributor and the hardware > >> interface supporting virtualization). User space should decide on this > >> address as user space decides on an emulated board and loads a device > >> tree describing these details directly to the guest. > >> > >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific > >> ioctl with a a highly device specific set of parameters, we try > >> something slightly more generic, that should fit well with how user > >> space (read QEMU) first builds the individual devices and later sets up > >> the emulated platform. > > > > > > Have you talked to Ben about this one? He wanted to design a new, more > > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > > room for cooperation here? > > > I have not - Ben, what do you have in mind? I've taken over Ben's patches in this area and I'm currently working on getting them ready for submission. So far we only have XICS emulation, and it is accessed through hypercalls, so there are no addresses in the create-iochip ioctl argument yet. What we have so far is a new ioctl: #define KVM_CREATE_IRQCHIP_ARGS _IOW(KVMIO, 0xac, struct kvm_irqchip_args) where kvm_irqchip_args is defined in an arch header and currently looks like this: /* for KVM_CAP_SPAPR_XICS */ #define __KVM_HAVE_IRQCHIP_ARGS struct kvm_irqchip_args { #define KVM_IRQCHIP_TYPE_ICP 0 /* XICS: ICP (presentation controller) */ #define KVM_IRQCHIP_TYPE_ICS 1 /* XICS: ICS (source controller) */ __u32 type; union { /* XICS ICP arguments. This needs to be called once before * creating any VCPU to initialize the main kernel XICS data * structures. */ struct { #define KVM_ICP_FLAG_NOREALMODE 0x00000001 /* Disable real mode ICP */ __u32 flags; } icp; /* XICS ICS arguments. You can call this for every BUID you * want to make available. * * The BUID is 12 bits, the interrupt number within a BUID * is up to 12 bits as well. The resulting interrupt numbers * exposed to the guest are BUID || IRQ which is 24 bit * * BUID cannot be 0. */ struct { __u32 flags; __u16 buid; __u16 nr_irqs; } ics; }; }; With the XICS, there are two types of irqchip: a source controller and a presentation controller. There is one presentation controller per vcpu and typically one source controller per PCI host bridge (a source controller can manage multiple sources). The "buid" above is basically an identifier for a source controller. So with the above, it would be quite easy to add new types and arguments for them. Thoughts? Paul. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-17 22:10 ` Paul Mackerras @ 2012-10-17 23:58 ` Benjamin Herrenschmidt 2012-10-18 13:48 ` Christoffer Dall 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-17 23:58 UTC (permalink / raw) To: Paul Mackerras; +Cc: Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: > > With the XICS, there are two types of irqchip: a source controller and > a presentation controller. There is one presentation controller per > vcpu and typically one source controller per PCI host bridge (a source > controller can manage multiple sources). The "buid" above is > basically an identifier for a source controller. > > So with the above, it would be quite easy to add new types and > arguments for them. The only possible issue is that afiak, the ioctl number depends on the structure size, no ? If it does, then we should add some padding to the union to leave room for new types. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-17 23:58 ` Benjamin Herrenschmidt @ 2012-10-18 13:48 ` Christoffer Dall 2012-10-18 13:49 ` Alexander Graf 2012-10-23 10:48 ` Jan Kiszka 0 siblings, 2 replies; 59+ messages in thread From: Christoffer Dall @ 2012-10-18 13:48 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paul Mackerras, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: >> >> With the XICS, there are two types of irqchip: a source controller and >> a presentation controller. There is one presentation controller per >> vcpu and typically one source controller per PCI host bridge (a source >> controller can manage multiple sources). The "buid" above is >> basically an identifier for a source controller. >> >> So with the above, it would be quite easy to add new types and >> arguments for them. > > The only possible issue is that afiak, the ioctl number depends on the > structure size, no ? If it does, then we should add some padding to the > union to leave room for new types. > It sounds overall ok to me, however, Peter Maydell pointed out that QEMU is architected in such a way that creating the IRQ chip happens before QEMU knows how the chip fits with the model it is emulating and therefore lacks bits of information that we require for initializing the irq chip. Specifically on ARM the information needed is the base address of a hardware peripheral, which is virtualization aware, and is mapped directly into the guest's physical address space. One could argue that's not a concern for designing a good kernel api, but on the other hand, qemu is already quite a large system on its own, and we have to be practical. Another alternative is to just let qemu init the device, later give the base address and check before each execution of the vcpu whether the base address has been set and simply return an error if not. This latter approach just seems horrible and unintuitive to me. Thoughts? -Christoffer ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-18 13:48 ` Christoffer Dall @ 2012-10-18 13:49 ` Alexander Graf 2012-10-18 15:25 ` Avi Kivity 2012-10-23 10:48 ` Jan Kiszka 1 sibling, 1 reply; 59+ messages in thread From: Alexander Graf @ 2012-10-18 13:49 UTC (permalink / raw) To: Christoffer Dall Cc: Benjamin Herrenschmidt, Paul Mackerras, kvmarm, kvm, kvm-ppc, Peter Maydell On 18.10.2012, at 15:48, Christoffer Dall wrote: > On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: >> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: >>> >>> With the XICS, there are two types of irqchip: a source controller and >>> a presentation controller. There is one presentation controller per >>> vcpu and typically one source controller per PCI host bridge (a source >>> controller can manage multiple sources). The "buid" above is >>> basically an identifier for a source controller. >>> >>> So with the above, it would be quite easy to add new types and >>> arguments for them. >> >> The only possible issue is that afiak, the ioctl number depends on the >> structure size, no ? If it does, then we should add some padding to the >> union to leave room for new types. >> > It sounds overall ok to me, however, Peter Maydell pointed out that > QEMU is architected in such a way that creating the IRQ chip happens > before QEMU knows how the chip fits with the model it is emulating and > therefore lacks bits of information that we require for initializing > the irq chip. > > Specifically on ARM the information needed is the base address of a > hardware peripheral, which is virtualization aware, and is mapped > directly into the guest's physical address space. > > One could argue that's not a concern for designing a good kernel api, > but on the other hand, qemu is already quite a large system on its > own, and we have to be practical. > > Another alternative is to just let qemu init the device, later give > the base address and check before each execution of the vcpu whether > the base address has been set and simply return an error if not. This > latter approach just seems horrible and unintuitive to me. > > Thoughts? Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete? Alex ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-18 13:49 ` Alexander Graf @ 2012-10-18 15:25 ` Avi Kivity 0 siblings, 0 replies; 59+ messages in thread From: Avi Kivity @ 2012-10-18 15:25 UTC (permalink / raw) To: Alexander Graf Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, kvmarm, kvm, kvm-ppc, Peter Maydell On 10/18/2012 03:49 PM, Alexander Graf wrote: > > On 18.10.2012, at 15:48, Christoffer Dall wrote: > >> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt >> <benh@kernel.crashing.org> wrote: >>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: >>>> >>>> With the XICS, there are two types of irqchip: a source controller and >>>> a presentation controller. There is one presentation controller per >>>> vcpu and typically one source controller per PCI host bridge (a source >>>> controller can manage multiple sources). The "buid" above is >>>> basically an identifier for a source controller. >>>> >>>> So with the above, it would be quite easy to add new types and >>>> arguments for them. >>> >>> The only possible issue is that afiak, the ioctl number depends on the >>> structure size, no ? If it does, then we should add some padding to the >>> union to leave room for new types. >>> >> It sounds overall ok to me, however, Peter Maydell pointed out that >> QEMU is architected in such a way that creating the IRQ chip happens >> before QEMU knows how the chip fits with the model it is emulating and >> therefore lacks bits of information that we require for initializing >> the irq chip. >> >> Specifically on ARM the information needed is the base address of a >> hardware peripheral, which is virtualization aware, and is mapped >> directly into the guest's physical address space. >> >> One could argue that's not a concern for designing a good kernel api, >> but on the other hand, qemu is already quite a large system on its >> own, and we have to be practical. >> >> Another alternative is to just let qemu init the device, later give >> the base address and check before each execution of the vcpu whether >> the base address has been set and simply return an error if not. This >> latter approach just seems horrible and unintuitive to me. >> >> Thoughts? > > Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete? vcpu->requests is checked on each entry. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-18 13:48 ` Christoffer Dall 2012-10-18 13:49 ` Alexander Graf @ 2012-10-23 10:48 ` Jan Kiszka 2012-10-23 10:52 ` Peter Maydell 2012-10-24 0:50 ` Paul Mackerras 1 sibling, 2 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-23 10:48 UTC (permalink / raw) To: Christoffer Dall Cc: Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell On 2012-10-18 15:48, Christoffer Dall wrote: > On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: >> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: >>> >>> With the XICS, there are two types of irqchip: a source controller and >>> a presentation controller. There is one presentation controller per >>> vcpu and typically one source controller per PCI host bridge (a source >>> controller can manage multiple sources). The "buid" above is >>> basically an identifier for a source controller. >>> >>> So with the above, it would be quite easy to add new types and >>> arguments for them. >> >> The only possible issue is that afiak, the ioctl number depends on the >> structure size, no ? If it does, then we should add some padding to the >> union to leave room for new types. >> > It sounds overall ok to me, however, Peter Maydell pointed out that > QEMU is architected in such a way that creating the IRQ chip happens > before QEMU knows how the chip fits with the model it is emulating and > therefore lacks bits of information that we require for initializing > the irq chip. > > Specifically on ARM the information needed is the base address of a > hardware peripheral, which is virtualization aware, and is mapped > directly into the guest's physical address space. > > One could argue that's not a concern for designing a good kernel api, > but on the other hand, qemu is already quite a large system on its > own, and we have to be practical. > > Another alternative is to just let qemu init the device, later give > the base address and check before each execution of the vcpu whether > the base address has been set and simply return an error if not. This > latter approach just seems horrible and unintuitive to me. > > Thoughts? The current irqchip API is like this: KVM_CREATE_IRQCHIP (without any parameters) ... KVM_CREATE_VCPU KVM_SET_IRQCHIP (or the other way around) ... KVM_RUN The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more like a "Hey, there will be an IRQ chip!" - could be passed via KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP becomes optional. Then you don't need the a check on KVM_RUN. What do we need in addition to this in any of the non-x86 archs? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-23 10:48 ` Jan Kiszka @ 2012-10-23 10:52 ` Peter Maydell 2012-10-23 11:00 ` Jan Kiszka 2012-10-24 0:50 ` Paul Mackerras 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-23 10:52 UTC (permalink / raw) To: Jan Kiszka Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm, kvm, kvm-ppc On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote: > The current irqchip API is like this: > > KVM_CREATE_IRQCHIP (without any parameters) > ... > KVM_CREATE_VCPU > KVM_SET_IRQCHIP (or the other way around) > ... > KVM_RUN > > The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more > like a "Hey, there will be an IRQ chip!" - could be passed via > KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane > configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP > becomes optional. Then you don't need the a check on KVM_RUN. KVM_SET_IRQCHIP is the state load ioctl, right (companion to KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that for configuration rather than for migration/sync of state with userspace... (For ARM we will just use the ONE_REG ABI for irqchip state save/load anyway, so KVM_SET_IRQCHIP isn't relevant.) -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-23 10:52 ` Peter Maydell @ 2012-10-23 11:00 ` Jan Kiszka 2012-10-23 11:04 ` Peter Maydell 0 siblings, 1 reply; 59+ messages in thread From: Jan Kiszka @ 2012-10-23 11:00 UTC (permalink / raw) To: Peter Maydell Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-23 12:52, Peter Maydell wrote: > On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> The current irqchip API is like this: >> >> KVM_CREATE_IRQCHIP (without any parameters) >> ... >> KVM_CREATE_VCPU >> KVM_SET_IRQCHIP (or the other way around) >> ... >> KVM_RUN >> >> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more >> like a "Hey, there will be an IRQ chip!" - could be passed via >> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane >> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP >> becomes optional. Then you don't need the a check on KVM_RUN. > > KVM_SET_IRQCHIP is the state load ioctl, right (companion to > KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that > for configuration rather than for migration/sync of state > with userspace... Depends on how reconfigurable your chip is. x86 also sends the mapping down this way, which happens to be guest configurable but is practically static. > > (For ARM we will just use the ONE_REG ABI for irqchip state > save/load anyway, so KVM_SET_IRQCHIP isn't relevant.) The less problems you should have using the SET interface to perform one-time configuration. BTW, I guess we will regret that one-reg ABI one day and have to introduce a multi-reg version again for hot-standby, i.e. continuous state migration. I know we also do this for c86 MSRs - that interface has the same limitation. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-23 11:00 ` Jan Kiszka @ 2012-10-23 11:04 ` Peter Maydell 2012-10-23 11:08 ` Jan Kiszka 0 siblings, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-23 11:04 UTC (permalink / raw) To: Jan Kiszka Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote: > BTW, I guess we will regret that one-reg ABI one day and have to > introduce a multi-reg version again for hot-standby, i.e. continuous > state migration. I know we also do this for c86 MSRs - that interface > has the same limitation. The multi-reg ABI idea has been floated, it would be easy enough to add. We just don't want to tie up getting ARM KVM support in with arguments over yet another ABI -- ONE_REG is sufficient for our current purposes. -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-23 11:04 ` Peter Maydell @ 2012-10-23 11:08 ` Jan Kiszka 0 siblings, 0 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-23 11:08 UTC (permalink / raw) To: Peter Maydell Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-23 13:04, Peter Maydell wrote: > On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> BTW, I guess we will regret that one-reg ABI one day and have to >> introduce a multi-reg version again for hot-standby, i.e. continuous >> state migration. I know we also do this for c86 MSRs - that interface >> has the same limitation. > > The multi-reg ABI idea has been floated, it would be easy enough to add. > We just don't want to tie up getting ARM KVM support in with arguments > over yet another ABI -- ONE_REG is sufficient for our current > purposes. Good that the number of IOCTLs we can encode is still large. ;) Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-23 10:48 ` Jan Kiszka 2012-10-23 10:52 ` Peter Maydell @ 2012-10-24 0:50 ` Paul Mackerras 2012-10-25 11:57 ` Jan Kiszka 2012-10-25 16:14 ` Paolo Bonzini 1 sibling, 2 replies; 59+ messages in thread From: Paul Mackerras @ 2012-10-24 0:50 UTC (permalink / raw) To: Jan Kiszka Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote: > The current irqchip API is like this: > > KVM_CREATE_IRQCHIP (without any parameters) > ... > KVM_CREATE_VCPU > KVM_SET_IRQCHIP (or the other way around) > ... > KVM_RUN > > The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more > like a "Hey, there will be an IRQ chip!" - could be passed via > KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane > configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP > becomes optional. Then you don't need the a check on KVM_RUN. Interesting. How many times do you call KVM_CREATE_IRQCHIP per VM? Just once? > What do we need in addition to this in any of the non-x86 archs? What we have with the XICS, and to some extent with the OpenPIC, is a separation between "source" and "presentation" controllers, with a message-passing fabric between them. The source controllers handle the details of some number of interrupt sources, such as the priority and destination of each interrupt source, and the presentation controllers handle the interface to the CPUs, so there is one presentation controller per CPU. The presentation controller for a CPU has registers for the CPU priority, IPI request priority, and pending interrupt status. So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to create a presentation controller per vcpu. But then how do we tell KVM how many source controllers we want and how many interrupts each source controller should handle? The source controllers are not tied to any particular vcpu, and a source controller could potentially have 100s of interrupts or more (particularly with MSIs). Configuration of each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP for configuring a source controller we'd be limited to 64 interrupts per source controller. What I have in my patches to do XICS emulation in the kernel is a new KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a type, and for source controllers, an identifying number ("bus unit ID" or BUID, since that's what the hardware calls it) and the number of sources. Then for saving/restoring the presentation controller state there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES ioctls that take an argument struct like this: struct kvm_irq_sources { __u32 start_irq_number; __u32 nr_irqs; __u64 *irqbuf; }; OpenPIC also can handle 100s or 1000s of interrupt sources and can have the sources divided up into blocks (which tends to make it desirable to have multiple source controllers). It also has per-CPU interrupt delivery registers and per-source interrupt source registers. So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP for small configurations, but it seems like it would run out of space for larger configurations. Paul. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-24 0:50 ` Paul Mackerras @ 2012-10-25 11:57 ` Jan Kiszka 2012-10-25 16:14 ` Paolo Bonzini 1 sibling, 0 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-25 11:57 UTC (permalink / raw) To: Paul Mackerras Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell On 2012-10-24 02:50, Paul Mackerras wrote: > On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote: > >> The current irqchip API is like this: >> >> KVM_CREATE_IRQCHIP (without any parameters) >> ... >> KVM_CREATE_VCPU >> KVM_SET_IRQCHIP (or the other way around) >> ... >> KVM_RUN >> >> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more >> like a "Hey, there will be an IRQ chip!" - could be passed via >> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane >> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP >> becomes optional. Then you don't need the a check on KVM_RUN. > > Interesting. How many times do you call KVM_CREATE_IRQCHIP per VM? > Just once? Yes, just once. The model is that we switch between user space and kernel space emulation (or hardware-assisted virtualization) of the irqchip(s). > >> What do we need in addition to this in any of the non-x86 archs? > > What we have with the XICS, and to some extent with the OpenPIC, is a > separation between "source" and "presentation" controllers, with a > message-passing fabric between them. The source controllers handle > the details of some number of interrupt sources, such as the priority > and destination of each interrupt source, and the presentation > controllers handle the interface to the CPUs, so there is one > presentation controller per CPU. The presentation controller for a > CPU has registers for the CPU priority, IPI request priority, and > pending interrupt status. > > So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to > create a presentation controller per vcpu. But then how do we tell > KVM how many source controllers we want and how many interrupts each > source controller should handle? The source controllers are not tied > to any particular vcpu, and a source controller could potentially have > 100s of interrupts or more (particularly with MSIs). Configuration of > each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP > for configuring a source controller we'd be limited to 64 interrupts > per source controller. > > What I have in my patches to do XICS emulation in the kernel is a new > KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a > type, and for source controllers, an identifying number ("bus unit ID" > or BUID, since that's what the hardware calls it) and the number of > sources. Then for saving/restoring the presentation controller state > there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and > for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES > ioctls that take an argument struct like this: > > struct kvm_irq_sources { > __u32 start_irq_number; > __u32 nr_irqs; > __u64 *irqbuf; > }; > > OpenPIC also can handle 100s or 1000s of interrupt sources and can > have the sources divided up into blocks (which tends to make it > desirable to have multiple source controllers). It also has per-CPU > interrupt delivery registers and per-source interrupt source > registers. > > So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP > for small configurations, but it seems like it would run out of space > for larger configurations. Our architectures are not that different. We'll have the same challenge on x86 one day as well as there can be several IOAPICs (source controllers), not just one as today. Those should be addressed via chip_id of struct kvm_irqchip (we have a 32-bit address space there). Also there the question is when to instantiate the chips. Without adding another IOCTL, they could be created on first SET_IRQCHIP. For Power, the number of IRQ lines can become a set-once field in the source controller state, i.e. must never be written twice with different values. But, of course, some KVM_CREATE_IRQCHIP[2|_ARGS] that takes extra arguments and specifies those details is also be an option. There the question is how often it should be called: once with a list of all necessary parameters or multiple times as in your model. As I'd like to see a new IOCTL being able to replace the old one (though we will still support it for older user space, of course), I'm leaning more toward the first option. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-24 0:50 ` Paul Mackerras 2012-10-25 11:57 ` Jan Kiszka @ 2012-10-25 16:14 ` Paolo Bonzini 2012-10-25 16:32 ` Jan Kiszka 2012-10-25 19:39 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 59+ messages in thread From: Paolo Bonzini @ 2012-10-25 16:14 UTC (permalink / raw) To: Paul Mackerras Cc: Jan Kiszka, Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell Il 24/10/2012 02:50, Paul Mackerras ha scritto: > So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to > create a presentation controller per vcpu. But then how do we tell > KVM how many source controllers we want and how many interrupts each > source controller should handle? The source controllers are not tied > to any particular vcpu, and a source controller could potentially have > 100s of interrupts or more (particularly with MSIs). Configuration of > each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP > for configuring a source controller we'd be limited to 64 interrupts > per source controller. As Jan said, there's more than a few similarities between the x86 and PPC model. Taking inspiration from the x86 API, configuring the source controller seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric name, I must admit). Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-25 16:14 ` Paolo Bonzini @ 2012-10-25 16:32 ` Jan Kiszka 2012-10-25 18:27 ` Paolo Bonzini 2012-10-25 19:39 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Jan Kiszka @ 2012-10-25 16:32 UTC (permalink / raw) To: Paolo Bonzini Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell On 2012-10-25 18:14, Paolo Bonzini wrote: > Il 24/10/2012 02:50, Paul Mackerras ha scritto: >> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to >> create a presentation controller per vcpu. But then how do we tell >> KVM how many source controllers we want and how many interrupts each >> source controller should handle? The source controllers are not tied >> to any particular vcpu, and a source controller could potentially have >> 100s of interrupts or more (particularly with MSIs). Configuration of >> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP >> for configuring a source controller we'd be limited to 64 interrupts >> per source controller. > > As Jan said, there's more than a few similarities between the x86 and > PPC model. > > Taking inspiration from the x86 API, configuring the source controller > seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric > name, I must admit). For wiring things together, I agree. But the IOAPIC has a fixed number of input lines per chip, Power needs to configure them. I don't think deriving this from addressed lines is the best solution. Some lines may be configured (too) late, when the chip should have defined its configuration already. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-25 16:32 ` Jan Kiszka @ 2012-10-25 18:27 ` Paolo Bonzini 2012-10-25 19:40 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 59+ messages in thread From: Paolo Bonzini @ 2012-10-25 18:27 UTC (permalink / raw) To: Jan Kiszka Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell Il 25/10/2012 18:32, Jan Kiszka ha scritto: > For wiring things together, I agree. But the IOAPIC has a fixed number > of input lines per chip, Power needs to configure them. I don't think > deriving this from addressed lines is the best solution. Some lines may > be configured (too) late, when the chip should have defined its > configuration already. Sure, I was replying to this only: "Configuration of each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP for configuring a source controller we'd be limited to 64 interrupts per source controller" (I figure 64 is the size of the KVM_SET_IRQCHIP union / 64 bits per entry). Probably you do need a variant of KVM_CREATE_IRQCHIP to create the IOAPICs/source controllers (Paul's proposal at http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674 for example), assign chip ids to them, set the number of input lines, etc. but the configuration should work well with the existing ioctls, with no limit on the number of sources. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-25 18:27 ` Paolo Bonzini @ 2012-10-25 19:40 ` Benjamin Herrenschmidt 2012-10-26 9:58 ` Paolo Bonzini 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-25 19:40 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell On Thu, 2012-10-25 at 20:27 +0200, Paolo Bonzini wrote: > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the > IOAPICs/source controllers (Paul's proposal at > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674 > for example), assign chip ids to them, set the number of input lines, > etc. but the configuration should work well with the existing ioctls, > with no limit on the number of sources. But what do you mean by "configuration" really ? I don't see anything in common there. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-25 19:40 ` Benjamin Herrenschmidt @ 2012-10-26 9:58 ` Paolo Bonzini 2012-10-26 10:09 ` Peter Maydell 2012-10-26 10:37 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 59+ messages in thread From: Paolo Bonzini @ 2012-10-26 9:58 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto: >> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the >> > IOAPICs/source controllers (Paul's proposal at >> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674 >> > for example), assign chip ids to them, set the number of input lines, >> > etc. but the configuration should work well with the existing ioctls, >> > with no limit on the number of sources. > But what do you mean by "configuration" really ? I don't see anything in > common there. Wiring which MSI-X interrupts go to which source controllers. If you have one source controller per PCI bridge, you need to tell the kernel the mapping between MSI messages interrupts and PCI bridges, and update it whenever the MSI masking changes. The other problem is configuring the redirection table. If you need >64 sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 9:58 ` Paolo Bonzini @ 2012-10-26 10:09 ` Peter Maydell 2012-10-26 10:15 ` Paolo Bonzini 2012-10-26 10:37 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-26 10:09 UTC (permalink / raw) To: Paolo Bonzini Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 26 October 2012 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote: > Wiring which MSI-X interrupts go to which source controllers. If you > have one source controller per PCI bridge, you need to tell the kernel > the mapping between MSI messages interrupts and PCI bridges, and update > it whenever the MSI masking changes. > > The other problem is configuring the redirection table. If you need >64 > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG ioctls have plenty of space in the ID range to allow you to devote a subsection of it to your irqchip. (This is exactly how the ARM VGIC save/load is going to work.) Whether you want to do startup configuration and board wiring via the same ioctl that handles runtime state save/load/migration is a different question, of course. -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:09 ` Peter Maydell @ 2012-10-26 10:15 ` Paolo Bonzini 2012-10-26 10:22 ` Jan Kiszka 2012-10-26 10:44 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 59+ messages in thread From: Paolo Bonzini @ 2012-10-26 10:15 UTC (permalink / raw) To: Peter Maydell Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc Il 26/10/2012 12:09, Peter Maydell ha scritto: >> > >> > The other problem is configuring the redirection table. If you need >64 >> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. > Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG > ioctls have plenty of space in the ID range to allow you to devote > a subsection of it to your irqchip. (This is exactly how the ARM > VGIC save/load is going to work.) Ok, I stand corrected. :) > Whether you want to do startup configuration and board wiring via > the same ioctl that handles runtime state save/load/migration is > a different question, of course. QEMU's MSI-X routing is not x86-specific, so it should use the same KVM_SET_GSI_ROUTING ioctl that x86 uses. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:15 ` Paolo Bonzini @ 2012-10-26 10:22 ` Jan Kiszka 2012-10-26 10:44 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-26 10:22 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Maydell, Benjamin Herrenschmidt, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-26 12:15, Paolo Bonzini wrote: > Il 26/10/2012 12:09, Peter Maydell ha scritto: >>>> >>>> The other problem is configuring the redirection table. If you need >64 >>>> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. >> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG >> ioctls have plenty of space in the ID range to allow you to devote >> a subsection of it to your irqchip. (This is exactly how the ARM >> VGIC save/load is going to work.) > > Ok, I stand corrected. :) > >> Whether you want to do startup configuration and board wiring via >> the same ioctl that handles runtime state save/load/migration is >> a different question, of course. > > QEMU's MSI-X routing is not x86-specific, so it should use the same > KVM_SET_GSI_ROUTING ioctl that x86 uses. And it's not only MSI[-X]. Most IRQ sources need to be rounted, either from userspace or from irqfd or from some other in-kernel source to a specific IRQ controller. That allows to customize things according to a specific board / SoC emulation. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:15 ` Paolo Bonzini 2012-10-26 10:22 ` Jan Kiszka @ 2012-10-26 10:44 ` Benjamin Herrenschmidt 2012-10-26 11:00 ` Jan Kiszka 2012-10-26 11:17 ` Peter Maydell 1 sibling, 2 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 10:44 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Maydell, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote: > > Whether you want to do startup configuration and board wiring via > > the same ioctl that handles runtime state save/load/migration is > > a different question, of course. > > QEMU's MSI-X routing is not x86-specific, so it should use the same > KVM_SET_GSI_ROUTING ioctl that x86 uses. Well, that's the thing, I haven't managed to figure that out so far, it looks very x86-specific to me. To begin with there's no such thing as a "GSI" in our world. Basically we have a global interrupt number space. Interrupt numbers are 24-bit long quantities. On real HW, some bits are called the "BUID" and identify a given source controller and some bits are the interrupt within that source controller but that's fairly flexible and generally the OS doesn't care about it. The firmware sets up the mappings and tells us the final numbers via the device-tree. Under a hypervisor, it's totally virtualized already so we show a flat 24 bit number space to the guest. MSIs don't work exactly like x86 either. On real HW, we have a different MSI port per "partitionable endpoint" which are use purely for validation of access permission. The message itself contain the interrupt source number within the BUID of the bridge. A given bridge today can contains up to 256 of these on a P7IOC chip but upcoming stuff can have thousands. The final interrupt number seen by the OS is thus just that MSI message in the bottom bits and the BUID in the top bits. Here too, under a hypervisor, it's all virtualized so qemu just gives 24 bit numbers to the various emulated MSIs as part of the global interrupt number space. I'm not sure how any of that would need kernel communication. All we need is to be able to associate a given global interrupt with an eventfd. I might just miss some subtleties here but so far I haven't been able to figure out how to "shoehorn" our stuff in the very x86-centric existing interfaces to the kernel APICs. In fact all that code is in a generic location in kvm but is really x86/ia64 centric and the interfaces seem to be as well. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:44 ` Benjamin Herrenschmidt @ 2012-10-26 11:00 ` Jan Kiszka 2012-10-26 11:09 ` Benjamin Herrenschmidt 2012-10-26 11:17 ` Peter Maydell 1 sibling, 1 reply; 59+ messages in thread From: Jan Kiszka @ 2012-10-26 11:00 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-26 12:44, Benjamin Herrenschmidt wrote: > On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote: >>> Whether you want to do startup configuration and board wiring via >>> the same ioctl that handles runtime state save/load/migration is >>> a different question, of course. >> >> QEMU's MSI-X routing is not x86-specific, so it should use the same >> KVM_SET_GSI_ROUTING ioctl that x86 uses. > > Well, that's the thing, I haven't managed to figure that out so far, it > looks very x86-specific to me. To begin with there's no such thing as a > "GSI" in our world. > > Basically we have a global interrupt number space. Interrupt numbers are > 24-bit long quantities. On real HW, some bits are called the "BUID" and > identify a given source controller and some bits are the interrupt > within that source controller but that's fairly flexible and generally > the OS doesn't care about it. The firmware sets up the mappings and > tells us the final numbers via the device-tree. > > Under a hypervisor, it's totally virtualized already so we show a flat > 24 bit number space to the guest. > > MSIs don't work exactly like x86 either. On real HW, we have a different > MSI port per "partitionable endpoint" which are use purely for > validation of access permission. The message itself contain the > interrupt source number within the BUID of the bridge. A given bridge > today can contains up to 256 of these on a P7IOC chip but upcoming stuff > can have thousands. The final interrupt number seen by the OS is thus > just that MSI message in the bottom bits and the BUID in the top bits. > > Here too, under a hypervisor, it's all virtualized so qemu just gives 24 > bit numbers to the various emulated MSIs as part of the global interrupt > number space. > > I'm not sure how any of that would need kernel communication. All we > need is to be able to associate a given global interrupt with an > eventfd. And at latest there you will need the IRQ routing infrastructure of KVM. It tells KVM which "virtual IRQ" (badly named "GSI") triggers which event at which input, e.g. a physical IRQ line at some IRQ controller or a specific message at some MSI receiver. You shouldn't try to invent a Power wheel here, rather tune the existing one to become more generic. We could even try to get rid of that unfortunate GSI name (when leaving aliases behind), though that is cosmetic. > > I might just miss some subtleties here but so far I haven't been able to > figure out how to "shoehorn" our stuff in the very x86-centric existing > interfaces to the kernel APICs. In fact all that code is in a generic > location in kvm but is really x86/ia64 centric and the interfaces seem > to be as well. That's not true in general, though you surely find a lot of traces and still a few concrete x86 bits under virt/kvm. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:00 ` Jan Kiszka @ 2012-10-26 11:09 ` Benjamin Herrenschmidt 2012-10-26 11:57 ` Paolo Bonzini 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 11:09 UTC (permalink / raw) To: Jan Kiszka Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Fri, 2012-10-26 at 13:00 +0200, Jan Kiszka wrote: > And at latest there you will need the IRQ routing infrastructure of KVM. > It tells KVM which "virtual IRQ" (badly named "GSI") triggers which > event at which input, e.g. a physical IRQ line at some IRQ controller or > a specific message at some MSI receiver. You shouldn't try to invent a > Power wheel here, rather tune the existing one to become more generic. > We could even try to get rid of that unfortunate GSI name (when leaving > aliases behind), though that is cosmetic. Ok, there must be something wrong with me, I still don't understand what you are talking about. What "MSI receiver" ? What physical IRQ line are you talking about ? How is the kernel involved ? The only cases I can think of are the association between a virtual interrupt (ie, an interrupt in the guest interrupt number space) and an in-kernel source for that interrupt, ie, vhost and PCI pass-through essentially. Anything else is under qemu control. IE. MSIs or LSIs generated by emulated devices are just normal interrupt that go through our ioctl to trigger the in-kernel source with the same number. I don't see any "routing" happening anywhere in that picture really. The firmware calls done by the guest to change the target of interrupts (which CPU/presentation controller to direct a given interrupt to) are handled entirely in the kernel in platform specific code and update our internal ICS state. > > > > I might just miss some subtleties here but so far I haven't been able to > > figure out how to "shoehorn" our stuff in the very x86-centric existing > > interfaces to the kernel APICs. In fact all that code is in a generic > > location in kvm but is really x86/ia64 centric and the interfaces seem > > to be as well. > > That's not true in general, though you surely find a lot of traces and > still a few concrete x86 bits under virt/kvm. Well, I haven't found anything in virt/kvm/irq_comm.c that was of any use to us. Again, I might be sufferring from a major misunderstanding here but as far as I can tell, the model is totally different. Besides, that file has a hard coded list of what looks like completely x86 specific mappings between "GSI" and "interrupt numbers" (again I don't understand completely the distinction and I don't think we have anything like it on power). Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:09 ` Benjamin Herrenschmidt @ 2012-10-26 11:57 ` Paolo Bonzini 2012-10-26 12:08 ` Peter Maydell 2012-10-26 20:21 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 59+ messages in thread From: Paolo Bonzini @ 2012-10-26 11:57 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto: > The only cases I can think of are the association between a virtual > interrupt (ie, an interrupt in the guest interrupt number space) and an > in-kernel source for that interrupt, ie, vhost and PCI pass-through > essentially. If you exclude old-style PCI pass-through and limit yourself to vhost and VFIO, you can treat irqfd as "the" in-kernel source of the interrupt. Then you need a mapping between MSIs and numbers used in KVM_IRQFD ("GSIs"). This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is modified every time a vector is masked/unmasked in the MSI-X table. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:57 ` Paolo Bonzini @ 2012-10-26 12:08 ` Peter Maydell 2012-10-26 12:41 ` Jan Kiszka 2012-10-26 20:21 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-26 12:08 UTC (permalink / raw) To: Paolo Bonzini Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote: > If you exclude old-style PCI pass-through and limit yourself to vhost > and VFIO, you can treat irqfd as "the" in-kernel source of the > interrupt. Then you need a mapping between MSIs and numbers used in > KVM_IRQFD ("GSIs"). > > This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is > modified every time a vector is masked/unmasked in the MSI-X table. So SET_GSI_ROUTING sets the routing for MSIs? Very logical... -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 12:08 ` Peter Maydell @ 2012-10-26 12:41 ` Jan Kiszka 0 siblings, 0 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-26 12:41 UTC (permalink / raw) To: Peter Maydell Cc: Paolo Bonzini, Benjamin Herrenschmidt, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-26 14:08, Peter Maydell wrote: > On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote: >> If you exclude old-style PCI pass-through and limit yourself to vhost >> and VFIO, you can treat irqfd as "the" in-kernel source of the >> interrupt. Then you need a mapping between MSIs and numbers used in >> KVM_IRQFD ("GSIs"). >> >> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is >> modified every time a vector is masked/unmasked in the MSI-X table. > > So SET_GSI_ROUTING sets the routing for MSIs? Very logical... See my reply to Ben: It is used for MSIs as well, but not only. The concept is absolutely generic, you just need to define specific target types and provide ways to associate specific sources with a virtual IRQ number ("GSI"). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:57 ` Paolo Bonzini 2012-10-26 12:08 ` Peter Maydell @ 2012-10-26 20:21 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 20:21 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Fri, 2012-10-26 at 13:57 +0200, Paolo Bonzini wrote: > Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto: > > The only cases I can think of are the association between a virtual > > interrupt (ie, an interrupt in the guest interrupt number space) and an > > in-kernel source for that interrupt, ie, vhost and PCI pass-through > > essentially. > > If you exclude old-style PCI pass-through and limit yourself to vhost > and VFIO, you can treat irqfd as "the" in-kernel source of the > interrupt. Then you need a mapping between MSIs and numbers used in > KVM_IRQFD ("GSIs"). Argh. Ok, I get that we need a mapping between an irqfd and a global number, I don't see where MSIs come into the picture at all here. At least for us they don't. > This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is > modified every time a vector is masked/unmasked in the MSI-X table. Right and that doesn't make sense for us. In fact I don't understand how it makes sense for x86 either, but that's because I don't understand how the APIC works I suppose. Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:44 ` Benjamin Herrenschmidt 2012-10-26 11:00 ` Jan Kiszka @ 2012-10-26 11:17 ` Peter Maydell 2012-10-26 11:39 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-26 11:17 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 26 October 2012 11:44, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote: >> QEMU's MSI-X routing is not x86-specific, so it should use the same >> KVM_SET_GSI_ROUTING ioctl that x86 uses. > > Well, that's the thing, I haven't managed to figure that out so far, it > looks very x86-specific to me. To begin with there's no such thing as a > "GSI" in our world. This was roughly the feeling I had looking at these APIs. There might be some underlying generic concept but there is a definite tendency for the surface representation to use x86 specific terminology to the extent that you can't tell whether an API is x86 specific or merely apparently so... -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:17 ` Peter Maydell @ 2012-10-26 11:39 ` Benjamin Herrenschmidt 2012-10-26 12:39 ` Jan Kiszka 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 11:39 UTC (permalink / raw) To: Peter Maydell Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote: > > Well, that's the thing, I haven't managed to figure that out so far, > it > > looks very x86-specific to me. To begin with there's no such thing > as a > > "GSI" in our world. > > This was roughly the feeling I had looking at these APIs. There > might be some underlying generic concept but there is a definite > tendency for the surface representation to use x86 specific > terminology to the extent that you can't tell whether an API > is x86 specific or merely apparently so... Right. Which is why I'm sure I'm actually missing something there :-) And I'm hoping Paolo and Jan will help shed some light. It might help if somebody could explain a bit more what a GSI is in x86 land and how it relates to the various APICs, along with what exactly they mean by "routing" , ie. what are the different elements that get associated. Basically, if somebody could describe how the x86 stuff works, that might help. >From my view of things, we have various "sources" of interrupts. On my list are emulated device LSIs, emulated device MSIs, both in qemu, then vhost and finally pass-through, I suppose on some platforms IPIs come in as well though. Those "sources" need, one way or another, to hit a source controller which will then itself, in a platform specific way, shoot the interrupt to a presentation controller. The routing between source and presentation controllers is fairly platform specific as far as I can tell even within a given CPU family. Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than the way the XICS system does it on pseries, and is different from most embedded stuff (which typically doesn't have that source/presentation distinction but just cascaded dumber PICs). The amount of configurability, the type of configuration information etc... that governs such a layout is also very specific to the platform and the type of interrupt controller system used on it. Remains the "routing" between source of "events" and actual "inputs" to a source controller. This too doesn't seem totally obvious to generalize. For example an embedded platform with a bunch of cascaded dumb interrupt controllers doesn't have a concept of a flat number space in HW, an interrupt "input" to be identified properly, needs to identify the controller and the interrupt within that controller. However, within KVM/qemu, it's pretty easy to assign to each controller a number and by collating the two, get some kind of flat space, though it's not arbitrary and the routing is thus fairly constrained if not totally fixed. In the pseries case, the global number is split in two bit fields, the BUID identifying the specific source controller and the source within that controller. Here too it's fairly fixed though. So the ioctl we use to create a source controller in the kernel takes the BUID as an argument, and from there the kernel will "find" the right source controller based solely on the interrupt number. So basically on one side we have a global interrupt number that identifies an "input", I assume that's what x86 calls a GSI ? Remains how to associate the various sources of interrupts to that 'global number'... and that is fairly specific to each source type isn't it ? In our current powerpc code, the emulated devices toggle the qirq which ends up shooting an ioctl to set/reset or "message" (for MSIs) the corresponding global interrupt. The mapping is established entirely within qemu, we just tell the kernel to trigger a given interrupt. We haven't really sorted vhost out yet so I'm not sure how that will work out but the idea would be to have an ioctl to associate an eventfd or whatever vhost uses as interrupt "outputs" with a global interrupt number. For pass-through, currently our VFIO is dumb, interrupts get to qemu which then shoots them back to the kernel using the standard qirq stuff used by emulated devices. Here I suppose we would want something similar to vhost to associate the VFIO irq fd with a "global number". Is that what the existing ioctl's provide ? Their semantics aren't totally obvious to me. Note that for pass-through at least, and possibly for vhost, we'd like to actually totally bypass the irqfd & eventfd stuff for performance reasons. At least for VFIO, if we are going to get the max performance out of it, we need to take all generic code out of the picture. IE. If the interrupts are routed to the physical CPU where the guest is running, we want to be able to catch and distribute the interrupts to the guest entirely within guest context, ie, with KVM arch specific low level code that runs in "real mode" (ie MMU off) without context switching the MMU back to the host, which on POWER is fairly costly. That means that at least the association between a guest global interrupt number and a host global interrupt number for pass-through will be something that goes entirely through arch specific code path. We might still be able to use generic APIs to establish it if they are suitable though. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 11:39 ` Benjamin Herrenschmidt @ 2012-10-26 12:39 ` Jan Kiszka 2012-10-26 20:45 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 59+ messages in thread From: Jan Kiszka @ 2012-10-26 12:39 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 2012-10-26 13:39, Benjamin Herrenschmidt wrote: > On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote: >>> Well, that's the thing, I haven't managed to figure that out so far, >> it >>> looks very x86-specific to me. To begin with there's no such thing >> as a >>> "GSI" in our world. >> >> This was roughly the feeling I had looking at these APIs. There >> might be some underlying generic concept but there is a definite >> tendency for the surface representation to use x86 specific >> terminology to the extent that you can't tell whether an API >> is x86 specific or merely apparently so... > > Right. Which is why I'm sure I'm actually missing something there :-) > And I'm hoping Paolo and Jan will help shed some light. > > It might help if somebody could explain a bit more what a GSI is in x86 > land and how it relates to the various APICs, along with what exactly > they mean by "routing" , ie. what are the different elements that get > associated. Basically, if somebody could describe how the x86 stuff > works, that might help. > > From my view of things, we have various "sources" of interrupts. On my > list are emulated device LSIs, emulated device MSIs, both in qemu, then > vhost and finally pass-through, I suppose on some platforms IPIs come in > as well though. Those "sources" need, one way or another, to hit a > source controller which will then itself, in a platform specific way, > shoot the interrupt to a presentation controller. > > The routing between source and presentation controllers is fairly > platform specific as far as I can tell even within a given CPU family. > Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than > the way the XICS system does it on pseries, and is different from most > embedded stuff (which typically doesn't have that source/presentation > distinction but just cascaded dumber PICs). The amount of > configurability, the type of configuration information etc... that > governs such a layout is also very specific to the platform and the type > of interrupt controller system used on it. But we are just talking about sending messages from A to B or soldering an input to an output pin. That's pretty generic. Give each output event a virtual IRQ number and define where its output "line" should be linked to (input pin of target controller). All what will be specific are the IDs of those controllers. Of course, all that provided you do their emulation in kernel space. For x86, that even makes sense when the IRQ sources are in user space as the guest may still have to interact during IRQ delivery with IOAPIC, thus we save some costly heavy-weight exits when putting it in the kernel. > > Remains the "routing" between source of "events" and actual "inputs" to > a source controller. > > This too doesn't seem totally obvious to generalize. For example an > embedded platform with a bunch of cascaded dumb interrupt controllers > doesn't have a concept of a flat number space in HW, an interrupt > "input" to be identified properly, needs to identify the controller and > the interrupt within that controller. However, within KVM/qemu, it's > pretty easy to assign to each controller a number and by collating the > two, get some kind of flat space, though it's not arbitrary and the > routing is thus fairly constrained if not totally fixed. IRQ routing entry: - virq number ("gsi") - type (controller ID, MSI, whatever you like) - some flags (to extend it) - type-specific data (MSI message, controller input pin, etc.) And there can be multiple entries with the same virq, thus you can deliver to multiple targets. I bet you can model quite a lot of your platform specific routing this way. I'm not saying our generic code will work out of the box, but at least the interfaces and concepts are there. > > In the pseries case, the global number is split in two bit fields, the > BUID identifying the specific source controller and the source within > that controller. Here too it's fairly fixed though. So the ioctl we use > to create a source controller in the kernel takes the BUID as an > argument, and from there the kernel will "find" the right source > controller based solely on the interrupt number. > > So basically on one side we have a global interrupt number that > identifies an "input", I assume that's what x86 calls a GSI ? Right. The virtual IRQ numbers we call "GSI" is partially occupied by the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and PIC there, and then followed by IRQs that are mapped on MSI messages. But that's just how we _use_ it on x86, not how it has to work for other archs. > > Remains how to associate the various sources of interrupts to that > 'global number'... and that is fairly specific to each source type isn't > it ? > > In our current powerpc code, the emulated devices toggle the qirq which > ends up shooting an ioctl to set/reset or "message" (for MSIs) the > corresponding global interrupt. The mapping is established entirely > within qemu, we just tell the kernel to trigger a given interrupt. > > We haven't really sorted vhost out yet so I'm not sure how that will > work out but the idea would be to have an ioctl to associate an eventfd > or whatever vhost uses as interrupt "outputs" with a global interrupt > number. KVM_IRQFD is already there. It associates an irqfd file descriptor with a virtual IRQ. Once that triggers, the IRQ routing table is used to define the actual interrupt type and destination chip to use, see above. > > For pass-through, currently our VFIO is dumb, interrupts get to qemu > which then shoots them back to the kernel using the standard qirq stuff > used by emulated devices. Here I suppose we would want something similar > to vhost to associate the VFIO irq fd with a "global number". > > Is that what the existing ioctl's provide ? Their semantics aren't > totally obvious to me. Provided you want to trigger a MSI message, you first need to register it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING). That will give you a virtual IRQ number which can be associate with an irqfd file descriptor as explained above (KVM_IRQFD). But you may also create a different kind of routing table entry if MSI is not all you need to inject via irqfd. Could be a plain IRQ line as well, routed to a specific in-kernel IRQ controller model. > > Note that for pass-through at least, and possibly for vhost, we'd like > to actually totally bypass the irqfd & eventfd stuff for performance > reasons. At least for VFIO, if we are going to get the max performance > out of it, we need to take all generic code out of the picture. IE. If > the interrupts are routed to the physical CPU where the guest is > running, we want to be able to catch and distribute the interrupts to > the guest entirely within guest context, ie, with KVM arch specific low > level code that runs in "real mode" (ie MMU off) without context > switching the MMU back to the host, which on POWER is fairly costly. > > That means that at least the association between a guest global > interrupt number and a host global interrupt number for pass-through > will be something that goes entirely through arch specific code path. We > might still be able to use generic APIs to establish it if they are > suitable though. The same will happen on x86: direct injection to a target VCPU. Maybe again a topic for our IRQ routing table, just with specialized target types. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 12:39 ` Jan Kiszka @ 2012-10-26 20:45 ` Benjamin Herrenschmidt 2012-10-26 22:03 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 20:45 UTC (permalink / raw) To: Jan Kiszka Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote: > But we are just talking about sending messages from A to B or soldering > an input to an output pin. That's pretty generic. Give each output event > a virtual IRQ number and define where its output "line" should be linked > to (input pin of target controller). All what will be specific are the > IDs of those controllers. Hrm you seem to be saying something very different from Paolo here. Unless it's just a very very confused terminology. So let's see the powerpc "pseries" case. Things like embedded etc... might be quite different. We have essentially two "outputs" here. One is qemu itself shooting interrupts (emulated devices, virtio, etc...). This is an ioctl and that gives you a global interrupt number. So this goes directly to the source controller which then uses it's internal logic to send that to the presentation controller in ways that are entirely implementation specific. The specific source controller is located using the top bits of the global interrupt number (the BUID). When we create source controllers, we pass as argument to the ioctl the BUID for that source controller and the number of interrupts it handles. The other "output" is irqfd for kernel originated events. Here I assume there's an in-kernel way to directly call a function rather than queue something for qemu to consume later, anything else would be horribly wasteful. Here too, what we need here is a global interrupt number, so we can find the source controller by BUID and shoot it the interrupt. So that's the only case I see where we need an association of some kind, which is irqfs -> global number. I don't see where the "MSIs" that Paolo keep talking about come into play. User space (emulated) MSIs are dealt within qemu entirely and MSIs from VFIO end up as irqfd. Finally there is the "routing" between a given interrupt source (an entry in the source controller state table) and the target processor (the corresponding presentation controller). That routing is purely a field in the source controller field, which is there along with the interrupt priority and a few state bits. (We don't need to deal with level/edge because of the way the ICS work, we just say at the time of the triggering of an interrupt whether it's a level set, level reset, or message, and it will do the right thing). This field is accessed (programmed) by the guest using a firmware interface that is implemented in the kernel part of KVM. It's a platform specific API and it accesses the source controller (it's implemented three really). I don't see where any generic API here would make sense other than maybe adding useless bloat. The only place where qemu might "see" that stuff is for migration where it needs to save all the state of all the sources and restore it on the target. The actual communication between source controllers and presentation controllers is also entirely platform specific. It follows a somewhat specified protocol (we mimmic what the HW actually does) and here too, I see no room for anything generic. > Of course, all that provided you do their emulation in kernel space. For > x86, that even makes sense when the IRQ sources are in user space as the > guest may still have to interact during IRQ delivery with IOAPIC, thus > we save some costly heavy-weight exits when putting it in the kernel. We have a way to lower that cost. Since the interaction with the presentation controller is done by hypervisor calls, we handle them directly in real mode within the guest MMU context unless some exceptional condition is hit (such as the need to trigger a resend from one of the source controllers or an interrupt rejection). > > > > Remains the "routing" between source of "events" and actual "inputs" to > > a source controller. > > > > This too doesn't seem totally obvious to generalize. For example an > > embedded platform with a bunch of cascaded dumb interrupt controllers > > doesn't have a concept of a flat number space in HW, an interrupt > > "input" to be identified properly, needs to identify the controller and > > the interrupt within that controller. However, within KVM/qemu, it's > > pretty easy to assign to each controller a number and by collating the > > two, get some kind of flat space, though it's not arbitrary and the > > routing is thus fairly constrained if not totally fixed. > > IRQ routing entry: > - virq number ("gsi") > - type (controller ID, MSI, whatever you like) What is "controller ID" ? That doesn't mean anything to me. In our case, the specific source controller is known from the virq number (the top bits of it basically). > - some flags (to extend it) > - type-specific data (MSI message, controller input pin, etc.) I don't understand that business about MSIs really. I suppose it has to do with the way you do old-style device assignment ? Either MSIs come from virtual/emulated devices in which case they are a qemu fiction and qemu just sends us an ioctl with the virq number or they come from real devices in which case they are setup normally by the host kernel using host kernel MSI addresses, and we catch them via irqfd (or some platform specific bypass that we might implement in the future). We do have some per-interrupt data I mentioned earlier, the target presentation controller (known as server ID, basically the HW CPU number of the target) and a few bits of state. We only ever access it from qemu for migration though. > And there can be multiple entries with the same virq, thus you can > deliver to multiple targets. I bet you can model quite a lot of your > platform specific routing this way. I'm not saying our generic code will > work out of the box, but at least the interfaces and concepts are there. I don't see how we can model anything using that. Qemu doesn't actually look or modify any of that state other than during migration anyway. We do have a concept of delivery to multiple targets via a "link" mechanism which allows an interrupt to bounce to another target within a "ring" if the original target is busy (that's 2 bits of state) but this too is configured via firmware interfaces that are handled entirely in the kernel, not in qemu. > > In the pseries case, the global number is split in two bit fields, the > > BUID identifying the specific source controller and the source within > > that controller. Here too it's fairly fixed though. So the ioctl we use > > to create a source controller in the kernel takes the BUID as an > > argument, and from there the kernel will "find" the right source > > controller based solely on the interrupt number. > > > > So basically on one side we have a global interrupt number that > > identifies an "input", I assume that's what x86 calls a GSI ? > > Right. The virtual IRQ numbers we call "GSI" is partially occupied by > the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and > PIC there, and then followed by IRQs that are mapped on MSI messages. > But that's just how we _use_ it on x86, not how it has to work for other > archs. > > > > > Remains how to associate the various sources of interrupts to that > > 'global number'... and that is fairly specific to each source type isn't > > it ? > > > > In our current powerpc code, the emulated devices toggle the qirq which > > ends up shooting an ioctl to set/reset or "message" (for MSIs) the > > corresponding global interrupt. The mapping is established entirely > > within qemu, we just tell the kernel to trigger a given interrupt. > > > > We haven't really sorted vhost out yet so I'm not sure how that will > > work out but the idea would be to have an ioctl to associate an eventfd > > or whatever vhost uses as interrupt "outputs" with a global interrupt > > number. > > KVM_IRQFD is already there. It associates an irqfd file descriptor with > a virtual IRQ. Once that triggers, the IRQ routing table is used to > define the actual interrupt type and destination chip to use, see above. We only need irqfd to virtual irq. The rests doesn't make sense to me (the "routing table bit"). > > For pass-through, currently our VFIO is dumb, interrupts get to qemu > > which then shoots them back to the kernel using the standard qirq stuff > > used by emulated devices. Here I suppose we would want something similar > > to vhost to associate the VFIO irq fd with a "global number". > > > > Is that what the existing ioctl's provide ? Their semantics aren't > > totally obvious to me. > > Provided you want to trigger a MSI message, you first need to register > it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING). Why ? Again, I don't get it. > That will give you a virtual IRQ number which can be associate with an > irqfd file descriptor as explained above (KVM_IRQFD). But virq numbers are entirely under qemu control. qemu creates the source controllers, and assign the virq numbers to the devices. I really don't quite get how that concept of "GSI routing" means anything for us. We certainly don't want to have the kernel return a virq number to qemu. That whole business with MSIs makes very little sense to me. > But you may also > create a different kind of routing table entry if MSI is not all you > need to inject via irqfd. Could be a plain IRQ line as well, routed to a > specific in-kernel IRQ controller model. Sorry, I must be totally stupid or something but I don't understand. Doesn't make sense to me. > > Note that for pass-through at least, and possibly for vhost, we'd like > > to actually totally bypass the irqfd & eventfd stuff for performance > > reasons. At least for VFIO, if we are going to get the max performance > > out of it, we need to take all generic code out of the picture. IE. If > > the interrupts are routed to the physical CPU where the guest is > > running, we want to be able to catch and distribute the interrupts to > > the guest entirely within guest context, ie, with KVM arch specific low > > level code that runs in "real mode" (ie MMU off) without context > > switching the MMU back to the host, which on POWER is fairly costly. > > > > That means that at least the association between a guest global > > interrupt number and a host global interrupt number for pass-through > > will be something that goes entirely through arch specific code path. We > > might still be able to use generic APIs to establish it if they are > > suitable though. > > The same will happen on x86: direct injection to a target VCPU. Maybe > again a topic for our IRQ routing table, just with specialized target types. Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 20:45 ` Benjamin Herrenschmidt @ 2012-10-26 22:03 ` Benjamin Herrenschmidt 2012-10-27 8:06 ` Jan Kiszka 2012-10-27 10:01 ` Peter Maydell 0 siblings, 2 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 22:03 UTC (permalink / raw) To: Jan Kiszka Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote: > On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote: > > > But we are just talking about sending messages from A to B or soldering > > an input to an output pin. That's pretty generic. Give each output event > > a virtual IRQ number and define where its output "line" should be linked > > to (input pin of target controller). All what will be specific are the > > IDs of those controllers. > > Hrm you seem to be saying something very different from Paolo here. > Unless it's just a very very confused terminology. > > So let's see the powerpc "pseries" case. Things like embedded etc... > might be quite different. So I had a chat with Anthony who explained to me a bit more about what the x86 stuff is about. It's pretty horrible I must say :-) So correct me if I'm wrong but you essentially have to differentiate between MSI "outputs" and other (GSI) "outputs" due to the fact that MSIs in x86 land don't act as normal interrupts going through a source controller but instead get shot directly to the target CPU. Then you have to establish some kind of "routing" from those GSIs to some IO/APIC, and from MSIs to local APICs. That's where I think there is a fairly fundamental difference with us. So let's cut that problem in two. The GSI bit and the MSI bit. The reason is that the way x86 does MSIs seems to be fairly x86 specific, I wouldn't be surprised if everybody else did MSIs like we do them, that is turn them into normal interrupts (ie, GSIs). But let's discuss that below. So the GSI bit. We can assume that GSIs in that context are basically our "global interrupt number". This would apply to pretty much every platform indeed. The routing here, if I understand things correctly, consists of associating such a global interrupt number with a specific input pin (or virtual pin) of a specific source controller (ie, IO APIC). This would generally make sense in embedded space as well I suppose, where you can have multiple or even cascaded interrupt controllers of different breeds etc... However, in the pseries system, this routing is essentially encoded in the interrupt number itself. As I think I explained earlier, the number is arbitrarily split in two parts, the top bits indicating the source controller and the bottom bits the source within that controller. In qemu/kvm we have made an arbitrary split (whose size I don't remember precisely) and we currently create only one fairly big source controller but we might change that in the future. This there is no such thing as needing to "associate" or create routing entries here. qemu will directly shoot "GSIs" using an ioctl and our code can directly map that to a source controller without any routing table of any sort. In fact, adding one would complicate things since we'd have a requirement that it's populated 1:1 or thing would get very confused indeed so overall, there's no point for us to implement or use that API or the "generic" code behind it, it would be pure bloat, complication and problems. However, making that code more generic might make sense for other platforms (including other powerpc platforms such as embedded) where multiple interrupt controllers may exist though here too, it's probably going to be fairly common that the GSI numbers are essentially be a bit field split with entire ranges assigned to a given PIC. We don't have to emulate x86+ACPI ability to individually remap interrupts. The case if MSIs now. My understanding from what Anthony says is that your MSIs essentially bypass the IO APIC and route directly to the local APIC, which is equivalent to our presentation controller. You thus need specific APIs to associate an MSI (which isn't a GSI) to as specific local APIC. We have no such need at all. Our MSIs are decoded by the PCI host bridge and directly turned into "normal" interrupt. In fact, in HW, our bridges contain a special source controller that *is* essentially the thing that gets hit by MSIs. So our MSIs are just normal interrupts in the global space. Their numbers are assigned by qemu, the kernel never knows about them. When an emulated device triggers an MSI that turns into a normal "trigger global interrupt X" ioctl to the kernel. The only "knowledge" the kernel emulation gets along the way is an argument to the ioctl that indicates whether this is a level set, level reset, or edge type action (MSIs are edge obviously) which dictates how the delivery state machine will work (one shot vs. continuous until cleared). So qemu assigns interrupt numbers to MSIs and there's never any routing to establish at the kernel level. That also means that the current API that has tendrils all the way into devices in qemu for "getting the virq for a given MSI" is totally unsuitable for us. In fact we don't need a different API for KVM vs. full emulation. Everything in qemu side is the same, until the qirq gets actually delivered in which case with KVM we'll shoot an ioctl rather than emulating the source controller. So the only APIs we need as these: - Create the IRQ chips themselves - Shoot an interrupt - Save and Restore of individual source state for migration (The content of the state is very specific to a given IRQ chip implementation). I fail to see how we can shoe horn any of that in generic code, it doesn't fit the model you currently have at all and making it do so would add bloat and complexity without any benefit. IE. We wouldn't "share" code, we would "add" code not otherwise useful. The ARM situation might be different (and the powerpc situation for other platforms such as mac99 and embedded) in that there might be some value in having that GSI -> PIC input mapping, but here too I tend to doubt it. We are probably better off starting with a cleaner slate without the gross x86 baggage and use a unified flat number space in the kernel, leaving all the complication of who is connected to whome to qemu. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 22:03 ` Benjamin Herrenschmidt @ 2012-10-27 8:06 ` Jan Kiszka 2012-10-27 10:01 ` Peter Maydell 1 sibling, 0 replies; 59+ messages in thread From: Jan Kiszka @ 2012-10-27 8:06 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc [-- Attachment #1: Type: text/plain, Size: 8944 bytes --] On 2012-10-27 00:03, Benjamin Herrenschmidt wrote: > On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote: >> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote: >> >>> But we are just talking about sending messages from A to B or soldering >>> an input to an output pin. That's pretty generic. Give each output event >>> a virtual IRQ number and define where its output "line" should be linked >>> to (input pin of target controller). All what will be specific are the >>> IDs of those controllers. >> >> Hrm you seem to be saying something very different from Paolo here. >> Unless it's just a very very confused terminology. >> >> So let's see the powerpc "pseries" case. Things like embedded etc... >> might be quite different. > > So I had a chat with Anthony who explained to me a bit more about what > the x86 stuff is about. It's pretty horrible I must say :-) > > So correct me if I'm wrong but you essentially have to differentiate > between MSI "outputs" and other (GSI) "outputs" due to the fact that > MSIs in x86 land don't act as normal interrupts going through a source > controller but instead get shot directly to the target CPU. > > Then you have to establish some kind of "routing" from those GSIs to > some IO/APIC, and from MSIs to local APICs. I'm afraid there are still some misconceptions about what is happening on x86 and what role the bits play that are more generic. The fact that we can inject MSI messages directly to the target APIC doesn't affect the need to have IRQ routing support. That is used for two reason on x86: - define the wiring from a classic IRQ line to the various (legacy) IRQ controllers we have, namely the IOAPIC and the PIC - define the IRQ input that should be generated when an irqfd triggers (that's currently just irqfd->MSI associations, but irqfd->irqchip may come as well for vfio) > > That's where I think there is a fairly fundamental difference with us. > > So let's cut that problem in two. The GSI bit and the MSI bit. The > reason is that the way x86 does MSIs seems to be fairly x86 specific, I > wouldn't be surprised if everybody else did MSIs like we do them, that > is turn them into normal interrupts (ie, GSIs). But let's discuss that > below. > > So the GSI bit. We can assume that GSIs in that context are basically > our "global interrupt number". This would apply to pretty much every > platform indeed. > > The routing here, if I understand things correctly, consists of > associating such a global interrupt number with a specific input pin (or > virtual pin) of a specific source controller (ie, IO APIC). ...or PIC or whatever you have on your platform. > > This would generally make sense in embedded space as well I suppose, > where you can have multiple or even cascaded interrupt controllers of > different breeds etc... > > However, in the pseries system, this routing is essentially encoded in > the interrupt number itself. As I think I explained earlier, the number > is arbitrarily split in two parts, the top bits indicating the source > controller and the bottom bits the source within that controller. In > qemu/kvm we have made an arbitrary split (whose size I don't remember > precisely) and we currently create only one fairly big source controller > but we might change that in the future. > > This there is no such thing as needing to "associate" or create routing > entries here. qemu will directly shoot "GSIs" using an ioctl and our > code can directly map that to a source controller without any routing > table of any sort. In fact, adding one would complicate things since > we'd have a requirement that it's populated 1:1 or thing would get very > confused indeed so overall, there's no point for us to implement or use > that API or the "generic" code behind it, it would be pure bloat, > complication and problems. OK, that puts the IRQ injection IOCTL on pseries in the same category as KVM_SIGNAL_MSI on x86. Both require no routing as the target address is fully encoded and not otherwise dynamically remapped in the kernel. > > However, making that code more generic might make sense for other > platforms (including other powerpc platforms such as embedded) where > multiple interrupt controllers may exist though here too, it's probably > going to be fairly common that the GSI numbers are essentially be a bit > field split with entire ranges assigned to a given PIC. We don't have to > emulate x86+ACPI ability to individually remap interrupts. In KVM, a "GSI" is an index to its central IRQ routing table where the target IRQ controller or MSI message is defined. Other interpretations of the number passed down the IRQ injection IOCTL are of course possible (like you do on pseries), but then it becomes arch-specific. > > The case if MSIs now. My understanding from what Anthony says is that > your MSIs essentially bypass the IO APIC and route directly to the local > APIC, which is equivalent to our presentation controller. You thus need > specific APIs to associate an MSI (which isn't a GSI) to as specific > local APIC. First part is right, second part not. We only need special APIs on x86 for the legacy PCI assignment mess. For MSI delivery, we now have KVM_SIGNAL_MSI, and - as indicated above - that bypasses any routing. Only if your MSI event is delivered to the kernel via a virtual IRQ line (means irqfd today), you need a routing table entry. > > We have no such need at all. Our MSIs are decoded by the PCI host bridge > and directly turned into "normal" interrupt. In fact, in HW, our bridges > contain a special source controller that *is* essentially the thing that > gets hit by MSIs. > > So our MSIs are just normal interrupts in the global space. Their > numbers are assigned by qemu, the kernel never knows about them. When an > emulated device triggers an MSI that turns into a normal "trigger global > interrupt X" ioctl to the kernel. The only "knowledge" the kernel > emulation gets along the way is an argument to the ioctl that indicates > whether this is a level set, level reset, or edge type action (MSIs are > edge obviously) which dictates how the delivery state machine will work > (one shot vs. continuous until cleared). > > So qemu assigns interrupt numbers to MSIs and there's never any routing > to establish at the kernel level. That also means that the current API > that has tendrils all the way into devices in qemu for "getting the virq > for a given MSI" is totally unsuitable for us. Just like it is for x86 when the MSI event is generated in userspace - and that's why we avoid it in that case. > In fact we don't need a > different API for KVM vs. full emulation. Everything in qemu side is the > same, until the qirq gets actually delivered in which case with KVM > we'll shoot an ioctl rather than emulating the source controller. > > So the only APIs we need as these: > > - Create the IRQ chips themselves > - Shoot an interrupt > - Save and Restore of individual source state for migration > > (The content of the state is very specific to a given IRQ chip > implementation). > > I fail to see how we can shoe horn any of that in generic code, it > doesn't fit the model you currently have at all and making it do so > would add bloat and complexity without any benefit. IE. We wouldn't > "share" code, we would "add" code not otherwise useful. I agree that you have no logical need for IRQ routing. Still, if you want to use irqfd without reimplementing it for pseries, you will have to populate a GSI routing table. That's because irqfd injects a "GSI" (via kvm_set_irq), and that is dispatched according to the routing table. So you need to define for the GSI number of an irqfd which pseries global IRQ number should be generated. There is a bit of refactoring needed in irq_comm.c to pull out remaining x86 bits and place some arch callbacks, e.g. in setup_routing_entry so that you can hook up your specific IRQ handler. Options for configuring IRQ chips from userspace were discussed already. > > The ARM situation might be different (and the powerpc situation for > other platforms such as mac99 and embedded) in that there might be some > value in having that GSI -> PIC input mapping, but here too I tend to > doubt it. We are probably better off starting with a cleaner slate > without the gross x86 baggage and use a unified flat number space in the > kernel, leaving all the complication of who is connected to whome to > qemu. Because the specialties of pseries have not much use for IRQ routing, you shouldn't derive that it is useless for all other archs. The interface we have are generic enough, and there is no excuse to ignore them without actually having tried them on ARM or embedded Power. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 259 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 22:03 ` Benjamin Herrenschmidt 2012-10-27 8:06 ` Jan Kiszka @ 2012-10-27 10:01 ` Peter Maydell 2012-10-28 22:19 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 59+ messages in thread From: Peter Maydell @ 2012-10-27 10:01 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On 26 October 2012 23:03, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > So the GSI bit. We can assume that GSIs in that context are basically > our "global interrupt number". This would apply to pretty much every > platform indeed. > > The routing here, if I understand things correctly, consists of > associating such a global interrupt number with a specific input pin (or > virtual pin) of a specific source controller (ie, IO APIC). > > This would generally make sense in embedded space as well I suppose, > where you can have multiple or even cascaded interrupt controllers of > different breeds etc... > > However, in the pseries system, this routing is essentially encoded in > the interrupt number itself. As I think I explained earlier, the number > is arbitrarily split in two parts, the top bits indicating the source > controller and the bottom bits the source within that controller. In > qemu/kvm we have made an arbitrary split (whose size I don't remember > precisely) and we currently create only one fairly big source controller > but we might change that in the future. This is more or less how ARM has done it (though our specific encoding of interrupt numbers is different, obviously). If I were designing an interface for this kind of thing from scratch I'd probably want it to look like "create a specific irqchip and give me some kind of handle to it" and then have an interface for "assert interrupt line X on that irqchip". Lacking that, a plausible encoding scheme on the global interrupt numbers works OK given that you know there aren't going to be that many irqchips in practice... -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-27 10:01 ` Peter Maydell @ 2012-10-28 22:19 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-28 22:19 UTC (permalink / raw) To: Peter Maydell Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc On Sat, 2012-10-27 at 11:01 +0100, Peter Maydell wrote: > This is more or less how ARM has done it (though our specific encoding > of interrupt numbers is different, obviously). > > If I were designing an interface for this kind of thing from scratch > I'd probably want it to look like "create a specific irqchip and give > me some kind of handle to it" and then have an interface for "assert > interrupt line X on that irqchip". Lacking that, a plausible encoding > scheme on the global interrupt numbers works OK given that you know > there aren't going to be that many irqchips in practice... Agreed. It feels like the whole business with routing is not useful for either of us and just adds confusion... Note: I'll be in KVM forum next week (hopefully) so if necessary (and if you guys are there too) we can have a "live" discussion. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 9:58 ` Paolo Bonzini 2012-10-26 10:09 ` Peter Maydell @ 2012-10-26 10:37 ` Benjamin Herrenschmidt 2012-10-26 10:40 ` Paolo Bonzini 1 sibling, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 10:37 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell On Fri, 2012-10-26 at 11:58 +0200, Paolo Bonzini wrote: > Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto: > >> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the > >> > IOAPICs/source controllers (Paul's proposal at > >> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674 > >> > for example), assign chip ids to them, set the number of input lines, > >> > etc. but the configuration should work well with the existing ioctls, > >> > with no limit on the number of sources. > > But what do you mean by "configuration" really ? I don't see anything in > > common there. > > Wiring which MSI-X interrupts go to which source controllers. If you > have one source controller per PCI bridge, you need to tell the kernel > the mapping between MSI messages interrupts and PCI bridges, and update > it whenever the MSI masking changes. Not sure I get it. Are you talking in the context of PCI pass-through ? Each PCI bridge on POWER has its own set of MSIs though for emulated bridges it's a non-issue, it's all dealt with by qemu, so I'm not sure what you mean here. > The other problem is configuring the redirection table. If you need >64 > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. Well, all of that is totally specific to the IO-APIC design & limitations as far as I can tell. What is the "redirection table" ? Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:37 ` Benjamin Herrenschmidt @ 2012-10-26 10:40 ` Paolo Bonzini 2012-10-26 10:47 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 59+ messages in thread From: Paolo Bonzini @ 2012-10-26 10:40 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto: >> > Wiring which MSI-X interrupts go to which source controllers. If you >> > have one source controller per PCI bridge, you need to tell the kernel >> > the mapping between MSI messages interrupts and PCI bridges, and update >> > it whenever the MSI masking changes. > Not sure I get it. Are you talking in the context of PCI pass-through ? Not just that, it's also for emulated devices that do MSI-X (virtio-pci does). > > The other problem is configuring the redirection table. If you need >64 > > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. > Well, all of that is totally specific to the IO-APIC design & > limitations as far as I can tell. What is the "redirection table" ? The wiring between source and presentation controllers, roughly. I suppose that's what Paul referred to when he said there's 64 bits of config info per source in the source controllers. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:40 ` Paolo Bonzini @ 2012-10-26 10:47 ` Benjamin Herrenschmidt 2012-10-26 11:47 ` Paolo Bonzini 0 siblings, 1 reply; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-26 10:47 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell On Fri, 2012-10-26 at 12:40 +0200, Paolo Bonzini wrote: > Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto: > >> > Wiring which MSI-X interrupts go to which source controllers. If you > >> > have one source controller per PCI bridge, you need to tell the kernel > >> > the mapping between MSI messages interrupts and PCI bridges, and update > >> > it whenever the MSI masking changes. > > Not sure I get it. Are you talking in the context of PCI pass-through ? > > Not just that, it's also for emulated devices that do MSI-X (virtio-pci > does). Then I really don't get it. > > > The other problem is configuring the redirection table. If you need >64 > > > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG. > > Well, all of that is totally specific to the IO-APIC design & > > limitations as far as I can tell. What is the "redirection table" ? > > The wiring between source and presentation controllers, roughly. I > suppose that's what Paul referred to when he said there's 64 bits of > config info per source in the source controllers. But that's the point. We don't have such "wiring". The interrupt number space is global. In HW it's via special messages in the fabric. The firmware configures the various source controllers at boot time by assigning them a BUID which basically comprises the top bits of the interrupt number. Or do you mean the routing configured by the user ? IE. Affinity ? If yes, then that's indeed what the 64-bit per source is. Each interrupt source has some state including the configured target presentation controller (plus associated link info for distributed interrupts), a priority setting, and some internal state bits that need to be preserved in the case of migration. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-26 10:47 ` Benjamin Herrenschmidt @ 2012-10-26 11:47 ` Paolo Bonzini 0 siblings, 0 replies; 59+ messages in thread From: Paolo Bonzini @ 2012-10-26 11:47 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, kvm-ppc, Peter Maydell [snipping some parts that Jan answered about already] Il 26/10/2012 12:47, Benjamin Herrenschmidt ha scritto: > Or do you mean the routing configured by the user ? IE. Affinity ? If > yes, then that's indeed what the 64-bit per source is. Each interrupt > source has some state including the configured target presentation > controller (plus associated link info for distributed interrupts), a > priority setting, and some internal state bits that need to be preserved > in the case of migration. Yes, that's pretty much the contents of the IOAPIC redirection table. x86 has more stuff such as the polarity (low/high), masking, triggering mode (edge/level), etc., but the main thing is the destination and vector. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses 2012-10-25 16:14 ` Paolo Bonzini 2012-10-25 16:32 ` Jan Kiszka @ 2012-10-25 19:39 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 59+ messages in thread From: Benjamin Herrenschmidt @ 2012-10-25 19:39 UTC (permalink / raw) To: Paolo Bonzini Cc: Paul Mackerras, Jan Kiszka, Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell On Thu, 2012-10-25 at 18:14 +0200, Paolo Bonzini wrote: > As Jan said, there's more than a few similarities between the x86 and > PPC model. > > Taking inspiration from the x86 API, configuring the source controller > seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric > name, I must admit). Very x86 centric name, very x86 centric functionality as well with a very x86 centric implementation. I don't know what makes you think there's anything remotely re-usable but I looked at it a while back and there isn't that I can find. The "configuration" of the source controller consists of creating it with a number of sources and a BUID, that's about it. Then there's also APIs to extract/set its state for migration. Cheers, Ben. ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2012-10-28 22:19 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-14 0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall 2012-10-17 20:21 ` Peter Maydell 2012-10-17 20:23 ` Christoffer Dall 2012-10-17 20:31 ` Peter Maydell 2012-10-17 20:39 ` Christoffer Dall 2012-10-18 12:20 ` Avi Kivity 2012-10-19 18:42 ` Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall 2012-10-17 20:29 ` Peter Maydell 2012-10-19 18:46 ` Christoffer Dall 2012-10-19 20:24 ` Peter Maydell 2012-10-19 20:27 ` Christoffer Dall 2012-10-19 20:33 ` Christoffer Dall 2012-10-14 0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall 2012-10-18 11:15 ` Peter Maydell 2012-10-17 20:38 ` [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Alexander Graf 2012-10-17 20:39 ` Christoffer Dall 2012-10-17 21:19 ` Benjamin Herrenschmidt 2012-10-17 22:10 ` Paul Mackerras 2012-10-17 23:58 ` Benjamin Herrenschmidt 2012-10-18 13:48 ` Christoffer Dall 2012-10-18 13:49 ` Alexander Graf 2012-10-18 15:25 ` Avi Kivity 2012-10-23 10:48 ` Jan Kiszka 2012-10-23 10:52 ` Peter Maydell 2012-10-23 11:00 ` Jan Kiszka 2012-10-23 11:04 ` Peter Maydell 2012-10-23 11:08 ` Jan Kiszka 2012-10-24 0:50 ` Paul Mackerras 2012-10-25 11:57 ` Jan Kiszka 2012-10-25 16:14 ` Paolo Bonzini 2012-10-25 16:32 ` Jan Kiszka 2012-10-25 18:27 ` Paolo Bonzini 2012-10-25 19:40 ` Benjamin Herrenschmidt 2012-10-26 9:58 ` Paolo Bonzini 2012-10-26 10:09 ` Peter Maydell 2012-10-26 10:15 ` Paolo Bonzini 2012-10-26 10:22 ` Jan Kiszka 2012-10-26 10:44 ` Benjamin Herrenschmidt 2012-10-26 11:00 ` Jan Kiszka 2012-10-26 11:09 ` Benjamin Herrenschmidt 2012-10-26 11:57 ` Paolo Bonzini 2012-10-26 12:08 ` Peter Maydell 2012-10-26 12:41 ` Jan Kiszka 2012-10-26 20:21 ` Benjamin Herrenschmidt 2012-10-26 11:17 ` Peter Maydell 2012-10-26 11:39 ` Benjamin Herrenschmidt 2012-10-26 12:39 ` Jan Kiszka 2012-10-26 20:45 ` Benjamin Herrenschmidt 2012-10-26 22:03 ` Benjamin Herrenschmidt 2012-10-27 8:06 ` Jan Kiszka 2012-10-27 10:01 ` Peter Maydell 2012-10-28 22:19 ` Benjamin Herrenschmidt 2012-10-26 10:37 ` Benjamin Herrenschmidt 2012-10-26 10:40 ` Paolo Bonzini 2012-10-26 10:47 ` Benjamin Herrenschmidt 2012-10-26 11:47 ` Paolo Bonzini 2012-10-25 19:39 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).