* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
@ 2016-04-20 14:08 Ashok Kumar
2016-04-20 14:38 ` Mark Rutland
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Ashok Kumar @ 2016-04-20 14:08 UTC (permalink / raw)
To: linux-arm-kernel
For guests with NUMA configuration, Node ID needs to
be recorded in the respective affinity byte of MPIDR_EL1.
Cache the MPIDR_EL1 programmed by userspace and use it for
subsequent reset_mpidr calls.
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/sys_regs.c | 44 ++++++++++++++++++++++++++++-----------
2 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f5c6bd2..1fc723d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -197,6 +197,7 @@ struct kvm_vcpu_arch {
/* HYP configuration */
u64 hcr_el2;
u32 mdcr_el2;
+ u64 vmpidr_el2;
/* Exception Information */
struct kvm_vcpu_fault_info fault;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7bbe3ff..468f251 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -424,21 +424,29 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
vcpu_sys_reg(vcpu, AMAIR_EL1) = amair;
}
+static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ const struct kvm_one_reg *reg, void __user *uaddr);
+
static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
{
u64 mpidr;
- /*
- * Map the vcpu_id into the first three affinity level fields of
- * the MPIDR. We limit the number of VCPUs in level 0 due to a
- * limitation to 16 CPUs in that level in the ICC_SGIxR registers
- * of the GICv3 to be able to address each CPU directly when
- * sending IPIs.
- */
- mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
- mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
- mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
- vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
+ if (!vcpu->arch.vmpidr_el2) {
+ /*
+ * Map the vcpu_id into the first three affinity level fields of
+ * the MPIDR. We limit the number of VCPUs in level 0 due to a
+ * limitation to 16 CPUs in that level in the ICC_SGIxR registers
+ * of the GICv3 to be able to address each CPU directly when
+ * sending IPIs.
+ */
+ mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+ mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+ mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+ vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
+ } else {
+ /* use the userspace configured value */
+ vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
+ }
}
static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -902,7 +910,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* MPIDR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0000), CRm(0b0000), Op2(0b101),
- NULL, reset_mpidr, MPIDR_EL1 },
+ NULL, reset_mpidr, MPIDR_EL1, 0, NULL, set_mpidr },
/* SCTLR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
@@ -2034,6 +2042,18 @@ static int demux_c15_set(u64 id, void __user *uaddr)
}
}
+static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ const struct kvm_one_reg *reg, void __user *uaddr)
+{
+ int ret;
+
+ ret = reg_from_user(&vcpu_sys_reg(vcpu, rd->reg), uaddr, reg->id);
+ if (!ret)
+ vcpu->arch.vmpidr_el2 = vcpu_sys_reg(vcpu, rd->reg);
+
+ return ret;
+}
+
int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
const struct sys_reg_desc *r;
--
2.1.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-20 14:08 [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1 Ashok Kumar
@ 2016-04-20 14:38 ` Mark Rutland
2016-04-20 14:39 ` Andrew Jones
2016-04-20 17:33 ` Marc Zyngier
2 siblings, 0 replies; 8+ messages in thread
From: Mark Rutland @ 2016-04-20 14:38 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 20, 2016 at 07:08:39AM -0700, Ashok Kumar wrote:
> For guests with NUMA configuration, Node ID needs to
> be recorded in the respective affinity byte of MPIDR_EL1.
The MPIDR.Aff* fields are effectively arbitrary, and do not encode NUMA
topology information. They may describe some level of closeness of CPUs
relative to each other, but that is independent of the memory hierarchy,
and even then is unfortunately not all that informative.
NUMA topology information has to come from FW. With ACPI that's
SRAT+SLIT, and with DT that's something like [1].
There may be reasons to want to configure the MPIDR_EL1.Aff* fields, but
this is not required for NUMA, nor is it a good idea for an OS to assume
NUMA topology from the MPIDR_EL1.Aff* fields.
Thanks,
Mark.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/420900.html
>
> Cache the MPIDR_EL1 programmed by userspace and use it for
> subsequent reset_mpidr calls.
>
> Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/sys_regs.c | 44 ++++++++++++++++++++++++++++-----------
> 2 files changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f5c6bd2..1fc723d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -197,6 +197,7 @@ struct kvm_vcpu_arch {
> /* HYP configuration */
> u64 hcr_el2;
> u32 mdcr_el2;
> + u64 vmpidr_el2;
>
> /* Exception Information */
> struct kvm_vcpu_fault_info fault;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7bbe3ff..468f251 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -424,21 +424,29 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> vcpu_sys_reg(vcpu, AMAIR_EL1) = amair;
> }
>
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + const struct kvm_one_reg *reg, void __user *uaddr);
> +
> static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> {
> u64 mpidr;
>
> - /*
> - * Map the vcpu_id into the first three affinity level fields of
> - * the MPIDR. We limit the number of VCPUs in level 0 due to a
> - * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> - * of the GICv3 to be able to address each CPU directly when
> - * sending IPIs.
> - */
> - mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> - mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> - mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> - vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> + if (!vcpu->arch.vmpidr_el2) {
> + /*
> + * Map the vcpu_id into the first three affinity level fields of
> + * the MPIDR. We limit the number of VCPUs in level 0 due to a
> + * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> + * of the GICv3 to be able to address each CPU directly when
> + * sending IPIs.
> + */
> + mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> + mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> + mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> + vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> + } else {
> + /* use the userspace configured value */
> + vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
> + }
> }
>
> static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> @@ -902,7 +910,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>
> /* MPIDR_EL1 */
> { Op0(0b11), Op1(0b000), CRn(0b0000), CRm(0b0000), Op2(0b101),
> - NULL, reset_mpidr, MPIDR_EL1 },
> + NULL, reset_mpidr, MPIDR_EL1, 0, NULL, set_mpidr },
> /* SCTLR_EL1 */
> { Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
> access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
> @@ -2034,6 +2042,18 @@ static int demux_c15_set(u64 id, void __user *uaddr)
> }
> }
>
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + const struct kvm_one_reg *reg, void __user *uaddr)
> +{
> + int ret;
> +
> + ret = reg_from_user(&vcpu_sys_reg(vcpu, rd->reg), uaddr, reg->id);
> + if (!ret)
> + vcpu->arch.vmpidr_el2 = vcpu_sys_reg(vcpu, rd->reg);
> +
> + return ret;
> +}
> +
> int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> const struct sys_reg_desc *r;
> --
> 2.1.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-20 14:08 [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1 Ashok Kumar
2016-04-20 14:38 ` Mark Rutland
@ 2016-04-20 14:39 ` Andrew Jones
2016-04-20 17:33 ` Marc Zyngier
2 siblings, 0 replies; 8+ messages in thread
From: Andrew Jones @ 2016-04-20 14:39 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 20, 2016 at 07:08:39AM -0700, Ashok Kumar wrote:
> For guests with NUMA configuration, Node ID needs to
> be recorded in the respective affinity byte of MPIDR_EL1.
>
> Cache the MPIDR_EL1 programmed by userspace and use it for
> subsequent reset_mpidr calls.
>
It shouldn't be necessary for NUMA, but it is necessary for cpu
topology, and likely a "socket" will map to a numa node in many
cases. We shouldn't count on that though, and thus we shouldn't
associate the word NUMA with MPIDR here. Let's change this to
"In order for guests to be configured with arbitrary cpu
topologies we need to allow userspace to program the MPIDR."
or some such.
> Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/sys_regs.c | 44 ++++++++++++++++++++++++++++-----------
> 2 files changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f5c6bd2..1fc723d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -197,6 +197,7 @@ struct kvm_vcpu_arch {
> /* HYP configuration */
> u64 hcr_el2;
> u32 mdcr_el2;
> + u64 vmpidr_el2;
As this state is only used if set by the user, then I wonder if we
should use a _user type of name. Or, as this cached value may have
other uses (I have one in mind), then we may want to always write
vcpu_sys_reg(vcpu, MPIDR_EL1) to it at the end of vcpu initialization.
>
> /* Exception Information */
> struct kvm_vcpu_fault_info fault;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7bbe3ff..468f251 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -424,21 +424,29 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> vcpu_sys_reg(vcpu, AMAIR_EL1) = amair;
> }
>
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + const struct kvm_one_reg *reg, void __user *uaddr);
> +
> static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> {
> u64 mpidr;
>
> - /*
> - * Map the vcpu_id into the first three affinity level fields of
> - * the MPIDR. We limit the number of VCPUs in level 0 due to a
> - * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> - * of the GICv3 to be able to address each CPU directly when
> - * sending IPIs.
> - */
> - mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> - mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> - mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> - vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> + if (!vcpu->arch.vmpidr_el2) {
OK, so here we're counting on at least bit 31 being non-zero, as all
affinity levels may be zero, per userspace's request. If we can count
on bit 31, then that's fine, otherwise we should consider using
INVALID_HWID, but that would require initializing vmpidr_el2...
> + /*
> + * Map the vcpu_id into the first three affinity level fields of
> + * the MPIDR. We limit the number of VCPUs in level 0 due to a
> + * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> + * of the GICv3 to be able to address each CPU directly when
> + * sending IPIs.
> + */
> + mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> + mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> + mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> + vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> + } else {
> + /* use the userspace configured value */
> + vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
> + }
> }
>
> static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> @@ -902,7 +910,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>
> /* MPIDR_EL1 */
> { Op0(0b11), Op1(0b000), CRn(0b0000), CRm(0b0000), Op2(0b101),
> - NULL, reset_mpidr, MPIDR_EL1 },
> + NULL, reset_mpidr, MPIDR_EL1, 0, NULL, set_mpidr },
> /* SCTLR_EL1 */
> { Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
> access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
> @@ -2034,6 +2042,18 @@ static int demux_c15_set(u64 id, void __user *uaddr)
> }
> }
>
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> + const struct kvm_one_reg *reg, void __user *uaddr)
> +{
> + int ret;
> +
> + ret = reg_from_user(&vcpu_sys_reg(vcpu, rd->reg), uaddr, reg->id);
> + if (!ret)
> + vcpu->arch.vmpidr_el2 = vcpu_sys_reg(vcpu, rd->reg);
I think we should either sanity check bit 31 is set, or just OR it in to
make sure it is. We should also either check or mask-off all bits outside
MPIDR_HWID_BITMASK.
> +
> + return ret;
> +}
> +
> int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> {
> const struct sys_reg_desc *r;
> --
> 2.1.0
>
Thanks,
drew
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-20 14:08 [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1 Ashok Kumar
2016-04-20 14:38 ` Mark Rutland
2016-04-20 14:39 ` Andrew Jones
@ 2016-04-20 17:33 ` Marc Zyngier
2016-04-21 7:04 ` Andrew Jones
2 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2016-04-20 17:33 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 20 Apr 2016 07:08:39 -0700
Ashok Kumar <ashoks@broadcom.com> wrote:
> For guests with NUMA configuration, Node ID needs to
> be recorded in the respective affinity byte of MPIDR_EL1.
As others have said before, the mapping between the NUMA hierarchy and
MPIDR_EL1 are completely arbitrary, and only the firmware description
can help the kernel in interpreting the affinity levels.
If you want any patch like this one to be considered, I'd like to see
the corresponding userspace that:
- programs the affinity into the vcpus,
- pins the vcpus to specific physical CPUs,
- exposes the corresponding firmware description (either DT or ACPI) to
the kernel.
Short of having all these elements together, there is little point in
letting userspace messing with the guest's affinity registers.
Thanks,
M.
--
Jazz is not dead. It just smells funny.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-20 17:33 ` Marc Zyngier
@ 2016-04-21 7:04 ` Andrew Jones
2016-04-21 9:25 ` Marc Zyngier
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Jones @ 2016-04-21 7:04 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 20, 2016 at 06:33:54PM +0100, Marc Zyngier wrote:
> On Wed, 20 Apr 2016 07:08:39 -0700
> Ashok Kumar <ashoks@broadcom.com> wrote:
>
> > For guests with NUMA configuration, Node ID needs to
> > be recorded in the respective affinity byte of MPIDR_EL1.
>
> As others have said before, the mapping between the NUMA hierarchy and
> MPIDR_EL1 are completely arbitrary, and only the firmware description
> can help the kernel in interpreting the affinity levels.
>
> If you want any patch like this one to be considered, I'd like to see
> the corresponding userspace that:
>
> - programs the affinity into the vcpus,
I have a start on this for QEMU that I can dust off and send as an RFC
soon.
> - pins the vcpus to specific physical CPUs,
This wouldn't be part of the userspace directly interacting with KVM,
but rather a higher level (even higher than libvirt, e.g.
openstack/ovirt). I also don't think we should need to worry about
which/how the phyiscal cpus get chosen. Let's assume that entity
knows how to best map the guest's virtual topology to a physical one.
> - exposes the corresponding firmware description (either DT or ACPI) to
> the kernel.
The QEMU patches I've started on already generate the DT (the cpu-map
node). I started looking into how to do it for ACPI too, but there
were some questions about whether or not the topology description
tables added to the 6.1 spec were sufficient. I can send the DT part
soon, and continue to look into the ACPI part later though.
Thanks,
drew
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-21 7:04 ` Andrew Jones
@ 2016-04-21 9:25 ` Marc Zyngier
2016-04-21 9:56 ` Andrew Jones
0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2016-04-21 9:25 UTC (permalink / raw)
To: linux-arm-kernel
Hey Andrew,
On 21/04/16 08:04, Andrew Jones wrote:
> On Wed, Apr 20, 2016 at 06:33:54PM +0100, Marc Zyngier wrote:
>> On Wed, 20 Apr 2016 07:08:39 -0700
>> Ashok Kumar <ashoks@broadcom.com> wrote:
>>
>>> For guests with NUMA configuration, Node ID needs to
>>> be recorded in the respective affinity byte of MPIDR_EL1.
>>
>> As others have said before, the mapping between the NUMA hierarchy and
>> MPIDR_EL1 are completely arbitrary, and only the firmware description
>> can help the kernel in interpreting the affinity levels.
>>
>> If you want any patch like this one to be considered, I'd like to see
>> the corresponding userspace that:
>>
>> - programs the affinity into the vcpus,
>
> I have a start on this for QEMU that I can dust off and send as an RFC
> soon.
>
>> - pins the vcpus to specific physical CPUs,
>
> This wouldn't be part of the userspace directly interacting with KVM,
> but rather a higher level (even higher than libvirt, e.g.
> openstack/ovirt). I also don't think we should need to worry about
> which/how the phyiscal cpus get chosen. Let's assume that entity
> knows how to best map the guest's virtual topology to a physical one.
Surely the platform emulation userspace has to implement the pinning
itself, because I can't see high level tools being involved in the
creation of the vcpu threads themselves.
Also, I'd like to have a "simple" tool to test this without having to
deploy openstack (the day this becomes mandatory for kernel development,
I'll move my carrier to something more... agricultural).
So something in QEMU would be really good...
>
>> - exposes the corresponding firmware description (either DT or ACPI) to
>> the kernel.
>
> The QEMU patches I've started on already generate the DT (the cpu-map
> node). I started looking into how to do it for ACPI too, but there
> were some questions about whether or not the topology description
> tables added to the 6.1 spec were sufficient. I can send the DT part
> soon, and continue to look into the ACPI part later though.
That'd be great. Can you please sync with Ashok so that we have
something consistent between the two of you?
Thanks!
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-21 9:25 ` Marc Zyngier
@ 2016-04-21 9:56 ` Andrew Jones
2016-04-21 11:46 ` Ashok Kumar
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Jones @ 2016-04-21 9:56 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Apr 21, 2016 at 10:25:24AM +0100, Marc Zyngier wrote:
> Hey Andrew,
>
> On 21/04/16 08:04, Andrew Jones wrote:
> > On Wed, Apr 20, 2016 at 06:33:54PM +0100, Marc Zyngier wrote:
> >> On Wed, 20 Apr 2016 07:08:39 -0700
> >> Ashok Kumar <ashoks@broadcom.com> wrote:
> >>
> >>> For guests with NUMA configuration, Node ID needs to
> >>> be recorded in the respective affinity byte of MPIDR_EL1.
> >>
> >> As others have said before, the mapping between the NUMA hierarchy and
> >> MPIDR_EL1 are completely arbitrary, and only the firmware description
> >> can help the kernel in interpreting the affinity levels.
> >>
> >> If you want any patch like this one to be considered, I'd like to see
> >> the corresponding userspace that:
> >>
> >> - programs the affinity into the vcpus,
> >
> > I have a start on this for QEMU that I can dust off and send as an RFC
> > soon.
> >
> >> - pins the vcpus to specific physical CPUs,
> >
> > This wouldn't be part of the userspace directly interacting with KVM,
> > but rather a higher level (even higher than libvirt, e.g.
> > openstack/ovirt). I also don't think we should need to worry about
> > which/how the phyiscal cpus get chosen. Let's assume that entity
> > knows how to best map the guest's virtual topology to a physical one.
>
> Surely the platform emulation userspace has to implement the pinning
> itself, because I can't see high level tools being involved in the
> creation of the vcpu threads themselves.
The pinning comes after the threads are created, but before they are
run. The virtual topology created for a guest may or may not map well
to the physical topology of a given host. That's not the problem of
the emulation though. That's a problem of a high level application
trying to fit it.
>
> Also, I'd like to have a "simple" tool to test this without having to
> deploy openstack (the day this becomes mandatory for kernel development,
> I'll move my carrier to something more... agricultural).
>
> So something in QEMU would be really good...
>
To test the virtual topology only requires booting a guest, whether
the vcpus are pinned or not. To test that it was worth the effort to
create a virtual topology does require the pinning, and the perf
measuring. However we still don't need the pinning in QEMU. We can
start a guest paused, run a script that does a handful of tasksets,
and then resumes the guest. Or, just use libvirt, which allows one
to save vcpu affinities, and thus on guest launch it will automatically
do the affinity setting for you.
> >
> >> - exposes the corresponding firmware description (either DT or ACPI) to
> >> the kernel.
> >
> > The QEMU patches I've started on already generate the DT (the cpu-map
> > node). I started looking into how to do it for ACPI too, but there
> > were some questions about whether or not the topology description
> > tables added to the 6.1 spec were sufficient. I can send the DT part
> > soon, and continue to look into the ACPI part later though.
>
> That'd be great. Can you please sync with Ashok so that we have
> something consistent between the two of you?
Yup. I'm hoping Ashok will chime in to share any userspace status
they have.
Thanks,
drew
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1
2016-04-21 9:56 ` Andrew Jones
@ 2016-04-21 11:46 ` Ashok Kumar
0 siblings, 0 replies; 8+ messages in thread
From: Ashok Kumar @ 2016-04-21 11:46 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Apr 21, 2016 at 11:56:03AM +0200, Andrew Jones wrote:
> On Thu, Apr 21, 2016 at 10:25:24AM +0100, Marc Zyngier wrote:
> > Hey Andrew,
> >
> > On 21/04/16 08:04, Andrew Jones wrote:
> > > On Wed, Apr 20, 2016 at 06:33:54PM +0100, Marc Zyngier wrote:
> > >> On Wed, 20 Apr 2016 07:08:39 -0700
> > >> Ashok Kumar <ashoks@broadcom.com> wrote:
> > >>
> > >>> For guests with NUMA configuration, Node ID needs to
> > >>> be recorded in the respective affinity byte of MPIDR_EL1.
> > >>
> > >> As others have said before, the mapping between the NUMA hierarchy and
> > >> MPIDR_EL1 are completely arbitrary, and only the firmware description
> > >> can help the kernel in interpreting the affinity levels.
> > >>
> > >> If you want any patch like this one to be considered, I'd like to see
> > >> the corresponding userspace that:
> > >>
> > >> - programs the affinity into the vcpus,
> > >
> > > I have a start on this for QEMU that I can dust off and send as an RFC
> > > soon.
> > >
> > >> - pins the vcpus to specific physical CPUs,
> > >
> > > This wouldn't be part of the userspace directly interacting with KVM,
> > > but rather a higher level (even higher than libvirt, e.g.
> > > openstack/ovirt). I also don't think we should need to worry about
> > > which/how the phyiscal cpus get chosen. Let's assume that entity
> > > knows how to best map the guest's virtual topology to a physical one.
> >
> > Surely the platform emulation userspace has to implement the pinning
> > itself, because I can't see high level tools being involved in the
> > creation of the vcpu threads themselves.
>
> The pinning comes after the threads are created, but before they are
> run. The virtual topology created for a guest may or may not map well
> to the physical topology of a given host. That's not the problem of
> the emulation though. That's a problem of a high level application
> trying to fit it.
>
> >
> > Also, I'd like to have a "simple" tool to test this without having to
> > deploy openstack (the day this becomes mandatory for kernel development,
> > I'll move my carrier to something more... agricultural).
> >
> > So something in QEMU would be really good...
> >
>
> To test the virtual topology only requires booting a guest, whether
> the vcpus are pinned or not. To test that it was worth the effort to
> create a virtual topology does require the pinning, and the perf
> measuring. However we still don't need the pinning in QEMU. We can
> start a guest paused, run a script that does a handful of tasksets,
> and then resumes the guest. Or, just use libvirt, which allows one
> to save vcpu affinities, and thus on guest launch it will automatically
> do the affinity setting for you.
>
> > >
> > >> - exposes the corresponding firmware description (either DT or ACPI) to
> > >> the kernel.
> > >
> > > The QEMU patches I've started on already generate the DT (the cpu-map
> > > node). I started looking into how to do it for ACPI too, but there
> > > were some questions about whether or not the topology description
> > > tables added to the 6.1 spec were sufficient. I can send the DT part
> > > soon, and continue to look into the ACPI part later though.
> >
> > That'd be great. Can you please sync with Ashok so that we have
> > something consistent between the two of you?
>
> Yup. I'm hoping Ashok will chime in to share any userspace status
> they have.
I tested it using qemu's arm numa patchset [1] and I don't have any
changes for cpu-map.
I just used qemu's thread,core,socket information from "-smp" command
line argument to populate the affinity.
I was hoping to see the reception for this patch and then post qemu
changes.
[1] https://lists.gnu.org/archive/html/qemu-arm/2016-01/msg00363.html
Thanks,
Ashok
>
> Thanks,
> drew
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-21 11:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 14:08 [RFC PATCH] arm64: KVM: Allow userspace to configure guest MPIDR_EL1 Ashok Kumar
2016-04-20 14:38 ` Mark Rutland
2016-04-20 14:39 ` Andrew Jones
2016-04-20 17:33 ` Marc Zyngier
2016-04-21 7:04 ` Andrew Jones
2016-04-21 9:25 ` Marc Zyngier
2016-04-21 9:56 ` Andrew Jones
2016-04-21 11:46 ` Ashok Kumar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).