Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paul Mackerras <paulus@ozlabs.org>
To: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: clg@kaod.org, linuxppc-dev@lists.ozlabs.org,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	david@gibson.dropbear.id.au
Subject: Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
Date: Mon, 23 Jul 2018 05:43:37 +0000	[thread overview]
Message-ID: <20180723054337.GA29207@fergus> (raw)
In-Reply-To: <1fb3aea5f44f1029866ee10db40abde7e18b24ad.1531967105.git.sbobroff@linux.ibm.com>

On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> From: Sam Bobroff <sam.bobroff@au1.ibm.com>
> 
> It is not currently possible to create the full number of possible
> VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> threads per core than it's core stride (or "VSMT mode"). This is
> because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> 
> To address this, "pack" the VCORE ID and XIVE offsets by using
> knowledge of the way the VCPU IDs will be used when there are less
> guest threads per core than the core stride. The primary thread of
> each core will always be used first. Then, if the guest uses more than
> one thread per core, these secondary threads will sequentially follow
> the primary in each core.
> 
> So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> VCPUs are being spaced apart, so at least half of each core is empty
> and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> into the second half of each core (4..7, in an 8-thread core).
> 
> Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> each core is being left empty, and we can map down into the second and
> third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> 
> Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> threads are being used and 7/8 of the core is empty, allowing use of
> the 1, 3, 5 and 7 thread slots.
> 
> (Strides less than 8 are handled similarly.)
> 
> This allows the VCORE ID or offset to be calculated quickly from the
> VCPU ID or XIVE server numbers, without access to the VCPU structure.
> 
> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>

I have some comments relating to the situation where the stride
(i.e. kvm->arch.emul_smt_mode) is less than 8; see below.

[snip]
> +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> +{
> +	const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};

This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
from what you have) for the case when stride = 4 and block = 3.  In
that case we need block_offsets[block] to be 3; if it is 5, then we
will collide with the case where block = 2 for the next virtual core.

> +	int stride = kvm->arch.emul_smt_mode;
> +	int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> +	u32 packed_id;
> +
> +	BUG_ON(block >= MAX_SMT_THREADS);
> +	packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> +	BUG_ON(packed_id >= KVM_MAX_VCPUS);
> +	return packed_id;
> +}
> +
>  #endif /* __ASM_KVM_BOOK3S_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index de686b340f4a..363c2fb0d89e 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
>  	return threads_per_subcore;
>  }
>  
> -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
>  {
>  	struct kvmppc_vcore *vcore;
>  
> @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
>  	init_swait_queue_head(&vcore->wq);
>  	vcore->preempt_tb = TB_NIL;
>  	vcore->lpcr = kvm->arch.lpcr;
> -	vcore->first_vcpuid = core * kvm->arch.smt_mode;
> +	vcore->first_vcpuid = id;
>  	vcore->kvm = kvm;
>  	INIT_LIST_HEAD(&vcore->preempt_list);
>  
> @@ -2048,12 +2048,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	mutex_lock(&kvm->lock);
>  	vcore = NULL;
>  	err = -EINVAL;
> -	core = id / kvm->arch.smt_mode;
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		BUG_ON(kvm->arch.smt_mode != 1);
> +		core = kvmppc_pack_vcpu_id(kvm, id);

We now have a way for userspace to trigger a BUG_ON, as far as I can
see.  The only check on id up to this point is that it is less than
KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
giving an id that is greater than or equal to KVM_MAX_VCPUS *
kvm->arch.emul_smt+mode.

> +	} else {
> +		core = id / kvm->arch.smt_mode;
> +	}
>  	if (core < KVM_MAX_VCORES) {
>  		vcore = kvm->arch.vcores[core];
> +		BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);

Doesn't this just mean that userspace has chosen an id big enough to
cause a collision in the output space of kvmppc_pack_vcpu_id()?  How
is this not user-triggerable?

Paul.

WARNING: multiple messages have this Message-ID (diff)

From: Paul Mackerras <paulus@ozlabs.org>
To: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org, david@gibson.dropbear.id.au,
	clg@kaod.org
Subject: Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
Date: Mon, 23 Jul 2018 15:43:37 +1000	[thread overview]
Message-ID: <20180723054337.GA29207@fergus> (raw)
In-Reply-To: <1fb3aea5f44f1029866ee10db40abde7e18b24ad.1531967105.git.sbobroff@linux.ibm.com>

On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> From: Sam Bobroff <sam.bobroff@au1.ibm.com>
> 
> It is not currently possible to create the full number of possible
> VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> threads per core than it's core stride (or "VSMT mode"). This is
> because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> 
> To address this, "pack" the VCORE ID and XIVE offsets by using
> knowledge of the way the VCPU IDs will be used when there are less
> guest threads per core than the core stride. The primary thread of
> each core will always be used first. Then, if the guest uses more than
> one thread per core, these secondary threads will sequentially follow
> the primary in each core.
> 
> So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> VCPUs are being spaced apart, so at least half of each core is empty
> and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> into the second half of each core (4..7, in an 8-thread core).
> 
> Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> each core is being left empty, and we can map down into the second and
> third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> 
> Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> threads are being used and 7/8 of the core is empty, allowing use of
> the 1, 3, 5 and 7 thread slots.
> 
> (Strides less than 8 are handled similarly.)
> 
> This allows the VCORE ID or offset to be calculated quickly from the
> VCPU ID or XIVE server numbers, without access to the VCPU structure.
> 
> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>

I have some comments relating to the situation where the stride
(i.e. kvm->arch.emul_smt_mode) is less than 8; see below.

[snip]
> +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> +{
> +	const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};

This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
from what you have) for the case when stride == 4 and block == 3.  In
that case we need block_offsets[block] to be 3; if it is 5, then we
will collide with the case where block == 2 for the next virtual core.

> +	int stride = kvm->arch.emul_smt_mode;
> +	int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> +	u32 packed_id;
> +
> +	BUG_ON(block >= MAX_SMT_THREADS);
> +	packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> +	BUG_ON(packed_id >= KVM_MAX_VCPUS);
> +	return packed_id;
> +}
> +
>  #endif /* __ASM_KVM_BOOK3S_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index de686b340f4a..363c2fb0d89e 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
>  	return threads_per_subcore;
>  }
>  
> -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
>  {
>  	struct kvmppc_vcore *vcore;
>  
> @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
>  	init_swait_queue_head(&vcore->wq);
>  	vcore->preempt_tb = TB_NIL;
>  	vcore->lpcr = kvm->arch.lpcr;
> -	vcore->first_vcpuid = core * kvm->arch.smt_mode;
> +	vcore->first_vcpuid = id;
>  	vcore->kvm = kvm;
>  	INIT_LIST_HEAD(&vcore->preempt_list);
>  
> @@ -2048,12 +2048,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	mutex_lock(&kvm->lock);
>  	vcore = NULL;
>  	err = -EINVAL;
> -	core = id / kvm->arch.smt_mode;
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		BUG_ON(kvm->arch.smt_mode != 1);
> +		core = kvmppc_pack_vcpu_id(kvm, id);

We now have a way for userspace to trigger a BUG_ON, as far as I can
see.  The only check on id up to this point is that it is less than
KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
giving an id that is greater than or equal to KVM_MAX_VCPUS *
kvm->arch.emul_smt+mode.

> +	} else {
> +		core = id / kvm->arch.smt_mode;
> +	}
>  	if (core < KVM_MAX_VCORES) {
>  		vcore = kvm->arch.vcores[core];
> +		BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);

Doesn't this just mean that userspace has chosen an id big enough to
cause a collision in the output space of kvmppc_pack_vcpu_id()?  How
is this not user-triggerable?

Paul.

WARNING: multiple messages have this Message-ID (diff)

From: Paul Mackerras <paulus@ozlabs.org>
To: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: clg@kaod.org, linuxppc-dev@lists.ozlabs.org,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	david@gibson.dropbear.id.au
Subject: Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
Date: Mon, 23 Jul 2018 15:43:37 +1000	[thread overview]
Message-ID: <20180723054337.GA29207@fergus> (raw)
In-Reply-To: <1fb3aea5f44f1029866ee10db40abde7e18b24ad.1531967105.git.sbobroff@linux.ibm.com>

On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> From: Sam Bobroff <sam.bobroff@au1.ibm.com>
> 
> It is not currently possible to create the full number of possible
> VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> threads per core than it's core stride (or "VSMT mode"). This is
> because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> 
> To address this, "pack" the VCORE ID and XIVE offsets by using
> knowledge of the way the VCPU IDs will be used when there are less
> guest threads per core than the core stride. The primary thread of
> each core will always be used first. Then, if the guest uses more than
> one thread per core, these secondary threads will sequentially follow
> the primary in each core.
> 
> So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> VCPUs are being spaced apart, so at least half of each core is empty
> and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> into the second half of each core (4..7, in an 8-thread core).
> 
> Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> each core is being left empty, and we can map down into the second and
> third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> 
> Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> threads are being used and 7/8 of the core is empty, allowing use of
> the 1, 3, 5 and 7 thread slots.
> 
> (Strides less than 8 are handled similarly.)
> 
> This allows the VCORE ID or offset to be calculated quickly from the
> VCPU ID or XIVE server numbers, without access to the VCPU structure.
> 
> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>

I have some comments relating to the situation where the stride
(i.e. kvm->arch.emul_smt_mode) is less than 8; see below.

[snip]
> +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> +{
> +	const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};

This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
from what you have) for the case when stride == 4 and block == 3.  In
that case we need block_offsets[block] to be 3; if it is 5, then we
will collide with the case where block == 2 for the next virtual core.

> +	int stride = kvm->arch.emul_smt_mode;
> +	int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> +	u32 packed_id;
> +
> +	BUG_ON(block >= MAX_SMT_THREADS);
> +	packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> +	BUG_ON(packed_id >= KVM_MAX_VCPUS);
> +	return packed_id;
> +}
> +
>  #endif /* __ASM_KVM_BOOK3S_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index de686b340f4a..363c2fb0d89e 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
>  	return threads_per_subcore;
>  }
>  
> -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
>  {
>  	struct kvmppc_vcore *vcore;
>  
> @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
>  	init_swait_queue_head(&vcore->wq);
>  	vcore->preempt_tb = TB_NIL;
>  	vcore->lpcr = kvm->arch.lpcr;
> -	vcore->first_vcpuid = core * kvm->arch.smt_mode;
> +	vcore->first_vcpuid = id;
>  	vcore->kvm = kvm;
>  	INIT_LIST_HEAD(&vcore->preempt_list);
>  
> @@ -2048,12 +2048,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	mutex_lock(&kvm->lock);
>  	vcore = NULL;
>  	err = -EINVAL;
> -	core = id / kvm->arch.smt_mode;
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		BUG_ON(kvm->arch.smt_mode != 1);
> +		core = kvmppc_pack_vcpu_id(kvm, id);

We now have a way for userspace to trigger a BUG_ON, as far as I can
see.  The only check on id up to this point is that it is less than
KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
giving an id that is greater than or equal to KVM_MAX_VCPUS *
kvm->arch.emul_smt+mode.

> +	} else {
> +		core = id / kvm->arch.smt_mode;
> +	}
>  	if (core < KVM_MAX_VCORES) {
>  		vcore = kvm->arch.vcores[core];
> +		BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);

Doesn't this just mean that userspace has chosen an id big enough to
cause a collision in the output space of kvmppc_pack_vcpu_id()?  How
is this not user-triggerable?

Paul.

next prev parent reply	other threads:[~2018-07-23  5:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-19  2:25 [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space Sam Bobroff
2018-07-19  2:25 ` Sam Bobroff
2018-07-19  2:25 ` Sam Bobroff
2018-07-19  5:28 ` David Gibson
2018-07-19  5:28   ` David Gibson
2018-07-19  5:28   ` David Gibson
2018-07-23  5:43 ` Paul Mackerras [this message]
2018-07-23  5:43   ` Paul Mackerras
2018-07-23  5:43   ` Paul Mackerras
2018-07-25  5:26   ` Sam Bobroff
2018-07-25  5:26     ` Sam Bobroff
2018-07-25  5:26     ` Sam Bobroff
2018-07-24  9:07 ` Cédric Le Goater
2018-07-24  9:07   ` Cédric Le Goater
2018-07-24  9:07   ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180723054337.GA29207@fergus \
    --to=paulus@ozlabs.org \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.