From: Sam Bobroff <sbobroff@linux.ibm.com>
To: Paul Mackerras <paulus@ozlabs.org>
Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
kvm-ppc@vger.kernel.org, david@gibson.dropbear.id.au,
clg@kaod.org
Subject: Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
Date: Wed, 25 Jul 2018 15:26:04 +1000 [thread overview]
Message-ID: <20180725052603.GA4799@tungsten.ozlabs.ibm.com> (raw)
In-Reply-To: <20180723054337.GA29207@fergus>
[-- Attachment #1: Type: text/plain, Size: 5175 bytes --]
On Mon, Jul 23, 2018 at 03:43:37PM +1000, Paul Mackerras wrote:
> On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> > From: Sam Bobroff <sam.bobroff@au1.ibm.com>
> >
> > It is not currently possible to create the full number of possible
> > VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> > threads per core than it's core stride (or "VSMT mode"). This is
> > because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> > even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> >
> > To address this, "pack" the VCORE ID and XIVE offsets by using
> > knowledge of the way the VCPU IDs will be used when there are less
> > guest threads per core than the core stride. The primary thread of
> > each core will always be used first. Then, if the guest uses more than
> > one thread per core, these secondary threads will sequentially follow
> > the primary in each core.
> >
> > So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> > VCPUs are being spaced apart, so at least half of each core is empty
> > and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> > into the second half of each core (4..7, in an 8-thread core).
> >
> > Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> > each core is being left empty, and we can map down into the second and
> > third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> >
> > Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> > threads are being used and 7/8 of the core is empty, allowing use of
> > the 1, 3, 5 and 7 thread slots.
> >
> > (Strides less than 8 are handled similarly.)
> >
> > This allows the VCORE ID or offset to be calculated quickly from the
> > VCPU ID or XIVE server numbers, without access to the VCPU structure.
> >
> > Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
>
> I have some comments relating to the situation where the stride
> (i.e. kvm->arch.emul_smt_mode) is less than 8; see below.
>
> [snip]
> > +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> > +{
> > + const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};
>
> This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
> from what you have) for the case when stride == 4 and block == 3. In
> that case we need block_offsets[block] to be 3; if it is 5, then we
> will collide with the case where block == 2 for the next virtual core.
Agh! Yes it does.
> > + int stride = kvm->arch.emul_smt_mode;
> > + int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> > + u32 packed_id;
> > +
> > + BUG_ON(block >= MAX_SMT_THREADS);
> > + packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> > + BUG_ON(packed_id >= KVM_MAX_VCPUS);
> > + return packed_id;
> > +}
> > +
> > #endif /* __ASM_KVM_BOOK3S_H__ */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index de686b340f4a..363c2fb0d89e 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
> > return threads_per_subcore;
> > }
> >
> > -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> > +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
> > {
> > struct kvmppc_vcore *vcore;
> >
> > @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> > init_swait_queue_head(&vcore->wq);
> > vcore->preempt_tb = TB_NIL;
> > vcore->lpcr = kvm->arch.lpcr;
> > - vcore->first_vcpuid = core * kvm->arch.smt_mode;
> > + vcore->first_vcpuid = id;
> > vcore->kvm = kvm;
> > INIT_LIST_HEAD(&vcore->preempt_list);
> >
> > @@ -2048,12 +2048,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
> > mutex_lock(&kvm->lock);
> > vcore = NULL;
> > err = -EINVAL;
> > - core = id / kvm->arch.smt_mode;
> > + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > + BUG_ON(kvm->arch.smt_mode != 1);
> > + core = kvmppc_pack_vcpu_id(kvm, id);
>
> We now have a way for userspace to trigger a BUG_ON, as far as I can
> see. The only check on id up to this point is that it is less than
> KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
> can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
> giving an id that is greater than or equal to KVM_MAX_VCPUS *
> kvm->arch.emul_smt+mode.
>
> > + } else {
> > + core = id / kvm->arch.smt_mode;
> > + }
> > if (core < KVM_MAX_VCORES) {
> > vcore = kvm->arch.vcores[core];
> > + BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);
>
> Doesn't this just mean that userspace has chosen an id big enough to
> cause a collision in the output space of kvmppc_pack_vcpu_id()? How
> is this not user-triggerable?
>
> Paul.
Yep, good point. Particularly when dealing with a malicious userspace
that won't follow QEMU's allocation pattern.
I'll re-work it and re-post. I'll discuss the changes in the next
version.
Thanks for the review!
Sam.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2018-07-25 5:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-19 2:25 [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space Sam Bobroff
2018-07-19 5:28 ` David Gibson
2018-07-23 5:43 ` Paul Mackerras
2018-07-25 5:26 ` Sam Bobroff [this message]
2018-07-24 9:07 ` Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180725052603.GA4799@tungsten.ozlabs.ibm.com \
--to=sbobroff@linux.ibm.com \
--cc=clg@kaod.org \
--cc=david@gibson.dropbear.id.au \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).