From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ozlabs.org (ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40Pb4c3mB0zF1qT for ; Mon, 16 Apr 2018 14:29:20 +1000 (AEST) Date: Mon, 16 Apr 2018 14:09:42 +1000 From: David Gibson To: Sam Bobroff Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org, paulus@samba.org, clg@kaod.org Subject: Re: [PATCH RFC 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space Message-ID: <20180416040942.GB20551@umbus.fritz.box> References: <70974cfb62a7f09a53ec914d2909639884228244.1523516498.git.sam.bobroff@au1.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zx4FCpZtqtKETZ7O" In-Reply-To: <70974cfb62a7f09a53ec914d2909639884228244.1523516498.git.sam.bobroff@au1.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --zx4FCpZtqtKETZ7O Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 12, 2018 at 05:02:06PM +1000, Sam Bobroff wrote: > It is not currently possible to create the full number of possible > VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less > threads per core than it's core stride (or "VSMT mode"). This is > because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS > even though the VCPU ID is less than KVM_MAX_VCPU_ID. >=20 > To address this, "pack" the VCORE ID and XIVE offsets by using > knowledge of the way the VCPU IDs will be used when there are less > guest threads per core than the core stride. The primary thread of > each core will always be used first. Then, if the guest uses more than > one thread per core, these secondary threads will sequentially follow > the primary in each core. >=20 > So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the > VCPUs are being spaced apart, so at least half of each core is empty > and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped > into the second half of each core (4..7, in an 8-thread core). >=20 > Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of > each core is being left empty, and we can map down into the second and > third quarters of each core (2, 3 and 5, 6 in an 8-thread core). >=20 > Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary > threads are being used and 7/8 of the core is empty, allowing use of > the 1, 3, 5 and 7 thread slots. >=20 > (Strides less than 8 are handled similarly.) >=20 > This allows the VCORE ID or offset to be calculated quickly from the > VCPU ID or XIVE server numbers, without access to the VCPU structure. >=20 > Signed-off-by: Sam Bobroff > --- > Hello everyone, >=20 > I've tested this on P8 and P9, in lots of combinations of host and guest > threading modes and it has been fine but it does feel like a "tricky" > approach, so I still feel somewhat wary about it. >=20 > I've posted it as an RFC because I have not tested it with guest native-X= IVE, > and I suspect that it will take some work to support it. >=20 > arch/powerpc/include/asm/kvm_book3s.h | 19 +++++++++++++++++++ > arch/powerpc/kvm/book3s_hv.c | 14 ++++++++++---- > arch/powerpc/kvm/book3s_xive.c | 9 +++++++-- > 3 files changed, 36 insertions(+), 6 deletions(-) >=20 > diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include= /asm/kvm_book3s.h > index 376ae803b69c..1295056d564a 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -368,4 +368,23 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu= *vcpu); > #define SPLIT_HACK_MASK 0xff000000 > #define SPLIT_HACK_OFFS 0xfb000000 > =20 > +/* Pack a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the > + * [0..KVM_MAX_VCPUS) space, while using knowledge of the guest's core s= tride > + * (but not it's actual threading mode, which is not available) to avoid > + * collisions. > + */ > +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id) > +{ > + const int block_offsets[MAX_SMT_THREADS] =3D {0, 4, 2, 6, 1, 5, 3, 7}; I'd suggest 1,3,5,7 at the end rather than 1,5,3,7 - accomplishes roughly the same thing, but I think makes the pattern more obvious. > + int stride =3D kvm->arch.emul_smt_mode > 1 ? > + kvm->arch.emul_smt_mode : kvm->arch.smt_mode; AFAICT from BUG_ON()s etc. at the callsites, kvm->arch.smt_mode must always be 1 when this is called, so the conditional here doesn't seem useful. > + int block =3D (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride); > + u32 packed_id; > + > + BUG_ON(block >=3D MAX_SMT_THREADS); > + packed_id =3D (id % KVM_MAX_VCPUS) + block_offsets[block]; > + BUG_ON(packed_id >=3D KVM_MAX_VCPUS); > + return packed_id; > +} It took me a while to wrap my head around the packing function, but I think I got there in the end. It's pretty clever. One thing bothers me, though. This certainly packs things under KVM_MAX_VCPUS, but not necessarily under the actual number of vcpus. e.g. KVM_MAC_VCPUS=3D=3D16, 8 vcpus total, stride 8, 2 vthreads/vcore (as qemu sees it), gives both unpacked IDs (0, 1, 8, 9, 16, 17, 24, 25) and packed ids of (0, 1, 8, 9, 4, 5, 12, 13) - leaving 2, 3, 6, 7 etc. unused. So again, the question is what exactly are these remapped IDs useful for. If we're indexing into a bare array of structures of size KVM_MAX_VCPUS then we're *already* wasting a bunch of space by having more entries than vcpus. If we're indexing into something sparser, then why is the remapping worthwhile? > + > #endif /* __ASM_KVM_BOOK3S_H__ */ > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 9cb9448163c4..49165cc90051 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1762,7 +1762,7 @@ static int threads_per_vcore(struct kvm *kvm) > return threads_per_subcore; > } > =20 > -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int cor= e) > +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id) > { > struct kvmppc_vcore *vcore; > =20 > @@ -1776,7 +1776,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(str= uct kvm *kvm, int core) > init_swait_queue_head(&vcore->wq); > vcore->preempt_tb =3D TB_NIL; > vcore->lpcr =3D kvm->arch.lpcr; > - vcore->first_vcpuid =3D core * kvm->arch.smt_mode; > + vcore->first_vcpuid =3D id; > vcore->kvm =3D kvm; > INIT_LIST_HEAD(&vcore->preempt_list); > =20 > @@ -1992,12 +1992,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_h= v(struct kvm *kvm, > mutex_lock(&kvm->lock); > vcore =3D NULL; > err =3D -EINVAL; > - core =3D id / kvm->arch.smt_mode; > + if (cpu_has_feature(CPU_FTR_ARCH_300)) { > + BUG_ON(kvm->arch.smt_mode !=3D 1); > + core =3D kvmppc_pack_vcpu_id(kvm, id); > + } else { > + core =3D id / kvm->arch.smt_mode; > + } > if (core < KVM_MAX_VCORES) { > vcore =3D kvm->arch.vcores[core]; > + BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore); > if (!vcore) { > err =3D -ENOMEM; > - vcore =3D kvmppc_vcore_create(kvm, core); > + vcore =3D kvmppc_vcore_create(kvm, id & ~(kvm->arch.smt_mode - 1)); > kvm->arch.vcores[core] =3D vcore; > kvm->arch.online_vcores++; > } > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xiv= e.c > index f9818d7d3381..681dfe12a5f3 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -317,6 +317,11 @@ static int xive_select_target(struct kvm *kvm, u32 *= server, u8 prio) > return -EBUSY; > } > =20 > +static u32 xive_vp(struct kvmppc_xive *xive, u32 server) > +{ > + return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); > +} > + I'm finding the XIVE indexing really baffling. There are a bunch of other places where the code uses (xive->vp_base + NUMBER) directly. If those are host side references, I guess they don't need updates for this. But if that's the case, then how does indexing into the same array with both host and guest server numbers make sense? > static u8 xive_lock_and_mask(struct kvmppc_xive *xive, > struct kvmppc_xive_src_block *sb, > struct kvmppc_xive_irq_state *state) > @@ -1084,7 +1089,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > pr_devel("Duplicate !\n"); > return -EEXIST; > } > - if (cpu >=3D KVM_MAX_VCPUS) { > + if (cpu >=3D KVM_MAX_VCPU_ID) { > pr_devel("Out of bounds !\n"); > return -EINVAL; > } > @@ -1098,7 +1103,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > xc->xive =3D xive; > xc->vcpu =3D vcpu; > xc->server_num =3D cpu; > - xc->vp_id =3D xive->vp_base + cpu; > + xc->vp_id =3D xive_vp(xive, cpu); > xc->mfrr =3D 0xff; > xc->valid =3D true; > =20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --zx4FCpZtqtKETZ7O Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlrUIgQACgkQbDjKyiDZ s5IJWhAAxgEIeJGew4ZOG+dJez8giDohJDtC7fdEYkwMVosLKyBHI8d/tF3uFHR/ dUtQhJADHZrMomuQ2rhu1m9T9agUzh2GsAQ1GgWCDG1MM6JZR/PQSjbrF1p92ML3 iuFL0lbahElZmM/BkIKPmP2u+1ZxStC69SGv909Ke5Ulvss09Hp9LCXNE2Mj6GDI aQLeqZbAW41fsZX52YNwBPMlRJfUbgFFjMdYoAP9pOeNS8EkmyckSKXiumYmBNGU ca0ZfBUod9Def4s3RYUX3A2Oj1n9/O/Gw360o+1TcFS35o+MnqfezqUwiSsPhwO/ m3+QAern1ATjhtbnh4PUnZbN5jmUk8B8/ITm/26DVXUAnH/2Kv/nZvBK22KcHeK9 vn/pyTrfuldjJKU37czGwFSh4/D9KldtkuVwfuyFln6aEtqtnuUG8ABwY+kPhQwN KKJxHUe0nmsozabg+XvKf/VxTtwXVaqBUedU6iTRm24EuQ/bGa9nlny073R6RkFe oG0nxxpVZLuj7Z0GDVcaWe4bmrwFOGxqVpCAb5yd9HuTZo34dN7/nRJ7Wd9PAWS3 QBAwTBoALgPfimSoMC9MJs0iAcRTfjoPGwuug3FoL5cO1DTSkcDQQVzQOsShebSh hoaHUf7i+7jd7YImuPnFqIe9akwizgory5HNOfx09jPr9ZFe35o= =zZXF -----END PGP SIGNATURE----- --zx4FCpZtqtKETZ7O--