From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08029C4360F for ; Fri, 5 Apr 2019 12:54:28 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB7ED20855 for ; Fri, 5 Apr 2019 12:54:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="jmMjRASb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB7ED20855 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=kWT47i5G7mxZizzUdinUrvNtd/lfupuEL90jEPhtqbk=; b=jmMjRASbA+rjqw dmvaHIv2GfDD2FZiXjyQvWGoKtlGQQ0T8/uYvqHahoxrwGDLKaYPB4tgqO3R0fJo+MXCMjb4+EUbp +DrHSFQdVFhLNR3rDCc1jNdx8ThqNii5+tTY7XwzpA8E8Y5b/llN2akuWVZiyxmvgDTqOkowOo12M 6psx77Z7Pgcb3HlpjmwYctv4cB3wQ6kf4aw7XAl2rVIY1Kc94HUI/ylpCRljiBPD9tynfyKWGhTAl jVUgJdnj1O2G+htw+CwMWFaeFx7++dDLPNfFGjUnNKz30TOzxfSsnu+UQK77Gu13JHc0Amq+NTHfH rMY1mBgItUbToZkS0EbQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hCOMR-0004Z1-0c; Fri, 05 Apr 2019 12:54:23 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hCOMM-0004Ya-NH for linux-arm-kernel@lists.infradead.org; Fri, 05 Apr 2019 12:54:20 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 51670374; Fri, 5 Apr 2019 05:54:18 -0700 (PDT) Received: from e103592.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5FC233F557; Fri, 5 Apr 2019 05:54:16 -0700 (PDT) Date: Fri, 5 Apr 2019 13:54:13 +0100 From: Dave Martin To: Andrew Jones Subject: Re: [PATCH v7 23/27] KVM: arm64/sve: Add pseudo-register for the guest's vector lengths Message-ID: <20190405125413.GA3567@e103592.cambridge.arm.com> References: <1553864452-15080-1-git-send-email-Dave.Martin@arm.com> <1553864452-15080-24-git-send-email-Dave.Martin@arm.com> <20190404201854.7dzga3ctjj7427hk@kamzik.brq.redhat.com> <20190405093603.GM3567@e103592.cambridge.arm.com> <20190405101451.xtx7ggyomvrbrpxl@kamzik.brq.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190405101451.xtx7ggyomvrbrpxl@kamzik.brq.redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190405_055418_768097_F07A3BDA X-CRM114-Status: GOOD ( 49.67 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Okamoto Takayuki , Christoffer Dall , Ard Biesheuvel , Marc Zyngier , Catalin Marinas , Will Deacon , Zhang Lei , Julien Grall , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Apr 05, 2019 at 12:14:51PM +0200, Andrew Jones wrote: > On Fri, Apr 05, 2019 at 10:36:03AM +0100, Dave Martin wrote: > > On Thu, Apr 04, 2019 at 10:18:54PM +0200, Andrew Jones wrote: > > > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote: > > > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to > > > > allow userspace to set and query the set of vector lengths visible > > > > to the guest. > > > > > > > > In the future, multiple register slices per SVE register may be > > > > visible through the ioctl interface. Once the set of slices has > > > > been determined we would not be able to allow the vector length set > > > > to be changed any more, in order to avoid userspace seeing > > > > inconsistent sets of registers. For this reason, this patch adds > > > > support for explicit finalization of the SVE configuration via the > > > > KVM_ARM_VCPU_FINALIZE ioctl. > > > > > > > > Finalization is the proper place to allocate the SVE register state > > > > storage in vcpu->arch.sve_state, so this patch adds that as > > > > appropriate. The data is freed via kvm_arch_vcpu_uninit(), which > > > > was previously a no-op on arm64. > > > > > > > > To simplify the logic for determining what vector lengths can be > > > > supported, some code is added to KVM init to work this out, in the > > > > kvm_arm_init_arch_resources() hook. > > > > > > > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet. > > > > Subsequent patches will allow SVE to be turned on for guest vcpus, > > > > making it visible. > > > > > > > > Signed-off-by: Dave Martin > > > > Reviewed-by: Julien Thierry > > > > Tested-by: zhang.lei > > > > > > > > --- > > > > [...] > > > > > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > > > > index 2aa80a5..086ab05 100644 > > > > --- a/arch/arm64/kvm/guest.c > > > > +++ b/arch/arm64/kvm/guest.c > > > > @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) > > > > return err; > > > > } > > > > > > > > +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64) > > > > +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64) > > > > + > > > > +static bool vq_present( > > > > + const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)], > > > > + unsigned int vq) > > > > +{ > > > > + return (*vqs)[vq_word(vq)] & vq_mask(vq); > > > > +} > > > > + > > > > +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) > > > > +{ > > > > + unsigned int max_vq, vq; > > > > + u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)]; > > > > + > > > > + if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl))) > > > > + return -EINVAL; > > > > + > > > > + memset(vqs, 0, sizeof(vqs)); > > > > + > > > > + max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); > > > > + for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) > > > > + if (sve_vq_available(vq)) > > > > + vqs[vq_word(vq)] |= vq_mask(vq); > > > > + > > > > + if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs))) > > > > + return -EFAULT; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) > > > > +{ > > > > + unsigned int max_vq, vq; > > > > + u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)]; > > > > > > There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'. > > > A macro is probably warranted. > > > > Fair point. These are a bit cumbersome. How about: > > > > #define SVE_VLS_WORDS DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64) > > > > Annoyingly, this is virtually identical to a Linux bitmap: the base type > > is u64 instead of unsigned long; otherwise there's no intentional > > difference. > > Yeah, I noticed it was a reinvented bitmap, but I liked that it was, > because, for the userspace api, removing the bitmap abstractions > ensures we know what is what and where exactly it is. Agreed. I thought of using the bitmap API inside the kernel, but even if this is arm64-specific code, it made me uneasy: the compiler doesn't necessarily think that u64 and unsigned long are the same type even if they're both 64-bit, so there's the potential for weird optimisations to happen. > > (Aside: do you know why the KVM API insists on everything being u64? > > This makes sense for non-native types (like guest registers), but not > > for native things like host userspace addresses etc.) > > Would that work when the kernel is 64-bit and the userspace is 32? I > personally like the interfaces being explicit with type size, as it > removes the brain burden of translating long and int when moving arch > to arch and kernel to userspace. Ah yes, the joys of compat. And ioctl. Ignore me. > > > > + > > > > + if (kvm_arm_vcpu_sve_finalized(vcpu)) > > > > + return -EPERM; /* too late! */ > > > > + > > > > + if (WARN_ON(vcpu->arch.sve_state)) > > > > + return -EINVAL; > > > > + > > > > + if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs))) > > > > + return -EFAULT; > > > > + > > > > + max_vq = 0; > > > > + for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq) > > > > + if (vq_present(&vqs, vq)) > > > > + max_vq = vq; > > > > + > > > > + if (max_vq > sve_vq_from_vl(kvm_sve_max_vl)) > > > > + return -EINVAL; > > > > + > > > > + for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) > > > > + if (vq_present(&vqs, vq) != sve_vq_available(vq)) > > > > + return -EINVAL; > > > > > > What about supporting the optional non-power-of-2 vector lengths? For > > > example, if the maximum VL is 512, then in addition to 512, 128, and > > > 256 there could be a 384. If we plan to start a guest on a machine > > > that has all four and then migrate it to a machine that only has > > > 128,256,512 we would want to set the VL set to 128,256,512, even on > > > the machine that supports 384. If I understand the above test correctly, > > > then that wouldn't work. > > > > This is exactly what the code is trying to forbid, though it may not be > > obvious here why. > > > > The trouble is, we can't correctly emulate a vcpu supporting > > {128,256,512} on hardware that also supports 384-bit vectors: the > > architecture says that the vector length you get is determined by > > rounding ZCR_EL1.LEN down to the next vector length actually > > supported. > > > > So, the guest would expect writing ZCR_EL1.LEN = 2 to give a vector > > length of 256 bits, whereas on this hardware the actual resulting > > vector length would be 384 bits. > > Oh, I see. > > > > > sve_probe_vqs() relies on this property to detect the set of available > > vector lengths. Simple, bare-metal guests might also just set > > ZCR_ELx.LEN = 0x1ff to just get the max available. > > > > > > The architecture says categorically that the set of vector lengths > > supported is a fixed property of the (emulated) hardware -- we can't > > having this changing under the guest's feet. > > > > Fixing this would require an archietctural way to filter out individual > > vector lengths from the supported set, not just a way to clamp the > > maximum (which is all ZCR_EL2.LEN gives us). > > > > The general expectation is that sanely designed cluster will be > > "homogeneous enough" and won't trigger this check -- it's here > > just in case. > > Data centers evolve as they expand and hardware is replaced. I wouldn't > expect them to stay homogeneous for long, not without keeping them that > way at substantial cost. This isn't the first thing that is forcing Arm > virt data centers to be homogeneous (we still use qemu's '-cpu host'), > but it's disappointing to have yet another one. This may or may not be a problem in practive, since I suspect implementations that support non-power-of-two vector lengths may be in the minority anyway. The ARM Fast Model and probably QEMU can do it, but people are less likely to try to build a high-performance cluster out of those... > > I attempt to capture this in api.txt, leaving the door open in case the > > architecture gives a way to support this in future: > > > > | KVM_REG_ARM64_SVE_VLS > > | > > | [...] > > | > > | Apart from simply removing all vector lengths from the host set > > | that exceed some value, support for arbitrarily chosen sets of > > | vector lengths is hardware-dependent and may not be available. > > | Attempting to configure an invalid set of vector lengths via > > | KVM_SET_ONE_REG will fail with EINVAL. > > > > Is this enough, or do you think more explanation is needed somewhere? > > I saw that before, so I guess more is needed :) What you wrote above about > how we can only shorten the list due to how zcr_el1.len works was the > missing piece for me. Maybe that should be integrated into the text > somehow. If you think the above is enough for ABI documentation purposes, I will aim to drop the following comment into set_sve_vls(): /* * Vector lengths supported by the host can't currently be * hidden from the guest individually: instead we can only set a * maxmium via ZCR_EL2.LEN. So, make sure the available vector * length match the set requested exactly up to the requested * maximum: */ for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) if (vq_present(&vqs, vq) != sve_vq_available(vq)) return -EINVAL; Do you think that's enough? Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel