Re: [PATCHv2] kvm: arm64: Add SVE support for nVHE.

From: Dave Martin <Dave.Martin@arm.com>
To: Daniel Kiss <Daniel.Kiss@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Mark Brown <broonie@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>
Subject: Re: [PATCHv2] kvm: arm64: Add SVE support for nVHE.
Date: Mon, 8 Feb 2021 14:46:31 +0000	[thread overview]
Message-ID: <20210208144629.GD21837@arm.com> (raw)
In-Reply-To: <2C9C9BA7-6872-4420-9EDB-BCCEBD96BF6C@arm.com>

On Fri, Feb 05, 2021 at 12:12:51AM +0000, Daniel Kiss wrote:
> 
> 
> > On 4 Feb 2021, at 18:36, Dave Martin <Dave.Martin@arm.com> wrote:
> > 
> > On Tue, Feb 02, 2021 at 07:52:54PM +0100, Daniel Kiss wrote:
> >> CPUs that support SVE are architecturally required to support the
> >> Virtualization Host Extensions (VHE), so far the kernel supported
> >> SVE alongside KVM with VHE enabled. In same cases it is desired to
> >> run nVHE config even when VHE is available.
> >> This patch add support for SVE for nVHE configuration too.
> >> 
> >> Tested on FVP with a Linux guest VM that run with a different VL than
> >> the host system.
> >> 
> >> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>

[...]

> >> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> >> index 3e081d556e81..8f29b468e989 100644
> >> --- a/arch/arm64/kvm/fpsimd.c
> >> +++ b/arch/arm64/kvm/fpsimd.c
> >> @@ -42,6 +42,16 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> >> 	if (ret)
> >> 		goto error;
> >> 
> >> +	if (!has_vhe() && vcpu->arch.sve_state) {
> >> +		void *sve_state_end = vcpu->arch.sve_state +
> >> +					    SVE_SIG_REGS_SIZE(
> >> +						sve_vq_from_vl(vcpu->arch.sve_max_vl));
> >> +		ret = create_hyp_mappings(vcpu->arch.sve_state,
> >> +					  sve_state_end,
> >> +					  PAGE_HYP);
> >> +		if (ret)
> >> +			goto error;
> >> +	}
> >> 	vcpu->arch.host_thread_info = kern_hyp_va(ti);
> >> 	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> >> error:
> >> @@ -109,10 +119,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> >> 	local_irq_save(flags);
> >> 
> >> 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> >> +		if (guest_has_sve) {
> >> +			if (has_vhe())
> >> +				__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
> >> +			else {
> >> +				__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL1);
> >> +				/*
> >> +				 * vcpu could set ZCR_EL1 to a shorter VL then the max VL but
> >> +				 * the context is still valid there. Save the whole context.
> >> +				 * In nVHE case we need to reset the ZCR_EL1 for that
> >> +				 * because the save will be done in EL1.
> >> +				 */
> >> +				write_sysreg_s(sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1,
> >> +					       SYS_ZCR_EL1);
> > 
> > This still doesn't feel right.  We're already in EL1 here I think, in
> > which case ZCR_EL1 has an immediate effect on what state the
> > architecture guarantees to preserve: if we need to change ZCR_EL1, it's
> > because it might be wrong.  If it's wrong, it might be too small.  And
> > if it's too small, we may have already lost some SVE register bits that
> > the guest cares about.
> "On taking an exception from an Exception level that is more constrained
>  to a target Exception level that is less constrained, or on writing a larger value
>  to ZCR_ELx.LEN, then the previously inaccessible bits of these registers that 
>  become accessible have a value of either zero or the value they had before
>  executing at the more constrained size.”
> If the CPU zeros the register when ZCR is written or exception is taken my reading
>  of the above is that the register content maybe lost when we land in EL2.
> No code shall not count on the register content after reduces the VL in ZCR.
> 
> I see my comment also not clear enough.
> Maybe we shouldn’t save the guest’s sve_max_vl here, would enough to save up to
> the actual VL.

Maybe you're right, but I may be missing some information here.

Can you sketch out more explicitly how it works, showing how all the
bits the host cares about (and only those bits) are saved/restored for
the host, and how all the bits the guest cares about (and only those
bits) are saved/restored for the guest?

Various optimisations are possible, but there is a risk of breaking
assumptions elsewhere.  For example, the KVM_{SET,GET}_ONE_REG code
makes assmuptions about the layout of the data in
vcpu->arch.sve_state.

The architectural rules about when SVE register bits are also complex,
with many interactions.  We also don't want to aggressively optimise in
a way that might be hard to apply to nested virt.

My instinct is to keep it simple while this patch matures, and continue
to save/restore based on vcpu->arch.sve_max_vl.  This keeps a clear
split between the emulated "hardware" (which doesn't change while the
guest runs), and changeable runtime state (like the guest's ZCR_EL1).

I'm happy to review proposed optimisations, but I think those should be
separated out as later patches (or a separate series).  My experience
is that it's much easier to get this wrong than to get it right!

When it's wrong, it can be a nightmare to debug.

> > I think that we need to handle this on our way out of hyp instead,
> > _before_ returning back to EL1.
> > 
> > When at EL2 exiting back to the host: if and only if
> > KVM_ARM64_FP_ENABLED is set then save the guest's ZCR_EL1 and ZCR_EL1 to
> > match the guest's sve_max_vl (possibly by just cloning ZCR_EL2).
> > 
> >> +			}
> >> +		}
> >> 		fpsimd_save_and_flush_cpu_state();
> >> -
> >> -		if (guest_has_sve)
> >> -			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
> >> 	} else if (host_has_sve) {
> >> 		/*
> >> 		 * The FPSIMD/SVE state in the CPU has not been touched, and we

[...]

> >> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> >> index f3d0e9eca56c..df9e912d1278 100644
> >> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> >> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> >> @@ -45,6 +45,18 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
> >> 	if (!update_fp_enabled(vcpu)) {
> >> 		val |= CPTR_EL2_TFP;
> >> 		__activate_traps_fpsimd32(vcpu);
> >> +	} else {
> >> +		if (vcpu_has_sve(vcpu)) {
> >> +			/*
> >> +			 * The register access will not be trapped so restore
> >> +			 * ZCR_EL1/ZCR_EL2 because those were set for the host.
> >> +			 */
> >> +			write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL1);
> >> +			write_sysreg_s(
> >> +				sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1,
> >> +				SYS_ZCR_EL2);
> >> +			val &= ~CPTR_EL2_TZ;
> >> +		}
> >> 	}
> >> 
> >> 	write_sysreg(val, cptr_el2);
> >> @@ -110,6 +122,17 @@ static void __load_host_stage2(void)
> >> 	write_sysreg(0, vttbr_el2);
> >> }
> >> 
> >> +static void __restore_host_sve_state(struct kvm_vcpu *vcpu)
> >> +{
> >> +	/*
> >> +	 * If the guest uses SVE, the ZCR_EL2 was configured for the guest.
> >> +	 * Host might save the context in EL1 but for that the ZCR_EL2 need
> >> +	 * to be reset to the host's default.
> > 
> > This isn't just about saving the guest context correctly.  The host have
> > be using larger vectors than the guest's sve_max_vl allows.
> Right.

Can you clarify the comment please (unless you've done so already)?

> > 
> >> +	 */
> >> +	if (vcpu_has_sve(vcpu) && (vcpu->arch.flags |= KVM_ARM64_FP_ENABLED))
> >> +		write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
> > 
> > I'm not sure if it's worth having a special #define for this, but it
> > would be a good idea at least to put comments here and in el2_setup.h to
> > remind people that the ZCR_EL2 settings need to match.  Otherwise this
> > might get mis-maintained in the future.
> I will add a comment to el2_setup.h

Can you put comments in both places please?  Maintainers of either bit
of code need to be aware of the other code.

[...]

> >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> >> index 47f3f035f3ea..17cc5e87adcd 100644
> >> --- a/arch/arm64/kvm/reset.c
> >> +++ b/arch/arm64/kvm/reset.c
> >> @@ -74,10 +74,6 @@ static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
> >> 	if (!system_supports_sve())
> >> 		return -EINVAL;
> >> 
> >> -	/* Verify that KVM startup enforced this when SVE was detected: */
> >> -	if (WARN_ON(!has_vhe()))
> >> -		return -EINVAL;
> >> -
> >> 	vcpu->arch.sve_max_vl = kvm_sve_max_vl;
> >> 
> >> 	/*
> >> @@ -113,7 +109,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> >> 	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> >> 	if (!buf)
> >> 		return -ENOMEM;
> >> -
> >> +	__vcpu_sys_reg(vcpu, ZCR_EL1) = sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1;
> > 
> > What's this for?
> > 
> > __vcpu_sys_reg(vcpu, ZCR_EL1) should already be being reset sensibly
> > somewhere.  If not, that would be a bug…
> It is not needed indeed. 

Ah, OK.

Cheers
---Dave
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm