Re: [PATCH 5/5] KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high

From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: catalin.marinas@arm.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 5/5] KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
Date: Tue, 12 Apr 2022 12:30:47 +0100	[thread overview]
Message-ID: <87bkx6bkt4.wl-maz@kernel.org> (raw)
In-Reply-To: <YlQGh6Ky2oqrS89y@e121798.cambridge.arm.com>

On Mon, 11 Apr 2022 11:46:09 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Fri, Apr 08, 2022 at 08:47:00AM +0100, Marc Zyngier wrote:
> > Hi Alex,
> > 
> > On Thu, 07 Apr 2022 17:23:27 +0100,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > When userspace is debugging a VM, the kvm_debug_exit_arch part of the
> > > kvm_run struct contains arm64 specific debug information: the ESR_EL2
> > > value, encoded in the field "hsr", and the address of the instruction
> > > that caused the exception, encoded in the field "far".
> > > 
> > > Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
> > > kvm_debug_exit_arch.hsr cannot be changed to match because that would
> > > change the memory layout of the struct on big endian machines:
> > > 
> > > Current layout:			| Layout with "hsr" extended to 64 bits:
> > > 				|
> > > offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
> > > offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
> > > offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)
> > > 
> > > which breaks existing code.
> > > 
> > > The padding is inserted by the compiler because the "far" field must be
> > > aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
> > > page 18) and the struct itself must be aligned to 8 bytes (the struct must
> > > be aligned to the maximum alignment of its fields - aapcs64, page 18),
> > > which means that "hsr" must be aligned to 8 bytes as it is the first field
> > > in the struct.
> > > 
> > > To avoid changing the struct size and layout for the existing fields, add a
> > > new field, "hsr_high", which replaces the existing padding. "hsr_high" will
> > > be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
> > > both on big and little endian machine, becomes:
> > > 
> > > Layout with "hsr_high" added:
> > > 
> > > offset 0: ESR_EL2[31:0]  (hsr)
> > > offset 4: ESR_EL2[61:32] (hsr_high)
> > > offset 8: FAR_EL2[61:0]  (far)
> > 
> > My concern with this change is that it isn't clear what the padding is
> > currently initialised to, and I don't think there is any guarantee
> > that it is zeroed. With that, a new userspace on an old kernel would
> > interpret hsr_high, and potentially observe stuff that wasn't supposed
> > to be interpreted.
> 
> You are right, I didn't think about this scenario. Did some digging, and
> C99 explicitely states that padding is uninitialized (September 7, 2007,
> draft, page 38), so I assume that it's the same for C89 (couldn't find a
> free source for the standard).
> 
> > 
> > That's yet another mistake in our userspace ABI (where is the time
> > machine when you need it?).
> 
> To avoid this sort of thing happening in the future, KVM/arm64 could
> mandate that all structs that are part of the userspace API have the
> padding explicitly declared as a struct field and that padding must always
> be set to zero by userspace before a syscall.

That's what we were aiming for for most structures, but clearly missed
some. Debug is obviously an unloved part of the architecture.

> KVM would then check that the
> padding is zero and return -EINVAL if userspace didn't clear it correctly,
> to enforce this convention. I think that should be forward compatible with
> repurposing the padding to add another field, unless 0 has a special
> meaning for the field that is added, which I believe to be highly unlikely.
> 
> > 
> > In order to do this, we must advertise to userspace that we provide
> > more information. This probably means adding a flag of some sort to
> > kvm_run (there are at least 128 bits of x86 stuff that can be readily
> > reclaimed).
> 
> We could add a flag to kvm_run.flags that is set only when
> kvm_run.exit_reason == KVM_EXIT_DEBUG, something like
> KVM_DEBUG_ARM_HSR_HIGH_SET (or PRESENT). flags has 16 bits which are unused
> today, so I don't think it's too costly to use one bit for this.

That'd be a sensible approach. We use any for KVM/arm64 yet, and this
is all arch-specific anyway.

> 
> One other option would be to wait to expose the upper 32 bits until KVM
> supports a hardware feature that makes use of those bits. That means
> advertising the feature to userspace, which KVM might or might not want to
> do. And KVM today doesn't do any sanitisation/masking on the hsr value that
> is reported to userspace, and tying hsr_high to a particular feature might
> mean that KVM will have to sanitise the upper bits if said feature is
> opt-in by userspace.
> 
> My preference is for the first approach because the second approach looks
> more complicated.

+1.

> There's always the option to wait until KVM makes use of the upper 32 bits
> and decide then, when we have more information.

The sooner, the better. Updating userspace takes *years*, so let's do
it ASAP.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm