Re: [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, mhal@rbox.co
Subject: Re: [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area
Date: Tue, 22 Nov 2022 18:39:16 +0000	[thread overview]
Message-ID: <Y30XVDXmkAIlRX4N@google.com> (raw)
In-Reply-To: <20221119094659.11868-2-dwmw2@infradead.org>

On Sat, Nov 19, 2022, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The guest runstate area can be arbitrarily byte-aligned. In fact, even
> when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
> fields in the structure end up being unaligned due to the fact that the
> 32-bit ABI only aligns them to 32 bits.
> 
> So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
> is buggy, because if it's unaligned then we can't update the whole field
> atomically; the low bytes might be observable before the _UPDATE bit is.
> Xen actually updates the *byte* containing that top bit, on its own. KVM
> should do the same.

I think we're using the wrong APIs to update the runstate.  The VMCS/VMCB pages
_need_ the host pfn, i.e. need to use a gpc (eventually).  The Xen PV stuff on the
other hand most definitely doesn't need to know the pfn.

The event channel code would be difficult to convert due to lack of uaccess
primitives, but I don't see anything in the runstate code that prevents KVM from
using a gfn_to_hva_cache.  That will naturally handle page splits by sending them
down a slow path and would yield far simpler code.

If taking the slow path is an issue, then the guest really should be fixed to not
split pages.  And if that's not an acceptable answer, the gfn_to_hva_cache code
could be updated to use the fast path if the region is contiguous in the host
virtual address space.

> +static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
>  {
>  	struct kvm_vcpu_xen *vx = &v->arch.xen;
> -	u64 now = get_kvmclock_ns(v->kvm);
> -	u64 delta_ns = now - vx->runstate_entry_time;
> -	u64 run_delay = current->sched_info.run_delay;
> +	struct gfn_to_pfn_cache *gpc1 = &vx->runstate_cache;
> +	struct gfn_to_pfn_cache *gpc2 = &vx->runstate2_cache;
> +	size_t user_len, user_len1, user_len2;
> +	struct vcpu_runstate_info rs;
> +	int *rs_state = &rs.state;
> +	unsigned long flags;
> +	size_t times_ofs;
> +	u8 *update_bit;
>  
> -	if (unlikely(!vx->runstate_entry_time))
> -		vx->current_runstate = RUNSTATE_offline;
> +	/*
> +	 * The only difference between 32-bit and 64-bit versions of the
> +	 * runstate struct us the alignment of uint64_t in 32-bit, which

s/us/is

> +	 * means that the 64-bit version has an additional 4 bytes of
> +	 * padding after the first field 'state'.
> +	 */
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != 0);
> +	BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state) != 0);
> +	BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c);
> +#ifdef CONFIG_X86_64
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) !=
> +		     offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4);
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) !=
> +		     offsetof(struct compat_vcpu_runstate_info, time) + 4);
> +#endif
> +
> +	if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) {
> +		user_len = sizeof(struct vcpu_runstate_info);
> +		times_ofs = offsetof(struct vcpu_runstate_info,
> +				     state_entry_time);
> +	} else {
> +		user_len = sizeof(struct compat_vcpu_runstate_info);
> +		times_ofs = offsetof(struct compat_vcpu_runstate_info,
> +				     state_entry_time);
> +		rs_state++;

...

> +	*rs_state = vx->current_runstate;
> +#ifdef CONFIG_X86_64
> +	/* Don't leak kernel memory through the padding in the 64-bit struct */
> +	if (rs_state == &rs.state)
> +		rs_state[1] = 0;

Oof, that's difficult to follow.  Rather than pointer magic, what about zeroing
the first word unconditionally?  Likely faster than a CMP+CMOV or whatever gets
generated.  Or just zero the entire struct.

	/* Blah blah blah */
	*(unsigned long *)&rs.state = 0;

> +#endif

next prev parent reply	other threads:[~2022-11-22 18:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19  9:46 [PATCH 1/4] MAINTAINERS: Add KVM x86/xen maintainer list David Woodhouse
2022-11-19  9:46 ` [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area David Woodhouse
2022-11-19 11:38   ` Durrant, Paul
2022-11-19 11:46     ` David Woodhouse
2022-11-22 18:39   ` Sean Christopherson [this message]
2022-11-22 23:04     ` David Woodhouse
2022-11-23 17:17       ` Sean Christopherson
2022-11-23 17:58         ` David Woodhouse
2022-11-23 18:16           ` Sean Christopherson
2022-11-23 18:48             ` David Woodhouse
2022-11-22 23:46   ` Michal Luczaj
2022-11-23  0:03     ` David Woodhouse
2022-11-23 18:21   ` Sean Christopherson
2022-11-23 19:22     ` David Woodhouse
2022-11-23 19:32       ` Sean Christopherson
2022-11-23 20:07         ` David Woodhouse
2022-11-23 20:26           ` Sean Christopherson
2022-11-19  9:46 ` [PATCH 3/4] KVM: Update gfn_to_pfn_cache khva when it moves within the same page David Woodhouse
2022-11-19 11:40   ` Durrant, Paul
2022-11-22 16:23     ` Sean Christopherson
2022-11-22 16:49       ` Paul Durrant
2022-11-22 17:02         ` David Woodhouse
2022-11-22 19:09           ` Sean Christopherson
2022-11-22 23:13             ` David Woodhouse
2022-11-24  0:31           ` Paolo Bonzini
2022-11-24  0:45             ` David Woodhouse
2022-11-24  0:54             ` David Woodhouse
2022-11-24  1:11               ` Paolo Bonzini
2022-11-24 10:41                 ` David Woodhouse
2022-11-24 11:30                   ` David Woodhouse
2022-11-19  9:46 ` [PATCH 4/4] KVM: x86/xen: Add runstate tests for 32-bit mode and crossing page boundary David Woodhouse
2022-11-19 11:42   ` Durrant, Paul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y30XVDXmkAIlRX4N@google.com \
    --to=seanjc@google.com \
    --cc=dwmw2@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=mhal@rbox.co \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.