public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: paul@xen.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	 syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com,
	 Paul Durrant <pdurrant@amazon.com>,
	David Woodhouse <dwmw@amazon.co.uk>,
	 Vitaly Kuznetsov <vkuznets@redhat.com>
Subject: Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
Date: Tue, 21 Jan 2025 10:45:46 -0800	[thread overview]
Message-ID: <Z4_rWp97tzzZy0Po@google.com> (raw)
In-Reply-To: <06482c0e-e519-47ca-9f70-da3ab12ed2e4@xen.org>

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> > reference copy may not be up-to-date, i.e. may yield a false positive of
> > sorts.  In the unlikely scenario that the guest is starting a Xen timer
> > and has used a Xen pvclock in the past, but has since but turned it "off",
> > then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> > if and only if at least pvclock is enabled.
> > 
> > Furthermore, vcpu->arch.hv_clock is currently used by three different
> > pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
> > guest would ever enable multiple pvclocks, effectively sharing KVM's
> > reference clock could yield very weird behavior.  Using the guest's active
> > Xen pvclock instead of KVM's reference will allow dropping KVM's
> > reference copy.
> > 
> > Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> > Cc: Paul Durrant <pdurrant@amazon.com>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
> >   1 file changed, 53 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index a909b817b9c0..b82c28223585 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
> >   	return HRTIMER_NORESTART;
> >   }
> > +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> > +				 struct pvclock_vcpu_time_info *hv_clock,
> > +				 struct gfn_to_pfn_cache *gpc,
> > +				 unsigned int offset)
> > +{
> > +	struct pvclock_vcpu_time_info *guest_hv_clock;
> > +	unsigned long flags;
> > +	int r;
> > +
> > +	read_lock_irqsave(&gpc->lock, flags);
> > +	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> > +		read_unlock_irqrestore(&gpc->lock, flags);
> > +
> > +		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> > +		if (r)
> > +			return r;
> > +
> > +		read_lock_irqsave(&gpc->lock, flags);
> > +	}
> > +
> 
> I guess I must be missing something subtle... What is setting guest_hv_clock
> to point at something meaningful before this line?

Nope, you're not missing anything, this code is completely broken.  As pointed
out by the kernel test bot, the caller is also busted, because the "xen" pointer
is never initialied.

	struct kvm_vcpu_xen *xen;

	...

	do {
		...

		if (xen->vcpu_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
						offsetof(struct compat_vcpu_info, time));
		else if (xen->vcpu_time_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
		if (r)
			break;
	}


I suspect the selftest passes because the @gpc passed to xen_get_guest_pvclock()
is garbage, which likely results in kvm_gpc_refresh() failing, and so KVM falls
backs to the less precise method:

	if (r) {
		/*
		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
		 *
		 * Also if the guest PV clock hasn't been set up yet, as is
		 * likely to be the case during migration when the vCPU has
		 * not been run yet. It would be possible to calculate the
		 * scaling factors properly in that case but there's not much
		 * point in doing so. The get_kvmclock_ns() drift accumulates
		 * over time, so it's OK to use it at startup. Besides, on
		 * migration there's going to be a little bit of skew in the
		 * precise moment at which timers fire anyway. Often they'll
		 * be in the "past" by the time the VM is running again after
		 * migration.
		 */
		guest_now = get_kvmclock_ns(vcpu->kvm);
		kernel_now = ktime_get();
	}

Ugh.  And the reason my build tests didn't catch this is because the only config
I test with KVM_XEN=y also has KASAN=y, which is incompatible with KVM_ERROR=y
(unless the global WERROR=y is enabled).

Time to punt KASAN=y to it's own Kconfig I guess...

I'll verify the happy path is actually being tested before posting v2.

  reply	other threads:[~2025-01-21 18:45 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
2025-01-21 16:01   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
2025-01-21 16:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
2025-01-21 16:05   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
2025-01-21 16:42   ` Paul Durrant
2025-01-21 17:09     ` Sean Christopherson
2025-01-21 17:15       ` Paul Durrant
2025-01-21 18:32         ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
2025-01-21 16:54   ` Paul Durrant
2025-01-21 17:11     ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
2025-01-21 16:58   ` Paul Durrant
2025-01-21 18:45     ` Sean Christopherson [this message]
2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
2025-01-21 17:00   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
2025-01-21 17:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
2025-01-20 14:49   ` Vitaly Kuznetsov
2025-01-21 15:44     ` Sean Christopherson
2025-01-21 15:59       ` Paul Durrant
2025-01-21 17:16         ` David Woodhouse
2025-01-21 17:30           ` Paul Durrant
2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
2025-01-21 17:05   ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4_rWp97tzzZy0Po@google.com \
    --to=seanjc@google.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=pdurrant@amazon.com \
    --cc=syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox