kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	dmatlack@google.com, luto@kernel.org, peterhornyack@google.com,
	x86@kernel.org
Subject: Re: [PATCH 2/2] x86, kvm: use kvmclock to compute TSC deadline value
Date: Thu, 15 Sep 2016 23:02:43 +0200	[thread overview]
Message-ID: <c61d107c-5f08-af75-1183-b6bc5a4b7651@redhat.com> (raw)
In-Reply-To: <20160915195949.GA17095@potion>



On 15/09/2016 21:59, Radim Krčmář wrote:
> 2016-09-15 18:00+0200, Paolo Bonzini:
>> So virtual_tsc_khz does change on migration, at least if your host
>> doesn't have TSC scaling (which is pretty much all Intel hosts in
>> existence).
> 
> Ugh, I'd consider exposing constant TSC to the guest (which depends
> solely on CPUID -- modern models have it), allowing migration, and not
> preserving the TSC rate as a userspace bug.

Technically it is, but without hardware help I guess it's justified...

> Invariant TSC seems to prevent migration and the only thing that changed
> between constant and invariant TSC is that invariant TSC doesn't stop in
> guest-controlled deep C states.
> 
> Hm, and Linux uses TSC_DEADLINE_TIMER without invariant TSC: setting a
> timer and halting could mean that the timer never fires ...
> maybe real hardware always has both, so it is an implicit condition?

Yeah, I guess so.  And the kvmclock-based clockevent would not be able
to fix this.  So we also need to blacklist the TSC deadline timer if
kvmclock doesn't set PVCLOCK_TSC_STABLE_BIT.

>>              Safety against TSC rate changes is pretty much the reason
>> why kvmclock exists: it used to be that TSC rate changed on pCPU
>> migration, now it's only host migration but the idea is the same. :)
> 
> Yep.  I think that TSC shouldn't have been allowed outside of kvmclock.

Luckily TSC is only used by kvmclock (which is okay) and... the TSC
deadline timer.  Also by nested KVM's kvmclock, but that's the last of
our worries I guess.  So we're close.

>>> When we are already going the paravirtual route, we could add an
>>> interface that accepts the deadline in kvmclock nanoseconds.
>>> It would be much more maintanable than adding a fragile paravirtual
>>> layer on top of random interfaces.
>>
>> Good idea.
> 
> I'll prepare a prototype.

So how would this work?  A single MSR, used after setting TSC deadline
mode in LVTT?  Could you write it and read TSC deadline or vice versa?

My idea would be "yes" for writing nsec deadline and reading TSC
deadline, but "no" for writing TSC deadline and reading nsec deadline.
In the latter case, reading nsec deadline might return an impossible
value such as -1; this lets userspace decide whether to set a nsec-based
deadline or a TSC-based deadline after migration.  If you have other
ideas, I'm all ears!

>>             This still wouldn't handle old hosts of course.
> 
> The question is whether we want to carry around 150 LOC because of old
> hosts.  I'd just fix Linux to avoid deadline TSC without invariant TSC.
> :)

Yes, that would automatically blacklist it on KVM.  You'd also need to
update the recent optimization to the TSC deadline timer, to also work
on other APIC timer modes or at least in your new PV mode.

>>> Don't we completely replace the native apic timer, so it can't even be
>>> registered?  The comment doesn't explain what we expect from the call,
>>> so it's a bit confusing.
>>>
>>> I think the call expects that per_cpu(lapic_events, cpu).event_handler
>>> == NULL, so we do it to print the warning and disable the LAPIC timer.
>>
>> This pvop is always called instead of native_local_apic_timer_interrupt.
>>  We need to defer to the native implementation if the kvmclock
>> clocksource is not in use, for example if there's no TSC deadline timer
>> in the guest.
> 
> No, the pvop is for that.  If there is no TSC deadline timer, then the
> pvop will call native_local_apic_timer_interrupt(), because we don't
> overwrite it:
> 
>>>> +	if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER) &&
>>>> +	    !disable_apic && !disable_apic_timer) {
>>>> +		pv_time_ops.local_apic_timer_interrupt = kvmclock_local_apic_timer_interrupt;
>>>> +		x86_init.timers.setup_percpu_clockev = setup_kvmclock_timer;
>>>> +		x86_cpuinit.setup_percpu_clockev = setup_kvmclock_timer;
>>>> +	}
>>
>> So I should do s/hasn't been setup yet/isn't in use/.
> 
> Worse that no comment, IMO. ;)
> 
> I still think it is only to call this block in
> native_local_apic_timer_interrupt():
> 
>  	if (!evt->event_handler) {
>  		pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
>  		/* Switch it off */
>  		lapic_timer_shutdown(evt);
>  		return;
>  	}

No, it was needed for something else. :)  I just don't recall for what
anymore, since your observation on
pv_time_ops.local_apic_timer_interrupt is obviously correct.

Paolo

  reply	other threads:[~2016-09-15 21:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06 22:29 [PATCH v2 0/2] if running under KVM, use kvmclock to compute TSC deadline value Paolo Bonzini
2016-09-06 22:29 ` [PATCH 1/2] x86: paravirt: add local_apic_timer_interrupt to pv_ops Paolo Bonzini
2016-09-07  6:25   ` kbuild test robot
2016-09-07  6:33   ` kbuild test robot
2016-09-06 22:29 ` [PATCH 2/2] x86, kvm: use kvmclock to compute TSC deadline value Paolo Bonzini
2016-09-08 22:13   ` David Matlack
2016-09-09 16:38     ` Paolo Bonzini
2016-09-09 20:05       ` David Matlack
2016-10-11  4:05       ` Wanpeng Li
2016-09-15 15:09   ` Radim Krčmář
2016-09-15 16:00     ` Paolo Bonzini
2016-09-15 19:59       ` Radim Krčmář
2016-09-15 21:02         ` Paolo Bonzini [this message]
2016-09-16 14:59           ` Radim Krčmář
2016-09-16 15:06             ` Paolo Bonzini
2016-09-16 15:24               ` Radim Krčmář

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c61d107c-5f08-af75-1183-b6bc5a4b7651@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=peterhornyack@google.com \
    --cc=rkrcmar@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).