From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH 2/2] x86, kvm: use kvmclock to compute TSC deadline value Date: Fri, 16 Sep 2016 17:24:44 +0200 Message-ID: <20160916152443.GG17296@potion> References: <1473200999-123004-1-git-send-email-pbonzini@redhat.com> <1473200999-123004-3-git-send-email-pbonzini@redhat.com> <20160915150851.GA15815@potion> <20160915195949.GA17095@potion> <20160916145957.GF17296@potion> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, dmatlack@google.com, luto@kernel.org, peterhornyack@google.com, x86@kernel.org To: Paolo Bonzini Return-path: Received: from mx1.redhat.com ([209.132.183.28]:43358 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755857AbcIPPYs (ORCPT ); Fri, 16 Sep 2016 11:24:48 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: 2016-09-16 17:06+0200, Paolo Bonzini: > On 16/09/2016 16:59, Radim Krčmář wrote: >> KVM_MSR_DEADLINE would be interface in kvmclock nanosecond values and >> MSR_IA32_TSCDEADLINE in TSC values. KVM_MSR_DEADLINE would follow >> similar rules as MSR_IA32_TSCDEADLINE -- the interrupt fires when >> kvmclock reaches the value, you read what you write, and 0 disarms it. >> >> If the TSC deadline timer was enabled, then the guest could write to >> both MSR_IA32_TSCDEADLINE and KVM_MSR_DEADLINE, but only one could be >> armed at any time (non-zero write to one will set the other to 0). >> >> The dual interface would allow unconditinal addition of the PV feature >> without regressing users that currently use MSR_IA32_TSCDEADLINE and >> adapted their stack to handle KVM's TSC shortcomings ... > > So far so good. My question is: what happens if you write to > KVM_MSR_DEADLINE and read from MSR_IA32_TSCDEADLINE, or vice versa? (The second paragraph covered it ;]) > The possibilities are: > > a) you read a 0 This one. > b) you read the value converted to the other unit Too much hassle. :) > c) you read another value such as -1 Having common "disarmed" value is nicer and MSR_IA32_TSCDEADLINE has 0. > (a) and (c) are the simplest of course. (c) may make sense when writing > to MSR_IA32_TSCDEADLINE and reading from KVM_MSR_DEADLINE, since we can > decide which values are valid or not; -1 is technically a valid TSC > deadline. > > I'm not sure about whether to allow (b). In the end KVM is going to > convert a nsec deadline to a TSC value internally, and vice versa. It is not necessary to convert nsec deadline to guest-TSC, only to host-TSC in case the VMX_PREEPTION_TIMER is used. I would only have the host-TSC internal representation, which is not exportable to the guest or migratable. > On > the other hand, if we do, userspace needs to figure out (on migration) > whether the guest set up a TSC or a nanosecond deadline. Yeah, I think the solution described below (writing 0 doesn't disarm the other one) is not bad. >>> this lets userspace decide whether to set a nsec-based >>> deadline or a TSC-based deadline after migration. >> >> Hm, isn't switching to TSC-based deadline after migration pointless? > > Yes, but I didn't mean that. I meant preserving which MSR was written > to arm the timer, and redoing the same on the destination. Ah, I see. Both MSRs read what deadline written to them (if they are armed) and at most one can be non-zero. KVM will add MSR_IA32_TSCDEADLINE to the list of emulated MSRs, so userspace will save/restore both deadline MSRs and zero writes will not disarm the other timer, so the correct timer will be armed. No special logic to try to avoid TSC-related bugs. >>>>> This still wouldn't handle old hosts of course. >>>> >>>> The question is whether we want to carry around 150 LOC because of old >>>> hosts. I'd just fix Linux to avoid deadline TSC without invariant TSC. >>>> :) >>> >>> Yes, that would automatically blacklist it on KVM. You'd also need to >>> update the recent optimization to the TSC deadline timer, to also work >>> on other APIC timer modes or at least in your new PV mode. >> >> All modes shouldn't be much harder than just the PV mode. > > The PV mode would still be a bit easier since it's still the TSC > deadline timer just with a nicer interface that is not based on the TSC. > Depends on how you code it though, I guess. Yeah, we'll see. I am planning to carry around the deadline value in nanoseconds (to avoid needless conversions), so it would have similar requirements as the APIC timer.