public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
Date: Wed, 16 Jul 2014 16:16:17 +0200	[thread overview]
Message-ID: <53C68931.7000308@redhat.com> (raw)
In-Reply-To: <20140716155523.174a6b0e@igors-macbook-pro.local>

Il 16/07/2014 15:55, Igor Mammedov ha scritto:
> On Wed, 16 Jul 2014 08:41:00 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
>> On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:
>>> Il 16/07/2014 11:52, Igor Mammedov ha scritto:
>>>> There are buggy hosts in the wild that advertise invariant
>>>> TSC and as result host uses TSC as clocksource, but TSC on
>>>> such host sometimes sporadically jumps backwards.
>>>>
>>>> This causes kvmclock to go backwards if host advertises
>>>> PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
>>>> accumulator and returns:
>>>>  pvclock_vcpu_time_info.system_timestamp + offset
>>>> where 'offset' is calculated using TSC.
>>>> Since TSC is not virtualized in KVM, it makes guest see
>>>> TSC jumped backwards and leads to kvmclock going backwards
>>>> as well.
>>>>
>>>> This is defensive patch that keeps per CPU last clock value
>>>> and ensures that clock will never go backwards even with
>>>> using PVCLOCK_TSC_STABLE_BIT enabled path.
>>>
>>> I'm not sure that a per-CPU value is enough; your patch can make the
>>> problem much less frequent of course, but I'm not sure neither
>>> detection nor correction are 100% reliable.  Your addition is
>>> basically a faster but less reliable version of the last_value
>>> logic.
> How is it less reliable than last_value logic?

Suppose CPU 1 is behind by 3 nanoseconds

    CPU 0                      CPU 1
    time = 100                              (at time 100)
                               time = 99    (at time 102)
    time = 104                              (at time 104)
                               time = 105   (at time 108)

Your patch will not detect this.

>>> If may be okay to have detection that is faster but not 100%
>>> reliable. However, once you find that the host is buggy I think the
>>> correct thing to do is to write last_value and kill
>>> PVCLOCK_TSC_STABLE_BIT from valid_flags.
> that might be an option, but what value we need to store into
> last_value?

You can write the value that was in the per-CPU variable (not perfect 
correction)...

> To make sure that clock won't go back we need to track
> it on all CPUs and store highest value to last_value, at this point
> there is no point in switching to last_value path since we have to
> track per CPU anyway.

... or loop over all CPUs and find the highest value.  You would only 
have to do this once.

>> Can we move detection to the host TSC clocksource driver ?
>
> I haven't looked much at host side solution yet,
> but to detection reliable it needs to be run constantly,
> from read_native_tsc().
>
> it's possible to put detection into check_system_tsc_reliable() but
> that would increase boot time and it's not clear for how long test
> should run to make detection reliable (in my case it takes ~5-10sec
> to detect first failure).

Is 5-10sec the time that it takes for tsc_wrap_test to fail?

> Best we could at boot time is mark TSC as unstable on affected hardware,
> but for this we need to figure out if it's specific machine or CPU issue
> to do it properly. (I'm in process of finding out who to bug with it)

Thanks, this would be best.

> PS: it appears that host runs stably.
>
> but kvm_get_time_and_clockread() is affected since it uses its own
> version of do_monotonic()->vgettsc() which will lead to cycles
> go backwards and overflow of nano secs in timespec. We should mimic
> vread_tsc() here so not to run into this kind of issues.

I'm not sure I understand, the code is similar:

	arch/x86/kvm/x86.c	arch/x86/vdso/vclock_gettime.c
	do_monotonic		do_monotonic
	vgettsc			vgetsns
	read_tsc		vread_tsc
	vget_cycles
	__native_read_tsc	__native_read_tsc

The VDSO inlines timespec_add_ns.

Paolo

  reply	other threads:[~2014-07-16 14:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-16  9:52 [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward Igor Mammedov
2014-07-16 10:18 ` Paolo Bonzini
2014-07-16 11:41   ` Marcelo Tosatti
2014-07-16 13:55     ` Igor Mammedov
2014-07-16 14:16       ` Paolo Bonzini [this message]
2014-07-16 14:51         ` Igor Mammedov
2014-07-16 14:55           ` Paolo Bonzini
2014-07-16 15:18             ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C68931.7000308@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox