All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward
Date: Wed, 16 Jul 2014 16:16:17 +0200	[thread overview]
Message-ID: <53C68931.7000308@redhat.com> (raw)
In-Reply-To: <20140716155523.174a6b0e@igors-macbook-pro.local>

Il 16/07/2014 15:55, Igor Mammedov ha scritto:
> On Wed, 16 Jul 2014 08:41:00 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
>> On Wed, Jul 16, 2014 at 12:18:37PM +0200, Paolo Bonzini wrote:
>>> Il 16/07/2014 11:52, Igor Mammedov ha scritto:
>>>> There are buggy hosts in the wild that advertise invariant
>>>> TSC and as result host uses TSC as clocksource, but TSC on
>>>> such host sometimes sporadically jumps backwards.
>>>>
>>>> This causes kvmclock to go backwards if host advertises
>>>> PVCLOCK_TSC_STABLE_BIT, which turns off aggregated clock
>>>> accumulator and returns:
>>>>  pvclock_vcpu_time_info.system_timestamp + offset
>>>> where 'offset' is calculated using TSC.
>>>> Since TSC is not virtualized in KVM, it makes guest see
>>>> TSC jumped backwards and leads to kvmclock going backwards
>>>> as well.
>>>>
>>>> This is defensive patch that keeps per CPU last clock value
>>>> and ensures that clock will never go backwards even with
>>>> using PVCLOCK_TSC_STABLE_BIT enabled path.
>>>
>>> I'm not sure that a per-CPU value is enough; your patch can make the
>>> problem much less frequent of course, but I'm not sure neither
>>> detection nor correction are 100% reliable.  Your addition is
>>> basically a faster but less reliable version of the last_value
>>> logic.
> How is it less reliable than last_value logic?

Suppose CPU 1 is behind by 3 nanoseconds

    CPU 0                      CPU 1
    time = 100                              (at time 100)
                               time = 99    (at time 102)
    time = 104                              (at time 104)
                               time = 105   (at time 108)

Your patch will not detect this.

>>> If may be okay to have detection that is faster but not 100%
>>> reliable. However, once you find that the host is buggy I think the
>>> correct thing to do is to write last_value and kill
>>> PVCLOCK_TSC_STABLE_BIT from valid_flags.
> that might be an option, but what value we need to store into
> last_value?

You can write the value that was in the per-CPU variable (not perfect 
correction)...

> To make sure that clock won't go back we need to track
> it on all CPUs and store highest value to last_value, at this point
> there is no point in switching to last_value path since we have to
> track per CPU anyway.

... or loop over all CPUs and find the highest value.  You would only 
have to do this once.

>> Can we move detection to the host TSC clocksource driver ?
>
> I haven't looked much at host side solution yet,
> but to detection reliable it needs to be run constantly,
> from read_native_tsc().
>
> it's possible to put detection into check_system_tsc_reliable() but
> that would increase boot time and it's not clear for how long test
> should run to make detection reliable (in my case it takes ~5-10sec
> to detect first failure).

Is 5-10sec the time that it takes for tsc_wrap_test to fail?

> Best we could at boot time is mark TSC as unstable on affected hardware,
> but for this we need to figure out if it's specific machine or CPU issue
> to do it properly. (I'm in process of finding out who to bug with it)

Thanks, this would be best.

> PS: it appears that host runs stably.
>
> but kvm_get_time_and_clockread() is affected since it uses its own
> version of do_monotonic()->vgettsc() which will lead to cycles
> go backwards and overflow of nano secs in timespec. We should mimic
> vread_tsc() here so not to run into this kind of issues.

I'm not sure I understand, the code is similar:

	arch/x86/kvm/x86.c	arch/x86/vdso/vclock_gettime.c
	do_monotonic		do_monotonic
	vgettsc			vgetsns
	read_tsc		vread_tsc
	vget_cycles
	__native_read_tsc	__native_read_tsc

The VDSO inlines timespec_add_ns.

Paolo

  reply	other threads:[~2014-07-16 14:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-16  9:52 [PATCH] ensure guest's kvmclock never goes backwards when TSC jumps backward Igor Mammedov
2014-07-16 10:18 ` Paolo Bonzini
2014-07-16 11:41   ` Marcelo Tosatti
2014-07-16 13:55     ` Igor Mammedov
2014-07-16 14:16       ` Paolo Bonzini [this message]
2014-07-16 14:51         ` Igor Mammedov
2014-07-16 14:55           ` Paolo Bonzini
2014-07-16 15:18             ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C68931.7000308@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.