From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
Subject: Re: [QEMU PATCH] kvmclock: advance clock by time window between
 vm_stop and pre_save
Date: Fri, 4 Nov 2016 18:16:06 +0100
Message-ID: <20161104171605.GE5388@potion>
References: <20161104094322.GA16930@amt.cnet>
 <20161104152522.GC5388@potion>
 <1c69a083-eef0-8fa0-0e74-5a4e25a066a0@redhat.com>
 <20161104154827.GD5388@potion>
 <1217f6ce-8143-8d0e-97f6-470926438992@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm@vger.kernel.org,
        qemu-devel <qemu-devel@nongnu.org>,
        "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
        Juan Quintela <quintela@redhat.com>,
        Eduardo Habkost <ehabkost@redhat.com>,
        Roman Kagan <rkagan@virtuozzo.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:42428 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1756963AbcKDRQK (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Nov 2016 13:16:10 -0400
Content-Disposition: inline
In-Reply-To: <1217f6ce-8143-8d0e-97f6-470926438992@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2016-11-04 16:57+0100, Paolo Bonzini:
> On 04/11/2016 16:48, Radim Krčmář wrote:
>> 2016-11-04 16:33+0100, Paolo Bonzini:
>>> On 04/11/2016 16:25, Radim Krčmář wrote:
>>>>>>  
>>>>>> +        if (s->advance_clock && s->clock + s->advance_clock > s->clock) {
>>>>>> +            s->clock += s->advance_clock;
>>>>>> +            s->advance_clock = 0;
>>>>>> +        }
>>>> Can't the advance_clock added to the migrated KVMClockState instead of
>>>> passing it as another parameter?
>>>>
>>>> (It is sad that we can't just query KVMClockState in kvmclock_pre_save
>>>>  because of the Linux bug.)
>>>
>>> What Linux bug?  The one that makes us use kvmclock_current_nsec?
>> 
>> No, the one that forced Marcelo to add the 10 minute limit to the
>> advance_clock.  We wouldn't need this advance_clock hack if we could
>> just call KVM_GET_CLOCK like we did before 00f4d64ee76e ("kvmclock:
>> clock should count only if vm is running").
> 
> There are two cases:
> 
> - migrating a paused guest
> 
> - pausing at the end of migration
> 
> In the first case, kvmclock_vm_state_change's !running branch will see
> state == RUN_STATE_FINISH_MIGRATE && s->clock_valid.  In the second
> case, it will see state == RUN_STATE_FINISH_MIGRATE && !s->clock_valid.

I lift my case, marcelo's said that stopping the time is a feature ...
(*kittens die*)

> In the second case it should do nothing.  Then kvmclock_pre_save will
> see !s->clock_valid and call KVM_GET_CLOCK.  A bonus is that, if
> migration fails and migration_thread() calls vm_start(), then the guest
> will not see the clock drift backwards.

Yes, moving KVM_GET_CLOCK to kvmclock_pre_save() and doing nothing in
kvmclock_vm_state_change() to avoid the drift on failed migration would
be nicer.  We'd then also get rid of the ns migration parameter.

>>> It should work with 4.9-rc (well, once Linus applies my pull request).
>>> 4.9-rc will not return ktime_get_ns for KVM_GET_CLOCK; it will return
>>> the raw value from the kernel timekeeper.

Oh, and this does introduce a minor bug to this patch -- the time
counted by KVM_GET_CLOCK is has different frequency CLOCK_MONOTONIC.
Not accounting for that is bearable.

>>> I'm thinking that we should add a KVM capability for this, and skip
>>> kvmclock_current_nsec if the capability is present.  The first part is
>>> trivial, so we can do it even during Linux rc period.
>> 
>> I agree, that sounds like a nice improvement.
> 
> Ok, I'll send the KVM patch then.  It can even be the value *returned*
> for KVM_CAP_ADJUST_CLOCK---right now it returns 1, we can return 3 (bit
> 0 for KVM_CAP_ADJUST_CLOCK_RESENT and bit 1 for returning the raw
> value).  Any suggestion for the name?

KVM_CAP_GET_CLOCK_MASTERCLOCK?

to show that the time is counted differently when using master clock.
Another idea would be KVM_CAP_GET_CLOCK_ACTUAL, because we should now
read what the guest is seeing.