qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-devel@nongnu.org
Cc: Paul Mackerras <paulus@samba.org>, qemu-ppc <qemu-ppc@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration
Date: Fri, 11 Apr 2014 11:40:34 +0200	[thread overview]
Message-ID: <5347B892.5010107@suse.de> (raw)
In-Reply-To: <5346AB55.8030807@ozlabs.ru>


On 10.04.14 16:31, Alexey Kardashevskiy wrote:
> On 04/10/2014 10:34 PM, Alexander Graf wrote:
>> On 03.04.14 15:14, Alexey Kardashevskiy wrote:
>>> This allows guests to have a different timebase origin from the host.
>>>
>>> This is needed for migration, where a guest can migrate from one host
>>> to another and the two hosts might have a different timebase origin.
>>> However, the timebase seen by the guest must not go backwards, and
>>> should go forwards only by a small amount corresponding to the time
>>> taken for the migration.
>>>
>>> This is only supported for recent POWER hardware which has the TBU40
>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>> 970.
>>>
>>> This adds kvm_access_one_reg() to access a special register which is not
>>> in env->spr.
>>>
>>> The feature must be present in the host kernel.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> Changes:
>>> v4:
>>> * made it per machine timebase offser rather than per CPU
>>>
>>> v3:
>>> * kvm_access_one_reg moved out to a separate patch
>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>> the destionation does not really care of offset on the source
>>>
>>> v2:
>>> * bumped the vmstate_ppc_cpu version
>>> * defined version for the env.tb_env field
>>> ---
>>>    hw/ppc/ppc.c           | 120
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>    hw/ppc/spapr.c         |   3 +-
>>>    include/hw/ppc/spapr.h |   2 +
>>>    target-ppc/cpu-qom.h   |  16 +++++++
>>>    target-ppc/kvm.c       |   5 +++
>>>    target-ppc/machine.c   |   4 +-
>>>    trace-events           |   3 ++
>>>    7 files changed, 151 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>> index 9c2a132..b51db1b 100644
>>> --- a/hw/ppc/ppc.c
>>> +++ b/hw/ppc/ppc.c
>>> @@ -29,9 +29,11 @@
>>>    #include "sysemu/cpus.h"
>>>    #include "hw/timer/m48t59.h"
>>>    #include "qemu/log.h"
>>> +#include "qemu/error-report.h"
>>>    #include "hw/loader.h"
>>>    #include "sysemu/kvm.h"
>>>    #include "kvm_ppc.h"
>>> +#include "trace.h"
>>>      //#define PPC_DEBUG_IRQ
>>>    //#define PPC_DEBUG_TB
>>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>> uint32_t freq)
>>>        cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>    }
>>>    +/*
>>> + * Calculate timebase on the destination side of migration
>>> + *
>>> + * We calculate new timebase offset as shown below:
>>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
>>> + *    Gtb2 = tb2 + off2
>>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
>>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
>>> + *
>>> + * where:
>>> + * Gtb2 - destination guest timebase
>>> + * tb2 - destination host timebase
>>> + * off2 - destination timebase offset
>>> + * tod2 - destination time of the day
>>> + * Gtb1 - source guest timebase
>>> + * tod1 - source time of the day
>>> + *
>>> + * The result we want is in @off2
>>> + *
>>> + * Two conditions must be met for @off2:
>>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
>>> + * 2) Gtb2 >= Gtb1
>>> + */
>>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>>> +{
>>> +    uint64_t tb2, tod2;
>>> +    int64_t off2;
>>> +    int ratio = tb->freq / 1000000;
>>> +    struct timeval tv;
>>> +
>>> +    tb2 = cpu_get_real_ticks();
>>> +    gettimeofday(&tv, NULL);
>>> +    tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
>>> +
>>> +    off2 = tb->guest_timebase - tb2;
>>> +    if ((tod2 > tb->time_of_the_day) &&
>>> +        (tod2 - tb->time_of_the_day < 1000000)) {
>>> +        off2 += (tod2 - tb->time_of_the_day) * ratio;
>>> +    }
>>> +    off2 = ROUND_UP(off2, 1 << 24);
>>> +
>>> +    return off2;
>>> +}
>> I *think* what you're trying to say here is that you want
>>
>> assert(source_timebase_freq == timebase_freq);
>>
>> migration_duration_ns = host_ns - source_host_ns;
>> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
>> kvm_set_guest_tb(guest_tb);
>>    -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
>>
>> But I honestly have not managed to read that from the code. Either this
>> really is what you're trying to do and the code is just very hard to read
>> (which means it needs to be written more easily) or you're doing something
>> different which I don't understand.
>
> Is this any better?
>
> static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
> {
> 	struct timeval tv;
> 	int64_t migration_duration_ns, migration_duration_tb;

If I read the code correctly you're operating in us, not ns, no?

> 	int64_t guest_tb, host_ns;
> 	int ratio = tb->freq / 1000000;

#define USEC_PER_SEC 1000000

You're also losing quite a bit of precision here, no?

> 	int64_t off;
>
> 	gettimeofday(&tv, NULL);
> 	host_ns = tv.tv_sec * 1000000 + tv.tv_usec;

host_us = get_clock_realtime() / 1000; ?

> 	migration_duration_ns = MIN(1000000,

Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?

> 		host_ns - tb->time_of_the_day);
> 	migration_duration_tb = migration_duration_ns * ratio;
>
> 	guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
>
> 	off = guest_tb - cpu_get_real_ticks();

It's probably easier to read when you create one function that just 
returns a guest TB value adjusted by the time the last measurement 
happened. The fact that the KVM register wants an offset is a KVM 
implementation detail. The TB adjustment should happen generically.

>
> 	return off;
> }
>
>
>> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
>> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
>> don't want to do it in QEMU in case there will be a more fine-grained SPR
>> in the future.
> I believe rounding was not in the kernel when I started making this...
>
>
>> And from all I understand the timebase frequency is now architecturally
>> specified, so it won't change for newer cores, no?
> I asked people in our lab. Everyone says that it should not change but
> noone would bet on it too much.

When it changes and you want to live migrate, you'll need to implement a 
guest TB scale register and the whole idea of a "TB offset" ONE_REG is 
absurd.

The more I think about this the more I realize we should have created a 
"guest TB value", not a "guest TB offset" ONE_REG.

>
>
>> And if we migrate TCG
>> guests it will be the same between two hosts.
> And G5 uses 33333333. I really do not understand why it is bad to
> send-and-check timer frequency. Why?

Because the guest will continue to run at a different TB frequency on 
the new host and break.

>
>
> Is the rest ok? Thanks for review!

Not sure. Please rework everything according to the comments, make the 
code readable enough that your wife understands it and then resend it :).


Alex

  reply	other threads:[~2014-04-11  9:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-03 13:14 [Qemu-devel] [PATCH 0/4] power7/8 migration patches Alexey Kardashevskiy
2014-04-03 13:14 ` [Qemu-devel] [PATCH 1/4] kvm: Add set_one_reg/get_one_reg helpers Alexey Kardashevskiy
2014-05-08 12:27   ` Alexander Graf
2014-05-09  1:35     ` Alexey Kardashevskiy
2014-05-09  8:06       ` [Qemu-devel] [PATCH] kvm: make one_reg helpers available for everyone Cornelia Huck
2014-05-13 10:01         ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 2/4] spapr: Enable DABRX special register Alexey Kardashevskiy
2014-04-03 13:19   ` Alexander Graf
2014-04-04  6:13     ` Alexey Kardashevskiy
2014-04-04 12:21       ` Alexander Graf
2014-04-03 18:42   ` Tom Musta
2014-04-04  0:51     ` Alexey Kardashevskiy
2014-04-04 12:40       ` Tom Musta
2014-04-03 13:14 ` [Qemu-devel] [PATCH 3/4] KVM: PPC: Support POWER8 registers Alexey Kardashevskiy
2014-04-03 13:33   ` Alexander Graf
2014-04-03 19:12     ` Tom Musta
2014-04-04  6:58       ` Alexey Kardashevskiy
2014-04-04 12:23         ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration Alexey Kardashevskiy
2014-04-10 12:34   ` Alexander Graf
2014-04-10 14:31     ` Alexey Kardashevskiy
2014-04-11  9:40       ` Alexander Graf [this message]
2014-04-11 21:55         ` Benjamin Herrenschmidt
2014-04-11 22:59           ` Alexander Graf
2014-04-11 23:03             ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-04-12  3:44           ` [Qemu-devel] " Alexey Kardashevskiy
2014-04-12  7:25             ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5347B892.5010107@suse.de \
    --to=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=paulus@samba.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).