All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-devel@nongnu.org
Cc: Paul Mackerras <paulus@samba.org>, qemu-ppc <qemu-ppc@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration
Date: Fri, 11 Apr 2014 11:40:34 +0200	[thread overview]
Message-ID: <5347B892.5010107@suse.de> (raw)
In-Reply-To: <5346AB55.8030807@ozlabs.ru>


On 10.04.14 16:31, Alexey Kardashevskiy wrote:
> On 04/10/2014 10:34 PM, Alexander Graf wrote:
>> On 03.04.14 15:14, Alexey Kardashevskiy wrote:
>>> This allows guests to have a different timebase origin from the host.
>>>
>>> This is needed for migration, where a guest can migrate from one host
>>> to another and the two hosts might have a different timebase origin.
>>> However, the timebase seen by the guest must not go backwards, and
>>> should go forwards only by a small amount corresponding to the time
>>> taken for the migration.
>>>
>>> This is only supported for recent POWER hardware which has the TBU40
>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>> 970.
>>>
>>> This adds kvm_access_one_reg() to access a special register which is not
>>> in env->spr.
>>>
>>> The feature must be present in the host kernel.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> Changes:
>>> v4:
>>> * made it per machine timebase offser rather than per CPU
>>>
>>> v3:
>>> * kvm_access_one_reg moved out to a separate patch
>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>> the destionation does not really care of offset on the source
>>>
>>> v2:
>>> * bumped the vmstate_ppc_cpu version
>>> * defined version for the env.tb_env field
>>> ---
>>>    hw/ppc/ppc.c           | 120
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>    hw/ppc/spapr.c         |   3 +-
>>>    include/hw/ppc/spapr.h |   2 +
>>>    target-ppc/cpu-qom.h   |  16 +++++++
>>>    target-ppc/kvm.c       |   5 +++
>>>    target-ppc/machine.c   |   4 +-
>>>    trace-events           |   3 ++
>>>    7 files changed, 151 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>> index 9c2a132..b51db1b 100644
>>> --- a/hw/ppc/ppc.c
>>> +++ b/hw/ppc/ppc.c
>>> @@ -29,9 +29,11 @@
>>>    #include "sysemu/cpus.h"
>>>    #include "hw/timer/m48t59.h"
>>>    #include "qemu/log.h"
>>> +#include "qemu/error-report.h"
>>>    #include "hw/loader.h"
>>>    #include "sysemu/kvm.h"
>>>    #include "kvm_ppc.h"
>>> +#include "trace.h"
>>>      //#define PPC_DEBUG_IRQ
>>>    //#define PPC_DEBUG_TB
>>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>> uint32_t freq)
>>>        cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>>    }
>>>    +/*
>>> + * Calculate timebase on the destination side of migration
>>> + *
>>> + * We calculate new timebase offset as shown below:
>>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
>>> + *    Gtb2 = tb2 + off2
>>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
>>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
>>> + *
>>> + * where:
>>> + * Gtb2 - destination guest timebase
>>> + * tb2 - destination host timebase
>>> + * off2 - destination timebase offset
>>> + * tod2 - destination time of the day
>>> + * Gtb1 - source guest timebase
>>> + * tod1 - source time of the day
>>> + *
>>> + * The result we want is in @off2
>>> + *
>>> + * Two conditions must be met for @off2:
>>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
>>> + * 2) Gtb2 >= Gtb1
>>> + */
>>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>>> +{
>>> +    uint64_t tb2, tod2;
>>> +    int64_t off2;
>>> +    int ratio = tb->freq / 1000000;
>>> +    struct timeval tv;
>>> +
>>> +    tb2 = cpu_get_real_ticks();
>>> +    gettimeofday(&tv, NULL);
>>> +    tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
>>> +
>>> +    off2 = tb->guest_timebase - tb2;
>>> +    if ((tod2 > tb->time_of_the_day) &&
>>> +        (tod2 - tb->time_of_the_day < 1000000)) {
>>> +        off2 += (tod2 - tb->time_of_the_day) * ratio;
>>> +    }
>>> +    off2 = ROUND_UP(off2, 1 << 24);
>>> +
>>> +    return off2;
>>> +}
>> I *think* what you're trying to say here is that you want
>>
>> assert(source_timebase_freq == timebase_freq);
>>
>> migration_duration_ns = host_ns - source_host_ns;
>> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
>> kvm_set_guest_tb(guest_tb);
>>    -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
>>
>> But I honestly have not managed to read that from the code. Either this
>> really is what you're trying to do and the code is just very hard to read
>> (which means it needs to be written more easily) or you're doing something
>> different which I don't understand.
>
> Is this any better?
>
> static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
> {
> 	struct timeval tv;
> 	int64_t migration_duration_ns, migration_duration_tb;

If I read the code correctly you're operating in us, not ns, no?

> 	int64_t guest_tb, host_ns;
> 	int ratio = tb->freq / 1000000;

#define USEC_PER_SEC 1000000

You're also losing quite a bit of precision here, no?

> 	int64_t off;
>
> 	gettimeofday(&tv, NULL);
> 	host_ns = tv.tv_sec * 1000000 + tv.tv_usec;

host_us = get_clock_realtime() / 1000; ?

> 	migration_duration_ns = MIN(1000000,

Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?

> 		host_ns - tb->time_of_the_day);
> 	migration_duration_tb = migration_duration_ns * ratio;
>
> 	guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
>
> 	off = guest_tb - cpu_get_real_ticks();

It's probably easier to read when you create one function that just 
returns a guest TB value adjusted by the time the last measurement 
happened. The fact that the KVM register wants an offset is a KVM 
implementation detail. The TB adjustment should happen generically.

>
> 	return off;
> }
>
>
>> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
>> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
>> don't want to do it in QEMU in case there will be a more fine-grained SPR
>> in the future.
> I believe rounding was not in the kernel when I started making this...
>
>
>> And from all I understand the timebase frequency is now architecturally
>> specified, so it won't change for newer cores, no?
> I asked people in our lab. Everyone says that it should not change but
> noone would bet on it too much.

When it changes and you want to live migrate, you'll need to implement a 
guest TB scale register and the whole idea of a "TB offset" ONE_REG is 
absurd.

The more I think about this the more I realize we should have created a 
"guest TB value", not a "guest TB offset" ONE_REG.

>
>
>> And if we migrate TCG
>> guests it will be the same between two hosts.
> And G5 uses 33333333. I really do not understand why it is bad to
> send-and-check timer frequency. Why?

Because the guest will continue to run at a different TB frequency on 
the new host and break.

>
>
> Is the rest ok? Thanks for review!

Not sure. Please rework everything according to the comments, make the 
code readable enough that your wife understands it and then resend it :).


Alex

  reply	other threads:[~2014-04-11  9:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-03 13:14 [Qemu-devel] [PATCH 0/4] power7/8 migration patches Alexey Kardashevskiy
2014-04-03 13:14 ` [Qemu-devel] [PATCH 1/4] kvm: Add set_one_reg/get_one_reg helpers Alexey Kardashevskiy
2014-05-08 12:27   ` Alexander Graf
2014-05-09  1:35     ` Alexey Kardashevskiy
2014-05-09  8:06       ` [Qemu-devel] [PATCH] kvm: make one_reg helpers available for everyone Cornelia Huck
2014-05-13 10:01         ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 2/4] spapr: Enable DABRX special register Alexey Kardashevskiy
2014-04-03 13:19   ` Alexander Graf
2014-04-04  6:13     ` Alexey Kardashevskiy
2014-04-04 12:21       ` Alexander Graf
2014-04-03 18:42   ` Tom Musta
2014-04-04  0:51     ` Alexey Kardashevskiy
2014-04-04 12:40       ` Tom Musta
2014-04-03 13:14 ` [Qemu-devel] [PATCH 3/4] KVM: PPC: Support POWER8 registers Alexey Kardashevskiy
2014-04-03 13:33   ` Alexander Graf
2014-04-03 19:12     ` Tom Musta
2014-04-04  6:58       ` Alexey Kardashevskiy
2014-04-04 12:23         ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration Alexey Kardashevskiy
2014-04-10 12:34   ` Alexander Graf
2014-04-10 14:31     ` Alexey Kardashevskiy
2014-04-11  9:40       ` Alexander Graf [this message]
2014-04-11 21:55         ` Benjamin Herrenschmidt
2014-04-11 22:59           ` Alexander Graf
2014-04-11 23:03             ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-04-12  3:44           ` [Qemu-devel] " Alexey Kardashevskiy
2014-04-12  7:25             ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5347B892.5010107@suse.de \
    --to=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=paulus@samba.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.