From: Alexander Graf <agraf@suse.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-devel@nongnu.org
Cc: Paul Mackerras <paulus@samba.org>, qemu-ppc <qemu-ppc@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration
Date: Fri, 11 Apr 2014 11:40:34 +0200 [thread overview]
Message-ID: <5347B892.5010107@suse.de> (raw)
In-Reply-To: <5346AB55.8030807@ozlabs.ru>
On 10.04.14 16:31, Alexey Kardashevskiy wrote:
> On 04/10/2014 10:34 PM, Alexander Graf wrote:
>> On 03.04.14 15:14, Alexey Kardashevskiy wrote:
>>> This allows guests to have a different timebase origin from the host.
>>>
>>> This is needed for migration, where a guest can migrate from one host
>>> to another and the two hosts might have a different timebase origin.
>>> However, the timebase seen by the guest must not go backwards, and
>>> should go forwards only by a small amount corresponding to the time
>>> taken for the migration.
>>>
>>> This is only supported for recent POWER hardware which has the TBU40
>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
>>> 970.
>>>
>>> This adds kvm_access_one_reg() to access a special register which is not
>>> in env->spr.
>>>
>>> The feature must be present in the host kernel.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> Changes:
>>> v4:
>>> * made it per machine timebase offser rather than per CPU
>>>
>>> v3:
>>> * kvm_access_one_reg moved out to a separate patch
>>> * tb_offset and host_timebase were replaced with guest_timebase as
>>> the destionation does not really care of offset on the source
>>>
>>> v2:
>>> * bumped the vmstate_ppc_cpu version
>>> * defined version for the env.tb_env field
>>> ---
>>> hw/ppc/ppc.c | 120
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>> hw/ppc/spapr.c | 3 +-
>>> include/hw/ppc/spapr.h | 2 +
>>> target-ppc/cpu-qom.h | 16 +++++++
>>> target-ppc/kvm.c | 5 +++
>>> target-ppc/machine.c | 4 +-
>>> trace-events | 3 ++
>>> 7 files changed, 151 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
>>> index 9c2a132..b51db1b 100644
>>> --- a/hw/ppc/ppc.c
>>> +++ b/hw/ppc/ppc.c
>>> @@ -29,9 +29,11 @@
>>> #include "sysemu/cpus.h"
>>> #include "hw/timer/m48t59.h"
>>> #include "qemu/log.h"
>>> +#include "qemu/error-report.h"
>>> #include "hw/loader.h"
>>> #include "sysemu/kvm.h"
>>> #include "kvm_ppc.h"
>>> +#include "trace.h"
>>> //#define PPC_DEBUG_IRQ
>>> //#define PPC_DEBUG_TB
>>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
>>> uint32_t freq)
>>> cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
>>> }
>>> +/*
>>> + * Calculate timebase on the destination side of migration
>>> + *
>>> + * We calculate new timebase offset as shown below:
>>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
>>> + * Gtb2 = tb2 + off2
>>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
>>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
>>> + *
>>> + * where:
>>> + * Gtb2 - destination guest timebase
>>> + * tb2 - destination host timebase
>>> + * off2 - destination timebase offset
>>> + * tod2 - destination time of the day
>>> + * Gtb1 - source guest timebase
>>> + * tod1 - source time of the day
>>> + *
>>> + * The result we want is in @off2
>>> + *
>>> + * Two conditions must be met for @off2:
>>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
>>> + * 2) Gtb2 >= Gtb1
>>> + */
>>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
>>> +{
>>> + uint64_t tb2, tod2;
>>> + int64_t off2;
>>> + int ratio = tb->freq / 1000000;
>>> + struct timeval tv;
>>> +
>>> + tb2 = cpu_get_real_ticks();
>>> + gettimeofday(&tv, NULL);
>>> + tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
>>> +
>>> + off2 = tb->guest_timebase - tb2;
>>> + if ((tod2 > tb->time_of_the_day) &&
>>> + (tod2 - tb->time_of_the_day < 1000000)) {
>>> + off2 += (tod2 - tb->time_of_the_day) * ratio;
>>> + }
>>> + off2 = ROUND_UP(off2, 1 << 24);
>>> +
>>> + return off2;
>>> +}
>> I *think* what you're trying to say here is that you want
>>
>> assert(source_timebase_freq == timebase_freq);
>>
>> migration_duration_ns = host_ns - source_host_ns;
>> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
>> kvm_set_guest_tb(guest_tb);
>> -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
>>
>> But I honestly have not managed to read that from the code. Either this
>> really is what you're trying to do and the code is just very hard to read
>> (which means it needs to be written more easily) or you're doing something
>> different which I don't understand.
>
> Is this any better?
>
> static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
> {
> struct timeval tv;
> int64_t migration_duration_ns, migration_duration_tb;
If I read the code correctly you're operating in us, not ns, no?
> int64_t guest_tb, host_ns;
> int ratio = tb->freq / 1000000;
#define USEC_PER_SEC 1000000
You're also losing quite a bit of precision here, no?
> int64_t off;
>
> gettimeofday(&tv, NULL);
> host_ns = tv.tv_sec * 1000000 + tv.tv_usec;
host_us = get_clock_realtime() / 1000; ?
> migration_duration_ns = MIN(1000000,
Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?
> host_ns - tb->time_of_the_day);
> migration_duration_tb = migration_duration_ns * ratio;
>
> guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
>
> off = guest_tb - cpu_get_real_ticks();
It's probably easier to read when you create one function that just
returns a guest TB value adjusted by the time the last measurement
happened. The fact that the KVM register wants an offset is a KVM
implementation detail. The TB adjustment should happen generically.
>
> return off;
> }
>
>
>> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
>> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
>> don't want to do it in QEMU in case there will be a more fine-grained SPR
>> in the future.
> I believe rounding was not in the kernel when I started making this...
>
>
>> And from all I understand the timebase frequency is now architecturally
>> specified, so it won't change for newer cores, no?
> I asked people in our lab. Everyone says that it should not change but
> noone would bet on it too much.
When it changes and you want to live migrate, you'll need to implement a
guest TB scale register and the whole idea of a "TB offset" ONE_REG is
absurd.
The more I think about this the more I realize we should have created a
"guest TB value", not a "guest TB offset" ONE_REG.
>
>
>> And if we migrate TCG
>> guests it will be the same between two hosts.
> And G5 uses 33333333. I really do not understand why it is bad to
> send-and-check timer frequency. Why?
Because the guest will continue to run at a different TB frequency on
the new host and break.
>
>
> Is the rest ok? Thanks for review!
Not sure. Please rework everything according to the comments, make the
code readable enough that your wife understands it and then resend it :).
Alex
next prev parent reply other threads:[~2014-04-11 9:40 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-03 13:14 [Qemu-devel] [PATCH 0/4] power7/8 migration patches Alexey Kardashevskiy
2014-04-03 13:14 ` [Qemu-devel] [PATCH 1/4] kvm: Add set_one_reg/get_one_reg helpers Alexey Kardashevskiy
2014-05-08 12:27 ` Alexander Graf
2014-05-09 1:35 ` Alexey Kardashevskiy
2014-05-09 8:06 ` [Qemu-devel] [PATCH] kvm: make one_reg helpers available for everyone Cornelia Huck
2014-05-13 10:01 ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 2/4] spapr: Enable DABRX special register Alexey Kardashevskiy
2014-04-03 13:19 ` Alexander Graf
2014-04-04 6:13 ` Alexey Kardashevskiy
2014-04-04 12:21 ` Alexander Graf
2014-04-03 18:42 ` Tom Musta
2014-04-04 0:51 ` Alexey Kardashevskiy
2014-04-04 12:40 ` Tom Musta
2014-04-03 13:14 ` [Qemu-devel] [PATCH 3/4] KVM: PPC: Support POWER8 registers Alexey Kardashevskiy
2014-04-03 13:33 ` Alexander Graf
2014-04-03 19:12 ` Tom Musta
2014-04-04 6:58 ` Alexey Kardashevskiy
2014-04-04 12:23 ` Alexander Graf
2014-04-03 13:14 ` [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration Alexey Kardashevskiy
2014-04-10 12:34 ` Alexander Graf
2014-04-10 14:31 ` Alexey Kardashevskiy
2014-04-11 9:40 ` Alexander Graf [this message]
2014-04-11 21:55 ` Benjamin Herrenschmidt
2014-04-11 22:59 ` Alexander Graf
2014-04-11 23:03 ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-04-12 3:44 ` [Qemu-devel] " Alexey Kardashevskiy
2014-04-12 7:25 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5347B892.5010107@suse.de \
--to=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=paulus@samba.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).