From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56697) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYXwy-0000Qd-IT for qemu-devel@nongnu.org; Fri, 11 Apr 2014 05:40:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WYXws-0003tK-9a for qemu-devel@nongnu.org; Fri, 11 Apr 2014 05:40:44 -0400 Message-ID: <5347B892.5010107@suse.de> Date: Fri, 11 Apr 2014 11:40:34 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1396530891-6352-1-git-send-email-aik@ozlabs.ru> <1396530891-6352-5-git-send-email-aik@ozlabs.ru> <53468FBD.6040906@suse.de> <5346AB55.8030807@ozlabs.ru> In-Reply-To: <5346AB55.8030807@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/4] spapr: Add support for time base offset migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy , qemu-devel@nongnu.org Cc: Paul Mackerras , qemu-ppc On 10.04.14 16:31, Alexey Kardashevskiy wrote: > On 04/10/2014 10:34 PM, Alexander Graf wrote: >> On 03.04.14 15:14, Alexey Kardashevskiy wrote: >>> This allows guests to have a different timebase origin from the host. >>> >>> This is needed for migration, where a guest can migrate from one host >>> to another and the two hosts might have a different timebase origin. >>> However, the timebase seen by the guest must not go backwards, and >>> should go forwards only by a small amount corresponding to the time >>> taken for the migration. >>> >>> This is only supported for recent POWER hardware which has the TBU40 >>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not >>> 970. >>> >>> This adds kvm_access_one_reg() to access a special register which is not >>> in env->spr. >>> >>> The feature must be present in the host kernel. >>> >>> Signed-off-by: Alexey Kardashevskiy >>> --- >>> Changes: >>> v4: >>> * made it per machine timebase offser rather than per CPU >>> >>> v3: >>> * kvm_access_one_reg moved out to a separate patch >>> * tb_offset and host_timebase were replaced with guest_timebase as >>> the destionation does not really care of offset on the source >>> >>> v2: >>> * bumped the vmstate_ppc_cpu version >>> * defined version for the env.tb_env field >>> --- >>> hw/ppc/ppc.c | 120 >>> +++++++++++++++++++++++++++++++++++++++++++++++++ >>> hw/ppc/spapr.c | 3 +- >>> include/hw/ppc/spapr.h | 2 + >>> target-ppc/cpu-qom.h | 16 +++++++ >>> target-ppc/kvm.c | 5 +++ >>> target-ppc/machine.c | 4 +- >>> trace-events | 3 ++ >>> 7 files changed, 151 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c >>> index 9c2a132..b51db1b 100644 >>> --- a/hw/ppc/ppc.c >>> +++ b/hw/ppc/ppc.c >>> @@ -29,9 +29,11 @@ >>> #include "sysemu/cpus.h" >>> #include "hw/timer/m48t59.h" >>> #include "qemu/log.h" >>> +#include "qemu/error-report.h" >>> #include "hw/loader.h" >>> #include "sysemu/kvm.h" >>> #include "kvm_ppc.h" >>> +#include "trace.h" >>> //#define PPC_DEBUG_IRQ >>> //#define PPC_DEBUG_TB >>> @@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque, >>> uint32_t freq) >>> cpu_ppc_store_purr(cpu, 0x0000000000000000ULL); >>> } >>> +/* >>> + * Calculate timebase on the destination side of migration >>> + * >>> + * We calculate new timebase offset as shown below: >>> + * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0) >>> + * Gtb2 = tb2 + off2 >>> + * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0) >>> + * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0) >>> + * >>> + * where: >>> + * Gtb2 - destination guest timebase >>> + * tb2 - destination host timebase >>> + * off2 - destination timebase offset >>> + * tod2 - destination time of the day >>> + * Gtb1 - source guest timebase >>> + * tod1 - source time of the day >>> + * >>> + * The result we want is in @off2 >>> + * >>> + * Two conditions must be met for @off2: >>> + * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR >>> + * 2) Gtb2 >= Gtb1 >>> + */ >>> +static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb) >>> +{ >>> + uint64_t tb2, tod2; >>> + int64_t off2; >>> + int ratio = tb->freq / 1000000; >>> + struct timeval tv; >>> + >>> + tb2 = cpu_get_real_ticks(); >>> + gettimeofday(&tv, NULL); >>> + tod2 = tv.tv_sec * 1000000 + tv.tv_usec; >>> + >>> + off2 = tb->guest_timebase - tb2; >>> + if ((tod2 > tb->time_of_the_day) && >>> + (tod2 - tb->time_of_the_day < 1000000)) { >>> + off2 += (tod2 - tb->time_of_the_day) * ratio; >>> + } >>> + off2 = ROUND_UP(off2, 1 << 24); >>> + >>> + return off2; >>> +} >> I *think* what you're trying to say here is that you want >> >> assert(source_timebase_freq == timebase_freq); >> >> migration_duration_ns = host_ns - source_host_ns; >> guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns); >> kvm_set_guest_tb(guest_tb); >> -> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb()); >> >> But I honestly have not managed to read that from the code. Either this >> really is what you're trying to do and the code is just very hard to read >> (which means it needs to be written more easily) or you're doing something >> different which I don't understand. > > Is this any better? > > static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb) > { > struct timeval tv; > int64_t migration_duration_ns, migration_duration_tb; If I read the code correctly you're operating in us, not ns, no? > int64_t guest_tb, host_ns; > int ratio = tb->freq / 1000000; #define USEC_PER_SEC 1000000 You're also losing quite a bit of precision here, no? > int64_t off; > > gettimeofday(&tv, NULL); > host_ns = tv.tv_sec * 1000000 + tv.tv_usec; host_us = get_clock_realtime() / 1000; ? > migration_duration_ns = MIN(1000000, Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why? > host_ns - tb->time_of_the_day); > migration_duration_tb = migration_duration_ns * ratio; > > guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb); > > off = guest_tb - cpu_get_real_ticks(); It's probably easier to read when you create one function that just returns a guest TB value adjusted by the time the last measurement happened. The fact that the KVM register wants an offset is a KVM implementation detail. The TB adjustment should happen generically. > > return off; > } > > >> We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds >> up to its 40 bit granularity, so no need to do this in QEMU. In fact, we >> don't want to do it in QEMU in case there will be a more fine-grained SPR >> in the future. > I believe rounding was not in the kernel when I started making this... > > >> And from all I understand the timebase frequency is now architecturally >> specified, so it won't change for newer cores, no? > I asked people in our lab. Everyone says that it should not change but > noone would bet on it too much. When it changes and you want to live migrate, you'll need to implement a guest TB scale register and the whole idea of a "TB offset" ONE_REG is absurd. The more I think about this the more I realize we should have created a "guest TB value", not a "guest TB offset" ONE_REG. > > >> And if we migrate TCG >> guests it will be the same between two hosts. > And G5 uses 33333333. I really do not understand why it is bad to > send-and-check timer frequency. Why? Because the guest will continue to run at a different TB frequency on the new host and break. > > > Is the rest ok? Thanks for review! Not sure. Please rework everything according to the comments, make the code readable enough that your wife understands it and then resend it :). Alex