From: David Gibson <david@gibson.dropbear.id.au>
To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: quintela@redhat.com, Alexey Kardashevskiy <aik@ozlabs.ru>,
agraf@suse.de, qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
amit.shah@redhat.com
Subject: Re: [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration)
Date: Mon, 25 Jan 2016 22:10:08 +1100 [thread overview]
Message-ID: <20160125111008.GG32205@voom.redhat.com> (raw)
In-Reply-To: <56A5B712.9070808@ilande.co.uk>
[-- Attachment #1: Type: text/plain, Size: 6642 bytes --]
On Mon, Jan 25, 2016 at 05:48:02AM +0000, Mark Cave-Ayland wrote:
> On 18/01/16 04:51, David Gibson wrote:
>
> > On Fri, Jan 15, 2016 at 05:46:10PM +0000, Mark Cave-Ayland wrote:
> >> On 12/01/16 02:44, David Gibson wrote:
> >>
> >>>>> In other words, isn't this just skipping the decrementer interrupts at
> >>>>> the qemu level rather than the guest level?
> >>>>>
> >>>>> It seems that instead we should be reconstructing the decrementer on
> >>>>> the destination based on an offset from the timebase.
> >>>>
> >>>> Well I haven't really looked at how time warping works during in
> >>>> migration for QEMU, however this seems to be the method used by
> >>>> hw/ppc/ppc.c's timebase_post_load() function but my understanding is
> >>>> that this isn't currently available for the g3beige/mac99 machines?
> >>>
> >>> Ah.. yes, it looks like the timebase migration stuff is only hooked in
> >>> on the pseries machine type. As far as I can tell it should be
> >>> trivial to add it to other machines though - it doesn't appear to rely
> >>> on anything outside the common ppc timebase stuff.
> >>>
> >>>> Should the patch in fact do this but also add decrementer support? And
> >>>> if it did, would this have a negative effect on pseries?
> >>>
> >>> Yes, I think that's the right approach. Note that rather than
> >>> duplicating the logic to adjust the decrementer over migration, it
> >>> should be possible to encode the decrementer as a diff from the
> >>> timebase across the migration.
> >>>
> >>> In fact.. I'm not sure it ever makes sense to store the decrementer
> >>> value as a direct value, since it's constantly changing - probably
> >>> makes more sense to derive it from the timebase whenever it is needed.
> >>>
> >>> As far as I know that should be fine for pseries. I think the current
> >>> behaviour is probably technically wrong for pseries as well, but the
> >>> timing code of our Linux guests is robust enough to handle a small
> >>> displacement to the time of the next decrementer interrupt.
> >>
> >> I've had a bit of an experiment trying to implement something suitable,
> >> but I'm not 100% certain I've got this right.
> >>
> >> >From the code my understanding is that the timebase is effectively
> >> free-running and so if a migration takes 5s then you use tb_offset to
> >> calculate the difference between the timebase before migration, and
> >> subsequently apply the offset for all future reads of the timebase for
> >> the lifetime of the CPU (i.e. the migrated guest is effectively living
> >> at a point in the past where the timebase is consistent).
> >
> > Um.. no. At least in the usual configuration, the timebase represents
> > real, wall-clock time, so we expect it to jump forward across the
> > migration downtime. This is important because the guest will use the
> > timebase to calculate real time differences.
> >
> > However, the absolute value of the timebase may be different on the
> > *host* between source and destination for migration. So what we need
> > to do is before migration we work out the delta between host and guest
> > notions of wall clock time (as defined by the guest timebase), and
> > transfer that in the migration stream.
> >
> > On the destination we initialize the guest timebase so that the guest
> > maintains the same realtime offset from the host. This means that as
> > long as source and destination system time is synchronized, guest
> > real-time tracking will continue correctly across the migration.
> >
> > We do also make sure that the guest timebase never goes backwards, but
> > that would only happen if the source and destination host times were
> > badly out of sync.
>
> I had a poke at trying to include the timebase state in the Mac machines
> but found that enabling this caused migration to freeze, even on top of
> my existing (known working) patchset.
>
> Looking closer at the code, I don't think it can work on an x86 TCG host
> in its current form since the host/guest clocks are used interchangeably
> in places, e.g. in timebase_pre_save() we have this:
>
> uint64_t ticks = cpu_get_host_ticks();
> ...
> tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
>
> So this implies that guest_timebase is set in higher resolution host
> ticks. But then in timebase_post_load() we have this:
>
> migration_duration_tb = muldiv64(migration_duration_ns, freq,
> NANOSECONDS_PER_SECOND);
>
> which calculates the migration time in timebase ticks but then:
>
> guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> tb_off_adj = guest_tb - cpu_get_host_ticks();
>
> which mixes tb_remote->guest_timebase in host ticks with
> migration_duration_tb in timebase ticks.
>
> I think that tb_off_adj should be in host ticks so should
> migration_duration_tb be dropped and guest_tb be calculated from
> migration_duration_ns instead? At least given the current mixing of host
> ticks and timebase ticks, I think this can only work correctly on PPC
> where the two are the same.
>
> One other thing I noticed is that this code "broke" my normal testing
> pattern when I tend to launch qemu-system-ppc with -loadvm foo -S with
> the machine stopped. In this case the migration duration calculation is
> performed at the moment state is restored, and not the point where the
> CPU is started.
>
> Should the adjustment calculation not take place in a cpu_start()
> function instead? Otherwise at the point when I resume the paused
> machine the tb_off_adj calculation will be based in the past and
> therefore wrong.
Um.. so the migration duration is a complete red herring, regardless
of the units.
Remember, we only ever compute the guest timebase value at the moment
the guest requests it - actually maintaining a current timebase value
makes sense in hardware, but would be nuts in software.
The timebase is a function of real, wall-clock time, and the migration
destination has a notion of wall-clock time without reference to the
source.
So what you need to transmit for migration is enough information to
compute the guest timebase from real-time - essentially just an offset
between real-time and the timebase.
The guest can potentially observe the migration duration as a jump in
timebase values, but qemu doesn't need to do any calculations with it.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-01-25 11:09 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-06 18:22 [Qemu-devel] [PATCH 0/4] target-ppc: migration fixups (TCG related) Mark Cave-Ayland
2016-01-06 18:22 ` [Qemu-devel] [PATCH 1/4] target-ppc: add CPU IRQ state to PPC VMStateDescription Mark Cave-Ayland
2016-01-08 2:20 ` Alexey Kardashevskiy
2016-01-06 18:22 ` [Qemu-devel] [PATCH 2/4] target-ppc: use cpu_write_xer() helper in cpu_post_load Mark Cave-Ayland
2016-01-08 2:25 ` Alexey Kardashevskiy
2016-01-18 3:12 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2016-01-18 8:31 ` Mark Cave-Ayland
2016-01-19 0:11 ` David Gibson
2016-01-06 18:22 ` [Qemu-devel] [PATCH 3/4] target-ppc: add CPU access_type into the migration stream Mark Cave-Ayland
2016-01-08 2:29 ` Alexey Kardashevskiy
2016-01-25 19:03 ` Alexander Graf
2016-01-27 1:10 ` Alexey Kardashevskiy
2016-01-06 18:22 ` [Qemu-devel] [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration Mark Cave-Ayland
2016-01-08 2:47 ` Alexey Kardashevskiy
2016-01-08 14:21 ` Mark Cave-Ayland
2016-01-11 1:18 ` Alexey Kardashevskiy
2016-01-11 4:55 ` David Gibson
2016-01-11 7:43 ` Mark Cave-Ayland
2016-01-12 2:44 ` David Gibson
2016-01-15 17:46 ` Mark Cave-Ayland
2016-01-18 4:51 ` David Gibson
2016-01-25 5:48 ` [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration) Mark Cave-Ayland
2016-01-25 11:10 ` David Gibson [this message]
2016-01-25 17:20 ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
2016-01-26 5:51 ` David Gibson
2016-01-26 22:31 ` [Qemu-devel] Migrating decrementer Mark Cave-Ayland
2016-01-26 23:08 ` Mark Cave-Ayland
2016-02-01 0:52 ` David Gibson
2016-02-02 23:41 ` Mark Cave-Ayland
2016-02-03 4:59 ` David Gibson
2016-02-03 5:43 ` Alexander Graf
2016-02-23 21:27 ` Mark Cave-Ayland
2016-02-24 0:47 ` David Gibson
2016-02-24 12:31 ` Juan Quintela
2016-02-25 0:19 ` David Gibson
2016-02-25 4:33 ` Mark Cave-Ayland
2016-02-25 5:00 ` [Qemu-devel] [Qemu-ppc] " Mark Cave-Ayland
2016-02-25 9:50 ` Mark Cave-Ayland
2016-02-26 4:35 ` David Gibson
2016-02-26 12:29 ` Mark Cave-Ayland
2016-02-29 3:57 ` David Gibson
2016-02-29 20:21 ` Mark Cave-Ayland
2016-03-10 4:57 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160125111008.GG32205@voom.redhat.com \
--to=david@gibson.dropbear.id.au \
--cc=agraf@suse.de \
--cc=aik@ozlabs.ru \
--cc=amit.shah@redhat.com \
--cc=mark.cave-ayland@ilande.co.uk \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).