qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: quintela@redhat.com, Alexey Kardashevskiy <aik@ozlabs.ru>,
	agraf@suse.de, qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	amit.shah@redhat.com
Subject: Re: [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration)
Date: Mon, 25 Jan 2016 22:10:08 +1100	[thread overview]
Message-ID: <20160125111008.GG32205@voom.redhat.com> (raw)
In-Reply-To: <56A5B712.9070808@ilande.co.uk>

[-- Attachment #1: Type: text/plain, Size: 6642 bytes --]

On Mon, Jan 25, 2016 at 05:48:02AM +0000, Mark Cave-Ayland wrote:
> On 18/01/16 04:51, David Gibson wrote:
> 
> > On Fri, Jan 15, 2016 at 05:46:10PM +0000, Mark Cave-Ayland wrote:
> >> On 12/01/16 02:44, David Gibson wrote:
> >>
> >>>>> In other words, isn't this just skipping the decrementer interrupts at
> >>>>> the qemu level rather than the guest level?
> >>>>>
> >>>>> It seems that instead we should be reconstructing the decrementer on
> >>>>> the destination based on an offset from the timebase.
> >>>>
> >>>> Well I haven't really looked at how time warping works during in
> >>>> migration for QEMU, however this seems to be the method used by
> >>>> hw/ppc/ppc.c's timebase_post_load() function but my understanding is
> >>>> that this isn't currently available for the g3beige/mac99 machines?
> >>>
> >>> Ah.. yes, it looks like the timebase migration stuff is only hooked in
> >>> on the pseries machine type.  As far as I can tell it should be
> >>> trivial to add it to other machines though - it doesn't appear to rely
> >>> on anything outside the common ppc timebase stuff.
> >>>
> >>>> Should the patch in fact do this but also add decrementer support? And
> >>>> if it did, would this have a negative effect on pseries?
> >>>
> >>> Yes, I think that's the right approach.  Note that rather than
> >>> duplicating the logic to adjust the decrementer over migration, it
> >>> should be possible to encode the decrementer as a diff from the
> >>> timebase across the migration.
> >>>
> >>> In fact.. I'm not sure it ever makes sense to store the decrementer
> >>> value as a direct value, since it's constantly changing - probably
> >>> makes more sense to derive it from the timebase whenever it is needed.
> >>>
> >>> As far as I know that should be fine for pseries.  I think the current
> >>> behaviour is probably technically wrong for pseries as well, but the
> >>> timing code of our Linux guests is robust enough to handle a small
> >>> displacement to the time of the next decrementer interrupt.
> >>
> >> I've had a bit of an experiment trying to implement something suitable,
> >> but I'm not 100% certain I've got this right.
> >>
> >> >From the code my understanding is that the timebase is effectively
> >> free-running and so if a migration takes 5s then you use tb_offset to
> >> calculate the difference between the timebase before migration, and
> >> subsequently apply the offset for all future reads of the timebase for
> >> the lifetime of the CPU (i.e. the migrated guest is effectively living
> >> at a point in the past where the timebase is consistent).
> > 
> > Um.. no.  At least in the usual configuration, the timebase represents
> > real, wall-clock time, so we expect it to jump forward across the
> > migration downtime.  This is important because the guest will use the
> > timebase to calculate real time differences.
> > 
> > However, the absolute value of the timebase may be different on the
> > *host* between source and destination for migration.  So what we need
> > to do is before migration we work out the delta between host and guest
> > notions of wall clock time (as defined by the guest timebase), and
> > transfer that in the migration stream.
> > 
> > On the destination we initialize the guest timebase so that the guest
> > maintains the same realtime offset from the host.  This means that as
> > long as source and destination system time is synchronized, guest
> > real-time tracking will continue correctly across the migration.
> > 
> > We do also make sure that the guest timebase never goes backwards, but
> > that would only happen if the source and destination host times were
> > badly out of sync.
> 
> I had a poke at trying to include the timebase state in the Mac machines
> but found that enabling this caused migration to freeze, even on top of
> my existing (known working) patchset.
> 
> Looking closer at the code, I don't think it can work on an x86 TCG host
> in its current form since the host/guest clocks are used interchangeably
> in places, e.g. in timebase_pre_save() we have this:
> 
> uint64_t ticks = cpu_get_host_ticks();
> ...
> tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> 
> So this implies that guest_timebase is set in higher resolution host
> ticks. But then in timebase_post_load() we have this:
> 
> migration_duration_tb = muldiv64(migration_duration_ns, freq,
>     NANOSECONDS_PER_SECOND);
> 
> which calculates the migration time in timebase ticks but then:
> 
> guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> tb_off_adj = guest_tb - cpu_get_host_ticks();
> 
> which mixes tb_remote->guest_timebase in host ticks with
> migration_duration_tb in timebase ticks.
> 
> I think that tb_off_adj should be in host ticks so should
> migration_duration_tb be dropped and guest_tb be calculated from
> migration_duration_ns instead? At least given the current mixing of host
> ticks and timebase ticks, I think this can only work correctly on PPC
> where the two are the same.
> 
> One other thing I noticed is that this code "broke" my normal testing
> pattern when I tend to launch qemu-system-ppc with -loadvm foo -S with
> the machine stopped. In this case the migration duration calculation is
> performed at the moment state is restored, and not the point where the
> CPU is started.
> 
> Should the adjustment calculation not take place in a cpu_start()
> function instead? Otherwise at the point when I resume the paused
> machine the tb_off_adj calculation will be based in the past and
> therefore wrong.

Um.. so the migration duration is a complete red herring, regardless
of the units.

Remember, we only ever compute the guest timebase value at the moment
the guest requests it - actually maintaining a current timebase value
makes sense in hardware, but would be nuts in software.

The timebase is a function of real, wall-clock time, and the migration
destination has a notion of wall-clock time without reference to the
source.

So what you need to transmit for migration is enough information to
compute the guest timebase from real-time - essentially just an offset
between real-time and the timebase.

The guest can potentially observe the migration duration as a jump in
timebase values, but qemu doesn't need to do any calculations with it.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-01-25 11:09 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 18:22 [Qemu-devel] [PATCH 0/4] target-ppc: migration fixups (TCG related) Mark Cave-Ayland
2016-01-06 18:22 ` [Qemu-devel] [PATCH 1/4] target-ppc: add CPU IRQ state to PPC VMStateDescription Mark Cave-Ayland
2016-01-08  2:20   ` Alexey Kardashevskiy
2016-01-06 18:22 ` [Qemu-devel] [PATCH 2/4] target-ppc: use cpu_write_xer() helper in cpu_post_load Mark Cave-Ayland
2016-01-08  2:25   ` Alexey Kardashevskiy
2016-01-18  3:12     ` [Qemu-devel] [Qemu-ppc] " David Gibson
2016-01-18  8:31       ` Mark Cave-Ayland
2016-01-19  0:11         ` David Gibson
2016-01-06 18:22 ` [Qemu-devel] [PATCH 3/4] target-ppc: add CPU access_type into the migration stream Mark Cave-Ayland
2016-01-08  2:29   ` Alexey Kardashevskiy
2016-01-25 19:03     ` Alexander Graf
2016-01-27  1:10       ` Alexey Kardashevskiy
2016-01-06 18:22 ` [Qemu-devel] [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration Mark Cave-Ayland
2016-01-08  2:47   ` Alexey Kardashevskiy
2016-01-08 14:21     ` Mark Cave-Ayland
2016-01-11  1:18       ` Alexey Kardashevskiy
2016-01-11  4:55         ` David Gibson
2016-01-11  7:43           ` Mark Cave-Ayland
2016-01-12  2:44             ` David Gibson
2016-01-15 17:46               ` Mark Cave-Ayland
2016-01-18  4:51                 ` David Gibson
2016-01-25  5:48                   ` [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration) Mark Cave-Ayland
2016-01-25 11:10                     ` David Gibson [this message]
2016-01-25 17:20                       ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
2016-01-26  5:51                         ` David Gibson
2016-01-26 22:31                       ` [Qemu-devel] Migrating decrementer Mark Cave-Ayland
2016-01-26 23:08                         ` Mark Cave-Ayland
2016-02-01  0:52                         ` David Gibson
2016-02-02 23:41                           ` Mark Cave-Ayland
2016-02-03  4:59                             ` David Gibson
2016-02-03  5:43                               ` Alexander Graf
2016-02-23 21:27                               ` Mark Cave-Ayland
2016-02-24  0:47                                 ` David Gibson
2016-02-24 12:31                                   ` Juan Quintela
2016-02-25  0:19                                     ` David Gibson
2016-02-25  4:33                                     ` Mark Cave-Ayland
2016-02-25  5:00                                       ` [Qemu-devel] [Qemu-ppc] " Mark Cave-Ayland
2016-02-25  9:50                                         ` Mark Cave-Ayland
2016-02-26  4:35                                           ` David Gibson
2016-02-26 12:29                                             ` Mark Cave-Ayland
2016-02-29  3:57                                               ` David Gibson
2016-02-29 20:21                                                 ` Mark Cave-Ayland
2016-03-10  4:57                                                   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160125111008.GG32205@voom.redhat.com \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=amit.shah@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).