Re: [Qemu-devel] [regression] Clock jump on VM migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Neil Skrypuch <neil@tembosocial.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [regression] Clock jump on VM migration
Date: Fri, 8 Feb 2019 09:48:19 +0000	[thread overview]
Message-ID: <20190208094818.GA2608@work-vm> (raw)
In-Reply-To: <20190208062441.GF16257@stefanha-x1.localdomain>

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Thu, Feb 07, 2019 at 05:33:25PM -0500, Neil Skrypuch wrote:
> 
> Thanks for your email!
> 
> Please post your QEMU command-line.
> 
> > The clock jump numbers above are from NTP, but you can see that they are quite 
> > close to the amount of time spent in raw_co_invalidate_cache. So, it looks 
> > like flushing the cache is just taking a long time and stalling the guest, 
> > which causes the clock jump. This isn't too surprising as the entire disk 
> > image was just written as part of the block mirror and would likely still be 
> > in the cache.
> > 
> > I see the use case for this feature, but I don't think it applies here, as 
> > we're not technically using shared storage. I believe an option to toggle this 
> > behaviour on/off and/or some sort of heuristic to guess whether or not it 
> > should be enabled by default would be in order here.
> 
> It would be good to figure out how to perform the flush without
> affecting guest time at all.  The clock jump will also inconvenience
> users who do need the flush, so I rather not work around the clock jump
> for a subset of users only.

One thing that makes Neil's setup different is that having the source
and destination on the same host, that fadvise is bound to drop pages
that are actually in use by the source on the same host.

But I'm also curious at what point in the migration we call the
invalidate and so which threads get held up, in which state.

Neil: Another printf would also be interesting, between the
bdrv_co_flush and the posix_fadvise;  I'm assuming it's the
bdrv_co_flush that's taking the time but it would be good to check.

Dave

> Stefan

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2019-02-08  9:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 22:33 [Qemu-devel] [regression] Clock jump on VM migration Neil Skrypuch
2019-02-08  6:24 ` Stefan Hajnoczi
2019-02-08  9:48   ` Dr. David Alan Gilbert [this message]
2019-02-08 22:52     ` Neil Skrypuch
2019-02-12  2:56       ` Stefan Hajnoczi
2019-02-26 10:45         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190208094818.GA2608@work-vm \
    --to=dgilbert@redhat.com \
    --cc=neil@tembosocial.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).