All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alexey Perevalov <a.perevalov@samsung.com>
Cc: qemu-devel@nongnu.org, i.maximets@samsung.com
Subject: Re: [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration
Date: Tue, 4 Apr 2017 20:06:48 +0100	[thread overview]
Message-ID: <20170404190648.GN2147@work-vm> (raw)
In-Reply-To: <1489850003-5652-1-git-send-email-a.perevalov@samsung.com>

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> Hi David,
> 
> I already asked you about downtime calculation for postcopy live migration.
> As I remember you said it's worth not to calculate it per vCPU or maybe I
> understood you incorrectly. I decided to proof it could be useful.

Thanks - apologies for taking so long to look at it.
Some higher level thoughts:
   a) It needs to be switchable - the tree etc look like they could use a fair
      amount of RAM.
   b) The cpu bitmask is a problem given we can have more than 64 CPUs
   c) Tracing the pages that took the longest can be interesting - I've done
      graphs of latencies before - you get fun things like watching messes
      where you lose requests and the page eventually arrives anyway after
      a few seconds.
      
> This patch set is based on commit 272d7dee5951f926fad1911f2f072e5915cdcba0
> of QEMU master branch. It requires commit into Andreas git repository
> "userfaultfd: provide pid in userfault uffd_msg"
> 
> When I tested it I found following moments are strange:
> 1. First userfault always occurs due to access to ram in vapic_map_rom_writable,
> all vCPU are sleeping in this time

That's probably not too surprising - I bet the vapic device load code does that?
I've sometimes wondered about preloading the queue on the source with some that we know
will need to be loaded early.

> 2. Latest half of all userfault was initiated by kworkers, that's why I had a doubt
> about current in handle_userfault inside kernel as a proper task_struct for pagefault
> initiator. All vCPU was sleeping at that moment.

When you say kworkers - which ones?  I wonder what they are - perhaps incoming network
packets using vhost?

> 3. Also there is a discrepancy, of vCPU state and real vCPU thread state.

What do you mean by that?

> This patch is just for showing and idea, if you ok with this idea none RFC patch will not
> include proc access && a lot of traces.
> Also I think it worth to guard postcopy_downtime in MigrationIncomingState and
> return calculated downtime into src, where qeury-migration will be invocked.

I don't think it's worth it, we can always ask the destination and sending stuff
back to the source is probably messy - especially at the end.

Dave

> Alexey Perevalov (2):
>   userfault: add pid into uffd_msg
>   migration: calculate downtime on dst side
> 
>  include/migration/migration.h     |  11 ++
>  linux-headers/linux/userfaultfd.h |   1 +
>  migration/migration.c             | 238 +++++++++++++++++++++++++++++++++++++-
>  migration/postcopy-ram.c          |  61 +++++++++-
>  migration/savevm.c                |   2 +
>  migration/trace-events            |  10 +-
>  6 files changed, 319 insertions(+), 4 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2017-04-04 19:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170318151328eucas1p24d754940723a0e777a2096a678a2094d@eucas1p2.samsung.com>
2017-03-18 15:13 ` [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration Alexey Perevalov
2017-03-18 15:13   ` [Qemu-devel] [PATCH 1/2] userfault: add pid into uffd_msg Alexey Perevalov
2017-04-04 17:57     ` Dr. David Alan Gilbert
2017-04-05 14:25       ` Alexey Perevalov
2017-04-05 14:30         ` Dr. David Alan Gilbert
2017-03-18 15:13   ` [Qemu-devel] [PATCH 2/2] migration: calculate downtime on dst side Alexey Perevalov
2017-04-04 19:01     ` Dr. David Alan Gilbert
2017-04-05 14:31       ` Alexey Perevalov
2017-03-18 15:18   ` [Qemu-devel] [RFC PATCH 0/2] Calcuate downtime for postcopy live migration no-reply
2017-04-04 19:06   ` Dr. David Alan Gilbert [this message]
2017-04-05 14:33     ` Alexey Perevalov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170404190648.GN2147@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=i.maximets@samsung.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.