All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alexey Perevalov <a.perevalov@samsung.com>
Cc: dgilbert@redhat.com, qemu-devel@nongnu.org, i.maximets@samsung.com
Subject: Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
Date: Tue, 25 Apr 2017 18:25:37 +0800	[thread overview]
Message-ID: <20170425102537.GD31709@pxdev.xzpeter.org> (raw)
In-Reply-To: <234489b0-d33b-9a49-ec17-b1a6823d4eb3@samsung.com>

On Tue, Apr 25, 2017 at 01:10:30PM +0300, Alexey Perevalov wrote:
> On 04/25/2017 11:24 AM, Peter Xu wrote:
> >On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:
> >
> >[...]
> >
> >>+/*
> >>+ * This function calculates downtime per cpu and trace it
> >>+ *
> >>+ *  Also it calculates total downtime as an interval's overlap,
> >>+ *  for many vCPU.
> >>+ *
> >>+ *  The approach is following:
> >>+ *  Initially intervals are represented in tree where key is
> >>+ *  pagefault address, and values:
> >>+ *   begin - page fault time
> >>+ *   end   - page load time
> >>+ *   cpus  - bit mask shows affected cpus
> >>+ *
> >>+ *  To calculate overlap on all cpus, intervals converted into
> >>+ *  array of points in time (downtime_points), the size of
> >>+ *  array is 2 * number of nodes in tree of intervals (2 array
> >>+ *  elements per one in element of interval).
> >>+ *  Each element is marked as end (E) or as start (S) of interval.
> >>+ *  The overlap downtime will be calculated for SE, only in case
> >>+ *  there is sequence S(0..N)E(M) for every vCPU.
> >>+ *
> >>+ * As example we have 3 CPU
> >>+ *
> >>+ *      S1        E1           S1               E1
> >>+ * -----***********------------xxx***************------------------------> CPU1
> >>+ *
> >>+ *             S2                E2
> >>+ * ------------****************xxx---------------------------------------> CPU2
> >>+ *
> >>+ *                         S3            E3
> >>+ * ------------------------****xxx********-------------------------------> CPU3
> >>+ *
> >>+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> >>+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> >>+ * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
> >>+ * Legend of picture is following: * - means downtime per vCPU
> >>+ *                                 x - means overlapped downtime
> >>+ */
> >Not sure whether I get the point in this patch... iiuc we defined the
> >downtime here as the period when all vcpus are halted, right?
> >
> >If so, I have a few questions:
> >
> >- will this algorithm consume lots of memory? since I see we have one
> >   trace object per fault page address
> I don't think, it consumes too much, one DowntimeDuration
> takes (if I'm using bitmap_try_new, in this patch set I used pointer to
> uint64_t array to keep bitmap array,
> but I'm going to use include/qemu/bitmap.h, it works with pointers to long)
> 
> (2* int64 + (ROUND_UP((smp_cpus + BITS_PER_BYTE * sizeof(long) - 1 /
> (BITS_PER_BYTE * sizeof(long)))) * siezof(long)
> so it's about 16 + at least 4 bytes, per page fault,
> Lets assume we migration 256 vCPU and 256 Gb of ram and that ram is based on
> 4Kb pages - it's really bad case
> 16 + ((256 + 8 * 4 - 1) / ( 8 * 4 )) * 4 = 52 bytes
> (256 * 1024 * 1024 * 1024)/(4 * 1024) = 67108864 page faults, but not all of
> these pages will be pagefaulted, due to
> page pre-fetching
> 67108864 * 52 = 3489660928 bytes (3.5 Gb for that operation),
> but I have a doubt, who will use 4Kb pages for 256 Gb, probably
> 2Mb or 1G huge page will be chosen on x86, on ARM or other architecture it
> could be another values.

Hmm, it looks still big though...

> 
> >
> >- do we need to protect the tree to make sure there's no insertion
> >   when doing the calculation?
> I asked the same question when sent RFC patches,
> the answer here is no, we should not, due to right now,
> it's only one socket and one listen thread (maybe in future,
> it will be required, maybe after multi fd patch set),
> and calculation is doing synchronously right after migration complete.

Okay.

> 
> >
> >- if the only thing we want here is the "total downtime", whether
> >   below would work? (assuming N is vcpu numbers)
> >
> >   a. define array cpu_fault_addr[N], to store current faulted address
> >      for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
> >      be 0.
> >
> >   b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
> >      corresponding fault address.
> at this time need to is fault happens for all another vCPU,
> and if it happens mark current time as total vCPU downtime start.
> 
> >   c. when page copy finished, loop over cpu_fault_addr[] to see
> >      whether that matches any, clear corresponding element if matched.
> so when page copy finished and mark for total vCPU is set,
> yes that interval is a part of total downtime.
> >
> >   Then, we can just measure the period when cpu_fault_addr[] is all
> >   set (by tracing at both b. and c.). Can this work?
> Yes, it works, but it's better to keep time - cpu_fault_time,
> address is not important here, it doesn't matter the reason of pagefault.

We still need the addresses? So that when we do COPY, we can check the
new page address against these stored ones, to know which vcpus to
clear the bit.

> 2 vCPU could fault due to access to one page, ok, it's not a problem, just
> store
> time when it was faulted.
> Looks like it's better algorithm, with lesser complexity,
> thank you a lot.

My pleasure. Thanks,

-- 
Peter Xu

  reply	other threads:[~2017-04-25 10:25 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170414131735eucas1p21f1fcadf426789276f567191372f7794@eucas1p2.samsung.com>
2017-04-14 13:17 ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration Alexey Perevalov
2017-04-14 13:17   ` [Qemu-devel] [PATCH 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
2017-04-14 13:17   ` [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c Alexey Perevalov
2017-04-14 16:05     ` Philippe Mathieu-Daudé
2017-04-17  7:07       ` Alexey
2017-04-21 10:01       ` Dr. David Alan Gilbert
2017-04-21 10:27     ` Peter Maydell
2017-04-21 15:10       ` Alexey
2017-04-21 15:49         ` Peter Maydell
2017-04-25 11:23           ` Dr. David Alan Gilbert
2017-04-14 13:17   ` [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support Alexey Perevalov
2017-04-21 10:24     ` Dr. David Alan Gilbert
2017-04-21 15:22       ` Alexey
2017-04-24  8:03         ` Peter Xu
2017-04-24  8:12         ` Peter Xu
2017-04-24  8:38           ` Alexey
2017-04-24 17:10             ` Dr. David Alan Gilbert
2017-04-25  7:55               ` Alexey
2017-04-25 11:14                 ` Dr. David Alan Gilbert
2017-04-25 11:51                   ` Alexey Perevalov
2017-04-14 13:17   ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Alexey Perevalov
2017-04-21 12:00     ` Dr. David Alan Gilbert
2017-04-21 18:47       ` Alexey
2017-04-24 17:11         ` Dr. David Alan Gilbert
2017-04-22  9:49       ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK) Alexey
2017-04-24 17:13         ` Dr. David Alan Gilbert
2017-04-25  8:24     ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Peter Xu
2017-04-25 10:10       ` Alexey Perevalov
2017-04-25 10:25         ` Peter Xu [this message]
2017-04-25 10:47           ` Alexey Perevalov
2017-04-14 13:17   ` [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source Alexey Perevalov
2017-04-24 17:26     ` Dr. David Alan Gilbert
2017-04-25  5:51       ` Alexey
2017-04-14 13:17   ` [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy Alexey Perevalov
2017-04-17 13:32     ` Philippe Mathieu-Daudé
2017-04-24 18:03     ` Dr. David Alan Gilbert
2017-04-17  2:32   ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration no-reply
2017-04-17  2:36   ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170425102537.GD31709@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=dgilbert@redhat.com \
    --cc=i.maximets@samsung.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.