All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alexey Perevalov <a.perevalov@samsung.com>
Cc: qemu-devel@nongnu.org, i.maximets@samsung.com, eblake@redhat.com,
	dgilbert@redhat.com
Subject: Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side
Date: Wed, 24 May 2017 15:53:05 +0800	[thread overview]
Message-ID: <20170524075305.GG3873@pxdev.xzpeter.org> (raw)
In-Reply-To: <1495539071-12995-9-git-send-email-a.perevalov@samsung.com>

On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
> This patch provides blocktime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for blocktime per vCPU (could be traced with page_fault_addr)
> 
> Blocktime will not calculated if postcopy_blocktime field of
> MigrationIncomingState wasn't initialized.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |   5 ++-
>  2 files changed, 105 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index d647769..e70c44b 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -23,6 +23,7 @@
>  #include "postcopy-ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid) {
> +            return cpu_iter->cpu_index;
> +        }
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> +        RAMBlock *rb)
> +{
> +    int cpu;
> +    unsigned long int nr_bit;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int64_t now_ms;
> +
> +    if (!dc || ptid == 0) {
> +        return;
> +    }
> +    cpu = get_mem_fault_cpu_index(ptid);
> +    if (cpu < 0) {
> +        return;
> +    }
> +    nr_bit = get_copied_bit_offset(addr);
> +    if (test_bit(nr_bit, mis->copied_pages)) {
> +        return;
> +    }
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    if (dc->vcpu_addr[cpu] == 0) {
> +        atomic_inc(&dc->smp_cpus_down);
> +    }
> +
> +    atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> +    atomic_xchg__nocheck(&dc->last_begin, now_ms);
> +    atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);

Looks like this is not what you and Dave have discussed?

(Btw, sorry to have not followed the thread recently, so I just went
 over the discussion again...)

What I see that Dave suggested is (I copied from Dave's email):

blocktime_start:
   set CPU stall address
   check bitmap entry
     if set then zero stall-address

While here it is:

blocktime_start:
   check bitmap entry
     if set then return
   set CPU stall address

I don't think current version can really solve the risk condition. See
this possible sequence:

       receive-thread             fault-thread 
       --------------             ------------
                                  blocktime_start
                                    check bitmap entry,
                                      if set then return
       blocktime_end
         set bitmap entry
         read CPU stall address,
           if none-0 then zero it
                                    set CPU stall address [1]
        
Then imho the address set at [1] will be stall again until forever.

I think we should follow exactly what Dave has suggested.

And.. after a second thought, I am afraid even this would not satisfy
all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
the question is, can it quickly trigger another page fault?

Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
showing vcpu-thread X as well):

  receive-thread       fault-thread        vcpu-thread X
  --------------       ------------        -------------
                                           fault at addr A1
                       fault_addr[X]=A1
  UFFDIO_COPY page A1
  check fault_addr[X] with A1
    if match, clear fault_addr[X]
                                           vcpu X starts

This is fine.

While since "vcpu X starts" can be right after UFFDIO_COPY, can this
be possible?

  receive-thread       fault-thread        vcpu-thread X
  --------------       ------------        -------------
                                           fault at addr A1
                       fault_addr[X]=A1
  UFFDIO_COPY page A1
                                           vcpu X starts
                                           fault at addr A2
                       fault_addr[X]=A2
  check fault_addr[X] with A1
    if match, clear fault_addr[X]
        ^
        |
        +---------- here it will not match since now fault_addr[X]==A2

Then looks like fault_addr[X], which is currently A1, will stall
again?

(I feel like finally we may need something like a per-cpu lock... or I
 must have missed something)

> +
> +    trace_mark_postcopy_blocktime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> +            cpu);
> +}
> +
> +static void mark_postcopy_blocktime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +    int i, affected_cpu = 0;
> +    int64_t now_ms;
> +    bool vcpu_total_blocktime = false;
> +    unsigned long int nr_bit;
> +
> +    if (!dc) {
> +        return;
> +    }
> +    /* mark that page as copied */
> +    nr_bit = get_copied_bit_offset(addr);
> +    set_bit_atomic(nr_bit, mis->copied_pages);
> +
> +    now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +
> +    /* lookup cpu, to clear it,
> +     * that algorithm looks straighforward, but it's not
> +     * optimal, more optimal algorithm is keeping tree or hash
> +     * where key is address value is a list of  */
> +    for (i = 0; i < smp_cpus; i++) {
> +        uint64_t vcpu_blocktime = 0;
> +        if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
> +            continue;
> +        }
> +        atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);

Why use *__nocheck() rather than atomic_xchg() or even atomic_read()?
Same thing happened a lot in current patch.

Thanks,

> +        vcpu_blocktime = now_ms -
> +            atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
> +        affected_cpu += 1;
> +        /* we need to know is that mark_postcopy_end was due to
> +         * faulted page, another possible case it's prefetched
> +         * page and in that case we shouldn't be here */
> +        if (!vcpu_total_blocktime &&
> +            atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
> +            vcpu_total_blocktime = true;
> +        }
> +        /* continue cycle, due to one page could affect several vCPUs */
> +        dc->vcpu_blocktime[i] += vcpu_blocktime;
> +    }
> +
> +    atomic_sub(&dc->smp_cpus_down, affected_cpu);
> +    if (vcpu_total_blocktime) {
> +        dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
> +    }
> +    trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime);
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -654,8 +750,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset,
> +                                                msg.arg.pagefault.feat.ptid);
>  
> +        mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),
> +                msg.arg.pagefault.feat.ptid, rb);
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -750,6 +849,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>          return -e;
>      }
> +    mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host);
>  
>      trace_postcopy_place_page(host);
>      return 0;
> diff --git a/migration/trace-events b/migration/trace-events
> index 5b8ccf3..7bdadbb 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_blocktime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -188,7 +190,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, uint32_t pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %u"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -197,6 +199,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.8.3.1
> 

-- 
Peter Xu

  reply	other threads:[~2017-05-24  7:53 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170523113120eucas1p2032ace2121aa8627067b6d7f03fbf482@eucas1p2.samsung.com>
2017-05-23 11:31 ` [Qemu-devel] [PATCH V6 00/10] calculate blocktime for postcopy live migration Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 01/10] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 02/10] migration: pass MigrationIncomingState* into migration check functions Alexey Perevalov
2017-05-31 17:54     ` Dr. David Alan Gilbert
2017-06-05  5:59       ` Alexey Perevalov
2017-06-05  9:15         ` Dr. David Alan Gilbert
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 03/10] migration: fix hardcoded function name in error report Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 04/10] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
2017-05-24  2:36     ` Peter Xu
2017-05-24  6:45       ` Alexey
2017-05-24 11:33         ` Peter Xu
2017-05-24 11:47           ` Alexey Perevalov
2017-05-31 19:12             ` Dr. David Alan Gilbert
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 05/10] migration: introduce postcopy-blocktime capability Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 06/10] migration: add postcopy blocktime ctx into MigrationIncomingState Alexey Perevalov
2017-05-24  3:31     ` Peter Xu
2017-06-05  6:31       ` Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 07/10] migration: add bitmap for copied page Alexey Perevalov
2017-05-24  6:57     ` Peter Xu
2017-05-24  7:56       ` Alexey
2017-05-24 12:01         ` Peter Xu
2017-05-24 12:16           ` Alexey Perevalov
2017-05-24 23:30             ` Peter Xu
2017-05-25  6:28               ` Alexey Perevalov
2017-05-25  7:25                 ` Peter Xu
2017-05-31 19:25                   ` Dr. David Alan Gilbert
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side Alexey Perevalov
2017-05-24  7:53     ` Peter Xu [this message]
2017-05-24  9:37       ` Alexey
2017-05-24 11:22         ` Peter Xu
2017-05-24 11:37           ` Alexey Perevalov
2017-06-01 10:07           ` Dr. David Alan Gilbert
2017-06-01 10:50         ` Dr. David Alan Gilbert
2017-06-01 10:57     ` Dr. David Alan Gilbert
2017-06-07  7:34       ` Alexey Perevalov
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 09/10] migration: add postcopy total blocktime into query-migrate Alexey Perevalov
2017-06-01 11:35     ` Dr. David Alan Gilbert
2017-05-23 11:31   ` [Qemu-devel] [PATCH V6 10/10] migration: postcopy_blocktime documentation Alexey Perevalov
2017-06-01 11:37     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170524075305.GG3873@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=i.maximets@samsung.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.