qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Tanase <sebastian.tanase@openwide.fr>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kwolf@redhat.com, peter maydell <peter.maydell@linaro.org>,
	aliguori@amazon.com, wenchaoqemu@gmail.com, quintela@redhat.com,
	qemu-devel@nongnu.org, mjt@tls.msk.ru, mst@redhat.com,
	stefanha@redhat.com, armbru@redhat.com, lcapitulino@redhat.com,
	michael@walle.cc, alex@alex.org.uk, crobinso@redhat.com,
	afaerber@suse.de, rth@twiddle.net
Subject: Re: [Qemu-devel] [RFC PATCH V3 4/6] cpu_exec: Add sleeping algorithm
Date: Tue, 1 Jul 2014 17:44:56 +0200 (CEST)	[thread overview]
Message-ID: <28370062.19195651.1404229496721.JavaMail.root@openwide.fr> (raw)
In-Reply-To: <53B19455.4060205@redhat.com>



----- Mail original -----
> De: "Paolo Bonzini" <pbonzini@redhat.com>
> À: "Sebastian Tanase" <sebastian.tanase@openwide.fr>, qemu-devel@nongnu.org
> Cc: aliguori@amazon.com, afaerber@suse.de, rth@twiddle.net, "peter maydell" <peter.maydell@linaro.org>,
> michael@walle.cc, alex@alex.org.uk, stefanha@redhat.com, lcapitulino@redhat.com, crobinso@redhat.com,
> armbru@redhat.com, wenchaoqemu@gmail.com, quintela@redhat.com, kwolf@redhat.com, mjt@tls.msk.ru, mst@redhat.com
> Envoyé: Lundi 30 Juin 2014 18:46:13
> Objet: Re: [RFC PATCH V3 4/6] cpu_exec: Add sleeping algorithm
> 
> Il 30/06/2014 15:59, Sebastian Tanase ha scritto:
> > The goal is to sleep qemu whenever the guest clock
> > is in advance compared to the host clock (we use
> > the monotonic clocks). The amount of time to sleep
> > is calculated in the execution loop in cpu_exec.
> >
> > At first, we tried to approximate at each for loop the real time
> > elapsed
> > while searching for a TB (generating or retrieving from cache) and
> > executing it. We would then approximate the virtual time
> > corresponding
> > to the number of virtual instructions executed. The difference
> > between
> > these 2 values would allow us to know if the guest is in advance or
> > delayed.
> > However, the function used for measuring the real time
> > (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very
> > expensive.
> > We had an added overhead of 13% of the total run time.
> >
> > Therefore, we modified the algorithm and only take into account the
> > difference between the 2 clocks at the begining of the cpu_exec
> > function.
> > During the for loop we try to reduce the advance of the guest only
> > by
> > computing the virtual time elapsed and sleeping if necessary. The
> > overhead
> > is thus reduced to 3%. Even though this method still has a
> > noticeable
> > overhead, it no longer is a bottleneck in trying to achieve a
> > better
> > guest frequency for which the guest clock is faster than the host
> > one.
> >
> > As for the the alignement of the 2 clocks, with the first algorithm
> > the guest clock was oscillating between -1 and 1ms compared to the
> > host clock.
> > Using the second algorithm we notice that the guest is 5ms behind
> > the host, which
> > is still acceptable for our use case.
> >
> > The tests where conducted using fio and stress. The host machine in
> > an i5 CPU at
> > 3.10GHz running Debian Jessie (kernel 3.12). The guest machine is
> > an arm versatile-pb
> > built with buildroot.
> >
> > Currently, on our test machine, the lowest icount we can achieve
> > that is suitable for
> > aligning the 2 clocks is 6. However, we observe that the IO tests
> > (using fio) are
> > slower than the cpu tests (using stress).
> >
> > Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
> > Tested-by: Camille Bégué <camille.begue@openwide.fr>
> > ---
> >  cpu-exec.c | 112
> >  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 112 insertions(+)
> >
> > diff --git a/cpu-exec.c b/cpu-exec.c
> > index 38e5f02..ac741b7 100644
> > --- a/cpu-exec.c
> > +++ b/cpu-exec.c
> > @@ -22,6 +22,102 @@
> >  #include "tcg.h"
> >  #include "qemu/atomic.h"
> >  #include "sysemu/qtest.h"
> > +#include "qemu/timer.h"
> > +
> > +/* Structs and function pointers for delaying the host */
> > +typedef struct SyncClocks SyncClocks;
> > +typedef void (*init_delay_func)(SyncClocks *sc,
> > +                                const CPUState *cpu);
> > +typedef void (*perform_align_func)(SyncClocks *sc,
> > +                                   const CPUState *cpu);
> > +struct SyncClocks {
> > +    int64_t diff_clk;
> > +    int64_t original_instr_counter;
> > +    init_delay_func init_delay;
> > +    perform_align_func perform_align;
> > +};
> 
> I don't remember exactly what I had in mind :) but if I remove these
> pointers from your patches, the code already looks nice, with no
> CONFIG_USER_ONLY except just below here.
> 

Ok, I will remove the function pointers then :)

> > +#if !defined(CONFIG_USER_ONLY)
> > +/* Allow the guest to have a max 3ms advance.
> > + * The difference between the 2 clocks could therefore
> > + * oscillate around 0.
> > + */
> > +#define VM_CLOCK_ADVANCE 3000000
> 
> How did you tune this?
> 

I computed this value based on the tests run on my machine. Of course,
this value will be different on a different machine running different
tests.

> > +static int64_t delay_host(int64_t diff_clk)
> > +{
> > +    struct timespec sleep_delay, rem_delay;
> > +    if (diff_clk > VM_CLOCK_ADVANCE) {
> > +        sleep_delay.tv_sec = diff_clk / 1000000000LL;
> > +        sleep_delay.tv_nsec = diff_clk % 1000000000LL;
> > +        if (nanosleep(&sleep_delay, &rem_delay) < 0) {
> > +            diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) *
> > 1000000000LL;
> > +            diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec;
> 
> I just remembered that nanosleep doesn't exist on Windows. :(  The
> rem_delay feature of nanosleep is very useful, and I don't think
> there
> is an equivalent.  So for now we shall make this POSIX only.
> 
> Paolo
> 

Should I surround the nanosleep with #ifndef _WIN32 and then add
Sleep for the Windows case ? or just leave out Windows ?

Sebastian


> > +        } else {
> > +            diff_clk = 0;
> > +        }
> > +    }
> > +    return diff_clk;
> > +}
> > +
> > +static int64_t instr_to_vtime(int64_t instr_counter, const
> > CPUState *cpu)
> > +{
> > +    int64_t instr_exec_time;
> > +    instr_exec_time = instr_counter -
> > +                      (cpu->icount_extra +
> > +                       cpu->icount_decr.u16.low);
> > +    instr_exec_time = instr_exec_time << icount_time_shift;
> > +
> > +    return instr_exec_time;
> > +}
> > +
> > +static void align_clocks(SyncClocks *sc, const CPUState *cpu)
> > +{
> > +    if (!icount_align_option) {
> > +        return;
> > +    }
> > +    sc->diff_clk += instr_to_vtime(sc->original_instr_counter,
> > cpu);
> > +    sc->original_instr_counter = cpu->icount_extra +
> > cpu->icount_decr.u16.low;
> > +    sc->diff_clk = delay_host(sc->diff_clk);
> > +}
> > +
> > +static void init_delay_params(SyncClocks *sc,
> > +                              const CPUState *cpu)
> > +{
> > +    static int64_t clocks_offset = -1;
> > +    int64_t realtime_clock_value, virtual_clock_value;
> > +    if (!icount_align_option) {
> > +        return;
> > +    }
> > +    /* On x86 target architecture, the PIT reset function (called
> > +       by qemu_system_reset) will end up calling qemu_clock_warp
> > +       and then icount_warp_rt changing vm_clock_warp_start from 0
> > (initial
> > +       value) to -1. This in turn will make us skip the initial
> > offset
> > +       between the real and virtual clocks (initially virtual
> > clock is 0).
> > +       Therefore we impose that the first time we run the cpu
> > +       the host and virtual clocks should be aligned; we don't
> > alter any of
> > +       the clocks, we just calculate the difference between them.
> > */
> > +    realtime_clock_value = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> > +    virtual_clock_value = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > +    if (clocks_offset == -1) {
> > +        clocks_offset = realtime_clock_value -
> > virtual_clock_value;
> > +    }
> > +    sc->diff_clk = virtual_clock_value - realtime_clock_value +
> > clocks_offset;
> > +    sc->original_instr_counter = cpu->icount_extra +
> > cpu->icount_decr.u16.low;
> > +}
> > +#else
> > +/* We don't use the align feature for User emulation
> > +   thus we add empty functions which shall be ignored
> > +   by the compiler */
> > +static void align_clocks(SyncClocks *sc, const CPUState *cpu)
> > +{
> > +}
> > +
> > +static void init_delay_params(SyncClocks *sc,
> > +                              const CPUState *cpu)
> > +{
> > +}
> > +#endif /* CONFIG USER ONLY */
> >
> >  void cpu_loop_exit(CPUState *cpu)
> >  {
> > @@ -227,6 +323,11 @@ int cpu_exec(CPUArchState *env)
> >      TranslationBlock *tb;
> >      uint8_t *tc_ptr;
> >      uintptr_t next_tb;
> > +    /* Delay algorithm */
> > +    static SyncClocks sc = {
> > +        .init_delay = init_delay_params,
> > +        .perform_align = align_clocks
> > +    };
> >      /* This must be volatile so it is not trashed by longjmp() */
> >      volatile bool have_tb_lock = false;
> >
> > @@ -283,6 +384,11 @@ int cpu_exec(CPUArchState *env)
> >  #endif
> >      cpu->exception_index = -1;
> >
> > +    /* Calculate difference between guest clock and host clock.
> > +       This delay includes the delay of the last cycle, so
> > +       what we have to do is sleep until it is 0. As for the
> > +       advance/delay we gain here, we try to fix it next time. */
> > +    sc.init_delay(&sc, cpu);
> >      /* prepare setjmp context for exception handling */
> >      for(;;) {
> >          if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> > @@ -672,6 +778,9 @@ int cpu_exec(CPUArchState *env)
> >                              if (insns_left > 0) {
> >                                  /* Execute remaining instructions.
> >                                   */
> >                                  cpu_exec_nocache(env, insns_left,
> >                                  tb);
> > +                               /* Try to align the host and
> > virtual clocks
> > +                                  if the guest is in advance. */
> > +                                sc.perform_align(&sc, cpu);
> >                              }
> >                              cpu->exception_index = EXCP_INTERRUPT;
> >                              next_tb = 0;
> > @@ -684,6 +793,9 @@ int cpu_exec(CPUArchState *env)
> >                      }
> >                  }
> >                  cpu->current_tb = NULL;
> > +                /* Try to align the host and virtual clocks
> > +                   if the guest is in advance */
> > +                sc.perform_align(&sc, cpu);
> >                  /* reset soft MMU for next block (it can currently
> >                     only be set by a memory fault) */
> >              } /* for(;;) */
> >
> 
> 

  reply	other threads:[~2014-07-01 15:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 13:59 [Qemu-devel] [RFC PATCH V3 0/6] icount: Implement delay algorithm between guest and host clocks Sebastian Tanase
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 1/6] icount: Add QemuOpts for icount Sebastian Tanase
2014-07-01  8:59   ` Frederic Konrad
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 2/6] icount: Add align option to icount Sebastian Tanase
2014-06-30 16:33   ` Paolo Bonzini
2014-07-01 15:26     ` Sebastian Tanase
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 3/6] icount: Make icount_time_shift available everywhere Sebastian Tanase
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 4/6] cpu_exec: Add sleeping algorithm Sebastian Tanase
2014-06-30 16:46   ` Paolo Bonzini
2014-07-01 15:44     ` Sebastian Tanase [this message]
2014-06-30 17:08   ` Paolo Bonzini
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 5/6] cpu_exec: Print to console if the guest is late Sebastian Tanase
2014-06-30 17:11   ` Paolo Bonzini
2014-07-01 15:52     ` Sebastian Tanase
2014-06-30 13:59 ` [Qemu-devel] [RFC PATCH V3 6/6] monitor: Add drift info to 'info jit' Sebastian Tanase
2014-07-01  7:47   ` Frederic Konrad
2014-07-01 15:55     ` Sebastian Tanase
2014-06-30 17:16 ` [Qemu-devel] [RFC PATCH V3 0/6] icount: Implement delay algorithm between guest and host clocks Paolo Bonzini
2014-07-01 15:54   ` Sebastian Tanase
2014-07-04 15:36   ` Sebastian Tanase
2014-07-04 15:49     ` Paolo Bonzini
2014-07-04 16:05     ` Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28370062.19195651.1404229496721.JavaMail.root@openwide.fr \
    --to=sebastian.tanase@openwide.fr \
    --cc=afaerber@suse.de \
    --cc=alex@alex.org.uk \
    --cc=aliguori@amazon.com \
    --cc=armbru@redhat.com \
    --cc=crobinso@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=michael@walle.cc \
    --cc=mjt@tls.msk.ru \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    --cc=wenchaoqemu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).