From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57314) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WvOiG-0000qq-1e for qemu-devel@nongnu.org; Fri, 13 Jun 2014 06:28:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WvOiA-00008U-T3 for qemu-devel@nongnu.org; Fri, 13 Jun 2014 06:27:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:14678) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WvOiA-00008O-KO for qemu-devel@nongnu.org; Fri, 13 Jun 2014 06:27:54 -0400 Message-ID: <539AD215.9050505@redhat.com> Date: Fri, 13 Jun 2014 12:27:33 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1402652437-13194-1-git-send-email-sebastian.tanase@openwide.fr> <1402652437-13194-4-git-send-email-sebastian.tanase@openwide.fr> In-Reply-To: <1402652437-13194-4-git-send-email-sebastian.tanase@openwide.fr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH V2 3/5] cpu_exec: Add sleeping algorithm List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sebastian Tanase , qemu-devel@nongnu.org Cc: kwolf@redhat.com, peter.maydell@linaro.org, jeremy.rosen@openwide.fr, aliguori@amazon.com, wenchaoqemu@gmail.com, quintela@redhat.com, mjt@tls.msk.ru, mst@redhat.com, stefanha@redhat.com, armbru@redhat.com, lcapitulino@redhat.com, michael@walle.cc, camille.begue@openwide.fr, alex@alex.org.uk, crobinso@redhat.com, pierre.lemagourou@openwide.fr, afaerber@suse.de, rth@twiddle.net Il 13/06/2014 11:40, Sebastian Tanase ha scritto: > The goal is to sleep qemu whenever the guest clock > is in advance compared to the host clock (we use > the monotonic clocks). The amount of time to sleep > is calculated in the execution loop in cpu_exec. > > Basically, using QEMU_CLOCK_REALTIME, we calculate > the real time duration of the execution (meaning > generating TBs and executing them) and using the > the fields icount_decr.u16.low and icount_extra > (from the CPUState structure) shifted by icount_time_shift > we calculate the theoretical virtual time elapsed. > Having these 2 values, we can determine if the > guest is in advance and sleep. > > Signed-off-by: Sebastian Tanase > Tested-by: Camille B=C3=A9gu=C3=A9 > --- > > At first, we tried to approximate at each for loop the real time elapse= d > while searching for a TB (generating or retrieving from cache) and > executing it. We would then approximate the virtual time corresponding > to the number of virtual instructions executed. The difference between > these 2 values would allow us to know if the guest is in advance or del= ayed. > However, the function used for measuring the real time > (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive. > We had an added overhead of 13% of the total run time. > > Therefore, we modified the algorithm and only take into account the > difference between the 2 clocks at the begining of the cpu_exec functio= n. > During the for loop we try to reduce the advance of the guest only by > computing the virtual time elapsed and sleeping if necessary. The overh= ead > is thus reduced to 3%. Even though this method still has a noticeable > overhead, it no longer is a bottleneck in trying to achieve a better > guest frequency for which the guest clock is faster than the host one. > > As for the the alignement of the 2 clocks, with the first algorithm > the guest clock was oscillating between -1 and 1ms compared to the host= clock. > Using the second algorithm we notice that the guest is 5ms behind the h= ost, which > is still acceptable for our use case. > > The tests where conducted using fio and stress. The host machine in an = i5 CPU at > 3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an ar= m versatile-pb built > with buildroot. > > Currently, on our test machine, the lowest icount we can achieve that i= s suitable for > aligning the 2 clocks is 6. However, we observe that the IO tests (usin= g fio) are > slower than the cpu tests (using stress). You can place this text in the commit message. > --- > cpu-exec.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++= +++++++- > 1 file changed, 77 insertions(+), 1 deletion(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index 38e5f02..ccc2c0e 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -22,6 +22,7 @@ > #include "tcg.h" > #include "qemu/atomic.h" > #include "sysemu/qtest.h" > +#include "qemu/timer.h" > > void cpu_loop_exit(CPUState *cpu) > { > @@ -209,6 +210,55 @@ static void cpu_handle_debug_exception(CPUArchStat= e *env) > } > } > > +#if !(defined(CONFIG_USER_ONLY)) No need for ( ) here around defined. > +/* Allow the guest to have a max 3ms advance. > + * The difference between the 2 clocks could therefore > + * oscillate around 0. > + */ > +#define VM_CLOCK_ADVANCE 3000000 > + > +static int64_t delay_host(int64_t diff_clk) > +{ > + struct timespec sleep_delay, rem_delay; > + if (diff_clk > VM_CLOCK_ADVANCE) { > + sleep_delay.tv_sec =3D diff_clk / 1000000000LL; > + sleep_delay.tv_nsec =3D diff_clk % 1000000000LL; > + if (nanosleep(&sleep_delay, &rem_delay) < 0) { > + diff_clk -=3D (sleep_delay.tv_sec - rem_delay.tv_sec) * 10= 00000000LL; > + diff_clk -=3D sleep_delay.tv_nsec - rem_delay.tv_nsec; > + } else { > + diff_clk =3D 0; > + } > + } > + return diff_clk; > +} > + > +static int64_t update_clock_difference(int64_t diff_clk, > + int64_t *instr_counter, > + CPUState *cpu) > +{ > + int64_t instr_exec_time; > + instr_exec_time =3D *instr_counter - > + (cpu->icount_extra + > + cpu->icount_decr.u16.low); > + instr_exec_time =3D instr_exec_time << icount_time_shift; > + *instr_counter =3D cpu->icount_extra + cpu->icount_decr.u16.low; > + > + return diff_clk + instr_exec_time; > +} > + > +static int64_t align_clocks(int64_t diff_clk, int64_t *oic, CPUState *= cpu) > +{ > + if (icount_align_option) { > + diff_clk =3D update_clock_difference(diff_clk, oic, cpu); > + diff_clk =3D delay_host(diff_clk); > + } > + > + return diff_clk; > +} > +#endif /* CONFIG USER ONLY */ > + > /* main execution loop */ > > volatile sig_atomic_t exit_request; > @@ -227,6 +277,10 @@ int cpu_exec(CPUArchState *env) > TranslationBlock *tb; > uint8_t *tc_ptr; > uintptr_t next_tb; > +#if !(defined(CONFIG_USER_ONLY)) > + /* Delay algorithm */ > + int64_t diff_clk, original_instr_counter; > +#endif Perhaps these could become a struct too (an abstract data type with init=20 and realign operations), with a dummy implementation for=20 CONFIG_USER_ONLY. The compiler will remove the dead code. > /* This must be volatile so it is not trashed by longjmp() */ > volatile bool have_tb_lock =3D false; > > @@ -282,7 +336,17 @@ int cpu_exec(CPUArchState *env) > #error unsupported target CPU > #endif > cpu->exception_index =3D -1; > - > +#if !(defined(CONFIG_USER_ONLY)) > + if (icount_align_option) { > + /* Calculate difference between guest clock and host clock. > + This delay includes the delay of the last cycle, so > + what we have to do is sleep until it is 0. As for the > + advance/delay we gain here, we try to fix it next time. */ > + diff_clk =3D qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - > + qemu_clock_get_ns(QEMU_CLOCK_REALTIME); This doesn't work yet, does it? Let's look for a less hacky solution to the problem you fix in patch 4,=20 and squash it in this patch. Paolo > + original_instr_counter =3D cpu->icount_extra + cpu->icount_dec= r.u16.low; > + } > +#endif > /* prepare setjmp context for exception handling */ > for(;;) { > if (sigsetjmp(cpu->jmp_env, 0) =3D=3D 0) { > @@ -672,6 +736,13 @@ int cpu_exec(CPUArchState *env) > if (insns_left > 0) { > /* Execute remaining instructions. */ > cpu_exec_nocache(env, insns_left, tb); > +#if !(defined(CONFIG_USER_ONLY)) > + /* We can ignore the return value > + because we exit anyway. */ > + align_clocks(diff_clk, > + &original_instr_counter, > + cpu); > +#endif > } > cpu->exception_index =3D EXCP_INTERRUPT; > next_tb =3D 0; > @@ -684,6 +755,11 @@ int cpu_exec(CPUArchState *env) > } > } > cpu->current_tb =3D NULL; > +#if !(defined(CONFIG_USER_ONLY)) > + diff_clk =3D align_clocks(diff_clk, > + &original_instr_counter, > + cpu); > +#endif > /* reset soft MMU for next block (it can currently > only be set by a memory fault) */ > } /* for(;;) */ >