From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50835) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiexK-0003vq-KG for qemu-devel@nongnu.org; Wed, 23 Mar 2016 05:20:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiexG-0008Fs-Fw for qemu-devel@nongnu.org; Wed, 23 Mar 2016 05:19:58 -0400 Received: from greensocs.com ([193.104.36.180]:54278) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiexG-0008Ff-2d for qemu-devel@nongnu.org; Wed, 23 Mar 2016 05:19:54 -0400 References: <1458317932-1875-1-git-send-email-alex.bennee@linaro.org> <1458317932-1875-10-git-send-email-alex.bennee@linaro.org> From: KONRAD Frederic Message-ID: <56F25FB6.4070008@greensocs.com> Date: Wed, 23 Mar 2016 10:19:50 +0100 MIME-Version: 1.0 In-Reply-To: <1458317932-1875-10-git-send-email-alex.bennee@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Alex_Benn=c3=a9e?= , mttcg@listserver.greensocs.com, a.rigo@virtualopensystems.com, serge.fdrv@gmail.com, cota@braap.org Cc: Eduardo Habkost , "Michael S. Tsirkin" , jan.kiszka@siemens.com, Peter Crosthwaite , mark.burton@greensocs.com, qemu-devel@nongnu.org, pbonzini@redhat.com, Richard Henderson Hi Alex, Thanks for having pulling all that together! About this patch the original author was Jan Kiszka=20 (https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html) This has probably been dropped during a rebase. Thanks, Fred Le 18/03/2016 17:18, Alex Benn=C3=A9e a =C3=A9crit : > From: KONRAD Frederic > > This finally allows TCG to benefit from the iothread introduction: Drop > the global mutex while running pure TCG CPU code. Reacquire the lock > when entering MMIO or PIO emulation, or when leaving the TCG loop. > > We have to revert a few optimization for the current TCG threading > model, namely kicking the TCG thread in qemu_mutex_lock_iothread and no= t > kicking it in qemu_cpu_kick. We also need to disable RAM block > reordering until we have a more efficient locking mechanism at hand. > > Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. > These numbers demonstrate where we gain something: > > 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-syste= m-arm > 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-syste= m-arm > > The guest CPU was fully loaded, but the iothread could still run mostly > independent on a second core. Without the patch we don't get beyond > > 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-syste= m-arm > 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-syste= m-arm > > We don't benefit significantly, though, when the guest is not fully > loading a host CPU. > > Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.c= om> > Signed-off-by: KONRAD Frederic > [AJB: -smp single-threaded fix, rm old info from commit msg] > Signed-off-by: Alex Benn=C3=A9e > > --- > v1 (ajb): > - SMP failure now fixed by previous commit > Changes from Fred Konrad (mttcg-v7 via paolo): > * Rebase on the current HEAD. > * Fixes a deadlock in qemu_devices_reset(). > * Remove the mutex in address_space_* > --- > cpu-exec.c | 10 ++++++++++ > cpus.c | 26 +++----------------------- > cputlb.c | 4 ++++ > exec.c | 12 ++++++++++++ > hw/i386/kvmvapic.c | 3 +++ > softmmu_template.h | 17 +++++++++++++++++ > translate-all.c | 11 +++++++++-- > 7 files changed, 58 insertions(+), 25 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index 3572256..76891fd 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -28,6 +28,7 @@ > #include "qemu/rcu.h" > #include "exec/tb-hash.h" > #include "exec/log.h" > +#include "qemu/main-loop.h" > #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY) > #include "hw/i386/apic.h" > #endif > @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu) > for(;;) { > interrupt_request =3D cpu->interrupt_request; > if (unlikely(interrupt_request)) { > + /* FIXME: this needs to take the iothread lock. > + * For this we need to find all places in > + * cc->cpu_exec_interrupt that can call cpu_loop_e= xit, > + * and call qemu_unlock_iothread_mutex() there. E= lse, > + * add a flag telling cpu_loop_exit() to unlock it= . > + */ > + > if (unlikely(cpu->singlestep_enabled & SSTEP_NOIR= Q)) { > /* Mask out external interrupts for this step= . */ > interrupt_request &=3D ~CPU_INTERRUPT_SSTEP_M= ASK; > @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu) > the program flow was changed */ > next_tb =3D 0; > } > + > } > if (unlikely(cpu->exit_request > || replay_has_interrupt())) { > @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu) > g_assert(env =3D=3D &x86_cpu->env); > #endif > #endif /* buggy compiler */ > + > cpu->can_do_io =3D 1; > tb_lock_reset(); > } > diff --git a/cpus.c b/cpus.c > index a87fbf9..0e3d5f9 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu= ) > #endif /* _WIN32 */ > =20 > static QemuMutex qemu_global_mutex; > -static QemuCond qemu_io_proceeded_cond; > -static unsigned iothread_requesting_mutex; > =20 > static QemuThread io_thread; > =20 > @@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void) > qemu_cond_init(&qemu_cpu_cond); > qemu_cond_init(&qemu_pause_cond); > qemu_cond_init(&qemu_work_cond); > - qemu_cond_init(&qemu_io_proceeded_cond); > qemu_mutex_init(&qemu_global_mutex); > =20 > qemu_thread_get_self(&io_thread); > @@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu= ) > qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); > } > =20 > - while (iothread_requesting_mutex) { > - qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex); > - } > - > CPU_FOREACH(cpu) { > qemu_wait_io_event_common(cpu); > } > @@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void) > =20 > void qemu_mutex_lock_iothread(void) > { > - atomic_inc(&iothread_requesting_mutex); > - /* In the simple case there is no need to bump the VCPU thread out= of > - * TCG code execution. > - */ > - if (!tcg_enabled() || qemu_in_vcpu_thread() || > - !first_cpu || !first_cpu->created) { > - qemu_mutex_lock(&qemu_global_mutex); > - atomic_dec(&iothread_requesting_mutex); > - } else { > - if (qemu_mutex_trylock(&qemu_global_mutex)) { > - qemu_cpu_kick_no_halt(); > - qemu_mutex_lock(&qemu_global_mutex); > - } > - atomic_dec(&iothread_requesting_mutex); > - qemu_cond_broadcast(&qemu_io_proceeded_cond); > - } > + qemu_mutex_lock(&qemu_global_mutex); > iothread_locked =3D true; > } > =20 > @@ -1575,7 +1553,9 @@ static int tcg_cpu_exec(CPUState *cpu) > cpu->icount_decr.u16.low =3D decr; > cpu->icount_extra =3D count; > } > + qemu_mutex_unlock_iothread(); > ret =3D cpu_exec(cpu); > + qemu_mutex_lock_iothread(); > #ifdef CONFIG_PROFILER > tcg_time +=3D profile_getclock() - ti; > #endif > diff --git a/cputlb.c b/cputlb.c > index 2f7a166..d607471 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -30,6 +30,10 @@ > #include "exec/ram_addr.h" > #include "tcg/tcg.h" > =20 > +#ifdef CONFIG_SOFTMMU > +#include "qemu/main-loop.h" > +#endif > + > //#define DEBUG_TLB > //#define DEBUG_TLB_CHECK > =20 > diff --git a/exec.c b/exec.c > index 402b751..9986d59 100644 > --- a/exec.c > +++ b/exec.c > @@ -2113,6 +2113,12 @@ static void check_watchpoint(int offset, int len= , MemTxAttrs attrs, int flags) > } > cpu->watchpoint_hit =3D wp; > =20 > + /* > + * Unlock iothread mutex before calling cpu_loop_exit > + * or cpu_resume_from_signal. > + */ > + qemu_mutex_unlock_iothread(); > + > /* Unlocked by cpu_loop_exit or cpu_resume_from_signa= l. */ > tb_lock(); > tb_check_watchpoint(cpu); > @@ -2348,8 +2354,14 @@ static void io_mem_init(void) > memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NUL= L, NULL, UINT64_MAX); > memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_o= ps, NULL, > NULL, UINT64_MAX); > + > + /* io_mem_notdirty calls tb_invalidate_phys_page_fast, > + * which must be called without the iothread mutex. > + */ > memory_region_init_io(&io_mem_notdirty, NULL, ¬dirty_mem_ops, = NULL, > NULL, UINT64_MAX); > + memory_region_clear_global_locking(&io_mem_notdirty); > + > memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL, > NULL, UINT64_MAX); > } > diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c > index 7c0d542..a0439a8 100644 > --- a/hw/i386/kvmvapic.c > +++ b/hw/i386/kvmvapic.c > @@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86= CPU *cpu, target_ulong ip) > resume_all_vcpus(); > =20 > if (!kvm_enabled()) { > + /* Unlock iothread mutex before calling cpu_resume_from_signal= . */ > + qemu_mutex_unlock_iothread(); > + > /* Unlocked by cpu_resume_from_signal. */ > tb_lock(); > cs->current_tb =3D NULL; > diff --git a/softmmu_template.h b/softmmu_template.h > index 208f808..0b6c609 100644 > --- a/softmmu_template.h > +++ b/softmmu_template.h > @@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUAr= chState *env, > CPUState *cpu =3D ENV_GET_CPU(env); > hwaddr physaddr =3D iotlbentry->addr; > MemoryRegion *mr =3D iotlb_to_region(cpu, physaddr, iotlbentry->a= ttrs); > + bool locked =3D false; > =20 > physaddr =3D (physaddr & TARGET_PAGE_MASK) + addr; > cpu->mem_io_pc =3D retaddr; > @@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUA= rchState *env, > } > =20 > cpu->mem_io_vaddr =3D addr; > + if (mr->global_locking) { > + qemu_mutex_lock_iothread(); > + locked =3D true; > + } > memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT, > iotlbentry->attrs); > + if (locked) { > + qemu_mutex_unlock_iothread(); > + } > return val; > } > #endif > @@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchSt= ate *env, > CPUState *cpu =3D ENV_GET_CPU(env); > hwaddr physaddr =3D iotlbentry->addr; > MemoryRegion *mr =3D iotlb_to_region(cpu, physaddr, iotlbentry->a= ttrs); > + bool locked =3D false; > =20 > physaddr =3D (physaddr & TARGET_PAGE_MASK) + addr; > if (mr !=3D &io_mem_rom && mr !=3D &io_mem_notdirty && !cpu->can_= do_io) { > @@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchS= tate *env, > =20 > cpu->mem_io_vaddr =3D addr; > cpu->mem_io_pc =3D retaddr; > + > + if (mr->global_locking) { > + qemu_mutex_lock_iothread(); > + locked =3D true; > + } > memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT, > iotlbentry->attrs); > + if (locked) { > + qemu_mutex_unlock_iothread(); > + } > } > =20 > void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYP= E val, > diff --git a/translate-all.c b/translate-all.c > index 1aab243..fe1392a 100644 > --- a/translate-all.c > +++ b/translate-all.c > @@ -1221,6 +1221,7 @@ void tb_invalidate_phys_range(tb_page_addr_t star= t, tb_page_addr_t end) > * this TB. > * > * Called with mmap_lock held for user-mode emulation > + * If called from generated code, iothread mutex must not be held. > */ > void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr= _t end, > int is_cpu_write_access) > @@ -1333,7 +1334,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t= start, tb_page_addr_t end, > } > =20 > #ifdef CONFIG_SOFTMMU > -/* len must be <=3D 8 and start must be a multiple of len */ > +/* len must be <=3D 8 and start must be a multiple of len. > + * Called via softmmu_template.h, with iothread mutex not held. > + */ > void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len) > { > PageDesc *p; > @@ -1591,6 +1594,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_= ptr) > } > =20 > #if !defined(CONFIG_USER_ONLY) > +/* If called from generated code, iothread mutex must not be held. */ > void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr) > { > ram_addr_t ram_addr; > @@ -1637,7 +1641,10 @@ void tb_check_watchpoint(CPUState *cpu) > =20 > #ifndef CONFIG_USER_ONLY > /* in deterministic execution mode, instructions doing device I/Os > - must be at the end of the TB */ > + * must be at the end of the TB. > + * > + * Called by softmmu_template.h, with iothread mutex not held. > + */ > void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr) > { > #if defined(TARGET_MIPS) || defined(TARGET_SH4)