Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Frederic Konrad <fred.konrad@greensocs.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, mttcg@listserver.greensocs.com
Cc: mark.burton@greensocs.com, alex.bennee@linaro.org,
	a.rigo@virtualopensystems.com, guillaume.delbergue@greensocs.com
Subject: Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.
Date: Wed, 12 Aug 2015 17:19:22 +0200	[thread overview]
Message-ID: <55CB63FA.1070200@greensocs.com> (raw)
In-Reply-To: <55C9FFC0.7040502@greensocs.com>

On 11/08/2015 15:59, Frederic Konrad wrote:
> On 11/08/2015 14:45, Paolo Bonzini wrote:
>> On 10/08/2015 17:26, fred.konrad@greensocs.com wrote:
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> This is the 7th round of the MTTCG patch series.
> Thanks to look at this.
>> Here is a list of issues that I found:
>>
>> - tb_lock usage in tb_find_fast is complicated and introduces the need
>> for other complicated code such as the tb_invalidate callback. Instead,
>> the tb locking should reuse the cpu-exec.c code for user-mode emulation,
>> with additional locking in the spots identified by Fred.
> The reason for this is that locking around tb_find_fast just kills the 
> performance.
>
>>
>> - tb_lock uses a recursive lock, but this is not necessary.  Did I ever
>> say I dislike recursive mutexes? :)  The wrappers
>> tb_lock()/tb_unlock()/tb_lock_reset() can catch recursive locking for
>> us, so it's not hard to do without it.
> True this is definitely not nice but seems some code eg: tb_invalidate 
> are called from
> different place with or without the tb_lock. We might probably be able 
> to lock the
> caller.
>
>>
>> - code_bitmap is not protected by any mutex
>> (tb_invalidate_phys_page_fast is called with the iothread mutex taken,
>> but other users of code_bitmap do not use it).  Writes should be
>> protected by the tb_lock, reads by either tb_lock or RCU.
> Yes good point I missed this one.
>
>> - memory barriers are probably requested around accesses to
>> ->exit_request.  ->thread_kicked also needs to be accessed with atomics,
>> because async_run_{,safe_}on_cpu can be called outside the big QEMU 
>> lock.
>>
>> - the whole signal-based qemu_cpu_kick can just go away.  Just setting
>> tcg_exit_req and exit_request will kick the TCG thread.  The hairy Win32
>> SuspendThread/ResumeThread goes away too.  I suggest doing it now,
>> because proving it unnecessary is easier than proving it correct.
> Just setting tcg_exit_req and exit_request and signal the 
> cpu->halt_cond I guess?
BTW that affect KVM as well. Seems this mechanism is used as well with
qemu_cpu_kick_self().. Which is a little strange as it seems the SIGIPI 
trigger a
dummy signal handler?

     memset(&sigact, 0, sizeof(sigact));
     sigact.sa_handler = dummy_signal;
     sigaction(SIG_IPI, &sigact, NULL);


>
>> - user-mode emulation is broken (does not compile)
>>
>> - the big QEMU lock is not taken anywhere for MMIO accesses that require
>> it (i.e. basically all of them)
> Isn't that handled by prepare_mmio_access?
>
>>
>> - some code wants to be called _outside_ the big QEMU lock, for example
>> because it longjmps back to cpu_exec.  For example, I suspect that the
>> notdirty callbacks must be marked with 
>> memory_region_clear_global_locking.
>
> Fred
>>
>> I've started looking at them (and documenting the locking conventions
>> for functions), and I hope to post it to some git repo later this week.
>>
>> Paolo
>>
>>> It can be cloned from:
>>> git@git.greensocs.com:fkonrad/mttcg.git branch multi_tcg_v7.
>>>
>>> This patch-set try to address the different issues in the global 
>>> picture of
>>> MTTCG, presented on the wiki.
>>>
>>> == Needed patch for our work ==
>>>
>>> Some preliminaries are needed for our work:
>>>   * current_cpu doesn't make sense in mttcg so a tcg_executing flag 
>>> is added to
>>>     the CPUState.
>>>   * We need to run some work safely when all VCPUs are outside their 
>>> execution
>>>     loop. This is done with the async_run_safe_work_on_cpu function 
>>> introduced
>>>     in this series.
>>>   * QemuSpin lock is introduced (on posix only yet) to allow a 
>>> faster handling of
>>>     atomic instruction.
>>>
>>> == Code generation and cache ==
>>>
>>> As Qemu stands, there is no protection at all against two threads 
>>> attempting to
>>> generate code at the same time or modifying a TranslationBlock.
>>> The "protect TBContext with tb_lock" patch address the issue of code 
>>> generation
>>> and makes all the tb_* function thread safe (except tb_flush).
>>> This raised the question of one or multiple caches. We choosed to 
>>> use one
>>> unified cache because it's easier as a first step and since the 
>>> structure of
>>> QEMU effectively has a ‘local’ cache per CPU in the form of the jump 
>>> cache, we
>>> don't see the benefit of having two pools of tbs.
>>>
>>> == Dirty tracking ==
>>>
>>> Protecting the IOs:
>>> To allows all VCPUs threads to run at the same time we need to drop the
>>> global_mutex as soon as possible. The io access need to take the 
>>> mutex. This is
>>> likely to change when 
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
>>> will be upstreamed.
>>>
>>> Invalidation of TranslationBlocks:
>>> We can have all VCPUs running during an invalidation. Each VCPU is 
>>> able to clean
>>> it's jump cache itself as it is in CPUState so that can be handled 
>>> by a simple
>>> call to async_run_on_cpu. However tb_invalidate also writes to the
>>> TranslationBlock which is shared as we have only one pool.
>>> Hence this part of invalidate requires all VCPUs to exit before it 
>>> can be done.
>>> Hence the async_run_safe_work_on_cpu is introduced to handle this case.
>>>
>>> == Atomic instruction ==
>>>
>>> For now only ARM on x64 is supported by using an cmpxchg instruction.
>>> Specifically the limitation of this approach is that it is harder to 
>>> support
>>> 64bit ARM on a host architecture that is multi-core, but only 
>>> supports 32 bit
>>> cmpxchg (we believe this could be the case for some PPC cores).  For 
>>> now this
>>> case is not correctly handled. The existing atomic patch will 
>>> attempt to execute
>>> the 64 bit cmpxchg functionality in a non thread safe fashion. Our 
>>> intention is
>>> to provide a new multi-thread ARM atomic patch for 64bit ARM on 
>>> effective 32bit
>>> hosts.
>>> This atomic instruction part has been tested with Alexander's atomic 
>>> stress repo
>>> available here:
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>>>
>>> The execution is a little slower than upstream probably because of 
>>> the different
>>> VCPU fight for the mutex. Swaping arm_exclusive_lock from mutex to 
>>> spin_lock
>>> reduce considerably the difference.
>>>
>>> == Testing ==
>>>
>>> A simple double dhrystone test in SMP 2 with vexpress-a15 in a linux 
>>> guest show
>>> a good performance progression: it takes basically 18s upstream to 
>>> complete vs
>>> 10s with MTTCG.
>>>
>>> Testing image is available here:
>>> https://cloud.greensocs.com/index.php/s/CfHSLzDH5pmTkW3
>>>
>>> Then simply:
>>> ./configure --target-list=arm-softmmu
>>> make -j8
>>> ./arm-softmmu/qemu-system-arm -M vexpress-a15 -smp 2 -kernel zImage
>>> -initrd rootfs.ext2 -dtb vexpress-v2p-ca15-tc1.dtb --nographic
>>> --append "console=ttyAMA0"
>>>
>>> login: root
>>>
>>> The dhrystone command is the last one in the history.
>>> "dhrystone 10000000 & dhrystone 10000000"
>>>
>>> The atomic spinlock benchmark from Alexander shows that atomic 
>>> basically work.
>>> Just follow the instruction here:
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>>>
>>> == Known issues ==
>>>
>>> * GDB stub:
>>>    GDB stub is not tested right now it will probably requires some 
>>> changes to
>>>    work.
>>>
>>> * deadlock on exit:
>>>    When exiting QEMU Ctrl-C some VCPU's thread are not able to exit 
>>> and continue
>>>    execution.
>>>    http://git.greensocs.com/fkonrad/mttcg/issues/1
>>>
>>> * memory_region_rom_device_set_romd from pflash01 just crashes the 
>>> TCG code.
>>>    Strangely this happen only with "-smp 4" and 2 in the DTB.
>>>    http://git.greensocs.com/fkonrad/mttcg/issues/2
>>>
>>> Changes V6 -> V7:
>>>    * global_lock:
>>>       * Don't protect softmmu read/write helper as it's now done in
>>>         adress_space_rw.
>>>    * tcg_exec_flag:
>>>       * Make the flag atomically test and set through an API.
>>>    * introduce async_safe_work:
>>>       * move qemu_cpu_kick_thread to avoid prototype declaration.
>>>       * use the work_mutex.
>>>    * async_work:
>>>       * protect it with a mutex (work_mutex) against concurent access.
>>>    * tb_lock:
>>>       * protect tcg_malloc_internal as well.
>>>    * signal the VCPU even of current_cpu is NULL.
>>>    * added PSCI patch.
>>>    * rebased on v2.4.0-rc0 (6169b60285fe1ff730d840a49527e721bfb30899).
>>>
>>> Changes V5 -> V6:
>>>    * Introduce async_safe_work to do the tb_flush and some part of 
>>> tb_invalidate.
>>>    * Introduce QemuSpin from Guillaume which allow a faster atomic 
>>> instruction
>>>      (6s to pass Alexander's atomic test instead of 30s before).
>>>    * Don't take tb_lock before tb_find_fast.
>>>    * Handle tb_flush with async_safe_work.
>>>    * Handle tb_invalidate with async_work and async_safe_work.
>>>    * Drop the tlb_flush_request mechanism and use async_work as well.
>>>    * Fix the wrong lenght in atomic patch.
>>>    * Fix the wrong return address for exception in atomic patch.
>>>
>>> Alex Bennée (1):
>>>    target-arm/psci.c: wake up sleeping CPUs (MTTCG)
>>>
>>> Guillaume Delbergue (1):
>>>    add support for spin lock on POSIX systems exclusively
>>>
>>> KONRAD Frederic (17):
>>>    cpus: protect queued_work_* with work_mutex.
>>>    cpus: add tcg_exec_flag.
>>>    cpus: introduce async_run_safe_work_on_cpu.
>>>    replace spinlock by QemuMutex.
>>>    remove unused spinlock.
>>>    protect TBContext with tb_lock.
>>>    tcg: remove tcg_halt_cond global variable.
>>>    Drop global lock during TCG code execution
>>>    cpu: remove exit_request global.
>>>    tcg: switch on multithread.
>>>    Use atomic cmpxchg to atomically check the exclusive value in a 
>>> STREX
>>>    add a callback when tb_invalidate is called.
>>>    cpu: introduce tlb_flush*_all.
>>>    arm: use tlb_flush*_all
>>>    translate-all: introduces tb_flush_safe.
>>>    translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
>>>    mttcg: signal the associated cpu anyway.
>>>
>>>   cpu-exec.c                  |  98 +++++++++------
>>>   cpus.c                      | 295 
>>> +++++++++++++++++++++++++-------------------
>>>   cputlb.c                    |  81 ++++++++++++
>>>   include/exec/exec-all.h     |   8 +-
>>>   include/exec/spinlock.h     |  49 --------
>>>   include/qemu/thread-posix.h |   4 +
>>>   include/qemu/thread-win32.h |   4 +
>>>   include/qemu/thread.h       |   7 ++
>>>   include/qom/cpu.h           |  57 +++++++++
>>>   linux-user/main.c           |   6 +-
>>>   qom/cpu.c                   |  20 +++
>>>   target-arm/cpu.c            |  21 ++++
>>>   target-arm/cpu.h            |   6 +
>>>   target-arm/helper.c         |  58 +++------
>>>   target-arm/helper.h         |   4 +
>>>   target-arm/op_helper.c      | 128 ++++++++++++++++++-
>>>   target-arm/psci.c           |   2 +
>>>   target-arm/translate.c      | 101 +++------------
>>>   target-i386/mem_helper.c    |  16 ++-
>>>   target-i386/misc_helper.c   |  27 +++-
>>>   tcg/i386/tcg-target.c       |   8 ++
>>>   tcg/tcg.h                   |  14 ++-
>>>   translate-all.c             | 217 +++++++++++++++++++++++++++-----
>>>   util/qemu-thread-posix.c    |  45 +++++++
>>>   util/qemu-thread-win32.c    |  30 +++++
>>>   vl.c                        |   6 +
>>>   26 files changed, 934 insertions(+), 378 deletions(-)
>>>   delete mode 100644 include/exec/spinlock.h
>>>
>
>

next prev parent reply	other threads:[~2015-08-12 15:19 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-10 15:26 [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG fred.konrad
2015-08-10 15:26 ` [Qemu-devel] [RFC PATCH V7 01/19] cpus: protect queued_work_* with work_mutex fred.konrad
2015-08-10 15:59   ` Paolo Bonzini
2015-08-10 16:04     ` Frederic Konrad
2015-08-10 16:06       ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 02/19] cpus: add tcg_exec_flag fred.konrad
2015-08-11 10:53   ` Paolo Bonzini
2015-08-11 11:11     ` Frederic Konrad
2015-08-11 12:57       ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 03/19] cpus: introduce async_run_safe_work_on_cpu fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 04/19] replace spinlock by QemuMutex fred.konrad
2015-08-10 16:09   ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 05/19] remove unused spinlock fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 06/19] add support for spin lock on POSIX systems exclusively fred.konrad
2015-08-10 16:10   ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 07/19] protect TBContext with tb_lock fred.konrad
2015-08-10 16:36   ` Paolo Bonzini
2015-08-10 16:50     ` Paolo Bonzini
2015-08-10 18:39       ` Alex Bennée
2015-08-11  8:31         ` Paolo Bonzini
2015-08-11  6:46     ` Frederic Konrad
2015-08-11  8:34       ` Paolo Bonzini
2015-08-11  9:21         ` Peter Maydell
2015-08-11  9:59           ` Paolo Bonzini
2015-08-12 17:45   ` Frederic Konrad
2015-08-12 18:20     ` Alex Bennée
2015-08-12 18:22       ` Paolo Bonzini
2015-08-14  8:38       ` Frederic Konrad
2015-08-15  0:04         ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 08/19] tcg: remove tcg_halt_cond global variable fred.konrad
2015-08-10 16:12   ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution fred.konrad
2015-08-10 16:15   ` Paolo Bonzini
2015-08-11  6:55     ` Frederic Konrad
2015-08-11 20:12     ` Alex Bennée
2015-08-11 21:34       ` Frederic Konrad
2015-08-12  9:58         ` Paolo Bonzini
2015-08-12 12:32           ` Frederic Konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 10/19] cpu: remove exit_request global fred.konrad
2015-08-10 15:51   ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 11/19] tcg: switch on multithread fred.konrad
2015-08-13 11:17   ` Paolo Bonzini
2015-08-13 14:41     ` Frederic Konrad
2015-08-13 14:58       ` Paolo Bonzini
2015-08-13 15:18         ` Frederic Konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 12/19] Use atomic cmpxchg to atomically check the exclusive value in a STREX fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 13/19] add a callback when tb_invalidate is called fred.konrad
2015-08-10 16:52   ` Paolo Bonzini
2015-08-10 18:41     ` Alex Bennée
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 14/19] cpu: introduce tlb_flush*_all fred.konrad
2015-08-10 15:54   ` Paolo Bonzini
2015-08-10 16:00     ` Peter Maydell
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 15/19] arm: use tlb_flush*_all fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe fred.konrad
2015-08-10 16:26   ` Paolo Bonzini
2015-08-12 14:09   ` Paolo Bonzini
2015-08-12 14:11     ` Frederic Konrad
2015-08-12 14:14       ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 17/19] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 18/19] mttcg: signal the associated cpu anyway fred.konrad
2015-08-10 15:51   ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 19/19] target-arm/psci.c: wake up sleeping CPUs (MTTCG) fred.konrad
2015-08-10 16:41   ` Paolo Bonzini
2015-08-10 18:38     ` Alex Bennée
2015-08-10 18:34 ` [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG Alex Bennée
2015-08-10 23:02   ` Frederic Konrad
2015-08-11  6:15 ` Benjamin Herrenschmidt
2015-08-11  6:27   ` Frederic Konrad
2015-10-07 12:46     ` Claudio Fontana
2015-10-07 14:52       ` Frederic Konrad
2015-10-21 15:09         ` Claudio Fontana
2015-08-11  7:54   ` Alex Bennée
2015-08-11  9:22     ` Benjamin Herrenschmidt
2015-08-11  9:29       ` Peter Maydell
2015-08-11 10:09         ` Benjamin Herrenschmidt
2015-08-11 19:22       ` Alex Bennée
2015-08-11 12:45 ` Paolo Bonzini
2015-08-11 13:59   ` Frederic Konrad
2015-08-11 14:10     ` Paolo Bonzini
2015-08-12 15:19     ` Frederic Konrad [this message]
2015-08-12 15:39       ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CB63FA.1070200@greensocs.com \
    --to=fred.konrad@greensocs.com \
    --cc=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=guillaume.delbergue@greensocs.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.