From: Frederic Konrad <fred.konrad@greensocs.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org, mttcg@listserver.greensocs.com
Cc: mark.burton@greensocs.com, alex.bennee@linaro.org,
a.rigo@virtualopensystems.com, guillaume.delbergue@greensocs.com
Subject: Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.
Date: Wed, 12 Aug 2015 17:19:22 +0200 [thread overview]
Message-ID: <55CB63FA.1070200@greensocs.com> (raw)
In-Reply-To: <55C9FFC0.7040502@greensocs.com>
On 11/08/2015 15:59, Frederic Konrad wrote:
> On 11/08/2015 14:45, Paolo Bonzini wrote:
>> On 10/08/2015 17:26, fred.konrad@greensocs.com wrote:
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> This is the 7th round of the MTTCG patch series.
> Thanks to look at this.
>> Here is a list of issues that I found:
>>
>> - tb_lock usage in tb_find_fast is complicated and introduces the need
>> for other complicated code such as the tb_invalidate callback. Instead,
>> the tb locking should reuse the cpu-exec.c code for user-mode emulation,
>> with additional locking in the spots identified by Fred.
> The reason for this is that locking around tb_find_fast just kills the
> performance.
>
>>
>> - tb_lock uses a recursive lock, but this is not necessary. Did I ever
>> say I dislike recursive mutexes? :) The wrappers
>> tb_lock()/tb_unlock()/tb_lock_reset() can catch recursive locking for
>> us, so it's not hard to do without it.
> True this is definitely not nice but seems some code eg: tb_invalidate
> are called from
> different place with or without the tb_lock. We might probably be able
> to lock the
> caller.
>
>>
>> - code_bitmap is not protected by any mutex
>> (tb_invalidate_phys_page_fast is called with the iothread mutex taken,
>> but other users of code_bitmap do not use it). Writes should be
>> protected by the tb_lock, reads by either tb_lock or RCU.
> Yes good point I missed this one.
>
>> - memory barriers are probably requested around accesses to
>> ->exit_request. ->thread_kicked also needs to be accessed with atomics,
>> because async_run_{,safe_}on_cpu can be called outside the big QEMU
>> lock.
>>
>> - the whole signal-based qemu_cpu_kick can just go away. Just setting
>> tcg_exit_req and exit_request will kick the TCG thread. The hairy Win32
>> SuspendThread/ResumeThread goes away too. I suggest doing it now,
>> because proving it unnecessary is easier than proving it correct.
> Just setting tcg_exit_req and exit_request and signal the
> cpu->halt_cond I guess?
BTW that affect KVM as well. Seems this mechanism is used as well with
qemu_cpu_kick_self().. Which is a little strange as it seems the SIGIPI
trigger a
dummy signal handler?
memset(&sigact, 0, sizeof(sigact));
sigact.sa_handler = dummy_signal;
sigaction(SIG_IPI, &sigact, NULL);
>
>> - user-mode emulation is broken (does not compile)
>>
>> - the big QEMU lock is not taken anywhere for MMIO accesses that require
>> it (i.e. basically all of them)
> Isn't that handled by prepare_mmio_access?
>
>>
>> - some code wants to be called _outside_ the big QEMU lock, for example
>> because it longjmps back to cpu_exec. For example, I suspect that the
>> notdirty callbacks must be marked with
>> memory_region_clear_global_locking.
>
> Fred
>>
>> I've started looking at them (and documenting the locking conventions
>> for functions), and I hope to post it to some git repo later this week.
>>
>> Paolo
>>
>>> It can be cloned from:
>>> git@git.greensocs.com:fkonrad/mttcg.git branch multi_tcg_v7.
>>>
>>> This patch-set try to address the different issues in the global
>>> picture of
>>> MTTCG, presented on the wiki.
>>>
>>> == Needed patch for our work ==
>>>
>>> Some preliminaries are needed for our work:
>>> * current_cpu doesn't make sense in mttcg so a tcg_executing flag
>>> is added to
>>> the CPUState.
>>> * We need to run some work safely when all VCPUs are outside their
>>> execution
>>> loop. This is done with the async_run_safe_work_on_cpu function
>>> introduced
>>> in this series.
>>> * QemuSpin lock is introduced (on posix only yet) to allow a
>>> faster handling of
>>> atomic instruction.
>>>
>>> == Code generation and cache ==
>>>
>>> As Qemu stands, there is no protection at all against two threads
>>> attempting to
>>> generate code at the same time or modifying a TranslationBlock.
>>> The "protect TBContext with tb_lock" patch address the issue of code
>>> generation
>>> and makes all the tb_* function thread safe (except tb_flush).
>>> This raised the question of one or multiple caches. We choosed to
>>> use one
>>> unified cache because it's easier as a first step and since the
>>> structure of
>>> QEMU effectively has a ‘local’ cache per CPU in the form of the jump
>>> cache, we
>>> don't see the benefit of having two pools of tbs.
>>>
>>> == Dirty tracking ==
>>>
>>> Protecting the IOs:
>>> To allows all VCPUs threads to run at the same time we need to drop the
>>> global_mutex as soon as possible. The io access need to take the
>>> mutex. This is
>>> likely to change when
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
>>> will be upstreamed.
>>>
>>> Invalidation of TranslationBlocks:
>>> We can have all VCPUs running during an invalidation. Each VCPU is
>>> able to clean
>>> it's jump cache itself as it is in CPUState so that can be handled
>>> by a simple
>>> call to async_run_on_cpu. However tb_invalidate also writes to the
>>> TranslationBlock which is shared as we have only one pool.
>>> Hence this part of invalidate requires all VCPUs to exit before it
>>> can be done.
>>> Hence the async_run_safe_work_on_cpu is introduced to handle this case.
>>>
>>> == Atomic instruction ==
>>>
>>> For now only ARM on x64 is supported by using an cmpxchg instruction.
>>> Specifically the limitation of this approach is that it is harder to
>>> support
>>> 64bit ARM on a host architecture that is multi-core, but only
>>> supports 32 bit
>>> cmpxchg (we believe this could be the case for some PPC cores). For
>>> now this
>>> case is not correctly handled. The existing atomic patch will
>>> attempt to execute
>>> the 64 bit cmpxchg functionality in a non thread safe fashion. Our
>>> intention is
>>> to provide a new multi-thread ARM atomic patch for 64bit ARM on
>>> effective 32bit
>>> hosts.
>>> This atomic instruction part has been tested with Alexander's atomic
>>> stress repo
>>> available here:
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>>>
>>> The execution is a little slower than upstream probably because of
>>> the different
>>> VCPU fight for the mutex. Swaping arm_exclusive_lock from mutex to
>>> spin_lock
>>> reduce considerably the difference.
>>>
>>> == Testing ==
>>>
>>> A simple double dhrystone test in SMP 2 with vexpress-a15 in a linux
>>> guest show
>>> a good performance progression: it takes basically 18s upstream to
>>> complete vs
>>> 10s with MTTCG.
>>>
>>> Testing image is available here:
>>> https://cloud.greensocs.com/index.php/s/CfHSLzDH5pmTkW3
>>>
>>> Then simply:
>>> ./configure --target-list=arm-softmmu
>>> make -j8
>>> ./arm-softmmu/qemu-system-arm -M vexpress-a15 -smp 2 -kernel zImage
>>> -initrd rootfs.ext2 -dtb vexpress-v2p-ca15-tc1.dtb --nographic
>>> --append "console=ttyAMA0"
>>>
>>> login: root
>>>
>>> The dhrystone command is the last one in the history.
>>> "dhrystone 10000000 & dhrystone 10000000"
>>>
>>> The atomic spinlock benchmark from Alexander shows that atomic
>>> basically work.
>>> Just follow the instruction here:
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>>>
>>> == Known issues ==
>>>
>>> * GDB stub:
>>> GDB stub is not tested right now it will probably requires some
>>> changes to
>>> work.
>>>
>>> * deadlock on exit:
>>> When exiting QEMU Ctrl-C some VCPU's thread are not able to exit
>>> and continue
>>> execution.
>>> http://git.greensocs.com/fkonrad/mttcg/issues/1
>>>
>>> * memory_region_rom_device_set_romd from pflash01 just crashes the
>>> TCG code.
>>> Strangely this happen only with "-smp 4" and 2 in the DTB.
>>> http://git.greensocs.com/fkonrad/mttcg/issues/2
>>>
>>> Changes V6 -> V7:
>>> * global_lock:
>>> * Don't protect softmmu read/write helper as it's now done in
>>> adress_space_rw.
>>> * tcg_exec_flag:
>>> * Make the flag atomically test and set through an API.
>>> * introduce async_safe_work:
>>> * move qemu_cpu_kick_thread to avoid prototype declaration.
>>> * use the work_mutex.
>>> * async_work:
>>> * protect it with a mutex (work_mutex) against concurent access.
>>> * tb_lock:
>>> * protect tcg_malloc_internal as well.
>>> * signal the VCPU even of current_cpu is NULL.
>>> * added PSCI patch.
>>> * rebased on v2.4.0-rc0 (6169b60285fe1ff730d840a49527e721bfb30899).
>>>
>>> Changes V5 -> V6:
>>> * Introduce async_safe_work to do the tb_flush and some part of
>>> tb_invalidate.
>>> * Introduce QemuSpin from Guillaume which allow a faster atomic
>>> instruction
>>> (6s to pass Alexander's atomic test instead of 30s before).
>>> * Don't take tb_lock before tb_find_fast.
>>> * Handle tb_flush with async_safe_work.
>>> * Handle tb_invalidate with async_work and async_safe_work.
>>> * Drop the tlb_flush_request mechanism and use async_work as well.
>>> * Fix the wrong lenght in atomic patch.
>>> * Fix the wrong return address for exception in atomic patch.
>>>
>>> Alex Bennée (1):
>>> target-arm/psci.c: wake up sleeping CPUs (MTTCG)
>>>
>>> Guillaume Delbergue (1):
>>> add support for spin lock on POSIX systems exclusively
>>>
>>> KONRAD Frederic (17):
>>> cpus: protect queued_work_* with work_mutex.
>>> cpus: add tcg_exec_flag.
>>> cpus: introduce async_run_safe_work_on_cpu.
>>> replace spinlock by QemuMutex.
>>> remove unused spinlock.
>>> protect TBContext with tb_lock.
>>> tcg: remove tcg_halt_cond global variable.
>>> Drop global lock during TCG code execution
>>> cpu: remove exit_request global.
>>> tcg: switch on multithread.
>>> Use atomic cmpxchg to atomically check the exclusive value in a
>>> STREX
>>> add a callback when tb_invalidate is called.
>>> cpu: introduce tlb_flush*_all.
>>> arm: use tlb_flush*_all
>>> translate-all: introduces tb_flush_safe.
>>> translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
>>> mttcg: signal the associated cpu anyway.
>>>
>>> cpu-exec.c | 98 +++++++++------
>>> cpus.c | 295
>>> +++++++++++++++++++++++++-------------------
>>> cputlb.c | 81 ++++++++++++
>>> include/exec/exec-all.h | 8 +-
>>> include/exec/spinlock.h | 49 --------
>>> include/qemu/thread-posix.h | 4 +
>>> include/qemu/thread-win32.h | 4 +
>>> include/qemu/thread.h | 7 ++
>>> include/qom/cpu.h | 57 +++++++++
>>> linux-user/main.c | 6 +-
>>> qom/cpu.c | 20 +++
>>> target-arm/cpu.c | 21 ++++
>>> target-arm/cpu.h | 6 +
>>> target-arm/helper.c | 58 +++------
>>> target-arm/helper.h | 4 +
>>> target-arm/op_helper.c | 128 ++++++++++++++++++-
>>> target-arm/psci.c | 2 +
>>> target-arm/translate.c | 101 +++------------
>>> target-i386/mem_helper.c | 16 ++-
>>> target-i386/misc_helper.c | 27 +++-
>>> tcg/i386/tcg-target.c | 8 ++
>>> tcg/tcg.h | 14 ++-
>>> translate-all.c | 217 +++++++++++++++++++++++++++-----
>>> util/qemu-thread-posix.c | 45 +++++++
>>> util/qemu-thread-win32.c | 30 +++++
>>> vl.c | 6 +
>>> 26 files changed, 934 insertions(+), 378 deletions(-)
>>> delete mode 100644 include/exec/spinlock.h
>>>
>
>
next prev parent reply other threads:[~2015-08-12 15:19 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-10 15:26 [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG fred.konrad
2015-08-10 15:26 ` [Qemu-devel] [RFC PATCH V7 01/19] cpus: protect queued_work_* with work_mutex fred.konrad
2015-08-10 15:59 ` Paolo Bonzini
2015-08-10 16:04 ` Frederic Konrad
2015-08-10 16:06 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 02/19] cpus: add tcg_exec_flag fred.konrad
2015-08-11 10:53 ` Paolo Bonzini
2015-08-11 11:11 ` Frederic Konrad
2015-08-11 12:57 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 03/19] cpus: introduce async_run_safe_work_on_cpu fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 04/19] replace spinlock by QemuMutex fred.konrad
2015-08-10 16:09 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 05/19] remove unused spinlock fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 06/19] add support for spin lock on POSIX systems exclusively fred.konrad
2015-08-10 16:10 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 07/19] protect TBContext with tb_lock fred.konrad
2015-08-10 16:36 ` Paolo Bonzini
2015-08-10 16:50 ` Paolo Bonzini
2015-08-10 18:39 ` Alex Bennée
2015-08-11 8:31 ` Paolo Bonzini
2015-08-11 6:46 ` Frederic Konrad
2015-08-11 8:34 ` Paolo Bonzini
2015-08-11 9:21 ` Peter Maydell
2015-08-11 9:59 ` Paolo Bonzini
2015-08-12 17:45 ` Frederic Konrad
2015-08-12 18:20 ` Alex Bennée
2015-08-12 18:22 ` Paolo Bonzini
2015-08-14 8:38 ` Frederic Konrad
2015-08-15 0:04 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 08/19] tcg: remove tcg_halt_cond global variable fred.konrad
2015-08-10 16:12 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution fred.konrad
2015-08-10 16:15 ` Paolo Bonzini
2015-08-11 6:55 ` Frederic Konrad
2015-08-11 20:12 ` Alex Bennée
2015-08-11 21:34 ` Frederic Konrad
2015-08-12 9:58 ` Paolo Bonzini
2015-08-12 12:32 ` Frederic Konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 10/19] cpu: remove exit_request global fred.konrad
2015-08-10 15:51 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 11/19] tcg: switch on multithread fred.konrad
2015-08-13 11:17 ` Paolo Bonzini
2015-08-13 14:41 ` Frederic Konrad
2015-08-13 14:58 ` Paolo Bonzini
2015-08-13 15:18 ` Frederic Konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 12/19] Use atomic cmpxchg to atomically check the exclusive value in a STREX fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 13/19] add a callback when tb_invalidate is called fred.konrad
2015-08-10 16:52 ` Paolo Bonzini
2015-08-10 18:41 ` Alex Bennée
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 14/19] cpu: introduce tlb_flush*_all fred.konrad
2015-08-10 15:54 ` Paolo Bonzini
2015-08-10 16:00 ` Peter Maydell
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 15/19] arm: use tlb_flush*_all fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe fred.konrad
2015-08-10 16:26 ` Paolo Bonzini
2015-08-12 14:09 ` Paolo Bonzini
2015-08-12 14:11 ` Frederic Konrad
2015-08-12 14:14 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 17/19] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 18/19] mttcg: signal the associated cpu anyway fred.konrad
2015-08-10 15:51 ` Paolo Bonzini
2015-08-10 15:27 ` [Qemu-devel] [RFC PATCH V7 19/19] target-arm/psci.c: wake up sleeping CPUs (MTTCG) fred.konrad
2015-08-10 16:41 ` Paolo Bonzini
2015-08-10 18:38 ` Alex Bennée
2015-08-10 18:34 ` [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG Alex Bennée
2015-08-10 23:02 ` Frederic Konrad
2015-08-11 6:15 ` Benjamin Herrenschmidt
2015-08-11 6:27 ` Frederic Konrad
2015-10-07 12:46 ` Claudio Fontana
2015-10-07 14:52 ` Frederic Konrad
2015-10-21 15:09 ` Claudio Fontana
2015-08-11 7:54 ` Alex Bennée
2015-08-11 9:22 ` Benjamin Herrenschmidt
2015-08-11 9:29 ` Peter Maydell
2015-08-11 10:09 ` Benjamin Herrenschmidt
2015-08-11 19:22 ` Alex Bennée
2015-08-11 12:45 ` Paolo Bonzini
2015-08-11 13:59 ` Frederic Konrad
2015-08-11 14:10 ` Paolo Bonzini
2015-08-12 15:19 ` Frederic Konrad [this message]
2015-08-12 15:39 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55CB63FA.1070200@greensocs.com \
--to=fred.konrad@greensocs.com \
--cc=a.rigo@virtualopensystems.com \
--cc=alex.bennee@linaro.org \
--cc=guillaume.delbergue@greensocs.com \
--cc=mark.burton@greensocs.com \
--cc=mttcg@listserver.greensocs.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).