From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35896) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zjq5f-0006vl-2D for qemu-devel@nongnu.org; Wed, 07 Oct 2015 10:53:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zjq5X-0002sr-QW for qemu-devel@nongnu.org; Wed, 07 Oct 2015 10:53:10 -0400 Received: from greensocs.com ([193.104.36.180]:40952) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zjq5X-0002qy-Du for qemu-devel@nongnu.org; Wed, 07 Oct 2015 10:53:03 -0400 Message-ID: <561531C8.5000708@greensocs.com> Date: Wed, 07 Oct 2015 16:52:56 +0200 From: Frederic Konrad MIME-Version: 1.0 References: <1439220437-23957-1-git-send-email-fred.konrad@greensocs.com> <1439273709.14448.102.camel@kernel.crashing.org> <55C995CB.3080300@greensocs.com> <56151427.4080809@huawei.com> In-Reply-To: <56151427.4080809@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Claudio Fontana , Benjamin Herrenschmidt Cc: mttcg@greensocs.com, mark.burton@greensocs.com, qemu-devel@nongnu.org, a.rigo@virtualopensystems.com, guillaume.delbergue@greensocs.com, pbonzini@redhat.com, alex.bennee@linaro.org Hi Claudio, I'll rebase soon tomorrow with a bit of luck ;). Thanks, Fred On 07/10/2015 14:46, Claudio Fontana wrote: > Hello Frederic, > > On 11.08.2015 08:27, Frederic Konrad wrote: >> On 11/08/2015 08:15, Benjamin Herrenschmidt wrote: >>> On Mon, 2015-08-10 at 17:26 +0200, fred.konrad@greensocs.com wrote: >>>> From: KONRAD Frederic >>>> >>>> This is the 7th round of the MTTCG patch series. >>>> >>>> >>>> It can be cloned from: >>>> git@git.greensocs.com:fkonrad/mttcg.git branch multi_tcg_v7. > would it be possible to rebase on latest qemu? I wonder if mttcg is div= erging a bit too much from mainline, > which will make it more difficult to rebase later..(Or did I get confus= ed about all these repos?) > > Thank you! > > Claudio > >>>> This patch-set try to address the different issues in the global pic= ture of >>>> MTTCG, presented on the wiki. >>>> >>>> =3D=3D Needed patch for our work =3D=3D >>>> >>>> Some preliminaries are needed for our work: >>>> * current_cpu doesn't make sense in mttcg so a tcg_executing flag = is added to >>>> the CPUState. >>> Can't you just make it a TLS ? >> True that can be done as well. But the tcg_exec_flags has a second mea= ning saying >> "you can't start executing code right now because I want to do a safe_= work". >>>> * We need to run some work safely when all VCPUs are outside their= execution >>>> loop. This is done with the async_run_safe_work_on_cpu function = introduced >>>> in this series. >>>> * QemuSpin lock is introduced (on posix only yet) to allow a faste= r handling of >>>> atomic instruction. >>> How do you handle the memory model ? IE , ARM and PPC are OO while x8= 6 >>> is (mostly) in order, so emulating ARM/PPC on x86 is fine but emulati= ng >>> x86 on ARM or PPC will lead to problems unless you generate memory >>> barriers with every load/store .. >> For the moment we are trying to do the first case. >>> At least on POWER7 and later on PPC we have the possibility of settin= g >>> the attribute "Strong Access Ordering" with mremap/mprotect (I dont' >>> remember which one) which gives us x86-like memory semantics... >>> >>> I don't know if ARM supports something similar. On the other hand, wh= en >>> emulating ARM on PPC or vice-versa, we can probably get away with no >>> barriers. >>> >>> Do you expose some kind of guest memory model info to the TCG backend= so >>> it can decide how to handle these things ? >>> >>>> =3D=3D Code generation and cache =3D=3D >>>> >>>> As Qemu stands, there is no protection at all against two threads at= tempting to >>>> generate code at the same time or modifying a TranslationBlock. >>>> The "protect TBContext with tb_lock" patch address the issue of code= generation >>>> and makes all the tb_* function thread safe (except tb_flush). >>>> This raised the question of one or multiple caches. We choosed to us= e one >>>> unified cache because it's easier as a first step and since the stru= cture of >>>> QEMU effectively has a =E2=80=98local=E2=80=99 cache per CPU in the = form of the jump cache, we >>>> don't see the benefit of having two pools of tbs. >>>> >>>> =3D=3D Dirty tracking =3D=3D >>>> >>>> Protecting the IOs: >>>> To allows all VCPUs threads to run at the same time we need to drop = the >>>> global_mutex as soon as possible. The io access need to take the mut= ex. This is >>>> likely to change when http://thread.gmane.org/gmane.comp.emulators.q= emu/345258 >>>> will be upstreamed. >>>> >>>> Invalidation of TranslationBlocks: >>>> We can have all VCPUs running during an invalidation. Each VCPU is a= ble to clean >>>> it's jump cache itself as it is in CPUState so that can be handled b= y a simple >>>> call to async_run_on_cpu. However tb_invalidate also writes to the >>>> TranslationBlock which is shared as we have only one pool. >>>> Hence this part of invalidate requires all VCPUs to exit before it c= an be done. >>>> Hence the async_run_safe_work_on_cpu is introduced to handle this ca= se. >>> What about the host MMU emulation ? Is that multithreaded ? It has >>> potential issues when doing things like dirty bit updates into guest >>> memory, those need to be done atomically. Also TLB invalidations on A= RM >>> and PPC are global, so they will need to invalidate the remote SW TLB= s >>> as well. >>> >>> Do you have a mechanism to synchronize with another thread ? IE, make= it >>> pop out of TCG if already in and prevent it from getting in ? That wa= y >>> you can "remotely" invalidate its TLB... >> Yes that's what the safe_work is doing. Ask everybody to exit prevent = VCPUs to >> resume (tcg_exec_flag) and do the work when everybody is outside cpu-e= xec. >> >>>> =3D=3D Atomic instruction =3D=3D >>>> >>>> For now only ARM on x64 is supported by using an cmpxchg instruction= . >>>> Specifically the limitation of this approach is that it is harder to= support >>>> 64bit ARM on a host architecture that is multi-core, but only suppor= ts 32 bit >>>> cmpxchg (we believe this could be the case for some PPC cores). >>> Right, on the other hand 64-bit will do fine. But then x86 has 2-valu= e >>> atomics nowadays, doesn't it ? And that will be hard to emulate on >>> anything. You might need to have some kind of global hashed lock list >>> used by atomics (hash the physical address) as a fallback if you don'= t >>> have a 1:1 match between host and guest capabilities. >> VOS did a "Slow path for atomic instruction translation" series you ca= n find here: >> https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg00971.html >> >> Which will be used in the end. >> >> Thanks, >> Fred >>> Cheers, >>> Ben.