From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YktjU-0003DV-QO for qemu-devel@nongnu.org; Wed, 22 Apr 2015 08:26:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YktjP-00017Q-QT for qemu-devel@nongnu.org; Wed, 22 Apr 2015 08:26:24 -0400 Received: from greensocs.com ([193.104.36.180]:54231) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YktjP-000172-Fw for qemu-devel@nongnu.org; Wed, 22 Apr 2015 08:26:19 -0400 Message-ID: <55379366.9070304@greensocs.com> Date: Wed, 22 Apr 2015 14:26:14 +0200 From: Frederic Konrad MIME-Version: 1.0 References: <1421428797-23697-1-git-send-email-fred.konrad@greensocs.com> <87wq22bvmd.fsf@linaro.org> <551532CF.600@greensocs.com> <0ACA9B57-1DC6-4C75-8FA4-DEAA701B1E94@greensocs.com> <5527F444.7060808@greensocs.com> In-Reply-To: <5527F444.7060808@greensocs.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC 00/10] MultiThread TCG. Reply-To: Frederic Konrad List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell , Mark Burton Cc: mttcg@listserver.greensocs.com, Jan Kiszka , QEMU Developers , Alexander Graf , Paolo Bonzini , =?UTF-8?B?QWxleCBCZW5uw6ll?= On 10/04/2015 18:03, Frederic Konrad wrote: > On 30/03/2015 23:46, Peter Maydell wrote: >> On 30 March 2015 at 07:52, Mark Burton =20 >> wrote: >>> So - Fred is unwilling to send the patch set as it stands, because=20 >>> frankly this part is totally broken. >>> >>> There is an independent patch set that needs splitting out which=20 >>> deals with just the atomic instruction issue - specifically for ARM=20 >>> (though I guess it=E2=80=99s applicable across the board)=E2=80=A6 >>> >>> So - in short - I HOPE to get the patch set onto the reflector=20 >>> sometime next week, and I=E2=80=99m sorry for the delay. >> What I really want to see is not so much the patch set >> but the design sketch I asked for that lists the >> various data structures and indicates which ones >> are going to be per-cpu, which ones will be shared >> (and with what locking), etc. >> >> -- PMM Does that makes sense? BTW here is the repository: git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4 Thanks, Fred > Hi everybody, > Hi Peter, > > I tried to recap what we did, how it "works" and what the status: > > All the mechanism are basically unchanged. > > A lot of TCG structures are not thread safe. > And all TCG threads can run at the same times and sometimes want to=20 > generate > code at the same time. > > Translation block related structure: > > struct TBContext { > > TranslationBlock *tbs; > TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; > int nb_tbs; > /* any access to the tbs or the page table must use this lock */ > QemuMutex tb_lock; > > /* statistics */ > int tb_flush_count; > int tb_phys_invalidate_count; > > int tb_invalidated_flag; > }; > > This structure is used in TCGContext: TBContext tb_ctx; > > "tbs" is basically where the translated block are stored and=20 > tb_phys_hash an > hash table to find them quickly. > > There are two solutions to prevent thread issues: > A/ Just have two tb_ctx. > B/ Share it between CPUs and protect the tb_ctx access. > > We took the second solution so all CPUs can benefit of the translated T= B. > TBContext is written almost everywhere in translate-all.c. > When there are too much tbs a tb_flush occurs and destroy the array.=20 > We don't > handle this case right now. > tb_lock is already used by user-mode code, so we just convert it to=20 > QemuMutex so > we can reuse it in system-mode. > > struct TCGContext { > uint8_t *pool_cur, *pool_end; > TCGPool *pool_first, *pool_current, *pool_first_large; > TCGLabel *labels; > int nb_labels; > int nb_globals; > int nb_temps; > > /* goto_tb support */ > tcg_insn_unit *code_buf; > uintptr_t *tb_next; > uint16_t *tb_next_offset; > uint16_t *tb_jmp_offset; /* !=3D NULL if USE_DIRECT_JUMP */ > > /* liveness analysis */ > uint16_t *op_dead_args; /* for each operation, each bit tells if th= e > corresponding argument is dead */ > uint8_t *op_sync_args; /* for each operation, each bit tells if th= e > corresponding output argument needs to b= e > sync to memory. */ > > /* tells in which temporary a given register is. It does not take > into account fixed registers */ > int reg_to_temp[TCG_TARGET_NB_REGS]; > TCGRegSet reserved_regs; > intptr_t current_frame_offset; > intptr_t frame_start; > intptr_t frame_end; > int frame_reg; > > tcg_insn_unit *code_ptr; > TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */ > TCGTempSet free_temps[TCG_TYPE_COUNT * 2]; > > GHashTable *helpers; > > #ifdef CONFIG_PROFILER > /* profiling info */ > int64_t tb_count1; > int64_t tb_count; > int64_t op_count; /* total insn count */ > int op_count_max; /* max insn per TB */ > int64_t temp_count; > int temp_count_max; > int64_t del_op_count; > int64_t code_in_len; > int64_t code_out_len; > int64_t interm_time; > int64_t code_time; > int64_t la_time; > int64_t opt_time; > int64_t restore_count; > int64_t restore_time; > #endif > > #ifdef CONFIG_DEBUG_TCG > int temps_in_use; > int goto_tb_issue_mask; > #endif > > uint16_t gen_opc_buf[OPC_BUF_SIZE]; > TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE]; > > uint16_t *gen_opc_ptr; > TCGArg *gen_opparam_ptr; > target_ulong gen_opc_pc[OPC_BUF_SIZE]; > uint16_t gen_opc_icount[OPC_BUF_SIZE]; > uint8_t gen_opc_instr_start[OPC_BUF_SIZE]; > > /* Code generation. Note that we specifically do not use=20 > tcg_insn_unit > here, because there's too much arithmetic throughout that relies > on addition and subtraction working on bytes. Rely on the GCC > extension that allows arithmetic on void*. */ > int code_gen_max_blocks; > void *code_gen_prologue; > void *code_gen_buffer; > size_t code_gen_buffer_size; > /* threshold to flush the translated code buffer */ > size_t code_gen_buffer_max_size; > void *code_gen_ptr; > > TBContext tb_ctx; > > /* The TCGBackendData structure is private to tcg-target.c. */ > struct TCGBackendData *be; > }; > > This structure is used to translate the TBs. > The easier solution was to protect the generation of the code to only=20 > allow one > CPU to generate code at a time. This is normal as we don't want double=20 > generated > tb in the pool anyway. This is achieved with the tb_lock used above. > > TLB: > > TLB seems to be CPU dependant, so it is not really a problem as in our > implementation one CPU =3D one pthread. But sometimes a CPU wants to=20 > flush TLB, > through an instruction for example. It is very likely an other CPU in=20 > an other > thread is executing code at the same time. That's why we choose to=20 > create a > tlb_flush_mechanism: > When a CPU wants to flush it asks and wait all CPU to exit TCG and=20 > then exit > itself. This can be reused for tb_invalidate and or tb_flush as well. > > Atomic instructions: > > Atomic instructions are quite hard to implement. > The TranslationBlock implementing the atomic instruction can't be=20 > interrupted > during the execution (eg: by an interrupt or a signal) cmpxchg64=20 > helper is used > for that. > > QEMU's global lock: > > TCG thread take the lock during code execution. This is not ok for=20 > multi-thread > because that means only one thread will be running at a time. That's=20 > why we took > Jan's patch to allow TCG to run without the lock and take it when neede= d. > > What is the status: > > * We can start a vexpress-a15 simulation with two A15 and run two=20 > dhrystones at > a time, the performance are increased it's quite stable. > > What is missing: > > * tb_flush is not implemented correctly. > * PageDesc structure is not protected the patch which introduced a=20 > first_tb > array was not the right approach and is removed. This implies that > tb_invalidate is broken. > > For both issues we plan to use the same mechanism as tlb_flush:=20 > exiting all the > CPU, flushing, invalidating and let them continue. A generic mechanism=20 > must be > implemented for that. > > Known issues: > > * GDB stub is broken because it uses tb_invalidate and we didn't=20 > implement that > for now, and there are probably other issues. > * SMP > 2 crashes, probably because of tb_invalidate as well. > * We don't know the status of the user code, which is probably broken=20 > by our > changes. >