From: "Alex Bennée" <alex.bennee@linaro.org>
To: Frederic Konrad <fred.konrad@greensocs.com>
Cc: mttcg@listserver.greensocs.com,
Peter Maydell <peter.maydell@linaro.org>,
Jan Kiszka <jan.kiszka@siemens.com>,
Mark Burton <mark.burton@greensocs.com>,
Alexander Graf <agraf@suse.de>,
QEMU Developers <qemu-devel@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
Date: Thu, 23 Apr 2015 16:44:28 +0100 [thread overview]
Message-ID: <87wq126e8z.fsf@linaro.org> (raw)
In-Reply-To: <55379366.9070304@greensocs.com>
Frederic Konrad <fred.konrad@greensocs.com> writes:
> On 10/04/2015 18:03, Frederic Konrad wrote:
>> On 30/03/2015 23:46, Peter Maydell wrote:
>>> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com>
>>> wrote:
>>>> So - Fred is unwilling to send the patch set as it stands, because
>>>> frankly this part is totally broken.
>>>>
>>>> There is an independent patch set that needs splitting out which
>>>> deals with just the atomic instruction issue - specifically for ARM
>>>> (though I guess it’s applicable across the board)…
>>>>
>>>> So - in short - I HOPE to get the patch set onto the reflector
>>>> sometime next week, and I’m sorry for the delay.
>>> What I really want to see is not so much the patch set
>>> but the design sketch I asked for that lists the
>>> various data structures and indicates which ones
>>> are going to be per-cpu, which ones will be shared
>>> (and with what locking), etc.
>>>
>>> -- PMM
>
> Does that makes sense?
>
> BTW here is the repository:
> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4
Is there a non-authenticated read-only http or git:// access to this repo?
>
> Thanks,
> Fred
>
>> Hi everybody,
>> Hi Peter,
>>
>> I tried to recap what we did, how it "works" and what the status:
>>
>> All the mechanism are basically unchanged.
>>
>> A lot of TCG structures are not thread safe.
>> And all TCG threads can run at the same times and sometimes want to
>> generate
>> code at the same time.
>>
>> Translation block related structure:
>>
>> struct TBContext {
>>
>> TranslationBlock *tbs;
>> TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>> int nb_tbs;
>> /* any access to the tbs or the page table must use this lock */
>> QemuMutex tb_lock;
>>
>> /* statistics */
>> int tb_flush_count;
>> int tb_phys_invalidate_count;
>>
>> int tb_invalidated_flag;
>> };
>>
>> This structure is used in TCGContext: TBContext tb_ctx;
>>
>> "tbs" is basically where the translated block are stored and
>> tb_phys_hash an
>> hash table to find them quickly.
>>
>> There are two solutions to prevent thread issues:
>> A/ Just have two tb_ctx.
>> B/ Share it between CPUs and protect the tb_ctx access.
>>
>> We took the second solution so all CPUs can benefit of the translated TB.
>> TBContext is written almost everywhere in translate-all.c.
>> When there are too much tbs a tb_flush occurs and destroy the array.
>> We don't
>> handle this case right now.
>> tb_lock is already used by user-mode code, so we just convert it to
>> QemuMutex so
>> we can reuse it in system-mode.
>>
>> struct TCGContext {
>> uint8_t *pool_cur, *pool_end;
>> TCGPool *pool_first, *pool_current, *pool_first_large;
>> TCGLabel *labels;
>> int nb_labels;
>> int nb_globals;
>> int nb_temps;
>>
>> /* goto_tb support */
>> tcg_insn_unit *code_buf;
>> uintptr_t *tb_next;
>> uint16_t *tb_next_offset;
>> uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */
>>
>> /* liveness analysis */
>> uint16_t *op_dead_args; /* for each operation, each bit tells if the
>> corresponding argument is dead */
>> uint8_t *op_sync_args; /* for each operation, each bit tells if the
>> corresponding output argument needs to be
>> sync to memory. */
>>
>> /* tells in which temporary a given register is. It does not take
>> into account fixed registers */
>> int reg_to_temp[TCG_TARGET_NB_REGS];
>> TCGRegSet reserved_regs;
>> intptr_t current_frame_offset;
>> intptr_t frame_start;
>> intptr_t frame_end;
>> int frame_reg;
>>
>> tcg_insn_unit *code_ptr;
>> TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>> TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>>
>> GHashTable *helpers;
>>
>> #ifdef CONFIG_PROFILER
>> /* profiling info */
>> int64_t tb_count1;
>> int64_t tb_count;
>> int64_t op_count; /* total insn count */
>> int op_count_max; /* max insn per TB */
>> int64_t temp_count;
>> int temp_count_max;
>> int64_t del_op_count;
>> int64_t code_in_len;
>> int64_t code_out_len;
>> int64_t interm_time;
>> int64_t code_time;
>> int64_t la_time;
>> int64_t opt_time;
>> int64_t restore_count;
>> int64_t restore_time;
>> #endif
>>
>> #ifdef CONFIG_DEBUG_TCG
>> int temps_in_use;
>> int goto_tb_issue_mask;
>> #endif
>>
>> uint16_t gen_opc_buf[OPC_BUF_SIZE];
>> TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>>
>> uint16_t *gen_opc_ptr;
>> TCGArg *gen_opparam_ptr;
>> target_ulong gen_opc_pc[OPC_BUF_SIZE];
>> uint16_t gen_opc_icount[OPC_BUF_SIZE];
>> uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
>>
>> /* Code generation. Note that we specifically do not use
>> tcg_insn_unit
>> here, because there's too much arithmetic throughout that relies
>> on addition and subtraction working on bytes. Rely on the GCC
>> extension that allows arithmetic on void*. */
>> int code_gen_max_blocks;
>> void *code_gen_prologue;
>> void *code_gen_buffer;
>> size_t code_gen_buffer_size;
>> /* threshold to flush the translated code buffer */
>> size_t code_gen_buffer_max_size;
>> void *code_gen_ptr;
>>
>> TBContext tb_ctx;
>>
>> /* The TCGBackendData structure is private to tcg-target.c. */
>> struct TCGBackendData *be;
>> };
>>
>> This structure is used to translate the TBs.
>> The easier solution was to protect the generation of the code to only
>> allow one
>> CPU to generate code at a time. This is normal as we don't want double
>> generated
>> tb in the pool anyway. This is achieved with the tb_lock used above.
>>
>> TLB:
>>
>> TLB seems to be CPU dependant, so it is not really a problem as in our
>> implementation one CPU = one pthread. But sometimes a CPU wants to
>> flush TLB,
>> through an instruction for example. It is very likely an other CPU in
>> an other
>> thread is executing code at the same time. That's why we choose to
>> create a
>> tlb_flush_mechanism:
>> When a CPU wants to flush it asks and wait all CPU to exit TCG and
>> then exit
>> itself. This can be reused for tb_invalidate and or tb_flush as well.
>>
>> Atomic instructions:
>>
>> Atomic instructions are quite hard to implement.
>> The TranslationBlock implementing the atomic instruction can't be
>> interrupted
>> during the execution (eg: by an interrupt or a signal) cmpxchg64
>> helper is used
>> for that.
>>
>> QEMU's global lock:
>>
>> TCG thread take the lock during code execution. This is not ok for
>> multi-thread
>> because that means only one thread will be running at a time. That's
>> why we took
>> Jan's patch to allow TCG to run without the lock and take it when needed.
>>
>> What is the status:
>>
>> * We can start a vexpress-a15 simulation with two A15 and run two
>> dhrystones at
>> a time, the performance are increased it's quite stable.
>>
>> What is missing:
>>
>> * tb_flush is not implemented correctly.
>> * PageDesc structure is not protected the patch which introduced a
>> first_tb
>> array was not the right approach and is removed. This implies that
>> tb_invalidate is broken.
>>
>> For both issues we plan to use the same mechanism as tlb_flush:
>> exiting all the
>> CPU, flushing, invalidating and let them continue. A generic mechanism
>> must be
>> implemented for that.
>>
>> Known issues:
>>
>> * GDB stub is broken because it uses tb_invalidate and we didn't
>> implement that
>> for now, and there are probably other issues.
>> * SMP > 2 crashes, probably because of tb_invalidate as well.
>> * We don't know the status of the user code, which is probably broken
>> by our
>> changes.
>>
--
Alex Bennée
next prev parent reply other threads:[~2015-04-23 15:44 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
2015-01-27 14:36 ` Alex Bennée
2015-01-29 15:17 ` Peter Maydell
2015-02-02 8:31 ` Frederic Konrad
2015-02-02 8:36 ` Peter Maydell
2015-02-26 18:09 ` Frederic Konrad
2015-02-26 20:36 ` Alexander Graf
2015-02-26 22:56 ` Peter Maydell
2015-02-27 7:54 ` Mark Burton
2015-03-02 12:27 ` Peter Maydell
2015-03-03 15:29 ` Mark Burton
2015-03-03 15:32 ` Paolo Bonzini
2015-03-03 15:33 ` Mark Burton
2015-03-03 15:34 ` Paolo Bonzini
2015-03-03 15:41 ` Mark Burton
2015-03-03 15:47 ` Dr. David Alan Gilbert
2015-03-13 19:38 ` Richard Henderson
2015-03-13 20:04 ` Dr. David Alan Gilbert
2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
2015-01-27 14:45 ` Alex Bennée
2015-01-27 15:16 ` Frederic Konrad
2015-01-29 15:24 ` Peter Maydell
2015-01-29 15:33 ` Mark Burton
2015-02-02 8:39 ` Frederic Konrad
2015-02-02 8:49 ` Peter Maydell
2015-02-03 16:17 ` Richard Henderson
2015-02-03 16:33 ` Paolo Bonzini
2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
2015-01-29 15:25 ` Peter Maydell
2015-02-02 8:45 ` Frederic Konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
2015-01-29 15:44 ` Peter Maydell
2015-02-03 16:30 ` Richard Henderson
2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
2015-01-29 15:52 ` Peter Maydell
2015-02-02 10:03 ` Paolo Bonzini
2015-02-02 13:12 ` Peter Maydell
2015-02-02 13:14 ` Paolo Bonzini
2015-02-03 9:37 ` Frederic Konrad
2015-02-03 10:29 ` Peter Maydell
2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
2015-03-27 10:37 ` Frederic Konrad
2015-03-30 6:52 ` Mark Burton
2015-03-30 21:46 ` Peter Maydell
2015-03-31 6:41 ` Mark Burton
2015-04-10 16:03 ` Frederic Konrad
2015-04-22 12:26 ` Frederic Konrad
2015-04-22 13:18 ` Peter Maydell
2015-04-23 7:38 ` Frederic Konrad
2015-04-23 15:44 ` Alex Bennée [this message]
2015-04-23 15:46 ` Alex Bennée
2015-04-27 7:37 ` Frederic Konrad
2015-04-27 17:06 ` Emilio G. Cota
2015-04-28 8:17 ` Frederic Konrad
2015-04-28 9:06 ` Paolo Bonzini
2015-04-28 17:49 ` Emilio G. Cota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wq126e8z.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=agraf@suse.de \
--cc=fred.konrad@greensocs.com \
--cc=jan.kiszka@siemens.com \
--cc=mark.burton@greensocs.com \
--cc=mttcg@listserver.greensocs.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).