Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Frederic Konrad <fred.konrad@greensocs.com>
Cc: mttcg@listserver.greensocs.com,
	Peter Maydell <peter.maydell@linaro.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Mark Burton <mark.burton@greensocs.com>,
	Alexander Graf <agraf@suse.de>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
Date: Thu, 23 Apr 2015 16:44:28 +0100	[thread overview]
Message-ID: <87wq126e8z.fsf@linaro.org> (raw)
In-Reply-To: <55379366.9070304@greensocs.com>


Frederic Konrad <fred.konrad@greensocs.com> writes:

> On 10/04/2015 18:03, Frederic Konrad wrote:
>> On 30/03/2015 23:46, Peter Maydell wrote:
>>> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> 
>>> wrote:
>>>> So - Fred is unwilling to send the patch set as it stands, because 
>>>> frankly this part is totally broken.
>>>>
>>>> There is an independent patch set that needs splitting out which 
>>>> deals with just the atomic instruction issue - specifically for ARM 
>>>> (though I guess it’s applicable across the board)…
>>>>
>>>> So - in short - I HOPE to get the patch set onto the reflector 
>>>> sometime next week, and I’m sorry for the delay.
>>> What I really want to see is not so much the patch set
>>> but the design sketch I asked for that lists the
>>> various data structures and indicates which ones
>>> are going to be per-cpu, which ones will be shared
>>> (and with what locking), etc.
>>>
>>> -- PMM
>
> Does that makes sense?
>
> BTW here is the repository:
> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4

Is there a non-authenticated read-only http or git:// access to this repo?

>
> Thanks,
> Fred
>
>> Hi everybody,
>> Hi Peter,
>>
>> I tried to recap what we did, how it "works" and what the status:
>>
>> All the mechanism are basically unchanged.
>>
>> A lot of TCG structures are not thread safe.
>> And all TCG threads can run at the same times and sometimes want to 
>> generate
>> code at the same time.
>>
>> Translation block related structure:
>>
>> struct TBContext {
>>
>>     TranslationBlock *tbs;
>>     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>>     int nb_tbs;
>>     /* any access to the tbs or the page table must use this lock */
>>     QemuMutex tb_lock;
>>
>>     /* statistics */
>>     int tb_flush_count;
>>     int tb_phys_invalidate_count;
>>
>>     int tb_invalidated_flag;
>> };
>>
>> This structure is used in TCGContext: TBContext tb_ctx;
>>
>> "tbs" is basically where the translated block are stored and 
>> tb_phys_hash an
>> hash table to find them quickly.
>>
>> There are two solutions to prevent thread issues:
>>   A/ Just have two tb_ctx.
>>   B/ Share it between CPUs and protect the tb_ctx access.
>>
>> We took the second solution so all CPUs can benefit of the translated TB.
>> TBContext is written almost everywhere in translate-all.c.
>> When there are too much tbs a tb_flush occurs and destroy the array. 
>> We don't
>> handle this case right now.
>> tb_lock is already used by user-mode code, so we just convert it to 
>> QemuMutex so
>> we can reuse it in system-mode.
>>
>> struct TCGContext {
>>     uint8_t *pool_cur, *pool_end;
>>     TCGPool *pool_first, *pool_current, *pool_first_large;
>>     TCGLabel *labels;
>>     int nb_labels;
>>     int nb_globals;
>>     int nb_temps;
>>
>>     /* goto_tb support */
>>     tcg_insn_unit *code_buf;
>>     uintptr_t *tb_next;
>>     uint16_t *tb_next_offset;
>>     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */
>>
>>     /* liveness analysis */
>>     uint16_t *op_dead_args; /* for each operation, each bit tells if the
>>                                corresponding argument is dead */
>>     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
>>                                corresponding output argument needs to be
>>                                sync to memory. */
>>
>>     /* tells in which temporary a given register is. It does not take
>>        into account fixed registers */
>>     int reg_to_temp[TCG_TARGET_NB_REGS];
>>     TCGRegSet reserved_regs;
>>     intptr_t current_frame_offset;
>>     intptr_t frame_start;
>>     intptr_t frame_end;
>>     int frame_reg;
>>
>>     tcg_insn_unit *code_ptr;
>>     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>>     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>>
>>     GHashTable *helpers;
>>
>> #ifdef CONFIG_PROFILER
>>     /* profiling info */
>>     int64_t tb_count1;
>>     int64_t tb_count;
>>     int64_t op_count; /* total insn count */
>>     int op_count_max; /* max insn per TB */
>>     int64_t temp_count;
>>     int temp_count_max;
>>     int64_t del_op_count;
>>     int64_t code_in_len;
>>     int64_t code_out_len;
>>     int64_t interm_time;
>>     int64_t code_time;
>>     int64_t la_time;
>>     int64_t opt_time;
>>     int64_t restore_count;
>>     int64_t restore_time;
>> #endif
>>
>> #ifdef CONFIG_DEBUG_TCG
>>     int temps_in_use;
>>     int goto_tb_issue_mask;
>> #endif
>>
>>     uint16_t gen_opc_buf[OPC_BUF_SIZE];
>>     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>>
>>     uint16_t *gen_opc_ptr;
>>     TCGArg *gen_opparam_ptr;
>>     target_ulong gen_opc_pc[OPC_BUF_SIZE];
>>     uint16_t gen_opc_icount[OPC_BUF_SIZE];
>>     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
>>
>>     /* Code generation.  Note that we specifically do not use 
>> tcg_insn_unit
>>        here, because there's too much arithmetic throughout that relies
>>        on addition and subtraction working on bytes.  Rely on the GCC
>>        extension that allows arithmetic on void*.  */
>>     int code_gen_max_blocks;
>>     void *code_gen_prologue;
>>     void *code_gen_buffer;
>>     size_t code_gen_buffer_size;
>>     /* threshold to flush the translated code buffer */
>>     size_t code_gen_buffer_max_size;
>>     void *code_gen_ptr;
>>
>>     TBContext tb_ctx;
>>
>>     /* The TCGBackendData structure is private to tcg-target.c. */
>>     struct TCGBackendData *be;
>> };
>>
>> This structure is used to translate the TBs.
>> The easier solution was to protect the generation of the code to only 
>> allow one
>> CPU to generate code at a time. This is normal as we don't want double 
>> generated
>> tb in the pool anyway. This is achieved with the tb_lock used above.
>>
>> TLB:
>>
>> TLB seems to be CPU dependant, so it is not really a problem as in our
>> implementation one CPU = one pthread. But sometimes a CPU wants to 
>> flush TLB,
>> through an instruction for example. It is very likely an other CPU in 
>> an other
>> thread is executing code at the same time. That's why we choose to 
>> create a
>> tlb_flush_mechanism:
>> When a CPU wants to flush it asks and wait all CPU to exit TCG and 
>> then exit
>> itself. This can be reused for tb_invalidate and or tb_flush as well.
>>
>> Atomic instructions:
>>
>> Atomic instructions are quite hard to implement.
>> The TranslationBlock implementing the atomic instruction can't be 
>> interrupted
>> during the execution (eg: by an interrupt or a signal) cmpxchg64 
>> helper is used
>> for that.
>>
>> QEMU's global lock:
>>
>> TCG thread take the lock during code execution. This is not ok for 
>> multi-thread
>> because that means only one thread will be running at a time. That's 
>> why we took
>> Jan's patch to allow TCG to run without the lock and take it when needed.
>>
>> What is the status:
>>
>>  * We can start a vexpress-a15 simulation with two A15 and run two 
>> dhrystones at
>>    a time, the performance are increased it's quite stable.
>>
>> What is missing:
>>
>>  * tb_flush is not implemented correctly.
>>  * PageDesc structure is not protected the patch which introduced a 
>> first_tb
>>    array was not the right approach and is removed. This implies that
>>    tb_invalidate is broken.
>>
>> For both issues we plan to use the same mechanism as tlb_flush: 
>> exiting all the
>> CPU, flushing, invalidating and let them continue. A generic mechanism 
>> must be
>> implemented for that.
>>
>> Known issues:
>>
>>  * GDB stub is broken because it uses tb_invalidate and we didn't 
>> implement that
>>    for now, and there are probably other issues.
>>  * SMP > 2 crashes, probably because of tb_invalidate as well.
>>  * We don't know the status of the user code, which is probably broken 
>> by our
>>    changes.
>>

-- 
Alex Bennée

next prev parent reply	other threads:[~2015-04-23 15:44 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
2015-01-27 14:36   ` Alex Bennée
2015-01-29 15:17   ` Peter Maydell
2015-02-02  8:31     ` Frederic Konrad
2015-02-02  8:36       ` Peter Maydell
2015-02-26 18:09     ` Frederic Konrad
2015-02-26 20:36       ` Alexander Graf
2015-02-26 22:56       ` Peter Maydell
2015-02-27  7:54         ` Mark Burton
2015-03-02 12:27           ` Peter Maydell
2015-03-03 15:29             ` Mark Burton
2015-03-03 15:32               ` Paolo Bonzini
2015-03-03 15:33                 ` Mark Burton
2015-03-03 15:34                   ` Paolo Bonzini
2015-03-03 15:41                     ` Mark Burton
2015-03-03 15:47                   ` Dr. David Alan Gilbert
2015-03-13 19:38                     ` Richard Henderson
2015-03-13 20:04                       ` Dr. David Alan Gilbert
2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
2015-01-27 14:45   ` Alex Bennée
2015-01-27 15:16     ` Frederic Konrad
2015-01-29 15:24   ` Peter Maydell
2015-01-29 15:33     ` Mark Burton
2015-02-02  8:39     ` Frederic Konrad
2015-02-02  8:49       ` Peter Maydell
2015-02-03 16:17   ` Richard Henderson
2015-02-03 16:33     ` Paolo Bonzini
2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
2015-01-29 15:25   ` Peter Maydell
2015-02-02  8:45     ` Frederic Konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
2015-01-29 15:44   ` Peter Maydell
2015-02-03 16:30     ` Richard Henderson
2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
2015-01-29 15:52   ` Peter Maydell
2015-02-02 10:03     ` Paolo Bonzini
2015-02-02 13:12       ` Peter Maydell
2015-02-02 13:14         ` Paolo Bonzini
2015-02-03  9:37     ` Frederic Konrad
2015-02-03 10:29       ` Peter Maydell
2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
2015-03-27 10:37   ` Frederic Konrad
2015-03-30  6:52     ` Mark Burton
2015-03-30 21:46       ` Peter Maydell
2015-03-31  6:41         ` Mark Burton
2015-04-10 16:03         ` Frederic Konrad
2015-04-22 12:26           ` Frederic Konrad
2015-04-22 13:18             ` Peter Maydell
2015-04-23  7:38               ` Frederic Konrad
2015-04-23 15:44             ` Alex Bennée [this message]
2015-04-23 15:46               ` Alex Bennée
2015-04-27  7:37                 ` Frederic Konrad
2015-04-27 17:06             ` Emilio G. Cota
2015-04-28  8:17               ` Frederic Konrad
2015-04-28  9:06               ` Paolo Bonzini
2015-04-28 17:49                 ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wq126e8z.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=agraf@suse.de \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).