Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Frederic Konrad <fred.konrad@greensocs.com>
Cc: mttcg@listserver.greensocs.com,
	Peter Maydell <peter.maydell@linaro.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Mark Burton <mark.burton@greensocs.com>,
	Alexander Graf <agraf@suse.de>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
Date: Thu, 23 Apr 2015 16:44:28 +0100	[thread overview]
Message-ID: <87wq126e8z.fsf@linaro.org> (raw)
In-Reply-To: <55379366.9070304@greensocs.com>


Frederic Konrad <fred.konrad@greensocs.com> writes:

> On 10/04/2015 18:03, Frederic Konrad wrote:
>> On 30/03/2015 23:46, Peter Maydell wrote:
>>> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> 
>>> wrote:
>>>> So - Fred is unwilling to send the patch set as it stands, because 
>>>> frankly this part is totally broken.
>>>>
>>>> There is an independent patch set that needs splitting out which 
>>>> deals with just the atomic instruction issue - specifically for ARM 
>>>> (though I guess it’s applicable across the board)…
>>>>
>>>> So - in short - I HOPE to get the patch set onto the reflector 
>>>> sometime next week, and I’m sorry for the delay.
>>> What I really want to see is not so much the patch set
>>> but the design sketch I asked for that lists the
>>> various data structures and indicates which ones
>>> are going to be per-cpu, which ones will be shared
>>> (and with what locking), etc.
>>>
>>> -- PMM
>
> Does that makes sense?
>
> BTW here is the repository:
> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4

Is there a non-authenticated read-only http or git:// access to this repo?

>
> Thanks,
> Fred
>
>> Hi everybody,
>> Hi Peter,
>>
>> I tried to recap what we did, how it "works" and what the status:
>>
>> All the mechanism are basically unchanged.
>>
>> A lot of TCG structures are not thread safe.
>> And all TCG threads can run at the same times and sometimes want to 
>> generate
>> code at the same time.
>>
>> Translation block related structure:
>>
>> struct TBContext {
>>
>>     TranslationBlock *tbs;
>>     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>>     int nb_tbs;
>>     /* any access to the tbs or the page table must use this lock */
>>     QemuMutex tb_lock;
>>
>>     /* statistics */
>>     int tb_flush_count;
>>     int tb_phys_invalidate_count;
>>
>>     int tb_invalidated_flag;
>> };
>>
>> This structure is used in TCGContext: TBContext tb_ctx;
>>
>> "tbs" is basically where the translated block are stored and 
>> tb_phys_hash an
>> hash table to find them quickly.
>>
>> There are two solutions to prevent thread issues:
>>   A/ Just have two tb_ctx.
>>   B/ Share it between CPUs and protect the tb_ctx access.
>>
>> We took the second solution so all CPUs can benefit of the translated TB.
>> TBContext is written almost everywhere in translate-all.c.
>> When there are too much tbs a tb_flush occurs and destroy the array. 
>> We don't
>> handle this case right now.
>> tb_lock is already used by user-mode code, so we just convert it to 
>> QemuMutex so
>> we can reuse it in system-mode.
>>
>> struct TCGContext {
>>     uint8_t *pool_cur, *pool_end;
>>     TCGPool *pool_first, *pool_current, *pool_first_large;
>>     TCGLabel *labels;
>>     int nb_labels;
>>     int nb_globals;
>>     int nb_temps;
>>
>>     /* goto_tb support */
>>     tcg_insn_unit *code_buf;
>>     uintptr_t *tb_next;
>>     uint16_t *tb_next_offset;
>>     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */
>>
>>     /* liveness analysis */
>>     uint16_t *op_dead_args; /* for each operation, each bit tells if the
>>                                corresponding argument is dead */
>>     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
>>                                corresponding output argument needs to be
>>                                sync to memory. */
>>
>>     /* tells in which temporary a given register is. It does not take
>>        into account fixed registers */
>>     int reg_to_temp[TCG_TARGET_NB_REGS];
>>     TCGRegSet reserved_regs;
>>     intptr_t current_frame_offset;
>>     intptr_t frame_start;
>>     intptr_t frame_end;
>>     int frame_reg;
>>
>>     tcg_insn_unit *code_ptr;
>>     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>>     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>>
>>     GHashTable *helpers;
>>
>> #ifdef CONFIG_PROFILER
>>     /* profiling info */
>>     int64_t tb_count1;
>>     int64_t tb_count;
>>     int64_t op_count; /* total insn count */
>>     int op_count_max; /* max insn per TB */
>>     int64_t temp_count;
>>     int temp_count_max;
>>     int64_t del_op_count;
>>     int64_t code_in_len;
>>     int64_t code_out_len;
>>     int64_t interm_time;
>>     int64_t code_time;
>>     int64_t la_time;
>>     int64_t opt_time;
>>     int64_t restore_count;
>>     int64_t restore_time;
>> #endif
>>
>> #ifdef CONFIG_DEBUG_TCG
>>     int temps_in_use;
>>     int goto_tb_issue_mask;
>> #endif
>>
>>     uint16_t gen_opc_buf[OPC_BUF_SIZE];
>>     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>>
>>     uint16_t *gen_opc_ptr;
>>     TCGArg *gen_opparam_ptr;
>>     target_ulong gen_opc_pc[OPC_BUF_SIZE];
>>     uint16_t gen_opc_icount[OPC_BUF_SIZE];
>>     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
>>
>>     /* Code generation.  Note that we specifically do not use 
>> tcg_insn_unit
>>        here, because there's too much arithmetic throughout that relies
>>        on addition and subtraction working on bytes.  Rely on the GCC
>>        extension that allows arithmetic on void*.  */
>>     int code_gen_max_blocks;
>>     void *code_gen_prologue;
>>     void *code_gen_buffer;
>>     size_t code_gen_buffer_size;
>>     /* threshold to flush the translated code buffer */
>>     size_t code_gen_buffer_max_size;
>>     void *code_gen_ptr;
>>
>>     TBContext tb_ctx;
>>
>>     /* The TCGBackendData structure is private to tcg-target.c. */
>>     struct TCGBackendData *be;
>> };
>>
>> This structure is used to translate the TBs.
>> The easier solution was to protect the generation of the code to only 
>> allow one
>> CPU to generate code at a time. This is normal as we don't want double 
>> generated
>> tb in the pool anyway. This is achieved with the tb_lock used above.
>>
>> TLB:
>>
>> TLB seems to be CPU dependant, so it is not really a problem as in our
>> implementation one CPU = one pthread. But sometimes a CPU wants to 
>> flush TLB,
>> through an instruction for example. It is very likely an other CPU in 
>> an other
>> thread is executing code at the same time. That's why we choose to 
>> create a
>> tlb_flush_mechanism:
>> When a CPU wants to flush it asks and wait all CPU to exit TCG and 
>> then exit
>> itself. This can be reused for tb_invalidate and or tb_flush as well.
>>
>> Atomic instructions:
>>
>> Atomic instructions are quite hard to implement.
>> The TranslationBlock implementing the atomic instruction can't be 
>> interrupted
>> during the execution (eg: by an interrupt or a signal) cmpxchg64 
>> helper is used
>> for that.
>>
>> QEMU's global lock:
>>
>> TCG thread take the lock during code execution. This is not ok for 
>> multi-thread
>> because that means only one thread will be running at a time. That's 
>> why we took
>> Jan's patch to allow TCG to run without the lock and take it when needed.
>>
>> What is the status:
>>
>>  * We can start a vexpress-a15 simulation with two A15 and run two 
>> dhrystones at
>>    a time, the performance are increased it's quite stable.
>>
>> What is missing:
>>
>>  * tb_flush is not implemented correctly.
>>  * PageDesc structure is not protected the patch which introduced a 
>> first_tb
>>    array was not the right approach and is removed. This implies that
>>    tb_invalidate is broken.
>>
>> For both issues we plan to use the same mechanism as tlb_flush: 
>> exiting all the
>> CPU, flushing, invalidating and let them continue. A generic mechanism 
>> must be
>> implemented for that.
>>
>> Known issues:
>>
>>  * GDB stub is broken because it uses tb_invalidate and we didn't 
>> implement that
>>    for now, and there are probably other issues.
>>  * SMP > 2 crashes, probably because of tb_invalidate as well.
>>  * We don't know the status of the user code, which is probably broken 
>> by our
>>    changes.
>>

-- 
Alex Bennée

next prev parent reply	other threads:[~2015-04-23 15:44 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
2015-01-27 14:36   ` Alex Bennée
2015-01-29 15:17   ` Peter Maydell
2015-02-02  8:31     ` Frederic Konrad
2015-02-02  8:36       ` Peter Maydell
2015-02-26 18:09     ` Frederic Konrad
2015-02-26 20:36       ` Alexander Graf
2015-02-26 22:56       ` Peter Maydell
2015-02-27  7:54         ` Mark Burton
2015-03-02 12:27           ` Peter Maydell
2015-03-03 15:29             ` Mark Burton
2015-03-03 15:32               ` Paolo Bonzini
2015-03-03 15:33                 ` Mark Burton
2015-03-03 15:34                   ` Paolo Bonzini
2015-03-03 15:41                     ` Mark Burton
2015-03-03 15:47                   ` Dr. David Alan Gilbert
2015-03-13 19:38                     ` Richard Henderson
2015-03-13 20:04                       ` Dr. David Alan Gilbert
2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
2015-01-27 14:45   ` Alex Bennée
2015-01-27 15:16     ` Frederic Konrad
2015-01-29 15:24   ` Peter Maydell
2015-01-29 15:33     ` Mark Burton
2015-02-02  8:39     ` Frederic Konrad
2015-02-02  8:49       ` Peter Maydell
2015-02-03 16:17   ` Richard Henderson
2015-02-03 16:33     ` Paolo Bonzini
2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
2015-01-29 15:25   ` Peter Maydell
2015-02-02  8:45     ` Frederic Konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
2015-01-29 15:44   ` Peter Maydell
2015-02-03 16:30     ` Richard Henderson
2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
2015-01-29 15:52   ` Peter Maydell
2015-02-02 10:03     ` Paolo Bonzini
2015-02-02 13:12       ` Peter Maydell
2015-02-02 13:14         ` Paolo Bonzini
2015-02-03  9:37     ` Frederic Konrad
2015-02-03 10:29       ` Peter Maydell
2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
2015-03-27 10:37   ` Frederic Konrad
2015-03-30  6:52     ` Mark Burton
2015-03-30 21:46       ` Peter Maydell
2015-03-31  6:41         ` Mark Burton
2015-04-10 16:03         ` Frederic Konrad
2015-04-22 12:26           ` Frederic Konrad
2015-04-22 13:18             ` Peter Maydell
2015-04-23  7:38               ` Frederic Konrad
2015-04-23 15:44             ` Alex Bennée [this message]
2015-04-23 15:46               ` Alex Bennée
2015-04-27  7:37                 ` Frederic Konrad
2015-04-27 17:06             ` Emilio G. Cota
2015-04-28  8:17               ` Frederic Konrad
2015-04-28  9:06               ` Paolo Bonzini
2015-04-28 17:49                 ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wq126e8z.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=agraf@suse.de \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.