Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Frederic Konrad <fred.konrad@greensocs.com>
To: Peter Maydell <peter.maydell@linaro.org>,
	Mark Burton <mark.burton@greensocs.com>
Cc: mttcg@listserver.greensocs.com,
	"Jan Kiszka" <jan.kiszka@siemens.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Alexander Graf" <agraf@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
Date: Fri, 10 Apr 2015 18:03:16 +0200	[thread overview]
Message-ID: <5527F444.7060808@greensocs.com> (raw)
In-Reply-To: <CAFEAcA8FTNWbWfjfVd31e53=UCjctM5vCExqh6cqh2i8ePZ1Yg@mail.gmail.com>

On 30/03/2015 23:46, Peter Maydell wrote:
> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> wrote:
>> So - Fred is unwilling to send the patch set as it stands, because frankly this part is totally broken.
>>
>> There is an independent patch set that needs splitting out which deals with just the atomic instruction issue - specifically for ARM (though I guess it’s applicable across the board)…
>>
>> So - in short - I HOPE to get the patch set onto the reflector sometime next week, and I’m sorry for the delay.
> What I really want to see is not so much the patch set
> but the design sketch I asked for that lists the
> various data structures and indicates which ones
> are going to be per-cpu, which ones will be shared
> (and with what locking), etc.
>
> -- PMM
Hi everybody,
Hi Peter,

I tried to recap what we did, how it "works" and what the status:

All the mechanism are basically unchanged.

A lot of TCG structures are not thread safe.
And all TCG threads can run at the same times and sometimes want to generate
code at the same time.

Translation block related structure:

struct TBContext {

     TranslationBlock *tbs;
     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
     QemuMutex tb_lock;

     /* statistics */
     int tb_flush_count;
     int tb_phys_invalidate_count;

     int tb_invalidated_flag;
};

This structure is used in TCGContext: TBContext tb_ctx;

"tbs" is basically where the translated block are stored and 
tb_phys_hash an
hash table to find them quickly.

There are two solutions to prevent thread issues:
   A/ Just have two tb_ctx.
   B/ Share it between CPUs and protect the tb_ctx access.

We took the second solution so all CPUs can benefit of the translated TB.
TBContext is written almost everywhere in translate-all.c.
When there are too much tbs a tb_flush occurs and destroy the array. We 
don't
handle this case right now.
tb_lock is already used by user-mode code, so we just convert it to 
QemuMutex so
we can reuse it in system-mode.

struct TCGContext {
     uint8_t *pool_cur, *pool_end;
     TCGPool *pool_first, *pool_current, *pool_first_large;
     TCGLabel *labels;
     int nb_labels;
     int nb_globals;
     int nb_temps;

     /* goto_tb support */
     tcg_insn_unit *code_buf;
     uintptr_t *tb_next;
     uint16_t *tb_next_offset;
     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */

     /* liveness analysis */
     uint16_t *op_dead_args; /* for each operation, each bit tells if the
                                corresponding argument is dead */
     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
                                corresponding output argument needs to be
                                sync to memory. */

     /* tells in which temporary a given register is. It does not take
        into account fixed registers */
     int reg_to_temp[TCG_TARGET_NB_REGS];
     TCGRegSet reserved_regs;
     intptr_t current_frame_offset;
     intptr_t frame_start;
     intptr_t frame_end;
     int frame_reg;

     tcg_insn_unit *code_ptr;
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];

     GHashTable *helpers;

#ifdef CONFIG_PROFILER
     /* profiling info */
     int64_t tb_count1;
     int64_t tb_count;
     int64_t op_count; /* total insn count */
     int op_count_max; /* max insn per TB */
     int64_t temp_count;
     int temp_count_max;
     int64_t del_op_count;
     int64_t code_in_len;
     int64_t code_out_len;
     int64_t interm_time;
     int64_t code_time;
     int64_t la_time;
     int64_t opt_time;
     int64_t restore_count;
     int64_t restore_time;
#endif

#ifdef CONFIG_DEBUG_TCG
     int temps_in_use;
     int goto_tb_issue_mask;
#endif

     uint16_t gen_opc_buf[OPC_BUF_SIZE];
     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];

     uint16_t *gen_opc_ptr;
     TCGArg *gen_opparam_ptr;
     target_ulong gen_opc_pc[OPC_BUF_SIZE];
     uint16_t gen_opc_icount[OPC_BUF_SIZE];
     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];

     /* Code generation.  Note that we specifically do not use 
tcg_insn_unit
        here, because there's too much arithmetic throughout that relies
        on addition and subtraction working on bytes.  Rely on the GCC
        extension that allows arithmetic on void*.  */
     int code_gen_max_blocks;
     void *code_gen_prologue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     /* threshold to flush the translated code buffer */
     size_t code_gen_buffer_max_size;
     void *code_gen_ptr;

     TBContext tb_ctx;

     /* The TCGBackendData structure is private to tcg-target.c.  */
     struct TCGBackendData *be;
};

This structure is used to translate the TBs.
The easier solution was to protect the generation of the code to only 
allow one
CPU to generate code at a time. This is normal as we don't want double 
generated
tb in the pool anyway. This is achieved with the tb_lock used above.

TLB:

TLB seems to be CPU dependant, so it is not really a problem as in our
implementation one CPU = one pthread. But sometimes a CPU wants to flush 
TLB,
through an instruction for example. It is very likely an other CPU in an 
other
thread is executing code at the same time. That's why we choose to create a
tlb_flush_mechanism:
When a CPU wants to flush it asks and wait all CPU to exit TCG and then 
exit
itself. This can be reused for tb_invalidate and or tb_flush as well.

Atomic instructions:

Atomic instructions are quite hard to implement.
The TranslationBlock implementing the atomic instruction can't be 
interrupted
during the execution (eg: by an interrupt or a signal) cmpxchg64 helper 
is used
for that.

QEMU's global lock:

TCG thread take the lock during code execution. This is not ok for 
multi-thread
because that means only one thread will be running at a time. That's why 
we took
Jan's patch to allow TCG to run without the lock and take it when needed.

What is the status:

  * We can start a vexpress-a15 simulation with two A15 and run two 
dhrystones at
    a time, the performance are increased it's quite stable.

What is missing:

  * tb_flush is not implemented correctly.
  * PageDesc structure is not protected the patch which introduced a 
first_tb
    array was not the right approach and is removed. This implies that
    tb_invalidate is broken.

For both issues we plan to use the same mechanism as tlb_flush: exiting 
all the
CPU, flushing, invalidating and let them continue. A generic mechanism 
must be
implemented for that.

Known issues:

  * GDB stub is broken because it uses tb_invalidate and we didn't 
implement that
    for now, and there are probably other issues.
  * SMP > 2 crashes, probably because of tb_invalidate as well.
  * We don't know the status of the user code, which is probably broken 
by our
    changes.

next prev parent reply	other threads:[~2015-04-10 16:03 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
2015-01-27 14:36   ` Alex Bennée
2015-01-29 15:17   ` Peter Maydell
2015-02-02  8:31     ` Frederic Konrad
2015-02-02  8:36       ` Peter Maydell
2015-02-26 18:09     ` Frederic Konrad
2015-02-26 20:36       ` Alexander Graf
2015-02-26 22:56       ` Peter Maydell
2015-02-27  7:54         ` Mark Burton
2015-03-02 12:27           ` Peter Maydell
2015-03-03 15:29             ` Mark Burton
2015-03-03 15:32               ` Paolo Bonzini
2015-03-03 15:33                 ` Mark Burton
2015-03-03 15:34                   ` Paolo Bonzini
2015-03-03 15:41                     ` Mark Burton
2015-03-03 15:47                   ` Dr. David Alan Gilbert
2015-03-13 19:38                     ` Richard Henderson
2015-03-13 20:04                       ` Dr. David Alan Gilbert
2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
2015-01-27 14:45   ` Alex Bennée
2015-01-27 15:16     ` Frederic Konrad
2015-01-29 15:24   ` Peter Maydell
2015-01-29 15:33     ` Mark Burton
2015-02-02  8:39     ` Frederic Konrad
2015-02-02  8:49       ` Peter Maydell
2015-02-03 16:17   ` Richard Henderson
2015-02-03 16:33     ` Paolo Bonzini
2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
2015-01-29 15:25   ` Peter Maydell
2015-02-02  8:45     ` Frederic Konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
2015-01-29 15:44   ` Peter Maydell
2015-02-03 16:30     ` Richard Henderson
2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
2015-01-29 15:52   ` Peter Maydell
2015-02-02 10:03     ` Paolo Bonzini
2015-02-02 13:12       ` Peter Maydell
2015-02-02 13:14         ` Paolo Bonzini
2015-02-03  9:37     ` Frederic Konrad
2015-02-03 10:29       ` Peter Maydell
2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
2015-03-27 10:37   ` Frederic Konrad
2015-03-30  6:52     ` Mark Burton
2015-03-30 21:46       ` Peter Maydell
2015-03-31  6:41         ` Mark Burton
2015-04-10 16:03         ` Frederic Konrad [this message]
2015-04-22 12:26           ` Frederic Konrad
2015-04-22 13:18             ` Peter Maydell
2015-04-23  7:38               ` Frederic Konrad
2015-04-23 15:44             ` Alex Bennée
2015-04-23 15:46               ` Alex Bennée
2015-04-27  7:37                 ` Frederic Konrad
2015-04-27 17:06             ` Emilio G. Cota
2015-04-28  8:17               ` Frederic Konrad
2015-04-28  9:06               ` Paolo Bonzini
2015-04-28 17:49                 ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5527F444.7060808@greensocs.com \
    --to=fred.konrad@greensocs.com \
    --cc=agraf@suse.de \
    --cc=alex.bennee@linaro.org \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).