Re: [Qemu-devel] [PATCH v6 00/19] Remaining MTTCG Base patches and ARM enablement

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, mttcg@greensocs.com,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com,
	mark.burton@greensocs.com, jan.kiszka@siemens.com,
	serge.fdrv@gmail.com, rth@twiddle.net, peter.maydell@linaro.org,
	claudio.fontana@huawei.com
Subject: Re: [Qemu-devel] [PATCH v6 00/19] Remaining MTTCG Base patches and ARM enablement
Date: Wed, 09 Nov 2016 18:38:09 +0000	[thread overview]
Message-ID: <87lgws8pny.fsf@linaro.org> (raw)
In-Reply-To: <1b4e51fd-92c9-d731-4d15-63e330422f20@redhat.com>


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/11/2016 15:57, Alex Bennée wrote:
>> The one outstanding question is how to deal with the TLB flush
>> semantics of the various guest architectures. Currently flushes to
>> other vCPUs will happen at the end of their currently executing
>> Translation Block which could mean the originating vCPU makes
>> assumptions about flushes having been completed when they haven't. In
>> practice this hasn't been a problem and I haven't been able to
>> construct a test case so far that would fail in such a case. This is
>> probably because most tear downs of the other vCPU TLBs tend to be
>> done while the other vCPUs are not doing much. If anyone can come up
>> with a test case that would fail if this assumption isn't met then
>> please let me know.
>
> Have you tried implementing ARM's DMB semantics correctly?

I've implemented a stricter semantics with the proof of concept patch
bellow.

I'm not sure how to do it on the DMB instruction itself at the
concept of a pending flush is a run-time rather than a translation-time
concept. Is this the sort of state that could be pushed into the
Translation flags? I suspect forcing generation of safe work
synchronisation points for all DMBs would slow things down a lot.
Usually the DMB's will be right after the flushes but not always so I
doubt you can guarantee they will be in the same basic block.

Thoughts?


--8<---------------cut here---------------start------------->8---

target-arm: ensure tlbi_aa64_vae1is_write completes (POC)

Previously flushes on other vCPUs would only get serviced when they
exited their TranslationBlocks. While this isn't overly problematic it
violates the semantics of TLB flush from the point of view of source
vCPU.

This proof-of-concept solves this by introducing a new
tlb_flush_all_page_by_mmuidx which ensures all TLB flushes are completed
by the time execution continues on the vCPU. It does this by creating
a synchronisation point by scheduling its own flush via async_safe_work
and exiting the execution loop. Once the safe work has executed all TLBs
will have been updated.

4 files changed, 53 insertions(+), 9 deletions(-)
cputlb.c                   | 33 +++++++++++++++++++++++++++++++++
include/exec/exec-all.h    | 11 +++++++++++
target-arm/helper.c        | 17 ++++++++---------
target-arm/translate-a64.c |  1 +

modified   cputlb.c
@@ -354,6 +354,39 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
     }
 }

+/* This function affects all vCPUs are will ensure all work is
+ * complete by the time the loop restarts
+ */
+void tlb_flush_all_page_by_mmuidx(CPUState *src_cpu, target_ulong addr, ...)
+{
+    unsigned long mmu_idx_bitmap;
+    target_ulong addr_and_mmu_idx;
+    va_list argp;
+    CPUState *other_cs;
+
+    va_start(argp, addr);
+    mmu_idx_bitmap = make_mmu_index_bitmap(argp);
+    va_end(argp);
+
+    tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%lx\n", addr, mmu_idx_bitmap);
+
+    /* This should already be page aligned */
+    addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
+    addr_and_mmu_idx |= mmu_idx_bitmap;
+
+    CPU_FOREACH(other_cs) {
+        if (other_cs != src_cpu) {
+            async_run_on_cpu(other_cs, tlb_check_page_and_flush_by_mmuidx_async_work,
+                             RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+        } else {
+            async_safe_run_on_cpu(other_cs, tlb_check_page_and_flush_by_mmuidx_async_work,
+                                  RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+        }
+    }
+
+    cpu_loop_exit(src_cpu);
+}
+
 void tlb_flush_page_all(target_ulong addr)
 {
     CPUState *cpu;
modified   include/exec/exec-all.h
@@ -115,6 +115,17 @@ void tlb_flush(CPUState *cpu, int flush_global);
  */
 void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...);
 /**
+ * tlb_flush_all_page_by_mmuidx:
+ * @cpu: Originating CPU of the flush
+ * @addr: virtual address of page to be flushed
+ * @...: list of MMU indexes to flush, terminated by a negative value
+ *
+ * Flush one page from the TLB of all CPUs, for the specified
+ * MMU indexes. This function does not return, the run loop will exit
+ * and restart once the flush is completed.
+ */
+void tlb_flush_all_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...);
+/**
  * tlb_flush_by_mmuidx:
  * @cpu: CPU whose TLB should be flushed
  * @...: list of MMU indexes to flush, terminated by a negative value
modified   target-arm/helper.c
@@ -3047,20 +3047,19 @@ static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri,
--8<---------------cut here---------------end--------------->8---
 static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                    uint64_t value)
 {
+    ARMCPU *cpu = arm_env_get_cpu(env);
+    CPUState *cs = CPU(cpu);
     bool sec = arm_is_secure_below_el3(env);
-    CPUState *other_cs;
     uint64_t pageaddr = sextract64(value << 12, 0, 56);

     fprintf(stderr,"%s: dbg\n", __func__);

-    CPU_FOREACH(other_cs) {
-        if (sec) {
-            tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S1SE1,
-                                     ARMMMUIdx_S1SE0, -1);
-        } else {
-            tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S12NSE1,
-                                     ARMMMUIdx_S12NSE0, -1);
-        }
+    if (sec) {
+        tlb_flush_all_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S1SE1,
+                                 ARMMMUIdx_S1SE0, -1);
+    } else {
+        tlb_flush_all_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S12NSE1,
+                                 ARMMMUIdx_S12NSE0, -1);
     }
 }

modified   target-arm/translate-a64.c
@@ -1588,6 +1588,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         } else if (ri->writefn) {
             TCGv_ptr tmpptr;
             tmpptr = tcg_const_ptr(ri);
+            gen_a64_set_pc_im(s->pc);
             gen_helper_set_cp_reg64(cpu_env, tmpptr, tcg_rt);
             tcg_temp_free_ptr(tmpptr);
         } else {

--
Alex Bennée

next prev parent reply	other threads:[~2016-11-09 18:38 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-09 14:57 [Qemu-devel] [PATCH v6 00/19] Remaining MTTCG Base patches and ARM enablement Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 01/19] docs: new design document multi-thread-tcg.txt Alex Bennée
2016-11-10 15:00   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 02/19] tcg: add options for enabling MTTCG Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 03/19] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-11-10 15:10   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 04/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 05/19] tcg: drop global lock during TCG code execution Alex Bennée
2016-11-10 15:18   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 06/19] tcg: remove global exit_request Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 07/19] tcg: enable tb_lock() for SoftMMU Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 08/19] tcg: enable thread-per-vCPU Alex Bennée
2016-11-10 16:35   ` Richard Henderson
2016-11-10 16:46     ` Alex Bennée
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 09/19] tcg: handle EXCP_ATOMIC exception for system emulation Alex Bennée
2016-11-10 16:36   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 10/19] cputlb: add assert_cpu_is_self checks Alex Bennée
2016-11-10 16:39   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 11/19] cputlb: introduce tlb_flush_* async work Alex Bennée
2016-11-10 16:48   ` Richard Henderson
2016-11-10 17:34     ` Alex Bennée
2016-11-10 17:40       ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 12/19] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2016-11-10 16:51   ` Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 13/19] cputlb: atomically update tlb fields used by tlb_reset_dirty Alex Bennée
2016-11-09 19:36   ` Pranith Kumar
2016-11-10 16:14     ` Alex Bennée
2016-11-10 17:27       ` Richard Henderson
2016-11-10 18:00         ` Alex Bennée
2016-11-10 18:32           ` Richard Henderson
2016-11-10 17:23   ` Richard Henderson
2016-11-10 18:07     ` Alex Bennée
2016-11-09 14:57 ` [PATCH v6 14/19] target-arm/powerctl: defer cpu reset work to CPU context Alex Bennée
2016-11-09 14:57   ` [Qemu-devel] " Alex Bennée
2016-11-10 17:35   ` Richard Henderson
2016-11-10 17:35     ` [Qemu-devel] " Richard Henderson
2016-11-09 14:57 ` [PATCH v6 15/19] target-arm/cpu: don't reset TLB structures, use cputlb to do it Alex Bennée
2016-11-09 14:57   ` [Qemu-devel] " Alex Bennée
2016-11-10 17:48   ` Richard Henderson
2016-11-10 17:48     ` [Qemu-devel] " Richard Henderson
2016-11-10 18:08     ` Alex Bennée
2016-11-10 18:08       ` [Qemu-devel] " Alex Bennée
2016-11-09 14:57 ` [PATCH v6 16/19] target-arm: ensure BQL taken for ARM_CP_IO register access Alex Bennée
2016-11-09 14:57   ` [Qemu-devel] " Alex Bennée
2016-11-10 17:54   ` Richard Henderson
2016-11-10 17:54     ` [Qemu-devel] " Richard Henderson
2016-11-09 14:57 ` [PATCH v6 17/19] target-arm: helpers which may affect global state need the BQL Alex Bennée
2016-11-09 14:57   ` [Qemu-devel] " Alex Bennée
2016-11-10 17:56   ` Richard Henderson
2016-11-10 17:56     ` [Qemu-devel] " Richard Henderson
2016-11-09 14:57 ` [PATCH v6 18/19] target-arm: don't generate WFE/YIELD calls for MTTCG Alex Bennée
2016-11-09 14:57   ` [Qemu-devel] " Alex Bennée
2016-11-10 17:59   ` Richard Henderson
2016-11-10 17:59     ` [Qemu-devel] " Richard Henderson
2016-11-09 14:57 ` [Qemu-devel] [PATCH v6 19/19] tcg: enable MTTCG by default for ARM on x86 hosts Alex Bennée
2016-11-10 18:00   ` Richard Henderson
2016-11-10 18:13     ` Alex Bennée
2016-11-10 18:41       ` Richard Henderson
2016-11-09 15:11 ` [Qemu-devel] [PATCH v6 00/19] Remaining MTTCG Base patches and ARM enablement Paolo Bonzini
2016-11-09 18:38   ` Alex Bennée [this message]
2016-11-13  5:50 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lgws8pny.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=cota@braap.org \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@greensocs.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.