From: "Alex Bennée" <alex.bennee@linaro.org>
To: Alvise Rigo <a.rigo@virtualopensystems.com>
Cc: mttcg@listserver.greensocs.com, claudio.fontana@huawei.com,
qemu-devel@nongnu.org, pbonzini@redhat.com,
jani.kokkonen@huawei.com, tech@virtualopensystems.com
Subject: Re: [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag
Date: Thu, 16 Jul 2015 15:32:04 +0100 [thread overview]
Message-ID: <87k2u05g2j.fsf@linaro.org> (raw)
In-Reply-To: <1436516626-8322-3-git-send-email-a.rigo@virtualopensystems.com>
Alvise Rigo <a.rigo@virtualopensystems.com> writes:
> Add a new flag for the TLB entries to force all the accesses made to a
> page to follow the slow-path.
>
> In the case we remove a TLB entry marked as EXCL, we unset the
> corresponding exclusive bit in the bitmap.
>
> Mark the accessed page as dirty to invalidate any pending operation of
> LL/SC only if a vCPU writes to the protected address.
>
> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
> ---
> cputlb.c | 18 ++++-
> include/exec/cpu-all.h | 2 +
> include/exec/cpu-defs.h | 4 +
> softmmu_template.h | 189 +++++++++++++++++++++++++++++++-----------------
> 4 files changed, 144 insertions(+), 69 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index e5853fd..0aca407 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -380,6 +380,16 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> env->tlb_v_table[mmu_idx][vidx] = *te;
> env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
>
> + if (!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL)) {
> + /* We are removing an exclusive entry, if the corresponding exclusive
> + * bit is set, unset it. */
> + hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
> + (te->addr_write & TARGET_PAGE_MASK);
> + if (cpu_physical_memory_excl_is_dirty(hw_addr)) {
> + cpu_physical_memory_set_excl_dirty(hw_addr);
> + }
I'm confused. I'm reading that as "if the dirty exclusive bit is set
then set the dirty exclusive bit", that doesn't seem right. The comment
seems to imply that should be a: cpu_physical_memory_clear_excl_dirty?
> + }
> +
> /* refill the tlb */
> env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
> env->iotlb[mmu_idx][index].attrs = attrs;
> @@ -405,7 +415,13 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> + xlat)) {
> te->addr_write = address | TLB_NOTDIRTY;
> } else {
> - te->addr_write = address;
> + if (!(address & TLB_MMIO) &&
> + !cpu_physical_memory_excl_is_dirty(section->mr->ram_addr
> + + xlat)) {
> + te->addr_write = address | TLB_EXCL;
> + } else {
> + te->addr_write = address;
> + }
> }
> } else {
> te->addr_write = -1;
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index ac06c67..632f6ce 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -311,6 +311,8 @@ extern RAMList ram_list;
> #define TLB_NOTDIRTY (1 << 4)
> /* Set if TLB entry is an IO callback. */
> #define TLB_MMIO (1 << 5)
> +/* Set if TLB entry refers a page that requires exclusive access. */
> +#define TLB_EXCL (1 << 6)
I wonder if a compile time assert should be added here to trap the case
when TARGET_PAGE_MASK starts encroaching on the lower bits? It looks
like the smallest at the moment gives us 10 bits to play with.
>
> void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
> void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
> diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
> index d5aecaf..c73a75f 100644
> --- a/include/exec/cpu-defs.h
> +++ b/include/exec/cpu-defs.h
> @@ -165,5 +165,9 @@ typedef struct CPUIOTLBEntry {
> #define CPU_COMMON \
> /* soft mmu support */ \
> CPU_COMMON_TLB \
> + \
> + /* Used for atomic instruction translation. */ \
> + bool ll_sc_context; \
> + hwaddr excl_protected_hwaddr; \
>
> #endif
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 18871f5..0edd451 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -141,6 +141,23 @@
> vidx >= 0; \
> })
>
> +#define lookup_cpus_ll_addr(addr) \
> +({ \
> + CPUState *cpu; \
> + CPUArchState *acpu; \
> + bool hit = false; \
> + \
> + CPU_FOREACH(cpu) { \
> + acpu = (CPUArchState *)cpu->env_ptr; \
> + if (cpu != current_cpu && acpu->excl_protected_hwaddr == addr) { \
> + hit = true; \
> + break; \
> + } \
> + } \
> + \
> + hit; \
> +})
> +
Is there a reason to abuse a #define like this instead of having an
inline and letting the compiler sort it out?
> #ifndef SOFTMMU_CODE_ACCESS
> static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
> CPUIOTLBEntry *iotlbentry,
> @@ -414,43 +431,61 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> }
>
> - /* Handle an IO access. */
> + /* Handle an IO access or exclusive access. */
> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> - CPUIOTLBEntry *iotlbentry;
> - if ((addr & (DATA_SIZE - 1)) != 0) {
> - goto do_unaligned_access;
> - }
> - iotlbentry = &env->iotlb[mmu_idx][index];
> -
> - /* ??? Note that the io helpers always read data in the target
> - byte ordering. We should push the LE/BE request down into io. */
> - val = TGT_LE(val);
> - glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> - return;
> - }
> -
> - /* Handle slow unaligned access (it spans two pages or IO). */
> - if (DATA_SIZE > 1
> - && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> - >= TARGET_PAGE_SIZE)) {
> - int i;
> - do_unaligned_access:
> - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> - mmu_idx, retaddr);
> + CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> + if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> + /* The slow-path has been forced since we are writing to
> + * exclusive-protected memory. */
> + hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> +
> + bool set_to_dirty;
> +
> + /* Two cases of invalidation: the current vCPU is writing to another
> + * vCPU's exclusive address or the vCPU that issued the LoadLink is
> + * writing to it, but not through a StoreCond. */
> + set_to_dirty = lookup_cpus_ll_addr(hw_addr);
> + set_to_dirty |= env->ll_sc_context &&
> + (env->excl_protected_hwaddr == hw_addr);
> +
> + if (set_to_dirty) {
> + cpu_physical_memory_set_excl_dirty(hw_addr);
> + } /* the vCPU is legitimately writing to the protected address */
> + } else {
> + if ((addr & (DATA_SIZE - 1)) != 0) {
> + goto do_unaligned_access;
> + }
> +
> + /* ??? Note that the io helpers always read data in the target
> + byte ordering. We should push the LE/BE request down into io. */
> + val = TGT_LE(val);
> + glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> + return;
> }
> - /* XXX: not efficient, but simple */
> - /* Note: relies on the fact that tlb_fill() does not remove the
> - * previous page from the TLB cache. */
> - for (i = DATA_SIZE - 1; i >= 0; i--) {
> - /* Little-endian extract. */
> - uint8_t val8 = val >> (i * 8);
> - /* Note the adjustment at the beginning of the function.
> - Undo that for the recursion. */
> - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> - oi, retaddr + GETPC_ADJ);
> + } else {
> + /* Handle slow unaligned access (it spans two pages or IO). */
> + if (DATA_SIZE > 1
> + && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> + >= TARGET_PAGE_SIZE)) {
> + int i;
> + do_unaligned_access:
> + if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> + cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> + mmu_idx, retaddr);
> + }
> + /* XXX: not efficient, but simple */
> + /* Note: relies on the fact that tlb_fill() does not remove the
> + * previous page from the TLB cache. */
> + for (i = DATA_SIZE - 1; i >= 0; i--) {
> + /* Little-endian extract. */
> + uint8_t val8 = val >> (i * 8);
> + /* Note the adjustment at the beginning of the function.
> + Undo that for the recursion. */
> + glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> + oi, retaddr + GETPC_ADJ);
> + }
> + return;
> }
> - return;
> }
OK I can just about follow what happened now with the 3 exit points and
extra goto thrown in but this function is starting to smell. The changes
seem reasonable but what happens to the next tweak to the function?
>
> /* Handle aligned access or unaligned access in the same page. */
> @@ -494,43 +529,61 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> }
>
> - /* Handle an IO access. */
> + /* Handle an IO access or exclusive access. */
> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> - CPUIOTLBEntry *iotlbentry;
> - if ((addr & (DATA_SIZE - 1)) != 0) {
> - goto do_unaligned_access;
> - }
> - iotlbentry = &env->iotlb[mmu_idx][index];
> -
> - /* ??? Note that the io helpers always read data in the target
> - byte ordering. We should push the LE/BE request down into io. */
> - val = TGT_BE(val);
> - glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> - return;
> - }
> -
> - /* Handle slow unaligned access (it spans two pages or IO). */
> - if (DATA_SIZE > 1
> - && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> - >= TARGET_PAGE_SIZE)) {
> - int i;
> - do_unaligned_access:
> - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> - mmu_idx, retaddr);
> + CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> + if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> + /* The slow-path has been forced since we are writing to
> + * exclusive-protected memory. */
> + hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> +
> + bool set_to_dirty;
> +
> + /* Two cases of invalidation: the current vCPU is writing to another
> + * vCPU's exclusive address or the vCPU that issued the LoadLink is
> + * writing to it, but not through a StoreCond. */
> + set_to_dirty = lookup_cpus_ll_addr(hw_addr);
> + set_to_dirty |= env->ll_sc_context &&
> + (env->excl_protected_hwaddr == hw_addr);
> +
> + if (set_to_dirty) {
> + cpu_physical_memory_set_excl_dirty(hw_addr);
> + } /* the vCPU is legitimately writing to the protected address */
> + } else {
> + if ((addr & (DATA_SIZE - 1)) != 0) {
> + goto do_unaligned_access;
> + }
> +
> + /* ??? Note that the io helpers always read data in the target
> + byte ordering. We should push the LE/BE request down into io. */
> + val = TGT_BE(val);
> + glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> + return;
> }
> - /* XXX: not efficient, but simple */
> - /* Note: relies on the fact that tlb_fill() does not remove the
> - * previous page from the TLB cache. */
> - for (i = DATA_SIZE - 1; i >= 0; i--) {
> - /* Big-endian extract. */
> - uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> - /* Note the adjustment at the beginning of the function.
> - Undo that for the recursion. */
> - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> - oi, retaddr + GETPC_ADJ);
> + } else {
> + /* Handle slow unaligned access (it spans two pages or IO). */
> + if (DATA_SIZE > 1
> + && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> + >= TARGET_PAGE_SIZE)) {
> + int i;
> + do_unaligned_access:
> + if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> + cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> + mmu_idx, retaddr);
> + }
> + /* XXX: not efficient, but simple */
> + /* Note: relies on the fact that tlb_fill() does not remove the
> + * previous page from the TLB cache. */
> + for (i = DATA_SIZE - 1; i >= 0; i--) {
> + /* Big-endian extract. */
> + uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> + /* Note the adjustment at the beginning of the function.
> + Undo that for the recursion. */
> + glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> + oi, retaddr + GETPC_ADJ);
> + }
> + return;
> }
> - return;
> }
>
> /* Handle aligned access or unaligned access in the same page. */
--
Alex Bennée
next prev parent reply other threads:[~2015-07-16 14:32 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-10 8:23 [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Alvise Rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 01/13] exec: Add new exclusive bitmap to ram_list Alvise Rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag Alvise Rigo
2015-07-16 14:32 ` Alex Bennée [this message]
2015-07-16 15:04 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 03/13] softmmu: Add helpers for a new slow-path Alvise Rigo
2015-07-16 14:53 ` Alex Bennée
2015-07-16 15:15 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 04/13] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
2015-07-17 9:49 ` Alex Bennée
2015-07-17 10:05 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 05/13] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
2015-07-17 12:51 ` Alex Bennée
2015-07-17 13:01 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 06/13] target-i386: " Alvise Rigo
2015-07-17 12:56 ` Alex Bennée
2015-07-17 13:27 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 07/13] ram_addr.h: Make exclusive bitmap accessors atomic Alvise Rigo
2015-07-17 13:32 ` Alex Bennée
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 08/13] exec.c: introduce a simple rendezvous support Alvise Rigo
2015-07-17 13:45 ` Alex Bennée
2015-07-17 13:54 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 09/13] cpus.c: introduce simple callback support Alvise Rigo
2015-07-10 9:36 ` Paolo Bonzini
2015-07-10 9:47 ` alvise rigo
2015-07-10 9:53 ` Frederic Konrad
2015-07-10 10:06 ` alvise rigo
2015-07-10 10:24 ` Paolo Bonzini
2015-07-10 12:16 ` Frederic Konrad
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 10/13] Simple TLB flush wrap to use as exit callback Alvise Rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 11/13] Introduce exit_flush_req and tcg_excl_access_lock Alvise Rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 12/13] softmmu_llsc_template.h: move to multithreading Alvise Rigo
2015-07-17 15:27 ` Alex Bennée
2015-07-17 15:31 ` alvise rigo
2015-07-10 8:23 ` [Qemu-devel] [RFC v3 13/13] softmmu_template.h: " Alvise Rigo
2015-07-17 15:57 ` Alex Bennée
2015-07-17 16:19 ` alvise rigo
2015-07-10 8:31 ` [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Mark Burton
2015-07-10 8:58 ` alvise rigo
2015-07-10 8:39 ` Frederic Konrad
2015-07-10 9:04 ` alvise rigo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k2u05g2j.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=a.rigo@virtualopensystems.com \
--cc=claudio.fontana@huawei.com \
--cc=jani.kokkonen@huawei.com \
--cc=mttcg@listserver.greensocs.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tech@virtualopensystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).