Re: [PATCH v2] cpu-exec: simplify jump cache management

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org,  richard.henderson@linaro.org
Subject: Re: [PATCH v2] cpu-exec: simplify jump cache management
Date: Tue, 23 Jan 2024 22:02:11 +0000	[thread overview]
Message-ID: <87ede77lp8.fsf@draig.linaro.org> (raw)
In-Reply-To: <20240122153409.351959-1-pbonzini@redhat.com> (Paolo Bonzini's message of "Mon, 22 Jan 2024 16:34:09 +0100")

Paolo Bonzini <pbonzini@redhat.com> writes:

> Unless I'm missing something egregious, the jmp cache is only every
> populated with a valid entry by the same thread that reads the cache.
> Therefore, the contents of any valid entry are always consistent and
> there is no need for any acquire/release magic.
>
> Indeed ->tb has to be accessed with atomics, because concurrent
> invalidations would otherwise cause data races.  But ->pc is only ever
> accessed by one thread, and accesses to ->tb and ->pc within tb_lookup
> can never race with another tb_lookup.  While the TranslationBlock
> (especially the flags) could be modified by a concurrent invalidation,
> store-release and load-acquire operations on the cache entry would
> not add any additional ordering beyond what you get from performing
> the accesses within a single thread.
>
> Because of this, there is really nothing to win in splitting the CF_PCREL
> and !CF_PCREL paths.  It is easier to just always use the ->pc field in
> the jump cache.
>
> I noticed this while working on splitting commit 8ed558ec0cb
> ("accel/tcg: Introduce TARGET_TB_PCREL", 2022-10-04) into multiple
> pieces, for the sake of finding a more fine-grained bisection
> result for https://gitlab.com/qemu-project/qemu/-/issues/2092.
> It does not (and does not intend to) fix that issue; therefore
> it may make sense to not commit it until the root cause
> of issue #2092 is found.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  accel/tcg/tb-jmp-cache.h |  8 +++--
>  accel/tcg/cpu-exec.c     | 66 ++++++++++++++--------------------------
>  2 files changed, 28 insertions(+), 46 deletions(-)
>
> diff --git a/accel/tcg/tb-jmp-cache.h b/accel/tcg/tb-jmp-cache.h
> index bb424c8a05b..4ab8553afcc 100644
> --- a/accel/tcg/tb-jmp-cache.h
> +++ b/accel/tcg/tb-jmp-cache.h
> @@ -13,9 +13,11 @@
>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>  
>  /*
> - * Accessed in parallel; all accesses to 'tb' must be atomic.
> - * For CF_PCREL, accesses to 'pc' must be protected by a
> - * load_acquire/store_release to 'tb'.
> + * Invalidated in parallel; all accesses to 'tb' must be atomic.
> + * A valid entry is read/written by a single CPU, therefore there is
> + * no need for qatomic_rcu_read() and pc is always consistent with a
> + * non-NULL value of 'tb'.  Strictly speaking pc is only needed for
> + * CF_PCREL, but it's used always for simplicity.
>   */

I can follow the logic and soak testing seems to show its stable. So:

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

However I would note that TranslationBlock has the comment:

     * jmp_lock also protects the CF_INVALID cflag; a jump must not be chained
     * to a destination TB that has CF_INVALID set.

which I don't think holds true. It might be worth summarising the lock
requirements of TranslationBlock in a kdoc header.

>  struct CPUJumpCache {
>      struct rcu_head rcu;
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 67eda9865ee..40c268bfa1f 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -253,43 +253,29 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, vaddr pc,
>      hash = tb_jmp_cache_hash_func(pc);
>      jc = cpu->tb_jmp_cache;
>  
> -    if (cflags & CF_PCREL) {
> -        /* Use acquire to ensure current load of pc from jc. */
> -        tb = qatomic_load_acquire(&jc->array[hash].tb);
> -
> -        if (likely(tb &&
> -                   jc->array[hash].pc == pc &&
> -                   tb->cs_base == cs_base &&
> -                   tb->flags == flags &&
> -                   tb_cflags(tb) == cflags)) {
> -            return tb;
> -        }
> -        tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
> -        if (tb == NULL) {
> -            return NULL;
> -        }
> -        jc->array[hash].pc = pc;
> -        /* Ensure pc is written first. */
> -        qatomic_store_release(&jc->array[hash].tb, tb);
> -    } else {
> -        /* Use rcu_read to ensure current load of pc from *tb. */
> -        tb = qatomic_rcu_read(&jc->array[hash].tb);
> -
> -        if (likely(tb &&
> -                   tb->pc == pc &&
> -                   tb->cs_base == cs_base &&
> -                   tb->flags == flags &&
> -                   tb_cflags(tb) == cflags)) {
> -            return tb;
> -        }
> -        tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
> -        if (tb == NULL) {
> -            return NULL;
> -        }
> -        /* Use the pc value already stored in tb->pc. */
> -        qatomic_set(&jc->array[hash].tb, tb);
> +    tb = qatomic_read(&jc->array[hash].tb);
> +    if (likely(tb &&
> +               jc->array[hash].pc == pc &&
> +               tb->cs_base == cs_base &&
> +               tb->flags == flags &&
> +               tb_cflags(tb) == cflags)) {
> +        goto hit;
>      }
>  
> +    tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
> +    if (tb == NULL) {
> +        return NULL;
> +    }
> +
> +    jc->array[hash].pc = pc;
> +    qatomic_set(&jc->array[hash].tb, tb);
> +
> +hit:
> +    /*
> +     * As long as tb is not NULL, the contents are consistent.  Therefore,
> +     * the virtual PC has to match for non-CF_PCREL translations.
> +     */
> +    assert((tb_cflags(tb) & CF_PCREL) || tb->pc == pc);
>      return tb;
>  }
>  
> @@ -1012,14 +998,8 @@ cpu_exec_loop(CPUState *cpu, SyncClocks *sc)
>                   */
>                  h = tb_jmp_cache_hash_func(pc);
>                  jc = cpu->tb_jmp_cache;
> -                if (cflags & CF_PCREL) {
> -                    jc->array[h].pc = pc;
> -                    /* Ensure pc is written first. */
> -                    qatomic_store_release(&jc->array[h].tb, tb);
> -                } else {
> -                    /* Use the pc value already stored in tb->pc. */
> -                    qatomic_set(&jc->array[h].tb, tb);
> -                }
> +                jc->array[h].pc = pc;
> +                qatomic_set(&jc->array[h].tb, tb);
>              }
>  
>  #ifndef CONFIG_USER_ONLY

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

next prev parent reply	other threads:[~2024-01-23 22:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 15:34 [PATCH v2] cpu-exec: simplify jump cache management Paolo Bonzini
2024-01-22 21:57 ` Richard Henderson
2024-01-27  7:43   ` Richard Henderson
2024-01-23 22:02 ` Alex Bennée [this message]
2024-01-24 15:44   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ede77lp8.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.