All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: qemu-devel@nongnu.org, pbonzini@redhat.com
Subject: Re: tcg_flush_jmp_cache replacing qatomic_set loop with memset
Date: Mon, 16 Oct 2023 18:31:53 +0100	[thread overview]
Message-ID: <20231016173153.GR7912@redhat.com> (raw)
In-Reply-To: <20231016154336.GA27462@redhat.com>

On Mon, Oct 16, 2023 at 04:43:36PM +0100, Richard W.M. Jones wrote:
> Hey Paolo,
> 
> Quick question.  I'm sure the transformation below is *not* correct,
> because it doesn't preserve the invariant of the lockless structure.
> Is there a way to do this while maintaining correctness?  For example
> putting barrier() after memset?  (Note I'm also zeroing .pc which may
> be a problem.)

Alright so ignore this question :-(

After inspecting the assembly on x86-64, I can see the qatomic_set
simply expands to a regular store (actually looks like it is
unrolled by 2):

  716340:       48 c7 00 00 00 00 00    movq   $0x0,(%rax)
  716347:       48 c7 40 10 00 00 00    movq   $0x0,0x10(%rax)
  71634e:       00 
  71634f:       48 83 c0 20             add    $0x20,%rax
  716353:       48 39 d0                cmp    %rdx,%rax
  716356:       75 e8                   jne    716340 <tcg_flush_jmp_cache+0x20>

My memset version was twice as fast because it used some avx
instructions.

I guess this would do something more fancy on aarch64 host ...

Rich.

> The background to this is that I've been playing around with the very
> hot tb_lookup function.  Increasing the size of the jump cache (which
> hasn't changed since, erm, 2005!), looks like it could improve
> performance, plus a few other changes which I'm playing with.  However
> increasing the size causes profiles to be dominated by the loop in
> tcg_flush_jmp_cache, presumably because of all those serialized atomic ops.
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 8cb6ad3511..6a21b3dba8 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -796,9 +796,7 @@ void tcg_flush_jmp_cache(CPUState *cpu)
>          return;
>      }
>  
> -    for (int i = 0; i < TB_JMP_CACHE_SIZE; i++) {
> -        qatomic_set(&jc->array[i].tb, NULL);
> -    }
> +    memset(jc->array, 0, TB_JMP_CACHE_SIZE * sizeof jc->array[0]);
>  }
>  
>  /* This is a wrapper for common code that can not use CONFIG_SOFTMMU */
> 
> Rich.
> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> nbdkit - Flexible, fast NBD server with plugins
> https://gitlab.com/nbdkit/nbdkit

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit



      reply	other threads:[~2023-10-16 17:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 15:43 tcg_flush_jmp_cache replacing qatomic_set loop with memset Richard W.M. Jones
2023-10-16 17:31 ` Richard W.M. Jones [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231016173153.GR7912@redhat.com \
    --to=rjones@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.