All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: cota@braap.org, qemu-devel@nongnu.org
Subject: Re: [RFC PATCH 0/5] Experimenting with tb-lookup tweaks
Date: Thu, 25 Feb 2021 10:15:42 +0000	[thread overview]
Message-ID: <871rd4bgbv.fsf@linaro.org> (raw)
In-Reply-To: <7d23665f-fa20-028f-d48a-2ea79ab35b2f@linaro.org>


Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/24/21 8:58 AM, Alex Bennée wrote:
>> Hi Richard,
>> 
>> Well I spun up some of the ideas we talked about to see if there was
>> anything to be squeezed out of the function. In the end the results
>> seem to be a washout with my pigz benchmark:
>> 
>>  qemu-system-aarch64 -cpu cortex-a57 \
>>    -machine type=virt,virtualization=on,gic-version=3 \
>>    -serial mon:stdio \
>>    -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>    -device virtio-net-pci,netdev=unet,id=virt-net,disable-legacy=on \
>>    -device virtio-scsi-pci,id=virt-scsi,disable-legacy=on \
>>    -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-buster-arm64 \
>>    -device scsi-hd,drive=hd,id=virt-scsi-hd \
>>    -smp 4 -m 4096 \
>>    -kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image \
>>    -append "root=/dev/sda2 systemd.unit=benchmark-pigz.service" \
>>    -display none -snapshot
>> 
>> | Command | Mean [s]       | Min [s] | Max [s] | Relative |
>> |---------+----------------+---------+---------+----------|
>> | Before  | 46.597 ± 2.482 |  45.208 |  53.618 |     1.00 |
>> | After   | 46.867 ± 2.242 |  45.871 |  53.180 |     1.00 |
>
> Well that's disappointing.
>
>> Maybe the code cleanup itself makes it worthwhile. WDYT?
>
> I think there's little doubt that the first 3 patches are a good code cleanup.
>
> Patch 4 I think is still beneficial, simply so that we can add that "Above
> fields" comment.
>
> Patch 5 would only be worthwhile if we could measure any positive difference,
> which it seems we cannot.
>
> I have a follow-up patch to remove the parallel_cpus global variable which I
> will post in a moment.  While it removes a handful of insns from this
> fast-path, I doubt it helps.  But getting rid of a global is probably always
> positive, no?
>
> I was glancing through the lookup function for alpha, instead of aarch64 and saw:
>
>  21e:   33 43 18                xor    0x18(%rbx),%eax
>  221:   4c 31 e1                xor    %r12,%rcx
>  224:   44 31 ea                xor    %r13d,%edx
>  227:   09 c2                   or     %eax,%edx
>  229:   48 0b 4b 08             or     0x8(%rbx),%rcx
>
> and thought -- hang on, how come we're just ORing nor XORing here?  Of course
> it's the cs_base field, which alpha has set to zero.  The compiler has
> simplified bits |= 0 ^ tb->cs_base.
>
> Which got me thinking: what if we had a per-cpu
>
> typedef struct {
>     target_ulong pc;
>     ...
> } TranslationBlockID;
>
> static inline bool arch_tbid_cmp(TranslationBlockID x,
>                                  TranslationBlockID y)
> {
>     return x.pc == y.pc && ...;
> }
>
> We could potentially reduce this to memcmp(&x, &y).
>
> First, this would allow cs_base to be eliminated where it is not used.  Second,
> this would allow cs_base to be renamed for the non-x86 targets for which it is
> being abused.  Third, it would allow tb->flags to be either (a) elided or (b)
> extended by the target as needed.
>
> This final is directed at ARM, of course, where we've overflowed the uint32_t
> that is tb->flags.  We could now extend that to 64-bits.
>
> Obviously, some tweaks to tb_hash_func would be required as well, but that's
> manageable.
>
> What do you think about this last?

Sounds like a good idea for clean-up, especially to get rid of
cs_base/extend tbflags when needed. One concern would be where do we go
when we get to heterogeneous emulation? Will they share the same
translation area like the current cpu->cluster_index stuff or will that
only be for similar but not quite the same architectures? Maybe I'm
thinking too far ahead... 

>
>
> r~


-- 
Alex Bennée


  reply	other threads:[~2021-02-25 10:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 16:58 [RFC PATCH 0/5] Experimenting with tb-lookup tweaks Alex Bennée
2021-02-24 16:58 ` [RFC PATCH 1/5] accel/tcg: rename tb_lookup__cpu_state and hoist state extraction Alex Bennée
2021-02-24 16:58 ` [RFC PATCH 2/5] accel/tcg: move CF_CLUSTER calculation to curr_cflags Alex Bennée
2021-02-24 16:58 ` [RFC PATCH 3/5] accel/tcg: drop the use of CF_HASH_MASK and rename params Alex Bennée
2021-02-24 16:58 ` [RFC PATCH 4/5] include/exec: lightly re-arrange TranslationBlock Alex Bennée
2021-02-24 16:58 ` [RFC PATCH 5/5] include/exec/tb-lookup: try and reduce branch prediction issues Alex Bennée
2021-02-25  0:28 ` [RFC PATCH 0/5] Experimenting with tb-lookup tweaks Richard Henderson
2021-02-25 10:15   ` Alex Bennée [this message]
2021-02-25 15:45 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rd4bgbv.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.