From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>,
Vijaya Kumar K <vijayak@caviumnetworks.com>
Cc: Prasun Kapoor <Prasun.Kapoor@caviumnetworks.com>,
Vijay <vijayak@cavium.com>, qemu-arm <qemu-arm@nongnu.org>,
QEMU Developers <qemu-devel@nongnu.org>,
Vijay Kilari <vijay.kilari@gmail.com>
Subject: Re: [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking
Date: Tue, 5 Apr 2016 17:21:18 +0200 [thread overview]
Message-ID: <5703D7EE.2000005@redhat.com> (raw)
In-Reply-To: <CAFEAcA_755uEOZoGj0SCtmT-5PyLMntkK3wZeuHvZ_kPG9A1aQ@mail.gmail.com>
On 05/04/2016 16:36, Peter Maydell wrote:
>> > +
>> > +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16
>> > +
>> > +/*
>> > + * Zero page/buffer checking using SIMD(Neon)
>> > + */
>> > +
>> > +static bool
>> > +can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len)
>> > +{
>> > + return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON
>> > + * sizeof(NEON_VECTYPE)) == 0
>> > + && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0);
>> > +}
>> > +
>> > +static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len)
>> > +{
>> > + size_t i;
>> > + NEON_VECTYPE d0, d1, d2, d3, d4, d5, d6;
>> > + NEON_VECTYPE d7, d8, d9, d10, d11, d12, d13, d14;
>> > + uint64_t const *data = buf;
>> > +
>> > + assert(can_use_buffer_find_nonzero_offset_neon(buf, len));
>> > + len /= sizeof(unsigned long);
>> > +
>> > + for (i = 0; i < len; i += 32) {
>> > + d0 = NEON_LOAD_N_ORR(data[i], data[i + 2]);
>> > + d1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]);
>> > + d2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]);
>> > + d3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]);
>> > + d4 = NEON_ORR(d0, d1);
>> > + d5 = NEON_ORR(d2, d3);
>> > + d6 = NEON_ORR(d4, d5);
>> > +
>> > + d7 = NEON_LOAD_N_ORR(data[i + 16], data[i + 18]);
>> > + d8 = NEON_LOAD_N_ORR(data[i + 20], data[i + 22]);
>> > + d9 = NEON_LOAD_N_ORR(data[i + 24], data[i + 26]);
>> > + d10 = NEON_LOAD_N_ORR(data[i + 28], data[i + 30]);
>> > + d11 = NEON_ORR(d7, d8);
>> > + d12 = NEON_ORR(d9, d10);
>> > + d13 = NEON_ORR(d11, d12);
>> > +
>> > + d14 = NEON_ORR(d6, d13);
>> > + if (NEON_EQ_ZERO(d14)) {
>> > + break;
>> > + }
>> > + }
> Both the other optimised find_nonzero implementations in this
> file have two loops, not just one. Is it OK that this
> implementation has only a single loop?
>
> Paolo: do you know why we have two loops in the other
> implementations?
Because usually the first one or two iterations are enough to exit the
function if the page is nonzero. It's measurably slower to go through
the unrolled loop in that case. On the other hand, once the first few
iterations found only zero bytes, the buffer is very likely entirely
zero and the unrolled loop helps.
But in theory it should be enough to add a new #elif branch like this:
#include "arm_neon.h"
#define VECTYPE uint64x2_t
#define VEC_OR(a, b) ((a) | (b))
#define ALL_EQ(a, b) /* ??? :) */
around the
/* vector definitions */
comment in util/cutils.c. GCC should do everything else.
Paolo
next prev parent reply other threads:[~2016-04-05 15:21 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1459777195-7907-1-git-send-email-vijayak@caviumnetworks.com>
2016-04-04 13:39 ` [Qemu-devel] [RFC PATCH v1 1/2] target-arm: Update page size for aarch64 vijayak
2016-04-04 13:44 ` Peter Maydell
2016-04-04 16:40 ` Vijay Kilari
2016-04-04 16:44 ` Peter Maydell
2016-04-06 15:01 ` Vijay Kilari
2016-05-31 9:04 ` Vijay Kilari
2016-05-31 9:31 ` Peter Maydell
2016-04-04 13:39 ` [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking vijayak
2016-04-05 14:36 ` Peter Maydell
2016-04-05 15:21 ` Paolo Bonzini [this message]
2016-04-05 16:01 ` Peter Maydell
[not found] ` <C94A741879221447B4FC9B607EB4FFCD79EA34F4@DGGEMA504-MBX.china.huawei.com>
2017-03-23 16:56 ` [Qemu-devel] [Qemu-arm] about armv8's prefetch decode Pranith Kumar
2017-03-24 6:14 ` [Qemu-devel] [Qemu-arm] [patch 1/1]about " Wangjintang
2017-03-24 10:06 ` Peter Maydell
2017-03-25 2:22 ` Wangjintang
2017-03-25 12:35 ` Peter Maydell
2016-04-06 8:32 ` [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking Vijay Kilari
2016-04-05 15:28 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5703D7EE.2000005@redhat.com \
--to=pbonzini@redhat.com \
--cc=Prasun.Kapoor@caviumnetworks.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vijay.kilari@gmail.com \
--cc=vijayak@cavium.com \
--cc=vijayak@caviumnetworks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).