qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vijay Kilari <vijay.kilari@gmail.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Vijaya Kumar K <vijayak@caviumnetworks.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Prasun Kapoor <Prasun.Kapoor@caviumnetworks.com>,
	qemu-arm <qemu-arm@nongnu.org>, Vijay <vijayak@cavium.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking
Date: Wed, 6 Apr 2016 14:02:04 +0530	[thread overview]
Message-ID: <CALicx6tVZhH==LT5no5k_=y1x_ixHyu_u_+MSgB7SPAfamu80Q@mail.gmail.com> (raw)
In-Reply-To: <CAFEAcA_755uEOZoGj0SCtmT-5PyLMntkK3wZeuHvZ_kPG9A1aQ@mail.gmail.com>

On Tue, Apr 5, 2016 at 8:06 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 4 April 2016 at 14:39,  <vijayak@caviumnetworks.com> wrote:
>> From: Vijay <vijayak@cavium.com>
>>
>> Use Neon instructions to perform zero checking of
>> buffer. This is helps in reducing downtime during
>> live migration.
>>
>> Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
>> ---
>>  util/cutils.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 81 insertions(+)
>>
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 43d1afb..d343b9a 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -352,6 +352,87 @@ static void *can_use_buffer_find_nonzero_offset_ifunc(void)
>>      return func;
>>  }
>>  #pragma GCC pop_options
>> +
>> +#elif defined __aarch64__
>> +#include "arm_neon.h"
>
> Can we rely on all compilers having this, or do we need to
> test in configure?

GCC and armcc support the same intrinsics. Both needs inclusion
of arm_neon.h.

>
>> +
>> +#define NEON_VECTYPE               uint64x2_t
>> +#define NEON_LOAD_N_ORR(v1, v2)    vorrq_u64(vld1q_u64(&v1), vld1q_u64(&v2))
>> +#define NEON_ORR(v1, v2)           vorrq_u64(v1, v2)
>> +#define NEON_EQ_ZERO(v1) \
>> +        ((vgetq_lane_u64(vceqzq_u64(v1), 0) == 0) || \
>> +         (vgetq_lane_u64(vceqzq_u64(v1), 1)) == 0)
>
> The intrinsics are a bit confusing, but shouldn't we be
> testing that both lanes of v1 are 0, rather than whether
> either of them is? (so "&&", not "||").

Above check is correct. vceqzq() sets all bits to 1 if value is 0.
So if one lane is 0, then it means it is non-zero buffer. I think
redefining this macro as below would be better and avoid
vceqzq_u64()

#define NEON_NOT_EQ_ZERO(v1) \
        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1)) != 0)

>
>> +
>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16
>> +
>> +/*
>> + * Zero page/buffer checking using SIMD(Neon)
>> + */
>> +
>> +static bool
>> +can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len)
>> +{
>> +    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON
>> +                   * sizeof(NEON_VECTYPE)) == 0
>> +            && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0);
>> +}
>> +
>> +static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len)
>> +{
>> +    size_t i;
>> +    NEON_VECTYPE d0, d1, d2, d3, d4, d5, d6;
>> +    NEON_VECTYPE d7, d8, d9, d10, d11, d12, d13, d14;
>> +    uint64_t const *data = buf;
>> +
>> +    assert(can_use_buffer_find_nonzero_offset_neon(buf, len));
>> +    len /= sizeof(unsigned long);
>> +
>> +    for (i = 0; i < len; i += 32) {
>> +        d0 = NEON_LOAD_N_ORR(data[i], data[i + 2]);
>> +        d1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]);
>> +        d2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]);
>> +        d3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]);
>> +        d4 = NEON_ORR(d0, d1);
>> +        d5 = NEON_ORR(d2, d3);
>> +        d6 = NEON_ORR(d4, d5);
>> +
>> +        d7 = NEON_LOAD_N_ORR(data[i + 16], data[i + 18]);
>> +        d8 = NEON_LOAD_N_ORR(data[i + 20], data[i + 22]);
>> +        d9 = NEON_LOAD_N_ORR(data[i + 24], data[i + 26]);
>> +        d10 = NEON_LOAD_N_ORR(data[i + 28], data[i + 30]);
>> +        d11 = NEON_ORR(d7, d8);
>> +        d12 = NEON_ORR(d9, d10);
>> +        d13 = NEON_ORR(d11, d12);
>> +
>> +        d14 = NEON_ORR(d6, d13);
>> +        if (NEON_EQ_ZERO(d14)) {
>> +            break;
>> +        }
>> +    }
>
> Both the other optimised find_nonzero implementations in this
> file have two loops, not just one. Is it OK that this
> implementation has only a single loop?
>
> Paolo: do you know why we have two loops in the other
> implementations?

Paolo was right as he mentioned in the previous email.
But with two loops, I don't see much benefit. So restricted to
one loop.

>
>> +
>> +    return i * sizeof(unsigned long);
>> +}
>> +
>> +static inline bool neon_support(void)
>> +{
>> +    /*
>> +     * Check if neon feature is supported.
>> +     * By default neon is supported for aarch64.
>> +     */
>> +    return true;
>> +}
>
> There doesn't seem much point in this. We can assume Neon exists
> on any CPU we're going to run on (it's part of the ABI, the kernel
> assumes it, etc etc). So you can just implement the functions without
> the indirection functions below.
>
 Hmm. One reason was compilation fails if we don't call
can_use_buffer_find_nonzero_offset_inner() function from inside neon
implementation.
So I added this similar to AVX2 intel. Also thought if any platform
does not implement
Neon, then can simply skip changes this function.

>> +
>> +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    return neon_support() ? can_use_buffer_find_nonzero_offset_neon(buf, len) :
>> +           can_use_buffer_find_nonzero_offset_inner(buf, len);
>> +}
>> +
>> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>> +{
>> +    return neon_support() ? buffer_find_nonzero_offset_neon(buf, len) :
>> +           buffer_find_nonzero_offset_inner(buf, len);
>> +}
>>  #else
>>  bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>>  {
>> --
>
> thanks
> -- PMM

  parent reply	other threads:[~2016-04-06  8:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1459777195-7907-1-git-send-email-vijayak@caviumnetworks.com>
2016-04-04 13:39 ` [Qemu-devel] [RFC PATCH v1 1/2] target-arm: Update page size for aarch64 vijayak
2016-04-04 13:44   ` Peter Maydell
2016-04-04 16:40     ` Vijay Kilari
2016-04-04 16:44       ` Peter Maydell
2016-04-06 15:01         ` Vijay Kilari
2016-05-31  9:04           ` Vijay Kilari
2016-05-31  9:31             ` Peter Maydell
2016-04-04 13:39 ` [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking vijayak
2016-04-05 14:36   ` Peter Maydell
2016-04-05 15:21     ` Paolo Bonzini
2016-04-05 16:01       ` Peter Maydell
     [not found]         ` <C94A741879221447B4FC9B607EB4FFCD79EA34F4@DGGEMA504-MBX.china.huawei.com>
2017-03-23 16:56           ` [Qemu-devel] [Qemu-arm] about armv8's prefetch decode Pranith Kumar
2017-03-24  6:14             ` [Qemu-devel] [Qemu-arm] [patch 1/1]about " Wangjintang
2017-03-24 10:06               ` Peter Maydell
2017-03-25  2:22                 ` Wangjintang
2017-03-25 12:35                   ` Peter Maydell
2016-04-06  8:32     ` Vijay Kilari [this message]
2016-04-05 15:28   ` [Qemu-devel] [RFC PATCH v1 2/2] target-arm: Use Neon for zero checking Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALicx6tVZhH==LT5no5k_=y1x_ixHyu_u_+MSgB7SPAfamu80Q@mail.gmail.com' \
    --to=vijay.kilari@gmail.com \
    --cc=Prasun.Kapoor@caviumnetworks.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vijayak@cavium.com \
    --cc=vijayak@caviumnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).