From: Richard Henderson <richard.henderson@linaro.org>
To: Robert Hoo <robert.hu@linux.intel.com>,
qemu-devel@nongnu.org, pbonzini@redhat.com, laurent@vivier.eu,
philmd@redhat.com, berrange@redhat.com
Cc: robert.hu@intel.com
Subject: Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
Date: Thu, 13 Feb 2020 10:20:36 -0800 [thread overview]
Message-ID: <ee2ef44a-737b-e989-7f20-18a69e19d430@linaro.org> (raw)
In-Reply-To: <1581580379-54109-3-git-send-email-robert.hu@linux.intel.com>
On 2/12/20 11:52 PM, Robert Hoo wrote:
> And initialize buffer_is_zero() with it, when Intel AVX512F is
> available on host.
>
> This function utilizes Intel AVX512 fundamental instructions which
> perform over previous AVX2 instructions.
Is it not still true that any AVX512 insn will cause the entire cpu package,
not just the current core, to drop frequency by 20%?
As far as I know one should only use the 512-bit instructions when you can
overcome that frequency drop, which seems unlikely in this case. That said...
> + if (unlikely(len < 64)) { /*buff less than 512 bits, unlikely*/
> + return buffer_zero_int(buf, len);
> + }
First, len < 64 has been eliminated already in select_accel_fn.
Second, len < 256 is not handled properly by the code below...
> + /* Begin with an unaligned head of 64 bytes. */
> + t = _mm512_loadu_si512(buf);
> + p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
> + e = (__m512i *)(((uintptr_t)buf + len) & -64);
> +
> + /* Loop over 64-byte aligned blocks of 256. */
> + while (p < e) {
> + __builtin_prefetch(p);
> + if (unlikely(_mm512_test_epi64_mask(t, t))) {
> + return false;
> + }
> + t = p[-4] | p[-3] | p[-2] | p[-1];
> + p += 4;
> + }
> +
> + t |= _mm512_loadu_si512(buf + len - 4 * 64);
> + t |= _mm512_loadu_si512(buf + len - 3 * 64);
> + t |= _mm512_loadu_si512(buf + len - 2 * 64);
> + t |= _mm512_loadu_si512(buf + len - 1 * 64);
... because this final sequence loads 256 bytes.
Rather than make a second test vs 256 in buffer_zero_avx512, I wonder if it
would be better to have select_accel_fn do the job. Have a global variable
buffer_accel_size alongside buffer_accel so there's only one branch
(mis)predict to worry about.
FWIW, something that the compiler should do, but doesn't currently, is use
vpternlogq to perform a 3-input OR. Something like
/* 0xfe -> orABC */
t = _mm512_ternarylogic_epi64(t, p[-4], p[-3], 0xfe);
t = _mm512_ternarylogic_epi64(t, p[-2], p[-1], 0xfe);
r~
next prev parent reply other threads:[~2020-02-13 18:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-13 7:52 [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() Robert Hoo
2020-02-13 7:52 ` [PATCH 1/2] configure: add configure option avx512f_opt Robert Hoo
2020-02-13 7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
2020-02-13 10:30 ` Paolo Bonzini
2020-02-13 11:58 ` Robert Hoo
2020-02-13 18:20 ` Richard Henderson [this message]
2020-02-24 7:07 ` Robert Hoo
2020-02-24 16:13 ` Richard Henderson
2020-02-25 7:34 ` Robert Hoo
2020-02-25 15:29 ` Richard Henderson
2020-02-13 8:40 ` [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee2ef44a-737b-e989-7f20-18a69e19d430@linaro.org \
--to=richard.henderson@linaro.org \
--cc=berrange@redhat.com \
--cc=laurent@vivier.eu \
--cc=pbonzini@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=robert.hu@intel.com \
--cc=robert.hu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).