* [PATCH 0/9] Host-specific includes, begin cpuinfo.h @ 2023-05-18 4:40 Richard Henderson 2023-05-18 4:40 ` [PATCH 1/9] util: Introduce host-specific cpuinfo.h Richard Henderson ` (8 more replies) 0 siblings, 9 replies; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Hiya. This is looking toward cleaning up a couple of things: (1) There are 5 bits of x86 host detection, 3 of them for xbzrle. Unify this down to one, with additional cleanups for xbzrle. (2) Provides a host-specific include path for splitting atomic128.h and probably other stuff so as to avoid host-specific ifdefs. Actually splitting atomic128.h is so far left for further work. r~ Richard Henderson (9): util: Introduce host-specific cpuinfo.h util: Add cpuinfo-i386.c util: Add i386 CPUINFO_ATOMIC_VMOVDQU tcg/i386: Use cpuinfo.h util/bufferiszero: Use i386 cpuinfo.h migration/xbzrle: Shuffle function order migration/xbzrle: Use i386 cacheinfo.h migration: Build migration_files once util: Add cpuinfo-aarch64.c include/host/aarch64/cpuinfo.h | 22 ++ include/host/generic/cpuinfo.h | 4 + include/host/i386/cpuinfo.h | 39 +++ include/host/x86_64/cpuinfo.h | 1 + migration/xbzrle.h | 5 +- tcg/aarch64/tcg-target.h | 4 +- tcg/i386/tcg-target.h | 28 +- migration/ram.c | 34 +-- migration/xbzrle.c | 268 ++++++++++--------- tests/bench/xbzrle-bench.c | 469 --------------------------------- tests/unit/test-xbzrle.c | 49 +--- util/bufferiszero.c | 126 ++++----- util/cpuinfo-aarch64.c | 67 +++++ util/cpuinfo-i386.c | 99 +++++++ meson.build | 8 + migration/meson.build | 1 - tcg/aarch64/tcg-target.c.inc | 41 +-- tcg/i386/tcg-target.c.inc | 123 +-------- tests/bench/meson.build | 6 - util/meson.build | 6 + 20 files changed, 476 insertions(+), 924 deletions(-) create mode 100644 include/host/aarch64/cpuinfo.h create mode 100644 include/host/generic/cpuinfo.h create mode 100644 include/host/i386/cpuinfo.h create mode 100644 include/host/x86_64/cpuinfo.h delete mode 100644 tests/bench/xbzrle-bench.c create mode 100644 util/cpuinfo-aarch64.c create mode 100644 util/cpuinfo-i386.c -- 2.34.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/9] util: Introduce host-specific cpuinfo.h 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:30 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 2/9] util: Add cpuinfo-i386.c Richard Henderson ` (7 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel Cc: peter.maydell, Paolo Bonzini, Marc-André Lureau, Daniel P. Berrangé, Thomas Huth, Philippe Mathieu-Daudé The entire contents of the header is host-specific, but the existence of such a header is not, which could prevent some host specific ifdefs at the top of the file for the include. Add include/host/{arch,generic} to the project arguments. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Cc: Paolo Bonzini <pbonzini@redhat.com> (maintainer:Meson) Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com> (reviewer:Meson) Cc: "Daniel P. Berrangé" <berrange@redhat.com> (reviewer:Meson) Cc: Thomas Huth <thuth@redhat.com> (reviewer:Meson) Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org> (reviewer:Meson) --- include/host/generic/cpuinfo.h | 4 ++++ meson.build | 8 ++++++++ 2 files changed, 12 insertions(+) create mode 100644 include/host/generic/cpuinfo.h diff --git a/include/host/generic/cpuinfo.h b/include/host/generic/cpuinfo.h new file mode 100644 index 0000000000..eca672064a --- /dev/null +++ b/include/host/generic/cpuinfo.h @@ -0,0 +1,4 @@ +/* + * No host specific cpu indentification. + * SPDX-License-Identifier: GPL-2.0-or-later + */ diff --git a/meson.build b/meson.build index 4dddccb890..0dd806e8a5 100644 --- a/meson.build +++ b/meson.build @@ -292,6 +292,14 @@ add_project_arguments('-iquote', '.', '-iquote', meson.current_source_dir() / 'include', language: all_languages) +include_host = meson.current_source_dir() / 'include/host/' +if fs.is_dir(include_host / host_arch) + add_project_arguments('-iquote', include_host / host_arch, + language: all_languages) +endif +add_project_arguments('-iquote', include_host / 'generic', + language: all_languages) + sparse = find_program('cgcc', required: get_option('sparse')) if sparse.found() run_target('sparse', -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/9] util: Introduce host-specific cpuinfo.h 2023-05-18 4:40 ` [PATCH 1/9] util: Introduce host-specific cpuinfo.h Richard Henderson @ 2023-05-18 9:30 ` Juan Quintela 0 siblings, 0 replies; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:30 UTC (permalink / raw) To: Richard Henderson Cc: qemu-devel, peter.maydell, Paolo Bonzini, Marc-André Lureau, Daniel P. Berrangé, Thomas Huth, Philippe Mathieu-Daudé Richard Henderson <richard.henderson@linaro.org> wrote: > The entire contents of the header is host-specific, but the > existence of such a header is not, which could prevent some > host specific ifdefs at the top of the file for the include. > > Add include/host/{arch,generic} to the project arguments. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/9] util: Add cpuinfo-i386.c 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson 2023-05-18 4:40 ` [PATCH 1/9] util: Introduce host-specific cpuinfo.h Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:35 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU Richard Henderson ` (6 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Add cpuinfo.h for i386 and x86_64, and the initialization for that in util/. Populate that with a slightly altered copy of the tcg host probing code. Other uses of cpuid.h will be adjusted one patch at a time. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- include/host/i386/cpuinfo.h | 38 ++++++++++++++ include/host/x86_64/cpuinfo.h | 1 + util/cpuinfo-i386.c | 97 +++++++++++++++++++++++++++++++++++ util/meson.build | 4 ++ 4 files changed, 140 insertions(+) create mode 100644 include/host/i386/cpuinfo.h create mode 100644 include/host/x86_64/cpuinfo.h create mode 100644 util/cpuinfo-i386.c diff --git a/include/host/i386/cpuinfo.h b/include/host/i386/cpuinfo.h new file mode 100644 index 0000000000..e6f7461378 --- /dev/null +++ b/include/host/i386/cpuinfo.h @@ -0,0 +1,38 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu indentification for x86. + */ + +#ifndef HOST_CPUINFO_H +#define HOST_CPUINFO_H + +/* Digested version of <cpuid.h> */ + +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ +#define CPUINFO_CMOV (1u << 1) +#define CPUINFO_MOVBE (1u << 2) +#define CPUINFO_LZCNT (1u << 3) +#define CPUINFO_POPCNT (1u << 4) +#define CPUINFO_BMI1 (1u << 5) +#define CPUINFO_BMI2 (1u << 6) +#define CPUINFO_SSE2 (1u << 7) +#define CPUINFO_SSE4 (1u << 8) +#define CPUINFO_AVX1 (1u << 9) +#define CPUINFO_AVX2 (1u << 10) +#define CPUINFO_AVX512F (1u << 11) +#define CPUINFO_AVX512VL (1u << 12) +#define CPUINFO_AVX512BW (1u << 13) +#define CPUINFO_AVX512DQ (1u << 14) +#define CPUINFO_AVX512VBMI2 (1u << 15) +#define CPUINFO_ATOMIC_VMOVDQA (1u << 16) + +/* Initialized with a constructor. */ +extern unsigned cpuinfo; + +/* + * We cannot rely on constructor ordering, so other constructors must + * use the function interface rather than the variable above. + */ +unsigned cpuinfo_init(void); + +#endif /* HOST_CPUINFO_H */ diff --git a/include/host/x86_64/cpuinfo.h b/include/host/x86_64/cpuinfo.h new file mode 100644 index 0000000000..535a8d79d4 --- /dev/null +++ b/include/host/x86_64/cpuinfo.h @@ -0,0 +1 @@ +#include "host/i386/cpuinfo.h" diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c new file mode 100644 index 0000000000..cb9475c688 --- /dev/null +++ b/util/cpuinfo-i386.c @@ -0,0 +1,97 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu indentification for x86. + */ + +#include "qemu/osdep.h" +#include "cpuinfo.h" +#ifdef CONFIG_CPUID_H +# include "qemu/cpuid.h" +#endif + +unsigned cpuinfo; + +/* Called both as constructor and (possibly) via other constructors. */ +unsigned __attribute__((constructor)) cpuinfo_init(void) +{ + unsigned info = cpuinfo; + + if (info) { + return info; + } + +#ifdef CONFIG_CPUID_H + unsigned max, a, b, c, d, b7 = 0, c7 = 0; + + max = __get_cpuid_max(0, 0); + + if (max >= 7) { + __cpuid_count(7, 0, a, b7, c7, d); + info |= (b7 & bit_BMI ? CPUINFO_BMI1 : 0); + info |= (b7 & bit_BMI2 ? CPUINFO_BMI2 : 0); + } + + if (max >= 1) { + __cpuid(1, a, b, c, d); + + info |= (d & bit_CMOV ? CPUINFO_CMOV : 0); + info |= (d & bit_SSE2 ? CPUINFO_SSE2 : 0); + info |= (c & bit_SSE4_1 ? CPUINFO_SSE4 : 0); + info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0); + info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0); + + /* For AVX features, we must check available and usable. */ + if ((c & bit_AVX) && (c & bit_OSXSAVE)) { + unsigned bv = xgetbv_low(0); + + if ((bv & 6) == 6) { + info |= CPUINFO_AVX1; + info |= (b7 & bit_AVX2 ? CPUINFO_AVX2 : 0); + + if ((bv & 0xe0) == 0xe0) { + info |= (b7 & bit_AVX512F ? CPUINFO_AVX512F : 0); + info |= (b7 & bit_AVX512VL ? CPUINFO_AVX512VL : 0); + info |= (b7 & bit_AVX512BW ? CPUINFO_AVX512BW : 0); + info |= (b7 & bit_AVX512DQ ? CPUINFO_AVX512DQ : 0); + info |= (c7 & bit_AVX512VBMI2 ? CPUINFO_AVX512VBMI2 : 0); + } + + /* + * The Intel SDM has added: + * Processors that enumerate support for Intel® AVX + * (by setting the feature flag CPUID.01H:ECX.AVX[bit 28]) + * guarantee that the 16-byte memory operations performed + * by the following instructions will always be carried + * out atomically: + * - MOVAPD, MOVAPS, and MOVDQA. + * - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128. + * - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded + * with EVEX.128 and k0 (masking disabled). + * Note that these instructions require the linear addresses + * of their memory operands to be 16-byte aligned. + * + * AMD has provided an even stronger guarantee that processors + * with AVX provide 16-byte atomicity for all cachable, + * naturally aligned single loads and stores, e.g. MOVDQU. + * + * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 + */ + __cpuid(0, a, b, c, d); + if (c == signature_INTEL_ecx || c == signature_AMD_ecx) { + info |= CPUINFO_ATOMIC_VMOVDQA; + } + } + } + } + + max = __get_cpuid_max(0x8000000, 0); + if (max >= 1) { + __cpuid(0x80000001, a, b, c, d); + info |= (c & bit_LZCNT ? CPUINFO_LZCNT : 0); + } +#endif + + info |= CPUINFO_ALWAYS; + cpuinfo = info; + return info; +} diff --git a/util/meson.build b/util/meson.build index 3c2cfc6ede..714c783b4c 100644 --- a/util/meson.build +++ b/util/meson.build @@ -106,3 +106,7 @@ if have_block endif util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c')) endif + +if cpu in ['x86', 'x86_64'] + util_ss.add(files('cpuinfo-i386.c')) +endif -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] util: Add cpuinfo-i386.c 2023-05-18 4:40 ` [PATCH 2/9] util: Add cpuinfo-i386.c Richard Henderson @ 2023-05-18 9:35 ` Juan Quintela 2023-05-18 12:45 ` Richard Henderson 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:35 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, peter.maydell Richard Henderson <richard.henderson@linaro.org> wrote: > Add cpuinfo.h for i386 and x86_64, and the initialization > for that in util/. Populate that with a slightly altered > copy of the tcg host probing code. Other uses of cpuid.h > will be adjusted one patch at a time. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> For what is worth my vote O:-) > +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ > +#define CPUINFO_CMOV (1u << 1) > +#define CPUINFO_MOVBE (1u << 2) > +#define CPUINFO_LZCNT (1u << 3) > +#define CPUINFO_POPCNT (1u << 4) > +#define CPUINFO_BMI1 (1u << 5) > +#define CPUINFO_BMI2 (1u << 6) > +#define CPUINFO_SSE2 (1u << 7) > +#define CPUINFO_SSE4 (1u << 8) > +#define CPUINFO_AVX1 (1u << 9) > +#define CPUINFO_AVX2 (1u << 10) > +#define CPUINFO_AVX512F (1u << 11) > +#define CPUINFO_AVX512VL (1u << 12) > +#define CPUINFO_AVX512BW (1u << 13) > +#define CPUINFO_AVX512DQ (1u << 14) > +#define CPUINFO_AVX512VBMI2 (1u << 15) > +#define CPUINFO_ATOMIC_VMOVDQA (1u << 16) > + > +/* Initialized with a constructor. */ > +extern unsigned cpuinfo; On one hand, it is weird having a flags variable that is only 32bit. I am so user to put 64 bit flags. Future proof, blah, blah, ... On the other hand, if tcg has survived for so long with only 16 bits, it is inside posibility that 32bits are more than enough. > +unsigned cpuinfo; > + > +/* Called both as constructor and (possibly) via other constructors. */ > +unsigned __attribute__((constructor)) cpuinfo_init(void) > +{ > + unsigned info = cpuinfo; > + > + if (info) { > + return info; > + } Have to look several times to this, because info "needed to be"" a static variable, right? O:-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] util: Add cpuinfo-i386.c 2023-05-18 9:35 ` Juan Quintela @ 2023-05-18 12:45 ` Richard Henderson 0 siblings, 0 replies; 21+ messages in thread From: Richard Henderson @ 2023-05-18 12:45 UTC (permalink / raw) To: quintela; +Cc: qemu-devel, peter.maydell On 5/18/23 02:35, Juan Quintela wrote: > Richard Henderson <richard.henderson@linaro.org> wrote: >> Add cpuinfo.h for i386 and x86_64, and the initialization >> for that in util/. Populate that with a slightly altered >> copy of the tcg host probing code. Other uses of cpuid.h >> will be adjusted one patch at a time. >> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > Reviewed-by: Juan Quintela <quintela@redhat.com> > > For what is worth my vote O:-) > >> +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ >> +#define CPUINFO_CMOV (1u << 1) >> +#define CPUINFO_MOVBE (1u << 2) >> +#define CPUINFO_LZCNT (1u << 3) >> +#define CPUINFO_POPCNT (1u << 4) >> +#define CPUINFO_BMI1 (1u << 5) >> +#define CPUINFO_BMI2 (1u << 6) >> +#define CPUINFO_SSE2 (1u << 7) >> +#define CPUINFO_SSE4 (1u << 8) >> +#define CPUINFO_AVX1 (1u << 9) >> +#define CPUINFO_AVX2 (1u << 10) >> +#define CPUINFO_AVX512F (1u << 11) >> +#define CPUINFO_AVX512VL (1u << 12) >> +#define CPUINFO_AVX512BW (1u << 13) >> +#define CPUINFO_AVX512DQ (1u << 14) >> +#define CPUINFO_AVX512VBMI2 (1u << 15) >> +#define CPUINFO_ATOMIC_VMOVDQA (1u << 16) >> + >> +/* Initialized with a constructor. */ >> +extern unsigned cpuinfo; > > On one hand, it is weird having a flags variable that is only 32bit. I > am so user to put 64 bit flags. Future proof, blah, blah, ... > > On the other hand, if tcg has survived for so long with only 16 bits, it > is inside posibility that 32bits are more than enough. Indeed. Nor is this an public abi that needs future-proofing -- in the event we need more bits, we change it. >> +/* Called both as constructor and (possibly) via other constructors. */ >> +unsigned __attribute__((constructor)) cpuinfo_init(void) >> +{ >> + unsigned info = cpuinfo; >> + >> + if (info) { >> + return info; >> + } > > Have to look several times to this, because info "needed to be"" a > static variable, right? O:-) :-) r~ ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson 2023-05-18 4:40 ` [PATCH 1/9] util: Introduce host-specific cpuinfo.h Richard Henderson 2023-05-18 4:40 ` [PATCH 2/9] util: Add cpuinfo-i386.c Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 15:52 ` Peter Maydell 2023-05-18 4:40 ` [PATCH 4/9] tcg/i386: Use cpuinfo.h Richard Henderson ` (5 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Add a bit to indicate when VMOVDQU is also atomic if aligned. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- include/host/i386/cpuinfo.h | 1 + util/cpuinfo-i386.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/host/i386/cpuinfo.h b/include/host/i386/cpuinfo.h index e6f7461378..a6537123cf 100644 --- a/include/host/i386/cpuinfo.h +++ b/include/host/i386/cpuinfo.h @@ -25,6 +25,7 @@ #define CPUINFO_AVX512DQ (1u << 14) #define CPUINFO_AVX512VBMI2 (1u << 15) #define CPUINFO_ATOMIC_VMOVDQA (1u << 16) +#define CPUINFO_ATOMIC_VMOVDQU (1u << 17) /* Initialized with a constructor. */ extern unsigned cpuinfo; diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c index cb9475c688..b72374362f 100644 --- a/util/cpuinfo-i386.c +++ b/util/cpuinfo-i386.c @@ -77,8 +77,10 @@ unsigned __attribute__((constructor)) cpuinfo_init(void) * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 */ __cpuid(0, a, b, c, d); - if (c == signature_INTEL_ecx || c == signature_AMD_ecx) { + if (c == signature_INTEL_ecx) { info |= CPUINFO_ATOMIC_VMOVDQA; + } else if (c == signature_AMD_ecx) { + info |= CPUINFO_ATOMIC_VMOVDQA | CPUINFO_ATOMIC_VMOVDQU; } } } -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU 2023-05-18 4:40 ` [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU Richard Henderson @ 2023-05-18 15:52 ` Peter Maydell 0 siblings, 0 replies; 21+ messages in thread From: Peter Maydell @ 2023-05-18 15:52 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel On Thu, 18 May 2023 at 05:41, Richard Henderson <richard.henderson@linaro.org> wrote: > > Add a bit to indicate when VMOVDQU is also atomic if aligned. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > include/host/i386/cpuinfo.h | 1 + > util/cpuinfo-i386.c | 4 +++- > 2 files changed, 4 insertions(+), 1 deletion(-) Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 4/9] tcg/i386: Use cpuinfo.h 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (2 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 15:53 ` Peter Maydell 2023-05-18 4:40 ` [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h Richard Henderson ` (4 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Use the CPUINFO_* bits instead of the individual boolean variables that we had been using. Remove all of the init code that was moved over to cpuinfo-i386.c. Note that have_avx512* check both AVX512{F,VL}, as we had previously done during tcg_target_init. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/i386/tcg-target.h | 28 +++++---- tcg/i386/tcg-target.c.inc | 123 ++------------------------------------ 2 files changed, 22 insertions(+), 129 deletions(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 0b5a2c68c5..0e1759c0b4 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -25,6 +25,8 @@ #ifndef I386_TCG_TARGET_H #define I386_TCG_TARGET_H +#include "cpuinfo.h" + #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 @@ -111,16 +113,22 @@ typedef enum { # define TCG_TARGET_CALL_RET_I128 TCG_CALL_RET_BY_REF #endif -extern bool have_bmi1; -extern bool have_popcnt; -extern bool have_avx1; -extern bool have_avx2; -extern bool have_avx512bw; -extern bool have_avx512dq; -extern bool have_avx512vbmi2; -extern bool have_avx512vl; -extern bool have_movbe; -extern bool have_atomic16; +#define have_bmi1 (cpuinfo & CPUINFO_BMI1) +#define have_popcnt (cpuinfo & CPUINFO_POPCNT) +#define have_avx1 (cpuinfo & CPUINFO_AVX1) +#define have_avx2 (cpuinfo & CPUINFO_AVX2) +#define have_movbe (cpuinfo & CPUINFO_MOVBE) +#define have_atomic16 (cpuinfo & CPUINFO_ATOMIC_VMOVDQA) + +/* + * There are interesting instructions in AVX512, so long as we have AVX512VL, + * which indicates support for EVEX on sizes smaller than 512 bits. + */ +#define have_avx512vl ((cpuinfo & CPUINFO_AVX512VL) && \ + (cpuinfo & CPUINFO_AVX512F)) +#define have_avx512bw ((cpuinfo & CPUINFO_AVX512BW) && have_avx512vl) +#define have_avx512dq ((cpuinfo & CPUINFO_AVX512DQ) && have_avx512vl) +#define have_avx512vbmi2 ((cpuinfo & CPUINFO_AVX512VBMI2) && have_avx512vl) /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 1 diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index 8b9a5f00e5..bfe9d98b7e 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -158,42 +158,14 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) # define SOFTMMU_RESERVE_REGS 0 #endif -/* The host compiler should supply <cpuid.h> to enable runtime features - detection, as we're not going to go so far as our own inline assembly. - If not available, default values will be assumed. */ -#if defined(CONFIG_CPUID_H) -#include "qemu/cpuid.h" -#endif - /* For 64-bit, we always know that CMOV is available. */ #if TCG_TARGET_REG_BITS == 64 -# define have_cmov 1 -#elif defined(CONFIG_CPUID_H) -static bool have_cmov; +# define have_cmov true #else -# define have_cmov 0 -#endif - -/* We need these symbols in tcg-target.h, and we can't properly conditionalize - it there. Therefore we always define the variable. */ -bool have_bmi1; -bool have_popcnt; -bool have_avx1; -bool have_avx2; -bool have_avx512bw; -bool have_avx512dq; -bool have_avx512vbmi2; -bool have_avx512vl; -bool have_movbe; -bool have_atomic16; - -#ifdef CONFIG_CPUID_H -static bool have_bmi2; -static bool have_lzcnt; -#else -# define have_bmi2 0 -# define have_lzcnt 0 +# define have_cmov (cpuinfo & CPUINFO_CMOV) #endif +#define have_bmi2 (cpuinfo & CPUINFO_BMI2) +#define have_lzcnt (cpuinfo & CPUINFO_LZCNT) static const tcg_insn_unit *tb_ret_addr; @@ -3961,93 +3933,6 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count) static void tcg_target_init(TCGContext *s) { -#ifdef CONFIG_CPUID_H - unsigned a, b, c, d, b7 = 0, c7 = 0; - unsigned max = __get_cpuid_max(0, 0); - - if (max >= 7) { - /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ - __cpuid_count(7, 0, a, b7, c7, d); - have_bmi1 = (b7 & bit_BMI) != 0; - have_bmi2 = (b7 & bit_BMI2) != 0; - } - - if (max >= 1) { - __cpuid(1, a, b, c, d); -#ifndef have_cmov - /* For 32-bit, 99% certainty that we're running on hardware that - supports cmov, but we still need to check. In case cmov is not - available, we'll use a small forward branch. */ - have_cmov = (d & bit_CMOV) != 0; -#endif - - /* MOVBE is only available on Intel Atom and Haswell CPUs, so we - need to probe for it. */ - have_movbe = (c & bit_MOVBE) != 0; - have_popcnt = (c & bit_POPCNT) != 0; - - /* There are a number of things we must check before we can be - sure of not hitting invalid opcode. */ - if (c & bit_OSXSAVE) { - unsigned bv = xgetbv_low(0); - - if ((bv & 6) == 6) { - have_avx1 = (c & bit_AVX) != 0; - have_avx2 = (b7 & bit_AVX2) != 0; - - /* - * There are interesting instructions in AVX512, so long - * as we have AVX512VL, which indicates support for EVEX - * on sizes smaller than 512 bits. We are required to - * check that OPMASK and all extended ZMM state are enabled - * even if we're not using them -- the insns will fault. - */ - if ((bv & 0xe0) == 0xe0 - && (b7 & bit_AVX512F) - && (b7 & bit_AVX512VL)) { - have_avx512vl = true; - have_avx512bw = (b7 & bit_AVX512BW) != 0; - have_avx512dq = (b7 & bit_AVX512DQ) != 0; - have_avx512vbmi2 = (c7 & bit_AVX512VBMI2) != 0; - } - - /* - * The Intel SDM has added: - * Processors that enumerate support for Intel® AVX - * (by setting the feature flag CPUID.01H:ECX.AVX[bit 28]) - * guarantee that the 16-byte memory operations performed - * by the following instructions will always be carried - * out atomically: - * - MOVAPD, MOVAPS, and MOVDQA. - * - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128. - * - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded - * with EVEX.128 and k0 (masking disabled). - * Note that these instructions require the linear addresses - * of their memory operands to be 16-byte aligned. - * - * AMD has provided an even stronger guarantee that processors - * with AVX provide 16-byte atomicity for all cachable, - * naturally aligned single loads and stores, e.g. MOVDQU. - * - * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 - */ - if (have_avx1) { - __cpuid(0, a, b, c, d); - have_atomic16 = (c == signature_INTEL_ecx || - c == signature_AMD_ecx); - } - } - } - } - - max = __get_cpuid_max(0x8000000, 0); - if (max >= 1) { - __cpuid(0x80000001, a, b, c, d); - /* LZCNT was introduced with AMD Barcelona and Intel Haswell CPUs. */ - have_lzcnt = (c & bit_LZCNT) != 0; - } -#endif /* CONFIG_CPUID_H */ - tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS; if (TCG_TARGET_REG_BITS == 64) { tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS; -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 4/9] tcg/i386: Use cpuinfo.h 2023-05-18 4:40 ` [PATCH 4/9] tcg/i386: Use cpuinfo.h Richard Henderson @ 2023-05-18 15:53 ` Peter Maydell 0 siblings, 0 replies; 21+ messages in thread From: Peter Maydell @ 2023-05-18 15:53 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel On Thu, 18 May 2023 at 05:41, Richard Henderson <richard.henderson@linaro.org> wrote: > > Use the CPUINFO_* bits instead of the individual boolean > variables that we had been using. Remove all of the init > code that was moved over to cpuinfo-i386.c. > > Note that have_avx512* check both AVX512{F,VL}, as we had > previously done during tcg_target_init. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > tcg/i386/tcg-target.h | 28 +++++---- > tcg/i386/tcg-target.c.inc | 123 ++------------------------------------ > 2 files changed, 22 insertions(+), 129 deletions(-) Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (3 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 4/9] tcg/i386: Use cpuinfo.h Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:49 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 6/9] migration/xbzrle: Shuffle function order Richard Henderson ` (3 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Use cpuinfo_init() during init_accel(), and the variable cpuinfo during test_buffer_is_zero_next_accel(). Adjust the logic that cycles through the set of accelerators for testing. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- util/bufferiszero.c | 126 ++++++++++++++++---------------------------- 1 file changed, 45 insertions(+), 81 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 1886bc5ba4..f216d07b76 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -24,6 +24,7 @@ #include "qemu/osdep.h" #include "qemu/cutils.h" #include "qemu/bswap.h" +#include "cpuinfo.h" static bool buffer_zero_int(const void *buf, size_t len) @@ -184,111 +185,74 @@ buffer_zero_avx512(const void *buf, size_t len) } #endif /* CONFIG_AVX512F_OPT */ - -/* Note that for test_buffer_is_zero_next_accel, the most preferred - * ISA must have the least significant bit. - */ -#define CACHE_AVX512F 1 -#define CACHE_AVX2 2 -#define CACHE_SSE4 4 -#define CACHE_SSE2 8 - -/* Make sure that these variables are appropriately initialized when +/* + * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ #if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) -# define INIT_CACHE 0 -# define INIT_ACCEL buffer_zero_int +# define INIT_USED 0 +# define INIT_LENGTH 0 +# define INIT_ACCEL buffer_zero_int #else # ifndef __SSE2__ # error "ISA selection confusion" # endif -# define INIT_CACHE CACHE_SSE2 -# define INIT_ACCEL buffer_zero_sse2 +# define INIT_USED CPUINFO_SSE2 +# define INIT_LENGTH 64 +# define INIT_ACCEL buffer_zero_sse2 #endif -static unsigned cpuid_cache = INIT_CACHE; +static unsigned used_accel = INIT_USED; +static unsigned length_to_accel = INIT_LENGTH; static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; -static int length_to_accel = 64; -static void init_accel(unsigned cache) +static unsigned __attribute__((noinline)) +select_accel_cpuinfo(unsigned info) { - bool (*fn)(const void *, size_t) = buffer_zero_int; - if (cache & CACHE_SSE2) { - fn = buffer_zero_sse2; - length_to_accel = 64; - } -#ifdef CONFIG_AVX2_OPT - if (cache & CACHE_SSE4) { - fn = buffer_zero_sse4; - length_to_accel = 64; - } - if (cache & CACHE_AVX2) { - fn = buffer_zero_avx2; - length_to_accel = 128; - } -#endif + static const struct { + unsigned bit; + unsigned len; + bool (*fn)(const void *, size_t); + } all[] = { #ifdef CONFIG_AVX512F_OPT - if (cache & CACHE_AVX512F) { - fn = buffer_zero_avx512; - length_to_accel = 256; - } + { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, #endif - buffer_accel = fn; +#ifdef CONFIG_AVX2_OPT + { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_SSE4, 64, buffer_zero_sse4 }, +#endif + { CPUINFO_SSE2, 64, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, 0, buffer_zero_int }, + }; + + for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { + if (info & all[i].bit) { + length_to_accel = all[i].len; + buffer_accel = all[i].fn; + return all[i].bit; + } + } + return 0; } #if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) -#include "qemu/cpuid.h" - -static void __attribute__((constructor)) init_cpuid_cache(void) +static void __attribute__((constructor)) init_accel(void) { - unsigned max = __get_cpuid_max(0, NULL); - int a, b, c, d; - unsigned cache = 0; - - if (max >= 1) { - __cpuid(1, a, b, c, d); - if (d & bit_SSE2) { - cache |= CACHE_SSE2; - } - if (c & bit_SSE4_1) { - cache |= CACHE_SSE4; - } - - /* We must check that AVX is not just available, but usable. */ - if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) { - unsigned bv = xgetbv_low(0); - __cpuid_count(7, 0, a, b, c, d); - if ((bv & 0x6) == 0x6 && (b & bit_AVX2)) { - cache |= CACHE_AVX2; - } - /* 0xe6: - * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 - * and ZMM16-ZMM31 state are enabled by OS) - * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) - */ - if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512F)) { - cache |= CACHE_AVX512F; - } - } - } - cpuid_cache = cache; - init_accel(cache); + used_accel = select_accel_cpuinfo(cpuinfo_init()); } #endif /* CONFIG_AVX2_OPT */ bool test_buffer_is_zero_next_accel(void) { - /* If no bits set, we just tested buffer_zero_int, and there - are no more acceleration options to test. */ - if (cpuid_cache == 0) { - return false; - } - /* Disable the accelerator we used before and select a new one. */ - cpuid_cache &= cpuid_cache - 1; - init_accel(cpuid_cache); - return true; + /* + * Accumulate the accelerators that we've already tested, and + * remove them from the set to test this round. We'll get back + * a zero from select_accel_cpuinfo when there are no more. + */ + unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); + used_accel |= used; + return used; } static bool select_accel_fn(const void *buf, size_t len) -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h 2023-05-18 4:40 ` [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h Richard Henderson @ 2023-05-18 9:49 ` Juan Quintela 2023-05-18 12:48 ` Richard Henderson 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:49 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, peter.maydell Richard Henderson <richard.henderson@linaro.org> wrote: > Use cpuinfo_init() during init_accel(), and the variable cpuinfo > during test_buffer_is_zero_next_accel(). Adjust the logic that > cycles through the set of accelerators for testing. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Semi related to this. For migration, I check every single page to see if it is full of zeros. But I can promisse that it is just a page (i.e. 4KiB, 16KiB or 64KiB, correct alignation, correct length, ...). Will do it make sense to have an special function for that? Yes, I have found with perf that bufferiszero() is quite high. No, I haven't try to experiment using a function that is optimized for the page size in the architecture. What do you think? Later, Juan. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h 2023-05-18 9:49 ` Juan Quintela @ 2023-05-18 12:48 ` Richard Henderson 0 siblings, 0 replies; 21+ messages in thread From: Richard Henderson @ 2023-05-18 12:48 UTC (permalink / raw) To: quintela; +Cc: qemu-devel, peter.maydell On 5/18/23 02:49, Juan Quintela wrote: > Richard Henderson <richard.henderson@linaro.org> wrote: >> Use cpuinfo_init() during init_accel(), and the variable cpuinfo >> during test_buffer_is_zero_next_accel(). Adjust the logic that >> cycles through the set of accelerators for testing. >> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > Semi related to this. > > For migration, I check every single page to see if it is full of zeros. > But I can promisse that it is just a page (i.e. 4KiB, 16KiB or 64KiB, > correct alignation, correct length, ...). > > Will do it make sense to have an special function for that? > > Yes, I have found with perf that bufferiszero() is quite high. No, I > haven't try to experiment using a function that is optimized for the > page size in the architecture. > > What do you think? The optimized bufferiszero functions are already optimized for their blocklength (64, 128, 256). I don't think adding page multiples will do much more. r~ ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 6/9] migration/xbzrle: Shuffle function order 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (4 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:19 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h Richard Henderson ` (2 subsequent siblings) 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Juan Quintela, Peter Xu, Leonardo Bras Place the CONFIG_AVX512BW_OPT block at the top, which will aid function selection in the next patch. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Cc: Juan Quintela <quintela@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Leonardo Bras <leobras@redhat.com> --- migration/xbzrle.c | 244 ++++++++++++++++++++++----------------------- 1 file changed, 122 insertions(+), 122 deletions(-) diff --git a/migration/xbzrle.c b/migration/xbzrle.c index 258e4959c9..751b5428f7 100644 --- a/migration/xbzrle.c +++ b/migration/xbzrle.c @@ -15,6 +15,128 @@ #include "qemu/host-utils.h" #include "xbzrle.h" +#if defined(CONFIG_AVX512BW_OPT) +#include <immintrin.h> + +int __attribute__((target("avx512bw"))) +xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, + uint8_t *dst, int dlen) +{ + uint32_t zrun_len = 0, nzrun_len = 0; + int d = 0, i = 0, num = 0; + uint8_t *nzrun_start = NULL; + /* add 1 to include residual part in main loop */ + uint32_t count512s = (slen >> 6) + 1; + /* countResidual is tail of data, i.e., countResidual = slen % 64 */ + uint32_t count_residual = slen & 0b111111; + bool never_same = true; + uint64_t mask_residual = 1; + mask_residual <<= count_residual; + mask_residual -= 1; + __m512i r = _mm512_set1_epi32(0); + + while (count512s) { + int bytes_to_check = 64; + uint64_t mask = 0xffffffffffffffff; + if (count512s == 1) { + bytes_to_check = count_residual; + mask = mask_residual; + } + __m512i old_data = _mm512_mask_loadu_epi8(r, + mask, old_buf + i); + __m512i new_data = _mm512_mask_loadu_epi8(r, + mask, new_buf + i); + uint64_t comp = _mm512_cmpeq_epi8_mask(old_data, new_data); + count512s--; + + bool is_same = (comp & 0x1); + while (bytes_to_check) { + if (d + 2 > dlen) { + return -1; + } + if (is_same) { + if (nzrun_len) { + d += uleb128_encode_small(dst + d, nzrun_len); + if (d + nzrun_len > dlen) { + return -1; + } + nzrun_start = new_buf + i - nzrun_len; + memcpy(dst + d, nzrun_start, nzrun_len); + d += nzrun_len; + nzrun_len = 0; + } + /* 64 data at a time for speed */ + if (count512s && (comp == 0xffffffffffffffff)) { + i += 64; + zrun_len += 64; + break; + } + never_same = false; + num = ctz64(~comp); + num = (num < bytes_to_check) ? num : bytes_to_check; + zrun_len += num; + bytes_to_check -= num; + comp >>= num; + i += num; + if (bytes_to_check) { + /* still has different data after same data */ + d += uleb128_encode_small(dst + d, zrun_len); + zrun_len = 0; + } else { + break; + } + } + if (never_same || zrun_len) { + /* + * never_same only acts if + * data begins with diff in first count512s + */ + d += uleb128_encode_small(dst + d, zrun_len); + zrun_len = 0; + never_same = false; + } + /* has diff, 64 data at a time for speed */ + if ((bytes_to_check == 64) && (comp == 0x0)) { + i += 64; + nzrun_len += 64; + break; + } + num = ctz64(comp); + num = (num < bytes_to_check) ? num : bytes_to_check; + nzrun_len += num; + bytes_to_check -= num; + comp >>= num; + i += num; + if (bytes_to_check) { + /* mask like 111000 */ + d += uleb128_encode_small(dst + d, nzrun_len); + /* overflow */ + if (d + nzrun_len > dlen) { + return -1; + } + nzrun_start = new_buf + i - nzrun_len; + memcpy(dst + d, nzrun_start, nzrun_len); + d += nzrun_len; + nzrun_len = 0; + is_same = true; + } + } + } + + if (nzrun_len != 0) { + d += uleb128_encode_small(dst + d, nzrun_len); + /* overflow */ + if (d + nzrun_len > dlen) { + return -1; + } + nzrun_start = new_buf + i - nzrun_len; + memcpy(dst + d, nzrun_start, nzrun_len); + d += nzrun_len; + } + return d; +} +#endif + /* page = zrun nzrun | zrun nzrun page @@ -175,125 +297,3 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) return d; } - -#if defined(CONFIG_AVX512BW_OPT) -#include <immintrin.h> - -int __attribute__((target("avx512bw"))) -xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, - uint8_t *dst, int dlen) -{ - uint32_t zrun_len = 0, nzrun_len = 0; - int d = 0, i = 0, num = 0; - uint8_t *nzrun_start = NULL; - /* add 1 to include residual part in main loop */ - uint32_t count512s = (slen >> 6) + 1; - /* countResidual is tail of data, i.e., countResidual = slen % 64 */ - uint32_t count_residual = slen & 0b111111; - bool never_same = true; - uint64_t mask_residual = 1; - mask_residual <<= count_residual; - mask_residual -= 1; - __m512i r = _mm512_set1_epi32(0); - - while (count512s) { - int bytes_to_check = 64; - uint64_t mask = 0xffffffffffffffff; - if (count512s == 1) { - bytes_to_check = count_residual; - mask = mask_residual; - } - __m512i old_data = _mm512_mask_loadu_epi8(r, - mask, old_buf + i); - __m512i new_data = _mm512_mask_loadu_epi8(r, - mask, new_buf + i); - uint64_t comp = _mm512_cmpeq_epi8_mask(old_data, new_data); - count512s--; - - bool is_same = (comp & 0x1); - while (bytes_to_check) { - if (d + 2 > dlen) { - return -1; - } - if (is_same) { - if (nzrun_len) { - d += uleb128_encode_small(dst + d, nzrun_len); - if (d + nzrun_len > dlen) { - return -1; - } - nzrun_start = new_buf + i - nzrun_len; - memcpy(dst + d, nzrun_start, nzrun_len); - d += nzrun_len; - nzrun_len = 0; - } - /* 64 data at a time for speed */ - if (count512s && (comp == 0xffffffffffffffff)) { - i += 64; - zrun_len += 64; - break; - } - never_same = false; - num = ctz64(~comp); - num = (num < bytes_to_check) ? num : bytes_to_check; - zrun_len += num; - bytes_to_check -= num; - comp >>= num; - i += num; - if (bytes_to_check) { - /* still has different data after same data */ - d += uleb128_encode_small(dst + d, zrun_len); - zrun_len = 0; - } else { - break; - } - } - if (never_same || zrun_len) { - /* - * never_same only acts if - * data begins with diff in first count512s - */ - d += uleb128_encode_small(dst + d, zrun_len); - zrun_len = 0; - never_same = false; - } - /* has diff, 64 data at a time for speed */ - if ((bytes_to_check == 64) && (comp == 0x0)) { - i += 64; - nzrun_len += 64; - break; - } - num = ctz64(comp); - num = (num < bytes_to_check) ? num : bytes_to_check; - nzrun_len += num; - bytes_to_check -= num; - comp >>= num; - i += num; - if (bytes_to_check) { - /* mask like 111000 */ - d += uleb128_encode_small(dst + d, nzrun_len); - /* overflow */ - if (d + nzrun_len > dlen) { - return -1; - } - nzrun_start = new_buf + i - nzrun_len; - memcpy(dst + d, nzrun_start, nzrun_len); - d += nzrun_len; - nzrun_len = 0; - is_same = true; - } - } - } - - if (nzrun_len != 0) { - d += uleb128_encode_small(dst + d, nzrun_len); - /* overflow */ - if (d + nzrun_len > dlen) { - return -1; - } - nzrun_start = new_buf + i - nzrun_len; - memcpy(dst + d, nzrun_start, nzrun_len); - d += nzrun_len; - } - return d; -} -#endif -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 6/9] migration/xbzrle: Shuffle function order 2023-05-18 4:40 ` [PATCH 6/9] migration/xbzrle: Shuffle function order Richard Henderson @ 2023-05-18 9:19 ` Juan Quintela 0 siblings, 0 replies; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:19 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, peter.maydell, Peter Xu, Leonardo Bras Richard Henderson <richard.henderson@linaro.org> wrote: > Place the CONFIG_AVX512BW_OPT block at the top, > which will aid function selection in the next patch. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Queued. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (5 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 6/9] migration/xbzrle: Shuffle function order Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:44 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 8/9] migration: Build migration_files once Richard Henderson 2023-05-18 4:40 ` [PATCH 9/9] util: Add cpuinfo-aarch64.c Richard Henderson 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Juan Quintela, Peter Xu, Leonardo Bras Perform the function selection once, and only if CONFIG_AVX512_OPT is enabled. Centralize the selection to xbzrle.c, instead of spreading the init across 3 files. Remove xbzrle-bench.c. The benefit of being able to benchmark the different implementations is less important than peeking into the internals of the implementation. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Cc: Juan Quintela <quintela@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Leonardo Bras <leobras@redhat.com> --- migration/xbzrle.h | 5 +- migration/ram.c | 34 +-- migration/xbzrle.c | 26 +- tests/bench/xbzrle-bench.c | 469 ------------------------------------- tests/unit/test-xbzrle.c | 49 +--- tests/bench/meson.build | 6 - 6 files changed, 39 insertions(+), 550 deletions(-) delete mode 100644 tests/bench/xbzrle-bench.c diff --git a/migration/xbzrle.h b/migration/xbzrle.h index 6feb49160a..39e651b9ec 100644 --- a/migration/xbzrle.h +++ b/migration/xbzrle.h @@ -18,8 +18,5 @@ int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, uint8_t *dst, int dlen); int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen); -#if defined(CONFIG_AVX512BW_OPT) -int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, - uint8_t *dst, int dlen); -#endif + #endif diff --git a/migration/ram.c b/migration/ram.c index f69d8d42b0..f9e35a45e1 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -90,34 +90,6 @@ #define RAM_SAVE_FLAG_MULTIFD_FLUSH 0x200 /* We can't use any flag that is bigger than 0x200 */ -int (*xbzrle_encode_buffer_func)(uint8_t *, uint8_t *, int, - uint8_t *, int) = xbzrle_encode_buffer; -#if defined(CONFIG_AVX512BW_OPT) -#include "qemu/cpuid.h" -static void __attribute__((constructor)) init_cpu_flag(void) -{ - unsigned max = __get_cpuid_max(0, NULL); - int a, b, c, d; - if (max >= 1) { - __cpuid(1, a, b, c, d); - /* We must check that AVX is not just available, but usable. */ - if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) { - int bv; - __asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0)); - __cpuid_count(7, 0, a, b, c, d); - /* 0xe6: - * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 - * and ZMM16-ZMM31 state are enabled by OS) - * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) - */ - if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) { - xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512; - } - } - } -} -#endif - XBZRLECacheStats xbzrle_counters; /* used by the search for pages to send */ @@ -660,9 +632,9 @@ static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss, memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE); /* XBZRLE encoding (if there is no overflow) */ - encoded_len = xbzrle_encode_buffer_func(prev_cached_page, XBZRLE.current_buf, - TARGET_PAGE_SIZE, XBZRLE.encoded_buf, - TARGET_PAGE_SIZE); + encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf, + TARGET_PAGE_SIZE, XBZRLE.encoded_buf, + TARGET_PAGE_SIZE); /* * Update the cache contents, so that it corresponds to the data diff --git a/migration/xbzrle.c b/migration/xbzrle.c index 751b5428f7..57da6f4c96 100644 --- a/migration/xbzrle.c +++ b/migration/xbzrle.c @@ -17,8 +17,9 @@ #if defined(CONFIG_AVX512BW_OPT) #include <immintrin.h> +#include "cpuinfo.h" -int __attribute__((target("avx512bw"))) +static int __attribute__((target("avx512bw"))) xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, uint8_t *dst, int dlen) { @@ -135,6 +136,29 @@ xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, } return d; } + +static int xbzrle_encode_buffer_int(uint8_t *old_buf, uint8_t *new_buf, + int slen, uint8_t *dst, int dlen); + +static int (*accel_func)(uint8_t *, uint8_t *, int, uint8_t *, int); + +static void __attribute__((constructor)) init_accel(void) +{ + unsigned info = cpuinfo_init(); + if (info & CPUINFO_AVX512BW) { + accel_func = xbzrle_encode_buffer_avx512; + } else { + accel_func = xbzrle_encode_buffer_int; + } +} + +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, + uint8_t *dst, int dlen) +{ + return accel_func(old_buf, new_buf, slen, dst, dlen); +} + +#define xbzrle_encode_buffer xbzrle_encode_buffer_int #endif /* diff --git a/tests/bench/xbzrle-bench.c b/tests/bench/xbzrle-bench.c deleted file mode 100644 index 8848a3a32d..0000000000 --- a/tests/bench/xbzrle-bench.c +++ /dev/null @@ -1,469 +0,0 @@ -/* - * Xor Based Zero Run Length Encoding unit tests. - * - * Copyright 2013 Red Hat, Inc. and/or its affiliates - * - * Authors: - * Orit Wasserman <owasserm@redhat.com> - * - * This work is licensed under the terms of the GNU GPL, version 2 or later. - * See the COPYING file in the top-level directory. - * - */ -#include "qemu/osdep.h" -#include "qemu/cutils.h" -#include "../migration/xbzrle.h" - -#if defined(CONFIG_AVX512BW_OPT) -#define XBZRLE_PAGE_SIZE 4096 -static bool is_cpu_support_avx512bw; -#include "qemu/cpuid.h" -static void __attribute__((constructor)) init_cpu_flag(void) -{ - unsigned max = __get_cpuid_max(0, NULL); - int a, b, c, d; - is_cpu_support_avx512bw = false; - if (max >= 1) { - __cpuid(1, a, b, c, d); - /* We must check that AVX is not just available, but usable. */ - if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) { - int bv; - __asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0)); - __cpuid_count(7, 0, a, b, c, d); - /* 0xe6: - * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 - * and ZMM16-ZMM31 state are enabled by OS) - * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) - */ - if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) { - is_cpu_support_avx512bw = true; - } - } - } - return ; -} - -struct ResTime { - float t_raw; - float t_512; -}; - - -/* Function prototypes -int xbzrle_encode_buffer_avx512(uint8_t *old_buf, uint8_t *new_buf, int slen, - uint8_t *dst, int dlen); -*/ -static void encode_decode_zero(struct ResTime *res) -{ - uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE); - int i = 0; - int dlen = 0, dlen512 = 0; - int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006); - - for (i = diff_len; i > 0; i--) { - buffer[1000 + i] = i; - buffer512[1000 + i] = i; - } - - buffer[1000 + diff_len + 3] = 103; - buffer[1000 + diff_len + 5] = 105; - - buffer512[1000 + diff_len + 3] = 103; - buffer512[1000 + diff_len + 5] = 105; - - /* encode zero page */ - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - dlen = xbzrle_encode_buffer(buffer, buffer, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - g_assert(dlen == 0); - - t_start512 = clock(); - dlen512 = xbzrle_encode_buffer_avx512(buffer512, buffer512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - g_assert(dlen512 == 0); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(buffer); - g_free(compressed); - g_free(buffer512); - g_free(compressed512); - -} - -static void test_encode_decode_zero_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_zero(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("Zero test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} - -static void encode_decode_unchanged(struct ResTime *res) -{ - uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE); - int i = 0; - int dlen = 0, dlen512 = 0; - int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006); - - for (i = diff_len; i > 0; i--) { - test[1000 + i] = i + 4; - test512[1000 + i] = i + 4; - } - - test[1000 + diff_len + 3] = 107; - test[1000 + diff_len + 5] = 109; - - test512[1000 + diff_len + 3] = 107; - test512[1000 + diff_len + 5] = 109; - - /* test unchanged buffer */ - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - dlen = xbzrle_encode_buffer(test, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - g_assert(dlen == 0); - - t_start512 = clock(); - dlen512 = xbzrle_encode_buffer_avx512(test512, test512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - g_assert(dlen512 == 0); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(test); - g_free(compressed); - g_free(test512); - g_free(compressed512); - -} - -static void test_encode_decode_unchanged_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_unchanged(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("Unchanged test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} - -static void encode_decode_1_byte(struct ResTime *res) -{ - uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed = g_malloc(XBZRLE_PAGE_SIZE); - uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc(XBZRLE_PAGE_SIZE); - int dlen = 0, rc = 0, dlen512 = 0, rc512 = 0; - uint8_t buf[2]; - uint8_t buf512[2]; - - test[XBZRLE_PAGE_SIZE - 1] = 1; - test512[XBZRLE_PAGE_SIZE - 1] = 1; - - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - dlen = xbzrle_encode_buffer(buffer, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2)); - - rc = xbzrle_decode_buffer(compressed, dlen, buffer, XBZRLE_PAGE_SIZE); - g_assert(rc == XBZRLE_PAGE_SIZE); - g_assert(memcmp(test, buffer, XBZRLE_PAGE_SIZE) == 0); - - t_start512 = clock(); - dlen512 = xbzrle_encode_buffer_avx512(buffer512, test512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - g_assert(dlen512 == (uleb128_encode_small(&buf512[0], 4095) + 2)); - - rc512 = xbzrle_decode_buffer(compressed512, dlen512, buffer512, - XBZRLE_PAGE_SIZE); - g_assert(rc512 == XBZRLE_PAGE_SIZE); - g_assert(memcmp(test512, buffer512, XBZRLE_PAGE_SIZE) == 0); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(buffer); - g_free(compressed); - g_free(test); - g_free(buffer512); - g_free(compressed512); - g_free(test512); - -} - -static void test_encode_decode_1_byte_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_1_byte(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("1 byte test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} - -static void encode_decode_overflow(struct ResTime *res) -{ - uint8_t *compressed = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE); - int i = 0, rc = 0, rc512 = 0; - - for (i = 0; i < XBZRLE_PAGE_SIZE / 2 - 1; i++) { - test[i * 2] = 1; - test512[i * 2] = 1; - } - - /* encode overflow */ - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - rc = xbzrle_encode_buffer(buffer, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - g_assert(rc == -1); - - t_start512 = clock(); - rc512 = xbzrle_encode_buffer_avx512(buffer512, test512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - g_assert(rc512 == -1); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(buffer); - g_free(compressed); - g_free(test); - g_free(buffer512); - g_free(compressed512); - g_free(test512); - -} - -static void test_encode_decode_overflow_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_overflow(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("Overflow test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} - -static void encode_decode_range_avx512(struct ResTime *res) -{ - uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed = g_malloc(XBZRLE_PAGE_SIZE); - uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc(XBZRLE_PAGE_SIZE); - uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE); - int i = 0, rc = 0, rc512 = 0; - int dlen = 0, dlen512 = 0; - - int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1006); - - for (i = diff_len; i > 0; i--) { - buffer[1000 + i] = i; - test[1000 + i] = i + 4; - buffer512[1000 + i] = i; - test512[1000 + i] = i + 4; - } - - buffer[1000 + diff_len + 3] = 103; - test[1000 + diff_len + 3] = 107; - - buffer[1000 + diff_len + 5] = 105; - test[1000 + diff_len + 5] = 109; - - buffer512[1000 + diff_len + 3] = 103; - test512[1000 + diff_len + 3] = 107; - - buffer512[1000 + diff_len + 5] = 105; - test512[1000 + diff_len + 5] = 109; - - /* test encode/decode */ - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - dlen = xbzrle_encode_buffer(test, buffer, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - rc = xbzrle_decode_buffer(compressed, dlen, test, XBZRLE_PAGE_SIZE); - g_assert(rc < XBZRLE_PAGE_SIZE); - g_assert(memcmp(test, buffer, XBZRLE_PAGE_SIZE) == 0); - - t_start512 = clock(); - dlen512 = xbzrle_encode_buffer_avx512(test512, buffer512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - rc512 = xbzrle_decode_buffer(compressed512, dlen512, test512, XBZRLE_PAGE_SIZE); - g_assert(rc512 < XBZRLE_PAGE_SIZE); - g_assert(memcmp(test512, buffer512, XBZRLE_PAGE_SIZE) == 0); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(buffer); - g_free(compressed); - g_free(test); - g_free(buffer512); - g_free(compressed512); - g_free(test512); - -} - -static void test_encode_decode_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_range_avx512(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("Encode decode test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} - -static void encode_decode_random(struct ResTime *res) -{ - uint8_t *buffer = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed = g_malloc(XBZRLE_PAGE_SIZE); - uint8_t *test = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *buffer512 = g_malloc0(XBZRLE_PAGE_SIZE); - uint8_t *compressed512 = g_malloc(XBZRLE_PAGE_SIZE); - uint8_t *test512 = g_malloc0(XBZRLE_PAGE_SIZE); - int i = 0, rc = 0, rc512 = 0; - int dlen = 0, dlen512 = 0; - - int diff_len = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1); - /* store the index of diff */ - int dirty_index[diff_len]; - for (int j = 0; j < diff_len; j++) { - dirty_index[j] = g_test_rand_int_range(0, XBZRLE_PAGE_SIZE - 1); - } - for (i = diff_len - 1; i >= 0; i--) { - buffer[dirty_index[i]] = i; - test[dirty_index[i]] = i + 4; - buffer512[dirty_index[i]] = i; - test512[dirty_index[i]] = i + 4; - } - - time_t t_start, t_end, t_start512, t_end512; - t_start = clock(); - dlen = xbzrle_encode_buffer(test, buffer, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); - t_end = clock(); - float time_val = difftime(t_end, t_start); - rc = xbzrle_decode_buffer(compressed, dlen, test, XBZRLE_PAGE_SIZE); - g_assert(rc < XBZRLE_PAGE_SIZE); - - t_start512 = clock(); - dlen512 = xbzrle_encode_buffer_avx512(test512, buffer512, XBZRLE_PAGE_SIZE, - compressed512, XBZRLE_PAGE_SIZE); - t_end512 = clock(); - float time_val512 = difftime(t_end512, t_start512); - rc512 = xbzrle_decode_buffer(compressed512, dlen512, test512, XBZRLE_PAGE_SIZE); - g_assert(rc512 < XBZRLE_PAGE_SIZE); - - res->t_raw = time_val; - res->t_512 = time_val512; - - g_free(buffer); - g_free(compressed); - g_free(test); - g_free(buffer512); - g_free(compressed512); - g_free(test512); - -} - -static void test_encode_decode_random_avx512(void) -{ - int i; - float time_raw = 0.0, time_512 = 0.0; - struct ResTime res; - for (i = 0; i < 10000; i++) { - encode_decode_random(&res); - time_raw += res.t_raw; - time_512 += res.t_512; - } - printf("Random test:\n"); - printf("Raw xbzrle_encode time is %f ms\n", time_raw); - printf("512 xbzrle_encode time is %f ms\n", time_512); -} -#endif - -int main(int argc, char **argv) -{ - g_test_init(&argc, &argv, NULL); - g_test_rand_int(); - #if defined(CONFIG_AVX512BW_OPT) - if (likely(is_cpu_support_avx512bw)) { - g_test_add_func("/xbzrle/encode_decode_zero", test_encode_decode_zero_avx512); - g_test_add_func("/xbzrle/encode_decode_unchanged", - test_encode_decode_unchanged_avx512); - g_test_add_func("/xbzrle/encode_decode_1_byte", test_encode_decode_1_byte_avx512); - g_test_add_func("/xbzrle/encode_decode_overflow", - test_encode_decode_overflow_avx512); - g_test_add_func("/xbzrle/encode_decode", test_encode_decode_avx512); - g_test_add_func("/xbzrle/encode_decode_random", test_encode_decode_random_avx512); - } - #endif - return g_test_run(); -} diff --git a/tests/unit/test-xbzrle.c b/tests/unit/test-xbzrle.c index 547046d093..b6996de69a 100644 --- a/tests/unit/test-xbzrle.c +++ b/tests/unit/test-xbzrle.c @@ -16,35 +16,6 @@ #define XBZRLE_PAGE_SIZE 4096 -int (*xbzrle_encode_buffer_func)(uint8_t *, uint8_t *, int, - uint8_t *, int) = xbzrle_encode_buffer; -#if defined(CONFIG_AVX512BW_OPT) -#include "qemu/cpuid.h" -static void __attribute__((constructor)) init_cpu_flag(void) -{ - unsigned max = __get_cpuid_max(0, NULL); - int a, b, c, d; - if (max >= 1) { - __cpuid(1, a, b, c, d); - /* We must check that AVX is not just available, but usable. */ - if ((c & bit_OSXSAVE) && (c & bit_AVX) && max >= 7) { - int bv; - __asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0)); - __cpuid_count(7, 0, a, b, c, d); - /* 0xe6: - * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 - * and ZMM16-ZMM31 state are enabled by OS) - * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) - */ - if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512BW)) { - xbzrle_encode_buffer_func = xbzrle_encode_buffer_avx512; - } - } - } - return ; -} -#endif - static void test_uleb(void) { uint32_t i, val; @@ -83,8 +54,8 @@ static void test_encode_decode_zero(void) buffer[1000 + diff_len + 5] = 105; /* encode zero page */ - dlen = xbzrle_encode_buffer_func(buffer, buffer, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); + dlen = xbzrle_encode_buffer(buffer, buffer, XBZRLE_PAGE_SIZE, + compressed, XBZRLE_PAGE_SIZE); g_assert(dlen == 0); g_free(buffer); @@ -107,8 +78,8 @@ static void test_encode_decode_unchanged(void) test[1000 + diff_len + 5] = 109; /* test unchanged buffer */ - dlen = xbzrle_encode_buffer_func(test, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); + dlen = xbzrle_encode_buffer(test, test, XBZRLE_PAGE_SIZE, + compressed, XBZRLE_PAGE_SIZE); g_assert(dlen == 0); g_free(test); @@ -125,8 +96,8 @@ static void test_encode_decode_1_byte(void) test[XBZRLE_PAGE_SIZE - 1] = 1; - dlen = xbzrle_encode_buffer_func(buffer, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); + dlen = xbzrle_encode_buffer(buffer, test, XBZRLE_PAGE_SIZE, + compressed, XBZRLE_PAGE_SIZE); g_assert(dlen == (uleb128_encode_small(&buf[0], 4095) + 2)); rc = xbzrle_decode_buffer(compressed, dlen, buffer, XBZRLE_PAGE_SIZE); @@ -150,8 +121,8 @@ static void test_encode_decode_overflow(void) } /* encode overflow */ - rc = xbzrle_encode_buffer_func(buffer, test, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); + rc = xbzrle_encode_buffer(buffer, test, XBZRLE_PAGE_SIZE, + compressed, XBZRLE_PAGE_SIZE); g_assert(rc == -1); g_free(buffer); @@ -181,8 +152,8 @@ static void encode_decode_range(void) test[1000 + diff_len + 5] = 109; /* test encode/decode */ - dlen = xbzrle_encode_buffer_func(test, buffer, XBZRLE_PAGE_SIZE, compressed, - XBZRLE_PAGE_SIZE); + dlen = xbzrle_encode_buffer(test, buffer, XBZRLE_PAGE_SIZE, + compressed, XBZRLE_PAGE_SIZE); rc = xbzrle_decode_buffer(compressed, dlen, test, XBZRLE_PAGE_SIZE); g_assert(rc < XBZRLE_PAGE_SIZE); diff --git a/tests/bench/meson.build b/tests/bench/meson.build index 4e6b469066..3c799dbd98 100644 --- a/tests/bench/meson.build +++ b/tests/bench/meson.build @@ -3,12 +3,6 @@ qht_bench = executable('qht-bench', sources: 'qht-bench.c', dependencies: [qemuutil]) -if have_system -xbzrle_bench = executable('xbzrle-bench', - sources: 'xbzrle-bench.c', - dependencies: [qemuutil,migration]) -endif - qtree_bench = executable('qtree-bench', sources: 'qtree-bench.c', dependencies: [qemuutil]) -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h 2023-05-18 4:40 ` [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h Richard Henderson @ 2023-05-18 9:44 ` Juan Quintela 0 siblings, 0 replies; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:44 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, peter.maydell, Peter Xu, Leonardo Bras Richard Henderson <richard.henderson@linaro.org> wrote: > Perform the function selection once, and only if CONFIG_AVX512_OPT > is enabled. Centralize the selection to xbzrle.c, instead of > spreading the init across 3 files. > > Remove xbzrle-bench.c. The benefit of being able to benchmark > the different implementations is less important than peeking into > the internals of the implementation. Agreed. If AVX512 is not better than a plain C implementation, better to not have it O:-) > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Not queued. It needs your other patches, I think it is better that all teh series go through your tree. > +static void __attribute__((constructor)) init_accel(void) > +{ > + unsigned info = cpuinfo_init(); > + if (info & CPUINFO_AVX512BW) { > + accel_func = xbzrle_encode_buffer_avx512; > + } else { > + accel_func = xbzrle_encode_buffer_int; > + } > +} Wow. Comparing it with previous implementation that did it by hand with asm. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 8/9] migration: Build migration_files once 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (6 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 9:20 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 9/9] util: Add cpuinfo-aarch64.c Richard Henderson 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Juan Quintela, Peter Xu, Leonardo Bras The items in migration_files are built for libmigration and included info softmmu_ss from there; no need to also include them directly. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Cc: Juan Quintela <quintela@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Leonardo Bras <leobras@redhat.com> --- migration/meson.build | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/meson.build b/migration/meson.build index dc8b1daef5..21ac014496 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -7,7 +7,6 @@ migration_files = files( 'qemu-file.c', 'yank_functions.c', ) -softmmu_ss.add(migration_files) softmmu_ss.add(files( 'block-dirty-bitmap.c', -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 8/9] migration: Build migration_files once 2023-05-18 4:40 ` [PATCH 8/9] migration: Build migration_files once Richard Henderson @ 2023-05-18 9:20 ` Juan Quintela 0 siblings, 0 replies; 21+ messages in thread From: Juan Quintela @ 2023-05-18 9:20 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, peter.maydell, Peter Xu, Leonardo Bras Richard Henderson <richard.henderson@linaro.org> wrote: > The items in migration_files are built for libmigration and included > info softmmu_ss from there; no need to also include them directly. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Ouch. Good catch. Reviewed-by: Juan Quintela <quintela@redhat.com> queued. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 9/9] util: Add cpuinfo-aarch64.c 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson ` (7 preceding siblings ...) 2023-05-18 4:40 ` [PATCH 8/9] migration: Build migration_files once Richard Henderson @ 2023-05-18 4:40 ` Richard Henderson 2023-05-18 15:55 ` Peter Maydell 8 siblings, 1 reply; 21+ messages in thread From: Richard Henderson @ 2023-05-18 4:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Move the code from tcg/. The only use of these bits so far is with respect to the atomicity of tcg operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- include/host/aarch64/cpuinfo.h | 22 +++++++++++ tcg/aarch64/tcg-target.h | 4 +- util/cpuinfo-aarch64.c | 67 ++++++++++++++++++++++++++++++++++ tcg/aarch64/tcg-target.c.inc | 41 +-------------------- util/meson.build | 4 +- 5 files changed, 95 insertions(+), 43 deletions(-) create mode 100644 include/host/aarch64/cpuinfo.h create mode 100644 util/cpuinfo-aarch64.c diff --git a/include/host/aarch64/cpuinfo.h b/include/host/aarch64/cpuinfo.h new file mode 100644 index 0000000000..82227890b4 --- /dev/null +++ b/include/host/aarch64/cpuinfo.h @@ -0,0 +1,22 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu indentification for AArch64. + */ + +#ifndef HOST_CPUINFO_H +#define HOST_CPUINFO_H + +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ +#define CPUINFO_LSE (1u << 1) +#define CPUINFO_LSE2 (1u << 2) + +/* Initialized with a constructor. */ +extern unsigned cpuinfo; + +/* + * We cannot rely on constructor ordering, so other constructors must + * use the function interface rather than the variable above. + */ +unsigned cpuinfo_init(void); + +#endif /* HOST_CPUINFO_H */ diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index 74ee2ed255..b6ff440e15 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -57,8 +57,8 @@ typedef enum { #define TCG_TARGET_CALL_ARG_I128 TCG_CALL_ARG_EVEN #define TCG_TARGET_CALL_RET_I128 TCG_CALL_RET_NORMAL -extern bool have_lse; -extern bool have_lse2; +#define have_lse (cpuinfo & CPUINFO_LSE) +#define have_lse2 (cpuinfo & CPUINFO_LSE2) /* optional instructions */ #define TCG_TARGET_HAS_div_i32 1 diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c new file mode 100644 index 0000000000..a774fb170f --- /dev/null +++ b/util/cpuinfo-aarch64.c @@ -0,0 +1,67 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu indentification for AArch64. + */ + +#include "qemu/osdep.h" +#include "cpuinfo.h" + +#ifdef CONFIG_LINUX +# ifdef CONFIG_GETAUXVAL +# include <sys/auxv.h> +# else +# include <asm/hwcap.h> +# include "elf.h" +# endif +#endif +#ifdef CONFIG_DARWIN +# include <sys/sysctl.h> +#endif + +unsigned cpuinfo; + +#ifdef CONFIG_DARWIN +static bool sysctl_for_bool(const char *name) +{ + int val = 0; + size_t len = sizeof(val); + + if (sysctlbyname(name, &val, &len, NULL, 0) == 0) { + return val != 0; + } + + /* + * We might in the future ask for properties not present in older kernels, + * but we're only asking about static properties, all of which should be + * 'int'. So we shouln't see ENOMEM (val too small), or any of the other + * more exotic errors. + */ + assert(errno == ENOENT); + return false; +} +#endif + +/* Called both as constructor and (possibly) via other constructors. */ +unsigned __attribute__((constructor)) cpuinfo_init(void) +{ + unsigned info = cpuinfo; + + if (info) { + return info; + } + + info = CPUINFO_ALWAYS; + +#ifdef CONFIG_LINUX + unsigned long hwcap = qemu_getauxval(AT_HWCAP); + info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0); + info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0); +#endif +#ifdef CONFIG_DARWIN + info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE; + info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2; +#endif + + cpuinfo = info; + return info; +} diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc index bc6b99a1bd..1e5ffb7a49 100644 --- a/tcg/aarch64/tcg-target.c.inc +++ b/tcg/aarch64/tcg-target.c.inc @@ -13,12 +13,7 @@ #include "../tcg-ldst.c.inc" #include "../tcg-pool.c.inc" #include "qemu/bitops.h" -#ifdef __linux__ -#include <asm/hwcap.h> -#endif -#ifdef CONFIG_DARWIN -#include <sys/sysctl.h> -#endif +#include "cpuinfo.h" /* We're going to re-use TCGType in setting of the SF bit, which controls the size of the operation performed. If we know the values match, it @@ -77,9 +72,6 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) return TCG_REG_X0 + slot; } -bool have_lse; -bool have_lse2; - #define TCG_REG_TMP TCG_REG_X30 #define TCG_VEC_TMP TCG_REG_V31 @@ -2878,39 +2870,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) } } -#ifdef CONFIG_DARWIN -static bool sysctl_for_bool(const char *name) -{ - int val = 0; - size_t len = sizeof(val); - - if (sysctlbyname(name, &val, &len, NULL, 0) == 0) { - return val != 0; - } - - /* - * We might in the future ask for properties not present in older kernels, - * but we're only asking about static properties, all of which should be - * 'int'. So we shouln't see ENOMEM (val too small), or any of the other - * more exotic errors. - */ - assert(errno == ENOENT); - return false; -} -#endif - static void tcg_target_init(TCGContext *s) { -#ifdef __linux__ - unsigned long hwcap = qemu_getauxval(AT_HWCAP); - have_lse = hwcap & HWCAP_ATOMICS; - have_lse2 = hwcap & HWCAP_USCAT; -#endif -#ifdef CONFIG_DARWIN - have_lse = sysctl_for_bool("hw.optional.arm.FEAT_LSE"); - have_lse2 = sysctl_for_bool("hw.optional.arm.FEAT_LSE2"); -#endif - tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffffu; tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffffu; tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull; diff --git a/util/meson.build b/util/meson.build index 714c783b4c..c43b910aa7 100644 --- a/util/meson.build +++ b/util/meson.build @@ -107,6 +107,8 @@ if have_block util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c')) endif -if cpu in ['x86', 'x86_64'] +if cpu == 'aarch64' + util_ss.add(files('cpuinfo-aarch64.c')) +elif cpu in ['x86', 'x86_64'] util_ss.add(files('cpuinfo-i386.c')) endif -- 2.34.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 9/9] util: Add cpuinfo-aarch64.c 2023-05-18 4:40 ` [PATCH 9/9] util: Add cpuinfo-aarch64.c Richard Henderson @ 2023-05-18 15:55 ` Peter Maydell 0 siblings, 0 replies; 21+ messages in thread From: Peter Maydell @ 2023-05-18 15:55 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel On Thu, 18 May 2023 at 05:41, Richard Henderson <richard.henderson@linaro.org> wrote: > > Move the code from tcg/. The only use of these bits so far > is with respect to the atomicity of tcg operations. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-05-18 15:56 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-18 4:40 [PATCH 0/9] Host-specific includes, begin cpuinfo.h Richard Henderson 2023-05-18 4:40 ` [PATCH 1/9] util: Introduce host-specific cpuinfo.h Richard Henderson 2023-05-18 9:30 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 2/9] util: Add cpuinfo-i386.c Richard Henderson 2023-05-18 9:35 ` Juan Quintela 2023-05-18 12:45 ` Richard Henderson 2023-05-18 4:40 ` [PATCH 3/9] util: Add i386 CPUINFO_ATOMIC_VMOVDQU Richard Henderson 2023-05-18 15:52 ` Peter Maydell 2023-05-18 4:40 ` [PATCH 4/9] tcg/i386: Use cpuinfo.h Richard Henderson 2023-05-18 15:53 ` Peter Maydell 2023-05-18 4:40 ` [PATCH 5/9] util/bufferiszero: Use i386 cpuinfo.h Richard Henderson 2023-05-18 9:49 ` Juan Quintela 2023-05-18 12:48 ` Richard Henderson 2023-05-18 4:40 ` [PATCH 6/9] migration/xbzrle: Shuffle function order Richard Henderson 2023-05-18 9:19 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 7/9] migration/xbzrle: Use i386 cacheinfo.h Richard Henderson 2023-05-18 9:44 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 8/9] migration: Build migration_files once Richard Henderson 2023-05-18 9:20 ` Juan Quintela 2023-05-18 4:40 ` [PATCH 9/9] util: Add cpuinfo-aarch64.c Richard Henderson 2023-05-18 15:55 ` Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).