* [PULL v2 00/61] Misc patches for soft freeze @ 2020-03-16 22:06 Paolo Bonzini 2020-03-16 22:06 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini 2020-03-17 11:03 ` [PULL v2 00/61] Misc patches for soft freeze Peter Maydell 0 siblings, 2 replies; 9+ messages in thread From: Paolo Bonzini @ 2020-03-16 22:06 UTC (permalink / raw) To: qemu-devel The following changes since commit a98135f727595382e200d04c2996e868b7925a01: Merge remote-tracking branch 'remotes/kraxel/tags/vga-20200316-pull-request' into staging (2020-03-16 14:55:59 +0000) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 9d04fea181318684a899fadd99cef7e04097456b: hw/arm: Let devices own the MemoryRegion they create (2020-03-16 23:02:30 +0100) ---------------------------------------------------------------- * Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) * AVX512 optimization for buffer_is_zero (Robert) ---------------------------------------------------------------- v1->v2: fix for clang build Christian Ehrhardt (1): modules: load modules from versioned /var/run dir Christophe de Dinechin (1): scsi/qemu-pr-helper: Fix out-of-bounds access to trnptid_list[] Colin Xu (1): MAINTAINERS: Add entry for Guest X86 HAXM CPUs Dr. David Alan Gilbert (1): exec/rom_reset: Free rom data during inmigrate skip Eduardo Habkost (1): Use -isystem for linux-headers dir Felipe Franciosi (4): qom/object: enable setter for uint types ich9: fix getter type for sci_int property ich9: Simplify ich9_lpc_initfn qom/object: Use common get/set uint helpers Jan Kiszka (1): hw/i386/intel_iommu: Fix out-of-bounds access on guest IRT Joe Richey (1): optionrom/pvh: scan entire RSDP Area Julio Faracco (1): i386: Fix GCC warning with snprintf when HAX is enabled Kashyap Chamarthy (1): qemu-cpu-models.rst: Document -noTSX, mds-no, taa-no, and tsx-ctrl Longpeng (Mike) (1): cpus: avoid pause_all_vcpus getting stuck due to race Marc-André Lureau (1): build-sys: do not make qemu-ga link with pixman Matt Borgerson (1): memory: Fix start offset for bitmap log_clear hook Paolo Bonzini (1): oslib-posix: initialize mutex and condition variable Peter Maydell (1): softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine' Philippe Mathieu-Daudé (36): misc: Replace zero-length arrays with flexible array member (automatic) misc: Replace zero-length arrays with flexible array member (manual) configure: Fix building with SASL on Windows tests/docker: Install SASL library to extend code coverage on amd64 Makefile: Align 'help' target output Makefile: Let the 'help' target list the tools targets hw/audio/fmopl: Move ENV_CURVE to .heap to save 32KiB of .bss hw/audio/intel-hda: Use memory region alias to reduce .rodata by 4.34MB hw/usb/quirks: Use smaller types to reduce .rodata by 10KiB ui/curses: Make control_characters[] array const ui/curses: Move arrays to .heap to save 74KiB of .bss memory: Correctly return alias region type memory: Simplify memory_region_init_rom_nomigrate() to ease review scripts/cocci: Rename memory-region-{init-ram -> housekeeping} scripts/cocci: Patch to replace memory_region_init_{ram,readonly -> rom} hw/arm: Use memory_region_init_rom() with read-only regions hw/display: Use memory_region_init_rom() with read-only regions hw/m68k: Use memory_region_init_rom() with read-only regions hw/net: Use memory_region_init_rom() with read-only regions hw/pci-host: Use memory_region_init_rom() with read-only regions hw/ppc: Use memory_region_init_rom() with read-only regions hw/riscv: Use memory_region_init_rom() with read-only regions hw/sh4: Use memory_region_init_rom() with read-only regions hw/sparc: Use memory_region_init_rom() with read-only regions scripts/cocci: Patch to detect potential use of memory_region_init_rom scripts/cocci: Patch to remove unnecessary memory_region_set_readonly() scripts/cocci: Patch to let devices own their MemoryRegions hw/core: Let devices own the MemoryRegion they create hw/display: Let devices own the MemoryRegion they create hw/dma: Let devices own the MemoryRegion they create hw/riscv: Let devices own the MemoryRegion they create hw/char: Let devices own the MemoryRegion they create hw/arm/stm32: Use memory_region_init_rom() with read-only regions hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias hw/arm: Let devices own the MemoryRegion they create Robert Hoo (2): configure: add configure option avx512f_opt util: add util function buffer_zero_avx512() Stefan Hajnoczi (2): lockable: add lock guards lockable: add QemuRecMutex support Sunil Muthuswamy (3): WHPX: TSC get and set should be dependent on VM state WHPX: Use QEMU values for trapped CPUID WHPX: Use proper synchronization primitives while processing MAINTAINERS | 12 ++ Makefile | 49 +++-- Makefile.target | 2 +- block/linux-aio.c | 2 +- block/vmdk.c | 2 +- bsd-user/qemu.h | 2 +- configure | 62 +++++- contrib/libvhost-user/libvhost-user.h | 2 +- contrib/vhost-user-gpu/Makefile.objs | 6 +- .../vhost-user-gpu/{main.c => vhost-user-gpu.c} | 0 cpus.c | 6 +- docs/interop/vhost-user.rst | 4 +- docs/system/cpu-models-x86.rst.inc | 57 +++++- exec.c | 9 +- hw/acpi/ich9.c | 99 +--------- hw/acpi/nvdimm.c | 6 +- hw/acpi/pcihp.c | 7 +- hw/acpi/piix4.c | 12 +- hw/arm/exynos4210.c | 14 +- hw/arm/fsl-imx25.c | 10 +- hw/arm/fsl-imx31.c | 6 +- hw/arm/fsl-imx6.c | 6 +- hw/arm/fsl-imx6ul.c | 9 +- hw/arm/mainstone.c | 3 +- hw/arm/msf2-soc.c | 6 +- hw/arm/nrf51_soc.c | 2 +- hw/arm/omap_sx1.c | 6 +- hw/arm/palm.c | 3 +- hw/arm/spitz.c | 3 +- hw/arm/stellaris.c | 3 +- hw/arm/stm32f205_soc.c | 11 +- hw/arm/stm32f405_soc.c | 12 +- hw/arm/tosa.c | 3 +- hw/arm/xlnx-zynqmp.c | 11 +- hw/audio/fmopl.c | 4 +- hw/audio/intel-hda.c | 24 +-- hw/char/sclpconsole-lm.c | 2 +- hw/char/sclpconsole.c | 2 +- hw/char/serial.c | 7 +- hw/core/loader.c | 25 ++- hw/core/platform-bus.c | 3 +- hw/display/cg3.c | 5 +- hw/display/g364fb.c | 3 +- hw/display/macfb.c | 4 +- hw/display/tcx.c | 5 +- hw/dma/i8257.c | 2 +- hw/dma/rc4030.c | 4 +- hw/dma/soc_dma.c | 2 +- hw/i386/intel_iommu.c | 6 + hw/i386/x86.c | 2 +- hw/isa/lpc_ich9.c | 27 +-- hw/m68k/bootinfo.h | 2 +- hw/m68k/q800.c | 3 +- hw/misc/edu.c | 13 +- hw/misc/omap_l4.c | 2 +- hw/net/dp8393x.c | 5 +- hw/nvram/eeprom93xx.c | 2 +- hw/pci-host/prep.c | 5 +- hw/pci-host/q35.c | 14 +- hw/ppc/mac_newworld.c | 3 +- hw/ppc/mac_oldworld.c | 3 +- hw/ppc/ppc405_boards.c | 6 +- hw/ppc/spapr.c | 36 +--- hw/ppc/spapr_drc.c | 3 +- hw/rdma/vmw/pvrdma_qp_ops.c | 4 +- hw/riscv/sifive_e.c | 9 +- hw/riscv/sifive_u.c | 2 +- hw/s390x/virtio-ccw.c | 2 +- hw/sh4/shix.c | 3 +- hw/sparc/leon3.c | 3 +- hw/usb/dev-network.c | 2 +- hw/usb/dev-smartcard-reader.c | 4 +- hw/usb/quirks.c | 4 +- hw/usb/quirks.h | 22 ++- hw/virtio/virtio.c | 4 +- hw/xen/xen_pt.h | 2 +- include/hw/acpi/acpi-defs.h | 16 +- include/hw/arm/smmu-common.h | 2 +- include/hw/boards.h | 2 +- include/hw/i386/intel_iommu.h | 3 +- include/hw/s390x/event-facility.h | 2 +- include/hw/s390x/sclp.h | 8 +- include/hw/virtio/virtio-iommu.h | 2 +- include/qemu/cpuid.h | 3 + include/qemu/lockable.h | 67 +++++++ include/qom/object.h | 48 ++++- include/sysemu/cryptodev.h | 2 +- include/sysemu/whpx.h | 7 + include/tcg/tcg.h | 2 +- memory.c | 31 +-- net/queue.c | 2 +- pc-bios/optionrom/pvh_main.c | 2 +- pc-bios/s390-ccw/bootmap.h | 2 +- pc-bios/s390-ccw/sclp.h | 2 +- plugins/core.c | 7 +- plugins/loader.c | 16 +- qom/object.c | 212 ++++++++++++++++++--- .../coccinelle/memory-region-housekeeping.cocci | 159 ++++++++++++++++ scripts/coccinelle/memory-region-init-ram.cocci | 38 ---- scsi/qemu-pr-helper.c | 17 +- softmmu/vl.c | 26 ++- target/arm/cpu.c | 22 +-- target/i386/hax-posix.c | 33 +--- target/i386/hax-windows.c | 33 +--- target/i386/sev.c | 106 +---------- target/i386/whp-dispatch.h | 9 + target/i386/whpx-all.c | 162 +++++++++++----- target/s390x/ioinst.c | 2 +- tests/docker/dockerfiles/debian-amd64.docker | 1 + tests/qtest/libqos/ahci.h | 2 +- ui/console.c | 4 +- ui/curses.c | 10 +- util/bufferiszero.c | 71 ++++++- util/module.c | 14 ++ util/oslib-posix.c | 7 + util/qemu-timer.c | 23 ++- 116 files changed, 1145 insertions(+), 764 deletions(-) rename contrib/vhost-user-gpu/{main.c => vhost-user-gpu.c} (100%) create mode 100644 scripts/coccinelle/memory-region-housekeeping.cocci delete mode 100644 scripts/coccinelle/memory-region-init-ram.cocci -- 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PULL 06/61] util: add util function buffer_zero_avx512() 2020-03-16 22:06 [PULL v2 00/61] Misc patches for soft freeze Paolo Bonzini @ 2020-03-16 22:06 ` Paolo Bonzini 2020-03-17 11:03 ` [PULL v2 00/61] Misc patches for soft freeze Peter Maydell 1 sibling, 0 replies; 9+ messages in thread From: Paolo Bonzini @ 2020-03-16 22:06 UTC (permalink / raw) To: qemu-devel; +Cc: Robert Hoo From: Robert Hoo <robert.hu@linux.intel.com> And intialize buffer_is_zero() with it, when Intel AVX512F is available on host. This function utilizes Intel AVX512 fundamental instructions which is faster than its implementation with AVX2 (in my unit test, with 4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S. buffer_zero_avx2()). Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- include/qemu/cpuid.h | 3 +++ util/bufferiszero.c | 71 ++++++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 64 insertions(+), 10 deletions(-) diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h index 6930170..09fc245 100644 --- a/include/qemu/cpuid.h +++ b/include/qemu/cpuid.h @@ -45,6 +45,9 @@ #ifndef bit_AVX2 #define bit_AVX2 (1 << 5) #endif +#ifndef bit_AVX512F +#define bit_AVX512F (1 << 16) +#endif #ifndef bit_BMI2 #define bit_BMI2 (1 << 8) #endif diff --git a/util/bufferiszero.c b/util/bufferiszero.c index bfb2605..6639035 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -63,11 +63,11 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) /* Do not use push_options pragmas unnecessarily, because clang * does not support them. */ -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #pragma GCC push_options #pragma GCC target("sse2") #endif @@ -104,7 +104,7 @@ buffer_zero_sse2(const void *buf, size_t len) return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; } -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #pragma GCC pop_options #endif @@ -187,18 +187,54 @@ buffer_zero_avx2(const void *buf, size_t len) #pragma GCC pop_options #endif /* CONFIG_AVX2_OPT */ +#ifdef CONFIG_AVX512F_OPT +#pragma GCC push_options +#pragma GCC target("avx512f") +#include <immintrin.h> + +static bool +buffer_zero_avx512(const void *buf, size_t len) +{ + /* Begin with an unaligned head of 64 bytes. */ + __m512i t = _mm512_loadu_si512(buf); + __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); + __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); + + /* Loop over 64-byte aligned blocks of 256. */ + while (p <= e) { + __builtin_prefetch(p); + if (unlikely(_mm512_test_epi64_mask(t, t))) { + return false; + } + t = p[-4] | p[-3] | p[-2] | p[-1]; + p += 4; + } + + t |= _mm512_loadu_si512(buf + len - 4 * 64); + t |= _mm512_loadu_si512(buf + len - 3 * 64); + t |= _mm512_loadu_si512(buf + len - 2 * 64); + t |= _mm512_loadu_si512(buf + len - 1 * 64); + + return !_mm512_test_epi64_mask(t, t); + +} +#pragma GCC pop_options +#endif + + /* Note that for test_buffer_is_zero_next_accel, the most preferred * ISA must have the least significant bit. */ -#define CACHE_AVX2 1 -#define CACHE_SSE4 2 -#define CACHE_SSE2 4 +#define CACHE_AVX512F 1 +#define CACHE_AVX2 2 +#define CACHE_SSE4 4 +#define CACHE_SSE2 8 /* Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) # define INIT_CACHE 0 # define INIT_ACCEL buffer_zero_int #else @@ -211,6 +247,7 @@ buffer_zero_avx2(const void *buf, size_t len) static unsigned cpuid_cache = INIT_CACHE; static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; +static int length_to_accel = 64; static void init_accel(unsigned cache) { @@ -226,10 +263,16 @@ static void init_accel(unsigned cache) fn = buffer_zero_avx2; } #endif +#ifdef CONFIG_AVX512F_OPT + if (cache & CACHE_AVX512F) { + fn = buffer_zero_avx512; + length_to_accel = 256; + } +#endif buffer_accel = fn; } -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #include "qemu/cpuid.h" static void __attribute__((constructor)) init_cpuid_cache(void) @@ -252,9 +295,17 @@ static void __attribute__((constructor)) init_cpuid_cache(void) int bv; __asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0)); __cpuid_count(7, 0, a, b, c, d); - if ((bv & 6) == 6 && (b & bit_AVX2)) { + if ((bv & 0x6) == 0x6 && (b & bit_AVX2)) { cache |= CACHE_AVX2; } + /* 0xe6: + * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 + * and ZMM16-ZMM31 state are enabled by OS) + * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) + */ + if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512F)) { + cache |= CACHE_AVX512F; + } } } cpuid_cache = cache; @@ -277,7 +328,7 @@ bool test_buffer_is_zero_next_accel(void) static bool select_accel_fn(const void *buf, size_t len) { - if (likely(len >= 64)) { + if (likely(len >= length_to_accel)) { return buffer_accel(buf, len); } return buffer_zero_int(buf, len); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PULL v2 00/61] Misc patches for soft freeze 2020-03-16 22:06 [PULL v2 00/61] Misc patches for soft freeze Paolo Bonzini 2020-03-16 22:06 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini @ 2020-03-17 11:03 ` Peter Maydell 2020-03-17 12:02 ` Philippe Mathieu-Daudé 1 sibling, 1 reply; 9+ messages in thread From: Peter Maydell @ 2020-03-17 11:03 UTC (permalink / raw) To: Paolo Bonzini; +Cc: QEMU Developers On Mon, 16 Mar 2020 at 22:07, Paolo Bonzini <pbonzini@redhat.com> wrote: > > The following changes since commit a98135f727595382e200d04c2996e868b7925a01: > > Merge remote-tracking branch 'remotes/kraxel/tags/vga-20200316-pull-request' into staging (2020-03-16 14:55:59 +0000) > > are available in the git repository at: > > > git://github.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 9d04fea181318684a899fadd99cef7e04097456b: > > hw/arm: Let devices own the MemoryRegion they create (2020-03-16 23:02:30 +0100) > > ---------------------------------------------------------------- > * Bugfixes all over the place > * get/set_uint cleanups (Felipe) > * Lock guard support (Stefan) > * MemoryRegion ownership cleanup (Philippe) > * AVX512 optimization for buffer_is_zero (Robert) Hi; this generates a new warning on netbsd: /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function 'timerlist_expired': /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:197:12: warning: 'expire_time' may be used uninitialized in this function [-Wmaybe-uninitialized] return expire_time <= qemu_clock_get_ns(timer_list->clock->type); ^ /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function 'timerlist_deadline_ns': /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:235:11: warning: 'expire_time' may be used uninitialized in this function [-Wmaybe-uninitialized] delta = expire_time - qemu_clock_get_ns(timer_list->clock->type); ^ This is probably just the compiler being not smart enough to figure out that there's no code path where it's not initialized. thanks -- PMM ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PULL v2 00/61] Misc patches for soft freeze 2020-03-17 11:03 ` [PULL v2 00/61] Misc patches for soft freeze Peter Maydell @ 2020-03-17 12:02 ` Philippe Mathieu-Daudé 2020-03-17 14:26 ` Stefan Hajnoczi 0 siblings, 1 reply; 9+ messages in thread From: Philippe Mathieu-Daudé @ 2020-03-17 12:02 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Peter Maydell, QEMU Developers, Paolo Bonzini Cc'ing Stefan On 3/17/20 12:03 PM, Peter Maydell wrote: > On Mon, 16 Mar 2020 at 22:07, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> The following changes since commit a98135f727595382e200d04c2996e868b7925a01: >> >> Merge remote-tracking branch 'remotes/kraxel/tags/vga-20200316-pull-request' into staging (2020-03-16 14:55:59 +0000) >> >> are available in the git repository at: >> >> >> git://github.com/bonzini/qemu.git tags/for-upstream >> >> for you to fetch changes up to 9d04fea181318684a899fadd99cef7e04097456b: >> >> hw/arm: Let devices own the MemoryRegion they create (2020-03-16 23:02:30 +0100) >> >> ---------------------------------------------------------------- >> * Bugfixes all over the place >> * get/set_uint cleanups (Felipe) >> * Lock guard support (Stefan) >> * MemoryRegion ownership cleanup (Philippe) >> * AVX512 optimization for buffer_is_zero (Robert) > > Hi; this generates a new warning on netbsd: > > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function > 'timerlist_expired': > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:197:12: warning: > 'expire_time' may be used uninitialized in this function > [-Wmaybe-uninitialized] > return expire_time <= qemu_clock_get_ns(timer_list->clock->type); > ^ > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function > 'timerlist_deadline_ns': > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:235:11: warning: > 'expire_time' may be used uninitialized in this function > [-Wmaybe-uninitialized] > delta = expire_time - qemu_clock_get_ns(timer_list->clock->type); > ^ > > This is probably just the compiler being not smart enough > to figure out that there's no code path where it's not > initialized. > > thanks > -- PMM > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PULL v2 00/61] Misc patches for soft freeze 2020-03-17 12:02 ` Philippe Mathieu-Daudé @ 2020-03-17 14:26 ` Stefan Hajnoczi 2020-03-17 14:47 ` Paolo Bonzini 0 siblings, 1 reply; 9+ messages in thread From: Stefan Hajnoczi @ 2020-03-17 14:26 UTC (permalink / raw) To: Peter Maydell; +Cc: Paolo Bonzini, philmd, QEMU Developers [-- Attachment #1: Type: text/plain, Size: 2308 bytes --] On Tue, Mar 17, 2020 at 01:02:48PM +0100, Philippe Mathieu-Daudé wrote: > Cc'ing Stefan > > On 3/17/20 12:03 PM, Peter Maydell wrote: > > On Mon, 16 Mar 2020 at 22:07, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > > The following changes since commit a98135f727595382e200d04c2996e868b7925a01: > > > > > > Merge remote-tracking branch 'remotes/kraxel/tags/vga-20200316-pull-request' into staging (2020-03-16 14:55:59 +0000) > > > > > > are available in the git repository at: > > > > > > > > > git://github.com/bonzini/qemu.git tags/for-upstream > > > > > > for you to fetch changes up to 9d04fea181318684a899fadd99cef7e04097456b: > > > > > > hw/arm: Let devices own the MemoryRegion they create (2020-03-16 23:02:30 +0100) > > > > > > ---------------------------------------------------------------- > > > * Bugfixes all over the place > > > * get/set_uint cleanups (Felipe) > > > * Lock guard support (Stefan) > > > * MemoryRegion ownership cleanup (Philippe) > > > * AVX512 optimization for buffer_is_zero (Robert) > > > > Hi; this generates a new warning on netbsd: > > > > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function > > 'timerlist_expired': > > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:197:12: warning: > > 'expire_time' may be used uninitialized in this function > > [-Wmaybe-uninitialized] > > return expire_time <= qemu_clock_get_ns(timer_list->clock->type); > > ^ > > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c: In function > > 'timerlist_deadline_ns': > > /home/qemu/qemu-test.N42OXz/src/util/qemu-timer.c:235:11: warning: > > 'expire_time' may be used uninitialized in this function > > [-Wmaybe-uninitialized] > > delta = expire_time - qemu_clock_get_ns(timer_list->clock->type); > > ^ > > > > This is probably just the compiler being not smart enough > > to figure out that there's no code path where it's not > > initialized. Yes, looks like the compiler can't figure out the control flow on NetBSD. We could drop the WITH_QEMU_LOCK_GUARD() macro and use this idiom instead: { QEMU_LOCK_GUARD(&mutex); ... } But it's unusual for C code to create scopes without a statement (for, if, while). Opinions? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PULL v2 00/61] Misc patches for soft freeze 2020-03-17 14:26 ` Stefan Hajnoczi @ 2020-03-17 14:47 ` Paolo Bonzini 2020-03-17 15:42 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2020-03-17 14:47 UTC (permalink / raw) To: Stefan Hajnoczi, Peter Maydell; +Cc: philmd, QEMU Developers [-- Attachment #1.1: Type: text/plain, Size: 3466 bytes --] On 17/03/20 15:26, Stefan Hajnoczi wrote: > Yes, looks like the compiler can't figure out the control flow on > NetBSD. > > We could drop the WITH_QEMU_LOCK_GUARD() macro and use this idiom > instead: > > { > QEMU_LOCK_GUARD(&mutex); > ... > } > > But it's unusual for C code to create scopes without a statement (for, > if, while). After staring at compiler dumps for a while I have just concluded that this could actually be considered a bug in WITH_QEMU_LOCK_GUARD. QEMU_MAKE_LOCKABLE returns NULL if passed a NULL argument. This is the root cause of the NetBSD failure, as the compiler doesn't figure out that &timer_list->active_timers_lock is non-NULL and therefore doesn't simplify the qemu_make_lockable function. But why does that cause an uninitialized variable warning? Because if WITH_QEMU_LOCK_GUARD were passed NULL, it would not execute its body! So I'm going to squash the following in the series, mostly through a new patch "lockable: introduce QEMU_MAKE_LOCKABLE_NONNULL": diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h index 44b3f4b..1aeb2cb 100644 --- a/include/qemu/lockable.h +++ b/include/qemu/lockable.h @@ -67,7 +67,7 @@ qemu_make_lockable(void *x, QemuLockable *lockable) * In C++ it would be different, but then C++ wouldn't need QemuLockable * either... */ -#define QEMU_MAKE_LOCKABLE_(x) qemu_make_lockable((x), &(QemuLockable) { \ +#define QEMU_MAKE_LOCKABLE_(x) (&(QemuLockable) { \ .object = (x), \ .lock = QEMU_LOCK_FUNC(x), \ .unlock = QEMU_UNLOCK_FUNC(x), \ @@ -75,14 +75,27 @@ qemu_make_lockable(void *x, QemuLockable *lockable) /* QEMU_MAKE_LOCKABLE - Make a polymorphic QemuLockable * - * @x: a lock object (currently one of QemuMutex, CoMutex, QemuSpin). + * @x: a lock object (currently one of QemuMutex, QemuRecMutex, CoMutex, QemuSpin). * * Returns a QemuLockable object that can be passed around - * to a function that can operate with locks of any kind. + * to a function that can operate with locks of any kind, or + * NULL if @x is %NULL. */ #define QEMU_MAKE_LOCKABLE(x) \ QEMU_GENERIC(x, \ (QemuLockable *, (x)), \ + qemu_make_lockable((x), QEMU_MAKE_LOCKABLE_(x))) + +/* QEMU_MAKE_LOCKABLE_NONNULL - Make a polymorphic QemuLockable + * + * @x: a lock object (currently one of QemuMutex, QemuRecMutex, CoMutex, QemuSpin). + * + * Returns a QemuLockable object that can be passed around + * to a function that can operate with locks of any kind. + */ +#define QEMU_MAKE_LOCKABLE_NONNULL(x) \ + QEMU_GENERIC(x, \ + (QemuLockable *, (x)), \ QEMU_MAKE_LOCKABLE_(x)) static inline void qemu_lockable_lock(QemuLockable *x) @@ -112,7 +125,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuLockable, qemu_lockable_auto_unlock) #define WITH_QEMU_LOCK_GUARD_(x, var) \ for (g_autoptr(QemuLockable) var = \ - qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE((x))); \ + qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE_NONNULL((x))); \ var; \ qemu_lockable_auto_unlock(var), var = NULL) So thank you NetBSD compiler, I guess. :P Paolo [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PULL v2 00/61] Misc patches for soft freeze 2020-03-17 14:47 ` Paolo Bonzini @ 2020-03-17 15:42 ` Philippe Mathieu-Daudé 0 siblings, 0 replies; 9+ messages in thread From: Philippe Mathieu-Daudé @ 2020-03-17 15:42 UTC (permalink / raw) To: Paolo Bonzini, Stefan Hajnoczi, Peter Maydell; +Cc: QEMU Developers On 3/17/20 3:47 PM, Paolo Bonzini wrote: > On 17/03/20 15:26, Stefan Hajnoczi wrote: >> Yes, looks like the compiler can't figure out the control flow on >> NetBSD. >> >> We could drop the WITH_QEMU_LOCK_GUARD() macro and use this idiom >> instead: >> >> { >> QEMU_LOCK_GUARD(&mutex); >> ... >> } >> >> But it's unusual for C code to create scopes without a statement (for, >> if, while). > > After staring at compiler dumps for a while I have just concluded that > this could actually be considered a bug in WITH_QEMU_LOCK_GUARD. > > QEMU_MAKE_LOCKABLE returns NULL if passed a NULL argument. This is the > root cause of the NetBSD failure, as the compiler doesn't figure out > that &timer_list->active_timers_lock is non-NULL and therefore doesn't > simplify the qemu_make_lockable function. > > But why does that cause an uninitialized variable warning? Because if > WITH_QEMU_LOCK_GUARD were passed NULL, it would not execute its body! > > So I'm going to squash the following in the series, mostly through a new > patch "lockable: introduce QEMU_MAKE_LOCKABLE_NONNULL": > > diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h > index 44b3f4b..1aeb2cb 100644 > --- a/include/qemu/lockable.h > +++ b/include/qemu/lockable.h > @@ -67,7 +67,7 @@ qemu_make_lockable(void *x, QemuLockable *lockable) > * In C++ it would be different, but then C++ wouldn't need QemuLockable > * either... > */ > -#define QEMU_MAKE_LOCKABLE_(x) qemu_make_lockable((x), &(QemuLockable) { \ > +#define QEMU_MAKE_LOCKABLE_(x) (&(QemuLockable) { \ > .object = (x), \ > .lock = QEMU_LOCK_FUNC(x), \ > .unlock = QEMU_UNLOCK_FUNC(x), \ > @@ -75,14 +75,27 @@ qemu_make_lockable(void *x, QemuLockable *lockable) > > /* QEMU_MAKE_LOCKABLE - Make a polymorphic QemuLockable > * > - * @x: a lock object (currently one of QemuMutex, CoMutex, QemuSpin). > + * @x: a lock object (currently one of QemuMutex, QemuRecMutex, CoMutex, QemuSpin). > * > * Returns a QemuLockable object that can be passed around > - * to a function that can operate with locks of any kind. > + * to a function that can operate with locks of any kind, or > + * NULL if @x is %NULL. > */ > #define QEMU_MAKE_LOCKABLE(x) \ > QEMU_GENERIC(x, \ > (QemuLockable *, (x)), \ > + qemu_make_lockable((x), QEMU_MAKE_LOCKABLE_(x))) > + > +/* QEMU_MAKE_LOCKABLE_NONNULL - Make a polymorphic QemuLockable > + * > + * @x: a lock object (currently one of QemuMutex, QemuRecMutex, CoMutex, QemuSpin). > + * > + * Returns a QemuLockable object that can be passed around > + * to a function that can operate with locks of any kind. > + */ > +#define QEMU_MAKE_LOCKABLE_NONNULL(x) \ > + QEMU_GENERIC(x, \ > + (QemuLockable *, (x)), \ > QEMU_MAKE_LOCKABLE_(x)) > > static inline void qemu_lockable_lock(QemuLockable *x) > @@ -112,7 +125,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuLockable, qemu_lockable_auto_unlock) > > #define WITH_QEMU_LOCK_GUARD_(x, var) \ > for (g_autoptr(QemuLockable) var = \ > - qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE((x))); \ > + qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE_NONNULL((x))); \ > var; \ > qemu_lockable_auto_unlock(var), var = NULL) > > > So thank you NetBSD compiler, I guess. :P Yep, new patch looks good. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> > > Paolo > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PULL 00/61] Misc patches for soft freeze @ 2020-03-16 21:26 Paolo Bonzini 2020-03-16 21:26 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2020-03-16 21:26 UTC (permalink / raw) To: qemu-devel The following changes since commit a98135f727595382e200d04c2996e868b7925a01: Merge remote-tracking branch 'remotes/kraxel/tags/vga-20200316-pull-request' into staging (2020-03-16 14:55:59 +0000) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 39633510cca6dee4129258c3ba8fb4a677cc65fd: hw/arm: Let devices own the MemoryRegion they create (2020-03-16 22:07:52 +0100) ---------------------------------------------------------------- * Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) ---------------------------------------------------------------- Christian Ehrhardt (1): modules: load modules from versioned /var/run dir Christophe de Dinechin (1): scsi/qemu-pr-helper: Fix out-of-bounds access to trnptid_list[] Colin Xu (1): MAINTAINERS: Add entry for Guest X86 HAXM CPUs Dr. David Alan Gilbert (1): exec/rom_reset: Free rom data during inmigrate skip Eduardo Habkost (1): Use -isystem for linux-headers dir Felipe Franciosi (4): qom/object: enable setter for uint types ich9: fix getter type for sci_int property ich9: Simplify ich9_lpc_initfn qom/object: Use common get/set uint helpers Jan Kiszka (1): hw/i386/intel_iommu: Fix out-of-bounds access on guest IRT Joe Richey (1): optionrom/pvh: scan entire RSDP Area Julio Faracco (1): i386: Fix GCC warning with snprintf when HAX is enabled Kashyap Chamarthy (1): qemu-cpu-models.rst: Document -noTSX, mds-no, taa-no, and tsx-ctrl Longpeng (Mike) (1): cpus: avoid pause_all_vcpus getting stuck due to race Marc-André Lureau (1): build-sys: do not make qemu-ga link with pixman Matt Borgerson (1): memory: Fix start offset for bitmap log_clear hook Paolo Bonzini (1): oslib-posix: initialize mutex and condition variable Peter Maydell (1): softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine' Philippe Mathieu-Daudé (36): misc: Replace zero-length arrays with flexible array member (automatic) misc: Replace zero-length arrays with flexible array member (manual) configure: Fix building with SASL on Windows tests/docker: Install SASL library to extend code coverage on amd64 Makefile: Align 'help' target output Makefile: Let the 'help' target list the tools targets hw/audio/fmopl: Move ENV_CURVE to .heap to save 32KiB of .bss hw/audio/intel-hda: Use memory region alias to reduce .rodata by 4.34MB hw/usb/quirks: Use smaller types to reduce .rodata by 10KiB ui/curses: Make control_characters[] array const ui/curses: Move arrays to .heap to save 74KiB of .bss memory: Correctly return alias region type memory: Simplify memory_region_init_rom_nomigrate() to ease review scripts/cocci: Rename memory-region-{init-ram -> housekeeping} scripts/cocci: Patch to replace memory_region_init_{ram,readonly -> rom} hw/arm: Use memory_region_init_rom() with read-only regions hw/display: Use memory_region_init_rom() with read-only regions hw/m68k: Use memory_region_init_rom() with read-only regions hw/net: Use memory_region_init_rom() with read-only regions hw/pci-host: Use memory_region_init_rom() with read-only regions hw/ppc: Use memory_region_init_rom() with read-only regions hw/riscv: Use memory_region_init_rom() with read-only regions hw/sh4: Use memory_region_init_rom() with read-only regions hw/sparc: Use memory_region_init_rom() with read-only regions scripts/cocci: Patch to detect potential use of memory_region_init_rom scripts/cocci: Patch to remove unnecessary memory_region_set_readonly() scripts/cocci: Patch to let devices own their MemoryRegions hw/core: Let devices own the MemoryRegion they create hw/display: Let devices own the MemoryRegion they create hw/dma: Let devices own the MemoryRegion they create hw/riscv: Let devices own the MemoryRegion they create hw/char: Let devices own the MemoryRegion they create hw/arm/stm32: Use memory_region_init_rom() with read-only regions hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias hw/arm: Let devices own the MemoryRegion they create Robert Hoo (2): configure: add configure option avx512f_opt util: add util function buffer_zero_avx512() Stefan Hajnoczi (2): lockable: add lock guards lockable: add QemuRecMutex support Sunil Muthuswamy (3): WHPX: TSC get and set should be dependent on VM state WHPX: Use QEMU values for trapped CPUID WHPX: Use proper synchronization primitives while processing MAINTAINERS | 12 ++ Makefile | 49 +++-- Makefile.target | 2 +- block/linux-aio.c | 2 +- block/vmdk.c | 2 +- bsd-user/qemu.h | 2 +- configure | 62 +++++- contrib/libvhost-user/libvhost-user.h | 2 +- contrib/vhost-user-gpu/Makefile.objs | 6 +- .../vhost-user-gpu/{main.c => vhost-user-gpu.c} | 0 cpus.c | 6 +- docs/interop/vhost-user.rst | 4 +- docs/system/cpu-models-x86.rst.inc | 57 +++++- exec.c | 9 +- hw/acpi/ich9.c | 99 +--------- hw/acpi/nvdimm.c | 6 +- hw/acpi/pcihp.c | 7 +- hw/acpi/piix4.c | 12 +- hw/arm/exynos4210.c | 14 +- hw/arm/fsl-imx25.c | 10 +- hw/arm/fsl-imx31.c | 6 +- hw/arm/fsl-imx6.c | 6 +- hw/arm/fsl-imx6ul.c | 9 +- hw/arm/mainstone.c | 3 +- hw/arm/msf2-soc.c | 6 +- hw/arm/nrf51_soc.c | 2 +- hw/arm/omap_sx1.c | 6 +- hw/arm/palm.c | 3 +- hw/arm/spitz.c | 3 +- hw/arm/stellaris.c | 3 +- hw/arm/stm32f205_soc.c | 11 +- hw/arm/stm32f405_soc.c | 12 +- hw/arm/tosa.c | 3 +- hw/arm/xlnx-zynqmp.c | 11 +- hw/audio/fmopl.c | 4 +- hw/audio/intel-hda.c | 24 +-- hw/char/sclpconsole-lm.c | 2 +- hw/char/sclpconsole.c | 2 +- hw/char/serial.c | 7 +- hw/core/loader.c | 25 ++- hw/core/platform-bus.c | 3 +- hw/display/cg3.c | 5 +- hw/display/g364fb.c | 3 +- hw/display/macfb.c | 4 +- hw/display/tcx.c | 5 +- hw/dma/i8257.c | 2 +- hw/dma/rc4030.c | 4 +- hw/dma/soc_dma.c | 2 +- hw/i386/intel_iommu.c | 6 + hw/i386/x86.c | 2 +- hw/isa/lpc_ich9.c | 27 +-- hw/m68k/bootinfo.h | 2 +- hw/m68k/q800.c | 3 +- hw/misc/edu.c | 13 +- hw/misc/omap_l4.c | 2 +- hw/net/dp8393x.c | 5 +- hw/nvram/eeprom93xx.c | 2 +- hw/pci-host/prep.c | 5 +- hw/pci-host/q35.c | 14 +- hw/ppc/mac_newworld.c | 3 +- hw/ppc/mac_oldworld.c | 3 +- hw/ppc/ppc405_boards.c | 6 +- hw/ppc/spapr.c | 36 +--- hw/ppc/spapr_drc.c | 3 +- hw/rdma/vmw/pvrdma_qp_ops.c | 4 +- hw/riscv/sifive_e.c | 9 +- hw/riscv/sifive_u.c | 2 +- hw/s390x/virtio-ccw.c | 2 +- hw/sh4/shix.c | 3 +- hw/sparc/leon3.c | 3 +- hw/usb/dev-network.c | 2 +- hw/usb/dev-smartcard-reader.c | 4 +- hw/usb/quirks.c | 4 +- hw/usb/quirks.h | 22 ++- hw/virtio/virtio.c | 4 +- hw/xen/xen_pt.h | 2 +- include/hw/acpi/acpi-defs.h | 16 +- include/hw/arm/smmu-common.h | 2 +- include/hw/boards.h | 2 +- include/hw/i386/intel_iommu.h | 3 +- include/hw/s390x/event-facility.h | 2 +- include/hw/s390x/sclp.h | 8 +- include/hw/virtio/virtio-iommu.h | 2 +- include/qemu/cpuid.h | 3 + include/qemu/lockable.h | 67 +++++++ include/qom/object.h | 48 ++++- include/sysemu/cryptodev.h | 2 +- include/sysemu/whpx.h | 7 + include/tcg/tcg.h | 2 +- memory.c | 31 +-- net/queue.c | 2 +- pc-bios/optionrom/pvh_main.c | 2 +- pc-bios/s390-ccw/bootmap.h | 2 +- pc-bios/s390-ccw/sclp.h | 2 +- plugins/core.c | 7 +- plugins/loader.c | 16 +- qom/object.c | 212 ++++++++++++++++++--- .../coccinelle/memory-region-housekeeping.cocci | 159 ++++++++++++++++ scripts/coccinelle/memory-region-init-ram.cocci | 38 ---- scsi/qemu-pr-helper.c | 17 +- softmmu/vl.c | 26 ++- target/arm/cpu.c | 22 +-- target/i386/hax-posix.c | 33 +--- target/i386/hax-windows.c | 33 +--- target/i386/sev.c | 106 +---------- target/i386/whp-dispatch.h | 9 + target/i386/whpx-all.c | 162 +++++++++++----- target/s390x/ioinst.c | 2 +- tests/docker/dockerfiles/debian-amd64.docker | 1 + tests/qtest/libqos/ahci.h | 2 +- ui/console.c | 4 +- ui/curses.c | 10 +- util/bufferiszero.c | 74 ++++++- util/module.c | 14 ++ util/oslib-posix.c | 7 + util/qemu-timer.c | 23 ++- 116 files changed, 1148 insertions(+), 764 deletions(-) rename contrib/vhost-user-gpu/{main.c => vhost-user-gpu.c} (100%) create mode 100644 scripts/coccinelle/memory-region-housekeeping.cocci delete mode 100644 scripts/coccinelle/memory-region-init-ram.cocci -- 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PULL 06/61] util: add util function buffer_zero_avx512() 2020-03-16 21:26 [PULL " Paolo Bonzini @ 2020-03-16 21:26 ` Paolo Bonzini 2020-03-16 22:02 ` Paolo Bonzini 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2020-03-16 21:26 UTC (permalink / raw) To: qemu-devel; +Cc: Robert Hoo From: Robert Hoo <robert.hu@linux.intel.com> And intialize buffer_is_zero() with it, when Intel AVX512F is available on host. This function utilizes Intel AVX512 fundamental instructions which is faster than its implementation with AVX2 (in my unit test, with 4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S. buffer_zero_avx2()). Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- include/qemu/cpuid.h | 3 +++ util/bufferiszero.c | 74 +++++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h index 6930170..09fc245 100644 --- a/include/qemu/cpuid.h +++ b/include/qemu/cpuid.h @@ -45,6 +45,9 @@ #ifndef bit_AVX2 #define bit_AVX2 (1 << 5) #endif +#ifndef bit_AVX512F +#define bit_AVX512F (1 << 16) +#endif #ifndef bit_BMI2 #define bit_BMI2 (1 << 8) #endif diff --git a/util/bufferiszero.c b/util/bufferiszero.c index bfb2605..b6eab0c 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -63,11 +63,11 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) /* Do not use push_options pragmas unnecessarily, because clang * does not support them. */ -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #pragma GCC push_options #pragma GCC target("sse2") #endif @@ -104,7 +104,7 @@ buffer_zero_sse2(const void *buf, size_t len) return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; } -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #pragma GCC pop_options #endif @@ -187,18 +187,54 @@ buffer_zero_avx2(const void *buf, size_t len) #pragma GCC pop_options #endif /* CONFIG_AVX2_OPT */ +#ifdef CONFIG_AVX512F_OPT +#pragma GCC push_options +#pragma GCC target("avx512f") +#include <immintrin.h> + +static bool +buffer_zero_avx512(const void *buf, size_t len) +{ + /* Begin with an unaligned head of 64 bytes. */ + __m512i t = _mm512_loadu_si512(buf); + __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); + __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); + + /* Loop over 64-byte aligned blocks of 256. */ + while (p <= e) { + __builtin_prefetch(p); + if (unlikely(_mm512_test_epi64_mask(t, t))) { + return false; + } + t = p[-4] | p[-3] | p[-2] | p[-1]; + p += 4; + } + + t |= _mm512_loadu_si512(buf + len - 4 * 64); + t |= _mm512_loadu_si512(buf + len - 3 * 64); + t |= _mm512_loadu_si512(buf + len - 2 * 64); + t |= _mm512_loadu_si512(buf + len - 1 * 64); + + return !_mm512_test_epi64_mask(t, t); + +} +#pragma GCC pop_options +#endif + + /* Note that for test_buffer_is_zero_next_accel, the most preferred * ISA must have the least significant bit. */ -#define CACHE_AVX2 1 -#define CACHE_SSE4 2 -#define CACHE_SSE2 4 +#define CACHE_AVX512F 1 +#define CACHE_AVX2 2 +#define CACHE_SSE4 4 +#define CACHE_SSE2 8 /* Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) # define INIT_CACHE 0 # define INIT_ACCEL buffer_zero_int #else @@ -211,25 +247,35 @@ buffer_zero_avx2(const void *buf, size_t len) static unsigned cpuid_cache = INIT_CACHE; static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; +static int length_to_accel; static void init_accel(unsigned cache) { bool (*fn)(const void *, size_t) = buffer_zero_int; if (cache & CACHE_SSE2) { fn = buffer_zero_sse2; + length_to_accel = 64; } #ifdef CONFIG_AVX2_OPT if (cache & CACHE_SSE4) { fn = buffer_zero_sse4; + length_to_accel = 64; } if (cache & CACHE_AVX2) { fn = buffer_zero_avx2; + length_to_accel = 64; + } +#endif +#ifdef CONFIG_AVX512F_OPT + if (cache & CACHE_AVX512F) { + fn = buffer_zero_avx512; + length_to_accel = 256; } #endif buffer_accel = fn; } -#ifdef CONFIG_AVX2_OPT +#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) #include "qemu/cpuid.h" static void __attribute__((constructor)) init_cpuid_cache(void) @@ -252,9 +298,17 @@ static void __attribute__((constructor)) init_cpuid_cache(void) int bv; __asm("xgetbv" : "=a"(bv), "=d"(d) : "c"(0)); __cpuid_count(7, 0, a, b, c, d); - if ((bv & 6) == 6 && (b & bit_AVX2)) { + if ((bv & 0x6) == 0x6 && (b & bit_AVX2)) { cache |= CACHE_AVX2; } + /* 0xe6: + * XCR0[7:5] = 111b (OPMASK state, upper 256-bit of ZMM0-ZMM15 + * and ZMM16-ZMM31 state are enabled by OS) + * XCR0[2:1] = 11b (XMM state and YMM state are enabled by OS) + */ + if ((bv & 0xe6) == 0xe6 && (b & bit_AVX512F)) { + cache |= CACHE_AVX512F; + } } } cpuid_cache = cache; @@ -277,7 +331,7 @@ bool test_buffer_is_zero_next_accel(void) static bool select_accel_fn(const void *buf, size_t len) { - if (likely(len >= 64)) { + if (likely(len >= length_to_accel)) { return buffer_accel(buf, len); } return buffer_zero_int(buf, len); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PULL 06/61] util: add util function buffer_zero_avx512() 2020-03-16 21:26 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini @ 2020-03-16 22:02 ` Paolo Bonzini 0 siblings, 0 replies; 9+ messages in thread From: Paolo Bonzini @ 2020-03-16 22:02 UTC (permalink / raw) To: qemu-devel; +Cc: Robert Hoo This breaks if AVX2 and AVX512 are both disabled but SSE2 isn't, because it calls buffer_is_zero_sse2 even with length < 64. Fix: diff --git a/util/bufferiszero.c b/util/bufferiszero.c index b6eab0c..6639035 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -247,23 +247,20 @@ buffer_zero_avx512(const void *buf, size_t len) static unsigned cpuid_cache = INIT_CACHE; static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; -static int length_to_accel; +static int length_to_accel = 64; static void init_accel(unsigned cache) { bool (*fn)(const void *, size_t) = buffer_zero_int; if (cache & CACHE_SSE2) { fn = buffer_zero_sse2; - length_to_accel = 64; } #ifdef CONFIG_AVX2_OPT if (cache & CACHE_SSE4) { fn = buffer_zero_sse4; - length_to_accel = 64; } if (cache & CACHE_AVX2) { fn = buffer_zero_avx2; - length_to_accel = 64; } #endif #ifdef CONFIG_AVX512F_OPT Paolo ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-17 15:49 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-16 22:06 [PULL v2 00/61] Misc patches for soft freeze Paolo Bonzini 2020-03-16 22:06 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini 2020-03-17 11:03 ` [PULL v2 00/61] Misc patches for soft freeze Peter Maydell 2020-03-17 12:02 ` Philippe Mathieu-Daudé 2020-03-17 14:26 ` Stefan Hajnoczi 2020-03-17 14:47 ` Paolo Bonzini 2020-03-17 15:42 ` Philippe Mathieu-Daudé -- strict thread matches above, loose matches on Subject: below -- 2020-03-16 21:26 [PULL " Paolo Bonzini 2020-03-16 21:26 ` [PULL 06/61] util: add util function buffer_zero_avx512() Paolo Bonzini 2020-03-16 22:02 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).