qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Optimize buffer_is_zero
@ 2024-02-15  8:14 Richard Henderson
  2024-02-15  8:14 ` [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Richard Henderson @ 2024-02-15  8:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: amonakov, mmromanov

v3: https://patchew.org/QEMU/20240206204809.9859-1-amonakov@ispras.ru/

Changes for v4:
  - Keep separate >= 256 entry point, but only keep constant length
    check inline.  This allows the indirect function call to be hidden
    and optimized away when the pointer is constant.
  - Split out a >= 256 integer routine.
  - Simplify acceleration selection for testing.
  - Add function pointer typedef.
  - Implement new aarch64 accelerations.


r~


Alexander Monakov (5):
  util/bufferiszero: Remove SSE4.1 variant
  util/bufferiszero: Remove AVX512 variant
  util/bufferiszero: Reorganize for early test for acceleration
  util/bufferiszero: Remove useless prefetches
  util/bufferiszero: Optimize SSE2 and AVX2 variants

Richard Henderson (5):
  util/bufferiszero: Improve scalar variant
  util/bufferiszero: Introduce biz_accel_fn typedef
  util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  util/bufferiszero: Add simd acceleration for aarch64
  util/bufferiszero: Add sve acceleration for aarch64

 host/include/aarch64/host/cpuinfo.h |   1 +
 include/qemu/cutils.h               |  15 +-
 util/bufferiszero.c                 | 500 ++++++++++++++++------------
 util/cpuinfo-aarch64.c              |   1 +
 meson.build                         |  13 +
 5 files changed, 323 insertions(+), 207 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-02-16 22:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15  8:14 [PATCH v4 00/10] Optimize buffer_is_zero Richard Henderson
2024-02-15  8:14 ` [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
2024-02-15  8:14 ` [PATCH v4 02/10] util/bufferiszero: Remove AVX512 variant Richard Henderson
2024-02-15  8:14 ` [PATCH v4 03/10] util/bufferiszero: Reorganize for early test for acceleration Richard Henderson
2024-02-15  8:14 ` [PATCH v4 04/10] util/bufferiszero: Remove useless prefetches Richard Henderson
2024-02-15  8:14 ` [PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson
2024-02-15  8:14 ` [PATCH v4 06/10] util/bufferiszero: Improve scalar variant Richard Henderson
2024-02-15  8:14 ` [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Richard Henderson
2024-02-15  8:34   ` Philippe Mathieu-Daudé
2024-02-15  8:14 ` [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Richard Henderson
2024-02-15  8:40   ` Philippe Mathieu-Daudé
2024-02-15  8:14 ` [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64 Richard Henderson
2024-02-15  8:47   ` Alexander Monakov
2024-02-15 17:47     ` Richard Henderson
2024-02-15 18:46       ` Alexander Monakov
2024-02-15 21:10         ` Richard Henderson
2024-02-15  8:14 ` [RFC PATCH v4 10/10] util/bufferiszero: Add sve " Richard Henderson
2024-02-16  9:33   ` Alex Bennée
2024-02-16 11:05   ` Alex Bennée
2024-02-15  8:57 ` [PATCH v4 00/10] Optimize buffer_is_zero Alexander Monakov
2024-02-15 21:16   ` Richard Henderson
2024-02-15 21:36     ` Alexander Monakov
2024-02-15 22:27       ` Richard Henderson
2024-02-15 23:37         ` Alexander Monakov
2024-02-16  8:11           ` Richard Henderson
2024-02-16 20:20             ` Alexander Monakov
2024-02-16 22:28               ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).