[PATCH 00/13] hexagon: add missing HVX float instructions

public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed

* [PATCH 00/13] hexagon: add missing HVX float instructions
@ 2026-03-23 13:15 Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 01/13] tests/docker: Update hexagon cross toolchain to 22.1.0 Matheus Tavares Bernardino
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

This patchset adds 59 HVX floating point instructions from Hexagon
revisions v68 and v73 that were missing in qemu. Tests are also added at
the end.

Brian Cain (1):
  tests/docker: Update hexagon cross toolchain to 22.1.0

Matheus Tavares Bernardino (12):
  target/hexagon: fix incorrect/too-permissive HVX encodings
  target/hexagon/cpu: add HVX IEEE FP extension
  target/hexagon: add v68 HVX IEEE float arithmetic insns
  target/hexagon: add v68 HVX IEEE float min/max insns
  target/hexagon: add v68 HVX IEEE float misc insns
  target/hexagon: add v68 HVX IEEE float conversion insns
  target/hexagon: add v68 HVX IEEE float compare insns
  target/hexagon: add v73 HVX IEEE bfloat16 insns
  tests/hexagon: add tests for v68 HVX IEEE float arithmetics
  tests/hexagon: add tests for v68 HVX IEEE float min/max
  tests/hexagon: add tests for v68 HVX IEEE float conversions
  tests/hexagon: add tests for v68 HVX IEEE float comparisons

 target/hexagon/cpu.h                          |   1 +
 target/hexagon/mmvec/kvx_ieee.h               | 119 ++++++
 target/hexagon/mmvec/macros.h                 |  16 +
 target/hexagon/mmvec/mmvec.h                  |   3 +
 target/hexagon/translate.h                    |   1 +
 tests/tcg/hexagon/hex_test.h                  |  16 +
 tests/tcg/hexagon/hvx_misc.h                  |  14 +
 target/hexagon/attribs_def.h.inc              |   9 +
 target/hexagon/cpu.c                          |   1 +
 target/hexagon/decode.c                       |  22 ++
 target/hexagon/mmvec/kvx_ieee.c               | 234 ++++++++++++
 target/hexagon/translate.c                    |   1 +
 tests/tcg/hexagon/fp_hvx.c                    | 150 ++++++++
 tests/tcg/hexagon/fp_hvx_cmp.c                |  58 +++
 tests/tcg/hexagon/fp_hvx_cvt.c                | 194 ++++++++++
 tests/tcg/hexagon/fp_hvx_disabled.c           |  32 ++
 target/hexagon/hex_common.py                  |   2 +
 target/hexagon/imported/mmvec/encode_ext.def  | 127 +++++--
 target/hexagon/imported/mmvec/ext.idef        | 357 +++++++++++++++++-
 target/hexagon/meson.build                    |   1 +
 .../dockerfiles/debian-hexagon-cross.docker   |  10 +-
 tests/tcg/hexagon/Makefile.target             |  14 +
 22 files changed, 1354 insertions(+), 28 deletions(-)
 create mode 100644 target/hexagon/mmvec/kvx_ieee.h
 create mode 100644 target/hexagon/mmvec/kvx_ieee.c
 create mode 100644 tests/tcg/hexagon/fp_hvx.c
 create mode 100644 tests/tcg/hexagon/fp_hvx_cmp.c
 create mode 100644 tests/tcg/hexagon/fp_hvx_cvt.c
 create mode 100644 tests/tcg/hexagon/fp_hvx_disabled.c

-- 
2.37.2



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/13] tests/docker: Update hexagon cross toolchain to 22.1.0
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings Matheus Tavares Bernardino
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning, Pierrick Bouvier, Alex Bennée

From: Brian Cain <brian.cain@oss.qualcomm.com>

Update the hexagon cross-compiler Docker container to use toolchain
version 22.1.0, replacing the previous 12.Dec.2023 release.

Changes to accommodate the new toolchain:

- Add libc++1, libc++abi1, libunwind-19 runtime deps for the new
  LLVM-based toolchain
- Add zstd for the new .tar.zst archive format
- Update artifact URL domain to artifacts.codelinaro.org

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
---

I've added this patch because the tests that are added at the end of the
series depend on the new toolchain. Brian, feel free to drop this patch
when you rebase the series onto your hex-next if the patch is already
present there.

 tests/docker/dockerfiles/debian-hexagon-cross.docker | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/docker/dockerfiles/debian-hexagon-cross.docker b/tests/docker/dockerfiles/debian-hexagon-cross.docker
index 91d4b71ac9..636d0ca8a0 100644
--- a/tests/docker/dockerfiles/debian-hexagon-cross.docker
+++ b/tests/docker/dockerfiles/debian-hexagon-cross.docker
@@ -19,7 +19,11 @@ RUN apt-get update && \
         curl \
         ccache \
         xz-utils \
+        zstd \
         ca-certificates \
+        libc++1 \
+        libc++abi1 \
+        libunwind-19 \
         bison \
         flex \
         git \
@@ -40,12 +44,12 @@ RUN apt-get update && \
     dpkg-query --showformat '${Package}_${Version}_${Architecture}\n' --show > /packages.txt
 
 ENV TOOLCHAIN_INSTALL /opt
-ENV TOOLCHAIN_RELEASE 12.Dec.2023
+ENV TOOLCHAIN_RELEASE 22.1.0
 ENV TOOLCHAIN_BASENAME "clang+llvm-${TOOLCHAIN_RELEASE}-cross-hexagon-unknown-linux-musl"
-ENV TOOLCHAIN_URL https://codelinaro.jfrog.io/artifactory/codelinaro-toolchain-for-hexagon/${TOOLCHAIN_RELEASE}/${TOOLCHAIN_BASENAME}.tar.xz
+ENV TOOLCHAIN_URL https://artifacts.codelinaro.org/artifactory/codelinaro-toolchain-for-hexagon/${TOOLCHAIN_RELEASE}_/${TOOLCHAIN_BASENAME}.tar.zst
 ENV CCACHE_WRAPPERSDIR "/usr/libexec/ccache-wrappers"
 
-RUN curl -#SL "$TOOLCHAIN_URL" | tar -xJC "$TOOLCHAIN_INSTALL"
+RUN curl -#SL "$TOOLCHAIN_URL" | tar --zstd -xC "$TOOLCHAIN_INSTALL"
 ENV PATH $PATH:${TOOLCHAIN_INSTALL}/${TOOLCHAIN_BASENAME}/x86_64-linux-gnu/bin
 ENV MAKE /usr/bin/make
 # As a final step configure the user (if env is defined)
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 01/13] tests/docker: Update hexagon cross toolchain to 22.1.0 Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 19:21   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension Matheus Tavares Bernardino
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

The following encodings have become stricter since v68:

    - V6_vunpackob, V6_vunpackoh: ---00 -> --000
    - V6_vaddbq/hq/wq, V6_vaddbnq/hnq/wnq: ---01 -> --001
    - V6_vsubbq/hq, V6_vsubwq/bnq/hnq/wnq: ---01/---10 -> --001/--010
    - V6_vhist, V6_vwhist128/256, V6_vwhist128/256_sat: ---00 -> --000
    - V6_vhistq, V6_vwhist128/256q, V6_vwhist128/256q_sat: ---10 -> --010

Pre v68 compilers, by default, already use "0" for the non-specified bit
that changed in v68, so unless someone is manually writing the binary
encoding, this should not cause any backwards incompatibility with
pre-v68 binaries.

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/imported/mmvec/encode_ext.def | 48 ++++++++++----------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 402438f566..6d70086b5f 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -647,36 +647,36 @@ DEF_ENC(V6_vsubububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 101 ddddd")
 DEF_ENC(V6_vmpyewuh_64,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 110 ddddd")
 
 DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","Vx32=Vu32")
-DEF_ENC(V6_vunpackob,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 000 xxxxx") //
-DEF_ENC(V6_vunpackoh,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vunpackob,         ICLASS_CJ" 1 110 --0 --000 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vunpackoh,         ICLASS_CJ" 1 110 --0 --000 PP 1 uuuuu 001 xxxxx") //
 //DEF_ENC(V6_vunpackow,     ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 010 xxxxx") //
 
-DEF_ENC(V6_vhist,            ICLASS_CJ" 1 110 --0 ---00 PP 1 -000- 100 -----")
-DEF_ENC(V6_vwhist256,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -0010 100 -----")
-DEF_ENC(V6_vwhist256_sat,    ICLASS_CJ" 1 110 --0 ---00 PP 1 -0011 100 -----")
-DEF_ENC(V6_vwhist128,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -010- 100 -----")
-DEF_ENC(V6_vwhist128m,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -011i 100 -----")
+DEF_ENC(V6_vhist,            ICLASS_CJ" 1 110 --0 --000 PP 1 -000- 100 -----")
+DEF_ENC(V6_vwhist256,        ICLASS_CJ" 1 110 --0 --000 PP 1 -0010 100 -----")
+DEF_ENC(V6_vwhist256_sat,    ICLASS_CJ" 1 110 --0 --000 PP 1 -0011 100 -----")
+DEF_ENC(V6_vwhist128,        ICLASS_CJ" 1 110 --0 --000 PP 1 -010- 100 -----")
+DEF_ENC(V6_vwhist128m,        ICLASS_CJ" 1 110 --0 --000 PP 1 -011i 100 -----")
 
 DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","if (Qv4) Vx32=Vu32")
-DEF_ENC(V6_vaddbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 000 xxxxx") //
-DEF_ENC(V6_vaddhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 001 xxxxx") //
-DEF_ENC(V6_vaddwq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 010 xxxxx") //
-DEF_ENC(V6_vaddbnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 011 xxxxx") //
-DEF_ENC(V6_vaddhnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 100 xxxxx") //
-DEF_ENC(V6_vaddwnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 101 xxxxx") //
-DEF_ENC(V6_vsubbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 110 xxxxx") //
-DEF_ENC(V6_vsubhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 111 xxxxx") //
+DEF_ENC(V6_vaddbq,             ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vaddhq,             ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddwq,             ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vaddbnq,         ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vaddhnq,         ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddwnq,         ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vsubbq,             ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vsubhq,             ICLASS_CJ" 1 110 vv0 --001 PP 1 uuuuu 111 xxxxx") //
 
-DEF_ENC(V6_vsubwq,             ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 000 xxxxx") //
-DEF_ENC(V6_vsubbnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 001 xxxxx") //
-DEF_ENC(V6_vsubhnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 010 xxxxx") //
-DEF_ENC(V6_vsubwnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vsubwq,             ICLASS_CJ" 1 110 vv0 --010 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vsubbnq,         ICLASS_CJ" 1 110 vv0 --010 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vsubhnq,         ICLASS_CJ" 1 110 vv0 --010 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vsubwnq,         ICLASS_CJ" 1 110 vv0 --010 PP 1 uuuuu 011 xxxxx") //
 
-DEF_ENC(V6_vhistq,            ICLASS_CJ" 1 110 vv0 ---10 PP 1 --00- 100 -----")
-DEF_ENC(V6_vwhist256q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --010 100 -----")
-DEF_ENC(V6_vwhist256q_sat,    ICLASS_CJ" 1 110 vv0 ---10 PP 1 --011 100 -----")
-DEF_ENC(V6_vwhist128q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --10- 100 -----")
-DEF_ENC(V6_vwhist128qm,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --11i 100 -----")
+DEF_ENC(V6_vhistq,            ICLASS_CJ" 1 110 vv0 --010 PP 1 --00- 100 -----")
+DEF_ENC(V6_vwhist256q,        ICLASS_CJ" 1 110 vv0 --010 PP 1 --010 100 -----")
+DEF_ENC(V6_vwhist256q_sat,    ICLASS_CJ" 1 110 vv0 --010 PP 1 --011 100 -----")
+DEF_ENC(V6_vwhist128q,        ICLASS_CJ" 1 110 vv0 --010 PP 1 --10- 100 -----")
+DEF_ENC(V6_vwhist128qm,        ICLASS_CJ" 1 110 vv0 --010 PP 1 --11i 100 -----")
 
 
 DEF_ENC(V6_vandvqv,            ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 000 ddddd")
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 01/13] tests/docker: Update hexagon cross toolchain to 22.1.0 Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 19:32   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns Matheus Tavares Bernardino
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

This flag will be used to control the HVX IEEE float instructions, which
are only available at some Hexagon cores. When unavailable, the
instruction is essentially treated as a no-op.

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/cpu.h             |  1 +
 target/hexagon/translate.h       |  1 +
 target/hexagon/attribs_def.h.inc |  3 +++
 target/hexagon/cpu.c             |  1 +
 target/hexagon/decode.c          | 22 ++++++++++++++++++++++
 target/hexagon/translate.c       |  1 +
 6 files changed, 29 insertions(+)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 85afd59277..77822a48b6 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -127,6 +127,7 @@ struct ArchCPU {
     bool lldb_compat;
     target_ulong lldb_stack_adjust;
     bool short_circuit;
+    bool ieee_fp_extension;
 };
 
 #include "cpu_bits.h"
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index b37cb49238..516aab7038 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -70,6 +70,7 @@ typedef struct DisasContext {
     target_ulong branch_dest;
     bool is_tight_loop;
     bool short_circuit;
+    bool ieee_fp_extension;
     bool read_after_write;
     bool has_hvx_overlap;
     TCGv new_value[TOTAL_PER_THREAD_REGS];
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 9e3a05f882..c85cd5d17c 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -173,5 +173,8 @@ DEF_ATTRIB(NOTE_SHIFT_RESOURCE, "Uses the HVX shift resource.", "", "")
 DEF_ATTRIB(RESTRICT_NOSLOT1_STORE, "Packet must not have slot 1 store", "", "")
 DEF_ATTRIB(RESTRICT_LATEPRED, "Predicate can not be used as a .new.", "", "")
 
+/* HVX IEEE FP extension attributes */
+DEF_ATTRIB(HVX_IEEE_FP, "HVX IEEE FP extension instruction", "", "")
+
 /* Keep this as the last attribute: */
 DEF_ATTRIB(ZZ_LASTATTRIB, "Last attribute in the file", "", "")
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index ffd14bb467..8b72a5d3c8 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -54,6 +54,7 @@ static const Property hexagon_cpu_properties[] = {
     DEFINE_PROP_UNSIGNED("lldb-stack-adjust", HexagonCPU, lldb_stack_adjust, 0,
                          qdev_prop_uint32, target_ulong),
     DEFINE_PROP_BOOL("short-circuit", HexagonCPU, short_circuit, true),
+    DEFINE_PROP_BOOL("ieee-fp", HexagonCPU, ieee_fp_extension, true),
 };
 
 const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = {
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index dbc9c630e8..d832a64a17 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
     return !bitmap_empty(conflict, 32);
 }
 
+static void convert_to_nop(Insn *insn)
+{
+    bool is_endloop = insn->is_endloop;
+    memset(insn, 0, sizeof(*insn));
+    insn->opcode = A2_nop;
+    insn->new_read_idx = -1;
+    insn->dest_idx = -1;
+    insn->generate = opcode_genptr[insn->opcode];
+    insn->iclass = 0b111;
+    insn->is_endloop = is_endloop;
+}
+
 /*
  * decode_packet
  * Decodes packet with given words
@@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int max_words, const uint32_t *words,
         /* Ran out of words! */
         return 0;
     }
+
+    /* Disable HVX IEEE instruction if extension is disabled. */
+    if (!ctx->ieee_fp_extension) {
+        for (i = 0; i < num_insns; i++) {
+            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
+                convert_to_nop(&pkt->insn[i]);
+            }
+        }
+    }
+
     pkt->encod_pkt_size_in_bytes = words_read * 4;
     pkt->pkt_has_hvx = false;
     for (i = 0; i < num_insns; i++) {
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 8a223f6e13..9f8104f949 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -988,6 +988,7 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
     ctx->branch_cond = TCG_COND_NEVER;
     ctx->is_tight_loop = FIELD_EX32(hex_flags, TB_FLAGS, IS_TIGHT_LOOP);
     ctx->short_circuit = hex_cpu->short_circuit;
+    ctx->ieee_fp_extension = hex_cpu->ieee_fp_extension;
 }
 
 static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (2 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 20:28   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns Matheus Tavares Bernardino
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE floating-point arithmetic instructions:
- vmpy_sf_sf, vmpy_sf_hf, vmpy_hf_hf: multiply operations
- vdmpy_sf_hf: dot-product multiply
- vmpy_sf_hf_acc, vmpy_hf_hf_acc, vdmpy_sf_hf_acc: multiply-accumulate
- vadd_sf_sf, vsub_sf_sf, vadd_sf_hf, vsub_sf_hf: add/sub with sf output
- vadd_hf_hf, vsub_hf_hf: add/sub with hf output

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/kvx_ieee.h              | 47 ++++++++++
 target/hexagon/mmvec/macros.h                |  1 +
 target/hexagon/mmvec/mmvec.h                 |  2 +
 target/hexagon/attribs_def.h.inc             |  4 +
 target/hexagon/mmvec/kvx_ieee.c              | 87 ++++++++++++++++++
 target/hexagon/hex_common.py                 |  1 +
 target/hexagon/imported/mmvec/encode_ext.def | 18 ++++
 target/hexagon/imported/mmvec/ext.idef       | 93 ++++++++++++++++++++
 target/hexagon/meson.build                   |  1 +
 9 files changed, 254 insertions(+)
 create mode 100644 target/hexagon/mmvec/kvx_ieee.h
 create mode 100644 target/hexagon/mmvec/kvx_ieee.c

diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
new file mode 100644
index 0000000000..e92ddebeb9
--- /dev/null
+++ b/target/hexagon/mmvec/kvx_ieee.h
@@ -0,0 +1,47 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HEXAGON_KVX_IEEE_H
+#define HEXAGON_KVX_IEEE_H
+
+#include "fpu/softfloat.h"
+
+/* Hexagon canonical NaN */
+#define FP32_DEF_NAN      0x7FFFFFFF
+#define FP16_DEF_NAN      0x7FFF
+
+/*
+ * IEEE - FP ADD/SUB/MPY instructions
+ */
+uint32_t fp_mult_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint32_t fp_add_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint32_t fp_sub_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+
+uint16_t fp_mult_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint16_t fp_add_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint16_t fp_sub_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+
+uint32_t fp_mult_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint32_t fp_add_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint32_t fp_sub_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+
+/*
+ * IEEE - FP Accumulate instructions
+ */
+uint16_t fp_mult_hf_hf_acc(uint16_t a1, uint16_t a2, uint16_t acc,
+                           float_status *fp_status);
+uint32_t fp_mult_sf_hf_acc(uint16_t a1, uint16_t a2, uint32_t acc,
+                           float_status *fp_status);
+
+/*
+ * IEEE - FP Reduce instructions
+ */
+uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
+                  float_status *fp_status);
+uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
+                      uint16_t a4, float_status *fp_status);
+
+#endif
diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index c7840fbf2e..2af3d2d747 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -23,6 +23,7 @@
 #include "mmvec/system_ext_mmvec.h"
 #include "accel/tcg/getpc.h"
 #include "accel/tcg/probe.h"
+#include "mmvec/kvx_ieee.h"
 
 #ifndef QEMU_GENERATE
 #define VdV      (*(MMVector *restrict)(VdV_void))
diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
index 52d470709c..eaedfe0d6d 100644
--- a/target/hexagon/mmvec/mmvec.h
+++ b/target/hexagon/mmvec/mmvec.h
@@ -38,6 +38,8 @@ typedef union {
     int16_t   h[MAX_VEC_SIZE_BYTES / 2];
     uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
     int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+    int32_t  sf[MAX_VEC_SIZE_BYTES / 4];   /* single float (32-bit) */
+    int16_t  hf[MAX_VEC_SIZE_BYTES / 2];   /* half float (16-bit) */
 } MMVector;
 
 typedef union {
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index c85cd5d17c..d3c4bf6301 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -175,6 +175,10 @@ DEF_ATTRIB(RESTRICT_LATEPRED, "Predicate can not be used as a .new.", "", "")
 
 /* HVX IEEE FP extension attributes */
 DEF_ATTRIB(HVX_IEEE_FP, "HVX IEEE FP extension instruction", "", "")
+DEF_ATTRIB(HVX_IEEE_FP_ACC, "HVX IEEE FP accumulate instruction", "", "")
+DEF_ATTRIB(HVX_IEEE_FP_OUT_16, "HVX IEEE FP 16-bit output", "", "")
+DEF_ATTRIB(HVX_IEEE_FP_OUT_32, "HVX IEEE FP 32-bit output", "", "")
+DEF_ATTRIB(CVI_VX_NO_TMP_LD, "HVX multiply without tmp load", "", "")
 
 /* Keep this as the last attribute: */
 DEF_ATTRIB(ZZ_LASTATTRIB, "Last attribute in the file", "", "")
diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
new file mode 100644
index 0000000000..b763899aa3
--- /dev/null
+++ b/target/hexagon/mmvec/kvx_ieee.c
@@ -0,0 +1,87 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "kvx_ieee.h"
+
+#define DEF_FP_INSN_2(name, rt, a1t, a2t, op) \
+    uint##rt##_t fp_##name(uint##a1t##_t a1, uint##a2t##_t a2, \
+                           float_status *fp_status) { \
+        float##a1t f1 = make_float##a1t(a1); \
+        float##a2t f2 = make_float##a2t(a2); \
+        \
+        if (float##a1t##_is_any_nan(f1) || float##a2t##_is_any_nan(f2)) { \
+            return FP##rt##_DEF_NAN; \
+        } \
+        float##rt result = op; \
+        \
+        if (float##rt##_is_any_nan(result)) { \
+            return FP##rt##_DEF_NAN; \
+        } \
+        return result; \
+    }
+
+#define DEF_FP_INSN_3(name, rt, a1t, a2t, a3t, op) \
+    uint##rt##_t fp_##name(uint##a1t##_t a1, uint##a2t##_t a2, \
+                           uint##a3t##_t a3, float_status *fp_status) { \
+        float##a1t f1 = make_float##a1t(a1); \
+        float##a2t f2 = make_float##a2t(a2); \
+        float##a3t f3 = make_float##a3t(a3); \
+        \
+        if (float##a1t##_is_any_nan(f1) || float##a2t##_is_any_nan(f2) || \
+            float##a3t##_is_any_nan(f3)) \
+            return FP##rt##_DEF_NAN; \
+        \
+        float##rt result = op; \
+        \
+        if (float##rt##_is_any_nan(result)) \
+            return FP##rt##_DEF_NAN; \
+        return result; \
+    }
+
+DEF_FP_INSN_2(mult_sf_sf, 32, 32, 32, float32_mul(f1, f2, fp_status))
+DEF_FP_INSN_2(add_sf_sf, 32, 32, 32, float32_add(f1, f2, fp_status))
+DEF_FP_INSN_2(sub_sf_sf, 32, 32, 32, float32_sub(f1, f2, fp_status))
+
+DEF_FP_INSN_2(mult_hf_hf, 16, 16, 16, float16_mul(f1, f2, fp_status))
+DEF_FP_INSN_2(add_hf_hf, 16, 16, 16, float16_add(f1, f2, fp_status))
+DEF_FP_INSN_2(sub_hf_hf, 16, 16, 16, float16_sub(f1, f2, fp_status))
+
+DEF_FP_INSN_2(mult_sf_hf, 32, 16, 16,
+              float32_mul(float16_to_float32(f1, true, fp_status),
+                          float16_to_float32(f2, true, fp_status),
+                          fp_status))
+DEF_FP_INSN_2(add_sf_hf, 32, 16, 16,
+              float32_add(float16_to_float32(f1, true, fp_status),
+                          float16_to_float32(f2, true, fp_status),
+                          fp_status))
+DEF_FP_INSN_2(sub_sf_hf, 32, 16, 16,
+              float32_sub(float16_to_float32(f1, true, fp_status),
+                          float16_to_float32(f2, true, fp_status),
+                          fp_status))
+
+DEF_FP_INSN_3(mult_hf_hf_acc, 16, 16, 16, 16,
+              float16_muladd(f1, f2, f3, 0, fp_status))
+DEF_FP_INSN_3(mult_sf_hf_acc, 32, 16, 16, 32,
+              float32_muladd(float16_to_float32(f1, true, fp_status),
+                             float16_to_float32(f2, true, fp_status),
+                             f3, 0, fp_status))
+
+uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
+                 float_status *fp_status)
+{
+    float32 prod1 = fp_mult_sf_hf(a1, a3, fp_status);
+    float32 prod2 = fp_mult_sf_hf(a2, a4, fp_status);
+    return fp_add_sf_sf(float32_val(prod1), float32_val(prod2), fp_status);
+}
+
+uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2,
+                      uint16_t a3, uint16_t a4,
+                      float_status *fp_status)
+{
+    float32 red = fp_vdmpy(a1, a2, a3, a4, fp_status);
+    return fp_add_sf_sf(float32_val(red), acc, fp_status);
+}
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index c0e9f26aeb..f6a2848db1 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -215,6 +215,7 @@ def need_env(tag):
             "A_LOAD" in attribdict[tag] or
             "A_CVI_GATHER" in attribdict[tag] or
             "A_CVI_SCATTER" in attribdict[tag] or
+            "A_HVX_IEEE_FP" in attribdict[tag] or
             "A_IMPLICIT_WRITES_USR" in attribdict[tag])
 
 
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 6d70086b5f..4ce87d09fd 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -804,5 +804,23 @@ DEF_ENC(V6_vmpyewuh,    ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 101 ddddd")
 DEF_ENC(V6_vmpyowh,        ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 111 ddddd")
 DEF_ENC(V6_vmpyuhvs,"00011111110vvvvvPP1uuuuu111ddddd")
 
+/* IEEE FP multiply instructions */
+DEF_ENC(V6_vmpy_sf_sf,"00011111100vvvvvPP1uuuuu001ddddd")
+DEF_ENC(V6_vmpy_sf_hf,"00011111100vvvvvPP1uuuuu010ddddd")
+DEF_ENC(V6_vmpy_hf_hf,"00011111100vvvvvPP1uuuuu011ddddd")
+DEF_ENC(V6_vdmpy_sf_hf,"00011111101vvvvvPP1uuuuu110ddddd")
+
+/* IEEE FP multiply-accumulate instructions */
+DEF_ENC(V6_vmpy_sf_hf_acc,"00011100010vvvvvPP1uuuuu001xxxxx")
+DEF_ENC(V6_vmpy_hf_hf_acc,"00011100010vvvvvPP1uuuuu010xxxxx")
+DEF_ENC(V6_vdmpy_sf_hf_acc,"00011100010vvvvvPP1uuuuu011xxxxx")
+
+/* IEEE FP add/sub instructions */
+DEF_ENC(V6_vadd_sf_sf,"00011111100vvvvvPP1uuuuu110ddddd")
+DEF_ENC(V6_vsub_sf_sf,"00011111100vvvvvPP1uuuuu111ddddd")
+DEF_ENC(V6_vadd_sf_hf,"00011111100vvvvvPP1uuuuu100ddddd")
+DEF_ENC(V6_vsub_sf_hf,"00011111100vvvvvPP1uuuuu101ddddd")
+DEF_ENC(V6_vadd_hf_hf,"00011111101vvvvvPP1uuuuu111ddddd")
+DEF_ENC(V6_vsub_hf_hf,"00011111011vvvvvPP1uuuuu000ddddd")
 
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 03d31f6181..3f0d8e366e 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -2895,9 +2895,102 @@ EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_
     }
     } )
 
+/* KVX - IEEE FP Instructions */
 
+/* Single pipe, 32-bit output */
+#define ITERATOR_INSN_IEEE_FP_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_32), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
+/* Single pipe, 16-bit output */
+#define ITERATOR_INSN_IEEE_FP_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
+/* Two pipes: P2 & P3, single output: P2, 32-bit output */
+#define ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* Two pipes: P2 & P3, two outputs, 32-bit output */
+#define ITERATOR_INSN_IEEE_FP_DOUBLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/*
+ * single pipe, accumulate instruction, produces 16-bit output, requires 16-bit
+ * accumulate input
+ */
+#define ITERATOR_INSN_IEEE_FP_ACC_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_16,A_CVI_VX_NO_TMP_LD), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/*
+ * single pipe, accumulate instruction, produces 32-bit output, requires 32-bit
+ * accumulate input
+ */
+#define ITERATOR_INSN_IEEE_FP_ACC_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_32,A_CVI_VX_NO_TMP_LD), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* IEEE FP multiply instructions */
+ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(32, vmpy_sf_sf,
+    "Vd32.sf=vmpy(Vu32.sf,Vv32.sf)", "Vector IEEE mul: sf",
+    VdV.sf[i] = fp_mult_sf_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vmpy_sf_hf,
+    "Vdd32.sf=vmpy(Vu32.hf,Vv32.hf)", "Vector IEEE mul: hf widen to sf",
+    VddV.v[0].sf[i] = fp_mult_sf_hf(VuV.hf[2*i], VvV.hf[2*i], &env->fp_status);
+    VddV.v[1].sf[i] = fp_mult_sf_hf(VuV.hf[2*i+1], VvV.hf[2*i+1], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16(16, vmpy_hf_hf,     "Vd32.hf=vmpy(Vu32.hf,Vv32.hf)",
+    "Vector IEEE mul: hf",
+    VdV.hf[i] = fp_mult_hf_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_32(32, vdmpy_sf_hf,     "Vd32.sf=vdmpy(Vu32.hf,Vv32.hf)",
+    "Vector IEEE mul reduction: hf widen to sf",
+    VdV.sf[i] = fp_vdmpy(VuV.hf[2*i+1], VuV.hf[2*i], VvV.hf[2*i+1],
+        VvV.hf[2*i], &env->fp_status))
+
+/* IEEE FP multiply-accumulate instructions */
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vmpy_sf_hf_acc,
+    "Vxx32.sf+=vmpy(Vu32.hf,Vv32.hf)", "Vector IEEE fma: hf widen to sf",
+    VxxV.v[0].sf[i] = fp_mult_sf_hf_acc(VuV.hf[2*i], VvV.hf[2*i],
+        VxxV.v[0].sf[i], &env->fp_status);
+    VxxV.v[1].sf[i] = fp_mult_sf_hf_acc(VuV.hf[2*i+1], VvV.hf[2*i+1],
+        VxxV.v[1].sf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_ACC_16(16, vmpy_hf_hf_acc,
+    "Vx32.hf+=vmpy(Vu32.hf,Vv32.hf)", "Vector IEEE fma: hf",
+    VxV.hf[i] = fp_mult_hf_hf_acc(VuV.hf[i], VvV.hf[i], VxV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_ACC_32(32, vdmpy_sf_hf_acc,
+    "Vx32.sf+=vdmpy(Vu32.hf,Vv32.hf)", "Vector IEEE fma reduce: hf widen to sf",
+    VxV.sf[i] = fp_vdmpy_acc(VxV.sf[i], VuV.hf[2*i+1], VuV.hf[2*i], VvV.hf[2*i+1],
+        VvV.hf[2*i], &env->fp_status))
+
+/* IEEE FP add/sub instructions */
+ITERATOR_INSN_IEEE_FP_32(32, vadd_sf_sf, "Vd32.sf=vadd(Vu32.sf,Vv32.sf)",
+    "Vector IEEE add: sf",
+    VdV.sf[i] = fp_add_sf_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_32(32, vsub_sf_sf, "Vd32.sf=vsub(Vu32.sf,Vv32.sf)",
+    "Vector IEEE sub: sf",
+    VdV.sf[i] = fp_sub_sf_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16(16, vadd_hf_hf, "Vd32.hf=vadd(Vu32.hf,Vv32.hf)",
+    "Vector IEEE add: hf",
+    VdV.hf[i] = fp_add_hf_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16(16, vsub_hf_hf, "Vd32.hf=vsub(Vu32.hf,Vv32.hf)",
+    "Vector IEEE sub: hf",
+    VdV.hf[i] = fp_sub_hf_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vadd_sf_hf,
+    "Vdd32.sf=vadd(Vu32.hf,Vv32.hf)",  "Vector IEEE add: hf widen to sf",
+    VddV.v[0].sf[i] = fp_add_sf_hf(VuV.hf[2*i], VvV.hf[2*i], &env->fp_status);
+    VddV.v[1].sf[i] = fp_add_sf_hf(VuV.hf[2*i+1], VvV.hf[2*i+1], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vsub_sf_hf,
+    "Vdd32.sf=vsub(Vu32.hf,Vv32.hf)",  "Vector IEEE sub: hf widen to sf",
+    VddV.v[0].sf[i] = fp_sub_sf_hf(VuV.hf[2*i], VvV.hf[2*i], &env->fp_status);
+    VddV.v[1].sf[i] = fp_sub_sf_hf(VuV.hf[2*i+1], VvV.hf[2*i+1], &env->fp_status))
 
 /******************************************************************************
  DEBUG Vector/Register Printing
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index d169cf71b2..f9a93975ad 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -250,6 +250,7 @@ hexagon_ss.add(files(
     'fma_emu.c',
     'mmvec/decode_ext_mmvec.c',
     'mmvec/system_ext_mmvec.c',
+    'mmvec/kvx_ieee.c',
 ))
 
 #
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (3 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 20:47   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns Matheus Tavares Bernardino
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE floating-point min/max instructions:
- vfmin_hf, vfmin_sf: IEEE floating-point minimum
- vfmax_hf, vfmax_sf: IEEE floating-point maximum
- vmax_hf, vmax_sf: qfloat IEEE maximum
- vmin_hf, vmin_sf: qfloat IEEE minimum

The Hexagon qfloat variants are similar to the IEEE-754 ones, but they
handle NaN slightly differently. See comment on kvx_ieee.h

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/kvx_ieee.h              | 12 +++++
 target/hexagon/mmvec/kvx_ieee.c              | 46 ++++++++++++++++++++
 target/hexagon/imported/mmvec/encode_ext.def | 11 +++++
 target/hexagon/imported/mmvec/ext.idef       | 28 +++++++++++-
 4 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
index e92ddebeb9..78f546eb8e 100644
--- a/target/hexagon/mmvec/kvx_ieee.h
+++ b/target/hexagon/mmvec/kvx_ieee.h
@@ -44,4 +44,16 @@ uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
 uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
                       uint16_t a4, float_status *fp_status);
 
+/* IEEE - FP min/max instructions */
+uint32_t fp_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint32_t fp_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint16_t fp_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint16_t fp_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+
+/* Qfloat min/max treat +NaN as greater than +INF and -NaN as smaller than -INF */
+uint32_t qf_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint32_t qf_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
+uint16_t qf_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
+
 #endif
diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
index b763899aa3..33621a15f3 100644
--- a/target/hexagon/mmvec/kvx_ieee.c
+++ b/target/hexagon/mmvec/kvx_ieee.c
@@ -85,3 +85,49 @@ uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2,
     float32 red = fp_vdmpy(a1, a2, a3, a4, fp_status);
     return fp_add_sf_sf(float32_val(red), acc, fp_status);
 }
+
+DEF_FP_INSN_2(min_sf, 32, 32, 32, float32_min(f1, f2, fp_status))
+DEF_FP_INSN_2(max_sf, 32, 32, 32, float32_max(f1, f2, fp_status))
+DEF_FP_INSN_2(min_hf, 16, 16, 16, float16_min(f1, f2, fp_status))
+DEF_FP_INSN_2(max_hf, 16, 16, 16, float16_max(f1, f2, fp_status))
+
+#define float32_is_pos_nan(X) (float32_is_any_nan(X) && !float32_is_neg(X))
+#define float32_is_neg_nan(X) (float32_is_any_nan(X) && float32_is_neg(X))
+#define float16_is_pos_nan(X) (float16_is_any_nan(X) && !float16_is_neg(X))
+#define float16_is_neg_nan(X) (float16_is_any_nan(X) && float16_is_neg(X))
+
+uint32_t qf_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status)
+{
+    float32 f1 = make_float32(a1);
+    float32 f2 = make_float32(a2);
+    if (float32_is_pos_nan(f1) || float32_is_neg_nan(f2)) return a1;
+    if (float32_is_pos_nan(f2) || float32_is_neg_nan(f1)) return a2;
+    return fp_max_sf(a1, a2, fp_status);
+}
+
+uint32_t qf_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status)
+{
+    float32 f1 = make_float32(a1);
+    float32 f2 = make_float32(a2);
+    if (float32_is_pos_nan(f1) || float32_is_neg_nan(f2)) return a2;
+    if (float32_is_pos_nan(f2) || float32_is_neg_nan(f1)) return a1;
+    return fp_min_sf(a1, a2, fp_status);
+}
+
+uint16_t qf_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status)
+{
+    float16 f1 = make_float16(a1);
+    float16 f2 = make_float16(a2);
+    if (float16_is_pos_nan(f1) || float16_is_neg_nan(f2)) return a1;
+    if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a2;
+    return fp_max_hf(a1, a2, fp_status);
+}
+
+uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status)
+{
+    float16 f1 = make_float16(a1);
+    float16 f2 = make_float16(a2);
+    if (float16_is_pos_nan(f1) || float16_is_neg_nan(f2)) return a2;
+    if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a1;
+    return fp_min_hf(a1, a2, fp_status);
+}
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 4ce87d09fd..23fbb75743 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -823,4 +823,15 @@ DEF_ENC(V6_vsub_sf_hf,"00011111100vvvvvPP1uuuuu101ddddd")
 DEF_ENC(V6_vadd_hf_hf,"00011111101vvvvvPP1uuuuu111ddddd")
 DEF_ENC(V6_vsub_hf_hf,"00011111011vvvvvPP1uuuuu000ddddd")
 
+/* IEEE FP min/max instructions */
+DEF_ENC(V6_vfmin_hf,"00011100011vvvvvPP1uuuuu000ddddd")
+DEF_ENC(V6_vfmin_sf,"00011100011vvvvvPP1uuuuu001ddddd")
+DEF_ENC(V6_vfmax_hf,"00011100011vvvvvPP1uuuuu010ddddd")
+DEF_ENC(V6_vfmax_sf,"00011100011vvvvvPP1uuuuu011ddddd")
+DEF_ENC(V6_vmax_sf,"00011111110vvvvvPP1uuuuu001ddddd")
+DEF_ENC(V6_vmin_sf,"00011111110vvvvvPP1uuuuu010ddddd")
+DEF_ENC(V6_vmax_hf,"00011111110vvvvvPP1uuuuu011ddddd")
+DEF_ENC(V6_vmin_hf,"00011111110vvvvvPP1uuuuu100ddddd")
+DEF_ENC(V6_vcvt_ub_hf,"00011111110vvvvvPP1uuuuu101ddddd")
+
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 3f0d8e366e..43153366b1 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -43,7 +43,9 @@
 EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),  \
 DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
-
+#define ITERATOR_INSN_ANY_SLOT_2SRC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
 #define ITERATOR_INSN2_ANY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
 ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
@@ -2992,6 +2994,30 @@ ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vsub_sf_hf,
     VddV.v[0].sf[i] = fp_sub_sf_hf(VuV.hf[2*i], VvV.hf[2*i], &env->fp_status);
     VddV.v[1].sf[i] = fp_sub_sf_hf(VuV.hf[2*i+1], VvV.hf[2*i+1], &env->fp_status))
 
+#define ITERATOR_INSN_IEEE_FP_16_32_LATE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+        ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16,A_HVX_IEEE_FP_OUT_32), \
+        DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* IEEE FP min/max instructions */
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vfmin_hf, "Vd32.hf=vfmin(Vu32.hf,Vv32.hf)", \
+    "Vector IEEE min: hf",  VdV.hf[i] = fp_min_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vfmin_sf, "Vd32.sf=vfmin(Vu32.sf,Vv32.sf)", \
+    "Vector IEEE min: sf",  VdV.sf[i] = fp_min_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vfmax_hf,  "Vd32.hf=vfmax(Vu32.hf,Vv32.hf)", \
+    "Vector IEEE max: hf", VdV.hf[i] = fp_max_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vfmax_sf,  "Vd32.sf=vfmax(Vu32.sf,Vv32.sf)", \
+    "Vector IEEE max: sf", VdV.sf[i] = fp_max_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+
+ITERATOR_INSN_ANY_SLOT_2SRC(32,vmax_sf,"Vd32.sf=vmax(Vu32.sf,Vv32.sf)", \
+    "Vector max of sf input", VdV.sf[i] = qf_max_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_ANY_SLOT_2SRC(32,vmin_sf,"Vd32.sf=vmin(Vu32.sf,Vv32.sf)", \
+    "Vector min of sf input", VdV.sf[i] = qf_min_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
+ITERATOR_INSN_ANY_SLOT_2SRC(16,vmax_hf,"Vd32.hf=vmax(Vu32.hf,Vv32.hf)", \
+    "Vector max of hf input", VdV.hf[i] = qf_max_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+ITERATOR_INSN_ANY_SLOT_2SRC(16,vmin_hf,"Vd32.hf=vmin(Vu32.hf,Vv32.hf)", \
+    "Vector min of hf input", VdV.hf[i] = qf_min_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
+
 /******************************************************************************
  DEBUG Vector/Register Printing
  ******************************************************************************/
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (4 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 21:08   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns Matheus Tavares Bernardino
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE floating-point miscellaneous instructions:
- vassign_fp (vfmv): vector move
- vfneg_hf, vfneg_sf: vector floating-point negate
- vabs_hf, vabs_sf: vector absolute value

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/kvx_ieee.h              |  3 +++
 target/hexagon/imported/mmvec/encode_ext.def |  7 +++++++
 target/hexagon/imported/mmvec/ext.idef       | 14 ++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
index 78f546eb8e..263feb7e94 100644
--- a/target/hexagon/mmvec/kvx_ieee.h
+++ b/target/hexagon/mmvec/kvx_ieee.h
@@ -13,6 +13,9 @@
 #define FP32_DEF_NAN      0x7FFFFFFF
 #define FP16_DEF_NAN      0x7FFF
 
+#define signF32UI(a) ((bool)((uint32_t)(a) >> 31))
+#define signF16UI(a) ((bool)((uint16_t)(a) >> 15))
+
 /*
  * IEEE - FP ADD/SUB/MPY instructions
  */
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 23fbb75743..7138e593dd 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -834,4 +834,11 @@ DEF_ENC(V6_vmax_hf,"00011111110vvvvvPP1uuuuu011ddddd")
 DEF_ENC(V6_vmin_hf,"00011111110vvvvvPP1uuuuu100ddddd")
 DEF_ENC(V6_vcvt_ub_hf,"00011111110vvvvvPP1uuuuu101ddddd")
 
+/* IEEE FP move, negate, abs instructions */
+DEF_ENC(V6_vassign_fp,"00011110--0-0110PP1uuuuu001ddddd")
+DEF_ENC(V6_vfneg_hf,"00011110--0-0110PP1uuuuu010ddddd")
+DEF_ENC(V6_vfneg_sf,"00011110--0-0110PP1uuuuu011ddddd")
+DEF_ENC(V6_vabs_hf,"00011110--0-0110PP1uuuuu100ddddd")
+DEF_ENC(V6_vabs_sf,"00011110--0-0110PP1uuuuu101ddddd")
+
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 43153366b1..5ef5baa404 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -3018,6 +3018,20 @@ ITERATOR_INSN_ANY_SLOT_2SRC(16,vmax_hf,"Vd32.hf=vmax(Vu32.hf,Vv32.hf)", \
 ITERATOR_INSN_ANY_SLOT_2SRC(16,vmin_hf,"Vd32.hf=vmin(Vu32.hf,Vv32.hf)", \
     "Vector min of hf input", VdV.hf[i] = qf_min_hf(VuV.hf[i], VvV.hf[i], &env->fp_status))
 
+/* IEEE FP move, negate, abs instructions */
+ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vassign_fp, "Vd32.w=vfmv(Vu32.w)", \
+    "Vector IEEE move", VdV.w[i]  = VuV.w[i])
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vfneg_hf, "Vd32.hf=vfneg(Vu32.hf)", \
+    "Vector IEEE neg: hf", VdV.hf[i] = (VuV.hf[i] ^ 0x8000))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vfneg_sf, "Vd32.sf=vfneg(Vu32.sf)", \
+    "Vector IEEE neg: sf", VdV.sf[i] = (VuV.sf[i] ^ 0x80000000))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vabs_hf,  "Vd32.hf=vabs(Vu32.hf)", \
+    "Vector IEEE abs: hf", \
+    VdV.hf[i] = ((signF16UI(VuV.hf[i])) ? (VuV.hf[i] ^ 0x8000) : VuV.hf[i]))
+ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vabs_sf,  "Vd32.sf=vabs(Vu32.sf)", \
+    "Vector IEEE abs: sf", \
+    VdV.sf[i] = ((signF32UI(VuV.sf[i])) ? (VuV.sf[i] ^ 0x80000000) : VuV.sf[i]))
+
 /******************************************************************************
  DEBUG Vector/Register Printing
  ******************************************************************************/
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (5 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 21:25   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns Matheus Tavares Bernardino
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE floating-point conversion instructions:
- vconv_hf_h, vconv_h_hf, vconv_sf_w, vconv_w_sf: vconv operations
- vcvt_hf_sf, vcvt_sf_hf: float <-> half float conversions
- vcvt_hf_b, vcvt_hf_h, vcvt_hf_ub, vcvt_hf_uh: int to half float
- vcvt_b_hf, vcvt_h_hf, vcvt_ub_hf, vcvt_uh_hf: half float to int

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/kvx_ieee.h              | 21 +++++
 target/hexagon/mmvec/kvx_ieee.c              | 98 ++++++++++++++++++++
 target/hexagon/imported/mmvec/encode_ext.def | 18 ++++
 target/hexagon/imported/mmvec/ext.idef       | 97 +++++++++++++++++++
 4 files changed, 234 insertions(+)

diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
index 263feb7e94..8a6816f6b3 100644
--- a/target/hexagon/mmvec/kvx_ieee.h
+++ b/target/hexagon/mmvec/kvx_ieee.h
@@ -59,4 +59,25 @@ uint32_t qf_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
 uint16_t qf_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
 uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
 
+/*
+ * IEEE - FP Convert instructions
+ */
+uint16_t f32_to_f16(uint32_t a, float_status *fp_status);
+uint32_t f16_to_f32(uint16_t a, float_status *fp_status);
+
+uint16_t f16_to_uh(uint16_t op1, float_status *fp_status);
+int16_t  f16_to_h(uint16_t op1, float_status *fp_status);
+uint8_t  f16_to_ub(uint16_t op1, float_status *fp_status);
+int8_t   f16_to_b(uint16_t op1, float_status *fp_status);
+
+uint16_t uh_to_f16(uint16_t op1);
+uint16_t h_to_f16(int16_t op1);
+uint16_t ub_to_f16(uint8_t op1);
+uint16_t b_to_f16(int8_t op1);
+
+int32_t conv_sf_w(int32_t a, float_status *fp_status);
+int16_t conv_hf_h(int16_t a, float_status *fp_status);
+int32_t conv_w_sf(uint32_t a, float_status *fp_status);
+int16_t conv_h_hf(uint16_t a, float_status *fp_status);
+
 #endif
diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
index 33621a15f3..bbeec09707 100644
--- a/target/hexagon/mmvec/kvx_ieee.c
+++ b/target/hexagon/mmvec/kvx_ieee.c
@@ -131,3 +131,101 @@ uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status)
     if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a1;
     return fp_min_hf(a1, a2, fp_status);
 }
+
+uint16_t f32_to_f16(uint32_t a, float_status *fp_status)
+{
+    return float16_val(float32_to_float16(make_float32(a), true, fp_status));
+}
+
+uint32_t f16_to_f32(uint16_t a, float_status *fp_status)
+{
+    return float32_val(float16_to_float32(make_float16(a), true, fp_status));
+}
+
+uint16_t f16_to_uh(uint16_t op1, float_status *fp_status)
+{
+    return float16_to_uint16_scalbn(make_float16(op1),
+                                    float_round_nearest_even,
+                                    0, fp_status);
+}
+
+int16_t f16_to_h(uint16_t op1, float_status *fp_status)
+{
+    return float16_to_int16_scalbn(make_float16(op1),
+                                   float_round_nearest_even,
+                                   0, fp_status);
+}
+
+uint8_t f16_to_ub(uint16_t op1, float_status *fp_status)
+{
+    return float16_to_uint8_scalbn(make_float16(op1),
+                                   float_round_nearest_even,
+                                   0, fp_status);
+}
+
+int8_t f16_to_b(uint16_t op1, float_status *fp_status)
+{
+    return float16_to_int8_scalbn(make_float16(op1),
+                                   float_round_nearest_even,
+                                   0, fp_status);
+}
+
+uint16_t uh_to_f16(uint16_t op1)
+{
+    return uint64_to_float16_scalbn(op1, float_round_nearest_even, 0);
+}
+
+uint16_t h_to_f16(int16_t op1)
+{
+    return int64_to_float16_scalbn(op1, float_round_nearest_even, 0);
+}
+
+uint16_t ub_to_f16(uint8_t op1)
+{
+    return uint64_to_float16_scalbn(op1, float_round_nearest_even, 0);
+}
+
+uint16_t b_to_f16(int8_t op1)
+{
+    return int64_to_float16_scalbn(op1, float_round_nearest_even, 0);
+}
+
+int32_t conv_sf_w(int32_t a, float_status *fp_status)
+{
+    return float32_val(int32_to_float32(a, fp_status));
+}
+
+int16_t conv_hf_h(int16_t a, float_status *fp_status)
+{
+    return float16_val(int16_to_float16(a, fp_status));
+}
+
+int32_t conv_w_sf(uint32_t a, float_status *fp_status)
+{
+    float_status scratch_fpst = {};
+    const float32 W_MAX = int32_to_float32(INT32_MAX, &scratch_fpst);
+    const float32 W_MIN = int32_to_float32(INT32_MIN, &scratch_fpst);
+    float32 f1 = make_float32(a);
+
+    if (float32_is_any_nan(f1) || float32_is_infinity(f1) ||
+        float32_le_quiet(W_MAX, f1, fp_status) ||
+        float32_le_quiet(f1, W_MIN, fp_status)) {
+        return float32_is_neg(f1) ? INT32_MIN : INT32_MAX;
+    }
+    return float32_to_int32_round_to_zero(f1, fp_status);
+}
+
+int16_t conv_h_hf(uint16_t a, float_status *fp_status)
+{
+    float_status scratch_fpst = {};
+    const float16 H_MAX = int16_to_float16(INT16_MAX, &scratch_fpst);
+    const float16 H_MIN = int16_to_float16(INT16_MIN, &scratch_fpst);
+    float16 f1 = make_float16(a);
+
+    if (float16_is_any_nan(f1) || float16_is_infinity(f1) ||
+        float16_le_quiet(H_MAX, f1, fp_status) ||
+        float16_le_quiet(f1, H_MIN, fp_status)) {
+        return float16_is_neg(f1) ? INT16_MIN : INT16_MAX;
+    }
+    return float16_to_int16_round_to_zero(f1, fp_status);
+}
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 7138e593dd..5325bbd704 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -841,4 +841,22 @@ DEF_ENC(V6_vfneg_sf,"00011110--0-0110PP1uuuuu011ddddd")
 DEF_ENC(V6_vabs_hf,"00011110--0-0110PP1uuuuu100ddddd")
 DEF_ENC(V6_vabs_sf,"00011110--0-0110PP1uuuuu101ddddd")
 
+/* IEEE FP vcvt instructions */
+DEF_ENC(V6_vcvt_sf_hf,"00011110--0-0100PP1uuuuu100ddddd")
+DEF_ENC(V6_vcvt_hf_sf,"00011111011vvvvvPP1uuuuu001ddddd")
+DEF_ENC(V6_vcvt_hf_ub,"00011110--0-0100PP1uuuuu001ddddd")
+DEF_ENC(V6_vcvt_hf_b,"00011110--0-0100PP1uuuuu010ddddd")
+DEF_ENC(V6_vcvt_hf_uh,"00011110--0-0100PP1uuuuu101ddddd")
+DEF_ENC(V6_vcvt_hf_h,"00011110--0-0100PP1uuuuu111ddddd")
+DEF_ENC(V6_vcvt_uh_hf,"00011110--0--101PP1uuuuu000ddddd")
+DEF_ENC(V6_vcvt_h_hf,"00011110--0-0110PP1uuuuu000ddddd")
+DEF_ENC(V6_vcvt_ub_hf,"00011111110vvvvvPP1uuuuu101ddddd")
+DEF_ENC(V6_vcvt_b_hf,"00011111110vvvvvPP1uuuuu110ddddd")
+
+/* IEEE FP vconv instructions */
+DEF_ENC(V6_vconv_sf_w,"00011110--0--101PP1uuuuu011ddddd")
+DEF_ENC(V6_vconv_w_sf,"00011110--0--101PP1uuuuu001ddddd")
+DEF_ENC(V6_vconv_hf_h,"00011110--0--101PP1uuuuu100ddddd")
+DEF_ENC(V6_vconv_h_hf,"00011110--0--101PP1uuuuu010ddddd")
+
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 5ef5baa404..8b832166e0 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -63,6 +63,9 @@ ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
 EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
 DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
+#define ITERATOR_INSN_SHIFT_SLOT_FLT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_HVX_FLT),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
 #define ITERATOR_INSN_SHIFT3_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
 EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_CVI_VS_3SRC,A_NOTE_SHIFT_RESOURCE,A_NOTE_NOVP,A_NOTE_VA_UNARY),  \
@@ -3032,6 +3035,100 @@ ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vabs_sf,  "Vd32.sf=vabs(Vu32.sf)", \
     "Vector IEEE abs: sf", \
     VdV.sf[i] = ((signF32UI(VuV.sf[i])) ? (VuV.sf[i] ^ 0x80000000) : VuV.sf[i]))
 
+/* Two pipes: P2 & P3, two outputs, 16-bit */
+#define ITERATOR_INSN_IEEE_FP_DOUBLE_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_16), \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* Two pipes: P2 & P3, two outputs, 32-bit output */
+#define ITERATOR_INSN_IEEE_FP_DOUBLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+    ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32), \
+    DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* Single pipe, 16-bit output */
+#define ITERATOR_INSN_IEEE_FP_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+    ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16), \
+    DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/* single pipe, output can feed 16- or 32-bit accumulate */
+#define ITERATOR_INSN_IEEE_FP_16_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+    ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16,A_HVX_IEEE_FP_OUT_32), \
+    DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+/******************************************************************************
+ * IEEE FP convert instructions
+ ******************************************************************************/
+
+ITERATOR_INSN_IEEE_FP_DOUBLE_16(32,  vcvt_hf_ub, "Vdd32.hf=vcvt(Vu32.ub)",
+    "Vector IEEE cvt from int: ub widen to hf",
+    VddV.v[0].hf[2*i]   = ub_to_f16(VuV.ub[4*i]);
+    VddV.v[0].hf[2*i+1] = ub_to_f16(VuV.ub[4*i+1]);
+    VddV.v[1].hf[2*i]   = ub_to_f16(VuV.ub[4*i+2]);
+    VddV.v[1].hf[2*i+1] = ub_to_f16(VuV.ub[4*i+3]))
+
+ITERATOR_INSN_IEEE_FP_DOUBLE_16(32,  vcvt_hf_b,  "Vdd32.hf=vcvt(Vu32.b)",
+    "Vector IEEE cvt from int: b widen to hf",
+    VddV.v[0].hf[2*i]   = b_to_f16(VuV.b[4*i]);
+    VddV.v[0].hf[2*i+1] = b_to_f16(VuV.b[4*i+1]);
+    VddV.v[1].hf[2*i]   = b_to_f16(VuV.b[4*i+2]);
+    VddV.v[1].hf[2*i+1] = b_to_f16(VuV.b[4*i+3]))
+
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vcvt_sf_hf, "Vdd32.sf=vcvt(Vu32.hf)",
+    "Vector IEEE cvt: hf widen to sf",
+    VddV.v[0].sf[i] = f16_to_f32(VuV.hf[2*i], &env->fp_status);
+    VddV.v[1].sf[i] = f16_to_f32(VuV.hf[2*i+1], &env->fp_status))
+
+ITERATOR_INSN_IEEE_FP_16(16,    vcvt_hf_uh, "Vd32.hf=vcvt(Vu32.uh)",
+    "Vector IEEE cvt from int: uh to hf",
+    VdV.hf[i] = uh_to_f16(VuV.uh[i]))
+ITERATOR_INSN_IEEE_FP_16(16,    vcvt_hf_h,  "Vd32.hf=vcvt(Vu32.h)",
+    "Vector IEEE cvt from int: h to hf",
+    VdV.hf[i] = h_to_f16(VuV.h[i]))
+ITERATOR_INSN_IEEE_FP_16_32(16, vcvt_uh_hf, "Vd32.uh=vcvt(Vu32.hf)",
+    "Vector IEEE cvt to int: hf to uh",
+    VdV.uh[i] = f16_to_uh(VuV.hf[i], &env->fp_status))
+ITERATOR_INSN_IEEE_FP_16_32(16, vcvt_h_hf,  "Vd32.h=vcvt(Vu32.hf)",
+    "Vector IEEE cvt to int: hf to h",
+    VdV.h[i]  = f16_to_h(VuV.hf[i], &env->fp_status))
+
+ITERATOR_INSN_IEEE_FP_16(32, vcvt_hf_sf, "Vd32.hf=vcvt(Vu32.sf,Vv32.sf)",
+    "Vector IEEE cvt: sf to hf",
+    VdV.hf[2*i]   = f32_to_f16(VuV.sf[i], &env->fp_status);
+    VdV.hf[2*i+1] = f32_to_f16(VvV.sf[i], &env->fp_status))
+
+ITERATOR_INSN_IEEE_FP_16_32(32, vcvt_ub_hf, "Vd32.ub=vcvt(Vu32.hf,Vv32.hf)", "Vector cvt to int: hf narrow to ub",
+    VdV.ub[4*i]   = f16_to_ub(VuV.hf[2*i], &env->fp_status);
+    VdV.ub[4*i+1] = f16_to_ub(VuV.hf[2*i+1], &env->fp_status);
+    VdV.ub[4*i+2] = f16_to_ub(VvV.hf[2*i], &env->fp_status);
+    VdV.ub[4*i+3] = f16_to_ub(VvV.hf[2*i+1], &env->fp_status))
+
+ITERATOR_INSN_IEEE_FP_16_32(32, vcvt_b_hf,  "Vd32.b=vcvt(Vu32.hf,Vv32.hf)",
+    "Vector cvt to int: hf narrow to b",
+    VdV.b[4*i]   = f16_to_b(VuV.hf[2*i], &env->fp_status);
+    VdV.b[4*i+1] = f16_to_b(VuV.hf[2*i+1], &env->fp_status);
+    VdV.b[4*i+2] = f16_to_b(VvV.hf[2*i], &env->fp_status);
+    VdV.b[4*i+3] = f16_to_b(VvV.hf[2*i+1], &env->fp_status))
+
+ITERATOR_INSN_SHIFT_SLOT_FLT(32, vconv_w_sf,"Vd32.w=Vu32.sf",
+    "Vector conversion of sf32 format to int w",
+    VdV.w[i] = conv_w_sf(VuV.sf[i], &env->fp_status))
+
+ITERATOR_INSN_SHIFT_SLOT_FLT(16, vconv_h_hf,"Vd32.h=Vu32.hf",
+    "Vector conversion of hf16 format to int hw",
+    VdV.h[i] = conv_h_hf(VuV.hf[i], &env->fp_status))
+
+ITERATOR_INSN_SHIFT_SLOT_FLT(32, vconv_sf_w,"Vd32.sf=Vu32.w",
+    "Vector conversion of int w format to sf32",
+    VdV.sf[i] = conv_sf_w(VuV.w[i], &env->fp_status))
+
+ITERATOR_INSN_SHIFT_SLOT_FLT(16, vconv_hf_h,"Vd32.hf=Vu32.h",
+    "Vector conversion of int hw format to hf16",
+    VdV.hf[i] = conv_hf_h(VuV.h[i], &env->fp_status))
+
 /******************************************************************************
  DEBUG Vector/Register Printing
  ******************************************************************************/
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (6 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 21:42   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns Matheus Tavares Bernardino
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE floating-point compare instructions:
- V6_vgthf, V6_vgtsf: greater-than compare
- V6_vgthf_and, V6_vgtsf_and: greater-than with predicate-and
- V6_vgthf_or, V6_vgtsf_or: greater-than with predicate-or
- V6_vgthf_xor, V6_vgtsf_xor: greater-than with predicate-xor

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/macros.h                | 10 ++++
 target/hexagon/attribs_def.h.inc             |  2 +
 target/hexagon/hex_common.py                 |  1 +
 target/hexagon/imported/mmvec/encode_ext.def | 10 ++++
 target/hexagon/imported/mmvec/ext.idef       | 61 ++++++++++++++++++++
 5 files changed, 84 insertions(+)

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index 2af3d2d747..c342507d1a 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -356,4 +356,14 @@
                extract32(VAL, POS * 8, 8); \
     } while (0);
 
+#define fCMPGT_SF(A, B) \
+    (float32_is_any_nan(A) || float32_is_any_nan(B) ? \
+     (int32_t)(A) > (int32_t)(B) : \
+     float32_compare((A), (B), &env->fp_status) == float_relation_greater)
+
+#define fCMPGT_HF(A, B) \
+    (float16_is_any_nan(A) || float16_is_any_nan(B) ? \
+    (int16_t)(A) > (int16_t)(B) : \
+    float16_compare((A), (B), &env->fp_status) == float_relation_greater)
+
 #endif
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index d3c4bf6301..2d0fc7e9c0 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -81,6 +81,7 @@ DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
 DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
 DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", "")
 DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
+DEF_ATTRIB(CVI_VA_2SRC, "Execs on multimedia vector engine; requires two srcs", "", "")
 
 DEF_ATTRIB(VTCM_ALLBANK_ACCESS, "Allocates in all VTCM schedulers.", "", "")
 
@@ -179,6 +180,7 @@ DEF_ATTRIB(HVX_IEEE_FP_ACC, "HVX IEEE FP accumulate instruction", "", "")
 DEF_ATTRIB(HVX_IEEE_FP_OUT_16, "HVX IEEE FP 16-bit output", "", "")
 DEF_ATTRIB(HVX_IEEE_FP_OUT_32, "HVX IEEE FP 32-bit output", "", "")
 DEF_ATTRIB(CVI_VX_NO_TMP_LD, "HVX multiply without tmp load", "", "")
+DEF_ATTRIB(HVX_FLT, "This a floating point HVX instruction.", "", "")
 
 /* Keep this as the last attribute: */
 DEF_ATTRIB(ZZ_LASTATTRIB, "Last attribute in the file", "", "")
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index f6a2848db1..f93c7559d2 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -216,6 +216,7 @@ def need_env(tag):
             "A_CVI_GATHER" in attribdict[tag] or
             "A_CVI_SCATTER" in attribdict[tag] or
             "A_HVX_IEEE_FP" in attribdict[tag] or
+            "A_HVX_FLT" in attribdict[tag] or
             "A_IMPLICIT_WRITES_USR" in attribdict[tag])
 
 
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 5325bbd704..3f84a1691b 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -859,4 +859,14 @@ DEF_ENC(V6_vconv_w_sf,"00011110--0--101PP1uuuuu001ddddd")
 DEF_ENC(V6_vconv_hf_h,"00011110--0--101PP1uuuuu100ddddd")
 DEF_ENC(V6_vconv_h_hf,"00011110--0--101PP1uuuuu010ddddd")
 
+/* IEEE FP compare instructions */
+DEF_ENC(V6_vgtsf,"00011100100vvvvvPP1uuuuu011100dd")
+DEF_ENC(V6_vgthf,"00011100100vvvvvPP1uuuuu011101dd")
+DEF_ENC(V6_vgtsf_and,"00011100100vvvvvPP1uuuuu110010xx")
+DEF_ENC(V6_vgthf_and,"00011100100vvvvvPP1uuuuu110011xx")
+DEF_ENC(V6_vgtsf_or,"00011100100vvvvvPP1uuuuu001100xx")
+DEF_ENC(V6_vgthf_or,"00011100100vvvvvPP1uuuuu001101xx")
+DEF_ENC(V6_vgtsf_xor,"00011100100vvvvvPP1uuuuu111010xx")
+DEF_ENC(V6_vgthf_xor,"00011100100vvvvvPP1uuuuu111011xx")
+
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 8b832166e0..304c4966d8 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -3129,6 +3129,67 @@ ITERATOR_INSN_SHIFT_SLOT_FLT(16, vconv_hf_h,"Vd32.hf=Vu32.h",
     "Vector conversion of int hw format to hf16",
     VdV.hf[i] = conv_hf_h(VuV.h[i], &env->fp_status))
 
+/******************************************************************************
+ * IEEE FP compare instructions
+ ******************************************************************************/
+
+#define VCMPGT_SF(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH) \
+{ \
+    for (fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+        fHIDE(int) VAL = fCMPGT_SF(VuV.SRC[i/WIDTH],VvV.SRC[i/WIDTH]) ? MASK : 0; \
+        fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP VAL); \
+    } \
+}
+
+#define VCMPGT_HF(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH) \
+{ \
+    for (fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+        fHIDE(int) VAL = fCMPGT_HF(VuV.SRC[i/WIDTH],VvV.SRC[i/WIDTH]) ? MASK : 0; \
+        fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP VAL); \
+    } \
+}
+
+/* Vector SF compare */
+#define MMVEC_CMPGT_SF(TYPE,TYPE2,DESCR,N,MASK,WIDTH,SRC) \
+    EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-and", \
+        VCMPGT_SF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-xor", \
+        VCMPGT_SF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_or, "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-or", \
+        VCMPGT_SF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE, "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than", \
+        VCMPGT_SF(QdV, , , ">", N, SRC, MASK, WIDTH))
+
+/* Vector HF compare */
+#define MMVEC_CMPGT_HF(TYPE,TYPE2,DESCR,N,MASK,WIDTH,SRC) \
+    EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-and", \
+        VCMPGT_HF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-xor", \
+        VCMPGT_HF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_or, "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-or", \
+        VCMPGT_HF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE, "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than", \
+        VCMPGT_HF(QdV, , , ">", N, SRC, MASK, WIDTH))
+
+MMVEC_CMPGT_SF(sf,"sf","Vector sf Compare ", fVELEM(32), 0xF, 4, sf)
+MMVEC_CMPGT_HF(hf,"hf","Vector hf Compare ", fVELEM(16), 0x3, 2, hf)
+
 /******************************************************************************
  DEBUG Vector/Register Printing
  ******************************************************************************/
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (7 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-23 22:03   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics Matheus Tavares Bernardino
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Add HVX IEEE bfloat16 (bf16) instructions:

Arithmetic operations:
- V6_vadd_sf_bf, V6_vsub_sf_bf: add/sub bf16 widening to sf output
- V6_vmpy_sf_bf: multiply bf16 widening to sf output
- V6_vmpy_sf_bf_acc: multiply-accumulate bf16 widening to sf output

Min/Max operations:
- V6_vmin_bf, V6_vmax_bf: bf16 min/max

Comparison operations:
- V6_vgtbf: greater-than compare
- V6_vgtbf_and, V6_vgtbf_or, V6_vgtbf_xor: predicate variants

Conversion operations:
- V6_vcvt_bf_sf: convert sf to bf16

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 target/hexagon/mmvec/kvx_ieee.h              | 36 +++++++++++
 target/hexagon/mmvec/macros.h                |  5 ++
 target/hexagon/mmvec/mmvec.h                 |  1 +
 target/hexagon/mmvec/kvx_ieee.c              |  3 +
 target/hexagon/imported/mmvec/encode_ext.def | 15 +++++
 target/hexagon/imported/mmvec/ext.idef       | 64 ++++++++++++++++++++
 6 files changed, 124 insertions(+)

diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
index 8a6816f6b3..eb670d4ec3 100644
--- a/target/hexagon/mmvec/kvx_ieee.h
+++ b/target/hexagon/mmvec/kvx_ieee.h
@@ -80,4 +80,40 @@ int16_t conv_hf_h(int16_t a, float_status *fp_status);
 int32_t conv_w_sf(uint32_t a, float_status *fp_status);
 int16_t conv_h_hf(uint16_t a, float_status *fp_status);
 
+/* IEEE BFloat instructions */
+
+#define fp_mult_sf_bf(A, B) \
+    fp_mult_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16, &env->fp_status)
+#define fp_add_sf_bf(A, B) \
+    fp_add_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16, &env->fp_status)
+#define fp_sub_sf_bf(A, B) \
+    fp_sub_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16, &env->fp_status)
+
+uint32_t fp_mult_sf_bf_acc(uint16_t op1, uint16_t op2, uint32_t acc,
+                           float_status *fp_status);
+
+#define bf_to_sf(A) (((uint32_t)(A)) << 16)
+
+#define fp_min_bf(A, B) ({ \
+    uint32_t _bf_res = fp_min_sf(bf_to_sf(A), bf_to_sf(B), &env->fp_status); \
+    (uint16_t)((_bf_res >> 16) & 0xffff); \
+})
+
+#define fp_max_bf(A, B) ({ \
+    uint32_t _bf_res = fp_max_sf(bf_to_sf(A), bf_to_sf(B), &env->fp_status); \
+    (uint16_t)((_bf_res >> 16) & 0xffff); \
+})
+
+static inline uint16_t sf_to_bf(int32_t A)
+{
+    uint32_t rslt = A;
+    if ((rslt & 0x1FFFF) == 0x08000) {
+        /* do not round up if exactly .5 and even already */
+    } else if ((rslt & 0x8000) == 0x8000) {
+        rslt += 0x8000; /* rounding to nearest number */
+    }
+    rslt = float32_is_any_nan(A) ? FP32_DEF_NAN : rslt;
+    return rslt >> 16;
+}
+
 #endif
diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index c342507d1a..b70996578e 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -25,6 +25,9 @@
 #include "accel/tcg/probe.h"
 #include "mmvec/kvx_ieee.h"
 
+#define fBFLOAT()
+#define fCVI_VX_NO_TMP_LD()
+
 #ifndef QEMU_GENERATE
 #define VdV      (*(MMVector *restrict)(VdV_void))
 #define VsV      (*(MMVector *restrict)(VsV_void))
@@ -366,4 +369,6 @@
     (int16_t)(A) > (int16_t)(B) : \
     float16_compare((A), (B), &env->fp_status) == float_relation_greater)
 
+#define fCMPGT_BF(A, B) fCMPGT_SF(((int)A) << 16, ((int)B) << 16)
+
 #endif
diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
index eaedfe0d6d..9d8d57c7c6 100644
--- a/target/hexagon/mmvec/mmvec.h
+++ b/target/hexagon/mmvec/mmvec.h
@@ -40,6 +40,7 @@ typedef union {
     int8_t    b[MAX_VEC_SIZE_BYTES / 1];
     int32_t  sf[MAX_VEC_SIZE_BYTES / 4];   /* single float (32-bit) */
     int16_t  hf[MAX_VEC_SIZE_BYTES / 2];   /* half float (16-bit) */
+    uint16_t bf[MAX_VEC_SIZE_BYTES / 2];   /* bfloat16 */
 } MMVector;
 
 typedef union {
diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
index bbeec09707..b5c434ad6d 100644
--- a/target/hexagon/mmvec/kvx_ieee.c
+++ b/target/hexagon/mmvec/kvx_ieee.c
@@ -229,3 +229,6 @@ int16_t conv_h_hf(uint16_t a, float_status *fp_status)
     }
     return float16_to_int16_round_to_zero(f1, fp_status);
 }
+
+DEF_FP_INSN_3(mult_sf_bf_acc, 32, 16, 16, 32,
+              float32_muladd(bf_to_sf(f1), bf_to_sf(f2), f3, 0, fp_status))
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
index 3f84a1691b..352a8ec14b 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -869,4 +869,19 @@ DEF_ENC(V6_vgthf_or,"00011100100vvvvvPP1uuuuu001101xx")
 DEF_ENC(V6_vgtsf_xor,"00011100100vvvvvPP1uuuuu111010xx")
 DEF_ENC(V6_vgthf_xor,"00011100100vvvvvPP1uuuuu111011xx")
 
+/* BFLOAT instructions */
+DEF_ENC(V6_vmpy_sf_bf,"00011101010vvvvvPP1uuuuu100ddddd")
+DEF_ENC(V6_vmpy_sf_bf_acc,"00011101000vvvvvPP1uuuuu000xxxxx")
+DEF_ENC(V6_vadd_sf_bf,"00011101010vvvvvPP1uuuuu110ddddd")
+DEF_ENC(V6_vsub_sf_bf,"00011101010vvvvvPP1uuuuu101ddddd")
+DEF_ENC(V6_vmax_bf,"00011101010vvvvvPP1uuuuu111ddddd")
+DEF_ENC(V6_vmin_bf,"00011101010vvvvvPP1uuuuu000ddddd")
+DEF_ENC(V6_vcvt_bf_sf,"00011101010vvvvvPP1uuuuu011ddddd")
+
+/* BFLOAT compare instructions */
+DEF_ENC(V6_vgtbf,"00011100100vvvvvPP1uuuuu011110dd")
+DEF_ENC(V6_vgtbf_and,"00011100100vvvvvPP1uuuuu110100xx")
+DEF_ENC(V6_vgtbf_or,"00011100100vvvvvPP1uuuuu001110xx")
+DEF_ENC(V6_vgtbf_xor,"00011100100vvvvvPP1uuuuu111100xx")
+
 #endif /* NO MMVEC */
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
index 304c4966d8..afe9de3716 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -3149,6 +3149,15 @@ ITERATOR_INSN_SHIFT_SLOT_FLT(16, vconv_hf_h,"Vd32.hf=Vu32.h",
     } \
 }
 
+#define VCMPGT_BF(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH) \
+{ \
+    fBFLOAT(); \
+    for (fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+        fHIDE(int) VAL = fCMPGT_BF(VuV.SRC[i/WIDTH],VvV.SRC[i/WIDTH]) ? MASK : 0; \
+        fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP VAL); \
+    } \
+}
+
 /* Vector SF compare */
 #define MMVEC_CMPGT_SF(TYPE,TYPE2,DESCR,N,MASK,WIDTH,SRC) \
     EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
@@ -3187,8 +3196,63 @@ ITERATOR_INSN_SHIFT_SLOT_FLT(16, vconv_hf_h,"Vd32.hf=Vu32.h",
         DESCR" greater than", \
         VCMPGT_HF(QdV, , , ">", N, SRC, MASK, WIDTH))
 
+/* Vector BF compare */
+#define MMVEC_CMPGT_BF(TYPE,TYPE2,DESCR,N,MASK,WIDTH,SRC) \
+    EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")",\
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-and", \
+        VCMPGT_BF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-xor", \
+        VCMPGT_BF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE##_or, "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than with predicate-or", \
+        VCMPGT_BF(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ">", N, SRC, MASK, WIDTH)) \
+    EXTINSN(V6_vgt##TYPE, "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", \
+        ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_CVI_VA_2SRC,A_HVX_FLT), \
+        DESCR" greater than", \
+        VCMPGT_BF(QdV, , , ">", N, SRC, MASK, WIDTH))
+
 MMVEC_CMPGT_SF(sf,"sf","Vector sf Compare ", fVELEM(32), 0xF, 4, sf)
 MMVEC_CMPGT_HF(hf,"hf","Vector hf Compare ", fVELEM(16), 0x3, 2, hf)
+MMVEC_CMPGT_BF(bf,"bf","Vector bf Compare ", fVELEM(16), 0x3, 2, bf)
+
+/******************************************************************************
+ BFloat arithmetic and max/min instructions
+ ******************************************************************************/
+
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vadd_sf_bf,
+    "Vdd32.sf=vadd(Vu32.bf,Vv32.bf)",  "Vector IEEE add: bf widen to sf",
+    VddV.v[0].sf[i] = fp_add_sf_bf(VuV.bf[2*i], VvV.bf[2*i]);
+    VddV.v[1].sf[i] = fp_add_sf_bf(VuV.bf[2*i+1], VvV.bf[2*i+1]); fBFLOAT())
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vsub_sf_bf,
+    "Vdd32.sf=vsub(Vu32.bf,Vv32.bf)",  "Vector IEEE sub: bf widen to sf",
+    VddV.v[0].sf[i] = fp_sub_sf_bf(VuV.bf[2*i], VvV.bf[2*i]);
+    VddV.v[1].sf[i] = fp_sub_sf_bf(VuV.bf[2*i+1], VvV.bf[2*i+1]); fBFLOAT())
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vmpy_sf_bf,
+    "Vdd32.sf=vmpy(Vu32.bf,Vv32.bf)",  "Vector IEEE mul: hf widen to sf",
+    VddV.v[0].sf[i] = fp_mult_sf_bf(VuV.bf[2*i], VvV.bf[2*i]);
+    VddV.v[1].sf[i] = fp_mult_sf_bf(VuV.bf[2*i+1], VvV.bf[2*i+1]); fBFLOAT())
+ITERATOR_INSN_IEEE_FP_DOUBLE_32(32, vmpy_sf_bf_acc,
+    "Vxx32.sf+=vmpy(Vu32.bf,Vv32.bf)", "Vector IEEE fma: hf widen to sf",
+    VxxV.v[0].sf[i] = fp_mult_sf_bf_acc(VuV.bf[2*i], VvV.bf[2*i],
+                                        VxxV.v[0].sf[i], &env->fp_status);
+    VxxV.v[1].sf[i] = fp_mult_sf_bf_acc(VuV.bf[2*i+1], VvV.bf[2*i+1],
+                                        VxxV.v[1].sf[i], &env->fp_status);
+    fCVI_VX_NO_TMP_LD(); fBFLOAT())
+ITERATOR_INSN_IEEE_FP_16(32, vcvt_bf_sf,
+    "Vd32.bf=vcvt(Vu32.sf,Vv32.sf)",   "Vector IEEE cvt: sf to bf",
+    VdV.bf[2*i]   = sf_to_bf(VuV.sf[i]);
+    VdV.bf[2*i+1] = sf_to_bf(VvV.sf[i]); fBFLOAT())
+
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vmax_bf, "Vd32.bf=vmax(Vu32.bf,Vv32.bf)",
+    "Vector IEEE max: bf", VdV.bf[i] = fp_max_bf(VuV.bf[i], VvV.bf[i]);
+    fBFLOAT())
+ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vmin_bf, "Vd32.bf=vmin(Vu32.bf,Vv32.bf)",
+    "Vector IEEE max: bf", VdV.bf[i] = fp_min_bf(VuV.bf[i], VvV.bf[i]);
+    fBFLOAT())
 
 /******************************************************************************
  DEBUG Vector/Register Printing
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (8 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-24 19:05   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max Matheus Tavares Bernardino
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 tests/tcg/hexagon/hvx_misc.h        |  12 +++
 tests/tcg/hexagon/fp_hvx.c          | 129 ++++++++++++++++++++++++++++
 tests/tcg/hexagon/fp_hvx_disabled.c |  32 +++++++
 tests/tcg/hexagon/Makefile.target   |   8 ++
 4 files changed, 181 insertions(+)
 create mode 100644 tests/tcg/hexagon/fp_hvx.c
 create mode 100644 tests/tcg/hexagon/fp_hvx_disabled.c

diff --git a/tests/tcg/hexagon/hvx_misc.h b/tests/tcg/hexagon/hvx_misc.h
index 2e868340fd..771a4a22b6 100644
--- a/tests/tcg/hexagon/hvx_misc.h
+++ b/tests/tcg/hexagon/hvx_misc.h
@@ -34,8 +34,10 @@ typedef union {
     uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
     int64_t   d[MAX_VEC_SIZE_BYTES / 8];
     uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    uint32_t sf[MAX_VEC_SIZE_BYTES / 4]; /* convenience alias */
     int32_t   w[MAX_VEC_SIZE_BYTES / 4];
     uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    uint16_t hf[MAX_VEC_SIZE_BYTES / 2]; /* convenience alias */
     int16_t   h[MAX_VEC_SIZE_BYTES / 2];
     uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
     int8_t    b[MAX_VEC_SIZE_BYTES / 1];
@@ -63,7 +65,9 @@ static inline void check_output_##FIELD(int line, size_t num_vectors) \
 
 CHECK_OUTPUT_FUNC(d,  8)
 CHECK_OUTPUT_FUNC(w,  4)
+CHECK_OUTPUT_FUNC(sf, 4)
 CHECK_OUTPUT_FUNC(h,  2)
+CHECK_OUTPUT_FUNC(hf, 2)
 CHECK_OUTPUT_FUNC(b,  1)
 
 static inline void init_buffers(void)
@@ -175,4 +179,12 @@ static inline void test_##NAME(bool invert) \
     check_output_b(__LINE__, BUFSIZE); \
 }
 
+#define float_sf(x) ({ typeof(x) _x = (x); *((float *)&(_x)); })
+#define float_hf(x) ({ typeof(x) _x = (x); *((_Float16 *) &(_x)); })
+#define raw_sf(x) ({ typeof(x) _x = (x); *((uint32_t *)&(_x)); })
+#define raw_hf(x) ({ typeof(x) _x = (x); *((uint16_t *)&(_x)); })
+#define float_hf_to_sf(x) ((float)x)
+#define bytes_hf 2
+#define bytes_sf 4
+
 #endif
diff --git a/tests/tcg/hexagon/fp_hvx.c b/tests/tcg/hexagon/fp_hvx.c
new file mode 100644
index 0000000000..85b8ff78ed
--- /dev/null
+++ b/tests/tcg/hexagon/fp_hvx.c
@@ -0,0 +1,129 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <hexagon_types.h>
+#include <hvx_hexagon_protos.h>
+
+int err;
+#include "hvx_misc.h"
+
+#if __HEXAGON_ARCH__ > 75
+#error "After v75, compiler will replace some FP HVX instructions."
+#endif
+
+/******************************************************************************
+ * NAN handling
+ *****************************************************************************/
+
+#define isnan(X) \
+     (sizeof(X) == bytes_hf ? ((raw_hf(X) & ~0x8000) > 0x7c00) : \
+                              ((raw_sf(X) & ~(1 << 31)) > 0x7f800000UL))
+
+#define CHECK_NAN(A, DEF_NAN) (isnan(A) ? DEF_NAN : (A))
+#define NAN_SF float_sf(0x7FFFFFFF)
+#define NAN_HF float_hf(0x7FFF)
+
+/******************************************************************************
+ * Binary operations
+ *****************************************************************************/
+
+#define DEF_TEST_OP_2(vop, op, type_res, type_arg) \
+    static void test_##vop##_##type_res##_##type_arg(void) \
+    { \
+        memset(expect, 0xff, sizeof(expect)); \
+        memset(output, 0xff, sizeof(expect)); \
+        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
+        HVX_Vector hvx_buffer0 = *(HVX_Vector *)&buffer0[0]; \
+        HVX_Vector hvx_buffer1 = *(HVX_Vector *)&buffer1[0]; \
+        \
+        *hvx_output = \
+            Q6_V##type_res##_##vop##_V##type_arg##V##type_arg(hvx_buffer0, \
+                                                              hvx_buffer1); \
+        \
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES / bytes_##type_res; i++) { \
+            expect[0].type_res[i] = \
+                raw_##type_res(op(float_##type_arg(buffer0[0].type_arg[i]), \
+                                  float_##type_arg(buffer1[0].type_arg[i]))); \
+        } \
+        check_output_##type_res(__LINE__, 1); \
+    }
+
+#define SUM(X, Y, DEF_NAN) CHECK_NAN((X) + (Y), DEF_NAN)
+#define SUB(X, Y, DEF_NAN) CHECK_NAN((X) - (Y), DEF_NAN)
+#define MULT(X, Y, DEF_NAN) CHECK_NAN((X) * (Y), DEF_NAN)
+
+#define SUM_SF(X, Y) SUM(X, Y, NAN_SF)
+#define SUM_HF(X, Y) SUM(X, Y, NAN_HF)
+#define SUB_SF(X, Y) SUB(X, Y, NAN_SF)
+#define SUB_HF(X, Y) SUB(X, Y, NAN_HF)
+#define MULT_SF(X, Y) MULT(X, Y, NAN_SF)
+#define MULT_HF(X, Y) MULT(X, Y, NAN_HF)
+
+DEF_TEST_OP_2(vadd, SUM_SF, sf, sf);
+DEF_TEST_OP_2(vadd, SUM_HF, hf, hf);
+DEF_TEST_OP_2(vsub, SUB_SF, sf, sf);
+DEF_TEST_OP_2(vsub, SUB_HF, hf, hf);
+DEF_TEST_OP_2(vmpy, MULT_SF, sf, sf);
+DEF_TEST_OP_2(vmpy, MULT_HF, hf, hf);
+
+/******************************************************************************
+ * Other tests
+ *****************************************************************************/
+
+void test_vdmpy_sf_hf(bool acc)
+{
+    HVX_Vector *hvx_output = (HVX_Vector *)&output[0];
+    HVX_Vector hvx_buffer0 = *(HVX_Vector *)&buffer0[0];
+    HVX_Vector hvx_buffer1 = *(HVX_Vector *)&buffer1[0];
+
+    uint32_t PREFIL_VAL = 0x111222;
+    memset(expect, 0xff, sizeof(expect));
+    *hvx_output = Q6_V_vsplat_R(PREFIL_VAL);
+
+    if (!acc) {
+        *hvx_output = Q6_Vsf_vdmpy_VhfVhf(hvx_buffer0, hvx_buffer1);
+    } else {
+        *hvx_output = Q6_Vsf_vdmpyacc_VsfVhfVhf(*hvx_output, hvx_buffer0,
+                                                hvx_buffer1);
+    }
+
+    for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+        float a1 = float_hf_to_sf(float_hf(buffer0[0].hf[2 * i + 1]));
+        float a2 = float_hf_to_sf(float_hf(buffer0[0].hf[2 * i]));
+        float a3 = float_hf_to_sf(float_hf(buffer1[0].hf[2 * i + 1]));
+        float a4 = float_hf_to_sf(float_hf(buffer1[0].hf[2 * i]));
+        float prev = acc ? float_sf(PREFIL_VAL) : 0;
+        expect[0].sf[i] = raw_sf(CHECK_NAN((a1 * a3) + (a2 * a4) + prev, NAN_SF));
+    }
+
+    check_output_sf(__LINE__, 1);
+}
+
+int main(void)
+{
+    init_buffers();
+
+    /* add/sub */
+    test_vadd_sf_sf();
+    test_vadd_hf_hf();
+    test_vsub_sf_sf();
+    test_vsub_hf_hf();
+
+    /* multiply */
+    test_vmpy_sf_sf();
+    test_vmpy_hf_hf();
+
+    /* dot product */
+    test_vdmpy_sf_hf(false);
+    test_vdmpy_sf_hf(true);
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/fp_hvx_disabled.c b/tests/tcg/hexagon/fp_hvx_disabled.c
new file mode 100644
index 0000000000..af409ab8d2
--- /dev/null
+++ b/tests/tcg/hexagon/fp_hvx_disabled.c
@@ -0,0 +1,32 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <hexagon_types.h>
+#include <hvx_hexagon_protos.h>
+
+int err;
+#include "hvx_misc.h"
+
+int main(void)
+{
+    asm volatile("r0 = #0xff\n"
+                 "v0 = vsplat(r0)\n"
+                 "vmem(%1 + #0) = v0\n"
+                 "r1 = #0x1\n"
+                 "v1 = vsplat(r1)\n"
+                 "v2 = vsplat(r1)\n"
+                 "v0.sf = vadd(v1.sf, v2.sf)\n"
+                 "vmem(%0 + #0) = v0\n"
+                 :
+                 : "r"(output), "r"(expect)
+                 : "r0", "r1", "v0", "v1", "v2", "memory");
+
+    check_output_w(__LINE__, 1);
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index a70ef2f660..16072c96fd 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -50,6 +50,8 @@ HEX_TESTS += vector_add_int
 HEX_TESTS += scatter_gather
 HEX_TESTS += hvx_misc
 HEX_TESTS += hvx_histogram
+HEX_TESTS += fp_hvx
+HEX_TESTS += fp_hvx_disabled
 HEX_TESTS += invalid-slots
 HEX_TESTS += invalid-encoding
 HEX_TESTS += multiple-writes
@@ -123,6 +125,12 @@ v68_hvx: CFLAGS += -mhvx -Wno-unused-function
 v69_hvx: v69_hvx.c hvx_misc.h
 v69_hvx: CFLAGS += -mhvx -Wno-unused-function
 v73_scalar: CFLAGS += -Wno-unused-function
+fp_hvx: fp_hvx.c hvx_misc.h
+fp_hvx: CFLAGS += -mhvx -mhvx-ieee-fp
+fp_hvx_disabled: fp_hvx_disabled.c hvx_misc.h
+fp_hvx_disabled: CFLAGS += -mhvx -mhvx-ieee-fp
+
+run-fp_hvx_disabled: QEMU_OPTS += -cpu v73,ieee-fp=false
 
 hvx_histogram: hvx_histogram.c hvx_histogram_row.S
 	$(CC) $(CFLAGS) $(CROSS_CC_GUEST_CFLAGS) $^ -o $@ $(LDFLAGS)
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (9 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-24 19:07   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions Matheus Tavares Bernardino
  2026-03-23 13:15 ` [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons Matheus Tavares Bernardino
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 tests/tcg/hexagon/fp_hvx.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tests/tcg/hexagon/fp_hvx.c b/tests/tcg/hexagon/fp_hvx.c
index 85b8ff78ed..ded3a80f8f 100644
--- a/tests/tcg/hexagon/fp_hvx.c
+++ b/tests/tcg/hexagon/fp_hvx.c
@@ -73,6 +73,21 @@ DEF_TEST_OP_2(vsub, SUB_HF, hf, hf);
 DEF_TEST_OP_2(vmpy, MULT_SF, sf, sf);
 DEF_TEST_OP_2(vmpy, MULT_HF, hf, hf);
 
+#define MIN(X, Y, DEF_NAN) \
+    ((isnan(X) || isnan(Y)) ? DEF_NAN : ((X) < (Y) ? (X) : (Y)))
+#define MAX(X, Y, DEF_NAN) \
+    ((isnan(X) || isnan(Y)) ? DEF_NAN : ((X) > (Y) ? (X) : (Y)))
+
+#define MIN_HF(X, Y) MIN(X, Y, NAN_HF)
+#define MAX_HF(X, Y) MAX(X, Y, NAN_HF)
+#define MIN_SF(X, Y) MIN(X, Y, NAN_SF)
+#define MAX_SF(X, Y) MAX(X, Y, NAN_SF)
+
+DEF_TEST_OP_2(vfmin, MIN_SF, sf, sf);
+DEF_TEST_OP_2(vfmax, MAX_SF, sf, sf);
+DEF_TEST_OP_2(vfmin, MIN_HF, hf, hf);
+DEF_TEST_OP_2(vfmax, MAX_HF, hf, hf);
+
 /******************************************************************************
  * Other tests
  *****************************************************************************/
@@ -124,6 +139,12 @@ int main(void)
     test_vdmpy_sf_hf(false);
     test_vdmpy_sf_hf(true);
 
+    /* min/max */
+    test_vfmin_sf_sf();
+    test_vfmin_hf_hf();
+    test_vfmax_sf_sf();
+    test_vfmax_hf_hf();
+
     puts(err ? "FAIL" : "PASS");
     return err ? 1 : 0;
 }
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (10 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-24 19:30   ` Taylor Simpson
  2026-03-23 13:15 ` [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons Matheus Tavares Bernardino
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 tests/tcg/hexagon/hex_test.h      |  15 +++
 tests/tcg/hexagon/hvx_misc.h      |   2 +
 tests/tcg/hexagon/fp_hvx_cvt.c    | 194 ++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target |   3 +
 4 files changed, 214 insertions(+)
 create mode 100644 tests/tcg/hexagon/fp_hvx_cvt.c

diff --git a/tests/tcg/hexagon/hex_test.h b/tests/tcg/hexagon/hex_test.h
index cfed06a58b..28522faf04 100644
--- a/tests/tcg/hexagon/hex_test.h
+++ b/tests/tcg/hexagon/hex_test.h
@@ -109,6 +109,20 @@ static inline void __check64_ne(int line, uint64_t val, uint64_t expect)
     "usr = r2\n\t"
 
 /* Some useful floating point values */
+const uint16_t HF_INF = 0x7c00;
+const uint16_t HF_INF_neg = 0xfc00;
+const uint16_t HF_QNaN = 0x7e00;
+const uint16_t HF_SNaN = 0x7f80;
+const uint16_t HF_QNaN_neg = 0xfe00;
+const uint16_t HF_zero = 0x0000;
+const uint16_t HF_zero_neg = 0x8000;
+const uint16_t HF_one = 0x3c00;
+const uint16_t HF_one_recip = 0x3bf9;
+const uint16_t HF_two = 0x4000;
+const uint16_t HF_small_neg = 0x8010;
+const uint16_t HF_any = 0x3c00;
+const uint16_t HF_neg_two = 0xc000;
+
 const uint32_t SF_INF =              0x7f800000;
 const uint32_t SF_QNaN =             0x7fc00000;
 const uint32_t SF_QNaN_special =     0x7f800001;
@@ -128,6 +142,7 @@ const uint32_t SF_large_pos =        0x5afa572e;
 const uint32_t SF_any =              0x3f800000;
 const uint32_t SF_denorm =           0x00000001;
 const uint32_t SF_random =           0x346001d6;
+const uint32_t SF_neg_two =          0xc0000000;
 
 const uint64_t DF_QNaN =             0x7ff8000000000000ULL;
 const uint64_t DF_SNaN =             0x7ff7000000000000ULL;
diff --git a/tests/tcg/hexagon/hvx_misc.h b/tests/tcg/hexagon/hvx_misc.h
index 771a4a22b6..26dd9ad774 100644
--- a/tests/tcg/hexagon/hvx_misc.h
+++ b/tests/tcg/hexagon/hvx_misc.h
@@ -67,7 +67,9 @@ CHECK_OUTPUT_FUNC(d,  8)
 CHECK_OUTPUT_FUNC(w,  4)
 CHECK_OUTPUT_FUNC(sf, 4)
 CHECK_OUTPUT_FUNC(h,  2)
+CHECK_OUTPUT_FUNC(uh, 2)
 CHECK_OUTPUT_FUNC(hf, 2)
+CHECK_OUTPUT_FUNC(ub,  1)
 CHECK_OUTPUT_FUNC(b,  1)
 
 static inline void init_buffers(void)
diff --git a/tests/tcg/hexagon/fp_hvx_cvt.c b/tests/tcg/hexagon/fp_hvx_cvt.c
new file mode 100644
index 0000000000..7497455ac6
--- /dev/null
+++ b/tests/tcg/hexagon/fp_hvx_cvt.c
@@ -0,0 +1,194 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <hexagon_types.h>
+#include <hvx_hexagon_protos.h>
+
+#if __HEXAGON_ARCH__ > 75
+#error "After v75, compiler will replace some FP HVX instructions."
+#endif
+
+int err;
+#include "hvx_misc.h"
+#include "hex_test.h"
+
+#define TEST_EXP(TO, FROM, VAL, EXP) do { \
+    ((MMVector *)&buffer)->FROM[index] = VAL; \
+    expect[0].TO[index] = EXP; \
+    index++; \
+} while (0)
+
+#define DEF_TEST_CVT(TO, FROM, TESTS) \
+    void test_vcvt_##TO##_##FROM(void) \
+    { \
+        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
+        HVX_Vector buffer; \
+        int index = 0; \
+        memset(&buffer, 0, sizeof(buffer)); \
+        memset(expect, 0, sizeof(expect)); \
+        TESTS \
+        *hvx_output = Q6_V##TO##_vcvt_V##FROM(buffer); \
+        check_output_##TO(__LINE__, 1); \
+    }
+
+DEF_TEST_CVT(uh, hf, { \
+    TEST_EXP(uh, hf, HF_QNaN, UINT16_MAX); \
+    TEST_EXP(uh, hf, HF_SNaN, UINT16_MAX); \
+    TEST_EXP(uh, hf, HF_QNaN_neg, UINT16_MAX); \
+    TEST_EXP(uh, hf, HF_INF, UINT16_MAX); \
+    TEST_EXP(uh, hf, HF_INF_neg, 0); \
+    TEST_EXP(uh, hf, HF_neg_two, 0); \
+    TEST_EXP(uh, hf, HF_zero_neg, 0); \
+    TEST_EXP(uh, hf, raw_hf((_Float16)2.1), 2); \
+    TEST_EXP(uh, hf, HF_one_recip, 1); \
+})
+
+DEF_TEST_CVT(h, hf, { \
+    TEST_EXP(h, hf, HF_QNaN, INT16_MAX); \
+    TEST_EXP(h, hf, HF_SNaN, INT16_MAX); \
+    TEST_EXP(h, hf, HF_QNaN_neg, INT16_MAX); \
+    TEST_EXP(h, hf, HF_INF, INT16_MAX); \
+    TEST_EXP(h, hf, HF_INF_neg, INT16_MIN); \
+    TEST_EXP(h, hf, HF_neg_two, -2); \
+    TEST_EXP(h, hf, HF_zero_neg, 0); \
+    TEST_EXP(h, hf, raw_hf((_Float16)2.1), 2); \
+    TEST_EXP(h, hf, HF_one_recip, 1); \
+})
+
+/*
+ * Some cvt operations take two vectors as input and perform the following:
+ *    VdV.TO[4*i]   = OP(VuV.FROM[2*i]);
+ *    VdV.TO[4*i+1] = OP(VuV.FROM[2*i+1]);
+ *    VdV.TO[4*i+2] = OP(VvV.FROM[2*i]);
+ *    VdV.TO[4*i+3] = OP(VvV.FROM[2*i+1]))
+ * We use bf_index and index in a way that the tests are always done either
+ * using the first or third line of the above snippet.
+ */
+#define TEST_EXP_2(TO, FROM, VAL, EXP) do { \
+    ((MMVector *)&buffers[bf_index])->FROM[2 * index] = VAL; \
+    expect[0].TO[(4 * index) + (2 * bf_index)] = EXP; \
+    index++; \
+    bf_index = (bf_index + 1) % 2; \
+} while (0)
+
+#define DEF_TEST_CVT_2(TO, FROM, TESTS) \
+    void test_vcvt_##TO##_##FROM(void) \
+    { \
+        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
+        HVX_Vector buffers[2]; \
+        int index = 0, bf_index = 0; \
+        memset(&buffers, 0, sizeof(buffers)); \
+        memset(expect, 0, sizeof(expect)); \
+        TESTS \
+        *hvx_output = Q6_V##TO##_vcvt_V##FROM##V##FROM(buffers[0], buffers[1]); \
+        check_output_##TO(__LINE__, 1); \
+    }
+
+DEF_TEST_CVT_2(ub, hf, { \
+    TEST_EXP_2(ub, hf, HF_QNaN, UINT8_MAX); \
+    TEST_EXP_2(ub, hf, HF_SNaN, UINT8_MAX); \
+    TEST_EXP_2(ub, hf, HF_QNaN_neg, UINT8_MAX); \
+    TEST_EXP_2(ub, hf, HF_INF, UINT8_MAX); \
+    TEST_EXP_2(ub, hf, HF_INF_neg, 0); \
+    TEST_EXP_2(ub, hf, HF_small_neg, 0); \
+    TEST_EXP_2(ub, hf, HF_neg_two, 0); \
+    TEST_EXP_2(ub, hf, HF_zero_neg, 0); \
+    TEST_EXP_2(ub, hf, raw_hf((_Float16)2.1), 2); \
+    TEST_EXP_2(ub, hf, HF_one_recip, 1); \
+})
+
+DEF_TEST_CVT_2(b, hf, { \
+    TEST_EXP_2(b, hf, HF_QNaN, INT8_MAX); \
+    TEST_EXP_2(b, hf, HF_SNaN, INT8_MAX); \
+    TEST_EXP_2(b, hf, HF_QNaN_neg, INT8_MAX); \
+    TEST_EXP_2(b, hf, HF_INF, INT8_MAX); \
+    TEST_EXP_2(b, hf, HF_INF_neg, INT8_MIN); \
+    TEST_EXP_2(b, hf, HF_small_neg, 0); \
+    TEST_EXP_2(b, hf, HF_neg_two, -2); \
+    TEST_EXP_2(b, hf, HF_zero_neg, 0); \
+    TEST_EXP_2(b, hf, raw_hf((_Float16)2.1), 2); \
+    TEST_EXP_2(b, hf, HF_one_recip, 1); \
+})
+
+#define DEF_TEST_VCONV(TO, FROM, TESTS) \
+    void test_vconv_##TO##_##FROM(void) \
+    { \
+        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
+        HVX_Vector buffer; \
+        int index = 0; \
+        memset(&buffer, 0, sizeof(buffer)); \
+        memset(expect, 0, sizeof(expect)); \
+        TESTS \
+        *hvx_output = Q6_V##TO##_equals_V##FROM(buffer); \
+        check_output_##TO(__LINE__, 1); \
+    }
+
+DEF_TEST_VCONV(w, sf, { \
+    TEST_EXP(w, sf, SF_QNaN, INT32_MAX); \
+    TEST_EXP(w, sf, SF_SNaN, INT32_MAX); \
+    TEST_EXP(w, sf, SF_QNaN_neg, INT32_MIN); \
+    TEST_EXP(w, sf, SF_INF, INT32_MAX); \
+    TEST_EXP(w, sf, SF_INF_neg, INT32_MIN); \
+    TEST_EXP(w, sf, SF_small_neg, 0); \
+    TEST_EXP(w, sf, SF_neg_two, -2); \
+    TEST_EXP(w, sf, SF_zero_neg, 0); \
+    TEST_EXP(w, sf, raw_sf(2.1f), 2); \
+    TEST_EXP(w, sf, raw_sf(2.8f), 2); \
+})
+
+DEF_TEST_VCONV(h, hf, { \
+    TEST_EXP(h, hf, HF_QNaN, INT16_MAX); \
+    TEST_EXP(h, hf, HF_SNaN, INT16_MAX); \
+    TEST_EXP(h, hf, HF_QNaN_neg, INT16_MIN); \
+    TEST_EXP(h, hf, HF_INF, INT16_MAX); \
+    TEST_EXP(h, hf, HF_INF_neg, INT16_MIN); \
+    TEST_EXP(h, hf, HF_small_neg, 0); \
+    TEST_EXP(h, hf, HF_neg_two, -2); \
+    TEST_EXP(h, hf, HF_zero_neg, 0); \
+    TEST_EXP(h, hf, raw_hf(2.1), 2); \
+    TEST_EXP(h, hf, raw_hf(2.8), 2); \
+})
+
+DEF_TEST_VCONV(hf, h, { \
+    TEST_EXP(hf, h, INT16_MAX, HF_QNaN); \
+    TEST_EXP(hf, h, INT16_MAX, HF_SNaN); \
+    TEST_EXP(hf, h, INT16_MIN, HF_QNaN_neg); \
+    TEST_EXP(hf, h, INT16_MAX, HF_INF); \
+    TEST_EXP(hf, h, INT16_MIN, HF_INF_neg); \
+    TEST_EXP(hf, h, 0, HF_small_neg); \
+    TEST_EXP(hf, h, -2, HF_neg_two); \
+    TEST_EXP(hf, h, 0, HF_zero_neg); \
+    TEST_EXP(hf, h, 2, raw_hf(2.1)); \
+    TEST_EXP(hf, h, 2, raw_hf(2.8)); \
+})
+
+DEF_TEST_VCONV(sf, w, { \
+    TEST_EXP(sf, w, INT32_MAX, SF_QNaN); \
+    TEST_EXP(sf, w, INT32_MAX, SF_SNaN); \
+    TEST_EXP(sf, w, INT32_MIN, SF_QNaN_neg); \
+    TEST_EXP(sf, w, INT32_MAX, SF_INF); \
+    TEST_EXP(sf, w, INT32_MIN, SF_INF_neg); \
+    TEST_EXP(sf, w, 0, SF_small_neg); \
+    TEST_EXP(sf, w, -2, SF_neg_two); \
+    TEST_EXP(sf, w, 0, SF_zero_neg); \
+    TEST_EXP(sf, w, 2, raw_sf(2.1f)); \
+    TEST_EXP(sf, w, 2, raw_sf(2.8f)); \
+})
+
+int main(void)
+{
+    test_vcvt_uh_hf();
+    test_vcvt_h_hf();
+    test_vcvt_ub_hf();
+    test_vcvt_b_hf();
+    test_vconv_w_sf();
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 16072c96fd..e240372fd2 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -51,6 +51,7 @@ HEX_TESTS += scatter_gather
 HEX_TESTS += hvx_misc
 HEX_TESTS += hvx_histogram
 HEX_TESTS += fp_hvx
+HEX_TESTS += fp_hvx_cvt
 HEX_TESTS += fp_hvx_disabled
 HEX_TESTS += invalid-slots
 HEX_TESTS += invalid-encoding
@@ -129,6 +130,8 @@ fp_hvx: fp_hvx.c hvx_misc.h
 fp_hvx: CFLAGS += -mhvx -mhvx-ieee-fp
 fp_hvx_disabled: fp_hvx_disabled.c hvx_misc.h
 fp_hvx_disabled: CFLAGS += -mhvx -mhvx-ieee-fp
+fp_hvx_cvt: fp_hvx_cvt.c hvx_misc.h
+fp_hvx_cvt: CFLAGS += -mhvx -mhvx-ieee-fp
 
 run-fp_hvx_disabled: QEMU_OPTS += -cpu v73,ieee-fp=false
 
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons
  2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
                   ` (11 preceding siblings ...)
  2026-03-23 13:15 ` [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions Matheus Tavares Bernardino
@ 2026-03-23 13:15 ` Matheus Tavares Bernardino
  2026-03-24 19:37   ` Taylor Simpson
  12 siblings, 1 reply; 39+ messages in thread
From: Matheus Tavares Bernardino @ 2026-03-23 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: brian.cain, ale, anjo, ltaylorsimpson, marco.liebel, philmd,
	quic_mburton, sid.manning

Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
---
 tests/tcg/hexagon/hex_test.h      |  1 +
 tests/tcg/hexagon/fp_hvx_cmp.c    | 58 +++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target |  3 ++
 3 files changed, 62 insertions(+)
 create mode 100644 tests/tcg/hexagon/fp_hvx_cmp.c

diff --git a/tests/tcg/hexagon/hex_test.h b/tests/tcg/hexagon/hex_test.h
index 28522faf04..5bc7a76a17 100644
--- a/tests/tcg/hexagon/hex_test.h
+++ b/tests/tcg/hexagon/hex_test.h
@@ -124,6 +124,7 @@ const uint16_t HF_any = 0x3c00;
 const uint16_t HF_neg_two = 0xc000;
 
 const uint32_t SF_INF =              0x7f800000;
+const uint32_t SF_INF_neg =          0xff800000;
 const uint32_t SF_QNaN =             0x7fc00000;
 const uint32_t SF_QNaN_special =     0x7f800001;
 const uint32_t SF_SNaN =             0x7fb00000;
diff --git a/tests/tcg/hexagon/fp_hvx_cmp.c b/tests/tcg/hexagon/fp_hvx_cmp.c
new file mode 100644
index 0000000000..e925c973f3
--- /dev/null
+++ b/tests/tcg/hexagon/fp_hvx_cmp.c
@@ -0,0 +1,58 @@
+/*
+ *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ *
+ *  SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <hexagon_types.h>
+#include <hvx_hexagon_protos.h>
+
+#if __HEXAGON_ARCH__ > 75
+#error "After v75, compiler will replace some FP HVX instructions."
+#endif
+
+int err;
+#include "hvx_misc.h"
+#include "hex_test.h"
+
+#define TEST_CMP(VAL1, VAL2, EXP) do { \
+    ((MMVector *)&buffers[0])->sf[index] = VAL1; \
+    ((MMVector *)&buffers[1])->sf[index] = VAL2; \
+    expect[0].w[index] = EXP ? 0xffffffff : 0; \
+    index++; \
+} while (0)
+
+int main(void)
+{
+    HVX_Vector *hvx_output = (HVX_Vector *)&output[0];
+    HVX_Vector buffers[2], true_vec, false_vec;
+    HVX_VectorPred pred;
+    int index = 0;
+
+    memset(&buffers, 0, sizeof(buffers));
+    memset(expect, 0, sizeof(expect));
+    memset(&true_vec, 0xff, sizeof(true_vec));
+    memset(&false_vec, 0, sizeof(false_vec));
+
+    TEST_CMP(raw_sf(2.2),  raw_sf(2.1),  true);
+    TEST_CMP(raw_sf(2.2),  raw_sf(2.2),  false);
+    TEST_CMP(raw_sf(0),    raw_sf(-2.2), true);
+    TEST_CMP(SF_SNaN,      SF_SNaN,      false);
+    TEST_CMP(SF_INF,       SF_INF_neg,   true);
+    TEST_CMP(SF_INF_neg,   SF_INF,       false);
+    TEST_CMP(SF_SNaN,      SF_QNaN,      false);
+    TEST_CMP(SF_QNaN,      SF_SNaN,      true);
+    TEST_CMP(SF_QNaN,      SF_QNaN_neg,  true);
+
+    pred = Q6_Q_vcmp_gt_VsfVsf(buffers[0], buffers[1]);
+    *hvx_output = Q6_V_vmux_QVV(pred, true_vec, false_vec);
+
+    check_output_sf(__LINE__, 1);
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index e240372fd2..ba93ffab17 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -52,6 +52,7 @@ HEX_TESTS += hvx_misc
 HEX_TESTS += hvx_histogram
 HEX_TESTS += fp_hvx
 HEX_TESTS += fp_hvx_cvt
+HEX_TESTS += fp_hvx_cmp
 HEX_TESTS += fp_hvx_disabled
 HEX_TESTS += invalid-slots
 HEX_TESTS += invalid-encoding
@@ -132,6 +133,8 @@ fp_hvx_disabled: fp_hvx_disabled.c hvx_misc.h
 fp_hvx_disabled: CFLAGS += -mhvx -mhvx-ieee-fp
 fp_hvx_cvt: fp_hvx_cvt.c hvx_misc.h
 fp_hvx_cvt: CFLAGS += -mhvx -mhvx-ieee-fp
+fp_hvx_cmp: fp_hvx_cmp.c hvx_misc.h
+fp_hvx_cmp: CFLAGS += -mhvx -mhvx-ieee-fp
 
 run-fp_hvx_disabled: QEMU_OPTS += -cpu v73,ieee-fp=false
 
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings
  2026-03-23 13:15 ` [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings Matheus Tavares Bernardino
@ 2026-03-23 19:21   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 19:21 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> The following encodings have become stricter since v68:
>
>     - V6_vunpackob, V6_vunpackoh: ---00 -> --000
>     - V6_vaddbq/hq/wq, V6_vaddbnq/hnq/wnq: ---01 -> --001
>     - V6_vsubbq/hq, V6_vsubwq/bnq/hnq/wnq: ---01/---10 -> --001/--010
>     - V6_vhist, V6_vwhist128/256, V6_vwhist128/256_sat: ---00 -> --000
>     - V6_vhistq, V6_vwhist128/256q, V6_vwhist128/256q_sat: ---10 -> --010
>
> Pre v68 compilers, by default, already use "0" for the non-specified bit
> that changed in v68, so unless someone is manually writing the binary
> encoding, this should not cause any backwards incompatibility with
> pre-v68 binaries.
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/imported/mmvec/encode_ext.def | 48 ++++++++++----------
>  1 file changed, 24 insertions(+), 24 deletions(-)
>

Reviewed-by: Taylor Simpson <ltaylorsimpson@gmail.com>

[-- Attachment #2: Type: text/html, Size: 1611 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-23 13:15 ` [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension Matheus Tavares Bernardino
@ 2026-03-23 19:32   ` Taylor Simpson
  2026-03-24 16:52     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 19:32 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> This flag will be used to control the HVX IEEE float instructions, which
> are only available at some Hexagon cores. When unavailable, the
> instruction is essentially treated as a no-op.
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/cpu.h             |  1 +
>  target/hexagon/translate.h       |  1 +
>  target/hexagon/attribs_def.h.inc |  3 +++
>  target/hexagon/cpu.c             |  1 +
>  target/hexagon/decode.c          | 22 ++++++++++++++++++++++
>  target/hexagon/translate.c       |  1 +
>  6 files changed, 29 insertions(+)
>
>
> diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
> index dbc9c630e8..d832a64a17 100644
> --- a/target/hexagon/decode.c
> +++ b/target/hexagon/decode.c
> @@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
>      return !bitmap_empty(conflict, 32);
>  }
>
> +static void convert_to_nop(Insn *insn)
> +{
> +    bool is_endloop = insn->is_endloop;
> +    memset(insn, 0, sizeof(*insn));
> +    insn->opcode = A2_nop;
> +    insn->new_read_idx = -1;
> +    insn->dest_idx = -1;
> +    insn->generate = opcode_genptr[insn->opcode];
> +    insn->iclass = 0b111;
> +    insn->is_endloop = is_endloop;
> +}
> +
>  /*
>   * decode_packet
>   * Decodes packet with given words
> @@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int max_words,
> const uint32_t *words,
>          /* Ran out of words! */
>          return 0;
>      }
> +
> +    /* Disable HVX IEEE instruction if extension is disabled. */
> +    if (!ctx->ieee_fp_extension) {
> +        for (i = 0; i < num_insns; i++) {
> +            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
> +                convert_to_nop(&pkt->insn[i]);
> +            }
> +        }
> +    }
> +
>

Better to leave the instruction alone and turn it into a nop by not
generating any TCG.

That way, the disassembly (-d in_asm) will still show what's actually in
the binary.  You could add the check in gen_tcg_funcs.py.

You could also consider adding some sort of marker in the disassembly to
indicate that the flag is needed for the instruction to do anything.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 3123 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-23 13:15 ` [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns Matheus Tavares Bernardino
@ 2026-03-23 20:28   ` Taylor Simpson
  2026-03-24 19:30     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 20:28 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 8533 bytes --]

On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE floating-point arithmetic instructions:
> - vmpy_sf_sf, vmpy_sf_hf, vmpy_hf_hf: multiply operations
> - vdmpy_sf_hf: dot-product multiply
> - vmpy_sf_hf_acc, vmpy_hf_hf_acc, vdmpy_sf_hf_acc: multiply-accumulate
> - vadd_sf_sf, vsub_sf_sf, vadd_sf_hf, vsub_sf_hf: add/sub with sf output
> - vadd_hf_hf, vsub_hf_hf: add/sub with hf output
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/kvx_ieee.h              | 47 ++++++++++
>  target/hexagon/mmvec/macros.h                |  1 +
>  target/hexagon/mmvec/mmvec.h                 |  2 +
>  target/hexagon/attribs_def.h.inc             |  4 +
>  target/hexagon/mmvec/kvx_ieee.c              | 87 ++++++++++++++++++
>  target/hexagon/hex_common.py                 |  1 +
>  target/hexagon/imported/mmvec/encode_ext.def | 18 ++++
>  target/hexagon/imported/mmvec/ext.idef       | 93 ++++++++++++++++++++
>  target/hexagon/meson.build                   |  1 +
>  9 files changed, 254 insertions(+)
>  create mode 100644 target/hexagon/mmvec/kvx_ieee.h
>  create mode 100644 target/hexagon/mmvec/kvx_ieee.c
>

I'm curious why the prefix is kvx instead of hvx.


>
> diff --git a/target/hexagon/mmvec/kvx_ieee.h
> b/target/hexagon/mmvec/kvx_ieee.h
> new file mode 100644
> index 0000000000..e92ddebeb9
> --- /dev/null
> +++ b/target/hexagon/mmvec/kvx_ieee.h
> @@ -0,0 +1,47 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HEXAGON_KVX_IEEE_H
> +#define HEXAGON_KVX_IEEE_H
> +
> +#include "fpu/softfloat.h"
> +
> +/* Hexagon canonical NaN */
> +#define FP32_DEF_NAN      0x7FFFFFFF
> +#define FP16_DEF_NAN      0x7FFF
>

These are the same as the scalar core, right?  If so, there's already a
call to set_float_default_nan_pattern in hexagon_cpu_reset_hold.

If the patterns are different, you'll need to call
set_float_default_nan_pattern before each scalar FP instruction and before
each HVX FP instruction.


> +
> +/*
> + * IEEE - FP ADD/SUB/MPY instructionsFP
> + */
> +uint32_t fp_mult_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint32_t fp_add_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint32_t fp_sub_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +
> +uint16_t fp_mult_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint16_t fp_add_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint16_t fp_sub_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +
> +uint32_t fp_mult_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint32_t fp_add_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint32_t fp_sub_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +
> +/*
> + * IEEE - FP Accumulate instructions
> + */
> +uint16_t fp_mult_hf_hf_acc(uint16_t a1, uint16_t a2, uint16_t acc,
> +                           float_status *fp_status);
> +uint32_t fp_mult_sf_hf_acc(uint16_t a1, uint16_t a2, uint32_t acc,
> +                           float_status *fp_status);
> +
> +/*
> + * IEEE - FP Reduce instructions
> + */
> +uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
> +                  float_status *fp_status);
> +uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
> +                      uint16_t a4, float_status *fp_status);
> +
>

Consider using macros similar to the ones in the .c file to create these
protos.


> +#endif
> diff --git a/target/hexagon/mmvec/kvx_ieee.c
> b/target/hexagon/mmvec/kvx_ieee.c
> new file mode 100644
> index 0000000000..b763899aa3
> --- /dev/null
> +++ b/target/hexagon/mmvec/kvx_ieee.c
> @@ -0,0 +1,87 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "kvx_ieee.h"
> +
> +#define DEF_FP_INSN_2(name, rt, a1t, a2t, op) \
> +    uint##rt##_t fp_##name(uint##a1t##_t a1, uint##a2t##_t a2, \
> +                           float_status *fp_status) { \
> +        float##a1t f1 = make_float##a1t(a1); \
> +        float##a2t f2 = make_float##a2t(a2); \
> +        \
> +        if (float##a1t##_is_any_nan(f1) || float##a2t##_is_any_nan(f2)) {
> \
> +            return FP##rt##_DEF_NAN; \
> +        } \
>

These nan checks shouldn't be needed if you're using QEMU softfloat
properly.


> +        float##rt result = op; \
> +        \
> +        if (float##rt##_is_any_nan(result)) { \
> +            return FP##rt##_DEF_NAN; \
> +        } \
>

Ditto


> +        return result; \
> +    }
> +
> +#define DEF_FP_INSN_3(name, rt, a1t, a2t, a3t, op) \
> +    uint##rt##_t fp_##name(uint##a1t##_t a1, uint##a2t##_t a2, \
> +                           uint##a3t##_t a3, float_status *fp_status) { \
> +        float##a1t f1 = make_float##a1t(a1); \
> +        float##a2t f2 = make_float##a2t(a2); \
> +        float##a3t f3 = make_float##a3t(a3); \
> +        \
> +        if (float##a1t##_is_any_nan(f1) || float##a2t##_is_any_nan(f2) ||
> \
> +            float##a3t##_is_any_nan(f3)) \
> +            return FP##rt##_DEF_NAN; \
>

Ditto


> +        \
> +        float##rt result = op; \
> +        \
> +        if (float##rt##_is_any_nan(result)) \
> +            return FP##rt##_DEF_NAN; \
>

Ditto


> +        return result; \
> +    }
> +
> diff --git a/target/hexagon/imported/mmvec/ext.idef
> b/target/hexagon/imported/mmvec/ext.idef
> index 03d31f6181..3f0d8e366e 100644
> --- a/target/hexagon/imported/mmvec/ext.idef
> +++ b/target/hexagon/imported/mmvec/ext.idef
> @@ -2895,9 +2895,102 @@ EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",
>  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_
>      }
>      } )
>
> +/* KVX - IEEE FP Instructions */
>
> +/* Single pipe, 32-bit output */
> +#define ITERATOR_INSN_IEEE_FP_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_32), \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>
> +/* Single pipe, 16-bit output */
> +#define ITERATOR_INSN_IEEE_FP_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16), \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>
> +/* Two pipes: P2 & P3, single output: P2, 32-bit output */
> +#define
> ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32),
> \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
> +
> +/* Two pipes: P2 & P3, two outputs, 32-bit output */
> +#define ITERATOR_INSN_IEEE_FP_DOUBLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32),
> \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
> +
> +/*
> + * single pipe, accumulate instruction, produces 16-bit output, requires
> 16-bit
> + * accumulate input
> + */
> +#define ITERATOR_INSN_IEEE_FP_ACC_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_16,A_CVI_VX_NO_TMP_LD),
> \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
> +
> +/*
> + * single pipe, accumulate instruction, produces 32-bit output, requires
> 32-bit
> + * accumulate input
> + */
> +#define ITERATOR_INSN_IEEE_FP_ACC_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
> +EXTINSN(V6_##TAG, SYNTAX, \
> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_32,A_CVI_VX_NO_TMP_LD),
> \
> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
> +
> +/* IEEE FP multiply instructions */
> +ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(32, vmpy_sf_sf,
> +    "Vd32.sf=vmpy(Vu32.sf,Vv32.sf)", "Vector IEEE mul: sf",
> +    VdV.sf[i] = fp_mult_sf_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
>

Do these instructions interact with the FP bits in USR (e.g., rounding
mode, FP exceptions)?

If so, you'll need something similar to arch_fpop_start/arch_fpop_end at
the start/end of each helper.  This can be done in gen_helper_funcs.py.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 10903 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns
  2026-03-23 13:15 ` [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns Matheus Tavares Bernardino
@ 2026-03-23 20:47   ` Taylor Simpson
  2026-03-24 20:15     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 20:47 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE floating-point min/max instructions:
> - vfmin_hf, vfmin_sf: IEEE floating-point minimum
> - vfmax_hf, vfmax_sf: IEEE floating-point maximum
> - vmax_hf, vmax_sf: qfloat IEEE maximum
> - vmin_hf, vmin_sf: qfloat IEEE minimum
>
> The Hexagon qfloat variants are similar to the IEEE-754 ones, but they
> handle NaN slightly differently. See comment on kvx_ieee.h
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/kvx_ieee.h              | 12 +++++
>  target/hexagon/mmvec/kvx_ieee.c              | 46 ++++++++++++++++++++
>  target/hexagon/imported/mmvec/encode_ext.def | 11 +++++
>  target/hexagon/imported/mmvec/ext.idef       | 28 +++++++++++-
>  4 files changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/target/hexagon/mmvec/kvx_ieee.h
> b/target/hexagon/mmvec/kvx_ieee.h
> index e92ddebeb9..78f546eb8e 100644
> --- a/target/hexagon/mmvec/kvx_ieee.h
> +++ b/target/hexagon/mmvec/kvx_ieee.h
> @@ -44,4 +44,16 @@ uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t
> a3, uint16_t a4,
>  uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
>                        uint16_t a4, float_status *fp_status);
>
> +/* IEEE - FP min/max instructions */
> +uint32_t fp_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint32_t fp_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint16_t fp_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint16_t fp_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +
> +/* Qfloat min/max treat +NaN as greater than +INF and -NaN as smaller
> than -INF */
> +uint32_t qf_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint32_t qf_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
> +uint16_t qf_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
> +uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>

Why are we including Qfloat stuff in a patch series for IEEE float?


> +
>  #endif
> diff --git a/target/hexagon/imported/mmvec/encode_ext.def
> b/target/hexagon/imported/mmvec/encode_ext.def
> index 4ce87d09fd..23fbb75743 100644
> --- a/target/hexagon/imported/mmvec/encode_ext.def
> +++ b/target/hexagon/imported/mmvec/encode_ext.def
> @@ -823,4 +823,15 @@
> DEF_ENC(V6_vsub_sf_hf,"00011111100vvvvvPP1uuuuu101ddddd")
>  DEF_ENC(V6_vadd_hf_hf,"00011111101vvvvvPP1uuuuu111ddddd")
>  DEF_ENC(V6_vsub_hf_hf,"00011111011vvvvvPP1uuuuu000ddddd")
>
> +/* IEEE FP min/max instructions */
> +DEF_ENC(V6_vfmin_hf,"00011100011vvvvvPP1uuuuu000ddddd")
> +DEF_ENC(V6_vfmin_sf,"00011100011vvvvvPP1uuuuu001ddddd")
> +DEF_ENC(V6_vfmax_hf,"00011100011vvvvvPP1uuuuu010ddddd")
> +DEF_ENC(V6_vfmax_sf,"00011100011vvvvvPP1uuuuu011ddddd")
> +DEF_ENC(V6_vmax_sf,"00011111110vvvvvPP1uuuuu001ddddd")
> +DEF_ENC(V6_vmin_sf,"00011111110vvvvvPP1uuuuu010ddddd")
> +DEF_ENC(V6_vmax_hf,"00011111110vvvvvPP1uuuuu011ddddd")
> +DEF_ENC(V6_vmin_hf,"00011111110vvvvvPP1uuuuu100ddddd")
> +DEF_ENC(V6_vcvt_ub_hf,"00011111110vvvvvPP1uuuuu101ddddd")
>

Minor nit - this is a conversion instruction and is repeated in patch 7.
Remove it from this patch.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 4259 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns
  2026-03-23 13:15 ` [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns Matheus Tavares Bernardino
@ 2026-03-23 21:08   ` Taylor Simpson
  2026-03-24 20:25     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 21:08 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE floating-point miscellaneous instructions:
> - vassign_fp (vfmv): vector move
> - vfneg_hf, vfneg_sf: vector floating-point negate
> - vabs_hf, vabs_sf: vector absolute value
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/kvx_ieee.h              |  3 +++
>  target/hexagon/imported/mmvec/encode_ext.def |  7 +++++++
>  target/hexagon/imported/mmvec/ext.idef       | 14 ++++++++++++++
>  3 files changed, 24 insertions(+)
>
> diff --git a/target/hexagon/mmvec/kvx_ieee.h
> b/target/hexagon/mmvec/kvx_ieee.h
> index 78f546eb8e..263feb7e94 100644
> --- a/target/hexagon/mmvec/kvx_ieee.h
> +++ b/target/hexagon/mmvec/kvx_ieee.h
> @@ -13,6 +13,9 @@
>  #define FP32_DEF_NAN      0x7FFFFFFF
>  #define FP16_DEF_NAN      0x7FFF
>
> +#define signF32UI(a) ((bool)((uint32_t)(a) >> 31))
> +#define signF16UI(a) ((bool)((uint16_t)(a) >> 15))
>

Use softfloat routines here
    !float32_is_neg
    !float16_is_neg

Actually, these aren't needed.  See below.


> diff --git a/target/hexagon/imported/mmvec/ext.idef
> b/target/hexagon/imported/mmvec/ext.idef
> index 43153366b1..5ef5baa404 100644
> --- a/target/hexagon/imported/mmvec/ext.idef
> +++ b/target/hexagon/imported/mmvec/ext.idef
> @@ -3018,6 +3018,20 @@
> ITERATOR_INSN_ANY_SLOT_2SRC(16,vmax_hf,"Vd32.hf=vmax(Vu32.hf,Vv32.hf)", \
>  ITERATOR_INSN_ANY_SLOT_2SRC(16,vmin_hf,"Vd32.hf=vmin(Vu32.hf,Vv32.hf)", \
>      "Vector min of hf input", VdV.hf[i] = qf_min_hf(VuV.hf[i], VvV.hf[i],
> &env->fp_status))
>
> +/* IEEE FP move, negate, abs instructions */
> +ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vassign_fp, "Vd32.w=vfmv(Vu32.w)", \
> +    "Vector IEEE move", VdV.w[i]  = VuV.w[i])
> +ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vfneg_hf, "Vd32.hf=vfneg(Vu32.hf)", \
> +    "Vector IEEE neg: hf", VdV.hf[i] = (VuV.hf[i] ^ 0x8000))
>

Use softfloat routines
    VdV.hf[i] = float16_set_sign(VuV.hf[i], float16_is_neg(VuV.hf[i]) ? 1 :
0)


> +ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vfneg_sf, "Vd32.sf=vfneg(Vu32.sf)", \
> +    "Vector IEEE neg: sf", VdV.sf[i] = (VuV.sf[i] ^ 0x80000000))
>

Ditto, but float32_


> +ITERATOR_INSN_IEEE_FP_16_32_LATE(16, vabs_hf,  "Vd32.hf=vabs(Vu32.hf)", \
> +    "Vector IEEE abs: hf", \
> +    VdV.hf[i] = ((signF16UI(VuV.hf[i])) ? (VuV.hf[i] ^ 0x8000) :
> VuV.hf[i]))
>

Use softfloat routines
    VdV.hf[i] = float16_abs(VuV.hf[i])


> +ITERATOR_INSN_IEEE_FP_16_32_LATE(32, vabs_sf,  "Vd32.sf=vabs(Vu32.sf)", \
> +    "Vector IEEE abs: sf", \
> +    VdV.sf[i] = ((signF32UI(VuV.sf[i])) ? (VuV.sf[i] ^ 0x80000000) :
> VuV.sf[i]))
>

Ditto, but float32_

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 4305 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns
  2026-03-23 13:15 ` [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns Matheus Tavares Bernardino
@ 2026-03-23 21:25   ` Taylor Simpson
  2026-03-24 21:04     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 21:25 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 4878 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE floating-point conversion instructions:
> - vconv_hf_h, vconv_h_hf, vconv_sf_w, vconv_w_sf: vconv operations
> - vcvt_hf_sf, vcvt_sf_hf: float <-> half float conversions
> - vcvt_hf_b, vcvt_hf_h, vcvt_hf_ub, vcvt_hf_uh: int to half float
> - vcvt_b_hf, vcvt_h_hf, vcvt_ub_hf, vcvt_uh_hf: half float to int
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/kvx_ieee.h              | 21 +++++
>  target/hexagon/mmvec/kvx_ieee.c              | 98 ++++++++++++++++++++
>  target/hexagon/imported/mmvec/encode_ext.def | 18 ++++
>  target/hexagon/imported/mmvec/ext.idef       | 97 +++++++++++++++++++
>  4 files changed, 234 insertions(+)
>
> diff --git a/target/hexagon/mmvec/kvx_ieee.c
> b/target/hexagon/mmvec/kvx_ieee.c
> index 33621a15f3..bbeec09707 100644
> --- a/target/hexagon/mmvec/kvx_ieee.c
> +++ b/target/hexagon/mmvec/kvx_ieee.c
> @@ -131,3 +131,101 @@ uint16_t qf_min_hf(uint16_t a1, uint16_t a2,
> float_status *fp_status)
>      if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a1;
>      return fp_min_hf(a1, a2, fp_status);
>  }
> +
> +uint16_t f32_to_f16(uint32_t a, float_status *fp_status)
> +{
> +    return float16_val(float32_to_float16(make_float32(a), true,
> fp_status));
> +}
> +
> +uint32_t f16_to_f32(uint16_t a, float_status *fp_status)
> +{
> +    return float32_val(float16_to_float32(make_float16(a), true,
> fp_status));
> +}
> +
> +uint16_t f16_to_uh(uint16_t op1, float_status *fp_status)
> +{
> +    return float16_to_uint16_scalbn(make_float16(op1),
> +                                    float_round_nearest_even,
>

Does HVX always use this rounding mode?  The scalar core uses the rounding
mode in USR.

There are several more instances below.


> +                                    0, fp_status);
> +}
> +
> +int16_t f16_to_h(uint16_t op1, float_status *fp_status)
> +{
> +    return float16_to_int16_scalbn(make_float16(op1),
> +                                   float_round_nearest_even,

+                                   0, fp_status);
> +}
> +
> +uint8_t f16_to_ub(uint16_t op1, float_status *fp_status)
> +{
> +    return float16_to_uint8_scalbn(make_float16(op1),
> +                                   float_round_nearest_even,

+                                   0, fp_status);
> +}
> +
> +int8_t f16_to_b(uint16_t op1, float_status *fp_status)
> +{
> +    return float16_to_int8_scalbn(make_float16(op1),
> +                                   float_round_nearest_even,

+                                   0, fp_status);
> +}
> +
> +uint16_t uh_to_f16(uint16_t op1)
> +{
> +    return uint64_to_float16_scalbn(op1, float_round_nearest_even, 0);
> +}
> +
> +uint16_t h_to_f16(int16_t op1)
> +{
> +    return int64_to_float16_scalbn(op1, float_round_nearest_even, 0);
> +}
> +
> +uint16_t ub_to_f16(uint8_t op1)
> +{
> +    return uint64_to_float16_scalbn(op1, float_round_nearest_even, 0);
> +}
> +
> +uint16_t b_to_f16(int8_t op1)
> +{
> +    return int64_to_float16_scalbn(op1, float_round_nearest_even, 0);
> +}
> +
> +int32_t conv_sf_w(int32_t a, float_status *fp_status)
> +{
> +    return float32_val(int32_to_float32(a, fp_status));
> +}
> +
> +int16_t conv_hf_h(int16_t a, float_status *fp_status)
> +{
> +    return float16_val(int16_to_float16(a, fp_status));
> +}
> +
> +int32_t conv_w_sf(uint32_t a, float_status *fp_status)
> +{
> +    float_status scratch_fpst = {};
> +    const float32 W_MAX = int32_to_float32(INT32_MAX, &scratch_fpst);
> +    const float32 W_MIN = int32_to_float32(INT32_MIN, &scratch_fpst);
> +    float32 f1 = make_float32(a);
> +
> +    if (float32_is_any_nan(f1) || float32_is_infinity(f1) ||
> +        float32_le_quiet(W_MAX, f1, fp_status) ||
> +        float32_le_quiet(f1, W_MIN, fp_status)) {
> +        return float32_is_neg(f1) ? INT32_MIN : INT32_MAX;
> +    }
>

Does float32_to_int32 handle these checks?


> +    return float32_to_int32_round_to_zero(f1, fp_status);
>

Rounding mode?


> +}
> +
> +int16_t conv_h_hf(uint16_t a, float_status *fp_status)
> +{/
> +    float_status scratch_fpst = {};
> +    const float16 H_MAX = int16_to_float16(INT16_MAX, &scratch_fpst);
> +    const float16 H_MIN = int16_to_float16(INT16_MIN, &scratch_fpst);
> +    float16 f1 = make_float16(a);
> +
> +    if (float16_is_any_nan(f1) || float16_is_infinity(f1) ||
> +        float16_le_quiet(H_MAX, f1, fp_status) ||
> +        float16_le_quiet(f1, H_MIN, fp_status)) {
> +        return float16_is_neg(f1) ? INT16_MIN : INT16_MAX;
> +    }
> +    return float16_to_int16_round_to_zero(f1, fp_status);
> +}
>

Ditto

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 6754 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns
  2026-03-23 13:15 ` [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns Matheus Tavares Bernardino
@ 2026-03-23 21:42   ` Taylor Simpson
  2026-03-26 13:00     ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 21:42 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE floating-point compare instructions:
> - V6_vgthf, V6_vgtsf: greater-than compare
> - V6_vgthf_and, V6_vgtsf_and: greater-than with predicate-and
> - V6_vgthf_or, V6_vgtsf_or: greater-than with predicate-or
> - V6_vgthf_xor, V6_vgtsf_xor: greater-than with predicate-xor
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/macros.h                | 10 ++++
>  target/hexagon/attribs_def.h.inc             |  2 +
>  target/hexagon/hex_common.py                 |  1 +
>  target/hexagon/imported/mmvec/encode_ext.def | 10 ++++
>  target/hexagon/imported/mmvec/ext.idef       | 61 ++++++++++++++++++++
>  5 files changed, 84 insertions(+)
>
> diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
> index 2af3d2d747..c342507d1a 100644
> --- a/target/hexagon/mmvec/macros.h
> +++ b/target/hexagon/mmvec/macros.h
> @@ -356,4 +356,14 @@
>                 extract32(VAL, POS * 8, 8); \
>      } while (0);
>
> +#define fCMPGT_SF(A, B) \
> +    (float32_is_any_nan(A) || float32_is_any_nan(B) ? \
> +     (int32_t)(A) > (int32_t)(B) : \
>

Seems odd to do an integer comparison of two NaN's


> +     float32_compare((A), (B), &env->fp_status) == float_relation_greater)
> +
> +#define fCMPGT_HF(A, B) \
> +    (float16_is_any_nan(A) || float16_is_any_nan(B) ? \
> +    (int16_t)(A) > (int16_t)(B) : \
>

Ditto


> +    float16_compare((A), (B), &env->fp_status) == float_relation_greater)
> +
>  #endif
>

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 2610 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns
  2026-03-23 13:15 ` [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns Matheus Tavares Bernardino
@ 2026-03-23 22:03   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-23 22:03 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 4448 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Add HVX IEEE bfloat16 (bf16) instructions:
>
> Arithmetic operations:
> - V6_vadd_sf_bf, V6_vsub_sf_bf: add/sub bf16 widening to sf output
> - V6_vmpy_sf_bf: multiply bf16 widening to sf output
> - V6_vmpy_sf_bf_acc: multiply-accumulate bf16 widening to sf output
>
> Min/Max operations:
> - V6_vmin_bf, V6_vmax_bf: bf16 min/max
>
> Comparison operations:
> - V6_vgtbf: greater-than compare
> - V6_vgtbf_and, V6_vgtbf_or, V6_vgtbf_xor: predicate variants
>
> Conversion operations:
> - V6_vcvt_bf_sf: convert sf to bf16
>
> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  target/hexagon/mmvec/kvx_ieee.h              | 36 +++++++++++
>  target/hexagon/mmvec/macros.h                |  5 ++
>  target/hexagon/mmvec/mmvec.h                 |  1 +
>  target/hexagon/mmvec/kvx_ieee.c              |  3 +
>  target/hexagon/imported/mmvec/encode_ext.def | 15 +++++
>  target/hexagon/imported/mmvec/ext.idef       | 64 ++++++++++++++++++++
>  6 files changed, 124 insertions(+)
>
> diff --git a/target/hexagon/mmvec/kvx_ieee.h
> b/target/hexagon/mmvec/kvx_ieee.h
> index 8a6816f6b3..eb670d4ec3 100644
> --- a/target/hexagon/mmvec/kvx_ieee.h
> +++ b/target/hexagon/mmvec/kvx_ieee.h
> @@ -80,4 +80,40 @@ int16_t conv_hf_h(int16_t a, float_status *fp_status);
>  int32_t conv_w_sf(uint32_t a, float_status *fp_status);
>  int16_t conv_h_hf(uint16_t a, float_status *fp_status);
>
> +/* IEEE BFloat instructions */
> +
> +#define fp_mult_sf_bf(A, B) \
> +    fp_mult_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16,
> &env->fp_status)
> +#define fp_add_sf_bf(A, B) \
> +    fp_add_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16,
> &env->fp_status)
> +#define fp_sub_sf_bf(A, B) \
> +    fp_sub_sf_sf(((uint32_t)(A)) << 16, ((uint32_t)(B)) << 16,
> &env->fp_status)
>

Can we use softfloat routine bfloat16_to_float32 instead of shifting by 16?


> +
> +uint32_t fp_mult_sf_bf_acc(uint16_t op1, uint16_t op2, uint32_t acc,
> +                           float_status *fp_status);
> +
> +#define bf_to_sf(A) (((uint32_t)(A)) << 16)
>

Ditto


> +
> +#define fp_min_bf(A, B) ({ \
> +    uint32_t _bf_res = fp_min_sf(bf_to_sf(A), bf_to_sf(B),
> &env->fp_status); \
> +    (uint16_t)((_bf_res >> 16) & 0xffff); \
>

float32_to_bfloat16


> +})
> +
> +#define fp_max_bf(A, B) ({ \
> +    uint32_t _bf_res = fp_max_sf(bf_to_sf(A), bf_to_sf(B),
> &env->fp_status); \
> +    (uint16_t)((_bf_res >> 16) & 0xffff); \
>

Ditto


> +})
> +
> +static inline uint16_t sf_to_bf(int32_t A)
> +{
> +    uint32_t rslt = A;
> +    if ((rslt & 0x1FFFF) == 0x08000) {
> +        /* do not round up if exactly .5 and even already */
> +    } else if ((rslt & 0x8000) == 0x8000) {
> +        rslt += 0x8000; /* rounding to nearest number */
> +    }
> +    rslt = float32_is_any_nan(A) ? FP32_DEF_NAN : rslt;
> +    return rslt >> 16;
> +}
>

float32_to_bfloat16


> +
>  #endif
> diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
> index c342507d1a..b70996578e 100644
> --- a/target/hexagon/mmvec/macros.h
> +++ b/target/hexagon/mmvec/macros.h
> @@ -25,6 +25,9 @@
>  #include "accel/tcg/probe.h"
>  #include "mmvec/kvx_ieee.h"
>
> +#define fBFLOAT()
> +#define fCVI_VX_NO_TMP_LD()
> +
>  #ifndef QEMU_GENERATE
>  #define VdV      (*(MMVector *restrict)(VdV_void))
>  #define VsV      (*(MMVector *restrict)(VsV_void))
> @@ -366,4 +369,6 @@
>      (int16_t)(A) > (int16_t)(B) : \
>      float16_compare((A), (B), &env->fp_status) == float_relation_greater)
>
> +#define fCMPGT_BF(A, B) fCMPGT_SF(((int)A) << 16, ((int)B) << 16)
>

bfloat16_to_float32


> +
>  #endif
> diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
> index eaedfe0d6d..9d8d57c7c6 100644
> --- a/target/hexagon/mmvec/mmvec.h
> +++ b/target/hexagon/mmvec/mmvec.h
> @@ -40,6 +40,7 @@ typedef union {
>      int8_t    b[MAX_VEC_SIZE_BYTES / 1];
>      int32_t  sf[MAX_VEC_SIZE_BYTES / 4];   /* single float (32-bit) */
>      int16_t  hf[MAX_VEC_SIZE_BYTES / 2];   /* half float (16-bit) */
> +    uint16_t bf[MAX_VEC_SIZE_BYTES / 2];   /* bfloat16 */
>

Consider using bfloat16

Also float32 for sf and float16 for hf.


>  } MMVector;


 Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 6531 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-23 19:32   ` Taylor Simpson
@ 2026-03-24 16:52     ` Matheus Bernardino
  2026-03-24 18:48       ` Taylor Simpson
  0 siblings, 1 reply; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 16:52 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 4:33 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> This flag will be used to control the HVX IEEE float instructions, which
>> are only available at some Hexagon cores. When unavailable, the
>> instruction is essentially treated as a no-op.
>>
>> Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
>> ---
>>  target/hexagon/cpu.h             |  1 +
>>  target/hexagon/translate.h       |  1 +
>>  target/hexagon/attribs_def.h.inc |  3 +++
>>  target/hexagon/cpu.c             |  1 +
>>  target/hexagon/decode.c          | 22 ++++++++++++++++++++++
>>  target/hexagon/translate.c       |  1 +
>>  6 files changed, 29 insertions(+)
>>
>>
>> diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
>> index dbc9c630e8..d832a64a17 100644
>> --- a/target/hexagon/decode.c
>> +++ b/target/hexagon/decode.c
>> @@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
>>      return !bitmap_empty(conflict, 32);
>>  }
>>
>> +static void convert_to_nop(Insn *insn)
>> +{
>> +    bool is_endloop = insn->is_endloop;
>> +    memset(insn, 0, sizeof(*insn));
>> +    insn->opcode = A2_nop;
>> +    insn->new_read_idx = -1;
>> +    insn->dest_idx = -1;
>> +    insn->generate = opcode_genptr[insn->opcode];
>> +    insn->iclass = 0b111;
>> +    insn->is_endloop = is_endloop;
>> +}
>> +
>>  /*
>>   * decode_packet
>>   * Decodes packet with given words
>> @@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int max_words, const uint32_t *words,
>>          /* Ran out of words! */
>>          return 0;
>>      }
>> +
>> +    /* Disable HVX IEEE instruction if extension is disabled. */
>> +    if (!ctx->ieee_fp_extension) {
>> +        for (i = 0; i < num_insns; i++) {
>> +            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
>> +                convert_to_nop(&pkt->insn[i]);
>> +            }
>> +        }
>> +    }
>> +
>
>
> Better to leave the instruction alone and turn it into a nop by not generating any TCG.
>
> That way, the disassembly (-d in_asm) will still show what's actually in the binary.  You could add the check in gen_tcg_funcs.py.
>
> You could also consider adding some sort of marker in the disassembly to indicate that the flag is needed for the instruction to do anything.

Ah, good idea. Will do both for the next round, thanks.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-24 16:52     ` Matheus Bernardino
@ 2026-03-24 18:48       ` Taylor Simpson
  2026-03-24 19:20         ` Brian Cain
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 18:48 UTC (permalink / raw)
  To: Matheus Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 3423 bytes --]

On Tue, Mar 24, 2026 at 10:52 AM Matheus Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> On Mon, Mar 23, 2026 at 4:33 PM Taylor Simpson <ltaylorsimpson@gmail.com>
> wrote:
> >
> >
> >
> > On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com> wrote:
> >>
> >> This flag will be used to control the HVX IEEE float instructions, which
> >> are only available at some Hexagon cores. When unavailable, the
> >> instruction is essentially treated as a no-op.
> >>
> >> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> >> ---
> >>  target/hexagon/cpu.h             |  1 +
> >>  target/hexagon/translate.h       |  1 +
> >>  target/hexagon/attribs_def.h.inc |  3 +++
> >>  target/hexagon/cpu.c             |  1 +
> >>  target/hexagon/decode.c          | 22 ++++++++++++++++++++++
> >>  target/hexagon/translate.c       |  1 +
> >>  6 files changed, 29 insertions(+)
> >>
> >>
> >> diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
> >> index dbc9c630e8..d832a64a17 100644
> >> --- a/target/hexagon/decode.c
> >> +++ b/target/hexagon/decode.c
> >> @@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
> >>      return !bitmap_empty(conflict, 32);
> >>  }
> >>
> >> +static void convert_to_nop(Insn *insn)
> >> +{
> >> +    bool is_endloop = insn->is_endloop;
> >> +    memset(insn, 0, sizeof(*insn));
> >> +    insn->opcode = A2_nop;
> >> +    insn->new_read_idx = -1;
> >> +    insn->dest_idx = -1;
> >> +    insn->generate = opcode_genptr[insn->opcode];
> >> +    insn->iclass = 0b111;
> >> +    insn->is_endloop = is_endloop;
> >> +}
> >> +
> >>  /*
> >>   * decode_packet
> >>   * Decodes packet with given words
> >> @@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int
> max_words, const uint32_t *words,
> >>          /* Ran out of words! */
> >>          return 0;
> >>      }
> >> +
> >> +    /* Disable HVX IEEE instruction if extension is disabled. */
> >> +    if (!ctx->ieee_fp_extension) {
> >> +        for (i = 0; i < num_insns; i++) {
> >> +            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
> >> +                convert_to_nop(&pkt->insn[i]);
> >> +            }
> >> +        }
> >> +    }
> >> +
> >
> >
> > Better to leave the instruction alone and turn it into a nop by not
> generating any TCG.
> >
> > That way, the disassembly (-d in_asm) will still show what's actually in
> the binary.  You could add the check in gen_tcg_funcs.py.
> >
> > You could also consider adding some sort of marker in the disassembly to
> indicate that the flag is needed for the instruction to do anything.
>
> Ah, good idea. Will do both for the next round, thanks.
>

Note that we'll need to be careful with packets that use the result vector
in a .new context.  For example
    { V0.sf = vadd(V1.sf,V2.sf)
      vmem(R19+#0x0) = V0.new }
The problem is that the store wants to read the value from future_VRegs.
However, if the vadd is  nop, there is junk in future_VRegs.  So, we'll
either have to get the store to read from the real VRegs or have the vadd
copy the old value of the destination into the future_VRegs value.  The
first option will be more efficient because it will avoid the vector copy.

We should also add a test to fp_hvx_disabled for this case.

HTH,
Taylor

[-- Attachment #2: Type: text/html, Size: 4712 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics
  2026-03-23 13:15 ` [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics Matheus Tavares Bernardino
@ 2026-03-24 19:05   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:05 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 7261 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  tests/tcg/hexagon/hvx_misc.h        |  12 +++
>  tests/tcg/hexagon/fp_hvx.c          | 129 ++++++++++++++++++++++++++++
>  tests/tcg/hexagon/fp_hvx_disabled.c |  32 +++++++
>  tests/tcg/hexagon/Makefile.target   |   8 ++
>  4 files changed, 181 insertions(+)
>  create mode 100644 tests/tcg/hexagon/fp_hvx.c
>  create mode 100644 tests/tcg/hexagon/fp_hvx_disabled.c
>
> diff --git a/tests/tcg/hexagon/fp_hvx.c b/tests/tcg/hexagon/fp_hvx.c
> new file mode 100644
> index 0000000000..85b8ff78ed
> --- /dev/null
> +++ b/tests/tcg/hexagon/fp_hvx.c
> @@ -0,0 +1,129 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <hexagon_types.h>
> +#include <hvx_hexagon_protos.h>
> +
> +int err;
> +#include "hvx_misc.h"
> +
> +#if __HEXAGON_ARCH__ > 75
> +#error "After v75, compiler will replace some FP HVX instructions."
> +#endif
> +
>
> +/******************************************************************************
> + * NAN handling
> +
> *****************************************************************************/
> +
> +#define isnan(X) \
> +     (sizeof(X) == bytes_hf ? ((raw_hf(X) & ~0x8000) > 0x7c00) : \
> +                              ((raw_sf(X) & ~(1 << 31)) > 0x7f800000UL))
> +
> +#define CHECK_NAN(A, DEF_NAN) (isnan(A) ? DEF_NAN : (A))
> +#define NAN_SF float_sf(0x7FFFFFFF)
> +#define NAN_HF float_hf(0x7FFF)
> +
>
> +/******************************************************************************
> + * Binary operations
> +
> *****************************************************************************/
> +
> +#define DEF_TEST_OP_2(vop, op, type_res, type_arg) \
> +    static void test_##vop##_##type_res##_##type_arg(void) \
> +    { \
> +        memset(expect, 0xff, sizeof(expect)); \
> +        memset(output, 0xff, sizeof(expect)); \
>

sizeof(output)


> +        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
> +        HVX_Vector hvx_buffer0 = *(HVX_Vector *)&buffer0[0]; \
> +        HVX_Vector hvx_buffer1 = *(HVX_Vector *)&buffer1[0]; \
> +        \
> +        *hvx_output = \
> +
> Q6_V##type_res##_##vop##_V##type_arg##V##type_arg(hvx_buffer0, \
> +
> hvx_buffer1); \
> +        \
> +        for (int i = 0; i < MAX_VEC_SIZE_BYTES / bytes_##type_res; i++) {
> \
> +            expect[0].type_res[i] = \
> +
> raw_##type_res(op(float_##type_arg(buffer0[0].type_arg[i]), \
> +
> float_##type_arg(buffer1[0].type_arg[i]))); \
> +        } \
>

Put this in a loop over the input buffers to get more input values.  Then
change the second argument to check_output below.


> +        check_output_##type_res(__LINE__, 1); \
> +    }
> +
> +#define SUM(X, Y, DEF_NAN) CHECK_NAN((X) + (Y), DEF_NAN)
> +#define SUB(X, Y, DEF_NAN) CHECK_NAN((X) - (Y), DEF_NAN)
> +#define MULT(X, Y, DEF_NAN) CHECK_NAN((X) * (Y), DEF_NAN)
> +
> +#define SUM_SF(X, Y) SUM(X, Y, NAN_SF)
> +#define SUM_HF(X, Y) SUM(X, Y, NAN_HF)
> +#define SUB_SF(X, Y) SUB(X, Y, NAN_SF)
> +#define SUB_HF(X, Y) SUB(X, Y, NAN_HF)
> +#define MULT_SF(X, Y) MULT(X, Y, NAN_SF)
> +#define MULT_HF(X, Y) MULT(X, Y, NAN_HF)
> +
> +DEF_TEST_OP_2(vadd, SUM_SF, sf, sf);
> +DEF_TEST_OP_2(vadd, SUM_HF, hf, hf);
> +DEF_TEST_OP_2(vsub, SUB_SF, sf, sf);
> +DEF_TEST_OP_2(vsub, SUB_HF, hf, hf);
> +DEF_TEST_OP_2(vmpy, MULT_SF, sf, sf);
> +DEF_TEST_OP_2(vmpy, MULT_HF, hf, hf);
> +
>
> +/******************************************************************************
> + * Other tests
> +
> *****************************************************************************/
> +
> +void test_vdmpy_sf_hf(bool acc)
> +{
> +    HVX_Vector *hvx_output = (HVX_Vector *)&output[0];
> +    HVX_Vector hvx_buffer0 = *(HVX_Vector *)&buffer0[0];
> +    HVX_Vector hvx_buffer1 = *(HVX_Vector *)&buffer1[0];
> +
> +    uint32_t PREFIL_VAL = 0x111222;
> +    memset(expect, 0xff, sizeof(expect));
> +    *hvx_output = Q6_V_vsplat_R(PREFIL_VAL);
> +
> +    if (!acc) {
> +        *hvx_output = Q6_Vsf_vdmpy_VhfVhf(hvx_buffer0, hvx_buffer1);
> +    } else {
> +        *hvx_output = Q6_Vsf_vdmpyacc_VsfVhfVhf(*hvx_output, hvx_buffer0,
> +                                                hvx_buffer1);
> +    }
> +
> +    for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
> +        float a1 = float_hf_to_sf(float_hf(buffer0[0].hf[2 * i + 1]));
> +        float a2 = float_hf_to_sf(float_hf(buffer0[0].hf[2 * i]));
> +        float a3 = float_hf_to_sf(float_hf(buffer1[0].hf[2 * i + 1]));
> +        float a4 = float_hf_to_sf(float_hf(buffer1[0].hf[2 * i]));
> +        float prev = acc ? float_sf(PREFIL_VAL) : 0;
> +        expect[0].sf[i] = raw_sf(CHECK_NAN((a1 * a3) + (a2 * a4) + prev,
> NAN_SF));
> +    }
>

Put this into a loop also.


> +
> +    check_output_sf(__LINE__, 1);
> +}
> +
> +int main(void)
> +{
> +    init_buffers();
>

The init_buffers function is designed to create inputs for non-FP functions.
Create a new function to initialize the buffers with interesting FP values
(e.g., NaN, large FP values that will lead to overflow).
Also, see my prior comment about FP flags.  We'll want to check those here.
We should also add some tests with packets.  See my prior comment about
.new values.


> +
> +    /* add/sub */
> +    test_vadd_sf_sf();
> +    test_vadd_hf_hf();
> +    test_vsub_sf_sf();
> +    test_vsub_hf_hf();
> +
> +    /* multiply */
> +    test_vmpy_sf_sf();
> +    test_vmpy_hf_hf();
> +
> +    /* dot product */
> +    test_vdmpy_sf_hf(false);
> +    test_vdmpy_sf_hf(true);
> +
> +    puts(err ? "FAIL" : "PASS");
> +    return err ? 1 : 0;
> +}
> diff --git a/tests/tcg/hexagon/fp_hvx_disabled.c
> b/tests/tcg/hexagon/fp_hvx_disabled.c
> new file mode 100644
> index 0000000000..af409ab8d2
> --- /dev/null
> +++ b/tests/tcg/hexagon/fp_hvx_disabled.c
> @@ -0,0 +1,32 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <hexagon_types.h>
> +#include <hvx_hexagon_protos.h>
> +
> +int err;
> +#include "hvx_misc.h"
> +
> +int main(void)
> +{
> +    asm volatile("r0 = #0xff\n"
> +                 "v0 = vsplat(r0)\n"
> +                 "vmem(%1 + #0) = v0\n"
> +                 "r1 = #0x1\n"
> +                 "v1 = vsplat(r1)\n"
> +                 "v2 = vsplat(r1)\n"
> +                 "v0.sf = vadd(v1.sf, v2.sf)\n"
> +                 "vmem(%0 + #0) = v0\n"
> +                 :
> +                 : "r"(output), "r"(expect)
> +                 : "r0", "r1", "v0", "v1", "v2", "memory");
>

Add a test where the result is used in a .new context.


> +
> +    check_output_w(__LINE__, 1);
> +    puts(err ? "FAIL" : "PASS");
> +    return err ? 1 : 0;
> +}
>
>

[-- Attachment #2: Type: text/html, Size: 9689 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max
  2026-03-23 13:15 ` [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max Matheus Tavares Bernardino
@ 2026-03-24 19:07   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:07 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 361 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  tests/tcg/hexagon/fp_hvx.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>

Reviewed-by: Taylor Simpson <ltaylorsimpson@gmail.com>

[-- Attachment #2: Type: text/html, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-24 18:48       ` Taylor Simpson
@ 2026-03-24 19:20         ` Brian Cain
  2026-03-24 19:46           ` Taylor Simpson
  0 siblings, 1 reply; 39+ messages in thread
From: Brian Cain @ 2026-03-24 19:20 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: Matheus Bernardino, qemu-devel, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 4104 bytes --]

On Tue, Mar 24, 2026 at 1:48 PM Taylor Simpson <ltaylorsimpson@gmail.com>
wrote:

>
>
> On Tue, Mar 24, 2026 at 10:52 AM Matheus Bernardino <
> matheus.bernardino@oss.qualcomm.com> wrote:
>
>> On Mon, Mar 23, 2026 at 4:33 PM Taylor Simpson <ltaylorsimpson@gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
>> matheus.bernardino@oss.qualcomm.com> wrote:
>> >>
>> >> This flag will be used to control the HVX IEEE float instructions,
>> which
>> >> are only available at some Hexagon cores. When unavailable, the
>> >> instruction is essentially treated as a no-op.
>> >>
>> >> Signed-off-by: Matheus Tavares Bernardino <
>> matheus.bernardino@oss.qualcomm.com>
>> >> ---
>> >>  target/hexagon/cpu.h             |  1 +
>> >>  target/hexagon/translate.h       |  1 +
>> >>  target/hexagon/attribs_def.h.inc |  3 +++
>> >>  target/hexagon/cpu.c             |  1 +
>> >>  target/hexagon/decode.c          | 22 ++++++++++++++++++++++
>> >>  target/hexagon/translate.c       |  1 +
>> >>  6 files changed, 29 insertions(+)
>> >>
>> >>
>> >> diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
>> >> index dbc9c630e8..d832a64a17 100644
>> >> --- a/target/hexagon/decode.c
>> >> +++ b/target/hexagon/decode.c
>> >> @@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
>> >>      return !bitmap_empty(conflict, 32);
>> >>  }
>> >>
>> >> +static void convert_to_nop(Insn *insn)
>> >> +{
>> >> +    bool is_endloop = insn->is_endloop;
>> >> +    memset(insn, 0, sizeof(*insn));
>> >> +    insn->opcode = A2_nop;
>> >> +    insn->new_read_idx = -1;
>> >> +    insn->dest_idx = -1;
>> >> +    insn->generate = opcode_genptr[insn->opcode];
>> >> +    insn->iclass = 0b111;
>> >> +    insn->is_endloop = is_endloop;
>> >> +}
>> >> +
>> >>  /*
>> >>   * decode_packet
>> >>   * Decodes packet with given words
>> >> @@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int
>> max_words, const uint32_t *words,
>> >>          /* Ran out of words! */
>> >>          return 0;
>> >>      }
>> >> +
>> >> +    /* Disable HVX IEEE instruction if extension is disabled. */
>> >> +    if (!ctx->ieee_fp_extension) {
>> >> +        for (i = 0; i < num_insns; i++) {
>> >> +            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
>> >> +                convert_to_nop(&pkt->insn[i]);
>> >> +            }
>> >> +        }
>> >> +    }
>> >> +
>> >
>> >
>> > Better to leave the instruction alone and turn it into a nop by not
>> generating any TCG.
>> >
>> > That way, the disassembly (-d in_asm) will still show what's actually
>> in the binary.  You could add the check in gen_tcg_funcs.py.
>> >
>> > You could also consider adding some sort of marker in the disassembly
>> to indicate that the flag is needed for the instruction to do anything.
>>
>> Ah, good idea. Will do both for the next round, thanks.
>>
>
> Note that we'll need to be careful with packets that use the result vector
> in a .new context.  For example
>     { V0.sf = vadd(V1.sf,V2.sf)
>       vmem(R19+#0x0) = V0.new }
> The problem is that the store wants to read the value from future_VRegs.
> However, if the vadd is  nop, there is junk in future_VRegs.  So, we'll
> either have to get the store to read from the real VRegs or have the vadd
> copy the old value of the destination into the future_VRegs value.  The
> first option will be more efficient because it will avoid the vector copy.
>
>
For the sake of ease-of-verification we'll want to do whatever the ISS
does.  It's not very obvious to me what it would do in this packet context
based on the description of the nop-like behavior, but we'll follow the
ISS' lead.  In practical terms the garbage in future_VRegs is probably just
as bad or good as any other value - if you bothered to execute this packet
on the target w/o support for this opcode you probably don't care much
about the result.



> We should also add a test to fp_hvx_disabled for this case.
>
> HTH,
> Taylor
>

[-- Attachment #2: Type: text/html, Size: 5968 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-23 20:28   ` Taylor Simpson
@ 2026-03-24 19:30     ` Matheus Bernardino
  2026-03-24 19:51       ` Taylor Simpson
  0 siblings, 1 reply; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 19:30 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 5:29 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> Add HVX IEEE floating-point arithmetic instructions:
>> - vmpy_sf_sf, vmpy_sf_hf, vmpy_hf_hf: multiply operations
>> - vdmpy_sf_hf: dot-product multiply
>> - vmpy_sf_hf_acc, vmpy_hf_hf_acc, vdmpy_sf_hf_acc: multiply-accumulate
>> - vadd_sf_sf, vsub_sf_sf, vadd_sf_hf, vsub_sf_hf: add/sub with sf output
>> - vadd_hf_hf, vsub_hf_hf: add/sub with hf output
>>
>> Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
>> ---
>>  target/hexagon/mmvec/kvx_ieee.h              | 47 ++++++++++
>>  target/hexagon/mmvec/macros.h                |  1 +
>>  target/hexagon/mmvec/mmvec.h                 |  2 +
>>  target/hexagon/attribs_def.h.inc             |  4 +
>>  target/hexagon/mmvec/kvx_ieee.c              | 87 ++++++++++++++++++
>>  target/hexagon/hex_common.py                 |  1 +
>>  target/hexagon/imported/mmvec/encode_ext.def | 18 ++++
>>  target/hexagon/imported/mmvec/ext.idef       | 93 ++++++++++++++++++++
>>  target/hexagon/meson.build                   |  1 +
>>  9 files changed, 254 insertions(+)
>>  create mode 100644 target/hexagon/mmvec/kvx_ieee.h
>>  create mode 100644 target/hexagon/mmvec/kvx_ieee.c
>
>
> I'm curious why the prefix is kvx instead of hvx.

Actually, not sure either... These files were imported from the arch
simulator. Brian suggested it was a left over acronym that didn't
catch on. I'll rename to hvx_ieee

>>
>> diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
>> new file mode 100644
>> index 0000000000..e92ddebeb9
>> --- /dev/null
>> +++ b/target/hexagon/mmvec/kvx_ieee.h
>> @@ -0,0 +1,47 @@
>> +/*
>> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
>> + *
>> + *  SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#ifndef HEXAGON_KVX_IEEE_H
>> +#define HEXAGON_KVX_IEEE_H
>> +
>> +#include "fpu/softfloat.h"
>> +
>> +/* Hexagon canonical NaN */
>> +#define FP32_DEF_NAN      0x7FFFFFFF
>> +#define FP16_DEF_NAN      0x7FFF
>
>
> These are the same as the scalar core, right?  If so, there's already a call to set_float_default_nan_pattern in hexagon_cpu_reset_hold.
>
> If the patterns are different, you'll need to call set_float_default_nan_pattern before each scalar FP instruction and before each HVX FP instruction.

Not the same, no. I'll adjust the set_float_default_nan_pattern call
accordingly.

>> +
>> +/*
>> + * IEEE - FP ADD/SUB/MPY instructionsFP
>> + */
>> +uint32_t fp_mult_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint32_t fp_add_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint32_t fp_sub_sf_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +
>> +uint16_t fp_mult_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint16_t fp_add_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint16_t fp_sub_hf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +
>> +uint32_t fp_mult_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint32_t fp_add_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint32_t fp_sub_sf_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +
>> +/*
>> + * IEEE - FP Accumulate instructions
>> + */
>> +uint16_t fp_mult_hf_hf_acc(uint16_t a1, uint16_t a2, uint16_t acc,
>> +                           float_status *fp_status);
>> +uint32_t fp_mult_sf_hf_acc(uint16_t a1, uint16_t a2, uint32_t acc,
>> +                           float_status *fp_status);
>> +
>> +/*
>> + * IEEE - FP Reduce instructions
>> + */
>> +uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
>> +                  float_status *fp_status);
>> +uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
>> +                      uint16_t a4, float_status *fp_status);
>> +
>
>
> Consider using macros similar to the ones in the .c file to create these protos.

Hmm, I think in this case, the boilerplate size will outweight the
benefit of the macros.

>>
>> +#endif
>> diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
>> new file mode 100644
>> index 0000000000..b763899aa3
>> --- /dev/null
>> +++ b/target/hexagon/mmvec/kvx_ieee.c
>> @@ -0,0 +1,87 @@
>> +/*
>> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
>> + *
>> + *  SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "kvx_ieee.h"
>> +
>> +#define DEF_FP_INSN_2(name, rt, a1t, a2t, op) \
>> +    uint##rt##_t fp_##name(uint##a1t##_t a1, uint##a2t##_t a2, \
>> +                           float_status *fp_status) { \
>> +        float##a1t f1 = make_float##a1t(a1); \
>> +        float##a2t f2 = make_float##a2t(a2); \
>> +        \
>> +        if (float##a1t##_is_any_nan(f1) || float##a2t##_is_any_nan(f2)) { \
>> +            return FP##rt##_DEF_NAN; \
>> +        } \
>
>
> These nan checks shouldn't be needed if you're using QEMU softfloat properly.

Ah, indeed. Just checked that. Will change

>> +
>> diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
>> index 03d31f6181..3f0d8e366e 100644
>> --- a/target/hexagon/imported/mmvec/ext.idef
>> +++ b/target/hexagon/imported/mmvec/ext.idef
>> @@ -2895,9 +2895,102 @@ EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_
>>      }
>>      } )
>>
>> +/* KVX - IEEE FP Instructions */
>>
>> +/* Single pipe, 32-bit output */
>> +#define ITERATOR_INSN_IEEE_FP_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_32), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>>
>> +/* Single pipe, 16-bit output */
>> +#define ITERATOR_INSN_IEEE_FP_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_OUT_16), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>>
>> +/* Two pipes: P2 & P3, single output: P2, 32-bit output */
>> +#define ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>> +
>> +/* Two pipes: P2 & P3, two outputs, 32-bit output */
>> +#define ITERATOR_INSN_IEEE_FP_DOUBLE_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX_DV,A_HVX_IEEE_FP_OUT_32), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>> +
>> +/*
>> + * single pipe, accumulate instruction, produces 16-bit output, requires 16-bit
>> + * accumulate input
>> + */
>> +#define ITERATOR_INSN_IEEE_FP_ACC_16(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_16,A_CVI_VX_NO_TMP_LD), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>> +
>> +/*
>> + * single pipe, accumulate instruction, produces 32-bit output, requires 32-bit
>> + * accumulate input
>> + */
>> +#define ITERATOR_INSN_IEEE_FP_ACC_32(WIDTH,TAG,SYNTAX,DESCR,CODE) \
>> +EXTINSN(V6_##TAG, SYNTAX, \
>> +ATTRIBS(A_EXTENSION,A_HVX_IEEE_FP,A_CVI,A_CVI_VX,A_HVX_IEEE_FP_ACC,A_HVX_IEEE_FP_OUT_32,A_CVI_VX_NO_TMP_LD), \
>> +DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
>> +
>> +/* IEEE FP multiply instructions */
>> +ITERATOR_INSN_IEEE_FP_DOUBLE_SINGLE_32(32, vmpy_sf_sf,
>> +    "Vd32.sf=vmpy(Vu32.sf,Vv32.sf)", "Vector IEEE mul: sf",
>> +    VdV.sf[i] = fp_mult_sf_sf(VuV.sf[i], VvV.sf[i], &env->fp_status))
>
>
> Do these instructions interact with the FP bits in USR (e.g., rounding mode, FP exceptions)?

They do not. I'll add a new env->hvx_fp_status and use that for the
default nan. This way we can avoid messing up with the scalar
fp_status.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions
  2026-03-23 13:15 ` [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions Matheus Tavares Bernardino
@ 2026-03-24 19:30   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:30 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 8004 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  tests/tcg/hexagon/hex_test.h      |  15 +++
>  tests/tcg/hexagon/hvx_misc.h      |   2 +
>  tests/tcg/hexagon/fp_hvx_cvt.c    | 194 ++++++++++++++++++++++++++++++
>  tests/tcg/hexagon/Makefile.target |   3 +
>  4 files changed, 214 insertions(+)
>  create mode 100644 tests/tcg/hexagon/fp_hvx_cvt.c
>
> diff --git a/tests/tcg/hexagon/fp_hvx_cvt.c
> b/tests/tcg/hexagon/fp_hvx_cvt.c
> new file mode 100644
> index 0000000000..7497455ac6
> --- /dev/null
> +++ b/tests/tcg/hexagon/fp_hvx_cvt.c
> @@ -0,0 +1,194 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <hexagon_types.h>
> +#include <hvx_hexagon_protos.h>
> +
> +#if __HEXAGON_ARCH__ > 75
> +#error "After v75, compiler will replace some FP HVX instructions."
> +#endif
> +
> +int err;
> +#include "hvx_misc.h"
> +#include "hex_test.h"
> +
> +#define TEST_EXP(TO, FROM, VAL, EXP) do { \
> +    ((MMVector *)&buffer)->FROM[index] = VAL; \
> +    expect[0].TO[index] = EXP; \
> +    index++; \
> +} while (0)
> +
> +#define DEF_TEST_CVT(TO, FROM, TESTS) \
> +    void test_vcvt_##TO##_##FROM(void) \
>

static


> +    { \
> +        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
> +        HVX_Vector buffer; \
> +        int index = 0; \
> +        memset(&buffer, 0, sizeof(buffer)); \
> +        memset(expect, 0, sizeof(expect)); \
> +        TESTS \
> +        *hvx_output = Q6_V##TO##_vcvt_V##FROM(buffer); \
> +        check_output_##TO(__LINE__, 1); \
> +    }
> +
> +DEF_TEST_CVT(uh, hf, { \
> +    TEST_EXP(uh, hf, HF_QNaN, UINT16_MAX); \
> +    TEST_EXP(uh, hf, HF_SNaN, UINT16_MAX); \
> +    TEST_EXP(uh, hf, HF_QNaN_neg, UINT16_MAX); \
> +    TEST_EXP(uh, hf, HF_INF, UINT16_MAX); \
> +    TEST_EXP(uh, hf, HF_INF_neg, 0); \
> +    TEST_EXP(uh, hf, HF_neg_two, 0); \
> +    TEST_EXP(uh, hf, HF_zero_neg, 0); \
> +    TEST_EXP(uh, hf, raw_hf((_Float16)2.1), 2); \
> +    TEST_EXP(uh, hf, HF_one_recip, 1); \
> +})
> +
> +DEF_TEST_CVT(h, hf, { \
> +    TEST_EXP(h, hf, HF_QNaN, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_SNaN, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_QNaN_neg, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_INF, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_INF_neg, INT16_MIN); \
> +    TEST_EXP(h, hf, HF_neg_two, -2); \
> +    TEST_EXP(h, hf, HF_zero_neg, 0); \
> +    TEST_EXP(h, hf, raw_hf((_Float16)2.1), 2); \
> +    TEST_EXP(h, hf, HF_one_recip, 1); \
> +})
> +
> +/*
> + * Some cvt operations take two vectors as input and perform the
> following:
> + *    VdV.TO[4*i]   = OP(VuV.FROM[2*i]);
> + *    VdV.TO[4*i+1] = OP(VuV.FROM[2*i+1]);
> + *    VdV.TO[4*i+2] = OP(VvV.FROM[2*i]);
> + *    VdV.TO[4*i+3] = OP(VvV.FROM[2*i+1]))
> + * We use bf_index and index in a way that the tests are always done
> either
> + * using the first or third line of the above snippet.
> + */
> +#define TEST_EXP_2(TO, FROM, VAL, EXP) do { \
> +    ((MMVector *)&buffers[bf_index])->FROM[2 * index] = VAL; \
> +    expect[0].TO[(4 * index) + (2 * bf_index)] = EXP; \
> +    index++; \
> +    bf_index = (bf_index + 1) % 2; \
> +} while (0)
> +
> +#define DEF_TEST_CVT_2(TO, FROM, TESTS) \
> +    void test_vcvt_##TO##_##FROM(void) \
>

static


> +    { \
> +        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
> +        HVX_Vector buffers[2]; \
> +        int index = 0, bf_index = 0; \
> +        memset(&buffers, 0, sizeof(buffers)); \
> +        memset(expect, 0, sizeof(expect)); \
> +        TESTS \
> +        *hvx_output = Q6_V##TO##_vcvt_V##FROM##V##FROM(buffers[0],
> buffers[1]); \
> +        check_output_##TO(__LINE__, 1); \
> +    }
> +
> +DEF_TEST_CVT_2(ub, hf, { \
> +    TEST_EXP_2(ub, hf, HF_QNaN, UINT8_MAX); \
> +    TEST_EXP_2(ub, hf, HF_SNaN, UINT8_MAX); \
> +    TEST_EXP_2(ub, hf, HF_QNaN_neg, UINT8_MAX); \
> +    TEST_EXP_2(ub, hf, HF_INF, UINT8_MAX); \
> +    TEST_EXP_2(ub, hf, HF_INF_neg, 0); \
> +    TEST_EXP_2(ub, hf, HF_small_neg, 0); \
> +    TEST_EXP_2(ub, hf, HF_neg_two, 0); \
> +    TEST_EXP_2(ub, hf, HF_zero_neg, 0); \
> +    TEST_EXP_2(ub, hf, raw_hf((_Float16)2.1), 2); \
> +    TEST_EXP_2(ub, hf, HF_one_recip, 1); \
> +})
> +
> +DEF_TEST_CVT_2(b, hf, { \
> +    TEST_EXP_2(b, hf, HF_QNaN, INT8_MAX); \
> +    TEST_EXP_2(b, hf, HF_SNaN, INT8_MAX); \
> +    TEST_EXP_2(b, hf, HF_QNaN_neg, INT8_MAX); \
> +    TEST_EXP_2(b, hf, HF_INF, INT8_MAX); \
> +    TEST_EXP_2(b, hf, HF_INF_neg, INT8_MIN); \
> +    TEST_EXP_2(b, hf, HF_small_neg, 0); \
> +    TEST_EXP_2(b, hf, HF_neg_two, -2); \
> +    TEST_EXP_2(b, hf, HF_zero_neg, 0); \
> +    TEST_EXP_2(b, hf, raw_hf((_Float16)2.1), 2); \
> +    TEST_EXP_2(b, hf, HF_one_recip, 1); \
> +})
> +
> +#define DEF_TEST_VCONV(TO, FROM, TESTS) \
> +    void test_vconv_##TO##_##FROM(void) \
>

static


> +    { \
> +        HVX_Vector *hvx_output = (HVX_Vector *)&output[0]; \
> +        HVX_Vector buffer; \
> +        int index = 0; \
> +        memset(&buffer, 0, sizeof(buffer)); \
> +        memset(expect, 0, sizeof(expect)); \
> +        TESTS \
> +        *hvx_output = Q6_V##TO##_equals_V##FROM(buffer); \
> +        check_output_##TO(__LINE__, 1); \
> +    }
> +
> +DEF_TEST_VCONV(w, sf, { \
> +    TEST_EXP(w, sf, SF_QNaN, INT32_MAX); \
> +    TEST_EXP(w, sf, SF_SNaN, INT32_MAX); \
> +    TEST_EXP(w, sf, SF_QNaN_neg, INT32_MIN); \
> +    TEST_EXP(w, sf, SF_INF, INT32_MAX); \
> +    TEST_EXP(w, sf, SF_INF_neg, INT32_MIN); \
> +    TEST_EXP(w, sf, SF_small_neg, 0); \
> +    TEST_EXP(w, sf, SF_neg_two, -2); \
> +    TEST_EXP(w, sf, SF_zero_neg, 0); \
> +    TEST_EXP(w, sf, raw_sf(2.1f), 2); \
> +    TEST_EXP(w, sf, raw_sf(2.8f), 2); \
> +})
> +
> +DEF_TEST_VCONV(h, hf, { \
> +    TEST_EXP(h, hf, HF_QNaN, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_SNaN, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_QNaN_neg, INT16_MIN); \
> +    TEST_EXP(h, hf, HF_INF, INT16_MAX); \
> +    TEST_EXP(h, hf, HF_INF_neg, INT16_MIN); \
> +    TEST_EXP(h, hf, HF_small_neg, 0); \
> +    TEST_EXP(h, hf, HF_neg_two, -2); \
> +    TEST_EXP(h, hf, HF_zero_neg, 0); \
> +    TEST_EXP(h, hf, raw_hf(2.1), 2); \
> +    TEST_EXP(h, hf, raw_hf(2.8), 2); \
> +})
> +
> +DEF_TEST_VCONV(hf, h, { \
> +    TEST_EXP(hf, h, INT16_MAX, HF_QNaN); \
> +    TEST_EXP(hf, h, INT16_MAX, HF_SNaN); \
> +    TEST_EXP(hf, h, INT16_MIN, HF_QNaN_neg); \
> +    TEST_EXP(hf, h, INT16_MAX, HF_INF); \
> +    TEST_EXP(hf, h, INT16_MIN, HF_INF_neg); \
> +    TEST_EXP(hf, h, 0, HF_small_neg); \
> +    TEST_EXP(hf, h, -2, HF_neg_two); \
> +    TEST_EXP(hf, h, 0, HF_zero_neg); \
> +    TEST_EXP(hf, h, 2, raw_hf(2.1)); \
> +    TEST_EXP(hf, h, 2, raw_hf(2.8)); \
> +})
> +
> +DEF_TEST_VCONV(sf, w, { \
> +    TEST_EXP(sf, w, INT32_MAX, SF_QNaN); \
> +    TEST_EXP(sf, w, INT32_MAX, SF_SNaN); \
> +    TEST_EXP(sf, w, INT32_MIN, SF_QNaN_neg); \
> +    TEST_EXP(sf, w, INT32_MAX, SF_INF); \
> +    TEST_EXP(sf, w, INT32_MIN, SF_INF_neg); \
> +    TEST_EXP(sf, w, 0, SF_small_neg); \
> +    TEST_EXP(sf, w, -2, SF_neg_two); \
> +    TEST_EXP(sf, w, 0, SF_zero_neg); \
> +    TEST_EXP(sf, w, 2, raw_sf(2.1f)); \
> +    TEST_EXP(sf, w, 2, raw_sf(2.8f)); \
> +})
> +
> +int main(void)
> +{
> +    test_vcvt_uh_hf();
> +    test_vcvt_h_hf();
> +    test_vcvt_ub_hf();
> +    test_vcvt_b_hf();
> +    test_vconv_w_sf();
>

Several more of these were created above but not called here.  Marking them
static will flag the errors.


> +    puts(err ? "FAIL" : "PASS");
> +    return err ? 1 : 0;
> +}
>

Also, add checks for FP flags.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 9989 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons
  2026-03-23 13:15 ` [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons Matheus Tavares Bernardino
@ 2026-03-24 19:37   ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:37 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> Signed-off-by: Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com>
> ---
>  tests/tcg/hexagon/hex_test.h      |  1 +
>  tests/tcg/hexagon/fp_hvx_cmp.c    | 58 +++++++++++++++++++++++++++++++
>  tests/tcg/hexagon/Makefile.target |  3 ++
>  3 files changed, 62 insertions(+)
>  create mode 100644 tests/tcg/hexagon/fp_hvx_cmp.c
>
> diff --git a/tests/tcg/hexagon/fp_hvx_cmp.c
> b/tests/tcg/hexagon/fp_hvx_cmp.c
> new file mode 100644
> index 0000000000..e925c973f3
> --- /dev/null
> +++ b/tests/tcg/hexagon/fp_hvx_cmp.c
> @@ -0,0 +1,58 @@
> +/*
> + *  Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + *
> + *  SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <hexagon_types.h>
> +#include <hvx_hexagon_protos.h>
> +
> +#if __HEXAGON_ARCH__ > 75
> +#error "After v75, compiler will replace some FP HVX instructions."
> +#endif
> +
> +int err;
> +#include "hvx_misc.h"
> +#include "hex_test.h"
> +
> +#define TEST_CMP(VAL1, VAL2, EXP) do { \
> +    ((MMVector *)&buffers[0])->sf[index] = VAL1; \
> +    ((MMVector *)&buffers[1])->sf[index] = VAL2; \
> +    expect[0].w[index] = EXP ? 0xffffffff : 0; \
> +    index++; \
> +} while (0)
> +
> +int main(void)
> +{
> +    HVX_Vector *hvx_output = (HVX_Vector *)&output[0];
> +    HVX_Vector buffers[2], true_vec, false_vec;
> +    HVX_VectorPred pred;
> +    int index = 0;
> +
> +    memset(&buffers, 0, sizeof(buffers));
> +    memset(expect, 0, sizeof(expect));
> +    memset(&true_vec, 0xff, sizeof(true_vec));
> +    memset(&false_vec, 0, sizeof(false_vec));
> +
> +    TEST_CMP(raw_sf(2.2),  raw_sf(2.1),  true);
> +    TEST_CMP(raw_sf(2.2),  raw_sf(2.2),  false);
> +    TEST_CMP(raw_sf(0),    raw_sf(-2.2), true);
> +    TEST_CMP(SF_SNaN,      SF_SNaN,      false);
> +    TEST_CMP(SF_INF,       SF_INF_neg,   true);
> +    TEST_CMP(SF_INF_neg,   SF_INF,       false);
> +    TEST_CMP(SF_SNaN,      SF_QNaN,      false);
> +    TEST_CMP(SF_QNaN,      SF_SNaN,      true);
> +    TEST_CMP(SF_QNaN,      SF_QNaN_neg,  true);
> +
> +    pred = Q6_Q_vcmp_gt_VsfVsf(buffers[0], buffers[1]);
> +    *hvx_output = Q6_V_vmux_QVV(pred, true_vec, false_vec);
> +
> +    check_output_sf(__LINE__, 1);
> +
> +    puts(err ? "FAIL" : "PASS");
> +    return err ? 1 : 0;
> +}
>

We should add some half-float tests as well as and/or/xor versions.
Also, check the FP flags.
Finally, we should add another test for bfloat.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 3576 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension
  2026-03-24 19:20         ` Brian Cain
@ 2026-03-24 19:46           ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:46 UTC (permalink / raw)
  To: Brian Cain
  Cc: Matheus Bernardino, qemu-devel, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 4310 bytes --]

On Tue, Mar 24, 2026 at 1:21 PM Brian Cain <brian.cain@oss.qualcomm.com>
wrote:

>
>
> On Tue, Mar 24, 2026 at 1:48 PM Taylor Simpson <ltaylorsimpson@gmail.com>
> wrote:
>
>>
>>
>> On Tue, Mar 24, 2026 at 10:52 AM Matheus Bernardino <
>> matheus.bernardino@oss.qualcomm.com> wrote:
>>
>>> On Mon, Mar 23, 2026 at 4:33 PM Taylor Simpson <ltaylorsimpson@gmail.com>
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <
>>> matheus.bernardino@oss.qualcomm.com> wrote:
>>> >>
>>> >> This flag will be used to control the HVX IEEE float instructions,
>>> which
>>> >> are only available at some Hexagon cores. When unavailable, the
>>> >> instruction is essentially treated as a no-op.
>>> >>
>>> >> Signed-off-by: Matheus Tavares Bernardino <
>>> matheus.bernardino@oss.qualcomm.com>
>>> >> ---
>>> >>  target/hexagon/cpu.h             |  1 +
>>> >>  target/hexagon/translate.h       |  1 +
>>> >>  target/hexagon/attribs_def.h.inc |  3 +++
>>> >>  target/hexagon/cpu.c             |  1 +
>>> >>  target/hexagon/decode.c          | 22 ++++++++++++++++++++++
>>> >>  target/hexagon/translate.c       |  1 +
>>> >>  6 files changed, 29 insertions(+)
>>> >>
>>> >>
>>> >> diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
>>> >> index dbc9c630e8..d832a64a17 100644
>>> >> --- a/target/hexagon/decode.c
>>> >> +++ b/target/hexagon/decode.c
>>> >> @@ -696,6 +696,18 @@ static bool pkt_has_write_conflict(Packet *pkt)
>>> >>      return !bitmap_empty(conflict, 32);
>>> >>  }
>>> >>
>>> >> +static void convert_to_nop(Insn *insn)
>>> >> +{
>>> >> +    bool is_endloop = insn->is_endloop;
>>> >> +    memset(insn, 0, sizeof(*insn));
>>> >> +    insn->opcode = A2_nop;
>>> >> +    insn->new_read_idx = -1;
>>> >> +    insn->dest_idx = -1;
>>> >> +    insn->generate = opcode_genptr[insn->opcode];
>>> >> +    insn->iclass = 0b111;
>>> >> +    insn->is_endloop = is_endloop;
>>> >> +}
>>> >> +
>>> >>  /*
>>> >>   * decode_packet
>>> >>   * Decodes packet with given words
>>> >> @@ -746,6 +758,16 @@ int decode_packet(DisasContext *ctx, int
>>> max_words, const uint32_t *words,
>>> >>          /* Ran out of words! */
>>> >>          return 0;
>>> >>      }
>>> >> +
>>> >> +    /* Disable HVX IEEE instruction if extension is disabled. */
>>> >> +    if (!ctx->ieee_fp_extension) {
>>> >> +        for (i = 0; i < num_insns; i++) {
>>> >> +            if (GET_ATTRIB(pkt->insn[i].opcode, A_HVX_IEEE_FP)) {
>>> >> +                convert_to_nop(&pkt->insn[i]);
>>> >> +            }
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >
>>> >
>>> > Better to leave the instruction alone and turn it into a nop by not
>>> generating any TCG.
>>> >
>>> > That way, the disassembly (-d in_asm) will still show what's actually
>>> in the binary.  You could add the check in gen_tcg_funcs.py.
>>> >
>>> > You could also consider adding some sort of marker in the disassembly
>>> to indicate that the flag is needed for the instruction to do anything.
>>>
>>> Ah, good idea. Will do both for the next round, thanks.
>>>
>>
>> Note that we'll need to be careful with packets that use the result
>> vector in a .new context.  For example
>>     { V0.sf = vadd(V1.sf,V2.sf)
>>       vmem(R19+#0x0) = V0.new }
>> The problem is that the store wants to read the value from future_VRegs.
>> However, if the vadd is  nop, there is junk in future_VRegs.  So, we'll
>> either have to get the store to read from the real VRegs or have the vadd
>> copy the old value of the destination into the future_VRegs value.  The
>> first option will be more efficient because it will avoid the vector copy.
>>
>>
> For the sake of ease-of-verification we'll want to do whatever the ISS
> does.  It's not very obvious to me what it would do in this packet context
> based on the description of the nop-like behavior, but we'll follow the
> ISS' lead.  In practical terms the garbage in future_VRegs is probably just
> as bad or good as any other value - if you bothered to execute this packet
> on the target w/o support for this opcode you probably don't care much
> about the result.
>

I'll be interested to know what the ISS and hardware do in this case.

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 6166 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-24 19:30     ` Matheus Bernardino
@ 2026-03-24 19:51       ` Taylor Simpson
  2026-03-24 19:59         ` Matheus Bernardino
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Simpson @ 2026-03-24 19:51 UTC (permalink / raw)
  To: Matheus Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Tue, Mar 24, 2026 at 1:30 PM Matheus Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> On Mon, Mar 23, 2026 at 5:29 PM Taylor Simpson <ltaylorsimpson@gmail.com>
> wrote:
> >> +/*
> >> + * IEEE - FP Reduce instructions
> >> + */
> >> +uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
> >> +                  float_status *fp_status);
> >> +uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t
> a3,
> >> +                      uint16_t a4, float_status *fp_status);
> >> +
> >
> >
> > Consider using macros similar to the ones in the .c file to create these
> protos.
>
> Hmm, I think in this case, the boilerplate size will outweight the
> benefit of the macros.
>

OK


> >
> > Do these instructions interact with the FP bits in USR (e.g., rounding
> mode, FP exceptions)?
>
> They do not. I'll add a new env->hvx_fp_status and use that for the
> default nan. This way we can avoid messing up with the scalar
> fp_status.
>

That will work for the nan.  Is there any programmer-visible state for
rounding mode or FP exceptions?

Thanks,
Taylor

[-- Attachment #2: Type: text/html, Size: 1892 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-24 19:51       ` Taylor Simpson
@ 2026-03-24 19:59         ` Matheus Bernardino
  2026-03-25  1:18           ` Taylor Simpson
  0 siblings, 1 reply; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 19:59 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Tue, Mar 24, 2026 at 4:51 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Tue, Mar 24, 2026 at 1:30 PM Matheus Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> On Mon, Mar 23, 2026 at 5:29 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>>
>> >
>> > Do these instructions interact with the FP bits in USR (e.g., rounding mode, FP exceptions)?
>>
>> They do not. I'll add a new env->hvx_fp_status and use that for the
>> default nan. This way we can avoid messing up with the scalar
>> fp_status.
>
>
> That will work for the nan.  Is there any programmer-visible state for rounding mode or FP exceptions?

No, rounding is always float_round_nearest_even (the default) and
FWICT the HVX IEEE FP functions don't track or report any FP
exceptions.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns
  2026-03-23 20:47   ` Taylor Simpson
@ 2026-03-24 20:15     ` Matheus Bernardino
  0 siblings, 0 replies; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 20:15 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 5:48 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> Add HVX IEEE floating-point min/max instructions:
>> - vfmin_hf, vfmin_sf: IEEE floating-point minimum
>> - vfmax_hf, vfmax_sf: IEEE floating-point maximum
>> - vmax_hf, vmax_sf: qfloat IEEE maximum
>> - vmin_hf, vmin_sf: qfloat IEEE minimum
>>
>> The Hexagon qfloat variants are similar to the IEEE-754 ones, but they
>> handle NaN slightly differently. See comment on kvx_ieee.h
>>
>> Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
>> ---
>>  target/hexagon/mmvec/kvx_ieee.h              | 12 +++++
>>  target/hexagon/mmvec/kvx_ieee.c              | 46 ++++++++++++++++++++
>>  target/hexagon/imported/mmvec/encode_ext.def | 11 +++++
>>  target/hexagon/imported/mmvec/ext.idef       | 28 +++++++++++-
>>  4 files changed, 96 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
>> index e92ddebeb9..78f546eb8e 100644
>> --- a/target/hexagon/mmvec/kvx_ieee.h
>> +++ b/target/hexagon/mmvec/kvx_ieee.h
>> @@ -44,4 +44,16 @@ uint32_t fp_vdmpy(uint16_t a1, uint16_t a2, uint16_t a3, uint16_t a4,
>>  uint32_t fp_vdmpy_acc(uint32_t acc, uint16_t a1, uint16_t a2, uint16_t a3,
>>                        uint16_t a4, float_status *fp_status);
>>
>> +/* IEEE - FP min/max instructions */
>> +uint32_t fp_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint32_t fp_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint16_t fp_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint16_t fp_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +
>> +/* Qfloat min/max treat +NaN as greater than +INF and -NaN as smaller than -INF */
>> +uint32_t qf_max_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint32_t qf_min_sf(uint32_t a1, uint32_t a2, float_status *fp_status);
>> +uint16_t qf_max_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>> +uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status);
>
>
> Why are we including Qfloat stuff in a patch series for IEEE float?

Well, good point. We can separate those. I've only included them here
because their semantics are very similar to the respective IEEE
variants, and they are more common in Hexagon code as the IEEE
variants need the extension, which is not available in all cores.

>>
>> +
>>  #endif
>> diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
>> index 4ce87d09fd..23fbb75743 100644
>> --- a/target/hexagon/imported/mmvec/encode_ext.def
>> +++ b/target/hexagon/imported/mmvec/encode_ext.def
>> @@ -823,4 +823,15 @@ DEF_ENC(V6_vsub_sf_hf,"00011111100vvvvvPP1uuuuu101ddddd")
>>  DEF_ENC(V6_vadd_hf_hf,"00011111101vvvvvPP1uuuuu111ddddd")
>>  DEF_ENC(V6_vsub_hf_hf,"00011111011vvvvvPP1uuuuu000ddddd")
>>
>> +/* IEEE FP min/max instructions */
>> +DEF_ENC(V6_vfmin_hf,"00011100011vvvvvPP1uuuuu000ddddd")
>> +DEF_ENC(V6_vfmin_sf,"00011100011vvvvvPP1uuuuu001ddddd")
>> +DEF_ENC(V6_vfmax_hf,"00011100011vvvvvPP1uuuuu010ddddd")
>> +DEF_ENC(V6_vfmax_sf,"00011100011vvvvvPP1uuuuu011ddddd")
>> +DEF_ENC(V6_vmax_sf,"00011111110vvvvvPP1uuuuu001ddddd")
>> +DEF_ENC(V6_vmin_sf,"00011111110vvvvvPP1uuuuu010ddddd")
>> +DEF_ENC(V6_vmax_hf,"00011111110vvvvvPP1uuuuu011ddddd")
>> +DEF_ENC(V6_vmin_hf,"00011111110vvvvvPP1uuuuu100ddddd")
>> +DEF_ENC(V6_vcvt_ub_hf,"00011111110vvvvvPP1uuuuu101ddddd")
>
>
> Minor nit - this is a conversion instruction and is repeated in patch 7.  Remove it from this patch.

Will do, thanks.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns
  2026-03-23 21:08   ` Taylor Simpson
@ 2026-03-24 20:25     ` Matheus Bernardino
  0 siblings, 0 replies; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 20:25 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 6:08 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:15 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> Add HVX IEEE floating-point miscellaneous instructions:
>> - vassign_fp (vfmv): vector move
>> - vfneg_hf, vfneg_sf: vector floating-point negate
>> - vabs_hf, vabs_sf: vector absolute value
>>
>> Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
>> ---
>>  target/hexagon/mmvec/kvx_ieee.h              |  3 +++
>>  target/hexagon/imported/mmvec/encode_ext.def |  7 +++++++
>>  target/hexagon/imported/mmvec/ext.idef       | 14 ++++++++++++++
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/target/hexagon/mmvec/kvx_ieee.h b/target/hexagon/mmvec/kvx_ieee.h
>> index 78f546eb8e..263feb7e94 100644
>> --- a/target/hexagon/mmvec/kvx_ieee.h
>> +++ b/target/hexagon/mmvec/kvx_ieee.h
>> @@ -13,6 +13,9 @@
>>  #define FP32_DEF_NAN      0x7FFFFFFF
>>  #define FP16_DEF_NAN      0x7FFF
>>
>> +#define signF32UI(a) ((bool)((uint32_t)(a) >> 31))
>> +#define signF16UI(a) ((bool)((uint16_t)(a) >> 15))
>
>
> Use softfloat routines here
>     !float32_is_neg
>     !float16_is_neg
>
> Actually, these aren't needed.  See below.

Ah, good idea, thanks!


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns
  2026-03-23 21:25   ` Taylor Simpson
@ 2026-03-24 21:04     ` Matheus Bernardino
  2026-03-25  1:15       ` Taylor Simpson
  0 siblings, 1 reply; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-24 21:04 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 6:26 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>>
>> diff --git a/target/hexagon/mmvec/kvx_ieee.c b/target/hexagon/mmvec/kvx_ieee.c
>> index 33621a15f3..bbeec09707 100644
>> --- a/target/hexagon/mmvec/kvx_ieee.c
>> +++ b/target/hexagon/mmvec/kvx_ieee.c
>> @@ -131,3 +131,101 @@ uint16_t qf_min_hf(uint16_t a1, uint16_t a2, float_status *fp_status)
>>      if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a1;
>>      return fp_min_hf(a1, a2, fp_status);
>>  }
>> +
>> +uint16_t f16_to_uh(uint16_t op1, float_status *fp_status)
>> +{
>> +    return float16_to_uint16_scalbn(make_float16(op1),
>> +                                    float_round_nearest_even,
>
>
> Does HVX always use this rounding mode?  The scalar core uses the rounding mode in USR.

Yeah, almost always this mode, with the exception of some
instructions. It's not configurable via USR (or anything else).

>> +
>> +int32_t conv_w_sf(uint32_t a, float_status *fp_status)
>> +{
>> +    float_status scratch_fpst = {};
>> +    const float32 W_MAX = int32_to_float32(INT32_MAX, &scratch_fpst);
>> +    const float32 W_MIN = int32_to_float32(INT32_MIN, &scratch_fpst);
>> +    float32 f1 = make_float32(a);
>> +
>> +    if (float32_is_any_nan(f1) || float32_is_infinity(f1) ||
>> +        float32_le_quiet(W_MAX, f1, fp_status) ||
>> +        float32_le_quiet(f1, W_MIN, fp_status)) {
>> +        return float32_is_neg(f1) ? INT32_MIN : INT32_MAX;
>> +    }
>
>
> Does float32_to_int32 handle these checks?

Hmm, I don't think they are exactly the same. For example,
float32_to_int32 will return INT32_MAX for any NAN. But the hexagon
implementation here returns INT32_MIN for negative NAN.

>>
>> +    return float32_to_int32_round_to_zero(f1, fp_status);
>
>
> Rounding mode?

This is one of those exceptions I mentioned earlier.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns
  2026-03-24 21:04     ` Matheus Bernardino
@ 2026-03-25  1:15       ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-25  1:15 UTC (permalink / raw)
  To: Matheus Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Tue, Mar 24, 2026 at 3:04 PM Matheus Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> On Mon, Mar 23, 2026 at 6:26 PM Taylor Simpson <ltaylorsimpson@gmail.com>
> wrote:
> >
> >
> >
> > On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <
> matheus.bernardino@oss.qualcomm.com> wrote:
> >>
> >>
> >> diff --git a/target/hexagon/mmvec/kvx_ieee.c
> b/target/hexagon/mmvec/kvx_ieee.c
> >> index 33621a15f3..bbeec09707 100644
> >> --- a/target/hexagon/mmvec/kvx_ieee.c
> >> +++ b/target/hexagon/mmvec/kvx_ieee.c
> >> @@ -131,3 +131,101 @@ uint16_t qf_min_hf(uint16_t a1, uint16_t a2,
> float_status *fp_status)
> >>      if (float16_is_pos_nan(f2) || float16_is_neg_nan(f1)) return a1;
> >>      return fp_min_hf(a1, a2, fp_status);
> >>  }
> >> +
> >> +uint16_t f16_to_uh(uint16_t op1, float_status *fp_status)
> >> +{
> >> +    return float16_to_uint16_scalbn(make_float16(op1),
> >> +                                    float_round_nearest_even,
> >
> >
> > Does HVX always use this rounding mode?  The scalar core uses the
> rounding mode in USR.
>
> Yeah, almost always this mode, with the exception of some
> instructions. It's not configurable via USR (or anything else).
>

You can set that in the hvx_fp_status, and the softfloat lib will handle it
from there.


>
> >> +
> >> +int32_t conv_w_sf(uint32_t a, float_status *fp_status)
> >> +{
> >> +    float_status scratch_fpst = {};
> >> +    const float32 W_MAX = int32_to_float32(INT32_MAX, &scratch_fpst);
> >> +    const float32 W_MIN = int32_to_float32(INT32_MIN, &scratch_fpst);
> >> +    float32 f1 = make_float32(a);
> >> +
> >> +    if (float32_is_any_nan(f1) || float32_is_infinity(f1) ||
> >> +        float32_le_quiet(W_MAX, f1, fp_status) ||
> >> +        float32_le_quiet(f1, W_MIN, fp_status)) {
> >> +        return float32_is_neg(f1) ? INT32_MIN : INT32_MAX;
> >> +    }
> >
> >
> > Does float32_to_int32 handle these checks?
>
> Hmm, I don't think they are exactly the same. For example,
> float32_to_int32 will return INT32_MAX for any NAN. But the hexagon
> implementation here returns INT32_MIN for negative NAN.
>

Look around in the softfloat code - especially fields in float_status.  Ths
scalar core has a few exceptions, but not alot.

[-- Attachment #2: Type: text/html, Size: 3335 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns
  2026-03-24 19:59         ` Matheus Bernardino
@ 2026-03-25  1:18           ` Taylor Simpson
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Simpson @ 2026-03-25  1:18 UTC (permalink / raw)
  To: Matheus Bernardino
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

On Tue, Mar 24, 2026 at 2:00 PM Matheus Bernardino <
matheus.bernardino@oss.qualcomm.com> wrote:

> On Tue, Mar 24, 2026 at 4:51 PM Taylor Simpson <ltaylorsimpson@gmail.com>
> wrote:
> >
> >
> >
> > On Tue, Mar 24, 2026 at 1:30 PM Matheus Bernardino <
> matheus.bernardino@oss.qualcomm.com> wrote:
> >>
> >> On Mon, Mar 23, 2026 at 5:29 PM Taylor Simpson <
> ltaylorsimpson@gmail.com> wrote:
> >>
> >> >
> >> > Do these instructions interact with the FP bits in USR (e.g.,
> rounding mode, FP exceptions)?
> >>
> >> They do not. I'll add a new env->hvx_fp_status and use that for the
> >> default nan. This way we can avoid messing up with the scalar
> >> fp_status.
> >
> >
> > That will work for the nan.  Is there any programmer-visible state for
> rounding mode or FP exceptions?
>
> No, rounding is always float_round_nearest_even (the default) and
> FWICT the HVX IEEE FP functions don't track or report any FP
> exceptions.
>

OK, then disregard my comments about checking the exceptions.

[-- Attachment #2: Type: text/html, Size: 1729 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns
  2026-03-23 21:42   ` Taylor Simpson
@ 2026-03-26 13:00     ` Matheus Bernardino
  0 siblings, 0 replies; 39+ messages in thread
From: Matheus Bernardino @ 2026-03-26 13:00 UTC (permalink / raw)
  To: Taylor Simpson
  Cc: qemu-devel, brian.cain, ale, anjo, marco.liebel, philmd,
	quic_mburton, sid.manning

On Mon, Mar 23, 2026 at 6:42 PM Taylor Simpson <ltaylorsimpson@gmail.com> wrote:
>
>
>
> On Mon, Mar 23, 2026 at 7:16 AM Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com> wrote:
>>
>> Add HVX IEEE floating-point compare instructions:
>> - V6_vgthf, V6_vgtsf: greater-than compare
>> - V6_vgthf_and, V6_vgtsf_and: greater-than with predicate-and
>> - V6_vgthf_or, V6_vgtsf_or: greater-than with predicate-or
>> - V6_vgthf_xor, V6_vgtsf_xor: greater-than with predicate-xor
>>
>> Signed-off-by: Matheus Tavares Bernardino <matheus.bernardino@oss.qualcomm.com>
>> ---
>>  target/hexagon/mmvec/macros.h                | 10 ++++
>>  target/hexagon/attribs_def.h.inc             |  2 +
>>  target/hexagon/hex_common.py                 |  1 +
>>  target/hexagon/imported/mmvec/encode_ext.def | 10 ++++
>>  target/hexagon/imported/mmvec/ext.idef       | 61 ++++++++++++++++++++
>>  5 files changed, 84 insertions(+)
>>
>> diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
>> index 2af3d2d747..c342507d1a 100644
>> --- a/target/hexagon/mmvec/macros.h
>> +++ b/target/hexagon/mmvec/macros.h
>> @@ -356,4 +356,14 @@
>>                 extract32(VAL, POS * 8, 8); \
>>      } while (0);
>>
>> +#define fCMPGT_SF(A, B) \
>> +    (float32_is_any_nan(A) || float32_is_any_nan(B) ? \
>> +     (int32_t)(A) > (int32_t)(B) : \
>
>
> Seems odd to do an integer comparison of two NaN's

Oh, this is incorrect, indeed. HVX ordering goes like this: QNaN >
SNaN > +Inf > numbers > -Inf > SNaN_neg > QNaN_neg

I'll fix it in v2.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2026-03-26 13:02 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 13:15 [PATCH 00/13] hexagon: add missing HVX float instructions Matheus Tavares Bernardino
2026-03-23 13:15 ` [PATCH 01/13] tests/docker: Update hexagon cross toolchain to 22.1.0 Matheus Tavares Bernardino
2026-03-23 13:15 ` [PATCH 02/13] target/hexagon: fix incorrect/too-permissive HVX encodings Matheus Tavares Bernardino
2026-03-23 19:21   ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 03/13] target/hexagon/cpu: add HVX IEEE FP extension Matheus Tavares Bernardino
2026-03-23 19:32   ` Taylor Simpson
2026-03-24 16:52     ` Matheus Bernardino
2026-03-24 18:48       ` Taylor Simpson
2026-03-24 19:20         ` Brian Cain
2026-03-24 19:46           ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 04/13] target/hexagon: add v68 HVX IEEE float arithmetic insns Matheus Tavares Bernardino
2026-03-23 20:28   ` Taylor Simpson
2026-03-24 19:30     ` Matheus Bernardino
2026-03-24 19:51       ` Taylor Simpson
2026-03-24 19:59         ` Matheus Bernardino
2026-03-25  1:18           ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 05/13] target/hexagon: add v68 HVX IEEE float min/max insns Matheus Tavares Bernardino
2026-03-23 20:47   ` Taylor Simpson
2026-03-24 20:15     ` Matheus Bernardino
2026-03-23 13:15 ` [PATCH 06/13] target/hexagon: add v68 HVX IEEE float misc insns Matheus Tavares Bernardino
2026-03-23 21:08   ` Taylor Simpson
2026-03-24 20:25     ` Matheus Bernardino
2026-03-23 13:15 ` [PATCH 07/13] target/hexagon: add v68 HVX IEEE float conversion insns Matheus Tavares Bernardino
2026-03-23 21:25   ` Taylor Simpson
2026-03-24 21:04     ` Matheus Bernardino
2026-03-25  1:15       ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 08/13] target/hexagon: add v68 HVX IEEE float compare insns Matheus Tavares Bernardino
2026-03-23 21:42   ` Taylor Simpson
2026-03-26 13:00     ` Matheus Bernardino
2026-03-23 13:15 ` [PATCH 09/13] target/hexagon: add v73 HVX IEEE bfloat16 insns Matheus Tavares Bernardino
2026-03-23 22:03   ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 10/13] tests/hexagon: add tests for v68 HVX IEEE float arithmetics Matheus Tavares Bernardino
2026-03-24 19:05   ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 11/13] tests/hexagon: add tests for v68 HVX IEEE float min/max Matheus Tavares Bernardino
2026-03-24 19:07   ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 12/13] tests/hexagon: add tests for v68 HVX IEEE float conversions Matheus Tavares Bernardino
2026-03-24 19:30   ` Taylor Simpson
2026-03-23 13:15 ` [PATCH 13/13] tests/hexagon: add tests for v68 HVX IEEE float comparisons Matheus Tavares Bernardino
2026-03-24 19:37   ` Taylor Simpson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox