All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/35] target/mips: add missing Octeon user-mode support
@ 2026-05-11 18:22 James Hilliard
  2026-05-11 18:22 ` [PATCH v6 01/35] linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE) James Hilliard
                   ` (34 more replies)
  0 siblings, 35 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

This series updates MIPS linux-user unaligned-access behavior and fills
in missing Octeon user-mode instruction support used by existing Octeon
binaries.

The first patches model the Linux/MIPS sysmips ABI pieces needed by
linux-user, including MIPS_FLUSH_CACHE, MIPS_ATOMIC_SET, and the
MIPS_FIXADE policy used to control unaligned scalar access fixups.
User-mode unaligned scalar accesses default to software fixups and
sysmips(MIPS_FIXADE) can toggle SIGBUS/BUS_ADRALN behavior.

The Octeon patches add integer, indexed memory, atomic, fixed-point
QMAC, multiplier, COP2 crypto, CHORD, LLM, and CvmCount RDHWR support.
The series also adds a small mips64/mips64el TCG guest test covering
representative Octeon integer, fixed-point, multiplier, RDHWR, and COP2
selector paths. The final patch corrects the Octeon68XX CP1 feature
bits and FCR defaults.

Changes since v1:
- Split BADDU/DMUL destination fixes into a separate patch.
- Split the SEQ/SNE decode refactoring into a separate patch.
- Moved Octeon multiplier state to uint64_t arrays and updated VMState.
- Switched Octeon helper ABIs to i64/uint64_t where applicable.
- Moved COP2 selector decode/support logic into octeon_translate.c.
- Added in-tree TCG tests for mips64 and mips64el linux-user.
- Used switch ranges and g_assert_not_reached() for SHA3/ZUC shared
  selector handling.
- Dropped Octeon prefixes from generic Camellia helper routines.
- Replaced the reflected GFM 64-bit carryless multiply loop with
  crypto/clmul.h.
- Moved the Octeon68XX CP1 CPU-model correction to the end of the
  series.
- Added migration coverage for Octeon COP2 crypto and LLM sparse state.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes in v6:
- Added Octeon QMAC/QMACS fixed-point accumulator support and smoke
  coverage.
- Added Octeon RDHWR $31/CvmCount support and smoke coverage.
- Clarified MTM0/VMM0 deterministic handling of architecturally
  unpredictable multiplier lanes.
- Fixed MTP0 to zero P1 per the CN71XX register-state table and added
  smoke coverage.
- Fixed VMM0 to apply the full MTM0-style multiplier-state reset and
  added smoke coverage for MPL1.
- Cleaned up internal VMUL, LA*, COP2 payload/state, and COP2 selector
  naming to better match hardware register/selector terminology.
- Renamed the MIPS_FIXADE TB flag, HSH register word-packing helpers,
  and sparse LLM backing fields to match ABI and hardware terminology.
- Link to v5: https://lore.kernel.org/qemu-devel/20260510-mips-octeon-missing-insns-v2-v5-0-d5d2668d15ab@gmail.com

Changes in v5:
- Added Richard Henderson's Reviewed-by tags for LBX, LHUX, LWUX, SAA,
  and SAAD, plus Acked-by tags for ZCB and ZCBT.
- Dropped the separate Octeon+ feature bit; QEMU has a single Octeon CPU
  model today, so SAA/SAAD stay under the existing Octeon feature bucket.
- Folded ZCBT into the ZCB decodetree entry with a selector comment.
- Link to v4: https://lore.kernel.org/qemu-devel/20260509-mips-octeon-missing-insns-v2-v4-0-d669dcd05c2f@gmail.com

Changes in v4:
- Added Richard Henderson's Reviewed-by tags to the reviewed sysmips and
  Octeon translator cleanup patches.
- Kept the Octeon3 MPL3-MPL5/P3-P5 high-lane multiplier state
  documented by Cavium SDK/toolchain sources.
- Documented the Octeon3 two-source MTM/MTP forms and preserved the rt
  high-lane operands while legacy one-source encodings use rt == $zero.
- Simplified SAA/SAAD translation to use the i64 TCG atomic add path for
  both word and doubleword sizes.
- Marked SAA/SAAD as Octeon+ instructions and gated them behind a
  separate Octeon+ feature bit.
- Simplified LA* translation to use i64 TCG atomic helpers for word and
  doubleword operations, with MO_SL selecting word result sign-extension.
- Link to v3: https://lore.kernel.org/qemu-devel/20260508-mips-octeon-missing-insns-v2-v3-0-bcbec96357d9@gmail.com

Changes in v3:
- Rebased on current qemu.git master.
- Split sysmips support into separate MIPS_FLUSH_CACHE, MIPS_ATOMIC_SET,
  and MIPS_FIXADE patches.
- Made MIPS_ATOMIC_SET always use the MIPS separate error-result register
  path for successful returns.
- Removed redundant Octeon MIPS64 checks and target-long guards from the
  translator paths.
- Removed zero-register fast paths where gen_store_gpr() already handles
  discarded writes.
- Reworked SEQ/SNE decode and LA* translator helpers as requested.
- Split the Octeon arithmetic/memory patch into narrower state, indexed
  load, SAA/SAAD, ZCB, multiplier, and test patches.
- Switched Octeon multiplier limb accumulation to uadd64_overflow().
- Link to v2: https://lore.kernel.org/qemu-devel/20260421-mips-octeon-missing-insns-v2-v2-0-a0791df188c9@gmail.com

To: qemu-devel@nongnu.org
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Helge Deller <deller@gmx.de>
Cc: Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Aleksandar Rikalo <arikalo@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>

---
James Hilliard (35):
      linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE)
      linux-user/mips: implement sysmips(MIPS_ATOMIC_SET)
      linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses
      target/mips: fix Octeon arithmetic destination handling
      target/mips: split Octeon SEQ/SNE decode
      target/mips: drop Octeon zero-register fast paths
      target/mips: add Octeon multiplier state
      target/mips: add Octeon LBX instruction
      target/mips: add Octeon LHUX instruction
      target/mips: add Octeon LWUX instruction
      target/mips: add Octeon SAA instruction
      target/mips: add Octeon SAAD instruction
      target/mips: add Octeon ZCB instruction
      target/mips: add Octeon ZCBT instruction
      target/mips: add Octeon MTM0 instruction
      target/mips: add Octeon MTP0 instruction
      target/mips: add Octeon MTP1 instruction
      target/mips: add Octeon MTP2 instruction
      target/mips: add Octeon MTM1 instruction
      target/mips: add Octeon MTM2 instruction
      target/mips: add Octeon VMULU instruction
      target/mips: add Octeon VMM0 instruction
      target/mips: add Octeon V3MULU instruction
      target/mips: add Octeon QMAC instructions
      tests/tcg/mips: add Octeon instruction smoke test
      target/mips: add Octeon LA* atomic instructions
      target/mips: add Octeon COP2 crypto core support
      target/mips: add Octeon SMS4 crypto support
      target/mips: add Octeon SHA3 crypto support
      target/mips: add Octeon ZUC crypto support
      target/mips: add Octeon Camellia crypto support
      target/mips: add Octeon CHORD and LLM COP2 support
      target/mips: add Octeon CvmCount RDHWR support
      tests/tcg/mips: cover Octeon QMAC and CvmCount
      target/mips: expose Octeon68XX floating-point support

 MAINTAINERS                                   |    2 +
 linux-user/mips/cpu_loop.c                    |    5 +
 linux-user/mips/target_syscall.h              |    3 +
 linux-user/mips64/target_syscall.h            |    3 +
 linux-user/syscall.c                          |   56 +
 target/mips/cpu-defs.c.inc                    |   10 +-
 target/mips/cpu.c                             |   75 +-
 target/mips/cpu.h                             |  257 +++
 target/mips/helper.h                          |    9 +
 target/mips/internal.h                        |    3 +
 target/mips/system/machine.c                  |  142 ++
 target/mips/tcg/meson.build                   |    1 +
 target/mips/tcg/octeon.decode                 |   51 +-
 target/mips/tcg/octeon_crypto.c               | 2479 +++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c            |  569 +++++-
 target/mips/tcg/op_helper.c                   |  176 +-
 target/mips/tcg/translate.c                   |   34 +-
 target/mips/tcg/translate.h                   |    1 +
 tests/tcg/mips/Makefile.target                |   11 +
 tests/tcg/mips/user/isa/octeon/octeon-insns.c |  332 ++++
 tests/tcg/mips64/Makefile.target              |   20 +
 tests/tcg/mips64el/Makefile.target            |    8 +
 22 files changed, 4188 insertions(+), 59 deletions(-)
---
base-commit: 5e61afe211e82a9af15a8794a0bd29bb574e953b
change-id: 20260420-mips-octeon-missing-insns-v2-5e693770cf2c

Best regards,
--  
James Hilliard <james.hilliard1@gmail.com>



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v6 01/35] linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE)
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 02/35] linux-user/mips: implement sysmips(MIPS_ATOMIC_SET) James Hilliard
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

Add the target sysmips dispatcher and implement MIPS_FLUSH_CACHE as a
successful no-op for linux-user.

Self-modifying code is handled by QEMU's normal user-mode translation
invalidation machinery, so the target ABI only needs the syscall command
to be accepted.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MIPS_FLUSH_CACHE out of the combined sysmips/MIPS_FIXADE patch.
    (suggested by Richard Henderson)
---
 linux-user/mips/target_syscall.h   |  1 +
 linux-user/mips64/target_syscall.h |  1 +
 linux-user/syscall.c               | 17 +++++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/linux-user/mips/target_syscall.h b/linux-user/mips/target_syscall.h
index dfcdf320b7..3f36c1695a 100644
--- a/linux-user/mips/target_syscall.h
+++ b/linux-user/mips/target_syscall.h
@@ -10,6 +10,7 @@
 #define TARGET_MCL_ONFAULT 4
 
 #define TARGET_FORCE_SHMLBA
+#define TARGET_SYSMIPS_FLUSH_CACHE     3
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
 {
diff --git a/linux-user/mips64/target_syscall.h b/linux-user/mips64/target_syscall.h
index 9135bf5e8b..20ea7c6ab9 100644
--- a/linux-user/mips64/target_syscall.h
+++ b/linux-user/mips64/target_syscall.h
@@ -10,6 +10,7 @@
 #define TARGET_MCL_ONFAULT 4
 
 #define TARGET_FORCE_SHMLBA
+#define TARGET_SYSMIPS_FLUSH_CACHE     3
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
 {
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d3d9fffb54..73f09bb775 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6630,6 +6630,19 @@ static abi_long do_prctl_syscall_user_dispatch(CPUArchState *env,
     }
 }
 
+#ifdef TARGET_NR_sysmips
+static abi_long do_sysmips(CPUArchState *env, abi_long cmd, abi_long arg1,
+                           abi_long arg2)
+{
+    switch (cmd) {
+    case TARGET_SYSMIPS_FLUSH_CACHE:
+        return 0;
+    default:
+        return -TARGET_EINVAL;
+    }
+}
+#endif
+
 static abi_long do_prctl(CPUArchState *env, abi_long option, abi_long arg2,
                          abi_long arg3, abi_long arg4, abi_long arg5)
 {
@@ -12102,6 +12115,10 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
     case TARGET_NR_prctl:
         return do_prctl(cpu_env, arg1, arg2, arg3, arg4, arg5);
         break;
+#ifdef TARGET_NR_sysmips
+    case TARGET_NR_sysmips:
+        return do_sysmips(cpu_env, arg1, arg2, arg3);
+#endif
 #ifdef TARGET_NR_arch_prctl
     case TARGET_NR_arch_prctl:
         return do_arch_prctl(cpu_env, arg1, arg2);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 02/35] linux-user/mips: implement sysmips(MIPS_ATOMIC_SET)
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
  2026-05-11 18:22 ` [PATCH v6 01/35] linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE) James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses James Hilliard
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

Implement the MIPS_ATOMIC_SET sysmips command as an aligned 32-bit atomic
exchange in target memory.

MIPS reports syscall errors through a separate register, so successful old
values can overlap the errno range.  Write the return value and error flag
directly and return -QEMU_ESIGRETURN so the common syscall path leaves the
registers unchanged.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MIPS_ATOMIC_SET out of the combined sysmips/MIPS_FIXADE patch.
    (suggested by Richard Henderson)
  - Always use the explicit MIPS return-register path for successful
    atomic_set results.  (suggested by Richard Henderson)
---
 linux-user/mips/target_syscall.h   |  1 +
 linux-user/mips64/target_syscall.h |  1 +
 linux-user/syscall.c               | 31 +++++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/linux-user/mips/target_syscall.h b/linux-user/mips/target_syscall.h
index 3f36c1695a..9206694f4f 100644
--- a/linux-user/mips/target_syscall.h
+++ b/linux-user/mips/target_syscall.h
@@ -11,6 +11,7 @@
 
 #define TARGET_FORCE_SHMLBA
 #define TARGET_SYSMIPS_FLUSH_CACHE     3
+#define TARGET_SYSMIPS_ATOMIC_SET   2001
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
 {
diff --git a/linux-user/mips64/target_syscall.h b/linux-user/mips64/target_syscall.h
index 20ea7c6ab9..e07687f8ac 100644
--- a/linux-user/mips64/target_syscall.h
+++ b/linux-user/mips64/target_syscall.h
@@ -11,6 +11,7 @@
 
 #define TARGET_FORCE_SHMLBA
 #define TARGET_SYSMIPS_FLUSH_CACHE     3
+#define TARGET_SYSMIPS_ATOMIC_SET   2001
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
 {
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 73f09bb775..3786a34041 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6631,10 +6631,41 @@ static abi_long do_prctl_syscall_user_dispatch(CPUArchState *env,
 }
 
 #ifdef TARGET_NR_sysmips
+static abi_long do_sysmips_atomic_set(CPUArchState *env, abi_ulong addr,
+                                      abi_long value)
+{
+    uint32_t *ptr;
+    abi_long old;
+
+    if (addr & 3) {
+        return -TARGET_EINVAL;
+    }
+
+    ptr = lock_user(VERIFY_WRITE, addr, sizeof(*ptr), true);
+    if (!ptr) {
+        return -TARGET_EINVAL;
+    }
+
+    old = tswap32(qatomic_xchg(ptr, tswap32((uint32_t)value)));
+    unlock_user(ptr, addr, sizeof(*ptr));
+
+    /*
+     * MIPS uses a separate error flag, but the common linux-user syscall
+     * path infers that flag from the return value.  Successful atomic_set
+     * results can overlap the target errno range, so write the result
+     * registers here and ask the CPU loop to leave them alone.
+     */
+    env->active_tc.gpr[2] = old;
+    env->active_tc.gpr[7] = 0;
+    return -QEMU_ESIGRETURN;
+}
+
 static abi_long do_sysmips(CPUArchState *env, abi_long cmd, abi_long arg1,
                            abi_long arg2)
 {
     switch (cmd) {
+    case TARGET_SYSMIPS_ATOMIC_SET:
+        return do_sysmips_atomic_set(env, arg1, arg2);
     case TARGET_SYSMIPS_FLUSH_CACHE:
         return 0;
     default:

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
  2026-05-11 18:22 ` [PATCH v6 01/35] linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE) James Hilliard
  2026-05-11 18:22 ` [PATCH v6 02/35] linux-user/mips: implement sysmips(MIPS_ATOMIC_SET) James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 14:36   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling James Hilliard
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

Linux/MIPS enables software fixups for user-mode unaligned scalar
accesses by default through MIPS_FIXADE/TIF_FIXADE.  QEMU linux-user did
not model that ABI, so MIPS guests took fatal AdEL/AdES exceptions unless
translation was forced to use unaligned host accesses.

Key MIPS translation blocks on the linux-user unaligned policy, implement
sysmips(MIPS_FIXADE) to toggle that policy, and raise SIGBUS/BUS_ADRALN
when fixups are disabled.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v5 -> v6:
  - Rename the TB flag from TB_FLAG_UNALIGN to TB_FLAG_MIPS_FIXADE
    to match the MIPS_FIXADE ABI policy.

Changes v2 -> v3:
  - Split MIPS_FLUSH_CACHE and MIPS_ATOMIC_SET into preparatory sysmips
    patches.  (suggested by Richard Henderson)
---
 linux-user/mips/cpu_loop.c         | 5 +++++
 linux-user/mips/target_syscall.h   | 1 +
 linux-user/mips64/target_syscall.h | 1 +
 linux-user/syscall.c               | 8 ++++++++
 target/mips/cpu.c                  | 8 ++++++--
 target/mips/cpu.h                  | 4 ++++
 target/mips/tcg/translate.c        | 6 +++++-
 7 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c
index fa264b27ec..ff9d293c29 100644
--- a/linux-user/mips/cpu_loop.c
+++ b/linux-user/mips/cpu_loop.c
@@ -161,6 +161,11 @@ done_syscall:
         case EXCP_DSPDIS:
             force_sig(TARGET_SIGILL);
             break;
+        case EXCP_AdEL:
+        case EXCP_AdES:
+            force_sig_fault(TARGET_SIGBUS, TARGET_BUS_ADRALN,
+                            env->CP0_BadVAddr);
+            break;
         case EXCP_INTERRUPT:
             /* just indicate that signals should be handled asap */
             break;
diff --git a/linux-user/mips/target_syscall.h b/linux-user/mips/target_syscall.h
index 9206694f4f..be6942445a 100644
--- a/linux-user/mips/target_syscall.h
+++ b/linux-user/mips/target_syscall.h
@@ -11,6 +11,7 @@
 
 #define TARGET_FORCE_SHMLBA
 #define TARGET_SYSMIPS_FLUSH_CACHE     3
+#define TARGET_SYSMIPS_FIXADE          7
 #define TARGET_SYSMIPS_ATOMIC_SET   2001
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
diff --git a/linux-user/mips64/target_syscall.h b/linux-user/mips64/target_syscall.h
index e07687f8ac..c11d0a0888 100644
--- a/linux-user/mips64/target_syscall.h
+++ b/linux-user/mips64/target_syscall.h
@@ -11,6 +11,7 @@
 
 #define TARGET_FORCE_SHMLBA
 #define TARGET_SYSMIPS_FLUSH_CACHE     3
+#define TARGET_SYSMIPS_FIXADE          7
 #define TARGET_SYSMIPS_ATOMIC_SET   2001
 
 static inline abi_ulong target_shmlba(CPUMIPSState *env)
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3786a34041..551caa8e7c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6663,9 +6663,17 @@ static abi_long do_sysmips_atomic_set(CPUArchState *env, abi_ulong addr,
 static abi_long do_sysmips(CPUArchState *env, abi_long cmd, abi_long arg1,
                            abi_long arg2)
 {
+    CPUState *cs = env_cpu(env);
+
     switch (cmd) {
     case TARGET_SYSMIPS_ATOMIC_SET:
         return do_sysmips_atomic_set(env, arg1, arg2);
+    case TARGET_SYSMIPS_FIXADE:
+        if (arg1 & ~3) {
+            return -TARGET_EINVAL;
+        }
+        cs->prctl_unalign_sigbus = !(arg1 & 1);
+        return 0;
     case TARGET_SYSMIPS_FLUSH_CACHE:
         return 0;
     default:
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index f803d47763..6e827c72de 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -565,11 +565,15 @@ static int mips_cpu_mmu_index(CPUState *cs, bool ifunc)
 static TCGTBCPUState mips_get_tb_cpu_state(CPUState *cs)
 {
     CPUMIPSState *env = cpu_env(cs);
+    uint32_t flags = env->hflags & MIPS_HFLAG_TB_MASK;
+
+#ifdef CONFIG_USER_ONLY
+    flags |= TB_FLAG_MIPS_FIXADE * !cs->prctl_unalign_sigbus;
+#endif
 
     return (TCGTBCPUState){
         .pc = env->active_tc.PC,
-        .flags = env->hflags & (MIPS_HFLAG_TMASK | MIPS_HFLAG_BMASK |
-                                MIPS_HFLAG_HWRENA_ULR),
+        .flags = flags,
     };
 }
 
diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index cbb9b3e1b1..b478f834c1 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -1161,6 +1161,10 @@ typedef struct CPUArchState {
 #define MIPS_HFLAG_ELPA  0x4000000
 #define MIPS_HFLAG_ITC_CACHE  0x8000000 /* CACHE instr. operates on ITC tag */
 #define MIPS_HFLAG_ERL   0x10000000 /* error level flag */
+#define MIPS_HFLAG_TB_MASK (MIPS_HFLAG_TMASK | MIPS_HFLAG_BMASK | \
+                            MIPS_HFLAG_HWRENA_ULR)
+
+#define TB_FLAG_MIPS_FIXADE  0x40000000
     target_ulong btarget;        /* Jump / branch target               */
     target_ulong bcond;          /* Branch condition (if needed)       */
 
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 54ed253a7d..dac30aff8d 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -15070,6 +15070,7 @@ static void mips_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
 {
     DisasContext *ctx = container_of(dcbase, DisasContext, base);
     CPUMIPSState *env = cpu_env(cs);
+    uint32_t tb_flags = ctx->base.tb->flags;
 
     ctx->page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
     ctx->saved_pc = -1;
@@ -15092,7 +15093,7 @@ static void mips_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->CP0_LLAddr_shift = env->CP0_LLAddr_shift;
     ctx->cmgcr = (env->CP0_Config3 >> CP0C3_CMGCR) & 1;
     /* Restore delay slot state from the tb context.  */
-    ctx->hflags = (uint32_t)ctx->base.tb->flags; /* FIXME: maybe use 64 bits? */
+    ctx->hflags = tb_flags & MIPS_HFLAG_TB_MASK;
     ctx->ulri = (env->CP0_Config3 >> CP0C3_ULRI) & 1;
     ctx->ps = ((env->active_fpu.fcr0 >> FCR0_PS) & 1) ||
              (env->insn_flags & (INSN_LOONGSON2E | INSN_LOONGSON2F));
@@ -15112,6 +15113,9 @@ static void mips_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->default_tcg_memop_mask = (!(ctx->insn_flags & ISA_NANOMIPS32) &&
                                   (ctx->insn_flags & (ISA_MIPS_R6 |
                                   INSN_LOONGSON3A))) ? MO_UNALN : MO_ALIGN;
+    if (tb_flags & TB_FLAG_MIPS_FIXADE) {
+        ctx->default_tcg_memop_mask = MO_UNALN;
+    }
 
     /*
      * Execute a branch and its delay slot as a single instruction.

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (2 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:30   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 05/35] target/mips: split Octeon SEQ/SNE decode James Hilliard
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

BADDU and DMUL write their results to rd, not rt.  Route writes through
gen_store_gpr() so rd == $zero is handled consistently.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Split the BADDU/DMUL destination handling fix out of the Octeon
    arithmetic instruction patch.  (suggested by Philippe Mathieu-Daudé)

Changes v2 -> v3:
  - Remove the rd == $zero fast paths and let gen_store_gpr() discard
    writes to $zero.  (suggested by Richard Henderson)
---
 target/mips/tcg/octeon_translate.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index e1f52d444a..4dd7626835 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -45,18 +45,14 @@ static bool trans_BADDU(DisasContext *ctx, arg_BADDU *a)
 {
     TCGv_i64 t0, t1;
 
-    if (a->rt == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     t1 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
     gen_load_gpr(t1, a->rt);
 
     tcg_gen_add_i64(t0, t0, t1);
-    tcg_gen_andi_i64(cpu_gpr[a->rd], t0, 0xff);
+    tcg_gen_andi_i64(t0, t0, 0xff);
+    gen_store_gpr(t0, a->rd);
     return true;
 }
 
@@ -64,17 +60,13 @@ static bool trans_DMUL(DisasContext *ctx, arg_DMUL *a)
 {
     TCGv_i64 t0, t1;
 
-    if (a->rt == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     t1 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
     gen_load_gpr(t1, a->rt);
 
-    tcg_gen_mul_i64(cpu_gpr[a->rd], t0, t1);
+    tcg_gen_mul_i64(t0, t0, t1);
+    gen_store_gpr(t0, a->rd);
     return true;
 }
 

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 05/35] target/mips: split Octeon SEQ/SNE decode
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (3 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths James Hilliard
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

Decode the equality and inequality forms as explicit SEQ/SNE and
SEQI/SNEI instructions rather than using shared generated SEQNE/SEQNEI
entries.

The explicit decoder names match the architectural mnemonics, which makes
the translator entry points and trace/debug output easier to correlate
with the instruction set.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Split the SEQ/SNE decode cleanup out of the Octeon arithmetic
    instruction patch.  (suggested by Philippe Mathieu-Daudé)

Changes v2 -> v3:
  - Remove the decoded ne field now that the instructions are split.
  - Reuse @r3 for SEQ/SNE and pass the TCG condition into a shared
    translator helper.  (suggested by Richard Henderson)
---
 target/mips/tcg/octeon.decode      |  7 +++--
 target/mips/tcg/octeon_translate.c | 52 ++++++++++++++++++++------------------
 2 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 102a05860d..a2bfd0751d 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -30,6 +30,7 @@ BBIT         11 set:1 . 10 rs:5 ..... offset:s16 p=%bbit_p
 # SNEI rt, rs, immediate
 
 @r3          ...... rs:5 rt:5 rd:5 ..... ......
+&cmpi        rs rt imm
 %bitfield_p  0:1 6:5
 @bitfield    ...... rs:5 rt:5 lenm1:5 ..... ..... . p=%bitfield_p
 
@@ -38,8 +39,10 @@ DMUL         011100 ..... ..... ..... 00000 000011 @r3
 EXTS         011100 ..... ..... ..... ..... 11101 . @bitfield
 CINS         011100 ..... ..... ..... ..... 11001 . @bitfield
 POP          011100 rs:5 00000 rd:5 00000 10110 dw:1
-SEQNE        011100 rs:5 rt:5 rd:5 00000 10101 ne:1
-SEQNEI       011100 rs:5 rt:5 imm:s10 10111 ne:1
+SEQ          011100 ..... ..... ..... 00000 101010 @r3
+SNE          011100 ..... ..... ..... 00000 101011 @r3
+SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
+SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 
 &lx          base index rd
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 4dd7626835..8e49e16b5a 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -121,52 +121,54 @@ static bool trans_POP(DisasContext *ctx, arg_POP *a)
     return true;
 }
 
-static bool trans_SEQNE(DisasContext *ctx, arg_SEQNE *a)
+static bool do_seq_sne(DisasContext *ctx, const arg_decode_ext_octeon1 *a,
+                       TCGCond cond)
 {
     TCGv_i64 t0, t1;
 
-    if (a->rd == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     t1 = tcg_temp_new_i64();
 
     gen_load_gpr(t0, a->rs);
     gen_load_gpr(t1, a->rt);
 
-    if (a->ne) {
-        tcg_gen_setcond_i64(TCG_COND_NE, cpu_gpr[a->rd], t1, t0);
-    } else {
-        tcg_gen_setcond_i64(TCG_COND_EQ, cpu_gpr[a->rd], t1, t0);
-    }
+    tcg_gen_setcond_i64(cond, t0, t1, t0);
+    gen_store_gpr(t0, a->rd);
     return true;
 }
 
-static bool trans_SEQNEI(DisasContext *ctx, arg_SEQNEI *a)
+static bool trans_SEQ(DisasContext *ctx, arg_SEQ *a)
 {
-    TCGv_i64 t0;
+    return do_seq_sne(ctx, a, TCG_COND_EQ);
+}
 
-    if (a->rt == 0) {
-        /* nop */
-        return true;
-    }
+static bool trans_SNE(DisasContext *ctx, arg_SNE *a)
+{
+    return do_seq_sne(ctx, a, TCG_COND_NE);
+}
 
-    t0 = tcg_temp_new_i64();
+static bool do_seqi_snei(DisasContext *ctx, const arg_cmpi *a, TCGCond cond)
+{
+    TCGv_i64 t0;
 
+    t0 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
 
-    /* Sign-extend to 64 bit value */
-    target_ulong imm = a->imm;
-    if (a->ne) {
-        tcg_gen_setcondi_i64(TCG_COND_NE, cpu_gpr[a->rt], t0, imm);
-    } else {
-        tcg_gen_setcondi_i64(TCG_COND_EQ, cpu_gpr[a->rt], t0, imm);
-    }
+    tcg_gen_setcondi_i64(cond, t0, t0, a->imm);
+    gen_store_gpr(t0, a->rt);
     return true;
 }
 
+static bool trans_SEQI(DisasContext *ctx, arg_SEQI *a)
+{
+    return do_seqi_snei(ctx, a, TCG_COND_EQ);
+}
+
+static bool trans_SNEI(DisasContext *ctx, arg_SNEI *a)
+{
+    return do_seqi_snei(ctx, a, TCG_COND_NE);
+}
+
 static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
 {
     gen_lx(ctx, a->rd, a->base, a->index, mop);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (4 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 05/35] target/mips: split Octeon SEQ/SNE decode James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:31   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 07/35] target/mips: add Octeon multiplier state James Hilliard
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

EXTS, CINS, and POP route their destination writes through
gen_store_gpr(), which already discards writes to $zero. Remove the
remaining translator fast paths for destination $zero so these Octeon
instructions follow the same shape as BADDU/DMUL and the generic MIPS
translator helpers.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Remove the remaining destination $zero fast paths and let
    gen_store_gpr() discard writes to $zero.  (suggested by Richard
    Henderson)
---
 target/mips/tcg/octeon_translate.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 8e49e16b5a..5497ddfb10 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -74,11 +74,6 @@ static bool trans_EXTS(DisasContext *ctx, arg_EXTS *a)
 {
     TCGv_i64 t0;
 
-    if (a->rt == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
     tcg_gen_sextract_i64(t0, t0, a->p, a->lenm1 + 1);
@@ -90,11 +85,6 @@ static bool trans_CINS(DisasContext *ctx, arg_CINS *a)
 {
     TCGv_i64 t0;
 
-    if (a->rt == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
     tcg_gen_deposit_z_i64(t0, t0, a->p, a->lenm1 + 1);
@@ -106,11 +96,6 @@ static bool trans_POP(DisasContext *ctx, arg_POP *a)
 {
     TCGv_i64 t0;
 
-    if (a->rd == 0) {
-        /* nop */
-        return true;
-    }
-
     t0 = tcg_temp_new_i64();
     gen_load_gpr(t0, a->rs);
     if (!a->dw) {

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 07/35] target/mips: add Octeon multiplier state
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (5 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:31   ` Richard Henderson
  2026-05-14 10:27   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 08/35] target/mips: add Octeon LBX instruction James Hilliard
                   ` (27 subsequent siblings)
  34 siblings, 2 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add per-thread Octeon multiplier state for the MPL and P limb banks used
by the VMULU/VMM0/V3MULU instruction family.

Octeon3 extends the older MPL0-MPL2/P0-P2 state with high lanes
MPL3-MPL5/P3-P5, programmed by the two-source MTM/MTP forms. Represent
both banks as uint64_t arrays so the TC state matches the architected
64-bit limb layout used by Octeon68XX user-mode code.

Migrate the multiplier registers in an Octeon-only subsection so
non-Octeon CPU models do not grow migration state.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split the multiplier state out of the combined Octeon arithmetic and
    memory instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Document and keep the Octeon3 MPL3-MPL5/P3-P5 high-lane state used by
    the two-source MTM/MTP forms.
---
 target/mips/cpu.h            | 12 ++++++++++++
 target/mips/system/machine.c | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index b478f834c1..346713705a 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -459,6 +459,14 @@ typedef struct mips_def_t mips_def_t;
 
 
 typedef struct TCState TCState;
+
+/*
+ * Octeon3 adds a second bank of multiplier/product limbs used by the
+ * two-source MTM/MTP forms: MPL0..2/P0..2 from rs and MPL3..5/P3..5 from rt.
+ */
+#define OCTEON_MULTIPLIER_LANES 3
+#define OCTEON_MULTIPLIER_REGS (2 * OCTEON_MULTIPLIER_LANES)
+
 struct TCState {
     target_ulong gpr[32];
 #if defined(TARGET_MIPS64)
@@ -497,6 +505,10 @@ struct TCState {
     target_ulong CP0_TCScheFBack;
     int32_t CP0_Debug_tcstatus;
     target_ulong CP0_UserLocal;
+    struct {
+        uint64_t MPL[OCTEON_MULTIPLIER_REGS];
+        uint64_t P[OCTEON_MULTIPLIER_REGS];
+    } octeon;
 
     int32_t msacsr;
 
diff --git a/target/mips/system/machine.c b/target/mips/system/machine.c
index 5880b401b0..f988b3695b 100644
--- a/target/mips/system/machine.c
+++ b/target/mips/system/machine.c
@@ -120,6 +120,17 @@ static const VMStateDescription vmstate_inactive_tc = {
     .fields = vmstate_tc_fields
 };
 
+static const VMStateDescription vmstate_octeon_multiplier_tc = {
+    .name = "cpu/tc/octeon_multiplier",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64_ARRAY(octeon.MPL, TCState, OCTEON_MULTIPLIER_REGS),
+        VMSTATE_UINT64_ARRAY(octeon.P, TCState, OCTEON_MULTIPLIER_REGS),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 /* MVP state */
 
 static const VMStateDescription vmstate_mvp = {
@@ -247,6 +258,27 @@ static const VMStateDescription mips_vmstate_timer = {
     }
 };
 
+static bool mips_octeon_needed(void *opaque)
+{
+    MIPSCPU *cpu = opaque;
+
+    return cpu->env.insn_flags & INSN_OCTEON;
+}
+
+static const VMStateDescription mips_vmstate_octeon_multiplier = {
+    .name = "cpu/octeon_multiplier",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mips_octeon_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_STRUCT(env.active_tc, MIPSCPU, 1,
+                       vmstate_octeon_multiplier_tc, TCState),
+        VMSTATE_STRUCT_ARRAY(env.tcs, MIPSCPU, MIPS_SHADOW_SET_MAX, 1,
+                             vmstate_octeon_multiplier_tc, TCState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_mips_cpu = {
     .name = "cpu",
     .version_id = 21,
@@ -363,6 +395,7 @@ const VMStateDescription vmstate_mips_cpu = {
     },
     .subsections = (const VMStateDescription * const []) {
         &mips_vmstate_timer,
+        &mips_vmstate_octeon_multiplier,
         NULL
     }
 };

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 08/35] target/mips: add Octeon LBX instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (6 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 07/35] target/mips: add Octeon multiplier state James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14  9:49   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 09/35] target/mips: add Octeon LHUX instruction James Hilliard
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

LBX performs an indexed signed byte load from base + index and writes the
sign-extended result to rd.

Wire the existing indexed-load helper to MO_SB so Octeon user-mode
binaries can use the signed byte variant alongside the existing LBUX
path.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split LBX out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index a2bfd0751d..efb1a48b38 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -49,4 +49,5 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 LWX          011111 ..... ..... ..... 00000 001010 @lx
 LHX          011111 ..... ..... ..... 00100 001010 @lx
 LBUX         011111 ..... ..... ..... 00110 001010 @lx
+LBX          011111 ..... ..... ..... 10110 001010 @lx
 LDX          011111 ..... ..... ..... 01000 001010 @lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 5497ddfb10..451737cda1 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -161,6 +161,7 @@ static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
     return true;
 }
 
+TRANS(LBX,  trans_lx, MO_SB);
 TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);
 TRANS(LWX,  trans_lx, MO_SL);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 09/35] target/mips: add Octeon LHUX instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (7 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 08/35] target/mips: add Octeon LBX instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14  9:50   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 10/35] target/mips: add Octeon LWUX instruction James Hilliard
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

LHUX performs an indexed unsigned halfword load from base + index and
zero-extends the result into rd.

Add the decode entry and reuse the common indexed-load translator with
MO_UW.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split LHUX out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index efb1a48b38..8a755075e8 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -48,6 +48,7 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
 LWX          011111 ..... ..... ..... 00000 001010 @lx
 LHX          011111 ..... ..... ..... 00100 001010 @lx
+LHUX         011111 ..... ..... ..... 10100 001010 @lx
 LBUX         011111 ..... ..... ..... 00110 001010 @lx
 LBX          011111 ..... ..... ..... 10110 001010 @lx
 LDX          011111 ..... ..... ..... 01000 001010 @lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 451737cda1..f897b42807 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -164,5 +164,6 @@ static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
 TRANS(LBX,  trans_lx, MO_SB);
 TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);
+TRANS(LHUX, trans_lx, MO_UW);
 TRANS(LWX,  trans_lx, MO_SL);
 TRANS(LDX,  trans_lx, MO_UQ);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 10/35] target/mips: add Octeon LWUX instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (8 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 09/35] target/mips: add Octeon LHUX instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14  9:50   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 11/35] target/mips: add Octeon SAA instruction James Hilliard
                   ` (24 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

LWUX performs an indexed unsigned word load from base + index and
zero-extends the result into rd.

Add the decode entry and route it through the common indexed-load
translator with MO_UL.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split LWUX out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 8a755075e8..db7d5f55f0 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -50,5 +50,6 @@ LWX          011111 ..... ..... ..... 00000 001010 @lx
 LHX          011111 ..... ..... ..... 00100 001010 @lx
 LHUX         011111 ..... ..... ..... 10100 001010 @lx
 LBUX         011111 ..... ..... ..... 00110 001010 @lx
+LWUX         011111 ..... ..... ..... 10000 001010 @lx
 LBX          011111 ..... ..... ..... 10110 001010 @lx
 LDX          011111 ..... ..... ..... 01000 001010 @lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index f897b42807..401c4bd14b 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -166,4 +166,5 @@ TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);
 TRANS(LHUX, trans_lx, MO_UW);
 TRANS(LWX,  trans_lx, MO_SL);
+TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 11/35] target/mips: add Octeon SAA instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (9 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 10/35] target/mips: add Octeon LWUX instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:08   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 12/35] target/mips: add Octeon SAAD instruction James Hilliard
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

SAA atomically adds rt to the naturally aligned 32-bit word at base and
discards the old memory value.

Implement the common SAA/SAAD translator with TCG atomic_fetch_add_i64.
The MemOp selects the word or doubleword transaction size.  QEMU only has
one Octeon CPU model today, so keep SAA/SAAD under the existing Octeon
instruction feature bucket instead of adding a finer-grained Octeon+
feature bit.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split SAA out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Gate SAA/SAAD behind an Octeon+ feature bit.  (reported by Richard
    Henderson)
  - Use the i64 TCG atomic add path for both word and doubleword sizes.
    (suggested by Richard Henderson)

Changes v4 -> v5:
  - Drop the separate Octeon+ feature bit; QEMU only has one Octeon CPU
    model today.  (comment by Richard Henderson)
---
 target/mips/tcg/octeon.decode      |  4 ++++
 target/mips/tcg/octeon_translate.c | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index db7d5f55f0..d6b241de42 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -44,6 +44,10 @@ SNE          011100 ..... ..... ..... 00000 101011 @r3
 SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
 SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 
+&saa         base rt
+@saa         ...... base:5 rt:5 ................ &saa
+SAA          011100 ..... ..... 00000 00000 011000 @saa
+
 &lx          base index rd
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
 LWX          011111 ..... ..... ..... 00000 001010 @lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 401c4bd14b..441d71d57b 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -161,6 +161,20 @@ static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
     return true;
 }
 
+static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
+{
+    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv_i64 value = tcg_temp_new_i64();
+    TCGv_i64 old = tcg_temp_new_i64();
+    MemOp amo = mo_endian(ctx) | mop | MO_ALIGN;
+
+    gen_base_offset_addr(ctx, addr, a->base, 0);
+    gen_load_gpr(value, a->rt);
+    tcg_gen_atomic_fetch_add_i64(old, addr, value, ctx->mem_idx, amo);
+    return true;
+}
+
+TRANS(SAA,  trans_saa, MO_UL);
 TRANS(LBX,  trans_lx, MO_SB);
 TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 12/35] target/mips: add Octeon SAAD instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (10 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 11/35] target/mips: add Octeon SAA instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:08   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 13/35] target/mips: add Octeon ZCB instruction James Hilliard
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

SAAD is the doubleword form of SAA: it atomically adds rt to the
naturally aligned 64-bit doubleword at base and discards the old memory
value.

Route it through the common SAA/SAAD translator so the MemOp selects the
aligned doubleword transaction size.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split SAAD out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Note that SAAD shares the Octeon+ gated SAA translator path.

Changes v4 -> v5:
  - Drop the Octeon+ gated wording/path and keep SAAD under the existing
    Octeon feature bucket.
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index d6b241de42..d77717cd50 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -47,6 +47,7 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
 SAA          011100 ..... ..... 00000 00000 011000 @saa
+SAAD         011100 ..... ..... 00000 00000 011001 @saa
 
 &lx          base index rd
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 441d71d57b..daeaf07072 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -175,6 +175,7 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
 }
 
 TRANS(SAA,  trans_saa, MO_UL);
+TRANS(SAAD, trans_saa, MO_UQ);
 TRANS(LBX,  trans_lx, MO_SB);
 TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 13/35] target/mips: add Octeon ZCB instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (11 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 12/35] target/mips: add Octeon SAAD instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:25   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction James Hilliard
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

ZCB zeros the 128-byte cache block containing the base address.

Model the user-mode-visible effect by aligning the address down to a
128-byte line and storing sixteen zero doublewords to guest memory.

Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split ZCB out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)
---
 target/mips/tcg/octeon.decode      |  3 +++
 target/mips/tcg/octeon_translate.c | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index d77717cd50..d8a1bfce77 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -49,6 +49,9 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 SAA          011100 ..... ..... 00000 00000 011000 @saa
 SAAD         011100 ..... ..... 00000 00000 011001 @saa
 
+&zcb         base
+ZCB          011100 base:5 00000 00000 11100 011111 &zcb
+
 &lx          base index rd
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
 LWX          011111 ..... ..... ..... 00000 001010 @lx
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index daeaf07072..75b28c4338 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -174,6 +174,30 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
     return true;
 }
 
+static bool trans_ZCB(DisasContext *ctx, arg_ZCB *a)
+{
+    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv_i64 line = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_constant_i64(0);
+
+    gen_base_offset_addr(ctx, addr, a->base, 0);
+
+    /*
+     * QEMU models ZCB/ZCBT as zeroing the containing 128-byte cache line
+     * in guest memory.
+     */
+    tcg_gen_andi_i64(line, addr, ~0x7fULL);
+
+    for (int i = 0; i < 16; i++) {
+        TCGv_i64 slot = tcg_temp_new_i64();
+
+        tcg_gen_addi_i64(slot, line, i * 8);
+        tcg_gen_qemu_st_i64(zero, slot, ctx->mem_idx, mo_endian(ctx) | MO_UQ);
+    }
+
+    return true;
+}
+
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
 TRANS(LBX,  trans_lx, MO_SB);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (12 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 13/35] target/mips: add Octeon ZCB instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-14 10:03   ` Philippe Mathieu-Daudé
  2026-05-11 18:22 ` [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction James Hilliard
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard, Richard Henderson

ZCBT has the same user-mode memory effect as ZCB for QEMU's purposes.

Reuse the ZCB translator so both cache-block-zero forms clear the
containing 128-byte line.

Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split ZCBT out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v4 -> v5:
  - Fold ZCBT into the ZCB decodetree entry with a selector comment
    instead of adding a separate translator thunk.  (suggested by Richard
    Henderson)
---
 target/mips/tcg/octeon.decode | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index d8a1bfce77..5377f7b3ef 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -51,6 +51,7 @@ SAAD         011100 ..... ..... 00000 00000 011001 @saa
 
 &zcb         base
 ZCB          011100 base:5 00000 00000 11100 011111 &zcb
+ZCB          011100 base:5 00000 00000 11101 011111 &zcb  # ZCBT
 
 &lx          base index rd
 @lx          ...... base:5 index:5 rd:5 ...... ..... &lx

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (13 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:40   ` Richard Henderson
  2026-05-11 18:22 ` [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction James Hilliard
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTM0 loads the low Octeon3 multiplier operand pair from rs/rt into
MPL[0] and MPL[3], starts a new multiplier chain, sets MPL[1] to zero,
and resets partial products. Model the architecturally unpredictable
MPL[2], MPL[4], and MPL[5] lanes as zero for deterministic emulation.

Legacy single-source encodings have rt encoded as $zero, so the same
translator path also preserves the older Octeon behavior.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTM0 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Keep the Octeon3 two-source rt high-lane operand and document that
    legacy one-source MTM encodings use rt == $zero.

Changes v5 -> v6:
  - Clarify the CN71XX-defined MPL1 zeroing and the modeled-zero
    unpredictable MPL lanes.
---
 target/mips/tcg/octeon.decode      |  2 ++
 target/mips/tcg/octeon_translate.c | 57 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 5377f7b3ef..bf1dab61e1 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -43,6 +43,8 @@ SEQ          011100 ..... ..... ..... 00000 101010 @r3
 SNE          011100 ..... ..... ..... 00000 101011 @r3
 SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
 SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
+&r2          rs rt
+MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 75b28c4338..4507f8a5bc 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -198,6 +198,62 @@ static bool trans_ZCB(DisasContext *ctx, arg_ZCB *a)
     return true;
 }
 
+static void octeon_store_mpl(unsigned int index, TCGv_i64 value)
+{
+    tcg_gen_st_i64(value, tcg_env,
+                   offsetof(CPUMIPSState, active_tc.octeon.MPL) +
+                   index * sizeof(uint64_t));
+}
+
+static void octeon_store_p(unsigned int index, TCGv_i64 value)
+{
+    tcg_gen_st_i64(value, tcg_env,
+                   offsetof(CPUMIPSState, active_tc.octeon.P) +
+                   index * sizeof(uint64_t));
+}
+
+static void octeon_zero_partial_product_state(void)
+{
+    TCGv_i64 zero = tcg_constant_i64(0);
+
+    for (int i = 0; i < OCTEON_MULTIPLIER_REGS; i++) {
+        octeon_store_p(i, zero);
+    }
+}
+
+static void octeon_reset_mtm0_mpl_state(void)
+{
+    TCGv_i64 zero = tcg_constant_i64(0);
+
+    /*
+     * MTM0 defines MPL1 as zero; model the architecturally unpredictable
+     * MPL2/MPL4/MPL5 lanes as zero for deterministic emulation.
+     */
+    octeon_store_mpl(1, zero);
+    octeon_store_mpl(2, zero);
+    octeon_store_mpl(4, zero);
+    octeon_store_mpl(5, zero);
+}
+
+static bool trans_mtm(DisasContext *ctx, arg_r2 *a, unsigned int index)
+{
+    TCGv_i64 value = tcg_temp_new_i64();
+
+    /*
+     * Octeon3 two-source MTM forms load lane index from rs and lane index + 3
+     * from rt.  Legacy one-source forms encode rt as $zero.
+     */
+    gen_load_gpr(value, a->rs);
+    octeon_store_mpl(index, value);
+    gen_load_gpr(value, a->rt);
+    octeon_store_mpl(index + 3, value);
+    if (index == 0) {
+        octeon_reset_mtm0_mpl_state();
+    }
+    octeon_zero_partial_product_state();
+    return true;
+}
+
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
 TRANS(LBX,  trans_lx, MO_SB);
@@ -207,3 +263,4 @@ TRANS(LHUX, trans_lx, MO_UW);
 TRANS(LWX,  trans_lx, MO_SL);
 TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);
+TRANS(MTM0, trans_mtm, 0);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (14 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:46   ` Richard Henderson
  2026-05-11 18:22 ` [PATCH v6 17/35] target/mips: add Octeon MTP1 instruction James Hilliard
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTP0 loads the low Octeon3 partial-product pair from rs/rt into P[0]
and P[3] and sets P[1] to zero. Model the architecturally unpredictable
P[2], P[4], and P[5] lanes as zero for deterministic emulation.

Legacy single-source encodings have rt encoded as $zero, so the same
translator path also preserves the older Octeon behavior. Add the
translator storage path so subsequent VMULU/VMM0/V3MULU operations can
consume guest-managed partial products.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTP0 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Keep the Octeon3 two-source rt high-lane operand and document that
    legacy one-source MTP encodings use rt == $zero.

Changes v5 -> v6:
  - Zero P1 and model P2/P4/P5 as zero after checking the CN71XX
    register-state table and description.
---
 target/mips/tcg/octeon.decode      |  1 +
 target/mips/tcg/octeon_translate.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index bf1dab61e1..59ab7401ab 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -45,6 +45,7 @@ SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
 SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 &r2          rs rt
 MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
+MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 4507f8a5bc..88f64e791d 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -254,6 +254,33 @@ static bool trans_mtm(DisasContext *ctx, arg_r2 *a, unsigned int index)
     return true;
 }
 
+static bool trans_mtp(DisasContext *ctx, arg_r2 *a, unsigned int index)
+{
+    TCGv_i64 value = tcg_temp_new_i64();
+
+    /*
+     * Octeon3 two-source MTP forms load lane index from rs and lane index + 3
+     * from rt.  Legacy one-source forms encode rt as $zero.
+     */
+    gen_load_gpr(value, a->rs);
+    octeon_store_p(index, value);
+    gen_load_gpr(value, a->rt);
+    octeon_store_p(index + 3, value);
+    if (index == 0) {
+        /*
+         * The hardware description and register-state table define P1 as zero;
+         * model P2/P4/P5 as zero for deterministic emulation.
+         */
+        TCGv_i64 zero = tcg_constant_i64(0);
+
+        octeon_store_p(1, zero);
+        octeon_store_p(2, zero);
+        octeon_store_p(4, zero);
+        octeon_store_p(5, zero);
+    }
+    return true;
+}
+
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
 TRANS(LBX,  trans_lx, MO_SB);
@@ -264,3 +291,4 @@ TRANS(LWX,  trans_lx, MO_SL);
 TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);
 TRANS(MTM0, trans_mtm, 0);
+TRANS(MTP0, trans_mtp, 0);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 17/35] target/mips: add Octeon MTP1 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (15 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 18/35] target/mips: add Octeon MTP2 instruction James Hilliard
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTP1 loads the middle Octeon3 partial-product pair from rs/rt into P[1]
and P[4].

This completes the second guest-visible partial-product slot used by the
Octeon multiplier helpers.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTP1 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Describe the Octeon3 rs/rt P[1]/P[4] pair handled by the shared MTP
    translator.
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 59ab7401ab..670a12a956 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -46,6 +46,7 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 &r2          rs rt
 MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
 MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
+MTP1         011100 rs:5 rt:5 00000 00000 001010 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 88f64e791d..3f954eb134 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -292,3 +292,4 @@ TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);
 TRANS(MTM0, trans_mtm, 0);
 TRANS(MTP0, trans_mtp, 0);
+TRANS(MTP1, trans_mtp, 1);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 18/35] target/mips: add Octeon MTP2 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (16 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 17/35] target/mips: add Octeon MTP1 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 19/35] target/mips: add Octeon MTM1 instruction James Hilliard
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTP2 loads the high Octeon3 partial-product pair from rs/rt into P[2]
and P[5].

This exposes the final guest-managed partial-product slot for the
Octeon multiplier operations.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTP2 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Describe the Octeon3 rs/rt P[2]/P[5] pair handled by the shared MTP
    translator.
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 670a12a956..99606c80f6 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -47,6 +47,7 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
 MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
 MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
 MTP1         011100 rs:5 rt:5 00000 00000 001010 &r2
+MTP2         011100 rs:5 rt:5 00000 00000 001011 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 3f954eb134..55e9ca298c 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -293,3 +293,4 @@ TRANS(LDX,  trans_lx, MO_UQ);
 TRANS(MTM0, trans_mtm, 0);
 TRANS(MTP0, trans_mtp, 0);
 TRANS(MTP1, trans_mtp, 1);
+TRANS(MTP2, trans_mtp, 2);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 19/35] target/mips: add Octeon MTM1 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (17 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 18/35] target/mips: add Octeon MTP2 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 20/35] target/mips: add Octeon MTM2 instruction James Hilliard
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTM1 loads the middle Octeon3 multiplier operand pair from rs/rt into
MPL[1] and MPL[4].

Like the other MTM writes, it resets partial products so the following
multiplier operation starts from the newly programmed operand state.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTM1 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Describe the Octeon3 rs/rt MPL[1]/MPL[4] pair handled by the shared
    MTM translator.
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 99606c80f6..c85199ae32 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -48,6 +48,7 @@ MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
 MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
 MTP1         011100 rs:5 rt:5 00000 00000 001010 &r2
 MTP2         011100 rs:5 rt:5 00000 00000 001011 &r2
+MTM1         011100 rs:5 rt:5 00000 00000 001100 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 55e9ca298c..c9ea72d832 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -291,6 +291,7 @@ TRANS(LWX,  trans_lx, MO_SL);
 TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);
 TRANS(MTM0, trans_mtm, 0);
+TRANS(MTM1, trans_mtm, 1);
 TRANS(MTP0, trans_mtp, 0);
 TRANS(MTP1, trans_mtp, 1);
 TRANS(MTP2, trans_mtp, 2);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 20/35] target/mips: add Octeon MTM2 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (18 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 19/35] target/mips: add Octeon MTM1 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 21/35] target/mips: add Octeon VMULU instruction James Hilliard
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

MTM2 loads the high Octeon3 multiplier operand pair from rs/rt into
MPL[2] and MPL[5].

This supplies the final operand pair consumed by the V3MULU multiplier
helper.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split MTM2 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Describe the Octeon3 rs/rt MPL[2]/MPL[5] pair handled by the shared
    MTM translator.
---
 target/mips/tcg/octeon.decode      | 1 +
 target/mips/tcg/octeon_translate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index c85199ae32..682473b011 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -49,6 +49,7 @@ MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
 MTP1         011100 rs:5 rt:5 00000 00000 001010 &r2
 MTP2         011100 rs:5 rt:5 00000 00000 001011 &r2
 MTM1         011100 rs:5 rt:5 00000 00000 001100 &r2
+MTM2         011100 rs:5 rt:5 00000 00000 001101 &r2
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index c9ea72d832..86b384d312 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -292,6 +292,7 @@ TRANS(LWUX, trans_lx, MO_UL);
 TRANS(LDX,  trans_lx, MO_UQ);
 TRANS(MTM0, trans_mtm, 0);
 TRANS(MTM1, trans_mtm, 1);
+TRANS(MTM2, trans_mtm, 2);
 TRANS(MTP0, trans_mtp, 0);
 TRANS(MTP1, trans_mtp, 1);
 TRANS(MTP2, trans_mtp, 2);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 21/35] target/mips: add Octeon VMULU instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (19 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 20/35] target/mips: add Octeon MTM2 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 22/35] target/mips: add Octeon VMM0 instruction James Hilliard
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

VMULU multiplies the active Octeon multiplier state by rs, adds rt and
queued partial products, returns the low result, and advances P[0]/P[1]
with carry limbs.

Add helper and translator support for the two-limb accumulator operation.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split VMULU out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)
  - Use uadd64_overflow() for multiplier limb carry accumulation.
    (suggested by Richard Henderson)

Changes v5 -> v6:
  - Rename the translator helper callback typedef for clarity.
---
 target/mips/helper.h               |  1 +
 target/mips/tcg/octeon.decode      |  1 +
 target/mips/tcg/octeon_translate.c | 17 +++++++++++++++++
 target/mips/tcg/op_helper.c        | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index e2b83a1d19..f1e78ae329 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -24,6 +24,7 @@ DEF_HELPER_FLAGS_1(dbitswap, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_3(crc32, tl, tl, tl, i32)
 DEF_HELPER_3(crc32c, tl, tl, tl, i32)
 DEF_HELPER_FLAGS_4(rotx, TCG_CALL_NO_RWG_SE, tl, tl, i32, i32, i32)
+DEF_HELPER_3(octeon_vmulu, i64, env, i64, i64)
 
 /* microMIPS functions */
 DEF_HELPER_4(lwm, void, env, tl, tl, i32)
diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 682473b011..75834afc6c 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -50,6 +50,7 @@ MTP1         011100 rs:5 rt:5 00000 00000 001010 &r2
 MTP2         011100 rs:5 rt:5 00000 00000 001011 &r2
 MTM1         011100 rs:5 rt:5 00000 00000 001100 &r2
 MTM2         011100 rs:5 rt:5 00000 00000 001101 &r2
+VMULU        011100 ..... ..... ..... 00000 001111 @r3
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 86b384d312..348d0d8601 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -13,6 +13,8 @@
 /* Include the auto-generated decoder.  */
 #include "decode-octeon.c.inc"
 
+typedef void gen_helper_octeon_vmul(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+
 static bool trans_BBIT(DisasContext *ctx, arg_BBIT *a)
 {
     TCGv_i64 p;
@@ -281,6 +283,20 @@ static bool trans_mtp(DisasContext *ctx, arg_r2 *a, unsigned int index)
     return true;
 }
 
+static bool trans_vmul(DisasContext *ctx, arg_decode_ext_octeon1 *a,
+                       gen_helper_octeon_vmul *helper)
+{
+    TCGv_i64 rs = tcg_temp_new_i64();
+    TCGv_i64 rt = tcg_temp_new_i64();
+    TCGv_i64 rd = tcg_temp_new_i64();
+
+    gen_load_gpr(rs, a->rs);
+    gen_load_gpr(rt, a->rt);
+    helper(rd, tcg_env, rs, rt);
+    gen_store_gpr(rd, a->rd);
+    return true;
+}
+
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
 TRANS(LBX,  trans_lx, MO_SB);
@@ -296,3 +312,4 @@ TRANS(MTM2, trans_mtm, 2);
 TRANS(MTP0, trans_mtp, 0);
 TRANS(MTP1, trans_mtp, 1);
 TRANS(MTP2, trans_mtp, 2);
+TRANS(VMULU, trans_vmul, gen_helper_octeon_vmulu);
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index 4502ae2b5b..ab3fb06a16 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -144,6 +144,38 @@ target_ulong helper_rotx(target_ulong rs, uint32_t shift, uint32_t shiftx,
     return (int64_t)(int32_t)(uint32_t)tmp5;
 }
 
+static void octeon_add_limb(uint64_t *sum, int limb_count,
+                            uint64_t value, int limb)
+{
+    while (limb < limb_count &&
+           uadd64_overflow(sum[limb], value, &sum[limb])) {
+        value = 1;
+        limb++;
+    }
+}
+
+uint64_t helper_octeon_vmulu(CPUMIPSState *env, uint64_t rs, uint64_t rt)
+{
+    uint64_t lo, hi;
+    uint64_t sum[3] = {};
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[0], rs);
+    sum[0] = lo;
+    sum[1] = hi;
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[1], rs);
+    octeon_add_limb(sum, 3, lo, 1);
+    octeon_add_limb(sum, 3, hi, 2);
+
+    octeon_add_limb(sum, 3, rt, 0);
+    octeon_add_limb(sum, 3, env->active_tc.octeon.P[0], 0);
+    octeon_add_limb(sum, 3, env->active_tc.octeon.P[1], 1);
+
+    env->active_tc.octeon.P[0] = sum[1];
+    env->active_tc.octeon.P[1] = sum[2];
+    return sum[0];
+}
+
 /* these crc32 functions are based on target/loongarch/tcg/op_helper.c */
 target_ulong helper_crc32(target_ulong val, target_ulong m, uint32_t sz)
 {

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 22/35] target/mips: add Octeon VMM0 instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (20 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 21/35] target/mips: add Octeon VMULU instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 23/35] target/mips: add Octeon V3MULU instruction James Hilliard
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

VMM0 performs the VMULU accumulation, returns the low result, then feeds
that result back into the MTM0 multiplier state with a zero high operand.
It sets MPL[1] to zero, clears partial products, and models the remaining
architecturally unpredictable multiplier lanes as zero.

Add helper and translator support for this multiplier chain-update
operation.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split VMM0 out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Keep the Octeon3 MTM0-style high-lane update and set MPL[3] to zero
    when feeding the low result back.

Changes v5 -> v6:
  - Zero MPL1 and deterministic-zero the remaining modeled MTM0 lanes after
    checking the CN71XX VMM0 definition.
---
 target/mips/helper.h               |  1 +
 target/mips/tcg/octeon.decode      |  1 +
 target/mips/tcg/octeon_translate.c |  1 +
 target/mips/tcg/op_helper.c        | 20 ++++++++++++++++++++
 4 files changed, 23 insertions(+)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index f1e78ae329..46ccad95c3 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -25,6 +25,7 @@ DEF_HELPER_3(crc32, tl, tl, tl, i32)
 DEF_HELPER_3(crc32c, tl, tl, tl, i32)
 DEF_HELPER_FLAGS_4(rotx, TCG_CALL_NO_RWG_SE, tl, tl, i32, i32, i32)
 DEF_HELPER_3(octeon_vmulu, i64, env, i64, i64)
+DEF_HELPER_3(octeon_vmm0, i64, env, i64, i64)
 
 /* microMIPS functions */
 DEF_HELPER_4(lwm, void, env, tl, tl, i32)
diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 75834afc6c..c60af2d39a 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -51,6 +51,7 @@ MTP2         011100 rs:5 rt:5 00000 00000 001011 &r2
 MTM1         011100 rs:5 rt:5 00000 00000 001100 &r2
 MTM2         011100 rs:5 rt:5 00000 00000 001101 &r2
 VMULU        011100 ..... ..... ..... 00000 001111 @r3
+VMM0         011100 ..... ..... ..... 00000 010000 @r3
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 348d0d8601..75ab1daa70 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -313,3 +313,4 @@ TRANS(MTP0, trans_mtp, 0);
 TRANS(MTP1, trans_mtp, 1);
 TRANS(MTP2, trans_mtp, 2);
 TRANS(VMULU, trans_vmul, gen_helper_octeon_vmulu);
+TRANS(VMM0, trans_vmul, gen_helper_octeon_vmm0);
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index ab3fb06a16..45e208ca43 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -176,6 +176,26 @@ uint64_t helper_octeon_vmulu(CPUMIPSState *env, uint64_t rs, uint64_t rt)
     return sum[0];
 }
 
+uint64_t helper_octeon_vmm0(CPUMIPSState *env, uint64_t rs, uint64_t rt)
+{
+    uint64_t lo = helper_octeon_vmulu(env, rs, rt);
+
+    /*
+     * VMM0 is architecturally equivalent to VMULU followed by MTM0 with
+     * the low result and a zero high operand.
+     */
+    env->active_tc.octeon.MPL[0] = lo;
+    env->active_tc.octeon.MPL[1] = 0;
+    env->active_tc.octeon.MPL[2] = 0;
+    env->active_tc.octeon.MPL[3] = 0;
+    env->active_tc.octeon.MPL[4] = 0;
+    env->active_tc.octeon.MPL[5] = 0;
+    for (int i = 0; i < ARRAY_SIZE(env->active_tc.octeon.P); i++) {
+        env->active_tc.octeon.P[i] = 0;
+    }
+    return lo;
+}
+
 /* these crc32 functions are based on target/loongarch/tcg/op_helper.c */
 target_ulong helper_crc32(target_ulong val, target_ulong m, uint32_t sz)
 {

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 23/35] target/mips: add Octeon V3MULU instruction
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (21 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 22/35] target/mips: add Octeon VMM0 instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 24/35] target/mips: add Octeon QMAC instructions James Hilliard
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

V3MULU extends VMULU across the full Octeon3 multiplier state, adding rt
and queued partial products.

Return the low result while shifting the remaining accumulated limbs back
into P[0] through P[5].

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split V3MULU out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v3 -> v4:
  - Keep the Octeon3 MPL3-MPL5/P3-P5 high lanes used by the two-source
    MTM/MTP forms and Cavium SDK/runtime code.
---
 target/mips/helper.h               |  1 +
 target/mips/tcg/octeon.decode      |  1 +
 target/mips/tcg/octeon_translate.c |  1 +
 target/mips/tcg/op_helper.c        | 46 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 46ccad95c3..08fda55ae1 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -26,6 +26,7 @@ DEF_HELPER_3(crc32c, tl, tl, tl, i32)
 DEF_HELPER_FLAGS_4(rotx, TCG_CALL_NO_RWG_SE, tl, tl, i32, i32, i32)
 DEF_HELPER_3(octeon_vmulu, i64, env, i64, i64)
 DEF_HELPER_3(octeon_vmm0, i64, env, i64, i64)
+DEF_HELPER_3(octeon_v3mulu, i64, env, i64, i64)
 
 /* microMIPS functions */
 DEF_HELPER_4(lwm, void, env, tl, tl, i32)
diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index c60af2d39a..9c1fe8f4f1 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -52,6 +52,7 @@ MTM1         011100 rs:5 rt:5 00000 00000 001100 &r2
 MTM2         011100 rs:5 rt:5 00000 00000 001101 &r2
 VMULU        011100 ..... ..... ..... 00000 001111 @r3
 VMM0         011100 ..... ..... ..... 00000 010000 @r3
+V3MULU       011100 ..... ..... ..... 00000 010001 @r3
 
 &saa         base rt
 @saa         ...... base:5 rt:5 ................ &saa
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 75ab1daa70..2d836afddb 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -314,3 +314,4 @@ TRANS(MTP1, trans_mtp, 1);
 TRANS(MTP2, trans_mtp, 2);
 TRANS(VMULU, trans_vmul, gen_helper_octeon_vmulu);
 TRANS(VMM0, trans_vmul, gen_helper_octeon_vmm0);
+TRANS(V3MULU, trans_vmul, gen_helper_octeon_v3mulu);
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index 45e208ca43..740c181d27 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -196,6 +196,52 @@ uint64_t helper_octeon_vmm0(CPUMIPSState *env, uint64_t rs, uint64_t rt)
     return lo;
 }
 
+uint64_t helper_octeon_v3mulu(CPUMIPSState *env, uint64_t rs, uint64_t rt)
+{
+    uint64_t lo, hi;
+    uint64_t sum[OCTEON_MULTIPLIER_REGS + 1] = {};
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[0], rs);
+    sum[0] = lo;
+    sum[1] = hi;
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[1], rs);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), lo, 1);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), hi, 2);
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[2], rs);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), lo, 2);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), hi, 3);
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[3], rs);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), lo, 3);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), hi, 4);
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[4], rs);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), lo, 4);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), hi, 5);
+
+    mulu64(&lo, &hi, env->active_tc.octeon.MPL[5], rs);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), lo, 5);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), hi, 6);
+
+    octeon_add_limb(sum, ARRAY_SIZE(sum), rt, 0);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[0], 0);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[1], 1);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[2], 2);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[3], 3);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[4], 4);
+    octeon_add_limb(sum, ARRAY_SIZE(sum), env->active_tc.octeon.P[5], 5);
+
+    env->active_tc.octeon.P[0] = sum[1];
+    env->active_tc.octeon.P[1] = sum[2];
+    env->active_tc.octeon.P[2] = sum[3];
+    env->active_tc.octeon.P[3] = sum[4];
+    env->active_tc.octeon.P[4] = sum[5];
+    env->active_tc.octeon.P[5] = sum[6];
+    return sum[0];
+}
+
 /* these crc32 functions are based on target/loongarch/tcg/op_helper.c */
 target_ulong helper_crc32(target_ulong val, target_ulong m, uint32_t sz)
 {

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 24/35] target/mips: add Octeon QMAC instructions
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (22 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 23/35] target/mips: add Octeon V3MULU instruction James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 25/35] tests/tcg/mips: add Octeon instruction smoke test James Hilliard
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

QMAC.0x and QMACS.0x multiply the selected signed Q15 halfword lane
from rs by rt<15:0> and accumulate the Q31 product into the Octeon
HI/LO accumulator state.

QMAC updates the full 64-bit HI/LO accumulator. QMACS saturates the
32-bit Q31 result in LO and keeps HI<0> as the sticky saturation flag.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
 target/mips/helper.h               |  2 ++
 target/mips/tcg/octeon.decode      |  5 ++++
 target/mips/tcg/octeon_translate.c | 16 +++++++++++
 target/mips/tcg/op_helper.c        | 59 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 08fda55ae1..e93bc37903 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -27,6 +27,8 @@ DEF_HELPER_FLAGS_4(rotx, TCG_CALL_NO_RWG_SE, tl, tl, i32, i32, i32)
 DEF_HELPER_3(octeon_vmulu, i64, env, i64, i64)
 DEF_HELPER_3(octeon_vmm0, i64, env, i64, i64)
 DEF_HELPER_3(octeon_v3mulu, i64, env, i64, i64)
+DEF_HELPER_4(octeon_qmac, void, env, i64, i64, i32)
+DEF_HELPER_4(octeon_qmacs, void, env, i64, i64, i32)
 
 /* microMIPS functions */
 DEF_HELPER_4(lwm, void, env, tl, tl, i32)
diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 9c1fe8f4f1..5edcd95884 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -28,9 +28,12 @@ BBIT         11 set:1 . 10 rs:5 ..... offset:s16 p=%bbit_p
 # SEQI rt, rs, immediate
 # SNE rd, rs, rt
 # SNEI rt, rs, immediate
+# QMAC.0x rs, rt
+# QMACS.0x rs, rt
 
 @r3          ...... rs:5 rt:5 rd:5 ..... ......
 &cmpi        rs rt imm
+&qmac        rs rt lane
 %bitfield_p  0:1 6:5
 @bitfield    ...... rs:5 rt:5 lenm1:5 ..... ..... . p=%bitfield_p
 
@@ -43,6 +46,8 @@ SEQ          011100 ..... ..... ..... 00000 101010 @r3
 SNE          011100 ..... ..... ..... 00000 101011 @r3
 SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
 SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
+QMACS        011100 rs:5 rt:5 00000 000 lane:2 010010 &qmac
+QMAC         011100 rs:5 rt:5 00000 100 lane:2 010010 &qmac
 &r2          rs rt
 MTM0         011100 rs:5 rt:5 00000 00000 001000 &r2
 MTP0         011100 rs:5 rt:5 00000 00000 001001 &r2
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 2d836afddb..b41bc1f81e 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -14,6 +14,8 @@
 #include "decode-octeon.c.inc"
 
 typedef void gen_helper_octeon_vmul(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+typedef void gen_helper_octeon_qmac_fn(TCGv_ptr, TCGv_i64, TCGv_i64,
+                                       TCGv_i32);
 
 static bool trans_BBIT(DisasContext *ctx, arg_BBIT *a)
 {
@@ -156,6 +158,18 @@ static bool trans_SNEI(DisasContext *ctx, arg_SNEI *a)
     return do_seqi_snei(ctx, a, TCG_COND_NE);
 }
 
+static bool trans_qmac(DisasContext *ctx, arg_qmac *a,
+                       gen_helper_octeon_qmac_fn *helper)
+{
+    TCGv_i64 rs = tcg_temp_new_i64();
+    TCGv_i64 rt = tcg_temp_new_i64();
+
+    gen_load_gpr(rs, a->rs);
+    gen_load_gpr(rt, a->rt);
+    helper(tcg_env, rs, rt, tcg_constant_i32(a->lane));
+    return true;
+}
+
 static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
 {
     gen_lx(ctx, a->rd, a->base, a->index, mop);
@@ -299,6 +313,8 @@ static bool trans_vmul(DisasContext *ctx, arg_decode_ext_octeon1 *a,
 
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
+TRANS(QMAC,  trans_qmac, gen_helper_octeon_qmac);
+TRANS(QMACS, trans_qmac, gen_helper_octeon_qmacs);
 TRANS(LBX,  trans_lx, MO_SB);
 TRANS(LBUX, trans_lx, MO_UB);
 TRANS(LHX,  trans_lx, MO_SW);
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index 740c181d27..0a892e31a8 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -144,6 +144,65 @@ target_ulong helper_rotx(target_ulong rs, uint32_t shift, uint32_t shiftx,
     return (int64_t)(int32_t)(uint32_t)tmp5;
 }
 
+static int32_t octeon_mul_q15_q15(int16_t a, int16_t b, bool *overflow)
+{
+    if (a == INT16_MIN && b == INT16_MIN) {
+        *overflow = true;
+        return INT32_MAX;
+    }
+    return (int32_t)a * b * 2;
+}
+
+static int32_t octeon_sat32_acc_q31(int32_t acc, int32_t value,
+                                    bool *overflow)
+{
+    int64_t sum = (int64_t)acc + value;
+
+    if (sum > INT32_MAX) {
+        *overflow = true;
+        return INT32_MAX;
+    }
+    if (sum < INT32_MIN) {
+        *overflow = true;
+        return INT32_MIN;
+    }
+    return sum;
+}
+
+static int16_t octeon_qmac_lane(uint64_t rs, uint32_t lane)
+{
+    return (int16_t)(uint16_t)extract64(rs, lane * 16, 16);
+}
+
+void helper_octeon_qmac(CPUMIPSState *env, uint64_t rs, uint64_t rt,
+                        uint32_t lane)
+{
+    bool overflow = false;
+    int32_t product;
+    int64_t acc;
+
+    product = octeon_mul_q15_q15((int16_t)(uint16_t)rt,
+                                 octeon_qmac_lane(rs, lane), &overflow);
+    acc = deposit64(env->active_tc.LO[0], 32, 32, env->active_tc.HI[0]);
+    acc += product;
+
+    env->active_tc.LO[0] = (int64_t)(int32_t)acc;
+    env->active_tc.HI[0] = (int64_t)(int32_t)((uint64_t)acc >> 32);
+}
+
+void helper_octeon_qmacs(CPUMIPSState *env, uint64_t rs, uint64_t rt,
+                         uint32_t lane)
+{
+    bool overflow = env->active_tc.HI[0] & 1;
+    int32_t product;
+
+    product = octeon_mul_q15_q15((int16_t)(uint16_t)rt,
+                                 octeon_qmac_lane(rs, lane), &overflow);
+    env->active_tc.LO[0] = octeon_sat32_acc_q31(
+        (int32_t)(uint32_t)env->active_tc.LO[0], product, &overflow);
+    env->active_tc.HI[0] = overflow;
+}
+
 static void octeon_add_limb(uint64_t *sum, int limb_count,
                             uint64_t value, int limb)
 {

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 25/35] tests/tcg/mips: add Octeon instruction smoke test
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (23 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 24/35] target/mips: add Octeon QMAC instructions James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 26/35] target/mips: add Octeon LA* atomic instructions James Hilliard
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add a mips64/mips64el linux-user TCG smoke test for representative
Octeon integer, comparison, population count, and multiplier instruction
paths. Include hardware-backed regression coverage for VMM0 MPL1 zeroing
and MTP0 P1 zeroing.

Run the test with -cpu Octeon68XX and share the source between the
mips64 and mips64el target directories.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v2 -> v3:
  - Split the smoke test out of the combined Octeon arithmetic and memory
    instruction patch.  (requested by Richard Henderson)

Changes v5 -> v6:
  - Add VMM0/MPL1 and MTP0/P1 reset checks for the CN71XX-defined
    reset-state behavior.
---
 MAINTAINERS                                   |   2 +
 tests/tcg/mips/Makefile.target                |  11 ++
 tests/tcg/mips/user/isa/octeon/octeon-insns.c | 204 ++++++++++++++++++++++++++
 tests/tcg/mips64/Makefile.target              |  20 +++
 tests/tcg/mips64el/Makefile.target            |   8 +
 5 files changed, 245 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 93a1e4e482..f7e9c1b6b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -311,6 +311,8 @@ F: target/mips/
 F: disas/*mips.c
 F: docs/system/cpu-models-mips.rst.inc
 F: tests/tcg/mips/
+F: tests/tcg/mips64/
+F: tests/tcg/mips64el/
 
 OpenRISC TCG CPUs
 M: Stafford Horne <shorne@gmail.com>
diff --git a/tests/tcg/mips/Makefile.target b/tests/tcg/mips/Makefile.target
index 5d17c1706e..d9dc16f8ec 100644
--- a/tests/tcg/mips/Makefile.target
+++ b/tests/tcg/mips/Makefile.target
@@ -8,6 +8,17 @@ MIPS_SRC=$(SRC_PATH)/tests/tcg/mips
 # Set search path for all sources
 VPATH 		+= $(MIPS_SRC)
 
+ifneq ($(findstring 64,$(TARGET_NAME)),)
+VPATH += $(MIPS_SRC)/user/isa/octeon
+
+MIPS64_TESTS=octeon-insns
+
+TESTS += $(MIPS64_TESTS)
+
+octeon-insns: CFLAGS+=-mabi=64
+run-octeon-insns: QEMU_OPTS+=-cpu Octeon68XX
+endif
+
 # hello-mips is 32 bit only
 ifeq ($(findstring 64,$(TARGET_NAME)),)
 MIPS_TESTS=hello-mips
diff --git a/tests/tcg/mips/user/isa/octeon/octeon-insns.c b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
new file mode 100644
index 0000000000..9153e37e9e
--- /dev/null
+++ b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
@@ -0,0 +1,204 @@
+/*
+ * Test Octeon-specific user-mode instructions.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <assert.h>
+#include <stdint.h>
+
+static uint64_t octeon_baddu(uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x71095028\n\t" /* baddu $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_dmul(uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x71095003\n\t" /* dmul $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_dpop(uint64_t rs)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        ".word 0x7100502d\n\t" /* dpop $10, $8 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs)
+        : "$8", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_seq(uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x7109502a\n\t" /* seq $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_sne(uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x7109502b\n\t" /* sne $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_vmulu(uint64_t mpl0, uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[mpl0]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090008\n\t" /* mtm0 $8, $9 */
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x7109500f\n\t" /* vmulu $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [mpl0] "r" (mpl0), [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_vmm0(uint64_t mpl0, uint64_t p0,
+                            uint64_t rs, uint64_t rt)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[mpl0]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090008\n\t" /* mtm0 $8, $9 */
+        "move $8, %[p0]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090009\n\t" /* mtp0 $8, $9 */
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        ".word 0x71095010\n\t" /* vmm0 $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [mpl0] "r" (mpl0), [p0] "r" (p0),
+          [rs] "r" (rs), [rt] "r" (rt)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_vmm0_zeroes_mpl1(void)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[mpl0]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090008\n\t" /* mtm0 $8, $9 */
+        "move $8, %[mpl1]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109000c\n\t" /* mtm1 $8, $9 */
+        "move $8, %[vmm0_rs]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71095010\n\t" /* vmm0 $10, $8, $9 */
+        "move $8, %[vmulu_rs]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109500f\n\t" /* vmulu $10, $8, $9 */
+        "move $8, $0\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109500f\n\t" /* vmulu $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [mpl0] "r" (1ULL), [mpl1] "r" (1ULL),
+          [vmm0_rs] "r" (2ULL), [vmulu_rs] "r" (1ULL)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_mtp0_zeroes_p1(void)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[mpl0]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090008\n\t" /* mtm0 $8, $9 */
+        "move $8, %[p1]\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109000a\n\t" /* mtp1 $8, $9 */
+        "move $8, $0\n\t"
+        "move $9, $0\n\t"
+        ".word 0x71090009\n\t" /* mtp0 $8, $9 */
+        "move $8, $0\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109500f\n\t" /* vmulu $10, $8, $9 */
+        "move $8, $0\n\t"
+        "move $9, $0\n\t"
+        ".word 0x7109500f\n\t" /* vmulu $10, $8, $9 */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [mpl0] "r" (0ULL), [p1] "r" (1ULL)
+        : "$8", "$9", "$10");
+
+    return rd;
+}
+
+int main(void)
+{
+    assert(octeon_baddu(0x123, 0x0f0) == 0x13);
+    assert(octeon_dmul(0x12345678, 0x10) == 0x123456780);
+    assert(octeon_dpop(0xf0f0f0f0f0f0f0f0ULL) == 32);
+    assert(octeon_seq(0xabc, 0xabc) == 1);
+    assert(octeon_seq(0xabc, 0xdef) == 0);
+    assert(octeon_sne(0xabc, 0xabc) == 0);
+    assert(octeon_sne(0xabc, 0xdef) == 1);
+    assert(octeon_vmulu(5, 7, 11) == 46);
+    assert(octeon_vmm0(5, 13, 7, 11) == 59);
+    assert(octeon_vmm0_zeroes_mpl1() == 0);
+    assert(octeon_mtp0_zeroes_p1() == 0);
+
+    return 0;
+}
diff --git a/tests/tcg/mips64/Makefile.target b/tests/tcg/mips64/Makefile.target
new file mode 100644
index 0000000000..042855844a
--- /dev/null
+++ b/tests/tcg/mips64/Makefile.target
@@ -0,0 +1,20 @@
+# -*- Mode: makefile -*-
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# MIPS64 - included from tests/tcg/Makefile.target
+#
+
+MIPS64_SRC=$(SRC_PATH)/tests/tcg/mips64
+MIPS_OCTEON_SRC=$(SRC_PATH)/tests/tcg/mips/user/isa/octeon
+
+# Set search path for all sources
+VPATH 		+= $(MIPS64_SRC) $(MIPS_OCTEON_SRC)
+
+MIPS64_TESTS=octeon-insns
+
+TESTS += $(MIPS64_TESTS)
+
+$(MIPS64_TESTS): CFLAGS+=-mabi=64
+
+run-octeon-insns: QEMU_OPTS+=-cpu Octeon68XX
diff --git a/tests/tcg/mips64el/Makefile.target b/tests/tcg/mips64el/Makefile.target
new file mode 100644
index 0000000000..dbc5f8dc5f
--- /dev/null
+++ b/tests/tcg/mips64el/Makefile.target
@@ -0,0 +1,8 @@
+# -*- Mode: makefile -*-
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# MIPS64 little-endian - included from tests/tcg/Makefile.target
+#
+
+include $(SRC_PATH)/tests/tcg/mips64/Makefile.target

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 26/35] target/mips: add Octeon LA* atomic instructions
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (24 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 25/35] tests/tcg/mips: add Octeon instruction smoke test James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 27/35] target/mips: add Octeon COP2 crypto core support James Hilliard
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Implement the Octeon LA* read-modify-write atomic instruction family:
LAI/LAID, LAD/LADD, LAA/LAAD, LAS/LASD, LAC/LACD, and LAW/LAWD.

These operations are architecturally distinct from SAA/SAAD and are used
by existing Octeon user-mode code for atomic counters, bit operations,
and exchange-style updates.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Keep LA* atomics naturally aligned per Octeon L2 transaction
    semantics.
  - Use explicit i64 TCG ops in the LA* translator paths.  (suggested by
    Philippe Mathieu-Daudé)

Changes v2 -> v3:
  - Drop redundant TARGET_LONG_BITS guards from doubleword atomic paths.
    (suggested by Richard Henderson)
  - Group LA* translator wrappers by argument shape instead of adding one
    wrapper per instruction.  (suggested by Richard Henderson)

Changes v3 -> v4:
  - Use i64 atomic helpers for both word and doubleword paths and select
    word sign-extension through MO_SL.  (suggested by Richard Henderson)

Changes v5 -> v6:
  - Rename the shared translator helpers to distinguish fetch-add and
    exchange operations.
---
 target/mips/tcg/octeon.decode      | 17 +++++++++
 target/mips/tcg/octeon_translate.c | 74 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 5edcd95884..801d35d680 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -64,6 +64,23 @@ V3MULU       011100 ..... ..... ..... 00000 010001 @r3
 SAA          011100 ..... ..... 00000 00000 011000 @saa
 SAAD         011100 ..... ..... 00000 00000 011001 @saa
 
+&la          base rd
+&laa         base add rd
+@la          ...... base:5 ..... rd:5 ........... &la
+@laa         ...... base:5 add:5 rd:5 ........... &laa
+LAI          011100 ..... 00000 ..... 00010 011111 @la
+LAID         011100 ..... 00000 ..... 00011 011111 @la
+LAD          011100 ..... 00000 ..... 00110 011111 @la
+LADD         011100 ..... 00000 ..... 00111 011111 @la
+LAA          011100 ..... ..... ..... 10010 011111 @laa
+LAAD         011100 ..... ..... ..... 10011 011111 @laa
+LAS          011100 ..... 00000 ..... 01010 011111 @la
+LASD         011100 ..... 00000 ..... 01011 011111 @la
+LAC          011100 ..... 00000 ..... 01110 011111 @la
+LACD         011100 ..... 00000 ..... 01111 011111 @la
+LAW          011100 ..... ..... ..... 10110 011111 @laa
+LAWD         011100 ..... ..... ..... 10111 011111 @laa
+
 &zcb         base
 ZCB          011100 base:5 00000 00000 11100 011111 &zcb
 ZCB          011100 base:5 00000 00000 11101 011111 &zcb  # ZCBT
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index b41bc1f81e..07674f0d44 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -190,6 +190,68 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
     return true;
 }
 
+static bool trans_la_fetch_add(DisasContext *ctx, int base, int add_reg,
+                               int rd, int64_t imm, MemOp mop)
+{
+    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv_i64 value = tcg_temp_new_i64();
+    TCGv_i64 old = tcg_temp_new_i64();
+    MemOp amo = mo_endian(ctx) | mop | MO_ALIGN;
+
+    gen_base_offset_addr(ctx, addr, base, 0);
+
+    if (add_reg >= 0) {
+        gen_load_gpr(value, add_reg);
+    } else {
+        tcg_gen_movi_i64(value, imm);
+    }
+
+    tcg_gen_atomic_fetch_add_i64(old, addr, value, ctx->mem_idx, amo);
+    gen_store_gpr(old, rd);
+    return true;
+}
+
+static bool trans_la_xchg(DisasContext *ctx, int base, int add_reg, int rd,
+                          int64_t imm, MemOp mop)
+{
+    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv_i64 value = tcg_temp_new_i64();
+    TCGv_i64 old = tcg_temp_new_i64();
+    MemOp amo = mo_endian(ctx) | mop | MO_ALIGN;
+
+    gen_base_offset_addr(ctx, addr, base, 0);
+
+    if (add_reg >= 0) {
+        gen_load_gpr(value, add_reg);
+    } else {
+        tcg_gen_movi_i64(value, imm);
+    }
+
+    tcg_gen_atomic_xchg_i64(old, addr, value, ctx->mem_idx, amo);
+    gen_store_gpr(old, rd);
+    return true;
+}
+
+static bool do_la_imm_add(DisasContext *ctx, arg_la *a, int64_t imm, MemOp mop)
+{
+    return trans_la_fetch_add(ctx, a->base, -1, a->rd, imm, mop);
+}
+
+static bool do_la_reg_add(DisasContext *ctx, arg_laa *a, MemOp mop)
+{
+    return trans_la_fetch_add(ctx, a->base, a->add, a->rd, 0, mop);
+}
+
+static bool do_la_imm_xchg(DisasContext *ctx, arg_la *a, int64_t imm, MemOp mop)
+{
+    return trans_la_xchg(ctx, a->base, -1, a->rd, imm, mop);
+}
+
+static bool do_la_reg_xchg(DisasContext *ctx, arg_laa *a, MemOp mop)
+{
+    return trans_la_xchg(ctx, a->base, a->add, a->rd, 0, mop);
+}
+
 static bool trans_ZCB(DisasContext *ctx, arg_ZCB *a)
 {
     TCGv_i64 addr = tcg_temp_new_i64();
@@ -313,6 +375,18 @@ static bool trans_vmul(DisasContext *ctx, arg_decode_ext_octeon1 *a,
 
 TRANS(SAA,  trans_saa, MO_UL);
 TRANS(SAAD, trans_saa, MO_UQ);
+TRANS(LAI,  do_la_imm_add, 1, MO_SL);
+TRANS(LAID, do_la_imm_add, 1, MO_UQ);
+TRANS(LAD,  do_la_imm_add, -1, MO_SL);
+TRANS(LADD, do_la_imm_add, -1, MO_UQ);
+TRANS(LAA,  do_la_reg_add, MO_SL);
+TRANS(LAAD, do_la_reg_add, MO_UQ);
+TRANS(LAS,  do_la_imm_xchg, -1, MO_SL);
+TRANS(LASD, do_la_imm_xchg, -1, MO_UQ);
+TRANS(LAC,  do_la_imm_xchg, 0, MO_SL);
+TRANS(LACD, do_la_imm_xchg, 0, MO_UQ);
+TRANS(LAW,  do_la_reg_xchg, MO_SL);
+TRANS(LAWD, do_la_reg_xchg, MO_UQ);
 TRANS(QMAC,  trans_qmac, gen_helper_octeon_qmac);
 TRANS(QMACS, trans_qmac, gen_helper_octeon_qmacs);
 TRANS(LBX,  trans_lx, MO_SB);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 27/35] target/mips: add Octeon COP2 crypto core support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (25 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 26/35] target/mips: add Octeon LA* atomic instructions James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:22 ` [PATCH v6 28/35] target/mips: add Octeon SMS4 crypto support James Hilliard
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Octeon processors expose their crypto engines through selector-driven
DMFC2/DMTC2 accesses rather than architected standalone opcodes. Add the
common COP2 state, selector decode, and helper plumbing for the base
engine set.

This covers the hash, AES, CRC, GFM, 3DES, KASUMI, and SNOW3G engines
and moves the implementation into octeon_crypto.c to keep the MIPS
helper layer manageable.

Model the AES RESINP selector bank as writable as well as
readable. Octeon COP2 engines use these slots as result-input staging
registers, so the shared bank belongs with the base selector support.

Extend the TCG smoke test with AES key-register readback checks for the
selector window introduced here.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Fold the AES COP2 selector readback smoke coverage into this patch.
  - Move Octeon COP2 decode plumbing into octeon_translate.c.
    (suggested by Philippe Mathieu-Daudé)
  - Use uint64_t/i64 helper types for Octeon COP2 state transfers.
    (suggested by Philippe Mathieu-Daudé)

Changes v2 -> v3:
  - Remove redundant MIPS64 checks from Octeon COP2 translation; the
    opcode path is already restricted to TARGET_MIPS64 Octeon CPUs.
    (suggested by Richard Henderson)

Changes v5 -> v6:
  - Rename COP2 selector constants and comments to use RESINP/INP,
    HSH_STARTSHA, and MF/MT direction suffixes from the hardware selector
    naming.
  - Rename COP2 crypto state fields and shared-window helpers to use
    HSH_IV/HSH_DAT/HSH_IVW/HSH_DATW and GFM_RESINP hardware naming.
  - Rename HSH register word-packing helpers from octeon_hash_* to
    octeon_hsh_*.
---
 target/mips/cpu.h                             |  165 +++
 target/mips/helper.h                          |    2 +
 target/mips/system/machine.c                  |   37 +
 target/mips/tcg/meson.build                   |    1 +
 target/mips/tcg/octeon_crypto.c               | 1654 +++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c            |  204 +++
 target/mips/tcg/translate.c                   |    9 +
 target/mips/tcg/translate.h                   |    1 +
 tests/tcg/mips/user/isa/octeon/octeon-insns.c |   71 ++
 9 files changed, 2144 insertions(+)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 346713705a..e16f0f6e98 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -537,6 +537,169 @@ struct TCState {
 };
 
 struct MIPSITUState;
+typedef enum MIPSOcteonSharedMode {
+    OCTEON_SHARED_MODE_NONE = 0,
+    OCTEON_SHARED_MODE_SHA512,
+    OCTEON_SHARED_MODE_SNOW3G,
+} MIPSOcteonSharedMode;
+
+typedef enum MIPSOcteonCop2Sel {
+    OCTEON_COP2_SEL_HSH_DAT0 = 0x0040,
+    OCTEON_COP2_SEL_HSH_DAT1,
+    OCTEON_COP2_SEL_HSH_DAT2,
+    OCTEON_COP2_SEL_HSH_DAT3,
+    OCTEON_COP2_SEL_HSH_DAT4,
+    OCTEON_COP2_SEL_HSH_DAT5,
+    OCTEON_COP2_SEL_HSH_DAT6,
+    OCTEON_COP2_SEL_HSH_IV0 = 0x0048,
+    OCTEON_COP2_SEL_HSH_IV1,
+    OCTEON_COP2_SEL_HSH_IV2,
+    OCTEON_COP2_SEL_HSH_IV3,
+    OCTEON_COP2_SEL_SHA3_DAT24 = 0x0050,
+    OCTEON_COP2_SEL_SHA3_DAT15_MT = 0x0051,
+    OCTEON_COP2_SEL_HSH_STARTSHA_COMPAT = 0x0057,
+    OCTEON_COP2_SEL_GFM_MUL_REFLECT0 = 0x0058,
+    OCTEON_COP2_SEL_GFM_MUL_REFLECT1,
+    OCTEON_COP2_SEL_GFM_RESINP_REFLECT0 = 0x005a,
+    OCTEON_COP2_SEL_GFM_RESINP_REFLECT1,
+    OCTEON_COP2_SEL_GFM_XOR0_REFLECT = 0x005c,
+    OCTEON_COP2_SEL_3DES_KEY0 = 0x0080,
+    OCTEON_COP2_SEL_3DES_KEY1,
+    OCTEON_COP2_SEL_3DES_KEY2,
+    OCTEON_COP2_SEL_3DES_IV = 0x0084,
+    OCTEON_COP2_SEL_3DES_RESULT_MF = 0x0088,
+    OCTEON_COP2_SEL_3DES_RESULT_MT = 0x0098,
+    OCTEON_COP2_SEL_3DES_ENC_CBC = 0x4088,
+    /*
+     * Octeon reuses the 3DES key/result bank for KASUMI and only adds
+     * KASUMI-specific operation selectors.
+     */
+    OCTEON_COP2_SEL_KAS_ENC_CBC = 0x4089,
+    OCTEON_COP2_SEL_3DES_ENC = 0x408a,
+    OCTEON_COP2_SEL_KAS_ENC = 0x408b,
+    OCTEON_COP2_SEL_3DES_DEC_CBC = 0x408c,
+    OCTEON_COP2_SEL_3DES_DEC = 0x408e,
+    OCTEON_COP2_SEL_AES_RESINP0 = 0x0100,
+    OCTEON_COP2_SEL_AES_RESINP1,
+    OCTEON_COP2_SEL_AES_IV0,
+    OCTEON_COP2_SEL_AES_IV1,
+    OCTEON_COP2_SEL_AES_KEY0,
+    OCTEON_COP2_SEL_AES_KEY1,
+    OCTEON_COP2_SEL_AES_KEY2,
+    OCTEON_COP2_SEL_AES_KEY3,
+    OCTEON_COP2_SEL_AES_ENC_CBC0 = 0x0108,
+    OCTEON_COP2_SEL_AES_ENC0 = 0x010a,
+    OCTEON_COP2_SEL_AES_DEC_CBC0 = 0x010c,
+    OCTEON_COP2_SEL_AES_DEC0 = 0x010e,
+    OCTEON_COP2_SEL_AES_KEYLENGTH = 0x0110,
+    OCTEON_COP2_SEL_AES_INP0 = 0x0111,
+    OCTEON_COP2_SEL_CRC_POLYNOMIAL = 0x0200,
+    OCTEON_COP2_SEL_CRC_IV = 0x0201,
+    OCTEON_COP2_SEL_CRC_LEN = 0x0202,
+    OCTEON_COP2_SEL_CRC_IV_REFLECT = 0x0203,
+    OCTEON_COP2_SEL_CRC_WRITE_BYTE = 0x0204,
+    OCTEON_COP2_SEL_CRC_WRITE_HALF = 0x0205,
+    OCTEON_COP2_SEL_CRC_WRITE_WORD = 0x0206,
+    OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL = 0x4200,
+    OCTEON_COP2_SEL_CRC_WRITE_LEN = 0x1202,
+    OCTEON_COP2_SEL_CRC_WRITE_IV_REFLECT = 0x0211,
+    OCTEON_COP2_SEL_CRC_WRITE_BYTE_REFLECT = 0x0214,
+    OCTEON_COP2_SEL_CRC_WRITE_HALF_REFLECT = 0x0215,
+    OCTEON_COP2_SEL_CRC_WRITE_WORD_REFLECT = 0x0216,
+    OCTEON_COP2_SEL_CRC_WRITE_DWORD = 0x1207,
+    OCTEON_COP2_SEL_CRC_WRITE_VAR = 0x1208,
+    OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL_REFLECT = 0x4210,
+    OCTEON_COP2_SEL_CRC_WRITE_DWORD_REFLECT = 0x1217,
+    OCTEON_COP2_SEL_CRC_WRITE_VAR_REFLECT = 0x1218,
+    /*
+     * Octeon shares 0x0240..0x0257 between SHA512 state/data and the SNOW3G
+     * RESULT/FSM/LFSR window.
+     */
+    OCTEON_COP2_SEL_HSH_DATW0 = 0x0240,
+    OCTEON_COP2_SEL_HSH_DATW1,
+    OCTEON_COP2_SEL_HSH_DATW2,
+    OCTEON_COP2_SEL_HSH_DATW3,
+    OCTEON_COP2_SEL_HSH_DATW4,
+    OCTEON_COP2_SEL_HSH_DATW5,
+    OCTEON_COP2_SEL_HSH_DATW6,
+    OCTEON_COP2_SEL_HSH_DATW7,
+    OCTEON_COP2_SEL_HSH_DATW8,
+    OCTEON_COP2_SEL_HSH_DATW9,
+    OCTEON_COP2_SEL_HSH_DATW10,
+    OCTEON_COP2_SEL_HSH_DATW11,
+    OCTEON_COP2_SEL_HSH_DATW12,
+    OCTEON_COP2_SEL_HSH_DATW13,
+    OCTEON_COP2_SEL_HSH_DATW14,
+    OCTEON_COP2_SEL_HSH_DATW15,
+    OCTEON_COP2_SEL_HSH_IVW0 = 0x0250,
+    OCTEON_COP2_SEL_HSH_IVW1,
+    OCTEON_COP2_SEL_HSH_IVW2,
+    OCTEON_COP2_SEL_HSH_IVW3,
+    OCTEON_COP2_SEL_HSH_IVW4,
+    OCTEON_COP2_SEL_HSH_IVW5,
+    OCTEON_COP2_SEL_HSH_IVW6,
+    OCTEON_COP2_SEL_HSH_IVW7,
+    OCTEON_COP2_SEL_SNOW3G_LFSR0 = 0x0240,
+    OCTEON_COP2_SEL_SNOW3G_LFSR1,
+    OCTEON_COP2_SEL_SNOW3G_LFSR2,
+    OCTEON_COP2_SEL_SNOW3G_LFSR3,
+    OCTEON_COP2_SEL_SNOW3G_LFSR4,
+    OCTEON_COP2_SEL_SNOW3G_LFSR5,
+    OCTEON_COP2_SEL_SNOW3G_LFSR6,
+    OCTEON_COP2_SEL_SNOW3G_LFSR7,
+    OCTEON_COP2_SEL_SNOW3G_RESULT = 0x0250,
+    OCTEON_COP2_SEL_SNOW3G_FSM0,
+    OCTEON_COP2_SEL_SNOW3G_FSM1,
+    OCTEON_COP2_SEL_SNOW3G_FSM2,
+    OCTEON_COP2_SEL_GFM_MUL0 = 0x0258,
+    OCTEON_COP2_SEL_GFM_MUL1,
+    OCTEON_COP2_SEL_GFM_RESINP0,
+    OCTEON_COP2_SEL_GFM_RESINP1,
+    OCTEON_COP2_SEL_GFM_XOR0,
+    OCTEON_COP2_SEL_GFM_POLY = 0x025e,
+    OCTEON_COP2_SEL_AES_ENC_CBC1 = 0x3109,
+    OCTEON_COP2_SEL_AES_ENC1 = 0x310b,
+    OCTEON_COP2_SEL_AES_DEC_CBC1 = 0x310d,
+    OCTEON_COP2_SEL_AES_DEC1 = 0x310f,
+    OCTEON_COP2_SEL_HSH_STARTMD5 = 0x4047,
+    OCTEON_COP2_SEL_SNOW3G_START = 0x404d,
+    OCTEON_COP2_SEL_SNOW3G_MORE = 0x404e,
+    OCTEON_COP2_SEL_HSH_STARTSHA256 = 0x404f,
+    OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT = 0x405d,
+    OCTEON_COP2_SEL_HSH_STARTSHA = 0x4057,
+    OCTEON_COP2_SEL_HSH_STARTSHA512 = 0x424f,
+    OCTEON_COP2_SEL_GFM_XORMUL1 = 0x425d,
+} MIPSOcteonCop2Sel;
+
+typedef struct MIPSOcteonCryptoState {
+    uint64_t des3_key[3];
+    uint64_t des3_iv;
+    uint64_t des3_result;
+    uint64_t hsh_iv[4];
+    uint64_t hsh_dat[8];
+    uint64_t hsh_ivw[8];
+    uint64_t hsh_datw[16];
+    uint64_t aes_iv[2];
+    uint64_t aes_key[4];
+    uint64_t aes_result[2];
+    uint64_t aes_input[2];
+    uint64_t gfm_mul[2];
+    uint64_t gfm_resinp[2];
+    uint64_t gfm_xor0;
+    uint64_t gfm_reflect_mul[2];
+    uint64_t gfm_reflect_resinp[2];
+    uint64_t gfm_reflect_xor0;
+    uint16_t gfm_poly;
+    uint8_t aes_keylen;
+    uint32_t shared_mode;
+    uint32_t crc_poly;
+    uint32_t crc_iv;
+    uint32_t crc_len;
+    uint32_t snow3g_fsm[3];
+    uint32_t snow3g_lfsr[16];
+    uint64_t snow3g_result;
+} MIPSOcteonCryptoState;
+
 typedef struct CPUArchState {
     TCState active_tc;
     CPUMIPSFPUContext active_fpu;
@@ -558,6 +721,8 @@ typedef struct CPUArchState {
 #define MSAIR_ProcID    8
 #define MSAIR_Rev       0
 
+    MIPSOcteonCryptoState octeon_crypto;
+
 /*
  * CP0 Register 0
  */
diff --git a/target/mips/helper.h b/target/mips/helper.h
index e93bc37903..52fe18a8f8 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -29,6 +29,8 @@ DEF_HELPER_3(octeon_vmm0, i64, env, i64, i64)
 DEF_HELPER_3(octeon_v3mulu, i64, env, i64, i64)
 DEF_HELPER_4(octeon_qmac, void, env, i64, i64, i32)
 DEF_HELPER_4(octeon_qmacs, void, env, i64, i64, i32)
+DEF_HELPER_2(octeon_cop2_dmfc2, i64, env, i32)
+DEF_HELPER_3(octeon_cop2_dmtc2, void, env, i64, i32)
 
 /* microMIPS functions */
 DEF_HELPER_4(lwm, void, env, tl, tl, i32)
diff --git a/target/mips/system/machine.c b/target/mips/system/machine.c
index f988b3695b..ebfa0a9eb0 100644
--- a/target/mips/system/machine.c
+++ b/target/mips/system/machine.c
@@ -279,6 +279,42 @@ static const VMStateDescription mips_vmstate_octeon_multiplier = {
     }
 };
 
+static const VMStateDescription mips_vmstate_octeon_crypto = {
+    .name = "cpu/octeon_crypto",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mips_octeon_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.des3_key, MIPSCPU, 3),
+        VMSTATE_UINT64(env.octeon_crypto.des3_iv, MIPSCPU),
+        VMSTATE_UINT64(env.octeon_crypto.des3_result, MIPSCPU),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_iv, MIPSCPU, 4),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_dat, MIPSCPU, 8),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_ivw, MIPSCPU, 8),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_datw, MIPSCPU, 16),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_iv, MIPSCPU, 2),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_key, MIPSCPU, 4),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_result, MIPSCPU, 2),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_input, MIPSCPU, 2),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.gfm_mul, MIPSCPU, 2),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.gfm_resinp, MIPSCPU, 2),
+        VMSTATE_UINT64(env.octeon_crypto.gfm_xor0, MIPSCPU),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.gfm_reflect_mul, MIPSCPU, 2),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.gfm_reflect_resinp, MIPSCPU, 2),
+        VMSTATE_UINT64(env.octeon_crypto.gfm_reflect_xor0, MIPSCPU),
+        VMSTATE_UINT16(env.octeon_crypto.gfm_poly, MIPSCPU),
+        VMSTATE_UINT8(env.octeon_crypto.aes_keylen, MIPSCPU),
+        VMSTATE_UINT32(env.octeon_crypto.shared_mode, MIPSCPU),
+        VMSTATE_UINT32(env.octeon_crypto.crc_poly, MIPSCPU),
+        VMSTATE_UINT32(env.octeon_crypto.crc_iv, MIPSCPU),
+        VMSTATE_UINT32(env.octeon_crypto.crc_len, MIPSCPU),
+        VMSTATE_UINT32_ARRAY(env.octeon_crypto.snow3g_fsm, MIPSCPU, 3),
+        VMSTATE_UINT32_ARRAY(env.octeon_crypto.snow3g_lfsr, MIPSCPU, 16),
+        VMSTATE_UINT64(env.octeon_crypto.snow3g_result, MIPSCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_mips_cpu = {
     .name = "cpu",
     .version_id = 21,
@@ -396,6 +432,7 @@ const VMStateDescription vmstate_mips_cpu = {
     .subsections = (const VMStateDescription * const []) {
         &mips_vmstate_timer,
         &mips_vmstate_octeon_multiplier,
+        &mips_vmstate_octeon_crypto,
         NULL
     }
 };
diff --git a/target/mips/tcg/meson.build b/target/mips/tcg/meson.build
index fff9cd6c7f..4ee359874a 100644
--- a/target/mips/tcg/meson.build
+++ b/target/mips/tcg/meson.build
@@ -18,6 +18,7 @@ mips_ss.add(files(
   'lmmi_helper.c',
   'msa_helper.c',
   'msa_translate.c',
+  'octeon_crypto.c',
   'op_helper.c',
   'rel6_translate.c',
   'translate.c',
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
new file mode 100644
index 0000000000..8b3260c4d6
--- /dev/null
+++ b/target/mips/tcg/octeon_crypto.c
@@ -0,0 +1,1654 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * MIPS Octeon crypto emulation helpers.
+ *
+ * Copyright (c) 2026 James Hilliard
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internal.h"
+#include "exec/helper-proto.h"
+#include "crypto/aes.h"
+#include "crypto/clmul.h"
+#include "qemu/bitops.h"
+#include "qemu/host-utils.h"
+
+static inline void octeon_set_shared_mode(MIPSOcteonCryptoState *crypto,
+                                          MIPSOcteonSharedMode mode)
+{
+    crypto->shared_mode = mode;
+}
+
+static inline uint32_t octeon_crc_reflect32_by_byte(uint32_t v)
+{
+    return bswap32(revbit32(v));
+}
+
+static uint32_t octeon_crc_state_reflect(const MIPSOcteonCryptoState *crypto)
+{
+    return octeon_crc_reflect32_by_byte(crypto->crc_iv);
+}
+
+static void octeon_crc_set_state_reflect(MIPSOcteonCryptoState *crypto,
+                                         uint32_t state)
+{
+    crypto->crc_iv = octeon_crc_reflect32_by_byte(state);
+}
+
+static void octeon_crc_update_normal(MIPSOcteonCryptoState *crypto,
+                                     uint64_t value, unsigned int bytes)
+{
+    uint32_t crc = crypto->crc_iv;
+    uint32_t poly = crypto->crc_poly;
+
+    for (unsigned int i = 0; i < bytes; i++) {
+        uint8_t byte = value >> ((bytes - 1 - i) * 8);
+
+        crc ^= (uint32_t)byte << 24;
+        for (int bit = 0; bit < 8; bit++) {
+            if (crc & 0x80000000U) {
+                crc = (crc << 1) ^ poly;
+            } else {
+                crc <<= 1;
+            }
+        }
+    }
+
+    crypto->crc_iv = crc;
+}
+
+static void octeon_crc_update_reflect(MIPSOcteonCryptoState *crypto,
+                                      uint64_t value, unsigned int bytes)
+{
+    uint32_t crc = octeon_crc_state_reflect(crypto);
+    uint32_t poly = bswap32(crypto->crc_poly);
+
+    for (unsigned int i = 0; i < bytes; i++) {
+        uint8_t byte = value >> ((bytes - 1 - i) * 8);
+
+        crc ^= byte;
+        for (int bit = 0; bit < 8; bit++) {
+            if (crc & 1U) {
+                crc = (crc >> 1) ^ poly;
+            } else {
+                crc >>= 1;
+            }
+        }
+    }
+
+    octeon_crc_set_state_reflect(crypto, crc);
+}
+
+static uint64_t octeon_gfm_reduce64(Int128 product, uint8_t poly)
+{
+    uint64_t lo = int128_getlo(product);
+    uint64_t hi = int128_gethi(product);
+
+    while (hi) {
+        int bit = 63 - clz64(hi);
+        uint64_t shifted_poly = (uint64_t)poly << bit;
+
+        hi ^= 1ULL << bit;
+        lo ^= shifted_poly;
+        if (bit > 56) {
+            hi ^= (uint64_t)poly >> (64 - bit);
+        }
+    }
+
+    return lo;
+}
+
+static void octeon_gfm_mul64_uia2(const uint64_t x[2], const uint64_t y[2],
+                                  uint8_t poly, uint64_t out[2])
+{
+    uint64_t vx = revbit64(x[1]);
+    uint64_t vy = revbit64(y[0]);
+    Int128 product = clmul_64(vx, vy);
+    uint64_t res = octeon_gfm_reduce64(product, revbit32(poly) >> 24);
+
+    out[0] = 0;
+    out[1] = revbit64(res);
+}
+
+static void octeon_gfm_mul_reflect(MIPSOcteonCryptoState *crypto, uint64_t data)
+{
+    uint64_t in[2] = {
+        crypto->gfm_reflect_resinp[0] ^ crypto->gfm_reflect_xor0,
+        crypto->gfm_reflect_resinp[1] ^ data,
+    };
+
+    octeon_gfm_mul64_uia2(in, crypto->gfm_reflect_mul,
+                          crypto->gfm_poly, crypto->gfm_reflect_resinp);
+    crypto->gfm_reflect_xor0 = 0;
+}
+
+static inline void octeon_hsh_load_reg_words_be(uint64_t reg,
+                                                 uint32_t *hi, uint32_t *lo)
+{
+    uint8_t buf[8];
+
+    stq_be_p(buf, reg);
+    *hi = ldl_be_p(buf);
+    *lo = ldl_be_p(buf + 4);
+}
+
+static inline void octeon_hsh_load_reg_words_le(uint64_t reg,
+                                                 uint32_t *lo0, uint32_t *lo1)
+{
+    uint8_t buf[8];
+
+    stq_be_p(buf, reg);
+    *lo0 = ldl_le_p(buf);
+    *lo1 = ldl_le_p(buf + 4);
+}
+
+static inline uint64_t octeon_hsh_store_reg_words_be(uint32_t hi, uint32_t lo)
+{
+    uint8_t buf[8];
+
+    stl_be_p(buf, hi);
+    stl_be_p(buf + 4, lo);
+    return ldq_be_p(buf);
+}
+
+static inline uint64_t octeon_hsh_store_reg_words_le(uint32_t lo0,
+                                                      uint32_t lo1)
+{
+    uint8_t buf[8];
+
+    stl_le_p(buf, lo0);
+    stl_le_p(buf + 4, lo1);
+    return ldq_be_p(buf);
+}
+
+static void octeon_md5_transform(MIPSOcteonCryptoState *crypto)
+{
+    static const uint32_t k[64] = {
+        0xd76aa478U, 0xe8c7b756U, 0x242070dbU, 0xc1bdceeeU,
+        0xf57c0fafU, 0x4787c62aU, 0xa8304613U, 0xfd469501U,
+        0x698098d8U, 0x8b44f7afU, 0xffff5bb1U, 0x895cd7beU,
+        0x6b901122U, 0xfd987193U, 0xa679438eU, 0x49b40821U,
+        0xf61e2562U, 0xc040b340U, 0x265e5a51U, 0xe9b6c7aaU,
+        0xd62f105dU, 0x02441453U, 0xd8a1e681U, 0xe7d3fbc8U,
+        0x21e1cde6U, 0xc33707d6U, 0xf4d50d87U, 0x455a14edU,
+        0xa9e3e905U, 0xfcefa3f8U, 0x676f02d9U, 0x8d2a4c8aU,
+        0xfffa3942U, 0x8771f681U, 0x6d9d6122U, 0xfde5380cU,
+        0xa4beea44U, 0x4bdecfa9U, 0xf6bb4b60U, 0xbebfbc70U,
+        0x289b7ec6U, 0xeaa127faU, 0xd4ef3085U, 0x04881d05U,
+        0xd9d4d039U, 0xe6db99e5U, 0x1fa27cf8U, 0xc4ac5665U,
+        0xf4292244U, 0x432aff97U, 0xab9423a7U, 0xfc93a039U,
+        0x655b59c3U, 0x8f0ccc92U, 0xffeff47dU, 0x85845dd1U,
+        0x6fa87e4fU, 0xfe2ce6e0U, 0xa3014314U, 0x4e0811a1U,
+        0xf7537e82U, 0xbd3af235U, 0x2ad7d2bbU, 0xeb86d391U,
+    };
+    static const uint8_t s[64] = {
+        7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
+        5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
+        4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
+        6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21,
+    };
+    uint8_t block_bytes[64];
+    uint32_t m[16];
+    uint32_t a, b, c, d;
+    uint32_t aa, bb, cc, dd;
+    int i;
+
+    for (i = 0; i < 8; i++) {
+        stq_be_p(block_bytes + (i * 8), crypto->hsh_dat[i]);
+        m[i * 2] = ldl_le_p(block_bytes + (i * 8));
+        m[i * 2 + 1] = ldl_le_p(block_bytes + (i * 8) + 4);
+    }
+
+    octeon_hsh_load_reg_words_le(crypto->hsh_iv[0], &a, &b);
+    octeon_hsh_load_reg_words_le(crypto->hsh_iv[1], &c, &d);
+    aa = a;
+    bb = b;
+    cc = c;
+    dd = d;
+
+    for (i = 0; i < 64; i++) {
+        uint32_t f, g, tmp;
+
+        if (i < 16) {
+            f = (b & c) | ((~b) & d);
+            g = i;
+        } else if (i < 32) {
+            f = (d & b) | ((~d) & c);
+            g = (5 * i + 1) & 0xf;
+        } else if (i < 48) {
+            f = b ^ c ^ d;
+            g = (3 * i + 5) & 0xf;
+        } else {
+            f = c ^ (b | (~d));
+            g = (7 * i) & 0xf;
+        }
+
+        tmp = d;
+        d = c;
+        c = b;
+        b = b + rol32(a + f + k[i] + m[g], s[i]);
+        a = tmp;
+    }
+
+    a += aa;
+    b += bb;
+    c += cc;
+    d += dd;
+    crypto->hsh_iv[0] = octeon_hsh_store_reg_words_le(a, b);
+    crypto->hsh_iv[1] = octeon_hsh_store_reg_words_le(c, d);
+}
+
+static void octeon_sha1_transform(MIPSOcteonCryptoState *crypto)
+{
+    uint32_t w[80];
+    uint32_t a, b, c, d, e;
+    int i;
+
+    for (i = 0; i < 8; i++) {
+        octeon_hsh_load_reg_words_be(crypto->hsh_dat[i],
+                                      &w[i * 2], &w[i * 2 + 1]);
+    }
+    for (i = 16; i < 80; i++) {
+        w[i] = rol32(w[i - 3] ^ w[i - 8] ^ w[i - 14] ^ w[i - 16], 1);
+    }
+
+    octeon_hsh_load_reg_words_be(crypto->hsh_iv[0], &a, &b);
+    octeon_hsh_load_reg_words_be(crypto->hsh_iv[1], &c, &d);
+    e = crypto->hsh_iv[2] >> 32;
+
+    for (i = 0; i < 80; i++) {
+        uint32_t f, k, temp;
+
+        if (i < 20) {
+            f = (b & c) | ((~b) & d);
+            k = 0x5a827999;
+        } else if (i < 40) {
+            f = b ^ c ^ d;
+            k = 0x6ed9eba1;
+        } else if (i < 60) {
+            f = (b & c) | (b & d) | (c & d);
+            k = 0x8f1bbcdc;
+        } else {
+            f = b ^ c ^ d;
+            k = 0xca62c1d6;
+        }
+
+        temp = rol32(a, 5) + f + e + k + w[i];
+        e = d;
+        d = c;
+        c = rol32(b, 30);
+        b = a;
+        a = temp;
+    }
+
+    octeon_hsh_load_reg_words_be(crypto->hsh_iv[0], &w[0], &w[1]);
+    octeon_hsh_load_reg_words_be(crypto->hsh_iv[1], &w[2], &w[3]);
+    w[4] = crypto->hsh_iv[2] >> 32;
+    w[0] += a;
+    w[1] += b;
+    w[2] += c;
+    w[3] += d;
+    w[4] += e;
+    crypto->hsh_iv[0] = octeon_hsh_store_reg_words_be(w[0], w[1]);
+    crypto->hsh_iv[1] = octeon_hsh_store_reg_words_be(w[2], w[3]);
+    crypto->hsh_iv[2] = (uint64_t)w[4] << 32;
+}
+
+static void octeon_sha256_transform(MIPSOcteonCryptoState *crypto)
+{
+    static const uint32_t k[64] = {
+        0x428a2f98U, 0x71374491U, 0xb5c0fbcfU, 0xe9b5dba5U,
+        0x3956c25bU, 0x59f111f1U, 0x923f82a4U, 0xab1c5ed5U,
+        0xd807aa98U, 0x12835b01U, 0x243185beU, 0x550c7dc3U,
+        0x72be5d74U, 0x80deb1feU, 0x9bdc06a7U, 0xc19bf174U,
+        0xe49b69c1U, 0xefbe4786U, 0x0fc19dc6U, 0x240ca1ccU,
+        0x2de92c6fU, 0x4a7484aaU, 0x5cb0a9dcU, 0x76f988daU,
+        0x983e5152U, 0xa831c66dU, 0xb00327c8U, 0xbf597fc7U,
+        0xc6e00bf3U, 0xd5a79147U, 0x06ca6351U, 0x14292967U,
+        0x27b70a85U, 0x2e1b2138U, 0x4d2c6dfcU, 0x53380d13U,
+        0x650a7354U, 0x766a0abbU, 0x81c2c92eU, 0x92722c85U,
+        0xa2bfe8a1U, 0xa81a664bU, 0xc24b8b70U, 0xc76c51a3U,
+        0xd192e819U, 0xd6990624U, 0xf40e3585U, 0x106aa070U,
+        0x19a4c116U, 0x1e376c08U, 0x2748774cU, 0x34b0bcb5U,
+        0x391c0cb3U, 0x4ed8aa4aU, 0x5b9cca4fU, 0x682e6ff3U,
+        0x748f82eeU, 0x78a5636fU, 0x84c87814U, 0x8cc70208U,
+        0x90befffaU, 0xa4506cebU, 0xbef9a3f7U, 0xc67178f2U,
+    };
+    uint32_t w[64];
+    uint32_t a, b, c, d, e, f, g, h;
+    uint32_t orig[8];
+    int i;
+
+    for (i = 0; i < 8; i++) {
+        octeon_hsh_load_reg_words_be(crypto->hsh_dat[i],
+                                      &w[i * 2], &w[i * 2 + 1]);
+    }
+    for (i = 16; i < 64; i++) {
+        uint32_t s0 = ror32(w[i - 15], 7) ^
+                      ror32(w[i - 15], 18) ^
+                      (w[i - 15] >> 3);
+        uint32_t s1 = ror32(w[i - 2], 17) ^
+                      ror32(w[i - 2], 19) ^
+                      (w[i - 2] >> 10);
+        w[i] = w[i - 16] + s0 + w[i - 7] + s1;
+    }
+
+    for (i = 0; i < 4; i++) {
+        octeon_hsh_load_reg_words_be(crypto->hsh_iv[i],
+                                      &orig[i * 2], &orig[i * 2 + 1]);
+    }
+    a = orig[0];
+    b = orig[1];
+    c = orig[2];
+    d = orig[3];
+    e = orig[4];
+    f = orig[5];
+    g = orig[6];
+    h = orig[7];
+
+    for (i = 0; i < 64; i++) {
+        uint32_t s1 = ror32(e, 6) ^
+                      ror32(e, 11) ^
+                      ror32(e, 25);
+        uint32_t ch = (e & f) ^ ((~e) & g);
+        uint32_t temp1 = h + s1 + ch + k[i] + w[i];
+        uint32_t s0 = ror32(a, 2) ^
+                      ror32(a, 13) ^
+                      ror32(a, 22);
+        uint32_t maj = (a & b) ^ (a & c) ^ (b & c);
+        uint32_t temp2 = s0 + maj;
+
+        h = g;
+        g = f;
+        f = e;
+        e = d + temp1;
+        d = c;
+        c = b;
+        b = a;
+        a = temp1 + temp2;
+    }
+
+    orig[0] += a;
+    orig[1] += b;
+    orig[2] += c;
+    orig[3] += d;
+    orig[4] += e;
+    orig[5] += f;
+    orig[6] += g;
+    orig[7] += h;
+    for (i = 0; i < 4; i++) {
+        crypto->hsh_iv[i] =
+            octeon_hsh_store_reg_words_be(orig[i * 2], orig[i * 2 + 1]);
+    }
+}
+
+static void octeon_sha512_transform(MIPSOcteonCryptoState *crypto)
+{
+    static const uint64_t k[80] = {
+        0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL,
+        0xb5c0fbcfec4d3b2fULL, 0xe9b5dba58189dbbcULL,
+        0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+        0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL,
+        0xd807aa98a3030242ULL, 0x12835b0145706fbeULL,
+        0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+        0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL,
+        0x9bdc06a725c71235ULL, 0xc19bf174cf692694ULL,
+        0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+        0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL,
+        0x2de92c6f592b0275ULL, 0x4a7484aa6ea6e483ULL,
+        0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+        0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL,
+        0xb00327c898fb213fULL, 0xbf597fc7beef0ee4ULL,
+        0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+        0x06ca6351e003826fULL, 0x142929670a0e6e70ULL,
+        0x27b70a8546d22ffcULL, 0x2e1b21385c26c926ULL,
+        0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+        0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL,
+        0x81c2c92e47edaee6ULL, 0x92722c851482353bULL,
+        0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+        0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL,
+        0xd192e819d6ef5218ULL, 0xd69906245565a910ULL,
+        0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+        0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL,
+        0x2748774cdf8eeb99ULL, 0x34b0bcb5e19b48a8ULL,
+        0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+        0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL,
+        0x748f82ee5defb2fcULL, 0x78a5636f43172f60ULL,
+        0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+        0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL,
+        0xbef9a3f7b2c67915ULL, 0xc67178f2e372532bULL,
+        0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+        0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL,
+        0x06f067aa72176fbaULL, 0x0a637dc5a2c898a6ULL,
+        0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+        0x28db77f523047d84ULL, 0x32caab7b40c72493ULL,
+        0x3c9ebe0a15c9bebcULL, 0x431d67c49c100d4cULL,
+        0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+        0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL,
+    };
+    uint64_t w[80];
+    uint64_t a, b, c, d, e, f, g, h;
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        w[i] = crypto->hsh_datw[i];
+    }
+    for (i = 16; i < 80; i++) {
+        uint64_t s0 = ror64(w[i - 15], 1) ^
+                      ror64(w[i - 15], 8) ^
+                      (w[i - 15] >> 7);
+        uint64_t s1 = ror64(w[i - 2], 19) ^
+                      ror64(w[i - 2], 61) ^
+                      (w[i - 2] >> 6);
+        w[i] = w[i - 16] + s0 + w[i - 7] + s1;
+    }
+
+    a = crypto->hsh_ivw[0];
+    b = crypto->hsh_ivw[1];
+    c = crypto->hsh_ivw[2];
+    d = crypto->hsh_ivw[3];
+    e = crypto->hsh_ivw[4];
+    f = crypto->hsh_ivw[5];
+    g = crypto->hsh_ivw[6];
+    h = crypto->hsh_ivw[7];
+
+    for (i = 0; i < 80; i++) {
+        uint64_t s0 = ror64(a, 28) ^
+                      ror64(a, 34) ^
+                      ror64(a, 39);
+        uint64_t s1 = ror64(e, 14) ^
+                      ror64(e, 18) ^
+                      ror64(e, 41);
+        uint64_t ch = (e & f) ^ ((~e) & g);
+        uint64_t maj = (a & b) ^ (a & c) ^ (b & c);
+        uint64_t temp1 = h + s1 + ch + k[i] + w[i];
+        uint64_t temp2 = s0 + maj;
+
+        h = g;
+        g = f;
+        f = e;
+        e = d + temp1;
+        d = c;
+        c = b;
+        b = a;
+        a = temp1 + temp2;
+    }
+
+    crypto->hsh_ivw[0] += a;
+    crypto->hsh_ivw[1] += b;
+    crypto->hsh_ivw[2] += c;
+    crypto->hsh_ivw[3] += d;
+    crypto->hsh_ivw[4] += e;
+    crypto->hsh_ivw[5] += f;
+    crypto->hsh_ivw[6] += g;
+    crypto->hsh_ivw[7] += h;
+}
+
+static void octeon_store_shared_hsh_window(MIPSOcteonCryptoState *crypto,
+                                         uint32_t sel, uint64_t value)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW14:
+        crypto->hsh_datw[sel - OCTEON_COP2_SEL_HSH_DATW0] = value;
+        break;
+    case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW7:
+        crypto->hsh_ivw[sel - OCTEON_COP2_SEL_HSH_IVW0] = value;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static const uint8_t octeon_snow3g_sr[256] = {
+    0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
+    0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
+    0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+    0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
+    0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,
+    0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+    0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,
+    0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
+    0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+    0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
+    0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
+    0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+    0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,
+    0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
+    0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+    0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
+    0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,
+    0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+    0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,
+    0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
+    0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+    0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
+    0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,
+    0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+    0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,
+    0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
+    0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+    0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
+    0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,
+    0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+    0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,
+    0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16,
+};
+
+static const uint8_t octeon_snow3g_sq[256] = {
+    0x25, 0x24, 0x73, 0x67, 0xd7, 0xae, 0x5c, 0x30,
+    0xa4, 0xee, 0x6e, 0xcb, 0x7d, 0xb5, 0x82, 0xdb,
+    0xe4, 0x8e, 0x48, 0x49, 0x4f, 0x5d, 0x6a, 0x78,
+    0x70, 0x88, 0xe8, 0x5f, 0x5e, 0x84, 0x65, 0xe2,
+    0xd8, 0xe9, 0xcc, 0xed, 0x40, 0x2f, 0x11, 0x28,
+    0x57, 0xd2, 0xac, 0xe3, 0x4a, 0x15, 0x1b, 0xb9,
+    0xb2, 0x80, 0x85, 0xa6, 0x2e, 0x02, 0x47, 0x29,
+    0x07, 0x4b, 0x0e, 0xc1, 0x51, 0xaa, 0x89, 0xd4,
+    0xca, 0x01, 0x46, 0xb3, 0xef, 0xdd, 0x44, 0x7b,
+    0xc2, 0x7f, 0xbe, 0xc3, 0x9f, 0x20, 0x4c, 0x64,
+    0x83, 0xa2, 0x68, 0x42, 0x13, 0xb4, 0x41, 0xcd,
+    0xba, 0xc6, 0xbb, 0x6d, 0x4d, 0x71, 0x21, 0xf4,
+    0x8d, 0xb0, 0xe5, 0x93, 0xfe, 0x8f, 0xe6, 0xcf,
+    0x43, 0x45, 0x31, 0x22, 0x37, 0x36, 0x96, 0xfa,
+    0xbc, 0x0f, 0x08, 0x52, 0x1d, 0x55, 0x1a, 0xc5,
+    0x4e, 0x23, 0x69, 0x7a, 0x92, 0xff, 0x5b, 0x5a,
+    0xeb, 0x9a, 0x1c, 0xa9, 0xd1, 0x7e, 0x0d, 0xfc,
+    0x50, 0x8a, 0xb6, 0x62, 0xf5, 0x0a, 0xf8, 0xdc,
+    0x03, 0x3c, 0x0c, 0x39, 0xf1, 0xb8, 0xf3, 0x3d,
+    0xf2, 0xd5, 0x97, 0x66, 0x81, 0x32, 0xa0, 0x00,
+    0x06, 0xce, 0xf6, 0xea, 0xb7, 0x17, 0xf7, 0x8c,
+    0x79, 0xd6, 0xa7, 0xbf, 0x8b, 0x3f, 0x1f, 0x53,
+    0x63, 0x75, 0x35, 0x2c, 0x60, 0xfd, 0x27, 0xd3,
+    0x94, 0xa5, 0x7c, 0xa1, 0x05, 0x58, 0x2d, 0xbd,
+    0xd9, 0xc7, 0xaf, 0x6b, 0x54, 0x0b, 0xe0, 0x38,
+    0x04, 0xc8, 0x9d, 0xe7, 0x14, 0xb1, 0x87, 0x9c,
+    0xdf, 0x6f, 0xf9, 0xda, 0x2a, 0xc4, 0x59, 0x16,
+    0x74, 0x91, 0xab, 0x26, 0x61, 0x76, 0x34, 0x2b,
+    0xad, 0x99, 0xfb, 0x72, 0xec, 0x33, 0x12, 0xde,
+    0x98, 0x3b, 0xc0, 0x9b, 0x3e, 0x18, 0x10, 0x3a,
+    0x56, 0xe1, 0x77, 0xc9, 0x1e, 0x9e, 0x95, 0xa3,
+    0x90, 0x19, 0xa8, 0x6c, 0x09, 0xd0, 0xf0, 0x86,
+};
+
+static inline uint8_t octeon_snow3g_mulx(uint8_t v, uint8_t c)
+{
+    return (v & 0x80) ? ((v << 1) ^ c) : (v << 1);
+}
+
+static uint8_t octeon_snow3g_mulxpow(uint8_t v, unsigned int n, uint8_t c)
+{
+    while (n-- > 0) {
+        v = octeon_snow3g_mulx(v, c);
+    }
+    return v;
+}
+
+static inline uint32_t octeon_snow3g_pack32(uint8_t b0, uint8_t b1,
+                                            uint8_t b2, uint8_t b3)
+{
+    return ((uint32_t)b0 << 24) | ((uint32_t)b1 << 16) |
+           ((uint32_t)b2 << 8) | b3;
+}
+
+static uint32_t octeon_snow3g_mulalpha(uint8_t c)
+{
+    return octeon_snow3g_pack32(octeon_snow3g_mulxpow(c, 23, 0xa9),
+                                octeon_snow3g_mulxpow(c, 245, 0xa9),
+                                octeon_snow3g_mulxpow(c, 48, 0xa9),
+                                octeon_snow3g_mulxpow(c, 239, 0xa9));
+}
+
+static uint32_t octeon_snow3g_divalpha(uint8_t c)
+{
+    return octeon_snow3g_pack32(octeon_snow3g_mulxpow(c, 16, 0xa9),
+                                octeon_snow3g_mulxpow(c, 39, 0xa9),
+                                octeon_snow3g_mulxpow(c, 6, 0xa9),
+                                octeon_snow3g_mulxpow(c, 64, 0xa9));
+}
+
+static uint32_t octeon_snow3g_s1(uint32_t w)
+{
+    uint8_t x0 = octeon_snow3g_sr[w >> 24];
+    uint8_t x1 = octeon_snow3g_sr[(uint8_t)(w >> 16)];
+    uint8_t x2 = octeon_snow3g_sr[(uint8_t)(w >> 8)];
+    uint8_t x3 = octeon_snow3g_sr[(uint8_t)w];
+    uint8_t r0 = octeon_snow3g_mulx(x0, 0x1b) ^ x1 ^ x2 ^
+                 octeon_snow3g_mulx(x3, 0x1b) ^ x3;
+    uint8_t r1 = octeon_snow3g_mulx(x0, 0x1b) ^ x0 ^
+                 octeon_snow3g_mulx(x1, 0x1b) ^ x2 ^ x3;
+    uint8_t r2 = x0 ^ octeon_snow3g_mulx(x1, 0x1b) ^ x1 ^
+                 octeon_snow3g_mulx(x2, 0x1b) ^ x3;
+    uint8_t r3 = x0 ^ x1 ^ octeon_snow3g_mulx(x2, 0x1b) ^ x2 ^
+                 octeon_snow3g_mulx(x3, 0x1b);
+
+    return octeon_snow3g_pack32(r0, r1, r2, r3);
+}
+
+static uint32_t octeon_snow3g_s2(uint32_t w)
+{
+    uint8_t x0 = octeon_snow3g_sq[w >> 24];
+    uint8_t x1 = octeon_snow3g_sq[(uint8_t)(w >> 16)];
+    uint8_t x2 = octeon_snow3g_sq[(uint8_t)(w >> 8)];
+    uint8_t x3 = octeon_snow3g_sq[(uint8_t)w];
+    uint8_t r0 = octeon_snow3g_mulx(x0, 0x69) ^ x1 ^ x2 ^
+                 octeon_snow3g_mulx(x3, 0x69) ^ x3;
+    uint8_t r1 = octeon_snow3g_mulx(x0, 0x69) ^ x0 ^
+                 octeon_snow3g_mulx(x1, 0x69) ^ x2 ^ x3;
+    uint8_t r2 = x0 ^ octeon_snow3g_mulx(x1, 0x69) ^ x1 ^
+                 octeon_snow3g_mulx(x2, 0x69) ^ x3;
+    uint8_t r3 = x0 ^ x1 ^ octeon_snow3g_mulx(x2, 0x69) ^ x2 ^
+                 octeon_snow3g_mulx(x3, 0x69);
+
+    return octeon_snow3g_pack32(r0, r1, r2, r3);
+}
+
+static uint32_t octeon_snow3g_clock_fsm(MIPSOcteonCryptoState *crypto)
+{
+    uint32_t f = (uint32_t)(crypto->snow3g_lfsr[15] + crypto->snow3g_fsm[0]) ^
+                 crypto->snow3g_fsm[1];
+    uint32_t r = (uint32_t)(crypto->snow3g_fsm[1] +
+                            (crypto->snow3g_fsm[2] ^ crypto->snow3g_lfsr[5]));
+
+    crypto->snow3g_fsm[2] = octeon_snow3g_s2(crypto->snow3g_fsm[1]);
+    crypto->snow3g_fsm[1] = octeon_snow3g_s1(crypto->snow3g_fsm[0]);
+    crypto->snow3g_fsm[0] = r;
+    return f;
+}
+
+static void octeon_snow3g_clock_lfsr(MIPSOcteonCryptoState *crypto,
+                                     bool init_mode, uint32_t f)
+{
+    uint32_t s0 = crypto->snow3g_lfsr[0];
+    uint32_t s11 = crypto->snow3g_lfsr[11];
+    uint32_t v = (s0 << 8) ^ octeon_snow3g_mulalpha(s0 >> 24) ^
+                 crypto->snow3g_lfsr[2] ^ (s11 >> 8) ^
+                 octeon_snow3g_divalpha((uint8_t)s11);
+    int i;
+
+    if (init_mode) {
+        v ^= f;
+    }
+
+    for (i = 0; i < 15; i++) {
+        crypto->snow3g_lfsr[i] = crypto->snow3g_lfsr[i + 1];
+    }
+    crypto->snow3g_lfsr[15] = v;
+}
+
+static uint32_t octeon_snow3g_generate_word(MIPSOcteonCryptoState *crypto)
+{
+    uint32_t f = octeon_snow3g_clock_fsm(crypto);
+    uint32_t z = f ^ crypto->snow3g_lfsr[0];
+
+    octeon_snow3g_clock_lfsr(crypto, false, 0);
+    return z;
+}
+
+static void octeon_snow3g_queue_result(MIPSOcteonCryptoState *crypto)
+{
+    uint32_t z0 = octeon_snow3g_generate_word(crypto);
+    uint32_t z1 = octeon_snow3g_generate_word(crypto);
+
+    crypto->snow3g_result = ((uint64_t)z0 << 32) | z1;
+}
+
+static void octeon_snow3g_start(MIPSOcteonCryptoState *crypto, uint64_t data)
+{
+    int i;
+
+    octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SNOW3G);
+    for (i = 0; i < 7; i++) {
+        uint64_t pair = crypto->hsh_datw[i];
+
+        crypto->snow3g_lfsr[i * 2] = pair >> 32;
+        crypto->snow3g_lfsr[i * 2 + 1] = pair;
+    }
+    crypto->snow3g_lfsr[14] = data >> 32;
+    crypto->snow3g_lfsr[15] = data;
+    memset(crypto->snow3g_fsm, 0, sizeof(crypto->snow3g_fsm));
+
+    for (i = 0; i < 32; i++) {
+        uint32_t f = octeon_snow3g_clock_fsm(crypto);
+
+        octeon_snow3g_clock_lfsr(crypto, true, f);
+    }
+
+    (void)octeon_snow3g_clock_fsm(crypto);
+    octeon_snow3g_clock_lfsr(crypto, false, 0);
+    octeon_snow3g_queue_result(crypto);
+}
+
+static void octeon_snow3g_more(MIPSOcteonCryptoState *crypto)
+{
+    octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SNOW3G);
+    octeon_snow3g_queue_result(crypto);
+}
+
+static int octeon_aes_key_bits(const MIPSOcteonCryptoState *crypto)
+{
+    enum {
+        OCTEON_AES_KEYLEN_128 = 1,
+        OCTEON_AES_KEYLEN_192 = 2,
+        OCTEON_AES_KEYLEN_256 = 3,
+    };
+
+    switch (crypto->aes_keylen) {
+    case OCTEON_AES_KEYLEN_128:
+        return 128;
+    case OCTEON_AES_KEYLEN_192:
+        return 192;
+    case OCTEON_AES_KEYLEN_256:
+        return 256;
+    default:
+        return 0;
+    }
+}
+
+static const uint8_t octeon_des_ip[64] = {
+    58, 50, 42, 34, 26, 18, 10,  2,
+    60, 52, 44, 36, 28, 20, 12,  4,
+    62, 54, 46, 38, 30, 22, 14,  6,
+    64, 56, 48, 40, 32, 24, 16,  8,
+    57, 49, 41, 33, 25, 17,  9,  1,
+    59, 51, 43, 35, 27, 19, 11,  3,
+    61, 53, 45, 37, 29, 21, 13,  5,
+    63, 55, 47, 39, 31, 23, 15,  7,
+};
+
+static const uint8_t octeon_des_fp[64] = {
+    40,  8, 48, 16, 56, 24, 64, 32,
+    39,  7, 47, 15, 55, 23, 63, 31,
+    38,  6, 46, 14, 54, 22, 62, 30,
+    37,  5, 45, 13, 53, 21, 61, 29,
+    36,  4, 44, 12, 52, 20, 60, 28,
+    35,  3, 43, 11, 51, 19, 59, 27,
+    34,  2, 42, 10, 50, 18, 58, 26,
+    33,  1, 41,  9, 49, 17, 57, 25,
+};
+
+static const uint8_t octeon_des_e[48] = {
+    32,  1,  2,  3,  4,  5,
+     4,  5,  6,  7,  8,  9,
+     8,  9, 10, 11, 12, 13,
+    12, 13, 14, 15, 16, 17,
+    16, 17, 18, 19, 20, 21,
+    20, 21, 22, 23, 24, 25,
+    24, 25, 26, 27, 28, 29,
+    28, 29, 30, 31, 32,  1,
+};
+
+static const uint8_t octeon_des_p[32] = {
+    16,  7, 20, 21, 29, 12, 28, 17,
+     1, 15, 23, 26,  5, 18, 31, 10,
+     2,  8, 24, 14, 32, 27,  3,  9,
+    19, 13, 30,  6, 22, 11,  4, 25,
+};
+
+static const uint8_t octeon_des_pc1[56] = {
+    57, 49, 41, 33, 25, 17,  9,
+     1, 58, 50, 42, 34, 26, 18,
+    10,  2, 59, 51, 43, 35, 27,
+    19, 11,  3, 60, 52, 44, 36,
+    63, 55, 47, 39, 31, 23, 15,
+     7, 62, 54, 46, 38, 30, 22,
+    14,  6, 61, 53, 45, 37, 29,
+    21, 13,  5, 28, 20, 12,  4,
+};
+
+static const uint8_t octeon_des_pc2[48] = {
+    14, 17, 11, 24,  1,  5,
+     3, 28, 15,  6, 21, 10,
+    23, 19, 12,  4, 26,  8,
+    16,  7, 27, 20, 13,  2,
+    41, 52, 31, 37, 47, 55,
+    30, 40, 51, 45, 33, 48,
+    44, 49, 39, 56, 34, 53,
+    46, 42, 50, 36, 29, 32,
+};
+
+static const uint8_t octeon_des_rotations[16] = {
+    1, 1, 2, 2, 2, 2, 2, 2,
+    1, 2, 2, 2, 2, 2, 2, 1,
+};
+
+static const uint8_t octeon_des_sboxes[8][64] = {
+    {
+        14, 4, 13, 1,  2, 15, 11, 8,  3, 10, 6, 12, 5, 9, 0, 7,
+         0, 15, 7, 4, 14,  2, 13, 1, 10,  6, 12, 11, 9, 5, 3, 8,
+         4, 1, 14, 8, 13,  6,  2, 11, 15, 12, 9,  7, 3, 10, 5, 0,
+        15, 12, 8, 2,  4,  9,  1, 7,  5, 11, 3, 14, 10, 0, 6, 13,
+    },
+    {
+        15, 1,  8, 14, 6, 11, 3, 4,  9, 7, 2, 13, 12, 0, 5, 10,
+         3, 13, 4, 7, 15, 2,  8, 14, 12, 0, 1, 10,  6, 9, 11, 5,
+         0, 14, 7, 11, 10, 4, 13, 1,  5, 8, 12, 6,  9, 3,  2, 15,
+        13, 8, 10, 1,  3, 15, 4, 2, 11, 6, 7, 12,  0, 5, 14, 9,
+    },
+    {
+        10, 0,  9, 14, 6, 3, 15, 5,  1, 13, 12, 7, 11, 4, 2, 8,
+        13, 7,  0, 9,  3, 4, 6,  10, 2, 8,  5, 14, 12, 11, 15, 1,
+        13, 6,  4, 9,  8, 15, 3, 0, 11, 1,  2, 12,  5, 10, 14, 7,
+         1, 10, 13, 0,  6, 9, 8, 7,  4, 15, 14, 3, 11, 5,  2, 12,
+    },
+    {
+         7, 13, 14, 3,  0, 6, 9, 10, 1, 2, 8,  5, 11, 12, 4, 15,
+        13, 8,  11, 5,  6, 15, 0, 3,  4, 7, 2, 12, 1,  10, 14, 9,
+        10, 6,  9,  0, 12, 11, 7, 13, 15, 1, 3, 14, 5,  2,  8,  4,
+         3, 15, 0,  6, 10, 1, 13, 8,  9, 4, 5, 11, 12, 7,  2,  14,
+    },
+    {
+         2, 12, 4,  1,  7, 10, 11, 6,  8, 5, 3, 15, 13, 0, 14, 9,
+        14, 11, 2,  12, 4, 7,  13, 1,  5, 0, 15, 10, 3,  9, 8,  6,
+         4, 2,  1,  11, 10, 13, 7, 8, 15, 9, 12, 5,  6,  3, 0, 14,
+        11, 8,  12, 7,  1, 14, 2, 13, 6, 15, 0,  9, 10, 4, 5,  3,
+    },
+    {
+        12, 1,  10, 15, 9, 2,  6, 8,  0, 13, 3, 4, 14, 7, 5, 11,
+        10, 15, 4,  2,  7, 12, 9, 5,  6, 1,  13, 14, 0, 11, 3, 8,
+         9, 14, 15, 5,  2, 8,  12, 3,  7, 0,  4, 10, 1, 13, 11, 6,
+         4, 3,  2,  12, 9, 5,  15, 10, 11, 14, 1, 7,  6, 0,  8, 13,
+    },
+    {
+         4, 11, 2, 14, 15, 0, 8, 13, 3, 12, 9, 7,  5, 10, 6, 1,
+        13, 0,  11, 7,  4, 9, 1, 10, 14, 3, 5, 12, 2, 15, 8, 6,
+         1, 4,  11, 13, 12, 3, 7, 14, 10, 15, 6, 8,  0, 5,  9, 2,
+         6, 11, 13, 8,  1, 4, 10, 7,  9, 5,  0, 15, 14, 2,  3, 12,
+    },
+    {
+        13, 2,  8, 4,  6, 15, 11, 1, 10, 9, 3, 14, 5, 0, 12, 7,
+         1, 15, 13, 8, 10, 3,  7,  4, 12, 5, 6, 11, 0, 14, 9,  2,
+         7, 11, 4,  1,  9, 12, 14, 2, 0,  6, 10, 13, 15, 3, 5, 8,
+         2, 1,  14, 7,  4, 10, 8,  13, 15, 12, 9, 0,  3,  5, 6, 11,
+    },
+};
+
+static const uint8_t octeon_kasumi_s7[128] = {
+     54,  50,  62,  56,  22,  34,  94,  96,  38,   6,  63,  93,   2,  18,
+    123,  33,  55, 113,  39, 114,  21,  67,  65,  12,  47,  73,  46,  27,
+     25, 111, 124,  81,  53,   9, 121,  79,  52,  60,  58,  48, 101, 127,
+     40, 120, 104,  70,  71,  43,  20, 122,  72,  61,  23, 109,  13, 100,
+     77,   1,  16,   7,  82,  10, 105,  98, 117, 116,  76,  11,  89, 106,
+      0, 125, 118,  99,  86,  69,  30,  57, 126,  87, 112,  51,  17,   5,
+     95,  14,  90,  84,  91,   8,  35, 103,  32,  97,  28,  66, 102,  31,
+     26,  45,  75,   4,  85,  92,  37,  74,  80,  49,  68,  29, 115,  44,
+     64, 107, 108,  24, 110,  83,  36,  78,  42,  19,  15,  41,  88, 119,
+     59,   3,
+};
+
+static const uint16_t octeon_kasumi_s9[512] = {
+    167, 239, 161, 379, 391, 334,   9, 338,  38, 226,  48, 358, 452, 385,
+     90, 397, 183, 253, 147, 331, 415, 340,  51, 362, 306, 500, 262,  82,
+    216, 159, 356, 177, 175, 241, 489,  37, 206,  17,   0, 333,  44, 254,
+    378,  58, 143, 220,  81, 400,  95,   3, 315, 245,  54, 235, 218, 405,
+    472, 264, 172, 494, 371, 290, 399,  76, 165, 197, 395, 121, 257, 480,
+    423, 212, 240,  28, 462, 176, 406, 507, 288, 223, 501, 407, 249, 265,
+     89, 186, 221, 428, 164,  74, 440, 196, 458, 421, 350, 163, 232, 158,
+    134, 354,  13, 250, 491, 142, 191,  69, 193, 425, 152, 227, 366, 135,
+    344, 300, 276, 242, 437, 320, 113, 278,  11, 243,  87, 317,  36,  93,
+    496,  27, 487, 446, 482,  41,  68, 156, 457, 131, 326, 403, 339,  20,
+     39, 115, 442, 124, 475, 384, 508,  53, 112, 170, 479, 151, 126, 169,
+     73, 268, 279, 321, 168, 364, 363, 292,  46, 499, 393, 327, 324,  24,
+    456, 267, 157, 460, 488, 426, 309, 229, 439, 506, 208, 271, 349, 401,
+    434, 236,  16, 209, 359,  52,  56, 120, 199, 277, 465, 416, 252, 287,
+    246,   6,  83, 305, 420, 345, 153, 502,  65,  61, 244, 282, 173, 222,
+    418,  67, 386, 368, 261, 101, 476, 291, 195, 430,  49,  79, 166, 330,
+    280, 383, 373, 128, 382, 408, 155, 495, 367, 388, 274, 107, 459, 417,
+     62, 454, 132, 225, 203, 316, 234,  14, 301,  91, 503, 286, 424, 211,
+    347, 307, 140, 374,  35, 103, 125, 427,  19, 214, 453, 146, 498, 314,
+    444, 230, 256, 329, 198, 285,  50, 116,  78, 410,  10, 205, 510, 171,
+    231,  45, 139, 467,  29,  86, 505,  32,  72,  26, 342, 150, 313, 490,
+    431, 238, 411, 325, 149, 473,  40, 119, 174, 355, 185, 233, 389,  71,
+    448, 273, 372,  55, 110, 178, 322,  12, 469, 392, 369, 190,   1, 109,
+    375, 137, 181,  88,  75, 308, 260, 484,  98, 272, 370, 275, 412, 111,
+    336, 318,   4, 504, 492, 259, 304,  77, 337, 435,  21, 357, 303, 332,
+    483,  18,  47,  85,  25, 497, 474, 289, 100, 269, 296, 478, 270, 106,
+     31, 104, 433,  84, 414, 486, 394,  96,  99, 154, 511, 148, 413, 361,
+    409, 255, 162, 215, 302, 201, 266, 351, 343, 144, 441, 365, 108, 298,
+    251,  34, 182, 509, 138, 210, 335, 133, 311, 352, 328, 141, 396, 346,
+    123, 319, 450, 281, 429, 228, 443, 481,  92, 404, 485, 422, 248, 297,
+     23, 213, 130, 466,  22, 217, 283,  70, 294, 360, 419, 127, 312, 377,
+      7, 468, 194,   2, 117, 295, 463, 258, 224, 447, 247, 187,  80, 398,
+    284, 353, 105, 390, 299, 471, 470, 184,  57, 200, 348,  63, 204, 188,
+     33, 451,  97,  30, 310, 219,  94, 160, 129, 493,  64, 179, 263, 102,
+    189, 207, 114, 402, 438, 477, 387, 122, 192,  42, 381,   5, 145, 118,
+    180, 449, 293, 323, 136, 380,  43,  66,  60, 455, 341, 445, 202, 432,
+      8, 237,  15, 376, 436, 464,  59, 461,
+};
+
+static const uint16_t octeon_kasumi_constants[8] = {
+    0x0123, 0x4567, 0x89ab, 0xcdef, 0xfedc, 0xba98, 0x7654, 0x3210,
+};
+
+typedef struct OcteonKasumiSubkeys {
+    uint16_t kli1[8];
+    uint16_t kli2[8];
+    uint16_t koi1[8];
+    uint16_t koi2[8];
+    uint16_t koi3[8];
+    uint16_t kii1[8];
+    uint16_t kii2[8];
+    uint16_t kii3[8];
+} OcteonKasumiSubkeys;
+
+static uint64_t octeon_des_permute(uint64_t input, const uint8_t *table,
+                                   size_t output_bits, size_t input_bits)
+{
+    uint64_t out = 0;
+
+    for (size_t i = 0; i < output_bits; i++) {
+        unsigned src = table[i] - 1;
+
+        out = (out << 1) | ((input >> (input_bits - 1 - src)) & 1);
+    }
+    return out;
+}
+
+static uint32_t octeon_des_rotate28(uint32_t v, unsigned shift)
+{
+    return ((v << shift) | (v >> (28 - shift))) & 0x0fffffffU;
+}
+
+static void octeon_des_expand_subkeys(uint64_t key, uint64_t subkeys[16])
+{
+    uint64_t permuted = octeon_des_permute(key, octeon_des_pc1,
+                                           ARRAY_SIZE(octeon_des_pc1), 64);
+    uint32_t c = (permuted >> 28) & 0x0fffffffU;
+    uint32_t d = permuted & 0x0fffffffU;
+
+    for (int i = 0; i < 16; i++) {
+        c = octeon_des_rotate28(c, octeon_des_rotations[i]);
+        d = octeon_des_rotate28(d, octeon_des_rotations[i]);
+        subkeys[i] = octeon_des_permute(((uint64_t)c << 28) | d,
+                                        octeon_des_pc2,
+                                        ARRAY_SIZE(octeon_des_pc2), 56);
+    }
+}
+
+static uint32_t octeon_des_f(uint32_t r, uint64_t subkey)
+{
+    uint64_t expanded = octeon_des_permute(r, octeon_des_e,
+                                           ARRAY_SIZE(octeon_des_e), 32);
+    uint32_t out = 0;
+
+    expanded ^= subkey;
+    for (int i = 0; i < 8; i++) {
+        uint8_t sextet = (expanded >> (42 - i * 6)) & 0x3f;
+        uint8_t row = ((sextet & 0x20) >> 4) | (sextet & 0x01);
+        uint8_t col = (sextet >> 1) & 0x0f;
+
+        out = (out << 4) | octeon_des_sboxes[i][row * 16 + col];
+    }
+
+    return octeon_des_permute(out, octeon_des_p, ARRAY_SIZE(octeon_des_p), 32);
+}
+
+static uint64_t octeon_des_block_crypt(uint64_t block, uint64_t key,
+                                       bool encrypt)
+{
+    uint64_t subkeys[16];
+    uint64_t permuted = octeon_des_permute(block, octeon_des_ip,
+                                           ARRAY_SIZE(octeon_des_ip), 64);
+    uint32_t l = permuted >> 32;
+    uint32_t r = permuted;
+
+    octeon_des_expand_subkeys(key, subkeys);
+
+    for (int i = 0; i < 16; i++) {
+        uint32_t next = l ^ octeon_des_f(r, subkeys[encrypt ? i : 15 - i]);
+
+        l = r;
+        r = next;
+    }
+
+    return octeon_des_permute(((uint64_t)r << 32) | l,
+                              octeon_des_fp, ARRAY_SIZE(octeon_des_fp), 64);
+}
+
+static uint64_t octeon_3des_block_crypt(uint64_t block, const uint64_t keys[3],
+                                        bool encrypt)
+{
+    if (encrypt) {
+        block = octeon_des_block_crypt(block, keys[0], true);
+        block = octeon_des_block_crypt(block, keys[1], false);
+        block = octeon_des_block_crypt(block, keys[2], true);
+    } else {
+        block = octeon_des_block_crypt(block, keys[2], false);
+        block = octeon_des_block_crypt(block, keys[1], true);
+        block = octeon_des_block_crypt(block, keys[0], false);
+    }
+    return block;
+}
+
+static void octeon_3des_crypt_common(MIPSOcteonCryptoState *crypto,
+                                     uint64_t input_reg,
+                                     bool encrypt, bool cbc)
+{
+    const uint64_t keys[3] = {
+        crypto->des3_key[0],
+        crypto->des3_key[1],
+        crypto->des3_key[2],
+    };
+    uint64_t block = input_reg;
+
+    if (cbc) {
+        if (encrypt) {
+            block ^= crypto->des3_iv;
+            block = octeon_3des_block_crypt(block, keys, true);
+            crypto->des3_iv = block;
+        } else {
+            block = octeon_3des_block_crypt(block, keys, false);
+            block ^= crypto->des3_iv;
+            crypto->des3_iv = input_reg;
+        }
+    } else {
+        block = octeon_3des_block_crypt(block, keys, encrypt);
+    }
+
+    crypto->des3_result = block;
+}
+
+static inline uint16_t octeon_rol16(uint16_t value, unsigned int bits)
+{
+    return (value << bits) | (value >> (16 - bits));
+}
+
+static void octeon_kasumi_key_schedule(const uint64_t key_regs[2],
+                                       OcteonKasumiSubkeys *subkeys)
+{
+    uint16_t key[8];
+    uint16_t key_prime[8];
+
+    key[0] = key_regs[0] >> 48;
+    key[1] = key_regs[0] >> 32;
+    key[2] = key_regs[0] >> 16;
+    key[3] = key_regs[0];
+    key[4] = key_regs[1] >> 48;
+    key[5] = key_regs[1] >> 32;
+    key[6] = key_regs[1] >> 16;
+    key[7] = key_regs[1];
+
+    for (int i = 0; i < 8; i++) {
+        key_prime[i] = key[i] ^ octeon_kasumi_constants[i];
+    }
+
+    for (int i = 0; i < 8; i++) {
+        subkeys->kli1[i] = octeon_rol16(key[i], 1);
+        subkeys->kli2[i] = key_prime[(i + 2) & 7];
+        subkeys->koi1[i] = octeon_rol16(key[(i + 1) & 7], 5);
+        subkeys->koi2[i] = octeon_rol16(key[(i + 5) & 7], 8);
+        subkeys->koi3[i] = octeon_rol16(key[(i + 6) & 7], 13);
+        subkeys->kii1[i] = key_prime[(i + 4) & 7];
+        subkeys->kii2[i] = key_prime[(i + 3) & 7];
+        subkeys->kii3[i] = key_prime[(i + 7) & 7];
+    }
+}
+
+static uint16_t octeon_kasumi_fi(uint16_t in, uint16_t subkey)
+{
+    uint16_t nine = in >> 7;
+    uint16_t seven = in & 0x7f;
+
+    nine = octeon_kasumi_s9[nine] ^ seven;
+    seven = octeon_kasumi_s7[seven] ^ (nine & 0x7f);
+    seven ^= subkey >> 9;
+    nine ^= subkey & 0x1ff;
+    nine = octeon_kasumi_s9[nine] ^ seven;
+    seven = octeon_kasumi_s7[seven] ^ (nine & 0x7f);
+    return (seven << 9) | nine;
+}
+
+static uint32_t octeon_kasumi_fo(uint32_t in, int index,
+                                 const OcteonKasumiSubkeys *subkeys)
+{
+    uint16_t left = in >> 16;
+    uint16_t right = in;
+
+    left ^= subkeys->koi1[index];
+    left = octeon_kasumi_fi(left, subkeys->kii1[index]);
+    left ^= right;
+    right ^= subkeys->koi2[index];
+    right = octeon_kasumi_fi(right, subkeys->kii2[index]);
+    right ^= left;
+    left ^= subkeys->koi3[index];
+    left = octeon_kasumi_fi(left, subkeys->kii3[index]);
+    left ^= right;
+
+    return ((uint32_t)right << 16) | left;
+}
+
+static uint32_t octeon_kasumi_fl(uint32_t in, int index,
+                                 const OcteonKasumiSubkeys *subkeys)
+{
+    uint16_t left = in >> 16;
+    uint16_t right = in;
+    uint16_t a = left & subkeys->kli1[index];
+    uint16_t b;
+
+    right ^= octeon_rol16(a, 1);
+    b = right | subkeys->kli2[index];
+    left ^= octeon_rol16(b, 1);
+    return ((uint32_t)left << 16) | right;
+}
+
+static uint64_t octeon_kasumi_block_encrypt(uint64_t block,
+                                            const uint64_t key_regs[2])
+{
+    OcteonKasumiSubkeys subkeys;
+    uint32_t left = block >> 32;
+    uint32_t right = block;
+
+    octeon_kasumi_key_schedule(key_regs, &subkeys);
+
+    for (int i = 0; i < 8; ) {
+        uint32_t temp = octeon_kasumi_fl(left, i, &subkeys);
+
+        temp = octeon_kasumi_fo(temp, i++, &subkeys);
+        right ^= temp;
+        temp = octeon_kasumi_fo(right, i, &subkeys);
+        temp = octeon_kasumi_fl(temp, i++, &subkeys);
+        left ^= temp;
+    }
+
+    return ((uint64_t)left << 32) | right;
+}
+
+static void octeon_kasumi_crypt_common(MIPSOcteonCryptoState *crypto,
+                                       uint64_t input_reg, bool cbc)
+{
+    const uint64_t key_regs[2] = {
+        crypto->des3_key[0],
+        crypto->des3_key[1],
+    };
+    uint64_t block = input_reg;
+
+    if (cbc) {
+        block ^= crypto->des3_iv;
+    }
+
+    block = octeon_kasumi_block_encrypt(block, key_regs);
+    if (cbc) {
+        crypto->des3_iv = block;
+    }
+    crypto->des3_result = block;
+}
+
+static void octeon_aes_load_key(const MIPSOcteonCryptoState *crypto,
+                                uint8_t *key, size_t keylen)
+{
+    stq_be_p(key, crypto->aes_key[0]);
+    stq_be_p(key + 8, crypto->aes_key[1]);
+    if (keylen > 16) {
+        stq_be_p(key + 16, crypto->aes_key[2]);
+    }
+    if (keylen > 24) {
+        stq_be_p(key + 24, crypto->aes_key[3]);
+    }
+}
+
+static void octeon_aes_load_block(const uint64_t regs[2], uint8_t *block)
+{
+    stq_be_p(block, regs[0]);
+    stq_be_p(block + 8, regs[1]);
+}
+
+static void octeon_aes_store_block(uint64_t regs[2], const uint8_t *block)
+{
+    regs[0] = ldq_be_p(block);
+    regs[1] = ldq_be_p(block + 8);
+}
+
+static void octeon_aes_encrypt_common(MIPSOcteonCryptoState *crypto, bool cbc)
+{
+    AES_KEY key;
+    uint8_t in[16];
+    uint8_t out[16];
+    uint8_t iv[16];
+    uint8_t raw_key[32] = {};
+    int bits = octeon_aes_key_bits(crypto);
+
+    if (!bits) {
+        return;
+    }
+
+    octeon_aes_load_key(crypto, raw_key, bits / 8);
+    octeon_aes_load_block(crypto->aes_input, in);
+    if (cbc) {
+        int i;
+
+        octeon_aes_load_block(crypto->aes_iv, iv);
+        for (i = 0; i < sizeof(in); i++) {
+            in[i] ^= iv[i];
+        }
+    }
+
+    AES_set_encrypt_key(raw_key, bits, &key);
+    AES_encrypt(in, out, &key);
+    octeon_aes_store_block(crypto->aes_result, out);
+    if (cbc) {
+        octeon_aes_store_block(crypto->aes_iv, out);
+    }
+}
+
+static void octeon_aes_decrypt_common(MIPSOcteonCryptoState *crypto, bool cbc)
+{
+    AES_KEY key;
+    uint8_t in[16];
+    uint8_t out[16];
+    uint8_t iv[16];
+    uint8_t next_iv[16];
+    uint8_t raw_key[32] = {};
+    int bits = octeon_aes_key_bits(crypto);
+    int i;
+
+    if (!bits) {
+        return;
+    }
+
+    octeon_aes_load_key(crypto, raw_key, bits / 8);
+    octeon_aes_load_block(crypto->aes_input, in);
+    if (cbc) {
+        memcpy(next_iv, in, sizeof(next_iv));
+        octeon_aes_load_block(crypto->aes_iv, iv);
+    }
+
+    AES_set_decrypt_key(raw_key, bits, &key);
+    AES_decrypt(in, out, &key);
+    if (cbc) {
+        for (i = 0; i < sizeof(out); i++) {
+            out[i] ^= iv[i];
+        }
+    }
+
+    octeon_aes_store_block(crypto->aes_result, out);
+    if (cbc) {
+        octeon_aes_store_block(crypto->aes_iv, next_iv);
+    }
+}
+
+static void octeon_gfm_mul(const uint64_t x[2], const uint64_t y[2],
+                           uint16_t poly, uint64_t out[2])
+{
+    uint64_t zh = 0, zl = 0;
+    uint64_t vh = y[0], vl = y[1];
+    uint64_t rh = (uint64_t)poly << 48;
+    int i;
+
+    for (i = 0; i < 128; i++) {
+        bool bit;
+        bool lsb;
+
+        if (i < 64) {
+            bit = (x[0] >> (63 - i)) & 1;
+        } else {
+            bit = (x[1] >> (127 - i)) & 1;
+        }
+        if (bit) {
+            zh ^= vh;
+            zl ^= vl;
+        }
+
+        lsb = vl & 1;
+        vl = (vh << 63) | (vl >> 1);
+        vh >>= 1;
+        if (lsb) {
+            vh ^= rh;
+        }
+    }
+
+    out[0] = zh;
+    out[1] = zl;
+}
+
+uint64_t helper_octeon_cop2_dmfc2(CPUMIPSState *env, uint32_t sel)
+{
+    MIPSOcteonCryptoState *crypto = &env->octeon_crypto;
+
+    if (crypto->shared_mode == OCTEON_SHARED_MODE_SNOW3G) {
+        if (sel >= OCTEON_COP2_SEL_SNOW3G_LFSR0 &&
+            sel <= OCTEON_COP2_SEL_SNOW3G_LFSR7) {
+            unsigned int idx = sel - OCTEON_COP2_SEL_SNOW3G_LFSR0;
+
+            return ((uint64_t)crypto->snow3g_lfsr[idx * 2] << 32) |
+                   crypto->snow3g_lfsr[idx * 2 + 1];
+        }
+        switch (sel) {
+        case OCTEON_COP2_SEL_SNOW3G_RESULT:
+            return crypto->snow3g_result;
+        case OCTEON_COP2_SEL_SNOW3G_FSM0:
+        case OCTEON_COP2_SEL_SNOW3G_FSM1:
+        case OCTEON_COP2_SEL_SNOW3G_FSM2:
+            return crypto->snow3g_fsm[sel - OCTEON_COP2_SEL_SNOW3G_FSM0];
+        default:
+            break;
+        }
+    }
+
+    switch (sel) {
+    case OCTEON_COP2_SEL_3DES_KEY0:
+    case OCTEON_COP2_SEL_3DES_KEY1:
+    case OCTEON_COP2_SEL_3DES_KEY2:
+        return crypto->des3_key[sel - OCTEON_COP2_SEL_3DES_KEY0];
+    case OCTEON_COP2_SEL_3DES_IV:
+        return crypto->des3_iv;
+    case OCTEON_COP2_SEL_3DES_RESULT_MF:
+    case OCTEON_COP2_SEL_3DES_RESULT_MT:
+        return crypto->des3_result;
+    case OCTEON_COP2_SEL_AES_RESINP0:
+    case OCTEON_COP2_SEL_AES_RESINP1:
+        return crypto->aes_result[sel - OCTEON_COP2_SEL_AES_RESINP0];
+    case OCTEON_COP2_SEL_AES_KEY0:
+    case OCTEON_COP2_SEL_AES_KEY1:
+    case OCTEON_COP2_SEL_AES_KEY2:
+    case OCTEON_COP2_SEL_AES_KEY3:
+        return crypto->aes_key[sel - OCTEON_COP2_SEL_AES_KEY0];
+    case OCTEON_COP2_SEL_AES_KEYLENGTH:
+        return crypto->aes_keylen;
+    case OCTEON_COP2_SEL_AES_INP0:
+        return crypto->aes_input[0];
+    case OCTEON_COP2_SEL_AES_IV0:
+    case OCTEON_COP2_SEL_AES_IV1:
+        return crypto->aes_iv[sel - OCTEON_COP2_SEL_AES_IV0];
+    case OCTEON_COP2_SEL_CRC_POLYNOMIAL:
+        return crypto->crc_poly;
+    case OCTEON_COP2_SEL_CRC_IV:
+        return crypto->crc_iv;
+    case OCTEON_COP2_SEL_CRC_LEN:
+        return crypto->crc_len;
+    case OCTEON_COP2_SEL_CRC_IV_REFLECT:
+        return octeon_crc_reflect32_by_byte(crypto->crc_iv);
+    case OCTEON_COP2_SEL_HSH_DATW0:
+    case OCTEON_COP2_SEL_HSH_DATW1:
+    case OCTEON_COP2_SEL_HSH_DATW2:
+    case OCTEON_COP2_SEL_HSH_DATW3:
+    case OCTEON_COP2_SEL_HSH_DATW4:
+    case OCTEON_COP2_SEL_HSH_DATW5:
+    case OCTEON_COP2_SEL_HSH_DATW6:
+    case OCTEON_COP2_SEL_HSH_DATW7:
+    case OCTEON_COP2_SEL_HSH_DATW8:
+    case OCTEON_COP2_SEL_HSH_DATW9:
+    case OCTEON_COP2_SEL_HSH_DATW10:
+    case OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_DATW12:
+    case OCTEON_COP2_SEL_HSH_DATW13:
+    case OCTEON_COP2_SEL_HSH_DATW14:
+        return crypto->hsh_datw[sel - OCTEON_COP2_SEL_HSH_DATW0];
+    case OCTEON_COP2_SEL_HSH_DATW15:
+        return crypto->hsh_datw[15];
+    case OCTEON_COP2_SEL_HSH_IVW0:
+    case OCTEON_COP2_SEL_HSH_IVW1:
+    case OCTEON_COP2_SEL_HSH_IVW2:
+    case OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_HSH_IVW4:
+    case OCTEON_COP2_SEL_HSH_IVW5:
+    case OCTEON_COP2_SEL_HSH_IVW6:
+    case OCTEON_COP2_SEL_HSH_IVW7:
+        return crypto->hsh_ivw[sel - OCTEON_COP2_SEL_HSH_IVW0];
+    case OCTEON_COP2_SEL_HSH_IV0:
+    case OCTEON_COP2_SEL_HSH_IV1:
+    case OCTEON_COP2_SEL_HSH_IV2:
+    case OCTEON_COP2_SEL_HSH_IV3:
+        return crypto->hsh_iv[sel - OCTEON_COP2_SEL_HSH_IV0];
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT1:
+        return crypto->gfm_reflect_mul[sel - OCTEON_COP2_SEL_GFM_MUL_REFLECT0];
+    case OCTEON_COP2_SEL_GFM_RESINP_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_RESINP_REFLECT1:
+        return crypto->gfm_reflect_resinp[
+            sel - OCTEON_COP2_SEL_GFM_RESINP_REFLECT0];
+    case OCTEON_COP2_SEL_GFM_MUL0:
+    case OCTEON_COP2_SEL_GFM_MUL1:
+        return crypto->gfm_mul[sel - OCTEON_COP2_SEL_GFM_MUL0];
+    case OCTEON_COP2_SEL_GFM_RESINP0:
+    case OCTEON_COP2_SEL_GFM_RESINP1:
+        return crypto->gfm_resinp[sel - OCTEON_COP2_SEL_GFM_RESINP0];
+    case OCTEON_COP2_SEL_GFM_POLY:
+        return crypto->gfm_poly;
+    default:
+        return 0;
+    }
+}
+
+void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
+                              uint32_t sel)
+{
+    MIPSOcteonCryptoState *crypto = &env->octeon_crypto;
+    uint64_t data = value;
+
+    switch (sel) {
+    case OCTEON_COP2_SEL_3DES_KEY0:
+    case OCTEON_COP2_SEL_3DES_KEY1:
+    case OCTEON_COP2_SEL_3DES_KEY2:
+        crypto->des3_key[sel - OCTEON_COP2_SEL_3DES_KEY0] = data;
+        break;
+    case OCTEON_COP2_SEL_3DES_IV:
+        crypto->des3_iv = data;
+        break;
+    case OCTEON_COP2_SEL_3DES_RESULT_MT:
+        crypto->des3_result = data;
+        break;
+    case OCTEON_COP2_SEL_3DES_ENC_CBC:
+        octeon_3des_crypt_common(crypto, data, true, true);
+        break;
+    case OCTEON_COP2_SEL_KAS_ENC_CBC:
+        octeon_kasumi_crypt_common(crypto, data, true);
+        break;
+    case OCTEON_COP2_SEL_3DES_ENC:
+        octeon_3des_crypt_common(crypto, data, true, false);
+        break;
+    case OCTEON_COP2_SEL_KAS_ENC:
+        octeon_kasumi_crypt_common(crypto, data, false);
+        break;
+    case OCTEON_COP2_SEL_3DES_DEC_CBC:
+        octeon_3des_crypt_common(crypto, data, false, true);
+        break;
+    case OCTEON_COP2_SEL_3DES_DEC:
+        octeon_3des_crypt_common(crypto, data, false, false);
+        break;
+    case OCTEON_COP2_SEL_AES_RESINP0:
+    case OCTEON_COP2_SEL_AES_RESINP1:
+        crypto->aes_input[sel - OCTEON_COP2_SEL_AES_RESINP0] = data;
+        crypto->aes_result[sel - OCTEON_COP2_SEL_AES_RESINP0] = data;
+        break;
+    case OCTEON_COP2_SEL_AES_IV0:
+    case OCTEON_COP2_SEL_AES_IV1:
+        crypto->aes_iv[sel - OCTEON_COP2_SEL_AES_IV0] = data;
+        break;
+    case OCTEON_COP2_SEL_AES_KEY0:
+    case OCTEON_COP2_SEL_AES_KEY1:
+    case OCTEON_COP2_SEL_AES_KEY2:
+    case OCTEON_COP2_SEL_AES_KEY3:
+        crypto->aes_key[sel - OCTEON_COP2_SEL_AES_KEY0] = data;
+        break;
+    case OCTEON_COP2_SEL_AES_ENC_CBC0:
+    case OCTEON_COP2_SEL_AES_ENC0:
+    case OCTEON_COP2_SEL_AES_DEC_CBC0:
+    case OCTEON_COP2_SEL_AES_DEC0:
+        crypto->aes_input[0] = data;
+        break;
+    case OCTEON_COP2_SEL_AES_KEYLENGTH:
+        crypto->aes_keylen = data;
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL:
+    case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL_REFLECT:
+        crypto->crc_poly = data;
+        break;
+    case OCTEON_COP2_SEL_CRC_IV:
+        crypto->crc_iv = data;
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_LEN:
+        crypto->crc_len = data;
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_IV_REFLECT:
+        crypto->crc_iv = octeon_crc_reflect32_by_byte((uint32_t)data);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_BYTE:
+        octeon_crc_update_normal(crypto, data, 1);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_HALF:
+        octeon_crc_update_normal(crypto, data, 2);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_WORD:
+        octeon_crc_update_normal(crypto, data, 4);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_DWORD:
+        octeon_crc_update_normal(crypto, data, 8);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_VAR:
+        octeon_crc_update_normal(crypto, data, MIN(8U, crypto->crc_len));
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_BYTE_REFLECT:
+        octeon_crc_update_reflect(crypto, data, 1);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_HALF_REFLECT:
+        octeon_crc_update_reflect(crypto, data, 2);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_WORD_REFLECT:
+        octeon_crc_update_reflect(crypto, data, 4);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_DWORD_REFLECT:
+        octeon_crc_update_reflect(crypto, data, 8);
+        break;
+    case OCTEON_COP2_SEL_CRC_WRITE_VAR_REFLECT:
+        octeon_crc_update_reflect(crypto, data, MIN(8U, crypto->crc_len));
+        break;
+    case OCTEON_COP2_SEL_HSH_DATW0:
+    case OCTEON_COP2_SEL_HSH_DATW1:
+    case OCTEON_COP2_SEL_HSH_DATW2:
+    case OCTEON_COP2_SEL_HSH_DATW3:
+    case OCTEON_COP2_SEL_HSH_DATW4:
+    case OCTEON_COP2_SEL_HSH_DATW5:
+    case OCTEON_COP2_SEL_HSH_DATW6:
+    case OCTEON_COP2_SEL_HSH_DATW7:
+    case OCTEON_COP2_SEL_HSH_DATW8:
+    case OCTEON_COP2_SEL_HSH_DATW9:
+    case OCTEON_COP2_SEL_HSH_DATW10:
+    case OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_DATW12:
+    case OCTEON_COP2_SEL_HSH_DATW13:
+    case OCTEON_COP2_SEL_HSH_DATW14:
+        octeon_store_shared_hsh_window(crypto, sel, data);
+        break;
+    case OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_HSH_STARTSHA512:
+        crypto->hsh_datw[15] = data;
+        octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA512);
+        octeon_sha512_transform(crypto);
+        break;
+    case OCTEON_COP2_SEL_HSH_IVW0:
+    case OCTEON_COP2_SEL_HSH_IVW1:
+    case OCTEON_COP2_SEL_HSH_IVW2:
+    case OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_HSH_IVW4:
+    case OCTEON_COP2_SEL_HSH_IVW5:
+    case OCTEON_COP2_SEL_HSH_IVW6:
+    case OCTEON_COP2_SEL_HSH_IVW7:
+        octeon_store_shared_hsh_window(crypto, sel, data);
+        break;
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT1:
+        crypto->gfm_reflect_mul[
+            sel - OCTEON_COP2_SEL_GFM_MUL_REFLECT0] = data;
+        break;
+    case OCTEON_COP2_SEL_GFM_XOR0_REFLECT:
+        crypto->gfm_reflect_xor0 = data;
+        break;
+    case OCTEON_COP2_SEL_GFM_MUL0:
+    case OCTEON_COP2_SEL_GFM_MUL1:
+        crypto->gfm_mul[sel - OCTEON_COP2_SEL_GFM_MUL0] = data;
+        break;
+    case OCTEON_COP2_SEL_GFM_RESINP0:
+    case OCTEON_COP2_SEL_GFM_RESINP1:
+        crypto->gfm_resinp[sel - OCTEON_COP2_SEL_GFM_RESINP0] = data;
+        break;
+    case OCTEON_COP2_SEL_GFM_XOR0:
+        crypto->gfm_xor0 = data;
+        break;
+    case OCTEON_COP2_SEL_GFM_POLY:
+        crypto->gfm_poly = data;
+        break;
+    case OCTEON_COP2_SEL_HSH_DAT0:
+    case OCTEON_COP2_SEL_HSH_DAT1:
+    case OCTEON_COP2_SEL_HSH_DAT2:
+    case OCTEON_COP2_SEL_HSH_DAT3:
+    case OCTEON_COP2_SEL_HSH_DAT4:
+    case OCTEON_COP2_SEL_HSH_DAT5:
+    case OCTEON_COP2_SEL_HSH_DAT6:
+        crypto->hsh_dat[sel - OCTEON_COP2_SEL_HSH_DAT0] = data;
+        break;
+    case OCTEON_COP2_SEL_HSH_IV0:
+    case OCTEON_COP2_SEL_HSH_IV1:
+    case OCTEON_COP2_SEL_HSH_IV2:
+    case OCTEON_COP2_SEL_HSH_IV3:
+        crypto->hsh_iv[sel - OCTEON_COP2_SEL_HSH_IV0] = data;
+        break;
+    case OCTEON_COP2_SEL_HSH_STARTMD5:
+        crypto->hsh_dat[7] = data;
+        octeon_md5_transform(crypto);
+        break;
+    case OCTEON_COP2_SEL_HSH_STARTSHA256:
+        crypto->hsh_dat[7] = data;
+        octeon_sha256_transform(crypto);
+        break;
+    case OCTEON_COP2_SEL_HSH_STARTSHA_COMPAT:
+    case OCTEON_COP2_SEL_HSH_STARTSHA:
+        crypto->hsh_dat[7] = data;
+        octeon_sha1_transform(crypto);
+        break;
+    case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
+        octeon_gfm_mul_reflect(crypto, data);
+        break;
+    case OCTEON_COP2_SEL_AES_ENC_CBC1:
+        crypto->aes_input[1] = data;
+        octeon_aes_encrypt_common(crypto, true);
+        break;
+    case OCTEON_COP2_SEL_AES_ENC1:
+        crypto->aes_input[1] = data;
+        octeon_aes_encrypt_common(crypto, false);
+        break;
+    case OCTEON_COP2_SEL_AES_DEC_CBC1:
+        crypto->aes_input[1] = data;
+        octeon_aes_decrypt_common(crypto, true);
+        break;
+    case OCTEON_COP2_SEL_AES_DEC1:
+        crypto->aes_input[1] = data;
+        octeon_aes_decrypt_common(crypto, false);
+        break;
+    case OCTEON_COP2_SEL_GFM_XORMUL1: {
+        uint64_t in[2] = {
+            crypto->gfm_resinp[0] ^ crypto->gfm_xor0,
+            crypto->gfm_resinp[1] ^ data,
+        };
+
+        /*
+         * A 64-bit reflected GFM operation uses this XORMUL1 path when the
+         * block is programmed with only MUL0, an 8-bit polynomial, and a zero
+         * high input half. Detect that shape and use the reflected helper
+         * instead of the normal GHASH-style multiplier.
+         */
+        if (crypto->gfm_poly <= 0xff &&
+            crypto->gfm_mul[1] == 0 &&
+            in[0] == 0) {
+            octeon_gfm_mul64_uia2(in, crypto->gfm_mul,
+                                  crypto->gfm_poly, crypto->gfm_resinp);
+        } else {
+            octeon_gfm_mul(in, crypto->gfm_mul, crypto->gfm_poly,
+                           crypto->gfm_resinp);
+        }
+        /*
+         * GFM_XOR0 is a write-only staging half consumed by the next XORMUL1
+         * operation, so clear it once the combined multiply has been issued.
+         */
+        crypto->gfm_xor0 = 0;
+        break;
+    }
+    case OCTEON_COP2_SEL_SNOW3G_START:
+        octeon_snow3g_start(crypto, data);
+        break;
+    case OCTEON_COP2_SEL_SNOW3G_MORE:
+        octeon_snow3g_more(crypto);
+        break;
+    default:
+        break;
+    }
+}
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 07674f0d44..86e8c4b93d 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -17,6 +17,210 @@ typedef void gen_helper_octeon_vmul(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void gen_helper_octeon_qmac_fn(TCGv_ptr, TCGv_i64, TCGv_i64,
                                        TCGv_i32);
 
+static bool octeon_cop2_is_supported_dmfc2(uint16_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_3DES_KEY0:
+    case OCTEON_COP2_SEL_3DES_KEY1:
+    case OCTEON_COP2_SEL_3DES_KEY2:
+    case OCTEON_COP2_SEL_3DES_IV:
+    case OCTEON_COP2_SEL_3DES_RESULT_MF:
+    case OCTEON_COP2_SEL_3DES_RESULT_MT:
+    case OCTEON_COP2_SEL_AES_RESINP0:
+    case OCTEON_COP2_SEL_AES_RESINP1:
+    case OCTEON_COP2_SEL_AES_KEY0:
+    case OCTEON_COP2_SEL_AES_KEY1:
+    case OCTEON_COP2_SEL_AES_KEY2:
+    case OCTEON_COP2_SEL_AES_KEY3:
+    case OCTEON_COP2_SEL_AES_KEYLENGTH:
+    case OCTEON_COP2_SEL_CRC_POLYNOMIAL:
+    case OCTEON_COP2_SEL_AES_IV0:
+    case OCTEON_COP2_SEL_AES_IV1:
+    case OCTEON_COP2_SEL_CRC_IV:
+    case OCTEON_COP2_SEL_CRC_LEN:
+    case OCTEON_COP2_SEL_CRC_IV_REFLECT:
+    case OCTEON_COP2_SEL_HSH_DATW0:
+    case OCTEON_COP2_SEL_HSH_DATW1:
+    case OCTEON_COP2_SEL_HSH_DATW2:
+    case OCTEON_COP2_SEL_HSH_DATW3:
+    case OCTEON_COP2_SEL_HSH_DATW4:
+    case OCTEON_COP2_SEL_HSH_DATW5:
+    case OCTEON_COP2_SEL_HSH_DATW6:
+    case OCTEON_COP2_SEL_HSH_DATW7:
+    case OCTEON_COP2_SEL_HSH_DATW8:
+    case OCTEON_COP2_SEL_HSH_DATW9:
+    case OCTEON_COP2_SEL_HSH_DATW10:
+    case OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_DATW12:
+    case OCTEON_COP2_SEL_HSH_DATW13:
+    case OCTEON_COP2_SEL_HSH_DATW14:
+    case OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_HSH_IV0:
+    case OCTEON_COP2_SEL_HSH_IV1:
+    case OCTEON_COP2_SEL_HSH_IV2:
+    case OCTEON_COP2_SEL_HSH_IV3:
+    case OCTEON_COP2_SEL_HSH_IVW0:
+    case OCTEON_COP2_SEL_HSH_IVW1:
+    case OCTEON_COP2_SEL_HSH_IVW2:
+    case OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_HSH_IVW4:
+    case OCTEON_COP2_SEL_HSH_IVW5:
+    case OCTEON_COP2_SEL_HSH_IVW6:
+    case OCTEON_COP2_SEL_HSH_IVW7:
+    case OCTEON_COP2_SEL_AES_INP0:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT1:
+    case OCTEON_COP2_SEL_GFM_RESINP_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_RESINP_REFLECT1:
+    case OCTEON_COP2_SEL_GFM_MUL0:
+    case OCTEON_COP2_SEL_GFM_MUL1:
+    case OCTEON_COP2_SEL_GFM_RESINP0:
+    case OCTEON_COP2_SEL_GFM_RESINP1:
+    case OCTEON_COP2_SEL_GFM_POLY:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_3DES_KEY0:
+    case OCTEON_COP2_SEL_3DES_KEY1:
+    case OCTEON_COP2_SEL_3DES_KEY2:
+    case OCTEON_COP2_SEL_3DES_IV:
+    case OCTEON_COP2_SEL_3DES_RESULT_MT:
+    case OCTEON_COP2_SEL_3DES_ENC_CBC:
+    case OCTEON_COP2_SEL_KAS_ENC_CBC:
+    case OCTEON_COP2_SEL_3DES_ENC:
+    case OCTEON_COP2_SEL_KAS_ENC:
+    case OCTEON_COP2_SEL_3DES_DEC_CBC:
+    case OCTEON_COP2_SEL_3DES_DEC:
+    case OCTEON_COP2_SEL_AES_RESINP0:
+    case OCTEON_COP2_SEL_AES_RESINP1:
+    case OCTEON_COP2_SEL_AES_IV0:
+    case OCTEON_COP2_SEL_AES_IV1:
+    case OCTEON_COP2_SEL_AES_KEY0:
+    case OCTEON_COP2_SEL_AES_KEY1:
+    case OCTEON_COP2_SEL_AES_KEY2:
+    case OCTEON_COP2_SEL_AES_KEY3:
+    case OCTEON_COP2_SEL_AES_ENC_CBC0:
+    case OCTEON_COP2_SEL_AES_ENC0:
+    case OCTEON_COP2_SEL_AES_DEC_CBC0:
+    case OCTEON_COP2_SEL_AES_DEC0:
+    case OCTEON_COP2_SEL_AES_KEYLENGTH:
+    case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL:
+    case OCTEON_COP2_SEL_CRC_IV:
+    case OCTEON_COP2_SEL_CRC_WRITE_LEN:
+    case OCTEON_COP2_SEL_CRC_WRITE_IV_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_BYTE:
+    case OCTEON_COP2_SEL_CRC_WRITE_HALF:
+    case OCTEON_COP2_SEL_CRC_WRITE_WORD:
+    case OCTEON_COP2_SEL_CRC_WRITE_DWORD:
+    case OCTEON_COP2_SEL_CRC_WRITE_VAR:
+    case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_BYTE_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_HALF_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_WORD_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_DWORD_REFLECT:
+    case OCTEON_COP2_SEL_CRC_WRITE_VAR_REFLECT:
+    case OCTEON_COP2_SEL_HSH_DAT0:
+    case OCTEON_COP2_SEL_HSH_DAT1:
+    case OCTEON_COP2_SEL_HSH_DAT2:
+    case OCTEON_COP2_SEL_HSH_DAT3:
+    case OCTEON_COP2_SEL_HSH_DAT4:
+    case OCTEON_COP2_SEL_HSH_DAT5:
+    case OCTEON_COP2_SEL_HSH_DAT6:
+    case OCTEON_COP2_SEL_HSH_IV0:
+    case OCTEON_COP2_SEL_HSH_IV1:
+    case OCTEON_COP2_SEL_HSH_IV2:
+    case OCTEON_COP2_SEL_HSH_IV3:
+    case OCTEON_COP2_SEL_HSH_DATW0:
+    case OCTEON_COP2_SEL_HSH_DATW1:
+    case OCTEON_COP2_SEL_HSH_DATW2:
+    case OCTEON_COP2_SEL_HSH_DATW3:
+    case OCTEON_COP2_SEL_HSH_DATW4:
+    case OCTEON_COP2_SEL_HSH_DATW5:
+    case OCTEON_COP2_SEL_HSH_DATW6:
+    case OCTEON_COP2_SEL_HSH_DATW7:
+    case OCTEON_COP2_SEL_HSH_DATW8:
+    case OCTEON_COP2_SEL_HSH_DATW9:
+    case OCTEON_COP2_SEL_HSH_DATW10:
+    case OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_DATW12:
+    case OCTEON_COP2_SEL_HSH_DATW13:
+    case OCTEON_COP2_SEL_HSH_DATW14:
+    case OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_HSH_IVW0:
+    case OCTEON_COP2_SEL_HSH_IVW1:
+    case OCTEON_COP2_SEL_HSH_IVW2:
+    case OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_HSH_IVW4:
+    case OCTEON_COP2_SEL_HSH_IVW5:
+    case OCTEON_COP2_SEL_HSH_IVW6:
+    case OCTEON_COP2_SEL_HSH_IVW7:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT0:
+    case OCTEON_COP2_SEL_GFM_MUL_REFLECT1:
+    case OCTEON_COP2_SEL_GFM_XOR0_REFLECT:
+    case OCTEON_COP2_SEL_GFM_MUL0:
+    case OCTEON_COP2_SEL_GFM_MUL1:
+    case OCTEON_COP2_SEL_GFM_RESINP0:
+    case OCTEON_COP2_SEL_GFM_RESINP1:
+    case OCTEON_COP2_SEL_GFM_XOR0:
+    case OCTEON_COP2_SEL_GFM_POLY:
+    case OCTEON_COP2_SEL_HSH_STARTSHA_COMPAT:
+    case OCTEON_COP2_SEL_HSH_STARTMD5:
+    case OCTEON_COP2_SEL_SNOW3G_START:
+    case OCTEON_COP2_SEL_SNOW3G_MORE:
+    case OCTEON_COP2_SEL_HSH_STARTSHA256:
+    case OCTEON_COP2_SEL_HSH_STARTSHA:
+    case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
+    case OCTEON_COP2_SEL_HSH_STARTSHA512:
+    case OCTEON_COP2_SEL_GFM_XORMUL1:
+    case OCTEON_COP2_SEL_AES_ENC_CBC1:
+    case OCTEON_COP2_SEL_AES_ENC1:
+    case OCTEON_COP2_SEL_AES_DEC_CBC1:
+    case OCTEON_COP2_SEL_AES_DEC1:
+        return true;
+    default:
+        return false;
+    }
+}
+
+bool gen_octeon_cop2(DisasContext *ctx)
+{
+    enum {
+        OCTEON_CP2_RS_DMFC2 = 0x01,
+        OCTEON_CP2_RS_DMTC2 = 0x05,
+    };
+    int rs = extract32(ctx->opcode, 21, 5);
+    int rt = extract32(ctx->opcode, 16, 5);
+    uint16_t sel = ctx->opcode;
+    TCGv_i64 t0;
+
+    switch (rs) {
+    case OCTEON_CP2_RS_DMFC2:
+        if (!octeon_cop2_is_supported_dmfc2(sel)) {
+            return false;
+        }
+        t0 = tcg_temp_new_i64();
+        gen_helper_octeon_cop2_dmfc2(t0, tcg_env, tcg_constant_i32(sel));
+        gen_store_gpr(t0, rt);
+        return true;
+    case OCTEON_CP2_RS_DMTC2:
+        if (!octeon_cop2_is_supported_dmtc2(sel)) {
+            return false;
+        }
+        t0 = tcg_temp_new_i64();
+        gen_load_gpr(t0, rt);
+        gen_helper_octeon_cop2_dmtc2(tcg_env, t0, tcg_constant_i32(sel));
+        return true;
+    default:
+        return false;
+    }
+}
+
 static bool trans_BBIT(DisasContext *ctx, arg_BBIT *a)
 {
     TCGv_i64 p;
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index dac30aff8d..767d64718a 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -14863,6 +14863,15 @@ static bool decode_opc_legacy(CPUMIPSState *env, DisasContext *ctx)
         }
         break;
     case OPC_CP2:
+#if defined(TARGET_MIPS64)
+        if (ctx->insn_flags & INSN_OCTEON) {
+            if (gen_octeon_cop2(ctx)) {
+                break;
+            }
+            generate_exception_err(ctx, EXCP_CpU, 2);
+            break;
+        }
+#endif
         check_insn(ctx, ASE_LMMI);
         /* Note that these instructions use different fields.  */
         gen_loongson_multimedia(ctx, sa, rd, rt);
diff --git a/target/mips/tcg/translate.h b/target/mips/tcg/translate.h
index 89dde1e712..feb3c47c44 100644
--- a/target/mips/tcg/translate.h
+++ b/target/mips/tcg/translate.h
@@ -232,6 +232,7 @@ bool decode_ext_loongson(DisasContext *ctx, uint32_t insn);
 bool decode_ase_lcsr(DisasContext *ctx, uint32_t insn);
 bool decode_ext_tx79(DisasContext *ctx, uint32_t insn);
 bool decode_ext_octeon(DisasContext *ctx, uint32_t insn);
+bool gen_octeon_cop2(DisasContext *ctx);
 #endif
 bool decode_ext_vr54xx(DisasContext *ctx, uint32_t insn);
 
diff --git a/tests/tcg/mips/user/isa/octeon/octeon-insns.c b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
index 9153e37e9e..435ccfa347 100644
--- a/tests/tcg/mips/user/isa/octeon/octeon-insns.c
+++ b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
@@ -186,6 +186,70 @@ static uint64_t octeon_mtp0_zeroes_p1(void)
     return rd;
 }
 
+static uint64_t octeon_cop2_key0_readback(uint64_t value)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[value]\n\t"
+        ".word 0x48a80104\n\t" /* dmtc2 $8, AES_KEY0 selector */
+        ".word 0x482a0104\n\t" /* dmfc2 $10, AES_KEY0 selector */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [value] "r" (value)
+        : "$8", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_cop2_key2_readback(uint64_t value)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[value]\n\t"
+        ".word 0x48a80106\n\t" /* dmtc2 $8, AES_KEY2 selector */
+        ".word 0x482a0106\n\t" /* dmfc2 $10, AES_KEY2 selector */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [value] "r" (value)
+        : "$8", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_cop2_key3_readback(uint64_t value)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[value]\n\t"
+        ".word 0x48a80107\n\t" /* dmtc2 $8, AES_KEY3 selector */
+        ".word 0x482a0107\n\t" /* dmfc2 $10, AES_KEY3 selector */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [value] "r" (value)
+        : "$8", "$10");
+
+    return rd;
+}
+
+static uint64_t octeon_cop2_keylength_readback(uint64_t value)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[value]\n\t"
+        ".word 0x48a80110\n\t" /* dmtc2 $8, AES_KEYLENGTH selector */
+        ".word 0x482a0110\n\t" /* dmfc2 $10, AES_KEYLENGTH selector */
+        "move %[rd], $10\n\t"
+        : [rd] "=r" (rd)
+        : [value] "r" (value)
+        : "$8", "$10");
+
+    return rd;
+}
+
 int main(void)
 {
     assert(octeon_baddu(0x123, 0x0f0) == 0x13);
@@ -199,6 +263,13 @@ int main(void)
     assert(octeon_vmm0(5, 13, 7, 11) == 59);
     assert(octeon_vmm0_zeroes_mpl1() == 0);
     assert(octeon_mtp0_zeroes_p1() == 0);
+    assert(octeon_cop2_key0_readback(0x1122334455667788ULL) ==
+           0x1122334455667788ULL);
+    assert(octeon_cop2_key2_readback(0x8877665544332211ULL) ==
+           0x8877665544332211ULL);
+    assert(octeon_cop2_key3_readback(0x0102030405060708ULL) ==
+           0x0102030405060708ULL);
+    assert(octeon_cop2_keylength_readback(0xa5) == 0xa5);
 
     return 0;
 }

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 28/35] target/mips: add Octeon SMS4 crypto support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (26 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 27/35] target/mips: add Octeon COP2 crypto core support James Hilliard
@ 2026-05-11 18:22 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 29/35] target/mips: add Octeon SHA3 " James Hilliard
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

On Octeon, the SMS4 engine is exposed through selectors that alias the
AES register bank. Add the missing selectors and model the shared
RESINP, IV, and key state so the hardware interface matches the
processor behaviour.

Use the in-tree SM4 tables to implement the block operation without
adding a host crypto dependency.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Add selector dispatch updates in octeon_translate.c after moving
    COP2 decode out of translate.c.  (suggested by Philippe
    Mathieu-Daudé)

Changes v5 -> v6:
  - Use RESINP wording for the SMS4 shared selector aliases.
---
 target/mips/cpu.h                  |  18 ++++++
 target/mips/tcg/octeon_crypto.c    | 109 +++++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c |   4 ++
 3 files changed, 131 insertions(+)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index e16f0f6e98..dc883bfb4a 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -593,6 +593,20 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_AES_DEC0 = 0x010e,
     OCTEON_COP2_SEL_AES_KEYLENGTH = 0x0110,
     OCTEON_COP2_SEL_AES_INP0 = 0x0111,
+    /*
+     * SMS4 reuses the AES RESINP, IV, and key banks and only adds
+     * operation selectors for ECB/CBC encrypt/decrypt.
+     */
+    OCTEON_COP2_SEL_SMS4_RESINP0 = OCTEON_COP2_SEL_AES_RESINP0,
+    OCTEON_COP2_SEL_SMS4_RESINP1 = OCTEON_COP2_SEL_AES_RESINP1,
+    OCTEON_COP2_SEL_SMS4_IV0 = OCTEON_COP2_SEL_AES_IV0,
+    OCTEON_COP2_SEL_SMS4_IV1 = OCTEON_COP2_SEL_AES_IV1,
+    OCTEON_COP2_SEL_SMS4_KEY0 = OCTEON_COP2_SEL_AES_KEY0,
+    OCTEON_COP2_SEL_SMS4_KEY1 = OCTEON_COP2_SEL_AES_KEY1,
+    OCTEON_COP2_SEL_SMS4_ENC_CBC0 = OCTEON_COP2_SEL_AES_ENC_CBC0,
+    OCTEON_COP2_SEL_SMS4_ENC0 = OCTEON_COP2_SEL_AES_ENC0,
+    OCTEON_COP2_SEL_SMS4_DEC_CBC0 = OCTEON_COP2_SEL_AES_DEC_CBC0,
+    OCTEON_COP2_SEL_SMS4_DEC0 = OCTEON_COP2_SEL_AES_DEC0,
     OCTEON_COP2_SEL_CRC_POLYNOMIAL = 0x0200,
     OCTEON_COP2_SEL_CRC_IV = 0x0201,
     OCTEON_COP2_SEL_CRC_LEN = 0x0202,
@@ -661,6 +675,10 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_AES_ENC1 = 0x310b,
     OCTEON_COP2_SEL_AES_DEC_CBC1 = 0x310d,
     OCTEON_COP2_SEL_AES_DEC1 = 0x310f,
+    OCTEON_COP2_SEL_SMS4_ENC_CBC1 = 0x3119,
+    OCTEON_COP2_SEL_SMS4_ENC1 = 0x311b,
+    OCTEON_COP2_SEL_SMS4_DEC_CBC1 = 0x311d,
+    OCTEON_COP2_SEL_SMS4_DEC1 = 0x311f,
     OCTEON_COP2_SEL_HSH_STARTMD5 = 0x4047,
     OCTEON_COP2_SEL_SNOW3G_START = 0x404d,
     OCTEON_COP2_SEL_SNOW3G_MORE = 0x404e,
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
index 8b3260c4d6..2d2b19ad30 100644
--- a/target/mips/tcg/octeon_crypto.c
+++ b/target/mips/tcg/octeon_crypto.c
@@ -12,6 +12,7 @@
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
 #include "crypto/clmul.h"
+#include "crypto/sm4.h"
 #include "qemu/bitops.h"
 #include "qemu/host-utils.h"
 
@@ -745,6 +746,57 @@ static int octeon_aes_key_bits(const MIPSOcteonCryptoState *crypto)
     }
 }
 
+static inline uint32_t octeon_sms4_t(uint32_t x)
+{
+    x = sm4_subword(x);
+    return x ^ rol32(x, 2) ^ rol32(x, 10) ^
+           rol32(x, 18) ^ rol32(x, 24);
+}
+
+static inline uint32_t octeon_sms4_t_key(uint32_t x)
+{
+    x = sm4_subword(x);
+    return x ^ rol32(x, 13) ^ rol32(x, 23);
+}
+
+static void octeon_sms4_expand_key(const uint8_t *key, uint32_t round_keys[32])
+{
+    static const uint32_t fk[4] = {
+        0xa3b1bac6U, 0x56aa3350U, 0x677d9197U, 0xb27022dcU,
+    };
+    uint32_t k[36];
+
+    for (int i = 0; i < 4; i++) {
+        k[i] = ldl_be_p(key + i * 4) ^ fk[i];
+    }
+    for (int i = 0; i < 32; i++) {
+        k[i + 4] = k[i] ^ octeon_sms4_t_key(k[i + 1] ^ k[i + 2] ^
+                                            k[i + 3] ^ sm4_ck[i]);
+        round_keys[i] = k[i + 4];
+    }
+}
+
+static void octeon_sms4_crypt_block(const uint8_t *in, uint8_t *out,
+                                    const uint32_t round_keys[32],
+                                    bool encrypt)
+{
+    uint32_t x[36];
+
+    for (int i = 0; i < 4; i++) {
+        x[i] = ldl_be_p(in + i * 4);
+    }
+    for (int i = 0; i < 32; i++) {
+        uint32_t rk = round_keys[encrypt ? i : 31 - i];
+
+        x[i + 4] = x[i] ^ octeon_sms4_t(x[i + 1] ^ x[i + 2] ^
+                                        x[i + 3] ^ rk);
+    }
+    stl_be_p(out, x[35]);
+    stl_be_p(out + 4, x[34]);
+    stl_be_p(out + 8, x[33]);
+    stl_be_p(out + 12, x[32]);
+}
+
 static const uint8_t octeon_des_ip[64] = {
     58, 50, 42, 34, 26, 18, 10,  2,
     60, 52, 44, 36, 28, 20, 12,  4,
@@ -1198,6 +1250,47 @@ static void octeon_aes_store_block(uint64_t regs[2], const uint8_t *block)
     regs[1] = ldq_be_p(block + 8);
 }
 
+static void octeon_sms4_crypt_common(MIPSOcteonCryptoState *crypto,
+                                     bool encrypt, bool cbc)
+{
+    uint8_t key[16];
+    uint8_t in[16];
+    uint8_t out[16];
+    uint8_t iv[16];
+    uint8_t next_iv[16];
+    uint32_t round_keys[32];
+
+    /*
+     * SMS4 aliases the AES state onto the RESINP, IV, and KEY banks,
+     * with only the operation selectors remaining distinct.
+     */
+    octeon_aes_load_key(crypto, key, sizeof(key));
+    octeon_aes_load_block(crypto->aes_input, in);
+    if (cbc) {
+        octeon_aes_load_block(crypto->aes_iv, iv);
+        if (encrypt) {
+            for (int i = 0; i < sizeof(in); i++) {
+                in[i] ^= iv[i];
+            }
+        } else {
+            memcpy(next_iv, in, sizeof(next_iv));
+        }
+    }
+
+    octeon_sms4_expand_key(key, round_keys);
+    octeon_sms4_crypt_block(in, out, round_keys, encrypt);
+    if (cbc && !encrypt) {
+        for (int i = 0; i < sizeof(out); i++) {
+            out[i] ^= iv[i];
+        }
+    }
+
+    octeon_aes_store_block(crypto->aes_result, out);
+    if (cbc) {
+        octeon_aes_store_block(crypto->aes_iv, encrypt ? out : next_iv);
+    }
+}
+
 static void octeon_aes_encrypt_common(MIPSOcteonCryptoState *crypto, bool cbc)
 {
     AES_KEY key;
@@ -1614,6 +1707,22 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
         crypto->aes_input[1] = data;
         octeon_aes_decrypt_common(crypto, false);
         break;
+    case OCTEON_COP2_SEL_SMS4_ENC_CBC1:
+        crypto->aes_input[1] = data;
+        octeon_sms4_crypt_common(crypto, true, true);
+        break;
+    case OCTEON_COP2_SEL_SMS4_ENC1:
+        crypto->aes_input[1] = data;
+        octeon_sms4_crypt_common(crypto, true, false);
+        break;
+    case OCTEON_COP2_SEL_SMS4_DEC_CBC1:
+        crypto->aes_input[1] = data;
+        octeon_sms4_crypt_common(crypto, false, true);
+        break;
+    case OCTEON_COP2_SEL_SMS4_DEC1:
+        crypto->aes_input[1] = data;
+        octeon_sms4_crypt_common(crypto, false, false);
+        break;
     case OCTEON_COP2_SEL_GFM_XORMUL1: {
         uint64_t in[2] = {
             crypto->gfm_resinp[0] ^ crypto->gfm_xor0,
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 86e8c4b93d..fac7b8fa2e 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -182,6 +182,10 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_AES_ENC1:
     case OCTEON_COP2_SEL_AES_DEC_CBC1:
     case OCTEON_COP2_SEL_AES_DEC1:
+    case OCTEON_COP2_SEL_SMS4_ENC_CBC1:
+    case OCTEON_COP2_SEL_SMS4_ENC1:
+    case OCTEON_COP2_SEL_SMS4_DEC_CBC1:
+    case OCTEON_COP2_SEL_SMS4_DEC1:
         return true;
     default:
         return false;

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 29/35] target/mips: add Octeon SHA3 crypto support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (27 preceding siblings ...)
  2026-05-11 18:22 ` [PATCH v6 28/35] target/mips: add Octeon SMS4 crypto support James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 30/35] target/mips: add Octeon ZUC " James Hilliard
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add the Octeon SHA3 register window and STARTOP selector.

Keep the shared HSH/SHA3/SHA512 write path coherent, then model the
dedicated 25-lane Keccak state and the Keccak-f[1600] permutation so the
COP2 SHA3 interface follows the hardware behaviour.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Use switch ranges and g_assert_not_reached() for SHA3 selector
    position decoding.  (suggested by Philippe Mathieu-Daudé)
  - Add selector dispatch updates in octeon_translate.c after moving
    COP2 decode out of translate.c.  (suggested by Philippe
    Mathieu-Daudé)

Changes v5 -> v6:
  - Rename SHA3 DAT15 selector aliases with MF/MT direction suffixes.
---
 target/mips/cpu.h                  |  22 +++++
 target/mips/system/machine.c       |   1 +
 target/mips/tcg/octeon_crypto.c    | 171 +++++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c |  22 +++++
 4 files changed, 216 insertions(+)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index dc883bfb4a..258db2babe 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -541,6 +541,7 @@ typedef enum MIPSOcteonSharedMode {
     OCTEON_SHARED_MODE_NONE = 0,
     OCTEON_SHARED_MODE_SHA512,
     OCTEON_SHARED_MODE_SNOW3G,
+    OCTEON_SHARED_MODE_SHA3,
 } MIPSOcteonSharedMode;
 
 typedef enum MIPSOcteonCop2Sel {
@@ -645,6 +646,7 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_HSH_DATW13,
     OCTEON_COP2_SEL_HSH_DATW14,
     OCTEON_COP2_SEL_HSH_DATW15,
+    OCTEON_COP2_SEL_SHA3_DAT15_MF = 0x024f,
     OCTEON_COP2_SEL_HSH_IVW0 = 0x0250,
     OCTEON_COP2_SEL_HSH_IVW1,
     OCTEON_COP2_SEL_HSH_IVW2,
@@ -671,6 +673,24 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_GFM_RESINP1,
     OCTEON_COP2_SEL_GFM_XOR0,
     OCTEON_COP2_SEL_GFM_POLY = 0x025e,
+    OCTEON_COP2_SEL_SHA3_XORDAT0 = 0x02c0,
+    OCTEON_COP2_SEL_SHA3_XORDAT1,
+    OCTEON_COP2_SEL_SHA3_XORDAT2,
+    OCTEON_COP2_SEL_SHA3_XORDAT3,
+    OCTEON_COP2_SEL_SHA3_XORDAT4,
+    OCTEON_COP2_SEL_SHA3_XORDAT5,
+    OCTEON_COP2_SEL_SHA3_XORDAT6,
+    OCTEON_COP2_SEL_SHA3_XORDAT7,
+    OCTEON_COP2_SEL_SHA3_XORDAT8,
+    OCTEON_COP2_SEL_SHA3_XORDAT9,
+    OCTEON_COP2_SEL_SHA3_XORDAT10,
+    OCTEON_COP2_SEL_SHA3_XORDAT11,
+    OCTEON_COP2_SEL_SHA3_XORDAT12,
+    OCTEON_COP2_SEL_SHA3_XORDAT13,
+    OCTEON_COP2_SEL_SHA3_XORDAT14,
+    OCTEON_COP2_SEL_SHA3_XORDAT15,
+    OCTEON_COP2_SEL_SHA3_XORDAT16,
+    OCTEON_COP2_SEL_SHA3_XORDAT17,
     OCTEON_COP2_SEL_AES_ENC_CBC1 = 0x3109,
     OCTEON_COP2_SEL_AES_ENC1 = 0x310b,
     OCTEON_COP2_SEL_AES_DEC_CBC1 = 0x310d,
@@ -683,6 +703,7 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_SNOW3G_START = 0x404d,
     OCTEON_COP2_SEL_SNOW3G_MORE = 0x404e,
     OCTEON_COP2_SEL_HSH_STARTSHA256 = 0x404f,
+    OCTEON_COP2_SEL_SHA3_STARTOP = 0x4052,
     OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT = 0x405d,
     OCTEON_COP2_SEL_HSH_STARTSHA = 0x4057,
     OCTEON_COP2_SEL_HSH_STARTSHA512 = 0x424f,
@@ -697,6 +718,7 @@ typedef struct MIPSOcteonCryptoState {
     uint64_t hsh_dat[8];
     uint64_t hsh_ivw[8];
     uint64_t hsh_datw[16];
+    uint64_t sha3_state[25];
     uint64_t aes_iv[2];
     uint64_t aes_key[4];
     uint64_t aes_result[2];
diff --git a/target/mips/system/machine.c b/target/mips/system/machine.c
index ebfa0a9eb0..e6336534f4 100644
--- a/target/mips/system/machine.c
+++ b/target/mips/system/machine.c
@@ -292,6 +292,7 @@ static const VMStateDescription mips_vmstate_octeon_crypto = {
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_dat, MIPSCPU, 8),
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_ivw, MIPSCPU, 8),
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.hsh_datw, MIPSCPU, 16),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.sha3_state, MIPSCPU, 25),
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_iv, MIPSCPU, 2),
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_key, MIPSCPU, 4),
         VMSTATE_UINT64_ARRAY(env.octeon_crypto.aes_result, MIPSCPU, 2),
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
index 2d2b19ad30..42e68f4205 100644
--- a/target/mips/tcg/octeon_crypto.c
+++ b/target/mips/tcg/octeon_crypto.c
@@ -487,21 +487,150 @@ static void octeon_sha512_transform(MIPSOcteonCryptoState *crypto)
     crypto->hsh_ivw[7] += h;
 }
 
+static const uint64_t octeon_sha3_round_constants[24] = {
+    0x0000000000000001ULL, 0x0000000000008082ULL,
+    0x800000000000808aULL, 0x8000000080008000ULL,
+    0x000000000000808bULL, 0x0000000080000001ULL,
+    0x8000000080008081ULL, 0x8000000000008009ULL,
+    0x000000000000008aULL, 0x0000000000000088ULL,
+    0x0000000080008009ULL, 0x000000008000000aULL,
+    0x000000008000808bULL, 0x800000000000008bULL,
+    0x8000000000008089ULL, 0x8000000000008003ULL,
+    0x8000000000008002ULL, 0x8000000000000080ULL,
+    0x000000000000800aULL, 0x800000008000000aULL,
+    0x8000000080008081ULL, 0x8000000000008080ULL,
+    0x0000000080000001ULL, 0x8000000080008008ULL,
+};
+
+static const uint8_t octeon_sha3_rotation_constants[24] = {
+     1,  3,  6, 10, 15, 21, 28, 36, 45, 55,  2, 14,
+    27, 41, 56,  8, 25, 43, 62, 18, 39, 61, 20, 44,
+};
+
+static const uint8_t octeon_sha3_pi_lanes[24] = {
+    10,  7, 11, 17, 18,  3,  5, 16,  8, 21, 24,  4,
+    15, 23, 19, 13, 12,  2, 20, 14, 22,  9,  6,  1,
+};
+
+static void octeon_sha3_permute(MIPSOcteonCryptoState *crypto)
+{
+    uint64_t *state = crypto->sha3_state;
+
+    for (int round = 0; round < 24; round++) {
+        uint64_t bc[5];
+        uint64_t temp;
+
+        for (int x = 0; x < 5; x++) {
+            bc[x] = state[x] ^ state[5 + x] ^ state[10 + x] ^
+                    state[15 + x] ^ state[20 + x];
+        }
+        for (int x = 0; x < 5; x++) {
+            temp = bc[(x + 4) % 5] ^ rol64(bc[(x + 1) % 5], 1);
+            for (int y = 0; y < 25; y += 5) {
+                state[y + x] ^= temp;
+            }
+        }
+
+        temp = state[1];
+        for (int i = 0; i < 24; i++) {
+            uint64_t next = state[octeon_sha3_pi_lanes[i]];
+
+            state[octeon_sha3_pi_lanes[i]] =
+                rol64(temp, octeon_sha3_rotation_constants[i]);
+            temp = next;
+        }
+
+        for (int y = 0; y < 25; y += 5) {
+            for (int x = 0; x < 5; x++) {
+                bc[x] = state[y + x];
+            }
+            for (int x = 0; x < 5; x++) {
+                state[y + x] = bc[x] ^ ((~bc[(x + 1) % 5]) & bc[(x + 2) % 5]);
+            }
+        }
+
+        state[0] ^= octeon_sha3_round_constants[round];
+    }
+}
+
+static bool octeon_sha3_is_dat_sel(uint32_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW7:
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static int octeon_sha3_dat_pos_from_sel(uint32_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW14:
+        return sel - OCTEON_COP2_SEL_HSH_DATW0;
+    case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW7:
+        return 16 + (sel - OCTEON_COP2_SEL_HSH_IVW0);
+    case OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+        return 15;
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        return 24;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint64_t octeon_sha3_reg_to_lane(uint64_t value)
+{
+    /*
+     * The COP2 register interface is consumed by big-endian MIPS code as
+     * 64-bit register values, while Keccak lanes are byte-little-endian.
+     */
+    return bswap64(value);
+}
+
+static uint64_t octeon_sha3_lane_to_reg(uint64_t value)
+{
+    return bswap64(value);
+}
+
 static void octeon_store_shared_hsh_window(MIPSOcteonCryptoState *crypto,
                                          uint32_t sel, uint64_t value)
 {
     switch (sel) {
     case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW14:
         crypto->hsh_datw[sel - OCTEON_COP2_SEL_HSH_DATW0] = value;
+        crypto->sha3_state[sel - OCTEON_COP2_SEL_HSH_DATW0] =
+            octeon_sha3_reg_to_lane(value);
         break;
     case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW7:
         crypto->hsh_ivw[sel - OCTEON_COP2_SEL_HSH_IVW0] = value;
+        crypto->sha3_state[16 + (sel - OCTEON_COP2_SEL_HSH_IVW0)] =
+            octeon_sha3_reg_to_lane(value);
+        break;
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+        crypto->sha3_state[15] = octeon_sha3_reg_to_lane(value);
+        break;
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        crypto->sha3_state[24] = octeon_sha3_reg_to_lane(value);
         break;
     default:
         g_assert_not_reached();
     }
 }
 
+static int octeon_sha3_xordat_pos_from_sel(uint32_t sel)
+{
+    if (sel >= OCTEON_COP2_SEL_SHA3_XORDAT0 &&
+        sel <= OCTEON_COP2_SEL_SHA3_XORDAT17) {
+        return sel - OCTEON_COP2_SEL_SHA3_XORDAT0;
+    }
+    return -1;
+}
+
 static const uint8_t octeon_snow3g_sr[256] = {
     0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
     0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
@@ -1396,6 +1525,7 @@ static void octeon_gfm_mul(const uint64_t x[2], const uint64_t y[2],
 uint64_t helper_octeon_cop2_dmfc2(CPUMIPSState *env, uint32_t sel)
 {
     MIPSOcteonCryptoState *crypto = &env->octeon_crypto;
+    int sha3_pos;
 
     if (crypto->shared_mode == OCTEON_SHARED_MODE_SNOW3G) {
         if (sel >= OCTEON_COP2_SEL_SNOW3G_LFSR0 &&
@@ -1417,6 +1547,12 @@ uint64_t helper_octeon_cop2_dmfc2(CPUMIPSState *env, uint32_t sel)
         }
     }
 
+    if (crypto->shared_mode == OCTEON_SHARED_MODE_SHA3 &&
+        octeon_sha3_is_dat_sel(sel)) {
+        sha3_pos = octeon_sha3_dat_pos_from_sel(sel);
+        return octeon_sha3_lane_to_reg(crypto->sha3_state[sha3_pos]);
+    }
+
     switch (sel) {
     case OCTEON_COP2_SEL_3DES_KEY0:
     case OCTEON_COP2_SEL_3DES_KEY1:
@@ -1507,6 +1643,7 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
 {
     MIPSOcteonCryptoState *crypto = &env->octeon_crypto;
     uint64_t data = value;
+    int sha3_pos;
 
     switch (sel) {
     case OCTEON_COP2_SEL_3DES_KEY0:
@@ -1628,6 +1765,14 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
         octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA512);
         octeon_sha512_transform(crypto);
         break;
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+        octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA3);
+        octeon_store_shared_hsh_window(crypto, sel, data);
+        break;
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA3);
+        octeon_store_shared_hsh_window(crypto, sel, data);
+        break;
     case OCTEON_COP2_SEL_HSH_IVW0:
     case OCTEON_COP2_SEL_HSH_IVW1:
     case OCTEON_COP2_SEL_HSH_IVW2:
@@ -1688,6 +1833,32 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
         crypto->hsh_dat[7] = data;
         octeon_sha1_transform(crypto);
         break;
+    case OCTEON_COP2_SEL_SHA3_XORDAT0:
+    case OCTEON_COP2_SEL_SHA3_XORDAT1:
+    case OCTEON_COP2_SEL_SHA3_XORDAT2:
+    case OCTEON_COP2_SEL_SHA3_XORDAT3:
+    case OCTEON_COP2_SEL_SHA3_XORDAT4:
+    case OCTEON_COP2_SEL_SHA3_XORDAT5:
+    case OCTEON_COP2_SEL_SHA3_XORDAT6:
+    case OCTEON_COP2_SEL_SHA3_XORDAT7:
+    case OCTEON_COP2_SEL_SHA3_XORDAT8:
+    case OCTEON_COP2_SEL_SHA3_XORDAT9:
+    case OCTEON_COP2_SEL_SHA3_XORDAT10:
+    case OCTEON_COP2_SEL_SHA3_XORDAT11:
+    case OCTEON_COP2_SEL_SHA3_XORDAT12:
+    case OCTEON_COP2_SEL_SHA3_XORDAT13:
+    case OCTEON_COP2_SEL_SHA3_XORDAT14:
+    case OCTEON_COP2_SEL_SHA3_XORDAT15:
+    case OCTEON_COP2_SEL_SHA3_XORDAT16:
+    case OCTEON_COP2_SEL_SHA3_XORDAT17:
+        octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA3);
+        sha3_pos = octeon_sha3_xordat_pos_from_sel(sel);
+        crypto->sha3_state[sha3_pos] ^= octeon_sha3_reg_to_lane(data);
+        break;
+    case OCTEON_COP2_SEL_SHA3_STARTOP:
+        octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA3);
+        octeon_sha3_permute(crypto);
+        break;
     case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
         octeon_gfm_mul_reflect(crypto, data);
         break;
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index fac7b8fa2e..5bb638147d 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -68,6 +68,7 @@ static bool octeon_cop2_is_supported_dmfc2(uint16_t sel)
     case OCTEON_COP2_SEL_HSH_IVW6:
     case OCTEON_COP2_SEL_HSH_IVW7:
     case OCTEON_COP2_SEL_AES_INP0:
+    case OCTEON_COP2_SEL_SHA3_DAT24:
     case OCTEON_COP2_SEL_GFM_MUL_REFLECT0:
     case OCTEON_COP2_SEL_GFM_MUL_REFLECT1:
     case OCTEON_COP2_SEL_GFM_RESINP_REFLECT0:
@@ -152,6 +153,8 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_HSH_DATW13:
     case OCTEON_COP2_SEL_HSH_DATW14:
     case OCTEON_COP2_SEL_HSH_DATW15:
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
     case OCTEON_COP2_SEL_HSH_IVW0:
     case OCTEON_COP2_SEL_HSH_IVW1:
     case OCTEON_COP2_SEL_HSH_IVW2:
@@ -169,11 +172,30 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_GFM_RESINP1:
     case OCTEON_COP2_SEL_GFM_XOR0:
     case OCTEON_COP2_SEL_GFM_POLY:
+    case OCTEON_COP2_SEL_SHA3_XORDAT0:
+    case OCTEON_COP2_SEL_SHA3_XORDAT1:
+    case OCTEON_COP2_SEL_SHA3_XORDAT2:
+    case OCTEON_COP2_SEL_SHA3_XORDAT3:
+    case OCTEON_COP2_SEL_SHA3_XORDAT4:
+    case OCTEON_COP2_SEL_SHA3_XORDAT5:
+    case OCTEON_COP2_SEL_SHA3_XORDAT6:
+    case OCTEON_COP2_SEL_SHA3_XORDAT7:
+    case OCTEON_COP2_SEL_SHA3_XORDAT8:
+    case OCTEON_COP2_SEL_SHA3_XORDAT9:
+    case OCTEON_COP2_SEL_SHA3_XORDAT10:
+    case OCTEON_COP2_SEL_SHA3_XORDAT11:
+    case OCTEON_COP2_SEL_SHA3_XORDAT12:
+    case OCTEON_COP2_SEL_SHA3_XORDAT13:
+    case OCTEON_COP2_SEL_SHA3_XORDAT14:
+    case OCTEON_COP2_SEL_SHA3_XORDAT15:
+    case OCTEON_COP2_SEL_SHA3_XORDAT16:
+    case OCTEON_COP2_SEL_SHA3_XORDAT17:
     case OCTEON_COP2_SEL_HSH_STARTSHA_COMPAT:
     case OCTEON_COP2_SEL_HSH_STARTMD5:
     case OCTEON_COP2_SEL_SNOW3G_START:
     case OCTEON_COP2_SEL_SNOW3G_MORE:
     case OCTEON_COP2_SEL_HSH_STARTSHA256:
+    case OCTEON_COP2_SEL_SHA3_STARTOP:
     case OCTEON_COP2_SEL_HSH_STARTSHA:
     case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
     case OCTEON_COP2_SEL_HSH_STARTSHA512:

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 30/35] target/mips: add Octeon ZUC crypto support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (28 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 29/35] target/mips: add Octeon SHA3 " James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 31/35] target/mips: add Octeon Camellia " James Hilliard
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add the Octeon ZUC START and MORE selectors and model the shared state
window used by the hardware interface.

This covers the keystream and MAC engine state, including the
save-and-restore view that overlaps the HSH/SHA3 bank. Shared-window
writes also update the SHA512/SHA3 backing state so guests can switch
between engines without stale register contents.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Add shared-window selector predicates and assert on unreachable ZUC
    selector switches.  (suggested by Philippe Mathieu-Daudé)
  - Preserve aliased HSH/SHA3/SHA512 backing state during ZUC
    shared-window writes.
  - Add selector dispatch updates in octeon_translate.c after moving
    COP2 decode out of translate.c.  (suggested by Philippe
    Mathieu-Daudé)

Changes v5 -> v6:
  - Use the manual-aligned HSH_DATW field and shared HSH window helper
    names introduced by the COP2 crypto core patch.
---
 target/mips/cpu.h                  |  11 +-
 target/mips/system/machine.c       |   4 +
 target/mips/tcg/octeon_crypto.c    | 353 +++++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c |   2 +
 4 files changed, 368 insertions(+), 2 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 258db2babe..0b16382dd3 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -541,6 +541,7 @@ typedef enum MIPSOcteonSharedMode {
     OCTEON_SHARED_MODE_NONE = 0,
     OCTEON_SHARED_MODE_SHA512,
     OCTEON_SHARED_MODE_SNOW3G,
+    OCTEON_SHARED_MODE_ZUC,
     OCTEON_SHARED_MODE_SHA3,
 } MIPSOcteonSharedMode;
 
@@ -627,8 +628,8 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_CRC_WRITE_DWORD_REFLECT = 0x1217,
     OCTEON_COP2_SEL_CRC_WRITE_VAR_REFLECT = 0x1218,
     /*
-     * Octeon shares 0x0240..0x0257 between SHA512 state/data and the SNOW3G
-     * RESULT/FSM/LFSR window.
+     * Octeon shares 0x0240..0x0257 across the HSH/SHA512, SHA3, SNOW3G,
+     * and ZUC selector windows.
      */
     OCTEON_COP2_SEL_HSH_DATW0 = 0x0240,
     OCTEON_COP2_SEL_HSH_DATW1,
@@ -704,6 +705,8 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_SNOW3G_MORE = 0x404e,
     OCTEON_COP2_SEL_HSH_STARTSHA256 = 0x404f,
     OCTEON_COP2_SEL_SHA3_STARTOP = 0x4052,
+    OCTEON_COP2_SEL_ZUC_START = 0x4055,
+    OCTEON_COP2_SEL_ZUC_MORE = 0x4056,
     OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT = 0x405d,
     OCTEON_COP2_SEL_HSH_STARTSHA = 0x4057,
     OCTEON_COP2_SEL_HSH_STARTSHA512 = 0x424f,
@@ -738,6 +741,10 @@ typedef struct MIPSOcteonCryptoState {
     uint32_t snow3g_fsm[3];
     uint32_t snow3g_lfsr[16];
     uint64_t snow3g_result;
+    uint32_t zuc_fsm[2];
+    uint32_t zuc_lfsr[16];
+    uint32_t zuc_window[3];
+    uint32_t zuc_tresult;
 } MIPSOcteonCryptoState;
 
 typedef struct CPUArchState {
diff --git a/target/mips/system/machine.c b/target/mips/system/machine.c
index e6336534f4..9bcb066245 100644
--- a/target/mips/system/machine.c
+++ b/target/mips/system/machine.c
@@ -312,6 +312,10 @@ static const VMStateDescription mips_vmstate_octeon_crypto = {
         VMSTATE_UINT32_ARRAY(env.octeon_crypto.snow3g_fsm, MIPSCPU, 3),
         VMSTATE_UINT32_ARRAY(env.octeon_crypto.snow3g_lfsr, MIPSCPU, 16),
         VMSTATE_UINT64(env.octeon_crypto.snow3g_result, MIPSCPU),
+        VMSTATE_UINT32_ARRAY(env.octeon_crypto.zuc_fsm, MIPSCPU, 2),
+        VMSTATE_UINT32_ARRAY(env.octeon_crypto.zuc_lfsr, MIPSCPU, 16),
+        VMSTATE_UINT32_ARRAY(env.octeon_crypto.zuc_window, MIPSCPU, 3),
+        VMSTATE_UINT32(env.octeon_crypto.zuc_tresult, MIPSCPU),
         VMSTATE_END_OF_LIST()
     }
 };
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
index 42e68f4205..30df901e6b 100644
--- a/target/mips/tcg/octeon_crypto.c
+++ b/target/mips/tcg/octeon_crypto.c
@@ -631,6 +631,277 @@ static int octeon_sha3_xordat_pos_from_sel(uint32_t sel)
     return -1;
 }
 
+static const uint8_t octeon_zuc_s0[256] = {
+    0x3e, 0x72, 0x5b, 0x47, 0xca, 0xe0, 0x00, 0x33,
+    0x04, 0xd1, 0x54, 0x98, 0x09, 0xb9, 0x6d, 0xcb,
+    0x7b, 0x1b, 0xf9, 0x32, 0xaf, 0x9d, 0x6a, 0xa5,
+    0xb8, 0x2d, 0xfc, 0x1d, 0x08, 0x53, 0x03, 0x90,
+    0x4d, 0x4e, 0x84, 0x99, 0xe4, 0xce, 0xd9, 0x91,
+    0xdd, 0xb6, 0x85, 0x48, 0x8b, 0x29, 0x6e, 0xac,
+    0xcd, 0xc1, 0xf8, 0x1e, 0x73, 0x43, 0x69, 0xc6,
+    0xb5, 0xbd, 0xfd, 0x39, 0x63, 0x20, 0xd4, 0x38,
+    0x76, 0x7d, 0xb2, 0xa7, 0xcf, 0xed, 0x57, 0xc5,
+    0xf3, 0x2c, 0xbb, 0x14, 0x21, 0x06, 0x55, 0x9b,
+    0xe3, 0xef, 0x5e, 0x31, 0x4f, 0x7f, 0x5a, 0xa4,
+    0x0d, 0x82, 0x51, 0x49, 0x5f, 0xba, 0x58, 0x1c,
+    0x4a, 0x16, 0xd5, 0x17, 0xa8, 0x92, 0x24, 0x1f,
+    0x8c, 0xff, 0xd8, 0xae, 0x2e, 0x01, 0xd3, 0xad,
+    0x3b, 0x4b, 0xda, 0x46, 0xeb, 0xc9, 0xde, 0x9a,
+    0x8f, 0x87, 0xd7, 0x3a, 0x80, 0x6f, 0x2f, 0xc8,
+    0xb1, 0xb4, 0x37, 0xf7, 0x0a, 0x22, 0x13, 0x28,
+    0x7c, 0xcc, 0x3c, 0x89, 0xc7, 0xc3, 0x96, 0x56,
+    0x07, 0xbf, 0x7e, 0xf0, 0x0b, 0x2b, 0x97, 0x52,
+    0x35, 0x41, 0x79, 0x61, 0xa6, 0x4c, 0x10, 0xfe,
+    0xbc, 0x26, 0x95, 0x88, 0x8a, 0xb0, 0xa3, 0xfb,
+    0xc0, 0x18, 0x94, 0xf2, 0xe1, 0xe5, 0xe9, 0x5d,
+    0xd0, 0xdc, 0x11, 0x66, 0x64, 0x5c, 0xec, 0x59,
+    0x42, 0x75, 0x12, 0xf5, 0x74, 0x9c, 0xaa, 0x23,
+    0x0e, 0x86, 0xab, 0xbe, 0x2a, 0x02, 0xe7, 0x67,
+    0xe6, 0x44, 0xa2, 0x6c, 0xc2, 0x93, 0x9f, 0xf1,
+    0xf6, 0xfa, 0x36, 0xd2, 0x50, 0x68, 0x9e, 0x62,
+    0x71, 0x15, 0x3d, 0xd6, 0x40, 0xc4, 0xe2, 0x0f,
+    0x8e, 0x83, 0x77, 0x6b, 0x25, 0x05, 0x3f, 0x0c,
+    0x30, 0xea, 0x70, 0xb7, 0xa1, 0xe8, 0xa9, 0x65,
+    0x8d, 0x27, 0x1a, 0xdb, 0x81, 0xb3, 0xa0, 0xf4,
+    0x45, 0x7a, 0x19, 0xdf, 0xee, 0x78, 0x34, 0x60,
+};
+
+static const uint8_t octeon_zuc_s1[256] = {
+    0x55, 0xc2, 0x63, 0x71, 0x3b, 0xc8, 0x47, 0x86,
+    0x9f, 0x3c, 0xda, 0x5b, 0x29, 0xaa, 0xfd, 0x77,
+    0x8c, 0xc5, 0x94, 0x0c, 0xa6, 0x1a, 0x13, 0x00,
+    0xe3, 0xa8, 0x16, 0x72, 0x40, 0xf9, 0xf8, 0x42,
+    0x44, 0x26, 0x68, 0x96, 0x81, 0xd9, 0x45, 0x3e,
+    0x10, 0x76, 0xc6, 0xa7, 0x8b, 0x39, 0x43, 0xe1,
+    0x3a, 0xb5, 0x56, 0x2a, 0xc0, 0x6d, 0xb3, 0x05,
+    0x22, 0x66, 0xbf, 0xdc, 0x0b, 0xfa, 0x62, 0x48,
+    0xdd, 0x20, 0x11, 0x06, 0x36, 0xc9, 0xc1, 0xcf,
+    0xf6, 0x27, 0x52, 0xbb, 0x69, 0xf5, 0xd4, 0x87,
+    0x7f, 0x84, 0x4c, 0xd2, 0x9c, 0x57, 0xa4, 0xbc,
+    0x4f, 0x9a, 0xdf, 0xfe, 0xd6, 0x8d, 0x7a, 0xeb,
+    0x2b, 0x53, 0xd8, 0x5c, 0xa1, 0x14, 0x17, 0xfb,
+    0x23, 0xd5, 0x7d, 0x30, 0x67, 0x73, 0x08, 0x09,
+    0xee, 0xb7, 0x70, 0x3f, 0x61, 0xb2, 0x19, 0x8e,
+    0x4e, 0xe5, 0x4b, 0x93, 0x8f, 0x5d, 0xdb, 0xa9,
+    0xad, 0xf1, 0xae, 0x2e, 0xcb, 0x0d, 0xfc, 0xf4,
+    0x2d, 0x46, 0x6e, 0x1d, 0x97, 0xe8, 0xd1, 0xe9,
+    0x4d, 0x37, 0xa5, 0x75, 0x5e, 0x83, 0x9e, 0xab,
+    0x82, 0x9d, 0xb9, 0x1c, 0xe0, 0xcd, 0x49, 0x89,
+    0x01, 0xb6, 0xbd, 0x58, 0x24, 0xa2, 0x5f, 0x38,
+    0x78, 0x99, 0x15, 0x90, 0x50, 0xb8, 0x95, 0xe4,
+    0xd0, 0x91, 0xc7, 0xce, 0xed, 0x0f, 0xb4, 0x6f,
+    0xa0, 0xcc, 0xf0, 0x02, 0x4a, 0x79, 0xc3, 0xde,
+    0xa3, 0xef, 0xea, 0x51, 0xe6, 0x6b, 0x18, 0xec,
+    0x1b, 0x2c, 0x80, 0xf7, 0x74, 0xe7, 0xff, 0x21,
+    0x5a, 0x6a, 0x54, 0x1e, 0x41, 0x31, 0x92, 0x35,
+    0xc4, 0x33, 0x07, 0x0a, 0xba, 0x7e, 0x0e, 0x34,
+    0x88, 0xb1, 0x98, 0x7c, 0xf3, 0x3d, 0x60, 0x6c,
+    0x7b, 0xca, 0xd3, 0x1f, 0x32, 0x65, 0x04, 0x28,
+    0x64, 0xbe, 0x85, 0x9b, 0x2f, 0x59, 0x8a, 0xd7,
+    0xb0, 0x25, 0xac, 0xaf, 0x12, 0x03, 0xe2, 0xf2,
+};
+
+static inline uint32_t octeon_zuc_addm(uint32_t a, uint32_t b)
+{
+    uint32_t c = a + b;
+
+    c = (c & 0x7fffffffU) + (c >> 31);
+    return c ? c : 0x7fffffffU;
+}
+
+static inline uint32_t octeon_zuc_mul_by_pow2(uint32_t v, unsigned int shift)
+{
+    return ((v << shift) | (v >> (31 - shift))) & 0x7fffffffU;
+}
+
+static inline uint32_t octeon_zuc_make_u32(uint8_t a, uint8_t b,
+                                           uint8_t c, uint8_t d)
+{
+    return ((uint32_t)a << 24) | ((uint32_t)b << 16) |
+           ((uint32_t)c << 8) | d;
+}
+
+static inline uint64_t octeon_zuc_pack_pair(uint32_t hi, uint32_t lo)
+{
+    return ((uint64_t)hi << 32) | lo;
+}
+
+static void octeon_zuc_bit_reorganization(const MIPSOcteonCryptoState *crypto,
+                                          uint32_t x[4])
+{
+    x[0] = ((crypto->zuc_lfsr[15] & 0x7fff8000U) << 1) |
+           (crypto->zuc_lfsr[14] & 0xffffU);
+    x[1] = ((crypto->zuc_lfsr[11] & 0xffffU) << 16) |
+           (crypto->zuc_lfsr[9] >> 15);
+    x[2] = ((crypto->zuc_lfsr[7] & 0xffffU) << 16) |
+           (crypto->zuc_lfsr[5] >> 15);
+    x[3] = ((crypto->zuc_lfsr[2] & 0xffffU) << 16) |
+           (crypto->zuc_lfsr[0] >> 15);
+}
+
+static inline uint32_t octeon_zuc_l1(uint32_t x)
+{
+    return x ^ rol32(x, 2) ^ rol32(x, 10) ^
+           rol32(x, 18) ^ rol32(x, 24);
+}
+
+static inline uint32_t octeon_zuc_l2(uint32_t x)
+{
+    return x ^ rol32(x, 8) ^ rol32(x, 14) ^
+           rol32(x, 22) ^ rol32(x, 30);
+}
+
+static uint32_t octeon_zuc_f(MIPSOcteonCryptoState *crypto, const uint32_t x[4])
+{
+    uint32_t w = (x[0] ^ crypto->zuc_fsm[0]) + crypto->zuc_fsm[1];
+    uint32_t w1 = crypto->zuc_fsm[0] + x[1];
+    uint32_t w2 = crypto->zuc_fsm[1] ^ x[2];
+    uint32_t u = octeon_zuc_l1((w1 << 16) | (w2 >> 16));
+    uint32_t v = octeon_zuc_l2((w2 << 16) | (w1 >> 16));
+
+    crypto->zuc_fsm[0] = octeon_zuc_make_u32(octeon_zuc_s0[u >> 24],
+                                             octeon_zuc_s1[(uint8_t)(u >> 16)],
+                                             octeon_zuc_s0[(uint8_t)(u >> 8)],
+                                             octeon_zuc_s1[(uint8_t)u]);
+    crypto->zuc_fsm[1] = octeon_zuc_make_u32(octeon_zuc_s0[v >> 24],
+                                             octeon_zuc_s1[(uint8_t)(v >> 16)],
+                                             octeon_zuc_s0[(uint8_t)(v >> 8)],
+                                             octeon_zuc_s1[(uint8_t)v]);
+    return w;
+}
+
+static void octeon_zuc_lfsr_step(MIPSOcteonCryptoState *crypto,
+                                 bool init_mode, uint32_t u)
+{
+    uint32_t f = crypto->zuc_lfsr[0];
+
+    f = octeon_zuc_addm(f, octeon_zuc_mul_by_pow2(crypto->zuc_lfsr[0], 8));
+    f = octeon_zuc_addm(f, octeon_zuc_mul_by_pow2(crypto->zuc_lfsr[4], 20));
+    f = octeon_zuc_addm(f, octeon_zuc_mul_by_pow2(crypto->zuc_lfsr[10], 21));
+    f = octeon_zuc_addm(f, octeon_zuc_mul_by_pow2(crypto->zuc_lfsr[13], 17));
+    f = octeon_zuc_addm(f, octeon_zuc_mul_by_pow2(crypto->zuc_lfsr[15], 15));
+    if (init_mode) {
+        f = octeon_zuc_addm(f, u);
+    }
+
+    memmove(&crypto->zuc_lfsr[0], &crypto->zuc_lfsr[1],
+            15 * sizeof(crypto->zuc_lfsr[0]));
+    crypto->zuc_lfsr[15] = f;
+}
+
+static uint32_t octeon_zuc_generate_word(MIPSOcteonCryptoState *crypto)
+{
+    uint32_t x[4];
+    uint32_t z;
+
+    octeon_zuc_bit_reorganization(crypto, x);
+    z = octeon_zuc_f(crypto, x) ^ x[3];
+    octeon_zuc_lfsr_step(crypto, false, 0);
+    return z;
+}
+
+static void octeon_zuc_fill_window(MIPSOcteonCryptoState *crypto)
+{
+    crypto->zuc_window[0] = octeon_zuc_generate_word(crypto);
+    crypto->zuc_window[1] = octeon_zuc_generate_word(crypto);
+    crypto->zuc_window[2] = octeon_zuc_generate_word(crypto);
+}
+
+static inline uint32_t
+octeon_zuc_window_word(const MIPSOcteonCryptoState *crypto, unsigned int bit)
+{
+    if (bit == 0) {
+        return crypto->zuc_window[0];
+    }
+    if (bit < 32) {
+        return (crypto->zuc_window[0] << bit) |
+               (crypto->zuc_window[1] >> (32 - bit));
+    }
+    if (bit == 32) {
+        return crypto->zuc_window[1];
+    }
+    return (crypto->zuc_window[1] << (bit - 32)) |
+           (crypto->zuc_window[2] >> (64 - bit));
+}
+
+static void octeon_zuc_advance_window(MIPSOcteonCryptoState *crypto)
+{
+    crypto->zuc_window[0] = crypto->zuc_window[2];
+    crypto->zuc_window[1] = octeon_zuc_generate_word(crypto);
+    crypto->zuc_window[2] = octeon_zuc_generate_word(crypto);
+}
+
+static void octeon_zuc_start(MIPSOcteonCryptoState *crypto, uint64_t data)
+{
+    uint32_t x[4];
+    bool restore_active = crypto->shared_mode == OCTEON_SHARED_MODE_ZUC;
+
+    octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_ZUC);
+    if (!restore_active) {
+        for (int i = 0; i < 7; i++) {
+            uint64_t pair = crypto->hsh_datw[i];
+
+            crypto->zuc_lfsr[i * 2] = (pair >> 32) & 0x7fffffffU;
+            crypto->zuc_lfsr[i * 2 + 1] = pair & 0x7fffffffU;
+        }
+    }
+    crypto->zuc_lfsr[14] = (data >> 32) & 0x7fffffffU;
+    crypto->zuc_lfsr[15] = data & 0x7fffffffU;
+    crypto->zuc_fsm[0] = 0;
+    crypto->zuc_fsm[1] = 0;
+    crypto->zuc_tresult = 0;
+
+    for (int i = 0; i < 32; i++) {
+        octeon_zuc_bit_reorganization(crypto, x);
+        octeon_zuc_lfsr_step(crypto, true, octeon_zuc_f(crypto, x) >> 1);
+    }
+
+    octeon_zuc_bit_reorganization(crypto, x);
+    (void)octeon_zuc_f(crypto, x);
+    octeon_zuc_lfsr_step(crypto, false, 0);
+    octeon_zuc_fill_window(crypto);
+}
+
+static void octeon_zuc_more(MIPSOcteonCryptoState *crypto, uint64_t data)
+{
+    uint32_t t = crypto->zuc_tresult;
+
+    octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_ZUC);
+    for (unsigned int bit = 0; bit < 64; bit++) {
+        if ((data >> (63 - bit)) & 1) {
+            t ^= octeon_zuc_window_word(crypto, bit);
+        }
+    }
+    crypto->zuc_tresult = t;
+    octeon_zuc_advance_window(crypto);
+}
+
+static bool octeon_zuc_is_shared_dmfc2_sel(uint32_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_SHA3_DAT15_MF:
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static bool octeon_zuc_is_shared_dmtc2_sel(uint32_t sel)
+{
+    switch (sel) {
+    case OCTEON_COP2_SEL_HSH_DATW0 ... OCTEON_COP2_SEL_HSH_DATW11:
+    case OCTEON_COP2_SEL_HSH_IVW0 ... OCTEON_COP2_SEL_HSH_IVW3:
+    case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+    case OCTEON_COP2_SEL_SHA3_DAT24:
+        return true;
+    default:
+        return false;
+    }
+}
+
 static const uint8_t octeon_snow3g_sr[256] = {
     0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
     0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
@@ -1527,6 +1798,39 @@ uint64_t helper_octeon_cop2_dmfc2(CPUMIPSState *env, uint32_t sel)
     MIPSOcteonCryptoState *crypto = &env->octeon_crypto;
     int sha3_pos;
 
+    if (crypto->shared_mode == OCTEON_SHARED_MODE_ZUC &&
+        octeon_zuc_is_shared_dmfc2_sel(sel)) {
+        if (sel >= OCTEON_COP2_SEL_HSH_DATW0 &&
+            sel <= OCTEON_COP2_SEL_HSH_DATW7) {
+            unsigned int idx = sel - OCTEON_COP2_SEL_HSH_DATW0;
+
+            return octeon_zuc_pack_pair(crypto->zuc_lfsr[idx * 2],
+                                        crypto->zuc_lfsr[idx * 2 + 1]);
+        }
+        switch (sel) {
+        case OCTEON_COP2_SEL_HSH_DATW8:
+            return octeon_zuc_pack_pair(crypto->zuc_fsm[0], crypto->zuc_fsm[1]);
+        case OCTEON_COP2_SEL_HSH_DATW9:
+        case OCTEON_COP2_SEL_HSH_IVW0:
+            return octeon_zuc_pack_pair(crypto->zuc_window[0],
+                                        crypto->zuc_window[1]);
+        case OCTEON_COP2_SEL_HSH_DATW10:
+            return crypto->zuc_window[2];
+        case OCTEON_COP2_SEL_HSH_DATW11:
+        case OCTEON_COP2_SEL_HSH_IVW3:
+            return crypto->zuc_tresult;
+        case OCTEON_COP2_SEL_SHA3_DAT15_MF:
+        case OCTEON_COP2_SEL_SHA3_DAT24:
+            return 0;
+        case OCTEON_COP2_SEL_HSH_IVW1:
+            return crypto->zuc_fsm[0];
+        case OCTEON_COP2_SEL_HSH_IVW2:
+            return crypto->zuc_fsm[1];
+        default:
+            g_assert_not_reached();
+        }
+    }
+
     if (crypto->shared_mode == OCTEON_SHARED_MODE_SNOW3G) {
         if (sel >= OCTEON_COP2_SEL_SNOW3G_LFSR0 &&
             sel <= OCTEON_COP2_SEL_SNOW3G_LFSR7) {
@@ -1645,6 +1949,49 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
     uint64_t data = value;
     int sha3_pos;
 
+    if (crypto->shared_mode == OCTEON_SHARED_MODE_ZUC &&
+        octeon_zuc_is_shared_dmtc2_sel(sel)) {
+        octeon_store_shared_hsh_window(crypto, sel, data);
+
+        if (sel >= OCTEON_COP2_SEL_HSH_DATW0 &&
+            sel <= OCTEON_COP2_SEL_HSH_DATW7) {
+            unsigned int idx = sel - OCTEON_COP2_SEL_HSH_DATW0;
+
+            crypto->zuc_lfsr[idx * 2] = (data >> 32) & 0x7fffffffU;
+            crypto->zuc_lfsr[idx * 2 + 1] = data & 0x7fffffffU;
+            return;
+        }
+        switch (sel) {
+        case OCTEON_COP2_SEL_HSH_DATW8:
+            crypto->zuc_fsm[0] = data >> 32;
+            crypto->zuc_fsm[1] = data;
+            return;
+        case OCTEON_COP2_SEL_HSH_DATW9:
+        case OCTEON_COP2_SEL_HSH_IVW0:
+            crypto->zuc_window[0] = data >> 32;
+            crypto->zuc_window[1] = data;
+            return;
+        case OCTEON_COP2_SEL_HSH_DATW10:
+            crypto->zuc_window[2] = data;
+            return;
+        case OCTEON_COP2_SEL_HSH_DATW11:
+        case OCTEON_COP2_SEL_HSH_IVW3:
+            crypto->zuc_tresult = data;
+            return;
+        case OCTEON_COP2_SEL_SHA3_DAT15_MT:
+        case OCTEON_COP2_SEL_SHA3_DAT24:
+            return;
+        case OCTEON_COP2_SEL_HSH_IVW1:
+            crypto->zuc_fsm[0] = data;
+            return;
+        case OCTEON_COP2_SEL_HSH_IVW2:
+            crypto->zuc_fsm[1] = data;
+            return;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
     switch (sel) {
     case OCTEON_COP2_SEL_3DES_KEY0:
     case OCTEON_COP2_SEL_3DES_KEY1:
@@ -1859,6 +2206,12 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
         octeon_set_shared_mode(crypto, OCTEON_SHARED_MODE_SHA3);
         octeon_sha3_permute(crypto);
         break;
+    case OCTEON_COP2_SEL_ZUC_START:
+        octeon_zuc_start(crypto, data);
+        break;
+    case OCTEON_COP2_SEL_ZUC_MORE:
+        octeon_zuc_more(crypto, data);
+        break;
     case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
         octeon_gfm_mul_reflect(crypto, data);
         break;
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index 5bb638147d..b4aa4917fd 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -196,6 +196,8 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_SNOW3G_MORE:
     case OCTEON_COP2_SEL_HSH_STARTSHA256:
     case OCTEON_COP2_SEL_SHA3_STARTOP:
+    case OCTEON_COP2_SEL_ZUC_START:
+    case OCTEON_COP2_SEL_ZUC_MORE:
     case OCTEON_COP2_SEL_HSH_STARTSHA:
     case OCTEON_COP2_SEL_GFM_XORMUL1_REFLECT:
     case OCTEON_COP2_SEL_HSH_STARTSHA512:

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 31/35] target/mips: add Octeon Camellia crypto support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (29 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 30/35] target/mips: add Octeon ZUC " James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 32/35] target/mips: add Octeon CHORD and LLM COP2 support James Hilliard
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add the Octeon Camellia ROUND, FL, and FLINV selectors and model the
round engine that reuses the AES RESINP bank.

Implement the Camellia F-function and FL layers directly from RFC 3713
so guest-managed key schedules can drive the engine through the hardware
interface.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Drop the Octeon prefix from generic Camellia helper routines.
    (suggested by Philippe Mathieu-Daudé)
  - Add selector dispatch updates in octeon_translate.c after moving
    COP2 decode out of translate.c.  (suggested by Philippe
    Mathieu-Daudé)

Changes v5 -> v6:
  - Use RESINP wording for the Camellia shared selector aliases.
---
 target/mips/cpu.h                  |   9 +++
 target/mips/tcg/octeon_crypto.c    | 120 +++++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c |   3 +
 3 files changed, 132 insertions(+)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 0b16382dd3..ba886735d5 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -595,6 +595,14 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_AES_DEC0 = 0x010e,
     OCTEON_COP2_SEL_AES_KEYLENGTH = 0x0110,
     OCTEON_COP2_SEL_AES_INP0 = 0x0111,
+    /*
+     * Camellia reuses the AES RESINP bank and adds per-round and
+     * diffusion-layer selectors for the guest-managed key schedule.
+     */
+    OCTEON_COP2_SEL_CAMELLIA_RESINP0 = OCTEON_COP2_SEL_AES_RESINP0,
+    OCTEON_COP2_SEL_CAMELLIA_RESINP1 = OCTEON_COP2_SEL_AES_RESINP1,
+    OCTEON_COP2_SEL_CAMELLIA_FL = 0x0115,
+    OCTEON_COP2_SEL_CAMELLIA_FLINV = 0x0116,
     /*
      * SMS4 reuses the AES RESINP, IV, and key banks and only adds
      * operation selectors for ECB/CBC encrypt/decrypt.
@@ -696,6 +704,7 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_AES_ENC1 = 0x310b,
     OCTEON_COP2_SEL_AES_DEC_CBC1 = 0x310d,
     OCTEON_COP2_SEL_AES_DEC1 = 0x310f,
+    OCTEON_COP2_SEL_CAMELLIA_ROUND = 0x3114,
     OCTEON_COP2_SEL_SMS4_ENC_CBC1 = 0x3119,
     OCTEON_COP2_SEL_SMS4_ENC1 = 0x311b,
     OCTEON_COP2_SEL_SMS4_DEC_CBC1 = 0x311d,
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
index 30df901e6b..27e34b7f43 100644
--- a/target/mips/tcg/octeon_crypto.c
+++ b/target/mips/tcg/octeon_crypto.c
@@ -1650,6 +1650,117 @@ static void octeon_aes_store_block(uint64_t regs[2], const uint8_t *block)
     regs[1] = ldq_be_p(block + 8);
 }
 
+static const uint8_t camellia_sbox1[256] = {
+    112, 130,  44, 236, 179,  39, 192, 229, 228, 133,  87,  53, 234,  12,
+    174,  65,  35, 239, 107, 147,  69,  25, 165,  33, 237,  14,  79,  78,
+     29, 101, 146, 189, 134, 184, 175, 143, 124, 235,  31, 206,  62,  48,
+    220,  95,  94, 197,  11,  26, 166, 225,  57, 202, 213,  71,  93,  61,
+    217,   1,  90, 214,  81,  86, 108,  77, 139,  13, 154, 102, 251, 204,
+    176,  45, 116,  18,  43,  32, 240, 177, 132, 153, 223,  76, 203, 194,
+     52, 126, 118,   5, 109, 183, 169,  49, 209,  23,   4, 215,  20,  88,
+     58,  97, 222,  27,  17,  28,  50,  15, 156,  22,  83,  24, 242,  34,
+    254,  68, 207, 178, 195, 181, 122, 145,  36,   8, 232, 168,  96, 252,
+    105,  80, 170, 208, 160, 125, 161, 137,  98, 151,  84,  91,  30, 149,
+    224, 255, 100, 210,  16, 196,   0,  72, 163, 247, 117, 219, 138,   3,
+    230, 218,   9,  63, 221, 148, 135,  92, 131,   2, 205,  74, 144,  51,
+    115, 103, 246, 243, 157, 127, 191, 226,  82, 155, 216,  38, 200,  55,
+    198,  59, 129, 150, 111,  75,  19, 190,  99,  46, 233, 121, 167, 140,
+    159, 110, 188, 142,  41, 245, 249, 182,  47, 253, 180,  89, 120, 152,
+      6, 106, 231,  70, 113, 186, 212,  37, 171,  66, 136, 162, 141, 250,
+    114,   7, 185,  85, 248, 238, 172,  10,  54,  73,  42, 104,  60,  56,
+    241, 164,  64,  40, 211, 123, 187, 201,  67, 193,  21, 227, 173, 244,
+    119, 199, 128, 158,
+};
+
+static inline uint8_t camellia_rotl8(uint8_t v, unsigned int shift)
+{
+    return (v << shift) | (v >> (8 - shift));
+}
+
+static inline uint8_t camellia_sbox2(uint8_t x)
+{
+    return camellia_rotl8(camellia_sbox1[x], 1);
+}
+
+static inline uint8_t camellia_sbox3(uint8_t x)
+{
+    return camellia_rotl8(camellia_sbox1[x], 7);
+}
+
+static inline uint8_t camellia_sbox4(uint8_t x)
+{
+    return camellia_sbox1[camellia_rotl8(x, 1)];
+}
+
+static uint64_t camellia_f(uint64_t input, uint64_t key)
+{
+    uint64_t x = input ^ key;
+    uint8_t t1 = camellia_sbox1[x >> 56];
+    uint8_t t2 = camellia_sbox2((x >> 48) & 0xff);
+    uint8_t t3 = camellia_sbox3((x >> 40) & 0xff);
+    uint8_t t4 = camellia_sbox4((x >> 32) & 0xff);
+    uint8_t t5 = camellia_sbox2((x >> 24) & 0xff);
+    uint8_t t6 = camellia_sbox3((x >> 16) & 0xff);
+    uint8_t t7 = camellia_sbox4((x >> 8) & 0xff);
+    uint8_t t8 = camellia_sbox1[x & 0xff];
+    uint8_t y1 = t1 ^ t3 ^ t4 ^ t6 ^ t7 ^ t8;
+    uint8_t y2 = t1 ^ t2 ^ t4 ^ t5 ^ t7 ^ t8;
+    uint8_t y3 = t1 ^ t2 ^ t3 ^ t5 ^ t6 ^ t8;
+    uint8_t y4 = t2 ^ t3 ^ t4 ^ t5 ^ t6 ^ t7;
+    uint8_t y5 = t1 ^ t2 ^ t6 ^ t7 ^ t8;
+    uint8_t y6 = t2 ^ t3 ^ t5 ^ t7 ^ t8;
+    uint8_t y7 = t3 ^ t4 ^ t5 ^ t6 ^ t8;
+    uint8_t y8 = t1 ^ t4 ^ t5 ^ t6 ^ t7;
+
+    return ((uint64_t)y1 << 56) | ((uint64_t)y2 << 48) |
+           ((uint64_t)y3 << 40) | ((uint64_t)y4 << 32) |
+           ((uint64_t)y5 << 24) | ((uint64_t)y6 << 16) |
+           ((uint64_t)y7 << 8) | y8;
+}
+
+static uint64_t camellia_fl(uint64_t input, uint64_t key)
+{
+    uint32_t x1 = input >> 32;
+    uint32_t x2 = input;
+    uint32_t k1 = key >> 32;
+    uint32_t k2 = key;
+
+    x2 ^= rol32(x1 & k1, 1);
+    x1 ^= x2 | k2;
+    return ((uint64_t)x1 << 32) | x2;
+}
+
+static uint64_t camellia_flinv(uint64_t input, uint64_t key)
+{
+    uint32_t y1 = input >> 32;
+    uint32_t y2 = input;
+    uint32_t k1 = key >> 32;
+    uint32_t k2 = key;
+
+    y1 ^= y2 | k2;
+    y2 ^= rol32(y1 & k1, 1);
+    return ((uint64_t)y1 << 32) | y2;
+}
+
+static void octeon_camellia_round(MIPSOcteonCryptoState *crypto, uint64_t key)
+{
+    uint64_t left = crypto->aes_result[0];
+    uint64_t right = crypto->aes_result[1];
+
+    crypto->aes_result[0] = right ^ camellia_f(left, key);
+    crypto->aes_result[1] = left;
+}
+
+static void octeon_camellia_fl_layer(MIPSOcteonCryptoState *crypto,
+                                     uint64_t key, bool inverse)
+{
+    uint64_t state = crypto->aes_result[inverse ? 1 : 0];
+
+    crypto->aes_result[inverse ? 1 : 0] = inverse ?
+        camellia_flinv(state, key) :
+        camellia_fl(state, key);
+}
+
 static void octeon_sms4_crypt_common(MIPSOcteonCryptoState *crypto,
                                      bool encrypt, bool cbc)
 {
@@ -2046,6 +2157,12 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
     case OCTEON_COP2_SEL_AES_KEYLENGTH:
         crypto->aes_keylen = data;
         break;
+    case OCTEON_COP2_SEL_CAMELLIA_FL:
+        octeon_camellia_fl_layer(crypto, data, false);
+        break;
+    case OCTEON_COP2_SEL_CAMELLIA_FLINV:
+        octeon_camellia_fl_layer(crypto, data, true);
+        break;
     case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL:
     case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL_REFLECT:
         crypto->crc_poly = data;
@@ -2231,6 +2348,9 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
         crypto->aes_input[1] = data;
         octeon_aes_decrypt_common(crypto, false);
         break;
+    case OCTEON_COP2_SEL_CAMELLIA_ROUND:
+        octeon_camellia_round(crypto, data);
+        break;
     case OCTEON_COP2_SEL_SMS4_ENC_CBC1:
         crypto->aes_input[1] = data;
         octeon_sms4_crypt_common(crypto, true, true);
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index b4aa4917fd..ad8a38f927 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -111,6 +111,8 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_AES_DEC_CBC0:
     case OCTEON_COP2_SEL_AES_DEC0:
     case OCTEON_COP2_SEL_AES_KEYLENGTH:
+    case OCTEON_COP2_SEL_CAMELLIA_FL:
+    case OCTEON_COP2_SEL_CAMELLIA_FLINV:
     case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL:
     case OCTEON_COP2_SEL_CRC_IV:
     case OCTEON_COP2_SEL_CRC_WRITE_LEN:
@@ -206,6 +208,7 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_AES_ENC1:
     case OCTEON_COP2_SEL_AES_DEC_CBC1:
     case OCTEON_COP2_SEL_AES_DEC1:
+    case OCTEON_COP2_SEL_CAMELLIA_ROUND:
     case OCTEON_COP2_SEL_SMS4_ENC_CBC1:
     case OCTEON_COP2_SEL_SMS4_ENC1:
     case OCTEON_COP2_SEL_SMS4_DEC_CBC1:

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 32/35] target/mips: add Octeon CHORD and LLM COP2 support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (30 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 31/35] target/mips: add Octeon Camellia " James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 33/35] target/mips: add Octeon CvmCount RDHWR support James Hilliard
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Add the Octeon CHORD hardware register access path and the LLM
36-bit and 64-bit read and write windows.

Model both CHORD access forms, including the rdhwr $30 path and the
legacy dmfc2 alias, and implement sparse backing storage for the two LLM
sets so user-mode code can save, restore, and probe the architectural
state.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
Changes v1 -> v2:
  - Use neutral selector-slot wording for the LLM/CHORD alias comment.
  - Add selector dispatch updates in octeon_translate.c after moving
    COP2 decode out of translate.c.  (suggested by Philippe
    Mathieu-Daudé)

Changes v5 -> v6:
  - Rename sparse LLM backing fields from llm_narrow/llm_wide to
    llm36/llm64 to match the 36-bit and 64-bit selector windows.
---
 target/mips/cpu.c                  | 67 +++++++++++++++++++++++++++++++++++
 target/mips/cpu.h                  | 20 +++++++++++
 target/mips/helper.h               |  1 +
 target/mips/internal.h             |  3 ++
 target/mips/system/machine.c       | 67 +++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_crypto.c    | 72 ++++++++++++++++++++++++++++++++++++++
 target/mips/tcg/octeon_translate.c | 13 +++++++
 target/mips/tcg/op_helper.c        |  6 ++++
 target/mips/tcg/translate.c        |  8 +++++
 9 files changed, 257 insertions(+)

diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 6e827c72de..9bf9b67202 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -27,6 +27,7 @@
 #include "internal.h"
 #include "kvm_mips.h"
 #include "qemu/module.h"
+#include "qemu/qtree.h"
 #include "system/kvm.h"
 #include "system/qtest.h"
 #include "hw/core/qdev-properties.h"
@@ -183,6 +184,57 @@ static bool mips_cpu_has_work(CPUState *cs)
 
 #include "cpu-defs.c.inc"
 
+static gint mips_octeon_u64_tree_compare(gconstpointer a, gconstpointer b,
+                                         gpointer user_data)
+{
+    uint64_t av = *(const uint64_t *)a;
+    uint64_t bv = *(const uint64_t *)b;
+
+    return (av > bv) - (av < bv);
+}
+
+QTree *mips_octeon_llm_tree_new(void)
+{
+    return q_tree_new_full(mips_octeon_u64_tree_compare,
+                           NULL, g_free, g_free);
+}
+
+uint64_t mips_octeon_llm_load(QTree *tree, uint64_t addr)
+{
+    uint64_t key = addr;
+    uint64_t *value = tree ? q_tree_lookup(tree, &key) : NULL;
+
+    return value ? *value : 0;
+}
+
+void mips_octeon_llm_store(QTree **treep, uint64_t addr, uint64_t value)
+{
+    uint64_t *key;
+    uint64_t *stored;
+
+    if (!*treep) {
+        *treep = mips_octeon_llm_tree_new();
+    }
+
+    key = g_new(uint64_t, 1);
+    stored = g_new(uint64_t, 1);
+    *key = addr;
+    *stored = value;
+    q_tree_replace(*treep, key, stored);
+}
+
+static void mips_octeon_destroy_llm_state(MIPSOcteonCryptoState *crypto)
+{
+    if (crypto->llm36) {
+        q_tree_destroy(crypto->llm36);
+        crypto->llm36 = NULL;
+    }
+    if (crypto->llm64) {
+        q_tree_destroy(crypto->llm64);
+        crypto->llm64 = NULL;
+    }
+}
+
 static void mips_cpu_reset_hold(Object *obj, ResetType type)
 {
     CPUState *cs = CPU(obj);
@@ -194,6 +246,7 @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
         mcc->parent_phases.hold(obj, type);
     }
 
+    mips_octeon_destroy_llm_state(&env->octeon_crypto);
     memset(env, 0, offsetof(CPUMIPSState, end_reset_fields));
 
     /* Reset registers to their default values */
@@ -248,6 +301,9 @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
     env->active_fpu.fcr31 = env->cpu_model->CP1_fcr31;
     env->msair = env->cpu_model->MSAIR;
     env->insn_flags = env->cpu_model->insn_flags;
+    if (env->insn_flags & INSN_OCTEON) {
+        env->octeon_crypto.chord = 1;
+    }
 
 #if defined(CONFIG_USER_ONLY)
     env->CP0_Status = (MIPS_HFLAG_UM << CP0St_KSU);
@@ -264,6 +320,9 @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
      * hardware registers.
      */
     env->CP0_HWREna |= 0x0000000F;
+    if (env->insn_flags & INSN_OCTEON) {
+        env->CP0_HWREna |= 0x40000000u;
+    }
     if (env->CP0_Config1 & (1 << CP0C1_FP)) {
         env->CP0_Status |= (1 << CP0St_CU1);
     }
@@ -422,6 +481,13 @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
 #endif
 }
 
+static void mips_cpu_finalize(Object *obj)
+{
+    MIPSCPU *cpu = MIPS_CPU(obj);
+
+    mips_octeon_destroy_llm_state(&cpu->env.octeon_crypto);
+}
+
 static void mips_cpu_disas_set_info(const CPUState *cs, disassemble_info *info)
 {
     const MIPSCPU *cpu = MIPS_CPU(cs);
@@ -648,6 +714,7 @@ static const TypeInfo mips_cpu_type_info = {
     .instance_size = sizeof(MIPSCPU),
     .instance_align = __alignof(MIPSCPU),
     .instance_init = mips_cpu_initfn,
+    .instance_finalize = mips_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(MIPSCPUClass),
     .class_init = mips_cpu_class_init,
diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index ba886735d5..b1974c367f 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -11,6 +11,7 @@
 #include "fpu/softfloat-types.h"
 #include "hw/core/clock.h"
 #include "mips-defs.h"
+#include "qemu/qtree.h"
 
 typedef struct CPUMIPSTLBContext CPUMIPSTLBContext;
 
@@ -617,6 +618,21 @@ typedef enum MIPSOcteonCop2Sel {
     OCTEON_COP2_SEL_SMS4_ENC0 = OCTEON_COP2_SEL_AES_ENC0,
     OCTEON_COP2_SEL_SMS4_DEC_CBC0 = OCTEON_COP2_SEL_AES_DEC_CBC0,
     OCTEON_COP2_SEL_SMS4_DEC0 = OCTEON_COP2_SEL_AES_DEC0,
+    /*
+     * Selector 0x0400 is the 36-bit LLM read selector and is also used as a
+     * DMFC2 alias for the CHORD POW tag-switch completion bit.
+     */
+    OCTEON_COP2_SEL_LLM_READ_ADDR0 = 0x0400,
+    OCTEON_COP2_SEL_CHORD = OCTEON_COP2_SEL_LLM_READ_ADDR0,
+    OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL0 = 0x0401,
+    OCTEON_COP2_SEL_LLM_DATA0 = 0x0402,
+    OCTEON_COP2_SEL_LLM_READ64_ADDR0 = 0x0404,
+    OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL0 = 0x0405,
+    OCTEON_COP2_SEL_LLM_READ_ADDR1 = 0x0408,
+    OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL1 = 0x0409,
+    OCTEON_COP2_SEL_LLM_DATA1 = 0x040a,
+    OCTEON_COP2_SEL_LLM_READ64_ADDR1 = 0x040c,
+    OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL1 = 0x040d,
     OCTEON_COP2_SEL_CRC_POLYNOMIAL = 0x0200,
     OCTEON_COP2_SEL_CRC_IV = 0x0201,
     OCTEON_COP2_SEL_CRC_LEN = 0x0202,
@@ -754,6 +770,10 @@ typedef struct MIPSOcteonCryptoState {
     uint32_t zuc_lfsr[16];
     uint32_t zuc_window[3];
     uint32_t zuc_tresult;
+    uint64_t llm_data[2];
+    uint64_t chord;
+    QTree *llm36;
+    QTree *llm64;
 } MIPSOcteonCryptoState;
 
 typedef struct CPUArchState {
diff --git a/target/mips/helper.h b/target/mips/helper.h
index 52fe18a8f8..410a9b8090 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -202,6 +202,7 @@ DEF_HELPER_1(rdhwr_cc, tl, env)
 DEF_HELPER_1(rdhwr_ccres, tl, env)
 DEF_HELPER_1(rdhwr_performance, tl, env)
 DEF_HELPER_1(rdhwr_xnp, tl, env)
+DEF_HELPER_1(rdhwr_chord, tl, env)
 DEF_HELPER_2(pmon, void, env, int)
 DEF_HELPER_1(wait, void, env)
 
diff --git a/target/mips/internal.h b/target/mips/internal.h
index 23e1ada185..026dc0ea4f 100644
--- a/target/mips/internal.h
+++ b/target/mips/internal.h
@@ -93,6 +93,9 @@ extern const int mips_defs_number;
 
 int mips_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int mips_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+QTree *mips_octeon_llm_tree_new(void);
+uint64_t mips_octeon_llm_load(QTree *tree, uint64_t addr);
+void mips_octeon_llm_store(QTree **treep, uint64_t addr, uint64_t value);
 
 #define USEG_LIMIT      ((target_ulong)(int32_t)0x7FFFFFFFUL)
 #define KSEG0_BASE      ((target_ulong)(int32_t)0x80000000UL)
diff --git a/target/mips/system/machine.c b/target/mips/system/machine.c
index 9bcb066245..1ec05f0600 100644
--- a/target/mips/system/machine.c
+++ b/target/mips/system/machine.c
@@ -131,6 +131,69 @@ static const VMStateDescription vmstate_octeon_multiplier_tc = {
     }
 };
 
+typedef struct OcteonLLMTreePutData {
+    QEMUFile *f;
+} OcteonLLMTreePutData;
+
+static gboolean put_octeon_llm_tree_entry(gpointer key, gpointer value,
+                                          gpointer user_data)
+{
+    OcteonLLMTreePutData *data = user_data;
+
+    qemu_put_be64(data->f, *(uint64_t *)key);
+    qemu_put_be64(data->f, *(uint64_t *)value);
+    return false;
+}
+
+static int put_octeon_llm_tree(QEMUFile *f, void *pv, size_t size,
+                               const VMStateField *field, JSONWriter *vmdesc)
+{
+    QTree *tree = *(QTree **)pv;
+    OcteonLLMTreePutData data = { .f = f };
+    uint32_t nnodes = tree ? q_tree_nnodes(tree) : 0;
+
+    qemu_put_be32(f, nnodes);
+    if (tree) {
+        q_tree_foreach(tree, put_octeon_llm_tree_entry, &data);
+    }
+
+    return 0;
+}
+
+static int get_octeon_llm_tree(QEMUFile *f, void *pv, size_t size,
+                               const VMStateField *field)
+{
+    QTree **treep = pv;
+    uint32_t nnodes = qemu_get_be32(f);
+
+    if (*treep) {
+        q_tree_destroy(*treep);
+    }
+    *treep = mips_octeon_llm_tree_new();
+
+    for (uint32_t i = 0; i < nnodes; i++) {
+        uint64_t addr = qemu_get_be64(f);
+        uint64_t value = qemu_get_be64(f);
+
+        mips_octeon_llm_store(treep, addr, value);
+    }
+
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_octeon_llm_tree = {
+    .name = "octeon_llm_tree",
+    .get = get_octeon_llm_tree,
+    .put = put_octeon_llm_tree,
+};
+
+#define VMSTATE_OCTEON_LLM_TREE(_f, _s) {                         \
+    .name = stringify(_f),                                        \
+    .version_id = 1,                                              \
+    .info = &vmstate_info_octeon_llm_tree,                        \
+    .offset = vmstate_offset_pointer(_s, _f, QTree),              \
+}
+
 /* MVP state */
 
 static const VMStateDescription vmstate_mvp = {
@@ -316,6 +379,10 @@ static const VMStateDescription mips_vmstate_octeon_crypto = {
         VMSTATE_UINT32_ARRAY(env.octeon_crypto.zuc_lfsr, MIPSCPU, 16),
         VMSTATE_UINT32_ARRAY(env.octeon_crypto.zuc_window, MIPSCPU, 3),
         VMSTATE_UINT32(env.octeon_crypto.zuc_tresult, MIPSCPU),
+        VMSTATE_UINT64_ARRAY(env.octeon_crypto.llm_data, MIPSCPU, 2),
+        VMSTATE_UINT64(env.octeon_crypto.chord, MIPSCPU),
+        VMSTATE_OCTEON_LLM_TREE(env.octeon_crypto.llm36, MIPSCPU),
+        VMSTATE_OCTEON_LLM_TREE(env.octeon_crypto.llm64, MIPSCPU),
         VMSTATE_END_OF_LIST()
     }
 };
diff --git a/target/mips/tcg/octeon_crypto.c b/target/mips/tcg/octeon_crypto.c
index 27e34b7f43..b845bdff07 100644
--- a/target/mips/tcg/octeon_crypto.c
+++ b/target/mips/tcg/octeon_crypto.c
@@ -16,6 +16,42 @@
 #include "qemu/bitops.h"
 #include "qemu/host-utils.h"
 
+#define OCTEON_LLM_NARROW_MASK ((1ULL << 36) - 1)
+
+static uint64_t octeon_llm_pack_narrow(uint64_t value)
+{
+    value &= OCTEON_LLM_NARROW_MASK;
+    return value | ((uint64_t)(ctpop64(value) & 1) << 36);
+}
+
+static void octeon_llm_read(MIPSOcteonCryptoState *crypto, unsigned int set,
+                            uint64_t addr, bool wide)
+{
+    uint64_t value;
+
+    if (wide) {
+        value = mips_octeon_llm_load(crypto->llm64, addr);
+    } else {
+        value = octeon_llm_pack_narrow(
+            mips_octeon_llm_load(crypto->llm36, addr));
+    }
+
+    crypto->llm_data[set] = value;
+}
+
+static void octeon_llm_write(MIPSOcteonCryptoState *crypto, unsigned int set,
+                             uint64_t addr, bool wide)
+{
+    uint64_t value = crypto->llm_data[set];
+
+    if (wide) {
+        mips_octeon_llm_store(&crypto->llm64, addr, value);
+    } else {
+        mips_octeon_llm_store(&crypto->llm36, addr,
+                              value & OCTEON_LLM_NARROW_MASK);
+    }
+}
+
 static inline void octeon_set_shared_mode(MIPSOcteonCryptoState *crypto,
                                           MIPSOcteonSharedMode mode)
 {
@@ -2001,6 +2037,12 @@ uint64_t helper_octeon_cop2_dmfc2(CPUMIPSState *env, uint32_t sel)
         return crypto->crc_len;
     case OCTEON_COP2_SEL_CRC_IV_REFLECT:
         return octeon_crc_reflect32_by_byte(crypto->crc_iv);
+    case OCTEON_COP2_SEL_CHORD:
+        return crypto->chord;
+    case OCTEON_COP2_SEL_LLM_DATA0:
+        return crypto->llm_data[0];
+    case OCTEON_COP2_SEL_LLM_DATA1:
+        return crypto->llm_data[1];
     case OCTEON_COP2_SEL_HSH_DATW0:
     case OCTEON_COP2_SEL_HSH_DATW1:
     case OCTEON_COP2_SEL_HSH_DATW2:
@@ -2157,6 +2199,36 @@ void helper_octeon_cop2_dmtc2(CPUMIPSState *env, uint64_t value,
     case OCTEON_COP2_SEL_AES_KEYLENGTH:
         crypto->aes_keylen = data;
         break;
+    case OCTEON_COP2_SEL_LLM_READ_ADDR0:
+        octeon_llm_read(crypto, 0, data, false);
+        break;
+    case OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL0:
+        octeon_llm_write(crypto, 0, data, false);
+        break;
+    case OCTEON_COP2_SEL_LLM_DATA0:
+        crypto->llm_data[0] = data;
+        break;
+    case OCTEON_COP2_SEL_LLM_READ64_ADDR0:
+        octeon_llm_read(crypto, 0, data, true);
+        break;
+    case OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL0:
+        octeon_llm_write(crypto, 0, data, true);
+        break;
+    case OCTEON_COP2_SEL_LLM_READ_ADDR1:
+        octeon_llm_read(crypto, 1, data, false);
+        break;
+    case OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL1:
+        octeon_llm_write(crypto, 1, data, false);
+        break;
+    case OCTEON_COP2_SEL_LLM_DATA1:
+        crypto->llm_data[1] = data;
+        break;
+    case OCTEON_COP2_SEL_LLM_READ64_ADDR1:
+        octeon_llm_read(crypto, 1, data, true);
+        break;
+    case OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL1:
+        octeon_llm_write(crypto, 1, data, true);
+        break;
     case OCTEON_COP2_SEL_CAMELLIA_FL:
         octeon_camellia_fl_layer(crypto, data, false);
         break;
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index ad8a38f927..b10886b199 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -78,6 +78,9 @@ static bool octeon_cop2_is_supported_dmfc2(uint16_t sel)
     case OCTEON_COP2_SEL_GFM_RESINP0:
     case OCTEON_COP2_SEL_GFM_RESINP1:
     case OCTEON_COP2_SEL_GFM_POLY:
+    case OCTEON_COP2_SEL_CHORD:
+    case OCTEON_COP2_SEL_LLM_DATA0:
+    case OCTEON_COP2_SEL_LLM_DATA1:
         return true;
     default:
         return false;
@@ -113,6 +116,16 @@ static bool octeon_cop2_is_supported_dmtc2(uint16_t sel)
     case OCTEON_COP2_SEL_AES_KEYLENGTH:
     case OCTEON_COP2_SEL_CAMELLIA_FL:
     case OCTEON_COP2_SEL_CAMELLIA_FLINV:
+    case OCTEON_COP2_SEL_LLM_READ_ADDR0:
+    case OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL0:
+    case OCTEON_COP2_SEL_LLM_DATA0:
+    case OCTEON_COP2_SEL_LLM_READ64_ADDR0:
+    case OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL0:
+    case OCTEON_COP2_SEL_LLM_READ_ADDR1:
+    case OCTEON_COP2_SEL_LLM_WRITE_ADDR_INTERNAL1:
+    case OCTEON_COP2_SEL_LLM_DATA1:
+    case OCTEON_COP2_SEL_LLM_READ64_ADDR1:
+    case OCTEON_COP2_SEL_LLM_WRITE64_ADDR_INTERNAL1:
     case OCTEON_COP2_SEL_CRC_WRITE_POLYNOMIAL:
     case OCTEON_COP2_SEL_CRC_IV:
     case OCTEON_COP2_SEL_CRC_WRITE_LEN:
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index 0a892e31a8..67854f08df 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -412,6 +412,12 @@ target_ulong helper_rdhwr_xnp(CPUMIPSState *env)
     return (env->CP0_Config5 >> CP0C5_XNP) & 1;
 }
 
+target_ulong helper_rdhwr_chord(CPUMIPSState *env)
+{
+    check_hwrena(env, 30, GETPC());
+    return env->octeon_crypto.chord;
+}
+
 void helper_pmon(CPUMIPSState *env, int function)
 {
     function /= 2;
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 767d64718a..3e39f3460a 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -10923,6 +10923,14 @@ void gen_rdhwr(DisasContext *ctx, int rt, int rd, int sel)
         }
         break;
 #endif
+    case 30:
+        if (!(ctx->insn_flags & INSN_OCTEON)) {
+            gen_reserved_instruction(ctx);
+            break;
+        }
+        gen_helper_rdhwr_chord(t0, tcg_env);
+        gen_store_gpr(t0, rt);
+        break;
     default:            /* Invalid */
         MIPS_INVAL("rdhwr");
         gen_reserved_instruction(ctx);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 33/35] target/mips: add Octeon CvmCount RDHWR support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (31 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 32/35] target/mips: add Octeon CHORD and LLM COP2 support James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 34/35] tests/tcg/mips: cover Octeon QMAC and CvmCount James Hilliard
  2026-05-11 18:23 ` [PATCH v6 35/35] target/mips: expose Octeon68XX floating-point support James Hilliard
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Octeon exposes CvmCount through RDHWR register 31. Add the Octeon-only
decode path, enable the corresponding HWREna bit for linux-user, and use
an unsigned mask when checking HWREna so bit 31 is handled safely.

For user-mode emulation, return host ticks as a monotonic counter source
suitable for existing Octeon userspace code. In system mode, fall back to
the existing CP0 Count value.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
 target/mips/cpu.c           |  2 +-
 target/mips/helper.h        |  1 +
 target/mips/tcg/op_helper.c | 13 ++++++++++++-
 target/mips/tcg/translate.c | 11 +++++++++++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 9bf9b67202..639ffa77cd 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -321,7 +321,7 @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
      */
     env->CP0_HWREna |= 0x0000000F;
     if (env->insn_flags & INSN_OCTEON) {
-        env->CP0_HWREna |= 0x40000000u;
+        env->CP0_HWREna |= 0xc0000000u;
     }
     if (env->CP0_Config1 & (1 << CP0C1_FP)) {
         env->CP0_Status |= (1 << CP0St_CU1);
diff --git a/target/mips/helper.h b/target/mips/helper.h
index 410a9b8090..d7a0feb673 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -203,6 +203,7 @@ DEF_HELPER_1(rdhwr_ccres, tl, env)
 DEF_HELPER_1(rdhwr_performance, tl, env)
 DEF_HELPER_1(rdhwr_xnp, tl, env)
 DEF_HELPER_1(rdhwr_chord, tl, env)
+DEF_HELPER_1(rdhwr_cvmcount, tl, env)
 DEF_HELPER_2(pmon, void, env, int)
 DEF_HELPER_1(wait, void, env)
 
diff --git a/target/mips/tcg/op_helper.c b/target/mips/tcg/op_helper.c
index 67854f08df..55ac877506 100644
--- a/target/mips/tcg/op_helper.c
+++ b/target/mips/tcg/op_helper.c
@@ -25,6 +25,7 @@
 #include "exec/memop.h"
 #include "fpu_helper.h"
 #include "qemu/crc32c.h"
+#include "qemu/timer.h"
 #include <zlib.h>
 
 static inline target_ulong bitswap(target_ulong v)
@@ -366,7 +367,7 @@ target_ulong helper_yield(CPUMIPSState *env, target_ulong arg)
 
 static inline void check_hwrena(CPUMIPSState *env, int reg, uintptr_t pc)
 {
-    if ((env->hflags & MIPS_HFLAG_CP0) || (env->CP0_HWREna & (1 << reg))) {
+    if ((env->hflags & MIPS_HFLAG_CP0) || (env->CP0_HWREna & (1u << reg))) {
         return;
     }
     do_raise_exception(env, EXCP_RI, pc);
@@ -418,6 +419,16 @@ target_ulong helper_rdhwr_chord(CPUMIPSState *env)
     return env->octeon_crypto.chord;
 }
 
+target_ulong helper_rdhwr_cvmcount(CPUMIPSState *env)
+{
+    check_hwrena(env, 31, GETPC());
+#ifdef CONFIG_USER_ONLY
+    return cpu_get_host_ticks();
+#else
+    return (uint32_t)cpu_mips_get_count(env);
+#endif
+}
+
 void helper_pmon(CPUMIPSState *env, int function)
 {
     function /= 2;
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 3e39f3460a..7627a4ffb4 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -10931,6 +10931,17 @@ void gen_rdhwr(DisasContext *ctx, int rt, int rd, int sel)
         gen_helper_rdhwr_chord(t0, tcg_env);
         gen_store_gpr(t0, rt);
         break;
+    case 31:
+        if (!(ctx->insn_flags & INSN_OCTEON)) {
+            gen_reserved_instruction(ctx);
+            break;
+        }
+        translator_io_start(&ctx->base);
+        gen_helper_rdhwr_cvmcount(t0, tcg_env);
+        gen_store_gpr(t0, rt);
+        gen_save_pc(ctx->base.pc_next + 4);
+        ctx->base.is_jmp = DISAS_EXIT;
+        break;
     default:            /* Invalid */
         MIPS_INVAL("rdhwr");
         gen_reserved_instruction(ctx);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 34/35] tests/tcg/mips: cover Octeon QMAC and CvmCount
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (32 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 33/35] target/mips: add Octeon CvmCount RDHWR support James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  2026-05-11 18:23 ` [PATCH v6 35/35] target/mips: expose Octeon68XX floating-point support James Hilliard
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Extend the Octeon linux-user smoke test with QMAC/QMACS fixed-point
accumulator checks and an RDHWR $31 monotonicity check.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
---
 tests/tcg/mips/user/isa/octeon/octeon-insns.c | 57 +++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/tests/tcg/mips/user/isa/octeon/octeon-insns.c b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
index 435ccfa347..1dbdd9f52a 100644
--- a/tests/tcg/mips/user/isa/octeon/octeon-insns.c
+++ b/tests/tcg/mips/user/isa/octeon/octeon-insns.c
@@ -129,6 +129,59 @@ static uint64_t octeon_vmm0(uint64_t mpl0, uint64_t p0,
     return rd;
 }
 
+static uint64_t octeon_qmac_lo(uint64_t rs, uint64_t rt, uint64_t lo)
+{
+    uint64_t rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        "mtlo %[lo]\n\t"
+        "mthi $0\n\t"
+        ".word 0x710904d2\n\t" /* qmac.03 $8, $9 */
+        "mflo %[rd]\n\t"
+        : [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt), [lo] "r" (lo)
+        : "$8", "$9");
+
+    return rd;
+}
+
+static uint64_t octeon_qmacs_state(uint64_t rs, uint64_t rt, uint64_t lo)
+{
+    uint64_t hi, rd;
+
+    asm volatile(
+        "move $8, %[rs]\n\t"
+        "move $9, %[rt]\n\t"
+        "mtlo %[lo]\n\t"
+        "mthi $0\n\t"
+        ".word 0x71090012\n\t" /* qmacs.00 $8, $9 */
+        "mfhi %[hi]\n\t"
+        "mflo %[rd]\n\t"
+        : [hi] "=r" (hi), [rd] "=r" (rd)
+        : [rs] "r" (rs), [rt] "r" (rt), [lo] "r" (lo)
+        : "$8", "$9");
+
+    return ((hi & 1) << 32) | (rd & 0xffffffff);
+}
+
+static uint64_t octeon_rdhwr31_non_decreasing(void)
+{
+    uint64_t first, second;
+
+    asm volatile(
+        ".word 0x7c08f83b\n\t" /* rdhwr $8, $31 */
+        ".word 0x7c09f83b\n\t" /* rdhwr $9, $31 */
+        "move %[first], $8\n\t"
+        "move %[second], $9\n\t"
+        : [first] "=r" (first), [second] "=r" (second)
+        :
+        : "$8", "$9");
+
+    return second >= first;
+}
+
 static uint64_t octeon_vmm0_zeroes_mpl1(void)
 {
     uint64_t rd;
@@ -259,6 +312,10 @@ int main(void)
     assert(octeon_seq(0xabc, 0xdef) == 0);
     assert(octeon_sne(0xabc, 0xabc) == 0);
     assert(octeon_sne(0xabc, 0xdef) == 1);
+    assert(octeon_qmac_lo(0x0003000000000000ULL, 2, 1) == 13);
+    assert(octeon_qmacs_state(1, 1, 0x7ffffffe) == 0x17fffffffULL);
+    assert(octeon_qmacs_state(0x8000, 0x8000, 0) == 0x17fffffffULL);
+    assert(octeon_rdhwr31_non_decreasing());
     assert(octeon_vmulu(5, 7, 11) == 46);
     assert(octeon_vmm0(5, 13, 7, 11) == 59);
     assert(octeon_vmm0_zeroes_mpl1() == 0);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 35/35] target/mips: expose Octeon68XX floating-point support
  2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
                   ` (33 preceding siblings ...)
  2026-05-11 18:23 ` [PATCH v6 34/35] tests/tcg/mips: cover Octeon QMAC and CvmCount James Hilliard
@ 2026-05-11 18:23 ` James Hilliard
  34 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 18:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier,
	Philippe Mathieu-Daudé, Aurelien Jarno, Jiaxun Yang,
	Aleksandar Rikalo, Huacai Chen, James Hilliard

Octeon68XX cores implement CP1. Advertise that in the CPU definition by
setting Config1.FP, enabling the writable Status bits, and providing the
FCR0/FCR31 defaults used by this CPU model.

This lets guests observe the expected floating-point feature bits and
use CP1 with -cpu Octeon68XX.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>

---
Changes v1 -> v2:
  - Move this CPU-model correction into a separate final patch.
    (suggested by Philippe Mathieu-Daudé)
---
 target/mips/cpu-defs.c.inc | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/mips/cpu-defs.c.inc b/target/mips/cpu-defs.c.inc
index faefab0473..cc1916232f 100644
--- a/target/mips/cpu-defs.c.inc
+++ b/target/mips/cpu-defs.c.inc
@@ -997,7 +997,8 @@ const mips_def_t mips_defs[] =
         .CP0_PRid = 0x000D9100,
         .CP0_Config0 = MIPS_CONFIG0 | (0x1 << CP0C0_AR) | (0x2 << CP0C0_AT) |
                        (MMU_TYPE_R4000 << CP0C0_MT),
-        .CP0_Config1 = MIPS_CONFIG1 | (0x3F << CP0C1_MMU) |
+        .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) |
+                       (0x3F << CP0C1_MMU) |
                        (1 << CP0C1_IS) | (4 << CP0C1_IL) | (1 << CP0C1_IA) |
                        (1 << CP0C1_DS) | (4 << CP0C1_DL) | (1 << CP0C1_DA) |
                        (1 << CP0C1_PC) | (1 << CP0C1_WR) | (1 << CP0C1_EP),
@@ -1011,7 +1012,12 @@ const mips_def_t mips_defs[] =
         .CP0_PageGrain = (1 << CP0PG_ELPA),
         .SYNCI_Step = 32,
         .CCRes = 2,
-        .CP0_Status_rw_bitmask = 0x12F8FFFF,
+        .CP0_Status_rw_bitmask = 0x36F8FFFF,
+        .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_3D) | (1 << FCR0_PS) |
+                    (1 << FCR0_L) | (1 << FCR0_W) | (1 << FCR0_D) |
+                    (1 << FCR0_S) | (0x00 << FCR0_PRID) | (0x0 << FCR0_REV),
+        .CP1_fcr31 = 0,
+        .CP1_fcr31_rw_bitmask = 0xFF83FFFF,
         .SEGBITS = 42,
         .PABITS = 49,
         .insn_flags = CPU_MIPS64R2 | INSN_OCTEON,

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 07/35] target/mips: add Octeon multiplier state
  2026-05-11 18:22 ` [PATCH v6 07/35] target/mips: add Octeon multiplier state James Hilliard
@ 2026-05-11 18:31   ` Richard Henderson
  2026-05-14 10:27   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2026-05-11 18:31 UTC (permalink / raw)
  To: qemu-devel

On 5/11/26 13:22, James Hilliard wrote:
> Add per-thread Octeon multiplier state for the MPL and P limb banks used
> by the VMULU/VMM0/V3MULU instruction family.
> 
> Octeon3 extends the older MPL0-MPL2/P0-P2 state with high lanes
> MPL3-MPL5/P3-P5, programmed by the two-source MTM/MTP forms. Represent
> both banks as uint64_t arrays so the TC state matches the architected
> 64-bit limb layout used by Octeon68XX user-mode code.
> 
> Migrate the multiplier registers in an Octeon-only subsection so
> non-Octeon CPU models do not grow migration state.
> 
> Signed-off-by: James Hilliard<james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split the multiplier state out of the combined Octeon arithmetic and
>      memory instruction patch.  (requested by Richard Henderson)
> 
> Changes v3 -> v4:
>    - Document and keep the Octeon3 MPL3-MPL5/P3-P5 high-lane state used by
>      the two-source MTM/MTP forms.
> ---
>   target/mips/cpu.h            | 12 ++++++++++++
>   target/mips/system/machine.c | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 45 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction
  2026-05-11 18:22 ` [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction James Hilliard
@ 2026-05-11 18:40   ` Richard Henderson
  2026-05-11 20:12     ` James Hilliard
  0 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2026-05-11 18:40 UTC (permalink / raw)
  To: qemu-devel

On 5/11/26 13:22, James Hilliard wrote:
> +static void octeon_reset_mtm0_mpl_state(void)
> +{
> +    TCGv_i64 zero = tcg_constant_i64(0);
> +
> +    /*
> +     * MTM0 defines MPL1 as zero; model the architecturally unpredictable
> +     * MPL2/MPL4/MPL5 lanes as zero for deterministic emulation.
> +     */
> +    octeon_store_mpl(1, zero);
> +    octeon_store_mpl(2, zero);
> +    octeon_store_mpl(4, zero);
> +    octeon_store_mpl(5, zero);

Where do you get that from?

The octeon2 documentation, which only has MPL[0-2], MTM0 does *not* modify either MPL1 or 
MPL2.  While I don't know what changed with octeon3, I really doubt this statement is 
true.  It just doesn't make any sense.


r~


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction
  2026-05-11 18:22 ` [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction James Hilliard
@ 2026-05-11 18:46   ` Richard Henderson
  2026-05-11 20:19     ` James Hilliard
  0 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2026-05-11 18:46 UTC (permalink / raw)
  To: qemu-devel

On 5/11/26 13:22, James Hilliard wrote:
> +static bool trans_mtp(DisasContext *ctx, arg_r2 *a, unsigned int index)
> +{
> +    TCGv_i64 value = tcg_temp_new_i64();
> +
> +    /*
> +     * Octeon3 two-source MTP forms load lane index from rs and lane index + 3
> +     * from rt.  Legacy one-source forms encode rt as $zero.
> +     */
> +    gen_load_gpr(value, a->rs);
> +    octeon_store_p(index, value);
> +    gen_load_gpr(value, a->rt);
> +    octeon_store_p(index + 3, value);
> +    if (index == 0) {
> +        /*
> +         * The hardware description and register-state table define P1 as zero;
> +         * model P2/P4/P5 as zero for deterministic emulation.
> +         */
> +        TCGv_i64 zero = tcg_constant_i64(0);
> +
> +        octeon_store_p(1, zero);
> +        octeon_store_p(2, zero);
> +        octeon_store_p(4, zero);
> +        octeon_store_p(5, zero);

Likewise, where does this come from?

The octeon2 manual for MTP* are quite simple, writing to just the one register.


r~


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction
  2026-05-11 18:40   ` Richard Henderson
@ 2026-05-11 20:12     ` James Hilliard
  0 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 20:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, May 11, 2026 at 1:58 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/11/26 13:22, James Hilliard wrote:
> > +static void octeon_reset_mtm0_mpl_state(void)
> > +{
> > +    TCGv_i64 zero = tcg_constant_i64(0);
> > +
> > +    /*
> > +     * MTM0 defines MPL1 as zero; model the architecturally unpredictable
> > +     * MPL2/MPL4/MPL5 lanes as zero for deterministic emulation.
> > +     */
> > +    octeon_store_mpl(1, zero);
> > +    octeon_store_mpl(2, zero);
> > +    octeon_store_mpl(4, zero);
> > +    octeon_store_mpl(5, zero);
>
> Where do you get that from?
>
> The octeon2 documentation, which only has MPL[0-2], MTM0 does *not* modify either MPL1 or
> MPL2.  While I don't know what changed with octeon3, I really doubt this statement is
> true.  It just doesn't make any sense.

CN50XX-HM-0.99E is OCTEON Plus documentation and covers the older
three-lane multiplier state. Its MTM0 definition is MPL0 = rs; P0, P1,
P2 = 0. It does not define the OCTEON III rt source or the high
multiplier lanes MPL3..MPL5.

The behavior modeled here is from CN71XX/OCTEON III. CN71XX defines
MTM0 as loading MPL0 from rs, loading MPL3 from rt, setting MPL1 to
zero, and zeroing P0..P5. It marks MPL2/MPL4/MPL5 architecturally
unpredictable; I modeled those lanes as zero for deterministic emulation.

The MPL1/MPL2/MPL4/MPL5 handling is therefore not from the
CN50XX/OCTEON Plus MTM0 definition. It follows the newer OCTEON III
multiplier-state definition.

The legacy form remains encoding-compatible because cnMIPS II MTM0 rs
maps to cnMIPS III MTM0 rs, $0, but the modeled state side effects follow
the OCTEON III definition.

>
>
> r~
>
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction
  2026-05-11 18:46   ` Richard Henderson
@ 2026-05-11 20:19     ` James Hilliard
  0 siblings, 0 replies; 52+ messages in thread
From: James Hilliard @ 2026-05-11 20:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, May 11, 2026 at 1:58 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/11/26 13:22, James Hilliard wrote:
> > +static bool trans_mtp(DisasContext *ctx, arg_r2 *a, unsigned int index)
> > +{
> > +    TCGv_i64 value = tcg_temp_new_i64();
> > +
> > +    /*
> > +     * Octeon3 two-source MTP forms load lane index from rs and lane index + 3
> > +     * from rt.  Legacy one-source forms encode rt as $zero.
> > +     */
> > +    gen_load_gpr(value, a->rs);
> > +    octeon_store_p(index, value);
> > +    gen_load_gpr(value, a->rt);
> > +    octeon_store_p(index + 3, value);
> > +    if (index == 0) {
> > +        /*
> > +         * The hardware description and register-state table define P1 as zero;
> > +         * model P2/P4/P5 as zero for deterministic emulation.
> > +         */
> > +        TCGv_i64 zero = tcg_constant_i64(0);
> > +
> > +        octeon_store_p(1, zero);
> > +        octeon_store_p(2, zero);
> > +        octeon_store_p(4, zero);
> > +        octeon_store_p(5, zero);
>
> Likewise, where does this come from?

CN50XX-HM-0.99E is OCTEON Plus documentation and covers the older
three-lane product state. Its MTP0 definition is P0 = rs. It does not
define the OCTEON III rt source or the high product lanes P3..P5.

The behavior modeled here is from CN71XX/OCTEON III. CN71XX defines
MTP0 as loading P0 from rs, loading P3 from rt, and setting P1 to zero
in the description/register-state table. It marks P2/P4/P5
architecturally unpredictable; I modeled those lanes as zero for
deterministic emulation.

The P1/P2/P4/P5 handling is therefore not from the CN50XX/OCTEON Plus
MTP0 definition. It follows the newer OCTEON III product-state
definition.

The legacy form remains encoding-compatible because cnMIPS II MTP0 rs
maps to cnMIPS III MTP0 rs, $0, but the modeled state side effects follow
the OCTEON III definition.

> The octeon2 manual for MTP* are quite simple, writing to just the one register.

I don't think you have an OCTEON 2 manual, you told me off-list you only
had the CN50XX-HM-0.99E manual which is OCTEON plus/OCTEON 1, not
OCTEON 2. I do not have a copy of the OCTEON 2 manual although the
OCTEON 3 manual presumably should cover just about everything there.

I emailed you a copy of the newer CN71XX-HM-0.991E OCTEON 3 manual
I found yesterday.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 08/35] target/mips: add Octeon LBX instruction
  2026-05-11 18:22 ` [PATCH v6 08/35] target/mips: add Octeon LBX instruction James Hilliard
@ 2026-05-14  9:49   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14  9:49 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> LBX performs an indexed signed byte load from base + index and writes the
> sign-extended result to rd.
> 
> Wire the existing indexed-load helper to MO_SB so Octeon user-mode
> binaries can use the signed byte variant alongside the existing LBUX
> path.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split LBX out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> ---
>   target/mips/tcg/octeon.decode      | 1 +
>   target/mips/tcg/octeon_translate.c | 1 +
>   2 files changed, 2 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 09/35] target/mips: add Octeon LHUX instruction
  2026-05-11 18:22 ` [PATCH v6 09/35] target/mips: add Octeon LHUX instruction James Hilliard
@ 2026-05-14  9:50   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14  9:50 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> LHUX performs an indexed unsigned halfword load from base + index and
> zero-extends the result into rd.
> 
> Add the decode entry and reuse the common indexed-load translator with
> MO_UW.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split LHUX out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> ---
>   target/mips/tcg/octeon.decode      | 1 +
>   target/mips/tcg/octeon_translate.c | 1 +
>   2 files changed, 2 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 10/35] target/mips: add Octeon LWUX instruction
  2026-05-11 18:22 ` [PATCH v6 10/35] target/mips: add Octeon LWUX instruction James Hilliard
@ 2026-05-14  9:50   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14  9:50 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> LWUX performs an indexed unsigned word load from base + index and
> zero-extends the result into rd.
> 
> Add the decode entry and route it through the common indexed-load
> translator with MO_UL.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split LWUX out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> ---
>   target/mips/tcg/octeon.decode      | 1 +
>   target/mips/tcg/octeon_translate.c | 1 +
>   2 files changed, 2 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction
  2026-05-11 18:22 ` [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction James Hilliard
@ 2026-05-14 10:03   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:03 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> ZCBT has the same user-mode memory effect as ZCB for QEMU's purposes.
> 
> Reuse the ZCB translator so both cache-block-zero forms clear the
> containing 128-byte line.
> 
> Acked-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split ZCBT out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> 
> Changes v4 -> v5:
>    - Fold ZCBT into the ZCB decodetree entry with a selector comment
>      instead of adding a separate translator thunk.  (suggested by Richard
>      Henderson)
> ---
>   target/mips/tcg/octeon.decode | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
> index d8a1bfce77..5377f7b3ef 100644
> --- a/target/mips/tcg/octeon.decode
> +++ b/target/mips/tcg/octeon.decode
> @@ -51,6 +51,7 @@ SAAD         011100 ..... ..... 00000 00000 011001 @saa
>   
>   &zcb         base
>   ZCB          011100 base:5 00000 00000 11100 011111 &zcb
> +ZCB          011100 base:5 00000 00000 11101 011111 &zcb  # ZCBT

What about using '-' instead?

ZCB          011100 base:5 00000 00000 1110- 011111 &zcb

Although if we were using decodetree for the disassembler
output then ZCBT would be preferred.

>   
>   &lx          base index rd
>   @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 11/35] target/mips: add Octeon SAA instruction
  2026-05-11 18:22 ` [PATCH v6 11/35] target/mips: add Octeon SAA instruction James Hilliard
@ 2026-05-14 10:08   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:08 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> SAA atomically adds rt to the naturally aligned 32-bit word at base and
> discards the old memory value.
> 
> Implement the common SAA/SAAD translator with TCG atomic_fetch_add_i64.
> The MemOp selects the word or doubleword transaction size.  QEMU only has
> one Octeon CPU model today, so keep SAA/SAAD under the existing Octeon
> instruction feature bucket instead of adding a finer-grained Octeon+
> feature bit.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split SAA out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> 
> Changes v3 -> v4:
>    - Gate SAA/SAAD behind an Octeon+ feature bit.  (reported by Richard
>      Henderson)
>    - Use the i64 TCG atomic add path for both word and doubleword sizes.
>      (suggested by Richard Henderson)
> 
> Changes v4 -> v5:
>    - Drop the separate Octeon+ feature bit; QEMU only has one Octeon CPU
>      model today.  (comment by Richard Henderson)
> ---
>   target/mips/tcg/octeon.decode      |  4 ++++
>   target/mips/tcg/octeon_translate.c | 14 ++++++++++++++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
> index db7d5f55f0..d6b241de42 100644
> --- a/target/mips/tcg/octeon.decode
> +++ b/target/mips/tcg/octeon.decode
> @@ -44,6 +44,10 @@ SNE          011100 ..... ..... ..... 00000 101011 @r3
>   SEQI         011100 rs:5 rt:5 imm:s10 101110 &cmpi
>   SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
>   
> +&saa         base rt
> +@saa         ...... base:5 rt:5 ................ &saa
> +SAA          011100 ..... ..... 00000 00000 011000 @saa
> +
>   &lx          base index rd
>   @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
>   LWX          011111 ..... ..... ..... 00000 001010 @lx
> diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
> index 401c4bd14b..441d71d57b 100644
> --- a/target/mips/tcg/octeon_translate.c
> +++ b/target/mips/tcg/octeon_translate.c
> @@ -161,6 +161,20 @@ static bool trans_lx(DisasContext *ctx, arg_lx *a, MemOp mop)
>       return true;
>   }
>   
> +static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
> +{
> +    TCGv_i64 addr = tcg_temp_new_i64();
> +    TCGv_i64 value = tcg_temp_new_i64();
> +    TCGv_i64 old = tcg_temp_new_i64();
> +    MemOp amo = mo_endian(ctx) | mop | MO_ALIGN;
> +
> +    gen_base_offset_addr(ctx, addr, a->base, 0);
> +    gen_load_gpr(value, a->rt);
> +    tcg_gen_atomic_fetch_add_i64(old, addr, value, ctx->mem_idx, amo);
> +    return true;
> +}
> +
> +TRANS(SAA,  trans_saa, MO_UL);

Sign doesn't seem relevant here, so -- although it is equivalent --
I'd rather s/MO_UL/MO_32/. Can do when applying if you ack.

>   TRANS(LBX,  trans_lx, MO_SB);
>   TRANS(LBUX, trans_lx, MO_UB);
>   TRANS(LHX,  trans_lx, MO_SW);
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 12/35] target/mips: add Octeon SAAD instruction
  2026-05-11 18:22 ` [PATCH v6 12/35] target/mips: add Octeon SAAD instruction James Hilliard
@ 2026-05-14 10:08   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:08 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> SAAD is the doubleword form of SAA: it atomically adds rt to the
> naturally aligned 64-bit doubleword at base and discards the old memory
> value.
> 
> Route it through the common SAA/SAAD translator so the MemOp selects the
> aligned doubleword transaction size.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split SAAD out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> 
> Changes v3 -> v4:
>    - Note that SAAD shares the Octeon+ gated SAA translator path.
> 
> Changes v4 -> v5:
>    - Drop the Octeon+ gated wording/path and keep SAAD under the existing
>      Octeon feature bucket.
> ---
>   target/mips/tcg/octeon.decode      | 1 +
>   target/mips/tcg/octeon_translate.c | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
> index d6b241de42..d77717cd50 100644
> --- a/target/mips/tcg/octeon.decode
> +++ b/target/mips/tcg/octeon.decode
> @@ -47,6 +47,7 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
>   &saa         base rt
>   @saa         ...... base:5 rt:5 ................ &saa
>   SAA          011100 ..... ..... 00000 00000 011000 @saa
> +SAAD         011100 ..... ..... 00000 00000 011001 @saa
>   
>   &lx          base index rd
>   @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
> diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
> index 441d71d57b..daeaf07072 100644
> --- a/target/mips/tcg/octeon_translate.c
> +++ b/target/mips/tcg/octeon_translate.c
> @@ -175,6 +175,7 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
>   }
>   
>   TRANS(SAA,  trans_saa, MO_UL);
> +TRANS(SAAD, trans_saa, MO_UQ);

Ditto previous patch: s/MO_UQ/MO_64/.

>   TRANS(LBX,  trans_lx, MO_SB);
>   TRANS(LBUX, trans_lx, MO_UB);
>   TRANS(LHX,  trans_lx, MO_SW);
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 13/35] target/mips: add Octeon ZCB instruction
  2026-05-11 18:22 ` [PATCH v6 13/35] target/mips: add Octeon ZCB instruction James Hilliard
@ 2026-05-14 10:25   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:25 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> ZCB zeros the 128-byte cache block containing the base address.
> 
> Model the user-mode-visible effect by aligning the address down to a
> 128-byte line and storing sixteen zero doublewords to guest memory.
> 
> Acked-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split ZCB out of the combined Octeon arithmetic and memory
>      instruction patch.  (requested by Richard Henderson)
> ---
>   target/mips/tcg/octeon.decode      |  3 +++
>   target/mips/tcg/octeon_translate.c | 24 ++++++++++++++++++++++++
>   2 files changed, 27 insertions(+)
> 
> diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
> index d77717cd50..d8a1bfce77 100644
> --- a/target/mips/tcg/octeon.decode
> +++ b/target/mips/tcg/octeon.decode
> @@ -49,6 +49,9 @@ SNEI         011100 rs:5 rt:5 imm:s10 101111 &cmpi
>   SAA          011100 ..... ..... 00000 00000 011000 @saa
>   SAAD         011100 ..... ..... 00000 00000 011001 @saa
>   
> +&zcb         base
> +ZCB          011100 base:5 00000 00000 11100 011111 &zcb
> +
>   &lx          base index rd
>   @lx          ...... base:5 index:5 rd:5 ...... ..... &lx
>   LWX          011111 ..... ..... ..... 00000 001010 @lx
> diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
> index daeaf07072..75b28c4338 100644
> --- a/target/mips/tcg/octeon_translate.c
> +++ b/target/mips/tcg/octeon_translate.c
> @@ -174,6 +174,30 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, MemOp mop)
>       return true;
>   }
>   
> +static bool trans_ZCB(DisasContext *ctx, arg_ZCB *a)
> +{
> +    TCGv_i64 addr = tcg_temp_new_i64();
> +    TCGv_i64 line = tcg_temp_new_i64();
> +    TCGv_i64 zero = tcg_constant_i64(0);

Could it be more effective to use TCGv_i128 zero?

> +
> +    gen_base_offset_addr(ctx, addr, a->base, 0);
> +
> +    /*
> +     * QEMU models ZCB/ZCBT as zeroing the containing 128-byte cache line
> +     * in guest memory.
> +     */
> +    tcg_gen_andi_i64(line, addr, ~0x7fULL);
> +
> +    for (int i = 0; i < 16; i++) {
> +        TCGv_i64 slot = tcg_temp_new_i64();
> +
> +        tcg_gen_addi_i64(slot, line, i * 8);
> +        tcg_gen_qemu_st_i64(zero, slot, ctx->mem_idx, mo_endian(ctx) | MO_UQ);

s/MO_UQ/MO_$bits/

> +    }
> +
> +    return true;
> +}
> +
>   TRANS(SAA,  trans_saa, MO_UL);
>   TRANS(SAAD, trans_saa, MO_UQ);
>   TRANS(LBX,  trans_lx, MO_SB);
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 07/35] target/mips: add Octeon multiplier state
  2026-05-11 18:22 ` [PATCH v6 07/35] target/mips: add Octeon multiplier state James Hilliard
  2026-05-11 18:31   ` Richard Henderson
@ 2026-05-14 10:27   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:27 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen

On 11/5/26 20:22, James Hilliard wrote:
> Add per-thread Octeon multiplier state for the MPL and P limb banks used
> by the VMULU/VMM0/V3MULU instruction family.
> 
> Octeon3 extends the older MPL0-MPL2/P0-P2 state with high lanes
> MPL3-MPL5/P3-P5, programmed by the two-source MTM/MTP forms. Represent
> both banks as uint64_t arrays so the TC state matches the architected
> 64-bit limb layout used by Octeon68XX user-mode code.
> 
> Migrate the multiplier registers in an Octeon-only subsection so
> non-Octeon CPU models do not grow migration state.
> 
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Split the multiplier state out of the combined Octeon arithmetic and
>      memory instruction patch.  (requested by Richard Henderson)
> 
> Changes v3 -> v4:
>    - Document and keep the Octeon3 MPL3-MPL5/P3-P5 high-lane state used by
>      the two-source MTM/MTP forms.
> ---
>   target/mips/cpu.h            | 12 ++++++++++++
>   target/mips/system/machine.c | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 45 insertions(+)
> 
> diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> index b478f834c1..346713705a 100644
> --- a/target/mips/cpu.h
> +++ b/target/mips/cpu.h
> @@ -459,6 +459,14 @@ typedef struct mips_def_t mips_def_t;
>   
>   
>   typedef struct TCState TCState;
> +
> +/*
> + * Octeon3 adds a second bank of multiplier/product limbs used by the
> + * two-source MTM/MTP forms: MPL0..2/P0..2 from rs and MPL3..5/P3..5 from rt.
> + */

I'd reorder this patch just before adding these MT* instructions.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling
  2026-05-11 18:22 ` [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling James Hilliard
@ 2026-05-14 10:30   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:30 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> BADDU and DMUL write their results to rd, not rt.  Route writes through
> gen_store_gpr() so rd == $zero is handled consistently.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v1 -> v2:
>    - Split the BADDU/DMUL destination handling fix out of the Octeon
>      arithmetic instruction patch.  (suggested by Philippe Mathieu-Daudé)
> 
> Changes v2 -> v3:
>    - Remove the rd == $zero fast paths and let gen_store_gpr() discard
>      writes to $zero.  (suggested by Richard Henderson)
> ---
>   target/mips/tcg/octeon_translate.c | 16 ++++------------
>   1 file changed, 4 insertions(+), 12 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths
  2026-05-11 18:22 ` [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths James Hilliard
@ 2026-05-14 10:31   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 10:31 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> EXTS, CINS, and POP route their destination writes through
> gen_store_gpr(), which already discards writes to $zero. Remove the
> remaining translator fast paths for destination $zero so these Octeon
> instructions follow the same shape as BADDU/DMUL and the generic MIPS
> translator helpers.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v2 -> v3:
>    - Remove the remaining destination $zero fast paths and let
>      gen_store_gpr() discard writes to $zero.  (suggested by Richard
>      Henderson)
> ---
>   target/mips/tcg/octeon_translate.c | 15 ---------------
>   1 file changed, 15 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

I'd logically expect this patch to follow #4 "fix Octeon arithmetic
destination handling".


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses
  2026-05-11 18:22 ` [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses James Hilliard
@ 2026-05-14 14:36   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 52+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-14 14:36 UTC (permalink / raw)
  To: James Hilliard, qemu-devel
  Cc: Laurent Vivier, Helge Deller, Pierrick Bouvier, Aurelien Jarno,
	Jiaxun Yang, Aleksandar Rikalo, Huacai Chen, Richard Henderson

On 11/5/26 20:22, James Hilliard wrote:
> Linux/MIPS enables software fixups for user-mode unaligned scalar
> accesses by default through MIPS_FIXADE/TIF_FIXADE.  QEMU linux-user did
> not model that ABI, so MIPS guests took fatal AdEL/AdES exceptions unless
> translation was forced to use unaligned host accesses.
> 
> Key MIPS translation blocks on the linux-user unaligned policy, implement
> sysmips(MIPS_FIXADE) to toggle that policy, and raise SIGBUS/BUS_ADRALN
> when fixups are disabled.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
> ---
> Changes v5 -> v6:
>    - Rename the TB flag from TB_FLAG_UNALIGN to TB_FLAG_MIPS_FIXADE
>      to match the MIPS_FIXADE ABI policy.
> 
> Changes v2 -> v3:
>    - Split MIPS_FLUSH_CACHE and MIPS_ATOMIC_SET into preparatory sysmips
>      patches.  (suggested by Richard Henderson)
> ---
>   linux-user/mips/cpu_loop.c         | 5 +++++
>   linux-user/mips/target_syscall.h   | 1 +
>   linux-user/mips64/target_syscall.h | 1 +
>   linux-user/syscall.c               | 8 ++++++++
>   target/mips/cpu.c                  | 8 ++++++--
>   target/mips/cpu.h                  | 4 ++++
>   target/mips/tcg/translate.c        | 6 +++++-
>   7 files changed, 30 insertions(+), 3 deletions(-)


> diff --git a/target/mips/cpu.c b/target/mips/cpu.c
> index f803d47763..6e827c72de 100644
> --- a/target/mips/cpu.c
> +++ b/target/mips/cpu.c
> @@ -565,11 +565,15 @@ static int mips_cpu_mmu_index(CPUState *cs, bool ifunc)
>   static TCGTBCPUState mips_get_tb_cpu_state(CPUState *cs)
>   {
>       CPUMIPSState *env = cpu_env(cs);
> +    uint32_t flags = env->hflags & MIPS_HFLAG_TB_MASK;
> +
> +#ifdef CONFIG_USER_ONLY
> +    flags |= TB_FLAG_MIPS_FIXADE * !cs->prctl_unalign_sigbus;

I'm not a big fan of this optimized style where you need to think
twice about what is being done. The following uses 3 lines but is
a no-brainer:

   if (!cs->prctl_unalign_sigbus) {
       flags |= TB_FLAG_MIPS_FIXADE
   }

> +#endif
>   
>       return (TCGTBCPUState){
>           .pc = env->active_tc.PC,
> -        .flags = env->hflags & (MIPS_HFLAG_TMASK | MIPS_HFLAG_BMASK |
> -                                MIPS_HFLAG_HWRENA_ULR),
> +        .flags = flags,
>       };
>   }



^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2026-05-14 14:37 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 18:22 [PATCH v6 00/35] target/mips: add missing Octeon user-mode support James Hilliard
2026-05-11 18:22 ` [PATCH v6 01/35] linux-user/mips: implement sysmips(MIPS_FLUSH_CACHE) James Hilliard
2026-05-11 18:22 ` [PATCH v6 02/35] linux-user/mips: implement sysmips(MIPS_ATOMIC_SET) James Hilliard
2026-05-11 18:22 ` [PATCH v6 03/35] linux-user/mips, target/mips: honor MIPS_FIXADE for unaligned accesses James Hilliard
2026-05-14 14:36   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 04/35] target/mips: fix Octeon arithmetic destination handling James Hilliard
2026-05-14 10:30   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 05/35] target/mips: split Octeon SEQ/SNE decode James Hilliard
2026-05-11 18:22 ` [PATCH v6 06/35] target/mips: drop Octeon zero-register fast paths James Hilliard
2026-05-14 10:31   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 07/35] target/mips: add Octeon multiplier state James Hilliard
2026-05-11 18:31   ` Richard Henderson
2026-05-14 10:27   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 08/35] target/mips: add Octeon LBX instruction James Hilliard
2026-05-14  9:49   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 09/35] target/mips: add Octeon LHUX instruction James Hilliard
2026-05-14  9:50   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 10/35] target/mips: add Octeon LWUX instruction James Hilliard
2026-05-14  9:50   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 11/35] target/mips: add Octeon SAA instruction James Hilliard
2026-05-14 10:08   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 12/35] target/mips: add Octeon SAAD instruction James Hilliard
2026-05-14 10:08   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 13/35] target/mips: add Octeon ZCB instruction James Hilliard
2026-05-14 10:25   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 14/35] target/mips: add Octeon ZCBT instruction James Hilliard
2026-05-14 10:03   ` Philippe Mathieu-Daudé
2026-05-11 18:22 ` [PATCH v6 15/35] target/mips: add Octeon MTM0 instruction James Hilliard
2026-05-11 18:40   ` Richard Henderson
2026-05-11 20:12     ` James Hilliard
2026-05-11 18:22 ` [PATCH v6 16/35] target/mips: add Octeon MTP0 instruction James Hilliard
2026-05-11 18:46   ` Richard Henderson
2026-05-11 20:19     ` James Hilliard
2026-05-11 18:22 ` [PATCH v6 17/35] target/mips: add Octeon MTP1 instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 18/35] target/mips: add Octeon MTP2 instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 19/35] target/mips: add Octeon MTM1 instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 20/35] target/mips: add Octeon MTM2 instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 21/35] target/mips: add Octeon VMULU instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 22/35] target/mips: add Octeon VMM0 instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 23/35] target/mips: add Octeon V3MULU instruction James Hilliard
2026-05-11 18:22 ` [PATCH v6 24/35] target/mips: add Octeon QMAC instructions James Hilliard
2026-05-11 18:22 ` [PATCH v6 25/35] tests/tcg/mips: add Octeon instruction smoke test James Hilliard
2026-05-11 18:22 ` [PATCH v6 26/35] target/mips: add Octeon LA* atomic instructions James Hilliard
2026-05-11 18:22 ` [PATCH v6 27/35] target/mips: add Octeon COP2 crypto core support James Hilliard
2026-05-11 18:22 ` [PATCH v6 28/35] target/mips: add Octeon SMS4 crypto support James Hilliard
2026-05-11 18:23 ` [PATCH v6 29/35] target/mips: add Octeon SHA3 " James Hilliard
2026-05-11 18:23 ` [PATCH v6 30/35] target/mips: add Octeon ZUC " James Hilliard
2026-05-11 18:23 ` [PATCH v6 31/35] target/mips: add Octeon Camellia " James Hilliard
2026-05-11 18:23 ` [PATCH v6 32/35] target/mips: add Octeon CHORD and LLM COP2 support James Hilliard
2026-05-11 18:23 ` [PATCH v6 33/35] target/mips: add Octeon CvmCount RDHWR support James Hilliard
2026-05-11 18:23 ` [PATCH v6 34/35] tests/tcg/mips: cover Octeon QMAC and CvmCount James Hilliard
2026-05-11 18:23 ` [PATCH v6 35/35] target/mips: expose Octeon68XX floating-point support James Hilliard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.