Re: [RFC PATCH 19/66] Hexagon instruction utility functions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: Taylor Simpson <tsimpson@quicinc.com>, qemu-devel@nongnu.org
Cc: riku.voipio@iki.fi, richard.henderson@linaro.org,
	laurent@vivier.eu, aleksandar.m.mail@gmail.com
Subject: Re: [RFC PATCH 19/66] Hexagon instruction utility functions
Date: Tue, 11 Feb 2020 08:29:27 +0100	[thread overview]
Message-ID: <9831c7d0-a954-6c73-9729-a3eb7041b979@redhat.com> (raw)
In-Reply-To: <1581381644-13678-20-git-send-email-tsimpson@quicinc.com>

On 2/11/20 1:39 AM, Taylor Simpson wrote:
> Utility functions called by various instructions
> 
> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> ---
>   target/hexagon/arch.c     | 664 +++++++++++++++++++++++++++++++++
>   target/hexagon/arch.h     |  62 ++++
>   target/hexagon/conv_emu.c | 370 +++++++++++++++++++
>   target/hexagon/conv_emu.h |  50 +++
>   target/hexagon/fma_emu.c  | 918 ++++++++++++++++++++++++++++++++++++++++++++++
>   target/hexagon/fma_emu.h  |  30 ++
>   6 files changed, 2094 insertions(+)
>   create mode 100644 target/hexagon/arch.c
>   create mode 100644 target/hexagon/arch.h
>   create mode 100644 target/hexagon/conv_emu.c
>   create mode 100644 target/hexagon/conv_emu.h
>   create mode 100644 target/hexagon/fma_emu.c
>   create mode 100644 target/hexagon/fma_emu.h
> 
> diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
> new file mode 100644
> index 0000000..2f22e60
> --- /dev/null
> +++ b/target/hexagon/arch.c
> @@ -0,0 +1,664 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <stdio.h>

You don't need this one, "qemu/osdep.h" include it for you.
Please include it first however.

> +#include <math.h>
> +#include "qemu/osdep.h"
> +#include "fma_emu.h"
> +#include "arch.h"
> +#include "macros.h"
> +
> +/*
> + * These three tables are used by the cabacdecbin instruction
> + */
> +size1u_t rLPS_table_64x4[64][4] = {
> +    {128, 176, 208, 240},
> +    {128, 167, 197, 227},
> +    {128, 158, 187, 216},
> +    {123, 150, 178, 205},
> +    {116, 142, 169, 195},
> +    {111, 135, 160, 185},
> +    {105, 128, 152, 175},
> +    {100, 122, 144, 166},
> +    {95, 116, 137, 158},
> +    {90, 110, 130, 150},
> +    {85, 104, 123, 142},
> +    {81, 99, 117, 135},
> +    {77, 94, 111, 128},
> +    {73, 89, 105, 122},
> +    {69, 85, 100, 116},
> +    {66, 80, 95, 110},
> +    {62, 76, 90, 104},
> +    {59, 72, 86, 99},
> +    {56, 69, 81, 94},
> +    {53, 65, 77, 89},
> +    {51, 62, 73, 85},
> +    {48, 59, 69, 80},
> +    {46, 56, 66, 76},
> +    {43, 53, 63, 72},
> +    {41, 50, 59, 69},
> +    {39, 48, 56, 65},
> +    {37, 45, 54, 62},
> +    {35, 43, 51, 59},
> +    {33, 41, 48, 56},
> +    {32, 39, 46, 53},
> +    {30, 37, 43, 50},
> +    {29, 35, 41, 48},
> +    {27, 33, 39, 45},
> +    {26, 31, 37, 43},
> +    {24, 30, 35, 41},
> +    {23, 28, 33, 39},
> +    {22, 27, 32, 37},
> +    {21, 26, 30, 35},
> +    {20, 24, 29, 33},
> +    {19, 23, 27, 31},
> +    {18, 22, 26, 30},
> +    {17, 21, 25, 28},
> +    {16, 20, 23, 27},
> +    {15, 19, 22, 25},
> +    {14, 18, 21, 24},
> +    {14, 17, 20, 23},
> +    {13, 16, 19, 22},
> +    {12, 15, 18, 21},
> +    {12, 14, 17, 20},
> +    {11, 14, 16, 19},
> +    {11, 13, 15, 18},
> +    {10, 12, 15, 17},
> +    {10, 12, 14, 16},
> +    {9, 11, 13, 15},
> +    {9, 11, 12, 14},
> +    {8, 10, 12, 14},
> +    {8, 9, 11, 13},
> +    {7, 9, 11, 12},
> +    {7, 9, 10, 12},
> +    {7, 8, 10, 11},
> +    {6, 8, 9, 11},
> +    {6, 7, 9, 10},
> +    {6, 7, 8, 9},
> +    {2, 2, 2, 2}
> +};
> +
> +size1u_t AC_next_state_MPS_64[64] = {
> +    1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
> +    11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> +    21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
> +    31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
> +    41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
> +    51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
> +    61, 62, 62, 63
> +};
> +
> +
> +size1u_t AC_next_state_LPS_64[64] = {
> +    0, 0, 1, 2, 2, 4, 4, 5, 6, 7,
> +    8, 9, 9, 11, 11, 12, 13, 13, 15, 15,
> +    16, 16, 18, 18, 19, 19, 21, 21, 22, 22,
> +    23, 24, 24, 25, 26, 26, 27, 27, 28, 29,
> +    29, 30, 30, 30, 31, 32, 32, 33, 33, 33,
> +    34, 34, 35, 35, 35, 36, 36, 36, 37, 37,
> +    37, 38, 38, 63
> +};
> +
> +size4u_t fbrevaddr(size4u_t pointer)
> +{
> +    size4u_t offset = pointer & 0xffff;
> +    size4u_t brevoffset = 0;
> +    int i;
> +
> +    for (i = 0; i < 16; i++) {
> +        fSETBIT(i, brevoffset, (fGETBIT((15 - i), offset)));
> +    }
> +    return (pointer & 0xffff0000) | brevoffset;
> +}
> +
> +/*
> + * Counting bits set, Brian Kernighan's way
> + * Brian Kernighan's method goes through as many iterations as there are
> + * set bits. So if we have a 32-bit word with only the high bit set,
> + * then it will only go once through the loop.
> + */
> +size4u_t count_ones_2(size2u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src; ret++) {
> +        src &= src - 1;    /* clear the least significant bit set */
> +    }
> +    return ret;
> +}
> +
> + size4u_t count_ones_4(size4u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src; ret++) {
> +        src &= src - 1;    /* clear the least significant bit set */
> +    }
> +    return ret;
> +}
> +
> +size4u_t count_ones_8(size8u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src; ret++) {
> +        src &= src - 1;    /* clear the least significant bit set */
> +    }
> +    return ret;
> +}
> +
> +size4u_t count_leading_ones_8(size8u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src & 0x8000000000000000LL; src <<= 1) {
> +        ret++;
> +    }
> +    return ret;
> +}
> +
> +
> +size4u_t count_leading_ones_4(size4u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src & 0x80000000; src <<= 1) {
> +        ret++;
> +    }
> +    return ret;
> +}
> +
> +size4u_t count_leading_ones_2(size2u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src & 0x8000; src <<= 1) {
> +        ret++;
> +    }
> +    return ret;
> +}
> +
> +size4u_t count_leading_ones_1(size1u_t src)
> +{
> +    int ret;
> +    for (ret = 0; src & 0x80; src <<= 1) {
> +        ret++;
> +    }
> +    return ret;
> +}
> +
> +#define BITS_MASK_8 0x5555555555555555ULL
> +#define PAIR_MASK_8 0x3333333333333333ULL
> +#define NYBL_MASK_8 0x0f0f0f0f0f0f0f0fULL
> +#define BYTE_MASK_8 0x00ff00ff00ff00ffULL
> +#define HALF_MASK_8 0x0000ffff0000ffffULL
> +#define WORD_MASK_8 0x00000000ffffffffULL
> +
> +size8u_t reverse_bits_8(size8u_t src)
> +{
> +    src = ((src >> 1) & BITS_MASK_8) | ((src & BITS_MASK_8) << 1);
> +    src = ((src >> 2) & PAIR_MASK_8) | ((src & PAIR_MASK_8) << 2);
> +    src = ((src >> 4) & NYBL_MASK_8) | ((src & NYBL_MASK_8) << 4);
> +    src = ((src >> 8) & BYTE_MASK_8) | ((src & BYTE_MASK_8) << 8);
> +    src = ((src >> 16) & HALF_MASK_8) | ((src & HALF_MASK_8) << 16);
> +    src = ((src >> 32) & WORD_MASK_8) | ((src & WORD_MASK_8) << 32);
> +    return src;
> +}
> +
> +
> +#define BITS_MASK_4 0x55555555ULL
> +#define PAIR_MASK_4 0x33333333ULL
> +#define NYBL_MASK_4 0x0f0f0f0fULL
> +#define BYTE_MASK_4 0x00ff00ffULL
> +#define HALF_MASK_4 0x0000ffffULL
> +
> +size4u_t reverse_bits_4(size4u_t src)
> +{
> +    src = ((src >> 1) & BITS_MASK_4) | ((src & BITS_MASK_4) << 1);
> +    src = ((src >> 2) & PAIR_MASK_4) | ((src & PAIR_MASK_4) << 2);
> +    src = ((src >> 4) & NYBL_MASK_4) | ((src & NYBL_MASK_4) << 4);
> +    src = ((src >> 8) & BYTE_MASK_4) | ((src & BYTE_MASK_4) << 8);
> +    src = ((src >> 16) & HALF_MASK_4) | ((src & HALF_MASK_4) << 16);
> +    return src;
> +}
> +
> +#define BITS_MASK_2 0x5555ULL
> +#define PAIR_MASK_2 0x3333ULL
> +#define NYBL_MASK_2 0x0f0fULL
> +#define BYTE_MASK_2 0x00ffULL
> +#define HALF_MASK_2 0x0000ULL
> +
> +size4u_t reverse_bits_2(size2u_t src)
> +{
> +    src = ((src >> 1) & BITS_MASK_2) | ((src & BITS_MASK_2) << 1);
> +    src = ((src >> 2) & PAIR_MASK_2) | ((src & PAIR_MASK_2) << 2);
> +    src = ((src >> 4) & NYBL_MASK_2) | ((src & NYBL_MASK_2) << 4);
> +    src = ((src >> 8) & BYTE_MASK_2) | ((src & BYTE_MASK_2) << 8);
> +    return src;
> +}
> +
> +#define BITS_MASK_1 0x55ULL
> +#define PAIR_MASK_1 0x33ULL
> +#define NYBL_MASK_1 0x0fULL
> +#define BYTE_MASK_1 0x00ULL
> +#define HALF_MASK_1 0x00ULL
> +
> +size4u_t reverse_bits_1(size1u_t src)
> +{
> +    src = ((src >> 1) & BITS_MASK_1) | ((src & BITS_MASK_1) << 1);
> +    src = ((src >> 2) & PAIR_MASK_1) | ((src & PAIR_MASK_1) << 2);
> +    src = ((src >> 4) & NYBL_MASK_1) | ((src & NYBL_MASK_1) << 4);
> +    return src;
> +}
> +
> +size8u_t exchange(size8u_t bits, size4u_t cntrl)
> +{
> +    int i;
> +    size4u_t mask = 1 << 31;
> +    size8u_t mask0 = 0x1LL << 62;
> +    size8u_t mask1 = 0x2LL << 62;
> +    size8u_t outbits = 0;
> +    size8u_t b0, b1;
> +
> +    for (i = 0; i < 32; i++) {
> +        b0 = (bits & mask0) >> (2 * i);
> +        b1 = (bits & mask1) >> (2 * i);
> +        if (cntrl & mask) {
> +            outbits |= (b1 >> 1) | (b0 << 1);
> +        } else {
> +            outbits |= (b1 | b0);
> +        }
> +        outbits <<= 2;
> +        mask >>= 1;
> +        mask0 >>= 2;
> +        mask1 >>= 2;
> +    }
> +    return outbits;
> +}
> +
> +size8u_t interleave(size4u_t odd, size4u_t even)
> +{
> +    /* Convert to long long */
> +    size8u_t myodd = odd;
> +    size8u_t myeven = even;
> +    /* First, spread bits out */
> +    myodd = (myodd | (myodd << 16)) & HALF_MASK_8;
> +    myeven = (myeven | (myeven << 16)) & HALF_MASK_8;
> +    myodd = (myodd | (myodd << 8)) & BYTE_MASK_8;
> +    myeven = (myeven | (myeven << 8)) & BYTE_MASK_8;
> +    myodd = (myodd | (myodd << 4)) & NYBL_MASK_8;
> +    myeven = (myeven | (myeven << 4)) & NYBL_MASK_8;
> +    myodd = (myodd | (myodd << 2)) & PAIR_MASK_8;
> +    myeven = (myeven | (myeven << 2)) & PAIR_MASK_8;
> +    myodd = (myodd | (myodd << 1)) & BITS_MASK_8;
> +    myeven = (myeven | (myeven << 1)) & BITS_MASK_8;
> +    /* Now OR together */
> +    return myeven | (myodd << 1);
> +}
> +
> +size8u_t deinterleave(size8u_t src)
> +{
> +    /* Get odd and even bits */
> +    size8u_t myodd = ((src >> 1) & BITS_MASK_8);
> +    size8u_t myeven = (src & BITS_MASK_8);
> +
> +    /* Unspread bits */
> +    myeven = (myeven | (myeven >> 1)) & PAIR_MASK_8;
> +    myodd = (myodd | (myodd >> 1)) & PAIR_MASK_8;
> +    myeven = (myeven | (myeven >> 2)) & NYBL_MASK_8;
> +    myodd = (myodd | (myodd >> 2)) & NYBL_MASK_8;
> +    myeven = (myeven | (myeven >> 4)) & BYTE_MASK_8;
> +    myodd = (myodd | (myodd >> 4)) & BYTE_MASK_8;
> +    myeven = (myeven | (myeven >> 8)) & HALF_MASK_8;
> +    myodd = (myodd | (myodd >> 8)) & HALF_MASK_8;
> +    myeven = (myeven | (myeven >> 16)) & WORD_MASK_8;
> +    myodd = (myodd | (myodd >> 16)) & WORD_MASK_8;
> +
> +    /* Return odd bits in upper half */
> +    return myeven | (myodd << 32);
> +}
> +
> +size4u_t carry_from_add64(size8u_t a, size8u_t b, size4u_t c)
> +{
> +    size8u_t tmpa, tmpb, tmpc;
> +    tmpa = fGETUWORD(0, a);
> +    tmpb = fGETUWORD(0, b);
> +    tmpc = tmpa + tmpb + c;
> +    tmpa = fGETUWORD(1, a);
> +    tmpb = fGETUWORD(1, b);
> +    tmpc = tmpa + tmpb + fGETUWORD(1, tmpc);
> +    tmpc = fGETUWORD(1, tmpc);
> +    return tmpc;
> +}
> +
> +size4s_t conv_round(size4s_t a, int n)
> +{
> +    size8s_t val;
> +
> +    if (n == 0) {
> +        val = a;
> +    } else if ((a & ((1 << (n - 1)) - 1)) == 0) {    /* N-1..0 all zero? */
> +        /* Add LSB from int part */
> +        val = ((fSE32_64(a)) + (size8s_t) (((size4u_t) ((1 << n) & a)) >> 1));
> +    } else {
> +        val = ((fSE32_64(a)) + (1 << (n - 1)));
> +    }
> +
> +    val = val >> n;
> +    return (size4s_t)val;
> +}
> +
> +size16s_t cast8s_to_16s(size8s_t a)
> +{
> +    size16s_t result = {.hi = 0, .lo = 0};
> +    result.lo = a;
> +    if (a < 0) {
> +        result.hi = -1;
> +    }
> +    return result;
> +}
> +
> +size8s_t cast16s_to_8s(size16s_t a)
> +{
> +    return a.lo;
> +}
> +
> +size4s_t cast16s_to_4s(size16s_t a)
> +{
> +    return (size4s_t)a.lo;
> +}
> +
> +size16s_t add128(size16s_t a, size16s_t b)
> +{
> +    size16s_t result = {.hi = 0, .lo = 0};
> +    result.lo = a.lo + b.lo;
> +    result.hi = a.hi + b.hi;
> +
> +    if (result.lo < b.lo) {
> +        result.hi++;
> +    }
> +
> +    return result;
> +}
> +
> +size16s_t sub128(size16s_t a, size16s_t b)
> +{
> +    size16s_t result = {.hi = 0, .lo = 0};
> +    result.lo = a.lo - b.lo;
> +    result.hi = a.hi - b.hi;
> +    if (result.lo > a.lo) {
> +        result.hi--;
> +    }
> +
> +    return result;
> +}
> +
> +size16s_t shiftr128(size16s_t a, size4u_t n)
> +{
> +    size16s_t result;
> +    result.lo = (a.lo >> n) | (a.hi << (64 - n));
> +    result.hi = a.hi >> n;
> +    return result;
> +}
> +
> +size16s_t shiftl128(size16s_t a, size4u_t n)
> +{
> +    size16s_t result;
> +    result.lo = a.lo << n;
> +    result.hi = (a.hi << n) | (a.lo >> (64 - n));
> +    return result;
> +}
> +
> +size16s_t and128(size16s_t a, size16s_t b)
> +{
> +    size16s_t result;
> +    result.lo = a.lo & b.lo;
> +    result.hi = a.hi & b.hi;
> +    return result;
> +}
> +
> +/* Floating Point Stuff */
> +
> +static const int roundingmodes[] = {
> +    FE_TONEAREST,
> +    FE_TOWARDZERO,
> +    FE_DOWNWARD,
> +    FE_UPWARD
> +};
> +
> +void arch_fpop_start(CPUHexagonState *env)
> +{
> +    fegetenv(&env->fenv);
> +    feclearexcept(FE_ALL_EXCEPT);
> +    fesetround(roundingmodes[fREAD_REG_FIELD(USR, USR_FPRND)]);
> +}
> +
> +#define NOTHING             /* Don't do anything */
> +
> +#define TEST_FLAG(LIBCF, MYF, MYE) \
> +    do { \
> +        if (fetestexcept(LIBCF)) { \
> +            if (GET_USR_FIELD(USR_##MYF) == 0) { \
> +                SET_USR_FIELD(USR_##MYF, 1); \
> +                if (GET_USR_FIELD(USR_##MYE)) { \
> +                    NOTHING \
> +                } \
> +            } \
> +        } \
> +    } while (0)
> +
> +void arch_fpop_end(CPUHexagonState *env)
> +{
> +    if (fetestexcept(FE_ALL_EXCEPT)) {
> +        TEST_FLAG(FE_INEXACT, FPINPF, FPINPE);
> +        TEST_FLAG(FE_DIVBYZERO, FPDBZF, FPDBZE);
> +        TEST_FLAG(FE_INVALID, FPINVF, FPINVE);
> +        TEST_FLAG(FE_OVERFLOW, FPOVFF, FPOVFE);
> +        TEST_FLAG(FE_UNDERFLOW, FPUNFF, FPUNFE);
> +    }
> +    fesetenv(&env->fenv);
> +}
> +
> +#undef TEST_FLAG
> +
> +
> +void arch_raise_fpflag(unsigned int flags)
> +{
> +    feraiseexcept(flags);
> +}
> +
> +int arch_sf_recip_common(size4s_t *Rs, size4s_t *Rt, size4s_t *Rd, int *adjust)
> +{
> +    int n_class;
> +    int d_class;
> +    int n_exp;
> +    int d_exp;
> +    int ret = 0;
> +    size4s_t RsV, RtV, RdV;
> +    int PeV = 0;
> +    RsV = *Rs;
> +    RtV = *Rt;
> +    n_class = fpclassify(fFLOAT(RsV));
> +    d_class = fpclassify(fFLOAT(RtV));
> +    if ((n_class == FP_NAN) && (d_class == FP_NAN)) {
> +        if (fGETBIT(22, RsV & RtV) == 0) {
> +            fRAISEFLAGS(FE_INVALID);
> +        }
> +        RdV = RsV = RtV = fSFNANVAL();
> +    } else if (n_class == FP_NAN) {
> +        if (fGETBIT(22, RsV) == 0) {
> +            fRAISEFLAGS(FE_INVALID);
> +        }
> +        RdV = RsV = RtV = fSFNANVAL();
> +    } else if (d_class == FP_NAN) {
> +        /* EJP: or put NaN in num/den fixup? */
> +        if (fGETBIT(22, RtV) == 0) {
> +            fRAISEFLAGS(FE_INVALID);
> +        }
> +        RdV = RsV = RtV = fSFNANVAL();
> +    } else if ((n_class == FP_INFINITE) && (d_class == FP_INFINITE)) {
> +        /* EJP: or put Inf in num fixup? */
> +        RdV = RsV = RtV = fSFNANVAL();
> +        fRAISEFLAGS(FE_INVALID);
> +    } else if ((n_class == FP_ZERO) && (d_class == FP_ZERO)) {
> +        /* EJP: or put zero in num fixup? */
> +        RdV = RsV = RtV = fSFNANVAL();
> +        fRAISEFLAGS(FE_INVALID);
> +    } else if (d_class == FP_ZERO) {
> +        /* EJP: or put Inf in num fixup? */
> +        RsV = fSFINFVAL(RsV ^ RtV);
> +        RtV = fSFONEVAL(0);
> +        RdV = fSFONEVAL(0);
> +        if (n_class != FP_INFINITE) {
> +            fRAISEFLAGS(FE_DIVBYZERO);
> +        }
> +    } else if (d_class == FP_INFINITE) {
> +        RsV = 0x80000000 & (RsV ^ RtV);
> +        RtV = fSFONEVAL(0);
> +        RdV = fSFONEVAL(0);
> +    } else if (n_class == FP_ZERO) {
> +        /* EJP: Does this just work itself out? */
> +        /* EJP: No, 0/Inf causes problems. */
> +        RsV = 0x80000000 & (RsV ^ RtV);
> +        RtV = fSFONEVAL(0);
> +        RdV = fSFONEVAL(0);
> +    } else if (n_class == FP_INFINITE) {
> +        /* EJP: Does this just work itself out? */
> +        RsV = fSFINFVAL(RsV ^ RtV);
> +        RtV = fSFONEVAL(0);
> +        RdV = fSFONEVAL(0);
> +    } else {
> +        PeV = 0x00;
> +        /* Basic checks passed */
> +        n_exp = fSF_GETEXP(RsV);
> +        d_exp = fSF_GETEXP(RtV);
> +        if ((n_exp - d_exp + fSF_BIAS()) <= fSF_MANTBITS()) {
> +            /* Near quotient underflow / inexact Q */
> +            PeV = 0x80;
> +            RtV = fSF_MUL_POW2(RtV, -64);
> +            RsV = fSF_MUL_POW2(RsV, 64);
> +        } else if ((n_exp - d_exp + fSF_BIAS()) > (fSF_MAXEXP() - 24)) {
> +            /* Near quotient overflow */
> +            PeV = 0x40;
> +            RtV = fSF_MUL_POW2(RtV, 32);
> +            RsV = fSF_MUL_POW2(RsV, -32);
> +        } else if (n_exp <= fSF_MANTBITS() + 2) {
> +            RtV = fSF_MUL_POW2(RtV, 64);
> +            RsV = fSF_MUL_POW2(RsV, 64);
> +        } else if (d_exp <= 1) {
> +            RtV = fSF_MUL_POW2(RtV, 32);
> +            RsV = fSF_MUL_POW2(RsV, 32);
> +        } else if (d_exp > 252) {
> +            RtV = fSF_MUL_POW2(RtV, -32);
> +            RsV = fSF_MUL_POW2(RsV, -32);
> +        }
> +        RdV = 0;
> +        ret = 1;
> +    }
> +    *Rs = RsV;
> +    *Rt = RtV;
> +    *Rd = RdV;
> +    *adjust = PeV;
> +    return ret;
> +}
> +
> +int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust)
> +{
> +    int r_class;
> +    size4s_t RsV, RdV;
> +    int PeV = 0;
> +    int r_exp;
> +    int ret = 0;
> +    RsV = *Rs;
> +    r_class = fpclassify(fFLOAT(RsV));
> +    if (r_class == FP_NAN) {
> +        if (fGETBIT(22, RsV) == 0) {
> +            fRAISEFLAGS(FE_INVALID);
> +        }
> +        RdV = RsV = fSFNANVAL();
> +    } else if (fFLOAT(RsV) < 0.0) {
> +        /* Negative nonzero values are NaN */
> +        fRAISEFLAGS(FE_INVALID);
> +        RsV = fSFNANVAL();
> +        RdV = fSFNANVAL();
> +    } else if (r_class == FP_INFINITE) {
> +        /* EJP: or put Inf in num fixup? */
> +        RsV = fSFINFVAL(-1);
> +        RdV = fSFINFVAL(-1);
> +    } else if (r_class == FP_ZERO) {
> +        /* EJP: or put zero in num fixup? */
> +        RsV = RsV;
> +        RdV = fSFONEVAL(0);
> +    } else {
> +        PeV = 0x00;
> +        /* Basic checks passed */
> +        r_exp = fSF_GETEXP(RsV);
> +        if (r_exp <= 24) {
> +            RsV = fSF_MUL_POW2(RsV, 64);
> +            PeV = 0xe0;
> +        }
> +        RdV = 0;
> +        ret = 1;
> +    }
> +    *Rs = RsV;
> +    *Rd = RdV;
> +    *adjust = PeV;
> +    return ret;
> +}
> +
> +int arch_recip_lookup(int index)
> +{
> +    index &= 0x7f;
> +    unsigned const int roundrom[128] = {
> +        0x0fe, 0x0fa, 0x0f6, 0x0f2, 0x0ef, 0x0eb, 0x0e7, 0x0e4,
> +        0x0e0, 0x0dd, 0x0d9, 0x0d6, 0x0d2, 0x0cf, 0x0cc, 0x0c9,
> +        0x0c6, 0x0c2, 0x0bf, 0x0bc, 0x0b9, 0x0b6, 0x0b3, 0x0b1,
> +        0x0ae, 0x0ab, 0x0a8, 0x0a5, 0x0a3, 0x0a0, 0x09d, 0x09b,
> +        0x098, 0x096, 0x093, 0x091, 0x08e, 0x08c, 0x08a, 0x087,
> +        0x085, 0x083, 0x080, 0x07e, 0x07c, 0x07a, 0x078, 0x075,
> +        0x073, 0x071, 0x06f, 0x06d, 0x06b, 0x069, 0x067, 0x065,
> +        0x063, 0x061, 0x05f, 0x05e, 0x05c, 0x05a, 0x058, 0x056,
> +        0x054, 0x053, 0x051, 0x04f, 0x04e, 0x04c, 0x04a, 0x049,
> +        0x047, 0x045, 0x044, 0x042, 0x040, 0x03f, 0x03d, 0x03c,
> +        0x03a, 0x039, 0x037, 0x036, 0x034, 0x033, 0x032, 0x030,
> +        0x02f, 0x02d, 0x02c, 0x02b, 0x029, 0x028, 0x027, 0x025,
> +        0x024, 0x023, 0x021, 0x020, 0x01f, 0x01e, 0x01c, 0x01b,
> +        0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x013, 0x012,
> +        0x011, 0x00f, 0x00e, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
> +        0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x000,
> +    };
> +    return roundrom[index];
> +};
> +
> +int arch_invsqrt_lookup(int index)
> +{
> +    index &= 0x7f;
> +    unsigned const int roundrom[128] = {
> +        0x069, 0x066, 0x063, 0x061, 0x05e, 0x05b, 0x059, 0x057,
> +        0x054, 0x052, 0x050, 0x04d, 0x04b, 0x049, 0x047, 0x045,
> +        0x043, 0x041, 0x03f, 0x03d, 0x03b, 0x039, 0x037, 0x036,
> +        0x034, 0x032, 0x030, 0x02f, 0x02d, 0x02c, 0x02a, 0x028,
> +        0x027, 0x025, 0x024, 0x022, 0x021, 0x01f, 0x01e, 0x01d,
> +        0x01b, 0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x012,
> +        0x011, 0x010, 0x00f, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
> +        0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x001,
> +        0x0fe, 0x0fa, 0x0f6, 0x0f3, 0x0ef, 0x0eb, 0x0e8, 0x0e4,
> +        0x0e1, 0x0de, 0x0db, 0x0d7, 0x0d4, 0x0d1, 0x0ce, 0x0cb,
> +        0x0c9, 0x0c6, 0x0c3, 0x0c0, 0x0be, 0x0bb, 0x0b8, 0x0b6,
> +        0x0b3, 0x0b1, 0x0af, 0x0ac, 0x0aa, 0x0a8, 0x0a5, 0x0a3,
> +        0x0a1, 0x09f, 0x09d, 0x09b, 0x099, 0x097, 0x095, 0x093,
> +        0x091, 0x08f, 0x08d, 0x08b, 0x089, 0x087, 0x086, 0x084,
> +        0x082, 0x080, 0x07f, 0x07d, 0x07b, 0x07a, 0x078, 0x077,
> +        0x075, 0x074, 0x072, 0x071, 0x06f, 0x06e, 0x06c, 0x06b,
> +    };
> +    return roundrom[index];
> +};
> +
> diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
> new file mode 100644
> index 0000000..665a97f
> --- /dev/null
> +++ b/target/hexagon/arch.h
> @@ -0,0 +1,62 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ARCH_H
> +#define ARCH_H
> +
> +#include "cpu.h"
> +#include "hex_arch_types.h"
> +
> +extern size1u_t rLPS_table_64x4[64][4];
> +extern size1u_t AC_next_state_MPS_64[64];
> +extern size1u_t AC_next_state_LPS_64[64];
> +
> +extern size4u_t fbrevaddr(size4u_t pointer);
> +extern size4u_t count_ones_2(size2u_t src);
> +extern size4u_t count_ones_4(size4u_t src);
> +extern size4u_t count_ones_8(size8u_t src);
> +extern size4u_t count_leading_ones_8(size8u_t src);
> +extern size4u_t count_leading_ones_4(size4u_t src);
> +extern size4u_t count_leading_ones_2(size2u_t src);
> +extern size4u_t count_leading_ones_1(size1u_t src);
> +extern size8u_t reverse_bits_8(size8u_t src);
> +extern size4u_t reverse_bits_4(size4u_t src);
> +extern size4u_t reverse_bits_2(size2u_t src);
> +extern size4u_t reverse_bits_1(size1u_t src);
> +extern size8u_t exchange(size8u_t bits, size4u_t cntrl);
> +extern size8u_t interleave(size4u_t odd, size4u_t even);
> +extern size8u_t deinterleave(size8u_t src);
> +extern size4u_t carry_from_add64(size8u_t a, size8u_t b, size4u_t c);
> +extern size4s_t conv_round(size4s_t a, int n);
> +extern size16s_t cast8s_to_16s(size8s_t a);
> +extern size8s_t cast16s_to_8s(size16s_t a);
> +extern size4s_t cast16s_to_4s(size16s_t a);
> +extern size16s_t add128(size16s_t a, size16s_t b);
> +extern size16s_t sub128(size16s_t a, size16s_t b);
> +extern size16s_t shiftr128(size16s_t a, size4u_t n);
> +extern size16s_t shiftl128(size16s_t a, size4u_t n);
> +extern size16s_t and128(size16s_t a, size16s_t b);
> +extern void arch_fpop_start(CPUHexagonState *env);
> +extern void arch_fpop_end(CPUHexagonState *env);
> +extern void arch_raise_fpflag(unsigned int flags);
> +extern int arch_sf_recip_common(size4s_t *Rs, size4s_t *Rt, size4s_t *Rd,
> +                                int *adjust);
> +extern int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust);
> +extern int arch_recip_lookup(int index);
> +extern int arch_invsqrt_lookup(int index);
> +
> +#endif
> diff --git a/target/hexagon/conv_emu.c b/target/hexagon/conv_emu.c
> new file mode 100644
> index 0000000..79c2823
> --- /dev/null
> +++ b/target/hexagon/conv_emu.c
> @@ -0,0 +1,370 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hex_arch_types.h"
> +#include "macros.h"
> +#include <stdio.h>
> +#include <math.h>
> +#include "conv_emu.h"
> +
> +#define isz(X) (fpclassify(X) == FP_ZERO)
> +#define DF_BIAS 1023
> +#define SF_BIAS 127
> +
> +#define LL_MAX_POS 0x7fffffffffffffffULL
> +#define MAX_POS 0x7fffffffU
> +
> +#ifdef VCPP
> +/*
> + * Visual C isn't GNU C and doesn't have __builtin_clzll
> + */
> +
> +static int __builtin_clzll(unsigned long long int input)
> +{
> +    int total = 0;
> +    if (input == 0) {
> +        return 64;
> +    }
> +    total += ((input >> (total + 32)) != 0) ? 32 : 0;
> +    total += ((input >> (total + 16)) != 0) ? 16 : 0;
> +    total += ((input >> (total +  8)) != 0) ?  8 : 0;
> +    total += ((input >> (total +  4)) != 0) ?  4 : 0;
> +    total += ((input >> (total +  2)) != 0) ?  2 : 0;
> +    total += ((input >> (total +  1)) != 0) ?  1 : 0;
> +    return 63 - total;
> +}
> +#endif
> +
> +typedef union {
> +    double f;
> +    size8u_t i;
> +    struct {
> +        size8u_t mant:52;
> +        size8u_t exp:11;
> +        size8u_t sign:1;
> +    } x;
> +} df_t;
> +
> +
> +typedef union {
> +    float f;
> +    size4u_t i;
> +    struct {
> +        size4u_t mant:23;
> +        size4u_t exp:8;
> +        size4u_t sign:1;
> +    } x;
> +} sf_t;
> +
> +
> +#define MAKE_CONV_8U_TO_XF_N(FLOATID, BIGFLOATID, RETTYPE) \
> +static RETTYPE conv_8u_to_##FLOATID##_n(size8u_t in, int negate) \
> +{ \
> +    FLOATID##_t x; \
> +    size8u_t tmp, truncbits, shamt; \
> +    int leading_zeros; \
> +    if (in == 0) { \
> +        return 0.0; \
> +    } \
> +    leading_zeros = __builtin_clzll(in); \
> +    tmp = in << (leading_zeros); \
> +    tmp <<= 1; \
> +    shamt = 64 - f##BIGFLOATID##_MANTBITS(); \
> +    truncbits = tmp & ((1ULL << (shamt)) - 1); \
> +    tmp >>= shamt; \
> +    if (truncbits != 0) { \
> +        feraiseexcept(FE_INEXACT); \
> +        switch (fegetround()) { \
> +        case FE_TOWARDZERO: \
> +            break; \
> +        case FE_DOWNWARD: \
> +            if (negate) { \
> +                tmp += 1; \
> +            } \
> +            break; \
> +        case FE_UPWARD: \
> +            if (!negate) { \
> +                tmp += 1; \
> +            } \
> +            break; \
> +        default: \
> +            if ((truncbits & ((1ULL << (shamt - 1)) - 1)) == 0) { \
> +                tmp += (tmp & 1); \
> +            } else { \
> +                tmp += ((truncbits >> (shamt - 1)) & 1); \
> +            } \
> +            break; \
> +        } \
> +    } \
> +    if (((tmp << shamt) >> shamt) != tmp) { \
> +        leading_zeros--; \
> +    } \
> +    x.x.mant = tmp; \
> +    x.x.exp = BIGFLOATID##_BIAS + f##BIGFLOATID##_MANTBITS() - \
> +              leading_zeros + shamt - 1; \
> +    x.x.sign = negate; \
> +    return x.f; \
> +}
> +
> +MAKE_CONV_8U_TO_XF_N(df, DF, double)
> +MAKE_CONV_8U_TO_XF_N(sf, SF, float)
> +
> +double conv_8u_to_df(size8u_t in)
> +{
> +    return conv_8u_to_df_n(in, 0);
> +}
> +
> +double conv_8s_to_df(size8s_t in)
> +{
> +    if (in == 0x8000000000000000) {
> +        return -0x1p63;
> +    }
> +    if (in < 0) {
> +        return conv_8u_to_df_n(-in, 1);
> +    } else {
> +        return conv_8u_to_df_n(in, 0);
> +    }
> +}
> +
> +double conv_4u_to_df(size4u_t in)
> +{
> +    return conv_8u_to_df((size8u_t) in);
> +}
> +
> +double conv_4s_to_df(size4s_t in)
> +{
> +    return conv_8s_to_df(in);
> +}
> +
> +float conv_8u_to_sf(size8u_t in)
> +{
> +    return conv_8u_to_sf_n(in, 0);
> +}
> +
> +float conv_8s_to_sf(size8s_t in)
> +{
> +    if (in == 0x8000000000000000) {
> +        return -0x1p63;
> +    }
> +    if (in < 0) {
> +        return conv_8u_to_sf_n(-in, 1);
> +    } else {
> +        return conv_8u_to_sf_n(in, 0);
> +    }
> +}
> +
> +float conv_4u_to_sf(size4u_t in)
> +{
> +    return conv_8u_to_sf(in);
> +}
> +
> +float conv_4s_to_sf(size4s_t in)
> +{
> +    return conv_8s_to_sf(in);
> +}
> +
> +
> +static size8u_t conv_df_to_8u_n(double in, int will_negate)
> +{
> +    df_t x;
> +    int fracshift, endshift;
> +    size8u_t tmp, truncbits;
> +    x.f = in;
> +    if (isinf(in)) {
> +        feraiseexcept(FE_INVALID);
> +        if (in > 0.0) {
> +            return ~0ULL;
> +        } else {
> +            return 0ULL;
> +        }
> +    }
> +    if (isnan(in)) {
> +        feraiseexcept(FE_INVALID);
> +        return ~0ULL;
> +    }
> +    if (isz(in)) {
> +        return 0;
> +    }
> +    if (x.x.sign) {
> +        feraiseexcept(FE_INVALID);
> +        return 0;
> +    }
> +    if (in < 0.5) {
> +        /* Near zero, captures large fracshifts, denorms, etc */
> +        feraiseexcept(FE_INEXACT);
> +        switch (fegetround()) {
> +        case FE_DOWNWARD:
> +            if (will_negate) {
> +                return 1;
> +            } else {
> +                return 0;
> +            }
> +        case FE_UPWARD:
> +            if (!will_negate) {
> +                return 1;
> +            } else {
> +                return 0;
> +            }
> +        default:
> +            return 0;    /* nearest or towards zero */
> +        }
> +    }
> +    if ((x.x.exp - DF_BIAS) >= 64) {
> +        /* way too big */
> +        feraiseexcept(FE_INVALID);
> +        return ~0ULL;
> +    }
> +    fracshift = fMAX(0, (fDF_MANTBITS() - (x.x.exp - DF_BIAS)));
> +    endshift = fMAX(0, ((x.x.exp - DF_BIAS - fDF_MANTBITS())));
> +    tmp = x.x.mant | (1ULL << fDF_MANTBITS());
> +    truncbits = tmp & ((1ULL << fracshift) - 1);
> +    tmp >>= fracshift;
> +    if (truncbits) {
> +        /* Apply Rounding */
> +        feraiseexcept(FE_INEXACT);
> +        switch (fegetround()) {
> +        case FE_TOWARDZERO:
> +            break;
> +        case FE_DOWNWARD:
> +            if (will_negate) {
> +                tmp += 1;
> +            }
> +            break;
> +        case FE_UPWARD:
> +            if (!will_negate) {
> +                tmp += 1;
> +            }
> +            break;
> +        default:
> +            if ((truncbits & ((1ULL << (fracshift - 1)) - 1)) == 0) {
> +                /* Exactly .5 */
> +                tmp += (tmp & 1);
> +            } else {
> +                tmp += ((truncbits >> (fracshift - 1)) & 1);
> +            }
> +        }
> +    }
> +    /*
> +     * If we added one and it carried all the way out,
> +     * check to see if overflow
> +     */
> +    if ((tmp & ((1ULL << (fDF_MANTBITS() + 1)) - 1)) == 0) {
> +        if ((x.x.exp - DF_BIAS) == 63) {
> +            feclearexcept(FE_INEXACT);
> +            feraiseexcept(FE_INVALID);
> +            return ~0ULL;
> +        }
> +    }
> +    tmp <<= endshift;
> +    return tmp;
> +}
> +
> +static size4u_t conv_df_to_4u_n(double in, int will_negate)
> +{
> +    size8u_t tmp;
> +    tmp = conv_df_to_8u_n(in, will_negate);
> +    if (tmp > 0x00000000ffffffffULL) {
> +        feclearexcept(FE_INEXACT);
> +        feraiseexcept(FE_INVALID);
> +        return ~0U;
> +    }
> +    return (size4u_t)tmp;
> +}
> +
> +size8u_t conv_df_to_8u(double in)
> +{
> +    return conv_df_to_8u_n(in, 0);
> +}
> +
> +size4u_t conv_df_to_4u(double in)
> +{
> +    return conv_df_to_4u_n(in, 0);
> +}
> +
> +size8s_t conv_df_to_8s(double in)
> +{
> +    size8u_t tmp;
> +    df_t x;
> +    x.f = in;
> +    if (isnan(in)) {
> +        feraiseexcept(FE_INVALID);
> +        return -1;
> +    }
> +    if (x.x.sign) {
> +        tmp = conv_df_to_8u_n(-in, 1);
> +    } else {
> +        tmp = conv_df_to_8u_n(in, 0);
> +    }
> +    if (tmp > (LL_MAX_POS + x.x.sign)) {
> +        feclearexcept(FE_INEXACT);
> +        feraiseexcept(FE_INVALID);
> +        tmp = (LL_MAX_POS + x.x.sign);
> +    }
> +    if (x.x.sign) {
> +        return -tmp;
> +    } else {
> +        return tmp;
> +    }
> +}
> +
> +size4s_t conv_df_to_4s(double in)
> +{
> +    size8u_t tmp;
> +    df_t x;
> +    x.f = in;
> +    if (isnan(in)) {
> +        feraiseexcept(FE_INVALID);
> +        return -1;
> +    }
> +    if (x.x.sign) {
> +        tmp = conv_df_to_8u_n(-in, 1);
> +    } else {
> +        tmp = conv_df_to_8u_n(in, 0);
> +    }
> +    if (tmp > (MAX_POS + x.x.sign)) {
> +        feclearexcept(FE_INEXACT);
> +        feraiseexcept(FE_INVALID);
> +        tmp = (MAX_POS + x.x.sign);
> +    }
> +    if (x.x.sign) {
> +        return -tmp;
> +    } else {
> +        return tmp;
> +    }
> +}
> +
> +size8u_t conv_sf_to_8u(float in)
> +{
> +    return conv_df_to_8u(in);
> +}
> +
> +size4u_t conv_sf_to_4u(float in)
> +{
> +    return conv_df_to_4u(in);
> +}
> +
> +size8s_t conv_sf_to_8s(float in)
> +{
> +    return conv_df_to_8s(in);
> +}
> +
> +size4s_t conv_sf_to_4s(float in)
> +{
> +    return conv_df_to_4s(in);
> +}
> +
> diff --git a/target/hexagon/conv_emu.h b/target/hexagon/conv_emu.h
> new file mode 100644
> index 0000000..a5ee206
> --- /dev/null
> +++ b/target/hexagon/conv_emu.h
> @@ -0,0 +1,50 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef CONV_EMU_H
> +#define CONV_EMU_H
> +
> +#include "hex_arch_types.h"
> +
> +extern size8u_t conv_sf_to_8u(float in);
> +extern size4u_t conv_sf_to_4u(float in);
> +extern size8s_t conv_sf_to_8s(float in);
> +extern size4s_t conv_sf_to_4s(float in);
> +
> +extern size8u_t conv_df_to_8u(double in);
> +extern size4u_t conv_df_to_4u(double in);
> +extern size8s_t conv_df_to_8s(double in);
> +extern size4s_t conv_df_to_4s(double in);
> +
> +extern double conv_8u_to_df(size8u_t in);
> +extern double conv_4u_to_df(size4u_t in);
> +extern double conv_8s_to_df(size8s_t in);
> +extern double conv_4s_to_df(size4s_t in);
> +
> +extern float conv_8u_to_sf(size8u_t in);
> +extern float conv_4u_to_sf(size4u_t in);
> +extern float conv_8s_to_sf(size8s_t in);
> +extern float conv_4s_to_sf(size4s_t in);
> +
> +extern float conv_df_to_sf(double in);
> +
> +static inline double conv_sf_to_df(float in)
> +{
> +    return in;
> +}
> +
> +#endif
> diff --git a/target/hexagon/fma_emu.c b/target/hexagon/fma_emu.c
> new file mode 100644
> index 0000000..93cac0d
> --- /dev/null
> +++ b/target/hexagon/fma_emu.c
> @@ -0,0 +1,918 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include <stdio.h>
> +#include <math.h>
> +#include "macros.h"
> +#include "conv_emu.h"
> +#include "fma_emu.h"
> +
> +#define DF_INF_EXP 0x7ff
> +#define DF_BIAS 1023
> +
> +#define SF_INF_EXP 0xff
> +#define SF_BIAS 127
> +
> +#define HF_INF_EXP 0x1f
> +#define HF_BIAS 15
> +
> +#define WAY_BIG_EXP 4096
> +
> +#define isz(X) (fpclassify(X) == FP_ZERO)
> +
> +#define debug_fprintf(...)    /* nothing */
> +
> +typedef union {
> +    double f;
> +    size8u_t i;
> +    struct {
> +        size8u_t mant:52;
> +        size8u_t exp:11;
> +        size8u_t sign:1;
> +    } x;
> +} df_t;
> +
> +typedef union {
> +    float f;
> +    size4u_t i;
> +    struct {
> +        size4u_t mant:23;
> +        size4u_t exp:8;
> +        size4u_t sign:1;
> +    } x;
> +} sf_t;
> +
> +typedef struct {
> +    union {
> +        size8u_t low;
> +        struct {
> +            size4u_t w0;
> +            size4u_t w1;
> +        };
> +    };
> +    union {
> +        size8u_t high;
> +        struct {
> +            size4u_t w3;
> +            size4u_t w2;
> +        };
> +    };
> +} int128_t;
> +
> +typedef struct {
> +    int128_t mant;
> +    size4s_t exp;
> +    size1u_t sign;
> +    size1u_t guard;
> +    size1u_t round;
> +    size1u_t sticky;
> +} xf_t;
> +
> +static inline void xf_debug(const char *msg, xf_t a)
> +{
> +    debug_fprintf(stdout, "%s %c0x%016llx_%016llx /%d/%d/%d p%d\n", msg,
> +                  a.sign ? '-' : '+', a.mant.high, a.mant.low, a.guard,
> +                  a.round, a.sticky, a.exp);
> +}
> +
> +static inline void xf_init(xf_t *p)
> +{
> +    p->mant.low = 0;
> +    p->mant.high = 0;
> +    p->exp = 0;
> +    p->sign = 0;
> +    p->guard = 0;
> +    p->round = 0;
> +    p->sticky = 0;
> +}
> +
> +static inline size8u_t df_getmant(df_t a)
> +{
> +    int class = fpclassify(a.f);
> +    switch (class) {
> +    case FP_NORMAL:
> +    return a.x.mant | 1ULL << 52;
> +    case FP_ZERO:
> +        return 0;
> +    case FP_SUBNORMAL:
> +        return a.x.mant;
> +    default:
> +        return -1;
> +    };
> +}
> +
> +static inline size4s_t df_getexp(df_t a)
> +{
> +    int class = fpclassify(a.f);
> +    switch (class) {
> +    case FP_NORMAL:
> +        return a.x.exp;
> +    case FP_SUBNORMAL:
> +        return a.x.exp + 1;
> +    default:
> +        return -1;
> +    };
> +}
> +
> +static inline size8u_t sf_getmant(sf_t a)
> +{
> +    int class = fpclassify(a.f);
> +    switch (class) {
> +    case FP_NORMAL:
> +        return a.x.mant | 1ULL << 23;
> +    case FP_ZERO:
> +        return 0;
> +    case FP_SUBNORMAL:
> +        return a.x.mant | 0ULL;
> +    default:
> +        return -1;
> +    };
> +}
> +
> +static inline size4s_t sf_getexp(sf_t a)
> +{
> +    int class = fpclassify(a.f);
> +    switch (class) {
> +    case FP_NORMAL:
> +        return a.x.exp;
> +    case FP_SUBNORMAL:
> +        return a.x.exp + 1;
> +    default:
> +        return -1;
> +    };
> +}
> +
> +static inline int128_t int128_mul_6464(size8u_t ai, size8u_t bi)
> +{
> +    int128_t ret;
> +    int128_t a, b;
> +    size8u_t pp0, pp1a, pp1b, pp1s, pp2;
> +
> +    debug_fprintf(stdout, "ai/bi: 0x%016llx/0x%016llx\n", ai, bi);
> +
> +    a.high = b.high = 0;
> +    a.low = ai;
> +    b.low = bi;
> +    pp0 = (size8u_t)a.w0 * (size8u_t)b.w0;
> +    pp1a = (size8u_t)a.w1 * (size8u_t)b.w0;
> +    pp1b = (size8u_t)b.w1 * (size8u_t)a.w0;
> +    pp2 = (size8u_t)a.w1 * (size8u_t)b.w1;
> +
> +    debug_fprintf(stdout,
> +                  "pp2/1b/1a/0: 0x%016llx/0x%016llx/0x%016llx/0x%016llx\n",
> +                  pp2, pp1b, pp1a, pp0);
> +
> +    pp1s = pp1a + pp1b;
> +    if ((pp1s < pp1a) || (pp1s < pp1b)) {
> +        pp2 += (1ULL << 32);
> +    }
> +    ret.low = pp0 + (pp1s << 32);
> +    if ((ret.low < pp0) || (ret.low < (pp1s << 32))) {
> +        pp2 += 1;
> +    }
> +    ret.high = pp2 + (pp1s >> 32);
> +
> +    debug_fprintf(stdout,
> +                  "pp1s/rethi/retlo: 0x%016llx/0x%016llx/0x%016llx\n",
> +                  pp1s, ret.high, ret.low);
> +
> +    return ret;
> +}
> +
> +static inline int128_t int128_shl(int128_t a, size4u_t amt)
> +{
> +    int128_t ret;
> +    if (amt == 0) {
> +        return a;
> +    }
> +    if (amt > 128) {
> +        ret.high = 0;
> +        ret.low = 0;
> +        return ret;
> +    }
> +    if (amt >= 64) {
> +        amt -= 64;
> +        a.high = a.low;
> +        a.low = 0;
> +    }
> +    ret.high = a.high << amt;
> +    ret.high |= (a.low >> (64 - amt));
> +    ret.low = a.low << amt;
> +    return ret;
> +}
> +
> +static inline int128_t int128_shr(int128_t a, size4u_t amt)
> +{
> +    int128_t ret;
> +    if (amt == 0) {
> +        return a;
> +    }
> +    if (amt > 128) {
> +        ret.high = 0;
> +        ret.low = 0;
> +        return ret;
> +    }
> +    if (amt >= 64) {
> +        amt -= 64;
> +        a.low = a.high;
> +        a.high = 0;
> +    }
> +    ret.low = a.low >> amt;
> +    ret.low |= (a.high << (64 - amt));
> +    ret.high = a.high >> amt;
> +    return ret;
> +}
> +
> +static inline int128_t int128_add(int128_t a, int128_t b)
> +{
> +    int128_t ret;
> +    ret.low = a.low + b.low;
> +    if ((ret.low < a.low) || (ret.low < b.low)) {
> +        /* carry into high part */
> +        a.high += 1;
> +    }
> +    ret.high = a.high + b.high;
> +    return ret;
> +}
> +
> +static inline int128_t int128_sub(int128_t a, int128_t b, int borrow)
> +{
> +    int128_t ret;
> +    ret.low = a.low - b.low;
> +    if (ret.low > a.low) {
> +        /* borrow into high part */
> +        a.high -= 1;
> +    }
> +    ret.high = a.high - b.high;
> +    if (borrow == 0) {
> +        return ret;
> +    } else {
> +        a.high = 0;
> +        a.low = 1;
> +        return int128_sub(ret, a, 0);
> +    }
> +}
> +
> +static inline int int128_gt(int128_t a, int128_t b)
> +{
> +    if (a.high == b.high) {
> +        return a.low > b.low;
> +    }
> +    return a.high > b.high;
> +}
> +
> +static inline xf_t xf_norm_left(xf_t a)
> +{
> +    a.exp--;
> +    a.mant = int128_shl(a.mant, 1);
> +    a.mant.low |= a.guard;
> +    a.guard = a.round;
> +    a.round = a.sticky;
> +    return a;
> +}
> +
> +static inline xf_t xf_norm_right(xf_t a, int amt)
> +{
> +    if (amt > 130) {
> +        a.sticky |=
> +            a.round | a.guard | (a.mant.low != 0) | (a.mant.high != 0);
> +        a.guard = a.round = a.mant.high = a.mant.low = 0;
> +        a.exp += amt;
> +        return a;
> +
> +    }
> +    while (amt >= 64) {
> +        a.sticky |= a.round | a.guard | (a.mant.low != 0);
> +        a.guard = (a.mant.low >> 63) & 1;
> +        a.round = (a.mant.low >> 62) & 1;
> +        a.mant.low = a.mant.high;
> +        a.mant.high = 0;
> +        a.exp += 64;
> +        amt -= 64;
> +    }
> +    while (amt > 0) {
> +        a.exp++;
> +        a.sticky |= a.round;
> +        a.round = a.guard;
> +        a.guard = a.mant.low & 1;
> +        a.mant = int128_shr(a.mant, 1);
> +        amt--;
> +    }
> +    return a;
> +}
> +
> +
> +/*
> + * On the add/sub, we need to be able to shift out lots of bits, but need a
> + * sticky bit for what was shifted out, I think.
> + */
> +static xf_t xf_add(xf_t a, xf_t b);
> +
> +static inline xf_t xf_sub(xf_t a, xf_t b, int negate)
> +{
> +    xf_t ret;
> +    xf_init(&ret);
> +    int borrow;
> +
> +    xf_debug("-->Sub/a: ", a);
> +    xf_debug("-->Sub/b: ", b);
> +
> +    if (a.sign != b.sign) {
> +        b.sign = !b.sign;
> +        return xf_add(a, b);
> +    }
> +    if (b.exp > a.exp) {
> +        /* small - big == - (big - small) */
> +        return xf_sub(b, a, !negate);
> +    }
> +    if ((b.exp == a.exp) && (int128_gt(b.mant, a.mant))) {
> +        /* small - big == - (big - small) */
> +        return xf_sub(b, a, !negate);
> +    }
> +
> +    xf_debug("OK: Sub/a: ", a);
> +    xf_debug("OK: Sub/b: ", b);
> +
> +    while (a.exp > b.exp) {
> +        /* Try to normalize exponents: shrink a exponent and grow mantissa */
> +        if (a.mant.high & (1ULL << 62)) {
> +            /* Can't grow a any more */
> +            break;
> +        } else {
> +            a = xf_norm_left(a);
> +        }
> +    }
> +
> +    xf_debug("norm_l: Sub/a: ", a);
> +    xf_debug("norm_l: Sub/b: ", b);
> +
> +    while (a.exp > b.exp) {
> +        /* Try to normalize exponents: grow b exponent and shrink mantissa */
> +        /* Keep around shifted out bits... we might need those later */
> +        b = xf_norm_right(b, a.exp - b.exp);
> +    }
> +
> +    xf_debug("norm_r: Sub/a: ", a);
> +    xf_debug("norm_r: Sub/b: ", b);
> +
> +    if ((int128_gt(b.mant, a.mant))) {
> +        xf_debug("retry: Sub/a: ", a);
> +        xf_debug("retry: Sub/b: ", b);
> +        return xf_sub(b, a, !negate);
> +    }
> +
> +    /* OK, now things should be normalized! */
> +    ret.sign = a.sign;
> +    ret.exp = a.exp;
> +    assert(!int128_gt(b.mant, a.mant));
> +    borrow = (b.round << 2) | (b.guard << 1) | b.sticky;
> +    ret.mant = int128_sub(a.mant, b.mant, (borrow != 0));
> +    borrow = 0 - borrow;
> +    ret.guard = (borrow >> 2) & 1;
> +    ret.round = (borrow >> 1) & 1;
> +    ret.sticky = (borrow >> 0) & 1;
> +    if (negate) {
> +        ret.sign = !ret.sign;
> +    }
> +    return ret;
> +}
> +
> +static xf_t xf_add(xf_t a, xf_t b)
> +{
> +    xf_t ret;
> +    xf_init(&ret);
> +    xf_debug("-->Add/a: ", a);
> +    xf_debug("-->Add/b: ", b);
> +    if (a.sign != b.sign) {
> +        b.sign = !b.sign;
> +        return xf_sub(a, b, 0);
> +    }
> +    if (b.exp > a.exp) {
> +        /* small + big ==  (big + small) */
> +        return xf_add(b, a);
> +    }
> +    if ((b.exp == a.exp) && int128_gt(b.mant, a.mant)) {
> +        /* small + big ==  (big + small) */
> +        return xf_add(b, a);
> +    }
> +
> +    xf_debug("OK? Add/a: ", a);
> +    xf_debug("OK? Add/b: ", b);
> +
> +    while (a.exp > b.exp) {
> +        /* Try to normalize exponents: shrink a exponent and grow mantissa */
> +        if (a.mant.high & (1ULL << 62)) {
> +            /* Can't grow a any more */
> +            break;
> +        } else {
> +            a = xf_norm_left(a);
> +        }
> +    }
> +
> +    xf_debug("norm_l: Add/a: ", a);
> +    xf_debug("norm_l: Add/b: ", b);
> +
> +    while (a.exp > b.exp) {
> +        /* Try to normalize exponents: grow b exponent and shrink mantissa */
> +        /* Keep around shifted out bits... we might need those later */
> +        b = xf_norm_right(b, a.exp - b.exp);
> +    }
> +
> +    xf_debug("norm_r: Add/a: ", a);
> +    xf_debug("norm_r: Add/b: ", b);
> +
> +    /* OK, now things should be normalized! */
> +    if (int128_gt(b.mant, a.mant)) {
> +        xf_debug("retry: Add/a: ", a);
> +        xf_debug("retry: Add/b: ", b);
> +        return xf_add(b, a);
> +    };
> +    ret.sign = a.sign;
> +    ret.exp = a.exp;
> +    assert(!int128_gt(b.mant, a.mant));
> +    ret.mant = int128_add(a.mant, b.mant);
> +    ret.guard = b.guard;
> +    ret.round = b.round;
> +    ret.sticky = b.sticky;
> +    return ret;
> +}
> +
> +/* Return an infinity with the same sign as a */
> +static inline df_t infinite_df_t(xf_t a)
> +{
> +    df_t ret;
> +    ret.x.sign = a.sign;
> +    ret.x.exp = DF_INF_EXP;
> +    ret.x.mant = 0ULL;
> +    return ret;
> +}
> +
> +/* Return a maximum finite value with the same sign as a */
> +static inline df_t maxfinite_df_t(xf_t a)
> +{
> +    df_t ret;
> +    ret.x.sign = a.sign;
> +    ret.x.exp = DF_INF_EXP - 1;
> +    ret.x.mant = 0x000fffffffffffffULL;
> +    return ret;
> +}
> +
> +static inline df_t f2df_t(double in)
> +{
> +    df_t ret;
> +    ret.f = in;
> +    return ret;
> +}
> +
> +/* Return an infinity with the same sign as a */
> +static inline sf_t infinite_sf_t(xf_t a)
> +{
> +    sf_t ret;
> +    ret.x.sign = a.sign;
> +    ret.x.exp = SF_INF_EXP;
> +    ret.x.mant = 0ULL;
> +    return ret;
> +}
> +
> +/* Return a maximum finite value with the same sign as a */
> +static inline sf_t maxfinite_sf_t(xf_t a)
> +{
> +    sf_t ret;
> +    ret.x.sign = a.sign;
> +    ret.x.exp = SF_INF_EXP - 1;
> +    ret.x.mant = 0x007fffffUL;
> +    return ret;
> +}
> +
> +static inline sf_t f2sf_t(float in)
> +{
> +    sf_t ret;
> +    ret.f = in;
> +    return ret;
> +}
> +
> +#define GEN_XF_ROUND(TYPE, MANTBITS, INF_EXP) \
> +static inline TYPE xf_round_##TYPE(xf_t a) \
> +{ \
> +    TYPE ret; \
> +    ret.i = 0; \
> +    ret.x.sign = a.sign; \
> +    if ((a.mant.high == 0) && (a.mant.low == 0) \
> +        && ((a.guard | a.round | a.sticky) == 0)) { \
> +        /* result zero */ \
> +        switch (fegetround()) { \
> +        case FE_DOWNWARD: \
> +            return f2##TYPE(-0.0); \
> +        default: \
> +            return f2##TYPE(0.0); \
> +        } \
> +    } \
> +    /* Normalize right */ \
> +    /* We want MANTBITS bits of mantissa plus the leading one. */ \
> +    /* That means that we want MANTBITS+1 bits, or 0x000000000000FF_FFFF */ \
> +    /* So we need to normalize right while the high word is non-zero and \
> +    * while the low word is nonzero when masked with 0xffe0_0000_0000_0000 */ \
> +    xf_debug("input: ", a); \
> +    while ((a.mant.high != 0) || ((a.mant.low >> (MANTBITS + 1)) != 0)) { \
> +        a = xf_norm_right(a, 1); \
> +    } \
> +    xf_debug("norm_right: ", a); \
> +    /* \
> +     * OK, now normalize left \
> +     * We want to normalize left until we have a leading one in bit 24 \
> +     * Theoretically, we only need to shift a maximum of one to the left if we \
> +     * shifted out lots of bits from B, or if we had no shift / 1 shift sticky \
> +     * shoudl be 0  \
> +     */ \
> +    while ((a.mant.low & (1ULL << MANTBITS)) == 0) { \
> +        a = xf_norm_left(a); \
> +    } \
> +    xf_debug("norm_left: ", a); \
> +    /* \
> +     * OK, now we might need to denormalize because of potential underflow. \
> +     * We need to do this before rounding, and rounding might make us normal \
> +     * again \
> +     */ \
> +    while (a.exp <= 0) { \
> +        a = xf_norm_right(a, 1 - a.exp); \
> +        /* \
> +         * Do we have underflow? \
> +         * That's when we get an inexact answer because we ran out of bits \
> +         * in a denormal. \
> +         */ \
> +        if (a.guard || a.round || a.sticky) { \
> +            feraiseexcept(FE_UNDERFLOW); \
> +        } \
> +    } \
> +    xf_debug("norm_denorm: ", a); \
> +    /* OK, we're relatively canonical... now we need to round */ \
> +    if (a.guard || a.round || a.sticky) { \
> +        feraiseexcept(FE_INEXACT); \
> +        switch (fegetround()) { \
> +        case FE_TOWARDZERO: \
> +            /* Chop and we're done */ \
> +            break; \
> +        case FE_UPWARD: \
> +            if (a.sign == 0) { \
> +                a.mant.low += 1; \
> +            } \
> +            break; \
> +        case FE_DOWNWARD: \
> +            if (a.sign != 0) { \
> +                a.mant.low += 1; \
> +            } \
> +            break; \
> +        default: \
> +            if (a.round || a.sticky) { \
> +                /* round up if guard is 1, down if guard is zero */ \
> +                a.mant.low += a.guard; \
> +            } else if (a.guard) { \
> +                /* exactly .5, round up if odd */ \
> +                a.mant.low += (a.mant.low & 1); \
> +            } \
> +            break; \
> +        } \
> +    } \
> +    xf_debug("post_round: ", a); \
> +    /* \
> +     * OK, now we might have carried all the way up. \
> +     * So we might need to shr once \
> +     * at least we know that the lsb should be zero if we rounded and \
> +     * got a carry out... \
> +     */ \
> +    if ((a.mant.low >> (MANTBITS + 1)) != 0) { \
> +        a = xf_norm_right(a, 1); \
> +    } \
> +    xf_debug("once_norm_right: ", a); \
> +    /* Overflow? */ \
> +    if (a.exp >= INF_EXP) { \
> +        /* Yep, inf result */ \
> +        xf_debug("inf: ", a); \
> +        feraiseexcept(FE_OVERFLOW); \
> +        feraiseexcept(FE_INEXACT); \
> +        switch (fegetround()) { \
> +        case FE_TOWARDZERO: \
> +            return maxfinite_##TYPE(a); \
> +        case FE_UPWARD: \
> +            if (a.sign == 0) { \
> +                return infinite_##TYPE(a); \
> +            } else { \
> +                return maxfinite_##TYPE(a); \
> +            } \
> +        case FE_DOWNWARD: \
> +            if (a.sign != 0) { \
> +                return infinite_##TYPE(a); \
> +            } else { \
> +                return maxfinite_##TYPE(a); \
> +            } \
> +        default: \
> +            return infinite_##TYPE(a); \
> +        } \
> +    } \
> +    /* Underflow? */ \
> +    if (a.mant.low & (1ULL << MANTBITS)) { \
> +        /* Leading one means: No, we're normal. So, we should be done... */ \
> +        xf_debug("norm: ", a); \
> +        ret.x.exp = a.exp; \
> +        ret.x.mant = a.mant.low; \
> +        return ret; \
> +    } \
> +    xf_debug("denorm: ", a); \
> +    if (a.exp != 1) { \
> +        printf("a.exp == %d\n", a.exp); \
> +    } \
> +    assert(a.exp == 1); \
> +    ret.x.exp = 0; \
> +    ret.x.mant = a.mant.low; \
> +    return ret; \
> +}
> +
> +GEN_XF_ROUND(df_t, fDF_MANTBITS(), DF_INF_EXP)
> +GEN_XF_ROUND(sf_t, fSF_MANTBITS(), SF_INF_EXP)
> +
> +static inline double special_fma(df_t a, df_t b, df_t c)
> +{
> +    df_t ret;
> +    ret.i = 0;
> +
> +    /*
> +     * If A multiplied by B is an exact infinity and C is also an infinity
> +     * but with the opposite sign, FMA returns NaN and raises invalid.
> +     */
> +    if (fISINFPROD(a.f, b.f) && isinf(c.f)) {
> +        if ((a.x.sign ^ b.x.sign) != c.x.sign) {
> +            ret.i = fDFNANVAL();
> +            feraiseexcept(FE_INVALID);
> +            return ret.f;
> +        }
> +    }
> +    if ((isinf(a.f) && isz(b.f)) || (isz(a.f) && isinf(b.f))) {
> +        ret.i = fDFNANVAL();
> +        feraiseexcept(FE_INVALID);
> +        return ret.f;
> +    }
> +    /*
> +     * If none of the above checks are true and C is a NaN,
> +     * a NaN shall be returned
> +     * If A or B are NaN, a NAN shall be returned.
> +     */
> +    if (isnan(a.f) || isnan(b.f) || (isnan(c.f))) {
> +        if (isnan(a.f) && (fGETBIT(51, a.i) == 0)) {
> +            feraiseexcept(FE_INVALID);
> +        }
> +        if (isnan(b.f) && (fGETBIT(51, b.i) == 0)) {
> +            feraiseexcept(FE_INVALID);
> +        }
> +        if (isnan(c.f) && (fGETBIT(51, c.i) == 0)) {
> +            feraiseexcept(FE_INVALID);
> +        }
> +        ret.i = fDFNANVAL();
> +        return ret.f;
> +    }
> +    /*
> +     * We have checked for adding opposite-signed infinities.
> +     * Other infinities return infinity with the correct sign
> +     */
> +    if (isinf(c.f)) {
> +        ret.x.exp = DF_INF_EXP;
> +        ret.x.mant = 0;
> +        ret.x.sign = c.x.sign;
> +        return ret.f;
> +    }
> +    if (isinf(a.f) || isinf(b.f)) {
> +        ret.x.exp = DF_INF_EXP;
> +        ret.x.mant = 0;
> +        ret.x.sign = (a.x.sign ^ b.x.sign);
> +        return ret.f;
> +    }
> +    g_assert_not_reached();
> +    ret.x.exp = 0x123;
> +    ret.x.mant = 0xdead;
> +    return ret.f;
> +}
> +
> +static inline float special_fmaf(sf_t a, sf_t b, sf_t c)
> +{
> +    df_t aa, bb, cc;
> +    aa.f = a.f;
> +    bb.f = b.f;
> +    cc.f = c.f;
> +    return special_fma(aa, bb, cc);
> +}
> +
> +double internal_fmax(double a_in, double b_in, double c_in, int scale)
> +{
> +    df_t a, b, c;
> +    xf_t prod;
> +    xf_t acc;
> +    xf_t result;
> +    xf_init(&prod);
> +    xf_init(&acc);
> +    xf_init(&result);
> +    a.f = a_in;
> +    b.f = b_in;
> +    c.f = c_in;
> +
> +    debug_fprintf(stdout,
> +                  "internal_fmax: 0x%016llx * 0x%016llx + 0x%016llx sc: %d\n",
> +                  fUNDOUBLE(a_in), fUNDOUBLE(b_in), fUNDOUBLE(c_in),
> +                  scale);
> +
> +    if (isinf(a.f) || isinf(b.f) || isinf(c.f)) {
> +        return special_fma(a, b, c);
> +    }
> +    if (isnan(a.f) || isnan(b.f) || isnan(c.f)) {
> +        return special_fma(a, b, c);
> +    }
> +    if ((scale == 0) && (isz(a.f) || isz(b.f))) {
> +        return a.f * b.f + c.f;
> +    }
> +
> +    /* (a * 2**b) * (c * 2**d) == a*c * 2**(b+d) */
> +    prod.mant = int128_mul_6464(df_getmant(a), df_getmant(b));
> +
> +    /*
> +     * Note: extracting the mantissa into an int is multiplying by
> +     * 2**52, so adjust here
> +     */
> +    prod.exp = df_getexp(a) + df_getexp(b) - DF_BIAS - 52;
> +    prod.sign = a.x.sign ^ b.x.sign;
> +    xf_debug("prod: ", prod);
> +    if (!isz(c.f)) {
> +        acc.mant = int128_mul_6464(df_getmant(c), 1);
> +        acc.exp = df_getexp(c);
> +        acc.sign = c.x.sign;
> +        xf_debug("acc: ", acc);
> +        result = xf_add(prod, acc);
> +    } else {
> +        result = prod;
> +    }
> +    xf_debug("sum: ", result);
> +    debug_fprintf(stdout, "Scaling: %d\n", scale);
> +    result.exp += scale;
> +    xf_debug("post-scale: ", result);
> +    return xf_round_df_t(result).f;
> +}
> +
> +float internal_fmafx(float a_in, float b_in, float c_in, int scale)
> +{
> +    sf_t a, b, c;
> +    xf_t prod;
> +    xf_t acc;
> +    xf_t result;
> +    xf_init(&prod);
> +    xf_init(&acc);
> +    xf_init(&result);
> +    a.f = a_in;
> +    b.f = b_in;
> +    c.f = c_in;
> +
> +    debug_fprintf(stdout,
> +                  "internal_fmax: 0x%016x * 0x%016x + 0x%016x sc: %d\n",
> +                  fUNFLOAT(a_in), fUNFLOAT(b_in), fUNFLOAT(c_in), scale);
> +    if (isinf(a.f) || isinf(b.f) || isinf(c.f)) {
> +        return special_fmaf(a, b, c);
> +    }
> +    if (isnan(a.f) || isnan(b.f) || isnan(c.f)) {
> +        return special_fmaf(a, b, c);
> +    }
> +    if ((scale == 0) && (isz(a.f) || isz(b.f))) {
> +        return a.f * b.f + c.f;
> +    }
> +
> +    /* (a * 2**b) * (c * 2**d) == a*c * 2**(b+d) */
> +    prod.mant = int128_mul_6464(sf_getmant(a), sf_getmant(b));
> +
> +    /*
> +     * Note: extracting the mantissa into an int is multiplying by
> +     * 2**23, so adjust here
> +     */
> +    prod.exp = sf_getexp(a) + sf_getexp(b) - SF_BIAS - 23;
> +    prod.sign = a.x.sign ^ b.x.sign;
> +    if (isz(a.f) || isz(b.f)) {
> +        prod.exp = -2 * WAY_BIG_EXP;
> +    }
> +    xf_debug("prod: ", prod);
> +    if ((scale > 0) && (fpclassify(c.f) == FP_SUBNORMAL)) {
> +        acc.mant = int128_mul_6464(0, 0);
> +        acc.exp = -WAY_BIG_EXP;
> +        acc.sign = c.x.sign;
> +        acc.sticky = 1;
> +        xf_debug("special denorm acc: ", acc);
> +        result = xf_add(prod, acc);
> +    } else if (!isz(c.f)) {
> +        acc.mant = int128_mul_6464(sf_getmant(c), 1);
> +        acc.exp = sf_getexp(c);
> +        acc.sign = c.x.sign;
> +        xf_debug("acc: ", acc);
> +        result = xf_add(prod, acc);
> +    } else {
> +        result = prod;
> +    }
> +    xf_debug("sum: ", result);
> +    debug_fprintf(stdout, "Scaling: %d\n", scale);
> +    result.exp += scale;
> +    xf_debug("post-scale: ", result);
> +    return xf_round_sf_t(result).f;
> +}
> +
> +
> +float internal_fmaf(float a_in, float b_in, float c_in)
> +{
> +    return internal_fmafx(a_in, b_in, c_in, 0);
> +}
> +
> +double internal_fma(double a_in, double b_in, double c_in)
> +{
> +    return internal_fmax(a_in, b_in, c_in, 0);
> +}
> +
> +float internal_mpyf(float a_in, float b_in)
> +{
> +    if (isz(a_in) || isz(b_in)) {
> +        return a_in * b_in;
> +    }
> +    return internal_fmafx(a_in, b_in, 0.0, 0);
> +}
> +
> +double internal_mpy(double a_in, double b_in)
> +{
> +    if (isz(a_in) || isz(b_in)) {
> +        return a_in * b_in;
> +    }
> +    return internal_fmax(a_in, b_in, 0.0, 0);
> +}
> +
> +static inline double internal_mpyhh_special(double a, double b)
> +{
> +    return a * b;
> +}
> +
> +double internal_mpyhh(double a_in, double b_in,
> +                      unsigned long long int accumulated)
> +{
> +    df_t a, b;
> +    xf_t x;
> +    unsigned long long int prod;
> +    unsigned int sticky;
> +
> +    a.f = a_in;
> +    b.f = b_in;
> +    sticky = accumulated & 1;
> +    accumulated >>= 1;
> +    xf_init(&x);
> +    if (isz(a_in) || isnan(a_in) || isinf(a_in)) {
> +        return internal_mpyhh_special(a_in, b_in);
> +    }
> +    if (isz(b_in) || isnan(b_in) || isinf(b_in)) {
> +        return internal_mpyhh_special(a_in, b_in);
> +    }
> +    x.mant = int128_mul_6464(accumulated, 1);
> +    x.sticky = sticky;
> +    prod = fGETUWORD(1, df_getmant(a)) * fGETUWORD(1, df_getmant(b));
> +    x.mant = int128_add(x.mant, int128_mul_6464(prod, 0x100000000ULL));
> +    x.exp = df_getexp(a) + df_getexp(b) - DF_BIAS - 20;
> +    xf_debug("formed: ", x);
> +    if (!isnormal(a_in) || !isnormal(b_in)) {
> +        /* crush to inexact zero */
> +        x.sticky = 1;
> +        x.exp = -4096;
> +        xf_debug("crushing: ", x);
> +    }
> +    x.sign = a.x.sign ^ b.x.sign;
> +    xf_debug("with sign: ", x);
> +    return xf_round_df_t(x).f;
> +}
> +
> +float conv_df_to_sf(double in_f)
> +{
> +    xf_t x;
> +    df_t in;
> +    if (isz(in_f) || isnan(in_f) || isinf(in_f)) {
> +        return in_f;
> +    }
> +    xf_init(&x);
> +    in.f = in_f;
> +    x.mant = int128_mul_6464(df_getmant(in), 1);
> +    x.exp = df_getexp(in) - DF_BIAS + SF_BIAS - 52 + 23;
> +    x.sign = in.x.sign;
> +    xf_debug("conv to sf: x: ", x);
> +    return xf_round_sf_t(x).f;
> +}
> +
> diff --git a/target/hexagon/fma_emu.h b/target/hexagon/fma_emu.h
> new file mode 100644
> index 0000000..e38cad9
> --- /dev/null
> +++ b/target/hexagon/fma_emu.h
> @@ -0,0 +1,30 @@
> +/*
> + *  Copyright (c) 2019 Qualcomm Innovation Center, Inc. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef FMA_EMU_H
> +#define FMA_EMU_H
> +
> +extern float internal_fmafx(float a_in, float b_in, float c_in, int scale);
> +extern float internal_fmaf(float a_in, float b_in, float c_in);
> +extern double internal_fma(double a_in, double b_in, double c_in);
> +extern double internal_fmax(double a_in, double b_in, double c_in, int scale);
> +extern float internal_mpyf(float a_in, float b_in);
> +extern double internal_mpy(double a_in, double b_in);
> +extern double internal_mpyhh(double a_in, double b_in,
> +                             unsigned long long int accumulated);
> +
> +#endif
>

next prev parent reply	other threads:[~2020-02-11  7:30 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11  0:39 [RFC PATCH 00/66] Hexagon patch series Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 01/66] Hexagon Maintainers Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 02/66] Hexagon ELF Machine Definition Taylor Simpson
2020-02-11  7:16   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 03/66] Hexagon CPU Scalar Core Definition Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 04/66] Hexagon register names Taylor Simpson
2020-02-11  7:18   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 05/66] Hexagon Disassembler Taylor Simpson
2020-02-11  7:20   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 06/66] Hexagon CPU Scalar Core Helpers Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 07/66] Hexagon GDB Stub Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 08/66] Hexagon instruction and packet types Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 09/66] Hexagon architecture types Taylor Simpson
2020-02-11  7:23   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 10/66] Hexagon register fields Taylor Simpson
2020-02-11 15:29   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 11/66] Hexagon instruction attributes Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 12/66] Hexagon register map Taylor Simpson
2020-02-11  7:26   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 13/66] Hexagon instruction/packet decode Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 14/66] Hexagon instruction printing Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 15/66] Hexagon arch import - instruction semantics definitions Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 16/66] Hexagon arch import - macro definitions Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 17/66] Hexagon arch import - instruction encoding Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 18/66] Hexagon instruction class definitions Taylor Simpson
2020-02-11  0:39 ` [RFC PATCH 19/66] Hexagon instruction utility functions Taylor Simpson
2020-02-11  7:29   ` Philippe Mathieu-Daudé [this message]
2020-02-11  0:39 ` [RFC PATCH 20/66] Hexagon generator phase 1 - C preprocessor for semantics Taylor Simpson
2020-02-11  7:30   ` Philippe Mathieu-Daudé
2020-02-11  0:39 ` [RFC PATCH 21/66] Hexagon generator phase 2 - qemu_def_generated.h Taylor Simpson
2020-02-11  7:33   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 22/66] Hexagon generator phase 2 - qemu_wrap_generated.h Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 23/66] Hexagon generator phase 2 - opcodes_def_generated.h Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 24/66] Hexagon generator phase 2 - op_attribs_generated.h Taylor Simpson
2020-02-11  8:01   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 25/66] Hexagon generator phase 2 - op_regs_generated.h Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 26/66] Hexagon generator phase 2 - printinsn-generated.h Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 27/66] Hexagon generator phase 3 - C preprocessor for decode tree Taylor Simpson
2020-02-11  7:35   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 28/66] Hexagon generater phase 4 - Decode tree Taylor Simpson
2020-02-11  7:37   ` Philippe Mathieu-Daudé
2020-02-11  8:03     ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 29/66] Hexagon opcode data structures Taylor Simpson
2020-02-11  7:40   ` Philippe Mathieu-Daudé
2020-02-12 17:36     ` Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 30/66] Hexagon macros to interface with the generator Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 31/66] Hexagon macros referenced in instruction semantics Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 32/66] Hexagon instruction classes Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 33/66] Hexagon TCG generation helpers - step 1 Taylor Simpson
2020-02-11 15:22   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 34/66] Hexagon TCG generation helpers - step 2 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 35/66] Hexagon TCG generation helpers - step 3 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 36/66] Hexagon TCG generation helpers - step 4 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 37/66] Hexagon TCG generation helpers - step 5 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 38/66] Hexagon TCG generation - step 01 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 39/66] Hexagon TCG generation - step 02 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 40/66] Hexagon TCG generation - step 03 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 41/66] Hexagon TCG generation - step 04 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 42/66] Hexagon TCG generation - step 05 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 43/66] Hexagon TCG generation - step 06 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 44/66] Hexagon TCG generation - step 07 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 45/66] Hexagon TCG generation - step 08 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 46/66] Hexagon TCG generation - step 09 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 47/66] Hexagon TCG generation - step 10 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 48/66] Hexagon TCG generation - step 11 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 49/66] Hexagon TCG generation - step 12 Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 50/66] Hexagon translation Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 51/66] Hexagon Linux user emulation Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 52/66] Hexagon build infrastructure Taylor Simpson
2020-02-11  7:15   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 53/66] Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 54/66] Hexagon HVX support in gdbstub Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 55/66] Hexagon HVX import instruction encodings Taylor Simpson
2020-02-11  7:02   ` Philippe Mathieu-Daudé
2020-02-11 14:35     ` Taylor Simpson
2020-02-11 14:40       ` Philippe Mathieu-Daudé
2020-02-11 14:43         ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 56/66] Hexagon HVX import semantics Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 57/66] Hexagon HVX import macro definitions Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 58/66] Hexagon HVX semantics generator Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 59/66] Hexagon HVX instruction decoding Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 60/66] Hexagon HVX instruction utility functions Taylor Simpson
2020-02-11  7:46   ` Philippe Mathieu-Daudé
2020-02-11  0:40 ` [RFC PATCH 61/66] Hexagon HVX macros to interface with the generator Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 62/66] Hexagon HVX macros referenced in instruction semantics Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 63/66] Hexagon HVX helper to commit vector stores (masked and scatter/gather) Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 64/66] Hexagon HVX TCG generation Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 65/66] Hexagon HVX translation Taylor Simpson
2020-02-11  0:40 ` [RFC PATCH 66/66] Hexagon HVX build infrastructure Taylor Simpson
2020-02-11  1:31 ` [RFC PATCH 00/66] Hexagon patch series no-reply
2020-02-11  7:49   ` Philippe Mathieu-Daudé
2020-02-11  7:53 ` Philippe Mathieu-Daudé
2020-02-11 15:32 ` Philippe Mathieu-Daudé
2020-02-26 16:13   ` Taylor Simpson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9831c7d0-a954-6c73-9729-a3eb7041b979@redhat.com \
    --to=philmd@redhat.com \
    --cc=aleksandar.m.mail@gmail.com \
    --cc=laurent@vivier.eu \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=riku.voipio@iki.fi \
    --cc=tsimpson@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).