From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1MzvrG-00024J-J3
	for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:22 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1MzvrC-00021X-0i
	for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:21 -0400
Received: from [199.232.76.173] (port=43294 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1MzvrB-00021U-PE
	for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:17 -0400
Received: from cantor2.suse.de ([195.135.220.15]:49892 helo=mx2.suse.de)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <uli@suse.de>) id 1MzvrB-0000HI-00
	for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:17 -0400
From: Ulrich Hecht <uli@suse.de>
Subject: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
Date: Mon, 19 Oct 2009 19:17:18 +0200
References: <1255696735-21396-1-git-send-email-uli@suse.de>
	<1255696735-21396-3-git-send-email-uli@suse.de>
	<20091017104256.GA27224@hall.aurel32.net>
In-Reply-To: <20091017104256.GA27224@hall.aurel32.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Disposition: inline
Message-Id: <200910191917.18972.uli@suse.de>
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Aurelien Jarno <aurelien@aurel32.net>
Cc: riku.voipio@iki.fi, qemu-devel@nongnu.org, agraf@suse.de

On Saturday 17 October 2009, Aurelien Jarno wrote:
> On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote:
> First of all a few general comments. Note that I know very few things
> about S390/S390X, so I may have dumb comments/questions. Also as the
> patch is very long, I probably have missed things, we should probably
> iterate with new versions of the patches.
>
> Is it possible given the current implementation to emulate S390 in
> addition to S390X?

Not as it stands. The S/390 instruction set is a subset of zArch=20
(64-bit), but currently only 64-bit addressing is implemented (because=20
that is the only mode used by 64-bit Linux binaries).

> If yes how different would be a S390 only target?=20

It would need support for 31-bit addressing. 24-bit, too, if you want to=20
run the really old stuff.

> If they won't be too different, it probably worth using _tl type
> registers and call it target-s390. A bit the way it is done with
> i386/x86_64, ppc/ppc64, mips/mips64 or sparc/sparc64.

If it's going to be implemented, it will certainly share most code with=20
the 64-bit target, so having a common directory would most likely be a=20
good idea.

There are some cases where _tl types might be appropriate, but generally=20
everything that is shared between S/390 and zArch is 32-bit, and=20
instructions present in both architectures also have the same operand=20
size: L, for instance, is a 32-bit load to a 32-bit register on both=20
architectures.

> Secondly there seems to be a lot of mix between 32-bit and 64-bit
> TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32
> variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv
> variables. TCGv/_tl types can map either to _i32 or _i64 depending on
> your target. You should try to build the target with
> --enable-debug-tcg

Er, reading the code and observing the behavior of the (AMD64) code=20
generator, it seemed to me that neither frontends nor backends care one=20
bit about that. I'll look into it, though. (I won't comment on those=20
below, except in cases where the solution or, indeed, the problem is not=20
clear to me.)

> It would be nice to split all the disassembling part in a separate
> combined with the related makefile changes. This way it can be applied
> separately.

Can do.

> >  //  switch (info->mach)
> >  //    {
> >  //    case bfd_mach_s390_31:
> > -      current_arch_mask =3D 1 << S390_OPCODE_ESA;
> > +//      current_arch_mask =3D 1 << S390_OPCODE_ESA;
> >  //      break;
> >  //    case bfd_mach_s390_64:
> > -//      current_arch_mask =3D 1 << S390_OPCODE_ZARCH;
> > +      current_arch_mask =3D 1 << S390_OPCODE_ZARCH;
> >  //      break;
> >  //    default:
> >  //      abort ();
>
> While I understand the second part, Why is it necessary to comment the
> bfd_mach_s390_31 case?

It's not necessary, but current_arch_mask can only hold one value...

> > +typedef union FPReg {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        float32 e;
> > +        int32_t __pad;
> > +#else
> > +        int32_t __pad;
> > +        float32 e;
> > +#endif
> > +    };
> > +    float64 d;
> > +    uint64_t i;
> > +} FPReg;
>
> WORDS_BIGENDIAN is wrong here. It should probably be
> HOST_WORDS_BIGENDIAN.

Indeed.

> Also it may be a better idea to=20
> reuse CPU_FloatU and CPU_DoubleU here.

Yes. Didn't know about those.

> > +CPUS390XState *cpu_s390x_init(const char *cpu_model)
>
> This function would probably be better in translate.c, so that
> s390x_translate_init() can be declared static.

I'll check that.

> Please use /* */ for comments (See CODING_STYLE).

OK.

> > +    /* FIXME: reset vector? */
>
> Yes, reset vector, MMU mode, default register values, ...

Works perfectly fine without. ;)

Also, the question remains what kind of reset cpu_reset() is supposed to=20
be. For instance, a CPU Reset or Initial CPU Reset on S/390 leaves the=20
general-purpose registers unchanged, a Clear Reset or Power-On Reset=20
sets them to zero. See the table on p. 4-50 in the zArchitecture=20
Principles of Operation (POP),=20
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/A2278324.pdf,=20
for the gory details.

> > +//#define DEBUG_HELPER
> > +#ifdef DEBUG_HELPER
> > +#define HELPER_LOG(x...) qemu_log(x)
> > +#else
> > +#define HELPER_LOG(x...)
> > +#endif
>
> Small comment for the rest of the file. While I understand HELPER_LOG
> is very handy while developing, I think some calls to it can be
> dropped in really simple functions.

Yes, a bit of cleanup will be in order.

> > +    for (i =3D 0; i <=3D l; i++) {
> > +        x =3D ldub(dest + i) & ldub(src + i);
> > +        if (x) cc =3D 1;
>
> coding style

Maybe you are referring to "Every indented statement is braced; even if=20
the block contains just one statement.", but I figured that that does=20
not apply here because there is no indentation...

> > +    if (v < 0) return 1;
> > +    else if (v > 0) return 2;
>
> coding style

Same here.

> The last 6 helpers can probably be coded easily in TCG (they are very
> similar to some PowerPC code), though merging this patch with the
> helpers is not a problem at all.

Again, that would be detrimental to performance. I have experimented with=
=20
that and found that TCG is well capable of optimizing out a pure helper=20
call, but not a complex piece of code with branches. Condition codes are=20
not used most times, and the overhead of calling a helper in the cases=20
they are is far smaller than doing all the condition code stuff that is=20
not used. Also reduces code size.

> All the branch part would really gain to be coded in TCG, as it will
> allow TB chaining.

I vaguely remember trying that as well, but I don't know if I gave up on=20
it because it was slower, or because I couldn't get it to work...

> > +            }
> > +            else if (r > d) {
>
> coding style

OK, this one is valid. :)

> __uint128_t is probably not supported on all hosts/GCC versions.

Definitely does not work on 32-bit hosts.

> mulu64() should be used instead.

Excellent, didn't know about that either.

> Same here, __uint128_t should not be used, though I don't know what
> should be used instead.

Hmmmm...

> > +/* unsigned string compare (c is string terminator) */
> > +uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2)
> > +{
> > +    uint64_t s1 =3D env->regs[r1];
> > +    uint64_t s2 =3D env->regs[r2];
> > +    uint8_t v1, v2;
> > +    uint32_t cc;
> > +    c =3D c & 0xff;
> > +#ifdef CONFIG_USER_ONLY
> > +    if (!c) {
> > +        HELPER_LOG("%s: comparing '%s' and '%s'\n",
> > +                   __FUNCTION__, (char*)s1, (char*)s2);
> > +    }
> > +#endif
>
> Why CONFIG_USER_ONLY ?

s1 and s2 are target addresses and not valid on the host system in the=20
SOFTMMU case, so this would segfault.

> > +/* union used for splitting/joining 128-bit floats to/from 64-bit
> > FP regs */ +typedef union {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        uint64_t h;
> > +        uint64_t l;
> > +#else
> > +        uint64_t l;
> > +        uint64_t h;
> > +#endif
> > +    };
> > +    float128 x;
> > +} FP128;
>
> WORDS_BIGENDIAN is wrong here. CPU_QuadU can probably be used instead.

Agree.

> > +    if (float32_is_neg(v2)) {
> > +        v1 =3D float32_abs(v2);
> > +    }
> > +    else {
> > +        v1 =3D v2;
> > +    }
>
> I don't see the point of such a test here.

Now that you mention it, I don't really see it either.

> > +    v2.i =3D ldl(a2);
>
> The value should be passed directly instead of loaded here, as ldl is
> is wrong depending on the MMU mode.

Yup.

> > +#ifdef WORDS_BIGENDIAN
>
> This is wrong. It should probably be HOST_WORDS_BIGENDIAN.

I concur.

> > +static TCGv load_reg(int reg)
> > +{
> > +    TCGv r =3D tcg_temp_new_i64();
> > +#ifdef TCGREGS
> > +    sync_reg32(reg);
> > +    tcg_gen_mov_i64(r, tcgregs[reg]);
> > +    return r;
> > +#else
> > +    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg]));
> > +    return r;
> > +#endif
> > +}
>
> I don't really like implicit TCGv temp allocation. In other targets it
> often has caused missing tcg_temp_free().

I did in fact run into problems with this and was thus forced to make=20
sure that no tcg_temp_free()s are missing... :)

> It should probably be=20
> rewritten as load_reg(TCGv t, int reg).

Later. This is the kind of change that you never get right on the first=20
try, so I'd like to do that after everything went upstream.

> For all register load/store, as already explained in the TCG sync op
> patch, I am in favor of using the ld/st version (TCGREGS not defined).

It doesn't perform.

> > +static void gen_illegal_opcode(DisasContext *s)
> > +{
> > +    TCGv tmp =3D tcg_temp_new_i64();
> > +    tcg_gen_movi_i64(tmp, 42);
>
> tcg_const_i64 could be used instead.

Yes. Using the actual value for a specification exception instead of 42=20
might be a good idea as well. :)

>
> > +    gen_helper_exception(tmp);
>
> Missing tcg_temp_free_i64(tmp);

OK, so I missed _that_ one. Fine. Be like that if you want to. Illegal=20
instructions are not that frequent anyway...

> > +    gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2));
>
> The TCG passed to the helper should be freed.

And that one...

> > +    gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2));
>
> Same.

And that one... :(

> > +    gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :((

> > +    gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :(((

> The branches should be handled using brcond and goto_tb and exit_tb
> and not with helpers, to allow TB chaining.

I'll look into that, but I guess it's not entirely trivial.

> > +    TCGv tmp =3D 0, tmp2 =3D 0, tmp3 =3D 0;
>
> This is wrong. 0 maps to global 0 (env in your case). -1 should be
> used instead, or even better the TCGV_UNUSED macro.

OK.

> > +    case 0x12: /* LT R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
>
> tcg_gen_qemu_ld32s loads a 32 bit value in the default size register,
> that is 64-bit here.

Actually, this a purely 32-bit insn, so it should be tcg_gen_qemu_ld32()=20
in the first place.

> > +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> > +            tcg_gen_ext32s_i64(tmp2, tmp2);
>
> The sign extension is already done by qemu_ld32s

Just to be on the safe side. :)

> > +        switch (op) {
> > +        case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3);
> > break; +        case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc,
> > tmp, tmp2, tmp3); break; +        default: tcg_abort();
>
> tcg_abort() is wrong here, it is for internal TCG use. Also all the
> real CPU it probably launch an illegal instruction exception, it does
> not power off the machin.

Actually, if default is reached here there's an error in the translator=20
(it should only get here in the explicit cases), so I'd consider=20
tcg_abort() appropriate.

> > +    case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D tcg_temp_new_i32();
>
> Wrong TCGv type;
>
> > +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
>
> Wrong TCGv type;
>
> > +        store_reg(r1, tmp2);

This is more wrong than you think, because this is a 32-bit insn, so it=20
shouldn't overwrite the top 32 bits. Rarely used, I guess that's how it=20
could slip through.

> > +    case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D load_reg32(r1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
> > +        tcg_gen_qemu_st32(tmp2, tmp, 1);
>
> Wrong TCGv type.

What's wrong here? Everything's 32-bit.

> > +    case 0x57: /* XY R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D load_reg32(r1);
> > +        tmp3 =3D tcg_temp_new_i32();
> > +        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
> > +        tcg_gen_xor_i32(tmp, tmp2, tmp3);
> > +        store_reg32(r1, tmp);
> > +        set_cc_nz_u32(tmp);
>
> Wrong TCGv type.

No need to zero-extend anyway, it's a 32-bit insn.

> > +    case 0x76: /* LB R1,D2(X2,B2) [RXY] */
> > +    case 0x77: /* LGB R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld8s(tmp2, tmp, 1);
> > +        switch (op) {
> > +        case 0x76:
> > +            tcg_gen_ext8s_i32(tmp2, tmp2);
>
> Wrong TCGv type.
>
> > +            store_reg32(r1, tmp2);
> > +            break;
> > +        case 0x77:
> > +            tcg_gen_ext8s_i64(tmp2, tmp2);
>
> Wrong TCGv type.

The way I see it, there are only 32-bit and 64-bit types, so it can only=20
be wrong in one case, right?

> > +    case 0x78: /* LHY R1,D2(X2,B2) [RXY] */
> > +        tmp2 =3D tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> > +        tcg_gen_ext16s_i32(tmp2, tmp2);
> > +        store_reg32(r1, tmp2);
> > +        break;

Replacing that tcg_gen_qemu_ld16s() with tcg_gen_qemu_ld16() should do=20
it, right?

> > +    case 0x86: /* MLG      R1,D2(X2,B2)     [RXY] */
> > +        tmp2 =3D tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> > +        tmp =3D tcg_const_i32(r1);
>
> Wrong TCGv type.
>
> > +        gen_helper_mlg(tmp, tmp2);

Huh? It's a register number. There are only 16 GP registers. Fits a=20
32-bit value, last time I checked. The helper also expects a 32-bit=20
value.

> > +    if (tmp2) tcg_temp_free(tmp2);
> > +    if (tmp3) tcg_temp_free(tmp3);
>
> Comparison on TCGv type is not allowed.

Not even to TCGV_UNUSED?

> > +static void disas_ed(DisasContext *s, int op, int r1, int x2, int
> > b2, int d2, int r1b) +{
> > +    TCGv tmp, tmp2, tmp3 =3D 0;
>
> tmp should be declared as TGV_i32 here, so that the types are correct
> for the whole function. Also tmp3 should not be initialized to 0.

OK.

> > +        gen_helper_madb(tmp3, tmp2, tmp);
>
> tmp3 should be freed after the helper.

Yup.

> > +    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
> > +    uint64_t target =3D s->pc + i2 * 2;
> > +    /* FIXME: huh? */ target &=3D 0xffffffff;
>
> That should be fixed before a merge.

Tricky, because I have not yet understood why it is necessary. According=20
to the POP, the address calculation in 64-bit addressing mode should be=20
64 bits. In practice, on a real machine running a real Linux kernel, it=20
wraps around at 32 bits, and userland code actually relies on that. No=20
clue what I could be missing here...

> > +static inline uint64_t ld_code2(uint64_t pc) { return
> > (uint64_t)lduw_code(pc); }
>
> Coding style.

OK.

> > +#if 1
> > +    cc =3D tcg_temp_local_new_i32();
> > +    tcg_gen_mov_i32(cc, global_cc);
> > +#else
> > +    cc =3D global_cc;
> > +#endif
>
> Why this?

Because it didn't work otherwise. I don't claim to understand the=20
mysterious ways of the TCG to its full extent...

> > +    } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc <
> > next_page_start && num_insns < max_insns);
>
> The also translation should be stopped if in singlestep mode
> (singlestep variable).

OK.

Phew, that was a big one. :)

Thanks for reviewing. I'll fix what makes sense to me and send a new one=20
when I'm done.

CU
Uli

--=20
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N=FCrnberg)