From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39439)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1ZMdli-0000ps-Qa
	for qemu-devel@nongnu.org; Tue, 04 Aug 2015 11:04:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1ZMdld-0008EL-3x
	for qemu-devel@nongnu.org; Tue, 04 Aug 2015 11:04:42 -0400
Received: from mail-qk0-x22a.google.com ([2607:f8b0:400d:c09::22a]:33416)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1ZMdlc-0008E9-W1
	for qemu-devel@nongnu.org; Tue, 04 Aug 2015 11:04:37 -0400
Received: by qkdg63 with SMTP id g63so4227255qkd.0
	for <qemu-devel@nongnu.org>; Tue, 04 Aug 2015 08:04:36 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
References: <BLU437-SMTP912C2CDD0DC31586DBF95EB9890@phx.gbl>
	<55BF9975.7020002@twiddle.net>
	<BLU437-SMTP798F0610B954E92B7BCF1B9770@phx.gbl>
	<BLU436-SMTP413636C802F1FB97281B3AB9760@phx.gbl>
From: Richard Henderson <rth@twiddle.net>
Message-ID: <55C0D480.2070103@twiddle.net>
Date: Tue, 4 Aug 2015 08:04:32 -0700
MIME-Version: 1.0
In-Reply-To: <BLU436-SMTP413636C802F1FB97281B3AB9760@phx.gbl>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Chen Gang <xili_gchen_5257@hotmail.com>, Chris Metcalf <cmetcalf@ezchip.com>, Peter Maydell <peter.maydell@linaro.org>, =?UTF-8?Q?Andreas_F=c3=a4rber?= <afaerber@suse.de>, "walt@tilera.com" <walt@tilera.com>
Cc: qemu-devel <qemu-devel@nongnu.org>

On 08/04/2015 06:56 AM, Chen Gang wrote:
> 
> On 8/4/15 04:47, Chen Gang wrote:
>> On 8/4/15 00:40, Richard Henderson wrote:
>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>> but for me, I can not find any details about them (the ISA
>>>> documents only give a summary description, but not details), e.g.
>>>
>>> The tilegx splits the four/six cycle arithmetic into multiple
>>> black-box instructions.  You need only really implement one of the
>>> four, with the rest of them being implemented as nops or moves.
>>>
>>> Looking at what gcc produces gives the hints:
>>>
>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>
>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>> insn can perform the whole operation, the pack1 insn performs a move
>>> from "flg" to "dst".
>>>
>>> Similarly for the single-precision:
>>>
>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>
>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>
> 
> After check the tilegx.md completely, for me, we still need implement
> each of them precisely, or we can not emulate all cases (e.g. muldf3).

No, you can still implement all of muldf3 in fdouble_mul_flags.
Again, the fdouble_pack1 copies from the flag input to the output.

Yes, there is a 64-bit multiply in there, but the tcg optimizer
should be able to delete all of that as unused.  Especially if you have the
fdouble_unpack* insns store zero into their destinations.

Don't get me wrong -- more accurate implementation of the actual
insns would be nice, especially for debugging.  But if the insns
aren't accurately documented I don't see what choice we have.

On the good side, implementing the entire operation as part of the "flags" step
probably results in faster emulation.


r~