From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:37850)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1UN2sa-0006Xy-Gd
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:12:11 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1UN2sX-0006PR-Oq
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:12:08 -0400
Received: from mail-vc0-f182.google.com ([209.85.220.182]:48539)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1UN2sX-0006PL-Jk
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:12:05 -0400
Received: by mail-vc0-f182.google.com with SMTP id ht11so532614vcb.41
	for <qemu-devel@nongnu.org>; Tue, 02 Apr 2013 08:12:05 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
Message-ID: <515AF541.1040900@twiddle.net>
Date: Tue, 02 Apr 2013 08:12:01 -0700
From: Richard Henderson <rth@twiddle.net>
MIME-Version: 1.0
References: <1364876610-3933-1-git-send-email-rth@twiddle.net>
	<1364876610-3933-18-git-send-email-rth@twiddle.net>
	<F245E1E9-CF0A-4A73-BD6E-A14BF8FF0FE6@suse.de>
	<515AE0B2.5090005@twiddle.net>
	<34B5C19C-785B-43E0-B1A5-F75D16EA6F09@suse.de>
In-Reply-To: <34B5C19C-785B-43E0-B1A5-F75D16EA6F09@suse.de>
Content-Type: multipart/mixed; boundary="------------010505030903020303030306"
Subject: Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: "av1474@comtv.ru" <av1474@comtv.ru>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "aurelien@aurel32.net" <aurelien@aurel32.net>

This is a multi-part message in MIME format.
--------------010505030903020303030306
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 2013-04-02 07:41, Alexander Graf wrote:
>> On 2013-04-01 23:34, Alexander Graf wrote:
>>> Is this faster than a load/store with std/ldbrx?
>>
>> Hmm.  Almost certainly not.  And since we've got stack space
>> allocated for function calls, we've got scratch space to do it in.
>>
>> Probably similar for bswap32 too, eh?
>
> Depends - memory load/store doesn't come for free and bswap32 is quite short.
>
>>
>> I'll do a tiny bit o benchmarking for power7.
>
> Cool, thanks a bunch :)

Heh.  "Almost certainly not" indeed.  Unless I've made some silly mistake,
going through memory stalls badly.  No store buffer forwarding on power7?

With the following test case, time reports:

f1		2.967s
f2		8.930s
f3		7.071s
f4		7.166s

And note that f4 is a normal store/load pair, trying to determine what the
store buffer forwarding delay might be.


r~

--------------010505030903020303030306
Content-Type: text/x-csrc;
 name="z.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="z.c"

static long __attribute__((noinline)) f1(long x, long *mem)
{
  long r, t;
  asm volatile (
       "rlwinm %0,%1,8,0,31\n\
	rlwimi %0,%1,24,0,7\n\
	rlwimi %0,%1,24,16,23\n\
	rldicl %0,%0,32,0\n\
	rldicl %2,%1,32,0\n\
	rlwimi %0,%2,8,0,31\n\
	rlwimi %0,%2,24,0,7\n\
	rlwimi %0,%2,24,16,23"
	: "=&r"(r), "=r"(t)
	: "r"(x));
  return r;
}

static long __attribute__((noinline)) f2(long x, long *mem)
{
  long r, t;
  asm volatile ("std %1,0(%2); ldbrx %0,0,%2" : "=r"(r) : "r"(x), "b"(mem));
  return r;
}

static long __attribute__((noinline)) f3(long x, long *mem)
{
  long r, t;
  asm volatile ("stdbrx %1,0,%2; ld %0,0(%2)" : "=r"(r) : "r"(x), "b"(mem));
  return r;
}

static long __attribute__((noinline)) f4(long x, long *mem)
{
  long r, t;
  asm volatile ("std %1,0(%2); ld %0,0(%2)" : "=r"(r) : "r"(x), "b"(mem));
  return r;
}

#define D1(x,y) x##y
#define DO(x)   D1(f,x)

int main()
{
    long tmp, i;
    for (i = 0; i < 1000000000; ++i)
      DO(N)(i, &tmp);
    return 0;
}

--------------010505030903020303030306--