From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:41905)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1UN33h-0002c9-Um
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:23:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1UN33f-0002YT-Ks
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:23:37 -0400
Received: from cantor2.suse.de ([195.135.220.15]:50719 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1UN33f-0002YL-FZ
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 11:23:35 -0400
Message-ID: <515AF7F5.1020401@suse.de>
Date: Tue, 02 Apr 2013 17:23:33 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <1364876610-3933-1-git-send-email-rth@twiddle.net>
	<1364876610-3933-18-git-send-email-rth@twiddle.net>
	<F245E1E9-CF0A-4A73-BD6E-A14BF8FF0FE6@suse.de>
	<515AE0B2.5090005@twiddle.net>
	<34B5C19C-785B-43E0-B1A5-F75D16EA6F09@suse.de>
	<515AF541.1040900@twiddle.net>
In-Reply-To: <515AF541.1040900@twiddle.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: "av1474@comtv.ru" <av1474@comtv.ru>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "aurelien@aurel32.net" <aurelien@aurel32.net>

On 04/02/2013 05:12 PM, Richard Henderson wrote:
> On 2013-04-02 07:41, Alexander Graf wrote:
>>> On 2013-04-01 23:34, Alexander Graf wrote:
>>>> Is this faster than a load/store with std/ldbrx?
>>>
>>> Hmm.  Almost certainly not.  And since we've got stack space
>>> allocated for function calls, we've got scratch space to do it in.
>>>
>>> Probably similar for bswap32 too, eh?
>>
>> Depends - memory load/store doesn't come for free and bswap32 is 
>> quite short.
>>
>>>
>>> I'll do a tiny bit o benchmarking for power7.
>>
>> Cool, thanks a bunch :)
>
> Heh.  "Almost certainly not" indeed.  Unless I've made some silly 
> mistake,
> going through memory stalls badly.  No store buffer forwarding on power7?
>
> With the following test case, time reports:
>
> f1        2.967s
> f2        8.930s
> f3        7.071s
> f4        7.166s
>
> And note that f4 is a normal store/load pair, trying to determine what 
> the
> store buffer forwarding delay might be.

Yeah, doesn't look like it makes any sense at all to do a load/store 
cycle then. What a shame :).

Keep in mind that this tests icache hot cycles. However, you might get 
bad icache penalties due to the long bswap64 sequence. So all the memory 
latency you see here might also affect the instruction stream when it 
gets executed. But then again we only care about performance of cache 
hot sequences in the first place....


Alex