From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v2 03/23] x86: zero BSS using stosl instead
	of stosb
Date: Wed, 22 Jul 2015 12:22:18 +0100
Message-ID: <55AF7CEA.6060202@citrix.com>
References: <1437402558-7313-1-git-send-email-daniel.kiper@oracle.com>
	<1437402558-7313-4-git-send-email-daniel.kiper@oracle.com>
	<55AE2F0C02000078000938D2@prv-mh.provo.novell.com>
	<20150721182300.GK3479@olila.local.net-space.pl>
	<55AF35A502000078000D4F8E@prv-mh.provo.novell.com>
	<55AF578F.4070403@citrix.com>
	<55AF86CD0200007800093EC3@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=6381c0ebe=Andrew.Cooper3@citrix.com>)
	id 1ZHs6R-0001oT-9J
	for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 11:22:23 +0000
In-Reply-To: <55AF86CD0200007800093EC3@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, daniel.kiper@oracle.com, keir@xen.org
List-Id: xen-devel@lists.xenproject.org

On 22/07/15 11:04, Jan Beulich wrote:
>>>> On 22.07.15 at 10:42, <andrew.cooper3@citrix.com> wrote:
>> In the case of having aligned source and destination on a 16-byte
>> boundary (which we can trivially arrange), then ERMSB (to give it its
>> Intel name) and rep stosl differ only in the setup cost; they still
>> scale at the same rate for changes in length.
>>
>> Therefore, assuming we arrange for 16-byte alignment, using rep stosl
>> would appear to be a single 60ish cycle hit over using ERMSB, but would
>> be substantially more efficient than using rep stosb on a non-ERMSB system.
>>
>> Overall, I think 16 byte alignment and rep stosl is the best compromise.
> Or leaving such code alone, with the assumption that over time the
> setup cost (on a growing number of systems) outweighs the benefits
> (on a shrinking set).

The BSS is large - 295k on the last compile I have from staging.  The
setup cost is lost in the nose compared to the elapsed time to write
that many zeroes to memory.

Therefore, on an ERMBS-capable system, the two options will complete in
the same amount of time.

However, on all AMD hardware and Intel hardware older than IvyBridge,
rep stosl is 4 times faster than rep stosb.

~Andrew