From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v2 03/23] x86: zero BSS using stosl instead
	of stosb
Date: Wed, 22 Jul 2015 09:42:55 +0100
Message-ID: <55AF578F.4070403@citrix.com>
References: <1437402558-7313-1-git-send-email-daniel.kiper@oracle.com>
	<1437402558-7313-4-git-send-email-daniel.kiper@oracle.com>
	<55AE2F0C02000078000938D2@prv-mh.provo.novell.com>
	<20150721182300.GK3479@olila.local.net-space.pl>
	<55AF35A502000078000D4F8E@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <amc96@hermes.cam.ac.uk>) id 1ZHpcB-00049d-Kk
	for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 08:42:59 +0000
In-Reply-To: <55AF35A502000078000D4F8E@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <jbeulich@suse.com>, daniel.kiper@oracle.com
Cc: xen-devel@lists.xenproject.org, keir@xen.org
List-Id: xen-devel@lists.xenproject.org

On 22/07/2015 06:18, Jan Beulich wrote:
>>>> Daniel Kiper <daniel.kiper@oracle.com> 07/21/15 8:23 PM >>>
>> On Tue, Jul 21, 2015 at 03:37:48AM -0600, Jan Beulich wrote:
>>>>>> On 20.07.15 at 16:28, <daniel.kiper@oracle.com> wrote:
>>> ... because of ??? Nowadays - with X86_FEATURE_ERMS - rep stosb
>>> is expected to be faster than rep stosl.
>> OK, I did not know about that. However, as I know this feature
>> was introduced in 2012 with Ivy Bridge. So, I suppose that there
>> are still a lot of machines in the wild which does not support it.
>> Anyway, because this code is not performance critical I am not going
>> to insist on one or another solution. However, Andrew suggested that
>> thing, so, please agree with him in which direction we should go.
>> I will do what you agree.
> ISTR having seen a similar patch from him(?), maybe in another area
> of code, before (or was it v1 of this one?), which I responded to with the
> same as above.

Indeed you have, several in fact.  I had not had chance to delve into
the optimisation manuals, but have taken a peek now.  (Section 3.7.6)

In the case of having aligned source and destination on a 16-byte
boundary (which we can trivially arrange), then ERMSB (to give it its
Intel name) and rep stosl differ only in the setup cost; they still
scale at the same rate for changes in length.

Therefore, assuming we arrange for 16-byte alignment, using rep stosl
would appear to be a single 60ish cycle hit over using ERMSB, but would
be substantially more efficient than using rep stosb on a non-ERMSB system.

Overall, I think 16 byte alignment and rep stosl is the best compromise.

~Andrew