From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb Date: Wed, 22 Jul 2015 12:22:18 +0100 Message-ID: <55AF7CEA.6060202@citrix.com> References: <1437402558-7313-1-git-send-email-daniel.kiper@oracle.com> <1437402558-7313-4-git-send-email-daniel.kiper@oracle.com> <55AE2F0C02000078000938D2@prv-mh.provo.novell.com> <20150721182300.GK3479@olila.local.net-space.pl> <55AF35A502000078000D4F8E@prv-mh.provo.novell.com> <55AF578F.4070403@citrix.com> <55AF86CD0200007800093EC3@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHs6R-0001oT-9J for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 11:22:23 +0000 In-Reply-To: <55AF86CD0200007800093EC3@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel@lists.xenproject.org, daniel.kiper@oracle.com, keir@xen.org List-Id: xen-devel@lists.xenproject.org On 22/07/15 11:04, Jan Beulich wrote: >>>> On 22.07.15 at 10:42, wrote: >> In the case of having aligned source and destination on a 16-byte >> boundary (which we can trivially arrange), then ERMSB (to give it its >> Intel name) and rep stosl differ only in the setup cost; they still >> scale at the same rate for changes in length. >> >> Therefore, assuming we arrange for 16-byte alignment, using rep stosl >> would appear to be a single 60ish cycle hit over using ERMSB, but would >> be substantially more efficient than using rep stosb on a non-ERMSB system. >> >> Overall, I think 16 byte alignment and rep stosl is the best compromise. > Or leaving such code alone, with the assumption that over time the > setup cost (on a growing number of systems) outweighs the benefits > (on a shrinking set). The BSS is large - 295k on the last compile I have from staging. The setup cost is lost in the nose compared to the elapsed time to write that many zeroes to memory. Therefore, on an ERMBS-capable system, the two options will complete in the same amount of time. However, on all AMD hardware and Intel hardware older than IvyBridge, rep stosl is 4 times faster than rep stosb. ~Andrew