From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb Date: Wed, 22 Jul 2015 09:42:55 +0100 Message-ID: <55AF578F.4070403@citrix.com> References: <1437402558-7313-1-git-send-email-daniel.kiper@oracle.com> <1437402558-7313-4-git-send-email-daniel.kiper@oracle.com> <55AE2F0C02000078000938D2@prv-mh.provo.novell.com> <20150721182300.GK3479@olila.local.net-space.pl> <55AF35A502000078000D4F8E@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHpcB-00049d-Kk for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 08:42:59 +0000 In-Reply-To: <55AF35A502000078000D4F8E@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , daniel.kiper@oracle.com Cc: xen-devel@lists.xenproject.org, keir@xen.org List-Id: xen-devel@lists.xenproject.org On 22/07/2015 06:18, Jan Beulich wrote: >>>> Daniel Kiper 07/21/15 8:23 PM >>> >> On Tue, Jul 21, 2015 at 03:37:48AM -0600, Jan Beulich wrote: >>>>>> On 20.07.15 at 16:28, wrote: >>> ... because of ??? Nowadays - with X86_FEATURE_ERMS - rep stosb >>> is expected to be faster than rep stosl. >> OK, I did not know about that. However, as I know this feature >> was introduced in 2012 with Ivy Bridge. So, I suppose that there >> are still a lot of machines in the wild which does not support it. >> Anyway, because this code is not performance critical I am not going >> to insist on one or another solution. However, Andrew suggested that >> thing, so, please agree with him in which direction we should go. >> I will do what you agree. > ISTR having seen a similar patch from him(?), maybe in another area > of code, before (or was it v1 of this one?), which I responded to with the > same as above. Indeed you have, several in fact. I had not had chance to delve into the optimisation manuals, but have taken a peek now. (Section 3.7.6) In the case of having aligned source and destination on a 16-byte boundary (which we can trivially arrange), then ERMSB (to give it its Intel name) and rep stosl differ only in the setup cost; they still scale at the same rate for changes in length. Therefore, assuming we arrange for 16-byte alignment, using rep stosl would appear to be a single 60ish cycle hit over using ERMSB, but would be substantially more efficient than using rep stosb on a non-ERMSB system. Overall, I think 16 byte alignment and rep stosl is the best compromise. ~Andrew