From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: x86 SHA1: Faster than OpenSSL Date: Mon, 3 Aug 2009 23:40:35 -0700 (PDT) Message-ID: References: <20090804044842.6792.qmail@science.horizon.com> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Git Mailing List To: George Spelvin X-From: git-owner@vger.kernel.org Tue Aug 04 08:41:38 2009 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1MYDiJ-0006Sx-80 for gcvg-git-2@gmane.org; Tue, 04 Aug 2009 08:41:35 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932218AbZHDGlF (ORCPT ); Tue, 4 Aug 2009 02:41:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755004AbZHDGlE (ORCPT ); Tue, 4 Aug 2009 02:41:04 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:57719 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755084AbZHDGlD (ORCPT ); Tue, 4 Aug 2009 02:41:03 -0400 Received: from imap1.linux-foundation.org (imap1.linux-foundation.org [140.211.169.55]) by smtp1.linux-foundation.org (8.14.2/8.13.5/Debian-3ubuntu1.1) with ESMTP id n746eaPS021248 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 3 Aug 2009 23:40:37 -0700 Received: from localhost (localhost [127.0.0.1]) by imap1.linux-foundation.org (8.13.5.20060308/8.13.5/Debian-3ubuntu1.1) with ESMTP id n746eZSh031546; Mon, 3 Aug 2009 23:40:36 -0700 X-X-Sender: torvalds@localhost.localdomain In-Reply-To: <20090804044842.6792.qmail@science.horizon.com> User-Agent: Alpine 2.01 (LFD 1184 2008-12-16) X-Spam-Status: No, hits=-3.467 required=5 tests=AWL,BAYES_00 X-Spam-Checker-Version: SpamAssassin 3.2.4-osdl_revision__1.47__ X-MIMEDefang-Filter: lf$Revision: 1.188 $ X-Scanned-By: MIMEDefang 2.63 on 140.211.169.13 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, 4 Aug 2009, George Spelvin wrote: > +sha1_block_data_order: > + pushl %ebp > + pushl %ebx > + pushl %esi > + pushl %edi > + movl 20(%esp),%edi > + movl 24(%esp),%esi > + movl 28(%esp),%eax > + subl $64,%esp > + shll $6,%eax > + addl %esi,%eax > + movl %eax,92(%esp) > + movl 16(%edi),%ebp > + movl 12(%edi),%edx > +.align 16 > +.L000loop: > + movl (%esi),%ecx > + movl 4(%esi),%ebx > + bswap %ecx > + movl 8(%esi),%eax > + bswap %ebx > + movl %ecx,(%esp) ... Hmm. Does it really help to do the bswap as a separate initial phase? As far as I can tell, you load the result of the bswap just a single time for each value. So the initial "bswap all 64 bytes" seems pointless. > + /* 00_15 0 */ > + movl %edx,%edi > + movl (%esp),%esi Why not do the bswap here instead? Is it because you're running out of registers for scheduling, and want to use the stack pointer rather than the original source? Or does the data dependency end up being so much better that you're better off doing a separate bswap loop? Or is it just because the code was written that way? Intriguing, either way. Linus