From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: block-sha1: improve code on large-register-set machines Date: Tue, 11 Aug 2009 15:53:28 -0700 (PDT) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Git Mailing List , Junio C Hamano To: Nicolas Pitre X-From: git-owner@vger.kernel.org Wed Aug 12 00:54:43 2009 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mb0Et-0000pl-6b for gcvg-git-2@gmane.org; Wed, 12 Aug 2009 00:54:43 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755535AbZHKWye (ORCPT ); Tue, 11 Aug 2009 18:54:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754622AbZHKWye (ORCPT ); Tue, 11 Aug 2009 18:54:34 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:50760 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754124AbZHKWye (ORCPT ); Tue, 11 Aug 2009 18:54:34 -0400 Received: from imap1.linux-foundation.org (imap1.linux-foundation.org [140.211.169.55]) by smtp1.linux-foundation.org (8.14.2/8.13.5/Debian-3ubuntu1.1) with ESMTP id n7BMrUQN022672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 11 Aug 2009 15:53:31 -0700 Received: from localhost (localhost [127.0.0.1]) by imap1.linux-foundation.org (8.13.5.20060308/8.13.5/Debian-3ubuntu1.1) with ESMTP id n7BMrSFK032564; Tue, 11 Aug 2009 15:53:30 -0700 X-X-Sender: torvalds@localhost.localdomain In-Reply-To: User-Agent: Alpine 2.01 (LFD 1184 2008-12-16) X-Spam-Status: No, hits=-3.463 required=5 tests=AWL,BAYES_00 X-Spam-Checker-Version: SpamAssassin 3.2.4-osdl_revision__1.47__ X-MIMEDefang-Filter: lf$Revision: 1.188 $ X-Scanned-By: MIMEDefang 2.63 on 140.211.169.13 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, 11 Aug 2009, Nicolas Pitre wrote: > > Well... gcc is really strange in this case (and similar other ones) with > ARM compilation. A good indicator of the quality of the code is the > size of the stack frame. When using the "+m" then gcc creates a 816 > byte stack frame, the generated binary grows by approx 3000 bytes, and > performances is almost halved (7.600s). Looking at the assembly result > I just can't figure out all the crazy moves taking place. Even the > version with no barrier what so ever produces better assembly with a > stack frame of 560 bytes. Ok, that's just crazy. That function has a required stack size of exactly 64 bytes, and anything more than that is just spilling. And if you end up with a stack frame of 560 bytes, that means that gcc is doing some _crazy_ spilling. One thing that strikes me is that I've been just testing with gcc-4.4, and BenH (who did some tests on PPC where SHA1 is just _trivial_ because it all fits in the normal register space) noticed that older versions of gcc that he tested did much worse on this. I think Artur also posted (x86) numbers with older gcc versions doing worse. Maybe you're seeing some of that? Linus