From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= Subject: Linus' sha1 is much faster! Date: Sat, 15 Aug 2009 00:25:36 +0100 Message-ID: <4A85F270.20703@draigBrady.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010502090801070300010206" Cc: Git Mailing List To: Bug-coreutils@gnu.org, Linus Torvalds X-From: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Sat Aug 15 01:29:30 2009 Return-path: Envelope-to: gcgcb-bug-coreutils-616@gmane.org Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mc6DB-0001HP-SR for gcgcb-bug-coreutils-616@gmane.org; Sat, 15 Aug 2009 01:29:30 +0200 Received: from localhost ([127.0.0.1]:39794 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mc6DB-000473-4x for gcgcb-bug-coreutils-616@gmane.org; Fri, 14 Aug 2009 19:29:29 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mc6D6-00042L-Dv for bug-coreutils@gnu.org; Fri, 14 Aug 2009 19:29:24 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mc6D1-0003uQ-Cm for Bug-coreutils@gnu.org; Fri, 14 Aug 2009 19:29:23 -0400 Received: from [199.232.76.173] (port=60824 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mc6D1-0003u2-1r for Bug-coreutils@gnu.org; Fri, 14 Aug 2009 19:29:19 -0400 Received: from mail120.emailantidote.com ([80.169.59.120]:61809 helo=SC-MTA-01.mxsweep.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Mc6D0-0004fC-Dy for Bug-coreutils@gnu.org; Fri, 14 Aug 2009 19:29:18 -0400 Received: from tombstone.lincor.com ([84.203.137.218]) by SC-MTA-01.mxsweep.com with Microsoft SMTPSVC(7.0.6001.18000); Sat, 15 Aug 2009 00:29:15 +0100 Received: from [192.168.2.25] (crom.labs.lincor.com [192.168.2.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tombstone.lincor.com (Postfix) with ESMTP id E5ABD6105239; Sat, 15 Aug 2009 00:29:13 +0100 (IST) User-Agent: Thunderbird 2.0.0.6 (X11/20071008) X-Enigmail-Version: 0.95.0 X-OriginalArrivalTime: 14 Aug 2009 23:29:15.0228 (UTC) FILETIME=[0970E1C0:01CA1D37] x-MXSweep-CtasdSpam: Unknown x-MXSweep-CtasdVirus: Unknown x-Ctasd-RefID: str=0001.0A090205.4A85F34C.0023,ss=1,fgs=0 x-MXSweep-KeywordsCount: 0 x-MXSweep-MetaScanResult: Clean x-MXSweep-MetaScanThreat: x-MXSweep-VirusScanned: 14/08/2009 23:29:16 x-MXPurifier-SpamScore: 0 x-MXPurifier-VirusScore: 0 x-MXSweep-Threat: Clean X-MXUniqueID: 28651ecf-d32b-4a5c-9d6c-34493eea381c X-detected-operating-system: by monty-python.gnu.org: Windows 2000 SP2+, XP SP1+ (seldom 98) X-BeenThere: bug-coreutils@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "GNU Core Utilities: bug reports and discussion" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Errors-To: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Archived-At: --------------010502090801070300010206 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I've noticed before that coreutils hashing utils were a little behind in performance, but was prompted to look at it again when I noticed the recently updated sha1 implementation in git: http://git.kernel.org/?p=3Dgit/git.git;a=3Dhistory;f=3Dblock-sha1;h=3Dd31= 21f7;hb=3Dpu Testing that with the attached program which I wrote in a couple of mins to try and match sha1sum's system calls shows that it's around 33% faster, as shown below: $ gcc $(rpm -q --qf=3D"%{OPTFLAGS}\n" coreutils) linus-sha1.c sha1.c -o l= inus-sha1 $ time ./linus-sha1 300MB_file df1e19e245fee4f53087b50ef953ca2c8d1644d7 300MB_file real 0m2.742s user 0m2.516s sys 0m0.206s $ time ~/git/coreutils/src/sha1sum 300MB_file df1e19e245fee4f53087b50ef953ca2c8d1644d7 300MB_file real 0m4.166s user 0m3.846s sys 0m0.298s So, could we use that code in coreutils? Think of all the dead fish it would save. I've also attached a trivial block-sha1 patch which doesn't affect performance, but does suppress a signed unsigned comparison warning which occurs with -Wextra for example. cheers, P=C3=A1draig. --------------010502090801070300010206 Content-Type: text/x-csrc; name="linus-sha1.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="linus-sha1.c" /* gcc -O2 -Wall linus-sha1.c sha1.c -o linus-sha1 */ #include #include #include "sha1.h" int main(int argc, char** argv) { if (argc != 2) return 1; const char* filename = argv[1]; FILE *fp = fopen (filename, "r"); if (!fp) return 1; #define BS 4096 /* match coreutils */ blk_SHA_CTX ctx; blk_SHA1_Init(&ctx); size_t nr; char buf[BS]; while ((nr=fread_unlocked(buf, 1, sizeof(buf), fp))) blk_SHA1_Update(&ctx, buf, nr); unsigned char hash[20]; blk_SHA1_Final(hash, &ctx); int i; for (i=0; i>From fa75e818836f763357ff9b7bbde3327e1aabbe47 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= Date: Sat, 15 Aug 2009 00:17:30 +0100 Subject: [PATCH] block-sha1: suppress signed unsigned comparison warning * block-sha1/sha1.c: Use unsigned ints as the values will never go negative. --- block-sha1/sha1.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index d3121f7..be763d8 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -231,13 +231,13 @@ void blk_SHA1_Init(blk_SHA_CTX *ctx) void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) { - int lenW = ctx->size & 63; + unsigned int lenW = ctx->size & 63; ctx->size += len; /* Read the data into W and process blocks as they get full */ if (lenW) { - int left = 64 - lenW; + unsigned int left = 64 - lenW; if (len < left) left = len; memcpy(lenW + (char *)ctx->W, data, left); -- 1.6.2.5 --------------010502090801070300010206--