From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S266819AbUHISRT (ORCPT ); Mon, 9 Aug 2004 14:17:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266816AbUHISRR (ORCPT ); Mon, 9 Aug 2004 14:17:17 -0400 Received: from colin2.muc.de ([193.149.48.15]:27923 "HELO colin2.muc.de") by vger.kernel.org with SMTP id S266833AbUHISQY (ORCPT ); Mon, 9 Aug 2004 14:16:24 -0400 Date: 9 Aug 2004 20:16:22 +0200 Date: Mon, 9 Aug 2004 20:16:22 +0200 From: Andi Kleen To: Bob Deblier Cc: linux-kernel@vger.kernel.org Subject: Re: AES assembler optimizations Message-ID: <20040809181622.GA42722@muc.de> References: <2riR3-7U5-3@gated-at.bofh.it> <1092067328.4332.40.camel@orion> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1092067328.4332.40.camel@orion> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 09, 2004 at 06:02:08PM +0200, Bob Deblier wrote: > On Mon, 2004-08-09 at 16:28, Andi Kleen wrote: > > Bob Deblier writes: > > > > > Just picked up on KernelTrap that there were some problems with > > > optimized AES code; if you wish, I can provide my own LGPL licensed (or > > > I can relicense them for you under GPL), as included in the BeeCrypt > > > Cryptography Library. > > > > > > I have generic i586 code and SSE-optimized code available in GNU > > > assembler format. Latest version is always available on SourceForge > > > (http://sourceforge.net/cvs/?group_id=8924). > > > > Would be interesting. Do you have any benchmarks for your code? > > BeeCrypt contains benchmarks in the 'tests' subdirectory. Running of > 'make bench' will execute them. Benchmarks results below for repeatedly > looping over the same 16K block, produced by 'benchbc', without any > tweaks (YMMV): I guess a cache cold benchmark would be more interesting. AFAIK linux does encryption/decryption usually on cache cold buffers. > P4 2400, with MMX: > ECB encrypted 738304 KB in 10.00 seconds = 73823.02 KB/s > CBC encrypted 659456 KB in 10.00 seconds = 65925.82 KB/s > ECB decrypted 765952 KB in 10.00 seconds = 76564.57 KB/s > CBC decrypted 616448 KB in 10.02 seconds = 61546.33 KB/s > > P4 2400, plain i386: > ECB encrypted 584704 KB in 10.01 seconds = 58435.34 KB/s > CBC encrypted 570368 KB in 10.01 seconds = 56979.82 KB/s > ECB decrypted 444416 KB in 10.02 seconds = 44357.32 KB/s > CBC decrypted 423936 KB in 10.02 seconds = 42304.76 KB/s MMX seems to be fast enough that it's probably a win to use, even with the overhead of kernel_fpu_begin/end It usually annoys the "low latency" people a bit though because it requires disabling kernel preemption during the computation. -Andi