From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759897AbYEGFUq (ORCPT ); Wed, 7 May 2008 01:20:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753931AbYEGFUd (ORCPT ); Wed, 7 May 2008 01:20:33 -0400 Received: from mga02.intel.com ([134.134.136.20]:6386 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753705AbYEGFUb (ORCPT ); Wed, 7 May 2008 01:20:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.27,446,1204531200"; d="scan'208";a="380821819" Subject: Re: [PATCH -mm crypto] AES: x86_64 asm implementation optimization From: "Huang, Ying" To: Sebastian Siewior Cc: Herbert Xu , "Adam J. Richter" , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, mingo@elte.hu, tglx@linutronix.de In-Reply-To: <20080429221225.GA5280@Chamillionaire.breakpoint.cc> References: <1207723262.18313.37.camel@caritas-dev.intel.com> <20080416073108.GA13494@Chamillionaire.breakpoint.cc> <1208333949.4322.5.camel@caritas-dev.intel.com> <20080416184016.GA21365@Chamillionaire.breakpoint.cc> <1208403403.4322.27.camel@caritas-dev.intel.com> <20080423223221.GB16683@Chamillionaire.breakpoint.cc> <1209093077.20936.24.camel@caritas-dev.intel.com> <20080429221225.GA5280@Chamillionaire.breakpoint.cc> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Wed, 07 May 2008 13:26:13 +0800 Message-Id: <1210137973.4676.15.camel@caritas-dev.intel.com> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1 X-OriginalArrivalTime: 07 May 2008 05:20:27.0483 (UTC) FILETIME=[0F6612B0:01C8B002] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Sebastian, On Wed, 2008-04-30 at 00:12 +0200, Sebastian Siewior wrote: > * Huang, Ying | 2008-04-25 11:11:17 [+0800]: > > >Hi, Sebastian, > Hi Huang, > > sorry for the delay. > > >I changed the patches to group the read or write together instead of > >interleaving. Can you help me to test these new patches? The new patches > >is attached with the mail. > The new results are attached. It seems that the performance degradation between step4 to step5 is decreased. But the overall performance degradation between step0 to step7 is still about 5%. I also test the patches on Pentium 4 CPUs, and the performance decreased too. So I think this optimization is CPU micro-architecture dependent. While the dependency between instructions are reduced, more registers (at most 3) are saved/restored before/after encryption/decryption. If the CPU has no extra execution unit for newly independent instructions but more registers are saved/restored, the performance will decrease. We maybe should select different implementation based on micro-architecture. Best Regards, Huang Ying