From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sun, 5 Jan 2003 20:57:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sun, 5 Jan 2003 20:57:20 -0500 Received: from mailout10.sul.t-online.com ([194.25.134.21]:54164 "EHLO mailout10.sul.t-online.com") by vger.kernel.org with ESMTP id ; Sun, 5 Jan 2003 20:57:19 -0500 Date: Mon, 6 Jan 2003 03:05:42 +0100 From: Andi Kleen To: Linus Torvalds Cc: Andi Kleen , "David S. Miller" , andrew.morton@digeo.com, linux-kernel@vger.kernel.org, Mikael Pettersson Subject: Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements) Message-ID: <20030106020542.GA5615@averell> References: <20030105234617.GA4714@averell> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 06, 2003 at 02:33:28AM +0100, Linus Torvalds wrote: > > I can think of some things to speed it up more. e.g. replace all the > > push / pop in SAVE/RESTORE_ALL with sub $frame,%esp ; movl %reg,offset(%esp) > > and movl offset(%esp),%reg ; addl $frame,%esp. This way the CPU has > > no dependencies between all the load/store options unlike push/pop. > > Last I remember, that only made a difference on Athlons, and Intel CPU's I didn't benchmark it, but as a data point ICC 7 generates the movls instead of pushes now too, (even though it generates bigger code). In fact it is even more aggressive on that than gcc: gcc does it only for more than three or four registers, icc does it for two and more. So I expect it being faster on Intel CPUs - at least on the P4 - too. I doubt they tuned it for Athlons. -Andi