From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 3 Jan 2003 16:24:08 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 3 Jan 2003 16:24:08 -0500 Received: from packet.digeo.com ([12.110.80.53]:11137 "EHLO packet.digeo.com") by vger.kernel.org with ESMTP id ; Fri, 3 Jan 2003 16:24:06 -0500 Message-ID: <3E16016B.8D6092BE@digeo.com> Date: Fri, 03 Jan 2003 13:32:27 -0800 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.51 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: davem@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements) References: <94F20261551DC141B6B559DC4910867204491F@blr-m3-msg.wipro.com.suse.lists.linux.kernel> <3E155903.F8C22286@digeo.com.suse.lists.linux.kernel> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 03 Jan 2003 21:32:30.0594 (UTC) FILETIME=[9EC04E20:01C2B36F] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andi Kleen wrote: > > Andrew Morton writes: > > > > The teeny little microbenchmarks are telling us that the rmap overhead > > hurts, that the uninlining of copy_*_user may have been a bad idea, that > > the addition of AIO has cost a little and that the complexity which > > yielded large improvements in readv(), writev() and SMP throughput were > > not free. All of this is already known. > > If you mean the signal speed regressions they caused - I fixed > that on x86-64 by inlining 1,2,4,8,10(used by signal fpu frame),16. > But it should not use the stupud rep ; ..., of the old ersio but direct > unrolled moves. Yes, that would help a bit. We should do that for ia32. It's a little worrisome that the return value from such a copy_*_user() implementation will be incorrect - it is supposed to return the number of uncopied bytes. Probably doesn't matter. Most of the optimisation opportunities wrt signal delivery were soaked up by replacing the copy_*_user() calls with put_user() and friends. We could speed up signals heaps by re-lazying the fpu state storage in some manner. > x86-64 version in include/asm-x86_64/uaccess.h, could be ported > to i386 given that movqs need to be replaced by two movls. > > -Andi > > P.S.: regarding recent lmbench slow downs: I'm a bit > worried about the two wrmsrs which are in the i386 context switch > in load_esp0 for sysenter now. Last time I benchmarked WRMSRs on > Athlon they were really slow and knowing the P4 it is probably > even slower there. Imho it would be better to undo that patch > and use Linus' original trampoline stack. hm. How slow? Any numbers on that?