From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264472AbTLOXg5 (ORCPT ); Mon, 15 Dec 2003 18:36:57 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264482AbTLOXg4 (ORCPT ); Mon, 15 Dec 2003 18:36:56 -0500 Received: from ppp-217-133-42-200.cust-adsl.tiscali.it ([217.133.42.200]:38318 "EHLO dualathlon.random") by vger.kernel.org with ESMTP id S264472AbTLOXgz (ORCPT ); Mon, 15 Dec 2003 18:36:55 -0500 Date: Tue, 16 Dec 2003 00:37:46 +0100 From: Andrea Arcangeli To: Andrew Morton Cc: wli@holomorphy.com, kernel@kolivas.org, chris@cvine.freeserve.co.uk, riel@redhat.com, linux-kernel@vger.kernel.org, mbligh@aracnet.com Subject: Re: 2.6.0-test9 - poor swap performance on low end machines Message-ID: <20031215233746.GO6730@dualathlon.random> References: <200311031148.40242.kernel@kolivas.org> <200311032113.14462.chris@cvine.freeserve.co.uk> <200311041355.08731.kernel@kolivas.org> <20031208135225.GT19856@holomorphy.com> <20031208194930.GA8667@k3.hellgate.ch> <20031208204817.GA19856@holomorphy.com> <20031210215235.GC11193@dualathlon.random> <20031210220525.GA28912@k3.hellgate.ch> <20031210224445.GE11193@dualathlon.random> <20031215153122.1d915475.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031215153122.1d915475.akpm@osdl.org> User-Agent: Mutt/1.4.1i X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 15, 2003 at 03:31:22PM -0800, Andrew Morton wrote: > Single-threaded qsbench is OK on 2.6. Last time I looked it was a little > quicker than 2.4. It's when you go to multiple qsbench instances that > everything goes to crap. > > It's interesting to watch the `top' output during the run. In 2.4 you see > three qsbench instances have consumed 0.1 seconds CPU and the fourth has > consumed 45 seconds and then exits. > > In 2.6 all four processes consume CPU at the same rate. Really, really > slowly. sounds good, so this seems only a fariness issue. 2.6 is more fair but fariness in this case means much inferior performance. The reason 2.4 runs faster could be a more aggressive "young" pagetable heuristic via the swap_out clock algorithm. as soon as one program grows a bit its rss, it will run for longer, and the longer it runs the more pages it marks "young" during a clock scan, and the more pages it marks young the bigger it will grow. This keeps going until it's the by far biggest task and takes almost all available cpu. This is optimal for performance, but not optimal for fariness. So 2.6 may be better or worse depending if fariness payoffs or not, obviously in qsbench it doesn't since it's not even measured.