From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762600AbXGYFUV (ORCPT ); Wed, 25 Jul 2007 01:20:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752744AbXGYFUJ (ORCPT ); Wed, 25 Jul 2007 01:20:09 -0400 Received: from smtp102.mail.mud.yahoo.com ([209.191.85.212]:34031 "HELO smtp102.mail.mud.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752362AbXGYFUH (ORCPT ); Wed, 25 Jul 2007 01:20:07 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:X-YMail-OSG:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=vykiYwYCr9qirzdrw1leu/uvxl1u/IASJgizhzIfa8DhseF4sKf04zE8ZnPWn3yvLzuwm4lFYdyjBwjhpZRilSpgMzk7iacHVfcd+CnyjROizvoTBV+SHBLnE62OWIIdlUUa3rUzqpSdi0P4PiF+/zHw57VlhWpgx3zQ6ALirVI= ; X-YMail-OSG: dpZYypEVM1nHE8vufvS1UfN00JWtDa2OXuW2qecfiK78ONuFC_LhL3wQLwWchqlrlcovQ320Bg-- Message-ID: <46A6DD7F.1050505@yahoo.com.au> Date: Wed, 25 Jul 2007 15:19:59 +1000 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007 Debian/1.7.12-1 X-Accept-Language: en MIME-Version: 1.0 To: Eric St-Laurent CC: Rusty Russell , Fengguang Wu , Dave Jones , Peter Zijlstra , linux-kernel , riel , Andrew Morton , Tim Pepper , Chris Snook Subject: Re: [PATCH 0/3] readahead drop behind and size adjustment References: <20070721210005.000228000@chello.nl> <20070722023923.GA6438@mail.ustc.edu.cn> <20070722024428.GA724@redhat.com> <20070722081010.GA6317@mail.ustc.edu.cn> <1185093236.6344.87.camel@localhost.localdomain> <46A46E4B.7050007@yahoo.com.au> <1185338106.7105.44.camel@perkele> In-Reply-To: <1185338106.7105.44.camel@perkele> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Eric St-Laurent wrote: > On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote: > > >>I don't like this kind of conditional information going from something >>like readahead into page reclaim. Unless it is for readahead _specific_ >>data such as "I got these all wrong, so you can reclaim them" (which >>this isn't). >> >>But I don't like it as a use-once thing. The VM should be able to get >>that right. >> > > > > Question: How work the use-once code in the current kernel? Is there > any? I doesn't quite work for me... What *I* think is supposed to happen is that newly read in pages get put on the inactive list, and unless they get accessed againbefore being reclaimed, they are allowed to fall off the end of the list without disturbing active data too much. I think there is a missing piece here, that we used to ease the reclaim pressure off the active list when the inactive list grows relatively much larger than it (which could indicate a lot of use-once pages in the system). Andrew got rid of that logic for some reason which I don't know, but I can't see that use-once would be terribly effective today (so your results don't surprise me too much). I think I've been banned from touching vmscan.c, but if you're keen to try a patch, I might be convinced to come out of retirement :) > See my previous email today, I've done a small test case to demonstrate > the problem and the effectiveness of Peter's patch. The only piece > missing is the copy case (read once + write once). > > Regardless of how it's implemented, I think a similar mechanism must be > added. This is a long standing issue. > > In the end, I think it's a pagecache resources allocation problem. the > VM lacks fair-share limits between processes. The kernel doesn't have > enough information to make the right decisions. > > You can refine or use more advanced page reclaim, but some fair-share > splitting (like the CPU scheduler) between the processes must be > present. Of course some process should have large or unlimited VM > limits, like databases. > > Maybe the "containers" patchset and memory controller can help. With > some specific configuration and/or a userspace daemon to adjust the > limits on the fly. > > Independently, the basic large file streaming read (or copy) once cases > should not trash the pagecache. Can we agree on that? One man's trash is another's treasure: some people will want the files to remain in cache because they'll use them again (copy it somewhere else, or start editing it after being copied or whatever). But yeah, we can probably do better at the sequential read/write case. > I say, let's add some code to fix the problem. If we hear about any > regression in some workloads, we can add a tunable to limit or disable > its effects, _if_ a better compromised solution cannot be found. Sure, but let's figure out the workloads and look at all the alternatives first. -- SUSE Labs, Novell Inc.