public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Eric St-Laurent <ericstl34@sympatico.ca>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	Fengguang Wu <fengguang.wu@gmail.com>,
	Dave Jones <davej@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	riel <riel@redhat.com>, Andrew Morton <akpm@linux-foundation.org>,
	Tim Pepper <lnxninja@us.ibm.com>, Chris Snook <csnook@redhat.com>
Subject: Re: [PATCH 0/3] readahead drop behind and size adjustment
Date: Wed, 25 Jul 2007 17:09:38 +1000	[thread overview]
Message-ID: <46A6F732.3080905@yahoo.com.au> (raw)
In-Reply-To: <1185344325.7105.91.camel@perkele>

[-- Attachment #1: Type: text/plain, Size: 2821 bytes --]

Eric St-Laurent wrote:
> On Wed, 2007-25-07 at 15:19 +1000, Nick Piggin wrote:
> 
> 
>>What *I* think is supposed to happen is that newly read in pages get
>>put on the inactive list, and unless they get accessed againbefore
>>being reclaimed, they are allowed to fall off the end of the list
>>without disturbing active data too much.
>>
>>I think there is a missing piece here, that we used to ease the reclaim
>>pressure off the active list when the inactive list grows relatively
>>much larger than it (which could indicate a lot of use-once pages in
>>the system).
> 
> 
> Maybe a new list should be added to put newly read pages in it. If they
> are not used or used once after a certain period, they can be moved to
> the inactive list (or whatever).
> 
> Newly read pages...
> 
> - ... not used after this period are excessive readahead, we discard
> immediately.
> - ... used only once after this period, we discard soon.
> - ... used many/frequently are moved to active list.
> 
> Surely the scan rate (do I make sense?) should be different for this
> newly-read list and the inactive list. 

A new list could be a possibility. One problem with adding lists is just
trying to work out how to balance scanning rates between them, another
problem is CPU overhead of moving pages from one to another... but don't
let me stop you if you want to jump in and try something :)


> I also remember your split mapped/unmapped active list patches from a
> while ago.
> 
> Can someone point me to a up-to-date documentation about the Linux VM?
> The books and documents I've seen are outdated.

If you just want to play with page reclaim algorithms, try reading over
mm/vmscan.c. If you don't know much about the Linux VM internals before,
don't worry too much about the fine details and start by getting an idea
of how pages move between the active and inactive lists.

I have Mel Gorman's, but I don't recall whether it covers the fine details
of page reclaim. But anyway it is still a good book.


>>I think I've been banned from touching vmscan.c, but if you're keen to
>>try a patch, I might be convinced to come out of retirement :)
> 
> 
> I'm more than willing!  Now that CFS is merged, redirect your energies
> from nicksched to nick-vm ;)
> 
> Patches against any tree (stable, linus, mm, rt) are good. But I prefer
> the last stable release because it narrows down the possible problems
> that a moving target like the development tree may have.
> 
> I test this on my main system, so patches with basic testing and
> reasonable stability are preferred. I just want to avoid data corruption
> bugs. FYI, I used to run the -rt tree most of the time.

OK here is one which just changes the rate that the active and inactive
lists get scanned. Data corruption bugs should be minimal ;)

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: inactive-useonce.patch --]
[-- Type: text/plain, Size: 2375 bytes --]

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -1011,6 +1011,8 @@ static unsigned long shrink_zone(int pri
 {
 	unsigned long nr_active;
 	unsigned long nr_inactive;
+	unsigned long scan_active;
+	unsigned long scan_inactive;
 	unsigned long nr_to_scan;
 	unsigned long nr_reclaimed = 0;
 
@@ -1020,34 +1022,47 @@ static unsigned long shrink_zone(int pri
 	 * Add one to `nr_to_scan' just to make sure that the kernel will
 	 * slowly sift through the active list.
 	 */
-	zone->nr_scan_active +=
-		(zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
-	nr_active = zone->nr_scan_active;
-	if (nr_active >= sc->swap_cluster_max)
+	nr_active = zone_page_state(zone, NR_INACTIVE);
+	nr_inactive = zone_page_state(zone, NR_ACTIVE);
+
+	scan_inactive = (nr_inactive >> priority) + 1;
+
+	if (nr_active >= 4*(nr_inactive*2 + 1))
+		scan_active = 4*scan_inactive;
+	else {
+		unsigned long long tmp;
+
+		tmp = (unsigned long long)scan_inactive * nr_active;
+		do_div(tmp, nr_inactive*2 + 1);
+		scan_active = (unsigned long)tmp + 1;
+	}
+
+	zone->nr_scan_active += scan_active;
+	scan_active = zone->nr_scan_active;
+	if (scan_active >= sc->swap_cluster_max)
 		zone->nr_scan_active = 0;
 	else
-		nr_active = 0;
+		scan_active = 0;
 
-	zone->nr_scan_inactive +=
-		(zone_page_state(zone, NR_INACTIVE) >> priority) + 1;
-	nr_inactive = zone->nr_scan_inactive;
-	if (nr_inactive >= sc->swap_cluster_max)
+	zone->nr_scan_inactive += scan_inactive;
+	scan_inactive = zone->nr_scan_inactive;
+	if (scan_inactive >= sc->swap_cluster_max)
 		zone->nr_scan_inactive = 0;
 	else
-		nr_inactive = 0;
+		scan_inactive = 0;
 
-	while (nr_active || nr_inactive) {
-		if (nr_active) {
-			nr_to_scan = min(nr_active,
+	while (scan_active || scan_inactive) {
+		if (scan_active) {
+			nr_to_scan = min(scan_active,
 					(unsigned long)sc->swap_cluster_max);
-			nr_active -= nr_to_scan;
+			scan_active -= nr_to_scan;
 			shrink_active_list(nr_to_scan, zone, sc, priority);
 		}
 
-		if (nr_inactive) {
-			nr_to_scan = min(nr_inactive,
+		if (scan_inactive) {
+			nr_to_scan = min(scan_inactive,
 					(unsigned long)sc->swap_cluster_max);
-			nr_inactive -= nr_to_scan;
+			scan_inactive -= nr_to_scan;
 			nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
 								sc);
 		}

  reply	other threads:[~2007-07-25  7:09 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-21 21:00 [PATCH 0/3] readahead drop behind and size adjustment Peter Zijlstra
2007-07-21 21:00 ` [PATCH 1/3] readahead: drop behind Peter Zijlstra
2007-07-21 20:29   ` Eric St-Laurent
2007-07-21 20:37     ` Peter Zijlstra
2007-07-21 20:59       ` Eric St-Laurent
2007-07-21 21:06         ` Peter Zijlstra
2007-07-25  3:55   ` Eric St-Laurent
2007-07-21 21:00 ` [PATCH 2/3] readahead: fadvise drop behind controls Peter Zijlstra
2007-07-21 21:00 ` [PATCH 3/3] readahead: scale max readahead size depending on memory size Peter Zijlstra
2007-07-22  8:24   ` Jens Axboe
2007-07-22  8:36     ` Peter Zijlstra
2007-07-22  8:50       ` Jens Axboe
2007-07-22  9:17         ` Peter Zijlstra
2007-07-22 16:44           ` Jens Axboe
2007-07-23 10:04             ` Jörn Engel
2007-07-23 10:11               ` Jens Axboe
2007-07-23 22:44               ` Rusty Russell
2007-07-22 23:52         ` Rik van Riel
2007-07-23  5:22           ` Jens Axboe
     [not found]   ` <20070722084526.GB6317@mail.ustc.edu.cn>
2007-07-22  8:45     ` Fengguang Wu
2007-07-22  8:59       ` Peter Zijlstra
     [not found]         ` <20070722095313.GA8136@mail.ustc.edu.cn>
2007-07-22  9:53           ` Fengguang Wu
     [not found] ` <20070722023923.GA6438@mail.ustc.edu.cn>
2007-07-22  2:39   ` [PATCH 0/3] readahead drop behind and size adjustment Fengguang Wu
2007-07-22  2:44   ` Dave Jones
     [not found]     ` <20070722081010.GA6317@mail.ustc.edu.cn>
2007-07-22  8:10       ` Fengguang Wu
2007-07-22  8:24         ` Peter Zijlstra
     [not found]           ` <20070722082923.GA7790@mail.ustc.edu.cn>
2007-07-22  8:29             ` Fengguang Wu
2007-07-22  8:33       ` Rusty Russell
2007-07-22  8:45         ` Peter Zijlstra
2007-07-23  9:00         ` Nick Piggin
     [not found]           ` <20070723142457.GA10130@mail.ustc.edu.cn>
2007-07-23 14:24             ` Fengguang Wu
2007-07-23 19:40               ` Andrew Morton
     [not found]                 ` <20070724004728.GA8026@mail.ustc.edu.cn>
2007-07-24  0:47                   ` Fengguang Wu
2007-07-24  1:17                     ` Andrew Morton
2007-07-24  8:50                       ` Andreas Dilger
2007-07-24  4:30                     ` Nick Piggin
2007-07-25  4:35           ` Eric St-Laurent
2007-07-25  5:19             ` Nick Piggin
2007-07-25  6:18               ` Eric St-Laurent
2007-07-25  7:09                 ` Nick Piggin [this message]
2007-07-25  7:48                   ` Eric St-Laurent
2007-07-25 15:36                     ` Rik van Riel
2007-07-25 15:33                   ` Rik van Riel
2007-07-29  7:44                   ` Eric St-Laurent
2007-07-25 15:28               ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2007-07-22 11:11 Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46A6F732.3080905@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=csnook@redhat.com \
    --cc=davej@redhat.com \
    --cc=ericstl34@sympatico.ca \
    --cc=fengguang.wu@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lnxninja@us.ibm.com \
    --cc=riel@redhat.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox