* [PATCH -mm 01/12] move isolate_lru_page() to vmscan.c
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 02/12] Use an indexed array for LRU variables Rik van Riel
` (10 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
Lee Schermerhorn
[-- Attachment #1: np-01-move-and-rework-isolate_lru_page-v2.patch --]
[-- Type: text/plain, Size: 7400 bytes --]
From: Nick Piggin <npiggin@suse.de>
isolate_lru_page logically belongs to be in vmscan.c than migrate.c.
It is tough, because we don't need that function without memory migration
so there is a valid argument to have it in migrate.c. However a subsequent
patch needs to make use of it in the core mm, so we can happily move it
to vmscan.c.
Also, make the function a little more generic by not requiring that it
adds an isolated page to a given list. Callers can do that.
Note that we now have '__isolate_lru_page()', that does
something quite different, visible outside of vmscan.c
for use with memory controller. Methinks we need to
rationalize these names/purposes. --lts
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
---
V1 -> V2 [lts]:
+ fix botched merge -- add back "get_page_unless_zero()"
From: Nick Piggin <npiggin@suse.de>
To: Linux Memory Management <linux-mm@kvack.org>
Subject: [patch 1/4] mm: move and rework isolate_lru_page
Date: Mon, 12 Mar 2007 07:38:44 +0100 (CET)
include/linux/migrate.h | 3 ---
mm/internal.h | 2 ++
mm/mempolicy.c | 9 +++++++--
mm/migrate.c | 34 +++-------------------------------
mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 57 insertions(+), 36 deletions(-)
Index: linux-2.6.26-rc2-mm1/include/linux/migrate.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/migrate.h 2008-05-23 14:21:22.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/migrate.h 2008-05-23 14:21:32.000000000 -0400
@@ -25,7 +25,6 @@ static inline int vma_migratable(struct
return 1;
}
-extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
extern int putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
@@ -42,8 +41,6 @@ extern int migrate_vmas(struct mm_struct
static inline int vma_migratable(struct vm_area_struct *vma)
{ return 0; }
-static inline int isolate_lru_page(struct page *p, struct list_head *list)
- { return -ENOSYS; }
static inline int putback_lru_pages(struct list_head *l) { return 0; }
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private) { return -ENOSYS; }
Index: linux-2.6.26-rc2-mm1/mm/internal.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/internal.h 2008-05-23 14:21:22.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/internal.h 2008-05-23 14:21:32.000000000 -0400
@@ -34,6 +34,8 @@ static inline void __put_page(struct pag
atomic_dec(&page->_count);
}
+extern int isolate_lru_page(struct page *page);
+
extern void __free_pages_bootmem(struct page *page, unsigned int order);
/*
Index: linux-2.6.26-rc2-mm1/mm/migrate.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/migrate.c 2008-05-23 14:21:22.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/migrate.c 2008-05-23 14:21:32.000000000 -0400
@@ -37,36 +37,6 @@
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
/*
- * Isolate one page from the LRU lists. If successful put it onto
- * the indicated list with elevated page count.
- *
- * Result:
- * -EBUSY: page not on LRU list
- * 0: page removed from LRU list and added to the specified list.
- */
-int isolate_lru_page(struct page *page, struct list_head *pagelist)
-{
- int ret = -EBUSY;
-
- if (PageLRU(page)) {
- struct zone *zone = page_zone(page);
-
- spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && get_page_unless_zero(page)) {
- ret = 0;
- ClearPageLRU(page);
- if (PageActive(page))
- del_page_from_active_list(zone, page);
- else
- del_page_from_inactive_list(zone, page);
- list_add_tail(&page->lru, pagelist);
- }
- spin_unlock_irq(&zone->lru_lock);
- }
- return ret;
-}
-
-/*
* migrate_prep() needs to be called before we start compiling a list of pages
* to be migrated using isolate_lru_page().
*/
@@ -895,7 +865,9 @@ static int do_move_pages(struct mm_struc
!migrate_all)
goto put_and_set;
- err = isolate_lru_page(page, &pagelist);
+ err = isolate_lru_page(page);
+ if (!err)
+ list_add_tail(&page->lru, &pagelist);
put_and_set:
/*
* Either remove the duplicate refcount from
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:22.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-23 14:21:32.000000000 -0400
@@ -816,6 +816,51 @@ static unsigned long clear_active_flags(
return nr_active;
}
+/**
+ * isolate_lru_page - tries to isolate a page from its LRU list
+ * @page: page to isolate from its LRU list
+ *
+ * Isolates a @page from an LRU list, clears PageLRU and adjusts the
+ * vmstat statistic corresponding to whatever LRU list the page was on.
+ *
+ * Returns 0 if the page was removed from an LRU list.
+ * Returns -EBUSY if the page was not on an LRU list.
+ *
+ * The returned page will have PageLRU() cleared. If it was found on
+ * the active list, it will have PageActive set. That flag may need
+ * to be cleared by the caller before letting the page go.
+ *
+ * The vmstat statistic corresponding to the list on which the page was
+ * found will be decremented.
+ *
+ * Restrictions:
+ * (1) Must be called with an elevated refcount on the page. This is a
+ * fundamentnal difference from isolate_lru_pages (which is called
+ * without a stable reference).
+ * (2) the lru_lock must not be held.
+ * (3) interrupts must be enabled.
+ */
+int isolate_lru_page(struct page *page)
+{
+ int ret = -EBUSY;
+
+ if (PageLRU(page)) {
+ struct zone *zone = page_zone(page);
+
+ spin_lock_irq(&zone->lru_lock);
+ if (PageLRU(page) && get_page_unless_zero(page)) {
+ ret = 0;
+ ClearPageLRU(page);
+ if (PageActive(page))
+ del_page_from_active_list(zone, page);
+ else
+ del_page_from_inactive_list(zone, page);
+ }
+ spin_unlock_irq(&zone->lru_lock);
+ }
+ return ret;
+}
+
/*
* shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages
Index: linux-2.6.26-rc2-mm1/mm/mempolicy.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/mempolicy.c 2008-05-23 14:21:22.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/mempolicy.c 2008-05-23 14:21:32.000000000 -0400
@@ -93,6 +93,8 @@
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
+#include "internal.h"
+
/* Internal flags */
#define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */
#define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */
@@ -758,8 +760,11 @@ static void migrate_page_add(struct page
/*
* Avoid migrating a page that is shared with others.
*/
- if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(page) == 1)
- isolate_lru_page(page, pagelist);
+ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(page) == 1) {
+ if (!isolate_lru_page(page)) {
+ list_add_tail(&page->lru, pagelist);
+ }
+ }
}
static struct page *new_node_page(struct page *page, unsigned long node, int **x)
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 02/12] Use an indexed array for LRU variables
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 01/12] move isolate_lru_page() to vmscan.c Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 03/12] use an array for the LRU pagevecs Rik van Riel
` (9 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro,
Christoph Lameter
[-- Attachment #1: cl-use-indexed-array-of-lru-lists.patch --]
[-- Type: text/plain, Size: 22261 bytes --]
From: Christoph Lameter <clameter@sgi.com>
Currently we are defining explicit variables for the inactive
and active list. An indexed array can be more generic and avoid
repeating similar code in several places in the reclaim code.
We are saving a few bytes in terms of code size:
Before:
text data bss dec hex filename
4097753 573120 4092484 8763357 85b7dd vmlinux
After:
text data bss dec hex filename
4097729 573120 4092484 8763333 85b7c5 vmlinux
Having an easy way to add new lru lists may ease future work on
the reclaim code.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
V3 [riel]: memcontrol LRU arrayification
V1 -> V2 [lts]:
+ Remove extraneous __dec_zone_state(zone, NR_ACTIVE) pointed
out by Mel G.
include/linux/memcontrol.h | 17 +-----
include/linux/mm_inline.h | 33 ++++++++----
include/linux/mmzone.h | 17 ++++--
mm/memcontrol.c | 115 ++++++++++++++++---------------------------
mm/page_alloc.c | 9 +--
mm/swap.c | 2
mm/vmscan.c | 120 +++++++++++++++++++++------------------------
mm/vmstat.c | 3 -
8 files changed, 146 insertions(+), 170 deletions(-)
Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-23 14:21:33.000000000 -0400
@@ -81,8 +81,8 @@ struct zone_padding {
enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
- NR_INACTIVE,
- NR_ACTIVE,
+ NR_INACTIVE, /* must match order of LRU_[IN]ACTIVE */
+ NR_ACTIVE, /* " " " " " */
NR_ANON_PAGES, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
only modified from process context */
@@ -107,6 +107,13 @@ enum zone_stat_item {
#endif
NR_VM_ZONE_STAT_ITEMS };
+enum lru_list {
+ LRU_INACTIVE, /* must match order of NR_[IN]ACTIVE */
+ LRU_ACTIVE, /* " " " " " */
+ NR_LRU_LISTS };
+
+#define for_each_lru(l) for (l = 0; l < NR_LRU_LISTS; l++)
+
struct per_cpu_pages {
int count; /* number of pages in the list */
int high; /* high watermark, emptying needed */
@@ -251,10 +258,8 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct list_head active_list;
- struct list_head inactive_list;
- unsigned long nr_scan_active;
- unsigned long nr_scan_inactive;
+ struct list_head list[NR_LRU_LISTS];
+ unsigned long nr_scan[NR_LRU_LISTS];
unsigned long pages_scanned; /* since last reclaim */
unsigned long flags; /* zone flags, see below */
Index: linux-2.6.26-rc2-mm1/include/linux/mm_inline.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mm_inline.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mm_inline.h 2008-05-23 14:21:33.000000000 -0400
@@ -1,40 +1,51 @@
static inline void
+add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list l)
+{
+ list_add(&page->lru, &zone->list[l]);
+ __inc_zone_state(zone, NR_INACTIVE + l);
+}
+
+static inline void
+del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list l)
+{
+ list_del(&page->lru);
+ __dec_zone_state(zone, NR_INACTIVE + l);
+}
+
+static inline void
add_page_to_active_list(struct zone *zone, struct page *page)
{
- list_add(&page->lru, &zone->active_list);
- __inc_zone_state(zone, NR_ACTIVE);
+ add_page_to_lru_list(zone, page, LRU_ACTIVE);
}
static inline void
add_page_to_inactive_list(struct zone *zone, struct page *page)
{
- list_add(&page->lru, &zone->inactive_list);
- __inc_zone_state(zone, NR_INACTIVE);
+ add_page_to_lru_list(zone, page, LRU_INACTIVE);
}
static inline void
del_page_from_active_list(struct zone *zone, struct page *page)
{
- list_del(&page->lru);
- __dec_zone_state(zone, NR_ACTIVE);
+ del_page_from_lru_list(zone, page, LRU_ACTIVE);
}
static inline void
del_page_from_inactive_list(struct zone *zone, struct page *page)
{
- list_del(&page->lru);
- __dec_zone_state(zone, NR_INACTIVE);
+ del_page_from_lru_list(zone, page, LRU_INACTIVE);
}
static inline void
del_page_from_lru(struct zone *zone, struct page *page)
{
+ enum lru_list l = LRU_INACTIVE;
+
list_del(&page->lru);
if (PageActive(page)) {
__ClearPageActive(page);
- __dec_zone_state(zone, NR_ACTIVE);
- } else {
- __dec_zone_state(zone, NR_INACTIVE);
+ l = LRU_ACTIVE;
}
+ __dec_zone_state(zone, NR_INACTIVE + l);
}
Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-23 14:21:33.000000000 -0400
@@ -3444,6 +3444,7 @@ static void __paginginit free_area_init_
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize, memmap_pages;
+ enum lru_list l;
size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
@@ -3494,10 +3495,10 @@ static void __paginginit free_area_init_
zone->prev_priority = DEF_PRIORITY;
zone_pcp_init(zone);
- INIT_LIST_HEAD(&zone->active_list);
- INIT_LIST_HEAD(&zone->inactive_list);
- zone->nr_scan_active = 0;
- zone->nr_scan_inactive = 0;
+ for_each_lru(l) {
+ INIT_LIST_HEAD(&zone->list[l]);
+ zone->nr_scan[l] = 0;
+ }
zap_zone_vm_stats(zone);
zone->flags = 0;
if (!size)
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
@@ -117,7 +117,7 @@ static void pagevec_move_tail(struct pag
spin_lock(&zone->lru_lock);
}
if (PageLRU(page) && !PageActive(page)) {
- list_move_tail(&page->lru, &zone->inactive_list);
+ list_move_tail(&page->lru, &zone->list[LRU_INACTIVE]);
pgmoved++;
}
}
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:32.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
@@ -791,10 +791,10 @@ static unsigned long isolate_pages_globa
int active)
{
if (active)
- return isolate_lru_pages(nr, &z->active_list, dst,
+ return isolate_lru_pages(nr, &z->list[LRU_ACTIVE], dst,
scanned, order, mode);
else
- return isolate_lru_pages(nr, &z->inactive_list, dst,
+ return isolate_lru_pages(nr, &z->list[LRU_INACTIVE], dst,
scanned, order, mode);
}
@@ -945,10 +945,7 @@ static unsigned long shrink_inactive_lis
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
list_del(&page->lru);
- if (PageActive(page))
- add_page_to_active_list(zone, page);
- else
- add_page_to_inactive_list(zone, page);
+ add_page_to_lru_list(zone, page, PageActive(page));
if (!pagevec_add(&pvec, page)) {
spin_unlock_irq(&zone->lru_lock);
__pagevec_release(&pvec);
@@ -1116,8 +1113,8 @@ static void shrink_active_list(unsigned
int pgdeactivate = 0;
unsigned long pgscanned;
LIST_HEAD(l_hold); /* The pages which were snipped off */
- LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
- LIST_HEAD(l_active); /* Pages to go onto the active_list */
+ LIST_HEAD(l_active);
+ LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;
int reclaim_mapped = 0;
@@ -1169,7 +1166,7 @@ static void shrink_active_list(unsigned
VM_BUG_ON(!PageActive(page));
ClearPageActive(page);
- list_move(&page->lru, &zone->inactive_list);
+ list_move(&page->lru, &zone->list[LRU_INACTIVE]);
mem_cgroup_move_lists(page, false);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
@@ -1199,7 +1196,7 @@ static void shrink_active_list(unsigned
SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
+ list_move(&page->lru, &zone->list[LRU_ACTIVE]);
mem_cgroup_move_lists(page, true);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
@@ -1219,65 +1216,64 @@ static void shrink_active_list(unsigned
pagevec_release(&pvec);
}
+static unsigned long shrink_list(enum lru_list l, unsigned long nr_to_scan,
+ struct zone *zone, struct scan_control *sc, int priority)
+{
+ if (l == LRU_ACTIVE) {
+ shrink_active_list(nr_to_scan, zone, sc, priority);
+ return 0;
+ }
+ return shrink_inactive_list(nr_to_scan, zone, sc);
+}
+
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
static unsigned long shrink_zone(int priority, struct zone *zone,
struct scan_control *sc)
{
- unsigned long nr_active;
- unsigned long nr_inactive;
+ unsigned long nr[NR_LRU_LISTS];
unsigned long nr_to_scan;
unsigned long nr_reclaimed = 0;
+ enum lru_list l;
if (scan_global_lru(sc)) {
/*
* Add one to nr_to_scan just to make sure that the kernel
* will slowly sift through the active list.
*/
- zone->nr_scan_active +=
- (zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
- nr_active = zone->nr_scan_active;
- zone->nr_scan_inactive +=
- (zone_page_state(zone, NR_INACTIVE) >> priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
- else
- nr_inactive = 0;
-
- if (nr_active >= sc->swap_cluster_max)
- zone->nr_scan_active = 0;
- else
- nr_active = 0;
+ for_each_lru(l) {
+ zone->nr_scan[l] += (zone_page_state(zone,
+ NR_INACTIVE + l) >> priority) + 1;
+ nr[l] = zone->nr_scan[l];
+ if (nr[l] >= sc->swap_cluster_max)
+ zone->nr_scan[l] = 0;
+ else
+ nr[l] = 0;
+ }
} else {
/*
* This reclaim occurs not because zone memory shortage but
* because memory controller hits its limit.
* Then, don't modify zone reclaim related data.
*/
- nr_active = mem_cgroup_calc_reclaim_active(sc->mem_cgroup,
- zone, priority);
+ nr[LRU_ACTIVE] = mem_cgroup_calc_reclaim(sc->mem_cgroup,
+ zone, priority, LRU_ACTIVE);
- nr_inactive = mem_cgroup_calc_reclaim_inactive(sc->mem_cgroup,
- zone, priority);
+ nr[LRU_INACTIVE] = mem_cgroup_calc_reclaim(sc->mem_cgroup,
+ zone, priority, LRU_INACTIVE);
}
-
- while (nr_active || nr_inactive) {
- if (nr_active) {
- nr_to_scan = min(nr_active,
+ while (nr[LRU_ACTIVE] || nr[LRU_INACTIVE]) {
+ for_each_lru(l) {
+ if (nr[l]) {
+ nr_to_scan = min(nr[l],
(unsigned long)sc->swap_cluster_max);
- nr_active -= nr_to_scan;
- shrink_active_list(nr_to_scan, zone, sc, priority);
- }
+ nr[l] -= nr_to_scan;
- if (nr_inactive) {
- nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= nr_to_scan;
- nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
- sc);
+ nr_reclaimed += shrink_list(l, nr_to_scan,
+ zone, sc, priority);
+ }
}
}
@@ -1790,6 +1786,7 @@ static unsigned long shrink_all_zones(un
{
struct zone *zone;
unsigned long nr_to_scan, ret = 0;
+ enum lru_list l;
for_each_zone(zone) {
@@ -1799,28 +1796,25 @@ static unsigned long shrink_all_zones(un
if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY)
continue;
- /* For pass = 0 we don't shrink the active list */
- if (pass > 0) {
- zone->nr_scan_active +=
- (zone_page_state(zone, NR_ACTIVE) >> prio) + 1;
- if (zone->nr_scan_active >= nr_pages || pass > 3) {
- zone->nr_scan_active = 0;
+ for_each_lru(l) {
+ /* For pass = 0 we don't shrink the active list */
+ if (pass == 0 && l == LRU_ACTIVE)
+ continue;
+
+ zone->nr_scan[l] +=
+ (zone_page_state(zone, NR_INACTIVE + l)
+ >> prio) + 1;
+ if (zone->nr_scan[l] >= nr_pages || pass > 3) {
+ zone->nr_scan[l] = 0;
nr_to_scan = min(nr_pages,
- zone_page_state(zone, NR_ACTIVE));
- shrink_active_list(nr_to_scan, zone, sc, prio);
+ zone_page_state(zone,
+ NR_INACTIVE + l));
+ ret += shrink_list(l, nr_to_scan, zone,
+ sc, prio);
+ if (ret >= nr_pages)
+ return ret;
}
}
-
- zone->nr_scan_inactive +=
- (zone_page_state(zone, NR_INACTIVE) >> prio) + 1;
- if (zone->nr_scan_inactive >= nr_pages || pass > 3) {
- zone->nr_scan_inactive = 0;
- nr_to_scan = min(nr_pages,
- zone_page_state(zone, NR_INACTIVE));
- ret += shrink_inactive_list(nr_to_scan, zone, sc);
- if (ret >= nr_pages)
- return ret;
- }
}
return ret;
Index: linux-2.6.26-rc2-mm1/mm/vmstat.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmstat.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmstat.c 2008-05-23 14:21:33.000000000 -0400
@@ -772,7 +772,8 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->nr_scan_active, zone->nr_scan_inactive,
+ zone->nr_scan[LRU_ACTIVE],
+ zone->nr_scan[LRU_INACTIVE],
zone->spanned_pages,
zone->present_pages);
Index: linux-2.6.26-rc2-mm1/include/linux/memcontrol.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/memcontrol.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/memcontrol.h 2008-05-23 14:21:33.000000000 -0400
@@ -67,10 +67,8 @@ extern void mem_cgroup_note_reclaim_prio
extern void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem,
int priority);
-extern long mem_cgroup_calc_reclaim_active(struct mem_cgroup *mem,
- struct zone *zone, int priority);
-extern long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
- struct zone *zone, int priority);
+extern long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone *zone,
+ int priority, enum lru_list lru);
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
static inline void page_reset_bad_cgroup(struct page *page)
@@ -152,14 +150,9 @@ static inline void mem_cgroup_record_rec
{
}
-static inline long mem_cgroup_calc_reclaim_active(struct mem_cgroup *mem,
- struct zone *zone, int priority)
-{
- return 0;
-}
-
-static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
- struct zone *zone, int priority)
+static inline long mem_cgroup_calc_reclaim(struct mem_cgroup *mem,
+ struct zone *zone, int priority,
+ enum lru_list lru)
{
return 0;
}
Index: linux-2.6.26-rc2-mm1/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/memcontrol.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/memcontrol.c 2008-05-23 14:21:33.000000000 -0400
@@ -32,6 +32,7 @@
#include <linux/fs.h>
#include <linux/seq_file.h>
#include <linux/vmalloc.h>
+#include <linux/mm_inline.h>
#include <asm/uaccess.h>
@@ -85,22 +86,13 @@ static s64 mem_cgroup_read_stat(struct m
/*
* per-zone information in memory controller.
*/
-
-enum mem_cgroup_zstat_index {
- MEM_CGROUP_ZSTAT_ACTIVE,
- MEM_CGROUP_ZSTAT_INACTIVE,
-
- NR_MEM_CGROUP_ZSTAT,
-};
-
struct mem_cgroup_per_zone {
/*
* spin_lock to protect the per cgroup LRU
*/
spinlock_t lru_lock;
- struct list_head active_list;
- struct list_head inactive_list;
- unsigned long count[NR_MEM_CGROUP_ZSTAT];
+ struct list_head lists[NR_LRU_LISTS];
+ unsigned long count[NR_LRU_LISTS];
};
/* Macro for accessing counter */
#define MEM_CGROUP_ZSTAT(mz, idx) ((mz)->count[(idx)])
@@ -227,7 +219,7 @@ page_cgroup_zoneinfo(struct page_cgroup
}
static unsigned long mem_cgroup_get_all_zonestat(struct mem_cgroup *mem,
- enum mem_cgroup_zstat_index idx)
+ enum lru_list idx)
{
int nid, zid;
struct mem_cgroup_per_zone *mz;
@@ -289,11 +281,9 @@ static void __mem_cgroup_remove_list(str
struct page_cgroup *pc)
{
int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+ int lru = !!from;
- if (from)
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1;
- else
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
+ MEM_CGROUP_ZSTAT(mz, lru) -= 1;
mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, false);
list_del(&pc->lru);
@@ -302,37 +292,35 @@ static void __mem_cgroup_remove_list(str
static void __mem_cgroup_add_list(struct mem_cgroup_per_zone *mz,
struct page_cgroup *pc)
{
- int to = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+ int lru = LRU_INACTIVE;
+
+ if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE)
+ lru += LRU_ACTIVE;
+
+ MEM_CGROUP_ZSTAT(mz, lru) += 1;
+ list_add(&pc->lru, &mz->lists[lru]);
- if (!to) {
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1;
- list_add(&pc->lru, &mz->inactive_list);
- } else {
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) += 1;
- list_add(&pc->lru, &mz->active_list);
- }
mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, true);
}
static void __mem_cgroup_move_lists(struct page_cgroup *pc, bool active)
{
- int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
+ int lru = LRU_INACTIVE;
- if (from)
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1;
- else
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
+ if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE)
+ lru += LRU_ACTIVE;
- if (active) {
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) += 1;
+ MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+
+ if (active)
pc->flags |= PAGE_CGROUP_FLAG_ACTIVE;
- list_move(&pc->lru, &mz->active_list);
- } else {
- MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1;
+ else
pc->flags &= ~PAGE_CGROUP_FLAG_ACTIVE;
- list_move(&pc->lru, &mz->inactive_list);
- }
+
+ lru = !!active;
+ MEM_CGROUP_ZSTAT(mz, lru) += 1;
+ list_move(&pc->lru, &mz->lists[lru]);
}
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
@@ -401,8 +389,8 @@ long mem_cgroup_reclaim_imbalance(struct
{
unsigned long active, inactive;
/* active and inactive are the number of pages. 'long' is ok.*/
- active = mem_cgroup_get_all_zonestat(mem, MEM_CGROUP_ZSTAT_ACTIVE);
- inactive = mem_cgroup_get_all_zonestat(mem, MEM_CGROUP_ZSTAT_INACTIVE);
+ active = mem_cgroup_get_all_zonestat(mem, LRU_ACTIVE);
+ inactive = mem_cgroup_get_all_zonestat(mem, LRU_INACTIVE);
return (long) (active / (inactive + 1));
}
@@ -433,28 +421,17 @@ void mem_cgroup_record_reclaim_priority(
* (see include/linux/mmzone.h)
*/
-long mem_cgroup_calc_reclaim_active(struct mem_cgroup *mem,
- struct zone *zone, int priority)
+long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone *zone,
+ int priority, enum lru_list lru)
{
- long nr_active;
+ long nr_pages;
int nid = zone->zone_pgdat->node_id;
int zid = zone_idx(zone);
struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(mem, nid, zid);
- nr_active = MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE);
- return (nr_active >> priority);
-}
-
-long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
- struct zone *zone, int priority)
-{
- long nr_inactive;
- int nid = zone->zone_pgdat->node_id;
- int zid = zone_idx(zone);
- struct mem_cgroup_per_zone *mz = mem_cgroup_zoneinfo(mem, nid, zid);
+ nr_pages = MEM_CGROUP_ZSTAT(mz, lru);
- nr_inactive = MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE);
- return (nr_inactive >> priority);
+ return (nr_pages >> priority);
}
unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
@@ -473,14 +450,11 @@ unsigned long mem_cgroup_isolate_pages(u
int nid = z->zone_pgdat->node_id;
int zid = zone_idx(z);
struct mem_cgroup_per_zone *mz;
+ int lru = !!active;
BUG_ON(!mem_cont);
mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
- if (active)
- src = &mz->active_list;
- else
- src = &mz->inactive_list;
-
+ src = &mz->lists[lru];
spin_lock(&mz->lru_lock);
scan = 0;
@@ -771,7 +745,7 @@ void mem_cgroup_end_migration(struct pag
#define FORCE_UNCHARGE_BATCH (128)
static void mem_cgroup_force_empty_list(struct mem_cgroup *mem,
struct mem_cgroup_per_zone *mz,
- int active)
+ enum lru_list lru)
{
struct page_cgroup *pc;
struct page *page;
@@ -779,10 +753,7 @@ static void mem_cgroup_force_empty_list(
unsigned long flags;
struct list_head *list;
- if (active)
- list = &mz->active_list;
- else
- list = &mz->inactive_list;
+ list = &mz->lists[lru];
spin_lock_irqsave(&mz->lru_lock, flags);
while (!list_empty(list)) {
@@ -832,11 +803,10 @@ static int mem_cgroup_force_empty(struct
for_each_node_state(node, N_POSSIBLE)
for (zid = 0; zid < MAX_NR_ZONES; zid++) {
struct mem_cgroup_per_zone *mz;
+ enum lru_list l;
mz = mem_cgroup_zoneinfo(mem, node, zid);
- /* drop all page_cgroup in active_list */
- mem_cgroup_force_empty_list(mem, mz, 1);
- /* drop all page_cgroup in inactive_list */
- mem_cgroup_force_empty_list(mem, mz, 0);
+ for_each_lru(l)
+ mem_cgroup_force_empty_list(mem, mz, l);
}
}
ret = 0;
@@ -923,9 +893,9 @@ static int mem_control_stat_show(struct
unsigned long active, inactive;
inactive = mem_cgroup_get_all_zonestat(mem_cont,
- MEM_CGROUP_ZSTAT_INACTIVE);
+ LRU_INACTIVE);
active = mem_cgroup_get_all_zonestat(mem_cont,
- MEM_CGROUP_ZSTAT_ACTIVE);
+ LRU_ACTIVE);
cb->fill(cb, "active", (active) * PAGE_SIZE);
cb->fill(cb, "inactive", (inactive) * PAGE_SIZE);
}
@@ -970,6 +940,7 @@ static int alloc_mem_cgroup_per_zone_inf
{
struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
+ enum lru_list l;
int zone, tmp = node;
/*
* This routine is called against possible nodes.
@@ -990,9 +961,9 @@ static int alloc_mem_cgroup_per_zone_inf
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
mz = &pn->zoneinfo[zone];
- INIT_LIST_HEAD(&mz->active_list);
- INIT_LIST_HEAD(&mz->inactive_list);
spin_lock_init(&mz->lru_lock);
+ for_each_lru(l)
+ INIT_LIST_HEAD(&mz->lists[l]);
}
return 0;
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 03/12] use an array for the LRU pagevecs
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 01/12] move isolate_lru_page() to vmscan.c Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 02/12] Use an indexed array for LRU variables Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 04/12] free swap space on swap-in/activation Rik van Riel
` (8 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
[-- Attachment #1: pagevec-array.patch --]
[-- Type: text/plain, Size: 9283 bytes --]
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Turn the pagevecs into an array just like the LRUs. This significantly
cleans up the source code and reduces the size of the kernel by about
13kB after all the LRU lists have been created further down in the split
VM patch series.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/mmzone.h | 13 +++++-
include/linux/pagevec.h | 13 +++++-
include/linux/swap.h | 18 ++++++++-
mm/migrate.c | 11 -----
mm/swap.c | 96 ++++++++++++++++++++++--------------------------
5 files changed, 83 insertions(+), 68 deletions(-)
Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-23 14:21:33.000000000 -0400
@@ -107,13 +107,22 @@ enum zone_stat_item {
#endif
NR_VM_ZONE_STAT_ITEMS };
+#define LRU_BASE 0
+
enum lru_list {
- LRU_INACTIVE, /* must match order of NR_[IN]ACTIVE */
- LRU_ACTIVE, /* " " " " " */
+ LRU_INACTIVE = LRU_BASE, /* must match order of NR_[IN]ACTIVE */
+ LRU_ACTIVE, /* " " " " " */
NR_LRU_LISTS };
#define for_each_lru(l) for (l = 0; l < NR_LRU_LISTS; l++)
+static inline int is_active_lru(enum lru_list l)
+{
+ return (l == LRU_ACTIVE);
+}
+
+enum lru_list page_lru(struct page *page);
+
struct per_cpu_pages {
int count; /* number of pages in the list */
int high; /* high watermark, emptying needed */
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
@@ -34,8 +34,7 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
-static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
-static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
+static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs) = { {0,}, };
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs) = { 0, };
/*
@@ -96,6 +95,23 @@ void put_pages_list(struct list_head *pa
}
EXPORT_SYMBOL(put_pages_list);
+/**
+ * page_lru - which LRU list should a page be on?
+ * @page: the page to test
+ *
+ * Returns the LRU list a page should be on, as an index
+ * into the array of LRU lists.
+ */
+enum lru_list page_lru(struct page *page)
+{
+ enum lru_list lru = LRU_BASE;
+
+ if (PageActive(page))
+ lru += LRU_ACTIVE;
+
+ return lru;
+}
+
/*
* pagevec_move_tail() must be called with IRQ disabled.
* Otherwise this may cause nasty races.
@@ -186,28 +202,29 @@ void mark_page_accessed(struct page *pag
EXPORT_SYMBOL(mark_page_accessed);
-/**
- * lru_cache_add: add a page to the page lists
- * @page: the page to add
- */
-void lru_cache_add(struct page *page)
+void __lru_cache_add(struct page *page, enum lru_list lru)
{
- struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
page_cache_get(page);
if (!pagevec_add(pvec, page))
- __pagevec_lru_add(pvec);
+ ____pagevec_lru_add(pvec, lru);
put_cpu_var(lru_add_pvecs);
}
-void lru_cache_add_active(struct page *page)
+/**
+ * lru_cache_add_lru - add a page to a page list
+ * @page: the page to be added to the LRU.
+ * @lru: the LRU list to which the page is added.
+ */
+void lru_cache_add_lru(struct page *page, enum lru_list lru)
{
- struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);
+ if (PageActive(page)) {
+ ClearPageActive(page);
+ }
- page_cache_get(page);
- if (!pagevec_add(pvec, page))
- __pagevec_lru_add_active(pvec);
- put_cpu_var(lru_add_active_pvecs);
+ VM_BUG_ON(PageLRU(page) || PageActive(page));
+ __lru_cache_add(page, lru);
}
/*
@@ -217,15 +234,15 @@ void lru_cache_add_active(struct page *p
*/
static void drain_cpu_pagevecs(int cpu)
{
+ struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
struct pagevec *pvec;
+ int lru;
- pvec = &per_cpu(lru_add_pvecs, cpu);
- if (pagevec_count(pvec))
- __pagevec_lru_add(pvec);
-
- pvec = &per_cpu(lru_add_active_pvecs, cpu);
- if (pagevec_count(pvec))
- __pagevec_lru_add_active(pvec);
+ for_each_lru(lru) {
+ pvec = &pvecs[lru - LRU_BASE];
+ if (pagevec_count(pvec))
+ ____pagevec_lru_add(pvec, lru);
+ }
pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
@@ -379,7 +396,7 @@ void __pagevec_release_nonlru(struct pag
* Add the passed pages to the LRU, then drop the caller's refcount
* on them. Reinitialises the caller's pagevec.
*/
-void __pagevec_lru_add(struct pagevec *pvec)
+void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
{
int i;
struct zone *zone = NULL;
@@ -396,7 +413,9 @@ void __pagevec_lru_add(struct pagevec *p
}
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
- add_page_to_inactive_list(zone, page);
+ if (is_active_lru(lru))
+ SetPageActive(page);
+ add_page_to_lru_list(zone, page, lru);
}
if (zone)
spin_unlock_irq(&zone->lru_lock);
@@ -404,34 +423,7 @@ void __pagevec_lru_add(struct pagevec *p
pagevec_reinit(pvec);
}
-EXPORT_SYMBOL(__pagevec_lru_add);
-
-void __pagevec_lru_add_active(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- VM_BUG_ON(PageLRU(page));
- SetPageLRU(page);
- VM_BUG_ON(PageActive(page));
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
+EXPORT_SYMBOL(____pagevec_lru_add);
/*
* Try to drop buffers from the pages in a pagevec
Index: linux-2.6.26-rc2-mm1/include/linux/pagevec.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/pagevec.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/pagevec.h 2008-05-23 14:21:33.000000000 -0400
@@ -23,8 +23,7 @@ struct pagevec {
void __pagevec_release(struct pagevec *pvec);
void __pagevec_release_nonlru(struct pagevec *pvec);
void __pagevec_free(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec);
-void __pagevec_lru_add_active(struct pagevec *pvec);
+void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
void pagevec_strip(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
@@ -81,6 +80,16 @@ static inline void pagevec_free(struct p
__pagevec_free(pvec);
}
+static inline void __pagevec_lru_add(struct pagevec *pvec)
+{
+ ____pagevec_lru_add(pvec, LRU_INACTIVE);
+}
+
+static inline void __pagevec_lru_add_active(struct pagevec *pvec)
+{
+ ____pagevec_lru_add(pvec, LRU_ACTIVE);
+}
+
static inline void pagevec_lru_add(struct pagevec *pvec)
{
if (pagevec_count(pvec))
Index: linux-2.6.26-rc2-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/swap.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/swap.h 2008-05-23 14:21:33.000000000 -0400
@@ -171,8 +171,8 @@ extern unsigned int nr_free_pagecache_pa
/* linux/mm/swap.c */
-extern void lru_cache_add(struct page *);
-extern void lru_cache_add_active(struct page *);
+extern void __lru_cache_add(struct page *, enum lru_list lru);
+extern void lru_cache_add_lru(struct page *, enum lru_list lru);
extern void activate_page(struct page *);
extern void mark_page_accessed(struct page *);
extern void lru_add_drain(void);
@@ -180,6 +180,20 @@ extern int lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
extern void swap_setup(void);
+/**
+ * lru_cache_add: add a page to the page lists
+ * @page: the page to add
+ */
+static inline void lru_cache_add(struct page *page)
+{
+ __lru_cache_add(page, LRU_INACTIVE);
+}
+
+static inline void lru_cache_add_active(struct page *page)
+{
+ __lru_cache_add(page, LRU_ACTIVE);
+}
+
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask);
Index: linux-2.6.26-rc2-mm1/mm/migrate.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/migrate.c 2008-05-23 14:21:32.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/migrate.c 2008-05-23 14:21:33.000000000 -0400
@@ -55,16 +55,7 @@ int migrate_prep(void)
static inline void move_to_lru(struct page *page)
{
- if (PageActive(page)) {
- /*
- * lru_cache_add_active checks that
- * the PG_active bit is off.
- */
- ClearPageActive(page);
- lru_cache_add_active(page);
- } else {
- lru_cache_add(page);
- }
+ lru_cache_add_lru(page, page_lru(page));
put_page(page);
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 04/12] free swap space on swap-in/activation
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (2 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 03/12] use an array for the LRU pagevecs Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 05/12] define page_file_cache() function Rik van Riel
` (7 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, MinChan Kim,
Lee Schermerhorn
[-- Attachment #1: rvr-00-linux-2.6-swapfree.patch --]
[-- Type: text/plain, Size: 6190 bytes --]
From: Rik van Riel <riel@redhat.com>
Free swap cache entries when swapping in pages if vm_swap_full()
[swap space > 1/2 used]. Uses new pagevec to reduce pressure
on locks.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: MinChan Kim <minchan.kim@gmail.com>
---
include/linux/pagevec.h | 1 +
include/linux/swap.h | 6 ++++++
mm/swap.c | 18 ++++++++++++++++++
mm/swapfile.c | 25 ++++++++++++++++++++++---
mm/vmscan.c | 7 +++++++
5 files changed, 54 insertions(+), 3 deletions(-)
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
@@ -619,6 +619,9 @@ free_it:
continue;
activate_locked:
+ /* Not a candidate for swapping, so reclaim swap space. */
+ if (PageSwapCache(page) && vm_swap_full())
+ remove_exclusive_swap_page_ref(page);
SetPageActive(page);
pgactivate++;
keep_locked:
@@ -1203,6 +1206,8 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
@@ -1212,6 +1217,8 @@ static void shrink_active_list(unsigned
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);
pagevec_release(&pvec);
}
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
@@ -443,6 +443,24 @@ void pagevec_strip(struct pagevec *pvec)
}
}
+/*
+ * Try to free swap space from the pages in a pagevec
+ */
+void pagevec_swap_free(struct pagevec *pvec)
+{
+ int i;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+
+ if (PageSwapCache(page) && !TestSetPageLocked(page)) {
+ if (PageSwapCache(page))
+ remove_exclusive_swap_page_ref(page);
+ unlock_page(page);
+ }
+ }
+}
+
/**
* pagevec_lookup - gang pagecache lookup
* @pvec: Where the resulting pages are placed
Index: linux-2.6.26-rc2-mm1/include/linux/pagevec.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/pagevec.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/pagevec.h 2008-05-23 14:21:33.000000000 -0400
@@ -25,6 +25,7 @@ void __pagevec_release_nonlru(struct pag
void __pagevec_free(struct pagevec *pvec);
void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
void pagevec_strip(struct pagevec *pvec);
+void pagevec_swap_free(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
unsigned pagevec_lookup_tag(struct pagevec *pvec,
Index: linux-2.6.26-rc2-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/swap.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/swap.h 2008-05-23 14:21:33.000000000 -0400
@@ -265,6 +265,7 @@ extern sector_t swapdev_block(int, pgoff
extern struct swap_info_struct *get_swap_info_struct(unsigned);
extern int can_share_swap_page(struct page *);
extern int remove_exclusive_swap_page(struct page *);
+extern int remove_exclusive_swap_page_ref(struct page *);
struct backing_dev_info;
/* linux/mm/thrash.c */
@@ -353,6 +354,11 @@ static inline int remove_exclusive_swap_
return 0;
}
+static inline int remove_exclusive_swap_page_ref(struct page *page)
+{
+ return 0;
+}
+
static inline swp_entry_t get_swap_page(void)
{
swp_entry_t entry;
Index: linux-2.6.26-rc2-mm1/mm/swapfile.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swapfile.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swapfile.c 2008-05-23 14:21:33.000000000 -0400
@@ -343,7 +343,7 @@ int can_share_swap_page(struct page *pag
* Work out if there are any other processes sharing this
* swap cache page. Free it if you can. Return success.
*/
-int remove_exclusive_swap_page(struct page *page)
+static int remove_exclusive_swap_page_count(struct page *page, int count)
{
int retval;
struct swap_info_struct * p;
@@ -356,7 +356,7 @@ int remove_exclusive_swap_page(struct pa
return 0;
if (PageWriteback(page))
return 0;
- if (page_count(page) != 2) /* 2: us + cache */
+ if (page_count(page) != count) /* us + cache + ptes */
return 0;
entry.val = page_private(page);
@@ -369,7 +369,7 @@ int remove_exclusive_swap_page(struct pa
if (p->swap_map[swp_offset(entry)] == 1) {
/* Recheck the page count with the swapcache lock held.. */
write_lock_irq(&swapper_space.tree_lock);
- if ((page_count(page) == 2) && !PageWriteback(page)) {
+ if ((page_count(page) == count) && !PageWriteback(page)) {
__delete_from_swap_cache(page);
SetPageDirty(page);
retval = 1;
@@ -387,6 +387,25 @@ int remove_exclusive_swap_page(struct pa
}
/*
+ * Most of the time the page should have two references: one for the
+ * process and one for the swap cache.
+ */
+int remove_exclusive_swap_page(struct page *page)
+{
+ return remove_exclusive_swap_page_count(page, 2);
+}
+
+/*
+ * The pageout code holds an extra reference to the page. That raises
+ * the reference count to test for to 2 for a page that is only in the
+ * swap cache plus 1 for each process that maps the page.
+ */
+int remove_exclusive_swap_page_ref(struct page *page)
+{
+ return remove_exclusive_swap_page_count(page, 2 + page_mapcount(page));
+}
+
+/*
* Free the swap entry like above, but also try to
* free the page cache entry if it is the last user.
*/
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 05/12] define page_file_cache() function
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (3 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 04/12] free swap space on swap-in/activation Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 06/12] split LRU lists into anon & file sets Rik van Riel
` (6 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, MinChan Kim
[-- Attachment #1: rvr-01-linux-2.6-page_file_cache.patch --]
[-- Type: text/plain, Size: 7664 bytes --]
From: Rik van Riel <riel@redhat.com>
Define page_file_cache() function to answer the question:
is page backed by a file?
Originally part of Rik van Riel's split-lru patch. Extracted
to make available for other, independent reclaim patches.
Moved inline function to linux/mm_inline.h where it will
be needed by subsequent "split LRU" and "noreclaim" patches.
Unfortunately this needs to use a page flag, since the
PG_swapbacked state needs to be preserved all the way
to the point where the page is last removed from the
LRU. Trying to derive the status from other info in
the page resulted in wrong VM statistics in earlier
split VM patchsets.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: MinChan Kim <minchan.kim@gmail.com>
---
include/linux/mm_inline.h | 24 ++++++++++++++++++++++++
include/linux/page-flags.h | 2 ++
mm/memory.c | 3 +++
mm/migrate.c | 2 ++
mm/page_alloc.c | 4 ++++
mm/shmem.c | 1 +
mm/swap_state.c | 3 +++
7 files changed, 39 insertions(+)
Index: linux-2.6.26-rc2-mm1/include/linux/mm_inline.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mm_inline.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mm_inline.h 2008-05-23 14:21:34.000000000 -0400
@@ -1,3 +1,26 @@
+#ifndef LINUX_MM_INLINE_H
+#define LINUX_MM_INLINE_H
+
+/**
+ * page_file_cache - should the page be on a file LRU or anon LRU?
+ * @page: the page to test
+ *
+ * Returns !0 if @page is page cache page backed by a regular filesystem,
+ * or 0 if @page is anonymous, tmpfs or otherwise ram or swap backed.
+ *
+ * We would like to get this info without a page flag, but the state
+ * needs to survive until the page is last deleted from the LRU, which
+ * could be as far down as __page_cache_release.
+ */
+static inline int page_file_cache(struct page *page)
+{
+ if (PageSwapBacked(page))
+ return 0;
+
+ /* The page is page cache backed by a normal filesystem. */
+ return 2;
+}
+
static inline void
add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list l)
{
@@ -49,3 +72,4 @@ del_page_from_lru(struct zone *zone, str
__dec_zone_state(zone, NR_INACTIVE + l);
}
+#endif
Index: linux-2.6.26-rc2-mm1/mm/shmem.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/shmem.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/shmem.c 2008-05-23 14:21:34.000000000 -0400
@@ -1378,6 +1378,7 @@ repeat:
goto failed;
}
+ SetPageSwapBacked(filepage);
spin_lock(&info->lock);
entry = shmem_swp_alloc(info, idx, sgp);
if (IS_ERR(entry))
Index: linux-2.6.26-rc2-mm1/include/linux/page-flags.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/page-flags.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/page-flags.h 2008-05-23 14:21:34.000000000 -0400
@@ -93,6 +93,7 @@ enum pageflags {
PG_mappedtodisk, /* Has blocks allocated on-disk */
PG_reclaim, /* To be reclaimed asap */
PG_buddy, /* Page is free, on buddy lists */
+ PG_swapbacked, /* Page is backed by RAM/swap */
#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
PG_uncached, /* Page has been mapped as uncached */
#endif
@@ -160,6 +161,7 @@ PAGEFLAG(Pinned, owner_priv_1) TESTSCFLA
PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
PAGEFLAG(Private, private) __CLEARPAGEFLAG(Private, private)
__SETPAGEFLAG(Private, private)
+PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
/*
* Only test-and-set exist for PG_writeback. The unconditional operators are
Index: linux-2.6.26-rc2-mm1/mm/memory.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/memory.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/memory.c 2008-05-23 14:21:34.000000000 -0400
@@ -1765,6 +1765,7 @@ gotten:
ptep_clear_flush(vma, address, page_table);
set_pte_at(mm, address, page_table, entry);
update_mmu_cache(vma, address, entry);
+ SetPageSwapBacked(new_page);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
@@ -2233,6 +2234,7 @@ static int do_anonymous_page(struct mm_s
if (!pte_none(*page_table))
goto release;
inc_mm_counter(mm, anon_rss);
+ SetPageSwapBacked(page);
lru_cache_add_active(page);
page_add_new_anon_rmap(page, vma, address);
set_pte_at(mm, address, page_table, entry);
@@ -2374,6 +2376,7 @@ static int __do_fault(struct mm_struct *
set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
+ SetPageSwapBacked(page);
lru_cache_add_active(page);
page_add_new_anon_rmap(page, vma, address);
} else {
Index: linux-2.6.26-rc2-mm1/mm/swap_state.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap_state.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap_state.c 2008-05-23 14:21:34.000000000 -0400
@@ -74,6 +74,7 @@ int add_to_swap_cache(struct page *page,
BUG_ON(!PageLocked(page));
BUG_ON(PageSwapCache(page));
BUG_ON(PagePrivate(page));
+ BUG_ON(!PageSwapBacked(page));
error = radix_tree_preload(gfp_mask);
if (!error) {
write_lock_irq(&swapper_space.tree_lock);
@@ -295,6 +296,7 @@ struct page *read_swap_cache_async(swp_e
* May fail (-ENOMEM) if radix-tree node allocation failed.
*/
SetPageLocked(new_page);
+ SetPageSwapBacked(new_page);
err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
if (!err) {
/*
@@ -304,6 +306,7 @@ struct page *read_swap_cache_async(swp_e
swap_readpage(NULL, new_page);
return new_page;
}
+ ClearPageSwapBacked(new_page);
ClearPageLocked(new_page);
swap_free(entry);
} while (err != -ENOMEM);
Index: linux-2.6.26-rc2-mm1/mm/migrate.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/migrate.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/migrate.c 2008-05-23 14:21:34.000000000 -0400
@@ -558,6 +558,8 @@ static int move_to_new_page(struct page
/* Prepare mapping for the new page.*/
newpage->index = page->index;
newpage->mapping = page->mapping;
+ if (PageSwapBacked(page))
+ SetPageSwapBacked(newpage);
mapping = page_mapping(page);
if (!mapping)
Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-23 14:21:34.000000000 -0400
@@ -261,6 +261,7 @@ static void bad_page(struct page *page)
1 << PG_slab |
1 << PG_swapcache |
1 << PG_writeback |
+ 1 << PG_swapbacked |
1 << PG_buddy );
set_page_count(page, 0);
reset_page_mapcount(page);
@@ -494,6 +495,8 @@ static inline int free_pages_check(struc
bad_page(page);
if (PageDirty(page))
__ClearPageDirty(page);
+ if (PageSwapBacked(page))
+ __ClearPageSwapBacked(page);
/*
* For now, we report if PG_reserved was found set, but do not
* clear it, and do not free the page. But we shall soon need
@@ -644,6 +647,7 @@ static int prep_new_page(struct page *pa
1 << PG_swapcache |
1 << PG_writeback |
1 << PG_reserved |
+ 1 << PG_swapbacked |
1 << PG_buddy ))))
bad_page(page);
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 06/12] split LRU lists into anon & file sets
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (4 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 05/12] define page_file_cache() function Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-06-01 15:35 ` Balbir Singh
2008-05-29 20:22 ` [PATCH -mm 07/12] second chance replacement for anonymous pages Rik van Riel
` (5 subsequent siblings)
11 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro,
Lee Schermerhorn
[-- Attachment #1: rvr-02-linux-2.6-vm-split-lrus.patch --]
[-- Type: text/plain, Size: 59936 bytes --]
From: Rik van Riel <riel@redhat.com>
Split the LRU lists in two, one set for pages that are backed by
real file systems ("file") and one for pages that are backed by
memory and swap ("anon"). The latter includes tmpfs.
Eventually mlocked pages will be taken off the LRUs alltogether.
A patch for that already exists and just needs to be integrated
into this series.
This patch mostly has the infrastructure and a basic policy to
balance how much we scan the anon lists and how much we scan
the file lists. The big policy changes are in separate patches.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
---
drivers/base/node.c | 56 +++--
fs/cifs/file.c | 4
fs/nfs/dir.c | 2
fs/ntfs/file.c | 4
fs/proc/proc_misc.c | 81 +++++---
fs/ramfs/file-nommu.c | 4
include/linux/memcontrol.h | 2
include/linux/mm_inline.h | 55 ++++-
include/linux/mmzone.h | 33 ++-
include/linux/pagevec.h | 29 ++-
include/linux/swap.h | 20 +-
include/linux/vmstat.h | 10 +
mm/filemap.c | 9
mm/memcontrol.c | 73 +++----
mm/memory.c | 6
mm/page-writeback.c | 8
mm/page_alloc.c | 24 +-
mm/readahead.c | 2
mm/swap.c | 12 -
mm/swap_state.c | 2
mm/vmscan.c | 428 ++++++++++++++++++++-------------------------
mm/vmstat.c | 14 -
22 files changed, 491 insertions(+), 387 deletions(-)
Index: linux-2.6.26-rc2-mm1/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/fs/proc/proc_misc.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/fs/proc/proc_misc.c 2008-05-23 14:21:34.000000000 -0400
@@ -132,6 +132,10 @@ static int meminfo_read_proc(char *page,
unsigned long allowed;
struct vmalloc_info vmi;
long cached;
+ unsigned long inactive_anon;
+ unsigned long active_anon;
+ unsigned long inactive_file;
+ unsigned long active_file;
/*
* display in kilobytes.
@@ -150,48 +154,61 @@ static int meminfo_read_proc(char *page,
get_vmalloc_info(&vmi);
+ inactive_anon = global_page_state(NR_INACTIVE_ANON);
+ active_anon = global_page_state(NR_ACTIVE_ANON);
+ inactive_file = global_page_state(NR_INACTIVE_FILE);
+ active_file = global_page_state(NR_ACTIVE_FILE);
+
/*
* Tagged format, for easy grepping and expansion.
*/
len = sprintf(page,
- "MemTotal: %8lu kB\n"
- "MemFree: %8lu kB\n"
- "Buffers: %8lu kB\n"
- "Cached: %8lu kB\n"
- "SwapCached: %8lu kB\n"
- "Active: %8lu kB\n"
- "Inactive: %8lu kB\n"
+ "MemTotal: %8lu kB\n"
+ "MemFree: %8lu kB\n"
+ "Buffers: %8lu kB\n"
+ "Cached: %8lu kB\n"
+ "SwapCached: %8lu kB\n"
+ "Active: %8lu kB\n"
+ "Inactive: %8lu kB\n"
+ "Active(anon): %8lu kB\n"
+ "Inactive(anon): %8lu kB\n"
+ "Active(file): %8lu kB\n"
+ "Inactive(file): %8lu kB\n"
#ifdef CONFIG_HIGHMEM
- "HighTotal: %8lu kB\n"
- "HighFree: %8lu kB\n"
- "LowTotal: %8lu kB\n"
- "LowFree: %8lu kB\n"
-#endif
- "SwapTotal: %8lu kB\n"
- "SwapFree: %8lu kB\n"
- "Dirty: %8lu kB\n"
- "Writeback: %8lu kB\n"
- "AnonPages: %8lu kB\n"
- "Mapped: %8lu kB\n"
- "Slab: %8lu kB\n"
- "SReclaimable: %8lu kB\n"
- "SUnreclaim: %8lu kB\n"
- "PageTables: %8lu kB\n"
- "NFS_Unstable: %8lu kB\n"
- "Bounce: %8lu kB\n"
- "WritebackTmp: %8lu kB\n"
- "CommitLimit: %8lu kB\n"
- "Committed_AS: %8lu kB\n"
- "VmallocTotal: %8lu kB\n"
- "VmallocUsed: %8lu kB\n"
- "VmallocChunk: %8lu kB\n",
+ "HighTotal: %8lu kB\n"
+ "HighFree: %8lu kB\n"
+ "LowTotal: %8lu kB\n"
+ "LowFree: %8lu kB\n"
+#endif
+ "SwapTotal: %8lu kB\n"
+ "SwapFree: %8lu kB\n"
+ "Dirty: %8lu kB\n"
+ "Writeback: %8lu kB\n"
+ "AnonPages: %8lu kB\n"
+ "Mapped: %8lu kB\n"
+ "Slab: %8lu kB\n"
+ "SReclaimable: %8lu kB\n"
+ "SUnreclaim: %8lu kB\n"
+ "PageTables: %8lu kB\n"
+ "NFS_Unstable: %8lu kB\n"
+ "Bounce: %8lu kB\n"
+ "WritebackTmp: %8lu kB\n"
+ "CommitLimit: %8lu kB\n"
+ "Committed_AS: %8lu kB\n"
+ "VmallocTotal: %8lu kB\n"
+ "VmallocUsed: %8lu kB\n"
+ "VmallocChunk: %8lu kB\n",
K(i.totalram),
K(i.freeram),
K(i.bufferram),
K(cached),
K(total_swapcache_pages),
- K(global_page_state(NR_ACTIVE)),
- K(global_page_state(NR_INACTIVE)),
+ K(active_anon + active_file),
+ K(inactive_anon + inactive_file),
+ K(active_anon),
+ K(inactive_anon),
+ K(active_file),
+ K(inactive_file),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
K(i.freehigh),
Index: linux-2.6.26-rc2-mm1/fs/cifs/file.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/fs/cifs/file.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/fs/cifs/file.c 2008-05-23 14:21:34.000000000 -0400
@@ -1778,7 +1778,7 @@ static void cifs_copy_cache_pages(struct
SetPageUptodate(page);
unlock_page(page);
if (!pagevec_add(plru_pvec, page))
- __pagevec_lru_add(plru_pvec);
+ __pagevec_lru_add_file(plru_pvec);
data += PAGE_CACHE_SIZE;
}
return;
@@ -1912,7 +1912,7 @@ static int cifs_readpages(struct file *f
bytes_read = 0;
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_lru_add_file(&lru_pvec);
/* need to free smb_read_data buf before exit */
if (smb_read_data) {
Index: linux-2.6.26-rc2-mm1/fs/ntfs/file.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/fs/ntfs/file.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/fs/ntfs/file.c 2008-05-23 14:21:34.000000000 -0400
@@ -439,7 +439,7 @@ static inline int __ntfs_grab_cache_page
pages[nr] = *cached_page;
page_cache_get(*cached_page);
if (unlikely(!pagevec_add(lru_pvec, *cached_page)))
- __pagevec_lru_add(lru_pvec);
+ __pagevec_lru_add_file(lru_pvec);
*cached_page = NULL;
}
index++;
@@ -2084,7 +2084,7 @@ err_out:
OSYNC_METADATA|OSYNC_DATA);
}
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_lru_add_file(&lru_pvec);
ntfs_debug("Done. Returning %s (written 0x%lx, status %li).",
written ? "written" : "status", (unsigned long)written,
(long)status);
Index: linux-2.6.26-rc2-mm1/fs/nfs/dir.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/fs/nfs/dir.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/fs/nfs/dir.c 2008-05-23 14:21:34.000000000 -0400
@@ -1524,7 +1524,7 @@ static int nfs_symlink(struct inode *dir
if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
GFP_KERNEL)) {
pagevec_add(&lru_pvec, page);
- pagevec_lru_add(&lru_pvec);
+ pagevec_lru_add_file(&lru_pvec);
SetPageUptodate(page);
unlock_page(page);
} else
Index: linux-2.6.26-rc2-mm1/fs/ramfs/file-nommu.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/fs/ramfs/file-nommu.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/fs/ramfs/file-nommu.c 2008-05-23 14:21:34.000000000 -0400
@@ -111,12 +111,12 @@ static int ramfs_nommu_expand_for_mappin
goto add_error;
if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
+ __pagevec_lru_add_file(&lru_pvec);
unlock_page(page);
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_lru_add_file(&lru_pvec);
return 0;
fsize_exceeded:
Index: linux-2.6.26-rc2-mm1/drivers/base/node.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/drivers/base/node.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/drivers/base/node.c 2008-05-23 14:21:34.000000000 -0400
@@ -58,34 +58,44 @@ static ssize_t node_read_meminfo(struct
si_meminfo_node(&i, nid);
n = sprintf(buf, "\n"
- "Node %d MemTotal: %8lu kB\n"
- "Node %d MemFree: %8lu kB\n"
- "Node %d MemUsed: %8lu kB\n"
- "Node %d Active: %8lu kB\n"
- "Node %d Inactive: %8lu kB\n"
+ "Node %d MemTotal: %8lu kB\n"
+ "Node %d MemFree: %8lu kB\n"
+ "Node %d MemUsed: %8lu kB\n"
+ "Node %d Active: %8lu kB\n"
+ "Node %d Inactive: %8lu kB\n"
+ "Node %d Active(anon): %8lu kB\n"
+ "Node %d Inactive(anon): %8lu kB\n"
+ "Node %d Active(file): %8lu kB\n"
+ "Node %d Inactive(file): %8lu kB\n"
#ifdef CONFIG_HIGHMEM
- "Node %d HighTotal: %8lu kB\n"
- "Node %d HighFree: %8lu kB\n"
- "Node %d LowTotal: %8lu kB\n"
- "Node %d LowFree: %8lu kB\n"
+ "Node %d HighTotal: %8lu kB\n"
+ "Node %d HighFree: %8lu kB\n"
+ "Node %d LowTotal: %8lu kB\n"
+ "Node %d LowFree: %8lu kB\n"
#endif
- "Node %d Dirty: %8lu kB\n"
- "Node %d Writeback: %8lu kB\n"
- "Node %d FilePages: %8lu kB\n"
- "Node %d Mapped: %8lu kB\n"
- "Node %d AnonPages: %8lu kB\n"
- "Node %d PageTables: %8lu kB\n"
- "Node %d NFS_Unstable: %8lu kB\n"
- "Node %d Bounce: %8lu kB\n"
- "Node %d WritebackTmp: %8lu kB\n"
- "Node %d Slab: %8lu kB\n"
- "Node %d SReclaimable: %8lu kB\n"
- "Node %d SUnreclaim: %8lu kB\n",
+ "Node %d Dirty: %8lu kB\n"
+ "Node %d Writeback: %8lu kB\n"
+ "Node %d FilePages: %8lu kB\n"
+ "Node %d Mapped: %8lu kB\n"
+ "Node %d AnonPages: %8lu kB\n"
+ "Node %d PageTables: %8lu kB\n"
+ "Node %d NFS_Unstable: %8lu kB\n"
+ "Node %d Bounce: %8lu kB\n"
+ "Node %d WritebackTmp: %8lu kB\n"
+ "Node %d Slab: %8lu kB\n"
+ "Node %d SReclaimable: %8lu kB\n"
+ "Node %d SUnreclaim: %8lu kB\n",
nid, K(i.totalram),
nid, K(i.freeram),
nid, K(i.totalram - i.freeram),
- nid, node_page_state(nid, NR_ACTIVE),
- nid, node_page_state(nid, NR_INACTIVE),
+ nid, node_page_state(nid, NR_ACTIVE_ANON) +
+ node_page_state(nid, NR_ACTIVE_FILE),
+ nid, node_page_state(nid, NR_INACTIVE_ANON) +
+ node_page_state(nid, NR_INACTIVE_FILE),
+ nid, node_page_state(nid, NR_ACTIVE_ANON),
+ nid, node_page_state(nid, NR_INACTIVE_ANON),
+ nid, node_page_state(nid, NR_ACTIVE_FILE),
+ nid, node_page_state(nid, NR_INACTIVE_FILE),
#ifdef CONFIG_HIGHMEM
nid, K(i.totalhigh),
nid, K(i.freehigh),
Index: linux-2.6.26-rc2-mm1/mm/memory.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/memory.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/memory.c 2008-05-23 14:21:34.000000000 -0400
@@ -1766,7 +1766,7 @@ gotten:
set_pte_at(mm, address, page_table, entry);
update_mmu_cache(vma, address, entry);
SetPageSwapBacked(new_page);
- lru_cache_add_active(new_page);
+ lru_cache_add_active_anon(new_page);
page_add_new_anon_rmap(new_page, vma, address);
/* Free the old page.. */
@@ -2235,7 +2235,7 @@ static int do_anonymous_page(struct mm_s
goto release;
inc_mm_counter(mm, anon_rss);
SetPageSwapBacked(page);
- lru_cache_add_active(page);
+ lru_cache_add_active_anon(page);
page_add_new_anon_rmap(page, vma, address);
set_pte_at(mm, address, page_table, entry);
@@ -2377,7 +2377,7 @@ static int __do_fault(struct mm_struct *
if (anon) {
inc_mm_counter(mm, anon_rss);
SetPageSwapBacked(page);
- lru_cache_add_active(page);
+ lru_cache_add_active_anon(page);
page_add_new_anon_rmap(page, vma, address);
} else {
inc_mm_counter(mm, file_rss);
Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-23 14:21:34.000000000 -0400
@@ -1908,10 +1908,13 @@ void show_free_areas(void)
}
}
- printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu unstable:%lu\n"
+ printk("Active_anon:%lu active_file:%lu inactive_anon%lu\n"
+ " inactive_file:%lu dirty:%lu writeback:%lu unstable:%lu\n"
" free:%lu slab:%lu mapped:%lu pagetables:%lu bounce:%lu\n",
- global_page_state(NR_ACTIVE),
- global_page_state(NR_INACTIVE),
+ global_page_state(NR_ACTIVE_ANON),
+ global_page_state(NR_ACTIVE_FILE),
+ global_page_state(NR_INACTIVE_ANON),
+ global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
global_page_state(NR_UNSTABLE_NFS),
@@ -1934,8 +1937,10 @@ void show_free_areas(void)
" min:%lukB"
" low:%lukB"
" high:%lukB"
- " active:%lukB"
- " inactive:%lukB"
+ " active_anon:%lukB"
+ " inactive_anon:%lukB"
+ " active_file:%lukB"
+ " inactive_file:%lukB"
" present:%lukB"
" pages_scanned:%lu"
" all_unreclaimable? %s"
@@ -1945,8 +1950,10 @@ void show_free_areas(void)
K(zone->pages_min),
K(zone->pages_low),
K(zone->pages_high),
- K(zone_page_state(zone, NR_ACTIVE)),
- K(zone_page_state(zone, NR_INACTIVE)),
+ K(zone_page_state(zone, NR_ACTIVE_ANON)),
+ K(zone_page_state(zone, NR_INACTIVE_ANON)),
+ K(zone_page_state(zone, NR_ACTIVE_FILE)),
+ K(zone_page_state(zone, NR_INACTIVE_FILE)),
K(zone->present_pages),
zone->pages_scanned,
(zone_is_all_unreclaimable(zone) ? "yes" : "no")
@@ -3503,6 +3510,9 @@ static void __paginginit free_area_init_
INIT_LIST_HEAD(&zone->list[l]);
zone->nr_scan[l] = 0;
}
+ zone->recent_rotated_anon = 0;
+ zone->recent_rotated_file = 0;
+//TODO recent_scanned_* ???
zap_zone_vm_stats(zone);
zone->flags = 0;
if (!size)
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap.c 2008-05-23 14:21:34.000000000 -0400
@@ -108,6 +108,7 @@ enum lru_list page_lru(struct page *page
if (PageActive(page))
lru += LRU_ACTIVE;
+ lru += page_file_cache(page);
return lru;
}
@@ -133,7 +134,8 @@ static void pagevec_move_tail(struct pag
spin_lock(&zone->lru_lock);
}
if (PageLRU(page) && !PageActive(page)) {
- list_move_tail(&page->lru, &zone->list[LRU_INACTIVE]);
+ int lru = page_file_cache(page);
+ list_move_tail(&page->lru, &zone->list[lru]);
pgmoved++;
}
}
@@ -174,9 +176,13 @@ void activate_page(struct page *page)
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page) && !PageActive(page)) {
- del_page_from_inactive_list(zone, page);
+ int lru = LRU_BASE;
+ lru += page_file_cache(page);
+ del_page_from_lru_list(zone, page, lru);
+
SetPageActive(page);
- add_page_to_active_list(zone, page);
+ lru += LRU_ACTIVE;
+ add_page_to_lru_list(zone, page, lru);
__count_vm_event(PGACTIVATE);
mem_cgroup_move_lists(page, true);
}
Index: linux-2.6.26-rc2-mm1/mm/readahead.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/readahead.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/readahead.c 2008-05-23 14:21:34.000000000 -0400
@@ -229,7 +229,7 @@ int do_page_cache_readahead(struct addre
*/
unsigned long max_sane_readahead(unsigned long nr)
{
- return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE)
+ return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
+ node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
}
Index: linux-2.6.26-rc2-mm1/mm/filemap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/filemap.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/filemap.c 2008-05-23 14:21:34.000000000 -0400
@@ -33,6 +33,7 @@
#include <linux/cpuset.h>
#include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
#include <linux/memcontrol.h>
+#include <linux/mm_inline.h> /* for page_file_cache() */
#include "internal.h"
/*
@@ -489,8 +490,12 @@ int add_to_page_cache_lru(struct page *p
pgoff_t offset, gfp_t gfp_mask)
{
int ret = add_to_page_cache(page, mapping, offset, gfp_mask);
- if (ret == 0)
- lru_cache_add(page);
+ if (ret == 0) {
+ if (page_file_cache(page))
+ lru_cache_add_file(page);
+ else
+ lru_cache_add_active_anon(page);
+ }
return ret;
}
Index: linux-2.6.26-rc2-mm1/mm/vmstat.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmstat.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmstat.c 2008-05-23 14:21:34.000000000 -0400
@@ -695,8 +695,10 @@ const struct seq_operations pagetypeinfo
static const char * const vmstat_text[] = {
/* Zoned VM counters */
"nr_free_pages",
- "nr_inactive",
- "nr_active",
+ "nr_inactive_anon",
+ "nr_active_anon",
+ "nr_inactive_file",
+ "nr_active_file",
"nr_anon_pages",
"nr_mapped",
"nr_file_pages",
@@ -764,7 +766,7 @@ static void zoneinfo_show_print(struct s
"\n min %lu"
"\n low %lu"
"\n high %lu"
- "\n scanned %lu (a: %lu i: %lu)"
+ "\n scanned %lu (aa: %lu ia: %lu af: %lu if: %lu)"
"\n spanned %lu"
"\n present %lu",
zone_page_state(zone, NR_FREE_PAGES),
@@ -772,8 +774,10 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->nr_scan[LRU_ACTIVE],
- zone->nr_scan[LRU_INACTIVE],
+ zone->nr_scan[LRU_ACTIVE_ANON],
+ zone->nr_scan[LRU_INACTIVE_ANON],
+ zone->nr_scan[LRU_ACTIVE_FILE],
+ zone->nr_scan[LRU_INACTIVE_FILE],
zone->spanned_pages,
zone->present_pages);
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-23 14:21:34.000000000 -0400
@@ -78,7 +78,7 @@ struct scan_control {
unsigned long (*isolate_pages)(unsigned long nr, struct list_head *dst,
unsigned long *scanned, int order, int mode,
struct zone *z, struct mem_cgroup *mem_cont,
- int active);
+ int active, int file);
};
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -236,27 +236,6 @@ unsigned long shrink_slab(unsigned long
return ret;
}
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
-}
-
static inline int is_page_cache_freeable(struct page *page)
{
return page_count(page) - !!PagePrivate(page) == 2;
@@ -518,8 +497,7 @@ static unsigned long shrink_page_list(st
referenced = page_referenced(page, 1, sc->mem_cgroup);
/* In active use or really unfreeable? Activate it. */
- if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
- referenced && page_mapping_inuse(page))
+ if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
goto activate_locked;
#ifdef CONFIG_SWAP
@@ -550,8 +528,6 @@ static unsigned long shrink_page_list(st
}
if (PageDirty(page)) {
- if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
- goto keep_locked;
if (!may_enter_fs)
goto keep_locked;
if (!sc->may_writepage)
@@ -652,7 +628,7 @@ keep:
*
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, int mode)
+int __isolate_lru_page(struct page *page, int mode, int file)
{
int ret = -EINVAL;
@@ -668,6 +644,9 @@ int __isolate_lru_page(struct page *page
if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
return ret;
+ if (mode != ISOLATE_BOTH && (!page_file_cache(page) != !file))
+ return ret;
+
ret = -EBUSY;
if (likely(get_page_unless_zero(page))) {
/*
@@ -698,12 +677,13 @@ int __isolate_lru_page(struct page *page
* @scanned: The number of pages that were scanned.
* @order: The caller's attempted allocation order
* @mode: One of the LRU isolation modes
+ * @file: True [1] if isolating file [!anon] pages
*
* returns how many pages were moved onto *@dst.
*/
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
- unsigned long *scanned, int order, int mode)
+ unsigned long *scanned, int order, int mode, int file)
{
unsigned long nr_taken = 0;
unsigned long scan;
@@ -720,7 +700,7 @@ static unsigned long isolate_lru_pages(u
VM_BUG_ON(!PageLRU(page));
- switch (__isolate_lru_page(page, mode)) {
+ switch (__isolate_lru_page(page, mode, file)) {
case 0:
list_move(&page->lru, dst);
nr_taken++;
@@ -763,10 +743,11 @@ static unsigned long isolate_lru_pages(u
break;
cursor_page = pfn_to_page(pfn);
+
/* Check that we have not crossed a zone boundary. */
if (unlikely(page_zone_id(cursor_page) != zone_id))
continue;
- switch (__isolate_lru_page(cursor_page, mode)) {
+ switch (__isolate_lru_page(cursor_page, mode, file)) {
case 0:
list_move(&cursor_page->lru, dst);
nr_taken++;
@@ -791,30 +772,37 @@ static unsigned long isolate_pages_globa
unsigned long *scanned, int order,
int mode, struct zone *z,
struct mem_cgroup *mem_cont,
- int active)
+ int active, int file)
{
+ int lru = LRU_BASE;
if (active)
- return isolate_lru_pages(nr, &z->list[LRU_ACTIVE], dst,
- scanned, order, mode);
- else
- return isolate_lru_pages(nr, &z->list[LRU_INACTIVE], dst,
- scanned, order, mode);
+ lru += LRU_ACTIVE;
+ if (file)
+ lru += LRU_FILE;
+ return isolate_lru_pages(nr, &z->list[lru], dst, scanned, order,
+ mode, !!file);
}
/*
* clear_active_flags() is a helper for shrink_active_list(), clearing
* any active bits from the pages in the list.
*/
-static unsigned long clear_active_flags(struct list_head *page_list)
+static unsigned long clear_active_flags(struct list_head *page_list,
+ unsigned int *count)
{
int nr_active = 0;
+ int lru;
struct page *page;
- list_for_each_entry(page, page_list, lru)
+ list_for_each_entry(page, page_list, lru) {
+ lru = page_file_cache(page);
if (PageActive(page)) {
+ lru += LRU_ACTIVE;
ClearPageActive(page);
nr_active++;
}
+ count[lru]++;
+ }
return nr_active;
}
@@ -852,12 +840,12 @@ int isolate_lru_page(struct page *page)
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page) && get_page_unless_zero(page)) {
+ int lru = LRU_BASE;
ret = 0;
ClearPageLRU(page);
- if (PageActive(page))
- del_page_from_active_list(zone, page);
- else
- del_page_from_inactive_list(zone, page);
+
+ lru += page_file_cache(page) + !!PageActive(page);
+ del_page_from_lru_list(zone, page, lru);
}
spin_unlock_irq(&zone->lru_lock);
}
@@ -869,7 +857,7 @@ int isolate_lru_page(struct page *page)
* of reclaimed pages
*/
static unsigned long shrink_inactive_list(unsigned long max_scan,
- struct zone *zone, struct scan_control *sc)
+ struct zone *zone, struct scan_control *sc, int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -886,18 +874,25 @@ static unsigned long shrink_inactive_lis
unsigned long nr_scan;
unsigned long nr_freed;
unsigned long nr_active;
+ unsigned int count[NR_LRU_LISTS] = { 0, };
+ int mode = (sc->order > PAGE_ALLOC_COSTLY_ORDER) ?
+ ISOLATE_BOTH : ISOLATE_INACTIVE;
nr_taken = sc->isolate_pages(sc->swap_cluster_max,
- &page_list, &nr_scan, sc->order,
- (sc->order > PAGE_ALLOC_COSTLY_ORDER)?
- ISOLATE_BOTH : ISOLATE_INACTIVE,
- zone, sc->mem_cgroup, 0);
- nr_active = clear_active_flags(&page_list);
+ &page_list, &nr_scan, sc->order, mode,
+ zone, sc->mem_cgroup, 0, file);
+ nr_active = clear_active_flags(&page_list, count);
__count_vm_events(PGDEACTIVATE, nr_active);
- __mod_zone_page_state(zone, NR_ACTIVE, -nr_active);
- __mod_zone_page_state(zone, NR_INACTIVE,
- -(nr_taken - nr_active));
+ __mod_zone_page_state(zone, NR_ACTIVE_FILE,
+ -count[LRU_ACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_INACTIVE_FILE,
+ -count[LRU_INACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_ACTIVE_ANON,
+ -count[LRU_ACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON,
+ -count[LRU_INACTIVE_ANON]);
+
if (scan_global_lru(sc))
zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
@@ -919,7 +914,7 @@ static unsigned long shrink_inactive_lis
* The attempt at page out may have made some
* of the pages active, mark them inactive again.
*/
- nr_active = clear_active_flags(&page_list);
+ nr_active = clear_active_flags(&page_list, count);
count_vm_events(PGDEACTIVATE, nr_active);
nr_freed += shrink_page_list(&page_list, sc,
@@ -944,11 +939,20 @@ static unsigned long shrink_inactive_lis
* Put back any unfreeable pages.
*/
while (!list_empty(&page_list)) {
+ int lru = LRU_BASE;
page = lru_to_page(&page_list);
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
list_del(&page->lru);
- add_page_to_lru_list(zone, page, PageActive(page));
+ if (page_file_cache(page)) {
+ lru += LRU_FILE;
+ zone->recent_rotated_file++;
+ } else {
+ zone->recent_rotated_anon++;
+ }
+ if (PageActive(page))
+ lru += LRU_ACTIVE;
+ add_page_to_lru_list(zone, page, lru);
if (!pagevec_add(&pvec, page)) {
spin_unlock_irq(&zone->lru_lock);
__pagevec_release(&pvec);
@@ -979,115 +983,7 @@ static inline void note_zone_scanning_pr
static inline int zone_is_near_oom(struct zone *zone)
{
- return zone->pages_scanned >= (zone_page_state(zone, NR_ACTIVE)
- + zone_page_state(zone, NR_INACTIVE))*3;
-}
-
-/*
- * Determine we should try to reclaim mapped pages.
- * This is called only when sc->mem_cgroup is NULL.
- */
-static int calc_reclaim_mapped(struct scan_control *sc, struct zone *zone,
- int priority)
-{
- long mapped_ratio;
- long distress;
- long swap_tendency;
- long imbalance;
- int reclaim_mapped = 0;
- int prev_priority;
-
- if (scan_global_lru(sc) && zone_is_near_oom(zone))
- return 1;
- /*
- * `distress' is a measure of how much trouble we're having
- * reclaiming pages. 0 -> no problems. 100 -> great trouble.
- */
- if (scan_global_lru(sc))
- prev_priority = zone->prev_priority;
- else
- prev_priority = mem_cgroup_get_reclaim_priority(sc->mem_cgroup);
-
- distress = 100 >> min(prev_priority, priority);
-
- /*
- * The point of this algorithm is to decide when to start
- * reclaiming mapped memory instead of just pagecache. Work out
- * how much memory
- * is mapped.
- */
- if (scan_global_lru(sc))
- mapped_ratio = ((global_page_state(NR_FILE_MAPPED) +
- global_page_state(NR_ANON_PAGES)) * 100) /
- vm_total_pages;
- else
- mapped_ratio = mem_cgroup_calc_mapped_ratio(sc->mem_cgroup);
-
- /*
- * Now decide how much we really want to unmap some pages. The
- * mapped ratio is downgraded - just because there's a lot of
- * mapped memory doesn't necessarily mean that page reclaim
- * isn't succeeding.
- *
- * The distress ratio is important - we don't want to start
- * going oom.
- *
- * A 100% value of vm_swappiness overrides this algorithm
- * altogether.
- */
- swap_tendency = mapped_ratio / 2 + distress + sc->swappiness;
-
- /*
- * If there's huge imbalance between active and inactive
- * (think active 100 times larger than inactive) we should
- * become more permissive, or the system will take too much
- * cpu before it start swapping during memory pressure.
- * Distress is about avoiding early-oom, this is about
- * making swappiness graceful despite setting it to low
- * values.
- *
- * Avoid div by zero with nr_inactive+1, and max resulting
- * value is vm_total_pages.
- */
- if (scan_global_lru(sc)) {
- imbalance = zone_page_state(zone, NR_ACTIVE);
- imbalance /= zone_page_state(zone, NR_INACTIVE) + 1;
- } else
- imbalance = mem_cgroup_reclaim_imbalance(sc->mem_cgroup);
-
- /*
- * Reduce the effect of imbalance if swappiness is low,
- * this means for a swappiness very low, the imbalance
- * must be much higher than 100 for this logic to make
- * the difference.
- *
- * Max temporary value is vm_total_pages*100.
- */
- imbalance *= (vm_swappiness + 1);
- imbalance /= 100;
-
- /*
- * If not much of the ram is mapped, makes the imbalance
- * less relevant, it's high priority we refill the inactive
- * list with mapped pages only in presence of high ratio of
- * mapped pages.
- *
- * Max temporary value is vm_total_pages*100.
- */
- imbalance *= mapped_ratio;
- imbalance /= 100;
-
- /* apply imbalance feedback to swap_tendency */
- swap_tendency += imbalance;
-
- /*
- * Now use this metric to decide whether to start moving mapped
- * memory onto the inactive list.
- */
- if (swap_tendency >= 100)
- reclaim_mapped = 1;
-
- return reclaim_mapped;
+ return zone->pages_scanned >= (zone_lru_pages(zone) * 3);
}
/*
@@ -1110,7 +1006,7 @@ static int calc_reclaim_mapped(struct sc
static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc, int priority)
+ struct scan_control *sc, int priority, int file)
{
unsigned long pgmoved;
int pgdeactivate = 0;
@@ -1120,16 +1016,13 @@ static void shrink_active_list(unsigned
LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;
- int reclaim_mapped = 0;
-
- if (sc->may_swap)
- reclaim_mapped = calc_reclaim_mapped(sc, zone, priority);
+ enum lru_list lru;
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order,
ISOLATE_ACTIVE, zone,
- sc->mem_cgroup, 1);
+ sc->mem_cgroup, 1, file);
/*
* zone->pages_scanned is used for detect zone's oom
* mem_cgroup remembers nr_scan by itself.
@@ -1137,29 +1030,29 @@ static void shrink_active_list(unsigned
if (scan_global_lru(sc))
zone->pages_scanned += pgscanned;
- __mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
+ if (file)
+ __mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
+ else
+ __mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
spin_unlock_irq(&zone->lru_lock);
while (!list_empty(&l_hold)) {
cond_resched();
page = lru_to_page(&l_hold);
list_del(&page->lru);
- if (page_mapped(page)) {
- if (!reclaim_mapped ||
- (total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0, sc->mem_cgroup)) {
- list_add(&page->lru, &l_active);
- continue;
- }
- } else if (TestClearPageReferenced(page)) {
+ if (page_referenced(page, 0, sc->mem_cgroup))
list_add(&page->lru, &l_active);
- continue;
- }
- list_add(&page->lru, &l_inactive);
+ else
+ list_add(&page->lru, &l_inactive);
}
+ /*
+ * Now put the pages back on the appropriate [file or anon] inactive
+ * and active lists.
+ */
pagevec_init(&pvec, 1);
pgmoved = 0;
+ lru = LRU_BASE + file * LRU_FILE;
spin_lock_irq(&zone->lru_lock);
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
@@ -1169,11 +1062,12 @@ static void shrink_active_list(unsigned
VM_BUG_ON(!PageActive(page));
ClearPageActive(page);
- list_move(&page->lru, &zone->list[LRU_INACTIVE]);
+ list_move(&page->lru, &zone->list[lru]);
mem_cgroup_move_lists(page, false);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
- __mod_zone_page_state(zone, NR_INACTIVE, pgmoved);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON + lru,
+ pgmoved);
spin_unlock_irq(&zone->lru_lock);
pgdeactivate += pgmoved;
pgmoved = 0;
@@ -1183,7 +1077,7 @@ static void shrink_active_list(unsigned
spin_lock_irq(&zone->lru_lock);
}
}
- __mod_zone_page_state(zone, NR_INACTIVE, pgmoved);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON + lru, pgmoved);
pgdeactivate += pgmoved;
if (buffer_heads_over_limit) {
spin_unlock_irq(&zone->lru_lock);
@@ -1192,6 +1086,7 @@ static void shrink_active_list(unsigned
}
pgmoved = 0;
+ lru = LRU_ACTIVE + file * LRU_FILE;
while (!list_empty(&l_active)) {
page = lru_to_page(&l_active);
prefetchw_prev_lru_page(page, &l_active, flags);
@@ -1199,11 +1094,12 @@ static void shrink_active_list(unsigned
SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->list[LRU_ACTIVE]);
+ list_move(&page->lru, &zone->list[lru]);
mem_cgroup_move_lists(page, true);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
- __mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON + lru,
+ pgmoved);
pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
if (vm_swap_full())
@@ -1212,7 +1108,12 @@ static void shrink_active_list(unsigned
spin_lock_irq(&zone->lru_lock);
}
}
- __mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON + lru, pgmoved);
+ if (file) {
+ zone->recent_rotated_file += pgmoved;
+ } else {
+ zone->recent_rotated_anon += pgmoved;
+ }
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
@@ -1223,16 +1124,82 @@ static void shrink_active_list(unsigned
pagevec_release(&pvec);
}
-static unsigned long shrink_list(enum lru_list l, unsigned long nr_to_scan,
+static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct zone *zone, struct scan_control *sc, int priority)
{
- if (l == LRU_ACTIVE) {
- shrink_active_list(nr_to_scan, zone, sc, priority);
+ int file = is_file_lru(lru);
+
+ if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) {
+ shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
- return shrink_inactive_list(nr_to_scan, zone, sc);
+ return shrink_inactive_list(nr_to_scan, zone, sc, file);
+}
+
+/*
+ * The utility of the anon and file memory corresponds to the fraction
+ * of pages that were recently referenced in each category. Pageout
+ * pressure is distributed according to the size of each set, the fraction
+ * of recently referenced pages (except used-once file pages) and the
+ * swappiness parameter.
+ *
+ * We return the relative pressures as percentages so shrink_zone can
+ * easily use them.
+ */
+static void get_scan_ratio(struct zone *zone, struct scan_control * sc,
+ unsigned long *percent)
+{
+ unsigned long anon, file;
+ unsigned long anon_prio, file_prio;
+ unsigned long rotate_sum;
+ unsigned long ap, fp;
+
+ anon = zone_page_state(zone, NR_ACTIVE_ANON) +
+ zone_page_state(zone, NR_INACTIVE_ANON);
+ file = zone_page_state(zone, NR_ACTIVE_FILE) +
+ zone_page_state(zone, NR_INACTIVE_FILE);
+
+ rotate_sum = zone->recent_rotated_file + zone->recent_rotated_anon;
+
+ /* Keep a floating average of RECENT references. */
+ if (unlikely(rotate_sum > min(anon, file))) {
+ spin_lock_irq(&zone->lru_lock);
+ zone->recent_rotated_file /= 2;
+ zone->recent_rotated_anon /= 2;
+ spin_unlock_irq(&zone->lru_lock);
+ rotate_sum /= 2;
+ }
+
+ /*
+ * With swappiness at 100, anonymous and file have the same priority.
+ * This scanning priority is essentially the inverse of IO cost.
+ */
+ anon_prio = sc->swappiness;
+ file_prio = 200 - sc->swappiness;
+
+ /*
+ * anon recent_rotated_anon
+ * %anon = 100 * ----------- / ------------------- * IO cost
+ * anon + file rotate_sum
+ */
+ ap = (anon_prio * anon) / (anon + file + 1);
+ ap *= rotate_sum / (zone->recent_rotated_anon + 1);
+ if (ap == 0)
+ ap = 1;
+ else if (ap > 100)
+ ap = 100;
+ percent[0] = ap;
+
+ fp = (file_prio * file) / (anon + file + 1);
+ fp *= rotate_sum / (zone->recent_rotated_file + 1);
+ if (fp == 0)
+ fp = 1;
+ else if (fp > 100)
+ fp = 100;
+ percent[1] = fp;
}
+
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
@@ -1242,36 +1209,38 @@ static unsigned long shrink_zone(int pri
unsigned long nr[NR_LRU_LISTS];
unsigned long nr_to_scan;
unsigned long nr_reclaimed = 0;
+ unsigned long percent[2]; /* anon @ 0; file @ 1 */
enum lru_list l;
- if (scan_global_lru(sc)) {
- /*
- * Add one to nr_to_scan just to make sure that the kernel
- * will slowly sift through the active list.
- */
- for_each_lru(l) {
+ get_scan_ratio(zone, sc, percent);
+
+ for_each_lru(l) {
+ if (scan_global_lru(sc)) {
+ int file = is_file_lru(l);
+ /*
+ * Add one to nr_to_scan just to make sure that the
+ * kernel will slowly sift through the active list.
+ */
zone->nr_scan[l] += (zone_page_state(zone,
- NR_INACTIVE + l) >> priority) + 1;
- nr[l] = zone->nr_scan[l];
+ NR_INACTIVE_ANON + l) >> priority) + 1;
+ nr[l] = zone->nr_scan[l] * percent[file] / 100;
if (nr[l] >= sc->swap_cluster_max)
zone->nr_scan[l] = 0;
else
nr[l] = 0;
+ } else {
+ /*
+ * This reclaim occurs not because zone memory shortage
+ * but because memory controller hits its limit.
+ * Then, don't modify zone reclaim related data.
+ */
+ nr[l] = mem_cgroup_calc_reclaim(sc->mem_cgroup, zone,
+ priority, l);
}
- } else {
- /*
- * This reclaim occurs not because zone memory shortage but
- * because memory controller hits its limit.
- * Then, don't modify zone reclaim related data.
- */
- nr[LRU_ACTIVE] = mem_cgroup_calc_reclaim(sc->mem_cgroup,
- zone, priority, LRU_ACTIVE);
-
- nr[LRU_INACTIVE] = mem_cgroup_calc_reclaim(sc->mem_cgroup,
- zone, priority, LRU_INACTIVE);
}
- while (nr[LRU_ACTIVE] || nr[LRU_INACTIVE]) {
+ while (nr[LRU_ACTIVE_ANON] || nr[LRU_INACTIVE_ANON] ||
+ nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) {
for_each_lru(l) {
if (nr[l]) {
nr_to_scan = min(nr[l],
@@ -1344,7 +1313,7 @@ static unsigned long shrink_zones(int pr
return nr_reclaimed;
}
-
+
/*
* This is the main entry point to direct page reclaim.
*
@@ -1385,8 +1354,7 @@ static unsigned long do_try_to_free_page
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
- lru_pages += zone_page_state(zone, NR_ACTIVE)
- + zone_page_state(zone, NR_INACTIVE);
+ lru_pages += zone_lru_pages(zone);
}
}
@@ -1586,8 +1554,7 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
- lru_pages += zone_page_state(zone, NR_ACTIVE)
- + zone_page_state(zone, NR_INACTIVE);
+ lru_pages += zone_lru_pages(zone);
}
/*
@@ -1631,8 +1598,7 @@ loop_again:
if (zone_is_all_unreclaimable(zone))
continue;
if (nr_slab == 0 && zone->pages_scanned >=
- (zone_page_state(zone, NR_ACTIVE)
- + zone_page_state(zone, NR_INACTIVE)) * 6)
+ (zone_lru_pages(zone) * 6))
zone_set_flag(zone,
ZONE_ALL_UNRECLAIMABLE);
/*
@@ -1686,7 +1652,7 @@ out:
/*
* The background pageout daemon, started as a kernel thread
- * from the init process.
+ * from the init process.
*
* This basically trickles out pages so that we have _some_
* free memory available even if there is no other activity
@@ -1780,6 +1746,14 @@ void wakeup_kswapd(struct zone *zone, in
wake_up_interruptible(&pgdat->kswapd_wait);
}
+unsigned long global_lru_pages(void)
+{
+ return global_page_state(NR_ACTIVE_ANON)
+ + global_page_state(NR_ACTIVE_FILE)
+ + global_page_state(NR_INACTIVE_ANON)
+ + global_page_state(NR_INACTIVE_FILE);
+}
+
#ifdef CONFIG_PM
/*
* Helper function for shrink_all_memory(). Tries to reclaim 'nr_pages' pages
@@ -1805,17 +1779,18 @@ static unsigned long shrink_all_zones(un
for_each_lru(l) {
/* For pass = 0 we don't shrink the active list */
- if (pass == 0 && l == LRU_ACTIVE)
+ if (pass == 0 &&
+ (l == LRU_ACTIVE_ANON || l == LRU_ACTIVE_FILE))
continue;
zone->nr_scan[l] +=
- (zone_page_state(zone, NR_INACTIVE + l)
+ (zone_page_state(zone, NR_INACTIVE_ANON + l)
>> prio) + 1;
if (zone->nr_scan[l] >= nr_pages || pass > 3) {
zone->nr_scan[l] = 0;
nr_to_scan = min(nr_pages,
zone_page_state(zone,
- NR_INACTIVE + l));
+ NR_INACTIVE_ANON + l));
ret += shrink_list(l, nr_to_scan, zone,
sc, prio);
if (ret >= nr_pages)
@@ -1827,11 +1802,6 @@ static unsigned long shrink_all_zones(un
return ret;
}
-static unsigned long count_lru_pages(void)
-{
- return global_page_state(NR_ACTIVE) + global_page_state(NR_INACTIVE);
-}
-
/*
* Try to free `nr_pages' of memory, system-wide, and return the number of
* freed pages.
@@ -1857,7 +1827,7 @@ unsigned long shrink_all_memory(unsigned
current->reclaim_state = &reclaim_state;
- lru_pages = count_lru_pages();
+ lru_pages = global_lru_pages();
nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
/* If slab caches are huge, it's better to hit them first */
while (nr_slab >= lru_pages) {
@@ -1900,7 +1870,7 @@ unsigned long shrink_all_memory(unsigned
reclaim_state.reclaimed_slab = 0;
shrink_slab(sc.nr_scanned, sc.gfp_mask,
- count_lru_pages());
+ global_lru_pages());
ret += reclaim_state.reclaimed_slab;
if (ret >= nr_pages)
goto out;
@@ -1917,7 +1887,7 @@ unsigned long shrink_all_memory(unsigned
if (!ret) {
do {
reclaim_state.reclaimed_slab = 0;
- shrink_slab(nr_pages, sc.gfp_mask, count_lru_pages());
+ shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
ret += reclaim_state.reclaimed_slab;
} while (ret < nr_pages && reclaim_state.reclaimed_slab > 0);
}
Index: linux-2.6.26-rc2-mm1/mm/swap_state.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap_state.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap_state.c 2008-05-23 14:21:34.000000000 -0400
@@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
/*
* Initiate read into locked page and return.
*/
- lru_cache_add_active(new_page);
+ lru_cache_add_active_anon(new_page);
swap_readpage(NULL, new_page);
return new_page;
}
Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-23 14:21:34.000000000 -0400
@@ -81,21 +81,23 @@ struct zone_padding {
enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
- NR_INACTIVE, /* must match order of LRU_[IN]ACTIVE */
- NR_ACTIVE, /* " " " " " */
+ NR_INACTIVE_ANON, /* must match order of LRU_[IN]ACTIVE_* */
+ NR_ACTIVE_ANON, /* " " " " " */
+ NR_INACTIVE_FILE, /* " " " " " */
+ NR_ACTIVE_FILE, /* " " " " " */
NR_ANON_PAGES, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
only modified from process context */
NR_FILE_PAGES,
NR_FILE_DIRTY,
NR_WRITEBACK,
- /* Second 128 byte cacheline */
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
+ /* Second 128 byte cacheline */
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
@@ -107,18 +109,33 @@ enum zone_stat_item {
#endif
NR_VM_ZONE_STAT_ITEMS };
+/*
+ * We do arithmetic on the LRU lists in various places in the code,
+ * so it is important to keep the active lists LRU_ACTIVE higher in
+ * the array than the corresponding inactive lists, and to keep
+ * the *_FILE lists LRU_FILE higher than the corresponding _ANON lists.
+ */
#define LRU_BASE 0
+#define LRU_ACTIVE 1
+#define LRU_FILE 2
enum lru_list {
- LRU_INACTIVE = LRU_BASE, /* must match order of NR_[IN]ACTIVE */
- LRU_ACTIVE, /* " " " " " */
+ LRU_INACTIVE_ANON = LRU_BASE,
+ LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
+ LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
+ LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
NR_LRU_LISTS };
#define for_each_lru(l) for (l = 0; l < NR_LRU_LISTS; l++)
+static inline int is_file_lru(enum lru_list l)
+{
+ return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE);
+}
+
static inline int is_active_lru(enum lru_list l)
{
- return (l == LRU_ACTIVE);
+ return (l == LRU_ACTIVE_ANON || l == LRU_ACTIVE_FILE);
}
enum lru_list page_lru(struct page *page);
@@ -269,6 +286,10 @@ struct zone {
spinlock_t lru_lock;
struct list_head list[NR_LRU_LISTS];
unsigned long nr_scan[NR_LRU_LISTS];
+
+ unsigned long recent_rotated_anon;
+ unsigned long recent_rotated_file;
+
unsigned long pages_scanned; /* since last reclaim */
unsigned long flags; /* zone flags, see below */
Index: linux-2.6.26-rc2-mm1/include/linux/mm_inline.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mm_inline.h 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mm_inline.h 2008-05-23 14:21:34.000000000 -0400
@@ -5,7 +5,7 @@
* page_file_cache - should the page be on a file LRU or anon LRU?
* @page: the page to test
*
- * Returns !0 if @page is page cache page backed by a regular filesystem,
+ * Returns LRU_FILE if @page is page cache page backed by a regular filesystem,
* or 0 if @page is anonymous, tmpfs or otherwise ram or swap backed.
*
* We would like to get this info without a page flag, but the state
@@ -18,58 +18,83 @@ static inline int page_file_cache(struct
return 0;
/* The page is page cache backed by a normal filesystem. */
- return 2;
+ return LRU_FILE;
}
static inline void
add_page_to_lru_list(struct zone *zone, struct page *page, enum lru_list l)
{
list_add(&page->lru, &zone->list[l]);
- __inc_zone_state(zone, NR_INACTIVE + l);
+ __inc_zone_state(zone, NR_INACTIVE_ANON + l);
}
static inline void
del_page_from_lru_list(struct zone *zone, struct page *page, enum lru_list l)
{
list_del(&page->lru);
- __dec_zone_state(zone, NR_INACTIVE + l);
+ __dec_zone_state(zone, NR_INACTIVE_ANON + l);
}
static inline void
-add_page_to_active_list(struct zone *zone, struct page *page)
+add_page_to_inactive_anon_list(struct zone *zone, struct page *page)
{
- add_page_to_lru_list(zone, page, LRU_ACTIVE);
+ add_page_to_lru_list(zone, page, LRU_INACTIVE_ANON);
}
static inline void
-add_page_to_inactive_list(struct zone *zone, struct page *page)
+add_page_to_active_anon_list(struct zone *zone, struct page *page)
{
- add_page_to_lru_list(zone, page, LRU_INACTIVE);
+ add_page_to_lru_list(zone, page, LRU_ACTIVE_ANON);
}
static inline void
-del_page_from_active_list(struct zone *zone, struct page *page)
+add_page_to_inactive_file_list(struct zone *zone, struct page *page)
{
- del_page_from_lru_list(zone, page, LRU_ACTIVE);
+ add_page_to_lru_list(zone, page, LRU_INACTIVE_FILE);
}
static inline void
-del_page_from_inactive_list(struct zone *zone, struct page *page)
+add_page_to_active_file_list(struct zone *zone, struct page *page)
{
- del_page_from_lru_list(zone, page, LRU_INACTIVE);
+ add_page_to_lru_list(zone, page, LRU_ACTIVE_FILE);
+}
+
+static inline void
+del_page_from_inactive_anon_list(struct zone *zone, struct page *page)
+{
+ del_page_from_lru_list(zone, page, LRU_INACTIVE_ANON);
+}
+
+static inline void
+del_page_from_active_anon_list(struct zone *zone, struct page *page)
+{
+ del_page_from_lru_list(zone, page, LRU_ACTIVE_ANON);
+}
+
+static inline void
+del_page_from_inactive_file_list(struct zone *zone, struct page *page)
+{
+ del_page_from_lru_list(zone, page, LRU_INACTIVE_FILE);
+}
+
+static inline void
+del_page_from_active_file_list(struct zone *zone, struct page *page)
+{
+ del_page_from_lru_list(zone, page, LRU_INACTIVE_FILE);
}
static inline void
del_page_from_lru(struct zone *zone, struct page *page)
{
- enum lru_list l = LRU_INACTIVE;
+ enum lru_list l = LRU_INACTIVE_ANON;
list_del(&page->lru);
if (PageActive(page)) {
__ClearPageActive(page);
- l = LRU_ACTIVE;
+ l += LRU_ACTIVE;
}
- __dec_zone_state(zone, NR_INACTIVE + l);
+ l += page_file_cache(page);
+ __dec_zone_state(zone, NR_INACTIVE_ANON + l);
}
#endif
Index: linux-2.6.26-rc2-mm1/include/linux/pagevec.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/pagevec.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/pagevec.h 2008-05-23 14:21:34.000000000 -0400
@@ -81,20 +81,37 @@ static inline void pagevec_free(struct p
__pagevec_free(pvec);
}
-static inline void __pagevec_lru_add(struct pagevec *pvec)
+static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
{
- ____pagevec_lru_add(pvec, LRU_INACTIVE);
+ ____pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
}
-static inline void __pagevec_lru_add_active(struct pagevec *pvec)
+static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
{
- ____pagevec_lru_add(pvec, LRU_ACTIVE);
+ ____pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
}
-static inline void pagevec_lru_add(struct pagevec *pvec)
+static inline void __pagevec_lru_add_file(struct pagevec *pvec)
+{
+ ____pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
+}
+
+static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
+{
+ ____pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
+}
+
+
+static inline void pagevec_lru_add_file(struct pagevec *pvec)
+{
+ if (pagevec_count(pvec))
+ __pagevec_lru_add_file(pvec);
+}
+
+static inline void pagevec_lru_add_anon(struct pagevec *pvec)
{
if (pagevec_count(pvec))
- __pagevec_lru_add(pvec);
+ __pagevec_lru_add_anon(pvec);
}
#endif /* _LINUX_PAGEVEC_H */
Index: linux-2.6.26-rc2-mm1/include/linux/vmstat.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/vmstat.h 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/vmstat.h 2008-05-23 14:21:34.000000000 -0400
@@ -159,6 +159,16 @@ static inline unsigned long zone_page_st
return x;
}
+extern unsigned long global_lru_pages(void);
+
+static inline unsigned long zone_lru_pages(struct zone *zone)
+{
+ return (zone_page_state(zone, NR_ACTIVE_ANON)
+ + zone_page_state(zone, NR_ACTIVE_FILE)
+ + zone_page_state(zone, NR_INACTIVE_ANON)
+ + zone_page_state(zone, NR_INACTIVE_FILE));
+}
+
#ifdef CONFIG_NUMA
/*
* Determine the per node value of a stat item. This function
Index: linux-2.6.26-rc2-mm1/mm/page-writeback.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page-writeback.c 2008-05-23 14:21:21.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page-writeback.c 2008-05-23 14:21:34.000000000 -0400
@@ -331,9 +331,7 @@ static unsigned long highmem_dirtyable_m
struct zone *z =
&NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
- x += zone_page_state(z, NR_FREE_PAGES)
- + zone_page_state(z, NR_INACTIVE)
- + zone_page_state(z, NR_ACTIVE);
+ x += zone_page_state(z, NR_FREE_PAGES) + zone_lru_pages(z);
}
/*
* Make sure that the number of highmem pages is never larger
@@ -351,9 +349,7 @@ static unsigned long determine_dirtyable
{
unsigned long x;
- x = global_page_state(NR_FREE_PAGES)
- + global_page_state(NR_INACTIVE)
- + global_page_state(NR_ACTIVE);
+ x = global_page_state(NR_FREE_PAGES) + global_lru_pages();
if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
Index: linux-2.6.26-rc2-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/swap.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/swap.h 2008-05-23 14:21:34.000000000 -0400
@@ -184,14 +184,24 @@ extern void swap_setup(void);
* lru_cache_add: add a page to the page lists
* @page: the page to add
*/
-static inline void lru_cache_add(struct page *page)
+static inline void lru_cache_add_anon(struct page *page)
{
- __lru_cache_add(page, LRU_INACTIVE);
+ __lru_cache_add(page, LRU_INACTIVE_ANON);
}
-static inline void lru_cache_add_active(struct page *page)
+static inline void lru_cache_add_active_anon(struct page *page)
{
- __lru_cache_add(page, LRU_ACTIVE);
+ __lru_cache_add(page, LRU_ACTIVE_ANON);
+}
+
+static inline void lru_cache_add_file(struct page *page)
+{
+ __lru_cache_add(page, LRU_INACTIVE_FILE);
+}
+
+static inline void lru_cache_add_active_file(struct page *page)
+{
+ __lru_cache_add(page, LRU_ACTIVE_FILE);
}
/* linux/mm/vmscan.c */
@@ -199,7 +209,7 @@ extern unsigned long try_to_free_pages(s
gfp_t gfp_mask);
extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
gfp_t gfp_mask);
-extern int __isolate_lru_page(struct page *page, int mode);
+extern int __isolate_lru_page(struct page *page, int mode, int file);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
Index: linux-2.6.26-rc2-mm1/include/linux/memcontrol.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/memcontrol.h 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/memcontrol.h 2008-05-23 14:21:34.000000000 -0400
@@ -41,7 +41,7 @@ extern unsigned long mem_cgroup_isolate_
unsigned long *scanned, int order,
int mode, struct zone *z,
struct mem_cgroup *mem_cont,
- int active);
+ int active, int file);
extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
Index: linux-2.6.26-rc2-mm1/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/memcontrol.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/memcontrol.c 2008-05-23 14:21:34.000000000 -0400
@@ -163,6 +163,7 @@ struct page_cgroup {
};
#define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */
#define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */
+#define PAGE_CGROUP_FLAG_FILE (0x4) /* page is file system backed */
static int page_cgroup_nid(struct page_cgroup *pc)
{
@@ -280,8 +281,12 @@ static void unlock_page_cgroup(struct pa
static void __mem_cgroup_remove_list(struct mem_cgroup_per_zone *mz,
struct page_cgroup *pc)
{
- int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
- int lru = !!from;
+ int lru = LRU_BASE;
+
+ if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE)
+ lru += LRU_ACTIVE;
+ if (pc->flags & PAGE_CGROUP_FLAG_FILE)
+ lru += LRU_FILE;
MEM_CGROUP_ZSTAT(mz, lru) -= 1;
@@ -292,10 +297,12 @@ static void __mem_cgroup_remove_list(str
static void __mem_cgroup_add_list(struct mem_cgroup_per_zone *mz,
struct page_cgroup *pc)
{
- int lru = LRU_INACTIVE;
+ int lru = LRU_BASE;
if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE)
lru += LRU_ACTIVE;
+ if (pc->flags & PAGE_CGROUP_FLAG_FILE)
+ lru += LRU_FILE;
MEM_CGROUP_ZSTAT(mz, lru) += 1;
list_add(&pc->lru, &mz->lists[lru]);
@@ -306,10 +313,9 @@ static void __mem_cgroup_add_list(struct
static void __mem_cgroup_move_lists(struct page_cgroup *pc, bool active)
{
struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
- int lru = LRU_INACTIVE;
-
- if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE)
- lru += LRU_ACTIVE;
+ int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+ int file = pc->flags & PAGE_CGROUP_FLAG_FILE;
+ int lru = LRU_FILE * !!file + !!from;
MEM_CGROUP_ZSTAT(mz, lru) -= 1;
@@ -318,7 +324,7 @@ static void __mem_cgroup_move_lists(stru
else
pc->flags &= ~PAGE_CGROUP_FLAG_ACTIVE;
- lru = !!active;
+ lru = LRU_FILE * !!file + !!active;
MEM_CGROUP_ZSTAT(mz, lru) += 1;
list_move(&pc->lru, &mz->lists[lru]);
}
@@ -380,21 +386,6 @@ int mem_cgroup_calc_mapped_ratio(struct
}
/*
- * This function is called from vmscan.c. In page reclaiming loop. balance
- * between active and inactive list is calculated. For memory controller
- * page reclaiming, we should use using mem_cgroup's imbalance rather than
- * zone's global lru imbalance.
- */
-long mem_cgroup_reclaim_imbalance(struct mem_cgroup *mem)
-{
- unsigned long active, inactive;
- /* active and inactive are the number of pages. 'long' is ok.*/
- active = mem_cgroup_get_all_zonestat(mem, LRU_ACTIVE);
- inactive = mem_cgroup_get_all_zonestat(mem, LRU_INACTIVE);
- return (long) (active / (inactive + 1));
-}
-
-/*
* prev_priority control...this will be used in memory reclaim path.
*/
int mem_cgroup_get_reclaim_priority(struct mem_cgroup *mem)
@@ -439,7 +430,7 @@ unsigned long mem_cgroup_isolate_pages(u
unsigned long *scanned, int order,
int mode, struct zone *z,
struct mem_cgroup *mem_cont,
- int active)
+ int active, int file)
{
unsigned long nr_taken = 0;
struct page *page;
@@ -450,7 +441,7 @@ unsigned long mem_cgroup_isolate_pages(u
int nid = z->zone_pgdat->node_id;
int zid = zone_idx(z);
struct mem_cgroup_per_zone *mz;
- int lru = !!active;
+ int lru = LRU_FILE * !!file + !!active;
BUG_ON(!mem_cont);
mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
@@ -466,6 +457,9 @@ unsigned long mem_cgroup_isolate_pages(u
if (unlikely(!PageLRU(page)))
continue;
+ /*
+ * TODO: play better with lumpy reclaim, grabbing anything.
+ */
if (PageActive(page) && !active) {
__mem_cgroup_move_lists(pc, true);
continue;
@@ -478,7 +472,7 @@ unsigned long mem_cgroup_isolate_pages(u
scan++;
list_move(&pc->lru, &pc_list);
- if (__isolate_lru_page(page, mode) == 0) {
+ if (__isolate_lru_page(page, mode, file) == 0) {
list_move(&page->lru, dst);
nr_taken++;
}
@@ -590,9 +584,11 @@ retry:
* If a page is accounted as a page cache, insert to inactive list.
* If anon, insert to active list.
*/
- if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE)
+ if (ctype == MEM_CGROUP_CHARGE_TYPE_CACHE) {
pc->flags = PAGE_CGROUP_FLAG_CACHE;
- else
+ if (page_file_cache(page))
+ pc->flags |= PAGE_CGROUP_FLAG_FILE;
+ } else
pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
lock_page_cgroup(page);
@@ -890,14 +886,21 @@ static int mem_control_stat_show(struct
}
/* showing # of active pages */
{
- unsigned long active, inactive;
+ unsigned long active_anon, inactive_anon;
+ unsigned long active_file, inactive_file;
- inactive = mem_cgroup_get_all_zonestat(mem_cont,
- LRU_INACTIVE);
- active = mem_cgroup_get_all_zonestat(mem_cont,
- LRU_ACTIVE);
- cb->fill(cb, "active", (active) * PAGE_SIZE);
- cb->fill(cb, "inactive", (inactive) * PAGE_SIZE);
+ inactive_anon = mem_cgroup_get_all_zonestat(mem_cont,
+ LRU_INACTIVE_ANON);
+ active_anon = mem_cgroup_get_all_zonestat(mem_cont,
+ LRU_ACTIVE_ANON);
+ inactive_file = mem_cgroup_get_all_zonestat(mem_cont,
+ LRU_INACTIVE_FILE);
+ active_file = mem_cgroup_get_all_zonestat(mem_cont,
+ LRU_ACTIVE_FILE);
+ cb->fill(cb, "active_anon", (active_anon) * PAGE_SIZE);
+ cb->fill(cb, "inactive_anon", (inactive_anon) * PAGE_SIZE);
+ cb->fill(cb, "active_file", (active_file) * PAGE_SIZE);
+ cb->fill(cb, "inactive_file", (inactive_file) * PAGE_SIZE);
}
return 0;
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH -mm 06/12] split LRU lists into anon & file sets
2008-05-29 20:22 ` [PATCH -mm 06/12] split LRU lists into anon & file sets Rik van Riel
@ 2008-06-01 15:35 ` Balbir Singh
0 siblings, 0 replies; 15+ messages in thread
From: Balbir Singh @ 2008-06-01 15:35 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-kernel, Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
* Rik van Riel <riel@redhat.com> [2008-05-29 16:22:52]:
> From: Rik van Riel <riel@redhat.com>
>
> Split the LRU lists in two, one set for pages that are backed by
> real file systems ("file") and one for pages that are backed by
> memory and swap ("anon"). The latter includes tmpfs.
>
> Eventually mlocked pages will be taken off the LRUs alltogether.
> A patch for that already exists and just needs to be integrated
> into this series.
>
> This patch mostly has the infrastructure and a basic policy to
> balance how much we scan the anon lists and how much we scan
> the file lists. The big policy changes are in separate patches.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
>
Hi, Rik,
My problems with OOM continue, despite the changes incorporated into
memcontrol.c. While I am investigating the problem, the code below
seemed incorrect.
> +static inline void
> +del_page_from_active_file_list(struct zone *zone, struct page *page)
> +{
> + del_page_from_lru_list(zone, page, LRU_INACTIVE_FILE);
> }
Shouldn't this be del_page_from_lru_list(zone, page, LRU_ACTIVE_FILE)?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH -mm 07/12] second chance replacement for anonymous pages
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (5 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 06/12] split LRU lists into anon & file sets Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-06-01 20:04 ` Balbir Singh
2008-05-29 20:22 ` [PATCH -mm 08/12] add some sanity checks to get_scan_ratio Rik van Riel
` (4 subsequent siblings)
11 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
[-- Attachment #1: rvr-03-linux-2.6-vm-anon-clock.patch --]
[-- Type: text/plain, Size: 8223 bytes --]
From: Rik van Riel <riel@redhat.com>
We avoid evicting and scanning anonymous pages for the most part, but
under some workloads we can end up with most of memory filled with
anonymous pages. At that point, we suddenly need to clear the referenced
bits on all of memory, which can take ages on very large memory systems.
We can reduce the maximum number of pages that need to be scanned by
not taking the referenced state into account when deactivating an
anonymous page. After all, every anonymous page starts out referenced,
so why check?
If an anonymous page gets referenced again before it reaches the end
of the inactive list, we move it back to the active list.
To keep the maximum amount of necessary work reasonable, we scale the
active to inactive ratio with the size of memory, using the formula
active:inactive ratio = sqrt(memory in GB * 10).
Kswapd CPU use now seems to scale by the amount of pageout bandwidth,
instead of by the amount of memory present in the system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/mm_inline.h | 12 ++++++++++++
include/linux/mmzone.h | 5 +++++
mm/page_alloc.c | 40 ++++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 38 +++++++++++++++++++++++++++++++-------
mm/vmstat.c | 6 ++++--
5 files changed, 92 insertions(+), 9 deletions(-)
Index: linux-2.6.26-rc2-mm1/include/linux/mm_inline.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mm_inline.h 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mm_inline.h 2008-05-28 12:09:06.000000000 -0400
@@ -97,4 +97,16 @@ del_page_from_lru(struct zone *zone, str
__dec_zone_state(zone, NR_INACTIVE_ANON + l);
}
+static inline int inactive_anon_low(struct zone *zone)
+{
+ unsigned long active, inactive;
+
+ active = zone_page_state(zone, NR_ACTIVE_ANON);
+ inactive = zone_page_state(zone, NR_INACTIVE_ANON);
+
+ if (inactive * zone->inactive_ratio < active)
+ return 1;
+
+ return 0;
+}
#endif
Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-28 12:09:06.000000000 -0400
@@ -311,6 +311,11 @@ struct zone {
*/
int prev_priority;
+ /*
+ * The ratio of active to inactive pages.
+ */
+ unsigned int inactive_ratio;
+
ZONE_PADDING(_pad2_)
/* Rarely used or read-mostly fields */
Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-28 12:09:06.000000000 -0400
@@ -4269,6 +4269,45 @@ void setup_per_zone_pages_min(void)
calculate_totalreserve_pages();
}
+/**
+ * setup_per_zone_inactive_ratio - called when min_free_kbytes changes.
+ *
+ * The inactive anon list should be small enough that the VM never has to
+ * do too much work, but large enough that each inactive page has a chance
+ * to be referenced again before it is swapped out.
+ *
+ * The inactive_anon ratio is the ratio of active to inactive anonymous
+ * pages. Ie. a ratio of 3 means 3:1 or 25% of the anonymous pages are
+ * on the inactive list.
+ *
+ * total return max
+ * memory value inactive anon
+ * -------------------------------------
+ * 10MB 1 5MB
+ * 100MB 1 50MB
+ * 1GB 3 250MB
+ * 10GB 10 0.9GB
+ * 100GB 31 3GB
+ * 1TB 101 10GB
+ * 10TB 320 32GB
+ */
+void setup_per_zone_inactive_ratio(void)
+{
+ struct zone *zone;
+
+ for_each_zone(zone) {
+ unsigned int gb, ratio;
+
+ /* Zone size in gigabytes */
+ gb = zone->present_pages >> (30 - PAGE_SHIFT);
+ ratio = int_sqrt(10 * gb);
+ if (!ratio)
+ ratio = 1;
+
+ zone->inactive_ratio = ratio;
+ }
+}
+
/*
* Initialise min_free_kbytes.
*
@@ -4306,6 +4345,7 @@ static int __init init_per_zone_pages_mi
min_free_kbytes = 65536;
setup_per_zone_pages_min();
setup_per_zone_lowmem_reserve();
+ setup_per_zone_inactive_ratio();
return 0;
}
module_init(init_per_zone_pages_min)
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-28 12:11:38.000000000 -0400
@@ -114,7 +114,7 @@ struct scan_control {
/*
* From 0 .. 100. Higher means more swappy.
*/
-int vm_swappiness = 60;
+int vm_swappiness = 20;
long vm_total_pages; /* The total number of pages which the VM controls */
static LIST_HEAD(shrinker_list);
@@ -1008,7 +1008,7 @@ static inline int zone_is_near_oom(struc
static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
struct scan_control *sc, int priority, int file)
{
- unsigned long pgmoved;
+ unsigned long pgmoved = 0;
int pgdeactivate = 0;
unsigned long pgscanned;
LIST_HEAD(l_hold); /* The pages which were snipped off */
@@ -1036,17 +1036,32 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
spin_unlock_irq(&zone->lru_lock);
+ pgmoved = 0;
while (!list_empty(&l_hold)) {
cond_resched();
page = lru_to_page(&l_hold);
list_del(&page->lru);
- if (page_referenced(page, 0, sc->mem_cgroup))
- list_add(&page->lru, &l_active);
- else
+ if (page_referenced(page, 0, sc->mem_cgroup)) {
+ if (file) {
+ /* Referenced file pages stay active. */
+ list_add(&page->lru, &l_active);
+ } else {
+ /* Anonymous pages always get deactivated. */
+ list_add(&page->lru, &l_inactive);
+ pgmoved++;
+ }
+ } else
list_add(&page->lru, &l_inactive);
}
/*
+ * Count the referenced anon pages as rotated, to balance pageout
+ * scan pressure between file and anonymous pages in get_sacn_ratio.
+ */
+ if (!file)
+ zone->recent_rotated_anon += pgmoved;
+
+ /*
* Now put the pages back on the appropriate [file or anon] inactive
* and active lists.
*/
@@ -1129,7 +1144,13 @@ static unsigned long shrink_list(enum lr
{
int file = is_file_lru(lru);
- if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) {
+ if (lru == LRU_ACTIVE_FILE) {
+ shrink_active_list(nr_to_scan, zone, sc, priority, file);
+ return 0;
+ }
+
+ if (lru == LRU_ACTIVE_ANON &&
+ (!scan_global_lru(sc) || inactive_anon_low(zone))) {
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
@@ -1239,8 +1260,8 @@ static unsigned long shrink_zone(int pri
}
}
- while (nr[LRU_ACTIVE_ANON] || nr[LRU_INACTIVE_ANON] ||
- nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) {
+ while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
+ nr[LRU_INACTIVE_FILE]) {
for_each_lru(l) {
if (nr[l]) {
nr_to_scan = min(nr[l],
@@ -1542,6 +1563,14 @@ loop_again:
priority != DEF_PRIORITY)
continue;
+ /*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming.
+ */
+ if (inactive_anon_low(zone))
+ shrink_active_list(SWAP_CLUSTER_MAX, zone,
+ &sc, priority, 0);
+
if (!zone_watermark_ok(zone, order, zone->pages_high,
0, 0)) {
end_zone = i;
Index: linux-2.6.26-rc2-mm1/mm/vmstat.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmstat.c 2008-05-23 14:21:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmstat.c 2008-05-28 12:09:06.000000000 -0400
@@ -814,10 +814,12 @@ static void zoneinfo_show_print(struct s
seq_printf(m,
"\n all_unreclaimable: %u"
"\n prev_priority: %i"
- "\n start_pfn: %lu",
+ "\n start_pfn: %lu"
+ "\n inactive_ratio: %u",
zone_is_all_unreclaimable(zone),
zone->prev_priority,
- zone->zone_start_pfn);
+ zone->zone_start_pfn,
+ zone->inactive_ratio);
seq_putc(m, '\n');
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH -mm 07/12] second chance replacement for anonymous pages
2008-05-29 20:22 ` [PATCH -mm 07/12] second chance replacement for anonymous pages Rik van Riel
@ 2008-06-01 20:04 ` Balbir Singh
0 siblings, 0 replies; 15+ messages in thread
From: Balbir Singh @ 2008-06-01 20:04 UTC (permalink / raw)
To: Rik van Riel
Cc: linux-kernel, Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
* Rik van Riel <riel@redhat.com> [2008-05-29 16:22:53]:
> From: Rik van Riel <riel@redhat.com>
>
> + if (lru == LRU_ACTIVE_ANON &&
> + (!scan_global_lru(sc) || inactive_anon_low(zone))) {
> shrink_active_list(nr_to_scan, zone, sc, priority, file);
> return 0;
> }
This should do the trick. I was playing around with v8 and came up
with the exact same fix, after having spent some time reviewing and
playing around with the code.
Let me bundle up all of V9 and test it.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH -mm 08/12] add some sanity checks to get_scan_ratio
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (6 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 07/12] second chance replacement for anonymous pages Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 09/12] fix pagecache reclaim referenced bit check Rik van Riel
` (3 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
[-- Attachment #1: rvr-04-linux-2.6-scan-ratio-fixes.patch --]
[-- Type: text/plain, Size: 8679 bytes --]
From: Rik van Riel <riel@redhat.com>
The access ratio based scan rate determination in get_scan_ratio
works ok in most situations, but needs to be corrected in some
corner cases:
- if we run out of swap space, do not bother scanning the anon LRUs
- if we have already freed all of the page cache, we need to scan
the anon LRUs
- restore the *actual* access ratio based scan rate algorithm, the
previous versions of this patch series had the wrong version
- scale the number of pages added to zone->nr_scan[l]
Signed-off-by: Rik van Riel <riel@redhat.com>
---
include/linux/mmzone.h | 2
mm/page_alloc.c | 3 -
mm/swap.c | 13 +++++-
mm/vmscan.c | 104 ++++++++++++++++++++++++++++++++-----------------
4 files changed, 85 insertions(+), 37 deletions(-)
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-28 12:11:38.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-28 12:11:51.000000000 -0400
@@ -893,8 +893,13 @@ static unsigned long shrink_inactive_lis
__mod_zone_page_state(zone, NR_INACTIVE_ANON,
-count[LRU_INACTIVE_ANON]);
- if (scan_global_lru(sc))
+ if (scan_global_lru(sc)) {
zone->pages_scanned += nr_scan;
+ zone->recent_scanned_anon += count[LRU_ACTIVE_ANON] +
+ count[LRU_INACTIVE_ANON];
+ zone->recent_scanned_file += count[LRU_ACTIVE_FILE] +
+ count[LRU_INACTIVE_FILE];
+ }
spin_unlock_irq(&zone->lru_lock);
nr_scanned += nr_scan;
@@ -944,11 +949,13 @@ static unsigned long shrink_inactive_lis
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
list_del(&page->lru);
- if (page_file_cache(page)) {
+ if (page_file_cache(page))
lru += LRU_FILE;
- zone->recent_rotated_file++;
- } else {
- zone->recent_rotated_anon++;
+ if (scan_global_lru(sc)) {
+ if (page_file_cache(page))
+ zone->recent_rotated_file++;
+ else
+ zone->recent_rotated_anon++;
}
if (PageActive(page))
lru += LRU_ACTIVE;
@@ -1027,8 +1034,13 @@ static void shrink_active_list(unsigned
* zone->pages_scanned is used for detect zone's oom
* mem_cgroup remembers nr_scan by itself.
*/
- if (scan_global_lru(sc))
+ if (scan_global_lru(sc)) {
zone->pages_scanned += pgscanned;
+ if (file)
+ zone->recent_scanned_file += pgscanned;
+ else
+ zone->recent_scanned_anon += pgscanned;
+ }
if (file)
__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
@@ -1170,9 +1182,8 @@ static unsigned long shrink_list(enum lr
static void get_scan_ratio(struct zone *zone, struct scan_control * sc,
unsigned long *percent)
{
- unsigned long anon, file;
+ unsigned long anon, file, free;
unsigned long anon_prio, file_prio;
- unsigned long rotate_sum;
unsigned long ap, fp;
anon = zone_page_state(zone, NR_ACTIVE_ANON) +
@@ -1180,15 +1191,19 @@ static void get_scan_ratio(struct zone *
file = zone_page_state(zone, NR_ACTIVE_FILE) +
zone_page_state(zone, NR_INACTIVE_FILE);
- rotate_sum = zone->recent_rotated_file + zone->recent_rotated_anon;
-
/* Keep a floating average of RECENT references. */
- if (unlikely(rotate_sum > min(anon, file))) {
+ if (unlikely(zone->recent_scanned_anon > anon / zone->inactive_ratio)) {
spin_lock_irq(&zone->lru_lock);
- zone->recent_rotated_file /= 2;
+ zone->recent_scanned_anon /= 2;
zone->recent_rotated_anon /= 2;
spin_unlock_irq(&zone->lru_lock);
- rotate_sum /= 2;
+ }
+
+ if (unlikely(zone->recent_scanned_file > file / 4)) {
+ spin_lock_irq(&zone->lru_lock);
+ zone->recent_scanned_file /= 2;
+ zone->recent_rotated_file /= 2;
+ spin_unlock_irq(&zone->lru_lock);
}
/*
@@ -1201,23 +1216,33 @@ static void get_scan_ratio(struct zone *
/*
* anon recent_rotated_anon
* %anon = 100 * ----------- / ------------------- * IO cost
- * anon + file rotate_sum
+ * anon + file recent_scanned_anon
*/
- ap = (anon_prio * anon) / (anon + file + 1);
- ap *= rotate_sum / (zone->recent_rotated_anon + 1);
- if (ap == 0)
- ap = 1;
- else if (ap > 100)
- ap = 100;
- percent[0] = ap;
-
- fp = (file_prio * file) / (anon + file + 1);
- fp *= rotate_sum / (zone->recent_rotated_file + 1);
- if (fp == 0)
- fp = 1;
- else if (fp > 100)
- fp = 100;
- percent[1] = fp;
+ ap = (anon_prio + 1) * (zone->recent_scanned_anon + 1);
+ ap /= zone->recent_rotated_anon + 1;
+
+ fp = (file_prio + 1) * (zone->recent_scanned_file + 1);
+ fp /= zone->recent_rotated_file + 1;
+
+ /* Normalize to percentages */
+ percent[0] = 100 * ap / (ap + fp + 1);
+ percent[1] = 100 - percent[0];
+
+ free = zone_page_state(zone, NR_FREE_PAGES);
+
+ /*
+ * If we have no swap space, do not bother scanning anon pages.
+ */
+ if (nr_swap_pages <= 0) {
+ percent[0] = 0;
+ percent[1] = 100;
+ }
+ /*
+ * If we already freed most file pages, scan the anon pages
+ * regardless of the page access ratios or swappiness setting.
+ */
+ else if (file + free <= zone->pages_high)
+ percent[0] = 100;
}
@@ -1238,13 +1263,17 @@ static unsigned long shrink_zone(int pri
for_each_lru(l) {
if (scan_global_lru(sc)) {
int file = is_file_lru(l);
+ int scan;
/*
* Add one to nr_to_scan just to make sure that the
- * kernel will slowly sift through the active list.
+ * kernel will slowly sift through each list.
*/
- zone->nr_scan[l] += (zone_page_state(zone,
- NR_INACTIVE_ANON + l) >> priority) + 1;
- nr[l] = zone->nr_scan[l] * percent[file] / 100;
+ scan = zone_page_state(zone, NR_INACTIVE_ANON + l);
+ scan >>= priority;
+ scan = (scan * percent[file]) / 100;
+
+ zone->nr_scan[l] += scan + 1;
+ nr[l] = zone->nr_scan[l];
if (nr[l] >= sc->swap_cluster_max)
zone->nr_scan[l] = 0;
else
@@ -1261,7 +1290,7 @@ static unsigned long shrink_zone(int pri
}
while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
- nr[LRU_INACTIVE_FILE]) {
+ nr[LRU_INACTIVE_FILE]) {
for_each_lru(l) {
if (nr[l]) {
nr_to_scan = min(nr[l],
@@ -1274,6 +1303,13 @@ static unsigned long shrink_zone(int pri
}
}
+ /*
+ * Even if we did not try to evict anon pages at all, we want to
+ * rebalance the anon lru active/inactive ratio.
+ */
+ if (scan_global_lru(sc) && inactive_anon_low(zone))
+ shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
+
throttle_vm_writeout(sc->gfp_mask);
return nr_reclaimed;
}
Index: linux-2.6.26-rc2-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/mmzone.h 2008-05-28 12:09:06.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/mmzone.h 2008-05-28 12:11:51.000000000 -0400
@@ -289,6 +289,8 @@ struct zone {
unsigned long recent_rotated_anon;
unsigned long recent_rotated_file;
+ unsigned long recent_scanned_anon;
+ unsigned long recent_scanned_file;
unsigned long pages_scanned; /* since last reclaim */
unsigned long flags; /* zone flags, see below */
Index: linux-2.6.26-rc2-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/page_alloc.c 2008-05-28 12:09:06.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/page_alloc.c 2008-05-28 12:11:51.000000000 -0400
@@ -3512,7 +3512,8 @@ static void __paginginit free_area_init_
}
zone->recent_rotated_anon = 0;
zone->recent_rotated_file = 0;
-//TODO recent_scanned_* ???
+ zone->recent_scanned_anon = 0;
+ zone->recent_scanned_file = 0;
zap_zone_vm_stats(zone);
zone->flags = 0;
if (!size)
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-28 12:09:06.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap.c 2008-05-28 12:11:51.000000000 -0400
@@ -176,8 +176,8 @@ void activate_page(struct page *page)
spin_lock_irq(&zone->lru_lock);
if (PageLRU(page) && !PageActive(page)) {
- int lru = LRU_BASE;
- lru += page_file_cache(page);
+ int file = page_file_cache(page);
+ int lru = LRU_BASE + file;
del_page_from_lru_list(zone, page, lru);
SetPageActive(page);
@@ -185,6 +185,15 @@ void activate_page(struct page *page)
add_page_to_lru_list(zone, page, lru);
__count_vm_event(PGACTIVATE);
mem_cgroup_move_lists(page, true);
+
+ if (file) {
+ zone->recent_scanned_file++;
+ zone->recent_rotated_file++;
+ } else {
+ /* Can this happen? Maybe through tmpfs... */
+ zone->recent_scanned_anon++;
+ zone->recent_rotated_anon++;
+ }
}
spin_unlock_irq(&zone->lru_lock);
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 09/12] fix pagecache reclaim referenced bit check
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (7 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 08/12] add some sanity checks to get_scan_ratio Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 10/12] add newly swapped in pages to the inactive list Rik van Riel
` (2 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, Martin Bligh
[-- Attachment #1: rvr-05-linux-2.6-fix-pagecache-reclaim.patch --]
[-- Type: text/plain, Size: 1346 bytes --]
From: Rik van Riel <riel@redhat.com>
The -mm tree contains the patch
vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
which gives referenced pagecache pages another trip around
the active list. This seems to help keep frequently accessed
pagecache pages in memory.
However, it means that pagecache pages that get moved to the
inactive list do not have their referenced bit set, and a
reference to the page will not get it moved back to the active
list. This patch sets the referenced bit on pagecache pages
that get deactivated, so the next access to the page will promote
it back to the active list.
---
mm/vmscan.c | 4 ++++
1 file changed, 4 insertions(+)
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-28 12:11:51.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-28 12:14:34.000000000 -0400
@@ -1062,8 +1062,13 @@ static void shrink_active_list(unsigned
list_add(&page->lru, &l_inactive);
pgmoved++;
}
- } else
+ } else {
list_add(&page->lru, &l_inactive);
+ if (file && !page_mapped(page))
+ /* Bypass use-once, make the next access count.
+ * See mark_page_accessed. */
+ SetPageReferenced(page);
+ }
}
/*
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 10/12] add newly swapped in pages to the inactive list
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (8 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 09/12] fix pagecache reclaim referenced bit check Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 11/12] more aggressively use lumpy reclaim Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 12/12] pageflag helpers for configed-out flags Rik van Riel
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
[-- Attachment #1: rvr-swapin-inactive.patch --]
[-- Type: text/plain, Size: 1113 bytes --]
From: Rik van Riel <riel@redhat.com>
Swapin_readahead can read in a lot of data that the processes in
memory never need. Adding swap cache pages to the inactive list
prevents them from putting too much pressure on the working set.
This has the potential to help the programs that are already in
memory, but it could also be a disadvantage to processes that
are trying to get swapped in.
In short, this patch needs testing.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
mm/swap_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.26-rc2-mm1/mm/swap_state.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap_state.c 2008-05-28 09:40:59.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/swap_state.c 2008-05-28 09:42:26.000000000 -0400
@@ -302,7 +302,7 @@ struct page *read_swap_cache_async(swp_e
/*
* Initiate read into locked page and return.
*/
- lru_cache_add_active_anon(new_page);
+ lru_cache_add_anon(new_page);
swap_readpage(NULL, new_page);
return new_page;
}
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 11/12] more aggressively use lumpy reclaim
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (9 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 10/12] add newly swapped in pages to the inactive list Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
2008-05-29 20:22 ` [PATCH -mm 12/12] pageflag helpers for configed-out flags Rik van Riel
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro
[-- Attachment #1: lumpy-reclaim-lower-order.patch --]
[-- Type: text/plain, Size: 2358 bytes --]
From: Rik van Riel <riel@redhat.com>
During an AIM7 run on a 16GB system, fork started failing around
32000 threads, despite the system having plenty of free swap and
15GB of pageable memory.
If normal pageout does not result in contiguous free pages for
kernel stacks, fall back to lumpy reclaim instead of failing fork
or doing excessive pageout IO.
I do not know whether this change is needed due to the extreme
stress test or because the inactive list is a smaller fraction
of system memory on huge systems.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
mm/vmscan.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-28 12:14:34.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-28 12:14:43.000000000 -0400
@@ -857,7 +857,8 @@ int isolate_lru_page(struct page *page)
* of reclaimed pages
*/
static unsigned long shrink_inactive_list(unsigned long max_scan,
- struct zone *zone, struct scan_control *sc, int file)
+ struct zone *zone, struct scan_control *sc,
+ int priority, int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -875,8 +876,19 @@ static unsigned long shrink_inactive_lis
unsigned long nr_freed;
unsigned long nr_active;
unsigned int count[NR_LRU_LISTS] = { 0, };
- int mode = (sc->order > PAGE_ALLOC_COSTLY_ORDER) ?
- ISOLATE_BOTH : ISOLATE_INACTIVE;
+ int mode = ISOLATE_INACTIVE;
+
+ /*
+ * If we need a large contiguous chunk of memory, or have
+ * trouble getting a small set of contiguous pages, we
+ * will reclaim both active and inactive pages.
+ *
+ * We use the same threshold as pageout congestion_wait below.
+ */
+ if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+ mode = ISOLATE_BOTH;
+ else if (sc->order && priority < DEF_PRIORITY - 2)
+ mode = ISOLATE_BOTH;
nr_taken = sc->isolate_pages(sc->swap_cluster_max,
&page_list, &nr_scan, sc->order, mode,
@@ -1171,7 +1183,7 @@ static unsigned long shrink_list(enum lr
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;
}
- return shrink_inactive_list(nr_to_scan, zone, sc, file);
+ return shrink_inactive_list(nr_to_scan, zone, sc, priority, file);
}
/*
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH -mm 12/12] pageflag helpers for configed-out flags
2008-05-29 20:22 [PATCH -mm 00/12] VM pageout scalability improvements (V9) Rik van Riel
` (10 preceding siblings ...)
2008-05-29 20:22 ` [PATCH -mm 11/12] more aggressively use lumpy reclaim Rik van Riel
@ 2008-05-29 20:22 ` Rik van Riel
11 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2008-05-29 20:22 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro,
Lee Schermerhorn
[-- Attachment #1: pageflag-helpers.patch --]
[-- Type: text/plain, Size: 1361 bytes --]
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
Define proper false/noop inline functions for noreclaim page
flags when !defined(CONFIG_NORECLAIM_LRU)
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
include/linux/page-flags.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6.26-rc2-mm1/include/linux/page-flags.h
===================================================================
--- linux-2.6.26-rc2-mm1.orig/include/linux/page-flags.h 2008-05-28 09:40:59.000000000 -0400
+++ linux-2.6.26-rc2-mm1/include/linux/page-flags.h 2008-05-28 09:42:35.000000000 -0400
@@ -147,6 +147,18 @@ static inline int Page##uname(struct pag
#define TESTSCFLAG(uname, lname) \
TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname)
+#define SETPAGEFLAG_NOOP(uname) \
+static inline void SetPage##uname(struct page *page) { }
+
+#define CLEARPAGEFLAG_NOOP(uname) \
+static inline void ClearPage##uname(struct page *page) { }
+
+#define __CLEARPAGEFLAG_NOOP(uname) \
+static inline void __ClearPage##uname(struct page *page) { }
+
+#define TESTCLEARFLAG_FALSE(uname) \
+static inline int TestClearPage##uname(struct page *page) { return 0; }
+
struct page; /* forward declaration */
PAGEFLAG(Locked, locked) TESTSCFLAG(Locked, locked)
--
All Rights Reversed
^ permalink raw reply [flat|nested] 15+ messages in thread