* [PATCH 01/11] mm: compaction: Allow compaction to isolate dirty pages
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 02/11] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
noted that compaction does not migrate dirty or writeback pages and
that is was meaningless to pick the page and re-add it to the LRU list.
What was missed during review is that asynchronous migration moves
dirty pages if their ->migratepage callback is migrate_page() because
these can be moved without blocking. This potentially impacted
hugepage allocation success rates by a factor depending on how many
dirty pages are in the system.
This patch partially reverts 39deaf85 to allow migration to isolate
dirty pages again. This increases how much compaction disrupts the
LRU but that is addressed later in the series.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/compaction.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 899d956..237560e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,9 +349,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;
}
- if (!cc->sync)
- mode |= ISOLATE_CLEAN;
-
/* Try isolate the page */
if (__isolate_lru_page(page, mode, 0) != 0)
continue;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 02/11] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
2011-12-01 17:36 ` [PATCH 01/11] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 03/11] mm: vmscan: Check if we isolated a compound page during lumpy scan Mel Gorman
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
When asynchronous compaction was introduced, the
/proc/sys/vm/compact_memory handler should have been updated to always
use synchronous compaction. This did not happen so this patch addresses
it. The assumption is if a user writes to /proc/sys/vm/compact_memory,
they are willing for that process to stall.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/compaction.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 237560e..615502b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -666,6 +666,7 @@ static int compact_node(int nid)
.nr_freepages = 0,
.nr_migratepages = 0,
.order = -1,
+ .sync = true,
};
zone = &pgdat->node_zones[zoneid];
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 03/11] mm: vmscan: Check if we isolated a compound page during lumpy scan
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
2011-12-01 17:36 ` [PATCH 01/11] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
2011-12-01 17:36 ` [PATCH 02/11] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 04/11] mm: vmscan: Do not OOM if aborting reclaim to start compaction Mel Gorman
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
From: Andrea Arcangeli <aarcange@redhat.com>
Properly take into account if we isolated a compound page during the
lumpy scan in reclaim and skip over the tail pages when encountered.
This corrects the values given to the tracepoint for number of lumpy
pages isolated and will avoid breaking the loop early if compound
pages smaller than the requested allocation size are requested.
[mgorman@suse.de: Updated changelog]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/vmscan.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1893c0..3421746 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1183,13 +1183,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
break;
if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+ unsigned int isolated_pages;
list_move(&cursor_page->lru, dst);
mem_cgroup_del_lru(cursor_page);
- nr_taken += hpage_nr_pages(page);
- nr_lumpy_taken++;
+ isolated_pages = hpage_nr_pages(page);
+ nr_taken += isolated_pages;
+ nr_lumpy_taken += isolated_pages;
if (PageDirty(cursor_page))
- nr_lumpy_dirty++;
+ nr_lumpy_dirty += isolated_pages;
scan++;
+ pfn += isolated_pages-1;
} else {
/*
* Check if the page is freed already.
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 04/11] mm: vmscan: Do not OOM if aborting reclaim to start compaction
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (2 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 03/11] mm: vmscan: Check if we isolated a compound page during lumpy scan Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 05/11] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
When direct reclaim is entered is is possible that reclaim will be
aborted so that compaction can be attempted to satisfy a high-order
allocation. If this decision is made before any pages are reclaimed,
it is possible for 0 to be returned to the page allocator potentially
triggering an OOM. This has not been observed but it is a possibility
so this patch addresses it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3421746..5f4c789 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2222,6 +2222,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
struct zoneref *z;
struct zone *zone;
unsigned long writeback_threshold;
+ bool should_abort_reclaim;
get_mems_allowed();
delayacct_freepages_start();
@@ -2233,7 +2234,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
sc->nr_scanned = 0;
if (!priority)
disable_swap_token(sc->mem_cgroup);
- if (shrink_zones(priority, zonelist, sc))
+ should_abort_reclaim = shrink_zones(priority, zonelist, sc);
+ if (should_abort_reclaim)
break;
/*
@@ -2301,6 +2303,10 @@ out:
if (oom_killer_disabled)
return 0;
+ /* Aborting reclaim to try compaction? don't OOM, then */
+ if (should_abort_reclaim)
+ return 1;
+
/* top priority shrink_zones still had more to do? don't OOM, then */
if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
return 1;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 05/11] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (3 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 04/11] mm: vmscan: Do not OOM if aborting reclaim to start compaction Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 06/11] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
Asynchronous compaction is used when allocating transparent hugepages
to avoid blocking for long periods of time. Due to reports of
stalling, there was a debate on disabling synchronous compaction
but this severely impacted allocation success rates. Part of the
reason was because when deciding whether to migrate dirty pages,
the following check is made;
if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;
This skips over all pages using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It
is the resposibility of the callback to gracefully fail migration of
the page if it would block.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/btrfs/disk-io.c | 4 +-
fs/nfs/internal.h | 2 +-
fs/nfs/write.c | 4 +-
include/linux/fs.h | 9 ++-
include/linux/migrate.h | 2 +-
mm/migrate.c | 129 +++++++++++++++++++++++++++++++++-------------
6 files changed, 104 insertions(+), 46 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 632f8f3..896b87a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,7 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
#ifdef CONFIG_MIGRATION
static int btree_migratepage(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
/*
* we can't safely write a btree page from here,
@@ -887,7 +887,7 @@ static int btree_migratepage(struct address_space *mapping,
if (page_has_private(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
#endif
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 3f4d957..8d96ed6 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -330,7 +330,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
#else
#define nfs_migrate_page NULL
#endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1dda78d..33475df 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1711,7 +1711,7 @@ out_error:
#ifdef CONFIG_MIGRATION
int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
- struct page *page)
+ struct page *page, bool sync)
{
/*
* If PagePrivate is set, then the page is currently associated with
@@ -1726,7 +1726,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
nfs_fscache_release_page(page, GFP_KERNEL);
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e313022..07dae2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -609,9 +609,12 @@ struct address_space_operations {
loff_t offset, unsigned long nr_segs);
int (*get_xip_mem)(struct address_space *, pgoff_t, int,
void **, unsigned long *);
- /* migrate the contents of a page to the specified target */
+ /*
+ * migrate the contents of a page to the specified target. If sync
+ * is false, it must not block.
+ */
int (*migratepage) (struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
@@ -2578,7 +2581,7 @@ extern int generic_check_addressable(unsigned, u64);
#ifdef CONFIG_MIGRATION
extern int buffer_migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
#else
#define buffer_migrate_page NULL
#endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..14e6d2a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,7 +11,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
diff --git a/mm/migrate.c b/mm/migrate.c
index 578e291..a5be362 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -220,6 +220,55 @@ out:
pte_unmap_unlock(ptep, ptl);
}
+#ifdef CONFIG_BLOCK
+/* Returns true if all buffers are successfully locked */
+static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+{
+ struct buffer_head *bh = head;
+
+ /* Simple case, sync compaction */
+ if (sync) {
+ do {
+ get_bh(bh);
+ lock_buffer(bh);
+ bh = bh->b_this_page;
+
+ } while (bh != head);
+
+ return true;
+ }
+
+ /* async case, we cannot block on lock_buffer so use trylock_buffer */
+ do {
+ get_bh(bh);
+ if (!trylock_buffer(bh)) {
+ /*
+ * We failed to lock the buffer and cannot stall in
+ * async migration. Release the taken locks
+ */
+ struct buffer_head *failed_bh = bh;
+ put_bh(failed_bh);
+ bh = head;
+ while (bh != failed_bh) {
+ unlock_buffer(bh);
+ put_bh(bh);
+ bh = bh->b_this_page;
+ }
+ return false;
+ }
+
+ bh = bh->b_this_page;
+ } while (bh != head);
+ return true;
+}
+#else
+static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
+ bool sync)
+{
+ return true;
+}
+#endif /* CONFIG_BLOCK */
+
/*
* Replace the page in the mapping.
*
@@ -229,7 +278,8 @@ out:
* 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
*/
static int migrate_page_move_mapping(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page,
+ struct buffer_head *head, bool sync)
{
int expected_count;
void **pslot;
@@ -259,6 +309,19 @@ static int migrate_page_move_mapping(struct address_space *mapping,
}
/*
+ * In the async migration case of moving a page with buffers, lock the
+ * buffers using trylock before the mapping is moved. If the mapping
+ * was moved, we later failed to lock the buffers and could not move
+ * the mapping back due to an elevated page count, we would have to
+ * block waiting on other references to be dropped.
+ */
+ if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+ page_unfreeze_refs(page, expected_count);
+ spin_unlock_irq(&mapping->tree_lock);
+ return -EAGAIN;
+ }
+
+ /*
* Now we know that no one else is looking at the page.
*/
get_page(newpage); /* add cache reference */
@@ -415,13 +478,13 @@ EXPORT_SYMBOL(fail_migrate_page);
* Pages are locked upon entry and exit.
*/
int migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
int rc;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
- rc = migrate_page_move_mapping(mapping, newpage, page);
+ rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
if (rc)
return rc;
@@ -438,28 +501,28 @@ EXPORT_SYMBOL(migrate_page);
* exist.
*/
int buffer_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
struct buffer_head *bh, *head;
int rc;
if (!page_has_buffers(page))
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
head = page_buffers(page);
- rc = migrate_page_move_mapping(mapping, newpage, page);
+ rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
if (rc)
return rc;
- bh = head;
- do {
- get_bh(bh);
- lock_buffer(bh);
- bh = bh->b_this_page;
-
- } while (bh != head);
+ /*
+ * In the async case, migrate_page_move_mapping locked the buffers
+ * with an IRQ-safe spinlock held. In the sync case, the buffers
+ * need to be locked no
+ */
+ if (sync)
+ BUG_ON(!buffer_migrate_lock_buffers(head, sync));
ClearPagePrivate(page);
set_page_private(newpage, page_private(page));
@@ -536,10 +599,13 @@ static int writeout(struct address_space *mapping, struct page *page)
* Default handling if a filesystem does not provide a migration function.
*/
static int fallback_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
- if (PageDirty(page))
+ if (PageDirty(page)) {
+ if (!sync)
+ return -EBUSY;
return writeout(mapping, page);
+ }
/*
* Buffers may be managed in a filesystem specific way.
@@ -549,7 +615,7 @@ static int fallback_migrate_page(struct address_space *mapping,
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
/*
@@ -585,29 +651,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
mapping = page_mapping(page);
if (!mapping)
- rc = migrate_page(mapping, newpage, page);
- else {
+ rc = migrate_page(mapping, newpage, page, sync);
+ else if (mapping->a_ops->migratepage)
/*
- * Do not writeback pages if !sync and migratepage is
- * not pointing to migrate_page() which is nonblocking
- * (swapcache/tmpfs uses migratepage = migrate_page).
+ * Most pages have a mapping and most filesystems provide a
+ * migratepage callback. Anonymous pages are part of swap
+ * space which also has its own migratepage callback. This
+ * is the most common path for page migration.
*/
- if (PageDirty(page) && !sync &&
- mapping->a_ops->migratepage != migrate_page)
- rc = -EBUSY;
- else if (mapping->a_ops->migratepage)
- /*
- * Most pages have a mapping and most filesystems
- * should provide a migration function. Anonymous
- * pages are part of swap space which also has its
- * own migration function. This is the most common
- * path for page migration.
- */
- rc = mapping->a_ops->migratepage(mapping,
- newpage, page);
- else
- rc = fallback_migrate_page(mapping, newpage, page);
- }
+ rc = mapping->a_ops->migratepage(mapping,
+ newpage, page, sync);
+ else
+ rc = fallback_migrate_page(mapping, newpage, page, sync);
if (rc) {
newpage->mapping = NULL;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 06/11] mm: compaction: make isolate_lru_page() filter-aware again
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (4 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 05/11] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 07/11] mm: page allocator: Do not call direct reclaim for THP allocations while compaction is deferred Mel Gorman
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
noted that compaction does not migrate dirty or writeback pages and
that is was meaningless to pick the page and re-add it to the LRU list.
This had to be partially reverted because some dirty pages can be
migrated by compaction without blocking.
This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise
LRU disruption.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mmzone.h | 2 ++
mm/compaction.c | 3 +++
mm/vmscan.c | 35 +++++++++++++++++++++++++++++++++--
3 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 188cb2f..ac5b522 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -173,6 +173,8 @@ static inline int is_unevictable_lru(enum lru_list l)
#define ISOLATE_CLEAN ((__force isolate_mode_t)0x4)
/* Isolate unmapped file */
#define ISOLATE_UNMAPPED ((__force isolate_mode_t)0x8)
+/* Isolate for asynchronous migration */
+#define ISOLATE_ASYNC_MIGRATE ((__force isolate_mode_t)0x10)
/* LRU Isolation modes. */
typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/mm/compaction.c b/mm/compaction.c
index 615502b..0379263 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,6 +349,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;
}
+ if (!cc->sync)
+ mode |= ISOLATE_ASYNC_MIGRATE;
+
/* Try isolate the page */
if (__isolate_lru_page(page, mode, 0) != 0)
continue;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5f4c789..d2b701a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1061,8 +1061,39 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
ret = -EBUSY;
- if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
- return ret;
+ /*
+ * To minimise LRU disruption, the caller can indicate that it only
+ * wants to isolate pages it will be able to operate on without
+ * blocking - clean pages for the most part.
+ *
+ * ISOLATE_CLEAN means that only clean pages should be isolated. This
+ * is used by reclaim when it is cannot write to backing storage
+ *
+ * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
+ * that it is possible to migrate without blocking
+ */
+ if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
+ /* All the caller can do on PageWriteback is block */
+ if (PageWriteback(page))
+ return ret;
+
+ if (PageDirty(page)) {
+ struct address_space *mapping;
+
+ /* ISOLATE_CLEAN means only clean pages */
+ if (mode & ISOLATE_CLEAN)
+ return ret;
+
+ /*
+ * Only pages without mappings or that have a
+ * ->migratepage callback are possible to migrate
+ * without blocking
+ */
+ mapping = page_mapping(page);
+ if (mapping && !mapping->a_ops->migratepage)
+ return ret;
+ }
+ }
if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
return ret;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 07/11] mm: page allocator: Do not call direct reclaim for THP allocations while compaction is deferred
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (5 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 06/11] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 08/11] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
If compaction is deferred, direct reclaim is used to try free enough
pages for the allocation to succeed. For small high-orders, this has
a reasonable chance of success. However, if the caller has specified
__GFP_NO_KSWAPD to limit the disruption to the system, it makes more
sense to fail the allocation rather than stall the caller in direct
reclaim. This patch skips direct reclaim if compaction is deferred
and the caller specifies __GFP_NO_KSWAPD.
Async compaction only considers a subset of pages so it is possible for
compaction to be deferred prematurely and not enter direct reclaim even
in cases where it should. To compensate for this, this patch also defers
compaction only if sync compaction failed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
---
mm/page_alloc.c | 45 +++++++++++++++++++++++++++++++++++----------
1 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9dd443d..d979376 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1886,14 +1886,20 @@ static struct page *
__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
- int migratetype, unsigned long *did_some_progress,
- bool sync_migration)
+ int migratetype, bool sync_migration,
+ bool *deferred_compaction,
+ unsigned long *did_some_progress)
{
struct page *page;
- if (!order || compaction_deferred(preferred_zone))
+ if (!order)
return NULL;
+ if (compaction_deferred(preferred_zone)) {
+ *deferred_compaction = true;
+ return NULL;
+ }
+
current->flags |= PF_MEMALLOC;
*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
nodemask, sync_migration);
@@ -1921,7 +1927,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
* but not enough to satisfy watermarks.
*/
count_vm_event(COMPACTFAIL);
- defer_compaction(preferred_zone);
+
+ /*
+ * As async compaction considers a subset of pageblocks, only
+ * defer if the failure was a sync compaction failure.
+ */
+ if (sync_migration)
+ defer_compaction(preferred_zone);
cond_resched();
}
@@ -1933,8 +1945,9 @@ static inline struct page *
__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
- int migratetype, unsigned long *did_some_progress,
- bool sync_migration)
+ int migratetype, bool sync_migration,
+ bool *deferred_compaction,
+ unsigned long *did_some_progress)
{
return NULL;
}
@@ -2084,6 +2097,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned long pages_reclaimed = 0;
unsigned long did_some_progress;
bool sync_migration = false;
+ bool deferred_compaction = false;
/*
* In the slowpath, we sanity check order to avoid ever trying to
@@ -2164,12 +2178,22 @@ rebalance:
zonelist, high_zoneidx,
nodemask,
alloc_flags, preferred_zone,
- migratetype, &did_some_progress,
- sync_migration);
+ migratetype, sync_migration,
+ &deferred_compaction,
+ &did_some_progress);
if (page)
goto got_pg;
sync_migration = true;
+ /*
+ * If compaction is deferred for high-order allocations, it is because
+ * sync compaction recently failed. In this is the case and the caller
+ * has requested the system not be heavily disrupted, fail the
+ * allocation now instead of entering direct reclaim
+ */
+ if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
+ goto nopage;
+
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
zonelist, high_zoneidx,
@@ -2232,8 +2256,9 @@ rebalance:
zonelist, high_zoneidx,
nodemask,
alloc_flags, preferred_zone,
- migratetype, &did_some_progress,
- sync_migration);
+ migratetype, sync_migration,
+ &deferred_compaction,
+ &did_some_progress);
if (page)
goto got_pg;
}
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 08/11] mm: compaction: Introduce sync-light migration for use by compaction
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (6 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 07/11] mm: page allocator: Do not call direct reclaim for THP allocations while compaction is deferred Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 09/11] mm: vmscan: When reclaiming for compaction, ensure there are sufficient free pages available Mel Gorman
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async
compaction maps to MIGRATE_ASYNC while sync compaction maps to
MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
hotplug, MIGRATE_SYNC is used.
This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be
a large number of dirty pages backed by a filesystem that does not
support ->writepages.
[aarcange@redhat.com: This patch is heavily based on Andrea's work]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/btrfs/disk-io.c | 3 +-
fs/nfs/internal.h | 2 +-
fs/nfs/write.c | 2 +-
include/linux/fs.h | 6 ++-
include/linux/migrate.h | 23 +++++++++++---
mm/compaction.c | 2 +-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 2 +-
mm/migrate.c | 78 ++++++++++++++++++++++++++---------------------
10 files changed, 73 insertions(+), 49 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 896b87a..dbe9518 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
#ifdef CONFIG_MIGRATION
static int btree_migratepage(struct address_space *mapping,
- struct page *newpage, struct page *page, bool sync)
+ struct page *newpage, struct page *page,
+ enum migrate_mode sync)
{
/*
* we can't safely write a btree page from here,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8d96ed6..68b3f20 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -330,7 +330,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
- struct page *, struct page *, bool);
+ struct page *, struct page *, enum migrate_mode);
#else
#define nfs_migrate_page NULL
#endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 33475df..adb87d9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1711,7 +1711,7 @@ out_error:
#ifdef CONFIG_MIGRATION
int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
- struct page *page, bool sync)
+ struct page *page, enum migrate_mode sync)
{
/*
* If PagePrivate is set, then the page is currently associated with
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 07dae2a..715b344 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -525,6 +525,7 @@ enum positive_aop_returns {
struct page;
struct address_space;
struct writeback_control;
+enum migrate_mode;
struct iov_iter {
const struct iovec *iov;
@@ -614,7 +615,7 @@ struct address_space_operations {
* is false, it must not block.
*/
int (*migratepage) (struct address_space *,
- struct page *, struct page *, bool);
+ struct page *, struct page *, enum migrate_mode);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
@@ -2581,7 +2582,8 @@ extern int generic_check_addressable(unsigned, u64);
#ifdef CONFIG_MIGRATION
extern int buffer_migrate_page(struct address_space *,
- struct page *, struct page *, bool);
+ struct page *, struct page *,
+ enum migrate_mode);
#else
#define buffer_migrate_page NULL
#endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 14e6d2a..775787c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -6,18 +6,31 @@
typedef struct page *new_page_t(struct page *, unsigned long private, int **);
+/*
+ * MIGRATE_ASYNC means never block
+ * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
+ * on most operations but not ->writepage as the potential stall time
+ * is too significant
+ * MIGRATE_SYNC will block when migrating pages
+ */
+enum migrate_mode {
+ MIGRATE_ASYNC,
+ MIGRATE_SYNC_LIGHT,
+ MIGRATE_SYNC,
+};
+
#ifdef CONFIG_MIGRATION
#define PAGE_MIGRATION 1
extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
- struct page *, struct page *, bool);
+ struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync);
+ enum migrate_mode sync);
extern int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync);
+ enum migrate_mode sync);
extern int fail_migrate_page(struct address_space *,
struct page *, struct page *);
@@ -36,10 +49,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync) { return -ENOSYS; }
+ enum migrate_mode sync) { return -ENOSYS; }
static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync) { return -ENOSYS; }
+ enum migrate_mode sync) { return -ENOSYS; }
static inline int migrate_prep(void) { return -ENOSYS; }
static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 0379263..dbe1da0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -555,7 +555,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
nr_migrate = cc->nr_migratepages;
err = migrate_pages(&cc->migratepages, compaction_alloc,
(unsigned long)cc, false,
- cc->sync);
+ cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
update_nr_listpages(cc);
nr_remaining = cc->nr_migratepages;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 06d3479..56080ea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1557,7 +1557,7 @@ int soft_offline_page(struct page *page, int flags)
page_is_file_cache(page));
list_add(&page->lru, &pagelist);
ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
- 0, true);
+ 0, MIGRATE_SYNC);
if (ret) {
putback_lru_pages(&pagelist);
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2168489..6629faf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -809,7 +809,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
}
/* this function returns # of failed pages */
ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
- true, true);
+ true, MIGRATE_SYNC);
if (ret)
putback_lru_pages(&source);
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index adc3954..97009a4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -933,7 +933,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_node_page, dest,
- false, true);
+ false, MIGRATE_SYNC);
if (err)
putback_lru_pages(&pagelist);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index a5be362..44071dc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -222,12 +222,13 @@ out:
#ifdef CONFIG_BLOCK
/* Returns true if all buffers are successfully locked */
-static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+static bool buffer_migrate_lock_buffers(struct buffer_head *head,
+ enum migrate_mode mode)
{
struct buffer_head *bh = head;
/* Simple case, sync compaction */
- if (sync) {
+ if (mode != MIGRATE_ASYNC) {
do {
get_bh(bh);
lock_buffer(bh);
@@ -263,7 +264,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
}
#else
static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
- bool sync)
+ enum migrate_mode mode)
{
return true;
}
@@ -279,7 +280,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
*/
static int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page,
- struct buffer_head *head, bool sync)
+ struct buffer_head *head, enum migrate_mode mode)
{
int expected_count;
void **pslot;
@@ -315,7 +316,8 @@ static int migrate_page_move_mapping(struct address_space *mapping,
* the mapping back due to an elevated page count, we would have to
* block waiting on other references to be dropped.
*/
- if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+ if (mode == MIGRATE_ASYNC && head &&
+ !buffer_migrate_lock_buffers(head, mode)) {
page_unfreeze_refs(page, expected_count);
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
@@ -478,13 +480,14 @@ EXPORT_SYMBOL(fail_migrate_page);
* Pages are locked upon entry and exit.
*/
int migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page, bool sync)
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
{
int rc;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
- rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
+ rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
if (rc)
return rc;
@@ -501,17 +504,17 @@ EXPORT_SYMBOL(migrate_page);
* exist.
*/
int buffer_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page, bool sync)
+ struct page *newpage, struct page *page, enum migrate_mode mode)
{
struct buffer_head *bh, *head;
int rc;
if (!page_has_buffers(page))
- return migrate_page(mapping, newpage, page, sync);
+ return migrate_page(mapping, newpage, page, mode);
head = page_buffers(page);
- rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
+ rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
if (rc)
return rc;
@@ -521,8 +524,8 @@ int buffer_migrate_page(struct address_space *mapping,
* with an IRQ-safe spinlock held. In the sync case, the buffers
* need to be locked no
*/
- if (sync)
- BUG_ON(!buffer_migrate_lock_buffers(head, sync));
+ if (mode != MIGRATE_ASYNC)
+ BUG_ON(!buffer_migrate_lock_buffers(head, mode));
ClearPagePrivate(page);
set_page_private(newpage, page_private(page));
@@ -599,10 +602,11 @@ static int writeout(struct address_space *mapping, struct page *page)
* Default handling if a filesystem does not provide a migration function.
*/
static int fallback_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page, bool sync)
+ struct page *newpage, struct page *page, enum migrate_mode mode)
{
if (PageDirty(page)) {
- if (!sync)
+ /* Only writeback pages in full synchronous migration */
+ if (mode != MIGRATE_SYNC)
return -EBUSY;
return writeout(mapping, page);
}
@@ -615,7 +619,7 @@ static int fallback_migrate_page(struct address_space *mapping,
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
- return migrate_page(mapping, newpage, page, sync);
+ return migrate_page(mapping, newpage, page, mode);
}
/*
@@ -630,7 +634,7 @@ static int fallback_migrate_page(struct address_space *mapping,
* == 0 - success
*/
static int move_to_new_page(struct page *newpage, struct page *page,
- int remap_swapcache, bool sync)
+ int remap_swapcache, enum migrate_mode mode)
{
struct address_space *mapping;
int rc;
@@ -651,7 +655,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
mapping = page_mapping(page);
if (!mapping)
- rc = migrate_page(mapping, newpage, page, sync);
+ rc = migrate_page(mapping, newpage, page, mode);
else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems provide a
@@ -660,9 +664,9 @@ static int move_to_new_page(struct page *newpage, struct page *page,
* is the most common path for page migration.
*/
rc = mapping->a_ops->migratepage(mapping,
- newpage, page, sync);
+ newpage, page, mode);
else
- rc = fallback_migrate_page(mapping, newpage, page, sync);
+ rc = fallback_migrate_page(mapping, newpage, page, mode);
if (rc) {
newpage->mapping = NULL;
@@ -677,7 +681,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
}
static int __unmap_and_move(struct page *page, struct page *newpage,
- int force, bool offlining, bool sync)
+ int force, bool offlining, enum migrate_mode mode)
{
int rc = -EAGAIN;
int remap_swapcache = 1;
@@ -686,7 +690,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
struct anon_vma *anon_vma = NULL;
if (!trylock_page(page)) {
- if (!force || !sync)
+ if (!force || mode == MIGRATE_ASYNC)
goto out;
/*
@@ -732,10 +736,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
if (PageWriteback(page)) {
/*
- * For !sync, there is no point retrying as the retry loop
- * is expected to be too short for PageWriteback to be cleared
+ * Only in the case of a full syncronous migration is it
+ * necessary to wait for PageWriteback. In the async case,
+ * the retry loop is too short and in the sync-light case,
+ * the overhead of stalling is too much
*/
- if (!sync) {
+ if (mode != MIGRATE_SYNC) {
rc = -EBUSY;
goto uncharge;
}
@@ -806,7 +812,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
skip_unmap:
if (!page_mapped(page))
- rc = move_to_new_page(newpage, page, remap_swapcache, sync);
+ rc = move_to_new_page(newpage, page, remap_swapcache, mode);
if (rc && remap_swapcache)
remove_migration_ptes(page, page);
@@ -829,7 +835,8 @@ out:
* to the newly allocated page in newpage.
*/
static int unmap_and_move(new_page_t get_new_page, unsigned long private,
- struct page *page, int force, bool offlining, bool sync)
+ struct page *page, int force, bool offlining,
+ enum migrate_mode mode)
{
int rc = 0;
int *result = NULL;
@@ -847,7 +854,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
if (unlikely(split_huge_page(page)))
goto out;
- rc = __unmap_and_move(page, newpage, force, offlining, sync);
+ rc = __unmap_and_move(page, newpage, force, offlining, mode);
out:
if (rc != -EAGAIN) {
/*
@@ -895,7 +902,8 @@ out:
*/
static int unmap_and_move_huge_page(new_page_t get_new_page,
unsigned long private, struct page *hpage,
- int force, bool offlining, bool sync)
+ int force, bool offlining,
+ enum migrate_mode mode)
{
int rc = 0;
int *result = NULL;
@@ -908,7 +916,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
rc = -EAGAIN;
if (!trylock_page(hpage)) {
- if (!force || !sync)
+ if (!force || mode != MIGRATE_SYNC)
goto out;
lock_page(hpage);
}
@@ -919,7 +927,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
if (!page_mapped(hpage))
- rc = move_to_new_page(new_hpage, hpage, 1, sync);
+ rc = move_to_new_page(new_hpage, hpage, 1, mode);
if (rc)
remove_migration_ptes(hpage, hpage);
@@ -962,7 +970,7 @@ out:
*/
int migrate_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
- bool sync)
+ enum migrate_mode mode)
{
int retry = 1;
int nr_failed = 0;
@@ -983,7 +991,7 @@ int migrate_pages(struct list_head *from,
rc = unmap_and_move(get_new_page, private,
page, pass > 2, offlining,
- sync);
+ mode);
switch(rc) {
case -ENOMEM:
@@ -1013,7 +1021,7 @@ out:
int migrate_huge_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
- bool sync)
+ enum migrate_mode mode)
{
int retry = 1;
int nr_failed = 0;
@@ -1030,7 +1038,7 @@ int migrate_huge_pages(struct list_head *from,
rc = unmap_and_move_huge_page(get_new_page,
private, page, pass > 2, offlining,
- sync);
+ mode);
switch(rc) {
case -ENOMEM:
@@ -1159,7 +1167,7 @@ set_status:
err = 0;
if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_page_node,
- (unsigned long)pm, 0, true);
+ (unsigned long)pm, 0, MIGRATE_SYNC);
if (err)
putback_lru_pages(&pagelist);
}
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 09/11] mm: vmscan: When reclaiming for compaction, ensure there are sufficient free pages available
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (7 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 08/11] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 10/11] mm: vmscan: Check if reclaim should really abort even if compaction_ready() is true for one zone Mel Gorman
2011-12-01 17:36 ` [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Mel Gorman
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
In commit [e0887c19: vmscan: limit direct reclaim for higher order
allocations], Rik noted that reclaim was too aggressive when THP was
enabled. In his initial patch he used the number of free pages to
decide if reclaim should abort for compaction. My feedback was that
reclaim and compaction should be using the same logic when deciding if
reclaim should be aborted.
Unfortunately, this had the effect of reducing THP success rates when
the workload included something like streaming reads that continually
allocated pages. The window during which compaction could run and return
a THP was too small.
This patch combines Rik's two patches together. compaction_suitable()
is still used to decide if reclaim should be aborted to allow
compaction is used. However, it will also ensure that there is a
reasonable buffer of free pages available. This improves upon the
THP allocation success rates but bounds the number of pages that are
freed for compaction.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 44 +++++++++++++++++++++++++++++++++++++++-----
1 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d2b701a..6c7085d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2122,6 +2122,42 @@ restart:
throttle_vm_writeout(sc->gfp_mask);
}
+/* Returns true if compaction should go ahead for a high-order request */
+static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+{
+ unsigned long balance_gap, watermark;
+ bool watermark_ok;
+
+ /* Do not consider compaction for orders reclaim is meant to satisfy */
+ if (sc->order <= PAGE_ALLOC_COSTLY_ORDER)
+ return false;
+
+ /*
+ * Compaction takes time to run and there are potentially other
+ * callers using the pages just freed. Continue reclaiming until
+ * there is a buffer of free pages available to give compaction
+ * a reasonable chance of completing and allocating the page
+ */
+ balance_gap = min(low_wmark_pages(zone),
+ (zone->present_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+ KSWAPD_ZONE_BALANCE_GAP_RATIO);
+ watermark = high_wmark_pages(zone) + balance_gap + (2UL << sc->order);
+ watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
+
+ /*
+ * If compaction is deferred, reclaim up to a point where
+ * compaction will have a chance of success when re-enabled
+ */
+ if (compaction_deferred(zone))
+ return watermark_ok;
+
+ /* If compaction is not ready to start, keep reclaiming */
+ if (!compaction_suitable(zone, sc->order))
+ return false;
+
+ return watermark_ok;
+}
+
/*
* This is the direct reclaim path, for page-allocating processes. We only
* try to reclaim pages from zones which will satisfy the caller's allocation
@@ -2139,8 +2175,8 @@ restart:
* scan then give up on it.
*
* This function returns true if a zone is being reclaimed for a costly
- * high-order allocation and compaction is either ready to begin or deferred.
- * This indicates to the caller that it should retry the allocation or fail.
+ * high-order allocation and compaction is ready to begin. This indicates to
+ * the caller that it should retry the allocation or fail.
*/
static bool shrink_zones(int priority, struct zonelist *zonelist,
struct scan_control *sc)
@@ -2174,9 +2210,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
* noticable problem, like transparent huge page
* allocations.
*/
- if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
- (compaction_suitable(zone, sc->order) ||
- compaction_deferred(zone))) {
+ if (compaction_ready(zone, sc)) {
should_abort_reclaim = true;
continue;
}
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 10/11] mm: vmscan: Check if reclaim should really abort even if compaction_ready() is true for one zone
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (8 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 09/11] mm: vmscan: When reclaiming for compaction, ensure there are sufficient free pages available Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
2011-12-01 17:36 ` [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Mel Gorman
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
If compaction can proceed for a given zone, shrink_zones() does not
reclaim any more pages from it. After commit [e0c2327: vmscan: abort
reclaim/compaction if compaction can proceed], do_try_to_free_pages()
tries to finish as soon as possible once one zone can compact.
This was intended to prevent slabs being shrunk unnecessarily but
there are side-effects. One is that a small zone that is ready for
compaction will abort reclaim even if the chances of successfully
allocating a THP from that zone is small. It also means that reclaim
can return too early even though sc->nr_to_reclaim pages were not
reclaimed.
This partially reverts the commit until it is proven that slabs are
really being shrunk unnecessarily but preserves the check to return
1 to avoid OOM if reclaim was aborted prematurely.
[aarcange@redhat.com: This patch replaces a revert from Andrea]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 19 +++++++++----------
1 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6c7085d..b0eeec7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2176,7 +2176,8 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
*
* This function returns true if a zone is being reclaimed for a costly
* high-order allocation and compaction is ready to begin. This indicates to
- * the caller that it should retry the allocation or fail.
+ * the caller that it should consider retrying the allocation instead of
+ * further reclaim.
*/
static bool shrink_zones(int priority, struct zonelist *zonelist,
struct scan_control *sc)
@@ -2185,7 +2186,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
struct zone *zone;
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
- bool should_abort_reclaim = false;
+ bool aborted_reclaim = false;
for_each_zone_zonelist_nodemask(zone, z, zonelist,
gfp_zone(sc->gfp_mask), sc->nodemask) {
@@ -2211,7 +2212,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
* allocations.
*/
if (compaction_ready(zone, sc)) {
- should_abort_reclaim = true;
+ aborted_reclaim = true;
continue;
}
}
@@ -2233,7 +2234,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
shrink_zone(priority, zone, sc);
}
- return should_abort_reclaim;
+ return aborted_reclaim;
}
static bool zone_reclaimable(struct zone *zone)
@@ -2287,7 +2288,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
struct zoneref *z;
struct zone *zone;
unsigned long writeback_threshold;
- bool should_abort_reclaim;
+ bool aborted_reclaim;
get_mems_allowed();
delayacct_freepages_start();
@@ -2299,9 +2300,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
sc->nr_scanned = 0;
if (!priority)
disable_swap_token(sc->mem_cgroup);
- should_abort_reclaim = shrink_zones(priority, zonelist, sc);
- if (should_abort_reclaim)
- break;
+ aborted_reclaim = shrink_zones(priority, zonelist, sc);
/*
* Don't shrink slabs when reclaiming memory from
@@ -2368,8 +2367,8 @@ out:
if (oom_killer_disabled)
return 0;
- /* Aborting reclaim to try compaction? don't OOM, then */
- if (should_abort_reclaim)
+ /* Aborted reclaim to try compaction? don't OOM, then */
+ if (aborted_reclaim)
return 1;
/* top priority shrink_zones still had more to do? don't OOM, then */
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU
2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman
` (9 preceding siblings ...)
2011-12-01 17:36 ` [PATCH 10/11] mm: vmscan: Check if reclaim should really abort even if compaction_ready() is true for one zone Mel Gorman
@ 2011-12-01 17:36 ` Mel Gorman
10 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2011-12-01 17:36 UTC (permalink / raw)
To: Linux-MM
Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML
It was observed that scan rates from direct reclaim during tests
writing to both fast and slow storage were extraordinarily high. The
problem was that while pages were being marked for immediate reclaim
when writeback completed, the same pages were being encountered over
and over again during LRU scanning.
This patch isolates file-backed pages that are to be reclaimed when
clean on their own LRU list.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
mm/page_alloc.c | 5 ++-
mm/swap.c | 74 ++++++++++++++++++++++++++++++++++++++---
mm/vmscan.c | 11 ++++++
mm/vmstat.c | 2 +
6 files changed, 89 insertions(+), 6 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ac5b522..80834eb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -84,6 +84,7 @@ enum zone_stat_item {
NR_ACTIVE_ANON, /* " " " " " */
NR_INACTIVE_FILE, /* " " " " " */
NR_ACTIVE_FILE, /* " " " " " */
+ NR_IMMEDIATE, /* " " " " " */
NR_UNEVICTABLE, /* " " " " " */
NR_MLOCK, /* mlock()ed pages found and moved off LRU */
NR_ANON_PAGES, /* Mapped anonymous pages */
@@ -136,6 +137,7 @@ enum lru_list {
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
+ LRU_IMMEDIATE,
LRU_UNEVICTABLE,
NR_LRU_LISTS
};
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 03b90cdc..9696fda 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -36,6 +36,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ PGRESCUED,
#ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d979376..9e3cd8d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2590,7 +2590,7 @@ void show_free_areas(unsigned int filter)
printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
- " unevictable:%lu"
+ " immediate:%lu unevictable:%lu"
" dirty:%lu writeback:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n",
@@ -2600,6 +2600,7 @@ void show_free_areas(unsigned int filter)
global_page_state(NR_ACTIVE_FILE),
global_page_state(NR_INACTIVE_FILE),
global_page_state(NR_ISOLATED_FILE),
+ global_page_state(NR_IMMEDIATE),
global_page_state(NR_UNEVICTABLE),
global_page_state(NR_FILE_DIRTY),
global_page_state(NR_WRITEBACK),
@@ -2627,6 +2628,7 @@ void show_free_areas(unsigned int filter)
" inactive_anon:%lukB"
" active_file:%lukB"
" inactive_file:%lukB"
+ " immediate:%lukB"
" unevictable:%lukB"
" isolated(anon):%lukB"
" isolated(file):%lukB"
@@ -2655,6 +2657,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_INACTIVE_ANON)),
K(zone_page_state(zone, NR_ACTIVE_FILE)),
K(zone_page_state(zone, NR_INACTIVE_FILE)),
+ K(zone_page_state(zone, NR_IMMEDIATE)),
K(zone_page_state(zone, NR_UNEVICTABLE)),
K(zone_page_state(zone, NR_ISOLATED_ANON)),
K(zone_page_state(zone, NR_ISOLATED_FILE)),
diff --git a/mm/swap.c b/mm/swap.c
index a91caf7..9973975 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,7 @@ int page_cluster;
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_putback_immediate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
/*
@@ -255,24 +256,80 @@ static void pagevec_move_tail(struct pagevec *pvec)
}
/*
+ * Similar pair of functions to pagevec_move_tail except it is called when
+ * moving a page from the LRU_IMMEDIATE to one of the [in]active_[file|anon]
+ * lists
+ */
+static void pagevec_putback_immediate_fn(struct page *page, void *arg)
+{
+ struct zone *zone = page_zone(page);
+
+ if (PageLRU(page)) {
+ enum lru_list lru = page_lru(page);
+ list_move(&page->lru, &zone->lru[lru].list);
+ }
+}
+
+static void pagevec_putback_immediate(struct pagevec *pvec)
+{
+ pagevec_lru_move_fn(pvec, pagevec_putback_immediate_fn, NULL);
+}
+
+/*
* Writeback is about to end against a page which has been marked for immediate
* reclaim. If it still appears to be reclaimable, move it to the tail of the
* inactive list.
*/
void rotate_reclaimable_page(struct page *page)
{
+ struct zone *zone = page_zone(page);
+ struct list_head *page_list;
+ struct pagevec *pvec;
+ unsigned long flags;
+
+ page_cache_get(page);
+ local_irq_save(flags);
+ __mod_zone_page_state(zone, NR_IMMEDIATE, -1);
+
if (!PageLocked(page) && !PageDirty(page) && !PageActive(page) &&
!PageUnevictable(page) && PageLRU(page)) {
- struct pagevec *pvec;
- unsigned long flags;
- page_cache_get(page);
- local_irq_save(flags);
pvec = &__get_cpu_var(lru_rotate_pvecs);
if (!pagevec_add(pvec, page))
pagevec_move_tail(pvec);
- local_irq_restore(flags);
+ } else {
+ pvec = &__get_cpu_var(lru_putback_immediate_pvecs);
+ if (!pagevec_add(pvec, page))
+ pagevec_putback_immediate(pvec);
+ }
+
+ /*
+ * There is a potential race that if a page is set PageReclaim
+ * and moved to the LRU_IMMEDIATE list after writeback completed,
+ * it can be left on the LRU_IMMEDATE list with no way for
+ * reclaim to find it.
+ *
+ * This race should be very rare but count how often it happens.
+ * If it is a continual race, then it's very unsatisfactory as there
+ * is no guarantee that rotate_reclaimable_page() will be called
+ * to rescue these pages but finding them in page reclaim is also
+ * problematic due to the problem of deciding when the right time
+ * to scan this list is.
+ */
+ page_list = &zone->lru[LRU_IMMEDIATE].list;
+ if (!zone_page_state(zone, NR_IMMEDIATE) && !list_empty(page_list)) {
+ struct page *page;
+
+ spin_lock(&zone->lru_lock);
+ while (!list_empty(page_list)) {
+ page = list_entry(page_list->prev, struct page, lru);
+ list_move(&page->lru, &zone->lru[page_lru(page)].list);
+ __count_vm_event(PGRESCUED);
+ }
+ spin_unlock(&zone->lru_lock);
}
+
+ local_irq_restore(flags);
}
static void update_page_reclaim_stat(struct zone *zone, struct page *page,
@@ -475,6 +532,13 @@ static void lru_deactivate_fn(struct page *page, void *arg)
* is _really_ small and it's non-critical problem.
*/
SetPageReclaim(page);
+
+ /*
+ * Move to the LRU_IMMEDIATE list to avoid being scanned
+ * by page reclaim uselessly.
+ */
+ list_move_tail(&page->lru, &zone->lru[LRU_IMMEDIATE].list);
+ __mod_zone_page_state(zone, NR_IMMEDIATE, 1);
} else {
/*
* The page's writeback ends up during pagevec
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b0eeec7..9879ae5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1404,6 +1404,17 @@ putback_lru_pages(struct zone *zone, struct scan_control *sc,
}
SetPageLRU(page);
lru = page_lru(page);
+
+ /*
+ * If reclaim has tagged a file page reclaim, move it to
+ * a separate LRU lists to avoid it being scanned by other
+ * users. It is expected that as writeback completes that
+ * they are taken back off and moved to the normal LRU
+ */
+ if (lru == LRU_INACTIVE_FILE &&
+ PageReclaim(page) && PageWriteback(page))
+ lru = LRU_IMMEDIATE;
+
add_page_to_lru_list(zone, page, lru);
if (is_active_lru(lru)) {
int file = is_file_lru(lru);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 8fd603b..dbfec4c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -688,6 +688,7 @@ const char * const vmstat_text[] = {
"nr_active_anon",
"nr_inactive_file",
"nr_active_file",
+ "nr_immediate",
"nr_unevictable",
"nr_mlock",
"nr_anon_pages",
@@ -756,6 +757,7 @@ const char * const vmstat_text[] = {
"allocstall",
"pgrotated",
+ "pgrescued",
#ifdef CONFIG_COMPACTION
"compact_blocks_moved",
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread