[RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2
@ 2011-11-21 18:36 Mel Gorman
  2011-11-21 18:36 ` [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

This is still a work-in-progress but felt it was important to show
what direction I am going with reconciling Andrea's series with
my own. This is against 3.2-rc2 and follows on from discussions on
"mm: Do not stall in synchronous compaction for THP allocations" and
"[RFC PATCH 0/5] Reduce compaction-related stalls".

Initially, the proposed patch eliminated stalls due to compaction
which sometimes resulted in user-visible interactivity problems on
browsers by simply never using sync compaction. The downside was that
THP success allocation rates were lower because dirty pages were not
being migrated as reported by Andrea. However, Andrea's approach was
a bit heavy handed and reverted fixes Rik merged that reduced the
amount of pages THP reclaimed.

This series is an RFC attempting to reconcile the requirements of
maximising THP usage, without stalling in a user-visible fashion due
to compaction or cheating by reclaiming an excessive number of pages.

Patch 1 partially reverts commit 39deaf85 to allow migration to isolate
	dirty pages.

Patch 2 notes that the /proc/sys/vm/compact_memory handler is not using
	synchronous compaction when it should be.

Patch 3 checks if we isolated a compound page during lumpy scan

Patch 4 adds a sync parameter to the migratepage callback. It is up
	to the callback to migrate that page without blocking if
	sync==false. For example, fallback_migrate_page will not
	call writepage if sync==false

Patch 5 restores filter-awareness to isolate_lru_page for migration.
	In practice, it means that pages under writeback and pages
	without a ->migratepage callback will not be isolated
	for migration.

Patch 6 avoids calling direct reclaim if compaction is deferred but
	makes sure that compaction is only deferred if sync
	compaction was used.

Patch 7 introduces a sync-light migration mechanism that sync compaction
	uses. The objective is to allow some stalls but to not call
	->writepage which can lead to significant user-visible stalls.

This has been lightly tested and nothing horrible fell out. Of critical
importance was that during a light test, stalls due to compaction were
eliminated even though sync compaction was still allowed.  Andrea, I
have not actually tried your test case but while monitoring THP usage
while a USB copy was in progress, I found that THP usage was higher

http://www.csn.ul.ie/~mel/postings/compaction-20111121/thp-comparison-smooth-hydra.png

while memory utilisation was also higher 

http://www.csn.ul.ie/~mel/postings/compaction-20111121/memory-usage-comparison-smooth-hydra.png

 fs/btrfs/disk-io.c      |    5 +-
 fs/nfs/internal.h       |    2 +-
 fs/nfs/write.c          |    4 +-
 include/linux/fs.h      |   11 ++-
 include/linux/migrate.h |   23 +++++--
 include/linux/mmzone.h  |    2 +
 mm/compaction.c         |    5 +-
 mm/memory-failure.c     |    2 +-
 mm/memory_hotplug.c     |    2 +-
 mm/mempolicy.c          |    2 +-
 mm/migrate.c            |  171 ++++++++++++++++++++++++++++++++---------------
 mm/page_alloc.c         |   45 ++++++++++---
 mm/vmscan.c             |   45 +++++++++++--
 13 files changed, 232 insertions(+), 87 deletions(-)

-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22 16:58   ` Minchan Kim
  2011-11-21 18:36 ` [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
noted that compaction does not migrate dirty or writeback pages and
that is was meaningless to pick the page and re-add it to the LRU list.

What was missed during review is that asynchronous migration moves
dirty pages if their ->migratepage callback is migrate_page() because
these can be moved without blocking. This potentially impacted
hugepage allocation success rates by a factor depending on how many
dirty pages are in the system.

This patch partially reverts 39deaf85 to allow migration to isolate
dirty pages again. This increases how much compaction disrupts the
LRU but that is addressed later in the series.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/compaction.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 899d956..237560e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,9 +349,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 		}
 
-		if (!cc->sync)
-			mode |= ISOLATE_CLEAN;
-
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, mode, 0) != 0)
 			continue;
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
  2011-11-21 18:36 ` [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22 17:00   ` Minchan Kim
  2011-11-21 18:36 ` [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan Mel Gorman
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

When asynchronous compaction was introduced, the
/proc/sys/vm/compact_memory handler should have been updated to always
use synchronous compaction. This did not happen so this patch addresses
it. The assumption is if a user writes to /proc/sys/vm/compact_memory,
they are willing for that process to stall.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/compaction.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 237560e..615502b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -666,6 +666,7 @@ static int compact_node(int nid)
 			.nr_freepages = 0,
 			.nr_migratepages = 0,
 			.order = -1,
+			.sync = true,
 		};
 
 		zone = &pgdat->node_zones[zoneid];
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
  2011-11-21 18:36 ` [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
  2011-11-21 18:36 ` [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22 17:05   ` Minchan Kim
  2011-11-21 18:36 ` [PATCH 4/7] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

From: Andrea Arcangeli <aarcange@redhat.com>

Properly take into account if we isolated a compound page during the
lumpy scan in reclaim and skip over the tail pages when encounted.
This corrects the values given to the tracepoint for number of lumpy
pages isolated and will avoid breaking the loop early if compound
pages smaller than the requested allocation size are requested.

[mgorman@suse.de: Updated changelog]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a1893c0..3421746 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1183,13 +1183,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 				break;
 
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+				unsigned int isolated_pages;
 				list_move(&cursor_page->lru, dst);
 				mem_cgroup_del_lru(cursor_page);
-				nr_taken += hpage_nr_pages(page);
-				nr_lumpy_taken++;
+				isolated_pages = hpage_nr_pages(page);
+				nr_taken += isolated_pages;
+				nr_lumpy_taken += isolated_pages;
 				if (PageDirty(cursor_page))
-					nr_lumpy_dirty++;
+					nr_lumpy_dirty += isolated_pages;
 				scan++;
+				pfn += isolated_pages-1;
 			} else {
 				/*
 				 * Check if the page is freed already.
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/7] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
                   ` (2 preceding siblings ...)
  2011-11-21 18:36 ` [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-21 18:36 ` [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

Asynchronous compaction is when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
synchronous compaction is never used but this impacts allocation
success rates. When deciding whether to migrate dirty pages, the
following check is made

	if (PageDirty(page) && !sync &&
		mapping->a_ops->migratepage != migrate_page)
			rc = -EBUSY;

This skips over all pages using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It
is the resposibility of the callback to gracefully fail migration of
the page if it cannot be achieved without blocking.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/btrfs/disk-io.c      |    4 +-
 fs/nfs/internal.h       |    2 +-
 fs/nfs/write.c          |    4 +-
 include/linux/fs.h      |    9 ++-
 include/linux/migrate.h |    2 +-
 mm/migrate.c            |  129 +++++++++++++++++++++++++++++++++-------------
 6 files changed, 104 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62afe5c..f158b5c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,7 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
-			struct page *newpage, struct page *page)
+			struct page *newpage, struct page *page, bool sync)
 {
 	/*
 	 * we can't safely write a btree page from here,
@@ -887,7 +887,7 @@ static int btree_migratepage(struct address_space *mapping,
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 #endif
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index c1a1bd8..d0c460f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -328,7 +328,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-		struct page *, struct page *);
+		struct page *, struct page *, bool);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1dda78d..33475df 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1711,7 +1711,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-		struct page *page)
+		struct page *page, bool sync)
 {
 	/*
 	 * If PagePrivate is set, then the page is currently associated with
@@ -1726,7 +1726,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
 
 	nfs_fscache_release_page(page, GFP_KERNEL);
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 #endif
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0c4df26..034cffb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -609,9 +609,12 @@ struct address_space_operations {
 			loff_t offset, unsigned long nr_segs);
 	int (*get_xip_mem)(struct address_space *, pgoff_t, int,
 						void **, unsigned long *);
-	/* migrate the contents of a page to the specified target */
+	/*
+	 * migrate the contents of a page to the specified target. If sync
+	 * is false, it must not block.
+	 */
 	int (*migratepage) (struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, bool);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
@@ -2577,7 +2580,7 @@ extern int generic_check_addressable(unsigned, u64);
 
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
-				struct page *, struct page *);
+				struct page *, struct page *, bool);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..14e6d2a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,7 +11,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, bool);
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
 			bool sync);
diff --git a/mm/migrate.c b/mm/migrate.c
index 578e291..a5be362 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -220,6 +220,55 @@ out:
 	pte_unmap_unlock(ptep, ptl);
 }
 
+#ifdef CONFIG_BLOCK
+/* Returns true if all buffers are successfully locked */
+static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+{
+	struct buffer_head *bh = head;
+
+	/* Simple case, sync compaction */
+	if (sync) {
+		do {
+			get_bh(bh);
+			lock_buffer(bh);
+			bh = bh->b_this_page;
+
+		} while (bh != head);
+
+		return true;
+	}
+
+	/* async case, we cannot block on lock_buffer so use trylock_buffer */
+	do {
+		get_bh(bh);
+		if (!trylock_buffer(bh)) {
+			/*
+			 * We failed to lock the buffer and cannot stall in
+			 * async migration. Release the taken locks
+			 */
+			struct buffer_head *failed_bh = bh;
+			put_bh(failed_bh);
+			bh = head;
+			while (bh != failed_bh) {
+				unlock_buffer(bh);
+				put_bh(bh);
+				bh = bh->b_this_page;
+			}
+			return false;
+		}
+
+		bh = bh->b_this_page;
+	} while (bh != head);
+	return true;
+}
+#else
+static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
+								bool sync)
+{
+	return true;
+}
+#endif /* CONFIG_BLOCK */
+
 /*
  * Replace the page in the mapping.
  *
@@ -229,7 +278,8 @@ out:
  * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
  */
 static int migrate_page_move_mapping(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page,
+		struct buffer_head *head, bool sync)
 {
 	int expected_count;
 	void **pslot;
@@ -259,6 +309,19 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	}
 
 	/*
+	 * In the async migration case of moving a page with buffers, lock the
+	 * buffers using trylock before the mapping is moved. If the mapping
+	 * was moved, we later failed to lock the buffers and could not move
+	 * the mapping back due to an elevated page count, we would have to
+	 * block waiting on other references to be dropped.
+	 */
+	if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+		page_unfreeze_refs(page, expected_count);
+		spin_unlock_irq(&mapping->tree_lock);
+		return -EAGAIN;
+	}
+
+	/*
 	 * Now we know that no one else is looking at the page.
 	 */
 	get_page(newpage);	/* add cache reference */
@@ -415,13 +478,13 @@ EXPORT_SYMBOL(fail_migrate_page);
  * Pages are locked upon entry and exit.
  */
 int migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page, bool sync)
 {
 	int rc;
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
 
 	if (rc)
 		return rc;
@@ -438,28 +501,28 @@ EXPORT_SYMBOL(migrate_page);
  * exist.
  */
 int buffer_migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page, bool sync)
 {
 	struct buffer_head *bh, *head;
 	int rc;
 
 	if (!page_has_buffers(page))
-		return migrate_page(mapping, newpage, page);
+		return migrate_page(mapping, newpage, page, sync);
 
 	head = page_buffers(page);
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
 
 	if (rc)
 		return rc;
 
-	bh = head;
-	do {
-		get_bh(bh);
-		lock_buffer(bh);
-		bh = bh->b_this_page;
-
-	} while (bh != head);
+	/*
+	 * In the async case, migrate_page_move_mapping locked the buffers
+	 * with an IRQ-safe spinlock held. In the sync case, the buffers
+	 * need to be locked no
+	 */
+	if (sync)
+		BUG_ON(!buffer_migrate_lock_buffers(head, sync));
 
 	ClearPagePrivate(page);
 	set_page_private(newpage, page_private(page));
@@ -536,10 +599,13 @@ static int writeout(struct address_space *mapping, struct page *page)
  * Default handling if a filesystem does not provide a migration function.
  */
 static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page)
+	struct page *newpage, struct page *page, bool sync)
 {
-	if (PageDirty(page))
+	if (PageDirty(page)) {
+		if (!sync)
+			return -EBUSY;
 		return writeout(mapping, page);
+	}
 
 	/*
 	 * Buffers may be managed in a filesystem specific way.
@@ -549,7 +615,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 
 /*
@@ -585,29 +651,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 
 	mapping = page_mapping(page);
 	if (!mapping)
-		rc = migrate_page(mapping, newpage, page);
-	else {
+		rc = migrate_page(mapping, newpage, page, sync);
+	else if (mapping->a_ops->migratepage)
 		/*
-		 * Do not writeback pages if !sync and migratepage is
-		 * not pointing to migrate_page() which is nonblocking
-		 * (swapcache/tmpfs uses migratepage = migrate_page).
+		 * Most pages have a mapping and most filesystems provide a
+		 * migratepage callback. Anonymous pages are part of swap
+		 * space which also has its own migratepage callback. This
+		 * is the most common path for page migration.
 		 */
-		if (PageDirty(page) && !sync &&
-		    mapping->a_ops->migratepage != migrate_page)
-			rc = -EBUSY;
-		else if (mapping->a_ops->migratepage)
-			/*
-			 * Most pages have a mapping and most filesystems
-			 * should provide a migration function. Anonymous
-			 * pages are part of swap space which also has its
-			 * own migration function. This is the most common
-			 * path for page migration.
-			 */
-			rc = mapping->a_ops->migratepage(mapping,
-							newpage, page);
-		else
-			rc = fallback_migrate_page(mapping, newpage, page);
-	}
+		rc = mapping->a_ops->migratepage(mapping,
+						newpage, page, sync);
+	else
+		rc = fallback_migrate_page(mapping, newpage, page, sync);
 
 	if (rc) {
 		newpage->mapping = NULL;
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
                   ` (3 preceding siblings ...)
  2011-11-21 18:36 ` [PATCH 4/7] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22 17:30   ` Minchan Kim
  2011-11-21 18:36 ` [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred Mel Gorman
  2011-11-21 18:36 ` [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
noted that compaction does not migrate dirty or writeback pages and
that is was meaningless to pick the page and re-add it to the LRU list.
This had to be partially reverted because some dirty pages can be
migrated by compaction without blocking.

This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise
LRU disruption.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mmzone.h |    2 ++
 mm/compaction.c        |    3 +++
 mm/vmscan.c            |   36 ++++++++++++++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 188cb2f..ac5b522 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -173,6 +173,8 @@ static inline int is_unevictable_lru(enum lru_list l)
 #define ISOLATE_CLEAN		((__force isolate_mode_t)0x4)
 /* Isolate unmapped file */
 #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
+/* Isolate for asynchronous migration */
+#define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
 
 /* LRU Isolation modes. */
 typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/mm/compaction.c b/mm/compaction.c
index 615502b..0379263 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,6 +349,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 		}
 
+		if (!cc->sync)
+			mode |= ISOLATE_ASYNC_MIGRATE;
+
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, mode, 0) != 0)
 			continue;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3421746..28df0ed 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1061,8 +1061,40 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 
 	ret = -EBUSY;
 
-	if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
-		return ret;
+	/*
+	 * To minimise LRU disruption, the caller can indicate that it only
+	 * wants to isolate pages it will be able to operate on without
+	 * blocking - clean pages for the most part.
+	 *
+	 * ISOLATE_CLEAN means that only clean pages should be isolated. This
+	 * is used by reclaim when it is cannot write to backing storage
+	 *
+	 * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
+	 * that it is possible to migrate without blocking with a ->migratepage
+	 * handler
+	 */
+	if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
+		/* All the caller can do on PageWriteback is block */
+		if (PageWriteback(page))
+			return ret;
+
+		if (PageDirty(page)) {
+			struct address_space *mapping;
+
+			/* ISOLATE_CLEAN means only clean pages */
+			if (mode & ISOLATE_CLEAN)
+				return ret;
+
+			/*
+			 * Only the ->migratepage callback knows if a dirty
+			 * page can be migrated without blocking. Skip the
+			 * page unless there is a ->migratepage callback.
+			 */
+			mapping = page_mapping(page);
+			if (!mapping || !mapping->a_ops->migratepage)
+				return ret;
+		}
+	}
 
 	if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
 		return ret;
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
                   ` (4 preceding siblings ...)
  2011-11-21 18:36 ` [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22 17:50   ` Minchan Kim
  2011-11-21 18:36 ` [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

If compaction is deferred, we enter direct reclaim to try reclaim the
pages that way. For small high-orders, this has a reasonable chance
of success. However, if the caller as specified __GFP_NO_KSWAPD to
limit the disruption to the system, it makes more sense to fail the
allocation rather than stall the caller in direct reclaim. This patch
will skip direct reclaim if compaction is deferred and the caller
specifies __GFP_NO_KSWAPD.

Async compaction only considers a subset of pages so it is possible for
compaction to be deferred prematurely and not enter direct reclaim even
in cases where it should. To compensate for this, this patch also defers
compaction only if sync compaction failed.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9dd443d..d979376 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1886,14 +1886,20 @@ static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	struct page *page;
 
-	if (!order || compaction_deferred(preferred_zone))
+	if (!order)
 		return NULL;
 
+	if (compaction_deferred(preferred_zone)) {
+		*deferred_compaction = true;
+		return NULL;
+	}
+
 	current->flags |= PF_MEMALLOC;
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
 						nodemask, sync_migration);
@@ -1921,7 +1927,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		 * but not enough to satisfy watermarks.
 		 */
 		count_vm_event(COMPACTFAIL);
-		defer_compaction(preferred_zone);
+
+		/*
+		 * As async compaction considers a subset of pageblocks, only
+		 * defer if the failure was a sync compaction failure.
+		 */
+		if (sync_migration)
+			defer_compaction(preferred_zone);
 
 		cond_resched();
 	}
@@ -1933,8 +1945,9 @@ static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	return NULL;
 }
@@ -2084,6 +2097,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
 	bool sync_migration = false;
+	bool deferred_compaction = false;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -2164,12 +2178,22 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 	if (page)
 		goto got_pg;
 	sync_migration = true;
 
+	/*
+	 * If compaction is deferred for high-order allocations, it is because
+	 * sync compaction recently failed. In this is the case and the caller
+	 * has requested the system not be heavily disrupted, fail the
+	 * allocation now instead of entering direct reclaim
+	 */
+	if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
+		goto nopage;
+
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,
@@ -2232,8 +2256,9 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 		if (page)
 			goto got_pg;
 	}
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
                   ` (5 preceding siblings ...)
  2011-11-21 18:36 ` [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred Mel Gorman
@ 2011-11-21 18:36 ` Mel Gorman
  2011-11-22  6:56   ` Shaohua Li
  6 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-21 18:36 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Mel Gorman, Rik van Riel, Nai Xia, LKML

This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async
compaction maps to MIGRATE_ASYNC while sync compaction maps to
MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
hotplug, MIGRATE_SYNC is used.

This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be
a large number of dirty pages backed by a filesystem that does not
support ->writepages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/btrfs/disk-io.c      |    3 +-
 fs/nfs/internal.h       |    2 +-
 fs/nfs/write.c          |    2 +-
 include/linux/fs.h      |    6 ++-
 include/linux/migrate.h |   23 +++++++++++---
 mm/compaction.c         |    2 +-
 mm/memory-failure.c     |    2 +-
 mm/memory_hotplug.c     |    2 +-
 mm/mempolicy.c          |    2 +-
 mm/migrate.c            |   78 ++++++++++++++++++++++++++---------------------
 10 files changed, 73 insertions(+), 49 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f158b5c..0476254 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
-			struct page *newpage, struct page *page, bool sync)
+			struct page *newpage, struct page *page,
+			enum migrate_mode sync)
 {
 	/*
 	 * we can't safely write a btree page from here,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d0c460f..5434b19 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -328,7 +328,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-		struct page *, struct page *, bool);
+		struct page *, struct page *, enum migrate_mode);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 33475df..adb87d9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1711,7 +1711,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-		struct page *page, bool sync)
+		struct page *page, enum migrate_mode sync)
 {
 	/*
 	 * If PagePrivate is set, then the page is currently associated with
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 034cffb..8197a31 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -525,6 +525,7 @@ enum positive_aop_returns {
 struct page;
 struct address_space;
 struct writeback_control;
+enum migrate_mode;
 
 struct iov_iter {
 	const struct iovec *iov;
@@ -614,7 +615,7 @@ struct address_space_operations {
 	 * is false, it must not block.
 	 */
 	int (*migratepage) (struct address_space *,
-			struct page *, struct page *, bool);
+			struct page *, struct page *, enum migrate_mode);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
@@ -2580,7 +2581,8 @@ extern int generic_check_addressable(unsigned, u64);
 
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
-				struct page *, struct page *, bool);
+				struct page *, struct page *,
+				enum migrate_mode);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 14e6d2a..775787c 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -6,18 +6,31 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
+/*
+ * MIGRATE_ASYNC means never block
+ * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
+ *	on most operations but not ->writepage as the potential stall time
+ *	is too significant
+ * MIGRATE_SYNC will block when migrating pages
+ */
+enum migrate_mode {
+	MIGRATE_ASYNC,
+	MIGRATE_SYNC_LIGHT,
+	MIGRATE_SYNC,
+};
+
 #ifdef CONFIG_MIGRATION
 #define PAGE_MIGRATION 1
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-			struct page *, struct page *, bool);
+			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode sync);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -36,10 +49,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 0379263..dbe1da0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -555,7 +555,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
 				(unsigned long)cc, false,
-				cc->sync);
+				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 06d3479..56080ea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1557,7 +1557,7 @@ int soft_offline_page(struct page *page, int flags)
 					    page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
-								0, true);
+							0, MIGRATE_SYNC);
 		if (ret) {
 			putback_lru_pages(&pagelist);
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2168489..6629faf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -809,7 +809,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		}
 		/* this function returns # of failed pages */
 		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
+							true, MIGRATE_SYNC);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index adc3954..97009a4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -933,7 +933,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_node_page, dest,
-								false, true);
+							false, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index a5be362..44071dc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -222,12 +222,13 @@ out:
 
 #ifdef CONFIG_BLOCK
 /* Returns true if all buffers are successfully locked */
-static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+static bool buffer_migrate_lock_buffers(struct buffer_head *head,
+							enum migrate_mode mode)
 {
 	struct buffer_head *bh = head;
 
 	/* Simple case, sync compaction */
-	if (sync) {
+	if (mode != MIGRATE_ASYNC) {
 		do {
 			get_bh(bh);
 			lock_buffer(bh);
@@ -263,7 +264,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
 }
 #else
 static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
-								bool sync)
+							enum migrate_mode mode)
 {
 	return true;
 }
@@ -279,7 +280,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
  */
 static int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page,
-		struct buffer_head *head, bool sync)
+		struct buffer_head *head, enum migrate_mode mode)
 {
 	int expected_count;
 	void **pslot;
@@ -315,7 +316,8 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	 * the mapping back due to an elevated page count, we would have to
 	 * block waiting on other references to be dropped.
 	 */
-	if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+	if (mode == MIGRATE_ASYNC && head &&
+			!buffer_migrate_lock_buffers(head, mode)) {
 		page_unfreeze_refs(page, expected_count);
 		spin_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
@@ -478,13 +480,14 @@ EXPORT_SYMBOL(fail_migrate_page);
  * Pages are locked upon entry and exit.
  */
 int migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page, bool sync)
+		struct page *newpage, struct page *page,
+		enum migrate_mode mode)
 {
 	int rc;
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
+	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
 
 	if (rc)
 		return rc;
@@ -501,17 +504,17 @@ EXPORT_SYMBOL(migrate_page);
  * exist.
  */
 int buffer_migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page, bool sync)
+		struct page *newpage, struct page *page, enum migrate_mode mode)
 {
 	struct buffer_head *bh, *head;
 	int rc;
 
 	if (!page_has_buffers(page))
-		return migrate_page(mapping, newpage, page, sync);
+		return migrate_page(mapping, newpage, page, mode);
 
 	head = page_buffers(page);
 
-	rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
+	rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
 
 	if (rc)
 		return rc;
@@ -521,8 +524,8 @@ int buffer_migrate_page(struct address_space *mapping,
 	 * with an IRQ-safe spinlock held. In the sync case, the buffers
 	 * need to be locked no
 	 */
-	if (sync)
-		BUG_ON(!buffer_migrate_lock_buffers(head, sync));
+	if (mode != MIGRATE_ASYNC)
+		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
 	set_page_private(newpage, page_private(page));
@@ -599,10 +602,11 @@ static int writeout(struct address_space *mapping, struct page *page)
  * Default handling if a filesystem does not provide a migration function.
  */
 static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page, bool sync)
+	struct page *newpage, struct page *page, enum migrate_mode mode)
 {
 	if (PageDirty(page)) {
-		if (!sync)
+		/* Only writeback pages in full synchronous migration */
+		if (mode != MIGRATE_SYNC)
 			return -EBUSY;
 		return writeout(mapping, page);
 	}
@@ -615,7 +619,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
 
-	return migrate_page(mapping, newpage, page, sync);
+	return migrate_page(mapping, newpage, page, mode);
 }
 
 /*
@@ -630,7 +634,7 @@ static int fallback_migrate_page(struct address_space *mapping,
  *  == 0 - success
  */
 static int move_to_new_page(struct page *newpage, struct page *page,
-					int remap_swapcache, bool sync)
+				int remap_swapcache, enum migrate_mode mode)
 {
 	struct address_space *mapping;
 	int rc;
@@ -651,7 +655,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 
 	mapping = page_mapping(page);
 	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, sync);
+		rc = migrate_page(mapping, newpage, page, mode);
 	else if (mapping->a_ops->migratepage)
 		/*
 		 * Most pages have a mapping and most filesystems provide a
@@ -660,9 +664,9 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 		 * is the most common path for page migration.
 		 */
 		rc = mapping->a_ops->migratepage(mapping,
-						newpage, page, sync);
+						newpage, page, mode);
 	else
-		rc = fallback_migrate_page(mapping, newpage, page, sync);
+		rc = fallback_migrate_page(mapping, newpage, page, mode);
 
 	if (rc) {
 		newpage->mapping = NULL;
@@ -677,7 +681,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 }
 
 static int __unmap_and_move(struct page *page, struct page *newpage,
-				int force, bool offlining, bool sync)
+			int force, bool offlining, enum migrate_mode mode)
 {
 	int rc = -EAGAIN;
 	int remap_swapcache = 1;
@@ -686,7 +690,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	struct anon_vma *anon_vma = NULL;
 
 	if (!trylock_page(page)) {
-		if (!force || !sync)
+		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
 
 		/*
@@ -732,10 +736,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (PageWriteback(page)) {
 		/*
-		 * For !sync, there is no point retrying as the retry loop
-		 * is expected to be too short for PageWriteback to be cleared
+		 * Only in the case of a full syncronous migration is it
+		 * necessary to wait for PageWriteback. In the async case,
+		 * the retry loop is too short and in the sync-light case,
+		 * the overhead of stalling is too much
 		 */
-		if (!sync) {
+		if (mode != MIGRATE_SYNC) {
 			rc = -EBUSY;
 			goto uncharge;
 		}
@@ -806,7 +812,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 skip_unmap:
 	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page, remap_swapcache, sync);
+		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
 
 	if (rc && remap_swapcache)
 		remove_migration_ptes(page, page);
@@ -829,7 +835,8 @@ out:
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, bool offlining, bool sync)
+			struct page *page, int force, bool offlining,
+			enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -847,7 +854,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 		if (unlikely(split_huge_page(page)))
 			goto out;
 
-	rc = __unmap_and_move(page, newpage, force, offlining, sync);
+	rc = __unmap_and_move(page, newpage, force, offlining, mode);
 out:
 	if (rc != -EAGAIN) {
 		/*
@@ -895,7 +902,8 @@ out:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, bool offlining, bool sync)
+				int force, bool offlining,
+				enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -908,7 +916,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force || !sync)
+		if (!force || mode != MIGRATE_SYNC)
 			goto out;
 		lock_page(hpage);
 	}
@@ -919,7 +927,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 
 	if (!page_mapped(hpage))
-		rc = move_to_new_page(new_hpage, hpage, 1, sync);
+		rc = move_to_new_page(new_hpage, hpage, 1, mode);
 
 	if (rc)
 		remove_migration_ptes(hpage, hpage);
@@ -962,7 +970,7 @@ out:
  */
 int migrate_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -983,7 +991,7 @@ int migrate_pages(struct list_head *from,
 
 			rc = unmap_and_move(get_new_page, private,
 						page, pass > 2, offlining,
-						sync);
+						mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1013,7 +1021,7 @@ out:
 
 int migrate_huge_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -1030,7 +1038,7 @@ int migrate_huge_pages(struct list_head *from,
 
 			rc = unmap_and_move_huge_page(get_new_page,
 					private, page, pass > 2, offlining,
-					sync);
+					mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1159,7 +1167,7 @@ set_status:
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0, true);
+				(unsigned long)pm, 0, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-21 18:36 ` [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
@ 2011-11-22  6:56   ` Shaohua Li
  2011-11-22 10:14     ` Mel Gorman
  2011-11-23  2:01     ` Nai Xia
  0 siblings, 2 replies; 33+ messages in thread
From: Shaohua Li @ 2011-11-22  6:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
> mode that avoids writing back pages to backing storage. Async
> compaction maps to MIGRATE_ASYNC while sync compaction maps to
> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
> hotplug, MIGRATE_SYNC is used.
> 
> This avoids sync compaction stalling for an excessive length of time,
> particularly when copying files to a USB stick where there might be
> a large number of dirty pages backed by a filesystem that does not
> support ->writepages.
Hi,
from my understanding, with this, even writes
to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
intended?
on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
lock, so could wait on page read. page read and page out have the same
latency, why takes them different?

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22  6:56   ` Shaohua Li
@ 2011-11-22 10:14     ` Mel Gorman
  2011-11-22 11:54       ` Jan Kara
  2011-11-23  2:01     ` Nai Xia
  1 sibling, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-22 10:14 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
> On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> > This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
> > mode that avoids writing back pages to backing storage. Async
> > compaction maps to MIGRATE_ASYNC while sync compaction maps to
> > MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
> > hotplug, MIGRATE_SYNC is used.
> > 
> > This avoids sync compaction stalling for an excessive length of time,
> > particularly when copying files to a USB stick where there might be
> > a large number of dirty pages backed by a filesystem that does not
> > support ->writepages.
> Hi,
> from my understanding, with this, even writes
> to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
> intended?

For the moment yes so that manual and automatic compaction behave
similarly. For example, if one runs a workload that periodically
tries to fault transparent hugepages and it steadily gets X huge
pages and running manual compaction gets more, it can indicate a bug
in how and when compaction runs. If manual compaction is significantly
different, the comparison is not as useful. I know this is a bit weak
as an example but right now there is no strong motivation right now
for manual compaction to use MIGRATE_SYNC.

> on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> lock, so could wait on page read. page read and page out have the same
> latency, why takes them different?
> 

That's a very reasonable question.

To date, the stalls that were reported to be a problem were related to
heavy writing workloads. Workloads are naturally throttled on reads
but not necessarily on writes and the IO scheduler priorities sync
reads over writes which contributes to keeping stalls due to page
reads low.  In my own tests, there have been no significant stalls
due to waiting on page reads. I accept this may be because the stall
threshold I record is too low.

Still, I double checked an old USB copy based test to see what the
compaction-related stalls really were.

58 seconds	waiting on PageWriteback
22 seconds	waiting on generic_make_request calling ->writepage

These are total times, each stall was about 2-5 seconds and very rough
estimates. There were no other sources of stalls that had compaction
in the stacktrace I'm rerunning to gather more accurate stall times
and for a workload similar to Andrea's and will see if page reads
crop up as a major source of stalls.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 10:14     ` Mel Gorman
@ 2011-11-22 11:54       ` Jan Kara
  2011-11-22 13:59         ` Nai Xia
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2011-11-22 11:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Tue 22-11-11 10:14:51, Mel Gorman wrote:
> On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
> > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > lock, so could wait on page read. page read and page out have the same
> > latency, why takes them different?
> > 
> 
> That's a very reasonable question.
> 
> To date, the stalls that were reported to be a problem were related to
> heavy writing workloads. Workloads are naturally throttled on reads
> but not necessarily on writes and the IO scheduler priorities sync
> reads over writes which contributes to keeping stalls due to page
> reads low.  In my own tests, there have been no significant stalls
> due to waiting on page reads. I accept this may be because the stall
> threshold I record is too low.
> 
> Still, I double checked an old USB copy based test to see what the
> compaction-related stalls really were.
> 
> 58 seconds	waiting on PageWriteback
> 22 seconds	waiting on generic_make_request calling ->writepage
> 
> These are total times, each stall was about 2-5 seconds and very rough
> estimates. There were no other sources of stalls that had compaction
> in the stacktrace I'm rerunning to gather more accurate stall times
> and for a workload similar to Andrea's and will see if page reads
> crop up as a major source of stalls.
  OK, but the fact that reads do not stall may pretty much depend on the
behavior of the underlying IO scheduler and we probably don't want to rely
on it's behavior too closely. So if you are going to treat reads in a
special way, check with NOOP or DEADLINE io schedulers that read-stalls
are not a problem with them as well.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 11:54       ` Jan Kara
@ 2011-11-22 13:59         ` Nai Xia
  2011-11-22 15:07           ` Nai Xia
  2011-11-22 19:13           ` Jan Kara
  0 siblings, 2 replies; 33+ messages in thread
From: Nai Xia @ 2011-11-22 13:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Tuesday 22 November 2011 19:54:27 Jan Kara wrote:
> On Tue 22-11-11 10:14:51, Mel Gorman wrote:
> > On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
> > > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> > > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > > lock, so could wait on page read. page read and page out have the same
> > > latency, why takes them different?
> > > 
> > 
> > That's a very reasonable question.
> > 
> > To date, the stalls that were reported to be a problem were related to
> > heavy writing workloads. Workloads are naturally throttled on reads
> > but not necessarily on writes and the IO scheduler priorities sync
> > reads over writes which contributes to keeping stalls due to page
> > reads low.  In my own tests, there have been no significant stalls
> > due to waiting on page reads. I accept this may be because the stall
> > threshold I record is too low.
> > 
> > Still, I double checked an old USB copy based test to see what the
> > compaction-related stalls really were.
> > 
> > 58 seconds	waiting on PageWriteback
> > 22 seconds	waiting on generic_make_request calling ->writepage
> > 
> > These are total times, each stall was about 2-5 seconds and very rough
> > estimates. There were no other sources of stalls that had compaction
> > in the stacktrace I'm rerunning to gather more accurate stall times
> > and for a workload similar to Andrea's and will see if page reads
> > crop up as a major source of stalls.
>   OK, but the fact that reads do not stall may pretty much depend on the
> behavior of the underlying IO scheduler and we probably don't want to rely
> on it's behavior too closely. So if you are going to treat reads in a
> special way, check with NOOP or DEADLINE io schedulers that read-stalls
> are not a problem with them as well.

Compared to the IO scheduler, I actually expect this behavior is more related
to these two facts:

1) Due to the IO direction , most pages to be read are still in disk,
while most pages to be write are in memory. 

2) And as Mel explained, read trends to be sync, write trends to be async,
so for decent IO schedulers, no matter what they differ in each other, 
should almost agree no favoring read more than write. 

So that amounts to the following calculation that is important to the 
statistical stall time for the compaction:

     page_nr *  average_stall_window_time

where average_stall_window_time is the window for a page between 
NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
number of pages in stall window for read or write.

So for general cases, 
Fact 1) may ensure that the page_nr is smaller for read, while
fact 2) may ensure the same for average_locking_window_time. 

I am not sure this will be the same case for all workloads, 
don't know if Mel has tested large readahead workloads which 
has more async read IOs and less writebacks. 

But theoretically I expect things are not that bad even for large
readahead, because readahead is triggered by the readahead TAG in
linear order, which means for a process to generating readahead IO,
its speed is still somewhat govened by the read IO speed. While
for a process writing to a file mapped memory area, it may well
exceed the speed of its backing-store writing speed. 

Aside from that, I think the relation between page locking and 
page read is not 1-to-1, in other words, there maybe quite some
transient page locking is caused by mmap and then page fault into 
already good-state pages requiring no IO at all. For these 
transient page lockings I think it's reasonable to have light 
waiting. 

Correct me please, if sth is wrong in my reasoning. :)

Thanks

Nai

> 
> 								Honza
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 13:59         ` Nai Xia
@ 2011-11-22 15:07           ` Nai Xia
  2011-11-22 19:13           ` Jan Kara
  1 sibling, 0 replies; 33+ messages in thread
From: Nai Xia @ 2011-11-22 15:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Tue, Nov 22, 2011 at 9:59 PM, Nai Xia <nai.xia@gmail.com> wrote:
> On Tuesday 22 November 2011 19:54:27 Jan Kara wrote:
>> On Tue 22-11-11 10:14:51, Mel Gorman wrote:
>> > On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
>> > > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
>> > > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
>> > > lock, so could wait on page read. page read and page out have the same
>> > > latency, why takes them different?
>> > >
>> >
>> > That's a very reasonable question.
>> >
>> > To date, the stalls that were reported to be a problem were related to
>> > heavy writing workloads. Workloads are naturally throttled on reads
>> > but not necessarily on writes and the IO scheduler priorities sync
>> > reads over writes which contributes to keeping stalls due to page
>> > reads low.  In my own tests, there have been no significant stalls
>> > due to waiting on page reads. I accept this may be because the stall
>> > threshold I record is too low.
>> >
>> > Still, I double checked an old USB copy based test to see what the
>> > compaction-related stalls really were.
>> >
>> > 58 seconds  waiting on PageWriteback
>> > 22 seconds  waiting on generic_make_request calling ->writepage
>> >
>> > These are total times, each stall was about 2-5 seconds and very rough
>> > estimates. There were no other sources of stalls that had compaction
>> > in the stacktrace I'm rerunning to gather more accurate stall times
>> > and for a workload similar to Andrea's and will see if page reads
>> > crop up as a major source of stalls.
>>   OK, but the fact that reads do not stall may pretty much depend on the
>> behavior of the underlying IO scheduler and we probably don't want to rely
>> on it's behavior too closely. So if you are going to treat reads in a
>> special way, check with NOOP or DEADLINE io schedulers that read-stalls
>> are not a problem with them as well.
>
> Compared to the IO scheduler, I actually expect this behavior is more related
> to these two facts:
>
> 1) Due to the IO direction , most pages to be read are still in disk,
> while most pages to be write are in memory.
>
> 2) And as Mel explained, read trends to be sync, write trends to be async,
> so for decent IO schedulers, no matter what they differ in each other,
> should almost agree no favoring read more than write.

er... I mean "agree on", a typo...

>
> So that amounts to the following calculation that is important to the
> statistical stall time for the compaction:
>
>     page_nr *  average_stall_window_time
>
> where average_stall_window_time is the window for a page between
> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
> number of pages in stall window for read or write.
>
> So for general cases,
> Fact 1) may ensure that the page_nr is smaller for read, while
> fact 2) may ensure the same for average_locking_window_time.
>
> I am not sure this will be the same case for all workloads,
> don't know if Mel has tested large readahead workloads which
> has more async read IOs and less writebacks.
>
> But theoretically I expect things are not that bad even for large
> readahead, because readahead is triggered by the readahead TAG in
> linear order, which means for a process to generating readahead IO,
> its speed is still somewhat govened by the read IO speed. While
> for a process writing to a file mapped memory area, it may well
> exceed the speed of its backing-store writing speed.
>
>
> Aside from that, I think the relation between page locking and
> page read is not 1-to-1, in other words, there maybe quite some
> transient page locking is caused by mmap and then page fault into
> already good-state pages requiring no IO at all. For these
> transient page lockings I think it's reasonable to have light
> waiting.

BTW, I also suggest that  maybe an early PageUptodate test
before page locking can further fine-grain the sync mode, which
can statistically( not 100% sure for early lookup of course)
distinguish the transient page locking from read locking.


Nai

>
> Correct me please, if sth is wrong in my reasoning. :)
>
>
> Thanks
>
> Nai
>
>>
>>                                                               Honza
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages
  2011-11-21 18:36 ` [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
@ 2011-11-22 16:58   ` Minchan Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2011-11-22 16:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Mon, Nov 21, 2011 at 06:36:42PM +0000, Mel Gorman wrote:
> Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
> noted that compaction does not migrate dirty or writeback pages and
> that is was meaningless to pick the page and re-add it to the LRU list.
> 
> What was missed during review is that asynchronous migration moves
> dirty pages if their ->migratepage callback is migrate_page() because
> these can be moved without blocking. This potentially impacted
> hugepage allocation success rates by a factor depending on how many
> dirty pages are in the system.
> 
> This patch partially reverts 39deaf85 to allow migration to isolate
> dirty pages again. This increases how much compaction disrupts the
> LRU but that is addressed later in the series.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Mel, Thanks for the fix.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory
  2011-11-21 18:36 ` [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
@ 2011-11-22 17:00   ` Minchan Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2011-11-22 17:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Mon, Nov 21, 2011 at 06:36:43PM +0000, Mel Gorman wrote:
> When asynchronous compaction was introduced, the
> /proc/sys/vm/compact_memory handler should have been updated to always
> use synchronous compaction. This did not happen so this patch addresses
> it. The assumption is if a user writes to /proc/sys/vm/compact_memory,
> they are willing for that process to stall.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan
  2011-11-21 18:36 ` [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan Mel Gorman
@ 2011-11-22 17:05   ` Minchan Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2011-11-22 17:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Mon, Nov 21, 2011 at 06:36:44PM +0000, Mel Gorman wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> Properly take into account if we isolated a compound page during the
> lumpy scan in reclaim and skip over the tail pages when encounted.
> This corrects the values given to the tracepoint for number of lumpy
> pages isolated and will avoid breaking the loop early if compound
> pages smaller than the requested allocation size are requested.
> 
> [mgorman@suse.de: Updated changelog]
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

I would like to see removing lumpy part in vmscan.c.
It is complicated day by day. :(

Having said that, it looks good to me now.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again
  2011-11-21 18:36 ` [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
@ 2011-11-22 17:30   ` Minchan Kim
  2011-11-23  9:19     ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Minchan Kim @ 2011-11-22 17:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Mon, Nov 21, 2011 at 06:36:46PM +0000, Mel Gorman wrote:
> Commit [39deaf85: mm: compaction: make isolate_lru_page() filter-aware]
> noted that compaction does not migrate dirty or writeback pages and
> that is was meaningless to pick the page and re-add it to the LRU list.
> This had to be partially reverted because some dirty pages can be
> migrated by compaction without blocking.
> 
> This patch updates "mm: compaction: make isolate_lru_page" by skipping
> over pages that migration has no possibility of migrating to minimise
> LRU disruption.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/compaction.c        |    3 +++
>  mm/vmscan.c            |   36 ++++++++++++++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 188cb2f..ac5b522 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -173,6 +173,8 @@ static inline int is_unevictable_lru(enum lru_list l)
>  #define ISOLATE_CLEAN		((__force isolate_mode_t)0x4)
>  /* Isolate unmapped file */
>  #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
> +/* Isolate for asynchronous migration */
> +#define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
>  
>  /* LRU Isolation modes. */
>  typedef unsigned __bitwise__ isolate_mode_t;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 615502b..0379263 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -349,6 +349,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			continue;
>  		}
>  
> +		if (!cc->sync)
> +			mode |= ISOLATE_ASYNC_MIGRATE;
> +
>  		/* Try isolate the page */
>  		if (__isolate_lru_page(page, mode, 0) != 0)
>  			continue;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3421746..28df0ed 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1061,8 +1061,40 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>  
>  	ret = -EBUSY;
>  
> -	if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
> -		return ret;
> +	/*
> +	 * To minimise LRU disruption, the caller can indicate that it only
> +	 * wants to isolate pages it will be able to operate on without
> +	 * blocking - clean pages for the most part.
> +	 *
> +	 * ISOLATE_CLEAN means that only clean pages should be isolated. This
> +	 * is used by reclaim when it is cannot write to backing storage
> +	 *
> +	 * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
> +	 * that it is possible to migrate without blocking with a ->migratepage
> +	 * handler
> +	 */
> +	if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
> +		/* All the caller can do on PageWriteback is block */
> +		if (PageWriteback(page))
> +			return ret;
> +
> +		if (PageDirty(page)) {
> +			struct address_space *mapping;
> +
> +			/* ISOLATE_CLEAN means only clean pages */
> +			if (mode & ISOLATE_CLEAN)
> +				return ret;
> +
> +			/*
> +			 * Only the ->migratepage callback knows if a dirty
> +			 * page can be migrated without blocking. Skip the
> +			 * page unless there is a ->migratepage callback.
> +			 */
> +			mapping = page_mapping(page);
> +			if (!mapping || !mapping->a_ops->migratepage)

I didn't review 4/7 carefully yet.
In case of page_mapping is NULL, move_to_new_page calls migrate_page
which is non-blocking function. So, I guess it could be migrated without blocking.
 
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred
  2011-11-21 18:36 ` [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred Mel Gorman
@ 2011-11-22 17:50   ` Minchan Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Minchan Kim @ 2011-11-22 17:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Mon, Nov 21, 2011 at 06:36:47PM +0000, Mel Gorman wrote:
> If compaction is deferred, we enter direct reclaim to try reclaim the
> pages that way. For small high-orders, this has a reasonable chance
> of success. However, if the caller as specified __GFP_NO_KSWAPD to
> limit the disruption to the system, it makes more sense to fail the
> allocation rather than stall the caller in direct reclaim. This patch
> will skip direct reclaim if compaction is deferred and the caller
> specifies __GFP_NO_KSWAPD.
> 
> Async compaction only considers a subset of pages so it is possible for
> compaction to be deferred prematurely and not enter direct reclaim even
> in cases where it should. To compensate for this, this patch also defers
> compaction only if sync compaction failed.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan.kim@gmail.com>

It does make sense to me. 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 13:59         ` Nai Xia
  2011-11-22 15:07           ` Nai Xia
@ 2011-11-22 19:13           ` Jan Kara
  2011-11-22 22:44             ` Nai Xia
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Kara @ 2011-11-22 19:13 UTC (permalink / raw)
  To: Nai Xia
  Cc: Jan Kara, Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli,
	Minchan Kim, Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Tue 22-11-11 21:59:24, Nai Xia wrote:
> On Tuesday 22 November 2011 19:54:27 Jan Kara wrote:
> > On Tue 22-11-11 10:14:51, Mel Gorman wrote:
> > > On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
> > > > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> > > > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > > > lock, so could wait on page read. page read and page out have the same
> > > > latency, why takes them different?
> > > > 
> > > 
> > > That's a very reasonable question.
> > > 
> > > To date, the stalls that were reported to be a problem were related to
> > > heavy writing workloads. Workloads are naturally throttled on reads
> > > but not necessarily on writes and the IO scheduler priorities sync
> > > reads over writes which contributes to keeping stalls due to page
> > > reads low.  In my own tests, there have been no significant stalls
> > > due to waiting on page reads. I accept this may be because the stall
> > > threshold I record is too low.
> > > 
> > > Still, I double checked an old USB copy based test to see what the
> > > compaction-related stalls really were.
> > > 
> > > 58 seconds	waiting on PageWriteback
> > > 22 seconds	waiting on generic_make_request calling ->writepage
> > > 
> > > These are total times, each stall was about 2-5 seconds and very rough
> > > estimates. There were no other sources of stalls that had compaction
> > > in the stacktrace I'm rerunning to gather more accurate stall times
> > > and for a workload similar to Andrea's and will see if page reads
> > > crop up as a major source of stalls.
> >   OK, but the fact that reads do not stall may pretty much depend on the
> > behavior of the underlying IO scheduler and we probably don't want to rely
> > on it's behavior too closely. So if you are going to treat reads in a
> > special way, check with NOOP or DEADLINE io schedulers that read-stalls
> > are not a problem with them as well.
> 
> Compared to the IO scheduler, I actually expect this behavior is more related
> to these two facts:
> 
> 1) Due to the IO direction , most pages to be read are still in disk,
> while most pages to be write are in memory. 
> 
> 2) And as Mel explained, read trends to be sync, write trends to be async,
> so for decent IO schedulers, no matter what they differ in each other, 
> should almost agree no favoring read more than write. 
  This is not true. CFQ heavily prefers read IO over write IO. Deadline
scheduler slightly prefers reads and noop io scheduler has no preference.
As a result, page which is read from disk is going to be locked for shorter
time with CFQ scheduler than with NOOP scheduler on average.
 
> So that amounts to the following calculation that is important to the 
> statistical stall time for the compaction:
> 
>      page_nr *  average_stall_window_time
> 
> where average_stall_window_time is the window for a page between 
> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
> number of pages in stall window for read or write.
> 
> So for general cases, 
> Fact 1) may ensure that the page_nr is smaller for read, while
> fact 2) may ensure the same for average_locking_window_time.
  Well, page_nr really depends on the load. If the workload is only reads,
clearly number of read pages is going to be higher than number of written
pages. Once workload does heavy writing, I agree number of pages under
writeback is likely going to be higher.
 
> I am not sure this will be the same case for all workloads, 
> don't know if Mel has tested large readahead workloads which 
> has more async read IOs and less writebacks. 
> 
> But theoretically I expect things are not that bad even for large
> readahead, because readahead is triggered by the readahead TAG in
> linear order, which means for a process to generating readahead IO,
> its speed is still somewhat govened by the read IO speed. While
> for a process writing to a file mapped memory area, it may well
> exceed the speed of its backing-store writing speed. 
> 
> 
> Aside from that, I think the relation between page locking and 
> page read is not 1-to-1, in other words, there maybe quite some
> transient page locking is caused by mmap and then page fault into 
> already good-state pages requiring no IO at all. For these 
> transient page lockings I think it's reasonable to have light 
> waiting. 
  Definitely there are other lockings than for read. E.g. to write a page,
we lock it first, submit IO (which can actually block waiting for request
to get freed), set PageWriteback, and unlock the page. And there are more
transient ones like you mention above...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 19:13           ` Jan Kara
@ 2011-11-22 22:44             ` Nai Xia
  2011-11-23 11:39               ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Nai Xia @ 2011-11-22 22:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 3:13 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 22-11-11 21:59:24, Nai Xia wrote:
>> On Tuesday 22 November 2011 19:54:27 Jan Kara wrote:
>> > On Tue 22-11-11 10:14:51, Mel Gorman wrote:
>> > > On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
>> > > > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
>> > > > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
>> > > > lock, so could wait on page read. page read and page out have the same
>> > > > latency, why takes them different?
>> > > >
>> > >
>> > > That's a very reasonable question.
>> > >
>> > > To date, the stalls that were reported to be a problem were related to
>> > > heavy writing workloads. Workloads are naturally throttled on reads
>> > > but not necessarily on writes and the IO scheduler priorities sync
>> > > reads over writes which contributes to keeping stalls due to page
>> > > reads low.  In my own tests, there have been no significant stalls
>> > > due to waiting on page reads. I accept this may be because the stall
>> > > threshold I record is too low.
>> > >
>> > > Still, I double checked an old USB copy based test to see what the
>> > > compaction-related stalls really were.
>> > >
>> > > 58 seconds        waiting on PageWriteback
>> > > 22 seconds        waiting on generic_make_request calling ->writepage
>> > >
>> > > These are total times, each stall was about 2-5 seconds and very rough
>> > > estimates. There were no other sources of stalls that had compaction
>> > > in the stacktrace I'm rerunning to gather more accurate stall times
>> > > and for a workload similar to Andrea's and will see if page reads
>> > > crop up as a major source of stalls.
>> >   OK, but the fact that reads do not stall may pretty much depend on the
>> > behavior of the underlying IO scheduler and we probably don't want to rely
>> > on it's behavior too closely. So if you are going to treat reads in a
>> > special way, check with NOOP or DEADLINE io schedulers that read-stalls
>> > are not a problem with them as well.
>>
>> Compared to the IO scheduler, I actually expect this behavior is more related
>> to these two facts:
>>
>> 1) Due to the IO direction , most pages to be read are still in disk,
>> while most pages to be write are in memory.
>>
>> 2) And as Mel explained, read trends to be sync, write trends to be async,
>> so for decent IO schedulers, no matter what they differ in each other,
>> should almost agree no favoring read more than write.
>  This is not true. CFQ heavily prefers read IO over write IO. Deadline
> scheduler slightly prefers reads and noop io scheduler has no preference.
> As a result, page which is read from disk is going to be locked for shorter
> time with CFQ scheduler than with NOOP scheduler on average.

I was just meaning that for an optimized scheduler not matter "slightly" or
"heavily" they agree on "prefering read over write"....
But well, I am really not very conscious about how "slightly" that can be,
maybe it's not about to make any difference.

>
>> So that amounts to the following calculation that is important to the
>> statistical stall time for the compaction:
>>
>>      page_nr *  average_stall_window_time
>>
>> where average_stall_window_time is the window for a page between
>> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
>> number of pages in stall window for read or write.
>>
>> So for general cases,
>> Fact 1) may ensure that the page_nr is smaller for read, while
>> fact 2) may ensure the same for average_locking_window_time.
>  Well, page_nr really depends on the load. If the workload is only reads,
> clearly number of read pages is going to be higher than number of written
> pages. Once workload does heavy writing, I agree number of pages under
> writeback is likely going to be higher.

Think about process A linearly scans 100MB mapped file pages
area for read, and another process B linearly writes to a same sized area.
If there is no readahead, the read page in stall window in memory is only
*one* page each time. However, 100MB dirty pages can be hold in memory
waiting to be write which may stall the compaction for fallback_migrate_page().
Even for buffer_migrate_page() these pages are much more likely to get locked
by other behaviors like you said for IO submission,etc.

I was not sure about readahead, of course,  I only theoretically
expected its still not
comparable to the totally async write behavior.

>
>> I am not sure this will be the same case for all workloads,
>> don't know if Mel has tested large readahead workloads which
>> has more async read IOs and less writebacks.
>>
>> But theoretically I expect things are not that bad even for large
>> readahead, because readahead is triggered by the readahead TAG in
>> linear order, which means for a process to generating readahead IO,
>> its speed is still somewhat govened by the read IO speed. While
>> for a process writing to a file mapped memory area, it may well
>> exceed the speed of its backing-store writing speed.
>>
>>
>> Aside from that, I think the relation between page locking and
>> page read is not 1-to-1, in other words, there maybe quite some
>> transient page locking is caused by mmap and then page fault into
>> already good-state pages requiring no IO at all. For these
>> transient page lockings I think it's reasonable to have light
>> waiting.
>  Definitely there are other lockings than for read. E.g. to write a page,
> we lock it first, submit IO (which can actually block waiting for request
> to get freed), set PageWriteback, and unlock the page. And there are more
> transient ones like you mention above...

Yes, you are right.
But I think we were talking about distinguishing page locking from page read
IO?

Well, I might also want to suggest that do an early dirty test before
taking the
lock...but, I expect page NotUpToDate is much more likely an indication that
we are going to block for IO on the following page lock. Dirty test is not that
strong. Do you agree ?

Nai

>
>                                                                Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22  6:56   ` Shaohua Li
  2011-11-22 10:14     ` Mel Gorman
@ 2011-11-23  2:01     ` Nai Xia
  2011-11-23  2:25       ` Shaohua Li
  2011-11-23 11:00       ` Mel Gorman
  1 sibling, 2 replies; 33+ messages in thread
From: Nai Xia @ 2011-11-23  2:01 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Mel Gorman, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Tue, Nov 22, 2011 at 2:56 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
>> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
>> mode that avoids writing back pages to backing storage. Async
>> compaction maps to MIGRATE_ASYNC while sync compaction maps to
>> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
>> hotplug, MIGRATE_SYNC is used.
>>
>> This avoids sync compaction stalling for an excessive length of time,
>> particularly when copying files to a USB stick where there might be
>> a large number of dirty pages backed by a filesystem that does not
>> support ->writepages.
> Hi,
> from my understanding, with this, even writes
> to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
> intended?
> on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> lock, so could wait on page read. page read and page out have the same
> latency, why takes them different?

So for the problem you raised, I think my suggestion to Mel is to adopt the
following logic:

           if (!trylock_page(page) && !PageUptodate(page))
                      we are quite likely to block on read, so we
                      depend on yet another MIGRATE_SYNC_MODE to decide
                      if we really want to lock_page() and wait for this IO.

How do you think ?


Thanks,

Nai
>
> Thanks,
> Shaohua
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23  2:01     ` Nai Xia
@ 2011-11-23  2:25       ` Shaohua Li
  2011-11-23 11:00       ` Mel Gorman
  1 sibling, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2011-11-23  2:25 UTC (permalink / raw)
  To: Nai Xia
  Cc: Mel Gorman, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, 2011-11-23 at 10:01 +0800, Nai Xia wrote:
> On Tue, Nov 22, 2011 at 2:56 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> >> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
> >> mode that avoids writing back pages to backing storage. Async
> >> compaction maps to MIGRATE_ASYNC while sync compaction maps to
> >> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
> >> hotplug, MIGRATE_SYNC is used.
> >>
> >> This avoids sync compaction stalling for an excessive length of time,
> >> particularly when copying files to a USB stick where there might be
> >> a large number of dirty pages backed by a filesystem that does not
> >> support ->writepages.
> > Hi,
> > from my understanding, with this, even writes
> > to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
> > intended?
> > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > lock, so could wait on page read. page read and page out have the same
> > latency, why takes them different?
> 
> So for the problem you raised, I think my suggestion to Mel is to adopt the
> following logic:
> 
>            if (!trylock_page(page) && !PageUptodate(page))
>                       we are quite likely to block on read, so we
>                       depend on yet another MIGRATE_SYNC_MODE to decide
>                       if we really want to lock_page() and wait for this IO.
> 
> How do you think ?
assume the PageUptodate() is at the check for 'goto out'. yes, looks
reasonable to me. And we need similar check for buffer_head.

Thanks,
Shaohua


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again
  2011-11-22 17:30   ` Minchan Kim
@ 2011-11-23  9:19     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2011-11-23  9:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux-MM, Andrea Arcangeli, Jan Kara, Andy Isaacson,
	Johannes Weiner, Rik van Riel, Nai Xia, LKML

On Wed, Nov 23, 2011 at 02:30:18AM +0900, Minchan Kim wrote:
> > <SNIP>
> > +	/*
> > +	 * To minimise LRU disruption, the caller can indicate that it only
> > +	 * wants to isolate pages it will be able to operate on without
> > +	 * blocking - clean pages for the most part.
> > +	 *
> > +	 * ISOLATE_CLEAN means that only clean pages should be isolated. This
> > +	 * is used by reclaim when it is cannot write to backing storage
> > +	 *
> > +	 * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
> > +	 * that it is possible to migrate without blocking with a ->migratepage
> > +	 * handler
> > +	 */
> > +	if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
> > +		/* All the caller can do on PageWriteback is block */
> > +		if (PageWriteback(page))
> > +			return ret;
> > +
> > +		if (PageDirty(page)) {
> > +			struct address_space *mapping;
> > +
> > +			/* ISOLATE_CLEAN means only clean pages */
> > +			if (mode & ISOLATE_CLEAN)
> > +				return ret;
> > +
> > +			/*
> > +			 * Only the ->migratepage callback knows if a dirty
> > +			 * page can be migrated without blocking. Skip the
> > +			 * page unless there is a ->migratepage callback.
> > +			 */
> > +			mapping = page_mapping(page);
> > +			if (!mapping || !mapping->a_ops->migratepage)
> 
> I didn't review 4/7 carefully yet.

Thanks for reviewing the others.

> In case of page_mapping is NULL, move_to_new_page calls migrate_page
> which is non-blocking function. So, I guess it could be migrated without blocking.
>  

Well spotted

                        /*
                         * Only pages without mappings or that have a
                         * ->migratepage callback are possible to
                         * migrate without blocking
                         */
                        mapping = page_mapping(page);
                        if (mapping && !mapping->a_ops->migratepage)
                                return ret;

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23  2:01     ` Nai Xia
  2011-11-23  2:25       ` Shaohua Li
@ 2011-11-23 11:00       ` Mel Gorman
  2011-11-23 12:51         ` Nai Xia
  2011-11-23 13:05         ` Nai Xia
  1 sibling, 2 replies; 33+ messages in thread
From: Mel Gorman @ 2011-11-23 11:00 UTC (permalink / raw)
  To: Nai Xia
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 10:01:53AM +0800, Nai Xia wrote:
> On Tue, Nov 22, 2011 at 2:56 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> >> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
> >> mode that avoids writing back pages to backing storage. Async
> >> compaction maps to MIGRATE_ASYNC while sync compaction maps to
> >> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
> >> hotplug, MIGRATE_SYNC is used.
> >>
> >> This avoids sync compaction stalling for an excessive length of time,
> >> particularly when copying files to a USB stick where there might be
> >> a large number of dirty pages backed by a filesystem that does not
> >> support ->writepages.
> > Hi,
> > from my understanding, with this, even writes
> > to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
> > intended?
> > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > lock, so could wait on page read. page read and page out have the same
> > latency, why takes them different?
> 
> So for the problem you raised, I think my suggestion to Mel is to adopt the
> following logic:
> 
>            if (!trylock_page(page) && !PageUptodate(page))
>                       we are quite likely to block on read, so we
>                       depend on yet another MIGRATE_SYNC_MODE to decide
>                       if we really want to lock_page() and wait for this IO.
> 
> How do you think ?
> 

Where are you adding this check?

If you mean in __unmap_and_move(), the check is unnecessary unless
another subsystem starts using sync-light compaction. With this series,
only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is
not up to date, it is also locked during the IO and unlocked after
setting Uptodate in the IO completion handler.

As the page is locked, compaction will fail trylock_page, do the
PF_MEMALLOC check and bail as it is not safe for direct compaction
to call lock_page as the comment in __unmap_and_move explains. This
should avoid the stall.

Did I misunderstand your suggestion?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-22 22:44             ` Nai Xia
@ 2011-11-23 11:39               ` Jan Kara
  2011-11-23 12:20                 ` Nai Xia
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2011-11-23 11:39 UTC (permalink / raw)
  To: Nai Xia
  Cc: Jan Kara, Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli,
	Minchan Kim, Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed 23-11-11 06:44:23, Nai Xia wrote:
> >> So that amounts to the following calculation that is important to the
> >> statistical stall time for the compaction:
> >>
> >>      page_nr *  average_stall_window_time
> >>
> >> where average_stall_window_time is the window for a page between
> >> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
> >> number of pages in stall window for read or write.
> >>
> >> So for general cases,
> >> Fact 1) may ensure that the page_nr is smaller for read, while
> >> fact 2) may ensure the same for average_locking_window_time.
> >  Well, page_nr really depends on the load. If the workload is only reads,
> > clearly number of read pages is going to be higher than number of written
> > pages. Once workload does heavy writing, I agree number of pages under
> > writeback is likely going to be higher.
> 
> Think about process A linearly scans 100MB mapped file pages
> area for read, and another process B linearly writes to a same sized area.
> If there is no readahead, the read page in stall window in memory is only
> *one* page each time.
  Yes, I understand this. But in a situation where there is *no* process
writing and *hundred* processes reading, you clearly have more pages locked
for reading than for writing. All I wanted to say is that your broad
statement that the number of pages read from disk is lower than the number
of pages written is not true in general. It depends on the workload.

> However, 100MB dirty pages can be hold in memory
> waiting to be write which may stall the compaction for fallback_migrate_page().
> Even for buffer_migrate_page() these pages are much more likely to get locked
> by other behaviors like you said for IO submission,etc.
> 
> I was not sure about readahead, of course,  I only theoretically
> expected its still not
> comparable to the totally async write behavior.
> 
> >
> >> I am not sure this will be the same case for all workloads,
> >> don't know if Mel has tested large readahead workloads which
> >> has more async read IOs and less writebacks.
> >>
> >> But theoretically I expect things are not that bad even for large
> >> readahead, because readahead is triggered by the readahead TAG in
> >> linear order, which means for a process to generating readahead IO,
> >> its speed is still somewhat govened by the read IO speed. While
> >> for a process writing to a file mapped memory area, it may well
> >> exceed the speed of its backing-store writing speed.
> >>
> >>
> >> Aside from that, I think the relation between page locking and
> >> page read is not 1-to-1, in other words, there maybe quite some
> >> transient page locking is caused by mmap and then page fault into
> >> already good-state pages requiring no IO at all. For these
> >> transient page lockings I think it's reasonable to have light
> >> waiting.
> >  Definitely there are other lockings than for read. E.g. to write a page,
> > we lock it first, submit IO (which can actually block waiting for request
> > to get freed), set PageWriteback, and unlock the page. And there are more
> > transient ones like you mention above...
> 
> Yes, you are right.
> But I think we were talking about distinguishing page locking from page read
> IO?
> 
> Well, I might also want to suggest that do an early dirty test before
> taking the lock...but, I expect page NotUpToDate is much more likely an
> indication that we are going to block for IO on the following page lock.
> Dirty test is not that strong. Do you agree ?
  Yes, I agree with this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 11:39               ` Jan Kara
@ 2011-11-23 12:20                 ` Nai Xia
  0 siblings, 0 replies; 33+ messages in thread
From: Nai Xia @ 2011-11-23 12:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: Mel Gorman, Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 7:39 PM, Jan Kara <jack@suse.cz> wrote:
> On Wed 23-11-11 06:44:23, Nai Xia wrote:
>> >> So that amounts to the following calculation that is important to the
>> >> statistical stall time for the compaction:
>> >>
>> >>      page_nr *  average_stall_window_time
>> >>
>> >> where average_stall_window_time is the window for a page between
>> >> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
>> >> number of pages in stall window for read or write.
>> >>
>> >> So for general cases,
>> >> Fact 1) may ensure that the page_nr is smaller for read, while
>> >> fact 2) may ensure the same for average_locking_window_time.
>> >  Well, page_nr really depends on the load. If the workload is only reads,
>> > clearly number of read pages is going to be higher than number of written
>> > pages. Once workload does heavy writing, I agree number of pages under
>> > writeback is likely going to be higher.
>>
>> Think about process A linearly scans 100MB mapped file pages
>> area for read, and another process B linearly writes to a same sized area.
>> If there is no readahead, the read page in stall window in memory is only
>> *one* page each time.
>  Yes, I understand this. But in a situation where there is *no* process
> writing and *hundred* processes reading, you clearly have more pages locked
> for reading than for writing. All I wanted to say is that your broad
> statement that the number of pages read from disk is lower than the number
> of pages written is not true in general. It depends on the workload.

OK, I agree with you here. I think I did not make my statement
of "general cases" very clear... I actually meant where reading is comparable to
writing. Yes, considering the variety of workloads, it's surely workload
dependent. Sorry for my vague statement :)

>
>> However, 100MB dirty pages can be hold in memory
>> waiting to be write which may stall the compaction for fallback_migrate_page().
>> Even for buffer_migrate_page() these pages are much more likely to get locked
>> by other behaviors like you said for IO submission,etc.
>>
>> I was not sure about readahead, of course,  I only theoretically
>> expected its still not
>> comparable to the totally async write behavior.
>>
>> >
>> >> I am not sure this will be the same case for all workloads,
>> >> don't know if Mel has tested large readahead workloads which
>> >> has more async read IOs and less writebacks.
>> >>
>> >> But theoretically I expect things are not that bad even for large
>> >> readahead, because readahead is triggered by the readahead TAG in
>> >> linear order, which means for a process to generating readahead IO,
>> >> its speed is still somewhat govened by the read IO speed. While
>> >> for a process writing to a file mapped memory area, it may well
>> >> exceed the speed of its backing-store writing speed.
>> >>
>> >>
>> >> Aside from that, I think the relation between page locking and
>> >> page read is not 1-to-1, in other words, there maybe quite some
>> >> transient page locking is caused by mmap and then page fault into
>> >> already good-state pages requiring no IO at all. For these
>> >> transient page lockings I think it's reasonable to have light
>> >> waiting.
>> >  Definitely there are other lockings than for read. E.g. to write a page,
>> > we lock it first, submit IO (which can actually block waiting for request
>> > to get freed), set PageWriteback, and unlock the page. And there are more
>> > transient ones like you mention above...
>>
>> Yes, you are right.
>> But I think we were talking about distinguishing page locking from page read
>> IO?
>>
>> Well, I might also want to suggest that do an early dirty test before
>> taking the lock...but, I expect page NotUpToDate is much more likely an
>> indication that we are going to block for IO on the following page lock.
>> Dirty test is not that strong. Do you agree ?
>  Yes, I agree with this.
>
>                                                                Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 11:00       ` Mel Gorman
@ 2011-11-23 12:51         ` Nai Xia
  2011-11-23 13:05         ` Nai Xia
  1 sibling, 0 replies; 33+ messages in thread
From: Nai Xia @ 2011-11-23 12:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 7:00 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, Nov 23, 2011 at 10:01:53AM +0800, Nai Xia wrote:
>> On Tue, Nov 22, 2011 at 2:56 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
>> >> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
>> >> mode that avoids writing back pages to backing storage. Async
>> >> compaction maps to MIGRATE_ASYNC while sync compaction maps to
>> >> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
>> >> hotplug, MIGRATE_SYNC is used.
>> >>
>> >> This avoids sync compaction stalling for an excessive length of time,
>> >> particularly when copying files to a USB stick where there might be
>> >> a large number of dirty pages backed by a filesystem that does not
>> >> support ->writepages.
>> > Hi,
>> > from my understanding, with this, even writes
>> > to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
>> > intended?
>> > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
>> > lock, so could wait on page read. page read and page out have the same
>> > latency, why takes them different?
>>
>> So for the problem you raised, I think my suggestion to Mel is to adopt the
>> following logic:
>>
>>            if (!trylock_page(page) && !PageUptodate(page))
>>                       we are quite likely to block on read, so we
>>                       depend on yet another MIGRATE_SYNC_MODE to decide
>>                       if we really want to lock_page() and wait for this IO.
>>
>> How do you think ?
>>
>
> Where are you adding this check?
>
> If you mean in __unmap_and_move(), the check is unnecessary unless
> another subsystem starts using sync-light compaction. With this series,
> only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is


Oh, Yes, I think I did not pay enough attention that direction compaction
is the *only* user after I saw your comment on MIGRATE_SYNC_LIGHT
of "allow blocking on most operations".... I guess Shaohua also missed
this point too....

Then MIGRATE_SYNC_LIGHT now is solely  for ruling out writeout for
dirty pages. My suggestion would be reserved for future if anyone
doing originally async compaction becomes willing to wait some time for
transient page locking to improve success rate.


> not up to date, it is also locked during the IO and unlocked after
> setting Uptodate in the IO completion handler.
>
> As the page is locked, compaction will fail trylock_page, do the
> PF_MEMALLOC check and bail as it is not safe for direct compaction
> to call lock_page as the comment in __unmap_and_move explains. This
> should avoid the stall.
>
> Did I misunderstand your suggestion?
>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 11:00       ` Mel Gorman
  2011-11-23 12:51         ` Nai Xia
@ 2011-11-23 13:05         ` Nai Xia
  2011-11-23 13:45           ` Mel Gorman
  1 sibling, 1 reply; 33+ messages in thread
From: Nai Xia @ 2011-11-23 13:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 7:00 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, Nov 23, 2011 at 10:01:53AM +0800, Nai Xia wrote:
>> On Tue, Nov 22, 2011 at 2:56 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
>> >> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
>> >> mode that avoids writing back pages to backing storage. Async
>> >> compaction maps to MIGRATE_ASYNC while sync compaction maps to
>> >> MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory
>> >> hotplug, MIGRATE_SYNC is used.
>> >>
>> >> This avoids sync compaction stalling for an excessive length of time,
>> >> particularly when copying files to a USB stick where there might be
>> >> a large number of dirty pages backed by a filesystem that does not
>> >> support ->writepages.
>> > Hi,
>> > from my understanding, with this, even writes
>> > to /proc/sys/vm/compact_memory doesn't wait for pageout, is this
>> > intended?
>> > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
>> > lock, so could wait on page read. page read and page out have the same
>> > latency, why takes them different?
>>
>> So for the problem you raised, I think my suggestion to Mel is to adopt the
>> following logic:
>>
>>            if (!trylock_page(page) && !PageUptodate(page))
>>                       we are quite likely to block on read, so we
>>                       depend on yet another MIGRATE_SYNC_MODE to decide
>>                       if we really want to lock_page() and wait for this IO.
>>
>> How do you think ?
>>
>
> Where are you adding this check?
>
> If you mean in __unmap_and_move(), the check is unnecessary unless
> another subsystem starts using sync-light compaction. With this series,
> only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is

But I am still a little bit confused that if MIGRATE_SYNC_LIGHT is only
used by direct compaction and  another mode can be used by it:
MIGRATE_ASYNC also does not write dirty pages, then why not also
do an (current->flags & PF_MEMALLOC) test before writing out pages,
like we already did for the page lock condition, but adding a new mode
instead?


> not up to date, it is also locked during the IO and unlocked after
> setting Uptodate in the IO completion handler.
>
> As the page is locked, compaction will fail trylock_page, do the
> PF_MEMALLOC check and bail as it is not safe for direct compaction
> to call lock_page as the comment in __unmap_and_move explains. This
> should avoid the stall.
>
> Did I misunderstand your suggestion?
>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 13:05         ` Nai Xia
@ 2011-11-23 13:45           ` Mel Gorman
  2011-11-23 14:35             ` Nai Xia
  0 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-23 13:45 UTC (permalink / raw)
  To: Nai Xia
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 09:05:08PM +0800, Nai Xia wrote:
> > <SNIP>
> >
> > Where are you adding this check?
> >
> > If you mean in __unmap_and_move(), the check is unnecessary unless
> > another subsystem starts using sync-light compaction. With this series,
> > only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is
> 
> But I am still a little bit confused that if MIGRATE_SYNC_LIGHT is only
> used by direct compaction and  another mode can be used by it:
> MIGRATE_ASYNC also does not write dirty pages, then why not also
> do an (current->flags & PF_MEMALLOC) test before writing out pages,

Why would it be necessary?
Why would it be better than what is there now?

> like we already did for the page lock condition, but adding a new mode
> instead?
> 

I'm afraid I am missing the significance of your question or how it
might apply to the problem at hand.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 13:45           ` Mel Gorman
@ 2011-11-23 14:35             ` Nai Xia
  2011-11-23 15:08               ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Nai Xia @ 2011-11-23 14:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 9:45 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, Nov 23, 2011 at 09:05:08PM +0800, Nai Xia wrote:
>> > <SNIP>
>> >
>> > Where are you adding this check?
>> >
>> > If you mean in __unmap_and_move(), the check is unnecessary unless
>> > another subsystem starts using sync-light compaction. With this series,
>> > only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is
>>
>> But I am still a little bit confused that if MIGRATE_SYNC_LIGHT is only
>> used by direct compaction and  another mode can be used by it:
>> MIGRATE_ASYNC also does not write dirty pages, then why not also
>> do an (current->flags & PF_MEMALLOC) test before writing out pages,
>
> Why would it be necessary?
> Why would it be better than what is there now?

I mean, if MIGRATE_SYNC_LIGHT -->  (current->flags & PF_MEMALLOC),
and MIGRATE_SYNC_LIGHT --> no dirty writeback, and
 (current->flags & PF_MEMALLOC)
---> (MIGRATE_SYNC_LIGHT || MIGRATE_ASYNC)
and   MIGRATE_ASYNC --> no dirty writeback, then
why not simply  (current->flags & PF_MEMALLOC) ---> no dirty writeback
and keep the sync meaning as it was?

Hoping I get myself clear this time......

>
>> like we already did for the page lock condition, but adding a new mode
>> instead?
>>
>
> I'm afraid I am missing the significance of your question or how it
> might apply to the problem at hand.

Sorry, It always takes some effort for me to get myself understood
when expressing a complicated thing. That's always my fault ;)

Thanks,
Nai

>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 14:35             ` Nai Xia
@ 2011-11-23 15:08               ` Mel Gorman
  2011-11-23 15:23                 ` Nai Xia
  0 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2011-11-23 15:08 UTC (permalink / raw)
  To: Nai Xia
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 10:35:37PM +0800, Nai Xia wrote:
> On Wed, Nov 23, 2011 at 9:45 PM, Mel Gorman <mgorman@suse.de> wrote:
> > On Wed, Nov 23, 2011 at 09:05:08PM +0800, Nai Xia wrote:
> >> > <SNIP>
> >> >
> >> > Where are you adding this check?
> >> >
> >> > If you mean in __unmap_and_move(), the check is unnecessary unless
> >> > another subsystem starts using sync-light compaction. With this series,
> >> > only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is
> >>
> >> But I am still a little bit confused that if MIGRATE_SYNC_LIGHT is only
> >> used by direct compaction and  another mode can be used by it:
> >> MIGRATE_ASYNC also does not write dirty pages, then why not also
> >> do an (current->flags & PF_MEMALLOC) test before writing out pages,
> >
> > Why would it be necessary?
> > Why would it be better than what is there now?
> 
> I mean, if
>    MIGRATE_SYNC_LIGHT --> (current->flags & PF_MEMALLOC) and
>    MIGRATE_SYNC_LIGHT --> no dirty writeback, and (current->flags & PF_MEMALLOC)
>                       --> (MIGRATE_SYNC_LIGHT || MIGRATE_ASYNC)
>    MIGRATE_ASYNC      --> no dirty writeback, then
> why not simply  (current->flags & PF_MEMALLOC) ---> no dirty writeback
> and keep the sync meaning as it was?
> 

Ok, I see what you mean. Instead of making MIGRATE_SYNC_LIGHT part of
the API, we could instead special case within migrate.c how to behave if
MIGRATE_SYNC && PF_MEMALLOC.

This would be functionally equivalent and satisfy THP users
but I do not see it as being easier to understand or easier
to maintain than updating the API. If someone in the future
wanted to use migration without significant stalls without
being PF_MEMALLOC, they would need to update the API like this.
There are no users like this today but automatic NUMA migration
might want to leverage something like MIGRATE_SYNC_LIGHT
(http://comments.gmane.org/gmane.linux.kernel.mm/70239)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 15:08               ` Mel Gorman
@ 2011-11-23 15:23                 ` Nai Xia
  2011-11-23 15:57                   ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Nai Xia @ 2011-11-23 15:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 11:08 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, Nov 23, 2011 at 10:35:37PM +0800, Nai Xia wrote:
>> On Wed, Nov 23, 2011 at 9:45 PM, Mel Gorman <mgorman@suse.de> wrote:
>> > On Wed, Nov 23, 2011 at 09:05:08PM +0800, Nai Xia wrote:
>> >> > <SNIP>
>> >> >
>> >> > Where are you adding this check?
>> >> >
>> >> > If you mean in __unmap_and_move(), the check is unnecessary unless
>> >> > another subsystem starts using sync-light compaction. With this series,
>> >> > only direct compaction cares about MIGRATE_SYNC_LIGHT. If the page is
>> >>
>> >> But I am still a little bit confused that if MIGRATE_SYNC_LIGHT is only
>> >> used by direct compaction and  another mode can be used by it:
>> >> MIGRATE_ASYNC also does not write dirty pages, then why not also
>> >> do an (current->flags & PF_MEMALLOC) test before writing out pages,
>> >
>> > Why would it be necessary?
>> > Why would it be better than what is there now?
>>
>> I mean, if
>>    MIGRATE_SYNC_LIGHT --> (current->flags & PF_MEMALLOC) and
>>    MIGRATE_SYNC_LIGHT --> no dirty writeback, and (current->flags & PF_MEMALLOC)
>>                       --> (MIGRATE_SYNC_LIGHT || MIGRATE_ASYNC)
>>    MIGRATE_ASYNC      --> no dirty writeback, then
>> why not simply  (current->flags & PF_MEMALLOC) ---> no dirty writeback
>> and keep the sync meaning as it was?
>>
>
> Ok, I see what you mean. Instead of making MIGRATE_SYNC_LIGHT part of
> the API, we could instead special case within migrate.c how to behave if
> MIGRATE_SYNC && PF_MEMALLOC.

Yeah~

>
> This would be functionally equivalent and satisfy THP users
> but I do not see it as being easier to understand or easier
> to maintain than updating the API. If someone in the future
> wanted to use migration without significant stalls without
> being PF_MEMALLOC, they would need to update the API like this.
> There are no users like this today but automatic NUMA migration
> might want to leverage something like MIGRATE_SYNC_LIGHT
> (http://comments.gmane.org/gmane.linux.kernel.mm/70239)

I see.
So could I say that might be the time and users for my suggestion of
page uptodate check to fit into?



Thanks,

Nai
>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction
  2011-11-23 15:23                 ` Nai Xia
@ 2011-11-23 15:57                   ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2011-11-23 15:57 UTC (permalink / raw)
  To: Nai Xia
  Cc: Shaohua Li, Linux-MM, Andrea Arcangeli, Minchan Kim, Jan Kara,
	Andy Isaacson, Johannes Weiner, Rik van Riel, LKML

On Wed, Nov 23, 2011 at 11:23:19PM +0800, Nai Xia wrote:
> > <SNIP>
> > This would be functionally equivalent and satisfy THP users
> > but I do not see it as being easier to understand or easier
> > to maintain than updating the API. If someone in the future
> > wanted to use migration without significant stalls without
> > being PF_MEMALLOC, they would need to update the API like this.
> > There are no users like this today but automatic NUMA migration
> > might want to leverage something like MIGRATE_SYNC_LIGHT
> > (http://comments.gmane.org/gmane.linux.kernel.mm/70239)
> 
> I see.
> So could I say that might be the time and users for my suggestion of
> page uptodate check to fit into?
> 

Yes, at that point checking for PageUptodate may be necessary depending
on their requirements.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-11-23 15:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-21 18:36 [RFC PATCH 0/7] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v4r2 Mel Gorman
2011-11-21 18:36 ` [PATCH 1/7] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman
2011-11-22 16:58   ` Minchan Kim
2011-11-21 18:36 ` [PATCH 2/7] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman
2011-11-22 17:00   ` Minchan Kim
2011-11-21 18:36 ` [PATCH 3/7] mm: check if we isolated a compound page during lumpy scan Mel Gorman
2011-11-22 17:05   ` Minchan Kim
2011-11-21 18:36 ` [PATCH 4/7] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman
2011-11-21 18:36 ` [PATCH 5/7] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman
2011-11-22 17:30   ` Minchan Kim
2011-11-23  9:19     ` Mel Gorman
2011-11-21 18:36 ` [PATCH 6/7] mm: page allocator: Limit when direct reclaim is used when compaction is deferred Mel Gorman
2011-11-22 17:50   ` Minchan Kim
2011-11-21 18:36 ` [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman
2011-11-22  6:56   ` Shaohua Li
2011-11-22 10:14     ` Mel Gorman
2011-11-22 11:54       ` Jan Kara
2011-11-22 13:59         ` Nai Xia
2011-11-22 15:07           ` Nai Xia
2011-11-22 19:13           ` Jan Kara
2011-11-22 22:44             ` Nai Xia
2011-11-23 11:39               ` Jan Kara
2011-11-23 12:20                 ` Nai Xia
2011-11-23  2:01     ` Nai Xia
2011-11-23  2:25       ` Shaohua Li
2011-11-23 11:00       ` Mel Gorman
2011-11-23 12:51         ` Nai Xia
2011-11-23 13:05         ` Nai Xia
2011-11-23 13:45           ` Mel Gorman
2011-11-23 14:35             ` Nai Xia
2011-11-23 15:08               ` Mel Gorman
2011-11-23 15:23                 ` Nai Xia
2011-11-23 15:57                   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).