[PATCH v3 0/3] f/madivse(DONTNEED) support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] f/madivse(DONTNEED) support
@ 2010-11-29 15:23 Minchan Kim
  2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-29 15:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, LKML, Ben Gamari, Minchan Kim

Recently there is a report about working set page eviction due to rsync
workload. application programmers want to use fadvise but it's not easy.
You could see detailed description on [1/3].

 - [1/3] is to move invalidated page which is dirty/writeback on active list
   into inactive list's head.
 - [2/3] is for moving invalidated page into inactive list's tail when the
   page's writeout is completed.
 - [3/3] is to not calling mark_page_accessed in case of madvise(DONTNEED).

Minchan Kim (3):
  deactivate invalidated pages
  Reclaim invalidated page ASAP
  Prevent activation of page in madvise_dontneed

 include/linux/mm.h   |    4 +-
 include/linux/swap.h |    1 +
 mm/madvise.c         |    4 +-
 mm/memory.c          |   38 +++++++++++-------
 mm/mmap.c            |    4 +-
 mm/page-writeback.c  |   12 +++++-
 mm/swap.c            |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c        |   16 ++++++--
 8 files changed, 155 insertions(+), 26 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/3] deactivate invalidated pages
  2010-11-29 15:23 [PATCH v3 0/3] f/madivse(DONTNEED) support Minchan Kim
@ 2010-11-29 15:23 ` Minchan Kim
  2010-11-30  1:01   ` KOSAKI Motohiro
  2010-11-30  5:22   ` Johannes Weiner
  2010-11-29 15:23 ` [PATCH 2/3] Reclaim invalidated page ASAP Minchan Kim
  2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
  2 siblings, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-29 15:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Peter Zijlstra,
	Wu Fengguang, KOSAKI Motohiro, Johannes Weiner, Nick Piggin,
	Mel Gorman

Recently, there are reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.

Some app developer want to support POSIX_FADV_NOREUSE.
But other OSes don't support it, either.
(http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By other approach, app developers use POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is writing
during invalidate_mapping_pages, it can't work.
It is very hard for application programmer to use it.
Because they always have to sync data before calling
fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
be discardable. At last, they can't use deferred write of kernel
so that they could see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So let's move
the writing page into inactive list's head.

Why I need the page to head, Dirty/Writeback page would be flushed
sooner or later. It can prevent writeout of pageout which is less
effective than flusher's writeout.

Originally, I reused lru_demote of Peter with some change so added
his Signed-off-by.

Reported-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Adnrew. Before applying this series, please drop below two patches.
 mm-deactivate-invalidated-pages.patch
 mm-deactivate-invalidated-pages-fix.patch

Changelog since v2:
 - mapped page leaves alone - suggested by Mel
 - pass part related PG_reclaim in next patch.

Changelog since v1:
 - modify description
 - correct typo
 - add some comment
---
 include/linux/swap.h |    1 +
 mm/swap.c            |   80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c        |   16 +++++++--
 3 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..84375e4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -213,6 +213,7 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern int lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
+extern void lru_deactivate_page(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/swap.c b/mm/swap.c
index 3f48542..19e0812 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -31,6 +31,7 @@
 #include <linux/backing-dev.h>
 #include <linux/memcontrol.h>
 #include <linux/gfp.h>
+#include <linux/rmap.h>
 
 #include "internal.h"
 
@@ -39,6 +40,8 @@ int page_cluster;
 
 static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+
 
 /*
  * This path almost never happens for VM activity - pages are normally
@@ -267,6 +270,63 @@ void add_page_to_unevictable_list(struct page *page)
 }
 
 /*
+ * This function is used by invalidate_mapping_pages.
+ * If the page can't be invalidated, this function moves the page
+ * into inative list's head. Because the VM expects the page would
+ * be writeout by flusher. The flusher's writeout is much effective
+ * than reclaimer's random writeout.
+ */
+static void __lru_deactivate(struct page *page, struct zone *zone)
+{
+	int lru, file;
+	unsigned long vm_flags;
+
+	if (!PageLRU(page) || !PageActive(page))
+		return;
+
+	/* Some processes are using the page */
+	if (page_mapped(page))
+		return;
+
+	file = page_is_file_cache(page);
+	lru = page_lru_base_type(page);
+	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+	add_page_to_lru_list(zone, page, lru);
+	__count_vm_event(PGDEACTIVATE);
+
+	update_page_reclaim_stat(zone, page, file, 0);
+}
+
+/*
+ * This function must be called with preemption disable.
+ */
+static void __pagevec_lru_deactivate(struct pagevec *pvec)
+{
+	int i;
+	struct zone *zone = NULL;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		struct zone *pagezone = page_zone(page);
+
+		if (pagezone != zone) {
+			if (zone)
+				spin_unlock_irq(&zone->lru_lock);
+			zone = pagezone;
+			spin_lock_irq(&zone->lru_lock);
+		}
+		__lru_deactivate(page, zone);
+	}
+	if (zone)
+		spin_unlock_irq(&zone->lru_lock);
+
+	release_pages(pvec->pages, pvec->nr, pvec->cold);
+	pagevec_reinit(pvec);
+}
+
+/*
  * Drain pages out of the cpu's pagevecs.
  * Either "cpu" is the current CPU, and preemption has already been
  * disabled; or "cpu" is being hot-unplugged, and is already dead.
@@ -292,6 +352,26 @@ static void drain_cpu_pagevecs(int cpu)
 		pagevec_move_tail(pvec);
 		local_irq_restore(flags);
 	}
+
+	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	if (pagevec_count(pvec))
+		__pagevec_lru_deactivate(pvec);
+}
+
+/*
+ * Forcefully deactivate a page.
+ * This function is used for reclaiming the page ASAP when the page
+ * can't be invalidated by Dirty/Writeback.
+ */
+void lru_deactivate_page(struct page *page)
+{
+	if (likely(get_page_unless_zero(page))) {
+		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+
+		if (!pagevec_add(pvec, page))
+			__pagevec_lru_deactivate(pvec);
+		put_cpu_var(lru_deactivate_pvecs);
+	}
 }
 
 void lru_add_drain(void)
diff --git a/mm/truncate.c b/mm/truncate.c
index cd94607..09b9748 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -332,7 +332,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 {
 	struct pagevec pvec;
 	pgoff_t next = start;
-	unsigned long ret = 0;
+	unsigned long ret;
+	unsigned long count = 0;
 	int i;
 
 	pagevec_init(&pvec, 0);
@@ -359,8 +360,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (lock_failed)
 				continue;
 
-			ret += invalidate_inode_page(page);
-
+			ret = invalidate_inode_page(page);
+			/*
+			 * If the page was dirty or under writeback we cannot
+			 * invalidate it now.  Move it to the head of the
+			 * inactive LRU for using deferred writeback of flusher.
+			 */
+			if (!ret)
+				lru_deactivate_page(page);
+			count += ret;
 			unlock_page(page);
 			if (next > end)
 				break;
@@ -369,7 +377,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 		mem_cgroup_uncharge_end();
 		cond_resched();
 	}
-	return ret;
+	return count;
 }
 EXPORT_SYMBOL(invalidate_mapping_pages);
 
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 15:23 [PATCH v3 0/3] f/madivse(DONTNEED) support Minchan Kim
  2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
@ 2010-11-29 15:23 ` Minchan Kim
  2010-11-29 16:57   ` Mel Gorman
  2010-11-30  1:10   ` KOSAKI Motohiro
  2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
  2 siblings, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-29 15:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman

invalidate_mapping_pages is very big hint to reclaimer.
It means user doesn't want to use the page any more.
So in order to prevent working set page eviction, this patch
move the page into tail of inactive list by PG_reclaim.

Please, remember that pages in inactive list are working set
as well as active list. If we don't move pages into inactive list's
tail, pages near by tail of inactive list can be evicted although
we have a big clue about useless pages. It's totally bad.

Now PG_readahead/PG_reclaim is shared.
fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
preventing fast reclaiming readahead marker page.

In this series, PG_reclaim is used by invalidated page, too.
If VM find the page is invalidated and it's dirty, it sets PG_reclaim
to reclaim asap. Then, when the dirty page will be writeback,
clear_page_dirty_for_io will clear PG_reclaim unconditionally.
It disturbs this serie's goal.

I think it's okay to clear PG_readahead when the page is dirty, not
writeback time. So this patch moves ClearPageReadahead.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Changelog since v2:
 - put ClearPageReclaim in set_page_dirty - suggested by Wu.
 
Changelog since v1:
 - make the invalidated page reclaim asap - suggested by Andrew.
---
 mm/page-writeback.c |   12 +++++++++++-
 mm/swap.c           |   48 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc93802..88587a5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	/*
+	 * readahead/lru_deactivate_page could remain
+	 * PG_readahead/PG_reclaim due to race with end_page_writeback
+	 * About readahead, if the page is written, the flags would be
+	 * reset. So no problem.
+	 * About lru_deactivate_page, if the page is redirty, the flag
+	 * will be reset. So no problem. but if the page is used by readahead
+	 * it will confuse readahead and  make it restart the size rampup
+	 * process. But it's a trivial problem.
+	 */
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.
diff --git a/mm/swap.c b/mm/swap.c
index 19e0812..936b281 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -275,28 +275,50 @@ void add_page_to_unevictable_list(struct page *page)
  * into inative list's head. Because the VM expects the page would
  * be writeout by flusher. The flusher's writeout is much effective
  * than reclaimer's random writeout.
+ *
+ * If the page isn't page_mapped and dirty/writeback, the page
+ * could reclaim asap using PG_reclaim.
+ *
+ * 1. active, mapped page -> none
+ * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
+ * 3. inactive, mapped page -> none
+ * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
+ * 5. Others -> none
+ *
+ * In 4, why it moves inactive's head, the VM expects the page would
+ * be writeout by flusher. The flusher's writeout is much effective than
+ * reclaimer's random writeout.
  */
 static void __lru_deactivate(struct page *page, struct zone *zone)
 {
 	int lru, file;
-	unsigned long vm_flags;
+	int active = 0;
 
-	if (!PageLRU(page) || !PageActive(page))
+	if (!PageLRU(page))
 		return;
-
 	/* Some processes are using the page */
 	if (page_mapped(page))
 		return;
-
-	file = page_is_file_cache(page);
-	lru = page_lru_base_type(page);
-	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
-	ClearPageActive(page);
-	ClearPageReferenced(page);
-	add_page_to_lru_list(zone, page, lru);
-	__count_vm_event(PGDEACTIVATE);
-
-	update_page_reclaim_stat(zone, page, file, 0);
+	if (PageActive(page))
+		active = 1;
+
+	if (PageWriteback(page) || PageDirty(page)) {
+		/*
+		 * PG_reclaim could be raced with end_page_writeback
+		 * It can make readahead confusing.  But race window
+		 * is _really_ small and  it's non-critical problem.
+		 */
+		SetPageReclaim(page);
+
+		file = page_is_file_cache(page);
+		lru = page_lru_base_type(page);
+		del_page_from_lru_list(zone, page, lru + active);
+		ClearPageActive(page);
+		ClearPageReferenced(page);
+		add_page_to_lru_list(zone, page, lru);
+		__count_vm_event(PGDEACTIVATE);
+		update_page_reclaim_stat(zone, page, file, 0);
+	}
 }
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-29 15:23 [PATCH v3 0/3] f/madivse(DONTNEED) support Minchan Kim
  2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
  2010-11-29 15:23 ` [PATCH 2/3] Reclaim invalidated page ASAP Minchan Kim
@ 2010-11-29 15:23 ` Minchan Kim
  2010-11-30  1:08   ` KOSAKI Motohiro
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-29 15:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, KOSAKI Motohiro,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

Now zap_pte_range alwayas activates pages which are pte_young &&
!VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
it's unnecessary since the page wouldn't use any more.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>

Changelog since v2:
 - remove unnecessary description
Changelog since v1: 
 - change word from promote to activate
 - add activate argument to zap_pte_range and family function

---
 include/linux/mm.h |    4 ++--
 mm/madvise.c       |    4 ++--
 mm/memory.c        |   38 +++++++++++++++++++++++---------------
 mm/mmap.c          |    4 ++--
 4 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..6032881 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -779,11 +779,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		unsigned long size, struct zap_details *);
+		unsigned long size, struct zap_details *, bool activate);
 unsigned long unmap_vmas(struct mmu_gather **tlb,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
-		struct zap_details *);
+		struct zap_details *, bool activate);
 
 /**
  * mm_walk - callbacks for walk_page_range
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..8bc4b2d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -171,9 +171,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			.nonlinear_vma = vma,
 			.last_index = ULONG_MAX,
 		};
-		zap_page_range(vma, start, end - start, &details);
+		zap_page_range(vma, start, end - start, &details, false);
 	} else
-		zap_page_range(vma, start, end - start, NULL);
+		zap_page_range(vma, start, end - start, NULL, false);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 2c989f3..b8d38bd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -891,7 +891,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pmd_t *pmd,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	struct mm_struct *mm = tlb->mm;
 	pte_t *pte;
@@ -949,7 +950,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
-				    likely(!VM_SequentialReadHint(vma)))
+				    likely(!VM_SequentialReadHint(vma)) &&
+					activate)
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
 			}
@@ -989,7 +991,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pud_t *pud,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -1002,7 +1005,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pte_range(tlb, vma, pmd, addr, next,
-						zap_work, details);
+					zap_work, details, activate);
 	} while (pmd++, addr = next, (addr != end && *zap_work > 0));
 
 	return addr;
@@ -1011,7 +1014,8 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pgd_t *pgd,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -1024,7 +1028,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pmd_range(tlb, vma, pud, addr, next,
-						zap_work, details);
+					zap_work, details, activate);
 	} while (pud++, addr = next, (addr != end && *zap_work > 0));
 
 	return addr;
@@ -1033,7 +1037,8 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 static unsigned long unmap_page_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pgd_t *pgd;
 	unsigned long next;
@@ -1052,7 +1057,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pud_range(tlb, vma, pgd, addr, next,
-						zap_work, details);
+						zap_work, details, activate);
 	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
 	tlb_end_vma(tlb, vma);
 	mem_cgroup_uncharge_end();
@@ -1075,6 +1080,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
  * @end_addr: virtual address at which to end unmapping
  * @nr_accounted: Place number of unmapped pages in vm-accountable vma's here
  * @details: details of nonlinear truncation or shared cache invalidation
+ * @activate: whether pages included in the vma should be activated or not
  *
  * Returns the end address of the unmapping (restart addr if interrupted).
  *
@@ -1096,7 +1102,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 unsigned long unmap_vmas(struct mmu_gather **tlbp,
 		struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
-		struct zap_details *details)
+		struct zap_details *details, bool activate)
 {
 	long zap_work = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -1149,8 +1155,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 
 				start = end;
 			} else
-				start = unmap_page_range(*tlbp, vma,
-						start, end, &zap_work, details);
+				start = unmap_page_range(*tlbp, vma, start,
+					end, &zap_work, details, activate);
 
 			if (zap_work > 0) {
 				BUG_ON(start != end);
@@ -1184,9 +1190,10 @@ out:
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  * @details: details of nonlinear truncation or shared cache invalidation
+ * @activate: whether pages included in the vma should be activated or not
  */
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		unsigned long size, struct zap_details *details)
+		unsigned long size, struct zap_details *details, bool activate)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -1196,7 +1203,8 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
+	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted,
+		details, activate);
 	if (tlb)
 		tlb_finish_mmu(tlb, address, end);
 	return end;
@@ -1220,7 +1228,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-	zap_page_range(vma, address, size, NULL);
+	zap_page_range(vma, address, size, NULL, false);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(zap_vma_ptes);
@@ -2481,7 +2489,7 @@ again:
 	}
 
 	restart_addr = zap_page_range(vma, start_addr,
-					end_addr - start_addr, details);
+			end_addr - start_addr, details, true);
 	need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
 
 	if (restart_addr >= end_addr) {
diff --git a/mm/mmap.c b/mm/mmap.c
index b179abb..0ed5ab3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1904,7 +1904,7 @@ static void unmap_region(struct mm_struct *mm,
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL, true);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
@@ -2278,7 +2278,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL, true);
 	vm_unacct_memory(nr_accounted);
 
 	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 15:23 ` [PATCH 2/3] Reclaim invalidated page ASAP Minchan Kim
@ 2010-11-29 16:57   ` Mel Gorman
  2010-11-29 22:41     ` Minchan Kim
  2010-11-30  1:10   ` KOSAKI Motohiro
  1 sibling, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-29 16:57 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin

On Tue, Nov 30, 2010 at 12:23:20AM +0900, Minchan Kim wrote:
> invalidate_mapping_pages is very big hint to reclaimer.
> It means user doesn't want to use the page any more.
> So in order to prevent working set page eviction, this patch
> move the page into tail of inactive list by PG_reclaim.
> 
> Please, remember that pages in inactive list are working set
> as well as active list. If we don't move pages into inactive list's
> tail, pages near by tail of inactive list can be evicted although
> we have a big clue about useless pages. It's totally bad.
> 
> Now PG_readahead/PG_reclaim is shared.
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Changelog since v2:
>  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
>  
> Changelog since v1:
>  - make the invalidated page reclaim asap - suggested by Andrew.
> ---
>  mm/page-writeback.c |   12 +++++++++++-
>  mm/swap.c           |   48 +++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index fc93802..88587a5 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
>  {
>  	struct address_space *mapping = page_mapping(page);
>  
> +	/*
> +	 * readahead/lru_deactivate_page could remain
> +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> +	 * About readahead, if the page is written, the flags would be
> +	 * reset. So no problem.
> +	 * About lru_deactivate_page, if the page is redirty, the flag
> +	 * will be reset. So no problem. but if the page is used by readahead
> +	 * it will confuse readahead and  make it restart the size rampup
> +	 * process. But it's a trivial problem.
> +	 */
> +	ClearPageReclaim(page);
>  	if (likely(mapping)) {
>  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
>  #ifdef CONFIG_BLOCK
> @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
>  
>  	BUG_ON(!PageLocked(page));
>  
> -	ClearPageReclaim(page);
>  	if (mapping && mapping_cap_account_dirty(mapping)) {
>  		/*
>  		 * Yes, Virginia, this is indeed insane.
> diff --git a/mm/swap.c b/mm/swap.c
> index 19e0812..936b281 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -275,28 +275,50 @@ void add_page_to_unevictable_list(struct page *page)
>   * into inative list's head. Because the VM expects the page would
>   * be writeout by flusher. The flusher's writeout is much effective
>   * than reclaimer's random writeout.
> + *
> + * If the page isn't page_mapped and dirty/writeback, the page
> + * could reclaim asap using PG_reclaim.
> + *
> + * 1. active, mapped page -> none
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. Others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
>   */
>  static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
>  	int lru, file;
> -	unsigned long vm_flags;
> +	int active = 0;
>  
> -	if (!PageLRU(page) || !PageActive(page))
> +	if (!PageLRU(page))
>  		return;
> -
>  	/* Some processes are using the page */
>  	if (page_mapped(page))
>  		return;
> -
> -	file = page_is_file_cache(page);
> -	lru = page_lru_base_type(page);
> -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> -	ClearPageActive(page);
> -	ClearPageReferenced(page);
> -	add_page_to_lru_list(zone, page, lru);
> -	__count_vm_event(PGDEACTIVATE);
> -
> -	update_page_reclaim_stat(zone, page, file, 0);
> +	if (PageActive(page))
> +		active = 1;
> +
> +	if (PageWriteback(page) || PageDirty(page)) {
> +		/*
> +		 * PG_reclaim could be raced with end_page_writeback
> +		 * It can make readahead confusing.  But race window
> +		 * is _really_ small and  it's non-critical problem.
> +		 */
> +		SetPageReclaim(page);
> +
> +		file = page_is_file_cache(page);
> +		lru = page_lru_base_type(page);
> +		del_page_from_lru_list(zone, page, lru + active);
> +		ClearPageActive(page);
> +		ClearPageReferenced(page);
> +		add_page_to_lru_list(zone, page, lru);
> +		__count_vm_event(PGDEACTIVATE);

You update PGDEACTIVATE whether the page was active or not.

> +		update_page_reclaim_stat(zone, page, file, 0);
> +	}
>  }
>  
>  /*
> -- 
> 1.7.0.4
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 16:57   ` Mel Gorman
@ 2010-11-29 22:41     ` Minchan Kim
  2010-11-30  9:16       ` Mel Gorman
  2010-11-30 11:20       ` Johannes Weiner
  0 siblings, 2 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-29 22:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin

On Mon, Nov 29, 2010 at 04:57:06PM +0000, Mel Gorman wrote:
> On Tue, Nov 30, 2010 at 12:23:20AM +0900, Minchan Kim wrote:
> > invalidate_mapping_pages is very big hint to reclaimer.
> > It means user doesn't want to use the page any more.
> > So in order to prevent working set page eviction, this patch
> > move the page into tail of inactive list by PG_reclaim.
> > 
> > Please, remember that pages in inactive list are working set
> > as well as active list. If we don't move pages into inactive list's
> > tail, pages near by tail of inactive list can be evicted although
> > we have a big clue about useless pages. It's totally bad.
> > 
> > Now PG_readahead/PG_reclaim is shared.
> > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > preventing fast reclaiming readahead marker page.
> > 
> > In this series, PG_reclaim is used by invalidated page, too.
> > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > to reclaim asap. Then, when the dirty page will be writeback,
> > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > It disturbs this serie's goal.
> > 
> > I think it's okay to clear PG_readahead when the page is dirty, not
> > writeback time. So this patch moves ClearPageReadahead.
> > 
> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Nick Piggin <npiggin@kernel.dk>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > 
> > Changelog since v2:
> >  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
> >  
> > Changelog since v1:
> >  - make the invalidated page reclaim asap - suggested by Andrew.
> > ---
> >  mm/page-writeback.c |   12 +++++++++++-
> >  mm/swap.c           |   48 +++++++++++++++++++++++++++++++++++-------------
> >  2 files changed, 46 insertions(+), 14 deletions(-)
> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index fc93802..88587a5 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
> >  {
> >  	struct address_space *mapping = page_mapping(page);
> >  
> > +	/*
> > +	 * readahead/lru_deactivate_page could remain
> > +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> > +	 * About readahead, if the page is written, the flags would be
> > +	 * reset. So no problem.
> > +	 * About lru_deactivate_page, if the page is redirty, the flag
> > +	 * will be reset. So no problem. but if the page is used by readahead
> > +	 * it will confuse readahead and  make it restart the size rampup
> > +	 * process. But it's a trivial problem.
> > +	 */
> > +	ClearPageReclaim(page);
> >  	if (likely(mapping)) {
> >  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> >  #ifdef CONFIG_BLOCK
> > @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
> >  
> >  	BUG_ON(!PageLocked(page));
> >  
> > -	ClearPageReclaim(page);
> >  	if (mapping && mapping_cap_account_dirty(mapping)) {
> >  		/*
> >  		 * Yes, Virginia, this is indeed insane.
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 19e0812..936b281 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -275,28 +275,50 @@ void add_page_to_unevictable_list(struct page *page)
> >   * into inative list's head. Because the VM expects the page would
> >   * be writeout by flusher. The flusher's writeout is much effective
> >   * than reclaimer's random writeout.
> > + *
> > + * If the page isn't page_mapped and dirty/writeback, the page
> > + * could reclaim asap using PG_reclaim.
> > + *
> > + * 1. active, mapped page -> none
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 5. Others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be writeout by flusher. The flusher's writeout is much effective than
> > + * reclaimer's random writeout.
> >   */
> >  static void __lru_deactivate(struct page *page, struct zone *zone)
> >  {
> >  	int lru, file;
> > -	unsigned long vm_flags;
> > +	int active = 0;
> >  
> > -	if (!PageLRU(page) || !PageActive(page))
> > +	if (!PageLRU(page))
> >  		return;
> > -
> >  	/* Some processes are using the page */
> >  	if (page_mapped(page))
> >  		return;
> > -
> > -	file = page_is_file_cache(page);
> > -	lru = page_lru_base_type(page);
> > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > -	ClearPageActive(page);
> > -	ClearPageReferenced(page);
> > -	add_page_to_lru_list(zone, page, lru);
> > -	__count_vm_event(PGDEACTIVATE);
> > -
> > -	update_page_reclaim_stat(zone, page, file, 0);
> > +	if (PageActive(page))
> > +		active = 1;
> > +
> > +	if (PageWriteback(page) || PageDirty(page)) {
> > +		/*
> > +		 * PG_reclaim could be raced with end_page_writeback
> > +		 * It can make readahead confusing.  But race window
> > +		 * is _really_ small and  it's non-critical problem.
> > +		 */
> > +		SetPageReclaim(page);
> > +
> > +		file = page_is_file_cache(page);
> > +		lru = page_lru_base_type(page);
> > +		del_page_from_lru_list(zone, page, lru + active);
> > +		ClearPageActive(page);
> > +		ClearPageReferenced(page);
> > +		add_page_to_lru_list(zone, page, lru);
> > +		__count_vm_event(PGDEACTIVATE);
> 
> You update PGDEACTIVATE whether the page was active or not.

My fault. 
Resend.

Subject: [PATCH v3 2/3] Reclaim invalidated page ASAP

invalidate_mapping_pages is very big hint to reclaimer.
It means user doesn't want to use the page any more.
So in order to prevent working set page eviction, this patch
move the page into tail of inactive list by PG_reclaim.

Please, remember that pages in inactive list are working set
as well as active list. If we don't move pages into inactive list's
tail, pages near by tail of inactive list can be evicted although
we have a big clue about useless pages. It's totally bad.

Now PG_readahead/PG_reclaim is shared.
fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
preventing fast reclaiming readahead marker page.

In this series, PG_reclaim is used by invalidated page, too.
If VM find the page is invalidated and it's dirty, it sets PG_reclaim
to reclaim asap. Then, when the dirty page will be writeback,
clear_page_dirty_for_io will clear PG_reclaim unconditionally.
It disturbs this serie's goal.

I think it's okay to clear PG_readahead when the page is dirty, not
writeback time. So this patch moves ClearPageReadahead.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Changelog since v2:
 - put ClearPageReclaim in set_page_dirty - suggested by Wu.

Changelog since v1:
 - make the invalidated page reclaim asap - suggested by Andrew.
---
 mm/page-writeback.c |   12 +++++++++++-
 mm/swap.c           |   49 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc93802..88587a5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	/*
+	 * readahead/lru_deactivate_page could remain
+	 * PG_readahead/PG_reclaim due to race with end_page_writeback
+	 * About readahead, if the page is written, the flags would be
+	 * reset. So no problem.
+	 * About lru_deactivate_page, if the page is redirty, the flag
+	 * will be reset. So no problem. but if the page is used by readahead
+	 * it will confuse readahead and  make it restart the size rampup
+	 * process. But it's a trivial problem.
+	 */
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.
diff --git a/mm/swap.c b/mm/swap.c
index 19e0812..1f1f435 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -275,28 +275,51 @@ void add_page_to_unevictable_list(struct page *page)
  * into inative list's head. Because the VM expects the page would
  * be writeout by flusher. The flusher's writeout is much effective
  * than reclaimer's random writeout.
+ *
+ * If the page isn't page_mapped and dirty/writeback, the page
+ * could reclaim asap using PG_reclaim.
+ *
+ * 1. active, mapped page -> none
+ * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
+ * 3. inactive, mapped page -> none
+ * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
+ * 5. Others -> none
+ *
+ * In 4, why it moves inactive's head, the VM expects the page would
+ * be writeout by flusher. The flusher's writeout is much effective than
+ * reclaimer's random writeout.
  */
 static void __lru_deactivate(struct page *page, struct zone *zone)
 {
 	int lru, file;
-	unsigned long vm_flags;
+	int active = 0;
 
-	if (!PageLRU(page) || !PageActive(page))
+	if (!PageLRU(page))
 		return;
-
 	/* Some processes are using the page */
 	if (page_mapped(page))
 		return;
-
-	file = page_is_file_cache(page);
-	lru = page_lru_base_type(page);
-	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
-	ClearPageActive(page);
-	ClearPageReferenced(page);
-	add_page_to_lru_list(zone, page, lru);
-	__count_vm_event(PGDEACTIVATE);
-
-	update_page_reclaim_stat(zone, page, file, 0);
+	if (PageActive(page))
+		active = 1;
+
+	if (PageWriteback(page) || PageDirty(page)) {
+		/*
+		 * PG_reclaim could be raced with end_page_writeback
+		 * It can make readahead confusing.  But race window
+		 * is _really_ small and  it's non-critical problem.
+		 */
+		SetPageReclaim(page);
+
+		file = page_is_file_cache(page);
+		lru = page_lru_base_type(page);
+		del_page_from_lru_list(zone, page, lru + active);
+		ClearPageActive(page);
+		ClearPageReferenced(page);
+		add_page_to_lru_list(zone, page, lru);
+		if (active)
+			__count_vm_event(PGDEACTIVATE);
+		update_page_reclaim_stat(zone, page, file, 0);
+	}
 }
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] deactivate invalidated pages
  2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
@ 2010-11-30  1:01   ` KOSAKI Motohiro
  2010-11-30  5:22   ` Johannes Weiner
  1 sibling, 0 replies; 22+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30  1:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Peter Zijlstra, Wu Fengguang, Johannes Weiner, Nick Piggin,
	Mel Gorman

> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By other approach, app developers use POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. It can prevent writeout of pageout which is less
> effective than flusher's writeout.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
@ 2010-11-30  1:08   ` KOSAKI Motohiro
  2010-11-30 11:35   ` Johannes Weiner
  2010-11-30 18:34   ` Hugh Dickins
  2 siblings, 0 replies; 22+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30  1:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

> Now zap_pte_range alwayas activates pages which are pte_young &&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>

Looks good to me.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 15:23 ` [PATCH 2/3] Reclaim invalidated page ASAP Minchan Kim
  2010-11-29 16:57   ` Mel Gorman
@ 2010-11-30  1:10   ` KOSAKI Motohiro
  2010-11-30  9:18     ` Mel Gorman
  1 sibling, 1 reply; 22+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30  1:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

> invalidate_mapping_pages is very big hint to reclaimer.
> It means user doesn't want to use the page any more.
> So in order to prevent working set page eviction, this patch
> move the page into tail of inactive list by PG_reclaim.
> 
> Please, remember that pages in inactive list are working set
> as well as active list. If we don't move pages into inactive list's
> tail, pages near by tail of inactive list can be evicted although
> we have a big clue about useless pages. It's totally bad.
> 
> Now PG_readahead/PG_reclaim is shared.
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>

I still dislike this one. I doubt this trick makes much benefit in real
world workload.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] deactivate invalidated pages
  2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
  2010-11-30  1:01   ` KOSAKI Motohiro
@ 2010-11-30  5:22   ` Johannes Weiner
  2010-11-30  6:18     ` Minchan Kim
  1 sibling, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2010-11-30  5:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, KOSAKI Motohiro, Nick Piggin, Mel Gorman

On Tue, Nov 30, 2010 at 12:23:19AM +0900, Minchan Kim wrote:
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By other approach, app developers use POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. It can prevent writeout of pageout which is less
> effective than flusher's writeout.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Adnrew. Before applying this series, please drop below two patches.
>  mm-deactivate-invalidated-pages.patch
>  mm-deactivate-invalidated-pages-fix.patch
> 
> Changelog since v2:
>  - mapped page leaves alone - suggested by Mel
>  - pass part related PG_reclaim in next patch.
> 
> Changelog since v1:
>  - modify description
>  - correct typo
>  - add some comment
> ---
>  include/linux/swap.h |    1 +
>  mm/swap.c            |   80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/truncate.c        |   16 +++++++--
>  3 files changed, 93 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..84375e4 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h

[...]

> @@ -267,6 +270,63 @@ void add_page_to_unevictable_list(struct page *page)
>  }
>  
>  /*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head. Because the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective
> + * than reclaimer's random writeout.

The wording is a bit confusing, I find.  It sounds a bit like the
flusher's chance is increased by moving it to the inactive list in the
first place, but the key is that it is moved to the head instead of,
what one would expect, the tail of the list.  How about:

	If the page can not be invalidated, it is moved to the
	inactive list to speed up its reclaim.  It is moved to the
	head of the list, rather than the tail, to give the flusher
	threads some time to write it out, as this is much more
	effective than the single-page writeout from reclaim.

> +static void __lru_deactivate(struct page *page, struct zone *zone)

Do you insist on the underscores? :)

> +{
> +	int lru, file;
> +	unsigned long vm_flags;
> +
> +	if (!PageLRU(page) || !PageActive(page))
> +		return;
> +
> +	/* Some processes are using the page */
> +	if (page_mapped(page))
> +		return;
> +
> +	file = page_is_file_cache(page);
> +	lru = page_lru_base_type(page);
> +	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> +	ClearPageActive(page);
> +	ClearPageReferenced(page);
> +	add_page_to_lru_list(zone, page, lru);
> +	__count_vm_event(PGDEACTIVATE);
> +
> +	update_page_reclaim_stat(zone, page, file, 0);
> +}
> +
> +/*
> + * This function must be called with preemption disable.

Why is that?  Unless I missed something, the only thing that needs
protection is the per-cpu pagevec reference the only user of this
function passes in.  But this should be the caller's concern and is
not really a requirement of this function per-se, is it?

> +static void __pagevec_lru_deactivate(struct pagevec *pvec)

More underscores!

> +{
> +	int i;
> +	struct zone *zone = NULL;
> +
> +	for (i = 0; i < pagevec_count(pvec); i++) {
> +		struct page *page = pvec->pages[i];
> +		struct zone *pagezone = page_zone(page);
> +
> +		if (pagezone != zone) {
> +			if (zone)
> +				spin_unlock_irq(&zone->lru_lock);
> +			zone = pagezone;
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +		__lru_deactivate(page, zone);
> +	}
> +	if (zone)
> +		spin_unlock_irq(&zone->lru_lock);
> +
> +	release_pages(pvec->pages, pvec->nr, pvec->cold);
> +	pagevec_reinit(pvec);
> +}
> +
> +/*
>   * Drain pages out of the cpu's pagevecs.
>   * Either "cpu" is the current CPU, and preemption has already been
>   * disabled; or "cpu" is being hot-unplugged, and is already dead.
> @@ -292,6 +352,26 @@ static void drain_cpu_pagevecs(int cpu)
>  		pagevec_move_tail(pvec);
>  		local_irq_restore(flags);
>  	}
> +
> +	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
> +	if (pagevec_count(pvec))
> +		__pagevec_lru_deactivate(pvec);
> +}
> +
> +/*
> + * Forcefully deactivate a page.
> + * This function is used for reclaiming the page ASAP when the page
> + * can't be invalidated by Dirty/Writeback.

How about:

/**
 * lru_deactivate_page - forcefully deactivate a page
 * @page: page to deactivate
 *
 * This function hints the VM that @page is a good reclaim candidate,
 * for example if its invalidation fails due to the page being dirty
 * or under writeback.
 */

> +void lru_deactivate_page(struct page *page)

I would love that lru_ prefix for most of the API in this file.  In
fact, the file should probably be called lru.c.  But for now, can you
keep the naming consistent and call it deactivate_page?

> +	if (likely(get_page_unless_zero(page))) {
> +		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
> +
> +		if (!pagevec_add(pvec, page))
> +			__pagevec_lru_deactivate(pvec);
> +		put_cpu_var(lru_deactivate_pvecs);
> +	}
>  }
>  
>  void lru_add_drain(void)

[...]

> @@ -359,8 +360,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>  			if (lock_failed)
>  				continue;
>  
> -			ret += invalidate_inode_page(page);
> -
> +			ret = invalidate_inode_page(page);
> +			/*
> +			 * If the page was dirty or under writeback we cannot
> +			 * invalidate it now.  Move it to the head of the
> +			 * inactive LRU for using deferred writeback of flusher.

This would also be less confusing if it would say

	Move it to the head of the inactive LRU (rather than the tail)
	for using [...]

But I am not sure that this detail is interesting at this point.  It
would be more interesting to name the reasons for why the page is
moved to the inactive list in the first place:

	If the page is dirty or under writeback, we can not invalidate
	it now.  But we assume that attempted invalidation is a hint
	that the page is no longer of interest and try to speed up its
	reclaim.

> +			 */
> +			if (!ret)
> +				lru_deactivate_page(page);
> +			count += ret;
>  			unlock_page(page);
>  			if (next > end)
>  				break;

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/3] deactivate invalidated pages
  2010-11-30  5:22   ` Johannes Weiner
@ 2010-11-30  6:18     ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-30  6:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, KOSAKI Motohiro, Nick Piggin, Mel Gorman

Hi Hannes,

On Tue, Nov 30, 2010 at 2:22 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Nov 30, 2010 at 12:23:19AM +0900, Minchan Kim wrote:
>> Recently, there are reported problem about thrashing.
>> (http://marc.info/?l=rsync&m=128885034930933&w=2)
>> It happens by backup workloads(ex, nightly rsync).
>> That's because the workload makes just use-once pages
>> and touches pages twice. It promotes the page into
>> active list so that it results in working set page eviction.
>>
>> Some app developer want to support POSIX_FADV_NOREUSE.
>> But other OSes don't support it, either.
>> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>>
>> By other approach, app developers use POSIX_FADV_DONTNEED.
>> But it has a problem. If kernel meets page is writing
>> during invalidate_mapping_pages, it can't work.
>> It is very hard for application programmer to use it.
>> Because they always have to sync data before calling
>> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
>> be discardable. At last, they can't use deferred write of kernel
>> so that they could see performance loss.
>> (http://insights.oetiker.ch/linux/fadvise.html)
>>
>> In fact, invalidation is very big hint to reclaimer.
>> It means we don't use the page any more. So let's move
>> the writing page into inactive list's head.
>>
>> Why I need the page to head, Dirty/Writeback page would be flushed
>> sooner or later. It can prevent writeout of pageout which is less
>> effective than flusher's writeout.
>>
>> Originally, I reused lru_demote of Peter with some change so added
>> his Signed-off-by.
>>
>> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>>
>> Adnrew. Before applying this series, please drop below two patches.
>>  mm-deactivate-invalidated-pages.patch
>>  mm-deactivate-invalidated-pages-fix.patch
>>
>> Changelog since v2:
>>  - mapped page leaves alone - suggested by Mel
>>  - pass part related PG_reclaim in next patch.
>>
>> Changelog since v1:
>>  - modify description
>>  - correct typo
>>  - add some comment
>> ---
>>  include/linux/swap.h |    1 +
>>  mm/swap.c            |   80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  mm/truncate.c        |   16 +++++++--
>>  3 files changed, 93 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index eba53e7..84375e4 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>
> [...]
>
>> @@ -267,6 +270,63 @@ void add_page_to_unevictable_list(struct page *page)
>>  }
>>
>>  /*
>> + * This function is used by invalidate_mapping_pages.
>> + * If the page can't be invalidated, this function moves the page
>> + * into inative list's head. Because the VM expects the page would
>> + * be writeout by flusher. The flusher's writeout is much effective
>> + * than reclaimer's random writeout.
>
> The wording is a bit confusing, I find.  It sounds a bit like the
> flusher's chance is increased by moving it to the inactive list in the
> first place, but the key is that it is moved to the head instead of,
> what one would expect, the tail of the list.  How about:
>
>        If the page can not be invalidated, it is moved to the
>        inactive list to speed up its reclaim.  It is moved to the
>        head of the list, rather than the tail, to give the flusher
>        threads some time to write it out, as this is much more
>        effective than the single-page writeout from reclaim.
>

Looks good to me.
I will add your comment instead of my ugly comment.

>> +static void __lru_deactivate(struct page *page, struct zone *zone)
>
> Do you insist on the underscores? :)

Good point.

__lru_deactivate is self-contained.
It is valuable enough using other places.
I will remove underscores.

>
>> +{
>> +     int lru, file;
>> +     unsigned long vm_flags;
>> +
>> +     if (!PageLRU(page) || !PageActive(page))
>> +             return;
>> +
>> +     /* Some processes are using the page */
>> +     if (page_mapped(page))
>> +             return;
>> +
>> +     file = page_is_file_cache(page);
>> +     lru = page_lru_base_type(page);
>> +     del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
>> +     ClearPageActive(page);
>> +     ClearPageReferenced(page);
>> +     add_page_to_lru_list(zone, page, lru);
>> +     __count_vm_event(PGDEACTIVATE);
>> +
>> +     update_page_reclaim_stat(zone, page, file, 0);
>> +}
>> +
>> +/*
>> + * This function must be called with preemption disable.
>
> Why is that?  Unless I missed something, the only thing that needs
> protection is the per-cpu pagevec reference the only user of this
> function passes in.  But this should be the caller's concern and is
> not really a requirement of this function per-se, is it?

Yes. It's unnecessary.
I didn't consider enoughly.
Will fix.

>
>> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
>
> More underscores!

Will fix.

>
>> +{
>> +     int i;
>> +     struct zone *zone = NULL;
>> +
>> +     for (i = 0; i < pagevec_count(pvec); i++) {
>> +             struct page *page = pvec->pages[i];
>> +             struct zone *pagezone = page_zone(page);
>> +
>> +             if (pagezone != zone) {
>> +                     if (zone)
>> +                             spin_unlock_irq(&zone->lru_lock);
>> +                     zone = pagezone;
>> +                     spin_lock_irq(&zone->lru_lock);
>> +             }
>> +             __lru_deactivate(page, zone);
>> +     }
>> +     if (zone)
>> +             spin_unlock_irq(&zone->lru_lock);
>> +
>> +     release_pages(pvec->pages, pvec->nr, pvec->cold);
>> +     pagevec_reinit(pvec);
>> +}
>> +
>> +/*
>>   * Drain pages out of the cpu's pagevecs.
>>   * Either "cpu" is the current CPU, and preemption has already been
>>   * disabled; or "cpu" is being hot-unplugged, and is already dead.
>> @@ -292,6 +352,26 @@ static void drain_cpu_pagevecs(int cpu)
>>               pagevec_move_tail(pvec);
>>               local_irq_restore(flags);
>>       }
>> +
>> +     pvec = &per_cpu(lru_deactivate_pvecs, cpu);
>> +     if (pagevec_count(pvec))
>> +             __pagevec_lru_deactivate(pvec);
>> +}
>> +
>> +/*
>> + * Forcefully deactivate a page.
>> + * This function is used for reclaiming the page ASAP when the page
>> + * can't be invalidated by Dirty/Writeback.
>
> How about:
>
> /**
>  * lru_deactivate_page - forcefully deactivate a page
>  * @page: page to deactivate
>  *
>  * This function hints the VM that @page is a good reclaim candidate,
>  * for example if its invalidation fails due to the page being dirty
>  * or under writeback.
>  */
>
>> +void lru_deactivate_page(struct page *page)
>
> I would love that lru_ prefix for most of the API in this file.  In
> fact, the file should probably be called lru.c.  But for now, can you
> keep the naming consistent and call it deactivate_page?

No matter. I can change it. but deactivate_page will be asymmetric
about that deactivate_page move active page into inactive
forcefully(two step) while activate_page does one step activation.
That's why I name it.

>
>> +     if (likely(get_page_unless_zero(page))) {
>> +             struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>> +
>> +             if (!pagevec_add(pvec, page))
>> +                     __pagevec_lru_deactivate(pvec);
>> +             put_cpu_var(lru_deactivate_pvecs);
>> +     }
>>  }
>>
>>  void lru_add_drain(void)
>
> [...]
>
>> @@ -359,8 +360,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>>                       if (lock_failed)
>>                               continue;
>>
>> -                     ret += invalidate_inode_page(page);
>> -
>> +                     ret = invalidate_inode_page(page);
>> +                     /*
>> +                      * If the page was dirty or under writeback we cannot
>> +                      * invalidate it now.  Move it to the head of the
>> +                      * inactive LRU for using deferred writeback of flusher.
>
> This would also be less confusing if it would say
>
>        Move it to the head of the inactive LRU (rather than the tail)
>        for using [...]
>
> But I am not sure that this detail is interesting at this point.  It
> would be more interesting to name the reasons for why the page is
> moved to the inactive list in the first place:
>
>        If the page is dirty or under writeback, we can not invalidate
>        it now.  But we assume that attempted invalidation is a hint
>        that the page is no longer of interest and try to speed up its
>        reclaim.
>

Will fix.
I hope listen you guys's opinions about [2/3], too. :)

Thanks, Hannes.

>> +                      */
>> +                     if (!ret)
>> +                             lru_deactivate_page(page);
>> +                     count += ret;
>>                       unlock_page(page);
>>                       if (next > end)
>>                               break;
>
>        Hannes
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 22:41     ` Minchan Kim
@ 2010-11-30  9:16       ` Mel Gorman
  2010-11-30 14:04         ` Minchan Kim
  2010-11-30 11:20       ` Johannes Weiner
  1 sibling, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-30  9:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin

On Tue, Nov 30, 2010 at 07:41:30AM +0900, Minchan Kim wrote:
> On Mon, Nov 29, 2010 at 04:57:06PM +0000, Mel Gorman wrote:
> > On Tue, Nov 30, 2010 at 12:23:20AM +0900, Minchan Kim wrote:
> > > invalidate_mapping_pages is very big hint to reclaimer.
> > > It means user doesn't want to use the page any more.
> > > So in order to prevent working set page eviction, this patch
> > > move the page into tail of inactive list by PG_reclaim.
> > > 
> > > Please, remember that pages in inactive list are working set
> > > as well as active list. If we don't move pages into inactive list's
> > > tail, pages near by tail of inactive list can be evicted although
> > > we have a big clue about useless pages. It's totally bad.
> > > 
> > > Now PG_readahead/PG_reclaim is shared.
> > > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > > preventing fast reclaiming readahead marker page.
> > > 
> > > In this series, PG_reclaim is used by invalidated page, too.
> > > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > > to reclaim asap. Then, when the dirty page will be writeback,
> > > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > > It disturbs this serie's goal.
> > > 
> > > I think it's okay to clear PG_readahead when the page is dirty, not
> > > writeback time. So this patch moves ClearPageReadahead.
> > > 
> > > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Nick Piggin <npiggin@kernel.dk>
> > > Cc: Mel Gorman <mel@csn.ul.ie>
> > > 
> > > Changelog since v2:
> > >  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
> > >  
> > > Changelog since v1:
> > >  - make the invalidated page reclaim asap - suggested by Andrew.
> > > ---
> > >  mm/page-writeback.c |   12 +++++++++++-
> > >  mm/swap.c           |   48 +++++++++++++++++++++++++++++++++++-------------
> > >  2 files changed, 46 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index fc93802..88587a5 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
> > >  {
> > >  	struct address_space *mapping = page_mapping(page);
> > >  
> > > +	/*
> > > +	 * readahead/lru_deactivate_page could remain
> > > +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> > > +	 * About readahead, if the page is written, the flags would be
> > > +	 * reset. So no problem.
> > > +	 * About lru_deactivate_page, if the page is redirty, the flag
> > > +	 * will be reset. So no problem. but if the page is used by readahead
> > > +	 * it will confuse readahead and  make it restart the size rampup
> > > +	 * process. But it's a trivial problem.
> > > +	 */
> > > +	ClearPageReclaim(page);
> > >  	if (likely(mapping)) {
> > >  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> > >  #ifdef CONFIG_BLOCK
> > > @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
> > >  
> > >  	BUG_ON(!PageLocked(page));
> > >  
> > > -	ClearPageReclaim(page);
> > >  	if (mapping && mapping_cap_account_dirty(mapping)) {
> > >  		/*
> > >  		 * Yes, Virginia, this is indeed insane.
> > > diff --git a/mm/swap.c b/mm/swap.c
> > > index 19e0812..936b281 100644
> > > --- a/mm/swap.c
> > > +++ b/mm/swap.c
> > > @@ -275,28 +275,50 @@ void add_page_to_unevictable_list(struct page *page)
> > >   * into inative list's head. Because the VM expects the page would
> > >   * be writeout by flusher. The flusher's writeout is much effective
> > >   * than reclaimer's random writeout.
> > > + *
> > > + * If the page isn't page_mapped and dirty/writeback, the page
> > > + * could reclaim asap using PG_reclaim.
> > > + *
> > > + * 1. active, mapped page -> none
> > > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > > + * 3. inactive, mapped page -> none
> > > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > > + * 5. Others -> none
> > > + *
> > > + * In 4, why it moves inactive's head, the VM expects the page would
> > > + * be writeout by flusher. The flusher's writeout is much effective than
> > > + * reclaimer's random writeout.
> > >   */
> > >  static void __lru_deactivate(struct page *page, struct zone *zone)
> > >  {
> > >  	int lru, file;
> > > -	unsigned long vm_flags;
> > > +	int active = 0;
> > >  
> > > -	if (!PageLRU(page) || !PageActive(page))
> > > +	if (!PageLRU(page))
> > >  		return;
> > > -
> > >  	/* Some processes are using the page */
> > >  	if (page_mapped(page))
> > >  		return;
> > > -
> > > -	file = page_is_file_cache(page);
> > > -	lru = page_lru_base_type(page);
> > > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > > -	ClearPageActive(page);
> > > -	ClearPageReferenced(page);
> > > -	add_page_to_lru_list(zone, page, lru);
> > > -	__count_vm_event(PGDEACTIVATE);
> > > -
> > > -	update_page_reclaim_stat(zone, page, file, 0);
> > > +	if (PageActive(page))
> > > +		active = 1;
> > > +
> > > +	if (PageWriteback(page) || PageDirty(page)) {
> > > +		/*
> > > +		 * PG_reclaim could be raced with end_page_writeback
> > > +		 * It can make readahead confusing.  But race window
> > > +		 * is _really_ small and  it's non-critical problem.
> > > +		 */
> > > +		SetPageReclaim(page);
> > > +
> > > +		file = page_is_file_cache(page);
> > > +		lru = page_lru_base_type(page);
> > > +		del_page_from_lru_list(zone, page, lru + active);
> > > +		ClearPageActive(page);
> > > +		ClearPageReferenced(page);
> > > +		add_page_to_lru_list(zone, page, lru);
> > > +		__count_vm_event(PGDEACTIVATE);
> > 
> > You update PGDEACTIVATE whether the page was active or not.
> 
> My fault. 
> Resend.
> 
> Subject: [PATCH v3 2/3] Reclaim invalidated page ASAP
> 
> invalidate_mapping_pages is very big hint to reclaimer.
> It means user doesn't want to use the page any more.
> So in order to prevent working set page eviction, this patch
> move the page into tail of inactive list by PG_reclaim.
> 
> Please, remember that pages in inactive list are working set
> as well as active list. If we don't move pages into inactive list's
> tail, pages near by tail of inactive list can be evicted although
> we have a big clue about useless pages. It's totally bad.
> 
> Now PG_readahead/PG_reclaim is shared.
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Changelog since v2:
>  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
> 
> Changelog since v1:
>  - make the invalidated page reclaim asap - suggested by Andrew.
> ---
>  mm/page-writeback.c |   12 +++++++++++-
>  mm/swap.c           |   49 ++++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 47 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index fc93802..88587a5 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
>  {
>  	struct address_space *mapping = page_mapping(page);
>  
> +	/*
> +	 * readahead/lru_deactivate_page could remain
> +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> +	 * About readahead, if the page is written, the flags would be
> +	 * reset. So no problem.
> +	 * About lru_deactivate_page, if the page is redirty, the flag
> +	 * will be reset. So no problem. but if the page is used by readahead
> +	 * it will confuse readahead and  make it restart the size rampup
> +	 * process. But it's a trivial problem.
> +	 */
> +	ClearPageReclaim(page);
>  	if (likely(mapping)) {
>  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
>  #ifdef CONFIG_BLOCK
> @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
>  
>  	BUG_ON(!PageLocked(page));
>  
> -	ClearPageReclaim(page);
>  	if (mapping && mapping_cap_account_dirty(mapping)) {
>  		/*
>  		 * Yes, Virginia, this is indeed insane.
> diff --git a/mm/swap.c b/mm/swap.c
> index 19e0812..1f1f435 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -275,28 +275,51 @@ void add_page_to_unevictable_list(struct page *page)
>   * into inative list's head. Because the VM expects the page would
>   * be writeout by flusher. The flusher's writeout is much effective
>   * than reclaimer's random writeout.
> + *
> + * If the page isn't page_mapped and dirty/writeback, the page
> + * could reclaim asap using PG_reclaim.
> + *
> + * 1. active, mapped page -> none
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. Others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
>   */
>  static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
>  	int lru, file;
> -	unsigned long vm_flags;
> +	int active = 0;
>  
> -	if (!PageLRU(page) || !PageActive(page))
> +	if (!PageLRU(page))
>  		return;
> -
>  	/* Some processes are using the page */
>  	if (page_mapped(page))
>  		return;
> -
> -	file = page_is_file_cache(page);
> -	lru = page_lru_base_type(page);
> -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> -	ClearPageActive(page);
> -	ClearPageReferenced(page);
> -	add_page_to_lru_list(zone, page, lru);
> -	__count_vm_event(PGDEACTIVATE);
> -
> -	update_page_reclaim_stat(zone, page, file, 0);
> +	if (PageActive(page))
> +		active = 1;
> +

I should have said this the last time but if you do another revision,
make active a "bool". There is a very slow migration of int to bool in
cases it makes sense. It's not urgent though.

> +	if (PageWriteback(page) || PageDirty(page)) {
> +		/*
> +		 * PG_reclaim could be raced with end_page_writeback
> +		 * It can make readahead confusing.  But race window
> +		 * is _really_ small and  it's non-critical problem.
> +		 */
> +		SetPageReclaim(page);
> +
> +		file = page_is_file_cache(page);
> +		lru = page_lru_base_type(page);
> +		del_page_from_lru_list(zone, page, lru + active);
> +		ClearPageActive(page);
> +		ClearPageReferenced(page);
> +		add_page_to_lru_list(zone, page, lru);
> +		if (active)
> +			__count_vm_event(PGDEACTIVATE);
> +		update_page_reclaim_stat(zone, page, file, 0);
> +	}
>  }
>  

Whether you update active's type or not;

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-30  1:10   ` KOSAKI Motohiro
@ 2010-11-30  9:18     ` Mel Gorman
  2010-11-30 14:01       ` Minchan Kim
  2010-11-30 14:11       ` Ben Gamari
  0 siblings, 2 replies; 22+ messages in thread
From: Mel Gorman @ 2010-11-30  9:18 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, Nov 30, 2010 at 10:10:20AM +0900, KOSAKI Motohiro wrote:
> > invalidate_mapping_pages is very big hint to reclaimer.
> > It means user doesn't want to use the page any more.
> > So in order to prevent working set page eviction, this patch
> > move the page into tail of inactive list by PG_reclaim.
> > 
> > Please, remember that pages in inactive list are working set
> > as well as active list. If we don't move pages into inactive list's
> > tail, pages near by tail of inactive list can be evicted although
> > we have a big clue about useless pages. It's totally bad.
> > 
> > Now PG_readahead/PG_reclaim is shared.
> > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > preventing fast reclaiming readahead marker page.
> > 
> > In this series, PG_reclaim is used by invalidated page, too.
> > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > to reclaim asap. Then, when the dirty page will be writeback,
> > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > It disturbs this serie's goal.
> > 
> > I think it's okay to clear PG_readahead when the page is dirty, not
> > writeback time. So this patch moves ClearPageReadahead.
> > 
> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Nick Piggin <npiggin@kernel.dk>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> 
> I still dislike this one. I doubt this trick makes much benefit in real
> world workload.
> 

I would agree except as said elsewhere, it's a chicken and egg problem.
We don't have a real world test because fadvise is not useful in its
current iteration. I'm hoping that there will be a test comparing

rsync		on vanilla kernel
rsync		on patched kernel
rsync+patch	on vanilla kernel
rsync+patch	on patched kernel

Are the results of such a test likely to happen?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-29 22:41     ` Minchan Kim
  2010-11-30  9:16       ` Mel Gorman
@ 2010-11-30 11:20       ` Johannes Weiner
  2010-11-30 14:03         ` Minchan Kim
  1 sibling, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2010-11-30 11:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Wu Fengguang, KOSAKI Motohiro, Nick Piggin

On Tue, Nov 30, 2010 at 07:41:30AM +0900, Minchan Kim wrote:

> diff --git a/mm/swap.c b/mm/swap.c
> index 19e0812..1f1f435 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -275,28 +275,51 @@ void add_page_to_unevictable_list(struct page *page)
>   * into inative list's head. Because the VM expects the page would
>   * be writeout by flusher. The flusher's writeout is much effective
>   * than reclaimer's random writeout.
> + *
> + * If the page isn't page_mapped and dirty/writeback, the page
> + * could reclaim asap using PG_reclaim.
> + *
> + * 1. active, mapped page -> none
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. Others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
>   */
>  static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
>  	int lru, file;
> -	unsigned long vm_flags;
> +	int active = 0;

vm_flags is never used in this series.

> -	if (!PageLRU(page) || !PageActive(page))
> +	if (!PageLRU(page))
>  		return;
> -
>  	/* Some processes are using the page */
>  	if (page_mapped(page))
>  		return;
> -
> -	file = page_is_file_cache(page);
> -	lru = page_lru_base_type(page);
> -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> -	ClearPageActive(page);
> -	ClearPageReferenced(page);
> -	add_page_to_lru_list(zone, page, lru);
> -	__count_vm_event(PGDEACTIVATE);
> -
> -	update_page_reclaim_stat(zone, page, file, 0);
> +	if (PageActive(page))
> +		active = 1;

	active = PageActive(page)

> +	if (PageWriteback(page) || PageDirty(page)) {
> +		/*
> +		 * PG_reclaim could be raced with end_page_writeback
> +		 * It can make readahead confusing.  But race window
> +		 * is _really_ small and  it's non-critical problem.
> +		 */
> +		SetPageReclaim(page);
> +
> +		file = page_is_file_cache(page);
> +		lru = page_lru_base_type(page);
> +		del_page_from_lru_list(zone, page, lru + active);
> +		ClearPageActive(page);
> +		ClearPageReferenced(page);
> +		add_page_to_lru_list(zone, page, lru);
> +		if (active)
> +			__count_vm_event(PGDEACTIVATE);
> +		update_page_reclaim_stat(zone, page, file, 0);
> +	}

If we lose the race with writeback, the completion handler won't see
PG_reclaim, won't move the page, and we have an unwanted clean cache
page on the active list.  Given the pagevec caching of those pages it
could be rather likely that IO completes before the above executes.

Shouldn't this be

	if (PageWriteback() || PageDirty()) {
		SetPageReclaim()
		move_to_inactive_head()
	} else {
		move_to_inactive_tail()
	}

instead?

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
  2010-11-30  1:08   ` KOSAKI Motohiro
@ 2010-11-30 11:35   ` Johannes Weiner
  2010-12-01  0:50     ` Minchan Kim
  2010-11-30 18:34   ` Hugh Dickins
  2 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2010-11-30 11:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Nick Piggin, Mel Gorman, Wu Fengguang

On Tue, Nov 30, 2010 at 12:23:21AM +0900, Minchan Kim wrote:
> Now zap_pte_range alwayas activates pages which are pte_young &&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> 
> Changelog since v2:
>  - remove unnecessary description
> Changelog since v1: 
>  - change word from promote to activate
>  - add activate argument to zap_pte_range and family function
> 
> ---
>  include/linux/mm.h |    4 ++--
>  mm/madvise.c       |    4 ++--
>  mm/memory.c        |   38 +++++++++++++++++++++++---------------
>  mm/mmap.c          |    4 ++--
>  4 files changed, 29 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e097df6..6032881 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -779,11 +779,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		unsigned long size);
>  unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
> -		unsigned long size, struct zap_details *);
> +		unsigned long size, struct zap_details *, bool activate);

I would prefer naming the parameter 'ignore_references' or something
similar, so that it reflects the immediate effect on the zappers'
behaviour, not what mark_page_accessed() might end up doing.

Other than that, the patch looks good to me.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-30  9:18     ` Mel Gorman
@ 2010-11-30 14:01       ` Minchan Kim
  2010-11-30 14:11       ` Ben Gamari
  1 sibling, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-30 14:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, Nov 30, 2010 at 09:18:22AM +0000, Mel Gorman wrote:
> On Tue, Nov 30, 2010 at 10:10:20AM +0900, KOSAKI Motohiro wrote:
> > > invalidate_mapping_pages is very big hint to reclaimer.
> > > It means user doesn't want to use the page any more.
> > > So in order to prevent working set page eviction, this patch
> > > move the page into tail of inactive list by PG_reclaim.
> > > 
> > > Please, remember that pages in inactive list are working set
> > > as well as active list. If we don't move pages into inactive list's
> > > tail, pages near by tail of inactive list can be evicted although
> > > we have a big clue about useless pages. It's totally bad.
> > > 
> > > Now PG_readahead/PG_reclaim is shared.
> > > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > > preventing fast reclaiming readahead marker page.
> > > 
> > > In this series, PG_reclaim is used by invalidated page, too.
> > > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > > to reclaim asap. Then, when the dirty page will be writeback,
> > > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > > It disturbs this serie's goal.
> > > 
> > > I think it's okay to clear PG_readahead when the page is dirty, not
> > > writeback time. So this patch moves ClearPageReadahead.
> > > 
> > > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Nick Piggin <npiggin@kernel.dk>
> > > Cc: Mel Gorman <mel@csn.ul.ie>
> > 
> > I still dislike this one. I doubt this trick makes much benefit in real
> > world workload.
> > 
> 
> I would agree except as said elsewhere, it's a chicken and egg problem.
> We don't have a real world test because fadvise is not useful in its
> current iteration. I'm hoping that there will be a test comparing
> 
> rsync		on vanilla kernel
> rsync		on patched kernel
> rsync+patch	on vanilla kernel
> rsync+patch	on patched kernel
> 
> Are the results of such a test likely to happen?

Ben, Could you get the rsync execution time(user/sys) and 
'cat /proc/vmstat' result before/after?
If Ben is busy, I will try to get a data. but I need enough time.

I expect rsync+patch on patched kernel should have a less allocstall, less pgscan 
so fast execution time.

> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-30 11:20       ` Johannes Weiner
@ 2010-11-30 14:03         ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-30 14:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Mel Gorman, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Wu Fengguang, KOSAKI Motohiro, Nick Piggin

On Tue, Nov 30, 2010 at 12:20:41PM +0100, Johannes Weiner wrote:
> On Tue, Nov 30, 2010 at 07:41:30AM +0900, Minchan Kim wrote:
> 
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 19e0812..1f1f435 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -275,28 +275,51 @@ void add_page_to_unevictable_list(struct page *page)
> >   * into inative list's head. Because the VM expects the page would
> >   * be writeout by flusher. The flusher's writeout is much effective
> >   * than reclaimer's random writeout.
> > + *
> > + * If the page isn't page_mapped and dirty/writeback, the page
> > + * could reclaim asap using PG_reclaim.
> > + *
> > + * 1. active, mapped page -> none
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 5. Others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be writeout by flusher. The flusher's writeout is much effective than
> > + * reclaimer's random writeout.
> >   */
> >  static void __lru_deactivate(struct page *page, struct zone *zone)
> >  {
> >  	int lru, file;
> > -	unsigned long vm_flags;
> > +	int active = 0;
> 
> vm_flags is never used in this series.

It's garbage in my old version which is used page_referenced.

> 
> > -	if (!PageLRU(page) || !PageActive(page))
> > +	if (!PageLRU(page))
> >  		return;
> > -
> >  	/* Some processes are using the page */
> >  	if (page_mapped(page))
> >  		return;
> > -
> > -	file = page_is_file_cache(page);
> > -	lru = page_lru_base_type(page);
> > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > -	ClearPageActive(page);
> > -	ClearPageReferenced(page);
> > -	add_page_to_lru_list(zone, page, lru);
> > -	__count_vm_event(PGDEACTIVATE);
> > -
> > -	update_page_reclaim_stat(zone, page, file, 0);
> > +	if (PageActive(page))
> > +		active = 1;
> 
> 	active = PageActive(page)

Will fix. 

> 
> > +	if (PageWriteback(page) || PageDirty(page)) {
> > +		/*
> > +		 * PG_reclaim could be raced with end_page_writeback
> > +		 * It can make readahead confusing.  But race window
> > +		 * is _really_ small and  it's non-critical problem.
> > +		 */
> > +		SetPageReclaim(page);
> > +
> > +		file = page_is_file_cache(page);
> > +		lru = page_lru_base_type(page);
> > +		del_page_from_lru_list(zone, page, lru + active);
> > +		ClearPageActive(page);
> > +		ClearPageReferenced(page);
> > +		add_page_to_lru_list(zone, page, lru);
> > +		if (active)
> > +			__count_vm_event(PGDEACTIVATE);
> > +		update_page_reclaim_stat(zone, page, file, 0);
> > +	}
> 
> If we lose the race with writeback, the completion handler won't see
> PG_reclaim, won't move the page, and we have an unwanted clean cache
> page on the active list.  Given the pagevec caching of those pages it
> could be rather likely that IO completes before the above executes.
> 
> Shouldn't this be
> 
> 	if (PageWriteback() || PageDirty()) {
> 		SetPageReclaim()
> 		move_to_inactive_head()
> 	} else {
> 		move_to_inactive_tail()
> 	}
> 
> instead?

Fair enough.

Thanks, Hannes.
> 
> 	Hannes

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-30  9:16       ` Mel Gorman
@ 2010-11-30 14:04         ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-11-30 14:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin

On Tue, Nov 30, 2010 at 09:16:18AM +0000, Mel Gorman wrote:
> On Tue, Nov 30, 2010 at 07:41:30AM +0900, Minchan Kim wrote:
> > On Mon, Nov 29, 2010 at 04:57:06PM +0000, Mel Gorman wrote:
> > > On Tue, Nov 30, 2010 at 12:23:20AM +0900, Minchan Kim wrote:
> > > > invalidate_mapping_pages is very big hint to reclaimer.
> > > > It means user doesn't want to use the page any more.
> > > > So in order to prevent working set page eviction, this patch
> > > > move the page into tail of inactive list by PG_reclaim.
> > > > 
> > > > Please, remember that pages in inactive list are working set
> > > > as well as active list. If we don't move pages into inactive list's
> > > > tail, pages near by tail of inactive list can be evicted although
> > > > we have a big clue about useless pages. It's totally bad.
> > > > 
> > > > Now PG_readahead/PG_reclaim is shared.
> > > > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > > > preventing fast reclaiming readahead marker page.
> > > > 
> > > > In this series, PG_reclaim is used by invalidated page, too.
> > > > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > > > to reclaim asap. Then, when the dirty page will be writeback,
> > > > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > > > It disturbs this serie's goal.
> > > > 
> > > > I think it's okay to clear PG_readahead when the page is dirty, not
> > > > writeback time. So this patch moves ClearPageReadahead.
> > > > 
> > > > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > > > Acked-by: Rik van Riel <riel@redhat.com>
> > > > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > > > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Cc: Nick Piggin <npiggin@kernel.dk>
> > > > Cc: Mel Gorman <mel@csn.ul.ie>
> > > > 
> > > > Changelog since v2:
> > > >  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
> > > >  
> > > > Changelog since v1:
> > > >  - make the invalidated page reclaim asap - suggested by Andrew.
> > > > ---
> > > >  mm/page-writeback.c |   12 +++++++++++-
> > > >  mm/swap.c           |   48 +++++++++++++++++++++++++++++++++++-------------
> > > >  2 files changed, 46 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index fc93802..88587a5 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
> > > >  {
> > > >  	struct address_space *mapping = page_mapping(page);
> > > >  
> > > > +	/*
> > > > +	 * readahead/lru_deactivate_page could remain
> > > > +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> > > > +	 * About readahead, if the page is written, the flags would be
> > > > +	 * reset. So no problem.
> > > > +	 * About lru_deactivate_page, if the page is redirty, the flag
> > > > +	 * will be reset. So no problem. but if the page is used by readahead
> > > > +	 * it will confuse readahead and  make it restart the size rampup
> > > > +	 * process. But it's a trivial problem.
> > > > +	 */
> > > > +	ClearPageReclaim(page);
> > > >  	if (likely(mapping)) {
> > > >  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> > > >  #ifdef CONFIG_BLOCK
> > > > @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
> > > >  
> > > >  	BUG_ON(!PageLocked(page));
> > > >  
> > > > -	ClearPageReclaim(page);
> > > >  	if (mapping && mapping_cap_account_dirty(mapping)) {
> > > >  		/*
> > > >  		 * Yes, Virginia, this is indeed insane.
> > > > diff --git a/mm/swap.c b/mm/swap.c
> > > > index 19e0812..936b281 100644
> > > > --- a/mm/swap.c
> > > > +++ b/mm/swap.c
> > > > @@ -275,28 +275,50 @@ void add_page_to_unevictable_list(struct page *page)
> > > >   * into inative list's head. Because the VM expects the page would
> > > >   * be writeout by flusher. The flusher's writeout is much effective
> > > >   * than reclaimer's random writeout.
> > > > + *
> > > > + * If the page isn't page_mapped and dirty/writeback, the page
> > > > + * could reclaim asap using PG_reclaim.
> > > > + *
> > > > + * 1. active, mapped page -> none
> > > > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > > > + * 3. inactive, mapped page -> none
> > > > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > > > + * 5. Others -> none
> > > > + *
> > > > + * In 4, why it moves inactive's head, the VM expects the page would
> > > > + * be writeout by flusher. The flusher's writeout is much effective than
> > > > + * reclaimer's random writeout.
> > > >   */
> > > >  static void __lru_deactivate(struct page *page, struct zone *zone)
> > > >  {
> > > >  	int lru, file;
> > > > -	unsigned long vm_flags;
> > > > +	int active = 0;
> > > >  
> > > > -	if (!PageLRU(page) || !PageActive(page))
> > > > +	if (!PageLRU(page))
> > > >  		return;
> > > > -
> > > >  	/* Some processes are using the page */
> > > >  	if (page_mapped(page))
> > > >  		return;
> > > > -
> > > > -	file = page_is_file_cache(page);
> > > > -	lru = page_lru_base_type(page);
> > > > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > > > -	ClearPageActive(page);
> > > > -	ClearPageReferenced(page);
> > > > -	add_page_to_lru_list(zone, page, lru);
> > > > -	__count_vm_event(PGDEACTIVATE);
> > > > -
> > > > -	update_page_reclaim_stat(zone, page, file, 0);
> > > > +	if (PageActive(page))
> > > > +		active = 1;
> > > > +
> > > > +	if (PageWriteback(page) || PageDirty(page)) {
> > > > +		/*
> > > > +		 * PG_reclaim could be raced with end_page_writeback
> > > > +		 * It can make readahead confusing.  But race window
> > > > +		 * is _really_ small and  it's non-critical problem.
> > > > +		 */
> > > > +		SetPageReclaim(page);
> > > > +
> > > > +		file = page_is_file_cache(page);
> > > > +		lru = page_lru_base_type(page);
> > > > +		del_page_from_lru_list(zone, page, lru + active);
> > > > +		ClearPageActive(page);
> > > > +		ClearPageReferenced(page);
> > > > +		add_page_to_lru_list(zone, page, lru);
> > > > +		__count_vm_event(PGDEACTIVATE);
> > > 
> > > You update PGDEACTIVATE whether the page was active or not.
> > 
> > My fault. 
> > Resend.
> > 
> > Subject: [PATCH v3 2/3] Reclaim invalidated page ASAP
> > 
> > invalidate_mapping_pages is very big hint to reclaimer.
> > It means user doesn't want to use the page any more.
> > So in order to prevent working set page eviction, this patch
> > move the page into tail of inactive list by PG_reclaim.
> > 
> > Please, remember that pages in inactive list are working set
> > as well as active list. If we don't move pages into inactive list's
> > tail, pages near by tail of inactive list can be evicted although
> > we have a big clue about useless pages. It's totally bad.
> > 
> > Now PG_readahead/PG_reclaim is shared.
> > fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> > preventing fast reclaiming readahead marker page.
> > 
> > In this series, PG_reclaim is used by invalidated page, too.
> > If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> > to reclaim asap. Then, when the dirty page will be writeback,
> > clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> > It disturbs this serie's goal.
> > 
> > I think it's okay to clear PG_readahead when the page is dirty, not
> > writeback time. So this patch moves ClearPageReadahead.
> > 
> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Nick Piggin <npiggin@kernel.dk>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > 
> > Changelog since v2:
> >  - put ClearPageReclaim in set_page_dirty - suggested by Wu.
> > 
> > Changelog since v1:
> >  - make the invalidated page reclaim asap - suggested by Andrew.
> > ---
> >  mm/page-writeback.c |   12 +++++++++++-
> >  mm/swap.c           |   49 ++++++++++++++++++++++++++++++++++++-------------
> >  2 files changed, 47 insertions(+), 14 deletions(-)
> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index fc93802..88587a5 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
> >  {
> >  	struct address_space *mapping = page_mapping(page);
> >  
> > +	/*
> > +	 * readahead/lru_deactivate_page could remain
> > +	 * PG_readahead/PG_reclaim due to race with end_page_writeback
> > +	 * About readahead, if the page is written, the flags would be
> > +	 * reset. So no problem.
> > +	 * About lru_deactivate_page, if the page is redirty, the flag
> > +	 * will be reset. So no problem. but if the page is used by readahead
> > +	 * it will confuse readahead and  make it restart the size rampup
> > +	 * process. But it's a trivial problem.
> > +	 */
> > +	ClearPageReclaim(page);
> >  	if (likely(mapping)) {
> >  		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> >  #ifdef CONFIG_BLOCK
> > @@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
> >  
> >  	BUG_ON(!PageLocked(page));
> >  
> > -	ClearPageReclaim(page);
> >  	if (mapping && mapping_cap_account_dirty(mapping)) {
> >  		/*
> >  		 * Yes, Virginia, this is indeed insane.
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 19e0812..1f1f435 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -275,28 +275,51 @@ void add_page_to_unevictable_list(struct page *page)
> >   * into inative list's head. Because the VM expects the page would
> >   * be writeout by flusher. The flusher's writeout is much effective
> >   * than reclaimer's random writeout.
> > + *
> > + * If the page isn't page_mapped and dirty/writeback, the page
> > + * could reclaim asap using PG_reclaim.
> > + *
> > + * 1. active, mapped page -> none
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 5. Others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be writeout by flusher. The flusher's writeout is much effective than
> > + * reclaimer's random writeout.
> >   */
> >  static void __lru_deactivate(struct page *page, struct zone *zone)
> >  {
> >  	int lru, file;
> > -	unsigned long vm_flags;
> > +	int active = 0;
> >  
> > -	if (!PageLRU(page) || !PageActive(page))
> > +	if (!PageLRU(page))
> >  		return;
> > -
> >  	/* Some processes are using the page */
> >  	if (page_mapped(page))
> >  		return;
> > -
> > -	file = page_is_file_cache(page);
> > -	lru = page_lru_base_type(page);
> > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > -	ClearPageActive(page);
> > -	ClearPageReferenced(page);
> > -	add_page_to_lru_list(zone, page, lru);
> > -	__count_vm_event(PGDEACTIVATE);
> > -
> > -	update_page_reclaim_stat(zone, page, file, 0);
> > +	if (PageActive(page))
> > +		active = 1;
> > +
> 
> I should have said this the last time but if you do another revision,
> make active a "bool". There is a very slow migration of int to bool in
> cases it makes sense. It's not urgent though.

Okay, I will fix it.

> 
> > +	if (PageWriteback(page) || PageDirty(page)) {
> > +		/*
> > +		 * PG_reclaim could be raced with end_page_writeback
> > +		 * It can make readahead confusing.  But race window
> > +		 * is _really_ small and  it's non-critical problem.
> > +		 */
> > +		SetPageReclaim(page);
> > +
> > +		file = page_is_file_cache(page);
> > +		lru = page_lru_base_type(page);
> > +		del_page_from_lru_list(zone, page, lru + active);
> > +		ClearPageActive(page);
> > +		ClearPageReferenced(page);
> > +		add_page_to_lru_list(zone, page, lru);
> > +		if (active)
> > +			__count_vm_event(PGDEACTIVATE);
> > +		update_page_reclaim_stat(zone, page, file, 0);
> > +	}
> >  }
> >  
> 
> Whether you update active's type or not;
> 
> Acked-by: Mel Gorman <mel@csn.ul.ie>

Thanks, Mel.

> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Reclaim invalidated page ASAP
  2010-11-30  9:18     ` Mel Gorman
  2010-11-30 14:01       ` Minchan Kim
@ 2010-11-30 14:11       ` Ben Gamari
  1 sibling, 0 replies; 22+ messages in thread
From: Ben Gamari @ 2010-11-30 14:11 UTC (permalink / raw)
  To: Mel Gorman, KOSAKI Motohiro
  Cc: Minchan Kim, Andrew Morton, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin

On Tue, 30 Nov 2010 09:18:22 +0000, Mel Gorman <mel@csn.ul.ie> wrote:
> I would agree except as said elsewhere, it's a chicken and egg problem.
> We don't have a real world test because fadvise is not useful in its
> current iteration. I'm hoping that there will be a test comparing
> 
> rsync		on vanilla kernel
> rsync		on patched kernel
> rsync+patch	on vanilla kernel
> rsync+patch	on patched kernel
> 
> Are the results of such a test likely to happen?
> 
Yes, absolutely, although I'm sorry it has taken so long. Between
thanksgiving and the impending end of the semester things have been a
bit hectic. Nevertheless, I just finished putting together a script to
record some metics from /proc/vmstat and /proc/[pid]/statm, so at this
point I'm ready to finally take some data. Any suggestions for
particular patterns to look for in the numbers or other metrics to
record are welcome. Cheers,

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
  2010-11-30  1:08   ` KOSAKI Motohiro
  2010-11-30 11:35   ` Johannes Weiner
@ 2010-11-30 18:34   ` Hugh Dickins
  2010-12-01  0:49     ` Minchan Kim
  2 siblings, 1 reply; 22+ messages in thread
From: Hugh Dickins @ 2010-11-30 18:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Tue, 30 Nov 2010, Minchan Kim wrote:

> Now zap_pte_range alwayas activates pages which are pte_young &&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> 
> Changelog since v2:
>  - remove unnecessary description
> Changelog since v1: 
>  - change word from promote to activate
>  - add activate argument to zap_pte_range and family function
> 
> ---
>  include/linux/mm.h |    4 ++--
>  mm/madvise.c       |    4 ++--
>  mm/memory.c        |   38 +++++++++++++++++++++++---------------
>  mm/mmap.c          |    4 ++--
>  4 files changed, 29 insertions(+), 21 deletions(-)

Everyone else seems pretty happy with this, and I've not checked
at all whether it achieves your purpose; but personally I'd much
prefer a smaller patch which adds your "activate" or "ignore_references"
flag to struct zap_details, instead of passing this exceptional arg
down lots of levels.  That's precisely the purpose of zap_details,
to gather together a few things that aren't needed in the common case
(though I admit the NULL details defaulting may be ugly).

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-30 18:34   ` Hugh Dickins
@ 2010-12-01  0:49     ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-12-01  0:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

Hi Hugh,

On Wed, Dec 1, 2010 at 3:34 AM, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 30 Nov 2010, Minchan Kim wrote:
>
>> Now zap_pte_range alwayas activates pages which are pte_young &&
>> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
>> it's unnecessary since the page wouldn't use any more.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>>
>> Changelog since v2:
>>  - remove unnecessary description
>> Changelog since v1:
>>  - change word from promote to activate
>>  - add activate argument to zap_pte_range and family function
>>
>> ---
>>  include/linux/mm.h |    4 ++--
>>  mm/madvise.c       |    4 ++--
>>  mm/memory.c        |   38 +++++++++++++++++++++++---------------
>>  mm/mmap.c          |    4 ++--
>>  4 files changed, 29 insertions(+), 21 deletions(-)
>
> Everyone else seems pretty happy with this, and I've not checked
> at all whether it achieves your purpose; but personally I'd much
> prefer a smaller patch which adds your "activate" or "ignore_references"
> flag to struct zap_details, instead of passing this exceptional arg
> down lots of levels.  That's precisely the purpose of zap_details,
> to gather together a few things that aren't needed in the common case
> (though I admit the NULL details defaulting may be ugly).

Before I sent RFC, I tried it and suffered from NULL detail as you said.
But it's valuable to look on it, again.
Since other guys don't opposed this patch's goal, I will have a time
for unifying it into zap_details.

Thanks, Hugh.


> Hugh
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/3] Prevent activation of page in madvise_dontneed
  2010-11-30 11:35   ` Johannes Weiner
@ 2010-12-01  0:50     ` Minchan Kim
  0 siblings, 0 replies; 22+ messages in thread
From: Minchan Kim @ 2010-12-01  0:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Nick Piggin, Mel Gorman, Wu Fengguang

On Tue, Nov 30, 2010 at 8:35 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Nov 30, 2010 at 12:23:21AM +0900, Minchan Kim wrote:
>> Now zap_pte_range alwayas activates pages which are pte_young &&
>> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
>> it's unnecessary since the page wouldn't use any more.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>>
>> Changelog since v2:
>>  - remove unnecessary description
>> Changelog since v1:
>>  - change word from promote to activate
>>  - add activate argument to zap_pte_range and family function
>>
>> ---
>>  include/linux/mm.h |    4 ++--
>>  mm/madvise.c       |    4 ++--
>>  mm/memory.c        |   38 +++++++++++++++++++++++---------------
>>  mm/mmap.c          |    4 ++--
>>  4 files changed, 29 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e097df6..6032881 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -779,11 +779,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>>               unsigned long size);
>>  unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
>> -             unsigned long size, struct zap_details *);
>> +             unsigned long size, struct zap_details *, bool activate);
>
> I would prefer naming the parameter 'ignore_references' or something
> similar, so that it reflects the immediate effect on the zappers'
> behaviour, not what mark_page_accessed() might end up doing.
>
> Other than that, the patch looks good to me.

Fair enough.
Will fix.
Maybe it would take a long time until sending next version.

Thanks, Hannes.

>
>        Hannes
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-12-01  0:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-29 15:23 [PATCH v3 0/3] f/madivse(DONTNEED) support Minchan Kim
2010-11-29 15:23 ` [PATCH v3 1/3] deactivate invalidated pages Minchan Kim
2010-11-30  1:01   ` KOSAKI Motohiro
2010-11-30  5:22   ` Johannes Weiner
2010-11-30  6:18     ` Minchan Kim
2010-11-29 15:23 ` [PATCH 2/3] Reclaim invalidated page ASAP Minchan Kim
2010-11-29 16:57   ` Mel Gorman
2010-11-29 22:41     ` Minchan Kim
2010-11-30  9:16       ` Mel Gorman
2010-11-30 14:04         ` Minchan Kim
2010-11-30 11:20       ` Johannes Weiner
2010-11-30 14:03         ` Minchan Kim
2010-11-30  1:10   ` KOSAKI Motohiro
2010-11-30  9:18     ` Mel Gorman
2010-11-30 14:01       ` Minchan Kim
2010-11-30 14:11       ` Ben Gamari
2010-11-29 15:23 ` [PATCH v3 3/3] Prevent activation of page in madvise_dontneed Minchan Kim
2010-11-30  1:08   ` KOSAKI Motohiro
2010-11-30 11:35   ` Johannes Weiner
2010-12-01  0:50     ` Minchan Kim
2010-11-30 18:34   ` Hugh Dickins
2010-12-01  0:49     ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).