[PATCH v2 1/3] deactivate invalidated pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 1/3] deactivate invalidated pages
@ 2010-11-28 15:02 Minchan Kim
  2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-28 15:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, KOSAKI Motohiro, Johannes Weiner,
	Nick Piggin, Mel Gorman

This patch is based on mmotm-11-23. 

Recently, there are reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.

Some app developer want to support POSIX_FADV_NOREUSE.
But other OSes don't support it, either.
(http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By Other approach, app developer uses POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is writing
during invalidate_mapping_pages, it can't work.
It is very hard for application programmer to use it.
Because they always have to sync data before calling
fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
be discardable. At last, they can't use deferred write of kernel
so that they could see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So let's move
the writing page into inactive list's head.

Why I need the page to head, Dirty/Writeback page would be flushed
sooner or later. This patch uses trick PG_reclaim so the page would
be moved into tail of inactive list when the page writeout completes.

It can prevent writeout of pageout which is less effective than
flusher's writeout.

This patch considers page_mappged(page) with working set.
So the page could leave head of inactive to get a change to activate.

Originally, I reused lru_demote of Peter with some change so added
his Signed-off-by.

Note :
PG_reclaim trick of writeback page could race with end_page_writeback
so this patch check PageWriteback one more. It makes race window time
reall small. But by theoretical, it still have a race. But it's a trivial.

Quote from fe3cba17 and some modification
"If some page PG_reclaim unintentionally, it will confuse readahead and
make it restart the size rampup process. But it's a trivial problem, and
can mostly be avoided by checking PageWriteback(page) first in readahead"

PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
always clears PG_reclaim. Next patch will fix it.

Reported-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Changelog since v1:
 - modify description
 - correct typo
 - add some comment
 - change deactivation policy
---
 mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 31f5ec4..345eca1 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
 	spin_unlock_irq(&zone->lru_lock);
 }

-static void __pagevec_lru_deactive(struct pagevec *pvec)
+/*
+ * This function is used by invalidate_mapping_pages.
+ * If the page can't be invalidated, this function moves the page
+ * into inative list's head or tail to reclaim ASAP and evict
+ * working set page.
+ *
+ * PG_reclaim means when the page's writeback completes, the page
+ * will move into tail of inactive for reclaiming ASAP.
+ *
+ * 1. active, mapped page -> inactive, head
+ * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
+ * 3. inactive, mapped page -> none
+ * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
+ * 5. others -> none
+ *
+ * In 4, why it moves inactive's head, the VM expects the page would
+ * be writeout by flusher. The flusher's writeout is much effective than
+ * reclaimer's random writeout.
+ */
+static void __lru_deactivate(struct page *page, struct zone *zone)
 {
-	int i, lru, file;
+	int lru, file;
+	int active = 0;
+
+	if (!PageLRU(page))
+		return;
+
+	if (PageActive(page))
+		active = 1;
+	/* Some processes are using the page */
+	if (page_mapped(page) && !active)
+		return;
+
+	else if (PageWriteback(page)) {
+		SetPageReclaim(page);
+		/* Check race with end_page_writeback */
+		if (!PageWriteback(page))
+			ClearPageReclaim(page);
+	} else if (PageDirty(page))
+		SetPageReclaim(page);
+
+	file = page_is_file_cache(page);
+	lru = page_lru_base_type(page);
+	del_page_from_lru_list(zone, page, lru + active);
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+	add_page_to_lru_list(zone, page, lru);
+	if (active)
+		__count_vm_event(PGDEACTIVATE);
+
+	update_page_reclaim_stat(zone, page, file, 0);
+}

+/*
+ * This function must be called with preemption disable.
+ */
+static void __pagevec_lru_deactivate(struct pagevec *pvec)
+{
+	int i;
 	struct zone *zone = NULL;

 	for (i = 0; i < pagevec_count(pvec); i++) {
@@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
 			zone = pagezone;
 			spin_lock_irq(&zone->lru_lock);
 		}
-
-		if (PageLRU(page)) {
-			if (PageActive(page)) {
-				file = page_is_file_cache(page);
-				lru = page_lru_base_type(page);
-				del_page_from_lru_list(zone, page,
-						lru + LRU_ACTIVE);
-				ClearPageActive(page);
-				ClearPageReferenced(page);
-				add_page_to_lru_list(zone, page, lru);
-				__count_vm_event(PGDEACTIVATE);
-
-				update_page_reclaim_stat(zone, page, file, 0);
-			}
-		}
+		__lru_deactivate(page, zone);
 	}
 	if (zone)
 		spin_unlock_irq(&zone->lru_lock);
@@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)

 	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
 	if (pagevec_count(pvec))
-		__pagevec_lru_deactive(pvec);
+		__pagevec_lru_deactivate(pvec);
 }

 /*
- * Forecfully demote a page to the tail of the inactive list.
+ * Forcefully deactivate a page.
+ * This function is used for reclaiming the page ASAP when the page
+ * can't be invalidated by Dirty/Writeback.
  */
 void lru_deactivate_page(struct page *page)
 {
@@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
 		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);

 		if (!pagevec_add(pvec, page))
-			__pagevec_lru_deactive(pvec);
+			__pagevec_lru_deactivate(pvec);
 		put_cpu_var(lru_deactivate_pvecs);
 	}
 }

-
 void lru_add_drain(void)
 {
 	drain_cpu_pagevecs(get_cpu());
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/3] move ClearPageReclaim
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
@ 2010-11-28 15:02 ` Minchan Kim
  2010-11-29  4:25   ` Rik van Riel
  2010-11-29  7:29   ` Wu Fengguang
  2010-11-28 15:02 ` [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed Minchan Kim
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-28 15:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Wu Fengguang,
	Rik van Riel, KOSAKI Motohiro, Johannes Weiner, Nick Piggin,
	Mel Gorman

fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
preventing fast reclaiming readahead marker page.

In this series, PG_reclaim is used by invalidated page, too.
If VM find the page is invalidated and it's dirty, it sets PG_reclaim
to reclaim asap. Then, when the dirty page will be writeback,
clear_page_dirty_for_io will clear PG_reclaim unconditionally.
It disturbs this serie's goal.

I think it's okay to clear PG_readahead when the page is dirty, not
writeback time. So this patch moves ClearPageReadahead.
This patch needs Wu's opinion.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
---
 fs/buffer.c         |    1 +
 mm/page-writeback.c |    6 +++++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 20a41c6..b920086 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -717,6 +717,7 @@ int __set_page_dirty_buffers(struct page *page)
 	int newly_dirty;
 	struct address_space *mapping = page_mapping(page);
 
+	ClearPageReclaim(page);
 	if (unlikely(!mapping))
 		return !TestSetPageDirty(page);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc93802..962b0d8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1146,6 +1146,7 @@ EXPORT_SYMBOL(write_one_page);
  */
 int __set_page_dirty_no_writeback(struct page *page)
 {
+	ClearPageReclaim(page);
 	if (!PageDirty(page))
 		return !TestSetPageDirty(page);
 	return 0;
@@ -1196,6 +1197,7 @@ EXPORT_SYMBOL(account_page_writeback);
  */
 int __set_page_dirty_nobuffers(struct page *page)
 {
+	ClearPageReclaim(page);
 	if (!TestSetPageDirty(page)) {
 		struct address_space *mapping = page_mapping(page);
 		struct address_space *mapping2;
@@ -1258,6 +1260,8 @@ int set_page_dirty(struct page *page)
 #endif
 		return (*spd)(page);
 	}
+
+	ClearPageReclaim(page);
 	if (!PageDirty(page)) {
 		if (!TestSetPageDirty(page))
 			return 1;
@@ -1280,6 +1284,7 @@ int set_page_dirty_lock(struct page *page)
 {
 	int ret;
 
+	ClearPageReclaim(page);
 	lock_page_nosync(page);
 	ret = set_page_dirty(page);
 	unlock_page(page);
@@ -1307,7 +1312,6 @@ int clear_page_dirty_for_io(struct page *page)
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
  2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
@ 2010-11-28 15:02 ` Minchan Kim
  2010-11-29  4:28   ` Rik van Riel
  2010-11-28 15:13 ` [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2010-11-28 15:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Rik van Riel,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman,
	Wu Fengguang

Now zap_pte_range alwayas promotes pages which are pte_young &&
!VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
it's unnecessary since the page wouldn't use any more.

If the page is sharred by other processes and it's real working set

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>

Changelog since v1:
- change word from promote to activate
- add activate argument to zap_pte_range and family function
---
 include/linux/mm.h |    4 ++--
 mm/madvise.c       |    4 ++--
 mm/memory.c        |   38 +++++++++++++++++++++++---------------
 mm/mmap.c          |    4 ++--
 4 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..6032881 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -779,11 +779,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		unsigned long size, struct zap_details *);
+		unsigned long size, struct zap_details *, bool activate);
 unsigned long unmap_vmas(struct mmu_gather **tlb,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
-		struct zap_details *);
+		struct zap_details *, bool activate);
 
 /**
  * mm_walk - callbacks for walk_page_range
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..8bc4b2d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -171,9 +171,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			.nonlinear_vma = vma,
 			.last_index = ULONG_MAX,
 		};
-		zap_page_range(vma, start, end - start, &details);
+		zap_page_range(vma, start, end - start, &details, false);
 	} else
-		zap_page_range(vma, start, end - start, NULL);
+		zap_page_range(vma, start, end - start, NULL, false);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 2c989f3..249e23a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -891,7 +891,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pmd_t *pmd,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	struct mm_struct *mm = tlb->mm;
 	pte_t *pte;
@@ -949,7 +950,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
-				    likely(!VM_SequentialReadHint(vma)))
+				    likely(!VM_SequentialReadHint(vma)) &&
+					activate)
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
 			}
@@ -989,7 +991,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pud_t *pud,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -1002,7 +1005,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pte_range(tlb, vma, pmd, addr, next,
-						zap_work, details);
+					zap_work, details, activate);
 	} while (pmd++, addr = next, (addr != end && *zap_work > 0));
 
 	return addr;
@@ -1011,7 +1014,8 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pgd_t *pgd,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -1024,7 +1028,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pmd_range(tlb, vma, pud, addr, next,
-						zap_work, details);
+					zap_work, details, activate);
 	} while (pud++, addr = next, (addr != end && *zap_work > 0));
 
 	return addr;
@@ -1033,7 +1037,8 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 static unsigned long unmap_page_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma,
 				unsigned long addr, unsigned long end,
-				long *zap_work, struct zap_details *details)
+				long *zap_work, struct zap_details *details,
+				bool activate)
 {
 	pgd_t *pgd;
 	unsigned long next;
@@ -1052,7 +1057,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 			continue;
 		}
 		next = zap_pud_range(tlb, vma, pgd, addr, next,
-						zap_work, details);
+						zap_work, details, activate);
 	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
 	tlb_end_vma(tlb, vma);
 	mem_cgroup_uncharge_end();
@@ -1075,6 +1080,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
  * @end_addr: virtual address at which to end unmapping
  * @nr_accounted: Place number of unmapped pages in vm-accountable vma's here
  * @details: details of nonlinear truncation or shared cache invalidation
+ * @activate: whether pages included in the vma should be activated or not
  *
  * Returns the end address of the unmapping (restart addr if interrupted).
  *
@@ -1096,7 +1102,7 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 unsigned long unmap_vmas(struct mmu_gather **tlbp,
 		struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
-		struct zap_details *details)
+		struct zap_details *details, bool activate)
 {
 	long zap_work = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -1149,8 +1155,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 
 				start = end;
 			} else
-				start = unmap_page_range(*tlbp, vma,
-						start, end, &zap_work, details);
+				start = unmap_page_range(*tlbp, vma, start,
+					end, &zap_work, details, activate);
 
 			if (zap_work > 0) {
 				BUG_ON(start != end);
@@ -1184,9 +1190,10 @@ out:
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  * @details: details of nonlinear truncation or shared cache invalidation
+ * @activate: whether pages included in the vma should be activated or not
  */
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		unsigned long size, struct zap_details *details)
+		unsigned long size, struct zap_details *details, bool activate)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -1196,7 +1203,8 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
+	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted,
+		details, activate);
 	if (tlb)
 		tlb_finish_mmu(tlb, address, end);
 	return end;
@@ -1220,7 +1228,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-	zap_page_range(vma, address, size, NULL);
+	zap_page_range(vma, address, size, NULL, false);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(zap_vma_ptes);
@@ -2481,7 +2489,7 @@ again:
 	}
 
 	restart_addr = zap_page_range(vma, start_addr,
-					end_addr - start_addr, details);
+			end_addr - start_addr, details, true);
 	need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
 
 	if (restart_addr >= end_addr) {
diff --git a/mm/mmap.c b/mm/mmap.c
index b179abb..0ed5ab3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1904,7 +1904,7 @@ static void unmap_region(struct mm_struct *mm,
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL, true);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
@@ -2278,7 +2278,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL, true);
 	vm_unacct_memory(nr_accounted);
 
 	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
  2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
  2010-11-28 15:02 ` [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed Minchan Kim
@ 2010-11-28 15:13 ` Minchan Kim
  2010-11-28 19:02 ` Rik van Riel
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-28 15:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Ben Gamari, Minchan Kim, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, KOSAKI Motohiro, Johannes Weiner,
	Nick Piggin, Mel Gorman

This patch is add-on patch of mmotm-11-23.
Please, read original patch and thread.

http://marc.info/?l=linux-kernel&m=129034986927826&w=3

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
                   ` (2 preceding siblings ...)
  2010-11-28 15:13 ` [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
@ 2010-11-28 19:02 ` Rik van Riel
  2010-11-29  0:33 ` KOSAKI Motohiro
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2010-11-28 19:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, KOSAKI Motohiro, Johannes Weiner, Nick Piggin,
	Mel Gorman

On 11/28/2010 10:02 AM, Minchan Kim wrote:
> This patch is based on mmotm-11-23.
>
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
                   ` (3 preceding siblings ...)
  2010-11-28 19:02 ` Rik van Riel
@ 2010-11-29  0:33 ` KOSAKI Motohiro
  2010-11-29  1:58   ` Ben Gamari
  2010-11-29  2:13   ` Minchan Kim
  2010-11-29  7:49 ` Wu Fengguang
  2010-11-29 12:07 ` Mel Gorman
  6 siblings, 2 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-11-29  0:33 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Peter Zijlstra, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Nick Piggin, Mel Gorman

> ---
>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 63 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>  	spin_unlock_irq(&zone->lru_lock);
>  }
>  
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
> -	int i, lru, file;
> +	int lru, file;
> +	int active = 0;
> +
> +	if (!PageLRU(page))
> +		return;
> +
> +	if (PageActive(page))
> +		active = 1;
> +	/* Some processes are using the page */
> +	if (page_mapped(page) && !active)
> +		return;
> +
> +	else if (PageWriteback(page)) {
> +		SetPageReclaim(page);
> +		/* Check race with end_page_writeback */
> +		if (!PageWriteback(page))
> +			ClearPageReclaim(page);
> +	} else if (PageDirty(page))
> +		SetPageReclaim(page);
> +
> +	file = page_is_file_cache(page);
> +	lru = page_lru_base_type(page);
> +	del_page_from_lru_list(zone, page, lru + active);
> +	ClearPageActive(page);
> +	ClearPageReferenced(page);
> +	add_page_to_lru_list(zone, page, lru);
> +	if (active)
> +		__count_vm_event(PGDEACTIVATE);
> +
> +	update_page_reclaim_stat(zone, page, file, 0);
> +}

I don't like this change because fadvise(DONT_NEED) is rarely used
function and this PG_reclaim trick doesn't improve so much. In the
other hand, It increase VM state mess.

However, I haven't found any fault and unworked reason in this patch.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  0:33 ` KOSAKI Motohiro
@ 2010-11-29  1:58   ` Ben Gamari
  2010-11-29  2:13     ` KOSAKI Motohiro
  2010-11-29  2:13   ` Minchan Kim
  1 sibling, 1 reply; 21+ messages in thread
From: Ben Gamari @ 2010-11-29  1:58 UTC (permalink / raw)
  To: KOSAKI Motohiro, Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Peter Zijlstra, Wu Fengguang,
	Rik van Riel, Johannes Weiner, Nick Piggin, Mel Gorman

On Mon, 29 Nov 2010 09:33:38 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > ---
> >  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
> >  1 files changed, 63 insertions(+), 21 deletions(-)
> > 
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 31f5ec4..345eca1 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
> >  	spin_unlock_irq(&zone->lru_lock);
> >  }
> >  
> > -static void __pagevec_lru_deactive(struct pagevec *pvec)
> > +/*
> > + * This function is used by invalidate_mapping_pages.
> > + * If the page can't be invalidated, this function moves the page
> > + * into inative list's head or tail to reclaim ASAP and evict
> > + * working set page.
> > + *
> > + * PG_reclaim means when the page's writeback completes, the page
> > + * will move into tail of inactive for reclaiming ASAP.
> > + *
> > + * 1. active, mapped page -> inactive, head
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 5. others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be writeout by flusher. The flusher's writeout is much effective than
> > + * reclaimer's random writeout.
> > + */
> > +static void __lru_deactivate(struct page *page, struct zone *zone)
> >  {
> > -	int i, lru, file;
> > +	int lru, file;
> > +	int active = 0;
> > +
> > +	if (!PageLRU(page))
> > +		return;
> > +
> > +	if (PageActive(page))
> > +		active = 1;
> > +	/* Some processes are using the page */
> > +	if (page_mapped(page) && !active)
> > +		return;
> > +
> > +	else if (PageWriteback(page)) {
> > +		SetPageReclaim(page);
> > +		/* Check race with end_page_writeback */
> > +		if (!PageWriteback(page))
> > +			ClearPageReclaim(page);
> > +	} else if (PageDirty(page))
> > +		SetPageReclaim(page);
> > +
> > +	file = page_is_file_cache(page);
> > +	lru = page_lru_base_type(page);
> > +	del_page_from_lru_list(zone, page, lru + active);
> > +	ClearPageActive(page);
> > +	ClearPageReferenced(page);
> > +	add_page_to_lru_list(zone, page, lru);
> > +	if (active)
> > +		__count_vm_event(PGDEACTIVATE);
> > +
> > +	update_page_reclaim_stat(zone, page, file, 0);
> > +}
> 
> I don't like this change because fadvise(DONT_NEED) is rarely used
> function and this PG_reclaim trick doesn't improve so much. In the
> other hand, It increase VM state mess.
> 

Can we please stop appealing to this argument? The reason that
fadvise(DONT_NEED) is currently rarely employed is that the interface as
implemented now is extremely kludgey to use.

Are you proposing that this particular implementation is not worth the
mess (as opposed to putting the pages at the head of the inactive list
as done earlier) or would you rather that we simply leave DONT_NEED in
its current state? Even if today's gains aren't as great as we would
like them to be, we should still make an effort to make fadvise()
usable, if for no other reason than to encourage use in user-space so
that applications can benefit when we finally do figure out how to
properly account for the user's hints.

Cheers,

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  0:33 ` KOSAKI Motohiro
  2010-11-29  1:58   ` Ben Gamari
@ 2010-11-29  2:13   ` Minchan Kim
  2010-11-29  2:35     ` KOSAKI Motohiro
  1 sibling, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2010-11-29  2:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, Johannes Weiner, Nick Piggin,
	Mel Gorman

Hi KOSAKI,

On Mon, Nov 29, 2010 at 9:33 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> ---
>>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>>  1 files changed, 63 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 31f5ec4..345eca1 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>>       spin_unlock_irq(&zone->lru_lock);
>>  }
>>
>> -static void __pagevec_lru_deactive(struct pagevec *pvec)
>> +/*
>> + * This function is used by invalidate_mapping_pages.
>> + * If the page can't be invalidated, this function moves the page
>> + * into inative list's head or tail to reclaim ASAP and evict
>> + * working set page.
>> + *
>> + * PG_reclaim means when the page's writeback completes, the page
>> + * will move into tail of inactive for reclaiming ASAP.
>> + *
>> + * 1. active, mapped page -> inactive, head
>> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 3. inactive, mapped page -> none
>> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 5. others -> none
>> + *
>> + * In 4, why it moves inactive's head, the VM expects the page would
>> + * be writeout by flusher. The flusher's writeout is much effective than
>> + * reclaimer's random writeout.
>> + */
>> +static void __lru_deactivate(struct page *page, struct zone *zone)
>>  {
>> -     int i, lru, file;
>> +     int lru, file;
>> +     int active = 0;
>> +
>> +     if (!PageLRU(page))
>> +             return;
>> +
>> +     if (PageActive(page))
>> +             active = 1;
>> +     /* Some processes are using the page */
>> +     if (page_mapped(page) && !active)
>> +             return;
>> +
>> +     else if (PageWriteback(page)) {
>> +             SetPageReclaim(page);
>> +             /* Check race with end_page_writeback */
>> +             if (!PageWriteback(page))
>> +                     ClearPageReclaim(page);
>> +     } else if (PageDirty(page))
>> +             SetPageReclaim(page);
>> +
>> +     file = page_is_file_cache(page);
>> +     lru = page_lru_base_type(page);
>> +     del_page_from_lru_list(zone, page, lru + active);
>> +     ClearPageActive(page);
>> +     ClearPageReferenced(page);
>> +     add_page_to_lru_list(zone, page, lru);
>> +     if (active)
>> +             __count_vm_event(PGDEACTIVATE);
>> +
>> +     update_page_reclaim_stat(zone, page, file, 0);
>> +}
>
> I don't like this change because fadvise(DONT_NEED) is rarely used
> function and this PG_reclaim trick doesn't improve so much. In the
> other hand, It increase VM state mess.

Chick-egg problem.
Why fadvise(DONT_NEED) is rarely used is it's hard to use effective.
mincore + fdatasync + fadvise series is very ugly.
This patch's goal is to solve it.

PG_reclaim trick would prevent working set eviction.
If you fadvise call and there are the invalidated page which are
dirtying in middle of inactive LRU,
reclaimer would evict working set of inactive LRU's tail even if we
have a invalidated page in LRU.
It's bad.

About VM state mess, PG_readahead already have done it.
But I admit this patch could make it worse and that's why I Cced Wu Fengguang.

The problem it can make is readahead confusing and working set
eviction after writeback.
I can add ClearPageReclaim of mark_page_accessed for clear flag if the
page is accessed during race.
But I didn't add it in this version because I think it's very rare case.

I don't want to add new page flag due to this function or revert merge
patch of (PG_readahead and PG_reclaim)


>
> However, I haven't found any fault and unworked reason in this patch.
>
Thanks for the good review, KOSAKI. :)


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  1:58   ` Ben Gamari
@ 2010-11-29  2:13     ` KOSAKI Motohiro
  2010-11-29  2:26       ` Ben Gamari
  0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-11-29  2:13 UTC (permalink / raw)
  To: Ben Gamari
  Cc: kosaki.motohiro, Minchan Kim, Andrew Morton, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Nick Piggin, Mel Gorman

> > I don't like this change because fadvise(DONT_NEED) is rarely used
> > function and this PG_reclaim trick doesn't improve so much. In the
> > other hand, It increase VM state mess.
> > 
> 
> Can we please stop appealing to this argument? The reason that
> fadvise(DONT_NEED) is currently rarely employed is that the interface as
> implemented now is extremely kludgey to use.
> 
> Are you proposing that this particular implementation is not worth the
> mess (as opposed to putting the pages at the head of the inactive list
> as done earlier) or would you rather that we simply leave DONT_NEED in
> its current state? Even if today's gains aren't as great as we would
> like them to be, we should still make an effort to make fadvise()
> usable, if for no other reason than to encourage use in user-space so
> that applications can benefit when we finally do figure out how to
> properly account for the user's hints.

Hi

I'm not againt DONT_NEED feature. I only said PG_reclaim trick is not
so effective. Every feature has their own pros/cons. I think the cons
is too big. Also, nobody have mesured PG_reclaim performance gain. Did you?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  2:13     ` KOSAKI Motohiro
@ 2010-11-29  2:26       ` Ben Gamari
  0 siblings, 0 replies; 21+ messages in thread
From: Ben Gamari @ 2010-11-29  2:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, Andrew Morton, linux-mm, LKML, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, Johannes Weiner, Nick Piggin,
	Mel Gorman

On Mon, 29 Nov 2010 11:13:53 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> I'm not againt DONT_NEED feature. I only said PG_reclaim trick is not
> so effective. Every feature has their own pros/cons. I think the cons
> is too big. Also, nobody have mesured PG_reclaim performance gain. Did you?
> 
Not yet. Finally back from vacation here in the States. I'll try sitting
down to put together a test tonight.

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  2:13   ` Minchan Kim
@ 2010-11-29  2:35     ` KOSAKI Motohiro
  0 siblings, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2010-11-29  2:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, linux-mm, LKML, Ben Gamari,
	Peter Zijlstra, Wu Fengguang, Rik van Riel, Johannes Weiner,
	Nick Piggin, Mel Gorman

Hi

> > I don't like this change because fadvise(DONT_NEED) is rarely used
> > function and this PG_reclaim trick doesn't improve so much. In the
> > other hand, It increase VM state mess.
> 
> Chick-egg problem.
> Why fadvise(DONT_NEED) is rarely used is it's hard to use effective.
> mincore + fdatasync + fadvise series is very ugly.
> This patch's goal is to solve it.

Well, I haven't put opposition to your previous patch for this reason.
I think every one have agree mincore + fdatasync + fadvise mess is ugly.

However I doubt PG_reclaim trick is so effective. I mean, _if_ it's effective, our current
streaming io heristics doesn't work so effective. It's bad. and if so, we should fix
it generically. That's the reason why I prefer to use simple add_page_to_lru_list().

Please remember why do we made this one. rsync has special io access pattern
then our streaming io detection doesn't work so good. therefore we decided to
improve manual knob. But, why do we need to make completely different behavior
manual DONT_NEED suggestion and automatic DONT_NEED detection?

That's my point.


> PG_reclaim trick would prevent working set eviction.
> If you fadvise call and there are the invalidated page which are
> dirtying in middle of inactive LRU,
> reclaimer would evict working set of inactive LRU's tail even if we
> have a invalidated page in LRU.
> It's bad.
> 
> About VM state mess, PG_readahead already have done it.
> But I admit this patch could make it worse and that's why I Cced Wu Fengguang.
> 
> The problem it can make is readahead confusing and working set
> eviction after writeback.
> I can add ClearPageReclaim of mark_page_accessed for clear flag if the
> page is accessed during race.
> But I didn't add it in this version because I think it's very rare case.
> 
> I don't want to add new page flag due to this function or revert merge
> patch of (PG_readahead and PG_reclaim)
> 
> 
> >
> > However, I haven't found any fault and unworked reason in this patch.
> >
> Thanks for the good review, KOSAKI. :)
> 
> 
> -- 
> Kind regards,
> Minchan Kim



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] move ClearPageReclaim
  2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
@ 2010-11-29  4:25   ` Rik van Riel
  2010-11-29  7:29   ` Wu Fengguang
  1 sibling, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2010-11-29  4:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Wu Fengguang,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman

On 11/28/2010 10:02 AM, Minchan Kim wrote:
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
>
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
>
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> This patch needs Wu's opinion.

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed
  2010-11-28 15:02 ` [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed Minchan Kim
@ 2010-11-29  4:28   ` Rik van Riel
  2010-11-29  4:30     ` Minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2010-11-29  4:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On 11/28/2010 10:02 AM, Minchan Kim wrote:
> Now zap_pte_range alwayas promotes pages which are pte_young&&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
>
> If the page is sharred by other processes and it's real working set

This line seems to be superfluous, I don't see any special
treatment for this case in the code.

> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner<hannes@cmpxchg.org>
> Cc: Nick Piggin<npiggin@kernel.dk>
> Cc: Mel Gorman<mel@csn.ul.ie>
> Cc: Wu Fengguang<fengguang.wu@intel.com>

The patch itself looks good to me.

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed
  2010-11-29  4:28   ` Rik van Riel
@ 2010-11-29  4:30     ` Minchan Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-29  4:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, KOSAKI Motohiro,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Mon, Nov 29, 2010 at 1:28 PM, Rik van Riel <riel@redhat.com> wrote:
> On 11/28/2010 10:02 AM, Minchan Kim wrote:
>>
>> Now zap_pte_range alwayas promotes pages which are pte_young&&
>> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
>> it's unnecessary since the page wouldn't use any more.
>>
>> If the page is sharred by other processes and it's real working set
>
> This line seems to be superfluous, I don't see any special
> treatment for this case in the code.

I should remove the lines.
It's my fault.

>
>> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
>> Cc: Rik van Riel<riel@redhat.com>
>> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner<hannes@cmpxchg.org>
>> Cc: Nick Piggin<npiggin@kernel.dk>
>> Cc: Mel Gorman<mel@csn.ul.ie>
>> Cc: Wu Fengguang<fengguang.wu@intel.com>
>
> The patch itself looks good to me.
>
> Acked-by: Rik van Riel <riel@redhat.com>

Thanks, Rik.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] move ClearPageReclaim
  2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
  2010-11-29  4:25   ` Rik van Riel
@ 2010-11-29  7:29   ` Wu Fengguang
  2010-11-29  8:16     ` Minchan Kim
  1 sibling, 1 reply; 21+ messages in thread
From: Wu Fengguang @ 2010-11-29  7:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Rik van Riel,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman

On Sun, Nov 28, 2010 at 11:02:56PM +0800, Minchan Kim wrote:
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> This patch needs Wu's opinion.

It's a safe change. The possibility and consequence of races are both
small enough. However the patch could be simplified as follows?

Thanks,
Fengguang
---

--- linux-next.orig/mm/page-writeback.c	2010-11-29 15:14:54.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-11-29 15:15:02.000000000 +0800
@@ -1330,6 +1330,7 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1387,7 +1388,6 @@ int clear_page_dirty_for_io(struct page 
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
                   ` (4 preceding siblings ...)
  2010-11-29  0:33 ` KOSAKI Motohiro
@ 2010-11-29  7:49 ` Wu Fengguang
  2010-11-29  8:09   ` Minchan Kim
  2010-11-29 12:07 ` Mel Gorman
  6 siblings, 1 reply; 21+ messages in thread
From: Wu Fengguang @ 2010-11-29  7:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Rik van Riel, KOSAKI Motohiro, Johannes Weiner, Nick Piggin,
	Mel Gorman

On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote:
> This patch is based on mmotm-11-23. 

I cannot find __pagevec_lru_deactive() in mmotm-11-23.
Do you have any more patches?

> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. This patch uses trick PG_reclaim so the page would
> be moved into tail of inactive list when the page writeout completes.
> 
> It can prevent writeout of pageout which is less effective than
> flusher's writeout.
> 
> This patch considers page_mappged(page) with working set.
> So the page could leave head of inactive to get a change to activate.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Note :
> PG_reclaim trick of writeback page could race with end_page_writeback
> so this patch check PageWriteback one more. It makes race window time
> reall small. But by theoretical, it still have a race. But it's a trivial.
> 
> Quote from fe3cba17 and some modification
> "If some page PG_reclaim unintentionally, it will confuse readahead and
> make it restart the size rampup process. But it's a trivial problem, and
> can mostly be avoided by checking PageWriteback(page) first in readahead"
> 
> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> always clears PG_reclaim. Next patch will fix it.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Changelog since v1:
>  - modify description
>  - correct typo
>  - add some comment
>  - change deactivation policy
> ---
>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 63 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>  	spin_unlock_irq(&zone->lru_lock);
>  }
>  
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
> -	int i, lru, file;
> +	int lru, file;
> +	int active = 0;
> +
> +	if (!PageLRU(page))
> +		return;
> +
> +	if (PageActive(page))
> +		active = 1;
> +	/* Some processes are using the page */
> +	if (page_mapped(page) && !active)
> +		return;

It's good to check such protections if doing heuristic demotion.
However if requested explicitly by the user, I'm _much more_ inclined
to act stupid&dumb and meet the user's expectation. Or will this code
be called by someone other than DONTNEED? Sorry I have no context of
the full code.

> +	else if (PageWriteback(page)) {
> +		SetPageReclaim(page);
> +		/* Check race with end_page_writeback */
> +		if (!PageWriteback(page))
> +			ClearPageReclaim(page);

Does the double check help a lot?

> +	} else if (PageDirty(page))
> +		SetPageReclaim(page);

Typically there are much more dirty pages than writeback pages.
I guess it's a good place to call bdi_start_inode_writeback() which
was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html

Thanks,
Fengguang

> +
> +	file = page_is_file_cache(page);
> +	lru = page_lru_base_type(page);
> +	del_page_from_lru_list(zone, page, lru + active);
> +	ClearPageActive(page);
> +	ClearPageReferenced(page);
> +	add_page_to_lru_list(zone, page, lru);
> +	if (active)
> +		__count_vm_event(PGDEACTIVATE);
> +
> +	update_page_reclaim_stat(zone, page, file, 0);
> +}
>  
> +/*
> + * This function must be called with preemption disable.
> + */
> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
> +{
> +	int i;
>  	struct zone *zone = NULL;
>  
>  	for (i = 0; i < pagevec_count(pvec); i++) {
> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
>  			zone = pagezone;
>  			spin_lock_irq(&zone->lru_lock);
>  		}
> -
> -		if (PageLRU(page)) {
> -			if (PageActive(page)) {
> -				file = page_is_file_cache(page);
> -				lru = page_lru_base_type(page);
> -				del_page_from_lru_list(zone, page,
> -						lru + LRU_ACTIVE);
> -				ClearPageActive(page);
> -				ClearPageReferenced(page);
> -				add_page_to_lru_list(zone, page, lru);
> -				__count_vm_event(PGDEACTIVATE);
> -
> -				update_page_reclaim_stat(zone, page, file, 0);
> -			}
> -		}
> +		__lru_deactivate(page, zone);
>  	}
>  	if (zone)
>  		spin_unlock_irq(&zone->lru_lock);
> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>  
>  	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
>  	if (pagevec_count(pvec))
> -		__pagevec_lru_deactive(pvec);
> +		__pagevec_lru_deactivate(pvec);
>  }
>  
>  /*
> - * Forecfully demote a page to the tail of the inactive list.
> + * Forcefully deactivate a page.
> + * This function is used for reclaiming the page ASAP when the page
> + * can't be invalidated by Dirty/Writeback.
>   */
>  void lru_deactivate_page(struct page *page)
>  {
> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
>  		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>  
>  		if (!pagevec_add(pvec, page))
> -			__pagevec_lru_deactive(pvec);
> +			__pagevec_lru_deactivate(pvec);
>  		put_cpu_var(lru_deactivate_pvecs);
>  	}
>  }
>  
> -
>  void lru_add_drain(void)
>  {
>  	drain_cpu_pagevecs(get_cpu());
> -- 
> 1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29  7:49 ` Wu Fengguang
@ 2010-11-29  8:09   ` Minchan Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-29  8:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Rik van Riel, KOSAKI Motohiro, Johannes Weiner, Nick Piggin,
	Mel Gorman

Hi Wu,

On Mon, Nov 29, 2010 at 4:49 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote:
>> This patch is based on mmotm-11-23.
>
> I cannot find __pagevec_lru_deactive() in mmotm-11-23.
> Do you have any more patches?

Please see this patch.
http://www.spinics.net/lists/mm-commits/msg80851.html

>
>> Recently, there are reported problem about thrashing.
>> (http://marc.info/?l=rsync&m=128885034930933&w=2)
>> It happens by backup workloads(ex, nightly rsync).
>> That's because the workload makes just use-once pages
>> and touches pages twice. It promotes the page into
>> active list so that it results in working set page eviction.
>>
>> Some app developer want to support POSIX_FADV_NOREUSE.
>> But other OSes don't support it, either.
>> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>>
>> By Other approach, app developer uses POSIX_FADV_DONTNEED.
>> But it has a problem. If kernel meets page is writing
>> during invalidate_mapping_pages, it can't work.
>> It is very hard for application programmer to use it.
>> Because they always have to sync data before calling
>> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
>> be discardable. At last, they can't use deferred write of kernel
>> so that they could see performance loss.
>> (http://insights.oetiker.ch/linux/fadvise.html)
>>
>> In fact, invalidation is very big hint to reclaimer.
>> It means we don't use the page any more. So let's move
>> the writing page into inactive list's head.
>>
>> Why I need the page to head, Dirty/Writeback page would be flushed
>> sooner or later. This patch uses trick PG_reclaim so the page would
>> be moved into tail of inactive list when the page writeout completes.
>>
>> It can prevent writeout of pageout which is less effective than
>> flusher's writeout.
>>
>> This patch considers page_mappged(page) with working set.
>> So the page could leave head of inactive to get a change to activate.
>>
>> Originally, I reused lru_demote of Peter with some change so added
>> his Signed-off-by.
>>
>> Note :
>> PG_reclaim trick of writeback page could race with end_page_writeback
>> so this patch check PageWriteback one more. It makes race window time
>> reall small. But by theoretical, it still have a race. But it's a trivial.
>>
>> Quote from fe3cba17 and some modification
>> "If some page PG_reclaim unintentionally, it will confuse readahead and
>> make it restart the size rampup process. But it's a trivial problem, and
>> can mostly be avoided by checking PageWriteback(page) first in readahead"
>>
>> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
>> always clears PG_reclaim. Next patch will fix it.
>>
>> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>>
>> Changelog since v1:
>>  - modify description
>>  - correct typo
>>  - add some comment
>>  - change deactivation policy
>> ---
>>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>>  1 files changed, 63 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 31f5ec4..345eca1 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>>       spin_unlock_irq(&zone->lru_lock);
>>  }
>>
>> -static void __pagevec_lru_deactive(struct pagevec *pvec)
>> +/*
>> + * This function is used by invalidate_mapping_pages.
>> + * If the page can't be invalidated, this function moves the page
>> + * into inative list's head or tail to reclaim ASAP and evict
>> + * working set page.
>> + *
>> + * PG_reclaim means when the page's writeback completes, the page
>> + * will move into tail of inactive for reclaiming ASAP.
>> + *
>> + * 1. active, mapped page -> inactive, head
>> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 3. inactive, mapped page -> none
>> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 5. others -> none
>> + *
>> + * In 4, why it moves inactive's head, the VM expects the page would
>> + * be writeout by flusher. The flusher's writeout is much effective than
>> + * reclaimer's random writeout.
>> + */
>> +static void __lru_deactivate(struct page *page, struct zone *zone)
>>  {
>> -     int i, lru, file;
>> +     int lru, file;
>> +     int active = 0;
>> +
>> +     if (!PageLRU(page))
>> +             return;
>> +
>> +     if (PageActive(page))
>> +             active = 1;
>> +     /* Some processes are using the page */
>> +     if (page_mapped(page) && !active)
>> +             return;
>
> It's good to check such protections if doing heuristic demotion.
> However if requested explicitly by the user, I'm _much more_ inclined
> to act stupid&dumb and meet the user's expectation. Or will this code
> be called by someone other than DONTNEED? Sorry I have no context of
> the full code.

Sorry.

Yes. I expect lru_deactive_page can be used by other places with some
modification.
First thing I expected is here.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg179576.html
After I make sure this patch's effective, I will try it, too.


>
>> +     else if (PageWriteback(page)) {
>> +             SetPageReclaim(page);
>> +             /* Check race with end_page_writeback */
>> +             if (!PageWriteback(page))
>> +                     ClearPageReclaim(page);
>
> Does the double check help a lot?
>
>> +     } else if (PageDirty(page))
>> +             SetPageReclaim(page);
>
> Typically there are much more dirty pages than writeback pages.
> I guess it's a good place to call bdi_start_inode_writeback() which
> was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html

It looks good to me.
It makes my code very simple.

I can use it. It means my patch depends on yours patch.
Do you have a plan to merge it?


>
> Thanks,
> Fengguang
>
>> +
>> +     file = page_is_file_cache(page);
>> +     lru = page_lru_base_type(page);
>> +     del_page_from_lru_list(zone, page, lru + active);
>> +     ClearPageActive(page);
>> +     ClearPageReferenced(page);
>> +     add_page_to_lru_list(zone, page, lru);
>> +     if (active)
>> +             __count_vm_event(PGDEACTIVATE);
>> +
>> +     update_page_reclaim_stat(zone, page, file, 0);
>> +}
>>
>> +/*
>> + * This function must be called with preemption disable.
>> + */
>> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
>> +{
>> +     int i;
>>       struct zone *zone = NULL;
>>
>>       for (i = 0; i < pagevec_count(pvec); i++) {
>> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
>>                       zone = pagezone;
>>                       spin_lock_irq(&zone->lru_lock);
>>               }
>> -
>> -             if (PageLRU(page)) {
>> -                     if (PageActive(page)) {
>> -                             file = page_is_file_cache(page);
>> -                             lru = page_lru_base_type(page);
>> -                             del_page_from_lru_list(zone, page,
>> -                                             lru + LRU_ACTIVE);
>> -                             ClearPageActive(page);
>> -                             ClearPageReferenced(page);
>> -                             add_page_to_lru_list(zone, page, lru);
>> -                             __count_vm_event(PGDEACTIVATE);
>> -
>> -                             update_page_reclaim_stat(zone, page, file, 0);
>> -                     }
>> -             }
>> +             __lru_deactivate(page, zone);
>>       }
>>       if (zone)
>>               spin_unlock_irq(&zone->lru_lock);
>> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>>
>>       pvec = &per_cpu(lru_deactivate_pvecs, cpu);
>>       if (pagevec_count(pvec))
>> -             __pagevec_lru_deactive(pvec);
>> +             __pagevec_lru_deactivate(pvec);
>>  }
>>
>>  /*
>> - * Forecfully demote a page to the tail of the inactive list.
>> + * Forcefully deactivate a page.
>> + * This function is used for reclaiming the page ASAP when the page
>> + * can't be invalidated by Dirty/Writeback.
>>   */
>>  void lru_deactivate_page(struct page *page)
>>  {
>> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
>>               struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>>
>>               if (!pagevec_add(pvec, page))
>> -                     __pagevec_lru_deactive(pvec);
>> +                     __pagevec_lru_deactivate(pvec);
>>               put_cpu_var(lru_deactivate_pvecs);
>>       }
>>  }
>>
>> -
>>  void lru_add_drain(void)
>>  {
>>       drain_cpu_pagevecs(get_cpu());
>> --
>> 1.7.0.4
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] move ClearPageReclaim
  2010-11-29  7:29   ` Wu Fengguang
@ 2010-11-29  8:16     ` Minchan Kim
  2010-11-29  8:26       ` Wu Fengguang
  0 siblings, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2010-11-29  8:16 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Rik van Riel,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman

On Mon, Nov 29, 2010 at 4:29 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Sun, Nov 28, 2010 at 11:02:56PM +0800, Minchan Kim wrote:
>> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
>> preventing fast reclaiming readahead marker page.
>>
>> In this series, PG_reclaim is used by invalidated page, too.
>> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
>> to reclaim asap. Then, when the dirty page will be writeback,
>> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
>> It disturbs this serie's goal.
>>
>> I think it's okay to clear PG_readahead when the page is dirty, not
>> writeback time. So this patch moves ClearPageReadahead.
>> This patch needs Wu's opinion.
>
> It's a safe change. The possibility and consequence of races are both
> small enough. However the patch could be simplified as follows?

If all of file systems use it, I don't mind it.
Do all of filesystems use it when the page is dirtied?
I was not sure it.(It's why I added Cc. :)
If it doesn't have a problem, I hope so.

Thanks, Wu.

>
> Thanks,
> Fengguang
> ---
>
> --- linux-next.orig/mm/page-writeback.c 2010-11-29 15:14:54.000000000 +0800
> +++ linux-next/mm/page-writeback.c      2010-11-29 15:15:02.000000000 +0800
> @@ -1330,6 +1330,7 @@ int set_page_dirty(struct page *page)
>  {
>        struct address_space *mapping = page_mapping(page);
>
> +       ClearPageReclaim(page);
>        if (likely(mapping)) {
>                int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
>  #ifdef CONFIG_BLOCK
> @@ -1387,7 +1388,6 @@ int clear_page_dirty_for_io(struct page
>
>        BUG_ON(!PageLocked(page));
>
> -       ClearPageReclaim(page);
>        if (mapping && mapping_cap_account_dirty(mapping)) {
>                /*
>                 * Yes, Virginia, this is indeed insane.
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] move ClearPageReclaim
  2010-11-29  8:16     ` Minchan Kim
@ 2010-11-29  8:26       ` Wu Fengguang
  0 siblings, 0 replies; 21+ messages in thread
From: Wu Fengguang @ 2010-11-29  8:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Rik van Riel,
	KOSAKI Motohiro, Johannes Weiner, Nick Piggin, Mel Gorman

On Mon, Nov 29, 2010 at 04:16:01PM +0800, Minchan Kim wrote:
> On Mon, Nov 29, 2010 at 4:29 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Sun, Nov 28, 2010 at 11:02:56PM +0800, Minchan Kim wrote:
> >> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> >> preventing fast reclaiming readahead marker page.
> >>
> >> In this series, PG_reclaim is used by invalidated page, too.
> >> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> >> to reclaim asap. Then, when the dirty page will be writeback,
> >> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> >> It disturbs this serie's goal.
> >>
> >> I think it's okay to clear PG_readahead when the page is dirty, not
> >> writeback time. So this patch moves ClearPageReadahead.
> >> This patch needs Wu's opinion.
> >
> > It's a safe change. The possibility and consequence of races are both
> > small enough. However the patch could be simplified as follows?
> 
> If all of file systems use it, I don't mind it.
> Do all of filesystems use it when the page is dirtied?
> I was not sure it.(It's why I added Cc. :)
> If it doesn't have a problem, I hope so.

Please double check, but here is my findings:

__set_page_dirty_buffers() is called by several fs' ->set_page_dirty()
which are all called by set_page_dirty().

set_page_dirty_lock() will call set_page_dirty().

__set_page_dirty_no_writeback(): it have no connection to
end_page_writeback(), so no need to set PG_reclaim.

Thanks,
Fengguang


> > --- linux-next.orig/mm/page-writeback.c 2010-11-29 15:14:54.000000000 +0800
> > +++ linux-next/mm/page-writeback.c A  A  A 2010-11-29 15:15:02.000000000 +0800
> > @@ -1330,6 +1330,7 @@ int set_page_dirty(struct page *page)
> > A {
> > A  A  A  A struct address_space *mapping = page_mapping(page);
> >
> > + A  A  A  ClearPageReclaim(page);
> > A  A  A  A if (likely(mapping)) {
> > A  A  A  A  A  A  A  A int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
> > A #ifdef CONFIG_BLOCK
> > @@ -1387,7 +1388,6 @@ int clear_page_dirty_for_io(struct page
> >
> > A  A  A  A BUG_ON(!PageLocked(page));
> >
> > - A  A  A  ClearPageReclaim(page);
> > A  A  A  A if (mapping && mapping_cap_account_dirty(mapping)) {
> > A  A  A  A  A  A  A  A /*
> > A  A  A  A  A  A  A  A  * Yes, Virginia, this is indeed insane.
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
                   ` (5 preceding siblings ...)
  2010-11-29  7:49 ` Wu Fengguang
@ 2010-11-29 12:07 ` Mel Gorman
  2010-11-29 15:28   ` Minchan Kim
  6 siblings, 1 reply; 21+ messages in thread
From: Mel Gorman @ 2010-11-29 12:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, KOSAKI Motohiro, Johannes Weiner,
	Nick Piggin

On Mon, Nov 29, 2010 at 12:02:55AM +0900, Minchan Kim wrote:
> This patch is based on mmotm-11-23. 
> 
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. This patch uses trick PG_reclaim so the page would
> be moved into tail of inactive list when the page writeout completes.
> 
> It can prevent writeout of pageout which is less effective than
> flusher's writeout.
> 
> This patch considers page_mappged(page) with working set.
> So the page could leave head of inactive to get a change to activate.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Note :
> PG_reclaim trick of writeback page could race with end_page_writeback
> so this patch check PageWriteback one more. It makes race window time
> reall small. But by theoretical, it still have a race. But it's a trivial.
> 
> Quote from fe3cba17 and some modification
> "If some page PG_reclaim unintentionally, it will confuse readahead and
> make it restart the size rampup process. But it's a trivial problem, and
> can mostly be avoided by checking PageWriteback(page) first in readahead"
> 
> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> always clears PG_reclaim. Next patch will fix it.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Changelog since v1:
>  - modify description
>  - correct typo
>  - add some comment
>  - change deactivation policy
> ---
>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 63 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>  	spin_unlock_irq(&zone->lru_lock);
>  }
>  
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
>  {
> -	int i, lru, file;
> +	int lru, file;
> +	int active = 0;
> +
> +	if (!PageLRU(page))
> +		return;
> +
> +	if (PageActive(page))
> +		active = 1;
> +	/* Some processes are using the page */
> +	if (page_mapped(page) && !active)
> +		return;
> +

Why do we move active pages to the inactive list? I thought the decision was
that mapped pages are certainly in use so we they should be not affected by
fadvise(). In contrast, I see you leave inactive pages alone.

> +	else if (PageWriteback(page)) {
> +		SetPageReclaim(page);
> +		/* Check race with end_page_writeback */
> +		if (!PageWriteback(page))
> +			ClearPageReclaim(page);

I think this is safe but the comment could be expanded to mention that
the page is locked at this point and explain how it's impossible for
PageReclaim to be set on a !PageWriteback page here.

> +	} else if (PageDirty(page))
> +		SetPageReclaim(page);
> +
> +	file = page_is_file_cache(page);
> +	lru = page_lru_base_type(page);
> +	del_page_from_lru_list(zone, page, lru + active);
> +	ClearPageActive(page);
> +	ClearPageReferenced(page);
> +	add_page_to_lru_list(zone, page, lru);
> +	if (active)
> +		__count_vm_event(PGDEACTIVATE);
> +
> +	update_page_reclaim_stat(zone, page, file, 0);
> +}
>  
> +/*
> + * This function must be called with preemption disable.
> + */
> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
> +{
> +	int i;
>  	struct zone *zone = NULL;
>  
>  	for (i = 0; i < pagevec_count(pvec); i++) {
> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
>  			zone = pagezone;
>  			spin_lock_irq(&zone->lru_lock);
>  		}
> -
> -		if (PageLRU(page)) {
> -			if (PageActive(page)) {
> -				file = page_is_file_cache(page);
> -				lru = page_lru_base_type(page);
> -				del_page_from_lru_list(zone, page,
> -						lru + LRU_ACTIVE);
> -				ClearPageActive(page);
> -				ClearPageReferenced(page);
> -				add_page_to_lru_list(zone, page, lru);
> -				__count_vm_event(PGDEACTIVATE);
> -
> -				update_page_reclaim_stat(zone, page, file, 0);
> -			}
> -		}
> +		__lru_deactivate(page, zone);
>  	}
>  	if (zone)
>  		spin_unlock_irq(&zone->lru_lock);
> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>  
>  	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
>  	if (pagevec_count(pvec))
> -		__pagevec_lru_deactive(pvec);
> +		__pagevec_lru_deactivate(pvec);
>  }
>  
>  /*
> - * Forecfully demote a page to the tail of the inactive list.
> + * Forcefully deactivate a page.
> + * This function is used for reclaiming the page ASAP when the page
> + * can't be invalidated by Dirty/Writeback.
>   */
>  void lru_deactivate_page(struct page *page)
>  {
> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
>  		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>  
>  		if (!pagevec_add(pvec, page))
> -			__pagevec_lru_deactive(pvec);
> +			__pagevec_lru_deactivate(pvec);
>  		put_cpu_var(lru_deactivate_pvecs);
>  	}
>  }
>  
> -

Unnecessary whitespace change there.

>  void lru_add_drain(void)
>  {
>  	drain_cpu_pagevecs(get_cpu());

Functionally, I think this will work (although I'd like a clarification
on why active pages are rotated). It'd be nice if there was a test case
for this but it's a bit of a chicken-and-egg problem :/

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] deactivate invalidated pages
  2010-11-29 12:07 ` Mel Gorman
@ 2010-11-29 15:28   ` Minchan Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Minchan Kim @ 2010-11-29 15:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linux-mm, LKML, Ben Gamari, Peter Zijlstra,
	Wu Fengguang, Rik van Riel, KOSAKI Motohiro, Johannes Weiner,
	Nick Piggin

Hi Mel,

On Mon, Nov 29, 2010 at 12:07:16PM +0000, Mel Gorman wrote:
> On Mon, Nov 29, 2010 at 12:02:55AM +0900, Minchan Kim wrote:
> > This patch is based on mmotm-11-23. 
> > 
> > Recently, there are reported problem about thrashing.
> > (http://marc.info/?l=rsync&m=128885034930933&w=2)
> > It happens by backup workloads(ex, nightly rsync).
> > That's because the workload makes just use-once pages
> > and touches pages twice. It promotes the page into
> > active list so that it results in working set page eviction.
> > 
> > Some app developer want to support POSIX_FADV_NOREUSE.
> > But other OSes don't support it, either.
> > (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> > 
> > By Other approach, app developer uses POSIX_FADV_DONTNEED.
> > But it has a problem. If kernel meets page is writing
> > during invalidate_mapping_pages, it can't work.
> > It is very hard for application programmer to use it.
> > Because they always have to sync data before calling
> > fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> > be discardable. At last, they can't use deferred write of kernel
> > so that they could see performance loss.
> > (http://insights.oetiker.ch/linux/fadvise.html)
> > 
> > In fact, invalidation is very big hint to reclaimer.
> > It means we don't use the page any more. So let's move
> > the writing page into inactive list's head.
> > 
> > Why I need the page to head, Dirty/Writeback page would be flushed
> > sooner or later. This patch uses trick PG_reclaim so the page would
> > be moved into tail of inactive list when the page writeout completes.
> > 
> > It can prevent writeout of pageout which is less effective than
> > flusher's writeout.
> > 
> > This patch considers page_mappged(page) with working set.
> > So the page could leave head of inactive to get a change to activate.
> > 
> > Originally, I reused lru_demote of Peter with some change so added
> > his Signed-off-by.
> > 
> > Note :
> > PG_reclaim trick of writeback page could race with end_page_writeback
> > so this patch check PageWriteback one more. It makes race window time
> > reall small. But by theoretical, it still have a race. But it's a trivial.
> > 
> > Quote from fe3cba17 and some modification
> > "If some page PG_reclaim unintentionally, it will confuse readahead and
> > make it restart the size rampup process. But it's a trivial problem, and
> > can mostly be avoided by checking PageWriteback(page) first in readahead"
> > 
> > PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> > always clears PG_reclaim. Next patch will fix it.
> > 
> > Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> > Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Nick Piggin <npiggin@kernel.dk>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > 
> > Changelog since v1:
> >  - modify description
> >  - correct typo
> >  - add some comment
> >  - change deactivation policy
> > ---
> >  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
> >  1 files changed, 63 insertions(+), 21 deletions(-)
> > 
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 31f5ec4..345eca1 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
> >  	spin_unlock_irq(&zone->lru_lock);
> >  }
> >  
> > -static void __pagevec_lru_deactive(struct pagevec *pvec)
> > +/*
> > + * This function is used by invalidate_mapping_pages.
> > + * If the page can't be invalidated, this function moves the page
> > + * into inative list's head or tail to reclaim ASAP and evict
> > + * working set page.
> > + *
> > + * PG_reclaim means when the page's writeback completes, the page
> > + * will move into tail of inactive for reclaiming ASAP.
> > + *
> > + * 1. active, mapped page -> inactive, head
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 5. others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be writeout by flusher. The flusher's writeout is much effective than
> > + * reclaimer's random writeout.
> > + */
> > +static void __lru_deactivate(struct page *page, struct zone *zone)
> >  {
> > -	int i, lru, file;
> > +	int lru, file;
> > +	int active = 0;
> > +
> > +	if (!PageLRU(page))
> > +		return;
> > +
> > +	if (PageActive(page))
> > +		active = 1;
> > +	/* Some processes are using the page */
> > +	if (page_mapped(page) && !active)
> > +		return;
> > +
> 
> Why do we move active pages to the inactive list? I thought the decision was
> that mapped pages are certainly in use so we they should be not affected by
> fadvise(). In contrast, I see you leave inactive pages alone.

Fix and resend new version.
Please look at that.

> 
> > +	else if (PageWriteback(page)) {
> > +		SetPageReclaim(page);
> > +		/* Check race with end_page_writeback */
> > +		if (!PageWriteback(page))
> > +			ClearPageReclaim(page);
> 
> I think this is safe but the comment could be expanded to mention that
> the page is locked at this point and explain how it's impossible for
> PageReclaim to be set on a !PageWriteback page here.

Ditto.

> 
> > +	} else if (PageDirty(page))
> > +		SetPageReclaim(page);
> > +
> > +	file = page_is_file_cache(page);
> > +	lru = page_lru_base_type(page);
> > +	del_page_from_lru_list(zone, page, lru + active);
> > +	ClearPageActive(page);
> > +	ClearPageReferenced(page);
> > +	add_page_to_lru_list(zone, page, lru);
> > +	if (active)
> > +		__count_vm_event(PGDEACTIVATE);
> > +
> > +	update_page_reclaim_stat(zone, page, file, 0);
> > +}
> >  
> > +/*
> > + * This function must be called with preemption disable.
> > + */
> > +static void __pagevec_lru_deactivate(struct pagevec *pvec)
> > +{
> > +	int i;
> >  	struct zone *zone = NULL;
> >  
> >  	for (i = 0; i < pagevec_count(pvec); i++) {
> > @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
> >  			zone = pagezone;
> >  			spin_lock_irq(&zone->lru_lock);
> >  		}
> > -
> > -		if (PageLRU(page)) {
> > -			if (PageActive(page)) {
> > -				file = page_is_file_cache(page);
> > -				lru = page_lru_base_type(page);
> > -				del_page_from_lru_list(zone, page,
> > -						lru + LRU_ACTIVE);
> > -				ClearPageActive(page);
> > -				ClearPageReferenced(page);
> > -				add_page_to_lru_list(zone, page, lru);
> > -				__count_vm_event(PGDEACTIVATE);
> > -
> > -				update_page_reclaim_stat(zone, page, file, 0);
> > -			}
> > -		}
> > +		__lru_deactivate(page, zone);
> >  	}
> >  	if (zone)
> >  		spin_unlock_irq(&zone->lru_lock);
> > @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
> >  
> >  	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
> >  	if (pagevec_count(pvec))
> > -		__pagevec_lru_deactive(pvec);
> > +		__pagevec_lru_deactivate(pvec);
> >  }
> >  
> >  /*
> > - * Forecfully demote a page to the tail of the inactive list.
> > + * Forcefully deactivate a page.
> > + * This function is used for reclaiming the page ASAP when the page
> > + * can't be invalidated by Dirty/Writeback.
> >   */
> >  void lru_deactivate_page(struct page *page)
> >  {
> > @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
> >  		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
> >  
> >  		if (!pagevec_add(pvec, page))
> > -			__pagevec_lru_deactive(pvec);
> > +			__pagevec_lru_deactivate(pvec);
> >  		put_cpu_var(lru_deactivate_pvecs);
> >  	}
> >  }
> >  
> > -
> 
> Unnecessary whitespace change there.

Thanks. Fixed.

> 
> >  void lru_add_drain(void)
> >  {
> >  	drain_cpu_pagevecs(get_cpu());
> 
> Functionally, I think this will work (although I'd like a clarification
> on why active pages are rotated). It'd be nice if there was a test case
> for this but it's a bit of a chicken-and-egg problem :/

I hope Ben have a good result.
Thanks. 

> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-11-29 15:50 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-28 15:02 [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
2010-11-28 15:02 ` [PATCH v2 2/3] move ClearPageReclaim Minchan Kim
2010-11-29  4:25   ` Rik van Riel
2010-11-29  7:29   ` Wu Fengguang
2010-11-29  8:16     ` Minchan Kim
2010-11-29  8:26       ` Wu Fengguang
2010-11-28 15:02 ` [PATCH v2 3/3] Prevent promotion of page in madvise_dontneed Minchan Kim
2010-11-29  4:28   ` Rik van Riel
2010-11-29  4:30     ` Minchan Kim
2010-11-28 15:13 ` [PATCH v2 1/3] deactivate invalidated pages Minchan Kim
2010-11-28 19:02 ` Rik van Riel
2010-11-29  0:33 ` KOSAKI Motohiro
2010-11-29  1:58   ` Ben Gamari
2010-11-29  2:13     ` KOSAKI Motohiro
2010-11-29  2:26       ` Ben Gamari
2010-11-29  2:13   ` Minchan Kim
2010-11-29  2:35     ` KOSAKI Motohiro
2010-11-29  7:49 ` Wu Fengguang
2010-11-29  8:09   ` Minchan Kim
2010-11-29 12:07 ` Mel Gorman
2010-11-29 15:28   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).