[PATCH]vmscan: handle underflow for get_scan

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH]vmscan: handle underflow for get_scan_ratio
@ 2010-03-30  5:53 Shaohua Li
  2010-03-30  6:08 ` KOSAKI Motohiro
                   ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Shaohua Li @ 2010-03-30  5:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm, fengguang.wu, kosaki.motohiro

Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
With it, our tmpfs test always oom. The test has a lot of rotated anon
pages and cause percent[0] zero. Actually the percent[0] is a very small
value, but our calculation round it to zero. The commit makes vmscan
completely skip anon pages and cause oops.
An option is if percent[x] is zero in get_scan_ratio(), forces it
to 1. See below patch.
But the offending commit still changes behavior. Without the commit, we scan
all pages if priority is zero, below patch doesn't fix this. Don't know if
It's required to fix this too.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..d5cc34e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	/* Normalize to percentages */
 	percent[0] = 100 * ap / (ap + fp + 1);
 	percent[1] = 100 - percent[0];
+	/*
+	 * if percent[x] is small and rounded to 0, this case doesn't mean we
+	 * should skip scan. Give it at least 1% share.
+	 */
+	if (percent[0] == 0) {
+		percent[0] = 1;
+		percent[1] = 99;
+	}
+	if (percent[1] == 0) {
+		percent[0] = 99;
+		percent[1] = 1;
+	}
 }

 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li
@ 2010-03-30  6:08 ` KOSAKI Motohiro
  2010-03-30  6:32   ` Shaohua Li
  2010-03-31  4:53   ` Shaohua Li
  2010-03-30 10:17 ` Minchan Kim
  2010-03-30 11:56 ` Balbir Singh
  2 siblings, 2 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-30  6:08 UTC (permalink / raw)
  To: Shaohua Li; +Cc: kosaki.motohiro, linux-mm, linux-kernel, akpm, fengguang.wu

Hi

> Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> With it, our tmpfs test always oom. The test has a lot of rotated anon
> pages and cause percent[0] zero. Actually the percent[0] is a very small
> value, but our calculation round it to zero. The commit makes vmscan
> completely skip anon pages and cause oops.
> An option is if percent[x] is zero in get_scan_ratio(), forces it
> to 1. See below patch.
> But the offending commit still changes behavior. Without the commit, we scan
> all pages if priority is zero, below patch doesn't fix this. Don't know if
> It's required to fix this too.

Can you please post your /proc/meminfo and reproduce program? I'll digg it.

Very unfortunately, this patch isn't acceptable. In past time, vmscan 
had similar logic, but 1% swap-out made lots bug reports. 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  6:08 ` KOSAKI Motohiro
@ 2010-03-30  6:32   ` Shaohua Li
  2010-03-30  6:40     ` KOSAKI Motohiro
  2010-03-31  4:53   ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2010-03-30  6:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote:
> Hi
> 
> > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > value, but our calculation round it to zero. The commit makes vmscan
> > completely skip anon pages and cause oops.
> > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > to 1. See below patch.
> > But the offending commit still changes behavior. Without the commit, we scan
> > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > It's required to fix this too.
> 
> Can you please post your /proc/meminfo 
attached.
> and reproduce program? I'll digg it.
our test is quite sample. mount tmpfs with double memory size and store several
copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build.
for example, there is 3G memory and then tmpfs size is 6G and there is 6
kernel copy.
> Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> had similar logic, but 1% swap-out made lots bug reports. 
can you elaborate this?
Completely restore previous behavior (do full scan with priority 0) is
ok too.

[-- Attachment #2: meminfo --]
[-- Type: text/plain, Size: 1114 bytes --]

MemTotal:        3078372 kB
MemFree:           34612 kB
Buffers:             960 kB
Cached:          2577644 kB
SwapCached:        24580 kB
Active:          2125712 kB
Inactive:         541972 kB
Active(anon):    2120276 kB
Inactive(anon):   534696 kB
Active(file):       5436 kB
Inactive(file):     7276 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6297472 kB
SwapFree:        6184988 kB
Dirty:              1720 kB
Writeback:         24024 kB
AnonPages:         64888 kB
Mapped:             3996 kB
Shmem:           2566004 kB
Slab:             290416 kB
SReclaimable:     113840 kB
SUnreclaim:       176576 kB
KernelStack:        3072 kB
PageTables:         6832 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7836656 kB
Committed_AS:    2811564 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      289724 kB
VmallocChunk:   34359444080 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10240 kB
DirectMap2M:     3127296 kB

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  6:32   ` Shaohua Li
@ 2010-03-30  6:40     ` KOSAKI Motohiro
  2010-03-30  6:53       ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-30  6:40 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > value, but our calculation round it to zero. The commit makes vmscan
> > > completely skip anon pages and cause oops.
> > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > to 1. See below patch.
> > > But the offending commit still changes behavior. Without the commit, we scan
> > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > It's required to fix this too.
> > 
> > Can you please post your /proc/meminfo 
> attached.
> > and reproduce program? I'll digg it.
> our test is quite sample. mount tmpfs with double memory size and store several
> copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build.
> for example, there is 3G memory and then tmpfs size is 6G and there is 6
> kernel copy.

Wow, tmpfs size > memsize!


> > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > had similar logic, but 1% swap-out made lots bug reports. 
> can you elaborate this?
> Completely restore previous behavior (do full scan with priority 0) is
> ok too.

This is a option. but we need to know the root cause anyway. if not,
we might reintroduce this issue again in the future.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  6:40     ` KOSAKI Motohiro
@ 2010-03-30  6:53       ` Shaohua Li
  2010-03-30  7:31         ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2010-03-30  6:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Tue, Mar 30, 2010 at 02:40:07PM +0800, KOSAKI Motohiro wrote:
> > On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote:
> > > Hi
> > > 
> > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > completely skip anon pages and cause oops.
> > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > to 1. See below patch.
> > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > It's required to fix this too.
> > > 
> > > Can you please post your /proc/meminfo 
> > attached.
> > > and reproduce program? I'll digg it.
> > our test is quite sample. mount tmpfs with double memory size and store several
> > copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build.
> > for example, there is 3G memory and then tmpfs size is 6G and there is 6
> > kernel copy.
> 
> Wow, tmpfs size > memsize!
> 
> 
> > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > had similar logic, but 1% swap-out made lots bug reports. 
> > can you elaborate this?
> > Completely restore previous behavior (do full scan with priority 0) is
> > ok too.
> 
> This is a option. but we need to know the root cause anyway.
I thought I mentioned the root cause in first mail. My debug shows
recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes
percent[0] 0. But you can double check too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  6:53       ` Shaohua Li
@ 2010-03-30  7:31         ` KOSAKI Motohiro
  2010-03-30  8:13           ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-30  7:31 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > can you elaborate this?
> > > Completely restore previous behavior (do full scan with priority 0) is
> > > ok too.
> > 
> > This is a option. but we need to know the root cause anyway.
> I thought I mentioned the root cause in first mail. My debug shows
> recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes
> percent[0] 0. But you can double check too.

To revert can save percent[0]==0 && priority==0 case. but it shouldn't
happen, I think. It mean to happen big latency issue.

Can you please try following patch? Also, I'll prepare reproduce environment soon.



---
 mm/vmscan.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..abf7f79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1571,15 +1571,19 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	 */
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
 		spin_lock_irq(&zone->lru_lock);
-		reclaim_stat->recent_scanned[0] /= 2;
-		reclaim_stat->recent_rotated[0] /= 2;
+		while (reclaim_stat->recent_scanned[0] > anon / 4) {
+			reclaim_stat->recent_scanned[0] /= 2;
+			reclaim_stat->recent_rotated[0] /= 2;
+		}
 		spin_unlock_irq(&zone->lru_lock);
 	}
 
 	if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
 		spin_lock_irq(&zone->lru_lock);
-		reclaim_stat->recent_scanned[1] /= 2;
-		reclaim_stat->recent_rotated[1] /= 2;
+		while (reclaim_stat->recent_scanned[1] > file / 4) {
+			reclaim_stat->recent_scanned[1] /= 2;
+			reclaim_stat->recent_rotated[1] /= 2;
+		}
 		spin_unlock_irq(&zone->lru_lock);
 	}
 
-- 
1.6.5.2





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  7:31         ` KOSAKI Motohiro
@ 2010-03-30  8:13           ` Shaohua Li
  0 siblings, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2010-03-30  8:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Tue, Mar 30, 2010 at 03:31:40PM +0800, KOSAKI Motohiro wrote:
> > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > > can you elaborate this?
> > > > Completely restore previous behavior (do full scan with priority 0) is
> > > > ok too.
> > > 
> > > This is a option. but we need to know the root cause anyway.
> > I thought I mentioned the root cause in first mail. My debug shows
> > recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes
> > percent[0] 0. But you can double check too.
> 
> To revert can save percent[0]==0 && priority==0 case. but it shouldn't
> happen, I think. It mean to happen big latency issue.
> 
> Can you please try following patch? Also, I'll prepare reproduce environment soon.
still oom with the patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  6:08 ` KOSAKI Motohiro
  2010-03-30  6:32   ` Shaohua Li
@ 2010-03-31  4:53   ` Shaohua Li
  2010-03-31  5:38     ` KOSAKI Motohiro
  2010-03-31  5:41     ` Wu Fengguang
  1 sibling, 2 replies; 49+ messages in thread
From: Shaohua Li @ 2010-03-31  4:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> Hi
> 
> > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > value, but our calculation round it to zero. The commit makes vmscan
> > completely skip anon pages and cause oops.
> > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > to 1. See below patch.
> > But the offending commit still changes behavior. Without the commit, we scan
> > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > It's required to fix this too.
> 
> Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> 
> Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> had similar logic, but 1% swap-out made lots bug reports. 
if 1% is still big, how about below patch?

Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
With it, our tmpfs test always oom. The test has a lot of rotated anon
pages and cause percent[0] zero. Actually the percent[0] is a very small
value, but our calculation round it to zero. The commit makes vmscan
completely skip anon pages and cause oops.
To avoid underflow, we don't use percentage, instead we directly calculate
how many pages should be scaned.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..80a7ed5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 }
 
 /*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+				       unsigned long *nr_saved_scan)
+{
+	unsigned long nr;
+
+	*nr_saved_scan += nr_to_scan;
+	nr = *nr_saved_scan;
+
+	if (nr >= SWAP_CLUSTER_MAX)
+		*nr_saved_scan = 0;
+	else
+		nr = 0;
+
+	return nr;
+}
+
+/*
  * Determine how aggressively the anon and file LRU lists should be
  * scanned.  The relative value of each set of LRU lists is determined
  * by looking at the fraction of the pages scanned we did rotate back
  * onto the active list instead of evict.
  *
- * percent[0] specifies how much pressure to put on ram/swap backed
- * memory, while percent[1] determines pressure on the file LRUs.
+ * nr[x] specifies how many pages should be scaned
  */
-static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
-					unsigned long *percent)
+static void get_scan_count(struct zone *zone, struct scan_control *sc,
+				unsigned long *nr, int priority)
 {
 	unsigned long anon, file, free;
 	unsigned long anon_prio, file_prio;
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	unsigned long fraction[2], denominator[2];
+	enum lru_list l;
 
 	/* If we have no swap space, do not bother scanning anon pages. */
 	if (!sc->may_swap || (nr_swap_pages <= 0)) {
-		percent[0] = 0;
-		percent[1] = 100;
-		return;
+		fraction[0] = 0;
+		denominator[0] = 1;
+		fraction[1] = 1;
+		denominator[1] = 1;
+		goto out;
 	}
 
 	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
@@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 		/* If we have very few page cache pages,
 		   force-scan anon pages. */
 		if (unlikely(file + free <= high_wmark_pages(zone))) {
-			percent[0] = 100;
-			percent[1] = 0;
-			return;
+			fraction[0] = 1;
+			denominator[0] = 1;
+			fraction[1] = 0;
+			denominator[1] = 1;
+			goto out;
 		}
 	}
 
@@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
 
-	/* Normalize to percentages */
-	percent[0] = 100 * ap / (ap + fp + 1);
-	percent[1] = 100 - percent[0];
-}
-
-/*
- * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
- * until we collected @swap_cluster_max pages to scan.
- */
-static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
-				       unsigned long *nr_saved_scan)
-{
-	unsigned long nr;
+	fraction[0] = ap;
+	denominator[0] = ap + fp + 1;
+	fraction[1] = fp;
+	denominator[1] = ap + fp + 1;
 
-	*nr_saved_scan += nr_to_scan;
-	nr = *nr_saved_scan;
+out:
+	for_each_evictable_lru(l) {
+		int file = is_file_lru(l);
+		unsigned long scan;
 
-	if (nr >= SWAP_CLUSTER_MAX)
-		*nr_saved_scan = 0;
-	else
-		nr = 0;
+		if (fraction[file] == 0) {
+			nr[l] = 0;
+			continue;
+		}
 
-	return nr;
+		scan = zone_nr_lru_pages(zone, sc, l);
+		if (priority) {
+			scan >>= priority;
+			scan = (scan * fraction[file] / denominator[file]);
+		}
+		nr[l] = nr_scan_try_batch(scan,
+					  &reclaim_stat->nr_saved_scan[l]);
+	}
 }
 
 /*
@@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
 {
 	unsigned long nr[NR_LRU_LISTS];
 	unsigned long nr_to_scan;
-	unsigned long percent[2];	/* anon @ 0; file @ 1 */
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
-
-	get_scan_ratio(zone, sc, percent);
 
-	for_each_evictable_lru(l) {
-		int file = is_file_lru(l);
-		unsigned long scan;
-
-		if (percent[file] == 0) {
-			nr[l] = 0;
-			continue;
-		}
-
-		scan = zone_nr_lru_pages(zone, sc, l);
-		if (priority) {
-			scan >>= priority;
-			scan = (scan * percent[file]) / 100;
-		}
-		nr[l] = nr_scan_try_batch(scan,
-					  &reclaim_stat->nr_saved_scan[l]);
-	}
+	get_scan_count(zone, sc, nr, priority);
 
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  4:53   ` Shaohua Li
@ 2010-03-31  5:38     ` KOSAKI Motohiro
  2010-03-31  5:51       ` Wu Fengguang
  2010-03-31  5:53       ` KOSAKI Motohiro
  2010-03-31  5:41     ` Wu Fengguang
  1 sibling, 2 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-31  5:38 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > value, but our calculation round it to zero. The commit makes vmscan
> > > completely skip anon pages and cause oops.
> > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > to 1. See below patch.
> > > But the offending commit still changes behavior. Without the commit, we scan
> > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > It's required to fix this too.
> > 
> > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > 
> > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > had similar logic, but 1% swap-out made lots bug reports. 
> if 1% is still big, how about below patch?

This patch makes a lot of sense than previous. however I think <1% anon ratio
shouldn't happen anyway because file lru doesn't have reclaimable pages.
<1% seems no good reclaim rate.

perhaps I'll take your patch for stable tree. but we need to attack the root
cause. iow, I guess we need to fix scan ratio equation itself.



> Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> With it, our tmpfs test always oom. The test has a lot of rotated anon
> pages and cause percent[0] zero. Actually the percent[0] is a very small
> value, but our calculation round it to zero. The commit makes vmscan
> completely skip anon pages and cause oops.
> To avoid underflow, we don't use percentage, instead we directly calculate
> how many pages should be scaned.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..80a7ed5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  }
>  
>  /*
> + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> + * until we collected @swap_cluster_max pages to scan.
> + */
> +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> +				       unsigned long *nr_saved_scan)
> +{
> +	unsigned long nr;
> +
> +	*nr_saved_scan += nr_to_scan;
> +	nr = *nr_saved_scan;
> +
> +	if (nr >= SWAP_CLUSTER_MAX)
> +		*nr_saved_scan = 0;
> +	else
> +		nr = 0;
> +
> +	return nr;
> +}
> +
> +/*
>   * Determine how aggressively the anon and file LRU lists should be
>   * scanned.  The relative value of each set of LRU lists is determined
>   * by looking at the fraction of the pages scanned we did rotate back
>   * onto the active list instead of evict.
>   *
> - * percent[0] specifies how much pressure to put on ram/swap backed
> - * memory, while percent[1] determines pressure on the file LRUs.
> + * nr[x] specifies how many pages should be scaned
>   */
> -static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> -					unsigned long *percent)
> +static void get_scan_count(struct zone *zone, struct scan_control *sc,
> +				unsigned long *nr, int priority)
>  {
>  	unsigned long anon, file, free;
>  	unsigned long anon_prio, file_prio;
>  	unsigned long ap, fp;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> +	unsigned long fraction[2], denominator[2];
> +	enum lru_list l;
>  
>  	/* If we have no swap space, do not bother scanning anon pages. */
>  	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> -		percent[0] = 0;
> -		percent[1] = 100;
> -		return;
> +		fraction[0] = 0;
> +		denominator[0] = 1;
> +		fraction[1] = 1;
> +		denominator[1] = 1;
> +		goto out;
>  	}
>  
>  	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  		/* If we have very few page cache pages,
>  		   force-scan anon pages. */
>  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> -			percent[0] = 100;
> -			percent[1] = 0;
> -			return;
> +			fraction[0] = 1;
> +			denominator[0] = 1;
> +			fraction[1] = 0;
> +			denominator[1] = 1;
> +			goto out;
>  		}
>  	}
>  
> @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>  	fp /= reclaim_stat->recent_rotated[1] + 1;
>  
> -	/* Normalize to percentages */
> -	percent[0] = 100 * ap / (ap + fp + 1);
> -	percent[1] = 100 - percent[0];
> -}
> -
> -/*
> - * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> - * until we collected @swap_cluster_max pages to scan.
> - */
> -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> -				       unsigned long *nr_saved_scan)
> -{
> -	unsigned long nr;
> +	fraction[0] = ap;
> +	denominator[0] = ap + fp + 1;
> +	fraction[1] = fp;
> +	denominator[1] = ap + fp + 1;
>  
> -	*nr_saved_scan += nr_to_scan;
> -	nr = *nr_saved_scan;
> +out:
> +	for_each_evictable_lru(l) {
> +		int file = is_file_lru(l);
> +		unsigned long scan;
>  
> -	if (nr >= SWAP_CLUSTER_MAX)
> -		*nr_saved_scan = 0;
> -	else
> -		nr = 0;
> +		if (fraction[file] == 0) {
> +			nr[l] = 0;
> +			continue;
> +		}
>  
> -	return nr;
> +		scan = zone_nr_lru_pages(zone, sc, l);
> +		if (priority) {
> +			scan >>= priority;
> +			scan = (scan * fraction[file] / denominator[file]);
> +		}
> +		nr[l] = nr_scan_try_batch(scan,
> +					  &reclaim_stat->nr_saved_scan[l]);
> +	}
>  }
>  
>  /*
> @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
>  {
>  	unsigned long nr[NR_LRU_LISTS];
>  	unsigned long nr_to_scan;
> -	unsigned long percent[2];	/* anon @ 0; file @ 1 */
>  	enum lru_list l;
>  	unsigned long nr_reclaimed = sc->nr_reclaimed;
>  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> -	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> -
> -	get_scan_ratio(zone, sc, percent);
>  
> -	for_each_evictable_lru(l) {
> -		int file = is_file_lru(l);
> -		unsigned long scan;
> -
> -		if (percent[file] == 0) {
> -			nr[l] = 0;
> -			continue;
> -		}
> -
> -		scan = zone_nr_lru_pages(zone, sc, l);
> -		if (priority) {
> -			scan >>= priority;
> -			scan = (scan * percent[file]) / 100;
> -		}
> -		nr[l] = nr_scan_try_batch(scan,
> -					  &reclaim_stat->nr_saved_scan[l]);
> -	}
> +	get_scan_count(zone, sc, nr, priority);
>  
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  5:38     ` KOSAKI Motohiro
@ 2010-03-31  5:51       ` Wu Fengguang
  2010-03-31  6:00         ` KOSAKI Motohiro
  2010-03-31  5:53       ` KOSAKI Motohiro
  1 sibling, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-03-31  5:51 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

KOSAKI-san,

On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote:
> > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > Hi
> > > 
> > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > completely skip anon pages and cause oops.
> > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > to 1. See below patch.
> > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > It's required to fix this too.
> > > 
> > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > 
> > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > had similar logic, but 1% swap-out made lots bug reports. 
> > if 1% is still big, how about below patch?
> 
> This patch makes a lot of sense than previous. however I think <1% anon ratio
> shouldn't happen anyway because file lru doesn't have reclaimable pages.
> <1% seems no good reclaim rate.
> 
> perhaps I'll take your patch for stable tree. but we need to attack the root
> cause. iow, I guess we need to fix scan ratio equation itself.

I tend to regard this patch as a general improvement for both
.33-stable and .34. 

I do agree with you that it's desirable to do more test&analyze and
check further for possibly hidden problems.

Thanks,
Fengguang


> 
> 
> > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > value, but our calculation round it to zero. The commit makes vmscan
> > completely skip anon pages and cause oops.
> > To avoid underflow, we don't use percentage, instead we directly calculate
> > how many pages should be scaned.
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 79c8098..80a7ed5 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> >  }
> >  
> >  /*
> > + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> > + * until we collected @swap_cluster_max pages to scan.
> > + */
> > +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> > +				       unsigned long *nr_saved_scan)
> > +{
> > +	unsigned long nr;
> > +
> > +	*nr_saved_scan += nr_to_scan;
> > +	nr = *nr_saved_scan;
> > +
> > +	if (nr >= SWAP_CLUSTER_MAX)
> > +		*nr_saved_scan = 0;
> > +	else
> > +		nr = 0;
> > +
> > +	return nr;
> > +}
> > +
> > +/*
> >   * Determine how aggressively the anon and file LRU lists should be
> >   * scanned.  The relative value of each set of LRU lists is determined
> >   * by looking at the fraction of the pages scanned we did rotate back
> >   * onto the active list instead of evict.
> >   *
> > - * percent[0] specifies how much pressure to put on ram/swap backed
> > - * memory, while percent[1] determines pressure on the file LRUs.
> > + * nr[x] specifies how many pages should be scaned
> >   */
> > -static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> > -					unsigned long *percent)
> > +static void get_scan_count(struct zone *zone, struct scan_control *sc,
> > +				unsigned long *nr, int priority)
> >  {
> >  	unsigned long anon, file, free;
> >  	unsigned long anon_prio, file_prio;
> >  	unsigned long ap, fp;
> >  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> > +	unsigned long fraction[2], denominator[2];
> > +	enum lru_list l;
> >  
> >  	/* If we have no swap space, do not bother scanning anon pages. */
> >  	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> > -		percent[0] = 0;
> > -		percent[1] = 100;
> > -		return;
> > +		fraction[0] = 0;
> > +		denominator[0] = 1;
> > +		fraction[1] = 1;
> > +		denominator[1] = 1;
> > +		goto out;
> >  	}
> >  
> >  	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> > @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> >  		/* If we have very few page cache pages,
> >  		   force-scan anon pages. */
> >  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> > -			percent[0] = 100;
> > -			percent[1] = 0;
> > -			return;
> > +			fraction[0] = 1;
> > +			denominator[0] = 1;
> > +			fraction[1] = 0;
> > +			denominator[1] = 1;
> > +			goto out;
> >  		}
> >  	}
> >  
> > @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> >  	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
> >  	fp /= reclaim_stat->recent_rotated[1] + 1;
> >  
> > -	/* Normalize to percentages */
> > -	percent[0] = 100 * ap / (ap + fp + 1);
> > -	percent[1] = 100 - percent[0];
> > -}
> > -
> > -/*
> > - * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> > - * until we collected @swap_cluster_max pages to scan.
> > - */
> > -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> > -				       unsigned long *nr_saved_scan)
> > -{
> > -	unsigned long nr;
> > +	fraction[0] = ap;
> > +	denominator[0] = ap + fp + 1;
> > +	fraction[1] = fp;
> > +	denominator[1] = ap + fp + 1;
> >  
> > -	*nr_saved_scan += nr_to_scan;
> > -	nr = *nr_saved_scan;
> > +out:
> > +	for_each_evictable_lru(l) {
> > +		int file = is_file_lru(l);
> > +		unsigned long scan;
> >  
> > -	if (nr >= SWAP_CLUSTER_MAX)
> > -		*nr_saved_scan = 0;
> > -	else
> > -		nr = 0;
> > +		if (fraction[file] == 0) {
> > +			nr[l] = 0;
> > +			continue;
> > +		}
> >  
> > -	return nr;
> > +		scan = zone_nr_lru_pages(zone, sc, l);
> > +		if (priority) {
> > +			scan >>= priority;
> > +			scan = (scan * fraction[file] / denominator[file]);
> > +		}
> > +		nr[l] = nr_scan_try_batch(scan,
> > +					  &reclaim_stat->nr_saved_scan[l]);
> > +	}
> >  }
> >  
> >  /*
> > @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
> >  {
> >  	unsigned long nr[NR_LRU_LISTS];
> >  	unsigned long nr_to_scan;
> > -	unsigned long percent[2];	/* anon @ 0; file @ 1 */
> >  	enum lru_list l;
> >  	unsigned long nr_reclaimed = sc->nr_reclaimed;
> >  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> > -	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> > -
> > -	get_scan_ratio(zone, sc, percent);
> >  
> > -	for_each_evictable_lru(l) {
> > -		int file = is_file_lru(l);
> > -		unsigned long scan;
> > -
> > -		if (percent[file] == 0) {
> > -			nr[l] = 0;
> > -			continue;
> > -		}
> > -
> > -		scan = zone_nr_lru_pages(zone, sc, l);
> > -		if (priority) {
> > -			scan >>= priority;
> > -			scan = (scan * percent[file]) / 100;
> > -		}
> > -		nr[l] = nr_scan_try_batch(scan,
> > -					  &reclaim_stat->nr_saved_scan[l]);
> > -	}
> > +	get_scan_count(zone, sc, nr, priority);
> >  
> >  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
> >  					nr[LRU_INACTIVE_FILE]) {
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  5:51       ` Wu Fengguang
@ 2010-03-31  6:00         ` KOSAKI Motohiro
  2010-03-31  6:03           ` Wu Fengguang
  2010-04-01 22:16           ` Andrew Morton
  0 siblings, 2 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-31  6:00 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org

> KOSAKI-san,
> 
> On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote:
> > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > Hi
> > > > 
> > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > completely skip anon pages and cause oops.
> > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > to 1. See below patch.
> > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > It's required to fix this too.
> > > > 
> > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > 
> > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > if 1% is still big, how about below patch?
> > 
> > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > <1% seems no good reclaim rate.
> > 
> > perhaps I'll take your patch for stable tree. but we need to attack the root
> > cause. iow, I guess we need to fix scan ratio equation itself.
> 
> I tend to regard this patch as a general improvement for both
> .33-stable and .34. 
> 
> I do agree with you that it's desirable to do more test&analyze and
> check further for possibly hidden problems.

Yeah, I don't want ignore .33-stable too. if I can't find the root cause
in 2-3 days, I'll revert guilty patch anyway.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  6:00         ` KOSAKI Motohiro
@ 2010-03-31  6:03           ` Wu Fengguang
  2010-04-01 22:16           ` Andrew Morton
  1 sibling, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2010-03-31  6:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

On Wed, Mar 31, 2010 at 02:00:52PM +0800, KOSAKI Motohiro wrote:
> > KOSAKI-san,
> > 
> > On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote:
> > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > > Hi
> > > > > 
> > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > > completely skip anon pages and cause oops.
> > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > > to 1. See below patch.
> > > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > > It's required to fix this too.
> > > > > 
> > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > > 
> > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > > if 1% is still big, how about below patch?
> > > 
> > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > <1% seems no good reclaim rate.
> > > 
> > > perhaps I'll take your patch for stable tree. but we need to attack the root
> > > cause. iow, I guess we need to fix scan ratio equation itself.
> > 
> > I tend to regard this patch as a general improvement for both
> > .33-stable and .34. 
> > 
> > I do agree with you that it's desirable to do more test&analyze and
> > check further for possibly hidden problems.
> 
> Yeah, I don't want ignore .33-stable too. if I can't find the root cause
> in 2-3 days, I'll revert guilty patch anyway.

OK, thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  6:00         ` KOSAKI Motohiro
  2010-03-31  6:03           ` Wu Fengguang
@ 2010-04-01 22:16           ` Andrew Morton
  2010-04-02  9:13             ` KOSAKI Motohiro
  1 sibling, 1 reply; 49+ messages in thread
From: Andrew Morton @ 2010-04-01 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Wed, 31 Mar 2010 15:00:52 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > KOSAKI-san,
> > 
> > On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote:
> > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > > Hi
> > > > > 
> > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > > completely skip anon pages and cause oops.
> > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > > to 1. See below patch.
> > > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > > It's required to fix this too.
> > > > > 
> > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > > 
> > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > > if 1% is still big, how about below patch?
> > > 
> > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > <1% seems no good reclaim rate.
> > > 
> > > perhaps I'll take your patch for stable tree. but we need to attack the root
> > > cause. iow, I guess we need to fix scan ratio equation itself.
> > 
> > I tend to regard this patch as a general improvement for both
> > .33-stable and .34. 
> > 
> > I do agree with you that it's desirable to do more test&analyze and
> > check further for possibly hidden problems.
> 
> Yeah, I don't want ignore .33-stable too. if I can't find the root cause
> in 2-3 days, I'll revert guilty patch anyway.
> 

It's a good idea to avoid fixing a bug one-way-in-stable,
other-way-in-mainline.  Because then we have new code in both trees
which is different.  And the -stable guys sensibly like to see code get
a bit of a shakedown in mainline before backporting it.

So it would be better to merge the "simple" patch into mainline, tagged
for -stable backporting.  Then we can later implement the larger fix in
mainline, perhaps starting by reverting the "simple" fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-01 22:16           ` Andrew Morton
@ 2010-04-02  9:13             ` KOSAKI Motohiro
  2010-04-06  1:22               ` Wu Fengguang
  2010-04-06  3:36               ` Rik van Riel
  0 siblings, 2 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-02  9:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Wu Fengguang, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

> > Yeah, I don't want ignore .33-stable too. if I can't find the root cause
> > in 2-3 days, I'll revert guilty patch anyway.
> > 
> 
> It's a good idea to avoid fixing a bug one-way-in-stable,
> other-way-in-mainline.  Because then we have new code in both trees
> which is different.  And the -stable guys sensibly like to see code get
> a bit of a shakedown in mainline before backporting it.
> 
> So it would be better to merge the "simple" patch into mainline, tagged
> for -stable backporting.  Then we can later implement the larger fix in
> mainline, perhaps starting by reverting the "simple" fix.

.....ok. I don't have to prevent your code maintainship. although I still 
think we need to fix the issue completely.


===================================================================
From 52358cbccdfe94e0381974cd6e937bcc6b1c608b Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 2 Apr 2010 17:13:48 +0900
Subject: [PATCH] Revert "vmscan: get_scan_ratio() cleanup"

Shaohua Li reported his tmpfs streaming I/O test can lead to make oom.
The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs,
there are 6 copies of kernel source and the test does kbuild for each
copy. His investigation shows the test has a lot of rotated anon
pages and quite few file pages, so get_scan_ratio calculates percent[0]
(i.e. scanning percent for anon)  to be zero. Actually the percent[0]
shoule be a big value, but our calculation round it to zero.

Although before commit 84b18490, we have the same sick too. but the old
logic can rescue percent[0]==0 case only when priority==0. It had hided
the real issue. I didn't think merely streaming io can makes percent[0]==0
&& priority==0 situation. but I was wrong.

So, definitely we have to fix such tmpfs streaming io issue. but anyway
I revert the regression commit at first.

This reverts commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26.

Reported-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   23 +++++++++--------------
 1 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..cb3947e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1535,13 +1535,6 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
-	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || (nr_swap_pages <= 0)) {
-		percent[0] = 0;
-		percent[1] = 100;
-		return;
-	}
-
 	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
 		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
 	file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
@@ -1639,20 +1632,22 @@ static void shrink_zone(int priority, struct zone *zone,
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	int noswap = 0;
 
-	get_scan_ratio(zone, sc, percent);
+	/* If we have no swap space, do not bother scanning anon pages. */
+	if (!sc->may_swap || (nr_swap_pages <= 0)) {
+		noswap = 1;
+		percent[0] = 0;
+		percent[1] = 100;
+	} else
+		get_scan_ratio(zone, sc, percent);
 
 	for_each_evictable_lru(l) {
 		int file = is_file_lru(l);
 		unsigned long scan;
 
-		if (percent[file] == 0) {
-			nr[l] = 0;
-			continue;
-		}
-
 		scan = zone_nr_lru_pages(zone, sc, l);
-		if (priority) {
+		if (priority || noswap) {
 			scan >>= priority;
 			scan = (scan * percent[file]) / 100;
 		}
-- 
1.6.5.2





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  9:13             ` KOSAKI Motohiro
@ 2010-04-06  1:22               ` Wu Fengguang
  2010-04-06  3:36               ` Rik van Riel
  1 sibling, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  1:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

> ===================================================================
> >From 52358cbccdfe94e0381974cd6e937bcc6b1c608b Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Fri, 2 Apr 2010 17:13:48 +0900
> Subject: [PATCH] Revert "vmscan: get_scan_ratio() cleanup"
> 
> Shaohua Li reported his tmpfs streaming I/O test can lead to make oom.
> The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs,
> there are 6 copies of kernel source and the test does kbuild for each
> copy. His investigation shows the test has a lot of rotated anon
> pages and quite few file pages, so get_scan_ratio calculates percent[0]
> (i.e. scanning percent for anon)  to be zero. Actually the percent[0]
> shoule be a big value, but our calculation round it to zero.

  should      small :)

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  9:13             ` KOSAKI Motohiro
  2010-04-06  1:22               ` Wu Fengguang
@ 2010-04-06  3:36               ` Rik van Riel
  1 sibling, 0 replies; 49+ messages in thread
From: Rik van Riel @ 2010-04-06  3:36 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Wu Fengguang, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On 04/02/2010 05:13 AM, KOSAKI Motohiro wrote:
>>> Yeah, I don't want ignore .33-stable too. if I can't find the root cause
>>> in 2-3 days, I'll revert guilty patch anyway.
>>>
>>
>> It's a good idea to avoid fixing a bug one-way-in-stable,
>> other-way-in-mainline.  Because then we have new code in both trees
>> which is different.  And the -stable guys sensibly like to see code get
>> a bit of a shakedown in mainline before backporting it.
>>
>> So it would be better to merge the "simple" patch into mainline, tagged
>> for -stable backporting.  Then we can later implement the larger fix in
>> mainline, perhaps starting by reverting the "simple" fix.
>
> .....ok. I don't have to prevent your code maintainship. although I still
> think we need to fix the issue completely.

Agreed on the revert.

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  5:38     ` KOSAKI Motohiro
  2010-03-31  5:51       ` Wu Fengguang
@ 2010-03-31  5:53       ` KOSAKI Motohiro
  2010-04-02  6:50         ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-31  5:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Shaohua Li, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > Hi
> > > 
> > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > completely skip anon pages and cause oops.
> > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > to 1. See below patch.
> > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > It's required to fix this too.
> > > 
> > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > 
> > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > had similar logic, but 1% swap-out made lots bug reports. 
> > if 1% is still big, how about below patch?
> 
> This patch makes a lot of sense than previous. however I think <1% anon ratio
> shouldn't happen anyway because file lru doesn't have reclaimable pages.
> <1% seems no good reclaim rate.

Oops, the above mention is wrong. sorry. only 1 page is still too big.
because under streaming io workload, the number of scanning anon pages should
be zero. this is very strong requirement. if not, backup operation will makes
a lot of swapping out.

Anyway, I'm digging this issue.


> 
> perhaps I'll take your patch for stable tree. but we need to attack the root
> cause. iow, I guess we need to fix scan ratio equation itself.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  5:53       ` KOSAKI Motohiro
@ 2010-04-02  6:50         ` Shaohua Li
  2010-04-02  9:14           ` KOSAKI Motohiro
                             ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-02  6:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote:
> > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > Hi
> > > > 
> > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > completely skip anon pages and cause oops.
> > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > to 1. See below patch.
> > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > It's required to fix this too.
> > > > 
> > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > 
> > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > if 1% is still big, how about below patch?
> > 
> > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > <1% seems no good reclaim rate.
> 
> Oops, the above mention is wrong. sorry. only 1 page is still too big.
> because under streaming io workload, the number of scanning anon pages should
> be zero. this is very strong requirement. if not, backup operation will makes
> a lot of swapping out.
Sounds there is no big impact for the workload which you mentioned with the patch.
please see below descriptions.
I updated the description of the patch as fengguang suggested.



Commit 84b18490d introduces a regression. With it, our tmpfs test always oom.
The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are
6 copies of kernel source and the test does kbuild for each copy. My
investigation shows the test has a lot of rotated anon pages and quite few
file pages, so get_scan_ratio calculates percent[0] to be zero. Actually
the percent[0] shoule be a very small value, but our calculation round it
to zero. The commit makes vmscan completely skip anon pages and cause oops.

To avoid underflow, we don't use percentage, instead we directly calculate
how many pages should be scaned. In this way, we should get several scan pages
for < 1% percent. With this fix, my test doesn't oom any more.

Note, this patch doesn't really change logics, but just increase precise. For
system with a lot of memory, this might slightly changes behavior. For example,
in a sequential file read workload, without the patch, we don't swap any anon
pages. With it, if anon memory size is bigger than 16G, we will say one anon page
swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap
is assumed to be 1024 which is common in this workload. So the impact sounds not
a big deal.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..80a7ed5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 }
 
 /*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+				       unsigned long *nr_saved_scan)
+{
+	unsigned long nr;
+
+	*nr_saved_scan += nr_to_scan;
+	nr = *nr_saved_scan;
+
+	if (nr >= SWAP_CLUSTER_MAX)
+		*nr_saved_scan = 0;
+	else
+		nr = 0;
+
+	return nr;
+}
+
+/*
  * Determine how aggressively the anon and file LRU lists should be
  * scanned.  The relative value of each set of LRU lists is determined
  * by looking at the fraction of the pages scanned we did rotate back
  * onto the active list instead of evict.
  *
- * percent[0] specifies how much pressure to put on ram/swap backed
- * memory, while percent[1] determines pressure on the file LRUs.
+ * nr[x] specifies how many pages should be scaned
  */
-static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
-					unsigned long *percent)
+static void get_scan_count(struct zone *zone, struct scan_control *sc,
+				unsigned long *nr, int priority)
 {
 	unsigned long anon, file, free;
 	unsigned long anon_prio, file_prio;
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	unsigned long fraction[2], denominator[2];
+	enum lru_list l;
 
 	/* If we have no swap space, do not bother scanning anon pages. */
 	if (!sc->may_swap || (nr_swap_pages <= 0)) {
-		percent[0] = 0;
-		percent[1] = 100;
-		return;
+		fraction[0] = 0;
+		denominator[0] = 1;
+		fraction[1] = 1;
+		denominator[1] = 1;
+		goto out;
 	}
 
 	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
@@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 		/* If we have very few page cache pages,
 		   force-scan anon pages. */
 		if (unlikely(file + free <= high_wmark_pages(zone))) {
-			percent[0] = 100;
-			percent[1] = 0;
-			return;
+			fraction[0] = 1;
+			denominator[0] = 1;
+			fraction[1] = 0;
+			denominator[1] = 1;
+			goto out;
 		}
 	}
 
@@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
 
-	/* Normalize to percentages */
-	percent[0] = 100 * ap / (ap + fp + 1);
-	percent[1] = 100 - percent[0];
-}
-
-/*
- * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
- * until we collected @swap_cluster_max pages to scan.
- */
-static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
-				       unsigned long *nr_saved_scan)
-{
-	unsigned long nr;
+	fraction[0] = ap;
+	denominator[0] = ap + fp + 1;
+	fraction[1] = fp;
+	denominator[1] = ap + fp + 1;
 
-	*nr_saved_scan += nr_to_scan;
-	nr = *nr_saved_scan;
+out:
+	for_each_evictable_lru(l) {
+		int file = is_file_lru(l);
+		unsigned long scan;
 
-	if (nr >= SWAP_CLUSTER_MAX)
-		*nr_saved_scan = 0;
-	else
-		nr = 0;
+		if (fraction[file] == 0) {
+			nr[l] = 0;
+			continue;
+		}
 
-	return nr;
+		scan = zone_nr_lru_pages(zone, sc, l);
+		if (priority) {
+			scan >>= priority;
+			scan = (scan * fraction[file] / denominator[file]);
+		}
+		nr[l] = nr_scan_try_batch(scan,
+					  &reclaim_stat->nr_saved_scan[l]);
+	}
 }
 
 /*
@@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
 {
 	unsigned long nr[NR_LRU_LISTS];
 	unsigned long nr_to_scan;
-	unsigned long percent[2];	/* anon @ 0; file @ 1 */
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
-
-	get_scan_ratio(zone, sc, percent);
 
-	for_each_evictable_lru(l) {
-		int file = is_file_lru(l);
-		unsigned long scan;
-
-		if (percent[file] == 0) {
-			nr[l] = 0;
-			continue;
-		}
-
-		scan = zone_nr_lru_pages(zone, sc, l);
-		if (priority) {
-			scan >>= priority;
-			scan = (scan * percent[file]) / 100;
-		}
-		nr[l] = nr_scan_try_batch(scan,
-					  &reclaim_stat->nr_saved_scan[l]);
-	}
+	get_scan_count(zone, sc, nr, priority);
 
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  6:50         ` Shaohua Li
@ 2010-04-02  9:14           ` KOSAKI Motohiro
  2010-04-02  9:24             ` Shaohua Li
  2010-04-04  0:48           ` Wu Fengguang
  2010-04-06  5:03           ` Wu Fengguang
  2 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-02  9:14 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > <1% seems no good reclaim rate.
> > 
> > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > because under streaming io workload, the number of scanning anon pages should
> > be zero. this is very strong requirement. if not, backup operation will makes
> > a lot of swapping out.
> Sounds there is no big impact for the workload which you mentioned with the patch.
> please see below descriptions.
> I updated the description of the patch as fengguang suggested.

Umm.. sorry, no.

"one fix but introduce another one bug" is not good deal. instead, 
I'll revert the guilty commit at first as akpm mentioned.

thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  9:14           ` KOSAKI Motohiro
@ 2010-04-02  9:24             ` Shaohua Li
  2010-04-04 14:19               ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2010-04-02  9:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > <1% seems no good reclaim rate.
> > > 
> > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > because under streaming io workload, the number of scanning anon pages should
> > > be zero. this is very strong requirement. if not, backup operation will makes
> > > a lot of swapping out.
> > Sounds there is no big impact for the workload which you mentioned with the patch.
> > please see below descriptions.
> > I updated the description of the patch as fengguang suggested.
> 
> Umm.. sorry, no.
> 
> "one fix but introduce another one bug" is not good deal. instead, 
> I'll revert the guilty commit at first as akpm mentioned.
Even we revert the commit, the patch still has its benefit, as it increases
calculation precision, right?

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  9:24             ` Shaohua Li
@ 2010-04-04 14:19               ` KOSAKI Motohiro
  2010-04-06  1:25                 ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-04 14:19 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > <1% seems no good reclaim rate.
> > > > 
> > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > because under streaming io workload, the number of scanning anon pages should
> > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > a lot of swapping out.
> > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > please see below descriptions.
> > > I updated the description of the patch as fengguang suggested.
> > 
> > Umm.. sorry, no.
> > 
> > "one fix but introduce another one bug" is not good deal. instead, 
> > I'll revert the guilty commit at first as akpm mentioned.
> Even we revert the commit, the patch still has its benefit, as it increases
> calculation precision, right?

no, you shouldn't ignore the regression case.

If we can remove the streaming io corner case by another patch, this patch
can be considered to merge.

thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-04 14:19               ` KOSAKI Motohiro
@ 2010-04-06  1:25                 ` Shaohua Li
  2010-04-06  1:36                   ` KOSAKI Motohiro
  2010-04-06  1:50                   ` Wu Fengguang
  0 siblings, 2 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-06  1:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > <1% seems no good reclaim rate.
> > > > > 
> > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > a lot of swapping out.
> > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > please see below descriptions.
> > > > I updated the description of the patch as fengguang suggested.
> > > 
> > > Umm.. sorry, no.
> > > 
> > > "one fix but introduce another one bug" is not good deal. instead, 
> > > I'll revert the guilty commit at first as akpm mentioned.
> > Even we revert the commit, the patch still has its benefit, as it increases
> > calculation precision, right?
> 
> no, you shouldn't ignore the regression case.
I don't think this is serious. In my calculation, there is only 1 page swapped out
for 6G anonmous memory. 1 page should haven't any performance impact.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  1:25                 ` Shaohua Li
@ 2010-04-06  1:36                   ` KOSAKI Motohiro
  2010-04-06  1:50                   ` Wu Fengguang
  1 sibling, 0 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-06  1:36 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Wu, Fengguang

> On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > <1% seems no good reclaim rate.
> > > > > > 
> > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > a lot of swapping out.
> > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > please see below descriptions.
> > > > > I updated the description of the patch as fengguang suggested.
> > > > 
> > > > Umm.. sorry, no.
> > > > 
> > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > I'll revert the guilty commit at first as akpm mentioned.
> > > Even we revert the commit, the patch still has its benefit, as it increases
> > > calculation precision, right?
> > 
> > no, you shouldn't ignore the regression case.
> I don't think this is serious. In my calculation, there is only 1 page swapped out
> for 6G anonmous memory. 1 page should haven't any performance impact.

there is. I had received exactly opposite claim. because shrink_zone()
is not called only once. it is called very much time.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  1:25                 ` Shaohua Li
  2010-04-06  1:36                   ` KOSAKI Motohiro
@ 2010-04-06  1:50                   ` Wu Fengguang
  2010-04-06  2:06                     ` KOSAKI Motohiro
  1 sibling, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  1:50 UTC (permalink / raw)
  To: Li, Shaohua
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Rik van Riel

On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > <1% seems no good reclaim rate.
> > > > > > 
> > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > a lot of swapping out.
> > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > please see below descriptions.
> > > > > I updated the description of the patch as fengguang suggested.
> > > > 
> > > > Umm.. sorry, no.
> > > > 
> > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > I'll revert the guilty commit at first as akpm mentioned.
> > > Even we revert the commit, the patch still has its benefit, as it increases
> > > calculation precision, right?
> > 
> > no, you shouldn't ignore the regression case.

> I don't think this is serious. In my calculation, there is only 1 page swapped out
> for 6G anonmous memory. 1 page should haven't any performance impact.

1 anon page scanned for every N file pages scanned?

Is N a _huge_ enough ratio so that the anon list will be very light scanned?

Rik: here is a little background.

Under streaming IO, the current get_scan_ratio() will get a percent[0]
that is (much) less than 1, so underflows to 0.

It has the bad effect of completely disabling the scan of anon list,
which leads to OOM in Shaohua's test case. OTOH, it also has the good
side effect of keeping anon pages in memory and totally prevent swap
IO.

Shaohua's patch improves the computation precision by computing nr[]
directly in get_scan_ratio(). This is good in general, however will
enable light scan of the anon list on streaming IO.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  1:50                   ` Wu Fengguang
@ 2010-04-06  2:06                     ` KOSAKI Motohiro
  2010-04-06  2:30                       ` Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-06  2:06 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	Rik van Riel

> On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > <1% seems no good reclaim rate.
> > > > > > > 
> > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > a lot of swapping out.
> > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > please see below descriptions.
> > > > > > I updated the description of the patch as fengguang suggested.
> > > > > 
> > > > > Umm.. sorry, no.
> > > > > 
> > > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > calculation precision, right?
> > > 
> > > no, you shouldn't ignore the regression case.
> 
> > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > for 6G anonmous memory. 1 page should haven't any performance impact.
> 
> 1 anon page scanned for every N file pages scanned?
> 
> Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> 
> Rik: here is a little background.

The problem is, the VM are couteniously discarding no longer used file
cache. if we are scan extra anon 1 page, we will observe tons swap usage
after few days.

please don't only think benchmark.


> Under streaming IO, the current get_scan_ratio() will get a percent[0]
> that is (much) less than 1, so underflows to 0.
> 
> It has the bad effect of completely disabling the scan of anon list,
> which leads to OOM in Shaohua's test case. OTOH, it also has the good
> side effect of keeping anon pages in memory and totally prevent swap
> IO.
> 
> Shaohua's patch improves the computation precision by computing nr[]
> directly in get_scan_ratio(). This is good in general, however will
> enable light scan of the anon list on streaming IO.

In such case, percent[0] should be big. I think underflowing is not point.
His test case is merely streaming io copy, why can't we drop tmpfs cached
page? his /proc/meminfo describe his machine didn't have droppable file cache.
so, big percent[1] value seems makes no sense. no?

I'm not sure we need either below detection. I need more investigate.
 1) detect no discardable file cache
 2) detect streaming io on tmpfs (as regular file)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  2:06                     ` KOSAKI Motohiro
@ 2010-04-06  2:30                       ` Wu Fengguang
  2010-04-06  2:58                         ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  2:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Rik van Riel

On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > > 
> > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > a lot of swapping out.
> > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > please see below descriptions.
> > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > > 
> > > > > > Umm.. sorry, no.
> > > > > > 
> > > > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > calculation precision, right?
> > > > 
> > > > no, you shouldn't ignore the regression case.
> > 
> > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > for 6G anonmous memory. 1 page should haven't any performance impact.
> > 
> > 1 anon page scanned for every N file pages scanned?
> > 
> > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> > 
> > Rik: here is a little background.
> 
> The problem is, the VM are couteniously discarding no longer used file
> cache. if we are scan extra anon 1 page, we will observe tons swap usage
> after few days.
> 
> please don't only think benchmark.

OK the days-of-streaming-io typically happen in file servers.  Suppose
a file server with 16GB memory, 1GB of which is consumed by anonymous
pages, others are for page cache.

Assume that the exact file:anon ratio computed by the get_scan_ratio()
algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
to 0, which keeps the anon pages in memory for the few days.

Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
be rounded down to 0. It only becomes >=1 when
- reclaim runs into trouble and priority goes low
- anon list goes huge

So I guess Shaohua's patch still has reasonable "underflow" threshold :)

Thanks,
Fengguang

> 
> > Under streaming IO, the current get_scan_ratio() will get a percent[0]
> > that is (much) less than 1, so underflows to 0.
> > 
> > It has the bad effect of completely disabling the scan of anon list,
> > which leads to OOM in Shaohua's test case. OTOH, it also has the good
> > side effect of keeping anon pages in memory and totally prevent swap
> > IO.
> > 
> > Shaohua's patch improves the computation precision by computing nr[]
> > directly in get_scan_ratio(). This is good in general, however will
> > enable light scan of the anon list on streaming IO.
> 
> In such case, percent[0] should be big. I think underflowing is not point.
> His test case is merely streaming io copy, why can't we drop tmpfs cached
> page? his /proc/meminfo describe his machine didn't have droppable file cache.
> so, big percent[1] value seems makes no sense. no?
> 
> I'm not sure we need either below detection. I need more investigate.
>  1) detect no discardable file cache
>  2) detect streaming io on tmpfs (as regular file)
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  2:30                       ` Wu Fengguang
@ 2010-04-06  2:58                         ` KOSAKI Motohiro
  2010-04-06  3:31                           ` Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-06  2:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	Rik van Riel

> On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > > > 
> > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > > a lot of swapping out.
> > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > > please see below descriptions.
> > > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > > > 
> > > > > > > Umm.. sorry, no.
> > > > > > > 
> > > > > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > > calculation precision, right?
> > > > > 
> > > > > no, you shouldn't ignore the regression case.
> > > 
> > > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > > for 6G anonmous memory. 1 page should haven't any performance impact.
> > > 
> > > 1 anon page scanned for every N file pages scanned?
> > > 
> > > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> > > 
> > > Rik: here is a little background.
> > 
> > The problem is, the VM are couteniously discarding no longer used file
> > cache. if we are scan extra anon 1 page, we will observe tons swap usage
> > after few days.
> > 
> > please don't only think benchmark.
> 
> OK the days-of-streaming-io typically happen in file servers.  Suppose
> a file server with 16GB memory, 1GB of which is consumed by anonymous
> pages, others are for page cache.
> 
> Assume that the exact file:anon ratio computed by the get_scan_ratio()
> algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
> to 0, which keeps the anon pages in memory for the few days.
> 
> Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
> be rounded down to 0. It only becomes >=1 when
> - reclaim runs into trouble and priority goes low
> - anon list goes huge
> 
> So I guess Shaohua's patch still has reasonable "underflow" threshold :)

Again, I didn't said his patch is no worth. I only said we don't have to
ignore the downside. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  2:58                         ` KOSAKI Motohiro
@ 2010-04-06  3:31                           ` Wu Fengguang
  2010-04-06  3:40                             ` Rik van Riel
  0 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  3:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, Rik van Riel

On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote:
> > On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote:
> > > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote:
> > > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote:
> > > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote:
> > > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > > > > > > > > <1% seems no good reclaim rate.
> > > > > > > > > > 
> > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > > > > > > > > because under streaming io workload, the number of scanning anon pages should
> > > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes
> > > > > > > > > > a lot of swapping out.
> > > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch.
> > > > > > > > > please see below descriptions.
> > > > > > > > > I updated the description of the patch as fengguang suggested.
> > > > > > > > 
> > > > > > > > Umm.. sorry, no.
> > > > > > > > 
> > > > > > > > "one fix but introduce another one bug" is not good deal. instead, 
> > > > > > > > I'll revert the guilty commit at first as akpm mentioned.
> > > > > > > Even we revert the commit, the patch still has its benefit, as it increases
> > > > > > > calculation precision, right?
> > > > > > 
> > > > > > no, you shouldn't ignore the regression case.
> > > > 
> > > > > I don't think this is serious. In my calculation, there is only 1 page swapped out
> > > > > for 6G anonmous memory. 1 page should haven't any performance impact.
> > > > 
> > > > 1 anon page scanned for every N file pages scanned?
> > > > 
> > > > Is N a _huge_ enough ratio so that the anon list will be very light scanned?
> > > > 
> > > > Rik: here is a little background.
> > > 
> > > The problem is, the VM are couteniously discarding no longer used file
> > > cache. if we are scan extra anon 1 page, we will observe tons swap usage
> > > after few days.
> > > 
> > > please don't only think benchmark.
> > 
> > OK the days-of-streaming-io typically happen in file servers.  Suppose
> > a file server with 16GB memory, 1GB of which is consumed by anonymous
> > pages, others are for page cache.
> > 
> > Assume that the exact file:anon ratio computed by the get_scan_ratio()
> > algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down
> > to 0, which keeps the anon pages in memory for the few days.
> > 
> > Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also
> > be rounded down to 0. It only becomes >=1 when
> > - reclaim runs into trouble and priority goes low
> > - anon list goes huge
> > 
> > So I guess Shaohua's patch still has reasonable "underflow" threshold :)
> 
> Again, I didn't said his patch is no worth. I only said we don't have to
> ignore the downside. 

Right, we should document both the upside and downside.

The main difference happens when file:anon scan ratio > 100:1.

For the current percent[] based computing, percent[0]=0 hence nr[0]=0
which disables anon list scan unconditionally, for good or for bad.

For the direct nr[] computing,
- nr[0] will be 0 for typical file servers, because with priority=12
  and anon lru size < 1.6GB, nr[0] = (anon_size/4096)/100 < 0
- nr[0] will be non-zero when priority=1 and anon_size > 100 pages,
  this stops OOM for Shaohua's test case, however may not be enough to
  guarantee safety (your previous reverting patch can provide this
  guarantee).

I liked Shaohua's patch a lot -- it adapts well to both the
file-server case and the mostly-anon-pages case :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  3:31                           ` Wu Fengguang
@ 2010-04-06  3:40                             ` Rik van Riel
  2010-04-06  4:49                               ` Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2010-04-06  3:40 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KOSAKI Motohiro, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org

On 04/05/2010 11:31 PM, Wu Fengguang wrote:
> On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote:
>> Again, I didn't said his patch is no worth. I only said we don't have to
>> ignore the downside.
>
> Right, we should document both the upside and downside.

The downside is obvious: streaming IO (used once data
that does not fit in the cache) can push out data that
is used more often - requiring that it be swapped in
at a later point in time.

I understand what Shaohua's patch does, but I do not
understand the upside.  What good does it do to increase
the size of the cache for streaming IO data, which is
generally touched only once?

What kind of performance benefits can we get by doing
that?

> The main difference happens when file:anon scan ratio>  100:1.
>
> For the current percent[] based computing, percent[0]=0 hence nr[0]=0
> which disables anon list scan unconditionally, for good or for bad.
>
> For the direct nr[] computing,
> - nr[0] will be 0 for typical file servers, because with priority=12
>    and anon lru size<  1.6GB, nr[0] = (anon_size/4096)/100<  0
> - nr[0] will be non-zero when priority=1 and anon_size>  100 pages,
>    this stops OOM for Shaohua's test case, however may not be enough to
>    guarantee safety (your previous reverting patch can provide this
>    guarantee).
>
> I liked Shaohua's patch a lot -- it adapts well to both the
> file-server case and the mostly-anon-pages case :)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  3:40                             ` Rik van Riel
@ 2010-04-06  4:49                               ` Wu Fengguang
  2010-04-06  5:09                                 ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  4:49 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Li, Shaohua, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org

On Tue, Apr 06, 2010 at 11:40:47AM +0800, Rik van Riel wrote:
> On 04/05/2010 11:31 PM, Wu Fengguang wrote:
> > On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote:
> >> Again, I didn't said his patch is no worth. I only said we don't have to
> >> ignore the downside.
> >
> > Right, we should document both the upside and downside.
> 
> The downside is obvious: streaming IO (used once data
> that does not fit in the cache) can push out data that
> is used more often - requiring that it be swapped in
> at a later point in time.
> 
> I understand what Shaohua's patch does, but I do not
> understand the upside.  What good does it do to increase
> the size of the cache for streaming IO data, which is
> generally touched only once?

Not that bad :)  With Shaohua's patch the anon list will typically
_never_ get scanned, just like before.

If it's mostly use-once IO, file:anon will be 1000 or even 10000, and
priority=12.  Then only anon lists larger than 16GB or 160GB will get
nr[0] >= 1.

> What kind of performance benefits can we get by doing
> that?

So vmscan behavior and performance remain the same as before.

For really large anon list, such workload is beyond our imagination.
So we cannot assert "don't scan anon list" will be a benefit.

On the other hand, in the test case of "do stream IO when most memory
occupied by tmpfs pages", it is very bad behavior refuse to scan anon
list in normal and suddenly start scanning _the whole_ anon list when
priority hits 0. Shaohua's patch helps it by gradually increasing the
scan nr of anon list as memory pressure increases.

Thanks,
Fengguang

> > The main difference happens when file:anon scan ratio>  100:1.
> >
> > For the current percent[] based computing, percent[0]=0 hence nr[0]=0
> > which disables anon list scan unconditionally, for good or for bad.
> >
> > For the direct nr[] computing,
> > - nr[0] will be 0 for typical file servers, because with priority=12
> >    and anon lru size<  1.6GB, nr[0] = (anon_size/4096)/100<  0
> > - nr[0] will be non-zero when priority=1 and anon_size>  100 pages,
> >    this stops OOM for Shaohua's test case, however may not be enough to
> >    guarantee safety (your previous reverting patch can provide this
> >    guarantee).
> >
> > I liked Shaohua's patch a lot -- it adapts well to both the
> > file-server case and the mostly-anon-pages case :)
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  4:49                               ` Wu Fengguang
@ 2010-04-06  5:09                                 ` Shaohua Li
  0 siblings, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-06  5:09 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org

On Tue, Apr 06, 2010 at 12:49:10PM +0800, Wu, Fengguang wrote:
> On Tue, Apr 06, 2010 at 11:40:47AM +0800, Rik van Riel wrote:
> > On 04/05/2010 11:31 PM, Wu Fengguang wrote:
> > > On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote:
> > >> Again, I didn't said his patch is no worth. I only said we don't have to
> > >> ignore the downside.
> > >
> > > Right, we should document both the upside and downside.
> > 
> > The downside is obvious: streaming IO (used once data
> > that does not fit in the cache) can push out data that
> > is used more often - requiring that it be swapped in
> > at a later point in time.
> > 
> > I understand what Shaohua's patch does, but I do not
> > understand the upside.  What good does it do to increase
> > the size of the cache for streaming IO data, which is
> > generally touched only once?
> 
> Not that bad :)  With Shaohua's patch the anon list will typically
> _never_ get scanned, just like before.
> 
> If it's mostly use-once IO, file:anon will be 1000 or even 10000, and
> priority=12.  Then only anon lists larger than 16GB or 160GB will get
> nr[0] >= 1.
> 
> > What kind of performance benefits can we get by doing
> > that?
> 
> So vmscan behavior and performance remain the same as before.
> 
> For really large anon list, such workload is beyond our imagination.
> So we cannot assert "don't scan anon list" will be a benefit.
> 
> On the other hand, in the test case of "do stream IO when most memory
> occupied by tmpfs pages", it is very bad behavior refuse to scan anon
> list in normal and suddenly start scanning _the whole_ anon list when
> priority hits 0. Shaohua's patch helps it by gradually increasing the
> scan nr of anon list as memory pressure increases.
Yep, the gradually increasing scan nr is the main advantage in my mind.

Thanks,
Shaohua
> > > The main difference happens when file:anon scan ratio>  100:1.
> > >
> > > For the current percent[] based computing, percent[0]=0 hence nr[0]=0
> > > which disables anon list scan unconditionally, for good or for bad.
> > >
> > > For the direct nr[] computing,
> > > - nr[0] will be 0 for typical file servers, because with priority=12
> > >    and anon lru size<  1.6GB, nr[0] = (anon_size/4096)/100<  0
> > > - nr[0] will be non-zero when priority=1 and anon_size>  100 pages,
> > >    this stops OOM for Shaohua's test case, however may not be enough to
> > >    guarantee safety (your previous reverting patch can provide this
> > >    guarantee).
> > >
> > > I liked Shaohua's patch a lot -- it adapts well to both the
> > > file-server case and the mostly-anon-pages case :)
> > 
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  6:50         ` Shaohua Li
  2010-04-02  9:14           ` KOSAKI Motohiro
@ 2010-04-04  0:48           ` Wu Fengguang
  2010-04-06  1:27             ` Shaohua Li
  2010-04-06  5:03           ` Wu Fengguang
  2 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2010-04-04  0:48 UTC (permalink / raw)
  To: Li, Shaohua
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

On Fri, Apr 02, 2010 at 02:50:52PM +0800, Li, Shaohua wrote:
> On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote:
> > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > > Hi
> > > > > 
> > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > > completely skip anon pages and cause oops.
> > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > > to 1. See below patch.
> > > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > > It's required to fix this too.
> > > > > 
> > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > > 
> > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > > if 1% is still big, how about below patch?
> > > 
> > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > <1% seems no good reclaim rate.
> > 
> > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > because under streaming io workload, the number of scanning anon pages should
> > be zero. this is very strong requirement. if not, backup operation will makes
> > a lot of swapping out.
> Sounds there is no big impact for the workload which you mentioned with the patch.
> please see below descriptions.
> I updated the description of the patch as fengguang suggested.
> 
> 
> 
> Commit 84b18490d introduces a regression. With it, our tmpfs test always oom.
> The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are
> 6 copies of kernel source and the test does kbuild for each copy. My
> investigation shows the test has a lot of rotated anon pages and quite few
> file pages, so get_scan_ratio calculates percent[0] to be zero. Actually
> the percent[0] shoule be a very small value, but our calculation round it
> to zero. The commit makes vmscan completely skip anon pages and cause oops.
> 
> To avoid underflow, we don't use percentage, instead we directly calculate
> how many pages should be scaned. In this way, we should get several scan pages
> for < 1% percent. With this fix, my test doesn't oom any more.
> 
> Note, this patch doesn't really change logics, but just increase precise. For
> system with a lot of memory, this might slightly changes behavior. For example,
> in a sequential file read workload, without the patch, we don't swap any anon
> pages. With it, if anon memory size is bigger than 16G, we will say one anon page

                                                                  see?

> swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap
> is assumed to be 1024 which is common in this workload. So the impact sounds not
> a big deal.
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..80a7ed5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  }
>  
>  /*
> + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> + * until we collected @swap_cluster_max pages to scan.
> + */
> +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> +				       unsigned long *nr_saved_scan)
> +{
> +	unsigned long nr;
> +
> +	*nr_saved_scan += nr_to_scan;
> +	nr = *nr_saved_scan;
> +
> +	if (nr >= SWAP_CLUSTER_MAX)
> +		*nr_saved_scan = 0;
> +	else
> +		nr = 0;
> +
> +	return nr;
> +}
> +
> +/*
>   * Determine how aggressively the anon and file LRU lists should be
>   * scanned.  The relative value of each set of LRU lists is determined
>   * by looking at the fraction of the pages scanned we did rotate back
>   * onto the active list instead of evict.
>   *
> - * percent[0] specifies how much pressure to put on ram/swap backed
> - * memory, while percent[1] determines pressure on the file LRUs.
> + * nr[x] specifies how many pages should be scaned

The new comment loses information..

>   */
> -static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> -					unsigned long *percent)
> +static void get_scan_count(struct zone *zone, struct scan_control *sc,
> +				unsigned long *nr, int priority)
>  {
>  	unsigned long anon, file, free;
>  	unsigned long anon_prio, file_prio;
>  	unsigned long ap, fp;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> +	unsigned long fraction[2], denominator[2];

denominator[2] can be reduced to denominator.
because denominator[0] == denominator[1] always holds.

> +	enum lru_list l;
>  
>  	/* If we have no swap space, do not bother scanning anon pages. */
>  	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> -		percent[0] = 0;
> -		percent[1] = 100;
> -		return;
> +		fraction[0] = 0;
> +		denominator[0] = 1;
> +		fraction[1] = 1;
> +		denominator[1] = 1;
> +		goto out;
>  	}
>  
>  	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  		/* If we have very few page cache pages,
>  		   force-scan anon pages. */
>  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> -			percent[0] = 100;
> -			percent[1] = 0;
> -			return;
> +			fraction[0] = 1;
> +			denominator[0] = 1;
> +			fraction[1] = 0;
> +			denominator[1] = 1;
> +			goto out;
>  		}
>  	}
>  
> @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>  	fp /= reclaim_stat->recent_rotated[1] + 1;
>  
> -	/* Normalize to percentages */
> -	percent[0] = 100 * ap / (ap + fp + 1);
> -	percent[1] = 100 - percent[0];
> -}
> -
> -/*
> - * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> - * until we collected @swap_cluster_max pages to scan.
> - */
> -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> -				       unsigned long *nr_saved_scan)
> -{
> -	unsigned long nr;
> +	fraction[0] = ap;
> +	denominator[0] = ap + fp + 1;
> +	fraction[1] = fp;
> +	denominator[1] = ap + fp + 1;
>  
> -	*nr_saved_scan += nr_to_scan;
> -	nr = *nr_saved_scan;
> +out:
> +	for_each_evictable_lru(l) {
> +		int file = is_file_lru(l);
> +		unsigned long scan;
>  
> -	if (nr >= SWAP_CLUSTER_MAX)
> -		*nr_saved_scan = 0;
> -	else
> -		nr = 0;
> +		if (fraction[file] == 0) {
> +			nr[l] = 0;
> +			continue;
> +		}
>  
> -	return nr;
> +		scan = zone_nr_lru_pages(zone, sc, l);
> +		if (priority) {
> +			scan >>= priority;
> +			scan = (scan * fraction[file] / denominator[file]);

The "()" is not necessary here, or better end it here^

Thanks,
Fengguang

> +		}
> +		nr[l] = nr_scan_try_batch(scan,
> +					  &reclaim_stat->nr_saved_scan[l]);
> +	}
>  }
>  
>  /*
> @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
>  {
>  	unsigned long nr[NR_LRU_LISTS];
>  	unsigned long nr_to_scan;
> -	unsigned long percent[2];	/* anon @ 0; file @ 1 */
>  	enum lru_list l;
>  	unsigned long nr_reclaimed = sc->nr_reclaimed;
>  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> -	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> -
> -	get_scan_ratio(zone, sc, percent);
>  
> -	for_each_evictable_lru(l) {
> -		int file = is_file_lru(l);
> -		unsigned long scan;
> -
> -		if (percent[file] == 0) {
> -			nr[l] = 0;
> -			continue;
> -		}
> -
> -		scan = zone_nr_lru_pages(zone, sc, l);
> -		if (priority) {
> -			scan >>= priority;
> -			scan = (scan * percent[file]) / 100;
> -		}
> -		nr[l] = nr_scan_try_batch(scan,
> -					  &reclaim_stat->nr_saved_scan[l]);
> -	}
> +	get_scan_count(zone, sc, nr, priority);
>  
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-04  0:48           ` Wu Fengguang
@ 2010-04-06  1:27             ` Shaohua Li
  0 siblings, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-06  1:27 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

On Sun, Apr 04, 2010 at 08:48:38AM +0800, Wu, Fengguang wrote:
> On Fri, Apr 02, 2010 at 02:50:52PM +0800, Li, Shaohua wrote:
> > On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote:
> > > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > > > > > Hi
> > > > > > 
> > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > > > > > value, but our calculation round it to zero. The commit makes vmscan
> > > > > > > completely skip anon pages and cause oops.
> > > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > > > > > to 1. See below patch.
> > > > > > > But the offending commit still changes behavior. Without the commit, we scan
> > > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > > > > > It's required to fix this too.
> > > > > > 
> > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > > > > > 
> > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > > > > > had similar logic, but 1% swap-out made lots bug reports. 
> > > > > if 1% is still big, how about below patch?
> > > > 
> > > > This patch makes a lot of sense than previous. however I think <1% anon ratio
> > > > shouldn't happen anyway because file lru doesn't have reclaimable pages.
> > > > <1% seems no good reclaim rate.
> > > 
> > > Oops, the above mention is wrong. sorry. only 1 page is still too big.
> > > because under streaming io workload, the number of scanning anon pages should
> > > be zero. this is very strong requirement. if not, backup operation will makes
> > > a lot of swapping out.
> > Sounds there is no big impact for the workload which you mentioned with the patch.
> > please see below descriptions.
> > I updated the description of the patch as fengguang suggested.
> > 
> > 
> > 
> > Commit 84b18490d introduces a regression. With it, our tmpfs test always oom.
> > The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are
> > 6 copies of kernel source and the test does kbuild for each copy. My
> > investigation shows the test has a lot of rotated anon pages and quite few
> > file pages, so get_scan_ratio calculates percent[0] to be zero. Actually
> > the percent[0] shoule be a very small value, but our calculation round it
> > to zero. The commit makes vmscan completely skip anon pages and cause oops.
> > 
> > To avoid underflow, we don't use percentage, instead we directly calculate
> > how many pages should be scaned. In this way, we should get several scan pages
> > for < 1% percent. With this fix, my test doesn't oom any more.
> > 
> > Note, this patch doesn't really change logics, but just increase precise. For
> > system with a lot of memory, this might slightly changes behavior. For example,
> > in a sequential file read workload, without the patch, we don't swap any anon
> > pages. With it, if anon memory size is bigger than 16G, we will say one anon page
> 
>                                                                   see?
Thanks, will send a updated against -mm since we reverted the offending patch.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-02  6:50         ` Shaohua Li
  2010-04-02  9:14           ` KOSAKI Motohiro
  2010-04-04  0:48           ` Wu Fengguang
@ 2010-04-06  5:03           ` Wu Fengguang
  2010-04-06  5:36             ` Shaohua Li
  2010-04-09  6:51             ` Shaohua Li
  2 siblings, 2 replies; 49+ messages in thread
From: Wu Fengguang @ 2010-04-06  5:03 UTC (permalink / raw)
  To: Li, Shaohua
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

Shaohua,

> +		scan = zone_nr_lru_pages(zone, sc, l);
> +		if (priority) {
> +			scan >>= priority;
> +			scan = (scan * fraction[file] / denominator[file]);

Ah, the (scan * fraction[file]) may overflow in 32bit kernel!

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  5:03           ` Wu Fengguang
@ 2010-04-06  5:36             ` Shaohua Li
  2010-04-09  6:51             ` Shaohua Li
  1 sibling, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-06  5:36 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

On Tue, Apr 06, 2010 at 01:03:25PM +0800, Wu, Fengguang wrote:
> Shaohua,
> 
> > +		scan = zone_nr_lru_pages(zone, sc, l);
> > +		if (priority) {
> > +			scan >>= priority;
> > +			scan = (scan * fraction[file] / denominator[file]);
> 
> Ah, the (scan * fraction[file]) may overflow in 32bit kernel!
good catch. will change it to u64.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-06  5:03           ` Wu Fengguang
  2010-04-06  5:36             ` Shaohua Li
@ 2010-04-09  6:51             ` Shaohua Li
  2010-04-09 21:20               ` Andrew Morton
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2010-04-09  6:51 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, riel

On Tue, Apr 06, 2010 at 01:03:25PM +0800, Wu, Fengguang wrote:
> Shaohua,
> 
> > +		scan = zone_nr_lru_pages(zone, sc, l);
> > +		if (priority) {
> > +			scan >>= priority;
> > +			scan = (scan * fraction[file] / denominator[file]);
> 
> Ah, the (scan * fraction[file]) may overflow in 32bit kernel!
I updated the patch to address previous issues.
is it possible to put this to -mm tree to see if there is anything wield happen?



get_scan_ratio() calculates percentage and if the percentage is < 1%, it will
round percentage down to 0% and cause we completely ignore scanning anon/file
pages to reclaim memory even the total anon/file pages are very big.

To avoid underflow, we don't use percentage, instead we directly calculate
how many pages should be scaned. In this way, we should get several scanned pages
for < 1% percent.

This has some benefits:
1. increase our calculation precision
2. making our scan more smoothly. Without this, if percent[x] is underflow,
shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority
is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan
some pages.

Note, this patch doesn't really change logics, but just increase precision. For
system with a lot of memory, this might slightly changes behavior. For example,
in a sequential file read workload, without the patch, we don't swap any anon
pages. With it, if anon memory size is bigger than 16G, we will see one anon page
swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap
is assumed to be 1024 which is common in this workload. So the impact sounds not
a big deal.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..1070f83 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1519,21 +1519,52 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 }
 
 /*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+				       unsigned long *nr_saved_scan)
+{
+	unsigned long nr;
+
+	*nr_saved_scan += nr_to_scan;
+	nr = *nr_saved_scan;
+
+	if (nr >= SWAP_CLUSTER_MAX)
+		*nr_saved_scan = 0;
+	else
+		nr = 0;
+
+	return nr;
+}
+
+/*
  * Determine how aggressively the anon and file LRU lists should be
  * scanned.  The relative value of each set of LRU lists is determined
  * by looking at the fraction of the pages scanned we did rotate back
  * onto the active list instead of evict.
  *
- * percent[0] specifies how much pressure to put on ram/swap backed
- * memory, while percent[1] determines pressure on the file LRUs.
+ * nr[0] = anon pages to scan; nr[1] = file pages to scan
  */
-static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
-					unsigned long *percent)
+static void get_scan_count(struct zone *zone, struct scan_control *sc,
+					unsigned long *nr, int priority)
 {
 	unsigned long anon, file, free;
 	unsigned long anon_prio, file_prio;
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	u64 fraction[2], denominator;
+	enum lru_list l;
+	int noswap = 0;
+
+	/* If we have no swap space, do not bother scanning anon pages. */
+	if (!sc->may_swap || (nr_swap_pages <= 0)) {
+		noswap = 1;
+		fraction[0] = 0;
+		fraction[1] = 1;
+		denominator = 1;
+		goto out;
+	}
 
 	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
 		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
@@ -1545,9 +1576,10 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 		/* If we have very few page cache pages,
 		   force-scan anon pages. */
 		if (unlikely(file + free <= high_wmark_pages(zone))) {
-			percent[0] = 100;
-			percent[1] = 0;
-			return;
+			fraction[0] = 1;
+			fraction[1] = 0;
+			denominator = 1;
+			goto out;
 		}
 	}
 
@@ -1594,29 +1626,22 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
 
-	/* Normalize to percentages */
-	percent[0] = 100 * ap / (ap + fp + 1);
-	percent[1] = 100 - percent[0];
-}
-
-/*
- * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
- * until we collected @swap_cluster_max pages to scan.
- */
-static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
-				       unsigned long *nr_saved_scan)
-{
-	unsigned long nr;
-
-	*nr_saved_scan += nr_to_scan;
-	nr = *nr_saved_scan;
-
-	if (nr >= SWAP_CLUSTER_MAX)
-		*nr_saved_scan = 0;
-	else
-		nr = 0;
+	fraction[0] = ap;
+	fraction[1] = fp;
+	denominator = ap + fp + 1;
+out:
+	for_each_evictable_lru(l) {
+		int file = is_file_lru(l);
+		unsigned long scan;
 
-	return nr;
+		scan = zone_nr_lru_pages(zone, sc, l);
+		if (priority || noswap) {
+			scan >>= priority;
+			scan = div64_u64(scan * fraction[file], denominator);
+		}
+		nr[l] = nr_scan_try_batch(scan,
+					  &reclaim_stat->nr_saved_scan[l]);
+	}
 }
 
 /*
@@ -1627,33 +1652,11 @@ static void shrink_zone(int priority, struct zone *zone,
 {
 	unsigned long nr[NR_LRU_LISTS];
 	unsigned long nr_to_scan;
-	unsigned long percent[2];	/* anon @ 0; file @ 1 */
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
-	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
-	int noswap = 0;
-
-	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || (nr_swap_pages <= 0)) {
-		noswap = 1;
-		percent[0] = 0;
-		percent[1] = 100;
-	} else
-		get_scan_ratio(zone, sc, percent);
 
-	for_each_evictable_lru(l) {
-		int file = is_file_lru(l);
-		unsigned long scan;
-
-		scan = zone_nr_lru_pages(zone, sc, l);
-		if (priority || noswap) {
-			scan >>= priority;
-			scan = (scan * percent[file]) / 100;
-		}
-		nr[l] = nr_scan_try_batch(scan,
-					  &reclaim_stat->nr_saved_scan[l]);
-	}
+	get_scan_count(zone, sc, nr, priority);
 
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-09  6:51             ` Shaohua Li
@ 2010-04-09 21:20               ` Andrew Morton
  2010-04-09 21:25                 ` Rik van Riel
  2010-04-12  1:57                 ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Andrew Morton @ 2010-04-09 21:20 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, riel

On Fri, 9 Apr 2010 14:51:04 +0800
Shaohua Li <shaohua.li@intel.com> wrote:

> get_scan_ratio() calculates percentage and if the percentage is < 1%, it will
> round percentage down to 0% and cause we completely ignore scanning anon/file
> pages to reclaim memory even the total anon/file pages are very big.
> 
> To avoid underflow, we don't use percentage, instead we directly calculate
> how many pages should be scaned. In this way, we should get several scanned pages
> for < 1% percent.
> 
> This has some benefits:
> 1. increase our calculation precision
> 2. making our scan more smoothly. Without this, if percent[x] is underflow,
> shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority
> is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan
> some pages.
> 
> Note, this patch doesn't really change logics, but just increase precision. For
> system with a lot of memory, this might slightly changes behavior. For example,
> in a sequential file read workload, without the patch, we don't swap any anon
> pages. With it, if anon memory size is bigger than 16G, we will see one anon page
> swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap
> is assumed to be 1024 which is common in this workload. So the impact sounds not
> a big deal.

I grabbed this.

Did we decide that this needed to be backported into 2.6.33.x?  If so,
some words explaining the reasoning would be needed.

Come to that, it's not obvious that we need this in 2.6.34 either.  What
is the user-visible impact here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-09 21:20               ` Andrew Morton
@ 2010-04-09 21:25                 ` Rik van Riel
  2010-04-13  1:30                   ` KOSAKI Motohiro
  2010-04-12  1:57                 ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2010-04-09 21:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shaohua Li, Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On 04/09/2010 05:20 PM, Andrew Morton wrote:

> Come to that, it's not obvious that we need this in 2.6.34 either.  What
> is the user-visible impact here?

I suspect very little impact, especially during workloads
where we can just reclaim clean page cache at DEF_PRIORITY.
FWIW, the patch looks good to me, so:

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-09 21:25                 ` Rik van Riel
@ 2010-04-13  1:30                   ` KOSAKI Motohiro
  2010-04-13  2:42                     ` Rik van Riel
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-13  1:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

> On 04/09/2010 05:20 PM, Andrew Morton wrote:
> 
> > Come to that, it's not obvious that we need this in 2.6.34 either.  What
> > is the user-visible impact here?
> 
> I suspect very little impact, especially during workloads
> where we can just reclaim clean page cache at DEF_PRIORITY.
> FWIW, the patch looks good to me, so:
> 
> Acked-by: Rik van Riel <riel@redhat.com>
> 

I'm surprised this ack a bit. Rik, do you have any improvement plan about
streaming io detection logic?
I think the patch have a slightly marginal benefit, it help to <1% scan
ratio case. but it have big regression, it cause streaming io (e.g. backup
operation) makes tons swap.

So, I thought we sould do either,
1) drop this one
2) merge to change stream io detection logic improvement at first, and
   merge this one at second.

Am i missing something?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-13  1:30                   ` KOSAKI Motohiro
@ 2010-04-13  2:42                     ` Rik van Riel
  2010-04-13  7:55                       ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2010-04-13  2:42 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Shaohua Li, Wu, Fengguang, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On 04/12/2010 09:30 PM, KOSAKI Motohiro wrote:
>> On 04/09/2010 05:20 PM, Andrew Morton wrote:
>>
>>> Come to that, it's not obvious that we need this in 2.6.34 either.  What
>>> is the user-visible impact here?
>>
>> I suspect very little impact, especially during workloads
>> where we can just reclaim clean page cache at DEF_PRIORITY.
>> FWIW, the patch looks good to me, so:
>>
>> Acked-by: Rik van Riel<riel@redhat.com>
>>
>
> I'm surprised this ack a bit. Rik, do you have any improvement plan about
> streaming io detection logic?
> I think the patch have a slightly marginal benefit, it help to<1% scan
> ratio case. but it have big regression, it cause streaming io (e.g. backup
> operation) makes tons swap.

How?  From the description I believe it took 16GB in
a zone before we start scanning anon pages when
reclaiming at DEF_PRIORITY?

Would that casue a problem?

> So, I thought we sould do either,
> 1) drop this one
> 2) merge to change stream io detection logic improvement at first, and
>     merge this one at second.

We may need better streaming IO detection, anyway.

I have noticed that while heavy sequential reads are fine,
the virtual machines on my desktop system do a lot of whole
block writes.  Presumably, a lot of those writes are to the
same blocks, over and over again.

This causes the blocks to be promoted to the active file
list, which ends up growing the active file list to the
point where things from the working set get evicted.

All for file pages that may only get WRITTEN to by the
guests, because the guests cache their own copy whenever
they need to read them!

I'll have to check the page cache code to see if it
keeps frequently written pages as accessed.  We may be
better off evicting frequently written pages, and
keeping our cache space for data that is read...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-13  2:42                     ` Rik van Riel
@ 2010-04-13  7:55                       ` KOSAKI Motohiro
  2010-04-13  8:55                         ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-13  7:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

> On 04/12/2010 09:30 PM, KOSAKI Motohiro wrote:
> >> On 04/09/2010 05:20 PM, Andrew Morton wrote:
> >>
> >>> Come to that, it's not obvious that we need this in 2.6.34 either.  What
> >>> is the user-visible impact here?
> >>
> >> I suspect very little impact, especially during workloads
> >> where we can just reclaim clean page cache at DEF_PRIORITY.
> >> FWIW, the patch looks good to me, so:
> >>
> >> Acked-by: Rik van Riel<riel@redhat.com>
> >>
> >
> > I'm surprised this ack a bit. Rik, do you have any improvement plan about
> > streaming io detection logic?
> > I think the patch have a slightly marginal benefit, it help to<1% scan
> > ratio case. but it have big regression, it cause streaming io (e.g. backup
> > operation) makes tons swap.
> 
> How?  From the description I believe it took 16GB in
> a zone before we start scanning anon pages when
> reclaiming at DEF_PRIORITY?
> 
> Would that casue a problem?

Please remember, 2.6.27 has following +1 scanning modifier.

  zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
                                                                         ^^^^

and, early (ano not yet merged) SplitLRU VM has similar +1. likes

         scan = zone_nr_lru_pages(zone, sc, l);
         scan >>= priority;
         scan = (scan * percent[file]) / 100 + 1;
                                             ^^^

We didn't think only one page scanning is not big matter. but it was not
correct. we got streaming io bug report. the above +1 makes annoying swap
io. because some server need big backup operation rather much much than
physical memory size.

example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes
1GB scan. and almost server only have some gigabyte swap although it
has >1TB memory.

If my memory is not correct, please correct me.

My point is, greater or smaller than 16GB isn't essential. all patches 
should have big worth than the downside. The description said "the impact 
sounds not a big deal", nobody disagree it. but it's worth is more little.
I don't imagine this patch improve anything.


> 
> > So, I thought we sould do either,
> > 1) drop this one
> > 2) merge to change stream io detection logic improvement at first, and
> >     merge this one at second.
> 
> We may need better streaming IO detection, anyway.

agreed. that's no doubt.


> I have noticed that while heavy sequential reads are fine,
> the virtual machines on my desktop system do a lot of whole
> block writes.  Presumably, a lot of those writes are to the
> same blocks, over and over again.
> 
> This causes the blocks to be promoted to the active file
> list, which ends up growing the active file list to the
> point where things from the working set get evicted.
> 
> All for file pages that may only get WRITTEN to by the
> guests, because the guests cache their own copy whenever
> they need to read them!
> 
> I'll have to check the page cache code to see if it
> keeps frequently written pages as accessed.  We may be
> better off evicting frequently written pages, and
> keeping our cache space for data that is read...

One question, In such case your guest don't use DirectIO?
Or do you talk about guest VM behabior?

I guess inactive_file_is_low_global() can be improvement a lot.
but I'm not sure.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-13  7:55                       ` KOSAKI Motohiro
@ 2010-04-13  8:55                         ` KOSAKI Motohiro
  2010-04-14  1:27                           ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-13  8:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

> > > I'm surprised this ack a bit. Rik, do you have any improvement plan about
> > > streaming io detection logic?
> > > I think the patch have a slightly marginal benefit, it help to<1% scan
> > > ratio case. but it have big regression, it cause streaming io (e.g. backup
> > > operation) makes tons swap.
> > 
> > How?  From the description I believe it took 16GB in
> > a zone before we start scanning anon pages when
> > reclaiming at DEF_PRIORITY?
> > 
> > Would that casue a problem?
> 
> Please remember, 2.6.27 has following +1 scanning modifier.
> 
>   zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
>                                                                          ^^^^
> 
> and, early (ano not yet merged) SplitLRU VM has similar +1. likes
> 
>          scan = zone_nr_lru_pages(zone, sc, l);
>          scan >>= priority;
>          scan = (scan * percent[file]) / 100 + 1;
>                                              ^^^
> 
> We didn't think only one page scanning is not big matter. but it was not
> correct. we got streaming io bug report. the above +1 makes annoying swap
> io. because some server need big backup operation rather much much than
> physical memory size.
> 
> example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes
> 1GB scan. and almost server only have some gigabyte swap although it
> has >1TB memory.
> 
> If my memory is not correct, please correct me.
> 
> My point is, greater or smaller than 16GB isn't essential. all patches 
> should have big worth than the downside. The description said "the impact 
> sounds not a big deal", nobody disagree it. but it's worth is more little.
> I don't imagine this patch improve anything.

And now I've merged this patch into my local vmscan patch queue.
After solving streaming io issue, I'll put it to mainline.

Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-13  8:55                         ` KOSAKI Motohiro
@ 2010-04-14  1:27                           ` Shaohua Li
  2010-04-15  3:25                             ` KOSAKI Motohiro
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2010-04-14  1:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rik van Riel, Andrew Morton, Wu, Fengguang, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Tue, Apr 13, 2010 at 04:55:52PM +0800, KOSAKI Motohiro wrote:
> > > > I'm surprised this ack a bit. Rik, do you have any improvement plan about
> > > > streaming io detection logic?
> > > > I think the patch have a slightly marginal benefit, it help to<1% scan
> > > > ratio case. but it have big regression, it cause streaming io (e.g. backup
> > > > operation) makes tons swap.
> > > 
> > > How?  From the description I believe it took 16GB in
> > > a zone before we start scanning anon pages when
> > > reclaiming at DEF_PRIORITY?
> > > 
> > > Would that casue a problem?
> > 
> > Please remember, 2.6.27 has following +1 scanning modifier.
> > 
> >   zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
> >                                                                          ^^^^
> > 
> > and, early (ano not yet merged) SplitLRU VM has similar +1. likes
> > 
> >          scan = zone_nr_lru_pages(zone, sc, l);
> >          scan >>= priority;
> >          scan = (scan * percent[file]) / 100 + 1;
> >                                              ^^^
> > 
> > We didn't think only one page scanning is not big matter. but it was not
> > correct. we got streaming io bug report. the above +1 makes annoying swap
> > io. because some server need big backup operation rather much much than
> > physical memory size.
> > 
> > example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes
> > 1GB scan. and almost server only have some gigabyte swap although it
> > has >1TB memory.
> > 
> > If my memory is not correct, please correct me.
> > 
> > My point is, greater or smaller than 16GB isn't essential. all patches 
> > should have big worth than the downside. The description said "the impact 
> > sounds not a big deal", nobody disagree it. but it's worth is more little.
> > I don't imagine this patch improve anything.
> 
> And now I've merged this patch into my local vmscan patch queue.
> After solving streaming io issue, I'll put it to mainline.
if the streaming io issue is popular, how about below patch against my last one?
we take priority == DEF_PRIORITY an exception.
 
Index: linux/mm/vmscan.c
===================================================================
--- linux.orig/mm/vmscan.c	2010-04-14 09:03:28.000000000 +0800
+++ linux/mm/vmscan.c	2010-04-14 09:19:56.000000000 +0800
@@ -1629,6 +1629,22 @@ static void get_scan_count(struct zone *
 	fraction[0] = ap;
 	fraction[1] = fp;
 	denominator = ap + fp + 1;
+
+	/*
+	 * memory pressure isn't high, we allow percentage underflow. This
+	 * avoids swap in stream io case.
+	 */
+	if (priority == DEF_PRIORITY) {
+		if (fraction[0] * 99 < fraction[1]) {
+			fraction[0] = 0;
+			fraction[1] = 1;
+			denominator = 1;
+		} else if (fraction[1] * 99 < fraction[0]) {
+			fraction[0] = 1;
+			fraction[1] = 0;
+			denominator = 1;
+		}
+	}
 out:
 	for_each_evictable_lru(l) {
 		int file = is_file_lru(l);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-14  1:27                           ` Shaohua Li
@ 2010-04-15  3:25                             ` KOSAKI Motohiro
  0 siblings, 0 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-04-15  3:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: kosaki.motohiro, Rik van Riel, Andrew Morton, Wu, Fengguang,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

Hi

sorry for the delay.

> > After solving streaming io issue, I'll put it to mainline.
> if the streaming io issue is popular, how about below patch against my last one?
> we take priority == DEF_PRIORITY an exception.

Your patch seems works. but it is obviously ugly and bandaid patch.
So, I like single your previous patch rather than combinate this one.
Even though both dropping makes sense rather than both merge.

Please consider attack root cause.



> Index: linux/mm/vmscan.c
> ===================================================================
> --- linux.orig/mm/vmscan.c	2010-04-14 09:03:28.000000000 +0800
> +++ linux/mm/vmscan.c	2010-04-14 09:19:56.000000000 +0800
> @@ -1629,6 +1629,22 @@ static void get_scan_count(struct zone *
>  	fraction[0] = ap;
>  	fraction[1] = fp;
>  	denominator = ap + fp + 1;
> +
> +	/*
> +	 * memory pressure isn't high, we allow percentage underflow. This
> +	 * avoids swap in stream io case.
> +	 */
> +	if (priority == DEF_PRIORITY) {
> +		if (fraction[0] * 99 < fraction[1]) {
> +			fraction[0] = 0;
> +			fraction[1] = 1;
> +			denominator = 1;
> +		} else if (fraction[1] * 99 < fraction[0]) {
> +			fraction[0] = 1;
> +			fraction[1] = 0;
> +			denominator = 1;
> +		}
> +	}
>  out:
>  	for_each_evictable_lru(l) {
>  		int file = is_file_lru(l);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-04-09 21:20               ` Andrew Morton
  2010-04-09 21:25                 ` Rik van Riel
@ 2010-04-12  1:57                 ` Shaohua Li
  1 sibling, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2010-04-12  1:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, riel@redhat.com

On Sat, Apr 10, 2010 at 05:20:57AM +0800, Andrew Morton wrote:
> On Fri, 9 Apr 2010 14:51:04 +0800
> Shaohua Li <shaohua.li@intel.com> wrote:
> 
> > get_scan_ratio() calculates percentage and if the percentage is < 1%, it will
> > round percentage down to 0% and cause we completely ignore scanning anon/file
> > pages to reclaim memory even the total anon/file pages are very big.
> > 
> > To avoid underflow, we don't use percentage, instead we directly calculate
> > how many pages should be scaned. In this way, we should get several scanned pages
> > for < 1% percent.
> > 
> > This has some benefits:
> > 1. increase our calculation precision
> > 2. making our scan more smoothly. Without this, if percent[x] is underflow,
> > shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority
> > is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan
> > some pages.
> > 
> > Note, this patch doesn't really change logics, but just increase precision. For
> > system with a lot of memory, this might slightly changes behavior. For example,
> > in a sequential file read workload, without the patch, we don't swap any anon
> > pages. With it, if anon memory size is bigger than 16G, we will see one anon page
> > swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap
> > is assumed to be 1024 which is common in this workload. So the impact sounds not
> > a big deal.
> 
> I grabbed this.
> 
> Did we decide that this needed to be backported into 2.6.33.x?  If so,
> some words explaining the reasoning would be needed.
> 
> Come to that, it's not obvious that we need this in 2.6.34 either. 
Not needed.

> is the user-visible impact here?
Should be very small I think.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-31  4:53   ` Shaohua Li
  2010-03-31  5:38     ` KOSAKI Motohiro
@ 2010-03-31  5:41     ` Wu Fengguang
  1 sibling, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2010-03-31  5:41 UTC (permalink / raw)
  To: Li, Shaohua
  Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org

On Wed, Mar 31, 2010 at 12:53:48PM +0800, Li, Shaohua wrote:
> On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> > > With it, our tmpfs test always oom. The test has a lot of rotated anon
> > > pages and cause percent[0] zero. Actually the percent[0] is a very small
> > > value, but our calculation round it to zero. The commit makes vmscan
> > > completely skip anon pages and cause oops.
> > > An option is if percent[x] is zero in get_scan_ratio(), forces it
> > > to 1. See below patch.
> > > But the offending commit still changes behavior. Without the commit, we scan
> > > all pages if priority is zero, below patch doesn't fix this. Don't know if
> > > It's required to fix this too.
> > 
> > Can you please post your /proc/meminfo and reproduce program? I'll digg it.
> > 
> > Very unfortunately, this patch isn't acceptable. In past time, vmscan 
> > had similar logic, but 1% swap-out made lots bug reports. 
> if 1% is still big, how about below patch?
> 
> Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> With it, our tmpfs test always oom. The test has a lot of rotated anon
> pages and cause percent[0] zero. Actually the percent[0] is a very small
> value, but our calculation round it to zero. The commit makes vmscan
> completely skip anon pages and cause oops.
> To avoid underflow, we don't use percentage, instead we directly calculate
> how many pages should be scaned.

The changelog can be improved. For example, to describe these items
in separate paragraphs:

- the behavior change introduced by 84b18490d1f (which claims to be cleanup)
- the tmpfs test case
- the root cause
- the solution
- test result

> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..80a7ed5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  }
>  
>  /*
> + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> + * until we collected @swap_cluster_max pages to scan.
> + */
> +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> +				       unsigned long *nr_saved_scan)
> +{
> +	unsigned long nr;
> +
> +	*nr_saved_scan += nr_to_scan;
> +	nr = *nr_saved_scan;
> +
> +	if (nr >= SWAP_CLUSTER_MAX)
> +		*nr_saved_scan = 0;
> +	else
> +		nr = 0;
> +
> +	return nr;
> +}
> +
> +/*
>   * Determine how aggressively the anon and file LRU lists should be
>   * scanned.  The relative value of each set of LRU lists is determined
>   * by looking at the fraction of the pages scanned we did rotate back
>   * onto the active list instead of evict.
>   *
> - * percent[0] specifies how much pressure to put on ram/swap backed
> - * memory, while percent[1] determines pressure on the file LRUs.
> + * nr[x] specifies how many pages should be scaned

typo: scanned

>   */
> -static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> -					unsigned long *percent)
> +static void get_scan_count(struct zone *zone, struct scan_control *sc,
> +				unsigned long *nr, int priority)
>  {
>  	unsigned long anon, file, free;
>  	unsigned long anon_prio, file_prio;
>  	unsigned long ap, fp;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> +	unsigned long fraction[2], denominator[2];

The denominator is shared, so one scaler would be sufficient.
Also ap, fp can be removed and to use fraction[] directly.

And it's better to retain this comment:
                                        /* anon @ 0; file @ 1 */

> +	enum lru_list l;
>  
>  	/* If we have no swap space, do not bother scanning anon pages. */
>  	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> -		percent[0] = 0;
> -		percent[1] = 100;
> -		return;
> +		fraction[0] = 0;
> +		denominator[0] = 1;
> +		fraction[1] = 1;
> +		denominator[1] = 1;
> +		goto out;
>  	}
>  
>  	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  		/* If we have very few page cache pages,
>  		   force-scan anon pages. */
>  		if (unlikely(file + free <= high_wmark_pages(zone))) {
> -			percent[0] = 100;
> -			percent[1] = 0;
> -			return;
> +			fraction[0] = 1;
> +			denominator[0] = 1;
> +			fraction[1] = 0;
> +			denominator[1] = 1;
> +			goto out;
>  		}
>  	}
>  
> @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>  	fp /= reclaim_stat->recent_rotated[1] + 1;
>  
> -	/* Normalize to percentages */
> -	percent[0] = 100 * ap / (ap + fp + 1);
> -	percent[1] = 100 - percent[0];
> -}
> -
> -/*
> - * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> - * until we collected @swap_cluster_max pages to scan.
> - */
> -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> -				       unsigned long *nr_saved_scan)
> -{
> -	unsigned long nr;
> +	fraction[0] = ap;
> +	denominator[0] = ap + fp + 1;
> +	fraction[1] = fp;
> +	denominator[1] = ap + fp + 1;
>  
> -	*nr_saved_scan += nr_to_scan;
> -	nr = *nr_saved_scan;
> +out:
> +	for_each_evictable_lru(l) {
> +		int file = is_file_lru(l);
> +		unsigned long scan;
>  
> -	if (nr >= SWAP_CLUSTER_MAX)
> -		*nr_saved_scan = 0;
> -	else
> -		nr = 0;
> +		if (fraction[file] == 0) {
> +			nr[l] = 0;
> +			continue;
> +		}
>  
> -	return nr;
> +		scan = zone_nr_lru_pages(zone, sc, l);
> +		if (priority) {
> +			scan >>= priority;
> +			scan = (scan * fraction[file] / denominator[file]);

			scan = (scan * fraction[file]) / denominator[file];

Thanks,
Fengguang

> +		}
> +		nr[l] = nr_scan_try_batch(scan,
> +					  &reclaim_stat->nr_saved_scan[l]);
> +	}
>  }
>  
>  /*
> @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone,
>  {
>  	unsigned long nr[NR_LRU_LISTS];
>  	unsigned long nr_to_scan;
> -	unsigned long percent[2];	/* anon @ 0; file @ 1 */
>  	enum lru_list l;
>  	unsigned long nr_reclaimed = sc->nr_reclaimed;
>  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> -	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> -
> -	get_scan_ratio(zone, sc, percent);
>  
> -	for_each_evictable_lru(l) {
> -		int file = is_file_lru(l);
> -		unsigned long scan;
> -
> -		if (percent[file] == 0) {
> -			nr[l] = 0;
> -			continue;
> -		}
> -
> -		scan = zone_nr_lru_pages(zone, sc, l);
> -		if (priority) {
> -			scan >>= priority;
> -			scan = (scan * percent[file]) / 100;
> -		}
> -		nr[l] = nr_scan_try_batch(scan,
> -					  &reclaim_stat->nr_saved_scan[l]);
> -	}
> +	get_scan_count(zone, sc, nr, priority);
>  
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li
  2010-03-30  6:08 ` KOSAKI Motohiro
@ 2010-03-30 10:17 ` Minchan Kim
  2010-03-30 10:25   ` KOSAKI Motohiro
  2010-03-30 11:56 ` Balbir Singh
  2 siblings, 1 reply; 49+ messages in thread
From: Minchan Kim @ 2010-03-30 10:17 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, linux-kernel, akpm, fengguang.wu, kosaki.motohiro

On Tue, Mar 30, 2010 at 2:53 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> With it, our tmpfs test always oom. The test has a lot of rotated anon
> pages and cause percent[0] zero. Actually the percent[0] is a very small
> value, but our calculation round it to zero. The commit makes vmscan
> completely skip anon pages and cause oops.
> An option is if percent[x] is zero in get_scan_ratio(), forces it
> to 1. See below patch.
> But the offending commit still changes behavior. Without the commit, we scan
> all pages if priority is zero, below patch doesn't fix this. Don't know if
> It's required to fix this too.
>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..d5cc34e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>        /* Normalize to percentages */
>        percent[0] = 100 * ap / (ap + fp + 1);
>        percent[1] = 100 - percent[0];
> +       /*
> +        * if percent[x] is small and rounded to 0, this case doesn't mean we
> +        * should skip scan. Give it at least 1% share.
> +        */
> +       if (percent[0] == 0) {
> +               percent[0] = 1;
> +               percent[1] = 99;
> +       }
> +       if (percent[1] == 0) {
> +               percent[0] = 99;
> +               percent[1] = 1;
> +       }
>  }
>
>  /*
>

Yes. It made subtle change.
But we should not depend that change.
Current logic seems to be good and clear than old.
I think you were lucky at that time by not-good and not-clear logic.

BTW, How about this?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79c8098..f0df563 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1646,11 +1646,6 @@ static void shrink_zone(int priority, struct zone *zone,
                int file = is_file_lru(l);
                unsigned long scan;

-               if (percent[file] == 0) {
-                       nr[l] = 0;
-                       continue;
-               }
-
                scan = zone_nr_lru_pages(zone, sc, l);
                if (priority) {
                        scan >>= priority;




-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30 10:17 ` Minchan Kim
@ 2010-03-30 10:25   ` KOSAKI Motohiro
  0 siblings, 0 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2010-03-30 10:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Shaohua Li, linux-mm, linux-kernel, akpm,
	fengguang.wu

> Yes. It made subtle change.
> But we should not depend that change.
> Current logic seems to be good and clear than old.
> I think you were lucky at that time by not-good and not-clear logic.
> 
> BTW, How about this?

Unfortunatelly, memcg need your removed code. if removed, swapping out
might happen although sc->may_swap==0 when priority==0.

Please give me little investigate time.



> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..f0df563 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1646,11 +1646,6 @@ static void shrink_zone(int priority, struct zone *zone,
>                 int file = is_file_lru(l);
>                 unsigned long scan;
> 
> -               if (percent[file] == 0) {
> -                       nr[l] = 0;
> -                       continue;
> -               }
> -
>                 scan = zone_nr_lru_pages(zone, sc, l);
>                 if (priority) {
>                         scan >>= priority;
> 
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH]vmscan: handle underflow for get_scan_ratio
  2010-03-30  5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li
  2010-03-30  6:08 ` KOSAKI Motohiro
  2010-03-30 10:17 ` Minchan Kim
@ 2010-03-30 11:56 ` Balbir Singh
  2 siblings, 0 replies; 49+ messages in thread
From: Balbir Singh @ 2010-03-30 11:56 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, linux-kernel, akpm, fengguang.wu, kosaki.motohiro

On Tue, Mar 30, 2010 at 11:23 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression.
> With it, our tmpfs test always oom. The test has a lot of rotated anon
> pages and cause percent[0] zero. Actually the percent[0] is a very small
> value, but our calculation round it to zero. The commit makes vmscan
> completely skip anon pages and cause oops.
> An option is if percent[x] is zero in get_scan_ratio(), forces it
> to 1. See below patch.
> But the offending commit still changes behavior. Without the commit, we scan
> all pages if priority is zero, below patch doesn't fix this. Don't know if
> It's required to fix this too.
>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 79c8098..d5cc34e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>        /* Normalize to percentages */
>        percent[0] = 100 * ap / (ap + fp + 1);
>        percent[1] = 100 - percent[0];
> +       /*
> +        * if percent[x] is small and rounded to 0, this case doesn't mean we
> +        * should skip scan. Give it at least 1% share.
> +        */
> +       if (percent[0] == 0) {
> +               percent[0] = 1;
> +               percent[1] = 99;
> +       }
> +       if (percent[1] == 0) {
> +               percent[0] = 99;
> +               percent[1] = 1;
> +       }
>  }
>

Can you please post the meminfo before and after the changes (diff
maybe?). Can you also please share the ap and fp data from which the
percent figures are being calculated Is your swappiness set to 60? Can
you please share the OOM/panic message as well.

Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-04-15  3:25 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-30  5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li
2010-03-30  6:08 ` KOSAKI Motohiro
2010-03-30  6:32   ` Shaohua Li
2010-03-30  6:40     ` KOSAKI Motohiro
2010-03-30  6:53       ` Shaohua Li
2010-03-30  7:31         ` KOSAKI Motohiro
2010-03-30  8:13           ` Shaohua Li
2010-03-31  4:53   ` Shaohua Li
2010-03-31  5:38     ` KOSAKI Motohiro
2010-03-31  5:51       ` Wu Fengguang
2010-03-31  6:00         ` KOSAKI Motohiro
2010-03-31  6:03           ` Wu Fengguang
2010-04-01 22:16           ` Andrew Morton
2010-04-02  9:13             ` KOSAKI Motohiro
2010-04-06  1:22               ` Wu Fengguang
2010-04-06  3:36               ` Rik van Riel
2010-03-31  5:53       ` KOSAKI Motohiro
2010-04-02  6:50         ` Shaohua Li
2010-04-02  9:14           ` KOSAKI Motohiro
2010-04-02  9:24             ` Shaohua Li
2010-04-04 14:19               ` KOSAKI Motohiro
2010-04-06  1:25                 ` Shaohua Li
2010-04-06  1:36                   ` KOSAKI Motohiro
2010-04-06  1:50                   ` Wu Fengguang
2010-04-06  2:06                     ` KOSAKI Motohiro
2010-04-06  2:30                       ` Wu Fengguang
2010-04-06  2:58                         ` KOSAKI Motohiro
2010-04-06  3:31                           ` Wu Fengguang
2010-04-06  3:40                             ` Rik van Riel
2010-04-06  4:49                               ` Wu Fengguang
2010-04-06  5:09                                 ` Shaohua Li
2010-04-04  0:48           ` Wu Fengguang
2010-04-06  1:27             ` Shaohua Li
2010-04-06  5:03           ` Wu Fengguang
2010-04-06  5:36             ` Shaohua Li
2010-04-09  6:51             ` Shaohua Li
2010-04-09 21:20               ` Andrew Morton
2010-04-09 21:25                 ` Rik van Riel
2010-04-13  1:30                   ` KOSAKI Motohiro
2010-04-13  2:42                     ` Rik van Riel
2010-04-13  7:55                       ` KOSAKI Motohiro
2010-04-13  8:55                         ` KOSAKI Motohiro
2010-04-14  1:27                           ` Shaohua Li
2010-04-15  3:25                             ` KOSAKI Motohiro
2010-04-12  1:57                 ` Shaohua Li
2010-03-31  5:41     ` Wu Fengguang
2010-03-30 10:17 ` Minchan Kim
2010-03-30 10:25   ` KOSAKI Motohiro
2010-03-30 11:56 ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).