* [PATCH]vmscan: handle underflow for get_scan_ratio @ 2010-03-30 5:53 Shaohua Li 2010-03-30 6:08 ` KOSAKI Motohiro ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Shaohua Li @ 2010-03-30 5:53 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: akpm, fengguang.wu, kosaki.motohiro Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. With it, our tmpfs test always oom. The test has a lot of rotated anon pages and cause percent[0] zero. Actually the percent[0] is a very small value, but our calculation round it to zero. The commit makes vmscan completely skip anon pages and cause oops. An option is if percent[x] is zero in get_scan_ratio(), forces it to 1. See below patch. But the offending commit still changes behavior. Without the commit, we scan all pages if priority is zero, below patch doesn't fix this. Don't know if It's required to fix this too. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..d5cc34e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, /* Normalize to percentages */ percent[0] = 100 * ap / (ap + fp + 1); percent[1] = 100 - percent[0]; + /* + * if percent[x] is small and rounded to 0, this case doesn't mean we + * should skip scan. Give it at least 1% share. + */ + if (percent[0] == 0) { + percent[0] = 1; + percent[1] = 99; + } + if (percent[1] == 0) { + percent[0] = 99; + percent[1] = 1; + } } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li @ 2010-03-30 6:08 ` KOSAKI Motohiro 2010-03-30 6:32 ` Shaohua Li 2010-03-31 4:53 ` Shaohua Li 2010-03-30 10:17 ` Minchan Kim 2010-03-30 11:56 ` Balbir Singh 2 siblings, 2 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-30 6:08 UTC (permalink / raw) To: Shaohua Li; +Cc: kosaki.motohiro, linux-mm, linux-kernel, akpm, fengguang.wu Hi > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > With it, our tmpfs test always oom. The test has a lot of rotated anon > pages and cause percent[0] zero. Actually the percent[0] is a very small > value, but our calculation round it to zero. The commit makes vmscan > completely skip anon pages and cause oops. > An option is if percent[x] is zero in get_scan_ratio(), forces it > to 1. See below patch. > But the offending commit still changes behavior. Without the commit, we scan > all pages if priority is zero, below patch doesn't fix this. Don't know if > It's required to fix this too. Can you please post your /proc/meminfo and reproduce program? I'll digg it. Very unfortunately, this patch isn't acceptable. In past time, vmscan had similar logic, but 1% swap-out made lots bug reports. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 6:08 ` KOSAKI Motohiro @ 2010-03-30 6:32 ` Shaohua Li 2010-03-30 6:40 ` KOSAKI Motohiro 2010-03-31 4:53 ` Shaohua Li 1 sibling, 1 reply; 49+ messages in thread From: Shaohua Li @ 2010-03-30 6:32 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang [-- Attachment #1: Type: text/plain, Size: 1277 bytes --] On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote: > Hi > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > value, but our calculation round it to zero. The commit makes vmscan > > completely skip anon pages and cause oops. > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > to 1. See below patch. > > But the offending commit still changes behavior. Without the commit, we scan > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > It's required to fix this too. > > Can you please post your /proc/meminfo attached. > and reproduce program? I'll digg it. our test is quite sample. mount tmpfs with double memory size and store several copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build. for example, there is 3G memory and then tmpfs size is 6G and there is 6 kernel copy. > Very unfortunately, this patch isn't acceptable. In past time, vmscan > had similar logic, but 1% swap-out made lots bug reports. can you elaborate this? Completely restore previous behavior (do full scan with priority 0) is ok too. [-- Attachment #2: meminfo --] [-- Type: text/plain, Size: 1114 bytes --] MemTotal: 3078372 kB MemFree: 34612 kB Buffers: 960 kB Cached: 2577644 kB SwapCached: 24580 kB Active: 2125712 kB Inactive: 541972 kB Active(anon): 2120276 kB Inactive(anon): 534696 kB Active(file): 5436 kB Inactive(file): 7276 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6297472 kB SwapFree: 6184988 kB Dirty: 1720 kB Writeback: 24024 kB AnonPages: 64888 kB Mapped: 3996 kB Shmem: 2566004 kB Slab: 290416 kB SReclaimable: 113840 kB SUnreclaim: 176576 kB KernelStack: 3072 kB PageTables: 6832 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7836656 kB Committed_AS: 2811564 kB VmallocTotal: 34359738367 kB VmallocUsed: 289724 kB VmallocChunk: 34359444080 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 3127296 kB ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 6:32 ` Shaohua Li @ 2010-03-30 6:40 ` KOSAKI Motohiro 2010-03-30 6:53 ` Shaohua Li 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-30 6:40 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote: > > Hi > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > value, but our calculation round it to zero. The commit makes vmscan > > > completely skip anon pages and cause oops. > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > to 1. See below patch. > > > But the offending commit still changes behavior. Without the commit, we scan > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > It's required to fix this too. > > > > Can you please post your /proc/meminfo > attached. > > and reproduce program? I'll digg it. > our test is quite sample. mount tmpfs with double memory size and store several > copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build. > for example, there is 3G memory and then tmpfs size is 6G and there is 6 > kernel copy. Wow, tmpfs size > memsize! > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > had similar logic, but 1% swap-out made lots bug reports. > can you elaborate this? > Completely restore previous behavior (do full scan with priority 0) is > ok too. This is a option. but we need to know the root cause anyway. if not, we might reintroduce this issue again in the future. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 6:40 ` KOSAKI Motohiro @ 2010-03-30 6:53 ` Shaohua Li 2010-03-30 7:31 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: Shaohua Li @ 2010-03-30 6:53 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Tue, Mar 30, 2010 at 02:40:07PM +0800, KOSAKI Motohiro wrote: > > On Tue, 2010-03-30 at 14:08 +0800, KOSAKI Motohiro wrote: > > > Hi > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > completely skip anon pages and cause oops. > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > to 1. See below patch. > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > It's required to fix this too. > > > > > > Can you please post your /proc/meminfo > > attached. > > > and reproduce program? I'll digg it. > > our test is quite sample. mount tmpfs with double memory size and store several > > copies (memory size * 2/G) of kernel in tmpfs, and then do kernel build. > > for example, there is 3G memory and then tmpfs size is 6G and there is 6 > > kernel copy. > > Wow, tmpfs size > memsize! > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > had similar logic, but 1% swap-out made lots bug reports. > > can you elaborate this? > > Completely restore previous behavior (do full scan with priority 0) is > > ok too. > > This is a option. but we need to know the root cause anyway. I thought I mentioned the root cause in first mail. My debug shows recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes percent[0] 0. But you can double check too. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 6:53 ` Shaohua Li @ 2010-03-30 7:31 ` KOSAKI Motohiro 2010-03-30 8:13 ` Shaohua Li 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-30 7:31 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > had similar logic, but 1% swap-out made lots bug reports. > > > can you elaborate this? > > > Completely restore previous behavior (do full scan with priority 0) is > > > ok too. > > > > This is a option. but we need to know the root cause anyway. > I thought I mentioned the root cause in first mail. My debug shows > recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes > percent[0] 0. But you can double check too. To revert can save percent[0]==0 && priority==0 case. but it shouldn't happen, I think. It mean to happen big latency issue. Can you please try following patch? Also, I'll prepare reproduce environment soon. --- mm/vmscan.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..abf7f79 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1571,15 +1571,19 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, */ if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { spin_lock_irq(&zone->lru_lock); - reclaim_stat->recent_scanned[0] /= 2; - reclaim_stat->recent_rotated[0] /= 2; + while (reclaim_stat->recent_scanned[0] > anon / 4) { + reclaim_stat->recent_scanned[0] /= 2; + reclaim_stat->recent_rotated[0] /= 2; + } spin_unlock_irq(&zone->lru_lock); } if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) { spin_lock_irq(&zone->lru_lock); - reclaim_stat->recent_scanned[1] /= 2; - reclaim_stat->recent_rotated[1] /= 2; + while (reclaim_stat->recent_scanned[1] > file / 4) { + reclaim_stat->recent_scanned[1] /= 2; + reclaim_stat->recent_rotated[1] /= 2; + } spin_unlock_irq(&zone->lru_lock); } -- 1.6.5.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 7:31 ` KOSAKI Motohiro @ 2010-03-30 8:13 ` Shaohua Li 0 siblings, 0 replies; 49+ messages in thread From: Shaohua Li @ 2010-03-30 8:13 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Tue, Mar 30, 2010 at 03:31:40PM +0800, KOSAKI Motohiro wrote: > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > > had similar logic, but 1% swap-out made lots bug reports. > > > > can you elaborate this? > > > > Completely restore previous behavior (do full scan with priority 0) is > > > > ok too. > > > > > > This is a option. but we need to know the root cause anyway. > > I thought I mentioned the root cause in first mail. My debug shows > > recent_rotated[0] is big, but recent_rotated[1] is almost zero, which makes > > percent[0] 0. But you can double check too. > > To revert can save percent[0]==0 && priority==0 case. but it shouldn't > happen, I think. It mean to happen big latency issue. > > Can you please try following patch? Also, I'll prepare reproduce environment soon. still oom with the patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 6:08 ` KOSAKI Motohiro 2010-03-30 6:32 ` Shaohua Li @ 2010-03-31 4:53 ` Shaohua Li 2010-03-31 5:38 ` KOSAKI Motohiro 2010-03-31 5:41 ` Wu Fengguang 1 sibling, 2 replies; 49+ messages in thread From: Shaohua Li @ 2010-03-31 4:53 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > Hi > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > value, but our calculation round it to zero. The commit makes vmscan > > completely skip anon pages and cause oops. > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > to 1. See below patch. > > But the offending commit still changes behavior. Without the commit, we scan > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > It's required to fix this too. > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > had similar logic, but 1% swap-out made lots bug reports. if 1% is still big, how about below patch? Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. With it, our tmpfs test always oom. The test has a lot of rotated anon pages and cause percent[0] zero. Actually the percent[0] is a very small value, but our calculation round it to zero. The commit makes vmscan completely skip anon pages and cause oops. To avoid underflow, we don't use percentage, instead we directly calculate how many pages should be scaned. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..80a7ed5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, } /* + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, + * until we collected @swap_cluster_max pages to scan. + */ +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, + unsigned long *nr_saved_scan) +{ + unsigned long nr; + + *nr_saved_scan += nr_to_scan; + nr = *nr_saved_scan; + + if (nr >= SWAP_CLUSTER_MAX) + *nr_saved_scan = 0; + else + nr = 0; + + return nr; +} + +/* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined * by looking at the fraction of the pages scanned we did rotate back * onto the active list instead of evict. * - * percent[0] specifies how much pressure to put on ram/swap backed - * memory, while percent[1] determines pressure on the file LRUs. + * nr[x] specifies how many pages should be scaned */ -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, - unsigned long *percent) +static void get_scan_count(struct zone *zone, struct scan_control *sc, + unsigned long *nr, int priority) { unsigned long anon, file, free; unsigned long anon_prio, file_prio; unsigned long ap, fp; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); + unsigned long fraction[2], denominator[2]; + enum lru_list l; /* If we have no swap space, do not bother scanning anon pages. */ if (!sc->may_swap || (nr_swap_pages <= 0)) { - percent[0] = 0; - percent[1] = 100; - return; + fraction[0] = 0; + denominator[0] = 1; + fraction[1] = 1; + denominator[1] = 1; + goto out; } anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, /* If we have very few page cache pages, force-scan anon pages. */ if (unlikely(file + free <= high_wmark_pages(zone))) { - percent[0] = 100; - percent[1] = 0; - return; + fraction[0] = 1; + denominator[0] = 1; + fraction[1] = 0; + denominator[1] = 1; + goto out; } } @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - /* Normalize to percentages */ - percent[0] = 100 * ap / (ap + fp + 1); - percent[1] = 100 - percent[0]; -} - -/* - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, - * until we collected @swap_cluster_max pages to scan. - */ -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, - unsigned long *nr_saved_scan) -{ - unsigned long nr; + fraction[0] = ap; + denominator[0] = ap + fp + 1; + fraction[1] = fp; + denominator[1] = ap + fp + 1; - *nr_saved_scan += nr_to_scan; - nr = *nr_saved_scan; +out: + for_each_evictable_lru(l) { + int file = is_file_lru(l); + unsigned long scan; - if (nr >= SWAP_CLUSTER_MAX) - *nr_saved_scan = 0; - else - nr = 0; + if (fraction[file] == 0) { + nr[l] = 0; + continue; + } - return nr; + scan = zone_nr_lru_pages(zone, sc, l); + if (priority) { + scan >>= priority; + scan = (scan * fraction[file] / denominator[file]); + } + nr[l] = nr_scan_try_batch(scan, + &reclaim_stat->nr_saved_scan[l]); + } } /* @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, { unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; - unsigned long percent[2]; /* anon @ 0; file @ 1 */ enum lru_list l; unsigned long nr_reclaimed = sc->nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); - - get_scan_ratio(zone, sc, percent); - for_each_evictable_lru(l) { - int file = is_file_lru(l); - unsigned long scan; - - if (percent[file] == 0) { - nr[l] = 0; - continue; - } - - scan = zone_nr_lru_pages(zone, sc, l); - if (priority) { - scan >>= priority; - scan = (scan * percent[file]) / 100; - } - nr[l] = nr_scan_try_batch(scan, - &reclaim_stat->nr_saved_scan[l]); - } + get_scan_count(zone, sc, nr, priority); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 4:53 ` Shaohua Li @ 2010-03-31 5:38 ` KOSAKI Motohiro 2010-03-31 5:51 ` Wu Fengguang 2010-03-31 5:53 ` KOSAKI Motohiro 2010-03-31 5:41 ` Wu Fengguang 1 sibling, 2 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-31 5:38 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > Hi > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > value, but our calculation round it to zero. The commit makes vmscan > > > completely skip anon pages and cause oops. > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > to 1. See below patch. > > > But the offending commit still changes behavior. Without the commit, we scan > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > It's required to fix this too. > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > had similar logic, but 1% swap-out made lots bug reports. > if 1% is still big, how about below patch? This patch makes a lot of sense than previous. however I think <1% anon ratio shouldn't happen anyway because file lru doesn't have reclaimable pages. <1% seems no good reclaim rate. perhaps I'll take your patch for stable tree. but we need to attack the root cause. iow, I guess we need to fix scan ratio equation itself. > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > With it, our tmpfs test always oom. The test has a lot of rotated anon > pages and cause percent[0] zero. Actually the percent[0] is a very small > value, but our calculation round it to zero. The commit makes vmscan > completely skip anon pages and cause oops. > To avoid underflow, we don't use percentage, instead we directly calculate > how many pages should be scaned. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..80a7ed5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > } > > /* > + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > + * until we collected @swap_cluster_max pages to scan. > + */ > +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > + unsigned long *nr_saved_scan) > +{ > + unsigned long nr; > + > + *nr_saved_scan += nr_to_scan; > + nr = *nr_saved_scan; > + > + if (nr >= SWAP_CLUSTER_MAX) > + *nr_saved_scan = 0; > + else > + nr = 0; > + > + return nr; > +} > + > +/* > * Determine how aggressively the anon and file LRU lists should be > * scanned. The relative value of each set of LRU lists is determined > * by looking at the fraction of the pages scanned we did rotate back > * onto the active list instead of evict. > * > - * percent[0] specifies how much pressure to put on ram/swap backed > - * memory, while percent[1] determines pressure on the file LRUs. > + * nr[x] specifies how many pages should be scaned > */ > -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > - unsigned long *percent) > +static void get_scan_count(struct zone *zone, struct scan_control *sc, > + unsigned long *nr, int priority) > { > unsigned long anon, file, free; > unsigned long anon_prio, file_prio; > unsigned long ap, fp; > struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > + unsigned long fraction[2], denominator[2]; > + enum lru_list l; > > /* If we have no swap space, do not bother scanning anon pages. */ > if (!sc->may_swap || (nr_swap_pages <= 0)) { > - percent[0] = 0; > - percent[1] = 100; > - return; > + fraction[0] = 0; > + denominator[0] = 1; > + fraction[1] = 1; > + denominator[1] = 1; > + goto out; > } > > anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > /* If we have very few page cache pages, > force-scan anon pages. */ > if (unlikely(file + free <= high_wmark_pages(zone))) { > - percent[0] = 100; > - percent[1] = 0; > - return; > + fraction[0] = 1; > + denominator[0] = 1; > + fraction[1] = 0; > + denominator[1] = 1; > + goto out; > } > } > > @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); > fp /= reclaim_stat->recent_rotated[1] + 1; > > - /* Normalize to percentages */ > - percent[0] = 100 * ap / (ap + fp + 1); > - percent[1] = 100 - percent[0]; > -} > - > -/* > - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > - * until we collected @swap_cluster_max pages to scan. > - */ > -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > - unsigned long *nr_saved_scan) > -{ > - unsigned long nr; > + fraction[0] = ap; > + denominator[0] = ap + fp + 1; > + fraction[1] = fp; > + denominator[1] = ap + fp + 1; > > - *nr_saved_scan += nr_to_scan; > - nr = *nr_saved_scan; > +out: > + for_each_evictable_lru(l) { > + int file = is_file_lru(l); > + unsigned long scan; > > - if (nr >= SWAP_CLUSTER_MAX) > - *nr_saved_scan = 0; > - else > - nr = 0; > + if (fraction[file] == 0) { > + nr[l] = 0; > + continue; > + } > > - return nr; > + scan = zone_nr_lru_pages(zone, sc, l); > + if (priority) { > + scan >>= priority; > + scan = (scan * fraction[file] / denominator[file]); > + } > + nr[l] = nr_scan_try_batch(scan, > + &reclaim_stat->nr_saved_scan[l]); > + } > } > > /* > @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, > { > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > - unsigned long percent[2]; /* anon @ 0; file @ 1 */ > enum lru_list l; > unsigned long nr_reclaimed = sc->nr_reclaimed; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > - > - get_scan_ratio(zone, sc, percent); > > - for_each_evictable_lru(l) { > - int file = is_file_lru(l); > - unsigned long scan; > - > - if (percent[file] == 0) { > - nr[l] = 0; > - continue; > - } > - > - scan = zone_nr_lru_pages(zone, sc, l); > - if (priority) { > - scan >>= priority; > - scan = (scan * percent[file]) / 100; > - } > - nr[l] = nr_scan_try_batch(scan, > - &reclaim_stat->nr_saved_scan[l]); > - } > + get_scan_count(zone, sc, nr, priority); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 5:38 ` KOSAKI Motohiro @ 2010-03-31 5:51 ` Wu Fengguang 2010-03-31 6:00 ` KOSAKI Motohiro 2010-03-31 5:53 ` KOSAKI Motohiro 1 sibling, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-03-31 5:51 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org KOSAKI-san, On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote: > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > Hi > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > completely skip anon pages and cause oops. > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > to 1. See below patch. > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > It's required to fix this too. > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > had similar logic, but 1% swap-out made lots bug reports. > > if 1% is still big, how about below patch? > > This patch makes a lot of sense than previous. however I think <1% anon ratio > shouldn't happen anyway because file lru doesn't have reclaimable pages. > <1% seems no good reclaim rate. > > perhaps I'll take your patch for stable tree. but we need to attack the root > cause. iow, I guess we need to fix scan ratio equation itself. I tend to regard this patch as a general improvement for both .33-stable and .34. I do agree with you that it's desirable to do more test&analyze and check further for possibly hidden problems. Thanks, Fengguang > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > value, but our calculation round it to zero. The commit makes vmscan > > completely skip anon pages and cause oops. > > To avoid underflow, we don't use percentage, instead we directly calculate > > how many pages should be scaned. > > > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 79c8098..80a7ed5 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > > } > > > > /* > > + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > > + * until we collected @swap_cluster_max pages to scan. > > + */ > > +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > > + unsigned long *nr_saved_scan) > > +{ > > + unsigned long nr; > > + > > + *nr_saved_scan += nr_to_scan; > > + nr = *nr_saved_scan; > > + > > + if (nr >= SWAP_CLUSTER_MAX) > > + *nr_saved_scan = 0; > > + else > > + nr = 0; > > + > > + return nr; > > +} > > + > > +/* > > * Determine how aggressively the anon and file LRU lists should be > > * scanned. The relative value of each set of LRU lists is determined > > * by looking at the fraction of the pages scanned we did rotate back > > * onto the active list instead of evict. > > * > > - * percent[0] specifies how much pressure to put on ram/swap backed > > - * memory, while percent[1] determines pressure on the file LRUs. > > + * nr[x] specifies how many pages should be scaned > > */ > > -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > > - unsigned long *percent) > > +static void get_scan_count(struct zone *zone, struct scan_control *sc, > > + unsigned long *nr, int priority) > > { > > unsigned long anon, file, free; > > unsigned long anon_prio, file_prio; > > unsigned long ap, fp; > > struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > > + unsigned long fraction[2], denominator[2]; > > + enum lru_list l; > > > > /* If we have no swap space, do not bother scanning anon pages. */ > > if (!sc->may_swap || (nr_swap_pages <= 0)) { > > - percent[0] = 0; > > - percent[1] = 100; > > - return; > > + fraction[0] = 0; > > + denominator[0] = 1; > > + fraction[1] = 1; > > + denominator[1] = 1; > > + goto out; > > } > > > > anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > > @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > > /* If we have very few page cache pages, > > force-scan anon pages. */ > > if (unlikely(file + free <= high_wmark_pages(zone))) { > > - percent[0] = 100; > > - percent[1] = 0; > > - return; > > + fraction[0] = 1; > > + denominator[0] = 1; > > + fraction[1] = 0; > > + denominator[1] = 1; > > + goto out; > > } > > } > > > > @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > > fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); > > fp /= reclaim_stat->recent_rotated[1] + 1; > > > > - /* Normalize to percentages */ > > - percent[0] = 100 * ap / (ap + fp + 1); > > - percent[1] = 100 - percent[0]; > > -} > > - > > -/* > > - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > > - * until we collected @swap_cluster_max pages to scan. > > - */ > > -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > > - unsigned long *nr_saved_scan) > > -{ > > - unsigned long nr; > > + fraction[0] = ap; > > + denominator[0] = ap + fp + 1; > > + fraction[1] = fp; > > + denominator[1] = ap + fp + 1; > > > > - *nr_saved_scan += nr_to_scan; > > - nr = *nr_saved_scan; > > +out: > > + for_each_evictable_lru(l) { > > + int file = is_file_lru(l); > > + unsigned long scan; > > > > - if (nr >= SWAP_CLUSTER_MAX) > > - *nr_saved_scan = 0; > > - else > > - nr = 0; > > + if (fraction[file] == 0) { > > + nr[l] = 0; > > + continue; > > + } > > > > - return nr; > > + scan = zone_nr_lru_pages(zone, sc, l); > > + if (priority) { > > + scan >>= priority; > > + scan = (scan * fraction[file] / denominator[file]); > > + } > > + nr[l] = nr_scan_try_batch(scan, > > + &reclaim_stat->nr_saved_scan[l]); > > + } > > } > > > > /* > > @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, > > { > > unsigned long nr[NR_LRU_LISTS]; > > unsigned long nr_to_scan; > > - unsigned long percent[2]; /* anon @ 0; file @ 1 */ > > enum lru_list l; > > unsigned long nr_reclaimed = sc->nr_reclaimed; > > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > > - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > > - > > - get_scan_ratio(zone, sc, percent); > > > > - for_each_evictable_lru(l) { > > - int file = is_file_lru(l); > > - unsigned long scan; > > - > > - if (percent[file] == 0) { > > - nr[l] = 0; > > - continue; > > - } > > - > > - scan = zone_nr_lru_pages(zone, sc, l); > > - if (priority) { > > - scan >>= priority; > > - scan = (scan * percent[file]) / 100; > > - } > > - nr[l] = nr_scan_try_batch(scan, > > - &reclaim_stat->nr_saved_scan[l]); > > - } > > + get_scan_count(zone, sc, nr, priority); > > > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > > nr[LRU_INACTIVE_FILE]) { > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 5:51 ` Wu Fengguang @ 2010-03-31 6:00 ` KOSAKI Motohiro 2010-03-31 6:03 ` Wu Fengguang 2010-04-01 22:16 ` Andrew Morton 0 siblings, 2 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-31 6:00 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org > KOSAKI-san, > > On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote: > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > Hi > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > completely skip anon pages and cause oops. > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > to 1. See below patch. > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > It's required to fix this too. > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > had similar logic, but 1% swap-out made lots bug reports. > > > if 1% is still big, how about below patch? > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > <1% seems no good reclaim rate. > > > > perhaps I'll take your patch for stable tree. but we need to attack the root > > cause. iow, I guess we need to fix scan ratio equation itself. > > I tend to regard this patch as a general improvement for both > .33-stable and .34. > > I do agree with you that it's desirable to do more test&analyze and > check further for possibly hidden problems. Yeah, I don't want ignore .33-stable too. if I can't find the root cause in 2-3 days, I'll revert guilty patch anyway. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 6:00 ` KOSAKI Motohiro @ 2010-03-31 6:03 ` Wu Fengguang 2010-04-01 22:16 ` Andrew Morton 1 sibling, 0 replies; 49+ messages in thread From: Wu Fengguang @ 2010-03-31 6:03 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Wed, Mar 31, 2010 at 02:00:52PM +0800, KOSAKI Motohiro wrote: > > KOSAKI-san, > > > > On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote: > > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > > Hi > > > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > > completely skip anon pages and cause oops. > > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > > to 1. See below patch. > > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > > It's required to fix this too. > > > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > > had similar logic, but 1% swap-out made lots bug reports. > > > > if 1% is still big, how about below patch? > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > <1% seems no good reclaim rate. > > > > > > perhaps I'll take your patch for stable tree. but we need to attack the root > > > cause. iow, I guess we need to fix scan ratio equation itself. > > > > I tend to regard this patch as a general improvement for both > > .33-stable and .34. > > > > I do agree with you that it's desirable to do more test&analyze and > > check further for possibly hidden problems. > > Yeah, I don't want ignore .33-stable too. if I can't find the root cause > in 2-3 days, I'll revert guilty patch anyway. OK, thank you! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 6:00 ` KOSAKI Motohiro 2010-03-31 6:03 ` Wu Fengguang @ 2010-04-01 22:16 ` Andrew Morton 2010-04-02 9:13 ` KOSAKI Motohiro 1 sibling, 1 reply; 49+ messages in thread From: Andrew Morton @ 2010-04-01 22:16 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Wu Fengguang, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 31 Mar 2010 15:00:52 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > KOSAKI-san, > > > > On Wed, Mar 31, 2010 at 01:38:12PM +0800, KOSAKI Motohiro wrote: > > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > > Hi > > > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > > completely skip anon pages and cause oops. > > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > > to 1. See below patch. > > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > > It's required to fix this too. > > > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > > had similar logic, but 1% swap-out made lots bug reports. > > > > if 1% is still big, how about below patch? > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > <1% seems no good reclaim rate. > > > > > > perhaps I'll take your patch for stable tree. but we need to attack the root > > > cause. iow, I guess we need to fix scan ratio equation itself. > > > > I tend to regard this patch as a general improvement for both > > .33-stable and .34. > > > > I do agree with you that it's desirable to do more test&analyze and > > check further for possibly hidden problems. > > Yeah, I don't want ignore .33-stable too. if I can't find the root cause > in 2-3 days, I'll revert guilty patch anyway. > It's a good idea to avoid fixing a bug one-way-in-stable, other-way-in-mainline. Because then we have new code in both trees which is different. And the -stable guys sensibly like to see code get a bit of a shakedown in mainline before backporting it. So it would be better to merge the "simple" patch into mainline, tagged for -stable backporting. Then we can later implement the larger fix in mainline, perhaps starting by reverting the "simple" fix. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-01 22:16 ` Andrew Morton @ 2010-04-02 9:13 ` KOSAKI Motohiro 2010-04-06 1:22 ` Wu Fengguang 2010-04-06 3:36 ` Rik van Riel 0 siblings, 2 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-02 9:13 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro, Wu Fengguang, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org > > Yeah, I don't want ignore .33-stable too. if I can't find the root cause > > in 2-3 days, I'll revert guilty patch anyway. > > > > It's a good idea to avoid fixing a bug one-way-in-stable, > other-way-in-mainline. Because then we have new code in both trees > which is different. And the -stable guys sensibly like to see code get > a bit of a shakedown in mainline before backporting it. > > So it would be better to merge the "simple" patch into mainline, tagged > for -stable backporting. Then we can later implement the larger fix in > mainline, perhaps starting by reverting the "simple" fix. .....ok. I don't have to prevent your code maintainship. although I still think we need to fix the issue completely. =================================================================== From 52358cbccdfe94e0381974cd6e937bcc6b1c608b Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Date: Fri, 2 Apr 2010 17:13:48 +0900 Subject: [PATCH] Revert "vmscan: get_scan_ratio() cleanup" Shaohua Li reported his tmpfs streaming I/O test can lead to make oom. The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are 6 copies of kernel source and the test does kbuild for each copy. His investigation shows the test has a lot of rotated anon pages and quite few file pages, so get_scan_ratio calculates percent[0] (i.e. scanning percent for anon) to be zero. Actually the percent[0] shoule be a big value, but our calculation round it to zero. Although before commit 84b18490, we have the same sick too. but the old logic can rescue percent[0]==0 case only when priority==0. It had hided the real issue. I didn't think merely streaming io can makes percent[0]==0 && priority==0 situation. but I was wrong. So, definitely we have to fix such tmpfs streaming io issue. but anyway I revert the regression commit at first. This reverts commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26. Reported-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/vmscan.c | 23 +++++++++-------------- 1 files changed, 9 insertions(+), 14 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..cb3947e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1535,13 +1535,6 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, unsigned long ap, fp; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); - /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || (nr_swap_pages <= 0)) { - percent[0] = 0; - percent[1] = 100; - return; - } - anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) + @@ -1639,20 +1632,22 @@ static void shrink_zone(int priority, struct zone *zone, unsigned long nr_reclaimed = sc->nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); + int noswap = 0; - get_scan_ratio(zone, sc, percent); + /* If we have no swap space, do not bother scanning anon pages. */ + if (!sc->may_swap || (nr_swap_pages <= 0)) { + noswap = 1; + percent[0] = 0; + percent[1] = 100; + } else + get_scan_ratio(zone, sc, percent); for_each_evictable_lru(l) { int file = is_file_lru(l); unsigned long scan; - if (percent[file] == 0) { - nr[l] = 0; - continue; - } - scan = zone_nr_lru_pages(zone, sc, l); - if (priority) { + if (priority || noswap) { scan >>= priority; scan = (scan * percent[file]) / 100; } -- 1.6.5.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 9:13 ` KOSAKI Motohiro @ 2010-04-06 1:22 ` Wu Fengguang 2010-04-06 3:36 ` Rik van Riel 1 sibling, 0 replies; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 1:22 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org > =================================================================== > >From 52358cbccdfe94e0381974cd6e937bcc6b1c608b Mon Sep 17 00:00:00 2001 > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Date: Fri, 2 Apr 2010 17:13:48 +0900 > Subject: [PATCH] Revert "vmscan: get_scan_ratio() cleanup" > > Shaohua Li reported his tmpfs streaming I/O test can lead to make oom. > The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, > there are 6 copies of kernel source and the test does kbuild for each > copy. His investigation shows the test has a lot of rotated anon > pages and quite few file pages, so get_scan_ratio calculates percent[0] > (i.e. scanning percent for anon) to be zero. Actually the percent[0] > shoule be a big value, but our calculation round it to zero. should small :) Acked-by: Wu Fengguang <fengguang.wu@intel.com> Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 9:13 ` KOSAKI Motohiro 2010-04-06 1:22 ` Wu Fengguang @ 2010-04-06 3:36 ` Rik van Riel 1 sibling, 0 replies; 49+ messages in thread From: Rik van Riel @ 2010-04-06 3:36 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Wu Fengguang, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 04/02/2010 05:13 AM, KOSAKI Motohiro wrote: >>> Yeah, I don't want ignore .33-stable too. if I can't find the root cause >>> in 2-3 days, I'll revert guilty patch anyway. >>> >> >> It's a good idea to avoid fixing a bug one-way-in-stable, >> other-way-in-mainline. Because then we have new code in both trees >> which is different. And the -stable guys sensibly like to see code get >> a bit of a shakedown in mainline before backporting it. >> >> So it would be better to merge the "simple" patch into mainline, tagged >> for -stable backporting. Then we can later implement the larger fix in >> mainline, perhaps starting by reverting the "simple" fix. > > .....ok. I don't have to prevent your code maintainship. although I still > think we need to fix the issue completely. Agreed on the revert. Acked-by: Rik van Riel <riel@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 5:38 ` KOSAKI Motohiro 2010-03-31 5:51 ` Wu Fengguang @ 2010-03-31 5:53 ` KOSAKI Motohiro 2010-04-02 6:50 ` Shaohua Li 1 sibling, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-31 5:53 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Shaohua Li, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > Hi > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > completely skip anon pages and cause oops. > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > to 1. See below patch. > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > It's required to fix this too. > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > had similar logic, but 1% swap-out made lots bug reports. > > if 1% is still big, how about below patch? > > This patch makes a lot of sense than previous. however I think <1% anon ratio > shouldn't happen anyway because file lru doesn't have reclaimable pages. > <1% seems no good reclaim rate. Oops, the above mention is wrong. sorry. only 1 page is still too big. because under streaming io workload, the number of scanning anon pages should be zero. this is very strong requirement. if not, backup operation will makes a lot of swapping out. Anyway, I'm digging this issue. > > perhaps I'll take your patch for stable tree. but we need to attack the root > cause. iow, I guess we need to fix scan ratio equation itself. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 5:53 ` KOSAKI Motohiro @ 2010-04-02 6:50 ` Shaohua Li 2010-04-02 9:14 ` KOSAKI Motohiro ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-02 6:50 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote: > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > Hi > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > completely skip anon pages and cause oops. > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > to 1. See below patch. > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > It's required to fix this too. > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > had similar logic, but 1% swap-out made lots bug reports. > > > if 1% is still big, how about below patch? > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > <1% seems no good reclaim rate. > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > because under streaming io workload, the number of scanning anon pages should > be zero. this is very strong requirement. if not, backup operation will makes > a lot of swapping out. Sounds there is no big impact for the workload which you mentioned with the patch. please see below descriptions. I updated the description of the patch as fengguang suggested. Commit 84b18490d introduces a regression. With it, our tmpfs test always oom. The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are 6 copies of kernel source and the test does kbuild for each copy. My investigation shows the test has a lot of rotated anon pages and quite few file pages, so get_scan_ratio calculates percent[0] to be zero. Actually the percent[0] shoule be a very small value, but our calculation round it to zero. The commit makes vmscan completely skip anon pages and cause oops. To avoid underflow, we don't use percentage, instead we directly calculate how many pages should be scaned. In this way, we should get several scan pages for < 1% percent. With this fix, my test doesn't oom any more. Note, this patch doesn't really change logics, but just increase precise. For system with a lot of memory, this might slightly changes behavior. For example, in a sequential file read workload, without the patch, we don't swap any anon pages. With it, if anon memory size is bigger than 16G, we will say one anon page swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap is assumed to be 1024 which is common in this workload. So the impact sounds not a big deal. Signed-off-by: Shaohua Li <shaohua.li@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..80a7ed5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, } /* + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, + * until we collected @swap_cluster_max pages to scan. + */ +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, + unsigned long *nr_saved_scan) +{ + unsigned long nr; + + *nr_saved_scan += nr_to_scan; + nr = *nr_saved_scan; + + if (nr >= SWAP_CLUSTER_MAX) + *nr_saved_scan = 0; + else + nr = 0; + + return nr; +} + +/* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined * by looking at the fraction of the pages scanned we did rotate back * onto the active list instead of evict. * - * percent[0] specifies how much pressure to put on ram/swap backed - * memory, while percent[1] determines pressure on the file LRUs. + * nr[x] specifies how many pages should be scaned */ -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, - unsigned long *percent) +static void get_scan_count(struct zone *zone, struct scan_control *sc, + unsigned long *nr, int priority) { unsigned long anon, file, free; unsigned long anon_prio, file_prio; unsigned long ap, fp; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); + unsigned long fraction[2], denominator[2]; + enum lru_list l; /* If we have no swap space, do not bother scanning anon pages. */ if (!sc->may_swap || (nr_swap_pages <= 0)) { - percent[0] = 0; - percent[1] = 100; - return; + fraction[0] = 0; + denominator[0] = 1; + fraction[1] = 1; + denominator[1] = 1; + goto out; } anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, /* If we have very few page cache pages, force-scan anon pages. */ if (unlikely(file + free <= high_wmark_pages(zone))) { - percent[0] = 100; - percent[1] = 0; - return; + fraction[0] = 1; + denominator[0] = 1; + fraction[1] = 0; + denominator[1] = 1; + goto out; } } @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - /* Normalize to percentages */ - percent[0] = 100 * ap / (ap + fp + 1); - percent[1] = 100 - percent[0]; -} - -/* - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, - * until we collected @swap_cluster_max pages to scan. - */ -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, - unsigned long *nr_saved_scan) -{ - unsigned long nr; + fraction[0] = ap; + denominator[0] = ap + fp + 1; + fraction[1] = fp; + denominator[1] = ap + fp + 1; - *nr_saved_scan += nr_to_scan; - nr = *nr_saved_scan; +out: + for_each_evictable_lru(l) { + int file = is_file_lru(l); + unsigned long scan; - if (nr >= SWAP_CLUSTER_MAX) - *nr_saved_scan = 0; - else - nr = 0; + if (fraction[file] == 0) { + nr[l] = 0; + continue; + } - return nr; + scan = zone_nr_lru_pages(zone, sc, l); + if (priority) { + scan >>= priority; + scan = (scan * fraction[file] / denominator[file]); + } + nr[l] = nr_scan_try_batch(scan, + &reclaim_stat->nr_saved_scan[l]); + } } /* @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, { unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; - unsigned long percent[2]; /* anon @ 0; file @ 1 */ enum lru_list l; unsigned long nr_reclaimed = sc->nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); - - get_scan_ratio(zone, sc, percent); - for_each_evictable_lru(l) { - int file = is_file_lru(l); - unsigned long scan; - - if (percent[file] == 0) { - nr[l] = 0; - continue; - } - - scan = zone_nr_lru_pages(zone, sc, l); - if (priority) { - scan >>= priority; - scan = (scan * percent[file]) / 100; - } - nr[l] = nr_scan_try_batch(scan, - &reclaim_stat->nr_saved_scan[l]); - } + get_scan_count(zone, sc, nr, priority); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 6:50 ` Shaohua Li @ 2010-04-02 9:14 ` KOSAKI Motohiro 2010-04-02 9:24 ` Shaohua Li 2010-04-04 0:48 ` Wu Fengguang 2010-04-06 5:03 ` Wu Fengguang 2 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-02 9:14 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > <1% seems no good reclaim rate. > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > because under streaming io workload, the number of scanning anon pages should > > be zero. this is very strong requirement. if not, backup operation will makes > > a lot of swapping out. > Sounds there is no big impact for the workload which you mentioned with the patch. > please see below descriptions. > I updated the description of the patch as fengguang suggested. Umm.. sorry, no. "one fix but introduce another one bug" is not good deal. instead, I'll revert the guilty commit at first as akpm mentioned. thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 9:14 ` KOSAKI Motohiro @ 2010-04-02 9:24 ` Shaohua Li 2010-04-04 14:19 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: Shaohua Li @ 2010-04-02 9:24 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > <1% seems no good reclaim rate. > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > because under streaming io workload, the number of scanning anon pages should > > > be zero. this is very strong requirement. if not, backup operation will makes > > > a lot of swapping out. > > Sounds there is no big impact for the workload which you mentioned with the patch. > > please see below descriptions. > > I updated the description of the patch as fengguang suggested. > > Umm.. sorry, no. > > "one fix but introduce another one bug" is not good deal. instead, > I'll revert the guilty commit at first as akpm mentioned. Even we revert the commit, the patch still has its benefit, as it increases calculation precision, right? Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 9:24 ` Shaohua Li @ 2010-04-04 14:19 ` KOSAKI Motohiro 2010-04-06 1:25 ` Shaohua Li 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-04 14:19 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > <1% seems no good reclaim rate. > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > because under streaming io workload, the number of scanning anon pages should > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > a lot of swapping out. > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > please see below descriptions. > > > I updated the description of the patch as fengguang suggested. > > > > Umm.. sorry, no. > > > > "one fix but introduce another one bug" is not good deal. instead, > > I'll revert the guilty commit at first as akpm mentioned. > Even we revert the commit, the patch still has its benefit, as it increases > calculation precision, right? no, you shouldn't ignore the regression case. If we can remove the streaming io corner case by another patch, this patch can be considered to merge. thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-04 14:19 ` KOSAKI Motohiro @ 2010-04-06 1:25 ` Shaohua Li 2010-04-06 1:36 ` KOSAKI Motohiro 2010-04-06 1:50 ` Wu Fengguang 0 siblings, 2 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-06 1:25 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > a lot of swapping out. > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > please see below descriptions. > > > > I updated the description of the patch as fengguang suggested. > > > > > > Umm.. sorry, no. > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > I'll revert the guilty commit at first as akpm mentioned. > > Even we revert the commit, the patch still has its benefit, as it increases > > calculation precision, right? > > no, you shouldn't ignore the regression case. I don't think this is serious. In my calculation, there is only 1 page swapped out for 6G anonmous memory. 1 page should haven't any performance impact. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 1:25 ` Shaohua Li @ 2010-04-06 1:36 ` KOSAKI Motohiro 2010-04-06 1:50 ` Wu Fengguang 1 sibling, 0 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-06 1:36 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Wu, Fengguang > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > a lot of swapping out. > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > please see below descriptions. > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > Umm.. sorry, no. > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > I'll revert the guilty commit at first as akpm mentioned. > > > Even we revert the commit, the patch still has its benefit, as it increases > > > calculation precision, right? > > > > no, you shouldn't ignore the regression case. > I don't think this is serious. In my calculation, there is only 1 page swapped out > for 6G anonmous memory. 1 page should haven't any performance impact. there is. I had received exactly opposite claim. because shrink_zone() is not called only once. it is called very much time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 1:25 ` Shaohua Li 2010-04-06 1:36 ` KOSAKI Motohiro @ 2010-04-06 1:50 ` Wu Fengguang 2010-04-06 2:06 ` KOSAKI Motohiro 1 sibling, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 1:50 UTC (permalink / raw) To: Li, Shaohua Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Rik van Riel On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote: > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > a lot of swapping out. > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > please see below descriptions. > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > Umm.. sorry, no. > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > I'll revert the guilty commit at first as akpm mentioned. > > > Even we revert the commit, the patch still has its benefit, as it increases > > > calculation precision, right? > > > > no, you shouldn't ignore the regression case. > I don't think this is serious. In my calculation, there is only 1 page swapped out > for 6G anonmous memory. 1 page should haven't any performance impact. 1 anon page scanned for every N file pages scanned? Is N a _huge_ enough ratio so that the anon list will be very light scanned? Rik: here is a little background. Under streaming IO, the current get_scan_ratio() will get a percent[0] that is (much) less than 1, so underflows to 0. It has the bad effect of completely disabling the scan of anon list, which leads to OOM in Shaohua's test case. OTOH, it also has the good side effect of keeping anon pages in memory and totally prevent swap IO. Shaohua's patch improves the computation precision by computing nr[] directly in get_scan_ratio(). This is good in general, however will enable light scan of the anon list on streaming IO. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 1:50 ` Wu Fengguang @ 2010-04-06 2:06 ` KOSAKI Motohiro 2010-04-06 2:30 ` Wu Fengguang 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-06 2:06 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Rik van Riel > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote: > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > > a lot of swapping out. > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > > please see below descriptions. > > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > > > Umm.. sorry, no. > > > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > > I'll revert the guilty commit at first as akpm mentioned. > > > > Even we revert the commit, the patch still has its benefit, as it increases > > > > calculation precision, right? > > > > > > no, you shouldn't ignore the regression case. > > > I don't think this is serious. In my calculation, there is only 1 page swapped out > > for 6G anonmous memory. 1 page should haven't any performance impact. > > 1 anon page scanned for every N file pages scanned? > > Is N a _huge_ enough ratio so that the anon list will be very light scanned? > > Rik: here is a little background. The problem is, the VM are couteniously discarding no longer used file cache. if we are scan extra anon 1 page, we will observe tons swap usage after few days. please don't only think benchmark. > Under streaming IO, the current get_scan_ratio() will get a percent[0] > that is (much) less than 1, so underflows to 0. > > It has the bad effect of completely disabling the scan of anon list, > which leads to OOM in Shaohua's test case. OTOH, it also has the good > side effect of keeping anon pages in memory and totally prevent swap > IO. > > Shaohua's patch improves the computation precision by computing nr[] > directly in get_scan_ratio(). This is good in general, however will > enable light scan of the anon list on streaming IO. In such case, percent[0] should be big. I think underflowing is not point. His test case is merely streaming io copy, why can't we drop tmpfs cached page? his /proc/meminfo describe his machine didn't have droppable file cache. so, big percent[1] value seems makes no sense. no? I'm not sure we need either below detection. I need more investigate. 1) detect no discardable file cache 2) detect streaming io on tmpfs (as regular file) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 2:06 ` KOSAKI Motohiro @ 2010-04-06 2:30 ` Wu Fengguang 2010-04-06 2:58 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 2:30 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Rik van Riel On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote: > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote: > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > > > a lot of swapping out. > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > > > please see below descriptions. > > > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > > > > > Umm.. sorry, no. > > > > > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > > > I'll revert the guilty commit at first as akpm mentioned. > > > > > Even we revert the commit, the patch still has its benefit, as it increases > > > > > calculation precision, right? > > > > > > > > no, you shouldn't ignore the regression case. > > > > > I don't think this is serious. In my calculation, there is only 1 page swapped out > > > for 6G anonmous memory. 1 page should haven't any performance impact. > > > > 1 anon page scanned for every N file pages scanned? > > > > Is N a _huge_ enough ratio so that the anon list will be very light scanned? > > > > Rik: here is a little background. > > The problem is, the VM are couteniously discarding no longer used file > cache. if we are scan extra anon 1 page, we will observe tons swap usage > after few days. > > please don't only think benchmark. OK the days-of-streaming-io typically happen in file servers. Suppose a file server with 16GB memory, 1GB of which is consumed by anonymous pages, others are for page cache. Assume that the exact file:anon ratio computed by the get_scan_ratio() algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down to 0, which keeps the anon pages in memory for the few days. Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also be rounded down to 0. It only becomes >=1 when - reclaim runs into trouble and priority goes low - anon list goes huge So I guess Shaohua's patch still has reasonable "underflow" threshold :) Thanks, Fengguang > > > Under streaming IO, the current get_scan_ratio() will get a percent[0] > > that is (much) less than 1, so underflows to 0. > > > > It has the bad effect of completely disabling the scan of anon list, > > which leads to OOM in Shaohua's test case. OTOH, it also has the good > > side effect of keeping anon pages in memory and totally prevent swap > > IO. > > > > Shaohua's patch improves the computation precision by computing nr[] > > directly in get_scan_ratio(). This is good in general, however will > > enable light scan of the anon list on streaming IO. > > In such case, percent[0] should be big. I think underflowing is not point. > His test case is merely streaming io copy, why can't we drop tmpfs cached > page? his /proc/meminfo describe his machine didn't have droppable file cache. > so, big percent[1] value seems makes no sense. no? > > I'm not sure we need either below detection. I need more investigate. > 1) detect no discardable file cache > 2) detect streaming io on tmpfs (as regular file) > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 2:30 ` Wu Fengguang @ 2010-04-06 2:58 ` KOSAKI Motohiro 2010-04-06 3:31 ` Wu Fengguang 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-06 2:58 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Rik van Riel > On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote: > > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote: > > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > > > > a lot of swapping out. > > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > > > > please see below descriptions. > > > > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > > > > > > > Umm.. sorry, no. > > > > > > > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > > > > I'll revert the guilty commit at first as akpm mentioned. > > > > > > Even we revert the commit, the patch still has its benefit, as it increases > > > > > > calculation precision, right? > > > > > > > > > > no, you shouldn't ignore the regression case. > > > > > > > I don't think this is serious. In my calculation, there is only 1 page swapped out > > > > for 6G anonmous memory. 1 page should haven't any performance impact. > > > > > > 1 anon page scanned for every N file pages scanned? > > > > > > Is N a _huge_ enough ratio so that the anon list will be very light scanned? > > > > > > Rik: here is a little background. > > > > The problem is, the VM are couteniously discarding no longer used file > > cache. if we are scan extra anon 1 page, we will observe tons swap usage > > after few days. > > > > please don't only think benchmark. > > OK the days-of-streaming-io typically happen in file servers. Suppose > a file server with 16GB memory, 1GB of which is consumed by anonymous > pages, others are for page cache. > > Assume that the exact file:anon ratio computed by the get_scan_ratio() > algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down > to 0, which keeps the anon pages in memory for the few days. > > Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also > be rounded down to 0. It only becomes >=1 when > - reclaim runs into trouble and priority goes low > - anon list goes huge > > So I guess Shaohua's patch still has reasonable "underflow" threshold :) Again, I didn't said his patch is no worth. I only said we don't have to ignore the downside. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 2:58 ` KOSAKI Motohiro @ 2010-04-06 3:31 ` Wu Fengguang 2010-04-06 3:40 ` Rik van Riel 0 siblings, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 3:31 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, Rik van Riel On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote: > > On Tue, Apr 06, 2010 at 10:06:19AM +0800, KOSAKI Motohiro wrote: > > > > On Tue, Apr 06, 2010 at 09:25:36AM +0800, Li, Shaohua wrote: > > > > > On Sun, Apr 04, 2010 at 10:19:06PM +0800, KOSAKI Motohiro wrote: > > > > > > > On Fri, Apr 02, 2010 at 05:14:38PM +0800, KOSAKI Motohiro wrote: > > > > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > > > > > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > > > > > > > > <1% seems no good reclaim rate. > > > > > > > > > > > > > > > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > > > > > > > > because under streaming io workload, the number of scanning anon pages should > > > > > > > > > > be zero. this is very strong requirement. if not, backup operation will makes > > > > > > > > > > a lot of swapping out. > > > > > > > > > Sounds there is no big impact for the workload which you mentioned with the patch. > > > > > > > > > please see below descriptions. > > > > > > > > > I updated the description of the patch as fengguang suggested. > > > > > > > > > > > > > > > > Umm.. sorry, no. > > > > > > > > > > > > > > > > "one fix but introduce another one bug" is not good deal. instead, > > > > > > > > I'll revert the guilty commit at first as akpm mentioned. > > > > > > > Even we revert the commit, the patch still has its benefit, as it increases > > > > > > > calculation precision, right? > > > > > > > > > > > > no, you shouldn't ignore the regression case. > > > > > > > > > I don't think this is serious. In my calculation, there is only 1 page swapped out > > > > > for 6G anonmous memory. 1 page should haven't any performance impact. > > > > > > > > 1 anon page scanned for every N file pages scanned? > > > > > > > > Is N a _huge_ enough ratio so that the anon list will be very light scanned? > > > > > > > > Rik: here is a little background. > > > > > > The problem is, the VM are couteniously discarding no longer used file > > > cache. if we are scan extra anon 1 page, we will observe tons swap usage > > > after few days. > > > > > > please don't only think benchmark. > > > > OK the days-of-streaming-io typically happen in file servers. Suppose > > a file server with 16GB memory, 1GB of which is consumed by anonymous > > pages, others are for page cache. > > > > Assume that the exact file:anon ratio computed by the get_scan_ratio() > > algorithm is 1000:1. In that case percent[0]=0.1 and is rounded down > > to 0, which keeps the anon pages in memory for the few days. > > > > Now with Shaohua's patch, nr[0] = (262144/4096)/1000 = 0.06 will also > > be rounded down to 0. It only becomes >=1 when > > - reclaim runs into trouble and priority goes low > > - anon list goes huge > > > > So I guess Shaohua's patch still has reasonable "underflow" threshold :) > > Again, I didn't said his patch is no worth. I only said we don't have to > ignore the downside. Right, we should document both the upside and downside. The main difference happens when file:anon scan ratio > 100:1. For the current percent[] based computing, percent[0]=0 hence nr[0]=0 which disables anon list scan unconditionally, for good or for bad. For the direct nr[] computing, - nr[0] will be 0 for typical file servers, because with priority=12 and anon lru size < 1.6GB, nr[0] = (anon_size/4096)/100 < 0 - nr[0] will be non-zero when priority=1 and anon_size > 100 pages, this stops OOM for Shaohua's test case, however may not be enough to guarantee safety (your previous reverting patch can provide this guarantee). I liked Shaohua's patch a lot -- it adapts well to both the file-server case and the mostly-anon-pages case :) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 3:31 ` Wu Fengguang @ 2010-04-06 3:40 ` Rik van Riel 2010-04-06 4:49 ` Wu Fengguang 0 siblings, 1 reply; 49+ messages in thread From: Rik van Riel @ 2010-04-06 3:40 UTC (permalink / raw) To: Wu Fengguang Cc: KOSAKI Motohiro, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On 04/05/2010 11:31 PM, Wu Fengguang wrote: > On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote: >> Again, I didn't said his patch is no worth. I only said we don't have to >> ignore the downside. > > Right, we should document both the upside and downside. The downside is obvious: streaming IO (used once data that does not fit in the cache) can push out data that is used more often - requiring that it be swapped in at a later point in time. I understand what Shaohua's patch does, but I do not understand the upside. What good does it do to increase the size of the cache for streaming IO data, which is generally touched only once? What kind of performance benefits can we get by doing that? > The main difference happens when file:anon scan ratio> 100:1. > > For the current percent[] based computing, percent[0]=0 hence nr[0]=0 > which disables anon list scan unconditionally, for good or for bad. > > For the direct nr[] computing, > - nr[0] will be 0 for typical file servers, because with priority=12 > and anon lru size< 1.6GB, nr[0] = (anon_size/4096)/100< 0 > - nr[0] will be non-zero when priority=1 and anon_size> 100 pages, > this stops OOM for Shaohua's test case, however may not be enough to > guarantee safety (your previous reverting patch can provide this > guarantee). > > I liked Shaohua's patch a lot -- it adapts well to both the > file-server case and the mostly-anon-pages case :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 3:40 ` Rik van Riel @ 2010-04-06 4:49 ` Wu Fengguang 2010-04-06 5:09 ` Shaohua Li 0 siblings, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 4:49 UTC (permalink / raw) To: Rik van Riel Cc: KOSAKI Motohiro, Li, Shaohua, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Tue, Apr 06, 2010 at 11:40:47AM +0800, Rik van Riel wrote: > On 04/05/2010 11:31 PM, Wu Fengguang wrote: > > On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote: > >> Again, I didn't said his patch is no worth. I only said we don't have to > >> ignore the downside. > > > > Right, we should document both the upside and downside. > > The downside is obvious: streaming IO (used once data > that does not fit in the cache) can push out data that > is used more often - requiring that it be swapped in > at a later point in time. > > I understand what Shaohua's patch does, but I do not > understand the upside. What good does it do to increase > the size of the cache for streaming IO data, which is > generally touched only once? Not that bad :) With Shaohua's patch the anon list will typically _never_ get scanned, just like before. If it's mostly use-once IO, file:anon will be 1000 or even 10000, and priority=12. Then only anon lists larger than 16GB or 160GB will get nr[0] >= 1. > What kind of performance benefits can we get by doing > that? So vmscan behavior and performance remain the same as before. For really large anon list, such workload is beyond our imagination. So we cannot assert "don't scan anon list" will be a benefit. On the other hand, in the test case of "do stream IO when most memory occupied by tmpfs pages", it is very bad behavior refuse to scan anon list in normal and suddenly start scanning _the whole_ anon list when priority hits 0. Shaohua's patch helps it by gradually increasing the scan nr of anon list as memory pressure increases. Thanks, Fengguang > > The main difference happens when file:anon scan ratio> 100:1. > > > > For the current percent[] based computing, percent[0]=0 hence nr[0]=0 > > which disables anon list scan unconditionally, for good or for bad. > > > > For the direct nr[] computing, > > - nr[0] will be 0 for typical file servers, because with priority=12 > > and anon lru size< 1.6GB, nr[0] = (anon_size/4096)/100< 0 > > - nr[0] will be non-zero when priority=1 and anon_size> 100 pages, > > this stops OOM for Shaohua's test case, however may not be enough to > > guarantee safety (your previous reverting patch can provide this > > guarantee). > > > > I liked Shaohua's patch a lot -- it adapts well to both the > > file-server case and the mostly-anon-pages case :) > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 4:49 ` Wu Fengguang @ 2010-04-06 5:09 ` Shaohua Li 0 siblings, 0 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-06 5:09 UTC (permalink / raw) To: Wu, Fengguang Cc: Rik van Riel, KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Tue, Apr 06, 2010 at 12:49:10PM +0800, Wu, Fengguang wrote: > On Tue, Apr 06, 2010 at 11:40:47AM +0800, Rik van Riel wrote: > > On 04/05/2010 11:31 PM, Wu Fengguang wrote: > > > On Tue, Apr 06, 2010 at 10:58:43AM +0800, KOSAKI Motohiro wrote: > > >> Again, I didn't said his patch is no worth. I only said we don't have to > > >> ignore the downside. > > > > > > Right, we should document both the upside and downside. > > > > The downside is obvious: streaming IO (used once data > > that does not fit in the cache) can push out data that > > is used more often - requiring that it be swapped in > > at a later point in time. > > > > I understand what Shaohua's patch does, but I do not > > understand the upside. What good does it do to increase > > the size of the cache for streaming IO data, which is > > generally touched only once? > > Not that bad :) With Shaohua's patch the anon list will typically > _never_ get scanned, just like before. > > If it's mostly use-once IO, file:anon will be 1000 or even 10000, and > priority=12. Then only anon lists larger than 16GB or 160GB will get > nr[0] >= 1. > > > What kind of performance benefits can we get by doing > > that? > > So vmscan behavior and performance remain the same as before. > > For really large anon list, such workload is beyond our imagination. > So we cannot assert "don't scan anon list" will be a benefit. > > On the other hand, in the test case of "do stream IO when most memory > occupied by tmpfs pages", it is very bad behavior refuse to scan anon > list in normal and suddenly start scanning _the whole_ anon list when > priority hits 0. Shaohua's patch helps it by gradually increasing the > scan nr of anon list as memory pressure increases. Yep, the gradually increasing scan nr is the main advantage in my mind. Thanks, Shaohua > > > The main difference happens when file:anon scan ratio> 100:1. > > > > > > For the current percent[] based computing, percent[0]=0 hence nr[0]=0 > > > which disables anon list scan unconditionally, for good or for bad. > > > > > > For the direct nr[] computing, > > > - nr[0] will be 0 for typical file servers, because with priority=12 > > > and anon lru size< 1.6GB, nr[0] = (anon_size/4096)/100< 0 > > > - nr[0] will be non-zero when priority=1 and anon_size> 100 pages, > > > this stops OOM for Shaohua's test case, however may not be enough to > > > guarantee safety (your previous reverting patch can provide this > > > guarantee). > > > > > > I liked Shaohua's patch a lot -- it adapts well to both the > > > file-server case and the mostly-anon-pages case :) > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 6:50 ` Shaohua Li 2010-04-02 9:14 ` KOSAKI Motohiro @ 2010-04-04 0:48 ` Wu Fengguang 2010-04-06 1:27 ` Shaohua Li 2010-04-06 5:03 ` Wu Fengguang 2 siblings, 1 reply; 49+ messages in thread From: Wu Fengguang @ 2010-04-04 0:48 UTC (permalink / raw) To: Li, Shaohua Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Fri, Apr 02, 2010 at 02:50:52PM +0800, Li, Shaohua wrote: > On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote: > > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > > Hi > > > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > > completely skip anon pages and cause oops. > > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > > to 1. See below patch. > > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > > It's required to fix this too. > > > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > > had similar logic, but 1% swap-out made lots bug reports. > > > > if 1% is still big, how about below patch? > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > <1% seems no good reclaim rate. > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > because under streaming io workload, the number of scanning anon pages should > > be zero. this is very strong requirement. if not, backup operation will makes > > a lot of swapping out. > Sounds there is no big impact for the workload which you mentioned with the patch. > please see below descriptions. > I updated the description of the patch as fengguang suggested. > > > > Commit 84b18490d introduces a regression. With it, our tmpfs test always oom. > The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are > 6 copies of kernel source and the test does kbuild for each copy. My > investigation shows the test has a lot of rotated anon pages and quite few > file pages, so get_scan_ratio calculates percent[0] to be zero. Actually > the percent[0] shoule be a very small value, but our calculation round it > to zero. The commit makes vmscan completely skip anon pages and cause oops. > > To avoid underflow, we don't use percentage, instead we directly calculate > how many pages should be scaned. In this way, we should get several scan pages > for < 1% percent. With this fix, my test doesn't oom any more. > > Note, this patch doesn't really change logics, but just increase precise. For > system with a lot of memory, this might slightly changes behavior. For example, > in a sequential file read workload, without the patch, we don't swap any anon > pages. With it, if anon memory size is bigger than 16G, we will say one anon page see? > swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap > is assumed to be 1024 which is common in this workload. So the impact sounds not > a big deal. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..80a7ed5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > } > > /* > + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > + * until we collected @swap_cluster_max pages to scan. > + */ > +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > + unsigned long *nr_saved_scan) > +{ > + unsigned long nr; > + > + *nr_saved_scan += nr_to_scan; > + nr = *nr_saved_scan; > + > + if (nr >= SWAP_CLUSTER_MAX) > + *nr_saved_scan = 0; > + else > + nr = 0; > + > + return nr; > +} > + > +/* > * Determine how aggressively the anon and file LRU lists should be > * scanned. The relative value of each set of LRU lists is determined > * by looking at the fraction of the pages scanned we did rotate back > * onto the active list instead of evict. > * > - * percent[0] specifies how much pressure to put on ram/swap backed > - * memory, while percent[1] determines pressure on the file LRUs. > + * nr[x] specifies how many pages should be scaned The new comment loses information.. > */ > -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > - unsigned long *percent) > +static void get_scan_count(struct zone *zone, struct scan_control *sc, > + unsigned long *nr, int priority) > { > unsigned long anon, file, free; > unsigned long anon_prio, file_prio; > unsigned long ap, fp; > struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > + unsigned long fraction[2], denominator[2]; denominator[2] can be reduced to denominator. because denominator[0] == denominator[1] always holds. > + enum lru_list l; > > /* If we have no swap space, do not bother scanning anon pages. */ > if (!sc->may_swap || (nr_swap_pages <= 0)) { > - percent[0] = 0; > - percent[1] = 100; > - return; > + fraction[0] = 0; > + denominator[0] = 1; > + fraction[1] = 1; > + denominator[1] = 1; > + goto out; > } > > anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > /* If we have very few page cache pages, > force-scan anon pages. */ > if (unlikely(file + free <= high_wmark_pages(zone))) { > - percent[0] = 100; > - percent[1] = 0; > - return; > + fraction[0] = 1; > + denominator[0] = 1; > + fraction[1] = 0; > + denominator[1] = 1; > + goto out; > } > } > > @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); > fp /= reclaim_stat->recent_rotated[1] + 1; > > - /* Normalize to percentages */ > - percent[0] = 100 * ap / (ap + fp + 1); > - percent[1] = 100 - percent[0]; > -} > - > -/* > - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > - * until we collected @swap_cluster_max pages to scan. > - */ > -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > - unsigned long *nr_saved_scan) > -{ > - unsigned long nr; > + fraction[0] = ap; > + denominator[0] = ap + fp + 1; > + fraction[1] = fp; > + denominator[1] = ap + fp + 1; > > - *nr_saved_scan += nr_to_scan; > - nr = *nr_saved_scan; > +out: > + for_each_evictable_lru(l) { > + int file = is_file_lru(l); > + unsigned long scan; > > - if (nr >= SWAP_CLUSTER_MAX) > - *nr_saved_scan = 0; > - else > - nr = 0; > + if (fraction[file] == 0) { > + nr[l] = 0; > + continue; > + } > > - return nr; > + scan = zone_nr_lru_pages(zone, sc, l); > + if (priority) { > + scan >>= priority; > + scan = (scan * fraction[file] / denominator[file]); The "()" is not necessary here, or better end it here^ Thanks, Fengguang > + } > + nr[l] = nr_scan_try_batch(scan, > + &reclaim_stat->nr_saved_scan[l]); > + } > } > > /* > @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, > { > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > - unsigned long percent[2]; /* anon @ 0; file @ 1 */ > enum lru_list l; > unsigned long nr_reclaimed = sc->nr_reclaimed; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > - > - get_scan_ratio(zone, sc, percent); > > - for_each_evictable_lru(l) { > - int file = is_file_lru(l); > - unsigned long scan; > - > - if (percent[file] == 0) { > - nr[l] = 0; > - continue; > - } > - > - scan = zone_nr_lru_pages(zone, sc, l); > - if (priority) { > - scan >>= priority; > - scan = (scan * percent[file]) / 100; > - } > - nr[l] = nr_scan_try_batch(scan, > - &reclaim_stat->nr_saved_scan[l]); > - } > + get_scan_count(zone, sc, nr, priority); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-04 0:48 ` Wu Fengguang @ 2010-04-06 1:27 ` Shaohua Li 0 siblings, 0 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-06 1:27 UTC (permalink / raw) To: Wu, Fengguang Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Sun, Apr 04, 2010 at 08:48:38AM +0800, Wu, Fengguang wrote: > On Fri, Apr 02, 2010 at 02:50:52PM +0800, Li, Shaohua wrote: > > On Wed, Mar 31, 2010 at 01:53:27PM +0800, KOSAKI Motohiro wrote: > > > > > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > > > > > Hi > > > > > > > > > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > > > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > > > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > > > > > value, but our calculation round it to zero. The commit makes vmscan > > > > > > > completely skip anon pages and cause oops. > > > > > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > > > > > to 1. See below patch. > > > > > > > But the offending commit still changes behavior. Without the commit, we scan > > > > > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > > > > > It's required to fix this too. > > > > > > > > > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > > > > > > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > > > > > had similar logic, but 1% swap-out made lots bug reports. > > > > > if 1% is still big, how about below patch? > > > > > > > > This patch makes a lot of sense than previous. however I think <1% anon ratio > > > > shouldn't happen anyway because file lru doesn't have reclaimable pages. > > > > <1% seems no good reclaim rate. > > > > > > Oops, the above mention is wrong. sorry. only 1 page is still too big. > > > because under streaming io workload, the number of scanning anon pages should > > > be zero. this is very strong requirement. if not, backup operation will makes > > > a lot of swapping out. > > Sounds there is no big impact for the workload which you mentioned with the patch. > > please see below descriptions. > > I updated the description of the patch as fengguang suggested. > > > > > > > > Commit 84b18490d introduces a regression. With it, our tmpfs test always oom. > > The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there are > > 6 copies of kernel source and the test does kbuild for each copy. My > > investigation shows the test has a lot of rotated anon pages and quite few > > file pages, so get_scan_ratio calculates percent[0] to be zero. Actually > > the percent[0] shoule be a very small value, but our calculation round it > > to zero. The commit makes vmscan completely skip anon pages and cause oops. > > > > To avoid underflow, we don't use percentage, instead we directly calculate > > how many pages should be scaned. In this way, we should get several scan pages > > for < 1% percent. With this fix, my test doesn't oom any more. > > > > Note, this patch doesn't really change logics, but just increase precise. For > > system with a lot of memory, this might slightly changes behavior. For example, > > in a sequential file read workload, without the patch, we don't swap any anon > > pages. With it, if anon memory size is bigger than 16G, we will say one anon page > > see? Thanks, will send a updated against -mm since we reverted the offending patch. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-02 6:50 ` Shaohua Li 2010-04-02 9:14 ` KOSAKI Motohiro 2010-04-04 0:48 ` Wu Fengguang @ 2010-04-06 5:03 ` Wu Fengguang 2010-04-06 5:36 ` Shaohua Li 2010-04-09 6:51 ` Shaohua Li 2 siblings, 2 replies; 49+ messages in thread From: Wu Fengguang @ 2010-04-06 5:03 UTC (permalink / raw) To: Li, Shaohua Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Shaohua, > + scan = zone_nr_lru_pages(zone, sc, l); > + if (priority) { > + scan >>= priority; > + scan = (scan * fraction[file] / denominator[file]); Ah, the (scan * fraction[file]) may overflow in 32bit kernel! Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 5:03 ` Wu Fengguang @ 2010-04-06 5:36 ` Shaohua Li 2010-04-09 6:51 ` Shaohua Li 1 sibling, 0 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-06 5:36 UTC (permalink / raw) To: Wu, Fengguang Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Tue, Apr 06, 2010 at 01:03:25PM +0800, Wu, Fengguang wrote: > Shaohua, > > > + scan = zone_nr_lru_pages(zone, sc, l); > > + if (priority) { > > + scan >>= priority; > > + scan = (scan * fraction[file] / denominator[file]); > > Ah, the (scan * fraction[file]) may overflow in 32bit kernel! good catch. will change it to u64. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-06 5:03 ` Wu Fengguang 2010-04-06 5:36 ` Shaohua Li @ 2010-04-09 6:51 ` Shaohua Li 2010-04-09 21:20 ` Andrew Morton 1 sibling, 1 reply; 49+ messages in thread From: Shaohua Li @ 2010-04-09 6:51 UTC (permalink / raw) To: Wu, Fengguang Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, riel On Tue, Apr 06, 2010 at 01:03:25PM +0800, Wu, Fengguang wrote: > Shaohua, > > > + scan = zone_nr_lru_pages(zone, sc, l); > > + if (priority) { > > + scan >>= priority; > > + scan = (scan * fraction[file] / denominator[file]); > > Ah, the (scan * fraction[file]) may overflow in 32bit kernel! I updated the patch to address previous issues. is it possible to put this to -mm tree to see if there is anything wield happen? get_scan_ratio() calculates percentage and if the percentage is < 1%, it will round percentage down to 0% and cause we completely ignore scanning anon/file pages to reclaim memory even the total anon/file pages are very big. To avoid underflow, we don't use percentage, instead we directly calculate how many pages should be scaned. In this way, we should get several scanned pages for < 1% percent. This has some benefits: 1. increase our calculation precision 2. making our scan more smoothly. Without this, if percent[x] is underflow, shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan some pages. Note, this patch doesn't really change logics, but just increase precision. For system with a lot of memory, this might slightly changes behavior. For example, in a sequential file read workload, without the patch, we don't swap any anon pages. With it, if anon memory size is bigger than 16G, we will see one anon page swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap is assumed to be 1024 which is common in this workload. So the impact sounds not a big deal. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> diff --git a/mm/vmscan.c b/mm/vmscan.c index 3ff3311..1070f83 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1519,21 +1519,52 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, } /* + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, + * until we collected @swap_cluster_max pages to scan. + */ +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, + unsigned long *nr_saved_scan) +{ + unsigned long nr; + + *nr_saved_scan += nr_to_scan; + nr = *nr_saved_scan; + + if (nr >= SWAP_CLUSTER_MAX) + *nr_saved_scan = 0; + else + nr = 0; + + return nr; +} + +/* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined * by looking at the fraction of the pages scanned we did rotate back * onto the active list instead of evict. * - * percent[0] specifies how much pressure to put on ram/swap backed - * memory, while percent[1] determines pressure on the file LRUs. + * nr[0] = anon pages to scan; nr[1] = file pages to scan */ -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, - unsigned long *percent) +static void get_scan_count(struct zone *zone, struct scan_control *sc, + unsigned long *nr, int priority) { unsigned long anon, file, free; unsigned long anon_prio, file_prio; unsigned long ap, fp; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); + u64 fraction[2], denominator; + enum lru_list l; + int noswap = 0; + + /* If we have no swap space, do not bother scanning anon pages. */ + if (!sc->may_swap || (nr_swap_pages <= 0)) { + noswap = 1; + fraction[0] = 0; + fraction[1] = 1; + denominator = 1; + goto out; + } anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); @@ -1545,9 +1576,10 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, /* If we have very few page cache pages, force-scan anon pages. */ if (unlikely(file + free <= high_wmark_pages(zone))) { - percent[0] = 100; - percent[1] = 0; - return; + fraction[0] = 1; + fraction[1] = 0; + denominator = 1; + goto out; } } @@ -1594,29 +1626,22 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - /* Normalize to percentages */ - percent[0] = 100 * ap / (ap + fp + 1); - percent[1] = 100 - percent[0]; -} - -/* - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, - * until we collected @swap_cluster_max pages to scan. - */ -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, - unsigned long *nr_saved_scan) -{ - unsigned long nr; - - *nr_saved_scan += nr_to_scan; - nr = *nr_saved_scan; - - if (nr >= SWAP_CLUSTER_MAX) - *nr_saved_scan = 0; - else - nr = 0; + fraction[0] = ap; + fraction[1] = fp; + denominator = ap + fp + 1; +out: + for_each_evictable_lru(l) { + int file = is_file_lru(l); + unsigned long scan; - return nr; + scan = zone_nr_lru_pages(zone, sc, l); + if (priority || noswap) { + scan >>= priority; + scan = div64_u64(scan * fraction[file], denominator); + } + nr[l] = nr_scan_try_batch(scan, + &reclaim_stat->nr_saved_scan[l]); + } } /* @@ -1627,33 +1652,11 @@ static void shrink_zone(int priority, struct zone *zone, { unsigned long nr[NR_LRU_LISTS]; unsigned long nr_to_scan; - unsigned long percent[2]; /* anon @ 0; file @ 1 */ enum lru_list l; unsigned long nr_reclaimed = sc->nr_reclaimed; unsigned long nr_to_reclaim = sc->nr_to_reclaim; - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); - int noswap = 0; - - /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || (nr_swap_pages <= 0)) { - noswap = 1; - percent[0] = 0; - percent[1] = 100; - } else - get_scan_ratio(zone, sc, percent); - for_each_evictable_lru(l) { - int file = is_file_lru(l); - unsigned long scan; - - scan = zone_nr_lru_pages(zone, sc, l); - if (priority || noswap) { - scan >>= priority; - scan = (scan * percent[file]) / 100; - } - nr[l] = nr_scan_try_batch(scan, - &reclaim_stat->nr_saved_scan[l]); - } + get_scan_count(zone, sc, nr, priority); while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-09 6:51 ` Shaohua Li @ 2010-04-09 21:20 ` Andrew Morton 2010-04-09 21:25 ` Rik van Riel 2010-04-12 1:57 ` Shaohua Li 0 siblings, 2 replies; 49+ messages in thread From: Andrew Morton @ 2010-04-09 21:20 UTC (permalink / raw) To: Shaohua Li Cc: Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel On Fri, 9 Apr 2010 14:51:04 +0800 Shaohua Li <shaohua.li@intel.com> wrote: > get_scan_ratio() calculates percentage and if the percentage is < 1%, it will > round percentage down to 0% and cause we completely ignore scanning anon/file > pages to reclaim memory even the total anon/file pages are very big. > > To avoid underflow, we don't use percentage, instead we directly calculate > how many pages should be scaned. In this way, we should get several scanned pages > for < 1% percent. > > This has some benefits: > 1. increase our calculation precision > 2. making our scan more smoothly. Without this, if percent[x] is underflow, > shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority > is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan > some pages. > > Note, this patch doesn't really change logics, but just increase precision. For > system with a lot of memory, this might slightly changes behavior. For example, > in a sequential file read workload, without the patch, we don't swap any anon > pages. With it, if anon memory size is bigger than 16G, we will see one anon page > swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap > is assumed to be 1024 which is common in this workload. So the impact sounds not > a big deal. I grabbed this. Did we decide that this needed to be backported into 2.6.33.x? If so, some words explaining the reasoning would be needed. Come to that, it's not obvious that we need this in 2.6.34 either. What is the user-visible impact here? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-09 21:20 ` Andrew Morton @ 2010-04-09 21:25 ` Rik van Riel 2010-04-13 1:30 ` KOSAKI Motohiro 2010-04-12 1:57 ` Shaohua Li 1 sibling, 1 reply; 49+ messages in thread From: Rik van Riel @ 2010-04-09 21:25 UTC (permalink / raw) To: Andrew Morton Cc: Shaohua Li, Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 04/09/2010 05:20 PM, Andrew Morton wrote: > Come to that, it's not obvious that we need this in 2.6.34 either. What > is the user-visible impact here? I suspect very little impact, especially during workloads where we can just reclaim clean page cache at DEF_PRIORITY. FWIW, the patch looks good to me, so: Acked-by: Rik van Riel <riel@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-09 21:25 ` Rik van Riel @ 2010-04-13 1:30 ` KOSAKI Motohiro 2010-04-13 2:42 ` Rik van Riel 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-13 1:30 UTC (permalink / raw) To: Rik van Riel Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org > On 04/09/2010 05:20 PM, Andrew Morton wrote: > > > Come to that, it's not obvious that we need this in 2.6.34 either. What > > is the user-visible impact here? > > I suspect very little impact, especially during workloads > where we can just reclaim clean page cache at DEF_PRIORITY. > FWIW, the patch looks good to me, so: > > Acked-by: Rik van Riel <riel@redhat.com> > I'm surprised this ack a bit. Rik, do you have any improvement plan about streaming io detection logic? I think the patch have a slightly marginal benefit, it help to <1% scan ratio case. but it have big regression, it cause streaming io (e.g. backup operation) makes tons swap. So, I thought we sould do either, 1) drop this one 2) merge to change stream io detection logic improvement at first, and merge this one at second. Am i missing something? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-13 1:30 ` KOSAKI Motohiro @ 2010-04-13 2:42 ` Rik van Riel 2010-04-13 7:55 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: Rik van Riel @ 2010-04-13 2:42 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Shaohua Li, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org On 04/12/2010 09:30 PM, KOSAKI Motohiro wrote: >> On 04/09/2010 05:20 PM, Andrew Morton wrote: >> >>> Come to that, it's not obvious that we need this in 2.6.34 either. What >>> is the user-visible impact here? >> >> I suspect very little impact, especially during workloads >> where we can just reclaim clean page cache at DEF_PRIORITY. >> FWIW, the patch looks good to me, so: >> >> Acked-by: Rik van Riel<riel@redhat.com> >> > > I'm surprised this ack a bit. Rik, do you have any improvement plan about > streaming io detection logic? > I think the patch have a slightly marginal benefit, it help to<1% scan > ratio case. but it have big regression, it cause streaming io (e.g. backup > operation) makes tons swap. How? From the description I believe it took 16GB in a zone before we start scanning anon pages when reclaiming at DEF_PRIORITY? Would that casue a problem? > So, I thought we sould do either, > 1) drop this one > 2) merge to change stream io detection logic improvement at first, and > merge this one at second. We may need better streaming IO detection, anyway. I have noticed that while heavy sequential reads are fine, the virtual machines on my desktop system do a lot of whole block writes. Presumably, a lot of those writes are to the same blocks, over and over again. This causes the blocks to be promoted to the active file list, which ends up growing the active file list to the point where things from the working set get evicted. All for file pages that may only get WRITTEN to by the guests, because the guests cache their own copy whenever they need to read them! I'll have to check the page cache code to see if it keeps frequently written pages as accessed. We may be better off evicting frequently written pages, and keeping our cache space for data that is read... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-13 2:42 ` Rik van Riel @ 2010-04-13 7:55 ` KOSAKI Motohiro 2010-04-13 8:55 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-13 7:55 UTC (permalink / raw) To: Rik van Riel Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org > On 04/12/2010 09:30 PM, KOSAKI Motohiro wrote: > >> On 04/09/2010 05:20 PM, Andrew Morton wrote: > >> > >>> Come to that, it's not obvious that we need this in 2.6.34 either. What > >>> is the user-visible impact here? > >> > >> I suspect very little impact, especially during workloads > >> where we can just reclaim clean page cache at DEF_PRIORITY. > >> FWIW, the patch looks good to me, so: > >> > >> Acked-by: Rik van Riel<riel@redhat.com> > >> > > > > I'm surprised this ack a bit. Rik, do you have any improvement plan about > > streaming io detection logic? > > I think the patch have a slightly marginal benefit, it help to<1% scan > > ratio case. but it have big regression, it cause streaming io (e.g. backup > > operation) makes tons swap. > > How? From the description I believe it took 16GB in > a zone before we start scanning anon pages when > reclaiming at DEF_PRIORITY? > > Would that casue a problem? Please remember, 2.6.27 has following +1 scanning modifier. zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1; ^^^^ and, early (ano not yet merged) SplitLRU VM has similar +1. likes scan = zone_nr_lru_pages(zone, sc, l); scan >>= priority; scan = (scan * percent[file]) / 100 + 1; ^^^ We didn't think only one page scanning is not big matter. but it was not correct. we got streaming io bug report. the above +1 makes annoying swap io. because some server need big backup operation rather much much than physical memory size. example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes 1GB scan. and almost server only have some gigabyte swap although it has >1TB memory. If my memory is not correct, please correct me. My point is, greater or smaller than 16GB isn't essential. all patches should have big worth than the downside. The description said "the impact sounds not a big deal", nobody disagree it. but it's worth is more little. I don't imagine this patch improve anything. > > > So, I thought we sould do either, > > 1) drop this one > > 2) merge to change stream io detection logic improvement at first, and > > merge this one at second. > > We may need better streaming IO detection, anyway. agreed. that's no doubt. > I have noticed that while heavy sequential reads are fine, > the virtual machines on my desktop system do a lot of whole > block writes. Presumably, a lot of those writes are to the > same blocks, over and over again. > > This causes the blocks to be promoted to the active file > list, which ends up growing the active file list to the > point where things from the working set get evicted. > > All for file pages that may only get WRITTEN to by the > guests, because the guests cache their own copy whenever > they need to read them! > > I'll have to check the page cache code to see if it > keeps frequently written pages as accessed. We may be > better off evicting frequently written pages, and > keeping our cache space for data that is read... One question, In such case your guest don't use DirectIO? Or do you talk about guest VM behabior? I guess inactive_file_is_low_global() can be improvement a lot. but I'm not sure. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-13 7:55 ` KOSAKI Motohiro @ 2010-04-13 8:55 ` KOSAKI Motohiro 2010-04-14 1:27 ` Shaohua Li 0 siblings, 1 reply; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-13 8:55 UTC (permalink / raw) To: Rik van Riel Cc: kosaki.motohiro, Andrew Morton, Shaohua Li, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org > > > I'm surprised this ack a bit. Rik, do you have any improvement plan about > > > streaming io detection logic? > > > I think the patch have a slightly marginal benefit, it help to<1% scan > > > ratio case. but it have big regression, it cause streaming io (e.g. backup > > > operation) makes tons swap. > > > > How? From the description I believe it took 16GB in > > a zone before we start scanning anon pages when > > reclaiming at DEF_PRIORITY? > > > > Would that casue a problem? > > Please remember, 2.6.27 has following +1 scanning modifier. > > zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1; > ^^^^ > > and, early (ano not yet merged) SplitLRU VM has similar +1. likes > > scan = zone_nr_lru_pages(zone, sc, l); > scan >>= priority; > scan = (scan * percent[file]) / 100 + 1; > ^^^ > > We didn't think only one page scanning is not big matter. but it was not > correct. we got streaming io bug report. the above +1 makes annoying swap > io. because some server need big backup operation rather much much than > physical memory size. > > example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes > 1GB scan. and almost server only have some gigabyte swap although it > has >1TB memory. > > If my memory is not correct, please correct me. > > My point is, greater or smaller than 16GB isn't essential. all patches > should have big worth than the downside. The description said "the impact > sounds not a big deal", nobody disagree it. but it's worth is more little. > I don't imagine this patch improve anything. And now I've merged this patch into my local vmscan patch queue. After solving streaming io issue, I'll put it to mainline. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-13 8:55 ` KOSAKI Motohiro @ 2010-04-14 1:27 ` Shaohua Li 2010-04-15 3:25 ` KOSAKI Motohiro 0 siblings, 1 reply; 49+ messages in thread From: Shaohua Li @ 2010-04-14 1:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Rik van Riel, Andrew Morton, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, Apr 13, 2010 at 04:55:52PM +0800, KOSAKI Motohiro wrote: > > > > I'm surprised this ack a bit. Rik, do you have any improvement plan about > > > > streaming io detection logic? > > > > I think the patch have a slightly marginal benefit, it help to<1% scan > > > > ratio case. but it have big regression, it cause streaming io (e.g. backup > > > > operation) makes tons swap. > > > > > > How? From the description I believe it took 16GB in > > > a zone before we start scanning anon pages when > > > reclaiming at DEF_PRIORITY? > > > > > > Would that casue a problem? > > > > Please remember, 2.6.27 has following +1 scanning modifier. > > > > zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1; > > ^^^^ > > > > and, early (ano not yet merged) SplitLRU VM has similar +1. likes > > > > scan = zone_nr_lru_pages(zone, sc, l); > > scan >>= priority; > > scan = (scan * percent[file]) / 100 + 1; > > ^^^ > > > > We didn't think only one page scanning is not big matter. but it was not > > correct. we got streaming io bug report. the above +1 makes annoying swap > > io. because some server need big backup operation rather much much than > > physical memory size. > > > > example, If vm are dropping 1TB use once pages, 0.1% anon scanning makes > > 1GB scan. and almost server only have some gigabyte swap although it > > has >1TB memory. > > > > If my memory is not correct, please correct me. > > > > My point is, greater or smaller than 16GB isn't essential. all patches > > should have big worth than the downside. The description said "the impact > > sounds not a big deal", nobody disagree it. but it's worth is more little. > > I don't imagine this patch improve anything. > > And now I've merged this patch into my local vmscan patch queue. > After solving streaming io issue, I'll put it to mainline. if the streaming io issue is popular, how about below patch against my last one? we take priority == DEF_PRIORITY an exception. Index: linux/mm/vmscan.c =================================================================== --- linux.orig/mm/vmscan.c 2010-04-14 09:03:28.000000000 +0800 +++ linux/mm/vmscan.c 2010-04-14 09:19:56.000000000 +0800 @@ -1629,6 +1629,22 @@ static void get_scan_count(struct zone * fraction[0] = ap; fraction[1] = fp; denominator = ap + fp + 1; + + /* + * memory pressure isn't high, we allow percentage underflow. This + * avoids swap in stream io case. + */ + if (priority == DEF_PRIORITY) { + if (fraction[0] * 99 < fraction[1]) { + fraction[0] = 0; + fraction[1] = 1; + denominator = 1; + } else if (fraction[1] * 99 < fraction[0]) { + fraction[0] = 1; + fraction[1] = 0; + denominator = 1; + } + } out: for_each_evictable_lru(l) { int file = is_file_lru(l); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-14 1:27 ` Shaohua Li @ 2010-04-15 3:25 ` KOSAKI Motohiro 0 siblings, 0 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-04-15 3:25 UTC (permalink / raw) To: Shaohua Li Cc: kosaki.motohiro, Rik van Riel, Andrew Morton, Wu, Fengguang, linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi sorry for the delay. > > After solving streaming io issue, I'll put it to mainline. > if the streaming io issue is popular, how about below patch against my last one? > we take priority == DEF_PRIORITY an exception. Your patch seems works. but it is obviously ugly and bandaid patch. So, I like single your previous patch rather than combinate this one. Even though both dropping makes sense rather than both merge. Please consider attack root cause. > Index: linux/mm/vmscan.c > =================================================================== > --- linux.orig/mm/vmscan.c 2010-04-14 09:03:28.000000000 +0800 > +++ linux/mm/vmscan.c 2010-04-14 09:19:56.000000000 +0800 > @@ -1629,6 +1629,22 @@ static void get_scan_count(struct zone * > fraction[0] = ap; > fraction[1] = fp; > denominator = ap + fp + 1; > + > + /* > + * memory pressure isn't high, we allow percentage underflow. This > + * avoids swap in stream io case. > + */ > + if (priority == DEF_PRIORITY) { > + if (fraction[0] * 99 < fraction[1]) { > + fraction[0] = 0; > + fraction[1] = 1; > + denominator = 1; > + } else if (fraction[1] * 99 < fraction[0]) { > + fraction[0] = 1; > + fraction[1] = 0; > + denominator = 1; > + } > + } > out: > for_each_evictable_lru(l) { > int file = is_file_lru(l); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-04-09 21:20 ` Andrew Morton 2010-04-09 21:25 ` Rik van Riel @ 2010-04-12 1:57 ` Shaohua Li 1 sibling, 0 replies; 49+ messages in thread From: Shaohua Li @ 2010-04-12 1:57 UTC (permalink / raw) To: Andrew Morton Cc: Wu, Fengguang, KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@redhat.com On Sat, Apr 10, 2010 at 05:20:57AM +0800, Andrew Morton wrote: > On Fri, 9 Apr 2010 14:51:04 +0800 > Shaohua Li <shaohua.li@intel.com> wrote: > > > get_scan_ratio() calculates percentage and if the percentage is < 1%, it will > > round percentage down to 0% and cause we completely ignore scanning anon/file > > pages to reclaim memory even the total anon/file pages are very big. > > > > To avoid underflow, we don't use percentage, instead we directly calculate > > how many pages should be scaned. In this way, we should get several scanned pages > > for < 1% percent. > > > > This has some benefits: > > 1. increase our calculation precision > > 2. making our scan more smoothly. Without this, if percent[x] is underflow, > > shrink_zone() doesn't scan any pages and suddenly it scans all pages when priority > > is zero. With this, even priority isn't zero, shrink_zone() gets chance to scan > > some pages. > > > > Note, this patch doesn't really change logics, but just increase precision. For > > system with a lot of memory, this might slightly changes behavior. For example, > > in a sequential file read workload, without the patch, we don't swap any anon > > pages. With it, if anon memory size is bigger than 16G, we will see one anon page > > swapped. The 16G is calculated as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap > > is assumed to be 1024 which is common in this workload. So the impact sounds not > > a big deal. > > I grabbed this. > > Did we decide that this needed to be backported into 2.6.33.x? If so, > some words explaining the reasoning would be needed. > > Come to that, it's not obvious that we need this in 2.6.34 either. Not needed. > is the user-visible impact here? Should be very small I think. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-31 4:53 ` Shaohua Li 2010-03-31 5:38 ` KOSAKI Motohiro @ 2010-03-31 5:41 ` Wu Fengguang 1 sibling, 0 replies; 49+ messages in thread From: Wu Fengguang @ 2010-03-31 5:41 UTC (permalink / raw) To: Li, Shaohua Cc: KOSAKI Motohiro, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org On Wed, Mar 31, 2010 at 12:53:48PM +0800, Li, Shaohua wrote: > On Tue, Mar 30, 2010 at 02:08:53PM +0800, KOSAKI Motohiro wrote: > > Hi > > > > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > > > With it, our tmpfs test always oom. The test has a lot of rotated anon > > > pages and cause percent[0] zero. Actually the percent[0] is a very small > > > value, but our calculation round it to zero. The commit makes vmscan > > > completely skip anon pages and cause oops. > > > An option is if percent[x] is zero in get_scan_ratio(), forces it > > > to 1. See below patch. > > > But the offending commit still changes behavior. Without the commit, we scan > > > all pages if priority is zero, below patch doesn't fix this. Don't know if > > > It's required to fix this too. > > > > Can you please post your /proc/meminfo and reproduce program? I'll digg it. > > > > Very unfortunately, this patch isn't acceptable. In past time, vmscan > > had similar logic, but 1% swap-out made lots bug reports. > if 1% is still big, how about below patch? > > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > With it, our tmpfs test always oom. The test has a lot of rotated anon > pages and cause percent[0] zero. Actually the percent[0] is a very small > value, but our calculation round it to zero. The commit makes vmscan > completely skip anon pages and cause oops. > To avoid underflow, we don't use percentage, instead we directly calculate > how many pages should be scaned. The changelog can be improved. For example, to describe these items in separate paragraphs: - the behavior change introduced by 84b18490d1f (which claims to be cleanup) - the tmpfs test case - the root cause - the solution - test result > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..80a7ed5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1519,27 +1519,50 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > } > > /* > + * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > + * until we collected @swap_cluster_max pages to scan. > + */ > +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > + unsigned long *nr_saved_scan) > +{ > + unsigned long nr; > + > + *nr_saved_scan += nr_to_scan; > + nr = *nr_saved_scan; > + > + if (nr >= SWAP_CLUSTER_MAX) > + *nr_saved_scan = 0; > + else > + nr = 0; > + > + return nr; > +} > + > +/* > * Determine how aggressively the anon and file LRU lists should be > * scanned. The relative value of each set of LRU lists is determined > * by looking at the fraction of the pages scanned we did rotate back > * onto the active list instead of evict. > * > - * percent[0] specifies how much pressure to put on ram/swap backed > - * memory, while percent[1] determines pressure on the file LRUs. > + * nr[x] specifies how many pages should be scaned typo: scanned > */ > -static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > - unsigned long *percent) > +static void get_scan_count(struct zone *zone, struct scan_control *sc, > + unsigned long *nr, int priority) > { > unsigned long anon, file, free; > unsigned long anon_prio, file_prio; > unsigned long ap, fp; > struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > + unsigned long fraction[2], denominator[2]; The denominator is shared, so one scaler would be sufficient. Also ap, fp can be removed and to use fraction[] directly. And it's better to retain this comment: /* anon @ 0; file @ 1 */ > + enum lru_list l; > > /* If we have no swap space, do not bother scanning anon pages. */ > if (!sc->may_swap || (nr_swap_pages <= 0)) { > - percent[0] = 0; > - percent[1] = 100; > - return; > + fraction[0] = 0; > + denominator[0] = 1; > + fraction[1] = 1; > + denominator[1] = 1; > + goto out; > } > > anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > @@ -1552,9 +1575,11 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > /* If we have very few page cache pages, > force-scan anon pages. */ > if (unlikely(file + free <= high_wmark_pages(zone))) { > - percent[0] = 100; > - percent[1] = 0; > - return; > + fraction[0] = 1; > + denominator[0] = 1; > + fraction[1] = 0; > + denominator[1] = 1; > + goto out; > } > } > > @@ -1601,29 +1626,29 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); > fp /= reclaim_stat->recent_rotated[1] + 1; > > - /* Normalize to percentages */ > - percent[0] = 100 * ap / (ap + fp + 1); > - percent[1] = 100 - percent[0]; > -} > - > -/* > - * Smallish @nr_to_scan's are deposited in @nr_saved_scan, > - * until we collected @swap_cluster_max pages to scan. > - */ > -static unsigned long nr_scan_try_batch(unsigned long nr_to_scan, > - unsigned long *nr_saved_scan) > -{ > - unsigned long nr; > + fraction[0] = ap; > + denominator[0] = ap + fp + 1; > + fraction[1] = fp; > + denominator[1] = ap + fp + 1; > > - *nr_saved_scan += nr_to_scan; > - nr = *nr_saved_scan; > +out: > + for_each_evictable_lru(l) { > + int file = is_file_lru(l); > + unsigned long scan; > > - if (nr >= SWAP_CLUSTER_MAX) > - *nr_saved_scan = 0; > - else > - nr = 0; > + if (fraction[file] == 0) { > + nr[l] = 0; > + continue; > + } > > - return nr; > + scan = zone_nr_lru_pages(zone, sc, l); > + if (priority) { > + scan >>= priority; > + scan = (scan * fraction[file] / denominator[file]); scan = (scan * fraction[file]) / denominator[file]; Thanks, Fengguang > + } > + nr[l] = nr_scan_try_batch(scan, > + &reclaim_stat->nr_saved_scan[l]); > + } > } > > /* > @@ -1634,31 +1659,11 @@ static void shrink_zone(int priority, struct zone *zone, > { > unsigned long nr[NR_LRU_LISTS]; > unsigned long nr_to_scan; > - unsigned long percent[2]; /* anon @ 0; file @ 1 */ > enum lru_list l; > unsigned long nr_reclaimed = sc->nr_reclaimed; > unsigned long nr_to_reclaim = sc->nr_to_reclaim; > - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > - > - get_scan_ratio(zone, sc, percent); > > - for_each_evictable_lru(l) { > - int file = is_file_lru(l); > - unsigned long scan; > - > - if (percent[file] == 0) { > - nr[l] = 0; > - continue; > - } > - > - scan = zone_nr_lru_pages(zone, sc, l); > - if (priority) { > - scan >>= priority; > - scan = (scan * percent[file]) / 100; > - } > - nr[l] = nr_scan_try_batch(scan, > - &reclaim_stat->nr_saved_scan[l]); > - } > + get_scan_count(zone, sc, nr, priority); > > while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || > nr[LRU_INACTIVE_FILE]) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li 2010-03-30 6:08 ` KOSAKI Motohiro @ 2010-03-30 10:17 ` Minchan Kim 2010-03-30 10:25 ` KOSAKI Motohiro 2010-03-30 11:56 ` Balbir Singh 2 siblings, 1 reply; 49+ messages in thread From: Minchan Kim @ 2010-03-30 10:17 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-mm, linux-kernel, akpm, fengguang.wu, kosaki.motohiro On Tue, Mar 30, 2010 at 2:53 PM, Shaohua Li <shaohua.li@intel.com> wrote: > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > With it, our tmpfs test always oom. The test has a lot of rotated anon > pages and cause percent[0] zero. Actually the percent[0] is a very small > value, but our calculation round it to zero. The commit makes vmscan > completely skip anon pages and cause oops. > An option is if percent[x] is zero in get_scan_ratio(), forces it > to 1. See below patch. > But the offending commit still changes behavior. Without the commit, we scan > all pages if priority is zero, below patch doesn't fix this. Don't know if > It's required to fix this too. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..d5cc34e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > /* Normalize to percentages */ > percent[0] = 100 * ap / (ap + fp + 1); > percent[1] = 100 - percent[0]; > + /* > + * if percent[x] is small and rounded to 0, this case doesn't mean we > + * should skip scan. Give it at least 1% share. > + */ > + if (percent[0] == 0) { > + percent[0] = 1; > + percent[1] = 99; > + } > + if (percent[1] == 0) { > + percent[0] = 99; > + percent[1] = 1; > + } > } > > /* > Yes. It made subtle change. But we should not depend that change. Current logic seems to be good and clear than old. I think you were lucky at that time by not-good and not-clear logic. BTW, How about this? diff --git a/mm/vmscan.c b/mm/vmscan.c index 79c8098..f0df563 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1646,11 +1646,6 @@ static void shrink_zone(int priority, struct zone *zone, int file = is_file_lru(l); unsigned long scan; - if (percent[file] == 0) { - nr[l] = 0; - continue; - } - scan = zone_nr_lru_pages(zone, sc, l); if (priority) { scan >>= priority; -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 10:17 ` Minchan Kim @ 2010-03-30 10:25 ` KOSAKI Motohiro 0 siblings, 0 replies; 49+ messages in thread From: KOSAKI Motohiro @ 2010-03-30 10:25 UTC (permalink / raw) To: Minchan Kim Cc: kosaki.motohiro, Shaohua Li, linux-mm, linux-kernel, akpm, fengguang.wu > Yes. It made subtle change. > But we should not depend that change. > Current logic seems to be good and clear than old. > I think you were lucky at that time by not-good and not-clear logic. > > BTW, How about this? Unfortunatelly, memcg need your removed code. if removed, swapping out might happen although sc->may_swap==0 when priority==0. Please give me little investigate time. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..f0df563 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1646,11 +1646,6 @@ static void shrink_zone(int priority, struct zone *zone, > int file = is_file_lru(l); > unsigned long scan; > > - if (percent[file] == 0) { > - nr[l] = 0; > - continue; > - } > - > scan = zone_nr_lru_pages(zone, sc, l); > if (priority) { > scan >>= priority; > > > > > -- > Kind regards, > Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a hrefmailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH]vmscan: handle underflow for get_scan_ratio 2010-03-30 5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li 2010-03-30 6:08 ` KOSAKI Motohiro 2010-03-30 10:17 ` Minchan Kim @ 2010-03-30 11:56 ` Balbir Singh 2 siblings, 0 replies; 49+ messages in thread From: Balbir Singh @ 2010-03-30 11:56 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-mm, linux-kernel, akpm, fengguang.wu, kosaki.motohiro On Tue, Mar 30, 2010 at 11:23 AM, Shaohua Li <shaohua.li@intel.com> wrote: > Commit 84b18490d1f1bc7ed5095c929f78bc002eb70f26 introduces a regression. > With it, our tmpfs test always oom. The test has a lot of rotated anon > pages and cause percent[0] zero. Actually the percent[0] is a very small > value, but our calculation round it to zero. The commit makes vmscan > completely skip anon pages and cause oops. > An option is if percent[x] is zero in get_scan_ratio(), forces it > to 1. See below patch. > But the offending commit still changes behavior. Without the commit, we scan > all pages if priority is zero, below patch doesn't fix this. Don't know if > It's required to fix this too. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 79c8098..d5cc34e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1604,6 +1604,18 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc, > /* Normalize to percentages */ > percent[0] = 100 * ap / (ap + fp + 1); > percent[1] = 100 - percent[0]; > + /* > + * if percent[x] is small and rounded to 0, this case doesn't mean we > + * should skip scan. Give it at least 1% share. > + */ > + if (percent[0] == 0) { > + percent[0] = 1; > + percent[1] = 99; > + } > + if (percent[1] == 0) { > + percent[0] = 99; > + percent[1] = 1; > + } > } > Can you please post the meminfo before and after the changes (diff maybe?). Can you also please share the ap and fp data from which the percent figures are being calculated Is your swappiness set to 60? Can you please share the OOM/panic message as well. Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2010-04-15 3:25 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-30 5:53 [PATCH]vmscan: handle underflow for get_scan_ratio Shaohua Li 2010-03-30 6:08 ` KOSAKI Motohiro 2010-03-30 6:32 ` Shaohua Li 2010-03-30 6:40 ` KOSAKI Motohiro 2010-03-30 6:53 ` Shaohua Li 2010-03-30 7:31 ` KOSAKI Motohiro 2010-03-30 8:13 ` Shaohua Li 2010-03-31 4:53 ` Shaohua Li 2010-03-31 5:38 ` KOSAKI Motohiro 2010-03-31 5:51 ` Wu Fengguang 2010-03-31 6:00 ` KOSAKI Motohiro 2010-03-31 6:03 ` Wu Fengguang 2010-04-01 22:16 ` Andrew Morton 2010-04-02 9:13 ` KOSAKI Motohiro 2010-04-06 1:22 ` Wu Fengguang 2010-04-06 3:36 ` Rik van Riel 2010-03-31 5:53 ` KOSAKI Motohiro 2010-04-02 6:50 ` Shaohua Li 2010-04-02 9:14 ` KOSAKI Motohiro 2010-04-02 9:24 ` Shaohua Li 2010-04-04 14:19 ` KOSAKI Motohiro 2010-04-06 1:25 ` Shaohua Li 2010-04-06 1:36 ` KOSAKI Motohiro 2010-04-06 1:50 ` Wu Fengguang 2010-04-06 2:06 ` KOSAKI Motohiro 2010-04-06 2:30 ` Wu Fengguang 2010-04-06 2:58 ` KOSAKI Motohiro 2010-04-06 3:31 ` Wu Fengguang 2010-04-06 3:40 ` Rik van Riel 2010-04-06 4:49 ` Wu Fengguang 2010-04-06 5:09 ` Shaohua Li 2010-04-04 0:48 ` Wu Fengguang 2010-04-06 1:27 ` Shaohua Li 2010-04-06 5:03 ` Wu Fengguang 2010-04-06 5:36 ` Shaohua Li 2010-04-09 6:51 ` Shaohua Li 2010-04-09 21:20 ` Andrew Morton 2010-04-09 21:25 ` Rik van Riel 2010-04-13 1:30 ` KOSAKI Motohiro 2010-04-13 2:42 ` Rik van Riel 2010-04-13 7:55 ` KOSAKI Motohiro 2010-04-13 8:55 ` KOSAKI Motohiro 2010-04-14 1:27 ` Shaohua Li 2010-04-15 3:25 ` KOSAKI Motohiro 2010-04-12 1:57 ` Shaohua Li 2010-03-31 5:41 ` Wu Fengguang 2010-03-30 10:17 ` Minchan Kim 2010-03-30 10:25 ` KOSAKI Motohiro 2010-03-30 11:56 ` Balbir Singh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).