* [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 @ 2012-05-11 8:00 Hugh Dickins 2012-05-11 8:30 ` Sasha Levin 2012-05-11 16:26 ` Linus Torvalds 0 siblings, 2 replies; 11+ messages in thread From: Hugh Dickins @ 2012-05-11 8:00 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Sasha Levin, Minchan Kim, linux-mm, linux-kernel Why is there less MemFree than there used to be? It perturbed a test, so I've just been bisecting linux-next, and now find the offender went upstream yesterday. Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, which leaves 1/8th of memory on percpu lists (on each cpu??); but most of us expect it to be left unset at 0 (and it's not then used as a divisor). MemTotal: 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB Repetitive test with percpu_pagelist_fraction 8: MemFree: 6948420kB 6237172kB 6949696kB 6840692kB 6949048kB 6862984kB Same test with percpu_pagelist_fraction back to 0: MemFree: 7945000kB 7944908kB 7948568kB 7949060kB 7948796kB 7948812kB Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 3.4-rc6+/mm/page_alloc.c 2012-05-10 22:53:35.362478419 -0700 +++ linux/mm/page_alloc.c 2012-05-11 00:07:31.613657283 -0700 @@ -105,7 +105,7 @@ unsigned long totalreserve_pages __read_ */ unsigned long dirty_balance_reserve __read_mostly; -int percpu_pagelist_fraction = 8; +int percpu_pagelist_fraction; gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; #ifdef CONFIG_PM_SLEEP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins @ 2012-05-11 8:30 ` Sasha Levin 2012-05-11 8:38 ` Minchan Kim 2012-05-11 16:26 ` Linus Torvalds 1 sibling, 1 reply; 11+ messages in thread From: Sasha Levin @ 2012-05-11 8:30 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Minchan Kim, linux-mm, linux-kernel On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote: > Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" > mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, > which leaves 1/8th of memory on percpu lists (on each cpu??); but most of > us expect it to be left unset at 0 (and it's not then used as a divisor). I'm a bit confused about this, does it mean that once you set percpu_pagelist_fraction to a value above the minimum, you can no longer set it back to being 0? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 8:30 ` Sasha Levin @ 2012-05-11 8:38 ` Minchan Kim 2012-05-11 9:01 ` Minchan Kim 2012-05-11 14:10 ` Hugh Dickins 0 siblings, 2 replies; 11+ messages in thread From: Minchan Kim @ 2012-05-11 8:38 UTC (permalink / raw) To: Sasha Levin Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On 05/11/2012 05:30 PM, Sasha Levin wrote: > On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote: >> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" >> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, >> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of >> us expect it to be left unset at 0 (and it's not then used as a divisor). > > I'm a bit confused about this, does it mean that once you set > percpu_pagelist_fraction to a value above the minimum, you can no > longer set it back to being 0? Unfortunately, Yes. :( It's rather awkward and need fix. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 8:38 ` Minchan Kim @ 2012-05-11 9:01 ` Minchan Kim 2012-05-11 16:27 ` Linus Torvalds 2012-05-11 14:10 ` Hugh Dickins 1 sibling, 1 reply; 11+ messages in thread From: Minchan Kim @ 2012-05-11 9:01 UTC (permalink / raw) Cc: Sasha Levin, Hugh Dickins, Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On 05/11/2012 05:38 PM, Minchan Kim wrote: > On 05/11/2012 05:30 PM, Sasha Levin wrote: > >> On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote: >>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" >>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, >>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of >>> us expect it to be left unset at 0 (and it's not then used as a divisor). >> >> I'm a bit confused about this, does it mean that once you set >> percpu_pagelist_fraction to a value above the minimum, you can no >> longer set it back to being 0? > > > Unfortunately, Yes. :( > It's rather awkward and need fix. I didn't have a time so made quick patch to show just concept. Not tested and Not consider carefully. If anyone doesn't oppose, I will send formal patch which will have more beauty code. diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f487f25..fabc52c 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -132,7 +132,6 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; -static int min_percpu_pagelist_fract = 8; static int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; @@ -1214,7 +1213,6 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(percpu_pagelist_fraction), .mode = 0644, .proc_handler = percpu_pagelist_fraction_sysctl_handler, - .extra1 = &min_percpu_pagelist_fract, }, #ifdef CONFIG_MMU { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a13ded1..cc2353a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5161,12 +5161,30 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write, ret = proc_dointvec_minmax(table, write, buffer, length, ppos); if (!write || (ret == -EINVAL)) return ret; - for_each_populated_zone(zone) { - for_each_possible_cpu(cpu) { - unsigned long high; - high = zone->present_pages / percpu_pagelist_fraction; - setup_pagelist_highmark( - per_cpu_ptr(zone->pageset, cpu), high); + + if (percpu_pagelist_fraction < 8 && percpu_pagelist_fraction != 0) + return -EINVAL; + + if (percpu_pagelist_fraction != 0) { + for_each_populated_zone(zone) { + for_each_possible_cpu(cpu) { + unsigned long high; + high = zone->present_pages / percpu_pagelist_fraction; + setup_pagelist_highmark( + per_cpu_ptr(zone->pageset, cpu), high); + } + } + } + else { + for_each_populated_zone(zone) { + for_each_possible_cpu(cpu) { + struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu); + unsigned long batch = zone_batchsize(zone); + struct per_cpu_pages *pcp; + pcp = &p->pcp; + pcp->high = 6 * batch; + pcp->batch = max(1UL, 1 * batch); + } } } return 0; -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 9:01 ` Minchan Kim @ 2012-05-11 16:27 ` Linus Torvalds 2012-05-11 16:35 ` Sasha Levin 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2012-05-11 16:27 UTC (permalink / raw) To: Minchan Kim Cc: Sasha Levin, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Fri, May 11, 2012 at 2:01 AM, Minchan Kim <minchan@kernel.org> wrote: > > I didn't have a time so made quick patch to show just concept. > Not tested and Not consider carefully. > If anyone doesn't oppose, I will send formal patch which will have more beauty code. What's so magical about that '8' *anyway*? We do we have that minimum at all? At the very least, the 8-vs-0 thing needs to be explained. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 16:27 ` Linus Torvalds @ 2012-05-11 16:35 ` Sasha Levin 2012-05-11 16:54 ` Linus Torvalds 2012-05-14 8:51 ` Cong Wang 0 siblings, 2 replies; 11+ messages in thread From: Sasha Levin @ 2012-05-11 16:35 UTC (permalink / raw) To: Linus Torvalds Cc: Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Fri, May 11, 2012 at 6:27 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, May 11, 2012 at 2:01 AM, Minchan Kim <minchan@kernel.org> wrote: >> >> I didn't have a time so made quick patch to show just concept. >> Not tested and Not consider carefully. >> If anyone doesn't oppose, I will send formal patch which will have more beauty code. > > What's so magical about that '8' *anyway*? We do we have that minimum at all? > > At the very least, the 8-vs-0 thing needs to be explained. The '0' acts as an "off" switch. Once it's on, we reserve 1/x of the pages for the pagelists. I'm not sure why 8 was selected in the first place, but I guess it made sense that you don't want to reserve 15%+ of your memory for the pagelists. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 16:35 ` Sasha Levin @ 2012-05-11 16:54 ` Linus Torvalds 2012-05-14 8:51 ` Cong Wang 1 sibling, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2012-05-11 16:54 UTC (permalink / raw) To: Sasha Levin Cc: Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Fri, May 11, 2012 at 9:35 AM, Sasha Levin <levinsasha928@gmail.com> wrote: > > Once it's on, we reserve 1/x of the pages for the pagelists. I'm not > sure why 8 was selected in the first place, but I guess it made sense > that you don't want to reserve 15%+ of your memory for the pagelists. Why not just accept any number, but turn small numbers into the minimum? And if it's a per-cpu, then the minimum had better depend on number of CPU's anyway. 15% of memory on a single-cpu already sounds insanely high, but if you have several cpu's, it's going to be just totally crazy. So a minimum of 8 already sounds broken. Exposing that minimum in a way that makes it impossible to reset it sounds just insane. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 16:35 ` Sasha Levin 2012-05-11 16:54 ` Linus Torvalds @ 2012-05-14 8:51 ` Cong Wang 1 sibling, 0 replies; 11+ messages in thread From: Cong Wang @ 2012-05-14 8:51 UTC (permalink / raw) To: Sasha Levin Cc: Linus Torvalds, Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On 05/12/2012 12:35 AM, Sasha Levin wrote: > On Fri, May 11, 2012 at 6:27 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Fri, May 11, 2012 at 2:01 AM, Minchan Kim<minchan@kernel.org> wrote: >>> >>> I didn't have a time so made quick patch to show just concept. >>> Not tested and Not consider carefully. >>> If anyone doesn't oppose, I will send formal patch which will have more beauty code. >> >> What's so magical about that '8' *anyway*? We do we have that minimum at all? >> >> At the very least, the 8-vs-0 thing needs to be explained. > > The '0' acts as an "off" switch. > > Once it's on, we reserve 1/x of the pages for the pagelists. I'm not > sure why 8 was selected in the first place, but I guess it made sense > that you don't want to reserve 15%+ of your memory for the pagelists. 1/x is not user-friendly, other vm sysctl's use percentage (x%), for example, overcommit_ratio. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 8:38 ` Minchan Kim 2012-05-11 9:01 ` Minchan Kim @ 2012-05-11 14:10 ` Hugh Dickins 2012-05-14 1:51 ` Minchan Kim 1 sibling, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2012-05-11 14:10 UTC (permalink / raw) To: Minchan Kim Cc: Sasha Levin, Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On Fri, 11 May 2012, Minchan Kim wrote: > On 05/11/2012 05:30 PM, Sasha Levin wrote: > > >> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" > >> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, > >> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of > >> us expect it to be left unset at 0 (and it's not then used as a divisor). > > > > I'm a bit confused about this, does it mean that once you set > > percpu_pagelist_fraction to a value above the minimum, you can no > > longer set it back to being 0? > > > Unfortunately, Yes. :( > It's rather awkward and need fix. It's inelegant, but does that actually need a fix? Has anybody asked for that option in the six years of percpu_pagelist_fraction? Does setting percpu_pagelist_fraction to some large number perhaps approximate to the default behaviour of percpu_pagelist_fraction 0? I don't care very much either way - just don't want this discussion to divert from applying last night's fix to the default behaviour that most people expect. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 14:10 ` Hugh Dickins @ 2012-05-14 1:51 ` Minchan Kim 0 siblings, 0 replies; 11+ messages in thread From: Minchan Kim @ 2012-05-14 1:51 UTC (permalink / raw) To: Hugh Dickins Cc: Sasha Levin, Linus Torvalds, Andrew Morton, linux-mm, linux-kernel Hi Hugh, On 05/11/2012 11:10 PM, Hugh Dickins wrote: > On Fri, 11 May 2012, Minchan Kim wrote: >> On 05/11/2012 05:30 PM, Sasha Levin wrote: >> >>>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" >>>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, >>>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of >>>> us expect it to be left unset at 0 (and it's not then used as a divisor). >>> >>> I'm a bit confused about this, does it mean that once you set >>> percpu_pagelist_fraction to a value above the minimum, you can no >>> longer set it back to being 0? >> >> >> Unfortunately, Yes. :( >> It's rather awkward and need fix. > > It's inelegant, but does that actually need a fix? Has anybody asked > for that option in the six years of percpu_pagelist_fraction? I don't have heard about it but thing we can't reset to 0 again once we set some number to above 8 is very strange. Sometime, someone may raise the value on /proc/sys/vm/percpu_pagelist_fraction to test it and realized function of the knob so he want to reset it to 0 default value, again. But he couldn't. It's very strange. :( > > Does setting percpu_pagelist_fraction to some large number perhaps > approximate to the default behaviour of percpu_pagelist_fraction 0? Yes. But it's not intuitive. > > I don't care very much either way - just don't want this discussion > to divert from applying last night's fix to the default behaviour > that most people expect. Of course. It's totally from my careless review. Actually, I didn't find to change default value to 8 when I review the patch. I just focused on proc_dointvec_minmax's err return value. Shame on me. :( Thanks for spot it. > > Hugh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins 2012-05-11 8:30 ` Sasha Levin @ 2012-05-11 16:26 ` Linus Torvalds 1 sibling, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2012-05-11 16:26 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Sasha Levin, Minchan Kim, linux-mm, linux-kernel On Fri, May 11, 2012 at 1:00 AM, Hugh Dickins <hughd@google.com> wrote: > > Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()" > mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8, > which leaves 1/8th of memory on percpu lists (on each cpu??); but most of > us expect it to be left unset at 0 (and it's not then used as a divisor). Applied. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-05-14 8:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins 2012-05-11 8:30 ` Sasha Levin 2012-05-11 8:38 ` Minchan Kim 2012-05-11 9:01 ` Minchan Kim 2012-05-11 16:27 ` Linus Torvalds 2012-05-11 16:35 ` Sasha Levin 2012-05-11 16:54 ` Linus Torvalds 2012-05-14 8:51 ` Cong Wang 2012-05-11 14:10 ` Hugh Dickins 2012-05-14 1:51 ` Minchan Kim 2012-05-11 16:26 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).