* [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
@ 2012-05-11 8:00 Hugh Dickins
2012-05-11 8:30 ` Sasha Levin
2012-05-11 16:26 ` Linus Torvalds
0 siblings, 2 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-05-11 8:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Sasha Levin, Minchan Kim, linux-mm, linux-kernel
Why is there less MemFree than there used to be? It perturbed a test,
so I've just been bisecting linux-next, and now find the offender went
upstream yesterday.
Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
us expect it to be left unset at 0 (and it's not then used as a divisor).
MemTotal: 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB 8061476kB
Repetitive test with percpu_pagelist_fraction 8:
MemFree: 6948420kB 6237172kB 6949696kB 6840692kB 6949048kB 6862984kB
Same test with percpu_pagelist_fraction back to 0:
MemFree: 7945000kB 7944908kB 7948568kB 7949060kB 7948796kB 7948812kB
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- 3.4-rc6+/mm/page_alloc.c 2012-05-10 22:53:35.362478419 -0700
+++ linux/mm/page_alloc.c 2012-05-11 00:07:31.613657283 -0700
@@ -105,7 +105,7 @@ unsigned long totalreserve_pages __read_
*/
unsigned long dirty_balance_reserve __read_mostly;
-int percpu_pagelist_fraction = 8;
+int percpu_pagelist_fraction;
gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
#ifdef CONFIG_PM_SLEEP
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins
@ 2012-05-11 8:30 ` Sasha Levin
2012-05-11 8:38 ` Minchan Kim
2012-05-11 16:26 ` Linus Torvalds
1 sibling, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2012-05-11 8:30 UTC (permalink / raw)
To: Hugh Dickins
Cc: Linus Torvalds, Andrew Morton, Minchan Kim, linux-mm,
linux-kernel
On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote:
> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
> us expect it to be left unset at 0 (and it's not then used as a divisor).
I'm a bit confused about this, does it mean that once you set
percpu_pagelist_fraction to a value above the minimum, you can no
longer set it back to being 0?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 8:30 ` Sasha Levin
@ 2012-05-11 8:38 ` Minchan Kim
2012-05-11 9:01 ` Minchan Kim
2012-05-11 14:10 ` Hugh Dickins
0 siblings, 2 replies; 11+ messages in thread
From: Minchan Kim @ 2012-05-11 8:38 UTC (permalink / raw)
To: Sasha Levin
Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, linux-mm,
linux-kernel
On 05/11/2012 05:30 PM, Sasha Levin wrote:
> On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote:
>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
>> us expect it to be left unset at 0 (and it's not then used as a divisor).
>
> I'm a bit confused about this, does it mean that once you set
> percpu_pagelist_fraction to a value above the minimum, you can no
> longer set it back to being 0?
Unfortunately, Yes. :(
It's rather awkward and need fix.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 8:38 ` Minchan Kim
@ 2012-05-11 9:01 ` Minchan Kim
2012-05-11 16:27 ` Linus Torvalds
2012-05-11 14:10 ` Hugh Dickins
1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2012-05-11 9:01 UTC (permalink / raw)
Cc: Sasha Levin, Hugh Dickins, Linus Torvalds, Andrew Morton,
linux-mm, linux-kernel
On 05/11/2012 05:38 PM, Minchan Kim wrote:
> On 05/11/2012 05:30 PM, Sasha Levin wrote:
>
>> On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@google.com> wrote:
>>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
>>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
>>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
>>> us expect it to be left unset at 0 (and it's not then used as a divisor).
>>
>> I'm a bit confused about this, does it mean that once you set
>> percpu_pagelist_fraction to a value above the minimum, you can no
>> longer set it back to being 0?
>
>
> Unfortunately, Yes. :(
> It's rather awkward and need fix.
I didn't have a time so made quick patch to show just concept.
Not tested and Not consider carefully.
If anyone doesn't oppose, I will send formal patch which will have more beauty code.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..fabc52c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -132,7 +132,6 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
static int minolduid;
-static int min_percpu_pagelist_fract = 8;
static int ngroups_max = NGROUPS_MAX;
static const int cap_last_cap = CAP_LAST_CAP;
@@ -1214,7 +1213,6 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(percpu_pagelist_fraction),
.mode = 0644,
.proc_handler = percpu_pagelist_fraction_sysctl_handler,
- .extra1 = &min_percpu_pagelist_fract,
},
#ifdef CONFIG_MMU
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a13ded1..cc2353a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5161,12 +5161,30 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
if (!write || (ret == -EINVAL))
return ret;
- for_each_populated_zone(zone) {
- for_each_possible_cpu(cpu) {
- unsigned long high;
- high = zone->present_pages / percpu_pagelist_fraction;
- setup_pagelist_highmark(
- per_cpu_ptr(zone->pageset, cpu), high);
+
+ if (percpu_pagelist_fraction < 8 && percpu_pagelist_fraction != 0)
+ return -EINVAL;
+
+ if (percpu_pagelist_fraction != 0) {
+ for_each_populated_zone(zone) {
+ for_each_possible_cpu(cpu) {
+ unsigned long high;
+ high = zone->present_pages / percpu_pagelist_fraction;
+ setup_pagelist_highmark(
+ per_cpu_ptr(zone->pageset, cpu), high);
+ }
+ }
+ }
+ else {
+ for_each_populated_zone(zone) {
+ for_each_possible_cpu(cpu) {
+ struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
+ unsigned long batch = zone_batchsize(zone);
+ struct per_cpu_pages *pcp;
+ pcp = &p->pcp;
+ pcp->high = 6 * batch;
+ pcp->batch = max(1UL, 1 * batch);
+ }
}
}
return 0;
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 8:38 ` Minchan Kim
2012-05-11 9:01 ` Minchan Kim
@ 2012-05-11 14:10 ` Hugh Dickins
2012-05-14 1:51 ` Minchan Kim
1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2012-05-11 14:10 UTC (permalink / raw)
To: Minchan Kim
Cc: Sasha Levin, Linus Torvalds, Andrew Morton, linux-mm,
linux-kernel
On Fri, 11 May 2012, Minchan Kim wrote:
> On 05/11/2012 05:30 PM, Sasha Levin wrote:
>
> >> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
> >> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
> >> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
> >> us expect it to be left unset at 0 (and it's not then used as a divisor).
> >
> > I'm a bit confused about this, does it mean that once you set
> > percpu_pagelist_fraction to a value above the minimum, you can no
> > longer set it back to being 0?
>
>
> Unfortunately, Yes. :(
> It's rather awkward and need fix.
It's inelegant, but does that actually need a fix? Has anybody asked
for that option in the six years of percpu_pagelist_fraction?
Does setting percpu_pagelist_fraction to some large number perhaps
approximate to the default behaviour of percpu_pagelist_fraction 0?
I don't care very much either way - just don't want this discussion
to divert from applying last night's fix to the default behaviour
that most people expect.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins
2012-05-11 8:30 ` Sasha Levin
@ 2012-05-11 16:26 ` Linus Torvalds
1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-05-11 16:26 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Sasha Levin, Minchan Kim, linux-mm, linux-kernel
On Fri, May 11, 2012 at 1:00 AM, Hugh Dickins <hughd@google.com> wrote:
>
> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
> us expect it to be left unset at 0 (and it's not then used as a divisor).
Applied.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 9:01 ` Minchan Kim
@ 2012-05-11 16:27 ` Linus Torvalds
2012-05-11 16:35 ` Sasha Levin
0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-05-11 16:27 UTC (permalink / raw)
To: Minchan Kim
Cc: Sasha Levin, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel
On Fri, May 11, 2012 at 2:01 AM, Minchan Kim <minchan@kernel.org> wrote:
>
> I didn't have a time so made quick patch to show just concept.
> Not tested and Not consider carefully.
> If anyone doesn't oppose, I will send formal patch which will have more beauty code.
What's so magical about that '8' *anyway*? We do we have that minimum at all?
At the very least, the 8-vs-0 thing needs to be explained.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 16:27 ` Linus Torvalds
@ 2012-05-11 16:35 ` Sasha Levin
2012-05-11 16:54 ` Linus Torvalds
2012-05-14 8:51 ` Cong Wang
0 siblings, 2 replies; 11+ messages in thread
From: Sasha Levin @ 2012-05-11 16:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel
On Fri, May 11, 2012 at 6:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, May 11, 2012 at 2:01 AM, Minchan Kim <minchan@kernel.org> wrote:
>>
>> I didn't have a time so made quick patch to show just concept.
>> Not tested and Not consider carefully.
>> If anyone doesn't oppose, I will send formal patch which will have more beauty code.
>
> What's so magical about that '8' *anyway*? We do we have that minimum at all?
>
> At the very least, the 8-vs-0 thing needs to be explained.
The '0' acts as an "off" switch.
Once it's on, we reserve 1/x of the pages for the pagelists. I'm not
sure why 8 was selected in the first place, but I guess it made sense
that you don't want to reserve 15%+ of your memory for the pagelists.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 16:35 ` Sasha Levin
@ 2012-05-11 16:54 ` Linus Torvalds
2012-05-14 8:51 ` Cong Wang
1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-05-11 16:54 UTC (permalink / raw)
To: Sasha Levin
Cc: Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel
On Fri, May 11, 2012 at 9:35 AM, Sasha Levin <levinsasha928@gmail.com> wrote:
>
> Once it's on, we reserve 1/x of the pages for the pagelists. I'm not
> sure why 8 was selected in the first place, but I guess it made sense
> that you don't want to reserve 15%+ of your memory for the pagelists.
Why not just accept any number, but turn small numbers into the minimum?
And if it's a per-cpu, then the minimum had better depend on number of
CPU's anyway. 15% of memory on a single-cpu already sounds insanely
high, but if you have several cpu's, it's going to be just totally
crazy.
So a minimum of 8 already sounds broken. Exposing that minimum in a
way that makes it impossible to reset it sounds just insane.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 14:10 ` Hugh Dickins
@ 2012-05-14 1:51 ` Minchan Kim
0 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2012-05-14 1:51 UTC (permalink / raw)
To: Hugh Dickins
Cc: Sasha Levin, Linus Torvalds, Andrew Morton, linux-mm,
linux-kernel
Hi Hugh,
On 05/11/2012 11:10 PM, Hugh Dickins wrote:
> On Fri, 11 May 2012, Minchan Kim wrote:
>> On 05/11/2012 05:30 PM, Sasha Levin wrote:
>>
>>>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
>>>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
>>>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
>>>> us expect it to be left unset at 0 (and it's not then used as a divisor).
>>>
>>> I'm a bit confused about this, does it mean that once you set
>>> percpu_pagelist_fraction to a value above the minimum, you can no
>>> longer set it back to being 0?
>>
>>
>> Unfortunately, Yes. :(
>> It's rather awkward and need fix.
>
> It's inelegant, but does that actually need a fix? Has anybody asked
> for that option in the six years of percpu_pagelist_fraction?
I don't have heard about it but thing we can't reset to 0 again once we set some number to above 8 is
very strange. Sometime, someone may raise the value on /proc/sys/vm/percpu_pagelist_fraction to test it
and realized function of the knob so he want to reset it to 0 default value, again. But he couldn't.
It's very strange. :(
>
> Does setting percpu_pagelist_fraction to some large number perhaps
> approximate to the default behaviour of percpu_pagelist_fraction 0?
Yes. But it's not intuitive.
>
> I don't care very much either way - just don't want this discussion
> to divert from applying last night's fix to the default behaviour
> that most people expect.
Of course.
It's totally from my careless review.
Actually, I didn't find to change default value to 8 when I review the patch.
I just focused on proc_dointvec_minmax's err return value.
Shame on me. :(
Thanks for spot it.
>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0
2012-05-11 16:35 ` Sasha Levin
2012-05-11 16:54 ` Linus Torvalds
@ 2012-05-14 8:51 ` Cong Wang
1 sibling, 0 replies; 11+ messages in thread
From: Cong Wang @ 2012-05-14 8:51 UTC (permalink / raw)
To: Sasha Levin
Cc: Linus Torvalds, Minchan Kim, Hugh Dickins, Andrew Morton,
linux-mm, linux-kernel
On 05/12/2012 12:35 AM, Sasha Levin wrote:
> On Fri, May 11, 2012 at 6:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Fri, May 11, 2012 at 2:01 AM, Minchan Kim<minchan@kernel.org> wrote:
>>>
>>> I didn't have a time so made quick patch to show just concept.
>>> Not tested and Not consider carefully.
>>> If anyone doesn't oppose, I will send formal patch which will have more beauty code.
>>
>> What's so magical about that '8' *anyway*? We do we have that minimum at all?
>>
>> At the very least, the 8-vs-0 thing needs to be explained.
>
> The '0' acts as an "off" switch.
>
> Once it's on, we reserve 1/x of the pages for the pagelists. I'm not
> sure why 8 was selected in the first place, but I guess it made sense
> that you don't want to reserve 15%+ of your memory for the pagelists.
1/x is not user-friendly, other vm sysctl's use percentage (x%), for
example, overcommit_ratio.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-05-14 8:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-11 8:00 [PATCH] mm: raise MemFree by reverting percpu_pagelist_fraction to 0 Hugh Dickins
2012-05-11 8:30 ` Sasha Levin
2012-05-11 8:38 ` Minchan Kim
2012-05-11 9:01 ` Minchan Kim
2012-05-11 16:27 ` Linus Torvalds
2012-05-11 16:35 ` Sasha Levin
2012-05-11 16:54 ` Linus Torvalds
2012-05-14 8:51 ` Cong Wang
2012-05-11 14:10 ` Hugh Dickins
2012-05-14 1:51 ` Minchan Kim
2012-05-11 16:26 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).