* [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory
@ 2013-02-27 21:09 Andrew Shewmaker
2013-02-28 22:14 ` Andrew Morton
2013-03-01 0:16 ` Simon Jeons
0 siblings, 2 replies; 4+ messages in thread
From: Andrew Shewmaker @ 2013-02-27 21:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
Add a rootuser_reserve_pages knob to allow admins of large memory
systems running with overcommit disabled to change the hardcoded
memory reserve to something other than 3%.
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
---
Patch based off of mmotm git tree as of February 27th.
I set rootuser_reserve pages to be a default of 1000, and I suppose
I should have initialzed similarly to the way min_free_kbytes is,
scaling it with the size of the box. However, I wanted to get a
simple version of this patch out for feedback to see if it has any
chance of acceptance or if I need to take an entirely different
approach.
Any feedback will be appreciated!
Documentation/sysctl/vm.txt | 9 +++++++++
include/linux/mm.h | 2 ++
kernel/sysctl.c | 8 ++++++++
mm/mmap.c | 30 +++++++++++++++++++++++-------
4 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701f..3a71de9 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -51,6 +51,7 @@ Currently, these files are in /proc/sys/vm:
- page-cluster
- panic_on_oom
- percpu_pagelist_fraction
+- rootuser_reserve_pages
- stat_interval
- swappiness
- vfs_cache_pressure
@@ -628,6 +629,14 @@ the high water marks for each per cpu page list.
==============================================================
+rootuser_reserve_pages
+
+The number of free pages left in the system that should be reserved for users
+with the capability cap_sys_admin. The default falue is 3% of total system
+memory. Changing this takes effect whenever an application requests memory.
+
+==============================================================
+
stat_interval
The time interval between which vm statistics are updated. The default
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 66e2f7c..af7b39f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1677,6 +1677,8 @@ int in_gate_area_no_mm(unsigned long addr);
int drop_caches_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+int rootuser_reserve_pages_sysctl_handler(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *);
unsigned long shrink_slab(struct shrink_control *shrink,
unsigned long nr_pages_scanned,
unsigned long lru_pages);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..cd1987e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -96,6 +96,7 @@
/* External variables not in a header file. */
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
+extern int sysctl_rootuser_reserve_pages;
extern int max_threads;
extern int suid_dumpable;
#ifdef CONFIG_COREDUMP
@@ -1413,6 +1414,13 @@ static struct ctl_table vm_table[] = {
.extra2 = &one,
},
#endif
+ {
+ .procname = "rootuser_reserve_pages",
+ .data = &sysctl_rootuser_reserve_pages,
+ .maxlen = sizeof(sysctl_rootuser_reserve_pages),
+ .mode = 0644,
+ .proc_handler = rootuser_reserve_pages_sysctl_handler,
+ },
{ }
};
diff --git a/mm/mmap.c b/mm/mmap.c
index d1e4124..b58af97 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -32,6 +32,7 @@
#include <linux/khugepaged.h>
#include <linux/uprobes.h>
#include <linux/rbtree_augmented.h>
+#include <linux/sysctl.h>
#include <asm/uaccess.h>
#include <asm/cacheflush.h>
@@ -83,6 +84,7 @@ EXPORT_SYMBOL(vm_get_page_prot);
int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; /* heuristic overcommit */
int sysctl_overcommit_ratio __read_mostly = 50; /* default is 50% */
int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
+int sysctl_rootuser_reserve_pages __read_mostly = 1000;
/*
* Make sure vm_committed_as in one cacheline and not cacheline shared with
* other variables. It can be updated by several CPUs frequently.
@@ -165,7 +167,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
* Leave the last 3% for root
*/
if (!cap_sys_admin)
- free -= free / 32;
+ free -= sysctl_rootuser_reserve_pages;
if (free > pages)
return 0;
@@ -179,9 +181,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
* Leave the last 3% for root
*/
if (!cap_sys_admin)
- allowed -= allowed / 32;
+ allowed -= sysctl_rootuser_reserve_pages;
allowed += total_swap_pages;
if (percpu_counter_read_positive(&vm_committed_as) < allowed)
return 0;
error:
@@ -3052,3 +3049,22 @@ void __init mmap_init(void)
ret = percpu_counter_init(&vm_committed_as, 0);
VM_BUG_ON(ret);
}
+
+/*
+ * rootuser_reserve_pages_sysctl_handler - just a wrapper around proc_dointvec_minmax() so
+ * that we can cap the number of pages to the current number of free pages.
+ */
+int rootuser_reserve_pages_sysctl_handler(ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ unsigned long free;
+
+ proc_dointvec(table, write, buffer, length, ppos);
+
+ if (write) {
+ free = global_page_state(NR_FREE_PAGES);
+ if (sysctl_rootuser_reserve_pages > free)
+ sysctl_rootuser_reserve_pages = free;
+ }
+ return 0;
+}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory
2013-02-28 22:14 ` Andrew Morton
@ 2013-02-28 2:19 ` Andrew Shewmaker
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Shewmaker @ 2013-02-28 2:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel
On Thu, Feb 28, 2013 at 02:14:41PM -0800, Andrew Morton wrote:
> On Wed, 27 Feb 2013 16:09:25 -0500
> Andrew Shewmaker <agshew@gmail.com> wrote:
>
> > Add a rootuser_reserve_pages knob to allow admins of large memory
> > systems running with overcommit disabled to change the hardcoded
> > memory reserve to something other than 3%.
> >
> > Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
> >
> > ---
> >
> > Patch based off of mmotm git tree as of February 27th.
> >
> > I set rootuser_reserve pages to be a default of 1000, and I suppose
> > I should have initialzed similarly to the way min_free_kbytes is,
> > scaling it with the size of the box. However, I wanted to get a
> > simple version of this patch out for feedback to see if it has any
> > chance of acceptance or if I need to take an entirely different
> > approach.
> >
> > Any feedback will be appreciated!
>
> Seems reasonable.
>
> Yes, we should scale the initial value according to the machine size in
> some fashion.
>
> btw, both these patches had the same title. Please avoid this.
> Documentation/SubmittingPatches section 15 has all the details.
Sorry about that. I'll resend correctly formatted patche submissions
with a scaled initial value for rootuser_reserve_pages.
Thanks for the feedback!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory
2013-02-27 21:09 [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory Andrew Shewmaker
@ 2013-02-28 22:14 ` Andrew Morton
2013-02-28 2:19 ` Andrew Shewmaker
2013-03-01 0:16 ` Simon Jeons
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2013-02-28 22:14 UTC (permalink / raw)
To: Andrew Shewmaker; +Cc: linux-mm, linux-kernel
On Wed, 27 Feb 2013 16:09:25 -0500
Andrew Shewmaker <agshew@gmail.com> wrote:
> Add a rootuser_reserve_pages knob to allow admins of large memory
> systems running with overcommit disabled to change the hardcoded
> memory reserve to something other than 3%.
>
> Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
>
> ---
>
> Patch based off of mmotm git tree as of February 27th.
>
> I set rootuser_reserve pages to be a default of 1000, and I suppose
> I should have initialzed similarly to the way min_free_kbytes is,
> scaling it with the size of the box. However, I wanted to get a
> simple version of this patch out for feedback to see if it has any
> chance of acceptance or if I need to take an entirely different
> approach.
>
> Any feedback will be appreciated!
Seems reasonable.
Yes, we should scale the initial value according to the machine size in
some fashion.
btw, both these patches had the same title. Please avoid this.
Documentation/SubmittingPatches section 15 has all the details.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory
2013-02-27 21:09 [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory Andrew Shewmaker
2013-02-28 22:14 ` Andrew Morton
@ 2013-03-01 0:16 ` Simon Jeons
1 sibling, 0 replies; 4+ messages in thread
From: Simon Jeons @ 2013-03-01 0:16 UTC (permalink / raw)
To: Andrew Shewmaker; +Cc: Andrew Morton, linux-mm, linux-kernel
On 02/28/2013 05:09 AM, Andrew Shewmaker wrote:
> Add a rootuser_reserve_pages knob to allow admins of large memory
> systems running with overcommit disabled to change the hardcoded
> memory reserve to something other than 3%.
>
> Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
>
> ---
>
> Patch based off of mmotm git tree as of February 27th.
>
> I set rootuser_reserve pages to be a default of 1000, and I suppose
> I should have initialzed similarly to the way min_free_kbytes is,
> scaling it with the size of the box. However, I wanted to get a
> simple version of this patch out for feedback to see if it has any
> chance of acceptance or if I need to take an entirely different
> approach.
>
> Any feedback will be appreciated!
>
> Documentation/sysctl/vm.txt | 9 +++++++++
> include/linux/mm.h | 2 ++
> kernel/sysctl.c | 8 ++++++++
> mm/mmap.c | 30 +++++++++++++++++++++++-------
> 4 files changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 078701f..3a71de9 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -51,6 +51,7 @@ Currently, these files are in /proc/sys/vm:
> - page-cluster
> - panic_on_oom
> - percpu_pagelist_fraction
> +- rootuser_reserve_pages
> - stat_interval
> - swappiness
> - vfs_cache_pressure
> @@ -628,6 +629,14 @@ the high water marks for each per cpu page list.
>
> ==============================================================
>
> +rootuser_reserve_pages
> +
> +The number of free pages left in the system that should be reserved for users
> +with the capability cap_sys_admin. The default falue is 3% of total system
s/falue/value
> +memory. Changing this takes effect whenever an application requests memory.
> +
> +==============================================================
> +
> stat_interval
>
> The time interval between which vm statistics are updated. The default
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 66e2f7c..af7b39f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1677,6 +1677,8 @@ int in_gate_area_no_mm(unsigned long addr);
>
> int drop_caches_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> +int rootuser_reserve_pages_sysctl_handler(struct ctl_table *, int,
> + void __user *, size_t *, loff_t *);
> unsigned long shrink_slab(struct shrink_control *shrink,
> unsigned long nr_pages_scanned,
> unsigned long lru_pages);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c88878d..cd1987e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -96,6 +96,7 @@
> /* External variables not in a header file. */
> extern int sysctl_overcommit_memory;
> extern int sysctl_overcommit_ratio;
> +extern int sysctl_rootuser_reserve_pages;
> extern int max_threads;
> extern int suid_dumpable;
> #ifdef CONFIG_COREDUMP
> @@ -1413,6 +1414,13 @@ static struct ctl_table vm_table[] = {
> .extra2 = &one,
> },
> #endif
> + {
> + .procname = "rootuser_reserve_pages",
> + .data = &sysctl_rootuser_reserve_pages,
> + .maxlen = sizeof(sysctl_rootuser_reserve_pages),
> + .mode = 0644,
> + .proc_handler = rootuser_reserve_pages_sysctl_handler,
> + },
> { }
> };
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d1e4124..b58af97 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -32,6 +32,7 @@
> #include <linux/khugepaged.h>
> #include <linux/uprobes.h>
> #include <linux/rbtree_augmented.h>
> +#include <linux/sysctl.h>
>
> #include <asm/uaccess.h>
> #include <asm/cacheflush.h>
> @@ -83,6 +84,7 @@ EXPORT_SYMBOL(vm_get_page_prot);
> int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; /* heuristic overcommit */
> int sysctl_overcommit_ratio __read_mostly = 50; /* default is 50% */
> int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
> +int sysctl_rootuser_reserve_pages __read_mostly = 1000;
> /*
> * Make sure vm_committed_as in one cacheline and not cacheline shared with
> * other variables. It can be updated by several CPUs frequently.
> @@ -165,7 +167,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
> * Leave the last 3% for root
> */
> if (!cap_sys_admin)
> - free -= free / 32;
> + free -= sysctl_rootuser_reserve_pages;
>
> if (free > pages)
> return 0;
> @@ -179,9 +181,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
> * Leave the last 3% for root
> */
> if (!cap_sys_admin)
> - allowed -= allowed / 32;
> + allowed -= sysctl_rootuser_reserve_pages;
> allowed += total_swap_pages;
>
> if (percpu_counter_read_positive(&vm_committed_as) < allowed)
> return 0;
> error:
> @@ -3052,3 +3049,22 @@ void __init mmap_init(void)
> ret = percpu_counter_init(&vm_committed_as, 0);
> VM_BUG_ON(ret);
> }
> +
> +/*
> + * rootuser_reserve_pages_sysctl_handler - just a wrapper around proc_dointvec_minmax() so
> + * that we can cap the number of pages to the current number of free pages.
> + */
> +int rootuser_reserve_pages_sysctl_handler(ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos)
> +{
> + unsigned long free;
> +
> + proc_dointvec(table, write, buffer, length, ppos);
> +
> + if (write) {
> + free = global_page_state(NR_FREE_PAGES);
> + if (sysctl_rootuser_reserve_pages > free)
> + sysctl_rootuser_reserve_pages = free;
> + }
> + return 0;
> +}
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-03-01 0:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-27 21:09 [RFC PATCH v2 2/2] mm: tuning hardcoded reserved memory Andrew Shewmaker
2013-02-28 22:14 ` Andrew Morton
2013-02-28 2:19 ` Andrew Shewmaker
2013-03-01 0:16 ` Simon Jeons
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).