* [patch 1/2] mm: change dirty limit type specifiers to unsigned long
@ 2008-11-23 18:55 David Rientjes
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
2008-11-24 19:16 ` [patch 1/2] mm: change dirty limit type specifiers to unsigned long Peter Zijlstra
0 siblings, 2 replies; 13+ messages in thread
From: David Rientjes @ 2008-11-23 18:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel
The background dirty and dirty limits are better defined with type
specifiers of unsigned long since negative writeback thresholds are not
possible.
These values, as returned by get_dirty_limits(), are normally compared
with ZVC values to determine whether writeback shall commence or be
throttled. Such page counts cannot be negative, so declaring the page
limits as signed is unnecessary.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
include/linux/writeback.h | 4 ++--
mm/backing-dev.c | 6 +++---
mm/page-writeback.c | 22 +++++++++++-----------
3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -125,8 +125,8 @@ struct file;
int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *,
void __user *, size_t *, loff_t *);
-void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty,
- struct backing_dev_info *bdi);
+void get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
+ unsigned long *pbdi_dirty, struct backing_dev_info *bdi);
void page_writeback_init(void);
void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -24,9 +24,9 @@ static void bdi_debug_init(void)
static int bdi_debug_stats_show(struct seq_file *m, void *v)
{
struct backing_dev_info *bdi = m->private;
- long background_thresh;
- long dirty_thresh;
- long bdi_thresh;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
+ unsigned long bdi_thresh;
get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -362,13 +362,13 @@ unsigned long determine_dirtyable_memory(void)
}
void
-get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty,
- struct backing_dev_info *bdi)
+get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
+ unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
{
int background_ratio; /* Percentages */
int dirty_ratio;
- long background;
- long dirty;
+ unsigned long background;
+ unsigned long dirty;
unsigned long available_memory = determine_dirtyable_memory();
struct task_struct *tsk;
@@ -423,9 +423,9 @@ static void balance_dirty_pages(struct address_space *mapping)
{
long nr_reclaimable, bdi_nr_reclaimable;
long nr_writeback, bdi_nr_writeback;
- long background_thresh;
- long dirty_thresh;
- long bdi_thresh;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
+ unsigned long bdi_thresh;
unsigned long pages_written = 0;
unsigned long write_chunk = sync_writeback_pages();
@@ -580,8 +580,8 @@ EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
void throttle_vm_writeout(gfp_t gfp_mask)
{
- long background_thresh;
- long dirty_thresh;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
for ( ; ; ) {
get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
@@ -624,8 +624,8 @@ static void background_writeout(unsigned long _min_pages)
};
for ( ; ; ) {
- long background_thresh;
- long dirty_thresh;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
if (global_page_state(NR_FILE_DIRTY) +
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-23 18:55 [patch 1/2] mm: change dirty limit type specifiers to unsigned long David Rientjes
@ 2008-11-23 18:55 ` David Rientjes
2008-11-24 19:15 ` Peter Zijlstra
` (2 more replies)
2008-11-24 19:16 ` [patch 1/2] mm: change dirty limit type specifiers to unsigned long Peter Zijlstra
1 sibling, 3 replies; 13+ messages in thread
From: David Rientjes @ 2008-11-23 18:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel
This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.
dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.
With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.
dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system. If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.
When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value. For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:
dirtyable_memory = free pages + mapped pages + file cache
dirty_background_bytes = dirty_background_ratio * dirtyable_memory
-or-
dirty_background_ratio = dirty_background_bytes / dirtyable_memory
AND
dirty_bytes = dirty_ratio * dirtyable_memory
-or-
dirty_ratio = dirty_bytes / dirtyable_memory
Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified. When one sysctl is written, the other appears as 0 when read.
The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.
Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100. This restriction is maintained, but
dirty_bytes has a lower limit of only one page.
Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio. This restriction is maintained in addition to
restricting dirty_background_bytes. If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
Documentation/filesystems/proc.txt | 26 +++++++++-
Documentation/sysctl/vm.txt | 3 +-
include/linux/writeback.h | 11 ++++
kernel/sysctl.c | 22 +++++++-
mm/page-writeback.c | 102 ++++++++++++++++++++++++++++++------
5 files changed, 145 insertions(+), 19 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1381,6 +1381,15 @@ swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.
+dirty_background_bytes
+----------------------
+
+Contains the amount of dirty memory at which the pdflush background writeback
+daemon will start writeback.
+
+If dirty_background_bytes is written, dirty_background_ratio becomes a function
+of its value (dirty_background_bytes / the amount of dirtyable system memory).
+
dirty_background_ratio
----------------------
@@ -1389,14 +1398,29 @@ pages + file cache, not including locked pages and HugePages), the number of
pages at which the pdflush background writeback daemon will start writing out
dirty data.
+If dirty_background_ratio is written, dirty_background_bytes becomes a function
+of its value (dirty_background_ratio * the amount of dirtyable system memory).
+
+dirty_bytes
+-----------
+
+Contains the amount of dirty memory at which a process generating disk writes
+will itself start writeback.
+
+If dirty_bytes is written, dirty_ratio becomes a function of its value
+(dirty_bytes / the amount of dirtyable system memory).
+
dirty_ratio
------------------
+-----------
Contains, as a percentage of the dirtyable system memory (free pages + mapped
pages + file cache, not including locked pages and HugePages), the number of
pages at which a process which is generating disk writes will itself start
writing out dirty data.
+If dirty_ratio is written, dirty_bytes becomes a function of its value
+(dirty_ratio * the amount of dirtyable system memory).
+
dirty_writeback_centisecs
-------------------------
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -41,7 +41,8 @@ Currently, these files are in /proc/sys/vm:
==============================================================
-dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
+dirty_bytes, dirty_ratio, dirty_background_bytes,
+dirty_background_ratio, dirty_expire_centisecs,
dirty_writeback_centisecs, highmem_is_dirtyable,
vfs_cache_pressure, laptop_mode, block_dump, swap_token_timeout,
drop-caches, hugepages_treat_as_movable:
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -107,7 +107,9 @@ void throttle_vm_writeout(gfp_t gfp_mask);
/* These are exported to sysctl. */
extern int dirty_background_ratio;
+extern unsigned long dirty_background_bytes;
extern int vm_dirty_ratio;
+extern unsigned long vm_dirty_bytes;
extern int dirty_writeback_interval;
extern int dirty_expire_interval;
extern int vm_highmem_is_dirtyable;
@@ -116,9 +118,18 @@ extern int laptop_mode;
extern unsigned long determine_dirtyable_memory(void);
+extern int dirty_background_ratio_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos);
+extern int dirty_background_bytes_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos);
extern int dirty_ratio_handler(struct ctl_table *table, int write,
struct file *filp, void __user *buffer, size_t *lenp,
loff_t *ppos);
+extern int dirty_bytes_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos);
struct ctl_table;
struct file;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -915,12 +915,22 @@ static struct ctl_table vm_table[] = {
.data = &dirty_background_ratio,
.maxlen = sizeof(dirty_background_ratio),
.mode = 0644,
- .proc_handler = &proc_dointvec_minmax,
+ .proc_handler = &dirty_background_ratio_handler,
.strategy = &sysctl_intvec,
.extra1 = &zero,
.extra2 = &one_hundred,
},
{
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "dirty_background_bytes",
+ .data = &dirty_background_bytes,
+ .maxlen = sizeof(dirty_background_bytes),
+ .mode = 0644,
+ .proc_handler = &dirty_background_bytes_handler,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+ {
.ctl_name = VM_DIRTY_RATIO,
.procname = "dirty_ratio",
.data = &vm_dirty_ratio,
@@ -932,6 +942,16 @@ static struct ctl_table vm_table[] = {
.extra2 = &one_hundred,
},
{
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "dirty_bytes",
+ .data = &vm_dirty_bytes,
+ .maxlen = sizeof(vm_dirty_bytes),
+ .mode = 0644,
+ .proc_handler = &dirty_bytes_handler,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+ {
.procname = "dirty_writeback_centisecs",
.data = &dirty_writeback_interval,
.maxlen = sizeof(dirty_writeback_interval),
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -69,6 +69,12 @@ static inline long sync_writeback_pages(void)
int dirty_background_ratio = 5;
/*
+ * dirty_background_bytes starts at 0 (disabled) so that it is a function of
+ * dirty_background_ratio * the amount of dirtyable memory
+ */
+unsigned long dirty_background_bytes;
+
+/*
* free highmem will not be subtracted from the total free memory
* for calculating free ratios if vm_highmem_is_dirtyable is true
*/
@@ -80,6 +86,12 @@ int vm_highmem_is_dirtyable;
int vm_dirty_ratio = 10;
/*
+ * vm_dirty_bytes starts at 0 (disabled) so that it is a function of
+ * vm_dirty_ratio * the amount of dirtyable memory
+ */
+unsigned long vm_dirty_bytes;
+
+/*
* The interval between `kupdate'-style writebacks, in jiffies
*/
int dirty_writeback_interval = 5 * HZ;
@@ -135,23 +147,75 @@ static int calc_period_shift(void)
{
unsigned long dirty_total;
- dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) / 100;
+ if (vm_dirty_bytes)
+ dirty_total = vm_dirty_bytes / PAGE_SIZE;
+ else
+ dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
+ 100;
return 2 + ilog2(dirty_total - 1);
}
/*
- * update the period when the dirty ratio changes.
+ * update the period when the dirty threshold changes.
*/
+static void update_completion_period(void)
+{
+ int shift = calc_period_shift();
+ prop_change_shift(&vm_completions, shift);
+ prop_change_shift(&vm_dirties, shift);
+}
+
+int dirty_background_ratio_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+ if (ret == 0 && write)
+ dirty_background_bytes = 0;
+ return ret;
+}
+
+int dirty_background_bytes_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
+ if (ret == 0 && write)
+ dirty_background_ratio = 0;
+ return ret;
+}
+
int dirty_ratio_handler(struct ctl_table *table, int write,
struct file *filp, void __user *buffer, size_t *lenp,
loff_t *ppos)
{
int old_ratio = vm_dirty_ratio;
- int ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
- int shift = calc_period_shift();
- prop_change_shift(&vm_completions, shift);
- prop_change_shift(&vm_dirties, shift);
+ update_completion_period();
+ vm_dirty_bytes = 0;
+ }
+ return ret;
+}
+
+
+int dirty_bytes_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int old_bytes = vm_dirty_bytes;
+ int ret;
+
+ ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
+ if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
+ update_completion_period();
+ vm_dirty_ratio = 0;
}
return ret;
}
@@ -365,23 +429,29 @@ void
get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
{
- int background_ratio; /* Percentages */
- int dirty_ratio;
unsigned long background;
unsigned long dirty;
unsigned long available_memory = determine_dirtyable_memory();
struct task_struct *tsk;
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
+ if (vm_dirty_bytes)
+ dirty = (vm_dirty_bytes + PAGE_SIZE) / PAGE_SIZE;
+ else {
+ int dirty_ratio;
- background_ratio = dirty_background_ratio;
- if (background_ratio >= dirty_ratio)
- background_ratio = dirty_ratio / 2;
+ dirty_ratio = vm_dirty_ratio;
+ if (dirty_ratio < 5)
+ dirty_ratio = 5;
+ dirty = (dirty_ratio * available_memory) / 100;
+ }
+
+ if (dirty_background_bytes)
+ background = (dirty_background_bytes + PAGE_SIZE) / PAGE_SIZE;
+ else
+ background = (dirty_background_ratio * available_memory) / 100;
- background = (background_ratio * available_memory) / 100;
- dirty = (dirty_ratio * available_memory) / 100;
+ if (background >= dirty)
+ background = dirty / 2;
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
background += background / 4;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
@ 2008-11-24 19:15 ` Peter Zijlstra
2008-11-24 23:38 ` Andrew Morton
2009-02-01 1:22 ` [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches Sven Wegener
2 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2008-11-24 19:15 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, Dave Chinner, Christoph Lameter, linux-kernel
On Sun, 2008-11-23 at 10:55 -0800, David Rientjes wrote:
> This change introduces two new sysctls to /proc/sys/vm:
> dirty_background_bytes and dirty_bytes.
>
> dirty_background_bytes is the counterpart to dirty_background_ratio and
> dirty_bytes is the counterpart to dirty_ratio.
>
> With growing memory capacities of individual machines, it's no longer
> sufficient to specify dirty thresholds as a percentage of the amount of
> dirtyable memory over the entire system.
>
> dirty_background_bytes and dirty_bytes specify quantities of memory, in
> bytes, that represent the dirty limits for the entire system. If either
> of these values is set, its value represents the amount of dirty memory
> that is needed to commence either background or direct writeback.
>
> When a `bytes' or `ratio' file is written, its counterpart becomes a
> function of the written value. For example, if dirty_bytes is written to
> be 8096, 8K of memory is required to commence direct writeback.
> dirty_ratio is then functionally equivalent to 8K / the amount of
> dirtyable memory:
>
> dirtyable_memory = free pages + mapped pages + file cache
>
> dirty_background_bytes = dirty_background_ratio * dirtyable_memory
> -or-
> dirty_background_ratio = dirty_background_bytes / dirtyable_memory
>
> AND
>
> dirty_bytes = dirty_ratio * dirtyable_memory
> -or-
> dirty_ratio = dirty_bytes / dirtyable_memory
>
> Only one of dirty_background_bytes and dirty_background_ratio may be
> specified at a time, and only one of dirty_bytes and dirty_ratio may be
> specified. When one sysctl is written, the other appears as 0 when read.
>
> The `bytes' files operate on a page size granularity since dirty limits
> are compared with ZVC values, which are in page units.
>
> Prior to this change, the minimum dirty_ratio was 5 as implemented by
> get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
> written value between 0 and 100. This restriction is maintained, but
> dirty_bytes has a lower limit of only one page.
>
> Also prior to this change, the dirty_background_ratio could not equal or
> exceed dirty_ratio. This restriction is maintained in addition to
> restricting dirty_background_bytes. If either background threshold equals
> or exceeds that of the dirty threshold, it is implicitly set to half the
> dirty threshold.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: David Rientjes <rientjes@google.com>
I don't really like the dual interface, but I guess given the
restrictions this is the best one can do... So
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 1/2] mm: change dirty limit type specifiers to unsigned long
2008-11-23 18:55 [patch 1/2] mm: change dirty limit type specifiers to unsigned long David Rientjes
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
@ 2008-11-24 19:16 ` Peter Zijlstra
2008-11-24 19:49 ` David Rientjes
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2008-11-24 19:16 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, Dave Chinner, Christoph Lameter, linux-kernel
On Sun, 2008-11-23 at 10:55 -0800, David Rientjes wrote:
> The background dirty and dirty limits are better defined with type
> specifiers of unsigned long since negative writeback thresholds are not
> possible.
>
> These values, as returned by get_dirty_limits(), are normally compared
> with ZVC values to determine whether writeback shall commence or be
> throttled. Such page counts cannot be negative, so declaring the page
> limits as signed is unnecessary.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: David Rientjes <rientjes@google.com>
This didn't pop up any weird and wonderfull assumptions on the
signed-ness of these variables I take it :-)
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 1/2] mm: change dirty limit type specifiers to unsigned long
2008-11-24 19:16 ` [patch 1/2] mm: change dirty limit type specifiers to unsigned long Peter Zijlstra
@ 2008-11-24 19:49 ` David Rientjes
0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2008-11-24 19:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Dave Chinner, Christoph Lameter, linux-kernel
On Mon, 24 Nov 2008, Peter Zijlstra wrote:
> This didn't pop up any weird and wonderfull assumptions on the
> signed-ness of these variables I take it :-)
>
It didn't; in fact, callers already used them as though they were
unsigned. For example, bdi_debug_stats_show():
seq_printf(m,
"BdiDirtyThresh: %8lu kB\n"
"DirtyThresh: %8lu kB\n"
"BackgroundThresh: %8lu kB\n",
K(bdi_thresh),
K(dirty_thresh),
K(background_thresh));
And get_dirty_limits() itself implicitly uses an unsigned type specifier
for the long *pbdi_dirty limit:
if (bdi) {
u64 bdi_dirty;
...
*pbdi_dirty = bdi_dirty;
}
And, given that the remainder of the get_dirty_limits() callers simply use
these for comparisons to the ZVC global_page_state() values, which are
also unsigned long, this conversion is safe.
Thanks Peter!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
2008-11-24 19:15 ` Peter Zijlstra
@ 2008-11-24 23:38 ` Andrew Morton
2008-11-25 0:52 ` Randy Dunlap
2008-11-25 1:00 ` David Rientjes
2009-02-01 1:22 ` [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches Sven Wegener
2 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2008-11-24 23:38 UTC (permalink / raw)
To: David Rientjes; +Cc: peterz, david, cl, linux-kernel
On Sun, 23 Nov 2008 10:55:16 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:
> This change introduces two new sysctls to /proc/sys/vm:
> dirty_background_bytes and dirty_bytes.
>
>
> ...
>
> @@ -365,23 +429,29 @@ void
> get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
> unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
> {
> - int background_ratio; /* Percentages */
> - int dirty_ratio;
> unsigned long background;
> unsigned long dirty;
> unsigned long available_memory = determine_dirtyable_memory();
> struct task_struct *tsk;
>
> - dirty_ratio = vm_dirty_ratio;
> - if (dirty_ratio < 5)
> - dirty_ratio = 5;
> + if (vm_dirty_bytes)
> + dirty = (vm_dirty_bytes + PAGE_SIZE) / PAGE_SIZE;
It would be conventional to use DIV_ROUND_UP() here.
> + else {
> + int dirty_ratio;
hm, I wonder why vm_dirty_ratio has a signed type.
> - background_ratio = dirty_background_ratio;
> - if (background_ratio >= dirty_ratio)
> - background_ratio = dirty_ratio / 2;
> + dirty_ratio = vm_dirty_ratio;
> + if (dirty_ratio < 5)
> + dirty_ratio = 5;
> + dirty = (dirty_ratio * available_memory) / 100;
> + }
> +
> + if (dirty_background_bytes)
> + background = (dirty_background_bytes + PAGE_SIZE) / PAGE_SIZE;
DIV_ROUND_UP()?
> + else
> + background = (dirty_background_ratio * available_memory) / 100;
>
> - background = (background_ratio * available_memory) / 100;
> - dirty = (dirty_ratio * available_memory) / 100;
> + if (background >= dirty)
> + background = dirty / 2;
> tsk = current;
> if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> background += background / 4;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-24 23:38 ` Andrew Morton
@ 2008-11-25 0:52 ` Randy Dunlap
2008-11-25 1:05 ` David Rientjes
2008-11-25 1:00 ` David Rientjes
1 sibling, 1 reply; 13+ messages in thread
From: Randy Dunlap @ 2008-11-25 0:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Rientjes, peterz, david, cl, linux-kernel
Andrew Morton wrote:
> On Sun, 23 Nov 2008 10:55:16 -0800 (PST)
> David Rientjes <rientjes@google.com> wrote:
>
>> This change introduces two new sysctls to /proc/sys/vm:
>> dirty_background_bytes and dirty_bytes.
>>
>>
>> ...
>>
>> @@ -365,23 +429,29 @@ void
>> get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
>> unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
>> {
>> - int background_ratio; /* Percentages */
>> - int dirty_ratio;
>> unsigned long background;
>> unsigned long dirty;
>> unsigned long available_memory = determine_dirtyable_memory();
>> struct task_struct *tsk;
>>
>> - dirty_ratio = vm_dirty_ratio;
>> - if (dirty_ratio < 5)
>> - dirty_ratio = 5;
>> + if (vm_dirty_bytes)
>> + dirty = (vm_dirty_bytes + PAGE_SIZE) / PAGE_SIZE;
>
> It would be conventional to use DIV_ROUND_UP() here.
>
>> + else {
>> + int dirty_ratio;
>
> hm, I wonder why vm_dirty_ratio has a signed type.
>
>> - background_ratio = dirty_background_ratio;
>> - if (background_ratio >= dirty_ratio)
>> - background_ratio = dirty_ratio / 2;
>> + dirty_ratio = vm_dirty_ratio;
>> + if (dirty_ratio < 5)
>> + dirty_ratio = 5;
>> + dirty = (dirty_ratio * available_memory) / 100;
>> + }
>> +
>> + if (dirty_background_bytes)
>> + background = (dirty_background_bytes + PAGE_SIZE) / PAGE_SIZE;
>
> DIV_ROUND_UP()?
Note that DIV_ROUND_UP() would make that
background = (dirty_background_byte + PAGE_SIZE - 1) / PAGE_SIZE;
but that's usually what is needed/intended.
>> + else
>> + background = (dirty_background_ratio * available_memory) / 100;
>>
>> - background = (background_ratio * available_memory) / 100;
>> - dirty = (dirty_ratio * available_memory) / 100;
>> + if (background >= dirty)
>> + background = dirty / 2;
>> tsk = current;
>> if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
>> background += background / 4;
--
~Randy
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-24 23:38 ` Andrew Morton
2008-11-25 0:52 ` Randy Dunlap
@ 2008-11-25 1:00 ` David Rientjes
1 sibling, 0 replies; 13+ messages in thread
From: David Rientjes @ 2008-11-25 1:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: peterz, david, cl, linux-kernel
On Mon, 24 Nov 2008, Andrew Morton wrote:
> > @@ -365,23 +429,29 @@ void
> > get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
> > unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
> > {
> > - int background_ratio; /* Percentages */
> > - int dirty_ratio;
> > unsigned long background;
> > unsigned long dirty;
> > unsigned long available_memory = determine_dirtyable_memory();
> > struct task_struct *tsk;
> >
> > - dirty_ratio = vm_dirty_ratio;
> > - if (dirty_ratio < 5)
> > - dirty_ratio = 5;
> > + if (vm_dirty_bytes)
> > + dirty = (vm_dirty_bytes + PAGE_SIZE) / PAGE_SIZE;
>
> It would be conventional to use DIV_ROUND_UP() here.
>
Ah, indeed. I forgot about those.
> > + else {
> > + int dirty_ratio;
>
> hm, I wonder why vm_dirty_ratio has a signed type.
>
vm_dirty_ratio only has an acceptable range between 0 and 100 since it
uses proc_dointvec_minmax(), so it's not really important. It could
definitely be changed, however.
There's a limitation in the sysctl interface where we don't handle
unsigned ints very well (ignore the reference to unsigned int in the
proc_dointvec comment). We lack an unsigned int handler, so all
proc_dointvec() users are implicitly signed.
For example, if a sysctl's data object were declared with a type specifier
of unsigned int and it uses proc_dointvec() as the handler, it would have
a x86_64 max of 2147483648, that of a signed integer.
> > - background_ratio = dirty_background_ratio;
> > - if (background_ratio >= dirty_ratio)
> > - background_ratio = dirty_ratio / 2;
> > + dirty_ratio = vm_dirty_ratio;
> > + if (dirty_ratio < 5)
> > + dirty_ratio = 5;
> > + dirty = (dirty_ratio * available_memory) / 100;
> > + }
> > +
> > + if (dirty_background_bytes)
> > + background = (dirty_background_bytes + PAGE_SIZE) / PAGE_SIZE;
>
> DIV_ROUND_UP()?
>
> > + else
> > + background = (dirty_background_ratio * available_memory) / 100;
> >
> > - background = (background_ratio * available_memory) / 100;
> > - dirty = (dirty_ratio * available_memory) / 100;
> > + if (background >= dirty)
> > + background = dirty / 2;
> > tsk = current;
> > if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> > background += background / 4;
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls
2008-11-25 0:52 ` Randy Dunlap
@ 2008-11-25 1:05 ` David Rientjes
0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2008-11-25 1:05 UTC (permalink / raw)
To: Randy Dunlap; +Cc: Andrew Morton, peterz, david, cl, linux-kernel
On Mon, 24 Nov 2008, Randy Dunlap wrote:
> >> - if (background_ratio >= dirty_ratio)
> >> - background_ratio = dirty_ratio / 2;
> >> + dirty_ratio = vm_dirty_ratio;
> >> + if (dirty_ratio < 5)
> >> + dirty_ratio = 5;
> >> + dirty = (dirty_ratio * available_memory) / 100;
> >> + }
> >> +
> >> + if (dirty_background_bytes)
> >> + background = (dirty_background_bytes + PAGE_SIZE) / PAGE_SIZE;
> >
> > DIV_ROUND_UP()?
>
> Note that DIV_ROUND_UP() would make that
>
> background = (dirty_background_byte + PAGE_SIZE - 1) / PAGE_SIZE;
>
> but that's usually what is needed/intended.
>
Good point. If we were to do this, then dirty_bytes and
dirty_background_bytes should use 1 instead of 0 as the minimum acceptable
value in proc_dointvec_minmax(). Otherwise, get_dirty_limits() would
return 0 pages for the dirty limit when 0 is written to either file since
integer division truncates and we'd be in a constant state of writeback :)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
2008-11-24 19:15 ` Peter Zijlstra
2008-11-24 23:38 ` Andrew Morton
@ 2009-02-01 1:22 ` Sven Wegener
2009-02-01 9:19 ` Peter Zijlstra
2009-02-02 0:26 ` David Rientjes
2 siblings, 2 replies; 13+ messages in thread
From: Sven Wegener @ 2009-02-01 1:22 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, Peter Zijlstra, Dave Chinner, Christoph Lameter,
linux-kernel
We need to pass an unsigned long as the minimum, because it gets casted
to an unsigned long in the sysctl handler. If we pass an int, we'll
access four more bytes on 64bit arches, resulting in a random minimum
value.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
---
kernel/sysctl.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 790f9d7..c5ef44f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -101,6 +101,7 @@ static int two = 2;
static int zero;
static int one = 1;
+static unsigned long one_ul = 1;
static int one_hundred = 100;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
@@ -974,7 +975,7 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &dirty_background_bytes_handler,
.strategy = &sysctl_intvec,
- .extra1 = &one,
+ .extra1 = &one_ul,
},
{
.ctl_name = VM_DIRTY_RATIO,
@@ -995,7 +996,7 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &dirty_bytes_handler,
.strategy = &sysctl_intvec,
- .extra1 = &one,
+ .extra1 = &one_ul,
},
{
.procname = "dirty_writeback_centisecs",
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches
2009-02-01 1:22 ` [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches Sven Wegener
@ 2009-02-01 9:19 ` Peter Zijlstra
2009-02-01 10:22 ` Sven Wegener
2009-02-02 0:26 ` David Rientjes
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-02-01 9:19 UTC (permalink / raw)
To: Sven Wegener
Cc: David Rientjes, Andrew Morton, Dave Chinner, Christoph Lameter,
linux-kernel
On Sun, 2009-02-01 at 02:22 +0100, Sven Wegener wrote:
> We need to pass an unsigned long as the minimum, because it gets casted
> to an unsigned long in the sysctl handler. If we pass an int, we'll
> access four more bytes on 64bit arches, resulting in a random minimum
> value.
If that's so, how can any of those other limit values still be good?
> Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
> ---
> kernel/sysctl.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 790f9d7..c5ef44f 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -101,6 +101,7 @@ static int two = 2;
>
> static int zero;
> static int one = 1;
> +static unsigned long one_ul = 1;
> static int one_hundred = 100;
>
> /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> @@ -974,7 +975,7 @@ static struct ctl_table vm_table[] = {
> .mode = 0644,
> .proc_handler = &dirty_background_bytes_handler,
> .strategy = &sysctl_intvec,
> - .extra1 = &one,
> + .extra1 = &one_ul,
> },
> {
> .ctl_name = VM_DIRTY_RATIO,
> @@ -995,7 +996,7 @@ static struct ctl_table vm_table[] = {
> .mode = 0644,
> .proc_handler = &dirty_bytes_handler,
> .strategy = &sysctl_intvec,
> - .extra1 = &one,
> + .extra1 = &one_ul,
> },
> {
> .procname = "dirty_writeback_centisecs",
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches
2009-02-01 9:19 ` Peter Zijlstra
@ 2009-02-01 10:22 ` Sven Wegener
0 siblings, 0 replies; 13+ messages in thread
From: Sven Wegener @ 2009-02-01 10:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: David Rientjes, Andrew Morton, Dave Chinner, Christoph Lameter,
linux-kernel
On Sun, 1 Feb 2009, Peter Zijlstra wrote:
> On Sun, 2009-02-01 at 02:22 +0100, Sven Wegener wrote:
> > We need to pass an unsigned long as the minimum, because it gets casted
> > to an unsigned long in the sysctl handler. If we pass an int, we'll
> > access four more bytes on 64bit arches, resulting in a random minimum
> > value.
>
> If that's so, how can any of those other limit values still be good?
The problem here is that we use proc_doulongvec_minmax(), which expects an
unsigned long. Most other sysctls use the dointvec versions and are safe,
as long as they pass an int. If they pass a char or a short, they're
subject to the same problem.
The other sysctls in sysctl.c that use proc_doulongvec_minmax() don't pass
a min/max value. The others in the kernel I checked do the same or pass a
proper unsigned long.
Sven
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches
2009-02-01 1:22 ` [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches Sven Wegener
2009-02-01 9:19 ` Peter Zijlstra
@ 2009-02-02 0:26 ` David Rientjes
1 sibling, 0 replies; 13+ messages in thread
From: David Rientjes @ 2009-02-02 0:26 UTC (permalink / raw)
To: Sven Wegener, Andrew Morton
Cc: Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel
On Sun, 1 Feb 2009, Sven Wegener wrote:
> We need to pass an unsigned long as the minimum, because it gets casted
> to an unsigned long in the sysctl handler. If we pass an int, we'll
> access four more bytes on 64bit arches, resulting in a random minimum
> value.
>
> Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: David Rientjes <rientjes@google.com>
You'll also need the following since vm_dirty_bytes is declared as an
unsigned long. Andrew, can you queue Sven's fix for rc4 with this folded
into it?
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/page-writeback.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -209,7 +209,7 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
struct file *filp, void __user *buffer, size_t *lenp,
loff_t *ppos)
{
- int old_bytes = vm_dirty_bytes;
+ unsigned long old_bytes = vm_dirty_bytes;
int ret;
ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-02-02 0:27 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-23 18:55 [patch 1/2] mm: change dirty limit type specifiers to unsigned long David Rientjes
2008-11-23 18:55 ` [patch 2/2] mm: add dirty_background_bytes and dirty_bytes sysctls David Rientjes
2008-11-24 19:15 ` Peter Zijlstra
2008-11-24 23:38 ` Andrew Morton
2008-11-25 0:52 ` Randy Dunlap
2008-11-25 1:05 ` David Rientjes
2008-11-25 1:00 ` David Rientjes
2009-02-01 1:22 ` [PATCH] mm: Fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches Sven Wegener
2009-02-01 9:19 ` Peter Zijlstra
2009-02-01 10:22 ` Sven Wegener
2009-02-02 0:26 ` David Rientjes
2008-11-24 19:16 ` [patch 1/2] mm: change dirty limit type specifiers to unsigned long Peter Zijlstra
2008-11-24 19:49 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox