public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
@ 2009-04-29  8:29 Andrea Righi
  2009-04-29  8:44 ` Peter Zijlstra
  2009-04-29  9:26 ` David Rientjes
  0 siblings, 2 replies; 9+ messages in thread
From: Andrea Righi @ 2009-04-29  8:29 UTC (permalink / raw)
  To: David Rientjes, akpm
  Cc: Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel,
	Andrea Righi

Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
potential division by 0 (like the following) in get_dirty_limits().

[   49.951610] divide error: 0000 [#1] PREEMPT SMP
[   49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
[   49.952195] CPU 1
[   49.952195] Modules linked in: pcspkr
[   49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
[   49.952195] RIP: 0010:[<ffffffff802d39a9>]  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
[   49.952195] RSP: 0018:ffff88001de03a98  EFLAGS: 00010202
[   49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[   49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
[   49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
[   49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
[   49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
[   49.952195] FS:  00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
[   49.952195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
[   49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
[   49.952195] Stack:
[   49.952195]  ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
[   49.952195]  00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
[   49.952195]  0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
[   49.952195] Call Trace:
[   49.952195]  [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
[   49.952195]  [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120
[   49.952195]  [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330
[   49.952195]  [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460
[   49.952195]  [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0
[   49.952195]  [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0
[   49.952195]  [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0
[   49.952195]  [<ffffffff803034d1>] do_sync_write+0xf1/0x140
[   49.952195]  [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60
[   49.952195]  [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40
[   49.952195]  [<ffffffff8030411b>] vfs_write+0xcb/0x190
[   49.952195]  [<ffffffff803042d0>] sys_write+0x50/0x90
[   49.952195]  [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b
[   49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
[   49.952195] RIP  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
[   49.952195]  RSP <ffff88001de03a98>
[   50.096523] ---[ end trace 008d7aa02f244d7b ]---

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
 mm/page-writeback.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 30351f0..9d6c427 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -213,6 +213,12 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
 	int ret;
 
 	ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
+	/*
+	 * Do not allow to set less than two pages for vm_dirty_byte: this is
+	 * necessary to avoid division by 0 in get_dirty_limits().
+	 */
+	vm_dirty_bytes = max_t(typeof(vm_dirty_bytes),
+				vm_dirty_bytes, 2 * PAGE_SIZE);
 	if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
 		update_completion_period();
 		vm_dirty_ratio = 0;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi
@ 2009-04-29  8:44 ` Peter Zijlstra
  2009-04-29  9:34   ` Andrea Righi
  2009-04-29  9:26 ` David Rientjes
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-04-29  8:44 UTC (permalink / raw)
  To: Andrea Righi
  Cc: David Rientjes, akpm, Dave Chinner, Christoph Lameter,
	linux-kernel

On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote:
> Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
> potential division by 0 (like the following) in get_dirty_limits().

isn't changing the .extra1 in the sysctl table a better fix?

> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> ---
>  mm/page-writeback.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 30351f0..9d6c427 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -213,6 +213,12 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
>  	int ret;
>  
>  	ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
> +	/*
> +	 * Do not allow to set less than two pages for vm_dirty_byte: this is
> +	 * necessary to avoid division by 0 in get_dirty_limits().
> +	 */
> +	vm_dirty_bytes = max_t(typeof(vm_dirty_bytes),
> +				vm_dirty_bytes, 2 * PAGE_SIZE);
>  	if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
>  		update_completion_period();
>  		vm_dirty_ratio = 0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi
  2009-04-29  8:44 ` Peter Zijlstra
@ 2009-04-29  9:26 ` David Rientjes
  2009-04-29  9:40   ` Andrea Righi
  1 sibling, 1 reply; 9+ messages in thread
From: David Rientjes @ 2009-04-29  9:26 UTC (permalink / raw)
  To: Andrea Righi
  Cc: akpm, Peter Zijlstra, Dave Chinner, Christoph Lameter,
	linux-kernel

On Wed, 29 Apr 2009, Andrea Righi wrote:

> Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
> potential division by 0 (like the following) in get_dirty_limits().
> 

Where exactly is the divide by 0 in get_dirty_limits()?

Is this because we forced the background threshold to be half of the dirty 
threshold when its setting is greater?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  8:44 ` Peter Zijlstra
@ 2009-04-29  9:34   ` Andrea Righi
  2009-04-29 20:02     ` David Rientjes
  2009-04-29 21:46     ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Andrea Righi @ 2009-04-29  9:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Rientjes, akpm, Dave Chinner, Christoph Lameter,
	linux-kernel

On Wed, Apr 29, 2009 at 10:44:36AM +0200, Peter Zijlstra wrote:
> On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote:
> > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
> > potential division by 0 (like the following) in get_dirty_limits().
> 
> isn't changing the .extra1 in the sysctl table a better fix?

I thought about that and probably you're right. But, in the first case
if the user writes a value < 2*PAGE_SIZE vm_dirty_bytes is explicitly
set to 2*PAGE_SIZE. In the other case the value is just ignored.

Maybe the second case could generate some unexpected behaviours for
the user. For example, the user writes X < 2*PAGE_SIZE to
/proc/sys/vm/dirty_bytes, and could think that the system has started to
use this new configuration, but it isn't actually.

So, probably the best way is to change .extra1 and add a note in the
documentation (see below).

---
mm: prevent divide error for small values of vm_dirty_bytes

Avoid to set less than two pages for vm_dirty_bytes: this is necessary to
avoid potential division by 0 (like the following) in get_dirty_limits().

[   49.951610] divide error: 0000 [#1] PREEMPT SMP
[   49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
[   49.952195] CPU 1
[   49.952195] Modules linked in: pcspkr
[   49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
[   49.952195] RIP: 0010:[<ffffffff802d39a9>]  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
[   49.952195] RSP: 0018:ffff88001de03a98  EFLAGS: 00010202
[   49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[   49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
[   49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
[   49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
[   49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
[   49.952195] FS:  00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
[   49.952195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
[   49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
[   49.952195] Stack:
[   49.952195]  ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
[   49.952195]  00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
[   49.952195]  0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
[   49.952195] Call Trace:
[   49.952195]  [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
[   49.952195]  [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120
[   49.952195]  [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330
[   49.952195]  [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460
[   49.952195]  [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0
[   49.952195]  [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0
[   49.952195]  [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0
[   49.952195]  [<ffffffff803034d1>] do_sync_write+0xf1/0x140
[   49.952195]  [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60
[   49.952195]  [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40
[   49.952195]  [<ffffffff8030411b>] vfs_write+0xcb/0x190
[   49.952195]  [<ffffffff803042d0>] sys_write+0x50/0x90
[   49.952195]  [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b
[   49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
[   49.952195] RIP  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
[   49.952195]  RSP <ffff88001de03a98>
[   50.096523] ---[ end trace 008d7aa02f244d7b ]---

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
 Documentation/sysctl/vm.txt |    4 ++++
 kernel/sysctl.c             |    5 ++++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 97c4b32..b716d33 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -90,6 +90,10 @@ will itself start writeback.
 If dirty_bytes is written, dirty_ratio becomes a function of its value
 (dirty_bytes / the amount of dirtyable system memory).
 
+Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
+value lower than this limit will be ignored and the old configuration will be
+retained.
+
 ==============================================================
 
 dirty_expire_centisecs
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e3d2c7d..ea78fa1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -103,6 +103,9 @@ static unsigned long one_ul = 1;
 static int one_hundred = 100;
 static int one_thousand = 1000;
 
+/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
+static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
 static int minolduid;
@@ -1006,7 +1009,7 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &dirty_bytes_handler,
 		.strategy	= &sysctl_intvec,
-		.extra1		= &one_ul,
+		.extra1		= &dirty_bytes_min,
 	},
 	{
 		.procname	= "dirty_writeback_centisecs",

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  9:26 ` David Rientjes
@ 2009-04-29  9:40   ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2009-04-29  9:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: akpm, Peter Zijlstra, Dave Chinner, Christoph Lameter,
	linux-kernel

On Wed, Apr 29, 2009 at 02:26:11AM -0700, David Rientjes wrote:
> On Wed, 29 Apr 2009, Andrea Righi wrote:
> 
> > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
> > potential division by 0 (like the following) in get_dirty_limits().
> > 
> 
> Where exactly is the divide by 0 in get_dirty_limits()?

in the evaluation of the BDI's share ratio:
...
	if (bdi) {
		u64 bdi_dirty;
		long numerator, denominator;

		/*
		 * Calculate this BDI's share of the dirty ratio.
		 */
		bdi_writeout_fraction(bdi, &numerator, &denominator);

		bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100;
		bdi_dirty *= numerator;
		do_div(bdi_dirty, denominator);
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...

More exactly:
/home/righiandr/Software/linux/git/linux-2.6/mm/page-writeback.c:474
    1186:       48 89 f0                mov    %rsi,%rax
    1189:       48 f7 f7                div    %rdi

And looking at the trace:

[   49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[   49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
                                                           ^^^^^^^^^^^^^^^^^^^^^
> 
> Is this because we forced the background threshold to be half of the dirty 
> threshold when its setting is greater?

No (see above).

-Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  9:34   ` Andrea Righi
@ 2009-04-29 20:02     ` David Rientjes
  2009-04-29 21:46     ` Andrew Morton
  1 sibling, 0 replies; 9+ messages in thread
From: David Rientjes @ 2009-04-29 20:02 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Peter Zijlstra, akpm, Dave Chinner, Christoph Lameter,
	linux-kernel

On Wed, 29 Apr 2009, Andrea Righi wrote:

> On Wed, Apr 29, 2009 at 10:44:36AM +0200, Peter Zijlstra wrote:
> > On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote:
> > > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid
> > > potential division by 0 (like the following) in get_dirty_limits().
> > 
> > isn't changing the .extra1 in the sysctl table a better fix?
> 
> I thought about that and probably you're right. But, in the first case
> if the user writes a value < 2*PAGE_SIZE vm_dirty_bytes is explicitly
> set to 2*PAGE_SIZE. In the other case the value is just ignored.
> 
> Maybe the second case could generate some unexpected behaviours for
> the user. For example, the user writes X < 2*PAGE_SIZE to
> /proc/sys/vm/dirty_bytes, and could think that the system has started to
> use this new configuration, but it isn't actually.
> 
> So, probably the best way is to change .extra1 and add a note in the
> documentation (see below).
> 

This is only caused by bdi, not by the dirty ratio itself; shouldn't the 
appropriate fix be in calc_period_shift()?  In other words, allow 
dirty thresholds to be less than 2 * PAGE_SIZE but prevent the bdi 
fraction from having a zero denominator?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29  9:34   ` Andrea Righi
  2009-04-29 20:02     ` David Rientjes
@ 2009-04-29 21:46     ` Andrew Morton
  2009-05-01 14:56       ` Andrea Righi
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2009-04-29 21:46 UTC (permalink / raw)
  To: Andrea Righi; +Cc: peterz, rientjes, david, cl, linux-kernel

On Wed, 29 Apr 2009 11:34:51 +0200
Andrea Righi <righi.andrea@gmail.com> wrote:

> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -90,6 +90,10 @@ will itself start writeback.
>  If dirty_bytes is written, dirty_ratio becomes a function of its value
>  (dirty_bytes / the amount of dirtyable system memory).
>  
> +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> +value lower than this limit will be ignored and the old configuration will be
> +retained.

Well.  This implies that the write to the procfs file would appear to
succeed.  One hopes that the write would in fact return -EINVAL or
such?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-04-29 21:46     ` Andrew Morton
@ 2009-05-01 14:56       ` Andrea Righi
  2009-05-01 19:53         ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Righi @ 2009-05-01 14:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: peterz, rientjes, david, cl, linux-kernel

On Wed, Apr 29, 2009 at 02:46:55PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 11:34:51 +0200
> Andrea Righi <righi.andrea@gmail.com> wrote:
> 
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -90,6 +90,10 @@ will itself start writeback.
> >  If dirty_bytes is written, dirty_ratio becomes a function of its value
> >  (dirty_bytes / the amount of dirtyable system memory).
> >  
> > +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> > +value lower than this limit will be ignored and the old configuration will be
> > +retained.
> 
> Well.  This implies that the write to the procfs file would appear to
> succeed.  One hopes that the write would in fact return -EINVAL or
> such?

I definitely agree. Just tested the following patch and it looks much
better with the error code.

-Andrea

---
sysctl: return error code if values are not within a valid range

Currently __do_proc_doulongvec_minmax(), as well as
__do_proc_dointvec(), simply skip the invalid values instead of return
-EINVAL.

A more correct behaviour is to report to the userspace that some values
were invalid and they couldn't be written instead of silently drop
them.

For example (vm_dirty_bytes must be greater or equal than 2*PAGE_SIZE):
- before:
  # cat /proc/sys/vm/dirty_bytes
  0
  # /bin/echo 1 > /proc/sys/vm/dirty_bytes
  # cat /proc/sys/vm/dirty_bytes
  0
  # /bin/echo 8192 > /proc/sys/vm/dirty_bytes
  # cat /proc/sys/vm/dirty_bytes
  8192

- after:
  # cat /proc/sys/vm/dirty_bytes
  0
  # /bin/echo 1 > /proc/sys/vm/dirty_bytes
  /bin/echo: write error: Invalid argument
  # cat /proc/sys/vm/dirty_bytes
  0
  # /bin/echo 8192 > /proc/sys/vm/dirty_bytes
  # cat /proc/sys/vm/dirty_bytes
  8192

As a bonus do few minor coding style fixup.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
 kernel/sysctl.c |   47 +++++++++++++++++++++++++++++++----------------
 1 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ea78fa1..92e56cf 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2243,19 +2243,19 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
 		  void *data)
 {
 #define TMPBUFLEN 21
-	int *i, vleft, first=1, neg, val;
+	int *i, vleft, first = 1, neg, val, ret = 0;
 	unsigned long lval;
 	size_t left, len;
-	
+
 	char buf[TMPBUFLEN], *p;
 	char __user *s = buffer;
-	
+
 	if (!tbl_data || !table->maxlen || !*lenp ||
 	    (*ppos && !write)) {
 		*lenp = 0;
 		return 0;
 	}
-	
+
 	i = (int *) tbl_data;
 	vleft = table->maxlen / sizeof(*i);
 	left = *lenp;
@@ -2288,26 +2288,31 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
 				neg = 1;
 				p++;
 			}
-			if (*p < '0' || *p > '9')
+			if (*p < '0' || *p > '9') {
+				ret = -EINVAL;
 				break;
+			}
 
 			lval = simple_strtoul(p, &p, 0);
 
 			len = p-buf;
-			if ((len < left) && *p && !isspace(*p))
+			if ((len < left) && *p && !isspace(*p)) {
+				ret = -EINVAL;
 				break;
+			}
 			if (neg)
 				val = -val;
 			s += len;
 			left -= len;
 
-			if (conv(&neg, &lval, i, 1, data))
+			ret = conv(&neg, &lval, i, 1, data);
+			if (ret)
 				break;
 		} else {
 			p = buf;
 			if (!first)
 				*p++ = '\t';
-	
+
 			if (conv(&neg, &lval, i, 0, data))
 				break;
 
@@ -2339,6 +2344,8 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
 	}
 	if (write && first)
 		return -EINVAL;
+	if (write && ret)
+		return ret;
 	*lenp -= left;
 	*ppos += *lenp;
 	return 0;
@@ -2477,23 +2484,23 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 {
 #define TMPBUFLEN 21
 	unsigned long *i, *min, *max, val;
-	int vleft, first=1, neg;
+	int vleft, first = 1, neg, ret = 0;
 	size_t len, left;
 	char buf[TMPBUFLEN], *p;
 	char __user *s = buffer;
-	
+
 	if (!data || !table->maxlen || !*lenp ||
 	    (*ppos && !write)) {
 		*lenp = 0;
 		return 0;
 	}
-	
+
 	i = (unsigned long *) data;
 	min = (unsigned long *) table->extra1;
 	max = (unsigned long *) table->extra2;
 	vleft = table->maxlen / sizeof(unsigned long);
 	left = *lenp;
-	
+
 	for (; left && vleft--; i++, min++, max++, first=0) {
 		if (write) {
 			while (left) {
@@ -2519,12 +2526,16 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 				neg = 1;
 				p++;
 			}
-			if (*p < '0' || *p > '9')
+			if (*p < '0' || *p > '9') {
+				ret = -EINVAL;
 				break;
+			}
 			val = simple_strtoul(p, &p, 0) * convmul / convdiv ;
 			len = p-buf;
-			if ((len < left) && *p && !isspace(*p))
+			if ((len < left) && *p && !isspace(*p)) {
+				ret = -EINVAL;
 				break;
+			}
 			if (neg)
 				val = -val;
 			s += len;
@@ -2532,8 +2543,10 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 
 			if(neg)
 				continue;
-			if ((min && val < *min) || (max && val > *max))
-				continue;
+			if ((min && val < *min) || (max && val > *max)) {
+				ret = -EINVAL;
+				break;
+			}
 			*i = val;
 		} else {
 			p = buf;
@@ -2567,6 +2580,8 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 	}
 	if (write && first)
 		return -EINVAL;
+	if (write && ret)
+		return ret;
 	*lenp -= left;
 	*ppos += *lenp;
 	return 0;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes
  2009-05-01 14:56       ` Andrea Righi
@ 2009-05-01 19:53         ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2009-05-01 19:53 UTC (permalink / raw)
  To: Andrea Righi; +Cc: peterz, rientjes, david, cl, linux-kernel

On Fri, 1 May 2009 16:56:40 +0200
Andrea Righi <righi.andrea@gmail.com> wrote:

> On Wed, Apr 29, 2009 at 02:46:55PM -0700, Andrew Morton wrote:
> > On Wed, 29 Apr 2009 11:34:51 +0200
> > Andrea Righi <righi.andrea@gmail.com> wrote:
> > 
> > > --- a/Documentation/sysctl/vm.txt
> > > +++ b/Documentation/sysctl/vm.txt
> > > @@ -90,6 +90,10 @@ will itself start writeback.
> > >  If dirty_bytes is written, dirty_ratio becomes a function of its value
> > >  (dirty_bytes / the amount of dirtyable system memory).
> > >  
> > > +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> > > +value lower than this limit will be ignored and the old configuration will be
> > > +retained.
> > 
> > Well.  This implies that the write to the procfs file would appear to
> > succeed.  One hopes that the write would in fact return -EINVAL or
> > such?
> 
> I definitely agree. Just tested the following patch and it looks much
> better with the error code.
> 
> -Andrea
> 
> ---
> sysctl: return error code if values are not within a valid range
> 
> Currently __do_proc_doulongvec_minmax(), as well as
> __do_proc_dointvec(), simply skip the invalid values instead of return
> -EINVAL.

Oh geeze, I didn't know that.

> A more correct behaviour is to report to the userspace that some values
> were invalid and they couldn't be written instead of silently drop
> them.
> 
> For example (vm_dirty_bytes must be greater or equal than 2*PAGE_SIZE):
> - before:
>   # cat /proc/sys/vm/dirty_bytes
>   0
>   # /bin/echo 1 > /proc/sys/vm/dirty_bytes
>   # cat /proc/sys/vm/dirty_bytes
>   0
>   # /bin/echo 8192 > /proc/sys/vm/dirty_bytes
>   # cat /proc/sys/vm/dirty_bytes
>   8192
> 
> - after:
>   # cat /proc/sys/vm/dirty_bytes
>   0
>   # /bin/echo 1 > /proc/sys/vm/dirty_bytes
>   /bin/echo: write error: Invalid argument
>   # cat /proc/sys/vm/dirty_bytes
>   0
>   # /bin/echo 8192 > /proc/sys/vm/dirty_bytes
>   # cat /proc/sys/vm/dirty_bytes
>   8192
> 

Unfortunately the potential here for breaking existing userspace is
huge.  I think it's too late for us to fix this :(



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-01 19:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-29  8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi
2009-04-29  8:44 ` Peter Zijlstra
2009-04-29  9:34   ` Andrea Righi
2009-04-29 20:02     ` David Rientjes
2009-04-29 21:46     ` Andrew Morton
2009-05-01 14:56       ` Andrea Righi
2009-05-01 19:53         ` Andrew Morton
2009-04-29  9:26 ` David Rientjes
2009-04-29  9:40   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox