* [PATCH] mm: prevent divide error for small values of vm_dirty_bytes @ 2009-04-29 8:29 Andrea Righi 2009-04-29 8:44 ` Peter Zijlstra 2009-04-29 9:26 ` David Rientjes 0 siblings, 2 replies; 9+ messages in thread From: Andrea Righi @ 2009-04-29 8:29 UTC (permalink / raw) To: David Rientjes, akpm Cc: Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel, Andrea Righi Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid potential division by 0 (like the following) in get_dirty_limits(). [ 49.951610] divide error: 0000 [#1] PREEMPT SMP [ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent [ 49.952195] CPU 1 [ 49.952195] Modules linked in: pcspkr [ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1 [ 49.952195] RIP: 0010:[<ffffffff802d39a9>] [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202 [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 [ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000 [ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001 [ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78 [ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000 [ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0 [ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250) [ 49.952195] Stack: [ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000 [ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218 [ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7 [ 49.952195] Call Trace: [ 49.952195] [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0 [ 49.952195] [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120 [ 49.952195] [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330 [ 49.952195] [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460 [ 49.952195] [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0 [ 49.952195] [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0 [ 49.952195] [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0 [ 49.952195] [<ffffffff803034d1>] do_sync_write+0xf1/0x140 [ 49.952195] [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60 [ 49.952195] [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40 [ 49.952195] [<ffffffff8030411b>] vfs_write+0xcb/0x190 [ 49.952195] [<ffffffff803042d0>] sys_write+0x50/0x90 [ 49.952195] [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b [ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02 [ 49.952195] RIP [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP <ffff88001de03a98> [ 50.096523] ---[ end trace 008d7aa02f244d7b ]--- Signed-off-by: Andrea Righi <righi.andrea@gmail.com> --- mm/page-writeback.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 30351f0..9d6c427 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -213,6 +213,12 @@ int dirty_bytes_handler(struct ctl_table *table, int write, int ret; ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos); + /* + * Do not allow to set less than two pages for vm_dirty_byte: this is + * necessary to avoid division by 0 in get_dirty_limits(). + */ + vm_dirty_bytes = max_t(typeof(vm_dirty_bytes), + vm_dirty_bytes, 2 * PAGE_SIZE); if (ret == 0 && write && vm_dirty_bytes != old_bytes) { update_completion_period(); vm_dirty_ratio = 0; -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi @ 2009-04-29 8:44 ` Peter Zijlstra 2009-04-29 9:34 ` Andrea Righi 2009-04-29 9:26 ` David Rientjes 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2009-04-29 8:44 UTC (permalink / raw) To: Andrea Righi Cc: David Rientjes, akpm, Dave Chinner, Christoph Lameter, linux-kernel On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote: > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid > potential division by 0 (like the following) in get_dirty_limits(). isn't changing the .extra1 in the sysctl table a better fix? > Signed-off-by: Andrea Righi <righi.andrea@gmail.com> > --- > mm/page-writeback.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 30351f0..9d6c427 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -213,6 +213,12 @@ int dirty_bytes_handler(struct ctl_table *table, int write, > int ret; > > ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos); > + /* > + * Do not allow to set less than two pages for vm_dirty_byte: this is > + * necessary to avoid division by 0 in get_dirty_limits(). > + */ > + vm_dirty_bytes = max_t(typeof(vm_dirty_bytes), > + vm_dirty_bytes, 2 * PAGE_SIZE); > if (ret == 0 && write && vm_dirty_bytes != old_bytes) { > update_completion_period(); > vm_dirty_ratio = 0; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 8:44 ` Peter Zijlstra @ 2009-04-29 9:34 ` Andrea Righi 2009-04-29 20:02 ` David Rientjes 2009-04-29 21:46 ` Andrew Morton 0 siblings, 2 replies; 9+ messages in thread From: Andrea Righi @ 2009-04-29 9:34 UTC (permalink / raw) To: Peter Zijlstra Cc: David Rientjes, akpm, Dave Chinner, Christoph Lameter, linux-kernel On Wed, Apr 29, 2009 at 10:44:36AM +0200, Peter Zijlstra wrote: > On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote: > > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid > > potential division by 0 (like the following) in get_dirty_limits(). > > isn't changing the .extra1 in the sysctl table a better fix? I thought about that and probably you're right. But, in the first case if the user writes a value < 2*PAGE_SIZE vm_dirty_bytes is explicitly set to 2*PAGE_SIZE. In the other case the value is just ignored. Maybe the second case could generate some unexpected behaviours for the user. For example, the user writes X < 2*PAGE_SIZE to /proc/sys/vm/dirty_bytes, and could think that the system has started to use this new configuration, but it isn't actually. So, probably the best way is to change .extra1 and add a note in the documentation (see below). --- mm: prevent divide error for small values of vm_dirty_bytes Avoid to set less than two pages for vm_dirty_bytes: this is necessary to avoid potential division by 0 (like the following) in get_dirty_limits(). [ 49.951610] divide error: 0000 [#1] PREEMPT SMP [ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent [ 49.952195] CPU 1 [ 49.952195] Modules linked in: pcspkr [ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1 [ 49.952195] RIP: 0010:[<ffffffff802d39a9>] [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202 [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 [ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000 [ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001 [ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78 [ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000 [ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0 [ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250) [ 49.952195] Stack: [ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000 [ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218 [ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7 [ 49.952195] Call Trace: [ 49.952195] [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0 [ 49.952195] [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120 [ 49.952195] [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330 [ 49.952195] [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460 [ 49.952195] [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0 [ 49.952195] [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0 [ 49.952195] [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0 [ 49.952195] [<ffffffff803034d1>] do_sync_write+0xf1/0x140 [ 49.952195] [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60 [ 49.952195] [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40 [ 49.952195] [<ffffffff8030411b>] vfs_write+0xcb/0x190 [ 49.952195] [<ffffffff803042d0>] sys_write+0x50/0x90 [ 49.952195] [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b [ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02 [ 49.952195] RIP [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP <ffff88001de03a98> [ 50.096523] ---[ end trace 008d7aa02f244d7b ]--- Signed-off-by: Andrea Righi <righi.andrea@gmail.com> --- Documentation/sysctl/vm.txt | 4 ++++ kernel/sysctl.c | 5 ++++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 97c4b32..b716d33 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -90,6 +90,10 @@ will itself start writeback. If dirty_bytes is written, dirty_ratio becomes a function of its value (dirty_bytes / the amount of dirtyable system memory). +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any +value lower than this limit will be ignored and the old configuration will be +retained. + ============================================================== dirty_expire_centisecs diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e3d2c7d..ea78fa1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -103,6 +103,9 @@ static unsigned long one_ul = 1; static int one_hundred = 100; static int one_thousand = 1000; +/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ +static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -1006,7 +1009,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = &dirty_bytes_handler, .strategy = &sysctl_intvec, - .extra1 = &one_ul, + .extra1 = &dirty_bytes_min, }, { .procname = "dirty_writeback_centisecs", ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 9:34 ` Andrea Righi @ 2009-04-29 20:02 ` David Rientjes 2009-04-29 21:46 ` Andrew Morton 1 sibling, 0 replies; 9+ messages in thread From: David Rientjes @ 2009-04-29 20:02 UTC (permalink / raw) To: Andrea Righi Cc: Peter Zijlstra, akpm, Dave Chinner, Christoph Lameter, linux-kernel On Wed, 29 Apr 2009, Andrea Righi wrote: > On Wed, Apr 29, 2009 at 10:44:36AM +0200, Peter Zijlstra wrote: > > On Wed, 2009-04-29 at 10:29 +0200, Andrea Righi wrote: > > > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid > > > potential division by 0 (like the following) in get_dirty_limits(). > > > > isn't changing the .extra1 in the sysctl table a better fix? > > I thought about that and probably you're right. But, in the first case > if the user writes a value < 2*PAGE_SIZE vm_dirty_bytes is explicitly > set to 2*PAGE_SIZE. In the other case the value is just ignored. > > Maybe the second case could generate some unexpected behaviours for > the user. For example, the user writes X < 2*PAGE_SIZE to > /proc/sys/vm/dirty_bytes, and could think that the system has started to > use this new configuration, but it isn't actually. > > So, probably the best way is to change .extra1 and add a note in the > documentation (see below). > This is only caused by bdi, not by the dirty ratio itself; shouldn't the appropriate fix be in calc_period_shift()? In other words, allow dirty thresholds to be less than 2 * PAGE_SIZE but prevent the bdi fraction from having a zero denominator? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 9:34 ` Andrea Righi 2009-04-29 20:02 ` David Rientjes @ 2009-04-29 21:46 ` Andrew Morton 2009-05-01 14:56 ` Andrea Righi 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2009-04-29 21:46 UTC (permalink / raw) To: Andrea Righi; +Cc: peterz, rientjes, david, cl, linux-kernel On Wed, 29 Apr 2009 11:34:51 +0200 Andrea Righi <righi.andrea@gmail.com> wrote: > --- a/Documentation/sysctl/vm.txt > +++ b/Documentation/sysctl/vm.txt > @@ -90,6 +90,10 @@ will itself start writeback. > If dirty_bytes is written, dirty_ratio becomes a function of its value > (dirty_bytes / the amount of dirtyable system memory). > > +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any > +value lower than this limit will be ignored and the old configuration will be > +retained. Well. This implies that the write to the procfs file would appear to succeed. One hopes that the write would in fact return -EINVAL or such? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 21:46 ` Andrew Morton @ 2009-05-01 14:56 ` Andrea Righi 2009-05-01 19:53 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Andrea Righi @ 2009-05-01 14:56 UTC (permalink / raw) To: Andrew Morton; +Cc: peterz, rientjes, david, cl, linux-kernel On Wed, Apr 29, 2009 at 02:46:55PM -0700, Andrew Morton wrote: > On Wed, 29 Apr 2009 11:34:51 +0200 > Andrea Righi <righi.andrea@gmail.com> wrote: > > > --- a/Documentation/sysctl/vm.txt > > +++ b/Documentation/sysctl/vm.txt > > @@ -90,6 +90,10 @@ will itself start writeback. > > If dirty_bytes is written, dirty_ratio becomes a function of its value > > (dirty_bytes / the amount of dirtyable system memory). > > > > +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any > > +value lower than this limit will be ignored and the old configuration will be > > +retained. > > Well. This implies that the write to the procfs file would appear to > succeed. One hopes that the write would in fact return -EINVAL or > such? I definitely agree. Just tested the following patch and it looks much better with the error code. -Andrea --- sysctl: return error code if values are not within a valid range Currently __do_proc_doulongvec_minmax(), as well as __do_proc_dointvec(), simply skip the invalid values instead of return -EINVAL. A more correct behaviour is to report to the userspace that some values were invalid and they couldn't be written instead of silently drop them. For example (vm_dirty_bytes must be greater or equal than 2*PAGE_SIZE): - before: # cat /proc/sys/vm/dirty_bytes 0 # /bin/echo 1 > /proc/sys/vm/dirty_bytes # cat /proc/sys/vm/dirty_bytes 0 # /bin/echo 8192 > /proc/sys/vm/dirty_bytes # cat /proc/sys/vm/dirty_bytes 8192 - after: # cat /proc/sys/vm/dirty_bytes 0 # /bin/echo 1 > /proc/sys/vm/dirty_bytes /bin/echo: write error: Invalid argument # cat /proc/sys/vm/dirty_bytes 0 # /bin/echo 8192 > /proc/sys/vm/dirty_bytes # cat /proc/sys/vm/dirty_bytes 8192 As a bonus do few minor coding style fixup. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> --- kernel/sysctl.c | 47 +++++++++++++++++++++++++++++++---------------- 1 files changed, 31 insertions(+), 16 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ea78fa1..92e56cf 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2243,19 +2243,19 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, void *data) { #define TMPBUFLEN 21 - int *i, vleft, first=1, neg, val; + int *i, vleft, first = 1, neg, val, ret = 0; unsigned long lval; size_t left, len; - + char buf[TMPBUFLEN], *p; char __user *s = buffer; - + if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } - + i = (int *) tbl_data; vleft = table->maxlen / sizeof(*i); left = *lenp; @@ -2288,26 +2288,31 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, neg = 1; p++; } - if (*p < '0' || *p > '9') + if (*p < '0' || *p > '9') { + ret = -EINVAL; break; + } lval = simple_strtoul(p, &p, 0); len = p-buf; - if ((len < left) && *p && !isspace(*p)) + if ((len < left) && *p && !isspace(*p)) { + ret = -EINVAL; break; + } if (neg) val = -val; s += len; left -= len; - if (conv(&neg, &lval, i, 1, data)) + ret = conv(&neg, &lval, i, 1, data); + if (ret) break; } else { p = buf; if (!first) *p++ = '\t'; - + if (conv(&neg, &lval, i, 0, data)) break; @@ -2339,6 +2344,8 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, } if (write && first) return -EINVAL; + if (write && ret) + return ret; *lenp -= left; *ppos += *lenp; return 0; @@ -2477,23 +2484,23 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int { #define TMPBUFLEN 21 unsigned long *i, *min, *max, val; - int vleft, first=1, neg; + int vleft, first = 1, neg, ret = 0; size_t len, left; char buf[TMPBUFLEN], *p; char __user *s = buffer; - + if (!data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } - + i = (unsigned long *) data; min = (unsigned long *) table->extra1; max = (unsigned long *) table->extra2; vleft = table->maxlen / sizeof(unsigned long); left = *lenp; - + for (; left && vleft--; i++, min++, max++, first=0) { if (write) { while (left) { @@ -2519,12 +2526,16 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int neg = 1; p++; } - if (*p < '0' || *p > '9') + if (*p < '0' || *p > '9') { + ret = -EINVAL; break; + } val = simple_strtoul(p, &p, 0) * convmul / convdiv ; len = p-buf; - if ((len < left) && *p && !isspace(*p)) + if ((len < left) && *p && !isspace(*p)) { + ret = -EINVAL; break; + } if (neg) val = -val; s += len; @@ -2532,8 +2543,10 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int if(neg) continue; - if ((min && val < *min) || (max && val > *max)) - continue; + if ((min && val < *min) || (max && val > *max)) { + ret = -EINVAL; + break; + } *i = val; } else { p = buf; @@ -2567,6 +2580,8 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int } if (write && first) return -EINVAL; + if (write && ret) + return ret; *lenp -= left; *ppos += *lenp; return 0; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-05-01 14:56 ` Andrea Righi @ 2009-05-01 19:53 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2009-05-01 19:53 UTC (permalink / raw) To: Andrea Righi; +Cc: peterz, rientjes, david, cl, linux-kernel On Fri, 1 May 2009 16:56:40 +0200 Andrea Righi <righi.andrea@gmail.com> wrote: > On Wed, Apr 29, 2009 at 02:46:55PM -0700, Andrew Morton wrote: > > On Wed, 29 Apr 2009 11:34:51 +0200 > > Andrea Righi <righi.andrea@gmail.com> wrote: > > > > > --- a/Documentation/sysctl/vm.txt > > > +++ b/Documentation/sysctl/vm.txt > > > @@ -90,6 +90,10 @@ will itself start writeback. > > > If dirty_bytes is written, dirty_ratio becomes a function of its value > > > (dirty_bytes / the amount of dirtyable system memory). > > > > > > +Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any > > > +value lower than this limit will be ignored and the old configuration will be > > > +retained. > > > > Well. This implies that the write to the procfs file would appear to > > succeed. One hopes that the write would in fact return -EINVAL or > > such? > > I definitely agree. Just tested the following patch and it looks much > better with the error code. > > -Andrea > > --- > sysctl: return error code if values are not within a valid range > > Currently __do_proc_doulongvec_minmax(), as well as > __do_proc_dointvec(), simply skip the invalid values instead of return > -EINVAL. Oh geeze, I didn't know that. > A more correct behaviour is to report to the userspace that some values > were invalid and they couldn't be written instead of silently drop > them. > > For example (vm_dirty_bytes must be greater or equal than 2*PAGE_SIZE): > - before: > # cat /proc/sys/vm/dirty_bytes > 0 > # /bin/echo 1 > /proc/sys/vm/dirty_bytes > # cat /proc/sys/vm/dirty_bytes > 0 > # /bin/echo 8192 > /proc/sys/vm/dirty_bytes > # cat /proc/sys/vm/dirty_bytes > 8192 > > - after: > # cat /proc/sys/vm/dirty_bytes > 0 > # /bin/echo 1 > /proc/sys/vm/dirty_bytes > /bin/echo: write error: Invalid argument > # cat /proc/sys/vm/dirty_bytes > 0 > # /bin/echo 8192 > /proc/sys/vm/dirty_bytes > # cat /proc/sys/vm/dirty_bytes > 8192 > Unfortunately the potential here for breaking existing userspace is huge. I think it's too late for us to fix this :( ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi 2009-04-29 8:44 ` Peter Zijlstra @ 2009-04-29 9:26 ` David Rientjes 2009-04-29 9:40 ` Andrea Righi 1 sibling, 1 reply; 9+ messages in thread From: David Rientjes @ 2009-04-29 9:26 UTC (permalink / raw) To: Andrea Righi Cc: akpm, Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel On Wed, 29 Apr 2009, Andrea Righi wrote: > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid > potential division by 0 (like the following) in get_dirty_limits(). > Where exactly is the divide by 0 in get_dirty_limits()? Is this because we forced the background threshold to be half of the dirty threshold when its setting is greater? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: prevent divide error for small values of vm_dirty_bytes 2009-04-29 9:26 ` David Rientjes @ 2009-04-29 9:40 ` Andrea Righi 0 siblings, 0 replies; 9+ messages in thread From: Andrea Righi @ 2009-04-29 9:40 UTC (permalink / raw) To: David Rientjes Cc: akpm, Peter Zijlstra, Dave Chinner, Christoph Lameter, linux-kernel On Wed, Apr 29, 2009 at 02:26:11AM -0700, David Rientjes wrote: > On Wed, 29 Apr 2009, Andrea Righi wrote: > > > Avoid to set less than two pages for vm_dirty_byte: this is necessary to avoid > > potential division by 0 (like the following) in get_dirty_limits(). > > > > Where exactly is the divide by 0 in get_dirty_limits()? in the evaluation of the BDI's share ratio: ... if (bdi) { u64 bdi_dirty; long numerator, denominator; /* * Calculate this BDI's share of the dirty ratio. */ bdi_writeout_fraction(bdi, &numerator, &denominator); bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100; bdi_dirty *= numerator; do_div(bdi_dirty, denominator); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... More exactly: /home/righiandr/Software/linux/git/linux-2.6/mm/page-writeback.c:474 1186: 48 89 f0 mov %rsi,%rax 1189: 48 f7 f7 div %rdi And looking at the trace: [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 ^^^^^^^^^^^^^^^^^^^^^ > > Is this because we forced the background threshold to be half of the dirty > threshold when its setting is greater? No (see above). -Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-05-01 19:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-29 8:29 [PATCH] mm: prevent divide error for small values of vm_dirty_bytes Andrea Righi 2009-04-29 8:44 ` Peter Zijlstra 2009-04-29 9:34 ` Andrea Righi 2009-04-29 20:02 ` David Rientjes 2009-04-29 21:46 ` Andrew Morton 2009-05-01 14:56 ` Andrea Righi 2009-05-01 19:53 ` Andrew Morton 2009-04-29 9:26 ` David Rientjes 2009-04-29 9:40 ` Andrea Righi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox