* [PATCH] writeback: prevent bandwidth calculation overflow @ 2010-11-18 6:57 Wu Fengguang 2010-11-18 14:27 ` Rik van Riel 0 siblings, 1 reply; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 6:57 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Peter Zijlstra, Mel Gorman, Rik van Riel, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. However the current calculation code can overflow when disk bandwidth reaches 800MB/s. Fix it by using "long long" and swapping the order of multiplication/division. And further, change its unit to pages/second rather than bytes/second. That allows up to 16TB/s bandwidth in 32bit kernel. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/backing-dev.c | 4 ++-- mm/page-writeback.c | 13 ++++++------- 2 files changed, 8 insertions(+), 9 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-11-18 14:30:10.000000000 +0800 @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b unsigned long written; unsigned long elapsed; unsigned long bw; - unsigned long w; + unsigned long long w; if (*bw_written == 0) goto snapshot; @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b goto snapshot; written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; + bw = (HZ * written + elapsed/2) / elapsed; w = min(elapsed / unit_time, 128UL); bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; bdi->write_bandwidth_update_time = jiffies; @@ -602,8 +602,7 @@ static void balance_dirty_pages(struct a * of dirty pages have been cleaned during our pause time. */ if (nr_dirty < dirty_thresh && - bdi_prev_dirty - bdi_dirty > - bdi->write_bandwidth >> (PAGE_CACHE_SHIFT + 2)) + bdi_prev_dirty - bdi_dirty > bdi->write_bandwidth / 4) break; bdi_prev_dirty = bdi_dirty; @@ -620,13 +619,13 @@ static void balance_dirty_pages(struct a * time when there are lots of dirtiers. */ bw = bdi->write_bandwidth; - bw = bw * (bdi_thresh - bdi_dirty); bw = bw / (bdi_thresh / BDI_SOFT_DIRTY_LIMIT + 1); + bw = bw * (bdi_thresh - bdi_dirty); - bw = bw * (task_thresh - bdi_dirty); bw = bw / (bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); + bw = bw * (task_thresh - bdi_dirty); - pause = HZ * (pages_dirtied << PAGE_CACHE_SHIFT) / (bw + 1); + pause = HZ * pages_dirtied / (bw + 1); pause = clamp_val(pause, 1, HZ/10); pause: --- linux-next.orig/mm/backing-dev.c 2010-11-18 14:24:45.000000000 +0800 +++ linux-next/mm/backing-dev.c 2010-11-18 14:27:00.000000000 +0800 @@ -103,7 +103,7 @@ static int bdi_debug_stats_show(struct s (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), K(bdi_thresh), K(dirty_thresh), K(background_thresh), (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)), - (unsigned long) bdi->write_bandwidth >> 10, + (unsigned long) K(bdi->write_bandwidth), nr_dirty, nr_io, nr_more_io, !list_empty(&bdi->bdi_list), bdi->state); #undef K @@ -662,7 +662,7 @@ int bdi_init(struct backing_dev_info *bd goto err; } - bdi->write_bandwidth = 100 << 20; + bdi->write_bandwidth = (100 << 20) / PAGE_CACHE_SIZE; bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 6:57 [PATCH] writeback: prevent bandwidth calculation overflow Wu Fengguang @ 2010-11-18 14:27 ` Rik van Riel 2010-11-18 15:44 ` Wu Fengguang 0 siblings, 1 reply; 13+ messages in thread From: Rik van Riel @ 2010-11-18 14:27 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Peter Zijlstra, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On 11/18/2010 01:57 AM, Wu Fengguang wrote: > On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. > > However the current calculation code can overflow when disk bandwidth > reaches 800MB/s. Fix it by using "long long" and swapping the order of > multiplication/division. And further, change its unit to pages/second > rather than bytes/second. That allows up to 16TB/s bandwidth in 32bit > kernel. > > Signed-off-by: Wu Fengguang<fengguang.wu@intel.com> Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 14:27 ` Rik van Riel @ 2010-11-18 15:44 ` Wu Fengguang 2010-11-18 16:02 ` Peter Zijlstra 0 siblings, 1 reply; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 15:44 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Peter Zijlstra, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Thu, Nov 18, 2010 at 10:27:10PM +0800, Rik van Riel wrote: > On 11/18/2010 01:57 AM, Wu Fengguang wrote: > > On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. > > > > However the current calculation code can overflow when disk bandwidth > > reaches 800MB/s. Fix it by using "long long" and swapping the order of > > multiplication/division. And further, change its unit to pages/second > > rather than bytes/second. That allows up to 16TB/s bandwidth in 32bit > > kernel. > > > > Signed-off-by: Wu Fengguang<fengguang.wu@intel.com> > > Acked-by: Rik van Riel <riel@redhat.com> Thanks. I find that the swap of multiplication/division leads to underflow, so changed to use "long long". It's free for 64bit gcc anyway :) Thanks, Fengguang -- Subject: writeback: prevent bandwidth calculation overflow Date: Thu Nov 18 12:55:42 CST 2010 On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. However the current calculation code can overflow when disk bandwidth reaches 800MB/s. Fix it by using "long long" when doing calculations. And further change its unit from bytes/second to pages/second. That allows up to 16TB/s bandwidth in 32bit kernel. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/backing-dev.c | 4 ++-- mm/page-writeback.c | 11 +++++------ 2 files changed, 7 insertions(+), 8 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-11-18 22:40:18.000000000 +0800 @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b unsigned long written; unsigned long elapsed; unsigned long bw; - unsigned long w; + unsigned long long w; if (*bw_written == 0) goto snapshot; @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b goto snapshot; written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; + bw = (HZ * written + elapsed/2) / elapsed; w = min(elapsed / unit_time, 128UL); bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; bdi->write_bandwidth_update_time = jiffies; @@ -539,7 +539,7 @@ static void balance_dirty_pages(struct a unsigned long dirty_thresh; unsigned long bdi_thresh; unsigned long task_thresh; - unsigned long bw; + unsigned long long bw; unsigned long pause = 0; bool dirty_exceeded = false; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -602,8 +602,7 @@ static void balance_dirty_pages(struct a * of dirty pages have been cleaned during our pause time. */ if (nr_dirty < dirty_thresh && - bdi_prev_dirty - bdi_dirty > - bdi->write_bandwidth >> (PAGE_CACHE_SHIFT + 2)) + bdi_prev_dirty - bdi_dirty > bdi->write_bandwidth / 4) break; bdi_prev_dirty = bdi_dirty; @@ -626,7 +625,7 @@ static void balance_dirty_pages(struct a bw = bw * (task_thresh - bdi_dirty); bw = bw / (bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); - pause = HZ * (pages_dirtied << PAGE_CACHE_SHIFT) / (bw + 1); + pause = HZ * pages_dirtied / (bw + 1); pause = clamp_val(pause, 1, HZ/10); pause: --- linux-next.orig/mm/backing-dev.c 2010-11-18 14:24:45.000000000 +0800 +++ linux-next/mm/backing-dev.c 2010-11-18 14:27:00.000000000 +0800 @@ -103,7 +103,7 @@ static int bdi_debug_stats_show(struct s (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), K(bdi_thresh), K(dirty_thresh), K(background_thresh), (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)), - (unsigned long) bdi->write_bandwidth >> 10, + (unsigned long) K(bdi->write_bandwidth), nr_dirty, nr_io, nr_more_io, !list_empty(&bdi->bdi_list), bdi->state); #undef K @@ -662,7 +662,7 @@ int bdi_init(struct backing_dev_info *bd goto err; } - bdi->write_bandwidth = 100 << 20; + bdi->write_bandwidth = (100 << 20) / PAGE_CACHE_SIZE; bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 15:44 ` Wu Fengguang @ 2010-11-18 16:02 ` Peter Zijlstra 2010-11-18 16:06 ` Wu Fengguang 2010-11-18 16:13 ` Wu Fengguang 0 siblings, 2 replies; 13+ messages in thread From: Peter Zijlstra @ 2010-11-18 16:02 UTC (permalink / raw) To: Wu Fengguang Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > + pause = HZ * pages_dirtied / (bw + 1); Shouldn't that be using something like div64_u64 ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:02 ` Peter Zijlstra @ 2010-11-18 16:06 ` Wu Fengguang 2010-11-18 16:22 ` Wu Fengguang 2010-11-18 16:29 ` Peter Zijlstra 2010-11-18 16:13 ` Wu Fengguang 1 sibling, 2 replies; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 16:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > + pause = HZ * pages_dirtied / (bw + 1); > > Shouldn't that be using something like div64_u64 ? OK, but a dumb question: gcc cannot handle this implicitly? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:06 ` Wu Fengguang @ 2010-11-18 16:22 ` Wu Fengguang 2010-11-18 16:36 ` Wu Fengguang 2010-11-18 16:29 ` Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 16:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, Nov 19, 2010 at 12:06:52AM +0800, Wu Fengguang wrote: > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > Shouldn't that be using something like div64_u64 ? > > OK, but a dumb question: gcc cannot handle this implicitly? Hmm, I'm tempting to do pause = HZ * pages_dirtied / ((unsigned long)bw + 1); because here "bw" is guaranteed to be smaller than ULONG_MAX. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:22 ` Wu Fengguang @ 2010-11-18 16:36 ` Wu Fengguang 0 siblings, 0 replies; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 16:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, Nov 19, 2010 at 12:22:22AM +0800, Wu Fengguang wrote: > On Fri, Nov 19, 2010 at 12:06:52AM +0800, Wu Fengguang wrote: > > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > > > Shouldn't that be using something like div64_u64 ? > > > > OK, but a dumb question: gcc cannot handle this implicitly? > > Hmm, I'm tempting to do > > pause = HZ * pages_dirtied / ((unsigned long)bw + 1); > > because here "bw" is guaranteed to be smaller than ULONG_MAX. Here it is :) --- Subject: writeback: prevent bandwidth calculation overflow Date: Thu Nov 18 12:55:42 CST 2010 On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. However the current calculation code can overflow when disk bandwidth reaches 800MB/s. Fix it by using "long long" and div64_u64() in the calculations. And further change its unit from bytes/second to pages/second. That allows up to 16TB/s bandwidth in 32bit kernel. CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/backing-dev.c | 4 ++-- mm/page-writeback.c | 11 +++++------ 2 files changed, 7 insertions(+), 8 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-11-19 00:22:39.000000000 +0800 @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b unsigned long written; unsigned long elapsed; unsigned long bw; - unsigned long w; + unsigned long long w; if (*bw_written == 0) goto snapshot; @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b goto snapshot; written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; + bw = (HZ * written + elapsed/2) / elapsed; w = min(elapsed / unit_time, 128UL); bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; bdi->write_bandwidth_update_time = jiffies; @@ -539,7 +539,7 @@ static void balance_dirty_pages(struct a unsigned long dirty_thresh; unsigned long bdi_thresh; unsigned long task_thresh; - unsigned long bw; + unsigned long long bw; unsigned long pause = 0; bool dirty_exceeded = false; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -602,8 +602,7 @@ static void balance_dirty_pages(struct a * of dirty pages have been cleaned during our pause time. */ if (nr_dirty < dirty_thresh && - bdi_prev_dirty - bdi_dirty > - bdi->write_bandwidth >> (PAGE_CACHE_SHIFT + 2)) + bdi_prev_dirty - bdi_dirty > bdi->write_bandwidth / 4) break; bdi_prev_dirty = bdi_dirty; @@ -626,7 +625,7 @@ static void balance_dirty_pages(struct a bw = bw * (task_thresh - bdi_dirty); bw = bw / (bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); - pause = HZ * (pages_dirtied << PAGE_CACHE_SHIFT) / (bw + 1); + pause = HZ * pages_dirtied / ((unsigned long)bw + 1); pause = clamp_val(pause, 1, HZ/10); pause: --- linux-next.orig/mm/backing-dev.c 2010-11-18 14:24:45.000000000 +0800 +++ linux-next/mm/backing-dev.c 2010-11-18 14:27:00.000000000 +0800 @@ -103,7 +103,7 @@ static int bdi_debug_stats_show(struct s (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), K(bdi_thresh), K(dirty_thresh), K(background_thresh), (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)), - (unsigned long) bdi->write_bandwidth >> 10, + (unsigned long) K(bdi->write_bandwidth), nr_dirty, nr_io, nr_more_io, !list_empty(&bdi->bdi_list), bdi->state); #undef K @@ -662,7 +662,7 @@ int bdi_init(struct backing_dev_info *bd goto err; } - bdi->write_bandwidth = 100 << 20; + bdi->write_bandwidth = (100 << 20) / PAGE_CACHE_SIZE; bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:06 ` Wu Fengguang 2010-11-18 16:22 ` Wu Fengguang @ 2010-11-18 16:29 ` Peter Zijlstra 2010-11-18 16:34 ` Wu Fengguang 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2010-11-18 16:29 UTC (permalink / raw) To: Wu Fengguang Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, 2010-11-19 at 00:06 +0800, Wu Fengguang wrote: > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > Shouldn't that be using something like div64_u64 ? > > OK, but a dumb question: gcc cannot handle this implicitly? it could, but we chose not to implement the symbol it emits for these things so as to cause pain.. that was still assuming the world of 32bit computing was relevant and 64bit divides were expensive ;-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:29 ` Peter Zijlstra @ 2010-11-18 16:34 ` Wu Fengguang 0 siblings, 0 replies; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 16:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, Nov 19, 2010 at 12:29:00AM +0800, Peter Zijlstra wrote: > On Fri, 2010-11-19 at 00:06 +0800, Wu Fengguang wrote: > > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > > > Shouldn't that be using something like div64_u64 ? > > > > OK, but a dumb question: gcc cannot handle this implicitly? > > it could, but we chose not to implement the symbol it emits for these > things so as to cause pain.. that was still assuming the world of 32bit > computing was relevant and 64bit divides were expensive ;-) Good to know that, thanks! So let's avoid it totally :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:02 ` Peter Zijlstra 2010-11-18 16:06 ` Wu Fengguang @ 2010-11-18 16:13 ` Wu Fengguang 2010-11-19 16:06 ` Michal Hocko 1 sibling, 1 reply; 13+ messages in thread From: Wu Fengguang @ 2010-11-18 16:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > + pause = HZ * pages_dirtied / (bw + 1); > > Shouldn't that be using something like div64_u64 ? Thanks for review. Here is the updated patch using div64_u64(). --- Subject: writeback: prevent bandwidth calculation overflow Date: Thu Nov 18 12:55:42 CST 2010 On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. However the current calculation code can overflow when disk bandwidth reaches 800MB/s. Fix it by using "long long" and div64_u64() in the calculations. And further change its unit from bytes/second to pages/second. That allows up to 16TB/s bandwidth in 32bit kernel. CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/backing-dev.c | 4 ++-- mm/page-writeback.c | 11 +++++------ 2 files changed, 7 insertions(+), 8 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-11-19 00:08:23.000000000 +0800 @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b unsigned long written; unsigned long elapsed; unsigned long bw; - unsigned long w; + unsigned long long w; if (*bw_written == 0) goto snapshot; @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b goto snapshot; written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; + bw = (HZ * written + elapsed/2) / elapsed; w = min(elapsed / unit_time, 128UL); bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; bdi->write_bandwidth_update_time = jiffies; @@ -539,7 +539,7 @@ static void balance_dirty_pages(struct a unsigned long dirty_thresh; unsigned long bdi_thresh; unsigned long task_thresh; - unsigned long bw; + unsigned long long bw; unsigned long pause = 0; bool dirty_exceeded = false; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -602,8 +602,7 @@ static void balance_dirty_pages(struct a * of dirty pages have been cleaned during our pause time. */ if (nr_dirty < dirty_thresh && - bdi_prev_dirty - bdi_dirty > - bdi->write_bandwidth >> (PAGE_CACHE_SHIFT + 2)) + bdi_prev_dirty - bdi_dirty > bdi->write_bandwidth / 4) break; bdi_prev_dirty = bdi_dirty; @@ -626,7 +625,7 @@ static void balance_dirty_pages(struct a bw = bw * (task_thresh - bdi_dirty); bw = bw / (bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); - pause = HZ * (pages_dirtied << PAGE_CACHE_SHIFT) / (bw + 1); + pause = div64_u64(HZ * pages_dirtied, bw + 1); pause = clamp_val(pause, 1, HZ/10); pause: --- linux-next.orig/mm/backing-dev.c 2010-11-18 14:24:45.000000000 +0800 +++ linux-next/mm/backing-dev.c 2010-11-18 14:27:00.000000000 +0800 @@ -103,7 +103,7 @@ static int bdi_debug_stats_show(struct s (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), K(bdi_thresh), K(dirty_thresh), K(background_thresh), (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)), - (unsigned long) bdi->write_bandwidth >> 10, + (unsigned long) K(bdi->write_bandwidth), nr_dirty, nr_io, nr_more_io, !list_empty(&bdi->bdi_list), bdi->state); #undef K @@ -662,7 +662,7 @@ int bdi_init(struct backing_dev_info *bd goto err; } - bdi->write_bandwidth = 100 << 20; + bdi->write_bandwidth = (100 << 20) / PAGE_CACHE_SIZE; bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-18 16:13 ` Wu Fengguang @ 2010-11-19 16:06 ` Michal Hocko 2010-11-19 18:44 ` Wu Fengguang 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2010-11-19 16:06 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Fri 19-11-10 00:13:56, Wu Fengguang wrote: > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > Shouldn't that be using something like div64_u64 ? > > Thanks for review. Here is the updated patch using div64_u64(). > > --- > Subject: writeback: prevent bandwidth calculation overflow > Date: Thu Nov 18 12:55:42 CST 2010 > > On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. > > However the current calculation code can overflow when disk bandwidth > reaches 800MB/s. Fix it by using "long long" and div64_u64() in the > calculations. > > And further change its unit from bytes/second to pages/second. > That allows up to 16TB/s bandwidth in 32bit kernel. > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl> > Acked-by: Rik van Riel <riel@redhat.com> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > --- > mm/backing-dev.c | 4 ++-- > mm/page-writeback.c | 11 +++++------ > 2 files changed, 7 insertions(+), 8 deletions(-) > > --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2010-11-19 00:08:23.000000000 +0800 > @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b > unsigned long written; > unsigned long elapsed; > unsigned long bw; > - unsigned long w; > + unsigned long long w; > > if (*bw_written == 0) > goto snapshot; > @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b > goto snapshot; > > written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; > - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; > + bw = (HZ * written + elapsed/2) / elapsed; Sorry for a dumb question, but where did PAGE_CACHE_SIZE part go? > w = min(elapsed / unit_time, 128UL); > bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; > bdi->write_bandwidth_update_time = jiffies; -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-19 16:06 ` Michal Hocko @ 2010-11-19 18:44 ` Wu Fengguang 2010-11-22 9:54 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Wu Fengguang @ 2010-11-19 18:44 UTC (permalink / raw) To: Michal Hocko Cc: Peter Zijlstra, Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Sat, Nov 20, 2010 at 12:06:53AM +0800, Michal Hocko wrote: > On Fri 19-11-10 00:13:56, Wu Fengguang wrote: > > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > > > Shouldn't that be using something like div64_u64 ? > > > > Thanks for review. Here is the updated patch using div64_u64(). > > > > --- > > Subject: writeback: prevent bandwidth calculation overflow > > Date: Thu Nov 18 12:55:42 CST 2010 > > > > On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. > > > > However the current calculation code can overflow when disk bandwidth > > reaches 800MB/s. Fix it by using "long long" and div64_u64() in the > > calculations. > > > > And further change its unit from bytes/second to pages/second. > > That allows up to 16TB/s bandwidth in 32bit kernel. > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Acked-by: Rik van Riel <riel@redhat.com> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > --- > > mm/backing-dev.c | 4 ++-- > > mm/page-writeback.c | 11 +++++------ > > 2 files changed, 7 insertions(+), 8 deletions(-) > > > > --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 > > +++ linux-next/mm/page-writeback.c 2010-11-19 00:08:23.000000000 +0800 > > @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b > > unsigned long written; > > unsigned long elapsed; > > unsigned long bw; > > - unsigned long w; > > + unsigned long long w; > > > > if (*bw_written == 0) > > goto snapshot; > > @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b > > goto snapshot; > > > > written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; > > - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; > > + bw = (HZ * written + elapsed/2) / elapsed; > > Sorry for a dumb question, but where did PAGE_CACHE_SIZE part go? Because write_bandwidth's unit is bumped from bytes/s to pages/s, so that it can express much higher bandwidth. I've updated the patch again, and will submit more fixes tomorrow. Thanks, Fengguang --- Subject: writeback: prevent bandwidth calculation overflow Date: Thu Nov 18 12:55:42 CST 2010 On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. However the current calculation code can overflow when disk bandwidth reaches 800MB/s. Fix it by using "long long" and div64_u64() in the calculations. And further change its unit from bytes/second to pages/second. That allows up to 16TB/s bandwidth in 32bit kernel. CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- include/linux/backing-dev.h | 5 +++-- mm/backing-dev.c | 4 ++-- mm/page-writeback.c | 14 +++++++------- 3 files changed, 12 insertions(+), 11 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-11-19 09:56:14.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-11-19 22:18:57.000000000 +0800 @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b unsigned long written; unsigned long elapsed; unsigned long bw; - unsigned long w; + unsigned long long w; if (*bw_written == 0) goto snapshot; @@ -513,9 +513,10 @@ void bdi_update_write_bandwidth(struct b goto snapshot; written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; + bw = (HZ * written + elapsed / 2) / elapsed; w = min(elapsed / unit_time, 128UL); - bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + bw * w) >> 10; + bdi->write_bandwidth = (bdi->write_bandwidth * (1024-w) + + bw * w + 1023) >> 10; bdi->write_bandwidth_update_time = jiffies; snapshot: *bw_written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]); @@ -539,7 +540,7 @@ static void balance_dirty_pages(struct a unsigned long dirty_thresh; unsigned long bdi_thresh; unsigned long task_thresh; - unsigned long bw; + unsigned long long bw; unsigned long pause = 0; bool dirty_exceeded = false; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -602,8 +603,7 @@ static void balance_dirty_pages(struct a * of dirty pages have been cleaned during our pause time. */ if (nr_dirty < dirty_thresh && - bdi_prev_dirty - bdi_dirty > - bdi->write_bandwidth >> (PAGE_CACHE_SHIFT + 2)) + bdi_prev_dirty - bdi_dirty > (long)bdi->write_bandwidth / 4) break; bdi_prev_dirty = bdi_dirty; @@ -626,7 +626,7 @@ static void balance_dirty_pages(struct a bw = bw * (task_thresh - bdi_dirty); bw = bw / (bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); - pause = HZ * (pages_dirtied << PAGE_CACHE_SHIFT) / (bw + 1); + pause = HZ * pages_dirtied / ((unsigned long)bw + 1); pause = clamp_val(pause, 1, HZ/10); pause: --- linux-next.orig/mm/backing-dev.c 2010-11-19 09:56:14.000000000 +0800 +++ linux-next/mm/backing-dev.c 2010-11-19 22:18:19.000000000 +0800 @@ -103,7 +103,7 @@ static int bdi_debug_stats_show(struct s (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), K(bdi_thresh), K(dirty_thresh), K(background_thresh), (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)), - (unsigned long) bdi->write_bandwidth >> 10, + (unsigned long) K(bdi->write_bandwidth), nr_dirty, nr_io, nr_more_io, !list_empty(&bdi->bdi_list), bdi->state); #undef K @@ -662,7 +662,7 @@ int bdi_init(struct backing_dev_info *bd goto err; } - bdi->write_bandwidth = 100 << 20; + bdi->write_bandwidth = (100 << 20) / PAGE_CACHE_SIZE; bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); --- linux-next.orig/include/linux/backing-dev.h 2010-11-19 19:43:05.000000000 +0800 +++ linux-next/include/linux/backing-dev.h 2010-11-19 22:18:19.000000000 +0800 @@ -74,9 +74,10 @@ struct backing_dev_info { struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS]; - struct prop_local_percpu completions; + unsigned long write_bandwidth; unsigned long write_bandwidth_update_time; - int write_bandwidth; + + struct prop_local_percpu completions; int dirty_exceeded; unsigned int min_ratio; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] writeback: prevent bandwidth calculation overflow 2010-11-19 18:44 ` Wu Fengguang @ 2010-11-22 9:54 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2010-11-22 9:54 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, Rik van Riel, Andrew Morton, Jan Kara, Li, Shaohua, Christoph Hellwig, Dave Chinner, Theodore Ts'o, Chris Mason, Mel Gorman, KOSAKI Motohiro, linux-mm, linux-fsdevel@vger.kernel.org, LKML On Sat 20-11-10 02:44:08, Wu Fengguang wrote: > On Sat, Nov 20, 2010 at 12:06:53AM +0800, Michal Hocko wrote: > > On Fri 19-11-10 00:13:56, Wu Fengguang wrote: > > > On Fri, Nov 19, 2010 at 12:02:01AM +0800, Peter Zijlstra wrote: > > > > On Thu, 2010-11-18 at 23:44 +0800, Wu Fengguang wrote: > > > > > + pause = HZ * pages_dirtied / (bw + 1); > > > > > > > > Shouldn't that be using something like div64_u64 ? > > > > > > Thanks for review. Here is the updated patch using div64_u64(). > > > > > > --- > > > Subject: writeback: prevent bandwidth calculation overflow > > > Date: Thu Nov 18 12:55:42 CST 2010 > > > > > > On 32bit kernel, bdi->write_bandwidth can express at most 4GB/s. > > > > > > However the current calculation code can overflow when disk bandwidth > > > reaches 800MB/s. Fix it by using "long long" and div64_u64() in the > > > calculations. > > > > > > And further change its unit from bytes/second to pages/second. > > > That allows up to 16TB/s bandwidth in 32bit kernel. > > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl> > > > Acked-by: Rik van Riel <riel@redhat.com> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > > --- > > > mm/backing-dev.c | 4 ++-- > > > mm/page-writeback.c | 11 +++++------ > > > 2 files changed, 7 insertions(+), 8 deletions(-) > > > > > > --- linux-next.orig/mm/page-writeback.c 2010-11-18 12:42:58.000000000 +0800 > > > +++ linux-next/mm/page-writeback.c 2010-11-19 00:08:23.000000000 +0800 > > > @@ -494,7 +494,7 @@ void bdi_update_write_bandwidth(struct b > > > unsigned long written; > > > unsigned long elapsed; > > > unsigned long bw; > > > - unsigned long w; > > > + unsigned long long w; > > > > > > if (*bw_written == 0) > > > goto snapshot; > > > @@ -513,7 +513,7 @@ void bdi_update_write_bandwidth(struct b > > > goto snapshot; > > > > > > written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]) - *bw_written; > > > - bw = (HZ * PAGE_CACHE_SIZE * written + elapsed/2) / elapsed; > > > + bw = (HZ * written + elapsed/2) / elapsed; > > > > Sorry for a dumb question, but where did PAGE_CACHE_SIZE part go? > > Because write_bandwidth's unit is bumped from bytes/s to pages/s, > so that it can express much higher bandwidth. Ahh, I can see it now. Thanks -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-11-22 9:54 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-18 6:57 [PATCH] writeback: prevent bandwidth calculation overflow Wu Fengguang 2010-11-18 14:27 ` Rik van Riel 2010-11-18 15:44 ` Wu Fengguang 2010-11-18 16:02 ` Peter Zijlstra 2010-11-18 16:06 ` Wu Fengguang 2010-11-18 16:22 ` Wu Fengguang 2010-11-18 16:36 ` Wu Fengguang 2010-11-18 16:29 ` Peter Zijlstra 2010-11-18 16:34 ` Wu Fengguang 2010-11-18 16:13 ` Wu Fengguang 2010-11-19 16:06 ` Michal Hocko 2010-11-19 18:44 ` Wu Fengguang 2010-11-22 9:54 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).