* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages [not found] ` <20090909150601.159061863@intel.com> @ 2009-09-09 15:44 ` Jan Kara 2009-09-10 1:42 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2009-09-09 15:44 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML, linux-fsdevel On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > Some filesystem may choose to write much more than ratelimit_pages > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > determine number to write based on real number of dirtied pages. > > The increased write_chunk may make the dirtier more bumpy. This is > filesystem writers' duty not to dirty too much at a time without > checking the ratelimit. I don't get this. balance_dirty_pages_ratelimited_nr() is called when we dirty the page, not when we write it out. So a problem would only happen if filesystem dirties pages by set_page_dirty() and won't call balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() and do_wp_page() takes care of that. So where's the problem? Honza > --- > mm/page-writeback.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > --- linux.orig/mm/page-writeback.c 2009-09-09 21:19:21.000000000 +0800 > +++ linux/mm/page-writeback.c 2009-09-09 21:25:45.000000000 +0800 > @@ -44,12 +44,12 @@ static long ratelimit_pages = 32; > /* > * When balance_dirty_pages decides that the caller needs to perform some > * non-background writeback, this is how many pages it will attempt to write. > - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably > + * It should be somewhat larger than dirtied pages to ensure that reasonably > * large amounts of I/O are submitted. > */ > -static inline long sync_writeback_pages(void) > +static inline long sync_writeback_pages(unsigned long dirtied) > { > - return ratelimit_pages + ratelimit_pages / 2; > + return dirtied + dirtied / 2; > } > > /* The following parameters are exported via /proc/sys/vm */ > @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro > * If we're over `background_thresh' then pdflush is woken to perform some > * writeout. > */ > -static void balance_dirty_pages(struct address_space *mapping) > +static void balance_dirty_pages(struct address_space *mapping, > + unsigned long write_chunk) > { > long nr_reclaimable, bdi_nr_reclaimable; > long nr_writeback, bdi_nr_writeback; > @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a > unsigned long dirty_thresh; > unsigned long bdi_thresh; > unsigned long pages_written = 0; > - unsigned long write_chunk = sync_writeback_pages(); > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > @@ -638,9 +638,10 @@ void balance_dirty_pages_ratelimited_nr( > p = &__get_cpu_var(bdp_ratelimits); > *p += nr_pages_dirtied; > if (unlikely(*p >= ratelimit)) { > + ratelimit = sync_writeback_pages(*p); > *p = 0; > preempt_enable(); > - balance_dirty_pages(mapping); > + balance_dirty_pages(mapping, ratelimit); > return; > } > preempt_enable(); > > -- > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-09 15:44 ` [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages Jan Kara @ 2009-09-10 1:42 ` Wu Fengguang 2009-09-10 12:57 ` Chris Mason 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 1:42 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > Some filesystem may choose to write much more than ratelimit_pages > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > determine number to write based on real number of dirtied pages. > > > > The increased write_chunk may make the dirtier more bumpy. This is > > filesystem writers' duty not to dirty too much at a time without > > checking the ratelimit. > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > dirty the page, not when we write it out. So a problem would only happen if > filesystem dirties pages by set_page_dirty() and won't call > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > and do_wp_page() takes care of that. So where's the problem? It seems that btrfs_file_write() is writing in chunks of up to 1024-pages (1024 is the computed nrptrs value in a 32bit kernel). And it calls balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. Thanks, Fengguang > > --- > > mm/page-writeback.c | 13 +++++++------ > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > > --- linux.orig/mm/page-writeback.c 2009-09-09 21:19:21.000000000 +0800 > > +++ linux/mm/page-writeback.c 2009-09-09 21:25:45.000000000 +0800 > > @@ -44,12 +44,12 @@ static long ratelimit_pages = 32; > > /* > > * When balance_dirty_pages decides that the caller needs to perform some > > * non-background writeback, this is how many pages it will attempt to write. > > - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably > > + * It should be somewhat larger than dirtied pages to ensure that reasonably > > * large amounts of I/O are submitted. > > */ > > -static inline long sync_writeback_pages(void) > > +static inline long sync_writeback_pages(unsigned long dirtied) > > { > > - return ratelimit_pages + ratelimit_pages / 2; > > + return dirtied + dirtied / 2; > > } > > > > /* The following parameters are exported via /proc/sys/vm */ > > @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro > > * If we're over `background_thresh' then pdflush is woken to perform some > > * writeout. > > */ > > -static void balance_dirty_pages(struct address_space *mapping) > > +static void balance_dirty_pages(struct address_space *mapping, > > + unsigned long write_chunk) > > { > > long nr_reclaimable, bdi_nr_reclaimable; > > long nr_writeback, bdi_nr_writeback; > > @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a > > unsigned long dirty_thresh; > > unsigned long bdi_thresh; > > unsigned long pages_written = 0; > > - unsigned long write_chunk = sync_writeback_pages(); > > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > > > @@ -638,9 +638,10 @@ void balance_dirty_pages_ratelimited_nr( > > p = &__get_cpu_var(bdp_ratelimits); > > *p += nr_pages_dirtied; > > if (unlikely(*p >= ratelimit)) { > > + ratelimit = sync_writeback_pages(*p); > > *p = 0; > > preempt_enable(); > > - balance_dirty_pages(mapping); > > + balance_dirty_pages(mapping, ratelimit); > > return; > > } > > preempt_enable(); > > > > -- > > > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 1:42 ` Wu Fengguang @ 2009-09-10 12:57 ` Chris Mason 2009-09-10 13:21 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Chris Mason @ 2009-09-10 12:57 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > Some filesystem may choose to write much more than ratelimit_pages > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > determine number to write based on real number of dirtied pages. > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > filesystem writers' duty not to dirty too much at a time without > > > checking the ratelimit. > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > dirty the page, not when we write it out. So a problem would only happen if > > filesystem dirties pages by set_page_dirty() and won't call > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > and do_wp_page() takes care of that. So where's the problem? > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. I can easily change this to call more often, but we do always call balance_dirty_pages to reflect how much ram we've really sent down. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 12:57 ` Chris Mason @ 2009-09-10 13:21 ` Wu Fengguang 2009-09-10 14:56 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 13:21 UTC (permalink / raw) To: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Pete On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote: > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > > Some filesystem may choose to write much more than ratelimit_pages > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > > determine number to write based on real number of dirtied pages. > > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > > filesystem writers' duty not to dirty too much at a time without > > > > checking the ratelimit. > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > > dirty the page, not when we write it out. So a problem would only happen if > > > filesystem dirties pages by set_page_dirty() and won't call > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > > and do_wp_page() takes care of that. So where's the problem? > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. > > I can easily change this to call more often, but we do always call > balance_dirty_pages to reflect how much ram we've really sent down. Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The need-change part is balance_dirty_pages_ratelimited_nr(), hence this patch :) Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 13:21 ` Wu Fengguang @ 2009-09-10 14:56 ` Peter Zijlstra 2009-09-10 15:14 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2009-09-10 14:56 UTC (permalink / raw) To: Wu Fengguang Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote: > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote: > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > > > Some filesystem may choose to write much more than ratelimit_pages > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > > > determine number to write based on real number of dirtied pages. > > > > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > > > filesystem writers' duty not to dirty too much at a time without > > > > > checking the ratelimit. > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > > > dirty the page, not when we write it out. So a problem would only happen if > > > > filesystem dirties pages by set_page_dirty() and won't call > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > > > and do_wp_page() takes care of that. So where's the problem? > > > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. > > > > I can easily change this to call more often, but we do always call > > balance_dirty_pages to reflect how much ram we've really sent down. > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The > need-change part is balance_dirty_pages_ratelimited_nr(), hence this > patch :) I'm not getting it, it calls set_page_dirty() for each page, right? and then it calls into balance_dirty_pages_ratelimited_nr(), that sounds right. What is the problem with that? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 14:56 ` Peter Zijlstra @ 2009-09-10 15:14 ` Wu Fengguang 2009-09-10 15:31 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 15:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote: > On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote: > > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote: > > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > > > > Some filesystem may choose to write much more than ratelimit_pages > > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > > > > determine number to write based on real number of dirtied pages. > > > > > > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > > > > filesystem writers' duty not to dirty too much at a time without > > > > > > checking the ratelimit. > > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > > > > dirty the page, not when we write it out. So a problem would only happen if > > > > > filesystem dirties pages by set_page_dirty() and won't call > > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > > > > and do_wp_page() takes care of that. So where's the problem? > > > > > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. > > > > > > I can easily change this to call more often, but we do always call > > > balance_dirty_pages to reflect how much ram we've really sent down. > > > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The > > need-change part is balance_dirty_pages_ratelimited_nr(), hence this > > patch :) > > I'm not getting it, it calls set_page_dirty() for each page, right? and > then it calls into balance_dirty_pages_ratelimited_nr(), that sounds > right. What is the problem with that? It looks like btrfs_file_write() eventually calls __set_page_dirty_buffers() which in turn won't call balance_dirty_pages*(). This is why do_wp_page() calls set_page_dirty_balance() to do balance_dirty_pages*(). So btrfs_file_write() explicitly calls balance_dirty_pages_ratelimited_nr() to get throttled. Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 15:14 ` Wu Fengguang @ 2009-09-10 15:31 ` Peter Zijlstra 2009-09-10 15:41 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2009-09-10 15:31 UTC (permalink / raw) To: Wu Fengguang Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, 2009-09-10 at 23:14 +0800, Wu Fengguang wrote: > On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote: > > On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote: > > > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote: > > > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > > > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > > > > > Some filesystem may choose to write much more than ratelimit_pages > > > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > > > > > determine number to write based on real number of dirtied pages. > > > > > > > > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > > > > > filesystem writers' duty not to dirty too much at a time without > > > > > > > checking the ratelimit. > > > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > > > > > dirty the page, not when we write it out. So a problem would only happen if > > > > > > filesystem dirties pages by set_page_dirty() and won't call > > > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > > > > > and do_wp_page() takes care of that. So where's the problem? > > > > > > > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > > > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > > > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. > > > > > > > > I can easily change this to call more often, but we do always call > > > > balance_dirty_pages to reflect how much ram we've really sent down. > > > > > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The > > > need-change part is balance_dirty_pages_ratelimited_nr(), hence this > > > patch :) > > > > I'm not getting it, it calls set_page_dirty() for each page, right? and > > then it calls into balance_dirty_pages_ratelimited_nr(), that sounds > > right. What is the problem with that? > > It looks like btrfs_file_write() eventually calls > __set_page_dirty_buffers() which in turn won't call > balance_dirty_pages*(). This is why do_wp_page() calls > set_page_dirty_balance() to do balance_dirty_pages*(). > > So btrfs_file_write() explicitly calls > balance_dirty_pages_ratelimited_nr() to get throttled. Right, so what is wrong with than, and how does this patch fix that? [ the only thing you have to be careful with is that you don't excessively grow the error bound on the dirty limit ] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 15:31 ` Peter Zijlstra @ 2009-09-10 15:41 ` Wu Fengguang 2009-09-10 15:54 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 15:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 11:31:38PM +0800, Peter Zijlstra wrote: > On Thu, 2009-09-10 at 23:14 +0800, Wu Fengguang wrote: > > On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote: > > > On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote: > > > > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote: > > > > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote: > > > > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote: > > > > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote: > > > > > > > > Some filesystem may choose to write much more than ratelimit_pages > > > > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > > > > > > > determine number to write based on real number of dirtied pages. > > > > > > > > > > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is > > > > > > > > filesystem writers' duty not to dirty too much at a time without > > > > > > > > checking the ratelimit. > > > > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we > > > > > > > dirty the page, not when we write it out. So a problem would only happen if > > > > > > > filesystem dirties pages by set_page_dirty() and won't call > > > > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write() > > > > > > > and do_wp_page() takes care of that. So where's the problem? > > > > > > > > > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages > > > > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls > > > > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk. > > > > > > > > > > I can easily change this to call more often, but we do always call > > > > > balance_dirty_pages to reflect how much ram we've really sent down. > > > > > > > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The > > > > need-change part is balance_dirty_pages_ratelimited_nr(), hence this > > > > patch :) > > > > > > I'm not getting it, it calls set_page_dirty() for each page, right? and > > > then it calls into balance_dirty_pages_ratelimited_nr(), that sounds > > > right. What is the problem with that? > > > > It looks like btrfs_file_write() eventually calls > > __set_page_dirty_buffers() which in turn won't call > > balance_dirty_pages*(). This is why do_wp_page() calls > > set_page_dirty_balance() to do balance_dirty_pages*(). > > > > So btrfs_file_write() explicitly calls > > balance_dirty_pages_ratelimited_nr() to get throttled. > > Right, so what is wrong with than, and how does this patch fix that? > > [ the only thing you have to be careful with is that you don't > excessively grow the error bound on the dirty limit ] Then we could form a loop: btrfs_file_write(): dirty 1024 pages balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5) in which the writeback rate cannot keep up with dirty rate, and the dirty pages go all the way beyond dirty_thresh. Sorry for writing such a vague changelog! Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 15:41 ` Wu Fengguang @ 2009-09-10 15:54 ` Peter Zijlstra 2009-09-10 16:08 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2009-09-10 15:54 UTC (permalink / raw) To: Wu Fengguang Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, 2009-09-10 at 23:41 +0800, Wu Fengguang wrote: > > > So btrfs_file_write() explicitly calls > > > balance_dirty_pages_ratelimited_nr() to get throttled. > > > > Right, so what is wrong with than, and how does this patch fix that? > > > > [ the only thing you have to be careful with is that you don't > > excessively grow the error bound on the dirty limit ] > > Then we could form a loop: > > btrfs_file_write(): dirty 1024 pages > balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5) > > in which the writeback rate cannot keep up with dirty rate, > and the dirty pages go all the way beyond dirty_thresh. Ah, ok so this is to keep the error bound on the dirty limit bounded, because we can break out of balance_dirty_pages() early, the /* We've done our duty */ break. Which unbalances the duty vs the dirty ratio. I figure that with the task dirty limit stuff we could maybe try to get rid of this break.. worth a try. > Sorry for writing such a vague changelog! np, as long as we get there :-) Change makes sense now, thanks! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-10 15:54 ` Peter Zijlstra @ 2009-09-10 16:08 ` Wu Fengguang 0 siblings, 0 replies; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 16:08 UTC (permalink / raw) To: Peter Zijlstra Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 11:54:29PM +0800, Peter Zijlstra wrote: > On Thu, 2009-09-10 at 23:41 +0800, Wu Fengguang wrote: > > > > So btrfs_file_write() explicitly calls > > > > balance_dirty_pages_ratelimited_nr() to get throttled. > > > > > > Right, so what is wrong with than, and how does this patch fix that? > > > > > > [ the only thing you have to be careful with is that you don't > > > excessively grow the error bound on the dirty limit ] > > > > Then we could form a loop: > > > > btrfs_file_write(): dirty 1024 pages > > balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5) > > > > in which the writeback rate cannot keep up with dirty rate, > > and the dirty pages go all the way beyond dirty_thresh. > > Ah, ok so this is to keep the error bound on the dirty limit bounded, > because we can break out of balance_dirty_pages() early, the /* We've > done our duty */ break. > > Which unbalances the duty vs the dirty ratio. Right! > I figure that with the task dirty limit stuff we could maybe try to get > rid of this break.. worth a try. Be careful. Without that break, the time a task get throttled in a single trip may go out of control. For example, task B get blocked for 1000 seconds because there is a task A keep dirtying pages, in the mean time task A's dirty thresh going down slowly, but still larger than B's. > > Sorry for writing such a vague changelog! > > np, as long as we get there :-) > > Change makes sense now, thanks! May I add you ack? Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20090909150600.330539880@intel.com>]
* Re: [RFC][PATCH 1/7] writeback: cleanup writeback_single_inode() [not found] ` <20090909150600.330539880@intel.com> @ 2009-09-09 15:45 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2009-09-09 15:45 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML, linux-fsdevel On Wed 09-09-09 22:51:42, Wu Fengguang wrote: > Make the if-else straight in writeback_single_inode(). > No behavior change. > > Cc: Michael Rubin <mrubin@google.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> The patch looks good. Acked-by: Jan Kara <jack@suse.cz> > --- > fs/fs-writeback.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:40:41.000000000 +0800 > +++ linux/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800 > @@ -417,8 +417,13 @@ writeback_single_inode(struct inode *ino > spin_lock(&inode_lock); > inode->i_state &= ~I_SYNC; > if (!(inode->i_state & (I_FREEING | I_CLEAR))) { > - if (!(inode->i_state & I_DIRTY) && > - mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { > + if (inode->i_state & I_DIRTY) { > + /* > + * Someone redirtied the inode while were writing back > + * the pages. > + */ > + redirty_tail(inode); > + } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { > /* > * We didn't write back all the pages. nfs_writepages() > * sometimes bales out without doing anything. Redirty > @@ -462,12 +467,6 @@ writeback_single_inode(struct inode *ino > inode->i_state |= I_DIRTY_PAGES; > redirty_tail(inode); > } > - } else if (inode->i_state & I_DIRTY) { > - /* > - * Someone redirtied the inode while were writing back > - * the pages. > - */ > - redirty_tail(inode); > } else if (atomic_read(&inode->i_count)) { > /* > * The inode is clean, inuse > > -- > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20090909150600.451373732@intel.com>]
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering [not found] ` <20090909150600.451373732@intel.com> @ 2009-09-09 15:53 ` Jan Kara 2009-09-10 1:26 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2009-09-09 15:53 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML, linux-fsdevel On Wed 09-09-09 22:51:43, Wu Fengguang wrote: > This was not a bug, since b_io is empty for kupdate writeback. > The next patch will do requeue_io() for non-kupdate writeback, > so let's fix it. But doesn't this patch also need your "anti-starvation" patch? Looking into the code, we put inode to b_more_io when nr_to_write drops to zero and this way we'd just start writing it again in the next round... Honza > > CC: Dave Chinner <david@fromorbit.com> > Cc: Martin Bligh <mbligh@google.com> > Cc: Michael Rubin <mrubin@google.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > fs/fs-writeback.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800 > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800 > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l > } > > /* > - * Queue all expired dirty inodes for io, eldest first. > + * Queue all expired dirty inodes for io, eldest first: > + * (newly dirtied) => b_dirty inodes > + * => b_more_io inodes > + * => remaining inodes in b_io => (dequeue for sync) > */ > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this) > { > - list_splice_init(&wb->b_more_io, wb->b_io.prev); > + list_splice_init(&wb->b_more_io, &wb->b_io); > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this); > } > > > -- > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering 2009-09-09 15:53 ` [RFC][PATCH 2/7] writeback: fix queue_io() ordering Jan Kara @ 2009-09-10 1:26 ` Wu Fengguang 2009-09-10 14:14 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 1:26 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, jack@suse.cz, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote: > On Wed 09-09-09 22:51:43, Wu Fengguang wrote: > > This was not a bug, since b_io is empty for kupdate writeback. > > The next patch will do requeue_io() for non-kupdate writeback, > > so let's fix it. > But doesn't this patch also need your "anti-starvation" patch? Honza, can you show me that patch? > Looking into the code, we put inode to b_more_io when nr_to_write > drops to zero and this way we'd just start writing it again > in the next round... I'm confused. It's OK to start it in next round. Starvation can occur if we start it immediately in the next writeback_inodes() invocation. How can that happen with this patch? Thanks, Fengguang > Honza > > > > CC: Dave Chinner <david@fromorbit.com> > > Cc: Martin Bligh <mbligh@google.com> > > Cc: Michael Rubin <mrubin@google.com> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > > --- > > fs/fs-writeback.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800 > > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800 > > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l > > } > > > > /* > > - * Queue all expired dirty inodes for io, eldest first. > > + * Queue all expired dirty inodes for io, eldest first: > > + * (newly dirtied) => b_dirty inodes > > + * => b_more_io inodes > > + * => remaining inodes in b_io => (dequeue for sync) > > */ > > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this) > > { > > - list_splice_init(&wb->b_more_io, wb->b_io.prev); > > + list_splice_init(&wb->b_more_io, &wb->b_io); > > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this); > > } > > > > > > -- > > > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering 2009-09-10 1:26 ` Wu Fengguang @ 2009-09-10 14:14 ` Jan Kara 2009-09-10 14:17 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2009-09-10 14:14 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu 10-09-09 09:26:24, Wu Fengguang wrote: > On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote: > > On Wed 09-09-09 22:51:43, Wu Fengguang wrote: > > > This was not a bug, since b_io is empty for kupdate writeback. > > > The next patch will do requeue_io() for non-kupdate writeback, > > > so let's fix it. > > But doesn't this patch also need your "anti-starvation" patch? > > Honza, can you show me that patch? > > > Looking into the code, we put inode to b_more_io when nr_to_write > > drops to zero and this way we'd just start writing it again > > in the next round... > > I'm confused. It's OK to start it in next round. Starvation can > occur if we start it immediately in the next writeback_inodes() > invocation. How can that happen with this patch? Sorry, my fault. For kupdate, we splice the list only once s_io is empty so that's not an issue. So the patch looks good. Acked-by: Jan Kara <jack@suse.cz> > > > CC: Dave Chinner <david@fromorbit.com> > > > Cc: Martin Bligh <mbligh@google.com> > > > Cc: Michael Rubin <mrubin@google.com> > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > > > --- > > > fs/fs-writeback.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800 > > > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800 > > > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l > > > } > > > > > > /* > > > - * Queue all expired dirty inodes for io, eldest first. > > > + * Queue all expired dirty inodes for io, eldest first: > > > + * (newly dirtied) => b_dirty inodes > > > + * => b_more_io inodes > > > + * => remaining inodes in b_io => (dequeue for sync) > > > */ > > > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this) > > > { > > > - list_splice_init(&wb->b_more_io, wb->b_io.prev); > > > + list_splice_init(&wb->b_more_io, &wb->b_io); > > > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this); > > > } > > > > > > > > > -- > > > > > -- > > Jan Kara <jack@suse.cz> > > SUSE Labs, CR -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering 2009-09-10 14:14 ` Jan Kara @ 2009-09-10 14:17 ` Wu Fengguang 0 siblings, 0 replies; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 14:17 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 10:14:15PM +0800, Jan Kara wrote: > On Thu 10-09-09 09:26:24, Wu Fengguang wrote: > > On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote: > > > On Wed 09-09-09 22:51:43, Wu Fengguang wrote: > > > > This was not a bug, since b_io is empty for kupdate writeback. > > > > The next patch will do requeue_io() for non-kupdate writeback, > > > > so let's fix it. > > > But doesn't this patch also need your "anti-starvation" patch? > > > > Honza, can you show me that patch? > > > > > Looking into the code, we put inode to b_more_io when nr_to_write > > > drops to zero and this way we'd just start writing it again > > > in the next round... > > > > I'm confused. It's OK to start it in next round. Starvation can > > occur if we start it immediately in the next writeback_inodes() > > invocation. How can that happen with this patch? > Sorry, my fault. For kupdate, we splice the list only once s_io is empty > so that's not an issue. So the patch looks good. > Acked-by: Jan Kara <jack@suse.cz> Thank you :) Regards, Fengguang > > > > CC: Dave Chinner <david@fromorbit.com> > > > > Cc: Martin Bligh <mbligh@google.com> > > > > Cc: Michael Rubin <mrubin@google.com> > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > > > > --- > > > > fs/fs-writeback.c | 7 +++++-- > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800 > > > > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800 > > > > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l > > > > } > > > > > > > > /* > > > > - * Queue all expired dirty inodes for io, eldest first. > > > > + * Queue all expired dirty inodes for io, eldest first: > > > > + * (newly dirtied) => b_dirty inodes > > > > + * => b_more_io inodes > > > > + * => remaining inodes in b_io => (dequeue for sync) > > > > */ > > > > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this) > > > > { > > > > - list_splice_init(&wb->b_more_io, wb->b_io.prev); > > > > + list_splice_init(&wb->b_more_io, &wb->b_io); > > > > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this); > > > > } > > > > > > > > > > > > -- > > > > > > > -- > > > Jan Kara <jack@suse.cz> > > > SUSE Labs, CR > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20090909150600.874037375@intel.com>]
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES [not found] ` <20090909150600.874037375@intel.com> @ 2009-09-09 23:29 ` Theodore Tso 2009-09-10 0:13 ` Wu Fengguang 2009-09-10 4:53 ` Peter Zijlstra 0 siblings, 2 replies; 19+ messages in thread From: Theodore Tso @ 2009-09-09 23:29 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML, linux-fsdevel On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote: > + * The maximum number of pages to writeout in a single periodic/background > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second. > + * This is not a big problem since we normally do kind of trylock on I_SYNC > + * for non-data-integrity writes. Userspace tasks doing throttled writeback > + * do not use this value. What's your justification for using 64MB? Where are you getting 1 second from? On a fast RAID array 64MB can be written in much less than 1 second. More generally, I assume your patches conflict with Jens' per-bdi patches? - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES 2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso @ 2009-09-10 0:13 ` Wu Fengguang 2009-09-10 4:53 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 0:13 UTC (permalink / raw) To: Theodore Tso, Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason On Thu, Sep 10, 2009 at 07:29:38AM +0800, Theodore Tso wrote: > On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote: > > + * The maximum number of pages to writeout in a single periodic/background > > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second. > > + * This is not a big problem since we normally do kind of trylock on I_SYNC > > + * for non-data-integrity writes. Userspace tasks doing throttled writeback > > + * do not use this value. > > What's your justification for using 64MB? Where are you getting 1 > second from? On a fast RAID array 64MB can be written in much less > than 1 second. Because this value will be used for desktop and server alike, we have to consider the worst case - for a laptop, it takes about 1 second to write 64MB. It's not accurate to say I_SYNC may be hold for up to 1 second, but the same I_SYNC will be taken, dropped because of congestion, retaken, and this loop could go on for 1 second until a large file have been written 64MB. In the mean while, in the normal case of a single kupdate thread per bdi, that file blocks writeout of other files for 1 second. > More generally, I assume your patches conflict with Jens' per-bdi patches? It is based on latest linux-next tree with the vm.max_writeback_mb patch reverted. The changelog only states the need to increase the size, but not mentioned why it should be made tunable. So I tend to just bump up MAX_WRITEBACK_PAGES. It seems that a large value won't hurt anyone. Or are there demand to further increase it a lot for some server configurations? Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES 2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso 2009-09-10 0:13 ` Wu Fengguang @ 2009-09-10 4:53 ` Peter Zijlstra 2009-09-10 7:35 ` Wu Fengguang 1 sibling, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2009-09-10 4:53 UTC (permalink / raw) To: Theodore Tso Cc: Wu Fengguang, Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Christoph Hellwig, jack, Artem Bityutskiy, LKML, linux-fsdevel On Wed, 2009-09-09 at 19:29 -0400, Theodore Tso wrote: > On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote: > > + * The maximum number of pages to writeout in a single periodic/background > > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second. > > + * This is not a big problem since we normally do kind of trylock on I_SYNC > > + * for non-data-integrity writes. Userspace tasks doing throttled writeback > > + * do not use this value. > > What's your justification for using 64MB? Where are you getting 1 > second from? On a fast RAID array 64MB can be written in much less > than 1 second. Worse, on my 5mb/s usb stick writing out 64m will take forever. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES 2009-09-10 4:53 ` Peter Zijlstra @ 2009-09-10 7:35 ` Wu Fengguang 0 siblings, 0 replies; 19+ messages in thread From: Wu Fengguang @ 2009-09-10 7:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Theodore Tso, Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason, Christoph Hellwig, jack@suse.cz, Artem Bityutskiy, LKML, linux-fsdevel@vger.kernel.org On Thu, Sep 10, 2009 at 12:53:18PM +0800, Peter Zijlstra wrote: > On Wed, 2009-09-09 at 19:29 -0400, Theodore Tso wrote: > > On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote: > > > + * The maximum number of pages to writeout in a single periodic/background > > > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second. > > > + * This is not a big problem since we normally do kind of trylock on I_SYNC > > > + * for non-data-integrity writes. Userspace tasks doing throttled writeback > > > + * do not use this value. > > > > What's your justification for using 64MB? Where are you getting 1 > > second from? On a fast RAID array 64MB can be written in much less > > than 1 second. > > Worse, on my 5mb/s usb stick writing out 64m will take forever. cp bigfile1 bigfile2 /mnt/usb/ sync In that case the user would notice that kernel keeps writing to one file for up to 13 seconds before switching to another file. A simple fix would look like this. It stops io continuation on one file after 1 second. It will work because when io is congested, it relies on the io continuation logic (based on last_file*) to retry the same file until MAX_WRITEBACK_PAGES is reached. The queue-able requests between congested <=> uncongested states are not very large. For slow devices, the queue-able pages between empty <=> congested states are also not very large. For example, my USB stick has nr_requests=128 and max_sectors_kb=120. It would take less than 12MB to congest this queue. With this patch and my usb stick, the kernel may first sync 12MB for bigfile1 (which takes 1-3 seconds), then sync bigfile2 for 1 second, and then bigfile1 for 1 second, and so on. It seems that we could now safely bump MAX_WRITEBACK_PAGES to even larger values beyond 128MB :) Thanks, Fengguang --- --- linux.orig/fs/fs-writeback.c 2009-09-10 15:02:48.000000000 +0800 +++ linux/fs/fs-writeback.c 2009-09-10 15:07:23.000000000 +0800 @@ -277,7 +277,8 @@ static void requeue_io(struct inode *ino */ static void requeue_partial_io(struct writeback_control *wbc, struct inode *inode) { - if (wbc->last_file_written == 0 || + if (time_before(wbc->last_file_time + HZ, jiffies) || + wbc->last_file_written == 0 || wbc->last_file_written >= MAX_WRITEBACK_PAGES) return requeue_io(inode); @@ -428,6 +429,7 @@ writeback_single_inode(struct inode *ino if (wbc->last_file != inode->i_ino) { wbc->last_file = inode->i_ino; + wbc->last_file_time = jiffies; wbc->last_file_written = nr_to_write - wbc->nr_to_write; } else wbc->last_file_written += nr_to_write - wbc->nr_to_write; --- linux.orig/include/linux/writeback.h 2009-09-10 15:07:28.000000000 +0800 +++ linux/include/linux/writeback.h 2009-09-10 15:08:46.000000000 +0800 @@ -46,6 +46,7 @@ struct writeback_control { long nr_to_write; /* Write this many pages, and decrement this for each page written */ unsigned long last_file; /* Inode number of last written file */ + unsigned long last_file_time; /* First sync time for last file */ long last_file_written; /* Total pages written for last file */ long pages_skipped; /* Pages which were not written */ ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-09-10 16:08 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20090909145141.293229693@intel.com> [not found] ` <20090909150601.159061863@intel.com> 2009-09-09 15:44 ` [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages Jan Kara 2009-09-10 1:42 ` Wu Fengguang 2009-09-10 12:57 ` Chris Mason 2009-09-10 13:21 ` Wu Fengguang 2009-09-10 14:56 ` Peter Zijlstra 2009-09-10 15:14 ` Wu Fengguang 2009-09-10 15:31 ` Peter Zijlstra 2009-09-10 15:41 ` Wu Fengguang 2009-09-10 15:54 ` Peter Zijlstra 2009-09-10 16:08 ` Wu Fengguang [not found] ` <20090909150600.330539880@intel.com> 2009-09-09 15:45 ` [RFC][PATCH 1/7] writeback: cleanup writeback_single_inode() Jan Kara [not found] ` <20090909150600.451373732@intel.com> 2009-09-09 15:53 ` [RFC][PATCH 2/7] writeback: fix queue_io() ordering Jan Kara 2009-09-10 1:26 ` Wu Fengguang 2009-09-10 14:14 ` Jan Kara 2009-09-10 14:17 ` Wu Fengguang [not found] ` <20090909150600.874037375@intel.com> 2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso 2009-09-10 0:13 ` Wu Fengguang 2009-09-10 4:53 ` Peter Zijlstra 2009-09-10 7:35 ` Wu Fengguang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).