* Re: [PATCH 1/6] writeback: balance_dirty_pages() shall write more than dirtied pages [not found] ` <20090923124027.456325340@intel.com> @ 2009-09-23 12:45 ` Christoph Hellwig 2009-09-23 12:53 ` Wu Fengguang 2009-09-23 13:56 ` [PATCH 1/6 -v2] " Wu Fengguang 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2009-09-23 12:45 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Ts'o, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel, LKML On Wed, Sep 23, 2009 at 08:33:39PM +0800, Wu Fengguang wrote: > Some filesystem may choose to write much more than ratelimit_pages > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > determine number to write based on real number of dirtied pages. > > The increased write_chunk may make the dirtier more bumpy. This is > filesystem writers' duty not to dirty too much at a time without > checking the ratelimit. For some reason the filesystems tricking around balance_dirty_pages_ratelimited's semantics seem to get much better buffered write performance. Are you sure this is not going to destroy that benefit? (But yes, we should make sure it behaves the same for all filesystems) > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > --- > mm/page-writeback.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > --- linux.orig/mm/page-writeback.c 2009-09-23 16:31:46.000000000 +0800 > +++ linux/mm/page-writeback.c 2009-09-23 16:33:19.000000000 +0800 > @@ -44,12 +44,12 @@ static long ratelimit_pages = 32; > /* > * When balance_dirty_pages decides that the caller needs to perform some > * non-background writeback, this is how many pages it will attempt to write. > - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably > + * It should be somewhat larger than dirtied pages to ensure that reasonably > * large amounts of I/O are submitted. > */ > -static inline long sync_writeback_pages(void) > +static inline long sync_writeback_pages(unsigned long dirtied) > { > - return ratelimit_pages + ratelimit_pages / 2; > + return dirtied + dirtied / 2; > } > > /* The following parameters are exported via /proc/sys/vm */ > @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro > * If we're over `background_thresh' then pdflush is woken to perform some > * writeout. > */ > -static void balance_dirty_pages(struct address_space *mapping) > +static void balance_dirty_pages(struct address_space *mapping, > + unsigned long write_chunk) > { > long nr_reclaimable, bdi_nr_reclaimable; > long nr_writeback, bdi_nr_writeback; > @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a > unsigned long dirty_thresh; > unsigned long bdi_thresh; > unsigned long pages_written = 0; > - unsigned long write_chunk = sync_writeback_pages(); > unsigned long pause = 1; > > struct backing_dev_info *bdi = mapping->backing_dev_info; > @@ -640,9 +640,10 @@ void balance_dirty_pages_ratelimited_nr( > p = &__get_cpu_var(bdp_ratelimits); > *p += nr_pages_dirtied; > if (unlikely(*p >= ratelimit)) { > + ratelimit = sync_writeback_pages(*p); > *p = 0; > preempt_enable(); > - balance_dirty_pages(mapping); > + balance_dirty_pages(mapping, ratelimit); > return; > } > preempt_enable(); > > ---end quoted text--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/6] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-23 12:45 ` [PATCH 1/6] writeback: balance_dirty_pages() shall write more than dirtied pages Christoph Hellwig @ 2009-09-23 12:53 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2009-09-23 12:53 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Ts'o, Dave Chinner, Chris Mason, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 08:45:29PM +0800, Christoph Hellwig wrote: > On Wed, Sep 23, 2009 at 08:33:39PM +0800, Wu Fengguang wrote: > > Some filesystem may choose to write much more than ratelimit_pages > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to > > determine number to write based on real number of dirtied pages. > > > > The increased write_chunk may make the dirtier more bumpy. This is > > filesystem writers' duty not to dirty too much at a time without > > checking the ratelimit. > > For some reason the filesystems tricking around > balance_dirty_pages_ratelimited's semantics seem to get much better > buffered write performance. Yes, I would expect enlarged ratelimit numbers to benefit performance. > Are you sure this is not going to destroy that benefit? (But yes, > we should make sure it behaves the same for all filesystems) If filesystem chooses to use a larger ratelimit number, then this patch will also enlarge the write chunk size (which is now computed based on the enlarged ratelimit number), which I believe will be beneficial, too. Thanks, Fengguang > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > > --- > > mm/page-writeback.c | 13 +++++++------ > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > > --- linux.orig/mm/page-writeback.c 2009-09-23 16:31:46.000000000 +0800 > > +++ linux/mm/page-writeback.c 2009-09-23 16:33:19.000000000 +0800 > > @@ -44,12 +44,12 @@ static long ratelimit_pages = 32; > > /* > > * When balance_dirty_pages decides that the caller needs to perform some > > * non-background writeback, this is how many pages it will attempt to write. > > - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably > > + * It should be somewhat larger than dirtied pages to ensure that reasonably > > * large amounts of I/O are submitted. > > */ > > -static inline long sync_writeback_pages(void) > > +static inline long sync_writeback_pages(unsigned long dirtied) > > { > > - return ratelimit_pages + ratelimit_pages / 2; > > + return dirtied + dirtied / 2; > > } > > > > /* The following parameters are exported via /proc/sys/vm */ > > @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro > > * If we're over `background_thresh' then pdflush is woken to perform some > > * writeout. > > */ > > -static void balance_dirty_pages(struct address_space *mapping) > > +static void balance_dirty_pages(struct address_space *mapping, > > + unsigned long write_chunk) > > { > > long nr_reclaimable, bdi_nr_reclaimable; > > long nr_writeback, bdi_nr_writeback; > > @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a > > unsigned long dirty_thresh; > > unsigned long bdi_thresh; > > unsigned long pages_written = 0; > > - unsigned long write_chunk = sync_writeback_pages(); > > unsigned long pause = 1; > > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > @@ -640,9 +640,10 @@ void balance_dirty_pages_ratelimited_nr( > > p = &__get_cpu_var(bdp_ratelimits); > > *p += nr_pages_dirtied; > > if (unlikely(*p >= ratelimit)) { > > + ratelimit = sync_writeback_pages(*p); > > *p = 0; > > preempt_enable(); > > - balance_dirty_pages(mapping); > > + balance_dirty_pages(mapping, ratelimit); > > return; > > } > > preempt_enable(); > > > > > ---end quoted text--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/6 -v2] writeback: balance_dirty_pages() shall write more than dirtied pages [not found] ` <20090923124027.456325340@intel.com> 2009-09-23 12:45 ` [PATCH 1/6] writeback: balance_dirty_pages() shall write more than dirtied pages Christoph Hellwig @ 2009-09-23 13:56 ` Wu Fengguang 2009-09-23 13:58 ` Wu Fengguang 1 sibling, 1 reply; 11+ messages in thread From: Wu Fengguang @ 2009-09-23 13:56 UTC (permalink / raw) To: Andrew Morton, Jens Axboe Cc: Jan Kara, Theodore Ts'o, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML Some filesystem may choose to write much more than ratelimit_pages before calling balance_dirty_pages_ratelimited_nr(). So it is safer to determine number to write based on real number of dirtied pages. Otherwise it is possible that loop { btrfs_file_write(): dirty 1024 pages balance_dirty_pages(): write up to 48 pages (= ratelimit_pages * 1.5) } in which the writeback rate cannot keep up with dirty rate, and the dirty pages go all the way beyond dirty_thresh. The increased write_chunk may make the dirtier more bumpy. So filesystems shall be take care not to dirty too much at a time (eg. > 4MB) without checking the ratelimit. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/page-writeback.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) --- linux.orig/mm/page-writeback.c 2009-09-23 21:41:21.000000000 +0800 +++ linux/mm/page-writeback.c 2009-09-23 21:42:17.000000000 +0800 @@ -44,12 +44,15 @@ static long ratelimit_pages = 32; /* * When balance_dirty_pages decides that the caller needs to perform some * non-background writeback, this is how many pages it will attempt to write. - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably + * It should be somewhat larger than dirtied pages to ensure that reasonably * large amounts of I/O are submitted. */ -static inline long sync_writeback_pages(void) +static inline long sync_writeback_pages(unsigned long dirtied) { - return ratelimit_pages + ratelimit_pages / 2; + if (dirtied < ratelimit_pages) + dirtied = ratelimit_pages; + + return dirtied + dirtied / 2; } /* The following parameters are exported via /proc/sys/vm */ @@ -476,7 +479,8 @@ get_dirty_limits(unsigned long *pbackgro * If we're over `background_thresh' then pdflush is woken to perform some * writeout. */ -static void balance_dirty_pages(struct address_space *mapping) +static void balance_dirty_pages(struct address_space *mapping, + unsigned long write_chunk) { long nr_reclaimable, bdi_nr_reclaimable; long nr_writeback, bdi_nr_writeback; @@ -484,7 +488,6 @@ static void balance_dirty_pages(struct a unsigned long dirty_thresh; unsigned long bdi_thresh; unsigned long pages_written = 0; - unsigned long write_chunk = sync_writeback_pages(); unsigned long pause = 1; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -639,9 +642,10 @@ void balance_dirty_pages_ratelimited_nr( p = &__get_cpu_var(bdp_ratelimits); *p += nr_pages_dirtied; if (unlikely(*p >= ratelimit)) { + ratelimit = sync_writeback_pages(*p); *p = 0; preempt_enable(); - balance_dirty_pages(mapping); + balance_dirty_pages(mapping, ratelimit); return; } preempt_enable(); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/6 -v2] writeback: balance_dirty_pages() shall write more than dirtied pages 2009-09-23 13:56 ` [PATCH 1/6 -v2] " Wu Fengguang @ 2009-09-23 13:58 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2009-09-23 13:58 UTC (permalink / raw) To: Andrew Morton, Jens Axboe Cc: Jan Kara, Theodore Ts'o, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 09:56:00PM +0800, Wu Fengguang wrote: > -static inline long sync_writeback_pages(void) > +static inline long sync_writeback_pages(unsigned long dirtied) > { > - return ratelimit_pages + ratelimit_pages / 2; > + if (dirtied < ratelimit_pages) > + dirtied = ratelimit_pages; Just added the above checks. Now balance_dirty_pages() for XFS works in a much larger 1536 chunk size :) [ 40.374081] redirty_tail() +516: inode=131 [ 40.374951] mm/page-writeback.c +543 balance_dirty_pages(): comm=cp pid=3296 n=1536 [ 40.377001] global dirty=49818 writeback=17027 nfs=0 flags=CM towrite=0 skipped=0 > + > + return dirtied + dirtied / 2; > } Thanks, Fengguang ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20090923124027.589303074@intel.com>]
* Re: [PATCH 2/6] writeback: stop background writeback when below background threshold [not found] ` <20090923124027.589303074@intel.com> @ 2009-09-23 15:05 ` Jens Axboe 2009-09-24 1:24 ` Wu Fengguang 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2009-09-23 15:05 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Theodore Ts'o, Dave Chinner, Chris Mason, Christoph Hellwig, linux-fsdevel, LKML On Wed, Sep 23 2009, Wu Fengguang wrote: > Treat bdi_start_writeback(0) as a special request to do background write, > and stop such work when we are below the background dirty threshold. > > Also simplify the (nr_pages <= 0) checks. Since we already pass in > nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't > need to worry about it being decreased to zero. Good cleanup! Do you want me to collect these fixes, or how shall we handle this? -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/6] writeback: stop background writeback when below background threshold 2009-09-23 15:05 ` [PATCH 2/6] writeback: stop background writeback when below background threshold Jens Axboe @ 2009-09-24 1:24 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2009-09-24 1:24 UTC (permalink / raw) To: Jens Axboe Cc: Andrew Morton, Jan Kara, Peter Zijlstra, Theodore Ts'o, Dave Chinner, Chris Mason, Christoph Hellwig, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 11:05:10PM +0800, Jens Axboe wrote: > On Wed, Sep 23 2009, Wu Fengguang wrote: > > Treat bdi_start_writeback(0) as a special request to do background write, > > and stop such work when we are below the background dirty threshold. > > > > Also simplify the (nr_pages <= 0) checks. Since we already pass in > > nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't > > need to worry about it being decreased to zero. > > Good cleanup! Do you want me to collect these fixes, or how shall we > handle this? Thanks. I'm feeling OK with these patches(except the last one) as well as Jan's busy loop fixing patch. Can you pull them to your tree? Thanks, Fengguang ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20090923124028.060887241@intel.com>]
* Re: [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier [not found] ` <20090923124028.060887241@intel.com> @ 2009-09-23 13:20 ` Wu Fengguang 2009-09-23 13:23 ` Christoph Hellwig 2009-09-26 19:47 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Wu Fengguang @ 2009-09-23 13:20 UTC (permalink / raw) To: Andrew Morton, Jens Axboe Cc: Jan Kara, Theodore Tso, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 08:33:43PM +0800, Wu, Fengguang wrote: > Debug traces show that in per-bdi writeback, the inode under writeback > almost always get redirtied by a busy dirtier. We used to call > redirty_tail() in this case, which could delay inode for up to 30s. > > This is unacceptable because it now happens so frequently for plain cp/dd, > that the accumulated delays could make writeback of big files very slow. > > So let's distinguish between data redirty and metadata only redirty. > The first one is caused by a busy dirtier, while the latter one could > happen in XFS, NFS, etc. when they are doing delalloc or updating isize. > > The inode being busy dirtied will now be requeued for next io, while > the inode being redirtied by fs will continue to be delayed to avoid > repeated IO. Here are some test results on XFS. Workload is a simple cp /dev/zero /mnt/xfs/ I saw many repeated [ 344.043711] mm/page-writeback.c +540 balance_dirty_pages(): comm=cp pid=3520 n=12 [ 344.043711] global dirty=57051 writeback=12920 nfs=0 flags=CM towrite=0 skipped=0 [ 344.043711] redirty_tail() +516: inode=128 => the directory inode [ 344.043711] redirty_tail() +516: inode=131 => the zero file being written to and then repeated [ 347.408629] fs/fs-writeback.c +813 wb_writeback(): comm=flush-8:0 pid=3298 n=4096 [ 347.411045] global dirty=50397 writeback=18065 nfs=0 flags=CM towrite=0 skipped=0 [ 347.422077] requeue_io() +468: inode=131 [ 347.423809] redirty_tail() +516: inode=128 traces during copy, and repeated [ 373.326496] redirty_tail() +516: inode=131 [ 373.328209] fs/fs-writeback.c +813 wb_writeback(): comm=flush-8:0 pid=3298 n=4096 [ 373.330988] global dirty=25213 writeback=20470 nfs=0 flags=CM towrite=0 skipped=0 after copy is interrupted. I noticed that - the write chunk size of balance_dirty_pages() is 12, which is pretty small and inefficient. - during copy, the inode is sometimes redirty_tail (old behavior) and sometimes requeue_io (new behavior). - during copy, the directory inode will always be synced and then redirty_tail. - after copy, the inode will be redirtied after sync. It shall not be a problem to use requeue_io for XFS, because whether it be requeue_io or redirty_tail, write_inode() will be called once for every 4MB. It would be inefficient if XFS really tries to write inode and directory inode's metadata every time it synced 4MB page. If that write attempt is turned into _real_ IO, that would be bad and kill performance. Increasing MAX_WRITEBACK_PAGES may help reduce the frequency of write_inode() though. Thanks, Fengguang > CC: Jan Kara <jack@suse.cz> > CC: Theodore Ts'o <tytso@mit.edu> > CC: Dave Chinner <david@fromorbit.com> > CC: Jens Axboe <jens.axboe@oracle.com> > CC: Chris Mason <chris.mason@oracle.com> > CC: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > --- > fs/fs-writeback.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > --- linux.orig/fs/fs-writeback.c 2009-09-23 16:13:41.000000000 +0800 > +++ linux/fs/fs-writeback.c 2009-09-23 16:21:24.000000000 +0800 > @@ -491,10 +493,15 @@ writeback_single_inode(struct inode *ino > spin_lock(&inode_lock); > inode->i_state &= ~I_SYNC; > if (!(inode->i_state & (I_FREEING | I_CLEAR))) { > - if (inode->i_state & I_DIRTY) { > + if (inode->i_state & I_DIRTY_PAGES) { > /* > - * Someone redirtied the inode while were writing back > - * the pages. > + * More pages get dirtied by a fast dirtier. > + */ > + goto select_queue; > + } else if (inode->i_state & I_DIRTY) { > + /* > + * At least XFS will redirty the inode during the > + * writeback (delalloc) and on io completion (isize). > */ > redirty_tail(inode); > } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier 2009-09-23 13:20 ` [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier Wu Fengguang @ 2009-09-23 13:23 ` Christoph Hellwig 2009-09-23 13:40 ` Wu Fengguang 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2009-09-23 13:23 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Tso, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 09:20:08PM +0800, Wu Fengguang wrote: > I noticed that > - the write chunk size of balance_dirty_pages() is 12, which is pretty > small and inefficient. > - during copy, the inode is sometimes redirty_tail (old behavior) and > sometimes requeue_io (new behavior). > - during copy, the directory inode will always be synced and then > redirty_tail. > - after copy, the inode will be redirtied after sync. Yeah, XFS uses generic_file_uffered_write and the heuristics in there for balance_dirty_pages turned out to be really bad. So far we didn't manage to sucessfully get that fixed, though. > It shall not be a problem to use requeue_io for XFS, because whether > it be requeue_io or redirty_tail, write_inode() will be called once > for every 4MB. > > It would be inefficient if XFS really tries to write inode and > directory inode's metadata every time it synced 4MB page. If > that write attempt is turned into _real_ IO, that would be bad > and kill performance. Increasing MAX_WRITEBACK_PAGES may help > reduce the frequency of write_inode() though. The way we call write_inode for XFS is extremly inefficient for XFS. As you noticed XFS tends to redirty the inode on I/O completion, and we also cluster inode writeouts. For XFS we'd really prefer to not intermix data and inode writeout, but first do the data writeout and then later push out the inodes, preferably with as many as possible inodes to sweep out in one go. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier 2009-09-23 13:23 ` Christoph Hellwig @ 2009-09-23 13:40 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2009-09-23 13:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Tso, Dave Chinner, Chris Mason, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Wed, Sep 23, 2009 at 09:23:51PM +0800, Christoph Hellwig wrote: > On Wed, Sep 23, 2009 at 09:20:08PM +0800, Wu Fengguang wrote: > > I noticed that > > - the write chunk size of balance_dirty_pages() is 12, which is pretty > > small and inefficient. > > - during copy, the inode is sometimes redirty_tail (old behavior) and > > sometimes requeue_io (new behavior). > > - during copy, the directory inode will always be synced and then > > redirty_tail. > > - after copy, the inode will be redirtied after sync. > > Yeah, XFS uses generic_file_uffered_write and the heuristics in there > for balance_dirty_pages turned out to be really bad. So far we didn't > manage to sucessfully get that fixed, though. Ah sorry. It's because of the first patch, it does not always "bump up" the write chunk. In your case it is obviously decreased (the original ratelimit_pages=4096 is much larger value). I'll fix it. > > It shall not be a problem to use requeue_io for XFS, because whether > > it be requeue_io or redirty_tail, write_inode() will be called once > > for every 4MB. > > > > It would be inefficient if XFS really tries to write inode and > > directory inode's metadata every time it synced 4MB page. If > > that write attempt is turned into _real_ IO, that would be bad > > and kill performance. Increasing MAX_WRITEBACK_PAGES may help > > reduce the frequency of write_inode() though. > > The way we call write_inode for XFS is extremly inefficient for XFS. As > you noticed XFS tends to redirty the inode on I/O completion, and we > also cluster inode writeouts. For XFS we'd really prefer to not > intermix data and inode writeout, but first do the data writeout and > then later push out the inodes, preferably with as many as possible > inodes to sweep out in one go. I guess the difficult part would be possible policy requirements on the max batch size (max number of inodes or pages to write before switching to write metadata) and the delay time (between the sync of data and metadata). It may take a long time to make a full scan of the dirty list. Thanks, Fengguang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier [not found] ` <20090923124028.060887241@intel.com> 2009-09-23 13:20 ` [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier Wu Fengguang @ 2009-09-26 19:47 ` Christoph Hellwig 2009-09-27 2:02 ` Wu Fengguang 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2009-09-26 19:47 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Tso, Dave Chinner, Chris Mason, Christoph Hellwig, Peter Zijlstra, linux-fsdevel, LKML On Wed, Sep 23, 2009 at 08:33:43PM +0800, Wu Fengguang wrote: > So let's distinguish between data redirty and metadata only redirty. > The first one is caused by a busy dirtier, while the latter one could > happen in XFS, NFS, etc. when they are doing delalloc or updating isize. Btw, I'm not sure the existing and preserved behaviour for that case is good. In the worst case the inode writeout gets delayed by another 30 seconds, doubling the window until a file is on disk if it was extended. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier 2009-09-26 19:47 ` Christoph Hellwig @ 2009-09-27 2:02 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2009-09-27 2:02 UTC (permalink / raw) To: Christoph Hellwig Cc: Andrew Morton, Jens Axboe, Jan Kara, Theodore Tso, Dave Chinner, Chris Mason, Peter Zijlstra, linux-fsdevel@vger.kernel.org, LKML On Sun, Sep 27, 2009 at 03:47:47AM +0800, Christoph Hellwig wrote: > On Wed, Sep 23, 2009 at 08:33:43PM +0800, Wu Fengguang wrote: > > So let's distinguish between data redirty and metadata only redirty. > > The first one is caused by a busy dirtier, while the latter one could > > happen in XFS, NFS, etc. when they are doing delalloc or updating isize. > > Btw, I'm not sure the existing and preserved behaviour for that case > is good. In the worst case the inode writeout gets delayed by another > 30 seconds, doubling the window until a file is on disk if it was > extended. Yes, the preserved behaviour is not optimal for XFS, but safe. We could try redirty_tail when there are no remaining dirty pages, and only metadata dirtiness. Like this: --- fs/fs-writeback.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) --- linux.orig/fs/fs-writeback.c 2009-09-27 09:52:15.000000000 +0800 +++ linux/fs/fs-writeback.c 2009-09-27 09:54:23.000000000 +0800 @@ -477,18 +477,7 @@ writeback_single_inode(struct inode *ino spin_lock(&inode_lock); inode->i_state &= ~I_SYNC; if (!(inode->i_state & (I_FREEING | I_CLEAR))) { - if ((inode->i_state & I_DIRTY_PAGES) && wbc->for_kupdate) { - /* - * More pages get dirtied by a fast dirtier. - */ - goto select_queue; - } else if (inode->i_state & I_DIRTY) { - /* - * At least XFS will redirty the inode during the - * writeback (delalloc) and on io completion (isize). - */ - redirty_tail(inode); - } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { + if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { /* * We didn't write back all the pages. nfs_writepages() * sometimes bales out without doing anything. Redirty @@ -510,7 +499,6 @@ writeback_single_inode(struct inode *ino * soon as the queue becomes uncongested. */ inode->i_state |= I_DIRTY_PAGES; -select_queue: if (wbc->nr_to_write <= 0) { /* * slice used up: queue for next turn @@ -533,6 +521,12 @@ select_queue: inode->i_state |= I_DIRTY_PAGES; redirty_tail(inode); } + } else if (inode->i_state & I_DIRTY) { + /* + * At least XFS will redirty the inode during the + * writeback (delalloc) and on io completion (isize). + */ + redirty_tail(inode); } else if (atomic_read(&inode->i_count)) { /* * The inode is clean, inuse ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-09-27 2:02 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20090923123337.990689487@intel.com> [not found] ` <20090923124027.456325340@intel.com> 2009-09-23 12:45 ` [PATCH 1/6] writeback: balance_dirty_pages() shall write more than dirtied pages Christoph Hellwig 2009-09-23 12:53 ` Wu Fengguang 2009-09-23 13:56 ` [PATCH 1/6 -v2] " Wu Fengguang 2009-09-23 13:58 ` Wu Fengguang [not found] ` <20090923124027.589303074@intel.com> 2009-09-23 15:05 ` [PATCH 2/6] writeback: stop background writeback when below background threshold Jens Axboe 2009-09-24 1:24 ` Wu Fengguang [not found] ` <20090923124028.060887241@intel.com> 2009-09-23 13:20 ` [PATCH 5/6] writeback: don't delay inodes redirtied by a fast dirtier Wu Fengguang 2009-09-23 13:23 ` Christoph Hellwig 2009-09-23 13:40 ` Wu Fengguang 2009-09-26 19:47 ` Christoph Hellwig 2009-09-27 2:02 ` Wu Fengguang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).