public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Theodore Tso <tytso@mit.edu>, Jens Axboe <jens.axboe@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"jack@suse.cz" <jack@suse.cz>
Subject: Re: [PATCH 0/7] Per-bdi writeback flusher threads v20
Date: Sat, 19 Sep 2009 23:03:51 +0800	[thread overview]
Message-ID: <20090919150351.GA19880@localhost> (raw)
In-Reply-To: <20090919042607.GA19752@localhost>

On Sat, Sep 19, 2009 at 12:26:07PM +0800, Wu Fengguang wrote:
> On Sat, Sep 19, 2009 at 12:00:51PM +0800, Wu Fengguang wrote:
> > On Sat, Sep 19, 2009 at 11:58:35AM +0800, Wu Fengguang wrote:
> > > On Sat, Sep 19, 2009 at 01:52:52AM +0800, Theodore Tso wrote:
> > > > On Fri, Sep 11, 2009 at 10:39:29PM +0800, Wu Fengguang wrote:
> > > > > 
> > > > > That would be good. Sorry for the late work. I'll allocate some time
> > > > > in mid next week to help review and benchmark recent writeback works,
> > > > > and hope to get things done in this merge window.
> > > > 
> > > > Did you have some chance to get more work done on the your writeback
> > > > patches?
> > > 
> > > Sorry for the delay, I'm now testing the patches with commands
> > > 
> > >  cp /dev/zero /mnt/test/zero0 &
> > >  dd if=/dev/zero of=/mnt/test/zero1 &
> > > 
> > > and the attached debug patch.
> > > 
> > > One problem I found with ext3/4 is, redirty_tail() is called repeatedly
> > > in the traces, which could slow down the inode writeback significantly.
> > 
> > FYI, it's this redirty_tail() called in writeback_single_inode():
> > 
> >                         /*
> >                          * Someone redirtied the inode while were writing back
> >                          * the pages.
> >                          */
> >                         redirty_tail(inode);
> 
> Hmm, this looks like an old fashioned problem get blew up by the
> 128MB MAX_WRITEBACK_PAGES.
> 
> The inode was redirtied by the busy cp/dd processes. Now it takes much
> more time to sync 128MB, so that a heavy dirtier can easily redirty
> the inode in that time window.
> 
> One single invocation of redirty_tail() could hold up the writeback of
> current inode for up to 30 seconds.

It seems that this patch helps. However I'm afraid it's too late to
risk merging such kind of patches now..

Thanks,
Fengguang
---

writeback: don't delay redirtied inode by a fast dirtier

The large 128MB MAX_WRITEBACK_PAGES greatly increases the chance
for an inode to be dirtied by a fast dirtier during the writeback.

We used to call redirty_tail() in this case, which could delay inode
writeback for up to 30s. This becomes unacceptable now even for simple
dd.

But still delay these cases:
- only inode metadata is dirtied (by the fs)
- the writeback_index wrapped around
  (to protect against fast dirtier that do repeated overwrites)

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Jens Axboe <jens.axboe@oracle.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

--- linux.orig/fs/fs-writeback.c	2009-09-19 18:09:50.000000000 +0800
+++ linux/fs/fs-writeback.c	2009-09-19 19:00:18.000000000 +0800
@@ -466,6 +466,7 @@ writeback_single_inode(struct inode *ino
 	long last_file_written;
 	long nr_to_write;
 	unsigned dirty;
+	pgoff_t writeback_index;
 	int ret;
 
 	if (!atomic_read(&inode->i_count))
@@ -508,6 +509,7 @@ writeback_single_inode(struct inode *ino
 		last_file_written = wbc->last_file_written;
 	wbc->nr_to_write -= last_file_written;
 	nr_to_write = wbc->nr_to_write;
+	writeback_index = mapping->writeback_index;
 
 	ret = do_writepages(mapping, wbc);
 
@@ -534,10 +536,15 @@ writeback_single_inode(struct inode *ino
 	spin_lock(&inode_lock);
 	inode->i_state &= ~I_SYNC;
 	if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
-		if (inode->i_state & I_DIRTY) {
+		if (inode->i_state & I_DIRTY_PAGES) {
 			/*
-			 * Someone redirtied the inode while were writing back
-			 * the pages.
+			 * More pages get dirtied by a fast dirtier.
+			 */
+			goto select_queue;
+		} else if (inode->i_state & I_DIRTY) {
+			/*
+			 * At least XFS will redirty the inode during the
+			 * writeback (delalloc) and on io completion (isize).
 			 */
 			redirty_tail(inode);
 		} else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
@@ -546,8 +553,10 @@ writeback_single_inode(struct inode *ino
 			 * sometimes bales out without doing anything.
 			 */
 			inode->i_state |= I_DIRTY_PAGES;
+select_queue:
 			if (wbc->encountered_congestion ||
-			    wbc->nr_to_write <= 0) {
+			    wbc->nr_to_write <= 0 ||
+			    writeback_index < mapping->writeback_index) {
 				/*
 				 * if slice used up, queue for next round;
 				 * otherwise continue this inode after return
@@ -556,6 +565,7 @@ writeback_single_inode(struct inode *ino
 			} else {
 				/*
 				 * somehow blocked: retry later
+				 * also protect against busy rewrites.
 				 */
 				redirty_tail(inode);
 			}

  reply	other threads:[~2009-09-19 15:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  7:34 [PATCH 0/7] Per-bdi writeback flusher threads v20 Jens Axboe
2009-09-11  7:34 ` [PATCH 1/7] writeback: get rid of generic_sync_sb_inodes() export Jens Axboe
2009-09-11  7:34 ` [PATCH 2/7] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 3/7] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-09-11  7:34 ` [PATCH 4/7] writeback: get rid of pdflush completely Jens Axboe
2009-09-11  7:34 ` [PATCH 5/7] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-09-11  7:34 ` [PATCH 6/7] writeback: add name to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 7/7] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-09-11 13:42 ` [PATCH 0/7] Per-bdi writeback flusher threads v20 Theodore Tso
2009-09-11 13:45   ` Chris Mason
2009-09-11 14:04     ` Jens Axboe
2009-09-11 14:16   ` Christoph Hellwig
2009-09-11 14:29     ` Jens Axboe
2009-09-11 14:39       ` Wu Fengguang
2009-09-18 17:52         ` Theodore Tso
2009-09-19  3:58           ` Wu Fengguang
2009-09-19  4:00             ` Wu Fengguang
2009-09-19  4:26               ` Wu Fengguang
2009-09-19 15:03                 ` Wu Fengguang [this message]
2009-09-20 19:00                   ` Jan Kara
2009-09-21  3:04                     ` Wu Fengguang
2009-09-21  5:35                       ` Wu Fengguang
2009-09-21  9:53                         ` Wu Fengguang
2009-09-21 10:02                           ` Jan Kara
2009-09-21 10:18                             ` Wu Fengguang
2009-09-21 12:42                       ` Jan Kara
2009-09-21 15:12                         ` Wu Fengguang
2009-09-21 16:08                           ` Jan Kara
2009-09-22  5:10                             ` Wu Fengguang
2009-09-21 13:53                 ` Chris Mason
2009-09-22 10:13                   ` Wu Fengguang
2009-09-22 11:30                     ` Chris Mason
2009-09-22 11:45                       ` Jan Kara
2009-09-22 12:47                         ` Wu Fengguang
2009-09-22 17:41                         ` Chris Mason
2009-09-22 13:18                       ` Wu Fengguang
2009-09-22 15:59                         ` Chris Mason
2009-09-23  1:05                           ` Wu Fengguang
2009-09-23 14:08                             ` Chris Mason
2009-09-24  1:32                               ` Wu Fengguang
2009-09-22 11:30                     ` Jan Kara
2009-09-22 13:33                       ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090919150351.GA19880@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox