All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, David Chinner <dgc@sgi.com>,
	Ken Chen <kenchen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Rubin <mrubin@google.com>
Subject: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
Date: Tue, 02 Oct 2007 16:41:48 +0800	[thread overview]
Message-ID: <391315778.00442@ustc.edu.cn> (raw)
Message-ID: <20071002090254.987182999@mail.ustc.edu.cn> (raw)
In-Reply-To: 20071002084143.110486039@mail.ustc.edu.cn

[-- Attachment #1: writeback-more-data.patch --]
[-- Type: text/plain, Size: 3892 bytes --]

After making dirty a 100M file, the normal behavior is to
start the writeback for all data after 30s delays. But
sometimes the following happens instead:

	- after 30s:    ~4M
	- after 5s:     ~4M
	- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

		s_io            s_more_io
		-------------------------
	1)	100M,1K         0
	2)	1K              96M
	3)	0               96M

1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all been
written out. The big dirty file is actually still sitting in s_more_io. We
cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
let the loop in generic_sync_sb_inodes() continue: this may starve newly
expired inodes in s_dirty.  It is also not an option to draw inodes from both
s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
and might also starve other superblocks in sync time(well kupdate may still
starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes. So nr_to_write > 0 does
not necessarily mean that "all data are written". This patch introduces a flag
writeback_control.more_io to indicate this situation. With it the big dirty file
no longer has to wait for the next kupdate invocation 5s later.

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c         |    2 ++
 include/linux/writeback.h |    1 +
 mm/page-writeback.c       |    9 ++++++---
 3 files changed, 9 insertions(+), 3 deletions(-)

--- linux-2.6.23-rc8-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc8-mm2/include/linux/writeback.h
@@ -62,6 +62,7 @@ struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned for_writepages:1;	/* This is a writepages() call */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
+	unsigned more_io:1;		/* more io to be dispatched */
 	void *fs_private;               /* For use by ->writepages() */
 };
 
--- linux-2.6.23-rc8-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc8-mm2/fs/fs-writeback.c
@@ -487,6 +487,8 @@ int generic_sync_sb_inodes(struct super_
 		if (wbc->nr_to_write <= 0)
 			break;
 	}
+	if (!list_empty(&sb->s_more_io))
+		wbc->more_io = 1;
 
 	spin_unlock(&inode_lock);
 	return ret;		/* Leave any unwritten inodes on s_io */
--- linux-2.6.23-rc8-mm2.orig/mm/page-writeback.c
+++ linux-2.6.23-rc8-mm2/mm/page-writeback.c
@@ -553,6 +553,7 @@ static void background_writeout(unsigned
 			global_page_state(NR_UNSTABLE_NFS) < background_thresh
 				&& min_pages <= 0)
 			break;
+		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		wbc.pages_skipped = 0;
@@ -560,8 +561,9 @@ static void background_writeout(unsigned
 		min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
 			/* Wrote less than expected */
-			congestion_wait(WRITE, HZ/10);
-			if (!wbc.encountered_congestion)
+			if (wbc.encountered_congestion || wbc.more_io)
+				congestion_wait(WRITE, HZ/10);
+			else
 				break;
 		}
 	}
@@ -626,11 +628,12 @@ static void wb_kupdate(unsigned long arg
 			global_page_state(NR_UNSTABLE_NFS) +
 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
 	while (nr_to_write > 0) {
+		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		writeback_inodes(&wbc);
 		if (wbc.nr_to_write > 0) {
-			if (wbc.encountered_congestion)
+			if (wbc.encountered_congestion || wbc.more_io)
 				congestion_wait(WRITE, HZ/10);
 			else
 				break;	/* All the old data is written */

-- 

  parent reply	other threads:[~2007-10-02  9:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-02  8:41 [PATCH 0/5] sluggish writeback fixes Fengguang Wu
2007-10-02  8:41 ` Fengguang Wu
2007-10-03 11:04   ` Martin Knoblauch
2007-10-02  8:41 ` [PATCH 1/5] revert check_dirty_inode_list.patch Fengguang Wu
2007-10-02  8:41   ` Fengguang Wu
2007-10-02  8:41 ` [PATCH 2/5] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-10-02  8:41   ` Fengguang Wu
2007-10-02  8:41 ` [PATCH 3/5] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-10-02  8:41   ` Fengguang Wu
2007-10-02  8:41 ` [PATCH 4/5] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-10-02  8:41   ` Fengguang Wu
2007-10-04 21:26     ` Andrew Morton
2007-10-02 21:55   ` David Chinner
2007-10-03  1:43     ` Fengguang Wu
2007-10-03  1:43       ` Fengguang Wu
2007-10-03  2:22       ` David Chinner
2007-10-02  8:41 ` Fengguang Wu [this message]
2007-10-02  8:41   ` [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io Fengguang Wu
2007-10-02 21:47   ` David Chinner
2007-10-03  1:34     ` Fengguang Wu
2007-10-03  1:34       ` Fengguang Wu
2007-10-03  2:41       ` David Chinner
2007-10-04  2:21         ` Fengguang Wu
2007-10-04  2:21           ` Fengguang Wu
2007-10-04  5:03           ` David Chinner
2007-10-05  3:36             ` Fengguang Wu
2007-10-05  3:36               ` Fengguang Wu
2007-10-05  7:41               ` David Chinner
2007-10-05 11:55                 ` Fengguang Wu
2007-10-05 11:55                   ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=391315778.00442@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=kenchen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.