public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, David Chinner <dgc@sgi.com>,
	Ken Chen <kenchen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Rubin <mrubin@google.com>
Subject: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io
Date: Tue, 02 Oct 2007 16:41:48 +0800	[thread overview]
Message-ID: <391315778.00442@ustc.edu.cn> (raw)
Message-ID: <20071002090254.987182999@mail.ustc.edu.cn> (raw)
In-Reply-To: 20071002084143.110486039@mail.ustc.edu.cn

[-- Attachment #1: writeback-more-data.patch --]
[-- Type: text/plain, Size: 3892 bytes --]

After making dirty a 100M file, the normal behavior is to
start the writeback for all data after 30s delays. But
sometimes the following happens instead:

	- after 30s:    ~4M
	- after 5s:     ~4M
	- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

		s_io            s_more_io
		-------------------------
	1)	100M,1K         0
	2)	1K              96M
	3)	0               96M

1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all been
written out. The big dirty file is actually still sitting in s_more_io. We
cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
let the loop in generic_sync_sb_inodes() continue: this may starve newly
expired inodes in s_dirty.  It is also not an option to draw inodes from both
s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
and might also starve other superblocks in sync time(well kupdate may still
starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes. So nr_to_write > 0 does
not necessarily mean that "all data are written". This patch introduces a flag
writeback_control.more_io to indicate this situation. With it the big dirty file
no longer has to wait for the next kupdate invocation 5s later.

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c         |    2 ++
 include/linux/writeback.h |    1 +
 mm/page-writeback.c       |    9 ++++++---
 3 files changed, 9 insertions(+), 3 deletions(-)

--- linux-2.6.23-rc8-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc8-mm2/include/linux/writeback.h
@@ -62,6 +62,7 @@ struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned for_writepages:1;	/* This is a writepages() call */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
+	unsigned more_io:1;		/* more io to be dispatched */
 	void *fs_private;               /* For use by ->writepages() */
 };
 
--- linux-2.6.23-rc8-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc8-mm2/fs/fs-writeback.c
@@ -487,6 +487,8 @@ int generic_sync_sb_inodes(struct super_
 		if (wbc->nr_to_write <= 0)
 			break;
 	}
+	if (!list_empty(&sb->s_more_io))
+		wbc->more_io = 1;
 
 	spin_unlock(&inode_lock);
 	return ret;		/* Leave any unwritten inodes on s_io */
--- linux-2.6.23-rc8-mm2.orig/mm/page-writeback.c
+++ linux-2.6.23-rc8-mm2/mm/page-writeback.c
@@ -553,6 +553,7 @@ static void background_writeout(unsigned
 			global_page_state(NR_UNSTABLE_NFS) < background_thresh
 				&& min_pages <= 0)
 			break;
+		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		wbc.pages_skipped = 0;
@@ -560,8 +561,9 @@ static void background_writeout(unsigned
 		min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
 			/* Wrote less than expected */
-			congestion_wait(WRITE, HZ/10);
-			if (!wbc.encountered_congestion)
+			if (wbc.encountered_congestion || wbc.more_io)
+				congestion_wait(WRITE, HZ/10);
+			else
 				break;
 		}
 	}
@@ -626,11 +628,12 @@ static void wb_kupdate(unsigned long arg
 			global_page_state(NR_UNSTABLE_NFS) +
 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
 	while (nr_to_write > 0) {
+		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		writeback_inodes(&wbc);
 		if (wbc.nr_to_write > 0) {
-			if (wbc.encountered_congestion)
+			if (wbc.encountered_congestion || wbc.more_io)
 				congestion_wait(WRITE, HZ/10);
 			else
 				break;	/* All the old data is written */

-- 

  parent reply	other threads:[~2007-10-02  9:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20071002084143.110486039@mail.ustc.edu.cn>
2007-10-02  8:41 ` [PATCH 0/5] sluggish writeback fixes Fengguang Wu
2007-10-03 11:04   ` Martin Knoblauch
     [not found] ` <20071002090254.489150786@mail.ustc.edu.cn>
2007-10-02  8:41   ` [PATCH 1/5] revert check_dirty_inode_list.patch Fengguang Wu
     [not found] ` <20071002090254.596842343@mail.ustc.edu.cn>
2007-10-02  8:41   ` [PATCH 2/5] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
     [not found] ` <20071002090254.728493507@mail.ustc.edu.cn>
2007-10-02  8:41   ` [PATCH 3/5] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
     [not found] ` <20071002090254.987182999@mail.ustc.edu.cn>
2007-10-02  8:41   ` Fengguang Wu [this message]
2007-10-02 21:47   ` [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io David Chinner
     [not found]     ` <20071003013439.GA6501@mail.ustc.edu.cn>
2007-10-03  1:34       ` Fengguang Wu
2007-10-03  2:41       ` David Chinner
     [not found]         ` <20071004022133.GA6244@mail.ustc.edu.cn>
2007-10-04  2:21           ` Fengguang Wu
2007-10-04  5:03           ` David Chinner
     [not found]             ` <20071005033652.GA6448@mail.ustc.edu.cn>
2007-10-05  3:36               ` Fengguang Wu
2007-10-05  7:41               ` David Chinner
     [not found]                 ` <20071005115508.GA9998@mail.ustc.edu.cn>
2007-10-05 11:55                   ` Fengguang Wu
     [not found] ` <20071002090254.873023041@mail.ustc.edu.cn>
2007-10-02  8:41   ` [PATCH 4/5] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-10-04 21:26     ` Andrew Morton
2007-10-02 21:55   ` David Chinner
     [not found]     ` <20071003014333.GB6501@mail.ustc.edu.cn>
2007-10-03  1:43       ` Fengguang Wu
2007-10-03  2:22       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=391315778.00442@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=kenchen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox