public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: Ken Chen <kenchen@google.com>, Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8
Date: Sun, 19 Aug 2007 14:53:50 +0800	[thread overview]
Message-ID: <387507684.32621@ustc.edu.cn> (raw)
Message-ID: <20070819071444.499427465@mail.ustc.edu.cn> (raw)
In-Reply-To: 20070819065349.160284305@mail.ustc.edu.cn

[-- Attachment #1: inode-dirty-time-ordering-fix.patch --]
[-- Type: text/plain, Size: 5497 bytes --]

Streamline the management of dirty inode lists and fix time ordering bugs.

The writeback logic used to moving not-yet-expired dirty inodes from s_dirty to
s_io, *only to* move them back. The move-inodes-back-and-forth thing is a mess,
which is eliminated by this patch.

The new scheme is:
- s_dirty acts as a time ordered io delaying queue;
- s_io/s_more_io together acts as an io dispatching queue.

On kupdate writeback, we pull some inodes from s_dirty to s_io at the start of
every full scan of s_io.  Otherwise(i.e. for sync/throttle/background
writeback), we always pull from s_dirty on each run(a partial scan).

Note that the line
	list_splice_init(&sb->s_more_io, &sb->s_io);
is moved to queue_io() to leave s_io empty. Otherwise a big dirtied file will
sit in s_io for a long time, preventing new expired inodes to get in.

Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/fs-writeback.c |   60 +++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 22 deletions(-)

--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -118,7 +118,7 @@ void __mark_inode_dirty(struct inode *in
 			goto out;
 
 		/*
-		 * If the inode was already on s_dirty or s_io, don't
+		 * If the inode was already on s_dirty/s_io/s_more_io, don't
 		 * reposition it (that would break s_dirty time-ordering).
 		 */
 		if (!was_dirty) {
@@ -172,6 +172,33 @@ static void requeue_io(struct inode *ino
 }
 
 /*
+ * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
+ */
+static void move_expired_inodes(struct list_head *delaying_queue,
+			       struct list_head *dispatch_queue,
+				unsigned long *older_than_this)
+{
+	while (!list_empty(delaying_queue)) {
+		struct inode *inode = list_entry(delaying_queue->prev,
+						struct inode, i_list);
+		if (older_than_this &&
+			time_after(inode->dirtied_when, *older_than_this))
+			break;
+		list_move(&inode->i_list, dispatch_queue);
+	}
+}
+
+/*
+ * Queue all expired dirty inodes for io, eldest first.
+ */
+static void queue_io(struct super_block *sb,
+				unsigned long *older_than_this)
+{
+	list_splice_init(&sb->s_more_io, sb->s_io.prev);
+	move_expired_inodes(&sb->s_dirty, &sb->s_io, older_than_this);
+}
+
+/*
  * Write a single inode's dirty pages and inode data out to disk.
  * If `wait' is set, wait on the writeout.
  *
@@ -221,7 +248,7 @@ __sync_single_inode(struct inode *inode,
 			/*
 			 * We didn't write back all the pages.  nfs_writepages()
 			 * sometimes bales out without doing anything. Redirty
-			 * the inode.  It is moved from s_io onto s_dirty.
+			 * the inode; Move it from s_io onto s_more_io/s_dirty.
 			 */
 			/*
 			 * akpm: if the caller was the kupdate function we put
@@ -234,10 +261,9 @@ __sync_single_inode(struct inode *inode,
 			 */
 			if (wbc->for_kupdate) {
 				/*
-				 * For the kupdate function we leave the inode
-				 * at the head of sb_dirty so it will get more
-				 * writeout as soon as the queue becomes
-				 * uncongested.
+				 * For the kupdate function we move the inode
+				 * to s_more_io so it will get more writeout as
+				 * soon as the queue becomes uncongested.
 				 */
 				inode->i_state |= I_DIRTY_PAGES;
 				requeue_io(inode);
@@ -295,10 +321,10 @@ __writeback_single_inode(struct inode *i
 
 		/*
 		 * We're skipping this inode because it's locked, and we're not
-		 * doing writeback-for-data-integrity.  Move it to the head of
-		 * s_dirty so that writeback can proceed with the other inodes
-		 * on s_io.  We'll have another go at writing back this inode
-		 * when the s_dirty iodes get moved back onto s_io.
+		 * doing writeback-for-data-integrity.  Move it to s_more_io so
+		 * that writeback can proceed with the other inodes on s_io.
+		 * We'll have another go at writing back this inode when we
+		 * completed a full scan of s_io.
 		 */
 		requeue_io(inode);
 
@@ -365,7 +391,7 @@ sync_sb_inodes(struct super_block *sb, s
 	const unsigned long start = jiffies;	/* livelock avoidance */
 
 	if (!wbc->for_kupdate || list_empty(&sb->s_io))
-		list_splice_init(&sb->s_dirty, &sb->s_io);
+		queue_io(sb, wbc->older_than_this);
 
 	while (!list_empty(&sb->s_io)) {
 		struct inode *inode = list_entry(sb->s_io.prev,
@@ -410,13 +436,6 @@ sync_sb_inodes(struct super_block *sb, s
 		if (time_after(inode->dirtied_when, start))
 			break;
 
-		/* Was this inode dirtied too recently? */
-		if (wbc->older_than_this && time_after(inode->dirtied_when,
-						*wbc->older_than_this)) {
-			list_splice_init(&sb->s_io, sb->s_dirty.prev);
-			break;
-		}
-
 		/* Is another pdflush already flushing this queue? */
 		if (current_is_pdflush() && !writeback_acquire(bdi))
 			break;
@@ -446,9 +465,6 @@ sync_sb_inodes(struct super_block *sb, s
 			break;
 	}
 
-	if (list_empty(&sb->s_io))
-		list_splice_init(&sb->s_more_io, &sb->s_io);
-
 	return;		/* Leave any unwritten inodes on s_io */
 }
 
@@ -458,7 +474,7 @@ sync_sb_inodes(struct super_block *sb, s
  * Note:
  * We don't need to grab a reference to superblock here. If it has non-empty
  * ->s_dirty it's hadn't been killed yet and kill_super() won't proceed
- * past sync_inodes_sb() until both the ->s_dirty and ->s_io lists are
+ * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all
  * empty. Since __sync_single_inode() regains inode_lock before it finally moves
  * inode from superblock lists we are OK.
  *

-- 

  parent reply	other threads:[~2007-08-19  7:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070819065349.160284305@mail.ustc.edu.cn>
2007-08-19  6:53 ` [PATCH 0/6] dirty inode lists time delay/ordering fixes Fengguang Wu
     [not found] ` <20070819071444.499427465@mail.ustc.edu.cn>
2007-08-19  6:53   ` Fengguang Wu [this message]
     [not found] ` <20070819071444.629894251@mail.ustc.edu.cn>
2007-08-19  6:53   ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
     [not found] ` <20070819071444.757927082@mail.ustc.edu.cn>
2007-08-19  6:53   ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
     [not found] ` <20070819071444.878093599@mail.ustc.edu.cn>
2007-08-19  6:53   ` [PATCH 4/6] writeback: introduce writeback_control.more_io to indicate more io Fengguang Wu
     [not found] ` <20070819071445.006427549@mail.ustc.edu.cn>
2007-08-19  6:53   ` [PATCH 5/6] check dirty inode list Fengguang Wu
     [not found] ` <20070819071445.137947640@mail.ustc.edu.cn>
2007-08-19  6:53   ` [PATCH 6/6] prevent time-ordering warnings Fengguang Wu
     [not found] <20070812091120.189651872@mail.ustc.edu.cn>
     [not found] ` <20070812092052.558804846@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=387507684.32621@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=akpm@osdl.org \
    --cc=kenchen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox